CN113379738A

CN113379738A - Method and system for detecting and positioning epidemic trees based on images

Info

Publication number: CN113379738A
Application number: CN202110821397.0A
Authority: CN
Inventors: 侯俊岭; 李伟红; 杨利平; 张超; 王欣然
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2021-09-10

Abstract

The invention provides an image-based epidemic wood detection and positioning method and system. The invention designs a multi-scale candidate region fusion network, wherein a characteristic diagram obtained by ResNet18 is used for detecting RPN (S-RPN) of a small target, a characteristic diagram obtained by ResNet32 is used for detecting RPN (M-RPN) of a medium target, and a characteristic diagram obtained by ResNet50 is used for detecting RPN (L-RPN) of a large target. Non-maximum suppression (NMS) is then employed to reduce redundant candidate regions. The network can solve the problem that the existing target detection network model based on the deep neural network is difficult to deal with multi-scale pine forest epidemic wood detection. The invention provides a method for constructing a log three-dimensional positioning geometric model according to a camera imaging principle of a holder monitoring tower and digital terrain elevation Data (DEM), which can accurately realize the three-dimensional positioning of a log explosion point. The method is simple and quick, and can improve the monitoring and early warning capability of the pine forest epidemic trees.

Description

Method and system for detecting and positioning epidemic trees based on images

Technical Field

The invention belongs to the technical field of image detection and positioning, and particularly relates to a multi-scale candidate region fusion epidemic wood detection network and an epidemic wood three-dimensional positioning geometric model based on deep learning.

Background

The pine wood nematode disease is one of pine wood diseases with the greatest damage to forestry in China and is called cancer of pine trees. It takes only 3-5 years from the infection of a pine tree to the death of the entire pine forest, and the infected pine is called an epidemic. Therefore, epidemic wood detection is the first major thing for pine forest protection. The most notable feature of infected pine trees over normal healthy pine trees is that the entire crown conifer of the tree is yellow-brown or reddish-brown, while the crown conifer of normal healthy pine trees is green. The existing pine forest epidemic wood detection method mainly comprises the following steps: 1) the ground investigation method includes collecting coordinate position information of single-plant epidemic trees and collecting samples by means of on-site visit of forest protection personnel, and is high in detection accuracy and reliability, but high in labor and time cost and low in efficiency; 2) according to the satellite remote sensing detection method, the area monitoring of large-area pine forest is mostly realized through remote sensing images in the current forestry monitoring, and because the satellite revisit period is long and is influenced by cloud and fog, the spatial resolution of the remote sensing images is not high, so that the monitoring of single-plant epidemic trees is difficult to realize at present; 3) an Unmanned Aerial Vehicle (UAV) detection method, which is limited by the endurance time of the UAV, and cannot detect pine forest in a large area in real time for a long time; 4) the cradle head observation tower monitoring method is characterized in that a monitoring camera is arranged at a high point of a cradle head observation tower to carry out near-ground low-altitude shooting on pine forest, shot video has high spatial resolution, and single-frame RGB images in the video are extracted to carry out detection and analysis, so that the method is an effective monitoring method for pine forest epidemic wood detection at present.

With the great progress of the deep learning target detection method, the target detection method based on the image can detect the interested target in the image through the deep neural network model. And features, especially high-level semantic features, can be automatically extracted through a Convolutional Neural Network (CNN), and good effects are achieved in the field of remote sensing image target detection. However, when the camera of the pan-tilt observation tower is used for shooting the image of the pine forest, the shooting has difference between a long shot and a short shot, and the size and the position of the crown of the pine tree in the high-resolution image are greatly changed along with the distance from the pan-tilt. The conventional target detection network based on the CNN has a single candidate frame size, so that the detection result is poor when epidemic trees with different sizes are detected. Meanwhile, the pixel coordinates of the epidemic trees which are detected only in the image level have no monitoring significance, and the spatial geographic coordinates of the epidemic trees relative to the mountain terrain need to be accurately positioned, so that the actual utility can be brought to the discovery and protection work of the epidemic trees.

Disclosure of Invention

The invention aims to provide an image-based epidemic wood detection and positioning method aiming at the epidemic wood detection and positioning of a pan-tilt observation tower image, and is used for solving the problems that the existing target detection network model based on a deep neural network is difficult to deal with multi-scale pine forest epidemic wood detection and cannot accurately acquire the geographic position of the epidemic wood.

The technical scheme of the invention is as follows:

an image-based epidemic wood detection and positioning method comprises the following steps:

step 1: video preprocessing: and extracting a single-frame image from a video shot by the holder observation tower, and then carrying out image preprocessing and labeling to manufacture a data set for network model training, verification and testing.

Step 2: and (3) constructing a multi-scale candidate region fusion network, and training the model by the data set obtained in the step (1).

And step 3: and inputting the image to be tested into the trained multi-scale candidate region fusion detection network to obtain an output result. And calculating specific geographical three-dimensional coordinates of the epidemic trees according to the three-dimensional positioning geometric model of the epidemic trees.

Further, step 2 comprises the following substeps:

step 2-1: firstly, preprocessing an input image of the multi-scale candidate regional fusion network, then fixing the image scaling to 224 × 224 pixels, and finally respectively taking the 224 × 224 pixel image and the enlarged image and the reduced image as the input of the multi-scale candidate regional fusion network.

Step 2-2: the 224 × 224 pixel images are input to a convolutional neural network (ResNet32), the 224 × 224 pixel image with one time enlargement is input to a convolutional neural network (ResNet18), and the 224 × 224 pixel image with one time reduction is input to a convolutional neural network (ResNet50), so that corresponding feature maps are obtained.

Step 2-3: and acquiring a epidemic wood candidate region from the characteristic map. A feature map obtained by using ResNet18 in the RPN (candidate area generation network) for detecting small targets is called S-RPN; detecting a characteristic graph obtained by ResNet32 of the RPN of the medium-sized target, and the characteristic graph is called M-RPN; the characteristic graph obtained by detecting the RPN of a large target with ResNet50 is called L-RPN. Since the regions generated by each RPN may overlap each other, non-maximum suppression (NMS) is employed to reduce redundant candidate regions.

Step 2-4: training the multi-scale candidate region fused epidemic wood detection network on a deep learning machine through the data set obtained in the step 1, repeatedly adjusting parameters until the network is converged, and storing the trained network parameters.

Further, step 3 comprises the following substeps:

step 3-1: firstly, the two-dimensional pixel coordinates of the target on the image are obtained by the target detection algorithm in the step 2, and meanwhile, a regular grid Digital Elevation Model (DEM) is introduced to obtain the terrain of the three-dimensional coordinates of the pine forest.

Step 3-2: establishing a geometric relation model according to a Digital Elevation Model (DEM) and an imaging principle of a camera, wherein the core is that a target detection algorithm determines pixel coordinates of a target on an image, and then according to a corresponding proportional relation of distances from any point to a boundary in a reference window, a corresponding coordinate A (x) of a target point on the reference window is obtained by solving_A，y_A，z_A)。

Step 3-3: according to the Janus visibility algorithm idea, calculating to obtain the k-th bisector N from the viewpoint C to the point P_k(x_Nk，y_Nk，Z_Nk) Three-dimensional coordinates of (c).

Step 3-4: and 3, according to the coordinates of the bisector obtained in the step 3-3, utilizing the elevation data of the four surrounding points, wherein the terrain elevation corresponding to the k bisector is calculated according to a distance weighting method, and the following steps are shown.

Wherein n is 4, Z_iElevation, d, of grid nodes around the grid node corresponding to the kth aliquot point_iThe distance from the mesh nodes to the interpolation points.

Step 3-5: scanning all the equant points along the sight line CP' of the target point of the camera, and comparing the obtained elevation on the DEM with the corresponding sight line elevation: if the elevation of the point on the DEM is smaller than the elevation of the corresponding equal division point of the sight line, the next point is judged until the elevation of the first point on the DEM is larger than the elevation of the corresponding equal division point, and the coordinate of the obstacle point, namely the three-dimensional coordinate (x) of the epidemic wood target B is returned_B，y_B，z_B)。

In the invention, when the epidemic trees of the video of the pan-tilt observation tower are detected and positioned, three stages are mainly adopted, firstly, the video data set of the pan-tilt observation tower is trained in the constructed multi-scale candidate area fusion network, and the strategy of fine tuning the learning rate is applied in the training process, so that the detection accuracy of the model is improved; then inputting the test image into a trained multi-scale candidate area fusion network to obtain two-dimensional coordinates of the epidemic wood target in the image; and finally, obtaining the specific three-dimensional geographic coordinates of the epidemic wood target according to the two-dimensional coordinates of the epidemic wood target in the image and the three-dimensional positioning geometric model of the epidemic wood.

The invention has the beneficial effects that:

the invention provides a multi-scale candidate area fusion network for epidemic wood detection, aiming at videos shot by a pinene holder observation tower. By analyzing the relation between the camera imaging principle and the camera related parameters and combining the characteristics of a Digital Elevation Model (DEM), the three-dimensional positioning method for the pine forest epidemic trees is provided. The invention can solve the problem that the existing target detection network model based on the deep neural network is difficult to deal with the detection of the multi-scale pine forest epidemic trees, and can realize the positioning of the geographic position of the epidemic trees. The method is simple and rapid, and can improve the monitoring and early warning capability of the pine forest epidemic trees.

Drawings

FIG. 1 is a multi-scale candidate area convergence network framework diagram of the present invention;

FIG. 2 is a three-dimensional positioning geometric model of Phytophthora.

Detailed Description

The following describes in further detail a specific embodiment of the present invention with reference to fig. 1 and 2.

As shown in fig. 1, an image-based epidemic wood detection and positioning method is specifically performed according to the following steps:

Step 1 comprises the following substeps:

step 1-1: single-frame images are extracted from videos shot by a holder observation tower, and then the single-frame images are subjected to manual data cleaning, wherein the manual data cleaning mainly comprises the steps of removing repeated images, missing images, useless images and the like.

Step 1-2: and according to the guidance suggestion of forestry disease and pest experts, carrying out data annotation on the cleaned image by adopting an image annotation tool for deep learning target detection. The shuffled data set is divided into a training set, a validation set, and a test set according to different ratios (e.g., 7: 2: 1).

The step 2 comprises the following substeps:

step 2-1: firstly, preprocessing an input image of a multi-scale candidate region fusion network, including data enhancement processing such as image turning, rotation, color transformation, Gaussian noise addition and the like. And then, the enhanced image is fixed to 224 × 224 pixels in a scaling mode, and finally, the image of 224 × 224 pixels and the enlarged image and the reduced image are respectively used as the input of the multi-scale candidate region fusion network.

Step 2-2: inputting 224 × 224 pixel images into a convolutional neural network (ResNet32), inputting 224 × 224 pixel images with one time amplification into a convolutional neural network (ResNet18), and inputting 224 × 224 pixel images with one time reduction into a convolutional neural network (ResNet50) to obtain corresponding feature maps;

step 2-3: different candidate regions are derived from the feature map. A feature map obtained by using ResNet18 in the RPN (candidate area generation network) for detecting small targets is called S-RPN; detecting a characteristic graph obtained by ResNet32 of the RPN of the medium-sized target, and the characteristic graph is called M-RPN; the characteristic graph obtained by detecting the RPN of a large target with ResNet50 is called L-RPN. Since the regions generated by each RPN may overlap each other, non-maximum suppression (NMS) is employed to reduce redundant candidate regions. For each RPN model, first, each Anchor is assigned a positive or negative label. Positive labels follow two principles: (1) the Anchor with the highest IoU in the true value region; or (2) an Anchor with IoU values higher than 0.7 in the true area. Negative labels are anchors with IoU values below 0.3 in the true area. While the Anchor with IoU score in the range of 0.3-0.7 does not belong to either positive or negative label, and has no effect in RPN model training. Then, a multitask loss function is adopted as a loss function of model training, which is specifically defined as follows:

where i is the index of a Batch (Batch), p_iRepresenting the prediction probability that the candidate region generated by the ith anchor is the target region,

the truth label for this region is represented, where 0 represents negative and 1 represents positive. Class penalty function of the above formula

And regression loss function

Respectively using N_cls、N_regNormalized with a balance weight λ. For classification loss function

Using log loss function, for regression loss function

A smoothL1 loss function was used. In the regression of the boundary frame, the coordinates of the central point of the prediction frame and the width and the height of the prediction frame are respectively regressed, and the prediction frame t_iSum truth box

Each defined as follows.

Wherein x and y respectively represent the central coordinates of the prediction frame, and w and h respectively represent the width and height of the prediction frame; x is the number of_a、y_aRespectively representing the center coordinates of the anchor, w_a、h_aWidth and height of the anchor are indicated, respectively; x is the number of^*、y^*Respectively representing the central coordinates of the real value box, w^*、h^*Representing the width and height of the true value box, respectively.

And step 3: and inputting the image to be tested into the trained multi-scale candidate region fusion detection network to obtain an output result. And calculating specific geographical three-dimensional coordinates of the epidemic trees according to the three-dimensional positioning geometric model of the epidemic trees. As shown in fig. 2, the method comprises the following substeps:

Step 3-2: a geometric relation model is established according to a Digital Elevation Model (DEM) and the imaging principle of a camera, as shown in figure 2, the core is the pixel coordinates of a target on an image determined by a target detection algorithm, and then the corresponding coordinates of a target point on a reference window are obtained by solving according to the corresponding proportional relation of the distance from any point in the reference window to a boundary.

First, the distance between the camera C and the reference window HEFG is set to R, and the spatial position of the camera is set to C (x)_c，y_c，z_c) The pitch angle is alpha, the azimuth angle is beta, the target surface size of the lens, the focal length is f, the field angle is theta and other parameters, the space analytic geometry is utilized, the solution model of the center point coordinate of the reference window is firstly solved to obtain the coordinate S (x) of the midpoint of the reference window_s，y_s，z_s). Solving the model of the coordinates of each boundary point of the established reference window to obtain the coordinates H (x) of the four boundary points of the reference window_H，y_H，z_H)、E(x_E，y_E，z_E)、F(x_F，y_F，z_F)、G(x_G，y_G，z_G) (ii) a Then, according to the corresponding proportion relation between the corresponding point of the target point on the reference window and the distance of the four boundaries, which is determined by the coordinates of the target point pixel on the image determined by the target detection algorithm, the solving model of the corresponding point coordinates of the target point on the reference window is solved to obtain the coordinates A (x) of the corresponding point of the target point on the reference window_A，y_A，z_A)。

Step 3-3: according to two coordinates C (x) in space_c，y_c，z_c)、A(x_A，y_A，z_A) The equation for the CA space line is obtained as follows.

Further, the coordinates (x) of the intersection point P 'of the sight line CP' and the plane xoy can be obtained_P′，y_P′，z_P′)。

According to the Janus visibility algorithm idea, it is necessary to calculate the translation amount of the x and y coordinates from the viewpoint C to the point P ', take max Δ ═ max { Δ x, Δ y }, and divide the sight line CP' into segments of n ═ int (max Δ/m) according to the resolution (m) of the adopted regular grid DEM. Let the k-th division point use N_kRepresents (k ═ 1, 2.). According to the proportional theorem of dividing the line segments equally by parallel lines, the single translation increment of the x, y and z coordinate directions after n is equally divided between the point C and the point P' are respectively (x, y and z)_P′-x_C)/n、(y_P′-y_C) N and z_CN, then the k-th division point N_k(x_Nk，y_Nk，Z_Nk) Three-dimensional coordinates of

A further embodiment is an image-based log detection and location system, comprising:

and the preprocessing module is used for extracting a single-frame image from the video shot by the holder observation tower, preprocessing and labeling the image, and making a data set for network model training, verification and testing.

And the network construction training module is used for constructing a multi-scale candidate region fusion network and training the model on the data set obtained by preprocessing.

And the detection positioning module is used for inputting the image to be tested into the trained multi-scale candidate region fusion detection network to obtain an output result, and calculating the specific geographical three-dimensional coordinates of the epidemic wood according to the epidemic wood three-dimensional positioning geometric model.

In particular, the network construction training module is configured to perform the following steps:

step 2-1: preprocessing an input image of the multi-scale candidate regional fusion network, then scaling and fixing the image to 224 × 224 pixels, and finally respectively taking the image of 224 × 224 pixels and the amplified image and the reduced image as the input of the multi-scale candidate regional fusion network;

step 2-3: different candidate regions are derived from the feature map. Detecting a candidate region of a small target by using a ResNet18 characteristic diagram to generate a network RPN, which is S-RPN; detecting the RPN of the medium-sized target by using the characteristic diagram of ResNet32, wherein the RPN is M-RPN; detecting the RPN of the large target by using the characteristic diagram of ResNet50, wherein the RPN is L-RPN; reducing redundant candidate regions using non-maximum suppression NMS;

step 2-4: and (3) training the multi-scale candidate area fusion network on a deep learning machine through the data set obtained in the step (1), repeatedly adjusting parameters until the network is converged, and storing the trained network parameters.

Specifically, the detection positioning module is configured to perform the following steps:

step 3-1: obtaining a two-dimensional pixel coordinate of a target on the image by the target detection algorithm in the step 2, and introducing a regular grid Digital Elevation Model (DEM) to obtain a terrain of a three-dimensional coordinate of the pine forest;

step 3-2: establishing a geometric relation model according to a Digital Elevation Model (DEM) and an imaging principle of a camera, wherein the core is that a target detection algorithm determines pixel coordinates of a target on an image, and then according to a corresponding proportional relation of distances from any point to a boundary in a reference window, a corresponding coordinate A (x) of a target point on the reference window is obtained by solving_A，y_A，z_A)；

Step 3-3: according to the Janus visibility algorithm idea, calculating to obtain the k-th bisector N from the viewpoint C to the point P_k(x_Nk，y_Nk，Z_Nk) Three-dimensional coordinates of (a);

Wherein n is 4, Z_iElevation, d, of grid nodes around the grid node corresponding to the kth aliquot point_iThe distance from the grid node to the interpolation point;

Claims

1. An image-based epidemic wood detection and positioning method is characterized by comprising the following steps:

step 1: video preprocessing: extracting a single-frame image from a video shot by a holder observation tower, and carrying out image preprocessing and labeling to manufacture a data set for network model training, verification and testing;

step 2: constructing a multi-scale candidate region fusion network, and training the model on the data set obtained in the step 1;

and step 3: and inputting the image to be tested into the trained multi-scale candidate region fusion detection network to obtain an output result, and calculating the specific geographical three-dimensional coordinates of the epidemic wood according to the epidemic wood three-dimensional positioning geometric model.

2. The image-based log detection and positioning method according to claim 1, characterized in that: the step 2 comprises the following specific steps:

3. The image-based log detection and positioning method according to claim 2, characterized in that: and 2-1, preprocessing the input image of the multi-scale candidate region fusion network, including image turning, rotation, color transformation, Gaussian noise addition and other data enhancement processing.

4. The image-based log detection and positioning method according to claim 2, characterized in that: the step 2-3 of reducing the redundant candidate region by using non-maximum inhibition NMS specifically comprises the following steps:

for each RPN model, first, each anchor box is assigned a positive label or a negative label. Positive labels follow two principles: (1) the Anchor with the highest IoU (cross-over ratio) with the truth region; or (2) an Anchor with IoU values higher than 0.7 in the true area. Negative labels are anchors with IoU values below 0.3 in the true area. While the Anchor with the IoU score within the range of 0.3-0.7 neither belongs to a positive label nor a negative label, and has no effect in the training of the RPN model;

then, a multitask loss function is adopted as a loss function of model training, which is specifically defined as follows:

where i is an index of a batch, p_iRepresenting the prediction probability that the candidate region generated by the ith anchor is the target region,

And regression loss function

Using log loss function, for regression loss function

A smoothL1 loss function was used. In the boundary frame regression, the coordinates of the center point of the prediction frame and the width and height of the prediction frame are respectively regressed, and the prediction frame t_iSum truth box

Each defined as follows.

5. The image-based log detection and positioning method according to claim 1, characterized in that: the step 3 comprises the following specific steps:

step 3-2: from a digital elevation model(DEM) and the imaging principle of a camera to establish a geometric relation model, wherein the core is that the pixel coordinates of a target on an image are determined by a target detection algorithm, and then the corresponding coordinate A (x) of a target point on a reference window is obtained by solving according to the corresponding proportional relation of the distance from any point in the reference window to a boundary_A，y_A，z_A)；

6. The image-based log detection and positioning method according to claim 5, characterized in that: the step 3-2 is specifically as follows:

first, the distance between the camera C and the reference window HEFG is set to R, and the spatial position of the camera is set to C (x)_c，y_c，z_c) The pitch angle is alpha, the azimuth angle is beta, the size of the target surface of the lens, the focal length is f, the field angle is theta and the likeParameters, using space analysis geometry, firstly solving the solving model of the central point coordinate of the reference window to obtain the coordinate S (x) of the central point of the reference window_s，y_s，z_s) (ii) a Solving the model of the coordinates of each boundary point of the established reference window to obtain the coordinates H (x) of the four boundary points of the reference window_H，y_H，z_H)、E(x_E，y_E，z_E)、F(x_F，y_F，z_F)、G(x_G，y_G，z_G) (ii) a Then, according to the corresponding proportion relation between the corresponding point of the target point on the reference window and the distance of the four boundaries, which is determined by the coordinates of the target point pixel on the image determined by the target detection algorithm, the solving model of the corresponding point coordinates of the target point on the reference window is solved to obtain the coordinates A (x) of the corresponding point of the target point on the reference window_A，y_A，z_A)。

7. The image-based log detection and positioning method according to claim 5, characterized in that: the step 3-3 specifically comprises: according to two coordinates C (x) in space_c，y_c，z_c)、A(x_A，y_A，z_A) Position, the CA space linear equation is obtained as follows:

further, the coordinates (x) of the intersection point P 'of the sight line CP' and the plane xoy are obtained_P′，y_P′，z_P′)；

According to the Janus visibility algorithm idea, firstly, the translation amount of x and y coordinates from a viewpoint C to a point P 'is calculated, max delta is taken as max { delta x, delta y }, according to the resolution (m) of the adopted regular grid DEM, the sight line CP' is equally divided into N which is int (max delta/m), and the kth equally dividing point is set as N for N_kExpressed as (k ═ 1, 2.,) each single translation increment in the x, y, z coordinate directions between point C and point P' after n equal division is (x, y, z coordinate directions), respectively_P′-x_C)/n、(y_P′-y_C)/n and z_CN, the k-th division point N_k(x_Nk，y_Nk，Z_Nk) Three-dimensional coordinates of

8. An image-based log detection and positioning system is characterized by comprising:

the system comprises a preprocessing module, a network model training module, a network model verifying module and a network model marking module, wherein the preprocessing module is used for extracting a single-frame image from a video shot by a holder observation tower, preprocessing and marking the image and manufacturing a data set for network model training, verification and testing;

the network construction training module is used for constructing a multi-scale candidate region fusion network and training the model on a data set obtained by preprocessing;

9. The image-based log detection and location system of claim 8, wherein the network construction training module is configured to perform the steps of:

10. The image-based log detection and location system of claim 8, wherein the detection and location module is configured to perform the steps of:

Wherein n is 4, Z_iFor grid nodes around the grid node corresponding to the kth aliquot pointElevation, d_iThe distance from the grid node to the interpolation point;