CN113837927A

CN113837927A - Standing tree height measuring system and method based on machine vision

Info

Publication number: CN113837927A
Application number: CN202111084045.8A
Authority: CN
Inventors: 华蓓; 黄汝维; 曾朝燕
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2021-09-14
Filing date: 2021-09-14
Publication date: 2021-12-24

Abstract

The invention discloses a standing tree height measuring method based on machine vision, which comprises the following steps: firstly, calibrating an android smart phone camera by adopting a camera calibration method with nonlinear distortion, extracting a camera nonlinear distortion parameter and internal and external parameters, and meanwhile, carrying out perspective distortion correction based on point operation on a standing tree image to be measured. The tree image information is acquired by using a portable smart phone, a Mask R-CNN algorithm which is not applied in a tree height measuring direction is adopted in a tree image processing part by combining machine vision knowledge, so that the outline of the tree image is rapidly segmented, and the tree image segmentation method has high universality and applicability; the method is high in universality, simple to operate and low in cost, and can efficiently obtain an actual tree height measurement value.

Description

Standing tree height measuring system and method based on machine vision

Technical Field

The invention relates to the technical field of tree height measurement, in particular to a standing tree height measurement system and method based on machine vision.

Background

In the current forest resource investigation process, especially for the measurement of tree height, most people still use the more traditional method, namely, the method of manually reading and manually recording data is adopted, and the labor and the time are consumed.

With the continuous progress of science and technology, the appearance of precision instruments such as electronic total stations, theodolites, tree measuring guns and the like enables forestry resource investigation to be greatly improved. However, the cost of the precision instrument is high, and some instruments also need professional operation techniques and are easily limited by the use environment. The use of these tools is more difficult for non-forestry professionals and is very inconvenient in the actual measurement operation.

Disclosure of Invention

The invention provides a standing tree height measuring system and method based on machine vision, and aims to solve the problems in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

a standing tree height measuring method based on machine vision comprises the following steps:

firstly, calibrating an android smart phone camera by adopting a camera calibration method with nonlinear distortion, extracting a camera nonlinear distortion parameter and internal and external parameters, and simultaneously performing perspective distortion correction based on point operation on a standing tree image to be measured, thereby providing powerful support for acquiring more accurate tree height characteristic point pixel values and constructing a better tree height measurement model;

secondly, training a tree image segmentation model by adopting a Mask R-CNN algorithm based on machine vision, processing the corrected tree image contour, extracting feature points related to tree height measurement from the corrected tree image contour, and obtaining a pixel value of a tree height difference value, so that the precision and universality of extraction of the tree image contour are improved;

thirdly, according to the imaging principle of the pinhole camera model, utilizing the acquired internal and external parameters, distortion parameters and tree height characteristic point pixel values of the mobile phone camera to construct a tree height measurement model, and calculating and finally obtaining tree height data of the target tree to be measured;

and fourthly, developing a single-plant standing tree height measurement prototype APP based on an Android smart phone platform, rapidly acquiring tree images through a smart phone camera, timely inputting required parameters, and finally timely obtaining tree height measurement results in the APP.

As a further improvement scheme of the technical scheme: in the first step, firstly, a camera imaging model is required to be established, wherein the camera imaging model comprises conversion among coordinate systems related to camera imaging, a conversion formula of real object points (X, Y, Z) and pixel points (u, v), and M (X)_w,Y_w,Z_w) And (u, v).

As a further improvement scheme of the technical scheme: in the first step, when a camera calibration method with nonlinear distortion is adopted to calibrate the android smartphone camera, the size of a calibration plate selects that the area of the calibration plate is larger than or equal to one half of the available pixel area, and the type of the calibration plate selects a checkerboard calibration plate.

As a further improvement scheme of the technical scheme: in the first step, the nonlinear distortion correction model formula is as follows:

wherein M is₁Is the obtained camera calibration intrinsic parameter matrix, M₂Is the obtained camera calibration extrinsic parameter matrix, d_xIs the corresponding physical dimension in the x-axis direction, d_yIs the corresponding physical dimension in the y-axis direction.

As a further improvement scheme of the technical scheme: in the first step, the steps of extracting the nonlinear distortion parameters and the internal and external parameters of the camera comprise homography relation and parameter constraint, nonlinear Levenberg-Marquardt algorithm and distortion optimization.

As a further improvement scheme of the technical scheme: in the first step, a formula is adopted for carrying out perspective distortion correction based on point operation on a standing tree image needing to be measured

Wherein x and y represent scene imaging planes in an ideal state, namely image point coordinates on an ideal image; x 'and y' are scene imaging planes under a real scene, namely image point coordinates on an actual distorted image; f. of_yThe focal length of the camera lens in the direction of the ordinate axis; f. of_xThe focal length of the camera lens in the direction of the abscissa axis; beta is the included angle between the imaging surface of the real object and the optical axis and is beta.

As a further improvement scheme of the technical scheme: in the second step, the Mask R-CNN algorithm comprises a prediction part algorithm and a training part algorithm.

As a further improvement scheme of the technical scheme: in the third step, the tree height data of the tree is

Wherein y' is the pixel difference from the highest point to the lowest point of the tree contour in the image, after fy is calibrated by the camera of the smart phone, theta is the mobile phone inclination angle obtained by the direction sensor of the mobile phone, and PA₁The distance from the mobile phone to a target tree to be measured is L, the length unit of the tree height H is meter/m, the angle units are degree/° y' and f_yThe unit of (1) is pixel/pixel.

As a further improvement scheme of the technical scheme: the method is applied to the smart phone.

A standing tree height measuring system based on machine vision comprises a shooting unit, a measuring unit and a control unit, wherein the shooting unit is used for shooting a tree to be measured, and a shot tree picture needs to comprise the whole tree of a single plant;

the horizontal distance input unit is used for inputting the horizontal distance from the mobile phone to the tree to be detected;

and the data processing unit is used for calculating the measured tree height and segmenting the processed tree image.

Compared with the prior art, the invention has the beneficial effects that:

aiming at the problems of high time and labor cost, low measurement efficiency, difficulty in carrying equipment and instruments, inconvenience in operation and the like in the conventional forestry resource investigation and measurement, the tree image information is acquired by using a portable smart phone, and a Mask R-CNN algorithm which is not applied in the tree height measurement direction is adopted in the tree image processing part in combination with machine vision knowledge so as to realize the rapid segmentation of the outline of the tree image, so that the tree image has high universality and applicability; the method is high in universality, simple to operate and low in cost, and can efficiently obtain an actual tree height measurement value.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a graph showing a relationship between coordinate systems in a pinhole model in a machine vision-based standing tree height measurement system and method according to the present invention;

FIG. 2 is a graph showing the geometric relationship between the actual scene imaging plane and the ideal scene imaging plane in the standing tree height measurement system and method based on machine vision according to the present invention;

FIG. 3 is a three-dimensional modeling image after camera calibration in a machine vision-based standing tree height measurement system and method provided by the invention;

FIG. 4 is a graph of the average pixel error of each calibration image in a machine vision based standing tree height measurement system and method of the present invention;

FIG. 5 is a diagram of a proposed box intercepting a common feature layer and resize in a machine vision-based stumpage tree height measurement system and method of the present invention;

FIG. 6 is a schematic diagram of a tree height measurement model in a machine vision-based standing tree height measurement system and method according to the present invention;

fig. 7 is a diagram of main development tasks of APP in the standing tree height measurement system and method based on machine vision according to the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention. The invention is described in more detail in the following paragraphs by way of example with reference to the accompanying drawings. Advantages and features of the present invention will become apparent from the following description and from the claims. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1 to 7, in an embodiment of the present invention, a standing tree height measuring method based on machine vision includes the following steps:

Specifically, in the first step, a camera imaging model is first established, where the camera imaging model includes conversion between coordinate systems involved in camera imaging, a conversion formula between real object points (X, Y, Z) and pixel points (u, v), and M (X)_w,Y_w,Z_w) The relationship of (i) and (u, v);

(1) conversion between coordinate systems involved in camera imaging

The actual imaging process of the camera needs to involve coordinate transformation between four coordinate systems, which are: an image coordinate system for describing coordinates of a photographing imaging plane; a world coordinate system for describing coordinates of a location of an object in the real world; the pixel coordinate system is used for describing the coordinates of the positions of the pixel points in the generated photo; and the camera coordinate system is used for describing the coordinates of the position where the camera is located. The process of camera imaging also conforms to the principle of similar triangles. The real object point M (XW, YW, ZW) is known to be on the world coordinate system. In an ideal imaging process, the real object point M is imaged on an image coordinate system through projection to form a pixel point M (xd, yd). However, in practice, the camera lens is affected by the manufacturing process to generate a certain distortion, so that the actual pixel point corresponding to the object point M in the pinhole camera model is located at M' (xu, yu), as shown in fig. 1;

(2) conversion formula of real object point (X, Y, Z) and pixel point (u, v)

On the basis of the imaging characteristic of the coplanar point, a conversion formula between a real object point (X, Y, Z) in a three-dimensional world coordinate system and a pixel point (u, v) in a two-dimensional pixel coordinate system can be deduced according to a pinhole model and by combining similar trigonometric function knowledge. The coordinates of pixel points of the two-dimensional image can be converted into a matrix form, namely m ═ u v]^TThe coordinates of the three-dimensional spatial scene point can be converted into a matrix form, namely M ═ X Y Z]^T. Where an augmented vector is provided, with an element of value 1 added to the vector, the augmented vectors of M and M, i.e. the homogeneous coordinates, can be expressed as

And

the relationship between the image projection point M and the three-dimensional object point M is specifically shown in formula (1):

(3)M(X_w,Y_w,Z_w) Relation of (u, v) and

in formula (1), s is a scale factor; the internal and external reference matrix of the camera are tabulated as K and [ R T ]](ii) a dx is the corresponding physical size of the pixel point of the pixel coordinate system plane in the x-axis direction; dy is the corresponding physical size of the pixel point of the pixel coordinate system plane in the y-axis direction; the inclination angle between the two coordinate axes is represented by the parameter c; the principal point coordinate is (u)₀,v₀) (ii) a f is the camera lens focal length. Point M (X) on the three-dimensional world coordinate system_w,Y_w,Z_w) And a point (u, v) on the pixel coordinate system, is expressed by equation (2):

the coordinate transformation relation among the four coordinate systems in the imaging model realizes the one-to-one correspondence of the two-dimensional pixel coordinate and the three-dimensional world coordinate, and simultaneously provides powerful support for the deduction calculation of the internal and external parameters of the camera in the subsequent steps.

Specifically, in the first step, when the camera calibration method with nonlinear distortion is used for calibrating the android smartphone camera, the size of the calibration plate is selected to be larger than or equal to one half of the available pixel area, and the type of the calibration plate is selected to be a checkerboard calibration plate, the checkerboard calibration plate used in the actual calibration work of the invention is printed by A3 paper, and the checkerboard specification is as follows: 9 rows and 9 columns, and each checkerboard has a width of 30 mm.

Specifically, in the first step, the nonlinear distortion correction model formula is as follows:

the nonlinear distortion correction model can be described by equation (3):

in the formula, the nonlinear distortion value is expressed by using deltax and deltay, and is particularly related to the position of an image point in an image; (x, y) is the actual image point coordinates; (x)_u,y_u) Are the ideal image point coordinates in the linear pinhole model.

Aiming at the characteristics of a camera of a smart phone, tangential distortion and radial distortion are introduced:

equation (4) is a tangential distortion model function, in which higher order terms are ignored:

equation (5) is a radial distortion model function, in which the higher order terms are ignored:

obtaining an aberration correction function model by combining the equations (3), (4) and (5), wherein p1 and p2 are nonlinear tangential aberration coefficients, and k1 and k2 are nonlinear radial aberration coefficients, specifically the following equation (6),

for the physical coordinate system of the image, the intersection point of the image plane and the coordinate optical axis should ideally be at the midpoint of the image, but in practice there will be deviations due to the manufacturing process of the camera.

If the origin O of the image physical coordinate system (x, y) corresponds to the coordinate (u, v) in the pixel coordinate system (u, v)₀,v₀) Then, each imaging unit on the imaging sensor, i.e. each pixel point of the image, should have a corresponding physical dimension dx in the x-axis direction and dy in the y-axis direction. But in practice the side length of each physical pixel cannot be kept equal for process reasons.

The pixel points of the image satisfy the following conversion relationship in the pixel coordinate system and the image physical coordinate system, as shown in formula (7):

rewriting into homogeneous coordinate and matrix form, as shown in formula (8), wherein M is₁Is the obtained camera calibration intrinsic parameter matrix, M₂And obtaining a camera calibration extrinsic parameter matrix which comprises a rotation matrix and a translation matrix.

Specifically, in the first step, the steps of extracting the nonlinear distortion parameters and the internal and external parameters of the camera comprise homography relation and parameter constraint, nonlinear Levenberg-Marquardt algorithm and distortion optimization;

a homography relation exists between a calibration plate plane and an image, meanwhile, a certain constraint condition exists between internal and external parameters of a mobile phone, a certain corresponding relation exists between a pixel point coordinate of an imaging plane and a three-dimensional coordinate of a checkerboard calibration plate, the internal and external parameters of the camera which are initially estimated can be deduced and solved by utilizing the relation, then, a nonlinear Levenberg-Marquardt algorithm, L-M algorithm for short, is used for iteratively calculating the internal and external parameters of the camera,

(1) homography relationships and parametric constraints

In order to accurately calibrate the camera, the key step is to calculate the internal and external parameters of the camera well. In the world coordinate system, if the Z coordinate is 0, then:

simultaneous homography matrix H ═ r1 r2 t]R1, r2 are rotation matrices, t is translation matrix; the constraint condition of the internal reference is adopted to obtain H ═ H1H 2H 3]From this, equation (9) can be obtained as follows:

the parameter calculation process can be described as that firstly a closed solution is obtained by calculation, then the camera parameter matrix of the initial estimation is calculated by using the closed solution, then the camera external parameter matrix is deduced and calculated, at the moment, the nonlinear optimization solution of the maximum likelihood estimation is obtained, then the radial distortion is considered, and finally the learned value is obtained.

(2) L-M algorithm nonlinear optimization

It can be known from the analysis that the maximum likelihood estimation values corresponding to the pixel point noises obeying the same distribution can be obtained from the formula (10).

Where m is the number of corner points on each calibration board template picture and n represents the number of calibration board template pictures. The minimum value in the formula (10) is nonlinear optimization, and the weight needs to be updated continuously through iteration in the specific process, so that the parameter to be estimated is calculated. In the calibration process of the mobile phone camera, due to the fact that the number of the calibration plate template pictures is large, the problems that the iteration efficiency is optimized, the camera calibration effect is affected and the like can be solved through an L-M algorithm.

(3) Distortion optimization

The distortion introduced in the L-M algorithm and the modification can obviously improve the precision of the calibration result. Based on the distortion model determined by equations (3) to (7) and the obtained camera parameters, the optimization process of the distortion model can be converted into a least squares process.

And deducing and calculating distortion parameters by combining the three-dimensional world coordinates of the chessboard angular points and the corresponding two-dimensional image coordinates. And taking the obtained camera parameters as initial estimated values, solving the minimum value of the objective function F by adopting a nonlinear L-M algorithm, and finally calculating more accurate camera parameters. The total number of the taken chessboard pattern calibration board pictures is n, each picture has n multiplied by m angular points, the calibration parameters are optimized by using residual minimization, and the established objective function F is specifically shown as a formula (11):

in the formula (11), Mj represents a model point in the world coordinate system; ri and Ti are internal and external parameters of the ith calibration template picture; m is the number of control points obtained by the ith calibration template picture;

is the projection of the point Mij on the ith calibration template picture.

Specifically, when a tree is actually shot, an ideal imaging surface can be obtained only after a real imaging surface is changed at a certain angle, so that a shot tree image has perspective geometric distortion, image imaging set coordinates under a distortion condition and an ideal condition can be converted with each other, but correction work of the perspective geometric distortion is required as a basis, when a scene is shot by using a mobile phone camera in practice, a geometric relationship exists between an actual scene imaging surface and a scene imaging surface under an ideal condition, specifically as shown in fig. 2, in the same optical process, x and y represent scene imaging surfaces under the ideal condition, namely image point coordinates on the ideal image, and x 'and y' are scene imaging surfaces under a real scene, namely image point coordinates on an actual distorted image. The optical axis of the camera lens of the smartphone should be always perpendicular to the imaging plane under ideal imaging conditions. However, in an actual shooting scene, due to the influence of factors such as an angle, a real generated image has distortion, and an optical axis is no longer perpendicular to an actual scene imaging plane, that is, the angle β is not equal to 90 °. The following equations (12), (13), (14) can be obtained by analyzing the geometrical relationship according to fig. 2:

in the above formula, there is an intersection point between the imaging plane of the real object and the optical axis, and l can represent the length from this intersection point to the real object point a'; length of ideal object imaging plane to camera lens L1; the focal length of the camera lens in the direction of the ordinate axis is fy; the included angle between the imaging surface of the actual object and the optical axis is beta; the length of the actual object point a' to the optical axis is h; according to the formulas (12), (13) and (14), the actual distortion coordinate point is y, the ideal coordinate points are y ', y and y' have a certain relationship, and the specific formula is shown in the formula (15):

because the x coordinate axis is always perpendicular to the optical axis, a plane passing through the actual object point a 'and perpendicular to the optical axis is found, the distance from the plane to the lens is L, and L determines the ratio of the pixel coordinate x on the ideal object imaging plane to the distortion point pixel coordinate x' on the actual object imaging plane, which is shown in formula (16):

according to (13), (14) and (16) in the above formula, the conversion relationship between the ideal image coordinate system x and the actual image coordinate system x' is derived, as shown in formula (17):

the coordinate conversion relationship between the object point of the ideal image and the object point of the actual image can be derived by combining the equations (16) and (17), as shown in the equations (18) and (19):

and correcting the distorted image by adopting Python and combining with an image processing function of an OpenCV (open CV) library, recovering partial information of the tree image, analyzing and knowing from a geometric relation graph 2 between an actual scene imaging surface and an ideal scene imaging surface, wherein the pixel coordinate of an image object point in an ideal state is (x ', y'), and deducing and calculating by combining with derivation formulas (18) and (19) to obtain the pixel coordinate (x, y) with distortion correction in the actual image. But the resulting pixel coordinates x 'and y' are not integer pixel values at this time;

to improve the image correction accuracy, interpolation algorithms are used to process the distorted image coordinates (x, y), which results in pixel gray values, but this value is not an integer. After all target pixels are processed by an interpolation operation method, pixel gray levels are obtained, the gray levels actually belong to ideal pixels of all selected areas, then assignment processing is carried out, and a perspective distortion correction model is operated in combination with a point to correct the shot tree image.

When a camera calibration experiment is carried out, the calibration board picture should be an image obtained by shooting the calibration board at different angles and different positions by a mobile phone. In order to ensure the validity of the experimental result, at least 3 calibration plate pictures are processed to obtain a definite solution, and 10-20 calibration plates are generally optimal for each group. According to the camera calibration precision evaluation method, 20 checkerboard calibration board images are selected for each group of calibration in the invention,

in the calibration process, a checkerboard calibration board is used which is a 9 x 9 checkerboard array, and the size of each checkerboard is 30mm x 30 mm. Shooting chessboard pattern calibration plate pictures at different angles and positions by using a Lenovo L38041 mobile phone, then carrying out corner point detection processing, and extracting coordinates of corner points at the moment;

before and after the distortion correction processing of the checkerboard camera calibration board image, the image is subjected to distortion stretching processing to eliminate certain image distortion;

according to the calibrated checkerboard calibration board image, a three-dimensional space relative relationship between the calibration board and the mobile phone camera is constructed, as shown in fig. 3:

calculating the pixel error of each corner point to obtain the statistical result of the average pixel error calibrated by the mobile phone camera, as shown in fig. 4, the average pixel error of all corner points in the image is 0.31 pixel,

the method comprises the steps of preparing 4 groups of 20 calibration board images for experiments, extracting corner points of the chessboard pattern calibration board images, and obtaining results through mobile phone camera calibration and distortion correction optimization as shown in table 1.

TABLE 1 camera calibration results

Specifically, in the second step, the Mask R-CNN algorithm comprises a prediction part algorithm and a training part algorithm. Once the tree image segmentation model is trained, the tree image segmentation model can be generally suitable for extracting the outline of a general single tree image, and the pixel difference value of the tree height feature point is automatically output.

The prediction part algorithm comprises the following steps:

(1) feature extraction

The main feature extraction network of the Mask R-CNN algorithm is a combined network of Resnet101 (deep residual error network 101) and FPN (feature pyramid).

The shape of an official coco data set 1024x1024 is used as an input, the length and the width of the shape are taken out, compression is sequentially carried out twice, three times, four times and five times, the result obtained through compression is used for constructing a characteristic pyramid structure, and further support is provided for next processing.

And respectively naming the compressed feature layers in the trunk feature extraction network as C2, C3, C4 and C5 according to the compression times. The classic and Mask network active feature layers corresponding to C2, C3, C4 and C5 are P2, P3, P4 and P5. Further treatment of P5 gave P6. In subsequent operations, the RPN proposed box network model may be used to perform the next operation on P2, P3, P4, P5, and P6, and then the proposed boxes are decoded, so as to obtain the final prediction boxes. Meanwhile, in order to obtain the image semantic segmentation information inside each prediction frame, a Mask semantic segmentation network is required to be used for processing the obtained effective feature layer.

(2) Obtaining a suggestion box and decoding

When the proposal frame is obtained, P2, P3, P4, P5 and P6 effective feature layers are used, and the 5 feature layers use the same RPN proposal frame network. And then adjusting parameters according to the obtained prior frame to determine whether an object exists in the prior frame. The description of the RPN suggestion box network model in the Mask R-CNN algorithm is similar to that in the fast RCNN algorithm. The process of acquiring the suggestion box comprises the following steps:

in the first step, a 512 convolution process is performed with a number of channels of 3 by 3.

The second step is to perform convolution processing of the anchors _ per _ location by 4 and by 2, respectively.

The convolution of anchors _ per _ location times 4 can predict the variation of each a priori box at each grid point used to predict the common feature layer.

The convolution of anchors _ per _ location times 2 can determine whether the inside of the prediction box contains an object, and the prediction box belongs to each grid point of the common feature layer.

Assuming that the size of the input picture shape is 1024x1024x3, the shape of the common feature layer has 256x256x256, 128x128x256, 64x64x256, 32x32x256 and 16x16x256 sizes from large to small, respectively. The idea is to segment the input image and obtain grids with different sizes, wherein three prior frames are set in the grids by default, and the effect displayed by the prior frames on the image is dense and numb. The total number of the a priori boxes at this time is 196608+49152+12288+3072+ 768-261,888. When the shape of the input image is different, the number of the prior boxes is changed.

The prior boxes can represent the position information and the size information of some boxes, but the representation capability is limited, and the box information of any situation cannot be represented, so that the adjustment is needed.

The result of the convolution of anchors _ per _ location times 4 will adjust to these a priori boxes in the image, the processing effect of which is to obtain new boxes.

The convolution of anchors _ per _ location times 2 determines whether the new box obtained above contains an object.

Meanwhile, the anchors _ per _ location multiplied by 4 anchors _ per _ location holds the prior number of boxes contained by the corresponding grid point, where 4 is an adjusted representation of the center and width of the box.

(3) Utilizing the advice boxes-RoI Align

As shown in fig. 5: firstly, a plurality of common feature layers are obtained, and each point on the feature layers corresponds to all feature concentrations in a certain area on the picture to be processed. The suggestion box may then intercept these common feature layers, the intercepted content requiring resize.

The truncated content is resized to 7x7x256 in the classizer model. The truncated content is resized to 14x14x256 in the mask model.

Based on the size of the suggestion box, it can be determined to which feature layer the suggestion box belongs.

In the processing process of the classifier model, a region with the size of 7x7x256 can be obtained through ROIAlign, and the region requires convolution processing by using 7x7 convolution with the channel number of 1024 and 1x1 convolution with the channel number of 1024. The two 1024 full-connection simulations rely on the two 1024 channel number convolutions, and then are fully connected to num _ classes and num _ classes × 4, respectively. num _ classes represents the objects within the suggestion box, and num _ classes 4 represents the parameter adjustment of the suggestion box.

In the processing process of the mask model, the local feature layer after resize is processed by using four times of 256-channel convolution of 3x3, one time of deconvolution processing is completed, then the convolution processing with the channel number of num _ classes is completed, and the processing result shows the classification of each pixel point. The final shape represents the class of each pixel, which is 28x28xnum _ classes in size.

(4) Decoding of prediction frame and obtaining of mask semantic segmentation information

The actual decoding process of the prediction box is designed as follows:

take out the suggestion box that does not belong to the background and has a score greater than config.

And secondly, extracting the prediction result and the suggestion box of the classifier model to the final position of the prediction box.

And thirdly, in order to prevent repeated detection, non-maximum inhibition processing is carried out according to the specific score of the model and the final position of the prediction frame.

Thus, a final prediction box that is more accurate than the previously proposed box has been obtained, which is the region truncation of the mask model. After the mask model intercepts the regions, pixel points can be classified and an image semantic segmentation result is obtained.

Training partial algorithms

To obtain the prediction result of the suggestion box in the common feature layer, it is necessary to perform a convolution of the number of channels anchors _ per _ location x1 and a convolution of the number of channels anchors _ per _ location x 4 after completing a convolution of 3x 3.

In the Mask R-CNN algorithm, the number of prior boxes anchor _ per _ location is 3 as a default, and the prior boxes at each grid point of the effective feature layer are to be predicted, and the two convolutions are analyzed:

the predicted result of the convolution of anchor _ per _ location x 4 is a change in the prior box.

② the prediction result of convolution of anchors _ per _ location x1 is to suggest whether the object inside the box is contained.

When the Mask R-CNN algorithm trains the model, a function aiming at the network prediction result of the suggestion box, namely a loss function, needs to be calculated. And inputting the picture to be processed into a current Mask R-CNN suggestion frame network system, and outputting a suggestion frame result needing encoding.

The encoding functions here are: and converting the real position information format of the suggestion box into a Mask R-CNN algorithm suggestion box prediction result information format. In other words, a priori box prediction result and a proposed box prediction result corresponding to each real box need to be found. The decoding process is a real frame obtained from the prediction result of the proposed frame, and the encoding process is a process of obtaining the prediction result of the proposed frame from the obtained real frame.

(1) Training of Classiffer model

In the algorithm of Mask R-CNN, the suggestion box is required to be adjusted to obtain the final prediction box, and the suggestion box of the classifier model is the prior box.

And calculating the coincidence degree values of all the real frames and the suggested frames, and screening the suggested frame samples according to the coincidence degree values, wherein the suggested frames with the coincidence degree values larger than 0.5 are positive samples, and the suggested frames with the coincidence degree values smaller than 0.5 are negative samples. The adjustment process is performed on the proposed frame, that is, the encoding of the real frame, and the encoding at this time needs to correspond to the proposed frame.

(2) Training of mask model

During training, a suggestion box network is used for intercepting a common feature layer, and the common feature layer is located in a mask model. Because the difference between the feature layer interception condition and the real frame interception condition is large, the relative position of the interception frame and the real frame needs to be calculated, and then correct image semantic segmentation information is successfully obtained.

(3) Production of training data sets

Data sets were prepared using LabelMe. Firstly, 250 tree images are shot in a campus by a smart phone, collected tree images are labeled one by using a LabelMe tool, and in the labeling process, the outline of the tree obtained by the pictures needs to be accurately outlined;

images annotated with LabelMe also need to undergo transcoding from LabelMe to a dataset, i.e., json _ to _ dataset.

Specifically, in the third step, the construction of the single standing tree height model is introduced in detail according to the imaging principle of the pinhole camera model. The tree height measurement model is built as shown in fig. 6. FG is a handset device; α is the field angle; OA1 is the line of sight; the device OP is the distance from the ground; the distance from the mobile phone to the target tree to be measured is L, namely PA1, and the distance is measured by a tape measure; the straight line OM is an optical axis shot by the camera; the actual tree height is A1A 3; ideally, the tree and the mobile phone device are parallel to each other and are perpendicular to the optical axis; ideally the stump height is AA 2; after fy is calibrated by the camera of the smart phone; y' is the pixel difference from the highest point to the lowest point of the tree contour in the image; theta is a mobile phone inclination angle obtained by the mobile phone direction sensor; the parallel relationship is AA2// FG// y' y "; the vertical relationship is as follows: FG perpendicular to OM, OP perpendicular to OH, OM perpendicular to FG, AA2, and y' y "; fy is the focal length in the direction of the mobile phone ordinate; therefore, angle MOA1 is equal to 0.5 α, and since angle POG + angle GPN is equal to 90 °, angle NOM + angle GPN is equal to 90 °, so angle POG is equal to angle NOM.

In mobile phoneAfter distortion correction, the tree image is transformed from the actual image A1A3 to the image AA2, so AA2 is A1 A3. The imaging of the tree image AA2 on the camera imaging plane is y', again because of Oy₀＝f_y，AA₂＝A₁A, therefore AM ═ A₃M and A1M ═ a₂And M. From the relationships in the model, equations (20) to (23) can be derived as follows:

ON＝PA₁ (21)

OM＝ON*cosθ (22)

OM＝PA₁*cosθ (23)

since A1A3 is the required real tree height H, i.e. formula (24)

H＝A₁A₃ (24)

The final standing tree height calculation formula (25) can be obtained from the formulas (20) to (24)

In the above equation (25), the length unit is meter/m, the angle unit is degree/° and the units of y' y "and fy are pixels/pixel. In the calculation process, the pixel unit of the formula is eliminated by the reduction, so that the finally obtained tree height is H and the unit is meter/m.

The method for measuring the height of the single standing tree is based on a smart phone, and a prototype APP for measuring the height of the tree is developed on an Android platform, wherein a main development language involved in the camera calibration, standing tree image distortion correction, Mask R-CNN image processing algorithm and tree height measurement model building and calculating parts of the smart phone is Python language; the Java language is mainly involved in the measurement APP interface design and related API calling part. The system uses an Android native application development mode, and combines with Android system components such as Activity, Service, Broadcast Receiver and Content Provider, and can realize functions required by the standing tree height measurement work.

The android front end of the tree height measurement APP uses a lightweight frame OkHttp; the back end uses light Web application frame flash [37], which is more light, flexible and safe compared with other similar frames.

The main tasks to be completed in android front-end development are the design of a tree height measurement interface, the calling of a mobile phone and a system photo album, the function of compiling tree images and parameters under an OkHttp frame and uploading the tree images and parameters to a server, and the function of receiving and displaying tree height measurement results; the main task of Python back-end development is to design and implement a special routing interface corresponding to the front end under a flash frame, and simultaneously, to complete tree image segmentation processing and tree height calculation of Mask R-CNN algorithm, and to return the tree height calculation result to the front end for display, as shown in fig. 7.

Tree height measurement process:

opening a tree height measurement APP, enabling a measurer to need to be on the same horizontal ground with a tree to be measured, clicking a photographing button, and calling a mobile phone camera to photograph the tree to be measured by a system, wherein a photographed tree picture needs to comprise the whole tree of a single plant, and the photographed background is required to be as uncomplicated as possible; meanwhile, the system can automatically acquire the inclination angle of the mobile phone output by the direction sensor when the mobile phone shoots;

clicking a picture selection button to confirm and display the tree image to be measured, and inputting the horizontal distance from the mobile phone to the tree to be measured in an input box;

clicking an uploading picture and parameter button, and uploading the image of the tree to be detected, the inclination angle when the mobile phone shoots and the horizontal distance from the mobile phone to the tree to be detected to a server by the system for subsequent processing; meanwhile, a prompt for waiting for picture uploading, uploading and uploading success is given in the interface;

after uploading successfully, clicking a measurement button and waiting for algorithm processing at the rear end for 4-4 seconds, prompting that measurement is successful and displaying the measured tree height and the tree image after segmentation processing in an interface

And (3) verifying and analyzing the tree height measurement result:

(1) tree height measurement error analysis

And randomly selecting 20 sample trees for measurement, and carrying out experimental verification on the algorithm.

Table 2 shows the measured data of the tree height experiment

The Swedish Vertex lV (60 degree) ultrasonic tree height measuring instrument is selected to measure the tree height of 20 standing trees. Each sample tree was measured 3 times with an ultrasonic tree height finder and the average was taken as the true tree height. The 20 stands were then measured using a calibrated lenova L38041 cell phone, the distance L of the cell phone to the target tree was measured using a tape measure, and then the shot measurements were taken using the cell phone target tree.

During the measurement, care is taken to keep the target tree and the measurement personnel on the same horizontal plane. The result pairs of tree height measurements are shown in table 2. As can be seen from the table, the relative error of the tree height measurements is less than 6.5%. In conclusion, the standing tree height measuring method meets the resource investigation requirements of precision forestry and digital forestry.

(2) Tree height measurement stability analysis:

randomly selecting a target tree to be measured in the university campus of Guangxi, respectively carrying out 10 times of repeatability measurement by using a Swedish Vertex lV (60 DEG) ultrasonic tree height measuring instrument and the APP measurement method in the invention, measuring the tree height measurement precision by using the standard deviation Isd of the 10 times of measurement as an evaluation index, and calculating the standard deviation by using a formula (26):

in the formula, the number of times a tree is observed is n, where n is 10; x is the value of the tree height,

is the average of the tree heights.

The result of measuring the tree height is recorded in table 3, the standard deviation calculated by using the ultrasonic tree height measuring instrument to measure the tree height is 0.553, and the standard deviation of using the APP to measure the tree height is 0.0031, which indicates that the stability of the tree height measuring method researched by the invention is higher.

TABLE 3 Tree height measurement stability analysis

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner; those skilled in the art can readily practice the invention as shown and described in the drawings and detailed description herein; however, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims; meanwhile, any changes, modifications, and evolutions of the equivalent changes of the above embodiments according to the actual techniques of the present invention are still within the protection scope of the technical solution of the present invention.

Claims

1. A standing tree height measuring method based on machine vision is characterized by comprising the following steps:

2. The standing tree height measuring method based on machine vision according to claim 1, characterized in that in the first step, a camera imaging model is firstly established, the camera imaging model comprises conversion between coordinate systems involved in camera imaging, a conversion formula of real object points (X, Y, Z) and pixel points (u, v), and M (X)_w,Y_w,Z_w) And (u, v).

3. The machine vision-based standing tree height measuring method according to claim 1, wherein in the first step, when the android smartphone camera is calibrated by a camera calibration method with nonlinear distortion, the size of the calibration plate is selected to be a calibration plate area which is greater than or equal to one half of the available pixel area, and the type of the calibration plate is selected to be a checkerboard calibration plate.

4. The standing tree height measurement method based on machine vision according to claim 1, wherein in the first step, the nonlinear distortion correction model formula is:

5. The method for measuring the height of the standing tree based on the machine vision is characterized in that in the first step, the step of extracting the nonlinear distortion parameters and the internal and external parameters of the camera comprises homography, parameter constraint, a nonlinear Levenberg-Marquardt algorithm and distortion optimization.

6. The standing tree height measuring method based on machine vision according to claim 1, characterized in that in the first step, the perspective distortion correction based on point operation is performed on the standing tree image to be measured by adopting a formula

7. The standing tree height measurement method based on machine vision according to claim 1, wherein in the second step, the Mask R-CNN algorithm includes a prediction part algorithm and a training part algorithm.

8. According to the claimsThe method for measuring the height of the standing tree based on the machine vision in the step 1 is characterized in that in the third step, the tree height data of the tree is

9. The standing tree height measuring method based on machine vision according to any one of claims 1-8, wherein the method is applied to a smart phone.

10. A standing tree height measuring system based on machine vision is characterized by comprising a shooting unit, a measuring unit and a control unit, wherein the shooting unit is used for shooting a tree to be measured, and a shot tree picture needs to comprise the whole tree of a single plant;