CN114898218A

CN114898218A - Monocular lookout tower monitoring video target space positioning method of high-dimensional overdetermined equation

Info

Publication number: CN114898218A
Application number: CN202210711311.3A
Authority: CN
Inventors: 尹烁; 戚知晨; 阮婧; 殷海军
Original assignee: NANJING GUOTU INFORMATION INDUSTRY CO LTD
Current assignee: NANJING GUOTU INFORMATION INDUSTRY CO LTD
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2022-08-12

Abstract

The invention discloses a monocular lookout tower monitoring video target space positioning method based on a high-dimensional over-determined equation. The method is based on the real-time return angle data of the monocular lookout tower camera cloud deck and the geographic space coordinates of the video image ground objects, an over-determined equation set is constructed, the view space bidirectional mapping is realized, the positioning precision requirement is met, the problem of real-time bidirectional positioning of the video target monitored by the monocular lookout tower camera and the geographic space is solved, and the method has the advantages of strong applicability, wide positioning range, high positioning precision, real-time data return and the like, and has higher application value in scenes such as urban security, forest fire prevention and the like.

Description

Monocular lookout tower monitoring video target space positioning method of high-dimensional overdetermined equation

Technical Field

The invention belongs to the technical field of monitoring video target space positioning, and particularly relates to a monocular watchtower monitoring video target space positioning method based on a high-dimensional overdetermined equation.

Background

Surveillance videos contain rich visual information, but do not have geospatial attributes. Therefore, the spatial expression of the target of the surveillance video is always the key point and the difficulty in the fusion direction of the surveillance video and the geographic information.

Monocular PTZ camera surveillance video is currently the most widespread one in the field of video surveillance. The traditional monocular PTZ camera monitoring video and the geographic scene are fused in two modes of visual perspective transformation and pixel characteristic matching. The visual perspective transformation specifically comprises a perspective transformation matrix and an inverse perspective transformation matrix so as to realize the projection and the inverse projection of the video to the geographic space, and the method has high requirements on the accuracy of the position coordinates and the actual posture parameters of the camera in the physical world. At present, a large number of monocular watchtower monitoring facilities have the problems of geographic space coordinate loss, relative parameter of attitude parameters and the like, are difficult to adapt to actual engineering production work, and have limited practical popularization. The pixel feature matching is realized by adopting a video and geographic scene picture pixel feature matching approach, such as a SIFT + RANSAC method, a three-dimensional geographic scene needs to be rendered on a computer in real time, a scene simulation camera is used for framing, and feature matching solving is carried out between the three-dimensional geographic scene and actually acquired video data. Therefore, the invention is a high-precision positioning method suitable for field remote distance and has important significance for popularization of engineering application.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to construct a high-dimensional overdetermined equation set based on real-time return angle data of a single-view lookout tower camera tripod head and the geographic space coordinates of video image ground objects, and realize the visual space bidirectional mapping, thereby solving the problem of real-time bidirectional positioning between a video target monitored by the single-view lookout tower camera and the geographic space, expanding the positioning range and improving the positioning precision.

The technical scheme is as follows: the invention discloses a monocular lookout tower monitoring video target space positioning method of a high-dimensional overdetermined equation, which comprises the following steps of:

step 1, constructing a visual reference point: and adjusting the posture of the camera, aligning the central point of the picture of the camera to the ground object with clear boundary, finding the ground object on the map, recording the posture of the camera and the geographic space coordinates of the ground object at the moment, and taking the coordinates as a reference point of the visual ground and recording the coordinates as P (x, y, z, P, t).

Step 2, constructing a monocular camera view space bidirectional mapping model: the essence of the monocular camera view-space bidirectional mapping model is a function relationship of one-to-one correspondence of a camera rotation angle (p, t) and a geographic space coordinate (x, y, z), and the monocular camera view-space bidirectional mapping model can be disassembled into a view-space mapping model from the geographic space coordinate to the camera rotation angle and a view-space mapping model from the camera rotation angle to the geographic space coordinate, and the specific steps are as follows:

and 2.1, constructing a coordinate conversion relation from (x, y, z) to (p, t), establishing a corresponding high-dimensional over-determined equation, substituting the view ground reference points, solving unknown numbers of the high-dimensional over-determined equation, and obtaining a ground view mapping model from the geographic space coordinates to the camera rotation angle.

And 2.2, constructing a coordinate conversion relation from (p, t) to (x, y, z), establishing a corresponding high-dimensional over-determined equation, substituting the reference points into the view ground, solving unknown numbers of the high-dimensional over-determined equation, and obtaining a view ground mapping model from the rotation angle of the camera to the geographic space coordinate.

Step 3, error assessment and model optimization: and substituting the geographic space coordinates of the visual reference points into the model to obtain corresponding rotation angles, calculating a ground-view conversion error, and adjusting or deleting points with larger errors until the root-mean-square error meets the positioning requirement. And substituting the rotation angle of the visual reference point into the model to obtain a corresponding geographic space coordinate, calculating a visual conversion error, and adjusting or deleting the point with the larger error until the root mean square error meets the positioning requirement.

Step 4, ground vision conversion: the method comprises the steps of selecting a point P at random within a map positioning range, obtaining geographic space coordinates (x, y, z) of the point, substituting the (x, y, z) into a selected ground-view positioning model, calculating corresponding camera rotation angles (P', t), adopting a geometric correction method for optimization due to the fact that a ground-view conversion model has certain system errors, calculating and obtaining the camera rotation angles (P, t) with higher precision, and then connecting a network to remotely control camera posture transformation to enable a target point position to be located in the center of a monitoring picture so as to achieve ground-view conversion.

Step 5, converting according to the circumstances: randomly selecting a target point in a camera monitoring picture, acquiring corresponding camera external parameters (p, t) when the target point is calibrated by a simple pinhole model method, substituting the (p, t) into a monocular camera visual space bidirectional mapping model, solving corresponding geographic space coordinates (x, y, z), and identifying on a map to realize visual conversion.

Further, in the step 1, the number of the acquired visual reference points should be not less than 14 groups, so as to meet the resolving requirement of the unknown number in the high-dimensional over-determined equation. The reference points should be distributed as uniformly as possible in all directions of the camera to balance the positioning accuracy within the positioning range.

Further, in step 2, the substance of the monocular camera view-space two-way mapping model is a function relationship of one-to-one correspondence between the camera rotation angle (p, t) and the geospatial coordinates (x, y, z), and can be disassembled into a geospatial coordinate to camera rotation angle view mapping model and a camera rotation angle to geospatial coordinate view mapping model, and the specific steps are as follows:

step 2.1, constructing a coordinate conversion relation from (x, y, z) to (p, t), establishing a corresponding high-dimensional overdetermined equation shown in a formula (1),

converting the formula (1) into a matrix form, see formula (2):

and (3) substituting any 13 groups of ground-view reference points acquired in the step (1) into an equation (2), and solving a matrix by adopting a least square method to obtain a ground-view mapping model from the geospatial coordinates to the rotation angle of the camera.

Step 2.2, constructing a coordinate conversion relation from (x, y, z) to (p, t), establishing a corresponding high-dimensional overdetermined equation, see formula (3),

converting the above formula into matrix form, see formula (4)

And (3) substituting any 14 groups of ground reference points acquired in the step (1) into an equation (4), and solving a matrix by adopting a least square method to obtain a ground mapping model from the rotation angle of the camera to the geographic space coordinate.

Further, in step 3, the error is visually converted to reference point geospatial coordinates and a distance measure between predicted coordinates. The geodesic transformation error is measured in terms of the reference point rotation angle and the predicted rotation angle deviation.

Further, in the step 4, after the camera rotation angle (p', t) is calculated from the ground view model, the arbitrary posture (p) of the camera is determined ₀ ,t ₀ ) A geometric correction method is adopted to construct the three-dimensional space relation between the camera and the selected point and evaluate the coordinates (x, y, z) of the selected point and the coordinates (x) of the camera ₀ ,y ₀ ,z ₀ ) To reduce the systematic error caused by the view conversion model, and to obtain the higher precision camera rotation angle (p, t) corresponding to the geographic space coordinate, the p value is calculated by formula (5),

wherein S represents the setpoint coordinates (x, y, z) and the camera coordinates (x) ₀ ,y ₀ ,z ₀ ) WorkshopThe distance of (c).

Further, in the step 5, a perspective projection model method is adopted, that is, under the condition that the internal parameters of the camera are fixed, a plurality of calibration images are acquired through angle conversion, the relationship between the internal parameters of the camera and the external parameters of any calibration image is determined, through linear model analysis, the camera parameter optimization solution of any image can be calculated, and nonlinear refinement is performed by using a maximum likelihood method to obtain the required camera parameters (p, t).

Has the advantages that: compared with the prior art, the invention has the following advantages:

(1) according to the invention, the mapping relation is established by associating the rotation angle of the camera with the geographic coordinate for the first time through researching the data characteristics and the positioning conditions of the video monitoring of the monocular lookout tower. The camera is remotely controlled through linkage of the monitoring picture and map data to realize remote accurate monitoring of the target, so that the monitoring and supervision capacity of the monocular high-altitude observation tower video is greatly improved, and the video monitoring and supervision system is effectively used for businesses such as city security, forest fire prevention and the like.

(2) The invention constructs a monocular camera ground space bidirectional mapping model based on a high-dimensional overdetermined equation, and can realize high-precision ground space bidirectional positioning in a larger range through a limited number of ground reference point identifiers.

Drawings

FIG. 1 is a flow chart of the present invention

FIG. 2 is a view of a reference point distribution diagram

FIG. 3 is an error analysis chart

FIG. 4 is a schematic view of a camera in space of a rotation angle

FIG. 5 is a ground view of the conversion effect

FIG. 6 is a schematic view of a camera calibration image

FIG. 7 is a view of the effect of the visual conversion

Detailed Description

The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.

As shown in fig. 1, the method for positioning a monocular watchtower monitoring video target space by using a high-dimensional over-determined equation in this embodiment outputs a positioning parameter by constructing a bidirectional mapping relationship of a visual reference point, so as to implement visual bidirectional positioning, and specifically includes the following steps:

step 1, constructing a visual reference point: adjusting the posture of the camera, aligning the central point of the camera picture with the ground object with clear boundary, finding the ground object on the map, recording the posture of the camera and the geographic space coordinates of the ground object at the moment, taking the coordinates as a ground reference point and recording the coordinates as P ₁ (x ₁ ,y ₁ ,z ₁ ,p ₁ ,t ₁ )、P ₂ (x ₂ ,y ₂ ,z ₂ ,p ₂ ,t ₂ )、……、 P _n (x _n ,y _n ,z _n ,p _n ,t _n ) The number of the visual reference points is not less than 14 groups, and the visual reference points are uniformly distributed in all directions of the camera as much as possible so as to balance the positioning precision in the positioning range. Fig. 2 shows a view reference point distribution diagram, in which the shaded portion is the invisible area where the camera view is blocked, and therefore there is no optional reference point.

converting the formula (1) into a matrix form, see formula (2):

and (3) substituting any 13 groups of ground-view reference points acquired in the step (1) into an expression (2), and solving a matrix by adopting a least square method to obtain a ground-view mapping model from the geographic space coordinates to the rotation angle of the camera.

Step 2.2, constructing a coordinate conversion relation from (p, t) to (x, y, z), establishing a corresponding high-dimensional overdetermined equation, see formula (3),

converting the above formula into matrix form, see formula (4)

Step 3, error assessment and model optimization: and substituting the geographic space coordinates of the visual reference points into the model to obtain corresponding rotation angles, calculating a ground-view conversion error, and adjusting or deleting points with larger errors until the root-mean-square error meets the positioning requirement. And substituting the rotation angle of the visual reference point into the model to obtain a corresponding geographic space coordinate, calculating a visual conversion error, and adjusting or deleting the point with the larger error until the root mean square error meets the positioning requirement. The error is visually transformed to reference point geospatial coordinates and a distance measure between predicted coordinates. The geodesic transformation error is measured in terms of the reference point rotation angle and the predicted rotation angle deviation. As shown in fig. 3, the single-point positioning algorithm can exhaust the target position on the three-dimensional terrain profile to realize target object positioning, is a common watchtower monitoring video positioning method, and is widely applied to the field of forest fire prevention. The random forest is used as a machine learning algorithm with high accuracy, the core idea is integrated learning, unbiased estimation of reverse internal generation errors is obtained by integrating a plurality of decision trees, and the unbiased estimation can be used for solving the regression problem. Therefore, two algorithms of single-point positioning and random forest fitting are adopted as comparison experiments. Fig. 3(a) is a distribution diagram of predicted geographic coordinates, and it can be seen from the diagram that the error of the predicted result of the single-point positioning algorithm is large, mainly because it has a high requirement on the elevation accuracy, and is affected by the installation inclination of the pan-tilt, and the error increase areas are symmetrically distributed. In fig. 3(b), as the distance between the verification point and the tower increases, the random forest method error gradually increases, the single-point positioning method error distribution is unstable, and the algorithm error is stably distributed and has a downward trend. The phenomenon shows that the mapping model provided by the algorithm is not influenced by the prediction distance and has superiority in apparent fitting. In FIG. 3(c), the random forest prediction error is larger for the pan value due to the limited number of reference points, while the random forest fitting is more suitable for large data prediction scenarios. In the area where the reference points are sparsely distributed, the error of the algorithm tends to rise. Finally, the algorithm is obviously superior to other algorithms in the aspect of computational efficiency.

Table 1 is the error of the validation point prediction calculated by the visually transformed mapping model after error evaluation and model optimization, where the visually transformed error is measured as the distance d-error between the reference point geographic coordinates and the predicted coordinates. The conversion error is looked at to predict the rotation angle deviation p-error, t-error metric.

TABLE 1 results of the positioning experiment

Step 4, ground vision conversion: selecting a point P arbitrarily in the map positioning range, acquiring the geographic space coordinates (x, y, z) of the point, substituting the (x, y, z) into the selected ground-view positioning model, and calculating the corresponding rotation angle (P', t) of the camera, wherein the arbitrary posture (P) of the camera is determined due to the certain system error of the ground-view conversion model ₀ ,t ₀ ) A geometric correction method is adopted to construct the three-dimensional space relation between the camera and the selected point and evaluate the coordinates (x, y, z) of the selected point and the coordinates (x) of the camera ₀ ,y ₀ ,z ₀ ) To reduce the systematic error caused by the view conversion model, and to obtain the higher-precision camera rotation angle (p, t) corresponding to the geospatial coordinates, the p value is calculated by the formula (A)5)，

Wherein S represents the setpoint coordinates (x, y, z) and the camera coordinates (x) ₀ ,y ₀ ,z ₀ ) The distance between them.

And controlling the posture transformation of the camera according to the rotation angle to enable the target point to be positioned at the center of the monitoring picture, thereby realizing the ground vision conversion. As shown in fig. 5, the ground view conversion effect map is obtained by taking a camera as a center, uniformly selecting 16 points around the camera according to a rule, calculating the rotation angle of the camera and marking the feature corresponding to the map point in a video picture. In the figure, the prediction results of 16 positioning points are all positioned in the picture, more than 60 percent of the prediction results are close to the center of the picture, and the positioning effect is good.

And 5, visually converting: selecting a target point in a camera monitoring picture at will, collecting a plurality of calibration images through angle transformation under the condition that internal parameters of the camera are fixed by adopting a pinhole model method, determining the relation between the internal parameters of the camera and external parameters of any calibration image, calculating a camera parameter optimization solution of any image through linear model analysis, and performing nonlinear refinement by utilizing a maximum likelihood method to obtain required camera parameters (p, t), wherein the camera calibration image schematic diagram is shown in fig. 6. And (p, t) is substituted into the monocular camera view space bidirectional mapping model, so that the corresponding geographic space coordinates (x, y, z) can be solved, and the coordinates are marked on a map, thereby realizing view conversion. Fig. 7 is a view showing the effect of visual conversion. The visual ground conversion can be widely applied to target monitoring and positioning, three-dimensional information is added to a two-dimensional target detection result, a pond electronic fence is drawn in a video picture, a drawing point prediction result is displayed in a map, and the prediction result is basically matched with the contour of the pond.

In conclusion, the method and the device construct the over-determined equation set based on the real-time returned angle data of the tripod head of the monocular lookout tower camera and the geographic space coordinates of the ground objects of the video pictures, realize the bidirectional mapping of the view space, meet the positioning precision requirement, solve the problem of real-time bidirectional positioning of the video target monitored by the monocular lookout tower camera and the geographic space, and have the advantages of strong applicability, wide positioning range, high positioning precision, real-time returned data and the like. The invention does not depend on accurate camera external reference information, such as whether the camera is installed horizontally or not and whether the geographic coordinates are absolutely accurate or not, and has stronger adaptability to outdoor complex environment positioning. The positioning distance can reach more than 500 meters, the positioning precision is within 20 meters, and the method has high application value in scenes such as city security, forest fire prevention and the like.

Claims

1. A monocular watchtower monitoring video target space positioning method based on a high-dimensional overdetermined equation is characterized by comprising the following steps: the method is based on a high-dimensional over-determined equation, a video camera view space mapping model is constructed through a small number of reference point calibration, and high-precision bidirectional positioning between a field large-scale video picture target point location and a geographic space coordinate is achieved. The method comprises the following steps:

step 1, constructing a visual reference point: and adjusting the posture of the camera, aligning the central point of the picture of the camera to a ground feature with clear boundary, finding the ground feature on the map, recording the geographic space coordinates of the ground feature and the rotation angles Pan and Tilt of the camera at the moment as a visual reference point, and marking as P (x, y, z, P and t).

Step 2, constructing a monocular camera view space bidirectional mapping model: the essence of the monocular camera view-space bidirectional mapping model is a corresponding functional relationship between a camera rotation angle (p, t) and a geographic space coordinate (x, y, z), and the monocular camera view-space bidirectional mapping model can be disassembled into a view-space mapping model from the geographic space coordinate to the camera rotation angle and a view-space mapping model from the camera rotation angle to the geographic space coordinate, and the specific steps are as follows:

And 5, visually converting: randomly selecting a target point in a camera monitoring picture, acquiring a corresponding camera rotation angle parameter (p, t) when the target is positioned at the center of the picture according to a pixel coordinate of the target point through a perspective projection model, substituting the (p, t) into a monocular camera view-space two-way mapping model, solving to obtain a corresponding geographic space coordinate (x, y, z), and identifying on a map to realize view conversion.

2. The monocular watchtower monitoring video target space positioning method based on the high-dimensional overdetermined equation as recited in claim 1, wherein: in the step 1, the number of the acquired visual reference points is not less than 14 groups, so that the resolving requirement of the unknown number in the high-dimensional over-determined equation is met. The reference points should be distributed as uniformly as possible in all directions of the camera to balance the positioning accuracy within the positioning range.

3. The monocular watchtower monitoring video target space positioning method based on the high-dimensional overdetermined equation as recited in claim 1, wherein: in the step 2, the monocular camera view space bidirectional mapping model is substantially a corresponding functional relationship between the camera rotation angle (p, t) and the geospatial coordinates (x, y, z), and can be disassembled into a view mapping model from the geospatial coordinates to the camera rotation angle and a view mapping model from the camera rotation angle to the geospatial coordinates, and the specific steps are as follows:

step 3.1, a geographical mapping model from the geographic space coordinates to the rotation angle of the camera, namely, a coordinate conversion relation from (x, y, z) to (p, t) is constructed, a corresponding high-dimensional overdetermined equation is established, see formula (1),

converting the formula (1) into a matrix form, see formula (2),

Step 3.2, a visual mapping model of the camera rotation angle to the geographic space coordinate is established, namely a coordinate conversion relation from (p, t) to (x, y, z) is established, a corresponding high-dimensional overdetermined equation is established, see formula (3),

the above formula is converted into a matrix form, see formula (4),

4. The monocular watchtower monitoring video target space positioning method based on the high-dimensional overdetermined equation as recited in claim 1, wherein: in said step 3, the error is visually converted to measure the distance between the reference point geospatial coordinates and the predicted coordinates. The geodesic transformation error is measured in terms of the reference point rotation angle and the predicted rotation angle deviation.

5. The monocular watchtower monitoring video target space positioning method based on the high-dimensional overdetermined equation as recited in claim 1, wherein: in the step 4, after the camera rotation angle (p', t) is calculated by the ground view model, an arbitrary posture (p) of the camera is determined ₀ ,t ₀ ) A geometric correction method is adopted to construct the three-dimensional space relation between the camera and the selected point and evaluate the coordinates (x, y, z) of the selected point and the coordinates (x) of the camera ₀ ,y ₀ ,z ₀ ) So as to reduce the systematic error brought by the visual transformation model, the p value is calculated by formula (5),

6. The monocular watchtower monitoring video target space positioning method based on the high-dimensional overdetermined equation as recited in claim 1, wherein: in the step 5, according to the perspective projection model, namely under the condition that the internal parameters of the camera are fixed, a plurality of calibration images are acquired through angle transformation, the relationship between the internal parameters of the camera and the external parameters of any calibration image is determined, through linear model analysis, the camera parameter optimization solution of any image can be calculated, and nonlinear refinement is performed by using a maximum likelihood method to obtain the required camera parameters (p, t).