CN112509056B

CN112509056B - Dynamic battlefield environment real-time path planning system and method

Info

Publication number: CN112509056B
Application number: CN202011373402.8A
Authority: CN
Inventors: 张西山; 连光耀; 李会杰; 闫鹏程; 孙江生; 袁祥波; 连云峰; 代冬升; 李雅峰; 李万领; 钟华; 付久长; 王子林; 宋秦松; 郭伟
Original assignee: 32181 Troops of PLA
Current assignee: 32181 Troops of PLA
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2022-12-20
Anticipated expiration: 2040-11-30
Also published as: CN112509056A

Abstract

The invention discloses a dynamic battlefield environment real-time path planning system and a method, wherein the dynamic battlefield environment real-time path planning system comprises a map reconstruction module, a visual perception module, an intelligent decision module and a UI interaction module; the UI interaction module realizes mutual calling and information connection of parameters among different modules; the map reconstruction module takes an image shot by the unmanned aerial vehicle as input to generate a real-time digital earth surface model; the visual perception module is used for extracting geographic entities existing in images shot by the unmanned aerial vehicle, generating a grid map from the real-time digital earth surface model, and setting barrier marks on the grid map; the intelligent decision module is combined with a DSAS algorithm to find an optimal path which can bypass the barrier to reach the destination from the beginning of the environment, so that real-time path planning is realized. The invention can provide high-real-time command control information for the unmanned equipment in a dynamic battlefield environment, thereby improving the autonomy capability of the unmanned equipment.

Description

Dynamic battlefield environment real-time path planning system and method

Technical Field

The invention relates to the field of paths, in particular to a dynamic battlefield environment real-time path planning system and a dynamic battlefield environment real-time path planning method.

Background

The information plays an increasingly important role in daily life, the efficiency and the quality of work can be obviously improved, and the information is used as an intelligent control system of unmanned equipment and can greatly improve the flexibility of the system through a flexible unmanned flight platform. With the rapid development of electronic and computer technologies in recent years, low-cost unmanned aerial vehicles have become mature, high-speed image processing is gradually applied to various tasks, and the realization of efficient intelligent control through the low-cost unmanned aerial vehicles becomes a flexible and efficient means. However, the traditional unmanned aerial vehicle map software system does not have a real-time map building function and an autonomous capability, and is low in processing speed, so that a real-time path planning system with the real-time map building function and the high processing speed is required.

Disclosure of Invention

In order to solve the problems, the invention provides a dynamic battlefield environment real-time path planning system and a dynamic battlefield environment real-time path planning method, which can provide high-real-time command control information for unmanned equipment in a dynamic battlefield environment, so that the autonomy capability of the unmanned equipment is improved.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a dynamic battlefield environment real-time path planning system, comprising: the system comprises a map reconstruction module, a visual perception module, an intelligent decision module and a UI interaction module;

the UI interaction module realizes mutual calling and information connection of parameters among different modules;

the map reconstruction module takes an image shot by the unmanned aerial vehicle as input to generate a real-time digital earth surface model;

the visual perception module is used for extracting geographic entities existing in images shot by the unmanned aerial vehicle, generating a grid map from the real-time digital earth surface model, and setting barrier marks on the grid map;

the intelligent decision module is combined with a DSAS algorithm to realize that an optimal path which can bypass the barrier is searched from the beginning of the environment to the end point, and real-time path planning is realized.

Optionally, the map reconstructing module includes: the visual SLAM takes aerial images shot by an unmanned aerial vehicle on line as input, generates 3D point cloud for each frame of image in real time, generates local DSM in real time by combining the 3D point cloud with texture information of the image, and fuses the local DSM to the global DSM to generate a real-time digital earth surface model.

Optionally, the visual SLAM system includes feature extraction, tracking, mapping, and looping;

the feature extraction is used for extracting feature points in the aerial image and calculating a descriptor;

the tracking completes the feature matching by calculating the distance between the feature point descriptors of the current frame and the previous frame, and then recovers the pose change of the current frame relative to the previous frame through the feature matching;

searching more feature matches in the current frame and the previous frames, recovering more 3D points through triangulation, and performing local BA optimization on the current frame, the commonly observed associated frame and the commonly observed map point to acquire more accurate camera pose and 3D map point coordinates;

the loop is used for detecting whether the camera returns to a place which is previously visited or not, and adding new constraint according to the detected loop to correct the corresponding camera pose and the 3D map point coordinate.

Optionally, the image shot by the unmanned aerial vehicle is used as input to generate a real-time digital earth surface model, a target detection algorithm is used to extract geographic entities existing in the image, a pre-trained AlexNet convolutional neural network is used to extract depth level features of the geographic entities, dimension reduction processing is performed on feature vectors of each geographic entity to generate a grid map, barrier marks are set at corresponding positions in the grid map according to the types of the ground objects, starting from the initial place of the environment and finding an optimal path capable of bypassing the barriers to reach the destination by combining with a DSAS algorithm are achieved, and real-time path planning is achieved.

Optionally, the generating the real-time digital earth surface model by using the image shot by the unmanned aerial vehicle as an input includes using a visual SLAM to use an aerial image shot by the unmanned aerial vehicle on line as an input, generating a 3D point cloud for each frame of the aerial image in real time, removing noise points of the 3D point cloud by iterative filtering, performing delaunay triangulation on feature points of the 3D point cloud after the noise points are filtered to generate a 2D mesh, projecting the generated 2D mesh into the 3D point cloud after the noise points are filtered to generate a 3D mesh, fusing texture information of the 3D mesh into the aerial image to generate a local DSM, dividing the local DSM into tiles, and fusing the local DSM into the global DSM by using a multiband algorithm to generate the real-time digital earth surface model.

Optionally, the extracting, by using the target detection algorithm, the geographic entity present in the image includes dividing the input image into S × S grids, each grid being responsible for detecting all objects falling into its own region, where each grid predicts 5 values, and predicts coordinates (x, y) of a center position of a prediction box of the obtained object, width and height (w, h) of the object, and confidence that the object belongs to a certain class, where if the prediction box contains a detection object, pr is 1, otherwise, pr is 0, and in the prediction stage, the probability of the class is a product of a class conditional probability and the predicted confidence:

optionally, the geographic entity comprises a relatively stationary geographic entity.

In the DSAS algorithm, besides coordinate information representing the space position of the unmanned aerial vehicle, pointers pointing to a father node and all child nodes of the unmanned aerial vehicle are also included, when a new obstacle in front of the current flight path is detected, the DSAS algorithm deletes the flight path nodes threatening the coverage area and all descendant nodes of the flight path nodes, reinserts the father nodes into an OPEN table, calculates the cost value of a new expansion node, and continues the searching process until a new optimal flight path of the unmanned aerial vehicle is obtained

Compared with the prior art, the invention has the technical progress that:

compared with the traditional computer vision method, the method has the advantages that the processing capacity is high, the real-time processing capacity is realized, more than 30frames of images can be processed per second, the map data can reach more than ten million point clouds, and high real-time command control information can be provided for the unmanned equipment in a dynamic battlefield environment, so that the autonomous capacity of the unmanned equipment is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

In the drawings:

FIG. 1 is a block diagram of the present invention.

Fig. 2 is a diagram of a descriptor of SIFT keypoints.

FIG. 3 is a schematic diagram of a matrix in BundleAdjustment.

FIG. 4 is a flow chart of the present invention.

Fig. 5 is a specific network structure diagram of the YOLO model.

Fig. 6 is a specific network flow diagram of the YOLO model.

Detailed Description

These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

As shown in FIG. 1, the present invention discloses a dynamic battlefield environment real-time path planning system, which comprises.

The system comprises a map reconstruction module, a visual perception module, an intelligent decision module and a UI interaction module;

each module of the invention is decoupled, and the parameters among different modules are mutually called and connected with information through the UI interaction module. The UI interface module is a part for interacting with a user and comprises several functions of building a project, loading the project, detecting a target and training a model, and in the newly built project, the user can determine the name of the project, the position of the project, a picture sequence, a label category and a used detection algorithm. A load item is an item previously created by a load user. And finishing the result of the constructed image by target detection. Training the model to retrain the model using the corrected labels. And right clicking the display path planning list in the interface, wherein the UI interface supports the user to modify the incorrect prediction result, and the corrected label data is used for automatically updating the parameter weight of the model again, so that the precision of the model is continuously improved in the use process, and each module of the system can be independently called and tested by virtue of the modular design, and new modules can be conveniently and continuously added.

Specifically, the map reconstruction module includes: the visual SLAM takes aerial images shot by an unmanned aerial vehicle on line as input, generates 3D point cloud for each frame of image in real time, generates local DSM in real time by combining the 3D point cloud with texture information of the image, and fuses the local DSM to the global DSM to generate a real-time digital earth surface model.

Wherein, the vision SLAM system includes: extracting, tracking, drawing and looping the features;

tracking and completing feature matching by calculating the distance between feature point descriptors of a current frame and a previous frame, and recovering the pose change of the current frame relative to the previous frame through the feature matching;

establishing a map, searching more feature matches in a current frame and previous frames, recovering more 3D points through triangulation, and performing local BA optimization on the current frame, a commonly observed associated frame and commonly observed map points to obtain more accurate camera pose and 3D map point coordinates;

the loop is used for detecting whether the camera returns to a place which is previously visited or not, and a new constraint is added according to the detected loop to correct the corresponding camera pose and the 3D map point coordinate.

Specifically, the unmanned aerial vehicle carries a camera to acquire aerial images at the speed of 30frames/s and transmits the aerial images to the ground station in real time. To recover the pose of the camera and the 3D structure of the environment from the aerial images, feature points in the images, which are images of landmark points in the environment, are extracted, and from the perspective of the image, the feature points are representative parts of the images.

After the feature points are detected, whether the two feature points in the two frames are images formed by the unified landmark points is judged through feature point matching, so that the position information of the feature points is not enough to be simply detected, and other information is required to be extracted for matching between the feature points. The feature points generally consist of two parts: key points (Key-points) containing information such as position, size, direction, score, etc.; descriptor (Descriptor), pixel information around the keypoint.

In order to keep the rotational invariance, when the feature point descriptor is calculated, the direction information of the key points is added, the detection of the feature points is actually the detection of the key points, and common detection methods include Harris key points, FAST key points, SIFT key points and SURF key points. The FAST keypoint algorithm is simple, and if there are N consecutive pixels in 16 pixels on a circle with a radius of 3 and a pixel p as a center, the point is a keypoint, where N is usually 12 or 9, if the N is greater than or less than a certain threshold of the value of the p pixel. The FAST key points have no direction, the FAST algorithm is improved in the ORB characteristics, and an image pyramid is respectively constructed to realize the acquisition of angle information by a scale invariance method and a gray scale centroid method. In this embodiment, a Speeduprobubustfeature (SURF) key point detection method is used, and since the construction of the DoG pyramid in the SIFT key point detection and the search of the extremum are time-consuming, SURF improves the extremum detection method of SIFT, and the position of the extremum point is located by using the local maximum of the hessian matrix determinant, thereby greatly accelerating the detection speed of the key point.

The detection method of the descriptor in the embodiment uses a SIFT feature descriptor, the SIFT feature descriptor is selected by taking a feature point as a center, as shown in fig. 2, a coordinate axis is rotated by an angle of a principal direction of the feature point in a nearby neighborhood, an 8 × 8 window is taken by taking a key point as the center, every 4 × 4 pixel is taken as a window, a gradient amplitude and a gradient direction of each pixel are obtained, an arrow direction represents the gradient direction of the pixel, a length represents the gradient amplitude, then a gaussian window is used for carrying out weighting operation on the gradient amplitude, finally gradient histograms in 8 directions are drawn on each 4 × 4 small block, an accumulated value in each gradient direction is calculated, vector information in 8 directions is formed, and finally the descriptor is composed of floating point type vectors in 4 × 4 × 8.

The SIFT features have invariance such as scale, rotation, brightness and the like, and are extremely stable feature points, but the calculation speed is much slower than SURF, the real-time performance of the system cannot be guaranteed by using the SIFT features, but the SIFT can completely reach the real-time speed under the acceleration of the GPU.

The feature matching is used for matching feature points of the same road sign on different images. The matching of the feature points is obtained by calculating the distances of different feature point descriptors, and the feature points of a certain feature point on the first frame image, which are matched on the second frame image, are calculated. The SIFT feature point descriptors are 128-dimensional floating point type vectors, and the distance between the two feature point descriptors is calculated to be the vector distance between the two feature point descriptors. When the number of feature points is large, the speed of calculating the distance one by one is slow, and real-time matching cannot be achieved. In order to achieve real-time matching speed, camera motion can be predicted according to a motion model to obtain initial camera pose change, and epipolar search is performed according to the initial pose change to accelerate matching feature points. Bag of words (BOW) or fast nearest neighbor (FLANN) algorithms may also be employed to speed up matching. Map initialization when no map point cloud is known, the map point cloud is generated by using the first two frames. The basic matrix and the homography matrix can be calculated by utilizing the matched characteristic points between the two frames, and the pose transformation between the two frames can be calculated by further decomposing the basic matrix and the homography matrix. Due to the existence of noise and mismatching, when the basis matrix and the homography matrix are calculated, RANSAC (random sample consensus) algorithm is adopted, the homography matrix and the basis matrix are calculated at the same time, and the matrix with small projection error is selected to decompose the pose. In order to ensure the quality of the initialization map, a certain distance and a certain rotation angle are required between two frames. And when the number of matched interior points between the two frames is less, judging that the initialization is failed, and reselecting the two frames for initialization. And calculating the distance between the two frames according to the conversion of the GPS coordinates between the two frames to a ground-fixed earth-center rectangular coordinate system (Earth holder Earth fixed) and taking the distance as the initial scale of the map.

Map initialization is to recover 3D map points by using 2D-2D epipolar geometry, and after map points exist, the camera pose and the 3D map points can be recovered by solving the problem of 3D-2D PnP. Although the algorithms such as PnP can solve the pose of the camera and acquire the depth of the feature point by triangulation, the pose and depth information currently solved are both rough values due to the existence of noise. In order to obtain more accurate pose and depth values, the current pose and depth values can be used as initial values, and the accurate pose and depth values are further obtained by using a nonlinear optimization method.

The mathematical expression for imaging of a known pinhole camera is:

pi [ Xi, yi, zi ] is the three-dimensional coordinate of the first map point, pi = [ ui, vi ] is the pixel coordinate of the projection of the ith map point on the current frame, si is the depth value corresponding to Pi, K is the camera internal reference of the current frame, and T is the pose of the current frame. T can be written in the form of lie algebra:

in practice, due to the presence of noise, an error ei exists between the true projected pixel coordinate ui of Pi and the theoretical value Pi. The true pixel coordinates with noise are here recorded as the observed values. When the error satisfies the gaussian distribution:

a least squares problem can be constructed, namely:

let the derivative of ei with respect to ξ be Jacobian J (ξ). The most important of the gradient descent method in the nonlinear optimization is the derivation of J (ξ). Here the world coordinate system is first down-converted to the camera coordinate system as follows:

C _i ＝exp(ξ^)P _i ＝TP _i ＝[X′ _i ，Y′ _i ，Z′ _i ] ^T

the camera model may be represented as:

here, [ u ', v' ] T is a specific value of the predicted value of the pixel coordinate. Elimination of si from si = Z' gives:

in the formula of the formula error term, ci is taken as an intermediate variable, and a chain derivation method is used to obtain:

wherein:

the partial derivative of Ci with respect to xi is equivalent to derivation of lie algebra, and can be completed by constructing a disturbance model. Multiplying the disturbance quantity delta xi by xi left, and then solving the derivative of the increment of TPi to the disturbance item delta xi:

the partial derivative of the rearrangement error term with respect to xi can be derived as:

the negative sign in front of the jacobian matrix here is due to the fact that the error in the derivation process is defined by the observed value minus the predicted value. Of course, if the error is defined as a predicted minus observed value, the negative sign of the foregoing may be omitted. Carrying out cholesky decomposition on sigma-1 in the formula, namely:

∑ ^-1 ＝LL ^T ＝L ^T L

bringing the above formula into available:

the formula for the least squares method can then be written as:

when iterative optimization is performed, a gaussian optimization algorithm is taken as an example here for simplicity. From the foregoing theoretical description, it can be seen that the idea of Gaussian is to perform a first order Taylor expansion on f (x). Taylor expansion is performed for the error term:

where J (ξ) is the derivative of e (ξ) with respect to ξ (Jacobian), which can be:

the single two-norm is expanded to obtain:

finding the derivative of the above equation with respect to ξ equal zero can give:

an incremental equation can be obtained:

the increment delta xi can be solved by an increment equation, so that iterative optimization is realized. When the Gaussian Newton method or Levenberg-Marquardt method is used again, the loss function is L (ui-1/siexp (ξ) Pi), and the Jacobian is LJ (ξ).

The key frame is a frame selected from the local normal frames as a representative of the local frame. Trying to capture the same image when the camera is stationary. Needless to say, if all images are taken as the objects of the back-end optimization, unnecessary calculations are added. Therefore, the key frame's primary role is to face a compromise between back-end optimization computational power and accuracy. The key frame extraction generally considers the following aspects:

whether the number of frames temporally distant from a previous key frame is enough;

whether the distance in space from the previous key frame is sufficiently far;

the observation quality is controlled to a certain range with the common observation point of the last key frame. Too many co-observations indicate a higher overlap rate, and too few will reduce the tracking accuracy.

In this embodiment, in addition to considering the above-mentioned 3 points, special requirements are made on the rotation of the aerial image. Mainly because unmanned aerial vehicle (camera) yaw angle changes when (around the z axle) rotatory, the image of taking photo by plane that shoots changes by a short time. Therefore, when the pitch angle and the yaw angle of the unmanned aerial vehicle (camera) exceed a certain angle, the key frames are added, and when only the yaw angle changes, the key frames are not added.

If the current frame tracks the previous key frame, the tracking loss phenomenon can occur due to the reasons that the detected feature points are few, the matched feature points are few, or the calculated pose deviation is large and the like. Relocation after a loss is crucial in order to improve the redundancy of the whole system. In the system herein, when GPS is available, the GPS is used to find several frames closer to the current frame as candidate frames. And then finding a key frame with better feature matching from the candidate frames, and solving the PnP again to track and calculate the pose. When the GPS is not available, a bag of words (BOW) is used to find candidate frames, and a keyframe with better feature matching is also found from the candidate frames for tracking.

The bag of words was originally used for text classification, representing text as feature vectors. Later, word bags were used in computer vision for image classification and image retrieval, etc. Taking the SIFT features as an example, firstly extracting a large number of SIFT features from an image, then classifying the features layer by layer according to the distance of the 128 floating point type descriptors of the features by using a clustering method such as Kmeans and the like, and taking the class at the bottommost layer as a word in a dictionary. And outputting the frequency of the word in each image statistical dictionary as a vector, and completing the conversion from the image to the vector. By comparing the vector distance between different images, the similarity between the images can be approximately calculated. A few frames with a small vector distance from the current frame may be candidate frames for relocation.

In the map initialization process, the pose is obtained by utilizing two frames of epipolar constraints, and then an initial map point cloud is generated through triangulation. In the tracking process, the pose of a new frame (current frame) can be obtained by continuously solving the PnP through the matched feature points of the current frame and the map point cloud and optimizing the PnP. However, as the camera is far away from the area where the map point cloud is initialized, the feature points of the new frame are less and less matched with the map point cloud, and finally, the PnP cannot be solved and the tracking is lost. Therefore, to ensure the continuity of the system, new map points are continuously added after the current frame is calculated. Since the map point creation and key frame addition conditions are relaxed, a very strict screening mechanism is used to remove redundant key frames and map points which are not easy to track.

And in the tracking process, one part of all the feature points of the current frame is used for matching the feature points of the previous key frame, and then the PnP is solved by using the map points corresponding to the feature points of the previous key frame and the feature points of the current frame. So that the feature points matching the previous key frame already have corresponding map points, and new map points are generated from these feature points not matching the map points. Feature points that do not have matching map points match feature points in keyframes that have a common view relationship with the current frame (except for the last keyframe utilized in the tracking process). And calculating epipolar lines of the feature points on the matched key frames by using an epipolar constraint equation, and judging whether the matching is correct or not by calculating whether the matched feature points are near the epipolar lines. If the matched feature point is more than a certain threshold away from the epipolar line, a mismatch is identified and re-matched in other keyframes. Since the current frame needs to be matched with the features of a plurality of co-view key frames, the matching time is long due to large calculation amount. To speed up matching, using the bag of words introduced above, matching is only done in features that have the same word label in the co-view keyframes. After the matched feature points are obtained, new map points can be calculated by utilizing triangulation.

Because the adding condition of the map points and the key frames is looser, the redundant key frames and the map points with low confidence coefficient are deleted, so that the calculation amount of optimization can be reduced. Deletion judgment conditions of key frames: 90% of the observed map points for that key frame are observed simultaneously by more than three other key frames. Deleting conditions of map points: less than three keyframes are observed for the map point at the same time.

So far, only one-to-one constraint is considered for the pose of the current frame and the coordinate estimation of the new map point. The pose estimation of the current frame is obtained by calculating and solving PnP and optimizing projection errors through projecting map points to the current frame. The new map point is obtained by triangulation of feature points matched between the current frame and the common frame. In fact, after the map is updated, any map point can be observed by multiple frames together. Thus, the projection error of all map points onto the keyframe where they are observed can be jointly optimized. The optimization process is similar to BA, constructing a projection with least squares adding multiple frames compared to equations 2-100:

where χ ij represents whether the ith map point is visible in the j cameras. If it is seen that its value is 1, otherwise it is 0.θ = (ξ 1.. ξ M, P1.. PN) is the variable to be optimized, θ ∈ R9M +3N.

The incremental equation for the Levenberg-Marquardt optimization method is as follows:

[J(θ) ^T J(θ)+λI]δ0＝-J(θ) ^T (n-f(0))

the Jacobian matrix J (theta) is a sparse matrix, as shown in FIG. 3 (a), which shows the Jacobian matrix when only 4 map points are observed by 3 key frames at the same time. Can be decomposed into two parts:

J(θ)＝[J _C ，J _P ]

each small block in the JC is a derivative of a projection error relative to the pose of each key frame, and a matrix of 2 multiplied by 6 is obtained by an equation 2-98. In JP, each small block is a derivative of the projection error with respect to each map point, and is a 2 × 3 matrix, and the calculation formula can be obtained by a chain rule and then by substituting the above formula:

wherein

The derivative to Pi is then R. In addition, each small block and the left side of the reprojection error are all decomposed into an L matrix by the information matrix Σ in the above equation. The value of [ J (theta) TJ (theta) + Lambda I in the formula can be obtained from the Jacobian matrix J (theta)]The matrix is shown in FIG. 3 (b). The matrix is a symmetric sparse matrix, and can be partitioned into the following blocks according to the definition of a Jacobian matrix:

thus, the equation for optimization can be:

and (3) performing simultaneous multiplication Schur compensation on the left and right sides of the formula to realize marginalization:

the first equation in the equation set of the above formula is a linear equation, and all the camera pose increments δ C can be directly solved. All map point increments δ P can be solved by substituting δ C into the second expression of the system of equations. And the accurate camera pose and map point coordinates can be continuously and iteratively estimated by obtaining the increment of the optimization variable of each step. In order to avoid excessive calculation amount in the mapping part, only local optimization is carried out, namely, only the poses of the current frame and the key frames with the common-view relation and the coordinates of map points observed by the poses are optimized. In addition, GPS information is fused in the local optimization to be used as a constraint item so as to obtain a more accurate camera pose. The cost function is obtained by adding GPS constraint in the optimization method, and more accurate camera pose and point cloud coordinates can be obtained by optimizing the cost function.

And (4) optimizing all map points and key frames (global optimization) by opening an independent thread after detecting and correcting loop 2.5 and every certain frame number.

In the tracking process, the estimation of the pose of the camera is a gradual recursion process, namely the pose of the previous key frame is tracked to solve the pose of the current frame. In fact, each frame pose solution has errors, and the errors are accumulated as the tracking is performed. One effective way to eliminate the accumulated error is to perform loop detection. The loop detection is used to determine whether the camera has returned to a previous position, and if loop is detected, the constraint between the current frame and the loop frame can be added to the back-end optimization to correct the error.

When loop detection is actually performed, the most direct method is to match all previous key frames with the current frame, and the key frame with the best match can be used as a loop frame. However, with the continuous addition of key frames, the matching speed is slower and the calculation amount is larger. In order to reduce the calculation amount, when the GPS is available, the GPS is used to find several key frames closer to the current frame as candidate frames, and the current frame is matched with the several candidate frames, and the best matched frame is taken as a loop back frame. The bag of words approach is used to speed up the matching when GPS is not available. The method of the word bag is to search words corresponding to the descriptors of the picture feature points in the dictionary tree. And counting the number of the words to form a bag-of-words vector. The similarity between two pictures can be estimated by comparing the distance of the bag-of-word vectors between the images. And calculating the distance between the current frame and the bag-of-word vectors of all key frames, searching the frames with the minimum distance as candidate frames, and finally taking the best matched candidate frame as a loop back frame.

Today most drones provide more accurate GPS information and therefore methods for detecting loops using GPS are well suited. However, in the method using bags, when the environment itself has similar scenes, it is easy to detect that the wrong loop leads to optimizing the destructive result. Therefore, it is required to design a more rigorous selection strategy for loop candidate frames: 1) Loop detection is performed every 10 frames to prevent redundant calculation; 2) Calculating the maximum distance Dm of the word bag vectors of the current frame and all the connected key frames, wherein the distance between the word bag vectors of the candidate frame and the current frame is smaller than Dm; 3) And if 3 continuous frames continuously detect the candidate frame, the loop is considered to be detected. And matching the 3 rd frame with the loop candidate frame, and searching the key frame with the best matching as a loop frame. The matching constraint relationship of the current frame and the loop frame is brought into global optimization to correct the accumulated error.

The visual perception module and the intelligent decision module are combined to use a target detection algorithm to extract geographic entities existing in images, depth level features of the geographic entities are extracted by using a pre-trained AlexNet convolutional neural network, dimension reduction processing is carried out on feature vectors of each geographic entity, a grid map is generated by using a real-time digital earth surface model, barrier marks are arranged at corresponding positions in the grid map according to the types of the ground objects, starting from the initial place of the environment is achieved by combining a DSAS algorithm, an optimal path capable of bypassing the barriers is searched to reach the end point, and real-time path planning is achieved.

The invention also provides a dynamic battlefield environment real-time path planning method, which comprises the following steps:

as shown in fig. 4, an image shot by an unmanned aerial vehicle is used as an input to generate a real-time digital earth surface model, a target detection algorithm is used to extract geographic entities existing in the image, a pre-trained AlexNet convolutional neural network is used to extract depth level features of the geographic entities, dimension reduction processing is performed on feature vectors of each geographic entity to generate a grid map, corresponding positions in the grid map are provided with barrier marks according to types of the ground objects, starting from an initial place of the environment and combining with a DSAS algorithm, an optimal path capable of bypassing the barriers is searched to reach a terminal point, and real-time path planning is achieved.

Specifically, after the real-time digital earth surface model is generated, the geographic entities existing in the image are extracted by using a target detection algorithm, and in consideration of the problem of computational efficiency, the embodiment adopts an entity detection method based on Yolo, but the range of the detected entities is restricted, only the relatively static geographic entities are detected, such as the geographic entities of houses, bridges and the like, and the volatile entities of pedestrians, vehicles and the like are not detected. After the geographic entities are extracted, the depth level features of the geographic entities are extracted by using a pre-trained AlexNet convolutional neural network, because the dimension of the feature vector of each geographic entity is too high and greatly influences the calculation efficiency, dimension reduction needs to be carried out on the feature vector, dimension reduction is carried out by adopting a Gaussian random mapping method, barrier marks are arranged at corresponding positions in a grid map according to the types of the ground objects, and therefore data are provided for subsequent path planning.

After a grid map of path planning is obtained, a Dynamic Sparse A Search (DSAS) algorithm is adopted to Search an optimal path which can bypass an obstacle from the beginning of the environment to reach an end point, and real-time path planning is realized.

In the DSAS algorithm, each node of the flight path contains pointers pointing to a father node and all child nodes of the node except coordinate information representing the space position of the node, when a new obstacle is detected in front of the current flight path, the DSAS algorithm deletes the flight path node threatening the coverage area and all descendant nodes of the node, reinserts the father node into an OPEN table, then calculates the cost value of a new expansion node according to the method, and continues the searching process until a new optimal flight path of the aircraft is obtained.

The method comprises the steps of using a visual SLAM to take an aerial image shot by an unmanned aerial vehicle as input, generating a 3D point cloud for each frame of aerial image in real time, removing noise points of the 3D point cloud by adopting iterative filtering, performing Delaunay triangulation on feature points of the 3D point cloud after the noise points are filtered to generate a 2D mesh, projecting the generated 2D mesh into the 3D point cloud after the noise points are filtered to generate a 3D mesh, fusing texture information of the aerial image into the 3D mesh to generate a local DSM, segmenting the local DSM into tiles, and fusing the tiles to the global DSM through a multi-band algorithm to generate the real-time digital earth surface model.

The geographic entities present in the images are extracted using a target detection algorithm, which in this embodiment is Youonlylookone (YOLO). Different from the two-stage algorithm, the YOLO removes a candidate box generation part (a region suggestion stage), the whole process directly extracts features, candidate box regression and classification are completed in a model without branches, and the detection speed is improved by about 10 times compared with the two-stage algorithm.

Specifically, as shown in fig. 5, the input picture is divided into S × S grids, and each grid is responsible for detecting all objects falling into its own region. Each grid predicts 5 values, and the coordinates (x, y) of the center position of the predicted frame of the object, the width and height (w, h) of the object, and the confidence that the object belongs to a certain class are respectively predicted. Wherein, the confidence degree refers to the confidence degree that the region contains the object and the accuracy of the prediction box. If the prediction box contains the detection object, then Pr (object) is 1. Otherwise, pr (object) is 0. In the prediction phase, the probability of a class is the product of the class conditional probability and the confidence of the prediction:

the detailed flow chart of the network is shown in fig. 6, and the network includes 24 convolutional layers, 2 fully-connected layers, and 5 pooling layers. The output of the network is a 7 x 30 dimensional vector: 7 × 7 represents a grid that divides the input into 7 × 7; 30 includes object type and width, position, and confidence information of each of the two prediction frames, and the size of the input image is not changeable because of the inclusion of the full link layer, and must be 448 × 448.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A dynamic battlefield environment real-time path planning system, comprising: the map reconstruction module, the visual perception module, the intelligent decision module and the UI interaction module;

the map reconstruction module includes: the visual SLAM takes aerial images shot by an unmanned aerial vehicle on line as input, generates 3D point cloud for each frame of image in real time, generates local DSMs in real time by combining the 3D point cloud with texture information of the image, and fuses the local DSMs to a global DSM to generate a real-time digital earth surface model;

the intelligent decision module is combined with a DSAS algorithm to realize that an optimal path capable of bypassing the barrier is searched from the beginning of the environment to reach the end point, so that real-time path planning is realized;

the visual SLAM system comprises feature extraction, tracking, drawing and looping;

the tracking completes feature matching by calculating the distance between feature point descriptors of a current frame and a previous frame, and restores the pose change of the current frame relative to the previous frame through the feature matching, wherein the obtaining of the pose of the previous frame comprises:

the mathematical expression for the imaging of a pinhole camera is:

wherein Pi [ Xi, yi, zi ] is the three-dimensional coordinate of the ith map point, pi = [ ui, vi ] is the pixel coordinate of the projection of the ith map point on the current frame, si is the depth value corresponding to Pi, K is the camera parameter of the current frame, and T is the pose of the current frame, where T can be written in the form of a lie algebra:

in practice, due to the existence of noise, an error ei exists between the pixel coordinate ui of Pi after real projection and the theoretical value Pi, the real pixel coordinate with noise is recorded as an observed value, and when the error satisfies Gaussian distribution:

a least squares problem can be constructed, namely:

noting the derivative of ei with respect to ξ as Jacobian J (ξ), the world coordinate system is first down-converted to camera coordinate system as:

C _i ＝exp(ξ^)P _i ＝TP _i ＝[X _i ′，Y _i ′，Z′ _i ] ^T

the camera model may be represented as:

here, [ u ', v ' ] T is a specific value of the predicted value of the pixel coordinate, and can be obtained by removing si from si = Z ':

wherein:

the partial derivative of Ci about xi is equivalent to derivation of a lie algebra, a disturbance model can be constructed to complete the derivation, the disturbance quantity delta xi is multiplied by xi left, and then the derivative of the increment of TPi to a disturbance item delta xi is calculated:

the partial derivative of the rearrangement error term with respect to ξ can be found as:

the negative sign in front of the Jacobian matrix is defined by subtracting the predicted value from the observed value due to the error in the derivation process, and cholesky decomposition is carried out on sigma-1 in the formula, namely:

∑ ^-1 ＝LL ^T ＝L ^T L

bringing the above formula into available:

the formula for the least squares method can then be written as:

when performing iterative optimization, taking a gaussian optimization algorithm as an example here for simplicity, taylor expansion is performed for the error term:

the single two-norm is expanded to obtain:

an incremental equation can be obtained:

the increment delta xi can be solved by an increment equation, so that iterative optimization is realized;

2. A dynamic battlefield environment real-time path planning method is characterized in that: the method comprises the following steps:

the method comprises the steps of generating a real-time digital earth surface model by taking an image shot by an unmanned aerial vehicle as input, extracting geographic entities existing in the image by using a target detection algorithm, extracting depth level features of the geographic entities by using a pre-trained AlexNet convolutional neural network, performing dimension reduction processing on feature vectors of each geographic entity to generate a grid map, setting barrier marks at corresponding positions in the grid map according to the types of the ground objects, starting from an initial place of an environment by combining a DSAS algorithm, searching for an optimal path capable of bypassing the barriers to reach an end point, and realizing real-time path planning;

the method comprises the steps that a real-time digital earth surface model is generated by taking an image shot by an unmanned aerial vehicle as input, wherein the image shot by the unmanned aerial vehicle is taken as input by using a visual SLAM, and the visual SLAM comprises feature extraction, tracking, drawing and looping;

the mathematical expression for the imaging of a pinhole camera is:

a least squares problem can be constructed, namely:

C _i ＝exp(ξ^)P _i ＝TP _i ＝[X _i ′，Y _i ′，Z _i ′] ^T

the camera model may be represented as:

in the formula of the formula error term, ci is taken as an intermediate variable, and a chain derivative method is used to obtain:

wherein:

the partial derivative of Ci about xi is equivalent to derivation of a lie algebra, a disturbance model can be constructed to complete the derivation, the disturbance quantity delta xi is multiplied by xi on the left side, and then the derivative of the increment of TPi to a disturbance item delta xi is calculated:

∑ ^-1 ＝LL ^T ＝L ^T L

bringing the above formula into available:

the formula for the least squares method can then be written as:

when performing iterative optimization, taking the gaussian optimization algorithm as an example here for simplicity, a taylor expansion is performed for the error term:

the single two-norm is expanded to:

finding the derivative of the above equation with respect to ξ equal zero yields:

an incremental equation can be obtained:

generating a 3D point cloud for each frame of aerial image in real time, removing noise points of the 3D point cloud by iterative filtering, executing Delaunay triangulation on feature points of the 3D point cloud after the noise points are filtered to generate a 2D mesh, projecting the generated 2D mesh into the 3D point cloud after the noise points are filtered to generate a 3D mesh, fusing texture information of the aerial image into the 3D mesh to generate a local DSM, segmenting the local DSM into tiles, and fusing the tiles to the global DSM by a multiband algorithm to generate a real-time digital earth surface model.

3. The method of claim 2, wherein the dynamic battlefield environment real-time path planning method comprises the following steps: the method for extracting the geographic entities existing in the image by using the target detection algorithm comprises the steps of dividing the input image into S × S grids, wherein each grid is responsible for detecting all objects falling into the own region, 5 values are predicted by each grid, the coordinates (x, y) of the center position of a prediction frame of each predicted object, the width and height (w, h) of each object and the confidence coefficient of each object belonging to a certain class are respectively predicted, if the prediction frame contains a detection object, pr is 1, otherwise, pr is 0, and in the prediction stage, the probability of each class is the product of the conditional probability of each class and the confidence coefficient of each prediction:

4. the method of claim 3, wherein the dynamic battlefield environment real-time path planning method comprises the following steps: the geographic entity comprises a relatively stationary geographic entity.

5. The dynamic battlefield environment real-time path planning method according to claim 3, wherein: in the DSAS algorithm, besides coordinate information representing the space position of the unmanned aerial vehicle, pointers pointing to a father node and all child nodes of the unmanned aerial vehicle are also included, when a new obstacle is detected in front of the current track, the DSAS algorithm deletes the track node threatening the coverage area and all descendant nodes of the track node, reinserts the father node into an OPEN table, then calculates the cost value of a new extended node, and continues the searching process until a new optimal track of the unmanned aerial vehicle is obtained.