CN112509056A

CN112509056A - Dynamic battlefield environment real-time path planning system and method

Info

Publication number: CN112509056A
Application number: CN202011373402.8A
Authority: CN
Inventors: 张西山; 连光耀; 李会杰; 闫鹏程; 孙江生; 袁祥波; 连云峰; 代冬升; 李雅峰; 李万领; 钟华; 付久长; 王子林; 宋秦松; 郭伟
Original assignee: 32181 Troops of PLA
Current assignee: 32181 Troops of PLA
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-03-16
Anticipated expiration: 2040-11-30
Also published as: CN112509056B

Abstract

The invention discloses a dynamic battlefield environment real-time path planning system and a method, wherein the dynamic battlefield environment real-time path planning system comprises a map reconstruction module, a visual perception module, an intelligent decision module and a UI interaction module; the UI interaction module realizes mutual calling and information connection of parameters among different modules; the map reconstruction module takes an image shot by the unmanned aerial vehicle as input to generate a real-time digital earth surface model; the visual perception module is used for extracting geographic entities existing in images shot by the unmanned aerial vehicle, generating a grid map from the real-time digital earth surface model, and setting barrier marks on the grid map; the intelligent decision module is combined with a DSAS algorithm to realize that an optimal path which can bypass the barrier is searched from the beginning of the environment to the end point, and real-time path planning is realized. The invention can provide command control information with high real-time performance for the unmanned equipment in a dynamic battlefield environment, thereby improving the autonomy of the unmanned equipment.

Description

Dynamic battlefield environment real-time path planning system and method

Technical Field

The invention relates to the field of paths, in particular to a dynamic battlefield environment real-time path planning system and a dynamic battlefield environment real-time path planning method.

Background

The information plays an increasingly important role in daily life, the efficiency and the quality of work can be obviously improved, and the information is used as an intelligent control system of unmanned equipment, so that the flexibility of the system can be greatly improved through a flexible unmanned flight platform. With the rapid development of electronic and computer technologies in recent years, low-cost unmanned aerial vehicles have become mature, high-speed image processing is gradually applied to various tasks, and efficient intelligent control achieved through the low-cost unmanned aerial vehicles becomes a flexible and efficient means. However, the conventional unmanned aerial vehicle map software system does not have a real-time map building function and autonomous capability, and is low in processing speed, so that a real-time path planning system with the real-time map building function and the high processing speed is required.

Disclosure of Invention

In order to solve the problems, the invention provides a dynamic battlefield environment real-time path planning system and a dynamic battlefield environment real-time path planning method, which can provide high-real-time command control information for unmanned equipment in a dynamic battlefield environment, so that the autonomy capability of the unmanned equipment is improved.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a dynamic battlefield environment real-time path planning system, comprising: the map reconstruction module, the visual perception module, the intelligent decision module and the UI interaction module;

the UI interaction module realizes mutual calling and information connection of parameters among different modules;

the map reconstruction module takes an image shot by the unmanned aerial vehicle as input to generate a real-time digital earth surface model;

the visual perception module is used for extracting geographic entities existing in images shot by the unmanned aerial vehicle, generating a grid map from the real-time digital earth surface model, and setting barrier marks on the grid map;

the intelligent decision module is combined with a DSAS algorithm to realize that an optimal path which can bypass the barrier is searched from the beginning of the environment to the end point, and real-time path planning is realized.

Optionally, the map reconstructing module includes: the visual SLAM takes aerial images shot by an unmanned aerial vehicle on line as input, generates 3D point cloud for each frame of image in real time, generates local DSM in real time by combining the 3D point cloud with texture information of the image, and fuses the local DSM to the global DSM to generate a real-time digital earth surface model.

Optionally, the visual SLAM system includes feature extraction, tracking, mapping, and looping;

the feature extraction is used for extracting feature points in the aerial image and calculating a descriptor;

the tracking completes feature matching by calculating the distance between feature point descriptors of a current frame and a previous frame, and restores the pose change of the current frame relative to the previous frame through the feature matching;

the mapping method comprises the steps that more feature matching is searched in a current frame and previous frames, more 3D points are recovered through triangulation, and local BA optimization is carried out on the current frame, a commonly observed associated frame and commonly observed map points to obtain more accurate camera pose and 3D map point coordinates;

the loop is used for detecting whether the camera returns to a place which is previously visited or not, and adding new constraint according to the detected loop to correct the corresponding camera pose and the 3D map point coordinate.

Optionally, the image shot by the unmanned aerial vehicle is used as input to generate a real-time digital earth surface model, a target detection and calculation method is used to extract geographic entities existing in the image, a pre-trained AlexNet convolutional neural network is used to extract depth level features of the geographic entities, dimension reduction processing is performed on feature vectors of each geographic entity to generate a grid map, barrier marks are set at corresponding positions in the grid map according to types of the geographic entities, starting and sending from an environment are achieved by combining a DSAS algorithm, an optimal path capable of bypassing the barriers is searched to reach a destination, and real-time path planning is achieved.

Optionally, the generating the real-time digital earth surface model by using the image shot by the unmanned aerial vehicle as an input includes using a visual SLAM to use an aerial image shot by the unmanned aerial vehicle on line as an input, generating a 3D point cloud for each frame of the aerial image in real time, removing noise of the 3D point cloud by iterative filtering, performing delaunay triangulation on feature points of the 3D point cloud after the noise is filtered to generate a 2D mesh, projecting the generated 2D mesh into the 3D point cloud after the noise is filtered to generate a 3D mesh, fusing texture information of the 3D mesh into the aerial image to generate a local DSM, dividing the local DSM into tiles, and fusing the local DSM into the global DSM by using a multi-band algorithm to generate the real-time digital earth surface model.

Optionally, the extracting, by using the target detection algorithm, the geographic entity existing in the image includes dividing the input image into S × S grids, each grid being responsible for detecting all objects falling into its own region, where each grid predicts 5 values, and predicts coordinates (x, y) of a center position of a prediction box of the obtained object, width and height (w, h) of the object, and confidence that the object belongs to a certain class, if the prediction box contains a detection object, Pr is 1, otherwise, Pr is 0, and in the prediction stage, the probability of the class is a product of a class conditional probability and the predicted confidence:

optionally, the geographic entity comprises a relatively stationary geographic entity.

In the DSAS algorithm, besides coordinate information representing the space position of the unmanned aerial vehicle, pointers pointing to a father node and all child nodes of the unmanned aerial vehicle are also included, when a new obstacle existing in front of the current track is detected, the DSAS algorithm deletes the track node threatening the coverage area and all descendant nodes of the track node, reinserts the father node into an OPEN table, calculates the cost value of a new expansion node, continues the searching process until a new optimal track of the unmanned aerial vehicle is obtained

Compared with the prior art, the invention has the technical progress that:

compared with the traditional computer vision method, the method has the advantages that the processing capacity is high, the real-time processing capacity is realized, more than 30frames of images can be processed per second, the map data can reach more than ten million point clouds, and high real-time command control information can be provided for the unmanned equipment in a dynamic battlefield environment, so that the autonomous capacity of the unmanned equipment is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

In the drawings:

FIG. 1 is a block diagram of the present invention.

Fig. 2 is a diagram of a descriptor of SIFT keypoints.

FIG. 3 is a schematic diagram of a matrix in BundleAdjustment.

FIG. 4 is a flow chart of the present invention.

Fig. 5 is a specific network structure diagram of the YOLO model.

Fig. 6 is a specific network flow diagram of the YOLO model.

Detailed Description

The following specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

As shown in FIG. 1, the invention discloses a dynamic battlefield environment real-time path planning system, which comprises.

The map reconstruction module, the visual perception module, the intelligent decision module and the UI interaction module;

each module of the invention is decoupled, and the parameters among different modules are mutually called and connected with information through the UI interaction module. The UI interface module is a part for interacting with a user and comprises several functions of creating a project, loading the project, detecting a target and training a model, and in the newly created project, the user can determine the name of the project, the position of the project, a picture sequence, a label category and a used detection algorithm. A load item is an item previously created by a load user. And finishing the result of the constructed image by target detection. Training the model is to retrain the model using the corrected labels. And right clicking the display path planning list in the interface, wherein the UI interface supports the user to modify the incorrect prediction result, and the corrected label data is used for automatically updating the parameter weight of the model again, so that the precision of the model is continuously improved in the use process, and each module of the system can be independently called and tested by virtue of the modular design, and new modules can be conveniently and continuously added.

Specifically, the map reconstruction module includes: the visual SLAM takes aerial images shot by an unmanned aerial vehicle on line as input, generates 3D point cloud for each frame of image in real time, generates local DSM in real time by combining the 3D point cloud with texture information of the image, and fuses the local DSM to the global DSM to generate a real-time digital earth surface model.

Wherein, the vision SLAM system includes: extracting, tracking, drawing and looping the features;

tracking and calculating the distance between feature point descriptors of a current frame and a previous frame to complete feature matching, and recovering the pose change of the current frame relative to the previous frame through the feature matching;

establishing a map, searching more feature matches in a current frame and previous frames, recovering more 3D points through triangulation, and performing local BA optimization on the current frame, a commonly observed associated frame and commonly observed map points to obtain more accurate camera pose and 3D map point coordinates;

the loop is used for detecting whether the camera returns to a place which is previously visited or not, and a new constraint is added according to the detected loop to correct the corresponding camera pose and the 3D map point coordinate.

Specifically, the unmanned aerial vehicle carries a camera to acquire aerial images at the speed of 30frames/s and transmits the aerial images to a ground station in real time. To recover the pose of the camera and the 3D structure of the environment from the aerial images, feature points in the images, which are images of landmark points in the environment, are extracted, and from the perspective of the image, the feature points are representative parts of the images.

After the feature points are detected, it is determined whether two feature points in two frames are images formed by the uniform landmark points through feature point matching, so that it is not enough to simply detect the position information of the feature points, and other information is extracted for matching between the feature points. The feature points generally consist of two parts: key points (Key-points) containing information such as position, size, direction, score, etc.; descriptor (Descriptor), pixel information around the keypoint.

In order to maintain the rotation invariance, when the feature point descriptor is calculated, the direction information of the key points is added, the detection of the feature points is actually the detection of the key points, and common detection methods include Harris key points, FAST key points, SIFT key points and SURF key points. The FAST keypoint algorithm is simple, and if there are N consecutive pixels in 16 pixels on a circle with a radius of 3 and a pixel p as a center, the point is a keypoint, where N is usually 12 or 9, if the N is greater than or less than a certain threshold of the value of the p pixel. FAST key points have no direction, a FAST algorithm is improved in ORB characteristics, and an image pyramid is respectively constructed to realize the acquisition of angle information by a scale invariance method and a gray scale centroid method. In this embodiment, a SURF (speeduprobubustfeature) keypoint detection method is used, because the construction of the DoG pyramid in SIFT keypoint detection and the search of an extremum are time-consuming, SURF improves the extremum detection method of SIFT, and the position of the extremum point is located by using the local maximum of the hessian matrix determinant, thereby greatly accelerating the detection speed of the keypoint.

The detection method of the descriptor in this embodiment uses a SIFT feature descriptor, the SIFT feature descriptor is selected by taking a feature point as a center, as shown in fig. 2, a coordinate axis is rotated by an angle of a main direction of the feature point in a nearby neighborhood, an 8 × 8 window is taken by taking a key point as the center, every 4 × 4 pixel is taken as a window, a gradient magnitude and a gradient direction of each pixel are obtained, an arrow direction represents the gradient direction of the pixel, a length represents the gradient magnitude, then a gaussian window is used for performing weighting operation on the gradient magnitude, finally, gradient histograms in 8 directions are drawn on each 4 × 4 small block, an accumulated value in each gradient direction is calculated, vector information in 8 directions is formed, and finally, the descriptor is composed of floating point type vectors in 4 × 4 × 8.

The SIFT features have invariance such as scale, rotation, brightness and the like, and are extremely stable feature points, but the calculation speed is much slower than SURF, the real-time performance of the system cannot be guaranteed by using the SIFT features, but the SIFT can completely reach the real-time speed under the acceleration of the GPU.

The feature matching is used for matching feature points of the same road sign on different images. The matching of the feature points is obtained by calculating the distances of different feature point descriptors, and the feature points of a certain feature point on the first frame image, which are matched on the second frame image, are calculated. The SIFT feature point descriptors are 128-dimensional floating point type vectors, and the distance between the two feature point descriptors is calculated to be the vector distance between the two feature point descriptors. When the number of feature points is large, the speed of calculating the distance one by one is slow, and real-time matching cannot be achieved. In order to achieve real-time matching speed, camera motion can be predicted according to a motion model to obtain initial camera pose change, and epipolar search is carried out according to the initial pose change to accelerate matching of feature points. Bag of words (BOW) or fast nearest neighbor (FLANN) algorithms may also be employed to speed up matching. Map initialization when no map point cloud is known, the map point cloud is generated by using the first two frames. The basic matrix and the homography matrix can be calculated by utilizing the matched characteristic points between the two frames, and the posture transformation between the two frames can be calculated by further decomposing the basic matrix and the homography matrix. Due to the existence of noise and mismatching, when the basis matrix and the homography matrix are calculated, the RANSAC (random sample consensus) algorithm is adopted, the homography matrix and the basis matrix are calculated at the same time, and the matrix with small projection error is selected to decompose the pose. In order to ensure the quality of the initialization map, a certain distance and a certain rotation angle are required between two frames. And when the number of matched interior points between the two frames is less, judging that the initialization is failed, and reselecting the two frames for initialization. And calculating the distance between the two frames according to the conversion of the GPS coordinates between the two frames to a ground-fixed earth-center rectangular coordinate system (Earth holder Earth fixed) and taking the distance as the initial scale of the map.

Map initialization is to recover 3D map points by using 2D-2D epipolar geometry, and after map points exist, the camera pose and the 3D map points can be recovered by solving the problem of 3D-2D PnP. Although the algorithms of PnP and the like can solve the pose of the camera and acquire the depth of the feature point by triangulation, the pose and depth information currently solved are both rough values due to the presence of noise. In order to obtain more accurate pose and depth values, the current pose and depth values can be used as initial values, and the accurate pose and depth values are further obtained by using a nonlinear optimization method.

The mathematical expression for imaging of a known pinhole camera is:

pi [ Xi, Yi, Zi ] is the three-dimensional coordinate of the ith map point, Pi is [ ui, vi ] is the pixel coordinate of the projection of the ith map point on the current frame, si is the depth value corresponding to Pi, K is the camera intrinsic parameter of the current frame, and T is the pose of the current frame. T can be written in the form of lie algebra:

in practice, due to the presence of noise, an error ei exists between the true projected pixel coordinates ui of Pi and the theoretical value Pi. The true pixel coordinates with noise are here taken as the observation. When the error satisfies the gaussian distribution:

a least squares problem can be constructed, namely:

let us note that the derivative of ei with respect to ξ is Jacobian J (ξ). The most important gradient descent method in the nonlinear optimization is the derivation of J (ξ). Here the world coordinate system is first down-converted to the camera coordinate system as follows:

C_i＝exp(ξ^)P_i＝TP_i＝[X′_iY′_i，Z′_i]^T

the camera model may be represented as:

here, [ u ', v' ] T is a specific value of the predicted value of the pixel coordinate. Elimination of si from si ═ Z' can give:

in the formula of the formula error term, Ci is taken as an intermediate variable, and a chain derivative method is used to obtain:

wherein:

the partial derivative of Ci with respect to xi is equivalent to derivation of lie algebra, and can be completed by constructing a disturbance model. Multiplying xi by the disturbance quantity delta xi to left, and then solving the derivative of the increment of TPi to the disturbance item delta xi:

the partial derivative of the rearrangement error term with respect to ξ can be found as:

the negative sign in front of the jacobian matrix here is due to the fact that the error in the derivation process is defined by the observed minus the predicted value. Of course, if the error is defined as a predicted minus observed value, the negative sign of the foregoing may be omitted. Carrying out cholesky decomposition on sigma-1 in the formula, namely:

∑^-1＝LL^T＝L^TL

bringing the above formula into available:

the formula for the least squares method can then be written as:

when iterative optimization is performed, a gaussian optimization algorithm is taken as an example here for simplicity. The theory presented above shows that the idea of Gaussian is to perform a first order Taylor expansion on f (x). Taylor expansion is performed for the error term:

where J (ξ) is the derivative of e (ξ) with respect to ξ (Jacobian), which can be:

the single two-norm is expanded to:

finding the derivative of the above equation with respect to ξ equal zero yields:

an incremental equation can be obtained:

the increment delta xi can be solved by an increment equation, so that iterative optimization is realized. When the Gaussian Newton method or Levenberg-Marquardt method is used again, the loss function is L (ui-1/siexp (ξ) Pi), and the Jacobian is LJ (ξ).

The key frame is a frame selected from the local normal frames as a representative of the local frame. Trying to capture the same image when the camera is stationary. If all images are taken as the objects of the back-end optimization, unnecessary calculations are undoubtedly added. Therefore, the key frame's primary role is to face a compromise between back-end optimization computational power and accuracy. The key frame extraction generally considers the following aspects:

whether the number of frames temporally distant from the previous key frame is enough;

whether the distance in space from the previous key frame is sufficiently far;

the observation quality is controlled to a certain range with the common observation point of the last key frame. Too many co-observations indicate a higher overlap rate, and too few will reduce the tracking accuracy.

In the present embodiment, in addition to the above-mentioned 3 points, special requirements are made for the rotation of the aerial image. Mainly because unmanned aerial vehicle (camera) yaw angle changes when (round the z axle) rotatory, the aerial image of shooing that shoots changes little. Therefore, when the pitch angle and the yaw angle of the unmanned aerial vehicle (camera) exceed a certain angle, the key frames are added, and when only the yaw angle changes, the key frames are not added.

If the current frame tracks the previous key frame, the tracking loss phenomenon can occur due to the reasons that the detected feature points are few, the matched feature points are few, or the calculated pose deviation is large and the like. To increase the redundancy of the whole system, relocation after a loss is crucial. In the system herein, when GPS is available, several frames closer to the current frame are found as candidate frames using GPS. And then finding a key frame with better feature matching from the candidate frames, and solving the PnP again to track and calculate the pose. When the GPS is not available, a bag of words (BOW) is used to find candidate frames, and a keyframe with better feature matching is also found from the candidate frames for tracking.

The bag of words was originally used for text classification, with the text representing feature vectors. Later, the bag of words was used in computer vision for image classification and image retrieval, etc. Taking the SIFT features as an example, firstly extracting a large number of SIFT features from an image, then classifying the features layer by layer according to the distance of the 128 floating point type descriptors of the features by using a clustering method such as Kmeans and the like, and taking the class at the bottommost layer as a word in a dictionary. And outputting the frequency of the word in each image statistical dictionary as a vector, thereby completing the conversion from the image to the vector. By comparing the vector distance between different images, the similarity between the images can be approximately calculated. A few frames with a small vector distance from the current frame may be candidate frames for relocation.

In the map initialization process, the pose is obtained by utilizing two frames of epipolar constraints, and then an initial map point cloud is generated through triangulation. In the tracking process, the pose of a new frame (current frame) can be obtained by continuously solving the PnP through the matched feature points of the current frame and the map point cloud and optimizing the PnP. However, as the camera is far away from the area where the map point cloud is initialized, the feature points of the new frame are less and less matched with the map point cloud, and finally, the PnP cannot be solved and the tracking is lost. Therefore, to ensure the continuity of the system, new map points are continuously added after the current frame is calculated. Since the map point creation and key frame addition conditions are relaxed, a very strict screening mechanism is used to remove redundant key frames and map points which are not easy to track.

And in the tracking process, one part of all the feature points of the current frame is used for matching the feature points of the previous key frame, and then the PnP is solved by using the map points corresponding to the feature points of the previous key frame and the feature points of the current frame. So that the feature points matching the previous key frame have corresponding map points, and new map points are generated from these feature points not matching the map points. Feature points that do not have matching map points match feature points in keyframes that have a common view relationship with the current frame (except for the last keyframe utilized in the tracking process). And calculating epipolar lines of the feature points on the matched key frames by using an epipolar constraint equation, and judging whether the matching is correct or not by calculating whether the matched feature points are near the epipolar lines. If the distance of the matched feature point from the epipolar line exceeds a certain threshold, a mismatch is identified and re-matched in other keyframes. Since the current frame needs to be matched with the features of a plurality of co-view key frames, the matching time is long due to large calculation amount. To speed up the matching, using the bag of words described above, the matching is only done in features that have the same word label in the co-view keyframe. After the matched feature points are obtained, new map points can be calculated by utilizing triangulation.

Because the adding condition of the map points and the key frames is looser, the redundant key frames and the map points with low confidence coefficient are deleted, so that the optimized calculation amount can be reduced. Deletion judgment conditions of key frames: 90% of the observed map points for this key frame are observed by more than three other key frames at the same time. Deleting conditions of map points: less than three keyframes are observed for the map point at the same time.

So far, only one-to-one constraint is considered for the pose of the current frame and the coordinate estimation of the new map point. The pose estimation of the current frame is obtained by solving the PnP and optimizing the projection error calculation by projecting map points to the current frame. The new map point is obtained by triangulation of feature points matched between the current frame and the common frame. In fact, after the map is updated, any map point can be observed by multiple frames together. Therefore, the projection errors of all map points onto the keyframes in which they are observed can be jointly optimized. The optimization process is similar to BA, constructing a projection with least squares adding multiple frames compared to equations 2-100:

where χ ij represents whether the ith map point is visible in the j cameras. If it is seen that its value is 1, otherwise it is 0. θ ═ ξ 1,. ξ M, P1,. PN) is the variable to be optimized, θ ∈ R9M + 3N.

The incremental equation for the Levenberg-Marquardt optimization method is as follows:

[J(θ)^TJ(θ)+λI]δθ＝-J(θ)^T(u-f(θ))

the Jacobian matrix J (theta) is a sparse matrix, as shown in FIG. 3(a), which shows the Jacobian matrix when only 4 map points are observed by 3 key frames at the same time. Can be decomposed into two parts:

J(θ)＝[J_C，J_P]

each small block in the JC is a derivative of a projection error relative to the pose of each key frame, and a matrix of 2 multiplied by 6 is obtained by an equation 2-98. In JP, each small block is a derivative of the projection error with respect to each map point, and is a 2 × 3 matrix, and the calculation formula can be obtained by a chain rule and then by substituting the above formula:

wherein

The derivative to Pi is then R. Furthermore, each ofThe small blocks and the left side of the reprojection error are all decomposed into an L matrix from the information matrix Σ in the above equation. The value of [ J (theta) TJ (theta) + Lambda I in the formula can be obtained from the Jacobian matrix J (theta)]The matrix is shown in fig. 3 (b). The matrix is a symmetric sparse matrix, and can be partitioned into the following blocks according to the definition of the Jacobian matrix:

thus, the equation for optimization can be:

schur compensation is multiplied by the left and right of the above formula, so that marginalization is realized:

the first equation in the equation set of the above formula is a linear equation, and all the camera pose increments δ C can be directly solved. All map point increments δ P can be solved by substituting δ C into the second expression of the system of equations. And the accurate camera pose and map point coordinates can be continuously estimated in an iterative manner by obtaining the increment of the optimization variable of each step. In order to avoid excessive calculation amount in the mapping part, only local optimization is performed, namely, only the poses of the current frame and the key frame with the common-view relationship and the coordinates of map points observed by the current frame and the key frame are optimized. In addition, GPS information is fused in local optimization to serve as a constraint item so as to obtain a more accurate camera pose. The cost function is obtained by adding GPS constraint in the optimization method, and more accurate camera pose and point cloud coordinates can be obtained by optimizing the cost function.

And (4) optimizing all map points and key frames (global optimization) after detecting and correcting loop 2.5 and at regular frame number intervals, and starting an independent thread to perform.

In the tracking process, the estimation of the pose of the camera is a gradual recursion process, namely the pose of the previous key frame is tracked to solve the pose of the current frame. In fact, each frame pose solution has errors, and the errors are accumulated continuously as the tracking is performed continuously. One effective way to eliminate the accumulated error is to perform loop detection. The loop detection is used to determine whether the camera has returned to a previously-visited position, and if loop is detected, the constraints between the current frame and the loop frame can be added to the back-end optimization to correct the error.

When loop detection is actually performed, the most direct method is to match all previous key frames with the current frame, and the key frame with the best match can be used as a loop frame. However, as the key frames are added, the matching speed is slower and the calculation amount is larger. In order to reduce the calculation amount, when the GPS is available, the GPS is utilized to search a plurality of key frames which are closer to the current frame to be used as candidate frames, the current frame is matched with the candidate frames, and the best matched frame is taken as a loop back frame. The bag of words approach is utilized to speed up matching when GPS is not available. The method of the word bag is to find words corresponding to the descriptors of the picture feature points in the dictionary tree. And counting the number of the words to form a bag-of-words vector. The similarity between two pictures can be estimated by comparing the distance of the bag-of-word vectors between the images. And calculating the distance between the current frame and the bag-of-word vectors of all key frames, searching the frames with the minimum distance as candidate frames, and finally taking the best matched candidate frame as a loop back frame.

Most drones today provide relatively accurate GPS information, and therefore methods for detecting loops using GPS are well suited. However, in the method using bags, when the environment itself has similar scenes, it is easy to detect that the wrong loop leads to optimizing the destructive result. Therefore, it is required to design a more rigorous selection strategy for loop candidate frames: 1) loop detection is performed every 10 frames to prevent redundant calculation; 2) calculating the maximum distance Dm of the word bag vectors of the current frame and all the connected key frames, wherein the distance between the word bag vectors of the candidate frame and the current frame is smaller than Dm; 3) and if 3 continuous frames continuously detect the candidate frame, the loop is considered to be detected. And matching the 3 rd frame with the loop candidate frame, and searching the key frame with the best matching as a loop frame. And (4) bringing the matching constraint relation of the current frame and the loop frame into global optimization to correct the accumulated error.

The visual perception module and the intelligent decision module are combined to use a target detection algorithm to extract geographic entities existing in images, depth level features of the geographic entities are extracted by utilizing a pre-trained AlexNet convolutional neural network, dimension reduction processing is carried out on feature vectors of each geographic entity, a grid map is generated by using a real-time digital earth surface model, barrier marks are arranged at corresponding positions in the grid map according to the types of the earth objects, sending from an initial place of the environment is achieved by combining a DSAS algorithm, an optimal path capable of bypassing the barriers is searched to reach a terminal point, and real-time path planning is achieved.

The invention also provides a dynamic battlefield environment real-time path planning method, which comprises the following steps:

as shown in fig. 4, an image shot by an unmanned aerial vehicle is used as an input to generate a real-time digital earth surface model, a target detection algorithm is used to extract geographic entities existing in the image, a pre-trained AlexNet convolutional neural network is used to extract depth level features of the geographic entities, a feature vector of each geographic entity is subjected to dimension reduction processing to generate a grid map, barrier marks are arranged at corresponding positions in the grid map according to types of the geographic entities, the DSAS algorithm is combined to realize starting from an environment, an optimal path capable of bypassing barriers is searched to reach an end point, and real-time path planning is realized.

Specifically, after the real-time digital earth surface model is generated, the geographic entities existing in the image are extracted by using a target detection algorithm, and in consideration of the problem of computational efficiency, the embodiment adopts an entity detection method based on Yolo, but the range of the detected entities is restricted, only relatively static geographic entities are detected, such as houses, bridges and other geographic entities, and the detection range of the volatile entities such as pedestrians, vehicles and the like is different. After the geographic entities are extracted, the pre-trained AlexNet convolutional neural network is used for extracting the depth level features of the geographic entities, because the feature vector dimension of each geographic entity is too high, the calculation efficiency is greatly influenced, the feature vector needs to be subjected to dimension reduction, the dimension reduction processing is carried out by adopting a Gaussian random mapping method, barrier marks are arranged at corresponding positions in the grid map according to the types of the ground objects, and data are provided for subsequent path planning.

After a grid map of path planning is obtained, a Dynamic Sparse A Search (DSAS) algorithm is adopted to Search an optimal path which can bypass an obstacle from the beginning of the environment to reach an end point, and real-time path planning is realized.

In the DSAS algorithm, each node of the flight path contains pointers pointing to a father node and all child nodes of the node except coordinate information representing the space position of the node, when a new obstacle is detected in front of the current flight path, the DSAS algorithm deletes the flight path node threatening the coverage area and all descendant nodes of the node, reinserts the father node into an OPEN table, then calculates the cost value of a new expansion node according to the method, and continues the searching process until a new optimal flight path of the aircraft is obtained.

The method comprises the steps of using an image shot by an unmanned aerial vehicle as an input to generate a real-time digital earth surface model, using an aerial image shot by the unmanned aerial vehicle on line as an input by using a visual SLAM, generating a 3D point cloud for each frame of aerial image in real time, removing noise points of the 3D point cloud by adopting iterative filtering, executing Delaunay triangulation on feature points of the 3D point cloud after the noise points are filtered to generate a 2D mesh, projecting the generated 2D mesh into the 3D point cloud after the noise points are filtered to generate a 3D mesh, fusing texture information of the aerial image into the 3D mesh to generate a local DSM, dividing the local DSM into tiles, and fusing the tiles to the global DSM through a multiband algorithm to generate the real-time digital earth surface model.

The method for extracting the geographic entities existing in the image by using the target detection algorithm uses YouOnlyLookOne (YoLO) in the embodiment. Different from the two-stage algorithm, the YOLO removes a candidate box generation part (a region development stage), the whole process directly extracts features, candidate box regression and classification are completed in a model without branches, and the detection speed is improved by about 10 times compared with the two-stage algorithm.

Specifically, as shown in fig. 5, the input picture is divided into S × S grids, and each grid is responsible for detecting all objects falling into its own region. Each grid predicts 5 values, and the coordinates (x, y) of the center position of the predicted frame of the object, the width and height (w, h) of the object, and the confidence that the object belongs to a certain class are respectively predicted. Wherein, the confidence degree refers to the confidence degree that the region contains the object and the accuracy of the prediction box. If the prediction box contains the detection object, then Pr (object) is 1. Otherwise, Pr (object) is 0. In the prediction phase, the probability of a class is the product of the class conditional probability and the confidence of the prediction:

the detailed flow chart of the network is shown in fig. 6, and the network includes 24 convolutional layers, 2 fully-connected layers, and 5 pooling layers. The output of the network is a 7 x 30 dimensional vector: 7 × 7 represents a grid that divides the input into 7 × 7; 30 includes object type and width, position, and confidence information of each of the two prediction frames, and the size of the input image is not changeable because of the inclusion of the full link layer, and must be 448 × 448.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing embodiments, or equivalent changes may be made to some of the features of the embodiments. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A dynamic battlefield environment real-time path planning system, comprising: the map reconstruction module, the visual perception module, the intelligent decision module and the UI interaction module;

the map reconstruction module includes: the visual SLAM takes aerial images shot by an unmanned aerial vehicle on line as input, generates 3D point cloud for each frame of image in real time, generates local DSM in real time by combining the 3D point cloud with texture information of the image, and fuses the local DSM to a global DSM to generate a real-time digital earth surface model;

2. The dynamic battlefield environment real-time path planning system of claim 1, wherein: the visual SLAM system comprises feature extraction, tracking, drawing and looping;

3. A dynamic battlefield environment real-time path planning method is characterized in that: the method comprises the following steps:

the method comprises the steps of generating a real-time digital earth surface model by taking an image shot by an unmanned aerial vehicle as input, extracting geographic entities existing in the image by using a target detection algorithm, extracting depth level features of the geographic entities by using a pre-trained AlexNet convolutional neural network, performing dimension reduction processing on feature vectors of each geographic entity to generate a grid map, setting barrier marks at corresponding positions in the grid map according to the types of the ground objects, starting from an environment starting place, searching for an optimal path capable of bypassing the barriers to reach an end point by combining a DSAS algorithm, and realizing real-time path planning.

4. The dynamic battlefield environment real-time path planning system of claim 3, wherein: the method for generating the real-time digital earth surface model by taking an image shot by an unmanned aerial vehicle as input comprises the steps of using a visual SLAM to take an aerial image shot by the unmanned aerial vehicle on line as input, generating 3D point cloud for each frame of aerial image in real time, removing noise points of the 3D point cloud by adopting iterative filtering, executing Delaunay triangulation on feature points of the 3D point cloud after the noise points are filtered to generate 2D grids, projecting the generated 2D grids into the 3D point cloud after the noise points are filtered to generate 3D grids, fusing texture information of the aerial image into the 3D grids to generate local DSMs, dividing the local DSMs into tiles, and fusing the tiles to the global DSMs through a multiband algorithm to generate the real-time digital earth surface model.

5. The dynamic battlefield environment real-time path planning system of claim 3, wherein: the method for extracting the geographic entities existing in the image by using the target detection algorithm comprises the steps of dividing the input image into S × S grids, wherein each grid is responsible for detecting all objects falling into the region of the grid, 5 values are predicted by each grid, coordinates (x, y) of the center position of a prediction frame of each predicted object, the width and height (w, h) of each object and confidence coefficient of each object belonging to a certain class are respectively predicted, if the prediction frame contains a detection object, Pr is 1, otherwise, Pr is 0, and in the prediction stage, the probability of each class is the product of the class conditional probability and the predicted confidence coefficient:

6. the dynamic battlefield environment real-time path planning system of claim 5, wherein: the geographic entity comprises a relatively stationary geographic entity.

7. The dynamic battlefield environment real-time path planning system of claim 3, wherein: in the DSAS algorithm, besides coordinate information representing the space position of the unmanned aerial vehicle, pointers pointing to a father node and all child nodes of the unmanned aerial vehicle are also included, when a new obstacle is detected in front of the current track, the DSAS algorithm deletes the track node threatening the coverage area and all descendant nodes of the track node, reinserts the father node into an OPEN table, then calculates the cost value of a new extended node, and continues the searching process until a new optimal track of the unmanned aerial vehicle is obtained.