WO2020186678A1

WO2020186678A1 - Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium

Info

Publication number: WO2020186678A1
Application number: PCT/CN2019/097745
Authority: WO
Inventors: 周翊民; 龚亮; 吴庆甜
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2019-03-19
Filing date: 2019-07-25
Publication date: 2020-09-24
Also published as: CN110047142A

Abstract

The present application relates to a three-dimensional map constructing method for an unmanned aerial vehicle. Said method comprises: acquiring video frame images photographed by a camera, extracting feature points in each of the video frame images; using a color histogram and scale-invariant feature transformation hybrid matching algorithm to perform matching on the feature points, to obtain feature point matching pairs; performing calculation according to the feature point matching pairs, to obtain an attitude transformation matrix; determining, according to the attitude transformation matrix, three-dimensional coordinates corresponding to each video frame image; converting the three-dimensional coordinates of the feature points in the video frame images into a world coordinate system, to obtain a three-dimensional point cloud map; taking the video frame images as an input to a target detection model, to obtain target object information; and combining the three-dimensional point cloud map with the target object information, to obtain a three-dimensional point cloud map containing the target object information. The method improves the real-time performance and accuracy of the construction of a three-dimensional point cloud map, containing rich information. In addition, further provided are a three-dimensional map constructing apparatus for an unmanned aerial vehicle, a computer device and a storage medium.

Description

UAV three-dimensional map construction method, device, computer equipment and storage medium

Technical field

The present invention relates to the field of computer technology, in particular to a method, device, computer equipment and storage medium for constructing a three-dimensional map of an unmanned aerial vehicle.

Background technique

With the development of science and technology, UAVs are becoming smaller and more intelligent, and their flight space has expanded to jungles, cities and even buildings. Environmental perception is the basis of UAV's understanding of the environment, navigation, planning and behavioral decision-making in the working process. Among them, the most important goal of environment perception is to build a perfect three-dimensional map, and then plan and navigate the route based on the three-dimensional map.

The construction of traditional 3D maps is either low in accuracy or low in real-time, and the amount of map information constructed is less, which affects subsequent path planning and navigation.

Summary of the invention

Therefore, it is necessary to address the above problems and provide a method, device, computer equipment, and storage medium for building a three-dimensional UAV map that can achieve good real-time requirements while ensuring high accuracy and contains a large amount of information.

In the first aspect, an embodiment of the present invention provides a method for constructing a three-dimensional map of a drone, the method including:

Obtain the video frame images taken by the camera, and extract the feature points in each video frame image;

Use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frames and images, and obtain feature point matching pairs between video frames and images;

Calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;

Determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

Convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;

Using the video frame image as an input of a target detection model to obtain target information in the video frame image detected by the target detection model;

The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.

In the second aspect, an embodiment of the present invention provides a device for constructing a three-dimensional map of a drone, the device including:

The extraction module is used to obtain the video frame images taken by the camera, and extract the feature points in each video frame image;

The matching module is used to use the color histogram and scale-invariant feature transformation hybrid matching algorithm to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images;

A calculation module, configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;

A determining module, configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

The conversion module is used to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;

The detection module is configured to use the video frame image as the input of the target detection model to obtain target information in the video frame image detected by the target detection model;

The combining module is used to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

In a third aspect, an embodiment of the present invention provides a computer device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps:

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor executes the following steps:

The above-mentioned UAV 3D map construction method, device, computer equipment and storage medium can match the feature points between the video frame images by using the color histogram and the scale-invariant feature transformation hybrid matching algorithm, which can improve the accuracy of feature point matching Degree and real-time. In addition, through the target detection model, the target in the video frame image is identified and detected, and the target information is combined with the 3D point cloud map to obtain a 3D point cloud map containing the object information, so that the established 3D point cloud map contains There is more information. That is, the accuracy of 3D map construction is improved by mixing and matching color histograms and scale-invariant feature transformations, and combined with the target information obtained by the target detection model to make the 3D point cloud map contain richer content, which is useful for follow-up Provide support for optimal path planning.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained based on the structures shown in these drawings.

Figure 1 is a flowchart of a method for constructing a three-dimensional map of a drone in an embodiment;

Fig. 2 is a schematic diagram of a method for constructing a three-dimensional map of a drone in an embodiment;

FIG. 3 is a schematic diagram of combining color histogram and SIFT feature matching in an embodiment;

Figure 4 is a schematic diagram of training and prediction of a drone target detection model based on deep learning in an embodiment;

Fig. 5 is a structural block diagram of a device for constructing a three-dimensional map of a drone in an embodiment;

Fig. 6 is a structural block diagram of a device for constructing a three-dimensional map of a drone in another embodiment;

FIG. 7 is a structural block diagram of a device for constructing a three-dimensional map of a drone in another embodiment;

Figure 8 is an internal structure diagram of a computer device in an embodiment.

detailed description

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

As shown in Figure 1, a method for constructing a three-dimensional map of a drone is proposed. The method for constructing a three-dimensional map of a drone is applied to a drone or a terminal or server connected to the drone. In this embodiment, it is applied to a drone. Take the man-machine as an example, it specifically includes the following steps:

Step 102: Obtain video frame images captured by the camera, and extract feature points in each video frame image.

Among them, feature points can be simply understood as more prominent points in the image, such as contour points, bright spots in a darker area, and dark spots in a brighter area. In one embodiment, the camera of the drone may use an RGB-D camera to obtain the color image and depth image obtained by shooting, and align the obtained color image and depth image in time, and then extract the features in the color image Point, feature point feature extraction can use color histogram and scale-invariant feature transformation for feature extraction.

Step 104: Use a color histogram and a scale-invariant feature transform hybrid matching algorithm to match the feature points between the video frame images to obtain a feature point matching pair between the video frame images.

Among them, the color histogram matching algorithm focuses on the matching of color features, and the scale invariant feature transform (SIFT) focuses on the matching of shape features. Therefore, the color histogram matching algorithm and the scale transformation feature transformation are mixed, that is, the "color" of the color histogram is combined with the "shape" of the SIFT algorithm, which improves the accuracy of feature recognition and the accuracy of feature point matching At the same time, it is also beneficial to improve the real-time performance of recognition, thereby helping to improve the real-time performance and accuracy of subsequent 3D point cloud map generation.

After the feature points in each video frame image are extracted, feature matching is performed according to the features of the feature points to obtain feature point matching pairs between the video frame images. Since the drone is constantly flying, the position of the same point in the real space is different in different video frames. By acquiring the features of the feature points in the front and rear video frames, and then matching according to the features, the real space is obtained. The position of the same point in different video frames.

In one embodiment, two adjacent video frame images are obtained, the features of multiple feature points are extracted from the previous video frame image and the next video frame image, and then the features of the feature points are matched to obtain the previous one The matching feature points in the video frame image and the following video frame image form a feature point matching pair. For example, the feature points in the previous video frame image are respectively P1, P2, P3..., Pn, and the corresponding matching feature points in the next video frame image are respectively Q1, Q2, Q3..., Qn. Among them, P1 and Q1 are feature point matching pairs, P2 and Q2 are feature point matching pairs, and P3 and Q3 are feature point matching pairs. Feature point matching can use Brute Force or Fast Nearest Neighbor (FLANN) algorithm for feature matching. Among them, the Fast Nearest Neighbor algorithm judges whether the ratio between the closest matching distance and the next-closest matching distance exceeds a set threshold. If it exceeds the preset threshold, it is determined that the matching is successful, thereby reducing mismatched point pairs.

Step 106: Calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images.

Among them, after determining the position of the feature point in the video frame image, the pose transformation matrix between the video frame images can be calculated according to the correspondence between the positions.

Step 108: Determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix.

Among them, the three-dimensional coordinates corresponding to the video frame image refer to the three-dimensional coordinates corresponding to the camera in the drone. After the pose transformation matrix between the video frames is known, the three-dimensional coordinates of any video frame image can be calculated according to the transformation relationship. The three-dimensional coordinates corresponding to the video frame image actually refer to the camera shooting the video frame image. The three-dimensional point coordinates corresponding to the location.

Step 110: Convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a three-dimensional point cloud map.

Among them, since the three-dimensional coordinates corresponding to each video frame image are the three-dimensional coordinates in the corresponding camera coordinate system, the coordinates of the feature points in the video frame image are also in the camera coordinate system, in order to convert the coordinates of the feature points to In the world coordinate system, the transformation is performed according to the pose transformation matrix to obtain the three-dimensional coordinates of the feature points in the world coordinates, thereby obtaining a three-dimensional point cloud map.

Step 112: The video frame image is used as the input of the target detection model, and the target object information in the video frame image detected by the target detection model is obtained.

Among them, the target detection model is obtained by pre-training, and the target detection model is used to detect the target object appearing in the video frame image, for example, a car. Since the video frame image may contain multiple objects, if the category of each object needs to be recognized, multiple target detection models need to be trained accordingly. After the target detection model is trained, the video frame image is used as the input of the target detection model, and the target object and the location of the target object in the video frame image can be detected.

Step 114: Combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

Among them, after obtaining the target information in the video frame image, by matching with the feature points on the three-dimensional point cloud map, the feature points corresponding to the target can be determined, and the target information corresponding to the feature points can be marked in Three-dimensional point cloud map, so that the established three-dimensional point cloud map has a richer amount of information. The target detection model is used for local perception, and the construction of the 3D point cloud map is based on global perception, which combines global perception and local perception, thereby increasing the richness of the 3D point cloud map.

The above-mentioned UAV 3D map construction method uses a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images, which can improve the accuracy and real-time performance of feature point matching. In addition, through the target detection model, the target in the video frame image is identified and detected, and the target information is combined with the 3D point cloud map to obtain a 3D point cloud map containing the object information, so that the established 3D point cloud map contains There is more information. That is, the accuracy of 3D map construction is improved by mixing and matching color histograms and scale-invariant feature transformations, and combined with the target information obtained by the target detection model to make the 3D point cloud map contain richer content, which is useful for follow-up The optimal path planning provides support and improves the intelligent level of drone environment perception.

As shown in Fig. 2, in one embodiment, a schematic diagram of a method for constructing a three-dimensional map of a UAV includes two parts: global perception and local perception. In the global perception, a mixed structural framework of color histogram and SIFT features is used for matching, and then positioning and 3D point cloud map construction are performed. Local perception uses the target detection model to identify the target in the video frame image. Finally, the two are combined to obtain a three-dimensional point cloud map containing the target information.

In one embodiment, the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms The graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.

Among them, the color histogram is used for preliminary feature point matching to obtain the first matching pair set, and then the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature point matching pair . In one embodiment, the matching of the color histogram adopts Bhattacharyya distance calculation, or adopts Correlation distance calculation. As shown in FIG. 3, it is a schematic diagram of the combination of color histogram and SIFT feature matching in an embodiment, and the two are in a cascade relationship.

In one embodiment, calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.

Among them, after determining the feature point matching pair, the three-dimensional coordinates of each feature point are obtained. The three-dimensional coordinates are obtained from the color image and the depth image taken by the RGB-D camera. The color image is used to identify the x and x of the feature point. The y value, the depth image is used to obtain the corresponding z value. For two video frame images, the feature point matching pairs are regarded as two sets respectively, the set of feature points in the first video frame image is {P|P _i ∈ R ³ , i=1, 2...N}, The set of feature points in the second video frame image is {Q|Q _i ∈ R ³ , i=1, 2...N}, the error between the two point sets is taken as the cost function, and the minimum of the cost function Calculate the corresponding rotation matrix R and translation vector t. It can be expressed by the following formula:

Among them, R and t are rotation matrix and translation vector respectively. The steps of the iterative closest point algorithm are:

1) The closest point corresponding to each point in P _i in Q is marked as Q _i ;

2) Obtain the smallest transformation matrix R and t according to the above formula;

3) Use R and t to perform a rigid body transformation operation on the point set P to obtain a new point set

Calculate the error distance between the new point set and the point set Q:

In actual operation, the rotation matrix and translation vector with constraints can be expressed by unconstrained Lie algebra, and the number of feature points whose error distance is less than the set threshold, that is, the number of interior points, can be recorded. If the error distance E _d calculated in step 3) is less than the threshold and the interior point is greater than the set threshold, or if the number of iterations reaches the set threshold, the iteration ends; if not, then go to step 1) for the next iteration.

In one embodiment, the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.

Among them, the target detection model is obtained by training with a deep learning model. In order to train the target detection model, first obtain the training video image sample, and set the positive sample and the negative sample. The positive sample is the video image that contains the target and the position mark of the target in the video image. It can be detected through training. Target detection model of the target. As shown in FIG. 4, in one embodiment, the training and prediction schematic diagram of the UAV target detection model based on deep learning is divided into two parts: preprocessing and real-time detection. To detect targets in real time, first perform pre-processing operations on the data collected by the drone, divide the collected video stream into video frame images, mark the targets in the images, and divide them into training and test data sets, using depth The learning framework trains the model, and then applies the saved model to the video stream returned by the platform to complete the real-time detection of the target.

Use a small drone carrier, equipped with an industrial camera, extensively sample a large number of video data for the scene under the drone's perspective, determine the identification target of the drone, and mark the required identification target in the acquired video data. Use the preprocessed data to train the neural network model, adjust the model parameters until the training results meet the convergence conditions, save the training model for subsequent target detection, load the trained model to the drone, and use the drone to detect the target Experiment and continuously adjust the optimization model.

In a specific embodiment, the deep learning model adopts the YOLOv3 network structure (also called Darknet-53), and adopts a fully convolutional network, including: the introduction of a residual (residual) structure, that is, the ResNet layer jump connection method, and a large number of residuals Poor network characteristics. Convolution with a step size of 2 is used for down-sampling, while up-sampling and route operations are used to perform 3 detections in a network structure. Use dimensional clustering as anchor boxes to predict bounding boxes, and use the sum of squared error losses during training to predict the object score of each bounding box through logistic regression. If the previous bounding box is not the best, and the object to be tested overlaps a certain threshold, we will ignore this prediction and continue. We use the threshold 0.5 system to assign only one bounding box to each object under test. If the previous bounding box is not assigned to the object to be measured, there will be no loss of coordinates or category prediction. Each box uses multi-label classification to predict the classes that the bounding box may contain. In the training process, binary cross-entropy loss is used for category prediction. The YOLOv3 lightweight target detection neural network structure is applied to the UAV platform, which improves the ability of real-time target recognition under the limited computing power of the UAV.

In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the target information to the three-dimensional point cloud map according to the feature point.

Among them, according to the position of the detected target in the video frame image and the position of the feature point in the video frame image, the target information matching the feature point is determined, and the target information is marked on the three-dimensional point cloud map. Obtain a three-dimensional point cloud map with more information.

In one embodiment, the method further includes: obtaining measurement data measured by an inertial measurement unit; calculating an initial pose transformation matrix between video frames according to the measurement data; The feature point matching pair calculated to obtain the pose transformation matrix between the video frame images includes: calculating the target position between the video frames according to the feature point matching pair between the initial pose transformation matrix and the video frame image Pose transformation matrix.

Among them, an inertial measurement unit (IMU) is a device that measures the three-axis attitude angle (or angular velocity) and acceleration of an object. The inertial measurement unit is used as the inertial parameter measurement device of the UAV. The device includes a three-axis gyroscope, a three-axis acceleration and a three-axis magnetometer. The UAV can directly read the measurement data measured by the inertial measurement unit. The measurement data includes: angular velocity, acceleration, and magnetometer data. After obtaining the measurement data measured by the inertial measurement unit, the UAV's pose transformation matrix can be directly calculated based on the measurement data. Because the inertial measurement unit will have accumulated errors, the obtained UAV pose transformation matrix is not enough accurate. In order to distinguish it from the post-optimized pose transformation matrix, the pose transformation matrix directly calculated from the measurement data is called the "initial pose transformation matrix". The pose transformation matrix includes a rotation matrix R and a translation vector t. In one embodiment, the initial pose transformation matrix corresponding to the measurement data is calculated by using a complementary filtering algorithm. After the initial pose transformation matrix is obtained, the initial pose transformation matrix is used as the initial matrix, and the Iterative Closest Point (ICP) algorithm is used to calculate the target between the video frames according to the feature point matching pair between the video frame images. Pose transformation matrix. Using the initial pose transformation matrix obtained by the inertial measurement unit as the initial matrix is beneficial to improve the calculation speed.

In an embodiment, after the calculation of the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images, the method further includes: calculating the distance between the current video frame and the previous key frame If the amount of movement is greater than the preset threshold, the current video frame is used as a key frame; when the current video frame is a key frame, the current video frame is matched with the key frame in the previous key frame library, if the If there is a key frame matching the current video frame in the key frame library, the current video frame is taken as the loop frame; the corresponding pose transformation matrix is optimized and updated according to the loop frame to obtain the updated pose transformation matrix; The pose transformation matrix determining the three-dimensional coordinates corresponding to each video frame image includes: determining the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.

Among them, in order to reduce the complexity of subsequent optimization, the calculation complexity can be reduced by extracting key frames. Since the captured video frames are relatively dense, for example, generally 30 frames can be captured within one second. It can be seen that the similarity between frames is very high, or even the same, so if you calculate each frame, it will undoubtedly increase the calculation. the complexity. Therefore, the complexity can be reduced by extracting key frames. Specifically, first take the first video frame as a key frame, and then calculate the amount of motion between the current video frame and the previous key frame. If the amount of motion is within a certain threshold range, select the key frame, where the calculation formula of the amount of motion is:

Wherein, E _m represents a measure of the amount of _{_{exercise, t x, t y, t}} z t represents the translation vector three translational distance,

Represents the rotation Euler angle of inter-frame motion, which can be converted from the rotation matrix. ω ₁ and ω ₂ are the balance weights of translation and rotation respectively. For the visual field shot by the camera, rotation is easier to bring about larger scene changes than translation Therefore, the value of ω ₂ is larger than ω ₁ , and the specific value should be adjusted according to the specific situation.

After the key frames are extracted, the loop detection method is used to optimize and update the obtained pose transformation matrix. In one embodiment, a closed loop detection algorithm is used for loop detection. After the loop detection is performed, the target pose transformation matrix is updated and optimized according to the loop detection result to obtain a more accurate pose transformation matrix, which is called "updated pose transformation matrix" for distinction. Determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.

As shown in Figure 5, a three-dimensional map construction device for drones is proposed, which includes:

The extraction module 502 is configured to obtain video frame images taken by the camera, and extract feature points in each video frame image;

The matching module 504 is configured to use a color histogram and a scale-invariant feature transform hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images;

The calculation module 506 is configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images;

The determining module 508 is configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

The conversion module 510 is configured to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;

The detection module 512 is configured to use the video frame image as the input of the target detection model, and obtain target information in the video frame image detected by the target detection model;

The combining module 514 is configured to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

In one embodiment, the matching module 504 is further configured to use a color histogram feature matching algorithm to match feature points between video frame images to obtain a first set of matching pairs; use a scale-invariant feature transform matching algorithm to The first matching pair further matches the matching points in the set to obtain the target feature point matching pair.

In an embodiment, the calculation module 506 is further configured to obtain the three-dimensional coordinates of each feature point in the feature point matching pair; calculate the conversion obtained by converting the three-dimensional coordinates of the feature point in one video frame image to another video frame image Three-dimensional coordinates; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the another video frame image; calculate the pose transformation matrix according to the converted three-dimensional coordinates and the target three-dimensional coordinates.

In one embodiment, the target detection model is obtained based on deep learning model training; the above-mentioned three-dimensional map construction device for drones further includes: a training module for obtaining training video image samples, the training video image samples including positive Samples and negative samples, the positive sample includes a target and a position mark of the target in the video image; training the target detection model according to the training video image sample to obtain a trained target detection model.

In one embodiment, the combining module 514 is also used to obtain the target position of the detected target in the video frame image; determine the matching feature point according to the target position; according to the feature point, the The object category information is marked on the three-dimensional point cloud map.

As shown in FIG. 6, in one embodiment, the above-mentioned three-dimensional map construction device for drones further includes:

The initial calculation module 505 is configured to obtain measurement data measured by the inertial measurement unit, and calculate an initial pose transformation matrix between video frames according to the measurement data;

The calculation module is further configured to include: calculating the target pose transformation matrix between the video frames according to the feature point matching pair between the initial pose transformation matrix and the video frame image.

As shown in FIG. 7, in one embodiment, the above-mentioned three-dimensional map construction device for drones further includes:

The key frame determination module 516 is used to calculate the amount of motion between the current video frame and the previous key frame. If the amount of motion is greater than the preset threshold, the current video frame is used as the key frame.

The loopback frame determination module 518 is configured to match the current video frame with a key frame in the previous key frame library when the current video frame is a key frame, and if there is a key frame in the key frame library that matches the current video frame Key frame, the current video frame is used as a loopback frame.

The optimization module 520 is configured to optimize and update the corresponding pose transformation matrix according to the loop frame to obtain an updated pose transformation matrix.

The determining module 508 is further configured to determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.

Fig. 8 shows an internal structure diagram of a computer device in an embodiment. The computer equipment can be a drone, or a terminal or server connected to the drone. As shown in FIG. 8, the computer device includes a processor, a memory, and a network interface connected through a system bus. Among them, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program. When the computer program is executed by the processor, the processor can enable the processor to implement the method for constructing a three-dimensional map of the drone. A computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the method for constructing a three-dimensional map of the UAV. The network interface is used to communicate with an external device. Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In an embodiment, the method for constructing a three-dimensional map of an unmanned aerial vehicle provided in this application can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 8. The memory of the computer equipment can store various program templates that make up the UAV 3D map construction device. For example, the extraction module 502, the matching module 504, the calculation module 506, the determination module 508, the conversion module 510, the detection module 512, and the combination module 514.

A computer device includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor executes the following steps: acquiring video frame images captured by a camera, and extracting The feature points in each video frame image; the color histogram and the scale-invariant feature transformation hybrid matching algorithm are used to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images; according to the The feature point matching pair between the video frame images is calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates corresponding to each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and The corresponding pose transformation matrix converts the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system to obtain a three-dimensional point cloud map; use the video frame image as the input of the target detection model to obtain the target detection model detection The target object information in the obtained video frame image; combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the object category information to the three-dimensional point cloud map according to the feature point.

In one embodiment, when the computer program is processed by the processor, it is also used to perform the following steps: obtain measurement data measured by the inertial measurement unit; calculate the initial pose between video frames based on the measurement data Transformation matrix; said calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, including: according to the initial pose transformation matrix and the video frame image The feature point matching pair is calculated to obtain the target pose transformation matrix between the video frames.

In one embodiment, after the pose transformation matrix between the video frame images is calculated according to the feature point matching pair between the video frame images, when the computer program is processed by the processor, the Perform the following steps: calculate the amount of motion between the current video frame and the previous key frame, if the amount of motion is greater than a preset threshold, use the current video frame as a key frame; when the current video frame is a key frame, set the current video frame Match with the key frame in the previous key frame library. If there is a key frame matching the current video frame in the key frame library, the current video frame is taken as the loop frame; the corresponding pose is transformed according to the loop frame The matrix is optimized and updated to obtain an updated pose transformation matrix; the determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix includes: determining the corresponding image of each video frame according to the updated pose transformation matrix Three-dimensional coordinates.

A computer-readable storage medium storing a computer program. When the computer program is executed by a processor, the processor executes the following steps: acquiring video frame images taken by a camera, and extracting features in each video frame image Point; use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images; according to the feature points between the video frame images The matching pair is calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates of each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and the corresponding pose transformation matrix The three-dimensional coordinates of the feature points in the frame image are converted to the world coordinate system to obtain a three-dimensional point cloud map; the video frame image is used as the input of the target detection model to obtain the target in the video frame image detected by the target detection model Object information; combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through a computer program. The program can be stored in a non-volatile computer readable storage medium. Here, when the program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of this application, and their descriptions are more specific and detailed, but they should not be construed as limiting the scope of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A method for constructing a three-dimensional map of an unmanned aerial vehicle, characterized in that the method includes:

Obtain the video frame images taken by the camera, and extract the feature points in each video frame image;

Use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frames and images, and obtain feature point matching pairs between video frames and images;

Calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;

Determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

Convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;

Using the video frame image as an input of a target detection model to obtain target information in the video frame image detected by the target detection model;

The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
The method according to claim 1, wherein the color histogram and scale-invariant feature transformation hybrid matching algorithm is used to match the feature points between the video frame images to obtain the feature point matching between the video frame images Yes, including:

Use the color histogram feature matching algorithm to match the feature points between the video frame images to obtain the first matching pair set;

A scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain a target feature point matching pair.
The method according to claim 1, wherein the calculation of the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images comprises:

Acquiring the three-dimensional coordinates of each feature point in the feature point matching pair;

Calculate the converted three-dimensional coordinates obtained by converting the 3D coordinates of the feature points in one video frame image to another video frame image;

Acquiring the target three-dimensional coordinates corresponding to the corresponding matching feature point in the another video frame image;

A pose transformation matrix is calculated according to the converted three-dimensional coordinates and the target three-dimensional coordinates.
The method according to claim 1, wherein the target detection model is obtained based on deep learning model training;

Before the step of using the video frame image as the input of the target detection model to obtain the detected target object output by the target detection model, the method further includes:

Acquiring a training video image sample, the training video image sample including a positive sample and a negative sample, the positive sample includes a target and a position mark of the target in the video image;

Training the target detection model according to the training video image sample to obtain a trained target detection model.
The method according to claim 1, wherein the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information comprises:

Obtain the target position of the detected target in the video frame image;

Determining a matching feature point according to the target position;

Annotate the object category information to the three-dimensional point cloud map according to the characteristic points.
The method of claim 1, wherein the method further comprises:

Obtain the measurement data obtained by the inertial measurement unit;

Calculating an initial pose transformation matrix between video frames according to the measurement data;

The calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes:

The target pose transformation matrix between the video frames is calculated according to the feature point matching pair between the initial pose transformation matrix and the video frame image.
The method according to claim 1, characterized in that, after calculating the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images, the method further comprises:

Calculate the amount of motion between the current video frame and the previous key frame, and if the amount of motion is greater than the preset threshold, use the current video frame as the key frame;

When the current video frame is a key frame, the current video frame is matched with the key frame in the previous key frame library, and if there is a key frame matching the current video frame in the key frame library, the current video frame As a loopback frame;

Optimize and update the corresponding pose transformation matrix according to the looped frame to obtain an updated pose transformation matrix;

The determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix includes: determining the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
An unmanned aerial vehicle three-dimensional map construction device, characterized in that, the device includes:

The extraction module is used to obtain the video frame images taken by the camera, and extract the feature points in each video frame image;

The matching module is used to use the color histogram and scale-invariant feature transformation hybrid matching algorithm to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images;

A calculation module, configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;

A determining module, configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

The conversion module is used to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;

The detection module is configured to use the video frame image as the input of the target detection model to obtain target information in the video frame image detected by the target detection model;

The combining module is used to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
A computer device, comprising a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the method according to any one of claims 1 to 7 A step of.
A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to execute the steps of the method according to any one of claims 1 to 7.