WO2020186678A1 - Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium - Google Patents

Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium Download PDF

Info

Publication number
WO2020186678A1
WO2020186678A1 PCT/CN2019/097745 CN2019097745W WO2020186678A1 WO 2020186678 A1 WO2020186678 A1 WO 2020186678A1 CN 2019097745 W CN2019097745 W CN 2019097745W WO 2020186678 A1 WO2020186678 A1 WO 2020186678A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
target
transformation matrix
feature
dimensional
Prior art date
Application number
PCT/CN2019/097745
Other languages
French (fr)
Chinese (zh)
Inventor
周翊民
龚亮
吴庆甜
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2020186678A1 publication Critical patent/WO2020186678A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models

Definitions

  • the present invention relates to the field of computer technology, in particular to a method, device, computer equipment and storage medium for constructing a three-dimensional map of an unmanned aerial vehicle.
  • an embodiment of the present invention provides a method for constructing a three-dimensional map of a drone, the method including:
  • the three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • an embodiment of the present invention provides a device for constructing a three-dimensional map of a drone, the device including:
  • the extraction module is used to obtain the video frame images taken by the camera, and extract the feature points in each video frame image
  • the matching module is used to use the color histogram and scale-invariant feature transformation hybrid matching algorithm to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images;
  • a calculation module configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images
  • a determining module configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix
  • the conversion module is used to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
  • the detection module is configured to use the video frame image as the input of the target detection model to obtain target information in the video frame image detected by the target detection model;
  • the combining module is used to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • an embodiment of the present invention provides a computer device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps:
  • the three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • an embodiment of the present invention provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor executes the following steps:
  • the three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • the above-mentioned UAV 3D map construction method, device, computer equipment and storage medium can match the feature points between the video frame images by using the color histogram and the scale-invariant feature transformation hybrid matching algorithm, which can improve the accuracy of feature point matching Degree and real-time.
  • the target detection model the target in the video frame image is identified and detected, and the target information is combined with the 3D point cloud map to obtain a 3D point cloud map containing the object information, so that the established 3D point cloud map contains There is more information.
  • the accuracy of 3D map construction is improved by mixing and matching color histograms and scale-invariant feature transformations, and combined with the target information obtained by the target detection model to make the 3D point cloud map contain richer content, which is useful for follow-up Provide support for optimal path planning.
  • Figure 1 is a flowchart of a method for constructing a three-dimensional map of a drone in an embodiment
  • Fig. 2 is a schematic diagram of a method for constructing a three-dimensional map of a drone in an embodiment
  • FIG. 3 is a schematic diagram of combining color histogram and SIFT feature matching in an embodiment
  • Figure 4 is a schematic diagram of training and prediction of a drone target detection model based on deep learning in an embodiment
  • Fig. 5 is a structural block diagram of a device for constructing a three-dimensional map of a drone in an embodiment
  • Fig. 6 is a structural block diagram of a device for constructing a three-dimensional map of a drone in another embodiment
  • FIG. 7 is a structural block diagram of a device for constructing a three-dimensional map of a drone in another embodiment
  • Figure 8 is an internal structure diagram of a computer device in an embodiment.
  • a method for constructing a three-dimensional map of a drone is proposed.
  • the method for constructing a three-dimensional map of a drone is applied to a drone or a terminal or server connected to the drone.
  • it is applied to a drone.
  • the man-machine Take the man-machine as an example, it specifically includes the following steps:
  • Step 102 Obtain video frame images captured by the camera, and extract feature points in each video frame image.
  • feature points can be simply understood as more prominent points in the image, such as contour points, bright spots in a darker area, and dark spots in a brighter area.
  • the camera of the drone may use an RGB-D camera to obtain the color image and depth image obtained by shooting, and align the obtained color image and depth image in time, and then extract the features in the color image Point, feature point feature extraction can use color histogram and scale-invariant feature transformation for feature extraction.
  • Step 104 Use a color histogram and a scale-invariant feature transform hybrid matching algorithm to match the feature points between the video frame images to obtain a feature point matching pair between the video frame images.
  • the color histogram matching algorithm focuses on the matching of color features
  • the scale invariant feature transform (SIFT) focuses on the matching of shape features. Therefore, the color histogram matching algorithm and the scale transformation feature transformation are mixed, that is, the "color” of the color histogram is combined with the "shape" of the SIFT algorithm, which improves the accuracy of feature recognition and the accuracy of feature point matching
  • it is also beneficial to improve the real-time performance of recognition thereby helping to improve the real-time performance and accuracy of subsequent 3D point cloud map generation.
  • feature matching is performed according to the features of the feature points to obtain feature point matching pairs between the video frame images. Since the drone is constantly flying, the position of the same point in the real space is different in different video frames. By acquiring the features of the feature points in the front and rear video frames, and then matching according to the features, the real space is obtained. The position of the same point in different video frames.
  • two adjacent video frame images are obtained, the features of multiple feature points are extracted from the previous video frame image and the next video frame image, and then the features of the feature points are matched to obtain the previous one
  • the matching feature points in the video frame image and the following video frame image form a feature point matching pair.
  • the feature points in the previous video frame image are respectively P1, P2, P3..., Pn
  • the corresponding matching feature points in the next video frame image are respectively Q1, Q2, Q3..., Qn.
  • P1 and Q1 are feature point matching pairs
  • P2 and Q2 are feature point matching pairs
  • P3 and Q3 are feature point matching pairs.
  • Feature point matching can use Brute Force or Fast Nearest Neighbor (FLANN) algorithm for feature matching.
  • FLANN Brute Force or Fast Nearest Neighbor
  • the Fast Nearest Neighbor algorithm judges whether the ratio between the closest matching distance and the next-closest matching distance exceeds a set threshold. If it exceeds the preset threshold, it is determined that the matching is successful, thereby reducing mismatched point pairs.
  • Step 106 Calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images.
  • the pose transformation matrix between the video frame images can be calculated according to the correspondence between the positions.
  • Step 108 Determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix.
  • the three-dimensional coordinates corresponding to the video frame image refer to the three-dimensional coordinates corresponding to the camera in the drone.
  • the three-dimensional coordinates of any video frame image can be calculated according to the transformation relationship.
  • the three-dimensional coordinates corresponding to the video frame image actually refer to the camera shooting the video frame image.
  • the three-dimensional point coordinates corresponding to the location refer to the three-dimensional coordinates corresponding to the location.
  • Step 110 Convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a three-dimensional point cloud map.
  • the coordinates of the feature points in the video frame image are also in the camera coordinate system, in order to convert the coordinates of the feature points to In the world coordinate system, the transformation is performed according to the pose transformation matrix to obtain the three-dimensional coordinates of the feature points in the world coordinates, thereby obtaining a three-dimensional point cloud map.
  • Step 112 The video frame image is used as the input of the target detection model, and the target object information in the video frame image detected by the target detection model is obtained.
  • the target detection model is obtained by pre-training, and the target detection model is used to detect the target object appearing in the video frame image, for example, a car. Since the video frame image may contain multiple objects, if the category of each object needs to be recognized, multiple target detection models need to be trained accordingly. After the target detection model is trained, the video frame image is used as the input of the target detection model, and the target object and the location of the target object in the video frame image can be detected.
  • Step 114 Combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • the feature points corresponding to the target can be determined, and the target information corresponding to the feature points can be marked in Three-dimensional point cloud map, so that the established three-dimensional point cloud map has a richer amount of information.
  • the target detection model is used for local perception, and the construction of the 3D point cloud map is based on global perception, which combines global perception and local perception, thereby increasing the richness of the 3D point cloud map.
  • the above-mentioned UAV 3D map construction method uses a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images, which can improve the accuracy and real-time performance of feature point matching.
  • the target in the video frame image is identified and detected, and the target information is combined with the 3D point cloud map to obtain a 3D point cloud map containing the object information, so that the established 3D point cloud map contains There is more information.
  • the accuracy of 3D map construction is improved by mixing and matching color histograms and scale-invariant feature transformations, and combined with the target information obtained by the target detection model to make the 3D point cloud map contain richer content, which is useful for follow-up
  • the optimal path planning provides support and improves the intelligent level of drone environment perception.
  • a schematic diagram of a method for constructing a three-dimensional map of a UAV includes two parts: global perception and local perception.
  • global perception a mixed structural framework of color histogram and SIFT features is used for matching, and then positioning and 3D point cloud map construction are performed.
  • Local perception uses the target detection model to identify the target in the video frame image. Finally, the two are combined to obtain a three-dimensional point cloud map containing the target information.
  • the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms
  • the graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.
  • the color histogram is used for preliminary feature point matching to obtain the first matching pair set, and then the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature point matching pair .
  • the matching of the color histogram adopts Bhattacharyya distance calculation, or adopts Correlation distance calculation.
  • FIG. 3 it is a schematic diagram of the combination of color histogram and SIFT feature matching in an embodiment, and the two are in a cascade relationship.
  • calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.
  • the three-dimensional coordinates of each feature point are obtained.
  • the three-dimensional coordinates are obtained from the color image and the depth image taken by the RGB-D camera.
  • the color image is used to identify the x and x of the feature point.
  • the y value, the depth image is used to obtain the corresponding z value.
  • the feature point matching pairs are regarded as two sets respectively, the set of feature points in the first video frame image is ⁇ P
  • P i ⁇ R 3 , i 1, 2...N ⁇ , The set of feature points in the second video frame image is ⁇ Q
  • Q i ⁇ R 3 , i 1, 2...N ⁇ , the error between the two point sets is taken as the cost function, and the minimum of the cost function Calculate the corresponding rotation matrix R and translation vector t. It can be expressed by the following formula:
  • R and t are rotation matrix and translation vector respectively.
  • the steps of the iterative closest point algorithm are:
  • the rotation matrix and translation vector with constraints can be expressed by unconstrained Lie algebra, and the number of feature points whose error distance is less than the set threshold, that is, the number of interior points, can be recorded. If the error distance E d calculated in step 3) is less than the threshold and the interior point is greater than the set threshold, or if the number of iterations reaches the set threshold, the iteration ends; if not, then go to step 1) for the next iteration.
  • the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.
  • the target detection model is obtained by training with a deep learning model.
  • a deep learning model In order to train the target detection model, first obtain the training video image sample, and set the positive sample and the negative sample.
  • the positive sample is the video image that contains the target and the position mark of the target in the video image. It can be detected through training.
  • Target detection model of the target As shown in FIG. 4, in one embodiment, the training and prediction schematic diagram of the UAV target detection model based on deep learning is divided into two parts: preprocessing and real-time detection.
  • To detect targets in real time first perform pre-processing operations on the data collected by the drone, divide the collected video stream into video frame images, mark the targets in the images, and divide them into training and test data sets, using depth
  • the learning framework trains the model, and then applies the saved model to the video stream returned by the platform to complete the real-time detection of the target.
  • Use a small drone carrier equipped with an industrial camera, extensively sample a large number of video data for the scene under the drone's perspective, determine the identification target of the drone, and mark the required identification target in the acquired video data.
  • Use the preprocessed data to train the neural network model, adjust the model parameters until the training results meet the convergence conditions, save the training model for subsequent target detection, load the trained model to the drone, and use the drone to detect the target Experiment and continuously adjust the optimization model.
  • the deep learning model adopts the YOLOv3 network structure (also called Darknet-53), and adopts a fully convolutional network, including: the introduction of a residual (residual) structure, that is, the ResNet layer jump connection method, and a large number of residuals Poor network characteristics.
  • Convolution with a step size of 2 is used for down-sampling, while up-sampling and route operations are used to perform 3 detections in a network structure.
  • Use dimensional clustering as anchor boxes to predict bounding boxes, and use the sum of squared error losses during training to predict the object score of each bounding box through logistic regression. If the previous bounding box is not the best, and the object to be tested overlaps a certain threshold, we will ignore this prediction and continue.
  • the threshold 0.5 system uses the threshold 0.5 system to assign only one bounding box to each object under test. If the previous bounding box is not assigned to the object to be measured, there will be no loss of coordinates or category prediction. Each box uses multi-label classification to predict the classes that the bounding box may contain. In the training process, binary cross-entropy loss is used for category prediction.
  • the YOLOv3 lightweight target detection neural network structure is applied to the UAV platform, which improves the ability of real-time target recognition under the limited computing power of the UAV.
  • the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the target information to the three-dimensional point cloud map according to the feature point.
  • the target information matching the feature point is determined, and the target information is marked on the three-dimensional point cloud map.
  • the target information is marked on the three-dimensional point cloud map.
  • the method further includes: obtaining measurement data measured by an inertial measurement unit; calculating an initial pose transformation matrix between video frames according to the measurement data;
  • the feature point matching pair calculated to obtain the pose transformation matrix between the video frame images includes: calculating the target position between the video frames according to the feature point matching pair between the initial pose transformation matrix and the video frame image Pose transformation matrix.
  • an inertial measurement unit is a device that measures the three-axis attitude angle (or angular velocity) and acceleration of an object.
  • the inertial measurement unit is used as the inertial parameter measurement device of the UAV.
  • the device includes a three-axis gyroscope, a three-axis acceleration and a three-axis magnetometer.
  • the UAV can directly read the measurement data measured by the inertial measurement unit.
  • the measurement data includes: angular velocity, acceleration, and magnetometer data.
  • the UAV's pose transformation matrix can be directly calculated based on the measurement data. Because the inertial measurement unit will have accumulated errors, the obtained UAV pose transformation matrix is not enough accurate.
  • the pose transformation matrix includes a rotation matrix R and a translation vector t.
  • the initial pose transformation matrix corresponding to the measurement data is calculated by using a complementary filtering algorithm. After the initial pose transformation matrix is obtained, the initial pose transformation matrix is used as the initial matrix, and the Iterative Closest Point (ICP) algorithm is used to calculate the target between the video frames according to the feature point matching pair between the video frame images.
  • ICP Iterative Closest Point
  • the method further includes: calculating the distance between the current video frame and the previous key frame If the amount of movement is greater than the preset threshold, the current video frame is used as a key frame; when the current video frame is a key frame, the current video frame is matched with the key frame in the previous key frame library, if the If there is a key frame matching the current video frame in the key frame library, the current video frame is taken as the loop frame; the corresponding pose transformation matrix is optimized and updated according to the loop frame to obtain the updated pose transformation matrix;
  • the pose transformation matrix determining the three-dimensional coordinates corresponding to each video frame image includes: determining the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
  • the calculation complexity can be reduced by extracting key frames. Since the captured video frames are relatively dense, for example, generally 30 frames can be captured within one second. It can be seen that the similarity between frames is very high, or even the same, so if you calculate each frame, it will undoubtedly increase the calculation. the complexity. Therefore, the complexity can be reduced by extracting key frames. Specifically, first take the first video frame as a key frame, and then calculate the amount of motion between the current video frame and the previous key frame. If the amount of motion is within a certain threshold range, select the key frame, where the calculation formula of the amount of motion is:
  • E m represents a measure of the amount of exercise
  • t x, t y, t z t represents the translation vector three translational distance
  • ⁇ 1 and ⁇ 2 are the balance weights of translation and rotation respectively. For the visual field shot by the camera, rotation is easier to bring about larger scene changes than translation Therefore, the value of ⁇ 2 is larger than ⁇ 1 , and the specific value should be adjusted according to the specific situation.
  • the loop detection method is used to optimize and update the obtained pose transformation matrix.
  • a closed loop detection algorithm is used for loop detection. After the loop detection is performed, the target pose transformation matrix is updated and optimized according to the loop detection result to obtain a more accurate pose transformation matrix, which is called "updated pose transformation matrix" for distinction. Determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
  • a three-dimensional map construction device for drones which includes:
  • the extraction module 502 is configured to obtain video frame images taken by the camera, and extract feature points in each video frame image
  • the matching module 504 is configured to use a color histogram and a scale-invariant feature transform hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images;
  • the calculation module 506 is configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images;
  • the determining module 508 is configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix
  • the conversion module 510 is configured to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
  • the detection module 512 is configured to use the video frame image as the input of the target detection model, and obtain target information in the video frame image detected by the target detection model;
  • the combining module 514 is configured to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • the matching module 504 is further configured to use a color histogram feature matching algorithm to match feature points between video frame images to obtain a first set of matching pairs; use a scale-invariant feature transform matching algorithm to The first matching pair further matches the matching points in the set to obtain the target feature point matching pair.
  • the calculation module 506 is further configured to obtain the three-dimensional coordinates of each feature point in the feature point matching pair; calculate the conversion obtained by converting the three-dimensional coordinates of the feature point in one video frame image to another video frame image Three-dimensional coordinates; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the another video frame image; calculate the pose transformation matrix according to the converted three-dimensional coordinates and the target three-dimensional coordinates.
  • the target detection model is obtained based on deep learning model training;
  • the above-mentioned three-dimensional map construction device for drones further includes: a training module for obtaining training video image samples, the training video image samples including positive Samples and negative samples, the positive sample includes a target and a position mark of the target in the video image; training the target detection model according to the training video image sample to obtain a trained target detection model.
  • the combining module 514 is also used to obtain the target position of the detected target in the video frame image; determine the matching feature point according to the target position; according to the feature point, the The object category information is marked on the three-dimensional point cloud map.
  • the above-mentioned three-dimensional map construction device for drones further includes:
  • the initial calculation module 505 is configured to obtain measurement data measured by the inertial measurement unit, and calculate an initial pose transformation matrix between video frames according to the measurement data;
  • the calculation module is further configured to include: calculating the target pose transformation matrix between the video frames according to the feature point matching pair between the initial pose transformation matrix and the video frame image.
  • the above-mentioned three-dimensional map construction device for drones further includes:
  • the key frame determination module 516 is used to calculate the amount of motion between the current video frame and the previous key frame. If the amount of motion is greater than the preset threshold, the current video frame is used as the key frame.
  • the loopback frame determination module 518 is configured to match the current video frame with a key frame in the previous key frame library when the current video frame is a key frame, and if there is a key frame in the key frame library that matches the current video frame Key frame, the current video frame is used as a loopback frame.
  • the optimization module 520 is configured to optimize and update the corresponding pose transformation matrix according to the loop frame to obtain an updated pose transformation matrix.
  • the determining module 508 is further configured to determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
  • Fig. 8 shows an internal structure diagram of a computer device in an embodiment.
  • the computer equipment can be a drone, or a terminal or server connected to the drone.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • the processor can enable the processor to implement the method for constructing a three-dimensional map of the drone.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the method for constructing a three-dimensional map of the UAV.
  • the network interface is used to communicate with an external device.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the method for constructing a three-dimensional map of an unmanned aerial vehicle can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 8.
  • the memory of the computer equipment can store various program templates that make up the UAV 3D map construction device. For example, the extraction module 502, the matching module 504, the calculation module 506, the determination module 508, the conversion module 510, the detection module 512, and the combination module 514.
  • a computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the following steps: acquiring video frame images captured by a camera, and extracting The feature points in each video frame image; the color histogram and the scale-invariant feature transformation hybrid matching algorithm are used to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images; according to the The feature point matching pair between the video frame images is calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates corresponding to each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and The corresponding pose transformation matrix converts the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system to obtain a three-dimensional point cloud map; use the video frame image as the input of the target detection model to obtain the target detection model detection The target object information in the obtained video frame image; combining the three-dimensional point cloud map with the target object information to obtain a three
  • the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms
  • the graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.
  • calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.
  • the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.
  • the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the object category information to the three-dimensional point cloud map according to the feature point.
  • the computer program when the computer program is processed by the processor, it is also used to perform the following steps: obtain measurement data measured by the inertial measurement unit; calculate the initial pose between video frames based on the measurement data Transformation matrix; said calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, including: according to the initial pose transformation matrix and the video frame image The feature point matching pair is calculated to obtain the target pose transformation matrix between the video frames.
  • the current video frame is taken as the loop frame; the corresponding pose is transformed according to the loop frame
  • a computer-readable storage medium storing a computer program.
  • the processor executes the following steps: acquiring video frame images taken by a camera, and extracting features in each video frame image Point; use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images; according to the feature points between the video frame images The matching pair is calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates of each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and the corresponding pose transformation matrix The three-dimensional coordinates of the feature points in the frame image are converted to the world coordinate system to obtain a three-dimensional point cloud map; the video frame image is used as the input of the target detection model to obtain the target in the video frame image detected by the target detection model Object information; combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  • the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms
  • the graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.
  • calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.
  • the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.
  • the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the object category information to the three-dimensional point cloud map according to the feature point.
  • the computer program when the computer program is processed by the processor, it is also used to perform the following steps: obtain measurement data measured by the inertial measurement unit; calculate the initial pose between video frames based on the measurement data Transformation matrix; said calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, including: according to the initial pose transformation matrix and the video frame image The feature point matching pair is calculated to obtain the target pose transformation matrix between the video frames.
  • the current video frame is taken as the loop frame; the corresponding pose is transformed according to the loop frame
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

The present application relates to a three-dimensional map constructing method for an unmanned aerial vehicle. Said method comprises: acquiring video frame images photographed by a camera, extracting feature points in each of the video frame images; using a color histogram and scale-invariant feature transformation hybrid matching algorithm to perform matching on the feature points, to obtain feature point matching pairs; performing calculation according to the feature point matching pairs, to obtain an attitude transformation matrix; determining, according to the attitude transformation matrix, three-dimensional coordinates corresponding to each video frame image; converting the three-dimensional coordinates of the feature points in the video frame images into a world coordinate system, to obtain a three-dimensional point cloud map; taking the video frame images as an input to a target detection model, to obtain target object information; and combining the three-dimensional point cloud map with the target object information, to obtain a three-dimensional point cloud map containing the target object information. The method improves the real-time performance and accuracy of the construction of a three-dimensional point cloud map, containing rich information. In addition, further provided are a three-dimensional map constructing apparatus for an unmanned aerial vehicle, a computer device and a storage medium.

Description

无人机三维地图构建方法、装置、计算机设备及存储介质UAV three-dimensional map construction method, device, computer equipment and storage medium 技术领域Technical field
本发明涉及计算机技术领域,尤其是涉及一种无人机三维地图构建方法、装置、计算机设备及存储介质。The present invention relates to the field of computer technology, in particular to a method, device, computer equipment and storage medium for constructing a three-dimensional map of an unmanned aerial vehicle.
背景技术Background technique
随着科学技术的发展,无人机日趋小型化、智能化,其飞行空间已扩展至丛林、城市甚至建筑物内。环境感知是无人机在工作过程中对环境理解、导航、规划和行为决策的基础。其中,环境感知的最重要的目标是构建完善的三维地图,然后基于三维地图进行路径的规划和导航。With the development of science and technology, UAVs are becoming smaller and more intelligent, and their flight space has expanded to jungles, cities and even buildings. Environmental perception is the basis of UAV's understanding of the environment, navigation, planning and behavioral decision-making in the working process. Among them, the most important goal of environment perception is to build a perfect three-dimensional map, and then plan and navigate the route based on the three-dimensional map.
传统的三维地图的构建要么准确性低、要么实时性低,且构建的地图信息量较少进而影响后续路径的规划和导航。The construction of traditional 3D maps is either low in accuracy or low in real-time, and the amount of map information constructed is less, which affects subsequent path planning and navigation.
发明内容Summary of the invention
此,有必要针对上述问题,提供了一种能够在保证高精度的前提下达到良好实时性要求且包含信息量多的无人机三维地图构建方法、装置、计算机设备及存储介质。Therefore, it is necessary to address the above problems and provide a method, device, computer equipment, and storage medium for building a three-dimensional UAV map that can achieve good real-time requirements while ensuring high accuracy and contains a large amount of information.
第一方面,本发明实施例提供一种无人机三维地图构建方法,所述方法包括:In the first aspect, an embodiment of the present invention provides a method for constructing a three-dimensional map of a drone, the method including:
获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;Obtain the video frame images taken by the camera, and extract the feature points in each video frame image;
采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;Use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frames and images, and obtain feature point matching pairs between video frames and images;
根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;Calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;
根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;Determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;Convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;Using the video frame image as an input of a target detection model to obtain target information in the video frame image detected by the target detection model;
将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
第二方面,本发明实施例提供一种无人机三维地图构建装置,所述装置包括:In the second aspect, an embodiment of the present invention provides a device for constructing a three-dimensional map of a drone, the device including:
提取模块,用于获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;The extraction module is used to obtain the video frame images taken by the camera, and extract the feature points in each video frame image;
匹配模块,用于采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;The matching module is used to use the color histogram and scale-invariant feature transformation hybrid matching algorithm to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images;
计算模块,用于根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;A calculation module, configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;
确定模块,用于根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;A determining module, configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
转换模块,用于根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;The conversion module is used to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
检测模块,用于将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;The detection module is configured to use the video frame image as the input of the target detection model to obtain target information in the video frame image detected by the target detection model;
结合模块,用于将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The combining module is used to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
第三方面,本发明实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:In a third aspect, an embodiment of the present invention provides a computer device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps:
获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;Obtain the video frame images taken by the camera, and extract the feature points in each video frame image;
采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;Use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frames and images, and obtain feature point matching pairs between video frames and images;
根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;Calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;
根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;Determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;Convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;Using the video frame image as an input of a target detection model to obtain target information in the video frame image detected by the target detection model;
将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
第四方面,本发明实施例提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor executes the following steps:
获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;Obtain the video frame images taken by the camera, and extract the feature points in each video frame image;
采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;Use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frames and images, and obtain feature point matching pairs between video frames and images;
根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;Calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;
根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;Determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;Convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;Using the video frame image as an input of a target detection model to obtain target information in the video frame image detected by the target detection model;
将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
上述无人机三维地图构建方法、装置、计算机设备及存储介质,通过采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,能够提高特征点匹配的准确度和实时性。另外,通过目标检测模型对视频帧图像中的目标物进行识别检测,将目标物信息信息与三维点云地图进行结合,得到包含有物体信息的三维点云地图,使得建立的三维点云地图包含有更丰富的信息。即通过颜色直方图和尺度不变特征变换混合匹配提高了三维地图构建的准确性,且与目标检测模型识别得到的目标物信息进行结合,使得三维点云地图包含有更丰富的内容,为后续进行最优路径规划提供了支撑。The above-mentioned UAV 3D map construction method, device, computer equipment and storage medium can match the feature points between the video frame images by using the color histogram and the scale-invariant feature transformation hybrid matching algorithm, which can improve the accuracy of feature point matching Degree and real-time. In addition, through the target detection model, the target in the video frame image is identified and detected, and the target information is combined with the 3D point cloud map to obtain a 3D point cloud map containing the object information, so that the established 3D point cloud map contains There is more information. That is, the accuracy of 3D map construction is improved by mixing and matching color histograms and scale-invariant feature transformations, and combined with the target information obtained by the target detection model to make the 3D point cloud map contain richer content, which is useful for follow-up Provide support for optimal path planning.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained based on the structures shown in these drawings.
图1为一个实施例中无人机三维地图构建方法的流程图;Figure 1 is a flowchart of a method for constructing a three-dimensional map of a drone in an embodiment;
图2为一个实施例中无人机三维地图构建方法的示意图;Fig. 2 is a schematic diagram of a method for constructing a three-dimensional map of a drone in an embodiment;
图3为一个实施例中颜色直方图与SIFT特征匹配的结合示意图;FIG. 3 is a schematic diagram of combining color histogram and SIFT feature matching in an embodiment;
图4为一个实施例中基于深度学习的无人机目标检测模型的训练以及预测的示意图;Figure 4 is a schematic diagram of training and prediction of a drone target detection model based on deep learning in an embodiment;
图5为一个实施例中无人机三维地图构建装置的结构框图;Fig. 5 is a structural block diagram of a device for constructing a three-dimensional map of a drone in an embodiment;
图6为另一个实施例中无人机三维地图构建装置的结构框图;Fig. 6 is a structural block diagram of a device for constructing a three-dimensional map of a drone in another embodiment;
图7为又一个实施例中无人机三维地图构建装置的结构框图;FIG. 7 is a structural block diagram of a device for constructing a three-dimensional map of a drone in another embodiment;
图8为一个实施例中计算机设备的内部结构图。Figure 8 is an internal structure diagram of a computer device in an embodiment.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.
如图1所示,提出了一种无人机三维地图构建方法,该无人机三维地图构建方法应用于无人机或者与无人机连接的终端或服务器,本实施例中以应用于无人机为例说明,具体包括以下步骤:As shown in Figure 1, a method for constructing a three-dimensional map of a drone is proposed. The method for constructing a three-dimensional map of a drone is applied to a drone or a terminal or server connected to the drone. In this embodiment, it is applied to a drone. Take the man-machine as an example, it specifically includes the following steps:
步骤102,获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点。Step 102: Obtain video frame images captured by the camera, and extract feature points in each video frame image.
其中,特征点可以简单理解为图像中比较显著的点,如轮廓点、较暗区域中的亮点,较亮区域中的暗点等。在一个实施例中,无人机的相机可以采用RGB-D相机,获取拍摄得到的彩色图像和深度图像,并将获取到的彩色图像和深度图像进行时间上对齐,然后提取彩色图像中的特征点,特征点的特征提取可以采用颜色直方图和尺度不变特征变换进行特征提取。Among them, feature points can be simply understood as more prominent points in the image, such as contour points, bright spots in a darker area, and dark spots in a brighter area. In one embodiment, the camera of the drone may use an RGB-D camera to obtain the color image and depth image obtained by shooting, and align the obtained color image and depth image in time, and then extract the features in the color image Point, feature point feature extraction can use color histogram and scale-invariant feature transformation for feature extraction.
步骤104,采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对。Step 104: Use a color histogram and a scale-invariant feature transform hybrid matching algorithm to match the feature points between the video frame images to obtain a feature point matching pair between the video frame images.
其中,颜色直方图匹配算法侧重于对颜色特征的匹配,尺度不变特征变换(scaleinvariant feature transform,SIFT)侧重于对形状特征的匹配。所以将颜色直方图匹配算法和尺度变换特征变换进行混合,即将颜色直方图的“色”与SIFT算法的“形”进行了结合,从而提高了特征识别的准确度,提高了特征点匹配的准确度,同时也有利于提高识别的实时性,从而有利于提高后续三维点云地图生成的实时性和准确度。Among them, the color histogram matching algorithm focuses on the matching of color features, and the scale invariant feature transform (SIFT) focuses on the matching of shape features. Therefore, the color histogram matching algorithm and the scale transformation feature transformation are mixed, that is, the "color" of the color histogram is combined with the "shape" of the SIFT algorithm, which improves the accuracy of feature recognition and the accuracy of feature point matching At the same time, it is also beneficial to improve the real-time performance of recognition, thereby helping to improve the real-time performance and accuracy of subsequent 3D point cloud map generation.
在提取到每个视频帧图像中的特征点后,根据特征点的特征进行特征匹配,得到视频帧图像之间的特征点匹配对。由于无人机是在不断地飞行中,所以真实空间中的同一点在不同视频帧图像中的位置不同,通过获取前后视频帧中特征点的特征,然后根据特征进行匹配,得到真实空间中的同一点在不同视频帧中的位置。After the feature points in each video frame image are extracted, feature matching is performed according to the features of the feature points to obtain feature point matching pairs between the video frame images. Since the drone is constantly flying, the position of the same point in the real space is different in different video frames. By acquiring the features of the feature points in the front and rear video frames, and then matching according to the features, the real space is obtained. The position of the same point in different video frames.
在一个实施例中,获取相邻的两个视频帧图像,在前一视频帧图像和后一视频帧图像中提取到多个特征点的特征,然后对特征点的特征进行匹配,得到前一视频帧图像与后一视频帧图像中的匹配的特征点,构成特征点匹配对。比如,前一视频帧图像中的特征点分别为P1,P2,P3……,Pn,后一视频帧图像中的相应匹配的特征点分别为Q1,Q2,Q3……,Qn。其中,P1和Q1为特征点匹配对,P2和Q2为特征点匹配对,P3和Q3为特征点匹配对等。特征点的匹配可以采用暴力匹配(Brute Force)或快速近似最近邻(FLANN)算法进行 特征匹配,其中,快速近似最近邻算法是通过判断最近匹配距离和次近匹配距离比值是否超过设定阈值,若超过预设阈值,则判定匹配成功,以此减少误匹配点对。In one embodiment, two adjacent video frame images are obtained, the features of multiple feature points are extracted from the previous video frame image and the next video frame image, and then the features of the feature points are matched to obtain the previous one The matching feature points in the video frame image and the following video frame image form a feature point matching pair. For example, the feature points in the previous video frame image are respectively P1, P2, P3..., Pn, and the corresponding matching feature points in the next video frame image are respectively Q1, Q2, Q3..., Qn. Among them, P1 and Q1 are feature point matching pairs, P2 and Q2 are feature point matching pairs, and P3 and Q3 are feature point matching pairs. Feature point matching can use Brute Force or Fast Nearest Neighbor (FLANN) algorithm for feature matching. Among them, the Fast Nearest Neighbor algorithm judges whether the ratio between the closest matching distance and the next-closest matching distance exceeds a set threshold. If it exceeds the preset threshold, it is determined that the matching is successful, thereby reducing mismatched point pairs.
步骤106,根据视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵。Step 106: Calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images.
其中,在确定了特征点在视频帧图像中的位置后,就可以根据位置之间的对应关系计算得到视频帧图像之间的位姿变换矩阵。Among them, after determining the position of the feature point in the video frame image, the pose transformation matrix between the video frame images can be calculated according to the correspondence between the positions.
步骤108,根据位姿变换矩阵确定每个视频帧图像对应的三维坐标。Step 108: Determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix.
其中,视频帧图像对应的三维坐标是指无人机中的相机对应的三维坐标。在已知视频帧图像之间的位姿变换矩阵后,就可以根据变换关系计算得到任一视频帧图像的三维坐标,视频帧图像对应的三维坐标实际上是指相机拍摄该视频帧图像时的位置对应的三维点坐标。Among them, the three-dimensional coordinates corresponding to the video frame image refer to the three-dimensional coordinates corresponding to the camera in the drone. After the pose transformation matrix between the video frames is known, the three-dimensional coordinates of any video frame image can be calculated according to the transformation relationship. The three-dimensional coordinates corresponding to the video frame image actually refer to the camera shooting the video frame image. The three-dimensional point coordinates corresponding to the location.
步骤110,根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图。Step 110: Convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a three-dimensional point cloud map.
其中,由于每个视频帧图像对应的三维坐标是在对应的相机坐标系下的三维坐标,视频帧图像中的特征点的坐标也是在相机坐标系下的,为了将特征点的坐标都转换到世界坐标系下,根据位姿变换矩阵进行转换,得到特征点在世界坐标下的三维坐标,从而得到三维点云地图。Among them, since the three-dimensional coordinates corresponding to each video frame image are the three-dimensional coordinates in the corresponding camera coordinate system, the coordinates of the feature points in the video frame image are also in the camera coordinate system, in order to convert the coordinates of the feature points to In the world coordinate system, the transformation is performed according to the pose transformation matrix to obtain the three-dimensional coordinates of the feature points in the world coordinates, thereby obtaining a three-dimensional point cloud map.
步骤112,将视频帧图像作为目标检测模型的输入,获取目标检测模型检测得到的视频帧图像中的目标物信息。Step 112: The video frame image is used as the input of the target detection model, and the target object information in the video frame image detected by the target detection model is obtained.
其中,预先训练得到目标检测模型,目标检测模型用于检测视频帧图像中出现的目标物,比如,汽车。由于视频帧图像中可能包含有多个物体,如果需要识别得到每个物体的类别,则相应地需要训练得到多个目标检测模型。在训练得到目标检测模型后,将视频帧图像作为目标检测模型的输入,就可以检测得到视频帧图像中的目标物以及目标物所在的位置。Among them, the target detection model is obtained by pre-training, and the target detection model is used to detect the target object appearing in the video frame image, for example, a car. Since the video frame image may contain multiple objects, if the category of each object needs to be recognized, multiple target detection models need to be trained accordingly. After the target detection model is trained, the video frame image is used as the input of the target detection model, and the target object and the location of the target object in the video frame image can be detected.
步骤114,将三维点云地图与目标物信息结合,得到包含有目标物信息的三维点云地图。Step 114: Combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
其中,在得到的视频帧图像中的目标物的信息后,通过与三维点云地图上的特征点进行匹配,就可以确定目标物对应的特征点,将该特征点对应的目标物信息标注到三维点云地图,从而使得建立的三维点云地图具有更丰富的信息量。目标检测模型用于对局部感知,而三维点云地图的构建是基于全局感知,将全局感知和局部感知进行结合,从而提高了三维点云地图的丰富性。Among them, after obtaining the target information in the video frame image, by matching with the feature points on the three-dimensional point cloud map, the feature points corresponding to the target can be determined, and the target information corresponding to the feature points can be marked in Three-dimensional point cloud map, so that the established three-dimensional point cloud map has a richer amount of information. The target detection model is used for local perception, and the construction of the 3D point cloud map is based on global perception, which combines global perception and local perception, thereby increasing the richness of the 3D point cloud map.
上述无人机三维地图构建方法,通过采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,能够提高特征点匹配的准确度和实时性。另外,通过目标检测模型对视频帧图像中的目标物进行识别检 测,将目标物信息信息与三维点云地图进行结合,得到包含有物体信息的三维点云地图,使得建立的三维点云地图包含有更丰富的信息。即通过颜色直方图和尺度不变特征变换混合匹配提高了三维地图构建的准确性,且与目标检测模型识别得到的目标物信息进行结合,使得三维点云地图包含有更丰富的内容,为后续进行最优路径规划提供了支撑,提高了无人机环境感知的智能化水平。The above-mentioned UAV 3D map construction method uses a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images, which can improve the accuracy and real-time performance of feature point matching. In addition, through the target detection model, the target in the video frame image is identified and detected, and the target information is combined with the 3D point cloud map to obtain a 3D point cloud map containing the object information, so that the established 3D point cloud map contains There is more information. That is, the accuracy of 3D map construction is improved by mixing and matching color histograms and scale-invariant feature transformations, and combined with the target information obtained by the target detection model to make the 3D point cloud map contain richer content, which is useful for follow-up The optimal path planning provides support and improves the intelligent level of drone environment perception.
如图2所示,在一个实施例中,无人机三维地图构建方法的示意图,包括:全局感知和局部感知两个部分。全局感知中采用颜色直方图和SIFT特征进行混合的结构框架进行匹配,然后进行定位以及三维点云地图的构建。局部感知采用目标检测模型对视频帧图像中的目标物进行识别,最后,将两者结合,得到包含有目标物信息的三维点云地图。As shown in Fig. 2, in one embodiment, a schematic diagram of a method for constructing a three-dimensional map of a UAV includes two parts: global perception and local perception. In the global perception, a mixed structural framework of color histogram and SIFT features is used for matching, and then positioning and 3D point cloud map construction are performed. Local perception uses the target detection model to identify the target in the video frame image. Finally, the two are combined to obtain a three-dimensional point cloud map containing the target information.
在一个实施例中,所述采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对,包括:采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配,得到第一匹配对集合;采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms The graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.
其中,先采用颜色直方图进行初步的特征点匹配,得到第一匹配对集合,然后采用尺度不变特征变换匹配算法对第一匹配对集合中的匹配点进行进一步匹配,得到目标特征点匹配对。在一个实施例中,颜色直方图的匹配采用Bhattacharyya距离计算,或者采用Correlation距离计算。如图3所示,为一个实施例中,颜色直方图与SIFT特征匹配的结合示意图,两者为串级的关系。Among them, the color histogram is used for preliminary feature point matching to obtain the first matching pair set, and then the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature point matching pair . In one embodiment, the matching of the color histogram adopts Bhattacharyya distance calculation, or adopts Correlation distance calculation. As shown in FIG. 3, it is a schematic diagram of the combination of color histogram and SIFT feature matching in an embodiment, and the two are in a cascade relationship.
在一个实施例中,根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:获取所述特征点匹配对中每个特征点的三维坐标;计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标;获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标;根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。In one embodiment, calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.
其中,在确定了特征点匹配对后,获取每个特征点的三维坐标,三维坐标是根据RGB-D相机拍摄得到的彩色图像和深度图像得到的,彩色图像用于识别得到特征点的x和y值,深度图像用于获取相应的z值。对于两个视频帧图像,将特征点匹配对分别作为两个集合,第一视频帧图像中的特征点的集合为{P|P i∈R 3,i=1,2...N},第二视频帧图像中的特征点的集合为{Q|Q i∈R 3,i=1,2...N},将两个点集之间的误差作为代价函数,通过代价函数的最小化求得对应的旋转矩阵R和平移向量t。可以采用如下公式表示: Among them, after determining the feature point matching pair, the three-dimensional coordinates of each feature point are obtained. The three-dimensional coordinates are obtained from the color image and the depth image taken by the RGB-D camera. The color image is used to identify the x and x of the feature point. The y value, the depth image is used to obtain the corresponding z value. For two video frame images, the feature point matching pairs are regarded as two sets respectively, the set of feature points in the first video frame image is {P|P i ∈ R 3 , i=1, 2...N}, The set of feature points in the second video frame image is {Q|Q i ∈ R 3 , i=1, 2...N}, the error between the two point sets is taken as the cost function, and the minimum of the cost function Calculate the corresponding rotation matrix R and translation vector t. It can be expressed by the following formula:
Figure PCTCN2019097745-appb-000001
Figure PCTCN2019097745-appb-000001
其中,R和t分别为旋转矩阵和平移向量。迭代最近点算法的步骤为:Among them, R and t are rotation matrix and translation vector respectively. The steps of the iterative closest point algorithm are:
1)对P i中每一个点在Q中对应的最近点,记为Q i1) The closest point corresponding to each point in P i in Q is marked as Q i ;
2)按照以上公式求取使最小的变换矩阵R和t;2) Obtain the smallest transformation matrix R and t according to the above formula;
3)利用R和t对点集P进行刚体变换操作得到新点集
Figure PCTCN2019097745-appb-000002
计算新点集与点集Q之间的误差距离:
3) Use R and t to perform a rigid body transformation operation on the point set P to obtain a new point set
Figure PCTCN2019097745-appb-000002
Calculate the error distance between the new point set and the point set Q:
Figure PCTCN2019097745-appb-000003
Figure PCTCN2019097745-appb-000003
在实际操作中,可以将有约束条件的旋转矩阵和平移向量用无约束的李代数表示,并且记录误差距离小于设定阈值的特征点数量,即内点数量。如果步骤3)中计算的误差距离E d小于阈值且内点大于设定阈值,或者迭代次数是否到达设定阈值,则迭代结束;如果不满足则转到步骤1)进行下一轮迭代。 In actual operation, the rotation matrix and translation vector with constraints can be expressed by unconstrained Lie algebra, and the number of feature points whose error distance is less than the set threshold, that is, the number of interior points, can be recorded. If the error distance E d calculated in step 3) is less than the threshold and the interior point is greater than the set threshold, or if the number of iterations reaches the set threshold, the iteration ends; if not, then go to step 1) for the next iteration.
在一个实施例中,所述目标检测模型是基于深度学习模型训练得到的;在所述将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型输出的检测得到目标物之前,还包括:获取训练视频图像样本,所述训练视频图像样本包括正样本和负样本,所述正样本中包括有目标物以及所述目标物在所述视频图像中位置标记;根据所述训练视频图像样本对所述目标检测模型进行训练,得到训练好的目标检测模型。In one embodiment, the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.
其中,目标检测模型是采用深度学习模型进行训练得到的。为了训练得到目标检测模型,首先获取训练视频图像样本,并设定正样本和负样本,正样本就是包含有目标物以及目标物在视频图像中的位置标记的视频图像,通过训练学习到能够检测目标物的目标检测模型。如图4所示,在一个实施例中,基于深度学习的无人机目标检测模型的训练以及预测的示意图,分为预处理和实时检测两大部分。实时检测目标,首先对无人机采集的数据进行预处理操作,将采集到的视频流分为一个个视频帧图像,对图像中的目标进行样品标记,分为训练和测试数据集,使用深度学习框架训练模型,然后将保存的模型应用于平台回传的视频流,完成对目标的实时检测。Among them, the target detection model is obtained by training with a deep learning model. In order to train the target detection model, first obtain the training video image sample, and set the positive sample and the negative sample. The positive sample is the video image that contains the target and the position mark of the target in the video image. It can be detected through training. Target detection model of the target. As shown in FIG. 4, in one embodiment, the training and prediction schematic diagram of the UAV target detection model based on deep learning is divided into two parts: preprocessing and real-time detection. To detect targets in real time, first perform pre-processing operations on the data collected by the drone, divide the collected video stream into video frame images, mark the targets in the images, and divide them into training and test data sets, using depth The learning framework trains the model, and then applies the saved model to the video stream returned by the platform to complete the real-time detection of the target.
使用小型无人机载体,搭载工业摄像头,广泛的针对无人机视角下的情景 大量视频数据取样,确定好无人机所需识别目标,在获取的视频数据中对所需识别目标进行标记,利用预处理好的数据对神经网络模型进行训练,调节模型参数至训练结果满足收敛条件,保存训练模型用于后续目标检测,将训练好的模型加载至无人机,运用无人机对目标检测试验,不断地调节优化模型。Use a small drone carrier, equipped with an industrial camera, extensively sample a large number of video data for the scene under the drone's perspective, determine the identification target of the drone, and mark the required identification target in the acquired video data. Use the preprocessed data to train the neural network model, adjust the model parameters until the training results meet the convergence conditions, save the training model for subsequent target detection, load the trained model to the drone, and use the drone to detect the target Experiment and continuously adjust the optimization model.
在一个具体的实施例中,深度学习模型采用YOLOv3网络结构(也叫做Darknet-53),采用全卷积网络,包括:引入了residual(残差)结构,即ResNet跳层连接方式,大量使用残差网络特性。使用步长为2的卷积来进行降采样,同时使用了上采样、route操作,在一个网络结构中进行3次检测。使用维度聚类作为anchor boxes(锚箱)进行预测边界框,训练期间使用平方误差损失的总和,通过逻辑回归预测每个边界框的对象分数。如果以前的边界框不是最好的,且将待测对象重叠了一定的阈值以上后,我们会忽略这个预测,继续进行。我们使用阈值0.5系统只为每个待测对象分配一个边界框。如果先前的边界框未分配给待测对象,则不会对坐标或类别预测造成损失。每个框使用多标签分类来预测边界框可能包含的类。在训练过程中,使用二元交叉熵损失来进行类别预测。使用YOLOv3轻量级目标检测神经网络结构应用于无人机平台,在无人机的有限算力下提高了目标的实时识别的能力。In a specific embodiment, the deep learning model adopts the YOLOv3 network structure (also called Darknet-53), and adopts a fully convolutional network, including: the introduction of a residual (residual) structure, that is, the ResNet layer jump connection method, and a large number of residuals Poor network characteristics. Convolution with a step size of 2 is used for down-sampling, while up-sampling and route operations are used to perform 3 detections in a network structure. Use dimensional clustering as anchor boxes to predict bounding boxes, and use the sum of squared error losses during training to predict the object score of each bounding box through logistic regression. If the previous bounding box is not the best, and the object to be tested overlaps a certain threshold, we will ignore this prediction and continue. We use the threshold 0.5 system to assign only one bounding box to each object under test. If the previous bounding box is not assigned to the object to be measured, there will be no loss of coordinates or category prediction. Each box uses multi-label classification to predict the classes that the bounding box may contain. In the training process, binary cross-entropy loss is used for category prediction. The YOLOv3 lightweight target detection neural network structure is applied to the UAV platform, which improves the ability of real-time target recognition under the limited computing power of the UAV.
在一个实施例中,所述将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图,包括:获取检测得到的目标物在视频帧图像中的目标位置;根据所述目标位置确定与之匹配的特征点;根据所述特征点将所述目标物信息标注到所述三维点云地图。In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the target information to the three-dimensional point cloud map according to the feature point.
其中,根据检测得到的目标物在视频帧图像中的位置,以及特征点在视频帧图像中的位置,确定与特征点匹配的目标物信息,将目标物信息标注到三维点云地图上,从而得到信息量更丰富的三维点云地图。Among them, according to the position of the detected target in the video frame image and the position of the feature point in the video frame image, the target information matching the feature point is determined, and the target information is marked on the three-dimensional point cloud map. Obtain a three-dimensional point cloud map with more information.
在一个实施例中,所述方法还包括:获取惯性测量单元测量得到的测量数据;根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵;所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算 得到视频帧之间的目标位姿变换矩阵。In one embodiment, the method further includes: obtaining measurement data measured by an inertial measurement unit; calculating an initial pose transformation matrix between video frames according to the measurement data; The feature point matching pair calculated to obtain the pose transformation matrix between the video frame images includes: calculating the target position between the video frames according to the feature point matching pair between the initial pose transformation matrix and the video frame image Pose transformation matrix.
其中,惯性测量单元(Inertial measurement unit,IMU)是测量物体三轴姿态角(或角速率)以及加速度的装置。将惯性测量单元作为无人机的惯性参数测量装置,该装置包含了三轴陀螺仪、三轴加速度和三轴磁力计。无人机可以直接读取惯性测量单元测量的测量数据,测量数据包括:角速度、加速度和磁力计数据等。在获取到惯性测量单元测量得到的测量数据后,直接可以根据测量数据计算得到无人机的位姿变换矩阵,由于惯性测量单元会存在累计误差,所以得到的无人机的位姿变换矩阵不够准确。为了与后续优化后的位姿变换矩阵进行区分,将根据测量数据直接计算得到的位姿变换矩阵称为“初始位姿变换矩阵”。位姿变换矩阵包括旋转矩阵R和平移向量t。在一个实施例中,通过采用互补滤波算法计算得到测量数据对应的初始位姿变换矩阵。在得到初始位姿变换矩阵后,将初始位姿变换矩阵作为初始矩阵,采用迭代最近点(Iterative Closest Point,ICP)算法根据视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。通过将惯性测量单元得到的初始位姿变换矩阵作为初始矩阵有利于提高计算的速度。Among them, an inertial measurement unit (IMU) is a device that measures the three-axis attitude angle (or angular velocity) and acceleration of an object. The inertial measurement unit is used as the inertial parameter measurement device of the UAV. The device includes a three-axis gyroscope, a three-axis acceleration and a three-axis magnetometer. The UAV can directly read the measurement data measured by the inertial measurement unit. The measurement data includes: angular velocity, acceleration, and magnetometer data. After obtaining the measurement data measured by the inertial measurement unit, the UAV's pose transformation matrix can be directly calculated based on the measurement data. Because the inertial measurement unit will have accumulated errors, the obtained UAV pose transformation matrix is not enough accurate. In order to distinguish it from the post-optimized pose transformation matrix, the pose transformation matrix directly calculated from the measurement data is called the "initial pose transformation matrix". The pose transformation matrix includes a rotation matrix R and a translation vector t. In one embodiment, the initial pose transformation matrix corresponding to the measurement data is calculated by using a complementary filtering algorithm. After the initial pose transformation matrix is obtained, the initial pose transformation matrix is used as the initial matrix, and the Iterative Closest Point (ICP) algorithm is used to calculate the target between the video frames according to the feature point matching pair between the video frame images. Pose transformation matrix. Using the initial pose transformation matrix obtained by the inertial measurement unit as the initial matrix is beneficial to improve the calculation speed.
在一个实施例中,在所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵之后,还包括:计算当前视频帧与前一关键帧之间的运动量,若运动量大于预设阈值,则将当前视频帧作为关键帧;当所述当前视频帧为关键帧时,将当前视频帧与之前的关键帧库中的关键帧进行匹配,若所述关键帧库中存在与当前视频帧匹配的关键帧,则将当前视频帧作为回环帧;根据所述回环帧对相应的位姿变换矩阵进行优化更新,得到更新位姿变换矩阵;所述根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标,包括:根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。In an embodiment, after the calculation of the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images, the method further includes: calculating the distance between the current video frame and the previous key frame If the amount of movement is greater than the preset threshold, the current video frame is used as a key frame; when the current video frame is a key frame, the current video frame is matched with the key frame in the previous key frame library, if the If there is a key frame matching the current video frame in the key frame library, the current video frame is taken as the loop frame; the corresponding pose transformation matrix is optimized and updated according to the loop frame to obtain the updated pose transformation matrix; The pose transformation matrix determining the three-dimensional coordinates corresponding to each video frame image includes: determining the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
其中,为了减少后续优化的复杂度,可以通过关键帧的提取来减少计算的复杂度。由于采集到的视频帧比较密集,比如,一般一秒内就可以采集30帧,可见,帧与帧之间的相似度很高,甚至是完全一样的,那么如果计算每一帧无疑会增加计算复杂度。所以可以通过提取关键帧来减少复杂度。具体地,首先 将第一视频帧作为关键帧,然后通过计算当前视频帧与前一关键帧之间的运动量,若运动量在一定阈值范围则选为关键帧,其中运动量的计算公式为:Among them, in order to reduce the complexity of subsequent optimization, the calculation complexity can be reduced by extracting key frames. Since the captured video frames are relatively dense, for example, generally 30 frames can be captured within one second. It can be seen that the similarity between frames is very high, or even the same, so if you calculate each frame, it will undoubtedly increase the calculation. the complexity. Therefore, the complexity can be reduced by extracting key frames. Specifically, first take the first video frame as a key frame, and then calculate the amount of motion between the current video frame and the previous key frame. If the amount of motion is within a certain threshold range, select the key frame, where the calculation formula of the amount of motion is:
Figure PCTCN2019097745-appb-000004
Figure PCTCN2019097745-appb-000004
其中,E m表示运动量的度量,t x,t y,t z表示平移向量t的三个平移距离,
Figure PCTCN2019097745-appb-000005
表示帧间运动旋转欧拉角,可以从旋转矩阵转化得到,ω 1,ω 2分别为平移和旋转运动量的平衡权重,对相机拍摄的视觉场,旋转比平移更容易带来较大的场景变化,因此ω 2的取值比ω 1大,具体取值要根据具体情况进行调整。
Wherein, E m represents a measure of the amount of exercise, t x, t y, t z t represents the translation vector three translational distance,
Figure PCTCN2019097745-appb-000005
Represents the rotation Euler angle of inter-frame motion, which can be converted from the rotation matrix. ω 1 and ω 2 are the balance weights of translation and rotation respectively. For the visual field shot by the camera, rotation is easier to bring about larger scene changes than translation Therefore, the value of ω 2 is larger than ω 1 , and the specific value should be adjusted according to the specific situation.
在提取了关键帧后,采用回环检测的方法对得到的位姿变换矩阵进行优化更新。在一个实施例中,采用闭环检测算法进行回环检测。在进行回环检测后,根据回环检测结果对目标位姿变换矩阵进行更新优化,得到更准确的位姿变换矩阵,为了区分,称为“更新位姿变换矩阵”。根据更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。After the key frames are extracted, the loop detection method is used to optimize and update the obtained pose transformation matrix. In one embodiment, a closed loop detection algorithm is used for loop detection. After the loop detection is performed, the target pose transformation matrix is updated and optimized according to the loop detection result to obtain a more accurate pose transformation matrix, which is called "updated pose transformation matrix" for distinction. Determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
如图5所示,提出了一种无人机三维地图构建装置,该装置包括:As shown in Figure 5, a three-dimensional map construction device for drones is proposed, which includes:
提取模块502,用于获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;The extraction module 502 is configured to obtain video frame images taken by the camera, and extract feature points in each video frame image;
匹配模块504,用于采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;The matching module 504 is configured to use a color histogram and a scale-invariant feature transform hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images;
计算模块506,用于根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;The calculation module 506 is configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images;
确定模块508,用于根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;The determining module 508 is configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
转换模块510,用于根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;The conversion module 510 is configured to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
检测模块512,用于将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;The detection module 512 is configured to use the video frame image as the input of the target detection model, and obtain target information in the video frame image detected by the target detection model;
结合模块514,用于将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The combining module 514 is configured to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
在一个实施例中,所述匹配模块504还用于采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配,得到第一匹配对集合;采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, the matching module 504 is further configured to use a color histogram feature matching algorithm to match feature points between video frame images to obtain a first set of matching pairs; use a scale-invariant feature transform matching algorithm to The first matching pair further matches the matching points in the set to obtain the target feature point matching pair.
在一个实施例中,计算模块506还用于获取所述特征点匹配对中每个特征点的三维坐标;计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标;获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标;根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。In an embodiment, the calculation module 506 is further configured to obtain the three-dimensional coordinates of each feature point in the feature point matching pair; calculate the conversion obtained by converting the three-dimensional coordinates of the feature point in one video frame image to another video frame image Three-dimensional coordinates; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the another video frame image; calculate the pose transformation matrix according to the converted three-dimensional coordinates and the target three-dimensional coordinates.
在一个实施例中,所述目标检测模型是基于深度学习模型训练得到的;上述无人机三维地图构建装置还包括:训练模块,用于获取训练视频图像样本,所述训练视频图像样本包括正样本和负样本,所述正样本中包括有目标物以及所述目标物在所述视频图像中位置标记;根据所述训练视频图像样本对所述目标检测模型进行训练,得到训练好的目标检测模型。In one embodiment, the target detection model is obtained based on deep learning model training; the above-mentioned three-dimensional map construction device for drones further includes: a training module for obtaining training video image samples, the training video image samples including positive Samples and negative samples, the positive sample includes a target and a position mark of the target in the video image; training the target detection model according to the training video image sample to obtain a trained target detection model.
在一个实施例中,所述结合模块514还用于获取检测得到的目标物在视频帧图像中的目标位置;根据所述目标位置确定与之匹配的特征点;根据所述特征点将所述物体类别信息标注到所述三维点云地图。In one embodiment, the combining module 514 is also used to obtain the target position of the detected target in the video frame image; determine the matching feature point according to the target position; according to the feature point, the The object category information is marked on the three-dimensional point cloud map.
如图6所示,在一个实施例中,上述无人机三维地图构建装置还包括:As shown in FIG. 6, in one embodiment, the above-mentioned three-dimensional map construction device for drones further includes:
初始计算模块505,用于获取惯性测量单元测量得到的测量数据,根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵;The initial calculation module 505 is configured to obtain measurement data measured by the inertial measurement unit, and calculate an initial pose transformation matrix between video frames according to the measurement data;
所述计算模块还用于包括:根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。The calculation module is further configured to include: calculating the target pose transformation matrix between the video frames according to the feature point matching pair between the initial pose transformation matrix and the video frame image.
如图7所示,在一个实施例中,上述无人机三维地图构建装置还包括:As shown in FIG. 7, in one embodiment, the above-mentioned three-dimensional map construction device for drones further includes:
关键帧确定模块516,用于计算当前视频帧与前一关键帧之间的运动量,若运动量大于预设阈值,则将当前视频帧作为关键帧。The key frame determination module 516 is used to calculate the amount of motion between the current video frame and the previous key frame. If the amount of motion is greater than the preset threshold, the current video frame is used as the key frame.
回环帧确定模块518,用于当所述当前视频帧为关键帧时,将当前视频帧与之前的关键帧库中的关键帧进行匹配,若所述关键帧库中存在与当前视频帧匹配的关键帧,则将当前视频帧作为回环帧。The loopback frame determination module 518 is configured to match the current video frame with a key frame in the previous key frame library when the current video frame is a key frame, and if there is a key frame in the key frame library that matches the current video frame Key frame, the current video frame is used as a loopback frame.
优化模块520,用于根据所述回环帧对相应的位姿变换矩阵进行优化更新,得到更新位姿变换矩阵。The optimization module 520 is configured to optimize and update the corresponding pose transformation matrix according to the loop frame to obtain an updated pose transformation matrix.
所述确定模块508还用于根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。The determining module 508 is further configured to determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
图8示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是无人机、或与无人机连接的终端或服务器。如图8所示,该计算机设备包括通过系统总线连接的处理器、存储器、和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现无人机三维地图构建方法。该内存储器中也可储存有计算机程序,该计算机程序被 处理器执行时,可使得处理器执行无人机三维地图构建方法。网络接口用于与外接进行通信。本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Fig. 8 shows an internal structure diagram of a computer device in an embodiment. The computer equipment can be a drone, or a terminal or server connected to the drone. As shown in FIG. 8, the computer device includes a processor, a memory, and a network interface connected through a system bus. Among them, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program. When the computer program is executed by the processor, the processor can enable the processor to implement the method for constructing a three-dimensional map of the drone. A computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the method for constructing a three-dimensional map of the UAV. The network interface is used to communicate with an external device. Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
在一个实施例中,本申请提供的无人机三维地图构建方法可以实现为一种计算机程序的形式,计算机程序可在如图8所示的计算机设备上运行。计算机设备的存储器中可存储组成该无人机三维地图构建装置的各个程序模板。比如,提取模块502,匹配模块504,计算模块506,确定模块508,转换模块510,检测模块512和结合模块514。In an embodiment, the method for constructing a three-dimensional map of an unmanned aerial vehicle provided in this application can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 8. The memory of the computer equipment can store various program templates that make up the UAV 3D map construction device. For example, the extraction module 502, the matching module 504, the calculation module 506, the determination module 508, the conversion module 510, the detection module 512, and the combination module 514.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。A computer device includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor executes the following steps: acquiring video frame images captured by a camera, and extracting The feature points in each video frame image; the color histogram and the scale-invariant feature transformation hybrid matching algorithm are used to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images; according to the The feature point matching pair between the video frame images is calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates corresponding to each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and The corresponding pose transformation matrix converts the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system to obtain a three-dimensional point cloud map; use the video frame image as the input of the target detection model to obtain the target detection model detection The target object information in the obtained video frame image; combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
在一个实施例中,所述采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对,包括:采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配,得到第一匹配对集合;采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms The graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.
在一个实施例中,根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:获取所述特征点匹配对中每个特征点的三维坐标;计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标;获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标;根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。In one embodiment, calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.
在一个实施例中,所述目标检测模型是基于深度学习模型训练得到的;在所述将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型输出的检测得到目标物之前,还包括:获取训练视频图像样本,所述训练视频图像样本包括正样本和负样本,所述正样本中包括有目标物以及所述目标物在所述 视频图像中位置标记;根据所述训练视频图像样本对所述目标检测模型进行训练,得到训练好的目标检测模型。In one embodiment, the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.
在一个实施例中,所述将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图,包括:获取检测得到的目标物在视频帧图像中的目标位置;根据所述目标位置确定与之匹配的特征点;根据所述特征点将所述物体类别信息标注到所述三维点云地图。In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the object category information to the three-dimensional point cloud map according to the feature point.
在一个实施例中,所述计算机程序被所述处理器处理时,还用于执行以下步骤:获取惯性测量单元测量得到的测量数据;根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵;所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。In one embodiment, when the computer program is processed by the processor, it is also used to perform the following steps: obtain measurement data measured by the inertial measurement unit; calculate the initial pose between video frames based on the measurement data Transformation matrix; said calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, including: according to the initial pose transformation matrix and the video frame image The feature point matching pair is calculated to obtain the target pose transformation matrix between the video frames.
在一个实施例中,在所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵之后,所述计算机程序被所述处理器处理时,还用于执行以下步骤:计算当前视频帧与前一关键帧之间的运动量,若运动量大于预设阈值,则将当前视频帧作为关键帧;当所述当前视频帧为关键帧时,将当前视频帧与之前的关键帧库中的关键帧进行匹配,若所述关键帧库中存在与当前视频帧匹配的关键帧,则将当前视频帧作为回环帧;根据所述回环帧对相应的位姿变换矩阵进行优化更新,得到更新位姿变换矩阵;所述根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标,包括:根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。In one embodiment, after the pose transformation matrix between the video frame images is calculated according to the feature point matching pair between the video frame images, when the computer program is processed by the processor, the Perform the following steps: calculate the amount of motion between the current video frame and the previous key frame, if the amount of motion is greater than a preset threshold, use the current video frame as a key frame; when the current video frame is a key frame, set the current video frame Match with the key frame in the previous key frame library. If there is a key frame matching the current video frame in the key frame library, the current video frame is taken as the loop frame; the corresponding pose is transformed according to the loop frame The matrix is optimized and updated to obtain an updated pose transformation matrix; the determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix includes: determining the corresponding image of each video frame according to the updated pose transformation matrix Three-dimensional coordinates.
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。A computer-readable storage medium storing a computer program. When the computer program is executed by a processor, the processor executes the following steps: acquiring video frame images taken by a camera, and extracting features in each video frame image Point; use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images; according to the feature points between the video frame images The matching pair is calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates of each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and the corresponding pose transformation matrix The three-dimensional coordinates of the feature points in the frame image are converted to the world coordinate system to obtain a three-dimensional point cloud map; the video frame image is used as the input of the target detection model to obtain the target in the video frame image detected by the target detection model Object information; combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
在一个实施例中,所述采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对,包括:采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配,得 到第一匹配对集合;采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, the use of a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images includes: using color histograms The graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature points Matching pair.
在一个实施例中,根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:获取所述特征点匹配对中每个特征点的三维坐标;计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标;获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标;根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。In one embodiment, calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes: obtaining the three-dimensional coordinates of each feature point in the feature point matching pair; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the other video frame image; The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.
在一个实施例中,所述目标检测模型是基于深度学习模型训练得到的;在所述将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型输出的检测得到目标物之前,还包括:获取训练视频图像样本,所述训练视频图像样本包括正样本和负样本,所述正样本中包括有目标物以及所述目标物在所述视频图像中位置标记;根据所述训练视频图像样本对所述目标检测模型进行训练,得到训练好的目标检测模型。In one embodiment, the target detection model is obtained based on deep learning model training; before the video frame image is used as the input of the target detection model to obtain the detected target object output by the target detection model, It also includes: acquiring training video image samples, the training video image samples including positive samples and negative samples, the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples train the target detection model to obtain a trained target detection model.
在一个实施例中,所述将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图,包括:获取检测得到的目标物在视频帧图像中的目标位置;根据所述目标位置确定与之匹配的特征点;根据所述特征点将所述物体类别信息标注到所述三维点云地图。In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target object in the video frame image Location; determining a matching feature point according to the target location; marking the object category information to the three-dimensional point cloud map according to the feature point.
在一个实施例中,所述计算机程序被所述处理器处理时,还用于执行以下步骤:获取惯性测量单元测量得到的测量数据;根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵;所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。In one embodiment, when the computer program is processed by the processor, it is also used to perform the following steps: obtain measurement data measured by the inertial measurement unit; calculate the initial pose between video frames based on the measurement data Transformation matrix; said calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, including: according to the initial pose transformation matrix and the video frame image The feature point matching pair is calculated to obtain the target pose transformation matrix between the video frames.
在一个实施例中,在所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵之后,所述计算机程序被所述处理器处理时,还用于执行以下步骤:计算当前视频帧与前一关键帧之间的运动量,若运动量大于预设阈值,则将当前视频帧作为关键帧;当所述当前视频帧为关键帧时,将当前视频帧与之前的关键帧库中的关键帧进行匹配,若所述关键帧库中存在与当前视频帧匹配的关键帧,则将当前视频帧作为回环帧;根据所述回环帧对相应的位姿变换矩阵进行优化更新,得到更新位姿变换矩阵;所述根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标,包括:根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。In one embodiment, after the pose transformation matrix between the video frame images is calculated according to the feature point matching pair between the video frame images, when the computer program is processed by the processor, the Perform the following steps: calculate the amount of motion between the current video frame and the previous key frame, if the amount of motion is greater than a preset threshold, use the current video frame as a key frame; when the current video frame is a key frame, set the current video frame Match with the key frame in the previous key frame library. If there is a key frame matching the current video frame in the key frame library, the current video frame is taken as the loop frame; the corresponding pose is transformed according to the loop frame The matrix is optimized and updated to obtain an updated pose transformation matrix; the determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix includes: determining the corresponding image of each video frame according to the updated pose transformation matrix Three-dimensional coordinates.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施 例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through a computer program. The program can be stored in a non-volatile computer readable storage medium. Here, when the program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of this application, and their descriptions are more specific and detailed, but they should not be construed as limiting the scope of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (10)

  1. 一种无人机三维地图构建方法,其特征在于,所述方法包括:A method for constructing a three-dimensional map of an unmanned aerial vehicle, characterized in that the method includes:
    获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;Obtain the video frame images taken by the camera, and extract the feature points in each video frame image;
    采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;Use color histogram and scale-invariant feature transformation hybrid matching algorithm to match feature points between video frames and images, and obtain feature point matching pairs between video frames and images;
    根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;Calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;
    根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;Determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
    根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;Convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
    将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;Using the video frame image as an input of a target detection model to obtain target information in the video frame image detected by the target detection model;
    将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  2. 根据权利要求1所述的方法,其特征在于,所述采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对,包括:The method according to claim 1, wherein the color histogram and scale-invariant feature transformation hybrid matching algorithm is used to match the feature points between the video frame images to obtain the feature point matching between the video frame images Yes, including:
    采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配,得到第一匹配对集合;Use the color histogram feature matching algorithm to match the feature points between the video frame images to obtain the first matching pair set;
    采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。A scale-invariant feature transform matching algorithm is used to further match the matching points in the first matching pair set to obtain a target feature point matching pair.
  3. 根据权利要求1所述的方法,其特征在于,根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:The method according to claim 1, wherein the calculation of the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images comprises:
    获取所述特征点匹配对中每个特征点的三维坐标;Acquiring the three-dimensional coordinates of each feature point in the feature point matching pair;
    计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标;Calculate the converted three-dimensional coordinates obtained by converting the 3D coordinates of the feature points in one video frame image to another video frame image;
    获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标;Acquiring the target three-dimensional coordinates corresponding to the corresponding matching feature point in the another video frame image;
    根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。A pose transformation matrix is calculated according to the converted three-dimensional coordinates and the target three-dimensional coordinates.
  4. 根据权利要求1所述的方法,其特征在于,所述目标检测模型是基于深度学习模型训练得到的;The method according to claim 1, wherein the target detection model is obtained based on deep learning model training;
    在所述将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型输出的检测得到目标物之前,还包括:Before the step of using the video frame image as the input of the target detection model to obtain the detected target object output by the target detection model, the method further includes:
    获取训练视频图像样本,所述训练视频图像样本包括正样本和负样本,所述正样本中包括有目标物以及所述目标物在所述视频图像中位置标记;Acquiring a training video image sample, the training video image sample including a positive sample and a negative sample, the positive sample includes a target and a position mark of the target in the video image;
    根据所述训练视频图像样本对所述目标检测模型进行训练,得到训练好的目标检测模型。Training the target detection model according to the training video image sample to obtain a trained target detection model.
  5. 根据权利要求1所述的方法,其特征在于,所述将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图,包括:The method according to claim 1, wherein the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information comprises:
    获取检测得到的目标物在视频帧图像中的目标位置;Obtain the target position of the detected target in the video frame image;
    根据所述目标位置确定与之匹配的特征点;Determining a matching feature point according to the target position;
    根据所述特征点将所述物体类别信息标注到所述三维点云地图。Annotate the object category information to the three-dimensional point cloud map according to the characteristic points.
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    获取惯性测量单元测量得到的测量数据;Obtain the measurement data obtained by the inertial measurement unit;
    根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵;Calculating an initial pose transformation matrix between video frames according to the measurement data;
    所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵,包括:The calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images includes:
    根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。The target pose transformation matrix between the video frames is calculated according to the feature point matching pair between the initial pose transformation matrix and the video frame image.
  7. 根据权利要求1所述的方法,其特征在于,在所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵之后,还包括:The method according to claim 1, characterized in that, after calculating the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images, the method further comprises:
    计算当前视频帧与前一关键帧之间的运动量,若运动量大于预设阈值,则将当前视频帧作为关键帧;Calculate the amount of motion between the current video frame and the previous key frame, and if the amount of motion is greater than the preset threshold, use the current video frame as the key frame;
    当所述当前视频帧为关键帧时,将当前视频帧与之前的关键帧库中的关键帧进行匹配,若所述关键帧库中存在与当前视频帧匹配的关键帧,则将当前视 频帧作为回环帧;When the current video frame is a key frame, the current video frame is matched with the key frame in the previous key frame library, and if there is a key frame matching the current video frame in the key frame library, the current video frame As a loopback frame;
    根据所述回环帧对相应的位姿变换矩阵进行优化更新,得到更新位姿变换矩阵;Optimize and update the corresponding pose transformation matrix according to the looped frame to obtain an updated pose transformation matrix;
    所述根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标,包括:根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。The determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix includes: determining the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.
  8. 一种无人机三维地图构建装置,其特征在于,所述装置包括:An unmanned aerial vehicle three-dimensional map construction device, characterized in that, the device includes:
    提取模块,用于获取相机拍摄得到的视频帧图像,提取每个视频帧图像中的特征点;The extraction module is used to obtain the video frame images taken by the camera, and extract the feature points in each video frame image;
    匹配模块,用于采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配,得到视频帧图像之间的特征点匹配对;The matching module is used to use the color histogram and scale-invariant feature transformation hybrid matching algorithm to match the feature points between the video frame images to obtain the feature point matching pairs between the video frame images;
    计算模块,用于根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵;A calculation module, configured to calculate the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images;
    确定模块,用于根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标;A determining module, configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;
    转换模块,用于根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下,得到三维点云地图;The conversion module is used to convert the 3D coordinates of the feature points in the video frame image to the world coordinate system according to the 3D coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a 3D point cloud map;
    检测模块,用于将所述视频帧图像作为目标检测模型的输入,获取所述目标检测模型检测得到的视频帧图像中的目标物信息;The detection module is configured to use the video frame image as the input of the target detection model to obtain target information in the video frame image detected by the target detection model;
    结合模块,用于将所述三维点云地图与所述目标物信息结合,得到包含有目标物信息的三维点云地图。The combining module is used to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A computer device, comprising a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the method according to any one of claims 1 to 7 A step of.
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to execute the steps of the method according to any one of claims 1 to 7.
PCT/CN2019/097745 2019-03-19 2019-07-25 Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium WO2020186678A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910209625.1 2019-03-19
CN201910209625.1A CN110047142A (en) 2019-03-19 2019-03-19 No-manned plane three-dimensional map constructing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020186678A1 true WO2020186678A1 (en) 2020-09-24

Family

ID=67273899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097745 WO2020186678A1 (en) 2019-03-19 2019-07-25 Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN110047142A (en)
WO (1) WO2020186678A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375870A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Loop detection optimization method, electronic equipment and computer readable storage device

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110047142A (en) * 2019-03-19 2019-07-23 中国科学院深圳先进技术研究院 No-manned plane three-dimensional map constructing method, device, computer equipment and storage medium
CN110487274B (en) * 2019-07-30 2021-01-29 中国科学院空间应用工程与技术中心 SLAM method and system for weak texture scene, navigation vehicle and storage medium
CN112393720B (en) * 2019-08-15 2023-05-30 纳恩博(北京)科技有限公司 Target equipment positioning method and device, storage medium and electronic device
CN110490131B (en) * 2019-08-16 2021-08-24 北京达佳互联信息技术有限公司 Positioning method and device of shooting equipment, electronic equipment and storage medium
CN110543917B (en) * 2019-09-06 2021-09-28 电子科技大学 Indoor map matching method by utilizing pedestrian inertial navigation track and video information
CN110580703B (en) * 2019-09-10 2024-01-23 广东电网有限责任公司 Distribution line detection method, device, equipment and storage medium
CN110602456A (en) * 2019-09-11 2019-12-20 安徽天立泰科技股份有限公司 Display method and system of aerial photography focus
CN112241010A (en) * 2019-09-17 2021-01-19 北京新能源汽车技术创新中心有限公司 Positioning method, positioning device, computer equipment and storage medium
CN110660134B (en) * 2019-09-25 2023-05-30 Oppo广东移动通信有限公司 Three-dimensional map construction method, three-dimensional map construction device and terminal equipment
CN110705574B (en) * 2019-09-27 2023-06-02 Oppo广东移动通信有限公司 Positioning method and device, equipment and storage medium
CN110728245A (en) * 2019-10-17 2020-01-24 珠海格力电器股份有限公司 Optimization method and device for VSLAM front-end processing, electronic equipment and storage medium
CN110880187B (en) * 2019-10-17 2022-08-12 北京达佳互联信息技术有限公司 Camera position information determining method and device, electronic equipment and storage medium
CN110826448B (en) * 2019-10-29 2023-04-07 中山大学 Indoor positioning method with automatic updating function
CN111105454B (en) * 2019-11-22 2023-05-09 北京小米移动软件有限公司 Method, device and medium for obtaining positioning information
CN111009012B (en) * 2019-11-29 2023-07-28 四川沃洛佳科技有限公司 Unmanned aerial vehicle speed measuring method based on computer vision, storage medium and terminal
CN111145339B (en) * 2019-12-25 2023-06-02 Oppo广东移动通信有限公司 Image processing method and device, equipment and storage medium
CN111105695B (en) * 2019-12-31 2022-11-25 智车优行科技(上海)有限公司 Map making method and device, electronic equipment and computer readable storage medium
CN111199584B (en) * 2019-12-31 2023-10-20 武汉市城建工程有限公司 Target object positioning virtual-real fusion method and device
CN111462029B (en) * 2020-03-27 2023-03-03 阿波罗智能技术(北京)有限公司 Visual point cloud and high-precision map fusion method and device and electronic equipment
CN113853577A (en) * 2020-04-28 2021-12-28 深圳市大疆创新科技有限公司 Image processing method and device, movable platform and control terminal thereof, and computer-readable storage medium
CN111311685B (en) * 2020-05-12 2020-08-07 中国人民解放军国防科技大学 Motion scene reconstruction unsupervised method based on IMU and monocular image
CN111586360B (en) * 2020-05-14 2021-09-10 佳都科技集团股份有限公司 Unmanned aerial vehicle projection method, device, equipment and storage medium
CN111814731B (en) * 2020-07-23 2023-12-01 科大讯飞股份有限公司 Sitting posture detection method, device, equipment and storage medium
CN114119885A (en) * 2020-08-11 2022-03-01 中国电信股份有限公司 Image feature point matching method, device and system and map construction method and system
CN112215714A (en) * 2020-09-08 2021-01-12 北京农业智能装备技术研究中心 Rice ear detection method and device based on unmanned aerial vehicle
CN112419375B (en) * 2020-11-18 2023-02-03 青岛海尔科技有限公司 Feature point matching method and device, storage medium and electronic device
CN112613107A (en) * 2020-12-26 2021-04-06 广东电网有限责任公司 Method and device for determining construction progress of tower project, storage medium and equipment
CN112819889A (en) * 2020-12-30 2021-05-18 浙江大华技术股份有限公司 Method and device for determining position information, storage medium and electronic device
CN112634370A (en) * 2020-12-31 2021-04-09 广州极飞科技有限公司 Unmanned aerial vehicle dotting method, device, equipment and storage medium
CN114842156A (en) * 2021-02-01 2022-08-02 华为技术有限公司 Three-dimensional map construction method and device
CN112966718B (en) * 2021-02-05 2023-12-19 深圳市优必选科技股份有限公司 Image recognition method and device and communication equipment
CN112819892B (en) * 2021-02-08 2022-11-25 北京航空航天大学 Image processing method and device
CN112950667B (en) * 2021-02-10 2023-12-22 中国科学院深圳先进技术研究院 Video labeling method, device, equipment and computer readable storage medium
CN112907550B (en) * 2021-03-01 2024-01-19 创新奇智(成都)科技有限公司 Building detection method and device, electronic equipment and storage medium
CN112950715A (en) * 2021-03-04 2021-06-11 杭州迅蚁网络科技有限公司 Visual positioning method and device for unmanned aerial vehicle, computer equipment and storage medium
CN112991448B (en) * 2021-03-22 2023-09-26 华南理工大学 Loop detection method, device and storage medium based on color histogram
CN113326769B (en) * 2021-05-28 2022-11-29 北京三快在线科技有限公司 High-precision map generation method, device, equipment and storage medium
CN113628286B (en) * 2021-08-09 2024-03-22 咪咕视讯科技有限公司 Video color gamut detection method, device, computing equipment and computer storage medium
CN113673388A (en) * 2021-08-09 2021-11-19 北京三快在线科技有限公司 Method and device for determining position of target object, storage medium and equipment
CN113793414A (en) * 2021-08-17 2021-12-14 中科云谷科技有限公司 Method, processor and device for establishing three-dimensional view of industrial field environment
CN115729250A (en) * 2021-09-01 2023-03-03 中移(成都)信息通信科技有限公司 Flight control method, device and equipment of unmanned aerial vehicle and storage medium
CN114743116A (en) * 2022-04-18 2022-07-12 蜂巢航宇科技(北京)有限公司 Barracks patrol scene-based unattended special load system and method
CN114596363B (en) * 2022-05-10 2022-07-22 北京鉴智科技有限公司 Three-dimensional point cloud marking method and device and terminal
CN117115414B (en) * 2023-10-23 2024-02-23 西安羚控电子科技有限公司 GPS-free unmanned aerial vehicle positioning method and device based on deep learning
CN117395377B (en) * 2023-12-06 2024-03-22 上海海事大学 Multi-view fusion-based coastal bridge sea side safety monitoring method, system and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663391A (en) * 2012-02-27 2012-09-12 安科智慧城市技术(中国)有限公司 Image multifeature extraction and fusion method and system
US20170053538A1 (en) * 2014-03-18 2017-02-23 Sri International Real-time system for multi-modal 3d geospatial mapping, object recognition, scene annotation and analytics
CN108303099A (en) * 2018-06-14 2018-07-20 江苏中科院智能科学技术应用研究院 Autonomous navigation method in unmanned plane room based on 3D vision SLAM
CN108932475A (en) * 2018-05-31 2018-12-04 中国科学院西安光学精密机械研究所 A kind of Three-dimensional target recognition system and method based on laser radar and monocular vision
CN109146935A (en) * 2018-07-13 2019-01-04 中国科学院深圳先进技术研究院 A kind of point cloud registration method, device, electronic equipment and readable storage medium storing program for executing
CN110047142A (en) * 2019-03-19 2019-07-23 中国科学院深圳先进技术研究院 No-manned plane three-dimensional map constructing method, device, computer equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104835115A (en) * 2015-05-07 2015-08-12 中国科学院长春光学精密机械与物理研究所 Imaging method for aerial camera, and system thereof
CN106485655A (en) * 2015-09-01 2017-03-08 张长隆 A kind of taken photo by plane map generation system and method based on quadrotor
CN106097304B (en) * 2016-05-31 2019-04-23 西北工业大学 A kind of unmanned plane real-time online ground drawing generating method
CN106595659A (en) * 2016-11-03 2017-04-26 南京航空航天大学 Map merging method of unmanned aerial vehicle visual SLAM under city complex environment
CN109073385A (en) * 2017-12-20 2018-12-21 深圳市大疆创新科技有限公司 A kind of localization method and aircraft of view-based access control model
CN108692661A (en) * 2018-05-08 2018-10-23 深圳大学 Portable three-dimensional measuring system based on Inertial Measurement Unit and its measurement method
CN108648240B (en) * 2018-05-11 2022-09-23 东南大学 Non-overlapping view field camera attitude calibration method based on point cloud feature map registration
CN109410316B (en) * 2018-09-21 2023-07-07 达闼机器人股份有限公司 Method for three-dimensional reconstruction of object, tracking method, related device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663391A (en) * 2012-02-27 2012-09-12 安科智慧城市技术(中国)有限公司 Image multifeature extraction and fusion method and system
US20170053538A1 (en) * 2014-03-18 2017-02-23 Sri International Real-time system for multi-modal 3d geospatial mapping, object recognition, scene annotation and analytics
CN108932475A (en) * 2018-05-31 2018-12-04 中国科学院西安光学精密机械研究所 A kind of Three-dimensional target recognition system and method based on laser radar and monocular vision
CN108303099A (en) * 2018-06-14 2018-07-20 江苏中科院智能科学技术应用研究院 Autonomous navigation method in unmanned plane room based on 3D vision SLAM
CN109146935A (en) * 2018-07-13 2019-01-04 中国科学院深圳先进技术研究院 A kind of point cloud registration method, device, electronic equipment and readable storage medium storing program for executing
CN110047142A (en) * 2019-03-19 2019-07-23 中国科学院深圳先进技术研究院 No-manned plane three-dimensional map constructing method, device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375870A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Loop detection optimization method, electronic equipment and computer readable storage device
CN115375870B (en) * 2022-10-25 2023-02-10 杭州华橙软件技术有限公司 Loop detection optimization method, electronic equipment and computer readable storage device

Also Published As

Publication number Publication date
CN110047142A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
WO2020186678A1 (en) Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium
CN109974693B (en) Unmanned aerial vehicle positioning method and device, computer equipment and storage medium
JP7326720B2 (en) Mobile position estimation system and mobile position estimation method
CN110047108B (en) Unmanned aerial vehicle pose determination method and device, computer equipment and storage medium
CN111862201B (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN114842365B (en) Unmanned aerial vehicle aerial photography target detection and identification method and system
CN106815323B (en) Cross-domain visual retrieval method based on significance detection
CN110969648B (en) 3D target tracking method and system based on point cloud sequence data
Cepni et al. Vehicle detection using different deep learning algorithms from image sequence
Vaquero et al. Dual-branch CNNs for vehicle detection and tracking on LiDAR data
US20220292715A1 (en) Method and apparatus for estimating pose of device
CN111812978B (en) Cooperative SLAM method and system for multiple unmanned aerial vehicles
Ribeiro et al. Underwater place recognition in unknown environments with triplet based acoustic image retrieval
CN104463962B (en) Three-dimensional scene reconstruction method based on GPS information video
CN115861860B (en) Target tracking and positioning method and system for unmanned aerial vehicle
CN112818837B (en) Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
Wang et al. Online drone-based moving target detection system in dense-obstructer environment
CN114119757A (en) Image processing method, apparatus, device, medium, and computer program product
Chen et al. Towards bio-inspired place recognition over multiple spatial scales
Kasebi et al. Hybrid navigation based on GPS data and SIFT-based place recognition using Biologically-inspired SLAM
Yin et al. M2F2-RCNN: Multi-functional faster RCNN based on multi-scale feature fusion for region search in remote sensing images
Schwaiger et al. Ultrafast object detection on high resolution sar images
CN117523428B (en) Ground target detection method and device based on aircraft platform
An et al. Research on map matching of lidar/vision sensor for automatic driving aided positioning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19920105

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19920105

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19920105

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19920105

Country of ref document: EP

Kind code of ref document: A1