CN110047108B

CN110047108B - Unmanned aerial vehicle pose determination method and device, computer equipment and storage medium

Info

Publication number: CN110047108B
Application number: CN201910172148.6A
Authority: CN
Inventors: 周翊民; 陈鹏; 吴庆甜
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2021-05-25
Anticipated expiration: 2039-03-07
Also published as: CN110047108A

Abstract

The application relates to an unmanned aerial vehicle pose determining method, which comprises the following steps: acquiring inertial measurement data obtained by measurement of an inertial measurement unit in an unmanned aerial vehicle and a video frame image obtained by shooting of a camera; determining a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images; acquiring two video frame images, taking the pose transformation matrixes between the two video frame images and the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model; and determining the pose of the unmanned aerial vehicle according to the target pose transformation matrix. The unmanned aerial vehicle pose determining method not only improves the accuracy of unmanned aerial vehicle pose estimation, but also improves the real-time performance of unmanned aerial vehicle pose estimation. In addition, an unmanned aerial vehicle position and orientation determining device, computer equipment and a storage medium are also provided.

Description

Unmanned aerial vehicle pose determination method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for determining the pose of an unmanned aerial vehicle, computer equipment and a storage medium.

Background

With the development of science and technology, unmanned aerial vehicles are increasingly miniaturized and intelligentized, and the flight space of the unmanned aerial vehicles is expanded to jungles, cities and even buildings. Based on the fact that the flight space of the unmanned aerial vehicle is complex and changeable, the mainstream GPS integrated navigation system cannot be normally used in an indoor or unknown environment without GPS signals. The most important link for realizing autonomous navigation is to estimate the pose of the unmanned aerial vehicle, and the traditional pose estimation method of the unmanned aerial vehicle has low accuracy or large calculation amount, so that the real-time performance is low.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for determining a pose of an unmanned aerial vehicle with high accuracy and small computation workload.

In a first aspect, an embodiment of the present invention provides an unmanned aerial vehicle pose determination method, where the method includes:

acquiring inertial measurement data obtained by measurement of an inertial measurement unit in an unmanned aerial vehicle and a video frame image obtained by shooting of a camera;

determining a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images;

acquiring two video frame images, taking the pose transformation matrixes between the two video frame images and the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model;

and determining the pose of the unmanned aerial vehicle according to the target pose transformation matrix.

In a second aspect, an embodiment of the present invention provides an unmanned aerial vehicle pose determination apparatus, where the apparatus includes:

the acquisition module is used for acquiring inertial measurement data obtained by measurement of an inertial measurement unit in the unmanned aerial vehicle and video frame images obtained by shooting of a camera;

the matrix determination module is used for determining a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images;

the optimization module is used for acquiring two video frame images, taking the pose transformation matrix between the two video frame images and the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model;

and the position determining module is used for determining the position of the unmanned aerial vehicle according to the target pose transformation matrix.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the following steps:

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the processor is caused to execute the following steps:

According to the unmanned aerial vehicle pose determining method, the unmanned aerial vehicle pose determining device, the computer equipment and the storage medium, firstly, the inertial measurement data and the visual data (video frame images) are fused to obtain the pose transformation matrix between the video frame images, then, the pose transformation matrix is further optimized through the pose optimization model to obtain the target pose transformation matrix, and then, the pose of the unmanned aerial vehicle is determined according to the optimized target pose transformation matrix, so that the unmanned aerial vehicle pose estimation accuracy is greatly improved, in the process, the unmanned aerial vehicle pose determining method can be realized only through a small amount of calculation, the calculation amount is greatly reduced, and the real-time performance of unmanned aerial vehicle pose estimation is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a flow diagram of a method for determining pose of an unmanned aerial vehicle in one embodiment;

FIG. 2 is a schematic diagram of a pose determination method for an unmanned aerial vehicle according to an embodiment;

FIG. 3 is a block diagram of an embodiment of an apparatus for determining pose of an UAV;

fig. 4 is a block diagram of the structure of the unmanned aerial vehicle pose determination device in another embodiment;

fig. 5 is a block diagram of a pose determination apparatus for a drone according to yet another embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a method for determining a pose of an unmanned aerial vehicle is provided, where the method is applied to an unmanned aerial vehicle or a terminal or a server connected to the unmanned aerial vehicle, and in this embodiment, the method is described as being applied to the unmanned aerial vehicle, and specifically includes the following steps:

and 102, acquiring inertial measurement data obtained by measurement of an inertial measurement unit in the unmanned aerial vehicle and video frame images obtained by shooting of a camera.

Among them, an Inertial Measurement Unit (IMU) is a device that measures the three-axis attitude angle (or angular velocity) and acceleration of an object. The inertial measurement unit is used as an inertial parameter measurement device of the unmanned aerial vehicle, and the device comprises a three-axis gyroscope, a three-axis acceleration and a three-axis magnetometer. Unmanned aerial vehicle can directly read the measured data of inertial measurement unit measurement, and measured data includes: angular velocity, acceleration, magnetometer data, and the like. The video frame image is an image shot by the unmanned aerial vehicle in real time through the camera. In one embodiment, the camera employs a binocular vision video camera, such as an RGB-D camera.

And 104, determining a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images.

The method comprises the steps of obtaining a pose transformation matrix between every two video frame images by fusing inertial measurement data obtained by measurement of an inertial measurement unit and visual data acquired by a camera. After the video frame images are acquired, feature points in each video frame image need to be extracted, and feature point matching pairs among the video frame images are obtained by performing feature matching on the feature points. And calculating a pose transformation matrix between every two video frame images according to the feature point matching pairs obtained by matching and the inertial measurement data. The pose transformation matrix includes a rotation matrix R and a translational vector t.

And 106, acquiring two video frame images, taking the two video frame images and a pose transformation matrix between the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model.

In order to improve the accuracy of pose estimation, a pose transformation matrix between the video frame images determined according to the inertial measurement data and the video frame images is further optimized. And acquiring a target pose transformation matrix after the pose optimization model optimizes the pose transformation matrix by taking the two video frame images and the pose transformation matrix corresponding to the two video frame images as the input of the pose optimization model.

And step 108, determining the pose of the unmanned aerial vehicle according to the target pose transformation matrix.

The pose refers to the position and the attitude of the unmanned aerial vehicle, and is generally expressed by 6 quantities, the position can be expressed by three-dimensional space coordinates (X, y, z), the attitude can be expressed by included angles (phi, theta, psi) with three coordinate axes, and phi is a rolling angle and an angle rotating around an X axis; θ is the pitch angle, which is the angle of rotation about the Y axis; ψ is the yaw angle, which is the angle of rotation about the Z axis. After the target pose transformation matrix between the video frame images is obtained through calculation, the position of the unmanned aerial vehicle can be obtained through calculation according to the initial position of the unmanned aerial vehicle and the target pose transformation matrix between the video frame images. The target pose transformation matrix refers to the relative relationship of poses between every two video frames.

According to the unmanned aerial vehicle pose determining method, firstly, the pose transformation matrix between video frame images is obtained by fusing inertial measurement data and visual data (video frame images), then the pose transformation matrix is further optimized through the pose optimization model to obtain the target pose transformation matrix, and then the pose of the unmanned aerial vehicle is determined according to the optimized target pose transformation matrix, so that the unmanned aerial vehicle pose estimating accuracy is greatly improved, in the process, the unmanned aerial vehicle pose determining method can be realized only by a small amount of calculation, the calculation amount is greatly reduced, and the real-time performance of unmanned aerial vehicle pose estimation is improved.

In one embodiment, as shown in fig. 2, a schematic diagram of a method for determining pose of an unmanned aerial vehicle is shown. The method comprises the following steps of firstly, respectively obtaining inertial measurement data obtained by measurement of an inertial measurement unit, then obtaining video frame images obtained by shooting of a camera, fusing and calculating the two video frame images to obtain a pose transformation matrix, then inputting the two video frame images and the pose transformation matrix corresponding to the two video frame images into a pose optimization model, then outputting a target pose transformation matrix, and determining the pose of the unmanned aerial vehicle according to the target pose transformation matrix.

In one embodiment, the pose optimization model is trained based on a convolutional neural network model; before the inputting the two video frame images and the pose transformation matrix between the two video frame images as the pose optimization model, further comprising: acquiring a training video frame image pair sample set, wherein the training video frame image pair sample set comprises: a plurality of training video frame image pair samples, the training video frame image pair samples comprising: the pose transformation matrix is determined by inertia measurement data and the training video frame images; acquiring an expected pose transformation matrix corresponding to the sample by the training video frame image; and taking the training video frame image pair samples as the input of the pose optimization model, and taking the expected pose transformation matrix as the expected output to train the pose optimization model to obtain the trained pose optimization model.

The training video frame image sample pair set comprises a plurality of training video frame image pair samples, and the training video frame image pair samples comprise two training video frame images and a pose transformation matrix between the two training video frame images. The pose transformation matrix is obtained by calculation according to the inertial measurement data obtained by measurement of the inertial measurement unit and the video frame image obtained by shooting of the camera. The pose optimization model is obtained based on convolutional neural network model training, and the convolutional neural network comprises a convolutional layer, a pooling layer, a nonlinear mapping layer, a full-link layer and a softmax layer. Extracting and comparing two image characteristics through a convolutional neural network, and then learning by combining an input pose transformation matrix and an expected pose transformation matrix to obtain how to optimize the pose transformation matrix according to the image characteristics to obtain a target pose transformation matrix.

In one embodiment, after acquiring the inertial measurement data measured by the inertial measurement unit in the drone and the video frame image captured by the camera, the method further includes: taking the video frame image as the input of a control type model, and acquiring the control type output by the control type model, wherein the control type is divided into left-turn control, right-turn control and straight-going control; and adjusting the flight direction of the unmanned aerial vehicle according to the control type.

In order to control the video frame image to realize autonomous navigation, the video frame image shot by the camera is used as the input of the control type model, and then the control type output by the control type model is obtained. The control types are divided into left turn control, right turn control, and straight travel control. And then the flight direction of the unmanned aerial vehicle is adjusted in real time according to the control type. The control type model is used for identifying the control type corresponding to the current unmanned aerial vehicle, and the direction of the unmanned aerial vehicle can be adjusted conveniently according to the control type. The control type model is obtained by training based on a convolutional neural network model, training images of different angle types are obtained, for example, an image of the unmanned aerial vehicle deviating from the right direction to the left is obtained, the image is correspondingly marked as right control, an image of the unmanned aerial vehicle deviating from the right direction to the right is also obtained, the image is correspondingly marked as left control, and if the image is in the range of the right direction, the image is marked as straight control. And training the model by taking each training image as the input of the control type model and taking the corresponding label as expected output to obtain the trained control type model.

In one embodiment, the determining a pose transformation matrix between two video frame images according to the inertial measurement data and the video frame images includes: calculating to obtain an initial pose transformation matrix between video frame images according to the measurement data; extracting feature points in each video frame image, and performing feature matching on the feature points to obtain feature point matching pairs among the video frame images; and calculating to obtain a pose transformation matrix between the video frame images according to the initial pose transformation matrix and the feature point matching pairs between the video frame images.

After the measurement data obtained by measurement of the inertia measurement unit are obtained, the pose transformation matrix of the unmanned aerial vehicle can be directly obtained by calculation according to the measurement data, and the obtained pose transformation matrix of the unmanned aerial vehicle is not accurate enough due to the fact that the inertia measurement unit has accumulated errors. In order to distinguish the position and orientation transformation matrix after subsequent optimization, the position and orientation transformation matrix directly calculated according to the measurement data is called as an initial position and orientation transformation matrix. In one embodiment, an initial pose transformation matrix corresponding to the measurement data is obtained through calculation by adopting a complementary filtering algorithm. In one embodiment, the initial pose transformation matrix between the video frame images refers to an initial pose transformation matrix between adjacent video frames, that is, pose transformation matrices between two adjacent video frames are calculated respectively.

Extracting characteristic points in the color image, wherein the characteristic points can be simply understood as more prominent points in the image, such as contour points, bright points in darker areas, dark points in lighter areas and the like. The feature extraction may use ORB features, and ORB uses fast (features from obtained segment test) algorithm to detect feature points. The core idea of FAST is to find out the pell's points, i.e. to compare a point with its surrounding points, and to consider it as a feature point if it is different from most of them. Of course other features may be employed such as HOG features, LBP features, etc. Here, the HOG (Histogram of Oriented gradients) feature is a feature descriptor used for object detection in computer vision and image processing, and is constructed by calculating and counting a Gradient direction Histogram of a local region of an image. LBP (Local Binary Pattern) is an operator used to describe the Local texture features of an image. After the feature points in each video frame image are extracted, feature matching is carried out according to the features of the feature points to obtain feature point matching pairs among the video frame images. Because the unmanned aerial vehicle is flying continuously, the positions of the same point in the real space in different video frame images are different, and the positions of the same point in the real space in different video frames are obtained by acquiring the characteristics of the characteristic points in the front and rear video frames and then matching according to the characteristics.

In one embodiment, two adjacent video frame images are acquired, the features of a plurality of feature points are extracted from the previous video frame image and the next video frame image, and then the features of the feature points are matched to obtain matched feature points in the previous video frame image and the next video frame image, so as to form a feature point matching pair. For example, the feature points in the previous video frame image are P1, P2, P3 … …, Pn, respectively, and the corresponding matched feature points in the subsequent video frame image are Q1, Q2, Q3 … …, Qn, respectively. Wherein, P1 and Q1 are feature point matching pairs, P2 and Q2 are feature point matching pairs, and P3 and Q3 are feature point matching pairs. The matching of the feature points can adopt a Brute Force matching (Brute Force) algorithm or a fast approximate nearest neighbor (FLANN) algorithm to carry out feature matching, wherein the fast approximate nearest neighbor algorithm judges whether the ratio of the nearest matching distance to the next nearest matching distance exceeds a set threshold value, if so, the matching is judged to be successful, and mismatching point pairs are reduced.

And taking the initial pose transformation matrix as an initial estimation matrix, and calculating to obtain the pose transformation matrix through the conversion relation between the feature point matching pairs. The initial pose transformation matrix is used as an initial estimation matrix, so that the calculation complexity is greatly reduced, and the positioning speed and accuracy are improved.

In one embodiment, the obtaining a feature point matching pair between video frame images by performing feature matching on feature points includes: matching the feature points between the video frame images by adopting a color histogram feature matching algorithm to obtain a first matching pair set; and further matching the matching points in the first matching pair set by adopting a scale invariant feature transformation matching algorithm to obtain target feature point matching pairs.

Among them, the color histogram matching algorithm focuses on matching color features, and the Scale Invariant Feature Transform (SIFT) focuses on matching shape features. Therefore, the color histogram matching algorithm and the scale transformation feature transformation are mixed, namely the color of the color histogram is combined with the shape of the SIFT algorithm, so that the accuracy of feature identification is improved, the accuracy of feature point matching is improved, the real-time performance of identification is improved, and the real-time performance and the accuracy of the subsequent generation of the three-dimensional point cloud map are improved. Specifically, firstly, a color histogram is adopted to perform preliminary feature point matching to obtain a first matching pair set, and then a scale-invariant feature transformation matching algorithm is adopted to further match matching points in the first matching pair set to obtain a target feature point matching pair.

In one embodiment, the calculating a pose transformation matrix between video frame images according to the initial pose transformation matrix and the feature point matching pairs between the video frame images includes: acquiring the three-dimensional coordinates of each feature point in the feature point matching pair; calculating a three-dimensional coordinate obtained by converting the three-dimensional coordinate of the feature point in one video frame image into another video frame image by taking the initial pose transformation matrix between the video frame images as an initial value; acquiring a target three-dimensional coordinate corresponding to the corresponding matched feature point in the other video frame image; and calculating to obtain a pose transformation matrix according to the converted three-dimensional coordinates and the target three-dimensional coordinates.

After the matching pairs of the feature points are determined, the three-dimensional coordinates of each feature point are obtained, wherein the three-dimensional coordinates can be obtained according to a color image and a depth image which are obtained by an RGB-D camera, the color image is used for identifying and obtaining the x and y values of the feature points, and the depth image is used for obtaining the corresponding z value. For two video frame images, the feature point matching pairs are respectively used as two sets, and the set of feature points in the first video frame image is { P | P_i∈R³I is 1,2 … N, and the set of feature points in the second video frame image is { Q | Q }_i∈R³And i is 1,2 … N, taking the error between two point sets as a cost function, and obtaining the corresponding rotation matrix R and translation vector t through minimization of the cost function. Can be expressed by the following formula:

wherein, R and t are respectively a rotation matrix and a translation vector. The steps of the iterative closest point algorithm are as follows:

1) to P_iThe closest point of each point in Q is marked as Q_i；

2) Solving the transformation matrixes R and t which are the minimum according to the formula;

3) carrying out rigid body transformation operation on the point set P by utilizing R and t to obtain a new point set

Calculating the error distance between the new point set and the point set Q:

in actual operation, the rotation matrix and the translation vector with constraint conditions can be represented by an unconstrained lie algebra, and the number of characteristic points with error distances smaller than a set threshold value, namely the number of inner points, is recorded. If the error distance E calculated in step 3) is not zero_dIf the internal point is smaller than the threshold and the internal point is larger than the set threshold, or if the iteration frequency reaches the set threshold, the iteration is finished; if not, go to step 1) to carry out the next iteration. According to the method, the initial pose matrix obtained through calculation is used as the initial value of iteration, so that the iteration speed is improved, the calculation speed is improved, and the robustness is high.

As shown in fig. 3, in one embodiment, a pose determination apparatus for a drone includes:

an obtaining module 302, configured to obtain inertial measurement data obtained by measurement by an inertial measurement unit in an unmanned aerial vehicle and a video frame image obtained by shooting with a camera;

a matrix determination module 304, configured to determine a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images;

an optimizing module 306, configured to obtain two video frame images, use the pose transformation matrix between the two video frame images and the two video frame images as an input of a pose optimization model, and obtain an optimized target pose transformation matrix output by the pose optimization model;

and the position determining module 308 is configured to determine the position of the drone according to the target pose transformation matrix.

As shown in fig. 4, in one embodiment, the pose optimization model is trained based on a convolutional neural network model; the above-mentioned device still includes:

a training sample obtaining module 310, configured to obtain a training video frame image pair sample set, where the training video frame image pair sample set includes: a plurality of training video frame image pair samples, the training video frame image pair samples comprising: the pose transformation matrix is determined by inertia measurement data and the training video frame images;

an expected acquisition module 312, configured to acquire an expected pose transformation matrix corresponding to the sample from the training video frame image;

the training module 314 is configured to use the training video frame image pair samples as inputs of the pose optimization model, and use the expected pose transformation matrix as an expected output to train the pose optimization model, so as to obtain a trained pose optimization model.

As shown in fig. 5, in an embodiment, the unmanned aerial vehicle pose determination apparatus further includes:

a type determining module 316, configured to use the video frame image as an input of a control type model, and obtain a control type output by the control type model, where the control type is left-turn control, right-turn control, and straight-going control;

an adjusting module 318, configured to adjust the flight direction of the unmanned aerial vehicle according to the control type.

In one embodiment, the matrix determination module is further configured to calculate an initial pose transformation matrix between the video frame images according to the measurement data; acquiring video frame images shot by a camera, extracting feature points in each video frame image, and performing feature matching on the feature points to obtain feature point matching pairs among the video frame images; and calculating to obtain a pose transformation matrix between the video frame images according to the initial pose transformation matrix and the feature point matching pairs between the video frame images.

In one embodiment, the matrix determination module is further configured to match feature points between video frame images by using a color histogram feature matching algorithm to obtain a first matching pair set; and further matching the matching points in the first matching pair set by adopting a scale invariant feature transformation matching algorithm to obtain target feature point matching pairs.

In one embodiment, the matrix determination module is further configured to obtain three-dimensional coordinates of each feature point in the feature point matching pair; calculating a three-dimensional coordinate obtained by converting the three-dimensional coordinate of the feature point in one video frame image into another video frame image by taking the initial pose transformation matrix between the video frame images as an initial value; acquiring a target three-dimensional coordinate corresponding to the corresponding matched feature point in the other video frame image; and calculating to obtain a pose transformation matrix according to the converted three-dimensional coordinates and the target three-dimensional coordinates.

FIG. 6 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may be a drone, or a terminal or server connected to a drone. As shown in fig. 6, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the unmanned aerial vehicle pose determination method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the drone pose determination method. The network interface is used for communicating with an external device. Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the unmanned aerial vehicle pose determination method provided by the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 6. The memory of the computer device can store various program templates forming the unmanned aerial vehicle pose determination device. Such as an acquisition module 302, a matrix determination module 304, an optimization module 306, and a location determination module 308.

A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of: acquiring inertial measurement data obtained by measurement of an inertial measurement unit in an unmanned aerial vehicle and a video frame image obtained by shooting of a camera; determining a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images; acquiring two video frame images, taking the pose transformation matrixes between the two video frame images and the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model; and determining the pose of the unmanned aerial vehicle according to the target pose transformation matrix.

In one embodiment, the pose optimization model is trained based on a convolutional neural network model; before said inputting the two video frame images and the pose transformation matrix between the two video frame images as an input to a pose optimization model, the computer program, when executed by the processor, is further configured to perform the steps of: acquiring a training video frame image pair sample set, wherein the training video frame image pair sample set comprises: a plurality of training video frame image pair samples, the training video frame image pair samples comprising: the pose transformation matrix is determined by inertia measurement data and the training video frame images; acquiring an expected pose transformation matrix corresponding to the sample by the training video frame image; and taking the training video frame image pair samples as the input of the pose optimization model, and taking the expected pose transformation matrix as the expected output to train the pose optimization model to obtain the trained pose optimization model.

In one embodiment, after acquiring the inertial measurement data measured by the inertial measurement unit and the video frame image captured by the camera in the drone, the computer program is further configured to, when executed by the processor, perform the following steps: taking the video frame image as the input of a control type model, and acquiring the control type output by the control type model, wherein the control type is divided into left-turn control, right-turn control and straight-going control; and adjusting the flight direction of the unmanned aerial vehicle according to the control type.

In one embodiment, the determining a pose transformation matrix between two video frame images according to the inertial measurement data and the video frame images includes: calculating to obtain an initial pose transformation matrix between video frame images according to the measurement data; acquiring video frame images shot by a camera, extracting feature points in each video frame image, and performing feature matching on the feature points to obtain feature point matching pairs among the video frame images; and calculating to obtain a pose transformation matrix between the video frame images according to the initial pose transformation matrix and the feature point matching pairs between the video frame images.

A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

acquiring inertial measurement data obtained by measurement of an inertial measurement unit in an unmanned aerial vehicle and a video frame image obtained by shooting of a camera; determining a pose transformation matrix between every two video frame images according to the inertial measurement data and the video frame images; acquiring two video frame images, taking the pose transformation matrixes between the two video frame images and the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model; and determining the pose of the unmanned aerial vehicle according to the target pose transformation matrix.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An unmanned aerial vehicle pose determination method, the method comprising:

acquiring inertial measurement data obtained by measurement of an inertial measurement unit in an unmanned aerial vehicle and a video frame image obtained by shooting of a camera, wherein the camera adopts a binocular vision camera;

acquiring two video frame images, taking the pose transformation matrix between the two video frame images and the pose transformation matrix between the two video frame images as the input of a pose optimization model, and acquiring an optimized target pose transformation matrix output by the pose optimization model, wherein the method comprises the following steps: acquiring a training video frame image pair sample set, wherein the training video frame image pair sample set comprises: a plurality of training video frame image pair samples, the training video frame image pair samples comprising: the pose transformation matrix is determined by inertia measurement data and the training video frame images; acquiring an expected pose transformation matrix corresponding to the sample by the training video frame image; taking the training video frame image pair samples as the input of the pose optimization model, taking the expected pose transformation matrix as the expected output to train the pose optimization model, and obtaining a trained pose optimization model; the pose optimization model is obtained based on convolutional neural network model training, two image features are extracted and compared through a convolutional neural network, and then a target pose transformation matrix is obtained by optimizing the pose transformation matrix according to the image features through learning by combining an input pose transformation matrix and an expected pose transformation matrix;

2. The method of claim 1, further comprising, after acquiring the inertial measurement data measured by the inertial measurement unit and the video frame image captured by the camera in the drone:

taking the video frame image as the input of a control type model, and acquiring the control type output by the control type model, wherein the control type is divided into left-turn control, right-turn control and straight-going control;

and adjusting the flight direction of the unmanned aerial vehicle according to the control type.

3. The method of claim 1, wherein said determining a pose transformation matrix between two video frame images from said inertial measurement data and said video frame images comprises:

calculating to obtain an initial pose transformation matrix between video frame images according to the measurement data;

acquiring video frame images shot by a camera, extracting feature points in each video frame image, and performing feature matching on the feature points to obtain feature point matching pairs among the video frame images;

and calculating to obtain a pose transformation matrix between the video frame images according to the initial pose transformation matrix and the feature point matching pairs between the video frame images.

4. The method of claim 3, wherein the obtaining a feature point matching pair between the video frame images by performing feature matching on the feature points comprises:

matching the feature points between the video frame images by adopting a color histogram feature matching algorithm to obtain a first matching pair set;

and further matching the matching points in the first matching pair set by adopting a scale invariant feature transformation matching algorithm to obtain target feature point matching pairs.

5. The method according to claim 3, wherein the calculating a pose transformation matrix between video frame images according to the initial pose transformation matrix and the feature point matching pairs between the video frame images comprises:

acquiring the three-dimensional coordinates of each feature point in the feature point matching pair;

calculating a three-dimensional coordinate obtained by converting the three-dimensional coordinate of the feature point in one video frame image into another video frame image by taking the initial pose transformation matrix between the video frame images as an initial value;

acquiring a target three-dimensional coordinate corresponding to the corresponding matched feature point in the other video frame image;

and calculating to obtain a pose transformation matrix according to the converted three-dimensional coordinates and the target three-dimensional coordinates.

6. An unmanned aerial vehicle position and orientation determination device, the device comprising:

the optimization module is used for acquiring two video frame images, taking the pose transformation matrix between the two video frame images and the two video frame images as the input of a pose optimization model, and acquiring the optimized target pose transformation matrix output by the pose optimization model, and comprises: acquiring a training video frame image pair sample set, wherein the training video frame image pair sample set comprises: a plurality of training video frame image pair samples, the training video frame image pair samples comprising: the pose transformation matrix is determined by inertia measurement data and the training video frame images; acquiring an expected pose transformation matrix corresponding to the sample by the training video frame image; taking the training video frame image pair samples as the input of the pose optimization model, taking the expected pose transformation matrix as the expected output to train the pose optimization model, and obtaining a trained pose optimization model; the pose optimization model is obtained based on convolutional neural network model training, two image features are extracted and compared through a convolutional neural network, and then a target pose transformation matrix is obtained by optimizing the pose transformation matrix according to the image features through learning by combining an input pose transformation matrix and an expected pose transformation matrix;

7. The apparatus according to claim 6, wherein the pose optimization model is trained based on a convolutional neural network model; the device further comprises:

a training sample acquisition module, configured to acquire a training video frame image pair sample set, where the training video frame image pair sample set includes: a plurality of training video frame image pair samples, the training video frame image pair samples comprising: the pose transformation matrix is determined by inertia measurement data and the training video frame images;

the expected acquisition module is used for acquiring an expected pose transformation matrix corresponding to the sample by the training video frame image;

and the training module is used for taking the training video frame image pair samples as the input of the pose optimization model and taking the expected pose transformation matrix as the expected output to train the pose optimization model so as to obtain the trained pose optimization model.

8. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 5.

9. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 5.