WO2022126529A1 - Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage - Google Patents

Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage Download PDF

Info

Publication number
WO2022126529A1
WO2022126529A1 PCT/CN2020/137313 CN2020137313W WO2022126529A1 WO 2022126529 A1 WO2022126529 A1 WO 2022126529A1 CN 2020137313 W CN2020137313 W CN 2020137313W WO 2022126529 A1 WO2022126529 A1 WO 2022126529A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
description information
key point
generation layer
information
Prior art date
Application number
PCT/CN2020/137313
Other languages
English (en)
Chinese (zh)
Inventor
梁湘国
杨健
蔡剑钊
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/137313 priority Critical patent/WO2022126529A1/fr
Priority to CN202080069130.4A priority patent/CN114556425A/zh
Publication of WO2022126529A1 publication Critical patent/WO2022126529A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present application relates to the field of visual return navigation, and in particular, to a positioning method, device, unmanned aerial vehicle and storage medium.
  • UAVs are unmanned aircraft operated by radio remote control equipment and self-contained program control devices. Alternatively, UAVs can also be fully or intermittently operated autonomously by on-board computers. During the flight of the UAV, since many times it is in the range of over-the-horizon, in order to ensure the safety of the UAV, it is quite necessary to automatically return to the home.
  • the UAV In the process of automatic return of the UAV, the UAV needs to locate the current position relatively quickly and accurately. However, it is very important to quickly and accurately locate the small-sized equipment such as UAVs.
  • the present application provides a positioning method, device, unmanned aerial vehicle and storage medium, which can be used for faster and more accurate positioning.
  • a first aspect of the present application is to provide a positioning method, the method is applied to a movable platform, and the movable platform includes a vision sensor, including: acquiring first image description information of historical images collected by the vision sensor and the first key point description information, and obtain the first position information of the movable platform when collecting the historical image; obtain the current image collected by the vision sensor, and obtain the first position of the current image based on the feature extraction model two image description information and second key point description information; based on the first image description information and the first key point description information of the historical image, and the second image description information and all the current image
  • the second key point description information is used to determine the matching results of a plurality of the historical images and the current image; according to the matching results and the first position information of the historical images, determine the second position information of the movable platform.
  • a second aspect of the present application is to provide a positioning device, including: a memory, a processor and a visual sensor; the memory for storing a computer program; the visual sensor for collecting historical images and current images; The processor invokes the computer program to implement the following steps: acquiring the first image description information and the first key point description information of the historical images collected by the visual sensor, and acquiring the available information when collecting the historical images.
  • the first position information of the mobile platform obtain the current image collected by the visual sensor, and obtain the second image description information and second key point description information of the current image based on the feature extraction model; The first image description information and the first key point description information, and the second image description information and the second key point description information of the current image, to determine a plurality of the historical images and the current image The matching result; according to the matching result and the first position information of the historical image, determine the second position information of the movable platform when the current image is collected.
  • a third aspect of the present application is to provide an unmanned aerial vehicle, comprising: a body and the positioning device described in the second aspect.
  • a fourth aspect of the present application is to provide a computer-readable storage medium, the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used in the first aspect. method described.
  • a positioning method provided by the present application is applied to a movable platform.
  • the movable platform includes a visual sensor, including: acquiring first image description information and first key point description information of historical images collected by the visual sensor, and acquiring The first position information of the movable platform when the historical image is collected; the current image collected by the visual sensor is acquired, and the second image description information and the second key point description information of the current image are obtained based on the feature extraction model; The image description information and the first key point description information, as well as the second image description information and the second key point description information of the current image, determine the matching results of multiple historical images and the current image; according to the matching results and the first position of the historical image information to determine the second position information of the movable platform when the current image is captured.
  • the second image description information and the second key point description information of the current image can be obtained at the same time, which improves the efficiency of obtaining the two description information.
  • the two description information can be more accurately determined, thereby further saving the positioning time, improving the positioning accuracy, and satisfying the real-time performance of acquiring the two description information and the real-time performance of positioning.
  • this method can also be applied to movable platforms such as UAVs, which can help the UAVs to return home more smoothly.
  • fusion training can be performed for the second image description information and the second key point description information, that is, training is performed for the one feature extraction model, that is, it can be achieved.
  • the acquisition of the second image description information and the second key point description information improves the global performance.
  • the embodiments of the present application also provide a device, an unmanned aerial vehicle, and a storage medium based on the method, all of which can achieve the above effects.
  • FIG. 1 is a schematic flowchart of a positioning method provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a feature extraction model provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a positioning device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a positioning device provided by an embodiment of the present application.
  • the UAV needs to locate the current position relatively quickly and accurately, especially for a small-sized mobile platform such as the UAV.
  • the key frame matching and key point matching can be divided into two tasks for visual return navigation, but considering the time efficiency, the key frame matching can use the BoW (Bag of Words, bag of words model) method. Although the time efficiency is high, the effect is not ideal.
  • the ORB Oriented Fast and Rotated Brief, a fast feature point extraction and description algorithm
  • the embodiments of the present application provide a method of generating the descriptors of key frames and key points in the same network model, and reconstructing the network structure for positioning.
  • FIG. 1 is a schematic flowchart of a positioning method provided by an embodiment of the present invention
  • the method 100 provided by an embodiment of the present application can be applied to a movable platform, such as an unmanned aerial vehicle and an intelligent mobile robot, and the movable platform includes a visual sensor.
  • the method 100 includes the following steps:
  • the second image description information and the second key point description information of the current image can be obtained at the same time, which improves the efficiency of obtaining the two description information.
  • the two description information can be more accurately determined, thereby further saving the positioning time, improving the positioning accuracy, and satisfying the real-time performance of acquiring the two description information and the real-time performance of positioning.
  • this method can also be applied to movable platforms such as UAVs, which can help the UAVs to return home smoothly and ensure the safety of UAVs.
  • the method 100 can be applied to a movable platform, and the movable platform can also involve other movable platforms besides drones, or movable devices, such as sweeping robots, etc., which can make these movable platforms Or the mobile device can automatically return to the home or automatically return to the original location, etc.
  • the visual sensor refers to an instrument that uses optical components and imaging devices to obtain image information of the external environment. It can be set inside the movable platform and used to obtain the external environment information of the movable platform, such as the outside of the current geographic location of the UAV. environment image.
  • Historical images refer to the historical images obtained by the movable platform during the moving process, such as the external environment images obtained by the UAV during the normal navigation phase.
  • the external environment image obtained during the normal navigation phase can be used as a historical image for reference to determine the current geographic location of the UAV during the process of returning to the flight.
  • the first image description information refers to information representing the characteristics of the image, such as image descriptors.
  • the image can be a key frame image in the moving process, which can be called a key frame descriptor.
  • the first key point description information refers to information representing the features of key points in the image, such as key point descriptors in the image.
  • the key point may be a corner or edge in the image.
  • the first location information refers to the geographic location where the movable platform is located when the movable platform obtains the corresponding historical image.
  • the current geographic location can be determined by a positioning device of the movable platform, such as GPS (Global Positioning System, global positioning system).
  • GPS Global Positioning System, global positioning system
  • the pose of the movable platform which may also be called orientation information, can also be obtained, so as to determine the pose of the movable platform.
  • the above-mentioned first image description information and first key point description information can be obtained by the following feature extraction model. It can also be acquired by other acquisition methods, such as SIFT (scale-invariant feature transform, Scale-invariant feature transform) algorithm, or, SuperPoint (feature point) algorithm, etc. It should be noted that although the above-mentioned other acquisition methods cannot adapt to the above-mentioned complex algorithms due to their relatively complex algorithms and the mobile platform may have problems such as small size, the above-mentioned complex algorithms can still be run, but the real-time performance is not ideal. . However, for the historical images, the historical images may not be acquired in real time, and the historical images may be acquired with time intervals.
  • the UAV when the UAV is sailing normally in the air, the UAV can obtain the image of the external environment when the UAV is in the air through the camera of the vision sensor set on the UAV from the normal navigation, that is, the historical image.
  • the vision sensor transmits the acquired historical images to the drone for image processing.
  • the vision sensor can acquire historical images in real time, or it can acquire historical images with time intervals.
  • the vision sensor or UAV can also determine whether it belongs to the key frame image according to the acquired historical image, which can be determined according to the determination rules of the key frame, and then the vision sensor determines whether to send the historical image to the UAV.
  • the drone processes historical images after determining keyframes.
  • the UAV can obtain image descriptors of historical images and key point descriptors in the image, such as corner descriptors, through the following feature extraction model or SIFT algorithm.
  • the current image refers to an image obtained by the movable platform at the current geographic location during the process of returning to the voyage or returning to the mobile platform.
  • the second image description information and the second key point description information are of the same nature as the first image description information and the first key point description information in the foregoing step 101, and will not be repeated here.
  • the second image description information and the second key point description information are obtained based on the same model (ie, the feature extraction model), it can not only improve the efficiency of information acquisition, but also meet the requirements of the descriptors obtained in real time during model training. When the model is fused instead of separately trained, the overall situation is better.
  • the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared feature information. generating second image description information; the key point information generation layer is used for generating second key point description information based on the common feature information.
  • FIG. 2 shows a schematic diagram of the structure of the feature extraction model.
  • the feature extraction model includes a feature extraction layer 201 , an image description information generation layer 203 and a key point information generation layer 202 .
  • the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network, including: the feature extraction layer is used to extract the common feature information based on multiple convolutional layers in the convolutional network.
  • the convolution layer may perform convolution on the input historical image or current image, that is, the real image 204 , to obtain convolved image feature information, that is, shared feature information.
  • the key point information generation layer is used to generate the second key point description information based on the shared feature information, including: the key point information generation layer is used to extract the first key point information from the shared feature information based on a convolutional layer and bilinear upsampling.
  • the key point information generation layer is used to extract the first key point information from the shared feature information based on a convolutional layer and bilinear upsampling.
  • Two key point description information the number of convolution kernels of one convolution layer is the same as the number of the second key point description information.
  • the keypoint information generation layer 202 may include a convolutional layer, and the number of convolution kernels in the convolutional layer is the same as that of the second keypoint description information. After bilinear upsampling, the key point descriptor 2021 in the current image can be obtained.
  • the image description information generation layer is configured to generate the second image description information based on the shared feature information, including: the image information generation layer is configured to extract the second image description information from the shared feature information through two convolution layers and a NetVLAD layer. As shown in FIG. 2, for the image description information generation layer 203, it may include two convolution layers and a NetVLAD (Net Vector of Local Aggregated Descriptors) layer, thereby generating the current image descriptor 2031 .
  • NetVLAD Net Vector of Local Aggregated Descriptors
  • the SIFT algorithm in the key point descriptor generation method, although it is described above that the SIFT algorithm can be used to obtain the descriptor, and its effect is good, the SIFT algorithm has high complexity and cannot be implemented in real time on embedded devices. For the return flight, the real-time performance is poor. In addition, the SIFT algorithm has not been specially improved for large-scale and large-angle changes, and is not suitable for use in visual return. However, other traditional methods generally have poor effect or high time complexity, and cannot meet the requirements of descriptors in visual return.
  • SuperPoint is a better model at present.
  • This model can obtain the position of the key point in the image and the key point descriptor at the same time, but the model is relatively large. It is difficult to run in real time on embedded devices, and because its training data is generated through homography changes, it cannot well simulate the actual usage scenarios in visual return navigation.
  • the above-mentioned one feature extraction model can meet the requirement of being able to run in real time on a mobile platform, that is, the embedded device.
  • the feature extraction layer can extract the features of the current image or historical images by fast downsampling with a convolutional layer with a stride of 2, which can reduce computing power consumption.
  • this model structure can reduce the computational complexity of the network as much as possible while ensuring the effect of extracting the descriptors, and at the same time combine the information of graph features and point features, that is, the shared feature information, to generate key point descriptors and points in the same network model.
  • Image descriptors not only make full use of the commonality of images and key points, but also save a lot of time for repeated feature calculation. For example, as mentioned above, when the drone is subjected to environmental factors such as weather, which causes the signal to be interrupted, or there is a problem with the GPS positioning device, it can automatically trigger the return flight. In the process of returning home, the UAV can obtain the current image in real time through the vision sensor and send it to the UAV for processing.
  • the current image can be input into the feature extraction model.
  • the feature extraction layer in the model is first passed, and the common feature information of the current image is obtained through the convolution layer in the feature extraction layer. Then this shared feature information is sent to the image description information generation layer and the key point information generation layer respectively.
  • the image description information generation layer obtains the current image descriptor through two convolution layers and NetVLAD layers.
  • the key point information generation layer receives the shared feature information, it obtains the key point descriptor in the current image through a convolution layer and bilinear upsampling.
  • the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer, including: the image information generation layer is used to pass the two convolution layers and the NetVLAD layer.
  • the feature information is extracted into the second image description information of the floating point data type; the image information generation layer is used to convert the second image description information of the floating point data type into the second image description information of the Boolean data type.
  • the image description information generation layer 203 obtains the current image descriptor 2031 through two convolutional layers and the NetVLAD layer, and the current image description obtained at this time
  • the sub 2031 is of floating point data type, and the current image descriptor 2031 of the floating data type is converted into the current image descriptor 2032 of the Boolean data type.
  • the above problem also exists for the key point information generation layer, so the data type of the key point descriptors in this layer can also be converted from the floating point data type to the Boolean data type.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information through a convolution layer and bilinear upsampling, including: the key point information generation layer is used to pass a convolution layer. and bilinear upsampling, extracting the second key point description information of the floating point data type from the common feature information; the key point information generation layer is used to convert the second key point description information of the floating point data type into the Boolean data type.
  • the second key point describes the information.
  • the keypoint information generation layer 202 after receiving the shared feature information, the keypoint information generation layer 202 obtains the keypoint descriptor 2021 in the current image through a convolutional layer and bilinear upsampling.
  • the key point descriptor 2021 obtained at this time is of the floating point data type, and then the key point descriptor 2021 of the floating point data type is converted into the key point descriptor 2022 of the Boolean data type.
  • the SuperPoint algorithm it is different from the SuperPoint algorithm to finally obtain all the keypoint descriptors, and then obtain the descriptors of the keypoints according to the positions of the keypoints.
  • the key point descriptors are directly obtained by bilinear upsampling, and by only performing bilinear upsampling on the key point positions, when using can greatly reduce the amount of computation.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information based on a convolution layer and bilinear upsampling, including: determining the position of the key point in the current image; generating the key point information
  • the layer is used to obtain the down-sampling information of the shared feature information through a convolution layer; the key point information generation layer is used to directly up-sample the information of the corresponding position in the down-sampling information through bilinear up-sampling to obtain the second key point Description.
  • the position of the key point refers to the position of the key point in the image, that is, the size of the image in which the key point is located. If the size of the key point is 16*16 pixels, the position of the key point in the image can be determined.
  • key points are obtained in a certain grid area in the current image, such as 16x16 pixels.
  • the descriptor is directly obtained by bilinear upsampling, which not only reduces other
  • the consumption of deconvolution upsampling in the learning-based method, and in the actual training and use, only upsampling for the position of the key points can greatly reduce the time consumption.
  • the creation of the above feature extraction model is obtained through model training. Because the model is a multi-task branch network model, it can be trained step by step during training. First, use the key point training set to train the model, and initially fix the parameters in the image description information generation layer, which are used for Determine the image descriptor (which can be the current image descriptor or the historical image descriptor). When the model is trained until the loss does not decrease significantly, the parameters of the key point information generation layer and the parameters of the feature extraction layer can be determined. Then, the obtained image matching training set is used to train the image description information generation layer, and its final parameters are determined.
  • the image description information generation layer which can be the current image descriptor or the historical image descriptor
  • the image description information generation layer can also be trained first, so that the parameters of the feature extraction layer can also be determined, and then the key point information generation layer is trained.
  • the model trained in this way is slightly less accurate than the model trained by the above training method.
  • the model in order to improve the training time of the entire model, can be trained through other training platforms, such as through a server or a computer, and then transplanted to a mobile platform after the training is completed.
  • the model can also be trained on the movable platform.
  • the initial feature extraction layer is trained through the first training data, and the trained feature extraction layer is generated as the feature extraction layer in the feature extraction model;
  • the first training data includes image point pairs corresponding to the same spatial point, The image point pair is represented in different corresponding real images of the same visual scene;
  • the initial key point information generation layer is trained, and the trained key point information generation layer is generated as the key point in the feature extraction model.
  • Point information generation layer is generated by the initial key point information generation layer.
  • the first training data is the training data in the above-mentioned key point training set.
  • the structure of the initial keypoint information generation layer is the same as that of the trained keypoint information generation layer. Only the parameters are different. For the initial keypoint information generation layer, the parameters are the initial parameters.
  • the training process is the training process of the network model, which will not be repeated here. It is only explained: the image point pair may be the image point pair corresponding to the spatial point in the same three-dimensional space.
  • the image point pair is derived from two images that are represented as different real images of the same visual scene, like two real images of the same location but at different angles, or at different image acquisition locations.
  • the acquisition method of the first training data including the above-mentioned image point pairs is as follows:
  • the real images from different angles in each visual scene For different visual scenes, obtain real images from different angles in each visual scene; for each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles; based on the similarity between spatial points, Select spatial points from the spatial three-dimensional model, and obtain the real image point pairs corresponding to each selected spatial point in the real image; select the real image point pairs according to the similarity between the collection positions of the real image point pairs , and the selected real image point pairs are used as key point pairs to obtain the first training data.
  • the acquisition of real images from different angles may be:
  • the UAV may acquire real images according to the size of the flight height and the attitude angle.
  • the UAV is used to collect the real image of the downward view, that is, the image data.
  • the data that is too similar is eliminated according to the similarity of the collected images.
  • real data can be provided during model training and testing.
  • the real data can include a large number of collected real images and matching feature points in the real images, which can also be called key points.
  • the process of constructing a three-dimensional spatial model may be: For example, according to the foregoing, for at least two real images (which may be two, three, four, and five, etc.) of the same visual scene obtained above, using SFM (Structure from motion, three-dimensional modeling) modeling method to build a three-dimensional model of space. After the model is established, there are 2D points in at least two real images corresponding to each real 3D point in the spatial three-dimensional model, thereby forming a 2D point pair. In order to improve the generalization ability of the model, robust feature descriptions can be extracted when processing different types of key points.
  • the embodiment of the present application can use a variety of different types of key points to construct a three-dimensional model through SFM.
  • the key point type may include, but is not limited to, SIFT type key points (which may be key points or corner points obtained by SIFT algorithm, etc.), FAST (Features from accelerated segment test, feature point detection algorithm) type key points (which may be The key points or corner points obtained by the FAST algorithm), the key points of the ORB type (which can be the key points or corner points obtained by the ORB algorithm, etc.), the key points of the Harris type (which can be obtained by the Harris algorithm) key points or corner points, etc.).
  • SIFT type key points which may be key points or corner points obtained by SIFT algorithm, etc.
  • FAST Features from accelerated segment test, feature point detection algorithm
  • type key points which may be The key points or corner points obtained by the FAST algorithm
  • the key points of the ORB type which can be the key points or corner points obtained by the ORB algorithm, etc.
  • Harris which can be obtained by the Harris algorithm
  • the 3D points corresponding to the 2D point pairs that can be obtained through the above process will contain a large number of 3D points with similar distances, especially when a certain area in the image is particularly rich in texture, a large number of 3D points corresponding to this area will appear, which will affect The balanced distribution of training data needs to be filtered.
  • the screening process is as follows:
  • a 3D point set S can be defined first, including the filtered 3D points. It can be traversed from the 3D points generated above, so that the similarity of any two 3D points in the filtered 3D points is less than or equal to a threshold, then the similarity of any two 3D points in the 3D points obtained in the set S is degree is less than or equal to a threshold.
  • the similarity algorithm can be determined by the Euclidean distance algorithm.
  • a set P may also be set, and the set P is a set of candidate 3D points. Before screening, all the generated 3D points can be set in the set P as candidate 3D points. First, it is determined from the set P that the similarity of any two 3D points is less than or equal to a threshold value, and any two 3D points can be selected by traversing to determine their similarity. If the similarity is less than or equal to a threshold, put the any two 3D points into the set S. At this time, the similarity between the set P and each 3D point in the set S, that is, the Euclidean distance d, is calculated.
  • the corresponding 3D points in the set P are added to the set S, so that The similarity of any two 3D points in the set S is less than or equal to a threshold, so that the 3D points in the set S are not overly similar and have data balance. If the set S is empty after screening, the candidate 3D point P may be added to the set S, where the candidate 3D point may be the 3D point generated above.
  • the selection of spatial points is completed, and the corresponding 2D point pairs need to be screened, that is, the corresponding real image point pairs are selected.
  • the spatial points in the three-dimensional spatial model have a corresponding relationship with the real image points in the real image used to construct the three-dimensional spatial model.
  • the spatial points that have been screened also have corresponding real image point pairs, that is, 2D point pairs.
  • each 3D point after screening will correspond to 2D points in multiple perspectives (real images from different perspectives used to construct the 3D model of the space), in order to increase the difficulty of the dataset, improve the accuracy and universality of the model. For each 3D point only the hardest pair of matching 2D points is kept.
  • a 3D point set S is obtained, and any 3D point m in S is defined, and the corresponding 2D points under different viewing angles form a set T, and an image acquisition device (set on a movable platform) under each viewing angle in T is used. ), such as a camera, the poses constitute a set Q, and it should be understood that each pose corresponds to an image acquisition device, such as a camera.
  • the set Q calculates the similarity between the corresponding image acquisition devices, such as cameras, and positions, such as the Euclidean distance, to obtain the two camera positions with the largest Euclidean distance, keep the corresponding 2D points in T, and discard the remaining 2D points.
  • the set S is traversed, and the unique 2D point pair corresponding to each 3D point in the set S is determined, and all the filtered 2D point pairs constitute the set T.
  • the positions of the two cameras (ie, the image acquisition devices) with the largest Euclidean distance are the two positions that represent the least similarity. Then the 2D point pairs obtained are the most difficult.
  • the first training data can be obtained.
  • the first training data can be divided according to different difficulties, and divided into three categories: simple, general, and difficult.
  • the sets S and T obtained above since any 3D point in the set S corresponds to a 2D point pair in the set T, then it can be defined that each group of corresponding 3D points m, 2D points x and 2D points y constitute a sample n , and calculate the difficulty score L of each sample n according to the following formula (1).
  • La represents the angle ⁇ xpy formed by the 2D point pair in the sample n and the 3D point
  • Ld represents the 2D point x, 2D point y corresponding to the image acquisition device, such as the camera, the spatial distance between the positions
  • Lq represents the corresponding 2D point Image acquisition device, such as camera, quaternion angle of pose.
  • weight parameters ⁇ 1, ⁇ 2 and ⁇ 3 are introduced. According to the final difficulty score L, the first training data is divided into easy, normal, and difficult.
  • the difficulty level of the first training data can be known based on the division of the first training data, so that the training of subsequent models can be more accurately controlled, especially whether the model can meet many application scenarios, Whether the descriptors can be obtained more accurately in different application scenarios.
  • the first training data can also be adjusted according to the degree of difficulty, so that the degree of difficulty of the samples can meet the requirements and meet the requirements of model training. As can be seen from the foregoing, in order to further reduce the storage space used by the descriptors and reduce the time for measuring the distance between the descriptors.
  • the loss function of the Boolean descriptor can be added, and under the combined action of multiple loss functions, the image descriptor of the Boolean data type and the key point descriptor of the Boolean data type can finally be output.
  • the dimension is much smaller than the traditional feature descriptor, and its effect is also better than the traditional feature descriptor.
  • binary descriptors of Boolean data type are directly output from the feature extraction model, which is more convenient for subsequent retrieval and matching of descriptors.
  • the initial key point information generation layer uses the first training data to train the initial key point information generation layer to generate a trained key point information generation layer, including: adding Boolean to the loss function of the floating point data type in the initial key point information generation layer The loss function of the data type; through the first training data, the loss function of the floating point data type, and the loss function of the Boolean data type, the initial key point information generation layer is trained, and the trained key point information generation layer is generated.
  • the loss function of this layer can be converted from a loss function of floating point data type to a loss function of floating point data type, and a loss function of Boolean data type is added to the loss function, that is, Form multiple loss functions.
  • a loss function of floating-point data type can also implement model training, but the descriptor obtained by the model during training is a descriptor of floating-point data type. Therefore, the loss function of the boolean data type is added to the loss function of the floating point data type as the loss function of the layer, and at the same time, the layer is trained through the first training data, and the trained layer is obtained.
  • the trained feature extraction layer can also be obtained.
  • the image description information generation layer can be trained.
  • the method 100 may further include: based on the trained feature extraction layer, training the initial image description information generation layer by using the second training data, and generating the trained image description information generation layer as the image in the feature extraction model The description information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  • the second training data may be acquired in the following manner: acquiring real images, determining real image matching pairs from the real images based on the classification model, and using them as key frame image matching pairs, and determining whether each real image matching pair belongs to the same vision scene, so as to obtain the second training data.
  • the classification model may be a model for matching real images, and the model may determine real image matching pairs that belong to the same visual scene and real image matching pairs that do not belong to the same visual scene, such as two matching pairs belonging to the same location. real image.
  • the model can be BoW.
  • multiple real images in multiple different visual scenes can be obtained by the drone in the actual flight scene of the visual return flight.
  • use the BoW model to input real images into the model to obtain matching pairs of real images in the same visual scene and matching pairs of real images in different scenes determined by the model.
  • the model can determine matching pairs by scoring.
  • the real image matching pairs with scores higher than the threshold are regarded as real image matching pairs in the same visual scene, that is, positive sample training data.
  • the real image matching pairs whose scores are lower than the threshold are regarded as real image matching pairs that are not in the same visual scene, that is, the negative sample training data.
  • the second training data can be obtained.
  • a random candidate matching pair can be added, that is, a candidate matching pair is randomly selected from the collected real image to generate a candidate matching pair, and after the candidate matching pair is generated. Whether there are errors or problems in these matching pairs is further determined manually. When there are problems or errors, especially caused by the classification model, precious negative sample training data can be obtained to improve the model ability.
  • the method 100 further includes: by displaying the real image matching pairs, in response to a user's determination operation, determining whether the real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
  • the images corresponding to the matching pairs can be displayed through a display device, such as a display screen, and the matching pairs can be displayed as the matching pairs.
  • a display device such as a display screen
  • the matching pairs can be displayed as the matching pairs.
  • the corresponding feature points between the two real images can also be displayed, and the corresponding feature points can be displayed through line to connect.
  • annotations are made by workers (i.e. users).
  • the annotation can include the following situations: same, not same, and indeterminate. The same can be represented by "0", the difference can be represented by "1", and the uncertainty can be represented by "2".
  • Matching pairs that are manually marked as uncertain can be eliminated and not used as the second training data. Others are used as the second training data, that is, the matching pairs marked with "0" and the matching pairs marked with "1" are used as the second training data.
  • the method 100 further includes: randomly selecting real image matching pairs from real images as key frame image matching pairs; by displaying the randomly selected real image matching pairs, in response to a user's determination operation, determining the randomly selected real images Whether the matching pair belongs to the same visual scene, so as to obtain the second training data.
  • the selected negative sample training data also has a difficult problem.
  • By manually labeling based on the BOW model more valuable negative sample training data can be found. (that is, the scenes are similar and are mistakenly identified by BOW as matching pairs belonging to the same visual scene), which helps to train a more robust model network.
  • the images of the matching pair can be obtained by the UAV from the actual flight scene of the visual return, it can fully reflect the change of the perspective scale in the visual return task.
  • the initial image description information generation layer can be trained, and the specific training process will not be repeated. Finally, the trained image description information generation layer can be obtained.
  • the loss function of the Boolean descriptor can be added, and under the combined action of multiple loss functions, the image descriptor of the Boolean data type and the key point descriptor of the Boolean data type can finally be output.
  • the dimension is much smaller than the traditional feature descriptor, and its effect is also better than the traditional feature descriptor.
  • the binary descriptor of Boolean data type is directly output from the feature extraction model, which is more convenient for the retrieval and matching of subsequent descriptors.
  • the binary descriptor of the Boolean data type of the second image description information is output from the image description information generation layer.
  • the initial image description information generation layer is trained by the second training data, and the trained image description information generation layer is generated, including: floating point data in the initial image description information generation layer
  • the loss function of Boolean data type is added to the loss function of the type;
  • the initial image description information generation layer is processed through the second training data, the loss function of the floating point data type and the loss function of the Boolean data type. Train, generate the image description information generation layer after training.
  • the training of the initial image description information generation layer is based on the feature extraction layer after training and the loss function of floating point data type and the loss function of Boolean data type. Through the second training data, the initial image description information generation layer is training. Thus, the feature extraction model can be completely trained.
  • the network of this feature extraction model is a multi-task branch network
  • a step-by-step training method can be adopted during training.
  • the first training data can be used to train the model, that is, the initial key point information generation layer is trained.
  • the parameters of the initial key point information generation layer and the initial feature extraction layer are fixed to obtain the key point information generation layer and the feature extraction layer.
  • use the second training data to train the initial image description information generation layer to obtain the image description information generation layer.
  • the first training data can be used to firstly use the first training data to correspond to the feature extraction model, because the The first training data is obtained from the spatial three-dimensional model, which is completely correct data.
  • the common layer of the model that is, the feature extraction layer, is already a good feature extraction layer.
  • the second training data can be used for training to avoid the generation of worker annotations.
  • the influence of the error on the network so as to obtain a better image description information generation layer.
  • the entire network can be fine-tuned using training data that contains both keypoints and keyframe images.
  • the method 100 further includes: adjusting the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through the third training data, and the third The three training data includes keyframe image matching pairs and keypoint matching pairs in the keyframe image matching pairs.
  • the third training data can be determined in the following way: when the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as the key frame image matching pair and key point matching pairs to obtain the third training data.
  • a three-dimensional spatial model can be constructed. After the model is established, there are at least two 2D points in the real images corresponding to each real 3D point in the spatial three-dimensional model, thereby forming a 2D point pair, that is, a pair of A 2D point pair belongs to a 3D point of a 3D model of space.
  • the two real images and the 2D point pairs in them can be used as the third training data, and the third training data can have multiple pairs real images, and each pair of real images has corresponding 2D point pairs.
  • the model trained by the first training data and the second training data is fine-tuned by using the third training data. Fine-tune the parameters of the feature extraction layer, image description information generation layer and/or keypoint information generation layer in the trained feature extraction model. It will not be repeated here.
  • the fine-tuned model can be used. If the model is trained on a mobile platform, it can be used directly. If the model is trained on a terminal, such as a server or a computer, the trained final model can be transplanted to the mobile platform.
  • the corresponding information can be combined according to the order of the key points in the image, so as to perform subsequent matching.
  • the method 100 further includes: combining the corresponding multiple second key point description information into a vector according to the sequence of multiple key points in the current image.
  • the corresponding descriptors can be integrated into a vector according to the order of the key points in the current image. for subsequent matching.
  • the corresponding descriptors into a vector according to the order of the key points in the historical image. for subsequent matching.
  • the image description information is used to find a first type of historical image with a scene similar to the current image in a plurality of the historical images
  • the key point description information is used to find a first type of historical image in the first type of historical image
  • a key point matching the key point of the current image is searched, and the matching result includes the matching relationship between the key point of the current image and the key point in the historical image.
  • the image description information is used to roughly pair the images, and based on this, one or more historical images (the first type of historical images) that more closely match the current image scene are obtained. What is related to positioning is the matching relationship of key points.
  • the key points in the current image history image can be further matched to obtain the matching relationship of key points, that is, a key point in the current image, A match to a keypoint in the historical image.
  • the position information of the key points in the historical image can be considered accurate, because based on the position information of the key points in the historical image, the matching relationship between a key point in the current image and a key point in the historical image can be obtained.
  • Location information of a key point in the image can be considered accurate, because based on the position information of the key points in the historical image, the matching relationship between a key point in the current image and a key point in the historical image can be obtained.
  • the UAV obtains the image descriptor of the above-mentioned historical image and the key point descriptor or the vector composed of the key point descriptor. And the image descriptor of the current image and the keypoint descriptor or the vector composed of the keypoint descriptor are obtained.
  • the vector composed of the image descriptor corresponding to the current image and the key point descriptor or key point descriptor can be compared with the image descriptor corresponding to multiple historical images and the vector composed of the key point descriptor or key point descriptor.
  • the comparison result that is, the matching result, can be determined through a similarity algorithm.
  • the comparison result may be that the current image may be exactly the same as one of the historical images, or partially the same, that is, similar.
  • the similarity can be obtained according to the similarity algorithm to determine whether the similarity is greater than the similarity threshold. When it is greater than the similarity threshold, it can be determined that the matching result is a matching. Otherwise, it is a mismatch.
  • the above similarity algorithm may include Hamming distance, Euclidean distance, and the like.
  • the above-mentioned image descriptors and key point descriptors can be Boolean descriptors.
  • the Boolean descriptor measures the distance between the corresponding descriptors through the similarity algorithm, it only needs to perform the XOR operation to obtain the similarity, such as the Hamming distance, which can greatly speed up the calculation of the distance between the corresponding descriptors. , thereby further reducing the time consumption.
  • the UAV determines which historical image the current image is the same as or meets the similarity threshold, so as to determine the geographic location to which the current image belongs based on the geographic location to which the historical image belongs.
  • the determined geographic location may be an absolute geographic location of the current image, that is, a geographic location based on a geographic location coordinate system or a geographic location relative to a historical image.
  • the movable platform After determining the position of the current image, the movable platform can go back according to the position.
  • the method 100 may further include: determining the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; according to the posture, the movable platform moves from the second position to the The first position to realize the automatic return of the movable platform.
  • the posture of the drone is determined and adjusted according to the deviation of the above two positions. So that the drone can move from the second position to the first position, so as to realize the return flight.
  • FIG. 3 is a schematic structural diagram of a positioning device according to an embodiment of the present invention
  • the device 300 can be applied to a movable platform, such as an unmanned aerial vehicle, an intelligent mobile robot, etc.
  • the movable platform includes a visual sensor.
  • the apparatus 300 can perform the above-mentioned positioning method.
  • the apparatus 300 includes: a first obtaining module 301 , a second obtaining module 302 , a first determining module 303 and a second determining module 304 .
  • the functions of each module are described in detail below:
  • the first obtaining module 301 is configured to obtain first image description information and first key point description information of historical images collected by the visual sensor, and obtain first position information of the movable platform when collecting the historical images.
  • the second obtaining module 302 is configured to obtain the current image collected by the visual sensor, and obtain second image description information and second key point description information of the current image based on the feature extraction model.
  • the first determination module 303 is configured to determine a plurality of historical images and the current image based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image. match result.
  • the first determination module 304 is configured to determine the second position information of the movable platform when the current image is collected according to the matching result and the first position information of the historical image.
  • the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared features. The information generates second image description information; the key point information generation layer is used for generating the second key point description information based on the common feature information.
  • the feature extraction layer is used to extract common feature information based on multiple convolutional layers in the convolutional network.
  • the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer.
  • the image information generation layer is used to extract the second image description information of the floating point data type from the common feature information through the two convolution layers and the NetVLAD layer; the image information generation layer is used to convert the floating point data type to the second image description information.
  • the second image description information is converted into the second image description information of the Boolean data type.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of one convolution layer is related to the second key point.
  • the amount of descriptive information is the same.
  • the key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a convolution layer and bilinear upsampling; the key point information generation layer is used to convert the floating point data.
  • the second key point description information of the point data type is converted into the second key point description information of the Boolean data type.
  • the second acquisition module 302 is used to: determine the position of the key point in the current image; the key point information generation layer is used to obtain the downsampling information of the shared feature information through a layer of convolution layer; the key point information generation layer is used to The information of the corresponding position in the down-sampling information is directly up-sampled by bilinear up-sampling, so as to obtain the second key point description information.
  • the apparatus 300 further includes: a combining module, configured to combine the corresponding multiple second key point description information into a vector according to the sequence of the multiple key points in the current image.
  • the device 300 further includes: a third determining module, configured to determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; a moving module, configured to determine the posture of the movable platform according to the posture , the movable platform moves from the second position to the first position to realize the automatic return of the movable platform.
  • a third determining module configured to determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result
  • a moving module configured to determine the posture of the movable platform according to the posture , the movable platform moves from the second position to the first position to realize the automatic return of the movable platform.
  • the device 300 further includes: a training module for training the initial feature extraction layer through the first training data, and generating a trained feature extraction layer as the feature extraction layer in the feature extraction model;
  • the first training data includes Image point pairs corresponding to the same spatial point, the image point pairs are represented in different corresponding real images of the same visual scene;
  • the training module is used to train the initial key point information generation layer through the first training data, and generate training The latter key point information generation layer is used as the key point information generation layer in the feature extraction model.
  • the second acquisition module 302 is used for acquiring real images from different angles under each visual scene for different visual scenes; the device 300 further includes: a creation module for each visual scene, according to corresponding different angles The real image of the space is constructed, and the spatial three-dimensional model is constructed; the selection module is used to select the space point from the space three-dimensional model based on the similarity between the space points, and obtain the real image point corresponding to each selected space point in the real image. Right; the selection module is used to select the real image point pair according to the similarity between the collection positions of the real image point pair, and use the selected real image point pair as the key point pair to obtain the first training data.
  • the training module includes: an adding unit for adding a loss function of Boolean data type to the loss function of floating point data type in the initial key point information generation layer; a training unit for passing the first training data, floating point data
  • the loss function of point data type and the loss function of Boolean data type are used to train the initial key point information generation layer, and generate the key point information generation layer after training.
  • the training module is also used for: training the initial image description information generation layer through the second training data based on the trained feature extraction layer, and generating the trained image description information generation layer as the image description in the feature extraction model The information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  • the second acquisition module 302 further includes: acquiring real images, determining real image matching pairs from the real images based on the classification model, as key frame image matching pairs, and determining whether each real image matching pair belongs to the same visual scene , so as to obtain the second training data.
  • the apparatus 300 further includes: a third determination module, configured to determine whether the real image matching pairs belong to the same visual scene by displaying the real image matching pairs in response to the user's determination operation, thereby acquiring the second training data.
  • a third determination module configured to determine whether the real image matching pairs belong to the same visual scene by displaying the real image matching pairs in response to the user's determination operation, thereby acquiring the second training data.
  • the selection module is further configured to: randomly select real image matching pairs from the real images as key frame image matching pairs; the third determining module is configured to respond to the user's determination by displaying the randomly selected real image matching pairs operation to determine whether the randomly selected matching pairs of real images belong to the same visual scene, so as to obtain the second training data.
  • the adding unit is also used for: adding a loss function of boolean data type to the loss function of floating point data type in the initial image description information generation layer; the training unit is also used to extract the layer based on the features after training, through The second training data, the loss function of the floating point data type, and the loss function of the Boolean data type are used to train the initial image description information generation layer to generate a trained image description information generation layer.
  • the apparatus 300 further includes: an adjustment module, configured to perform a feature extraction layer, an image description information generation layer and/or a key point information generation layer in the feature extraction model through the third training data
  • the third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
  • the selection module is also used for: when the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as key frame image matching pairs and key points Match pairs to obtain the third training data.
  • the structure of the positioning apparatus 300 shown in FIG. 3 may be implemented as an electronic device, and the electronic device may be a positioning device, such as a movable platform.
  • the positioning device 400 may include: one or more processors 401 , one or more memories 402 and a visual sensor 403 .
  • the visual sensor 403 is used to collect historical images and current images.
  • the memory 402 is used to store a program that supports the electronic device to execute the positioning method provided in the embodiments shown in FIG. 1 to FIG. 2 .
  • the processor 401 is configured to execute programs stored in the memory 402 .
  • the program includes one or more computer instructions, wherein the one or more computer instructions can implement the following steps when executed by the processor 401:
  • the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared features. The information generates second image description information; the key point information generation layer is used for generating the second key point description information based on the common feature information.
  • the feature extraction layer is used to extract common feature information based on multiple convolutional layers in the convolutional network.
  • the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer.
  • the image information generation layer is used to extract the second image description information of the floating point data type from the shared feature information through the two convolution layers and the NetVLAD layer; the image information generation layer is used to extract the second image description information of the floating point data type The image description information is converted into the second image description information of the Boolean data type.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of one convolution layer is related to the second key point.
  • the amount of descriptive information is the same.
  • the key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a convolution layer and bilinear upsampling; the key point information generation layer is used to convert the floating point data.
  • the second key point description information of the point data type is converted into the second key point description information of the Boolean data type.
  • the processor 401 is further configured to: determine the position of the key point in the current image; the key point information generation layer is used to obtain the down-sampling information of the common feature information through a convolution layer; the key point information generation layer is used to The information of the corresponding position in the down-sampling information is directly up-sampled by bilinear up-sampling, so as to obtain the second key point description information.
  • the processor 401 is further configured to: combine the corresponding multiple second key point description information into a vector according to the sequence of the multiple key points in the current image.
  • the processor 401 is further configured to: determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; according to the posture, the movable platform moves from the second position to the first position for automatic return of the movable platform.
  • the processor 401 is further configured to: train the initial feature extraction layer by using the first training data, and generate a trained feature extraction layer as the feature extraction layer in the feature extraction model; Image point pairs of spatial points, the image point pairs are represented in different corresponding real images of the same visual scene; through the first training data, the initial key point information generation layer is trained, and the trained key point information generation layer is generated, As the keypoint information generation layer in the feature extraction model.
  • the processor 401 is further configured to: for different visual scenes, obtain real images from different angles in each visual scene; for each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles; The similarity between the points, select the spatial point from the spatial three-dimensional model, and obtain the real image point pair corresponding to each selected spatial point in the real image; according to the similarity between the collection positions of the real image point pair, The real image point pair is selected, and the selected real image point pair is used as the key point pair to obtain the first training data.
  • the processor 401 is specifically configured to: add a loss function of the Boolean data type to the loss function of the floating point data type in the initial key point information generation layer; pass the first training data, the loss function of the floating point data type and the Boolean data type The loss function of the data type, the initial key point information generation layer is trained, and the trained key point information generation layer is generated.
  • the processor 401 is further configured to: based on the trained feature extraction layer, train the initial image description information generation layer by using the second training data, and generate the trained image description information generation layer as the image in the feature extraction model The description information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  • the processor 401 is further configured to: obtain a real image, determine a real image matching pair from the real image based on the classification model, use it as a key frame image matching pair, and determine whether each real image matching pair belongs to the same visual scene, Thus, the second training data is obtained.
  • the processor 401 is further configured to: by displaying the real image matching pairs, in response to the user's determination operation, determine whether the real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
  • the processor 401 is further configured to: randomly select real image matching pairs from real images as key frame image matching pairs; by displaying the randomly selected real image matching pairs, in response to a user's determination operation, determine the randomly selected real image matching pairs Whether the image matching pairs belong to the same visual scene, so as to obtain the second training data.
  • the processor 401 is specifically configured to: add a loss function of the Boolean data type to the loss function of the floating point data type in the initial image description information generation layer; based on the feature extraction layer after training, through the second training data, The loss function of the floating point data type and the loss function of the Boolean data type are used to train the initial image description information generation layer, and generate the trained image description information generation layer.
  • the processor 401 is further configured to: adjust the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through the third training data,
  • the third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
  • the processor 401 is further configured to: when the number of real image point pairs in the two real images is greater than the threshold, then use the two real images and the corresponding real image point pairs as the key frame image matching pair and the key The points are matched to obtain the third training data.
  • an embodiment of the present invention provides a computer-readable storage medium, where the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used to implement the above-mentioned methods in FIG. 1 to FIG. 2 . .
  • An embodiment of the present invention provides an unmanned aerial vehicle; specifically, the unmanned aerial vehicle includes: a body and a positioning device as shown in FIG. 4 , and the positioning device is provided on the body.
  • the disclosed related detection apparatus eg, IMU
  • the embodiments of the remote control device described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods, such as multiple units or components. May be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, and the indirect coupling or communication connection of the remote control device or unit may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer processor (processor) to perform all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de positionnement, et un véhicule aérien sans pilote et un support de stockage. Le procédé est appliqué à une plateforme mobile, la plateforme mobile comprenant un capteur visuel (403). Le procédé consiste à : acquérir des premières informations de description d'image et des premières informations de description de point clé d'images historiques collectées par un capteur visuel, et acquérir des premières informations de position correspondant à une plateforme mobile; acquérir l'image actuelle collectée par le capteur visuel, et acquérir des secondes informations de description d'image et des secondes informations de description de point clé de l'image actuelle sur la base d'un modèle d'extraction de caractéristique; sur la base des informations de description correspondant aux images historiques et des informations de description correspondant à l'image actuelle, déterminer un résultat de mise en correspondance entre la pluralité d'images historiques et l'image actuelle; et, en fonction du résultat de mise en correspondance et des premières informations de position, déterminer des secondes informations de position de la plate-forme mobile lorsque l'image actuelle est collectée. L'efficacité d'acquisition des deux éléments d'informations de description est améliorée, et les deux éléments d'informations de description peuvent également être déterminées de manière relativement précise, de telle sorte que la durée du positionnement est encore plus courte et que la précision du positionnement est améliorée.
PCT/CN2020/137313 2020-12-17 2020-12-17 Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage WO2022126529A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/137313 WO2022126529A1 (fr) 2020-12-17 2020-12-17 Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage
CN202080069130.4A CN114556425A (zh) 2020-12-17 2020-12-17 定位的方法、设备、无人机和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137313 WO2022126529A1 (fr) 2020-12-17 2020-12-17 Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage

Publications (1)

Publication Number Publication Date
WO2022126529A1 true WO2022126529A1 (fr) 2022-06-23

Family

ID=81667972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137313 WO2022126529A1 (fr) 2020-12-17 2020-12-17 Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage

Country Status (2)

Country Link
CN (1) CN114556425A (fr)
WO (1) WO2022126529A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116858215A (zh) * 2023-09-05 2023-10-10 武汉大学 一种ar导航地图生成方法及装置
CN118097796A (zh) * 2024-04-28 2024-05-28 中国人民解放军联勤保障部队第九六四医院 一种基于视觉识别的姿态检测分析系统及方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677444B (zh) * 2022-05-30 2022-08-26 杭州蓝芯科技有限公司 一种优化的视觉slam方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103256931A (zh) * 2011-08-17 2013-08-21 清华大学 无人机的可视导航系统
WO2015143615A1 (fr) * 2014-03-24 2015-10-01 深圳市大疆创新科技有限公司 Procédé et appareil de correction de l'état d'un aéronef en temps réel
US20160132057A1 (en) * 2013-07-09 2016-05-12 Duretek Inc. Method for constructing air-observed terrain data by using rotary wing structure
CN107209854A (zh) * 2015-09-15 2017-09-26 深圳市大疆创新科技有限公司 用于支持顺畅的目标跟随的系统和方法
CN110139038A (zh) * 2019-05-22 2019-08-16 深圳市道通智能航空技术有限公司 一种自主环绕拍摄方法、装置以及无人机

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103256931A (zh) * 2011-08-17 2013-08-21 清华大学 无人机的可视导航系统
US20160132057A1 (en) * 2013-07-09 2016-05-12 Duretek Inc. Method for constructing air-observed terrain data by using rotary wing structure
WO2015143615A1 (fr) * 2014-03-24 2015-10-01 深圳市大疆创新科技有限公司 Procédé et appareil de correction de l'état d'un aéronef en temps réel
CN107209854A (zh) * 2015-09-15 2017-09-26 深圳市大疆创新科技有限公司 用于支持顺畅的目标跟随的系统和方法
CN110139038A (zh) * 2019-05-22 2019-08-16 深圳市道通智能航空技术有限公司 一种自主环绕拍摄方法、装置以及无人机

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116858215A (zh) * 2023-09-05 2023-10-10 武汉大学 一种ar导航地图生成方法及装置
CN116858215B (zh) * 2023-09-05 2023-12-05 武汉大学 一种ar导航地图生成方法及装置
CN118097796A (zh) * 2024-04-28 2024-05-28 中国人民解放军联勤保障部队第九六四医院 一种基于视觉识别的姿态检测分析系统及方法

Also Published As

Publication number Publication date
CN114556425A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
Piasco et al. A survey on visual-based localization: On the benefit of heterogeneous data
Li et al. Dual-resolution correspondence networks
Kendall et al. Posenet: A convolutional network for real-time 6-dof camera relocalization
Laskar et al. Camera relocalization by computing pairwise relative poses using convolutional neural network
WO2022126529A1 (fr) Procédé et dispositif de positionnement, et véhicule aérien sans pilote et support de stockage
Zeng et al. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions
Lim et al. Real-time image-based 6-dof localization in large-scale environments
CN111652934B (zh) 定位方法及地图构建方法、装置、设备、存储介质
US8442307B1 (en) Appearance augmented 3-D point clouds for trajectory and camera localization
CN107967457A (zh) 一种适应视觉特征变化的地点识别与相对定位方法及系统
Xia et al. Loop closure detection for visual SLAM using PCANet features
CN107329962B (zh) 图像检索数据库生成方法、增强现实的方法及装置
CN113298934B (zh) 一种基于双向匹配的单目视觉图像三维重建方法及系统
JP7430243B2 (ja) 視覚的測位方法及び関連装置
Vishal et al. Accurate localization by fusing images and GPS signals
CN114926747A (zh) 一种基于多特征聚合与交互的遥感图像定向目标检测方法
Müller et al. Squeezeposenet: Image based pose regression with small convolutional neural networks for real time uas navigation
CN112861808B (zh) 动态手势识别方法、装置、计算机设备及可读存储介质
CN110969648A (zh) 一种基于点云序列数据的3d目标跟踪方法及系统
CN111368733B (zh) 一种基于标签分布学习的三维手部姿态估计方法、存储介质及终端
Alam et al. A review of recurrent neural network based camera localization for indoor environments
CN113592015B (zh) 定位以及训练特征匹配网络的方法和装置
Drobnitzky et al. Survey and systematization of 3D object detection models and methods
Álvarez-Tuñón et al. Monocular visual simultaneous localization and mapping:(r) evolution from geometry to deep learning-based pipelines
JP7336653B2 (ja) ディープラーニングを利用した屋内位置測位方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965545

Country of ref document: EP

Kind code of ref document: A1