WO2022126529A1 - Positioning method and device, and unmanned aerial vehicle and storage medium - Google Patents

Positioning method and device, and unmanned aerial vehicle and storage medium Download PDF

Info

Publication number
WO2022126529A1
WO2022126529A1 PCT/CN2020/137313 CN2020137313W WO2022126529A1 WO 2022126529 A1 WO2022126529 A1 WO 2022126529A1 CN 2020137313 W CN2020137313 W CN 2020137313W WO 2022126529 A1 WO2022126529 A1 WO 2022126529A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
description information
key point
generation layer
information
Prior art date
Application number
PCT/CN2020/137313
Other languages
French (fr)
Chinese (zh)
Inventor
梁湘国
杨健
蔡剑钊
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/137313 priority Critical patent/WO2022126529A1/en
Priority to CN202080069130.4A priority patent/CN114556425A/en
Publication of WO2022126529A1 publication Critical patent/WO2022126529A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present application relates to the field of visual return navigation, and in particular, to a positioning method, device, unmanned aerial vehicle and storage medium.
  • UAVs are unmanned aircraft operated by radio remote control equipment and self-contained program control devices. Alternatively, UAVs can also be fully or intermittently operated autonomously by on-board computers. During the flight of the UAV, since many times it is in the range of over-the-horizon, in order to ensure the safety of the UAV, it is quite necessary to automatically return to the home.
  • the UAV In the process of automatic return of the UAV, the UAV needs to locate the current position relatively quickly and accurately. However, it is very important to quickly and accurately locate the small-sized equipment such as UAVs.
  • the present application provides a positioning method, device, unmanned aerial vehicle and storage medium, which can be used for faster and more accurate positioning.
  • a first aspect of the present application is to provide a positioning method, the method is applied to a movable platform, and the movable platform includes a vision sensor, including: acquiring first image description information of historical images collected by the vision sensor and the first key point description information, and obtain the first position information of the movable platform when collecting the historical image; obtain the current image collected by the vision sensor, and obtain the first position of the current image based on the feature extraction model two image description information and second key point description information; based on the first image description information and the first key point description information of the historical image, and the second image description information and all the current image
  • the second key point description information is used to determine the matching results of a plurality of the historical images and the current image; according to the matching results and the first position information of the historical images, determine the second position information of the movable platform.
  • a second aspect of the present application is to provide a positioning device, including: a memory, a processor and a visual sensor; the memory for storing a computer program; the visual sensor for collecting historical images and current images; The processor invokes the computer program to implement the following steps: acquiring the first image description information and the first key point description information of the historical images collected by the visual sensor, and acquiring the available information when collecting the historical images.
  • the first position information of the mobile platform obtain the current image collected by the visual sensor, and obtain the second image description information and second key point description information of the current image based on the feature extraction model; The first image description information and the first key point description information, and the second image description information and the second key point description information of the current image, to determine a plurality of the historical images and the current image The matching result; according to the matching result and the first position information of the historical image, determine the second position information of the movable platform when the current image is collected.
  • a third aspect of the present application is to provide an unmanned aerial vehicle, comprising: a body and the positioning device described in the second aspect.
  • a fourth aspect of the present application is to provide a computer-readable storage medium, the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used in the first aspect. method described.
  • a positioning method provided by the present application is applied to a movable platform.
  • the movable platform includes a visual sensor, including: acquiring first image description information and first key point description information of historical images collected by the visual sensor, and acquiring The first position information of the movable platform when the historical image is collected; the current image collected by the visual sensor is acquired, and the second image description information and the second key point description information of the current image are obtained based on the feature extraction model; The image description information and the first key point description information, as well as the second image description information and the second key point description information of the current image, determine the matching results of multiple historical images and the current image; according to the matching results and the first position of the historical image information to determine the second position information of the movable platform when the current image is captured.
  • the second image description information and the second key point description information of the current image can be obtained at the same time, which improves the efficiency of obtaining the two description information.
  • the two description information can be more accurately determined, thereby further saving the positioning time, improving the positioning accuracy, and satisfying the real-time performance of acquiring the two description information and the real-time performance of positioning.
  • this method can also be applied to movable platforms such as UAVs, which can help the UAVs to return home more smoothly.
  • fusion training can be performed for the second image description information and the second key point description information, that is, training is performed for the one feature extraction model, that is, it can be achieved.
  • the acquisition of the second image description information and the second key point description information improves the global performance.
  • the embodiments of the present application also provide a device, an unmanned aerial vehicle, and a storage medium based on the method, all of which can achieve the above effects.
  • FIG. 1 is a schematic flowchart of a positioning method provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a feature extraction model provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a positioning device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a positioning device provided by an embodiment of the present application.
  • the UAV needs to locate the current position relatively quickly and accurately, especially for a small-sized mobile platform such as the UAV.
  • the key frame matching and key point matching can be divided into two tasks for visual return navigation, but considering the time efficiency, the key frame matching can use the BoW (Bag of Words, bag of words model) method. Although the time efficiency is high, the effect is not ideal.
  • the ORB Oriented Fast and Rotated Brief, a fast feature point extraction and description algorithm
  • the embodiments of the present application provide a method of generating the descriptors of key frames and key points in the same network model, and reconstructing the network structure for positioning.
  • FIG. 1 is a schematic flowchart of a positioning method provided by an embodiment of the present invention
  • the method 100 provided by an embodiment of the present application can be applied to a movable platform, such as an unmanned aerial vehicle and an intelligent mobile robot, and the movable platform includes a visual sensor.
  • the method 100 includes the following steps:
  • the second image description information and the second key point description information of the current image can be obtained at the same time, which improves the efficiency of obtaining the two description information.
  • the two description information can be more accurately determined, thereby further saving the positioning time, improving the positioning accuracy, and satisfying the real-time performance of acquiring the two description information and the real-time performance of positioning.
  • this method can also be applied to movable platforms such as UAVs, which can help the UAVs to return home smoothly and ensure the safety of UAVs.
  • the method 100 can be applied to a movable platform, and the movable platform can also involve other movable platforms besides drones, or movable devices, such as sweeping robots, etc., which can make these movable platforms Or the mobile device can automatically return to the home or automatically return to the original location, etc.
  • the visual sensor refers to an instrument that uses optical components and imaging devices to obtain image information of the external environment. It can be set inside the movable platform and used to obtain the external environment information of the movable platform, such as the outside of the current geographic location of the UAV. environment image.
  • Historical images refer to the historical images obtained by the movable platform during the moving process, such as the external environment images obtained by the UAV during the normal navigation phase.
  • the external environment image obtained during the normal navigation phase can be used as a historical image for reference to determine the current geographic location of the UAV during the process of returning to the flight.
  • the first image description information refers to information representing the characteristics of the image, such as image descriptors.
  • the image can be a key frame image in the moving process, which can be called a key frame descriptor.
  • the first key point description information refers to information representing the features of key points in the image, such as key point descriptors in the image.
  • the key point may be a corner or edge in the image.
  • the first location information refers to the geographic location where the movable platform is located when the movable platform obtains the corresponding historical image.
  • the current geographic location can be determined by a positioning device of the movable platform, such as GPS (Global Positioning System, global positioning system).
  • GPS Global Positioning System, global positioning system
  • the pose of the movable platform which may also be called orientation information, can also be obtained, so as to determine the pose of the movable platform.
  • the above-mentioned first image description information and first key point description information can be obtained by the following feature extraction model. It can also be acquired by other acquisition methods, such as SIFT (scale-invariant feature transform, Scale-invariant feature transform) algorithm, or, SuperPoint (feature point) algorithm, etc. It should be noted that although the above-mentioned other acquisition methods cannot adapt to the above-mentioned complex algorithms due to their relatively complex algorithms and the mobile platform may have problems such as small size, the above-mentioned complex algorithms can still be run, but the real-time performance is not ideal. . However, for the historical images, the historical images may not be acquired in real time, and the historical images may be acquired with time intervals.
  • the UAV when the UAV is sailing normally in the air, the UAV can obtain the image of the external environment when the UAV is in the air through the camera of the vision sensor set on the UAV from the normal navigation, that is, the historical image.
  • the vision sensor transmits the acquired historical images to the drone for image processing.
  • the vision sensor can acquire historical images in real time, or it can acquire historical images with time intervals.
  • the vision sensor or UAV can also determine whether it belongs to the key frame image according to the acquired historical image, which can be determined according to the determination rules of the key frame, and then the vision sensor determines whether to send the historical image to the UAV.
  • the drone processes historical images after determining keyframes.
  • the UAV can obtain image descriptors of historical images and key point descriptors in the image, such as corner descriptors, through the following feature extraction model or SIFT algorithm.
  • the current image refers to an image obtained by the movable platform at the current geographic location during the process of returning to the voyage or returning to the mobile platform.
  • the second image description information and the second key point description information are of the same nature as the first image description information and the first key point description information in the foregoing step 101, and will not be repeated here.
  • the second image description information and the second key point description information are obtained based on the same model (ie, the feature extraction model), it can not only improve the efficiency of information acquisition, but also meet the requirements of the descriptors obtained in real time during model training. When the model is fused instead of separately trained, the overall situation is better.
  • the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared feature information. generating second image description information; the key point information generation layer is used for generating second key point description information based on the common feature information.
  • FIG. 2 shows a schematic diagram of the structure of the feature extraction model.
  • the feature extraction model includes a feature extraction layer 201 , an image description information generation layer 203 and a key point information generation layer 202 .
  • the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network, including: the feature extraction layer is used to extract the common feature information based on multiple convolutional layers in the convolutional network.
  • the convolution layer may perform convolution on the input historical image or current image, that is, the real image 204 , to obtain convolved image feature information, that is, shared feature information.
  • the key point information generation layer is used to generate the second key point description information based on the shared feature information, including: the key point information generation layer is used to extract the first key point information from the shared feature information based on a convolutional layer and bilinear upsampling.
  • the key point information generation layer is used to extract the first key point information from the shared feature information based on a convolutional layer and bilinear upsampling.
  • Two key point description information the number of convolution kernels of one convolution layer is the same as the number of the second key point description information.
  • the keypoint information generation layer 202 may include a convolutional layer, and the number of convolution kernels in the convolutional layer is the same as that of the second keypoint description information. After bilinear upsampling, the key point descriptor 2021 in the current image can be obtained.
  • the image description information generation layer is configured to generate the second image description information based on the shared feature information, including: the image information generation layer is configured to extract the second image description information from the shared feature information through two convolution layers and a NetVLAD layer. As shown in FIG. 2, for the image description information generation layer 203, it may include two convolution layers and a NetVLAD (Net Vector of Local Aggregated Descriptors) layer, thereby generating the current image descriptor 2031 .
  • NetVLAD Net Vector of Local Aggregated Descriptors
  • the SIFT algorithm in the key point descriptor generation method, although it is described above that the SIFT algorithm can be used to obtain the descriptor, and its effect is good, the SIFT algorithm has high complexity and cannot be implemented in real time on embedded devices. For the return flight, the real-time performance is poor. In addition, the SIFT algorithm has not been specially improved for large-scale and large-angle changes, and is not suitable for use in visual return. However, other traditional methods generally have poor effect or high time complexity, and cannot meet the requirements of descriptors in visual return.
  • SuperPoint is a better model at present.
  • This model can obtain the position of the key point in the image and the key point descriptor at the same time, but the model is relatively large. It is difficult to run in real time on embedded devices, and because its training data is generated through homography changes, it cannot well simulate the actual usage scenarios in visual return navigation.
  • the above-mentioned one feature extraction model can meet the requirement of being able to run in real time on a mobile platform, that is, the embedded device.
  • the feature extraction layer can extract the features of the current image or historical images by fast downsampling with a convolutional layer with a stride of 2, which can reduce computing power consumption.
  • this model structure can reduce the computational complexity of the network as much as possible while ensuring the effect of extracting the descriptors, and at the same time combine the information of graph features and point features, that is, the shared feature information, to generate key point descriptors and points in the same network model.
  • Image descriptors not only make full use of the commonality of images and key points, but also save a lot of time for repeated feature calculation. For example, as mentioned above, when the drone is subjected to environmental factors such as weather, which causes the signal to be interrupted, or there is a problem with the GPS positioning device, it can automatically trigger the return flight. In the process of returning home, the UAV can obtain the current image in real time through the vision sensor and send it to the UAV for processing.
  • the current image can be input into the feature extraction model.
  • the feature extraction layer in the model is first passed, and the common feature information of the current image is obtained through the convolution layer in the feature extraction layer. Then this shared feature information is sent to the image description information generation layer and the key point information generation layer respectively.
  • the image description information generation layer obtains the current image descriptor through two convolution layers and NetVLAD layers.
  • the key point information generation layer receives the shared feature information, it obtains the key point descriptor in the current image through a convolution layer and bilinear upsampling.
  • the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer, including: the image information generation layer is used to pass the two convolution layers and the NetVLAD layer.
  • the feature information is extracted into the second image description information of the floating point data type; the image information generation layer is used to convert the second image description information of the floating point data type into the second image description information of the Boolean data type.
  • the image description information generation layer 203 obtains the current image descriptor 2031 through two convolutional layers and the NetVLAD layer, and the current image description obtained at this time
  • the sub 2031 is of floating point data type, and the current image descriptor 2031 of the floating data type is converted into the current image descriptor 2032 of the Boolean data type.
  • the above problem also exists for the key point information generation layer, so the data type of the key point descriptors in this layer can also be converted from the floating point data type to the Boolean data type.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information through a convolution layer and bilinear upsampling, including: the key point information generation layer is used to pass a convolution layer. and bilinear upsampling, extracting the second key point description information of the floating point data type from the common feature information; the key point information generation layer is used to convert the second key point description information of the floating point data type into the Boolean data type.
  • the second key point describes the information.
  • the keypoint information generation layer 202 after receiving the shared feature information, the keypoint information generation layer 202 obtains the keypoint descriptor 2021 in the current image through a convolutional layer and bilinear upsampling.
  • the key point descriptor 2021 obtained at this time is of the floating point data type, and then the key point descriptor 2021 of the floating point data type is converted into the key point descriptor 2022 of the Boolean data type.
  • the SuperPoint algorithm it is different from the SuperPoint algorithm to finally obtain all the keypoint descriptors, and then obtain the descriptors of the keypoints according to the positions of the keypoints.
  • the key point descriptors are directly obtained by bilinear upsampling, and by only performing bilinear upsampling on the key point positions, when using can greatly reduce the amount of computation.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information based on a convolution layer and bilinear upsampling, including: determining the position of the key point in the current image; generating the key point information
  • the layer is used to obtain the down-sampling information of the shared feature information through a convolution layer; the key point information generation layer is used to directly up-sample the information of the corresponding position in the down-sampling information through bilinear up-sampling to obtain the second key point Description.
  • the position of the key point refers to the position of the key point in the image, that is, the size of the image in which the key point is located. If the size of the key point is 16*16 pixels, the position of the key point in the image can be determined.
  • key points are obtained in a certain grid area in the current image, such as 16x16 pixels.
  • the descriptor is directly obtained by bilinear upsampling, which not only reduces other
  • the consumption of deconvolution upsampling in the learning-based method, and in the actual training and use, only upsampling for the position of the key points can greatly reduce the time consumption.
  • the creation of the above feature extraction model is obtained through model training. Because the model is a multi-task branch network model, it can be trained step by step during training. First, use the key point training set to train the model, and initially fix the parameters in the image description information generation layer, which are used for Determine the image descriptor (which can be the current image descriptor or the historical image descriptor). When the model is trained until the loss does not decrease significantly, the parameters of the key point information generation layer and the parameters of the feature extraction layer can be determined. Then, the obtained image matching training set is used to train the image description information generation layer, and its final parameters are determined.
  • the image description information generation layer which can be the current image descriptor or the historical image descriptor
  • the image description information generation layer can also be trained first, so that the parameters of the feature extraction layer can also be determined, and then the key point information generation layer is trained.
  • the model trained in this way is slightly less accurate than the model trained by the above training method.
  • the model in order to improve the training time of the entire model, can be trained through other training platforms, such as through a server or a computer, and then transplanted to a mobile platform after the training is completed.
  • the model can also be trained on the movable platform.
  • the initial feature extraction layer is trained through the first training data, and the trained feature extraction layer is generated as the feature extraction layer in the feature extraction model;
  • the first training data includes image point pairs corresponding to the same spatial point, The image point pair is represented in different corresponding real images of the same visual scene;
  • the initial key point information generation layer is trained, and the trained key point information generation layer is generated as the key point in the feature extraction model.
  • Point information generation layer is generated by the initial key point information generation layer.
  • the first training data is the training data in the above-mentioned key point training set.
  • the structure of the initial keypoint information generation layer is the same as that of the trained keypoint information generation layer. Only the parameters are different. For the initial keypoint information generation layer, the parameters are the initial parameters.
  • the training process is the training process of the network model, which will not be repeated here. It is only explained: the image point pair may be the image point pair corresponding to the spatial point in the same three-dimensional space.
  • the image point pair is derived from two images that are represented as different real images of the same visual scene, like two real images of the same location but at different angles, or at different image acquisition locations.
  • the acquisition method of the first training data including the above-mentioned image point pairs is as follows:
  • the real images from different angles in each visual scene For different visual scenes, obtain real images from different angles in each visual scene; for each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles; based on the similarity between spatial points, Select spatial points from the spatial three-dimensional model, and obtain the real image point pairs corresponding to each selected spatial point in the real image; select the real image point pairs according to the similarity between the collection positions of the real image point pairs , and the selected real image point pairs are used as key point pairs to obtain the first training data.
  • the acquisition of real images from different angles may be:
  • the UAV may acquire real images according to the size of the flight height and the attitude angle.
  • the UAV is used to collect the real image of the downward view, that is, the image data.
  • the data that is too similar is eliminated according to the similarity of the collected images.
  • real data can be provided during model training and testing.
  • the real data can include a large number of collected real images and matching feature points in the real images, which can also be called key points.
  • the process of constructing a three-dimensional spatial model may be: For example, according to the foregoing, for at least two real images (which may be two, three, four, and five, etc.) of the same visual scene obtained above, using SFM (Structure from motion, three-dimensional modeling) modeling method to build a three-dimensional model of space. After the model is established, there are 2D points in at least two real images corresponding to each real 3D point in the spatial three-dimensional model, thereby forming a 2D point pair. In order to improve the generalization ability of the model, robust feature descriptions can be extracted when processing different types of key points.
  • the embodiment of the present application can use a variety of different types of key points to construct a three-dimensional model through SFM.
  • the key point type may include, but is not limited to, SIFT type key points (which may be key points or corner points obtained by SIFT algorithm, etc.), FAST (Features from accelerated segment test, feature point detection algorithm) type key points (which may be The key points or corner points obtained by the FAST algorithm), the key points of the ORB type (which can be the key points or corner points obtained by the ORB algorithm, etc.), the key points of the Harris type (which can be obtained by the Harris algorithm) key points or corner points, etc.).
  • SIFT type key points which may be key points or corner points obtained by SIFT algorithm, etc.
  • FAST Features from accelerated segment test, feature point detection algorithm
  • type key points which may be The key points or corner points obtained by the FAST algorithm
  • the key points of the ORB type which can be the key points or corner points obtained by the ORB algorithm, etc.
  • Harris which can be obtained by the Harris algorithm
  • the 3D points corresponding to the 2D point pairs that can be obtained through the above process will contain a large number of 3D points with similar distances, especially when a certain area in the image is particularly rich in texture, a large number of 3D points corresponding to this area will appear, which will affect The balanced distribution of training data needs to be filtered.
  • the screening process is as follows:
  • a 3D point set S can be defined first, including the filtered 3D points. It can be traversed from the 3D points generated above, so that the similarity of any two 3D points in the filtered 3D points is less than or equal to a threshold, then the similarity of any two 3D points in the 3D points obtained in the set S is degree is less than or equal to a threshold.
  • the similarity algorithm can be determined by the Euclidean distance algorithm.
  • a set P may also be set, and the set P is a set of candidate 3D points. Before screening, all the generated 3D points can be set in the set P as candidate 3D points. First, it is determined from the set P that the similarity of any two 3D points is less than or equal to a threshold value, and any two 3D points can be selected by traversing to determine their similarity. If the similarity is less than or equal to a threshold, put the any two 3D points into the set S. At this time, the similarity between the set P and each 3D point in the set S, that is, the Euclidean distance d, is calculated.
  • the corresponding 3D points in the set P are added to the set S, so that The similarity of any two 3D points in the set S is less than or equal to a threshold, so that the 3D points in the set S are not overly similar and have data balance. If the set S is empty after screening, the candidate 3D point P may be added to the set S, where the candidate 3D point may be the 3D point generated above.
  • the selection of spatial points is completed, and the corresponding 2D point pairs need to be screened, that is, the corresponding real image point pairs are selected.
  • the spatial points in the three-dimensional spatial model have a corresponding relationship with the real image points in the real image used to construct the three-dimensional spatial model.
  • the spatial points that have been screened also have corresponding real image point pairs, that is, 2D point pairs.
  • each 3D point after screening will correspond to 2D points in multiple perspectives (real images from different perspectives used to construct the 3D model of the space), in order to increase the difficulty of the dataset, improve the accuracy and universality of the model. For each 3D point only the hardest pair of matching 2D points is kept.
  • a 3D point set S is obtained, and any 3D point m in S is defined, and the corresponding 2D points under different viewing angles form a set T, and an image acquisition device (set on a movable platform) under each viewing angle in T is used. ), such as a camera, the poses constitute a set Q, and it should be understood that each pose corresponds to an image acquisition device, such as a camera.
  • the set Q calculates the similarity between the corresponding image acquisition devices, such as cameras, and positions, such as the Euclidean distance, to obtain the two camera positions with the largest Euclidean distance, keep the corresponding 2D points in T, and discard the remaining 2D points.
  • the set S is traversed, and the unique 2D point pair corresponding to each 3D point in the set S is determined, and all the filtered 2D point pairs constitute the set T.
  • the positions of the two cameras (ie, the image acquisition devices) with the largest Euclidean distance are the two positions that represent the least similarity. Then the 2D point pairs obtained are the most difficult.
  • the first training data can be obtained.
  • the first training data can be divided according to different difficulties, and divided into three categories: simple, general, and difficult.
  • the sets S and T obtained above since any 3D point in the set S corresponds to a 2D point pair in the set T, then it can be defined that each group of corresponding 3D points m, 2D points x and 2D points y constitute a sample n , and calculate the difficulty score L of each sample n according to the following formula (1).
  • La represents the angle ⁇ xpy formed by the 2D point pair in the sample n and the 3D point
  • Ld represents the 2D point x, 2D point y corresponding to the image acquisition device, such as the camera, the spatial distance between the positions
  • Lq represents the corresponding 2D point Image acquisition device, such as camera, quaternion angle of pose.
  • weight parameters ⁇ 1, ⁇ 2 and ⁇ 3 are introduced. According to the final difficulty score L, the first training data is divided into easy, normal, and difficult.
  • the difficulty level of the first training data can be known based on the division of the first training data, so that the training of subsequent models can be more accurately controlled, especially whether the model can meet many application scenarios, Whether the descriptors can be obtained more accurately in different application scenarios.
  • the first training data can also be adjusted according to the degree of difficulty, so that the degree of difficulty of the samples can meet the requirements and meet the requirements of model training. As can be seen from the foregoing, in order to further reduce the storage space used by the descriptors and reduce the time for measuring the distance between the descriptors.
  • the loss function of the Boolean descriptor can be added, and under the combined action of multiple loss functions, the image descriptor of the Boolean data type and the key point descriptor of the Boolean data type can finally be output.
  • the dimension is much smaller than the traditional feature descriptor, and its effect is also better than the traditional feature descriptor.
  • binary descriptors of Boolean data type are directly output from the feature extraction model, which is more convenient for subsequent retrieval and matching of descriptors.
  • the initial key point information generation layer uses the first training data to train the initial key point information generation layer to generate a trained key point information generation layer, including: adding Boolean to the loss function of the floating point data type in the initial key point information generation layer The loss function of the data type; through the first training data, the loss function of the floating point data type, and the loss function of the Boolean data type, the initial key point information generation layer is trained, and the trained key point information generation layer is generated.
  • the loss function of this layer can be converted from a loss function of floating point data type to a loss function of floating point data type, and a loss function of Boolean data type is added to the loss function, that is, Form multiple loss functions.
  • a loss function of floating-point data type can also implement model training, but the descriptor obtained by the model during training is a descriptor of floating-point data type. Therefore, the loss function of the boolean data type is added to the loss function of the floating point data type as the loss function of the layer, and at the same time, the layer is trained through the first training data, and the trained layer is obtained.
  • the trained feature extraction layer can also be obtained.
  • the image description information generation layer can be trained.
  • the method 100 may further include: based on the trained feature extraction layer, training the initial image description information generation layer by using the second training data, and generating the trained image description information generation layer as the image in the feature extraction model The description information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  • the second training data may be acquired in the following manner: acquiring real images, determining real image matching pairs from the real images based on the classification model, and using them as key frame image matching pairs, and determining whether each real image matching pair belongs to the same vision scene, so as to obtain the second training data.
  • the classification model may be a model for matching real images, and the model may determine real image matching pairs that belong to the same visual scene and real image matching pairs that do not belong to the same visual scene, such as two matching pairs belonging to the same location. real image.
  • the model can be BoW.
  • multiple real images in multiple different visual scenes can be obtained by the drone in the actual flight scene of the visual return flight.
  • use the BoW model to input real images into the model to obtain matching pairs of real images in the same visual scene and matching pairs of real images in different scenes determined by the model.
  • the model can determine matching pairs by scoring.
  • the real image matching pairs with scores higher than the threshold are regarded as real image matching pairs in the same visual scene, that is, positive sample training data.
  • the real image matching pairs whose scores are lower than the threshold are regarded as real image matching pairs that are not in the same visual scene, that is, the negative sample training data.
  • the second training data can be obtained.
  • a random candidate matching pair can be added, that is, a candidate matching pair is randomly selected from the collected real image to generate a candidate matching pair, and after the candidate matching pair is generated. Whether there are errors or problems in these matching pairs is further determined manually. When there are problems or errors, especially caused by the classification model, precious negative sample training data can be obtained to improve the model ability.
  • the method 100 further includes: by displaying the real image matching pairs, in response to a user's determination operation, determining whether the real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
  • the images corresponding to the matching pairs can be displayed through a display device, such as a display screen, and the matching pairs can be displayed as the matching pairs.
  • a display device such as a display screen
  • the matching pairs can be displayed as the matching pairs.
  • the corresponding feature points between the two real images can also be displayed, and the corresponding feature points can be displayed through line to connect.
  • annotations are made by workers (i.e. users).
  • the annotation can include the following situations: same, not same, and indeterminate. The same can be represented by "0", the difference can be represented by "1", and the uncertainty can be represented by "2".
  • Matching pairs that are manually marked as uncertain can be eliminated and not used as the second training data. Others are used as the second training data, that is, the matching pairs marked with "0" and the matching pairs marked with "1" are used as the second training data.
  • the method 100 further includes: randomly selecting real image matching pairs from real images as key frame image matching pairs; by displaying the randomly selected real image matching pairs, in response to a user's determination operation, determining the randomly selected real images Whether the matching pair belongs to the same visual scene, so as to obtain the second training data.
  • the selected negative sample training data also has a difficult problem.
  • By manually labeling based on the BOW model more valuable negative sample training data can be found. (that is, the scenes are similar and are mistakenly identified by BOW as matching pairs belonging to the same visual scene), which helps to train a more robust model network.
  • the images of the matching pair can be obtained by the UAV from the actual flight scene of the visual return, it can fully reflect the change of the perspective scale in the visual return task.
  • the initial image description information generation layer can be trained, and the specific training process will not be repeated. Finally, the trained image description information generation layer can be obtained.
  • the loss function of the Boolean descriptor can be added, and under the combined action of multiple loss functions, the image descriptor of the Boolean data type and the key point descriptor of the Boolean data type can finally be output.
  • the dimension is much smaller than the traditional feature descriptor, and its effect is also better than the traditional feature descriptor.
  • the binary descriptor of Boolean data type is directly output from the feature extraction model, which is more convenient for the retrieval and matching of subsequent descriptors.
  • the binary descriptor of the Boolean data type of the second image description information is output from the image description information generation layer.
  • the initial image description information generation layer is trained by the second training data, and the trained image description information generation layer is generated, including: floating point data in the initial image description information generation layer
  • the loss function of Boolean data type is added to the loss function of the type;
  • the initial image description information generation layer is processed through the second training data, the loss function of the floating point data type and the loss function of the Boolean data type. Train, generate the image description information generation layer after training.
  • the training of the initial image description information generation layer is based on the feature extraction layer after training and the loss function of floating point data type and the loss function of Boolean data type. Through the second training data, the initial image description information generation layer is training. Thus, the feature extraction model can be completely trained.
  • the network of this feature extraction model is a multi-task branch network
  • a step-by-step training method can be adopted during training.
  • the first training data can be used to train the model, that is, the initial key point information generation layer is trained.
  • the parameters of the initial key point information generation layer and the initial feature extraction layer are fixed to obtain the key point information generation layer and the feature extraction layer.
  • use the second training data to train the initial image description information generation layer to obtain the image description information generation layer.
  • the first training data can be used to firstly use the first training data to correspond to the feature extraction model, because the The first training data is obtained from the spatial three-dimensional model, which is completely correct data.
  • the common layer of the model that is, the feature extraction layer, is already a good feature extraction layer.
  • the second training data can be used for training to avoid the generation of worker annotations.
  • the influence of the error on the network so as to obtain a better image description information generation layer.
  • the entire network can be fine-tuned using training data that contains both keypoints and keyframe images.
  • the method 100 further includes: adjusting the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through the third training data, and the third The three training data includes keyframe image matching pairs and keypoint matching pairs in the keyframe image matching pairs.
  • the third training data can be determined in the following way: when the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as the key frame image matching pair and key point matching pairs to obtain the third training data.
  • a three-dimensional spatial model can be constructed. After the model is established, there are at least two 2D points in the real images corresponding to each real 3D point in the spatial three-dimensional model, thereby forming a 2D point pair, that is, a pair of A 2D point pair belongs to a 3D point of a 3D model of space.
  • the two real images and the 2D point pairs in them can be used as the third training data, and the third training data can have multiple pairs real images, and each pair of real images has corresponding 2D point pairs.
  • the model trained by the first training data and the second training data is fine-tuned by using the third training data. Fine-tune the parameters of the feature extraction layer, image description information generation layer and/or keypoint information generation layer in the trained feature extraction model. It will not be repeated here.
  • the fine-tuned model can be used. If the model is trained on a mobile platform, it can be used directly. If the model is trained on a terminal, such as a server or a computer, the trained final model can be transplanted to the mobile platform.
  • the corresponding information can be combined according to the order of the key points in the image, so as to perform subsequent matching.
  • the method 100 further includes: combining the corresponding multiple second key point description information into a vector according to the sequence of multiple key points in the current image.
  • the corresponding descriptors can be integrated into a vector according to the order of the key points in the current image. for subsequent matching.
  • the corresponding descriptors into a vector according to the order of the key points in the historical image. for subsequent matching.
  • the image description information is used to find a first type of historical image with a scene similar to the current image in a plurality of the historical images
  • the key point description information is used to find a first type of historical image in the first type of historical image
  • a key point matching the key point of the current image is searched, and the matching result includes the matching relationship between the key point of the current image and the key point in the historical image.
  • the image description information is used to roughly pair the images, and based on this, one or more historical images (the first type of historical images) that more closely match the current image scene are obtained. What is related to positioning is the matching relationship of key points.
  • the key points in the current image history image can be further matched to obtain the matching relationship of key points, that is, a key point in the current image, A match to a keypoint in the historical image.
  • the position information of the key points in the historical image can be considered accurate, because based on the position information of the key points in the historical image, the matching relationship between a key point in the current image and a key point in the historical image can be obtained.
  • Location information of a key point in the image can be considered accurate, because based on the position information of the key points in the historical image, the matching relationship between a key point in the current image and a key point in the historical image can be obtained.
  • the UAV obtains the image descriptor of the above-mentioned historical image and the key point descriptor or the vector composed of the key point descriptor. And the image descriptor of the current image and the keypoint descriptor or the vector composed of the keypoint descriptor are obtained.
  • the vector composed of the image descriptor corresponding to the current image and the key point descriptor or key point descriptor can be compared with the image descriptor corresponding to multiple historical images and the vector composed of the key point descriptor or key point descriptor.
  • the comparison result that is, the matching result, can be determined through a similarity algorithm.
  • the comparison result may be that the current image may be exactly the same as one of the historical images, or partially the same, that is, similar.
  • the similarity can be obtained according to the similarity algorithm to determine whether the similarity is greater than the similarity threshold. When it is greater than the similarity threshold, it can be determined that the matching result is a matching. Otherwise, it is a mismatch.
  • the above similarity algorithm may include Hamming distance, Euclidean distance, and the like.
  • the above-mentioned image descriptors and key point descriptors can be Boolean descriptors.
  • the Boolean descriptor measures the distance between the corresponding descriptors through the similarity algorithm, it only needs to perform the XOR operation to obtain the similarity, such as the Hamming distance, which can greatly speed up the calculation of the distance between the corresponding descriptors. , thereby further reducing the time consumption.
  • the UAV determines which historical image the current image is the same as or meets the similarity threshold, so as to determine the geographic location to which the current image belongs based on the geographic location to which the historical image belongs.
  • the determined geographic location may be an absolute geographic location of the current image, that is, a geographic location based on a geographic location coordinate system or a geographic location relative to a historical image.
  • the movable platform After determining the position of the current image, the movable platform can go back according to the position.
  • the method 100 may further include: determining the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; according to the posture, the movable platform moves from the second position to the The first position to realize the automatic return of the movable platform.
  • the posture of the drone is determined and adjusted according to the deviation of the above two positions. So that the drone can move from the second position to the first position, so as to realize the return flight.
  • FIG. 3 is a schematic structural diagram of a positioning device according to an embodiment of the present invention
  • the device 300 can be applied to a movable platform, such as an unmanned aerial vehicle, an intelligent mobile robot, etc.
  • the movable platform includes a visual sensor.
  • the apparatus 300 can perform the above-mentioned positioning method.
  • the apparatus 300 includes: a first obtaining module 301 , a second obtaining module 302 , a first determining module 303 and a second determining module 304 .
  • the functions of each module are described in detail below:
  • the first obtaining module 301 is configured to obtain first image description information and first key point description information of historical images collected by the visual sensor, and obtain first position information of the movable platform when collecting the historical images.
  • the second obtaining module 302 is configured to obtain the current image collected by the visual sensor, and obtain second image description information and second key point description information of the current image based on the feature extraction model.
  • the first determination module 303 is configured to determine a plurality of historical images and the current image based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image. match result.
  • the first determination module 304 is configured to determine the second position information of the movable platform when the current image is collected according to the matching result and the first position information of the historical image.
  • the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared features. The information generates second image description information; the key point information generation layer is used for generating the second key point description information based on the common feature information.
  • the feature extraction layer is used to extract common feature information based on multiple convolutional layers in the convolutional network.
  • the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer.
  • the image information generation layer is used to extract the second image description information of the floating point data type from the common feature information through the two convolution layers and the NetVLAD layer; the image information generation layer is used to convert the floating point data type to the second image description information.
  • the second image description information is converted into the second image description information of the Boolean data type.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of one convolution layer is related to the second key point.
  • the amount of descriptive information is the same.
  • the key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a convolution layer and bilinear upsampling; the key point information generation layer is used to convert the floating point data.
  • the second key point description information of the point data type is converted into the second key point description information of the Boolean data type.
  • the second acquisition module 302 is used to: determine the position of the key point in the current image; the key point information generation layer is used to obtain the downsampling information of the shared feature information through a layer of convolution layer; the key point information generation layer is used to The information of the corresponding position in the down-sampling information is directly up-sampled by bilinear up-sampling, so as to obtain the second key point description information.
  • the apparatus 300 further includes: a combining module, configured to combine the corresponding multiple second key point description information into a vector according to the sequence of the multiple key points in the current image.
  • the device 300 further includes: a third determining module, configured to determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; a moving module, configured to determine the posture of the movable platform according to the posture , the movable platform moves from the second position to the first position to realize the automatic return of the movable platform.
  • a third determining module configured to determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result
  • a moving module configured to determine the posture of the movable platform according to the posture , the movable platform moves from the second position to the first position to realize the automatic return of the movable platform.
  • the device 300 further includes: a training module for training the initial feature extraction layer through the first training data, and generating a trained feature extraction layer as the feature extraction layer in the feature extraction model;
  • the first training data includes Image point pairs corresponding to the same spatial point, the image point pairs are represented in different corresponding real images of the same visual scene;
  • the training module is used to train the initial key point information generation layer through the first training data, and generate training The latter key point information generation layer is used as the key point information generation layer in the feature extraction model.
  • the second acquisition module 302 is used for acquiring real images from different angles under each visual scene for different visual scenes; the device 300 further includes: a creation module for each visual scene, according to corresponding different angles The real image of the space is constructed, and the spatial three-dimensional model is constructed; the selection module is used to select the space point from the space three-dimensional model based on the similarity between the space points, and obtain the real image point corresponding to each selected space point in the real image. Right; the selection module is used to select the real image point pair according to the similarity between the collection positions of the real image point pair, and use the selected real image point pair as the key point pair to obtain the first training data.
  • the training module includes: an adding unit for adding a loss function of Boolean data type to the loss function of floating point data type in the initial key point information generation layer; a training unit for passing the first training data, floating point data
  • the loss function of point data type and the loss function of Boolean data type are used to train the initial key point information generation layer, and generate the key point information generation layer after training.
  • the training module is also used for: training the initial image description information generation layer through the second training data based on the trained feature extraction layer, and generating the trained image description information generation layer as the image description in the feature extraction model The information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  • the second acquisition module 302 further includes: acquiring real images, determining real image matching pairs from the real images based on the classification model, as key frame image matching pairs, and determining whether each real image matching pair belongs to the same visual scene , so as to obtain the second training data.
  • the apparatus 300 further includes: a third determination module, configured to determine whether the real image matching pairs belong to the same visual scene by displaying the real image matching pairs in response to the user's determination operation, thereby acquiring the second training data.
  • a third determination module configured to determine whether the real image matching pairs belong to the same visual scene by displaying the real image matching pairs in response to the user's determination operation, thereby acquiring the second training data.
  • the selection module is further configured to: randomly select real image matching pairs from the real images as key frame image matching pairs; the third determining module is configured to respond to the user's determination by displaying the randomly selected real image matching pairs operation to determine whether the randomly selected matching pairs of real images belong to the same visual scene, so as to obtain the second training data.
  • the adding unit is also used for: adding a loss function of boolean data type to the loss function of floating point data type in the initial image description information generation layer; the training unit is also used to extract the layer based on the features after training, through The second training data, the loss function of the floating point data type, and the loss function of the Boolean data type are used to train the initial image description information generation layer to generate a trained image description information generation layer.
  • the apparatus 300 further includes: an adjustment module, configured to perform a feature extraction layer, an image description information generation layer and/or a key point information generation layer in the feature extraction model through the third training data
  • the third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
  • the selection module is also used for: when the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as key frame image matching pairs and key points Match pairs to obtain the third training data.
  • the structure of the positioning apparatus 300 shown in FIG. 3 may be implemented as an electronic device, and the electronic device may be a positioning device, such as a movable platform.
  • the positioning device 400 may include: one or more processors 401 , one or more memories 402 and a visual sensor 403 .
  • the visual sensor 403 is used to collect historical images and current images.
  • the memory 402 is used to store a program that supports the electronic device to execute the positioning method provided in the embodiments shown in FIG. 1 to FIG. 2 .
  • the processor 401 is configured to execute programs stored in the memory 402 .
  • the program includes one or more computer instructions, wherein the one or more computer instructions can implement the following steps when executed by the processor 401:
  • the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared features. The information generates second image description information; the key point information generation layer is used for generating the second key point description information based on the common feature information.
  • the feature extraction layer is used to extract common feature information based on multiple convolutional layers in the convolutional network.
  • the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer.
  • the image information generation layer is used to extract the second image description information of the floating point data type from the shared feature information through the two convolution layers and the NetVLAD layer; the image information generation layer is used to extract the second image description information of the floating point data type The image description information is converted into the second image description information of the Boolean data type.
  • the key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of one convolution layer is related to the second key point.
  • the amount of descriptive information is the same.
  • the key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a convolution layer and bilinear upsampling; the key point information generation layer is used to convert the floating point data.
  • the second key point description information of the point data type is converted into the second key point description information of the Boolean data type.
  • the processor 401 is further configured to: determine the position of the key point in the current image; the key point information generation layer is used to obtain the down-sampling information of the common feature information through a convolution layer; the key point information generation layer is used to The information of the corresponding position in the down-sampling information is directly up-sampled by bilinear up-sampling, so as to obtain the second key point description information.
  • the processor 401 is further configured to: combine the corresponding multiple second key point description information into a vector according to the sequence of the multiple key points in the current image.
  • the processor 401 is further configured to: determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; according to the posture, the movable platform moves from the second position to the first position for automatic return of the movable platform.
  • the processor 401 is further configured to: train the initial feature extraction layer by using the first training data, and generate a trained feature extraction layer as the feature extraction layer in the feature extraction model; Image point pairs of spatial points, the image point pairs are represented in different corresponding real images of the same visual scene; through the first training data, the initial key point information generation layer is trained, and the trained key point information generation layer is generated, As the keypoint information generation layer in the feature extraction model.
  • the processor 401 is further configured to: for different visual scenes, obtain real images from different angles in each visual scene; for each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles; The similarity between the points, select the spatial point from the spatial three-dimensional model, and obtain the real image point pair corresponding to each selected spatial point in the real image; according to the similarity between the collection positions of the real image point pair, The real image point pair is selected, and the selected real image point pair is used as the key point pair to obtain the first training data.
  • the processor 401 is specifically configured to: add a loss function of the Boolean data type to the loss function of the floating point data type in the initial key point information generation layer; pass the first training data, the loss function of the floating point data type and the Boolean data type The loss function of the data type, the initial key point information generation layer is trained, and the trained key point information generation layer is generated.
  • the processor 401 is further configured to: based on the trained feature extraction layer, train the initial image description information generation layer by using the second training data, and generate the trained image description information generation layer as the image in the feature extraction model The description information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  • the processor 401 is further configured to: obtain a real image, determine a real image matching pair from the real image based on the classification model, use it as a key frame image matching pair, and determine whether each real image matching pair belongs to the same visual scene, Thus, the second training data is obtained.
  • the processor 401 is further configured to: by displaying the real image matching pairs, in response to the user's determination operation, determine whether the real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
  • the processor 401 is further configured to: randomly select real image matching pairs from real images as key frame image matching pairs; by displaying the randomly selected real image matching pairs, in response to a user's determination operation, determine the randomly selected real image matching pairs Whether the image matching pairs belong to the same visual scene, so as to obtain the second training data.
  • the processor 401 is specifically configured to: add a loss function of the Boolean data type to the loss function of the floating point data type in the initial image description information generation layer; based on the feature extraction layer after training, through the second training data, The loss function of the floating point data type and the loss function of the Boolean data type are used to train the initial image description information generation layer, and generate the trained image description information generation layer.
  • the processor 401 is further configured to: adjust the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through the third training data,
  • the third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
  • the processor 401 is further configured to: when the number of real image point pairs in the two real images is greater than the threshold, then use the two real images and the corresponding real image point pairs as the key frame image matching pair and the key The points are matched to obtain the third training data.
  • an embodiment of the present invention provides a computer-readable storage medium, where the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used to implement the above-mentioned methods in FIG. 1 to FIG. 2 . .
  • An embodiment of the present invention provides an unmanned aerial vehicle; specifically, the unmanned aerial vehicle includes: a body and a positioning device as shown in FIG. 4 , and the positioning device is provided on the body.
  • the disclosed related detection apparatus eg, IMU
  • the embodiments of the remote control device described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods, such as multiple units or components. May be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, and the indirect coupling or communication connection of the remote control device or unit may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer processor (processor) to perform all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

A positioning method and device, and an unmanned aerial vehicle and a storage medium. The method is applied to a movable platform, wherein the movable platform comprises a visual sensor (403). The method comprises: acquiring first image description information and first key point description information of historical images collected by a visual sensor, and acquiring first position information corresponding to a movable platform; acquiring the current image collected by the visual sensor, and acquiring second image description information and second key point description information of the current image on the basis of a feature extraction model; on the basis of the description information corresponding to the historical images and the description information corresponding to the current image, determining a matching result between the plurality of historical images and the current image; and according to the matching result and the first position information, determining second position information of the movable platform of when the current image is collected. The efficiency of acquiring the two pieces of description information is improved, and the two pieces of description information can also be relatively accurately determined, such that positioning time is further saved, and the positioning precision is improved.

Description

定位的方法、设备、无人机和存储介质Method, device, drone and storage medium for positioning 技术领域technical field
本申请涉及领域视觉回航领域,尤其涉及一种定位的方法、设备、无人机和存储介质。The present application relates to the field of visual return navigation, and in particular, to a positioning method, device, unmanned aerial vehicle and storage medium.
背景技术Background technique
无人机是利用无线电遥控设备和自备的程序控制装置操纵的不载人飞机,或者,无人机也可以由车载计算机等完全地或间歇地自主地操作。无人机在飞行的过程中,由于很多时候都处于超视距的范围,为了保证无人机的安全,自动返航是相当必要的。UAVs are unmanned aircraft operated by radio remote control equipment and self-contained program control devices. Alternatively, UAVs can also be fully or intermittently operated autonomously by on-board computers. During the flight of the UAV, since many times it is in the range of over-the-horizon, in order to ensure the safety of the UAV, it is quite necessary to automatically return to the home.
无人机在自动返航的过程,需要无人机对当前位置进行较为快速以及较为准确地定位。而如何针对无人机这种体积小的设备进行较为快速地定位和较为准确地定位是非常重要的。In the process of automatic return of the UAV, the UAV needs to locate the current position relatively quickly and accurately. However, it is very important to quickly and accurately locate the small-sized equipment such as UAVs.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种定位的方法、设备、无人机和存储介质,可以用于较为快速地以及较为准确地进行定位。The present application provides a positioning method, device, unmanned aerial vehicle and storage medium, which can be used for faster and more accurate positioning.
本申请的第一方面是为了提供一种定位的方法,所述方法应用于可移动平台,所述可移动平台包括视觉传感器,包括:获取所述视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集所述历史图像时的所述可移动平台的第一位置信息;获取所述视觉传感器采集的当前图像,并基于特征提取模型获取所述当前图像的第二图像描述信息和第二关键点描述信息;基于所述历史图像的所述第一图像描述信息和所述第一关键点描述信息,以及所述当前图像的所述第二图像描述信息和所述第二关键点描述信息,确定多张所述历史图像与所述当前图像的匹配结果;根据所述匹配结果和所述历史图像的所述第一位置信息,确定采集所述当前图像时所述可移动平台的第二位置信息。A first aspect of the present application is to provide a positioning method, the method is applied to a movable platform, and the movable platform includes a vision sensor, including: acquiring first image description information of historical images collected by the vision sensor and the first key point description information, and obtain the first position information of the movable platform when collecting the historical image; obtain the current image collected by the vision sensor, and obtain the first position of the current image based on the feature extraction model two image description information and second key point description information; based on the first image description information and the first key point description information of the historical image, and the second image description information and all the current image The second key point description information is used to determine the matching results of a plurality of the historical images and the current image; according to the matching results and the first position information of the historical images, determine the second position information of the movable platform.
本申请的第二方面是为了提供一种定位的设备,包括:存储器、处理器 以及视觉传感器;所述存储器,用于存储计算机程序;所述视觉传感器,用于采集的历史图像以及当前图像;所述处理器调用所述计算机程序,以实现如下步骤:获取所述视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集所述历史图像时的所述可移动平台的第一位置信息;获取所述视觉传感器采集的当前图像,并基于特征提取模型获取所述当前图像的第二图像描述信息和第二关键点描述信息;基于所述历史图像的所述第一图像描述信息和所述第一关键点描述信息,以及所述当前图像的所述第二图像描述信息和所述第二关键点描述信息,确定多张所述历史图像与所述当前图像的匹配结果;根据所述匹配结果和所述历史图像的所述第一位置信息,确定采集所述当前图像时所述可移动平台的第二位置信息。A second aspect of the present application is to provide a positioning device, including: a memory, a processor and a visual sensor; the memory for storing a computer program; the visual sensor for collecting historical images and current images; The processor invokes the computer program to implement the following steps: acquiring the first image description information and the first key point description information of the historical images collected by the visual sensor, and acquiring the available information when collecting the historical images. the first position information of the mobile platform; obtain the current image collected by the visual sensor, and obtain the second image description information and second key point description information of the current image based on the feature extraction model; The first image description information and the first key point description information, and the second image description information and the second key point description information of the current image, to determine a plurality of the historical images and the current image The matching result; according to the matching result and the first position information of the historical image, determine the second position information of the movable platform when the current image is collected.
本申请的第三方面是为了提供一种无人机,包括:机体以及上述第二方面所述的定位的设备。A third aspect of the present application is to provide an unmanned aerial vehicle, comprising: a body and the positioning device described in the second aspect.
本申请的第四方面是为了提供一种计算机可读存储介质,所述存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,所述程序指令用于第一方面所述的方法。A fourth aspect of the present application is to provide a computer-readable storage medium, the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used in the first aspect. method described.
本申请提供的一种定位的方法、该方法应用于可移动平台,可移动平台包括视觉传感器,包括:获取视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集历史图像时的可移动平台的第一位置信息;获取视觉传感器采集的当前图像,并基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息;基于历史图像的第一图像描述信息和第一关键点描述信息,以及当前图像的第二图像描述信息和第二关键点描述信息,确定多张历史图像与当前图像的匹配结果;根据匹配结果和历史图像的第一位置信息,确定采集当前图像时可移动平台的第二位置信息。其中,通过基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息,可以同时获取到第二图像描述信息和第二关键点描述信息,提高了获取这两个描述信息的效率,且还可以较为准确地确定出这两个描述信息,从而进一步节省了定位时间,提高了定位精度,也满足了获取这两个描述信息的实时性以及定位的实时性。同时,该方法也可以适用于无人机等可移动平台,帮助无人机可以较为顺利的进行回航。A positioning method provided by the present application, and the method is applied to a movable platform. The movable platform includes a visual sensor, including: acquiring first image description information and first key point description information of historical images collected by the visual sensor, and acquiring The first position information of the movable platform when the historical image is collected; the current image collected by the visual sensor is acquired, and the second image description information and the second key point description information of the current image are obtained based on the feature extraction model; The image description information and the first key point description information, as well as the second image description information and the second key point description information of the current image, determine the matching results of multiple historical images and the current image; according to the matching results and the first position of the historical image information to determine the second position information of the movable platform when the current image is captured. Wherein, by obtaining the second image description information and the second key point description information of the current image based on the feature extraction model, the second image description information and the second key point description information can be obtained at the same time, which improves the efficiency of obtaining the two description information. The two description information can be more accurately determined, thereby further saving the positioning time, improving the positioning accuracy, and satisfying the real-time performance of acquiring the two description information and the real-time performance of positioning. At the same time, this method can also be applied to movable platforms such as UAVs, which can help the UAVs to return home more smoothly.
相对的,对于特征提取模型而言,其在模型训练的过程中,可以针对第二图像描述信息和第二关键点描述信息,进行融合训练,即针对该一个特征 提取模型进行训练,即可以实现对第二图像描述信息和第二关键点描述信息的获取,提高全局性能。In contrast, for the feature extraction model, in the process of model training, fusion training can be performed for the second image description information and the second key point description information, that is, training is performed for the one feature extraction model, that is, it can be achieved. The acquisition of the second image description information and the second key point description information improves the global performance.
此外,本申请实施例还提供了基于该方法的设备、无人机和存储介质,均可以实现上述效果。In addition, the embodiments of the present application also provide a device, an unmanned aerial vehicle, and a storage medium based on the method, all of which can achieve the above effects.
附图说明Description of drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:
图1为本申请实施例提供的一种定位的方法的流程示意图;1 is a schematic flowchart of a positioning method provided by an embodiment of the present application;
图2为本申请实施例提供的特征提取模型的结构示意图;2 is a schematic structural diagram of a feature extraction model provided by an embodiment of the present application;
图3为本申请实施例提供的一种定位的装置的结构示意图;3 is a schematic structural diagram of a positioning device provided by an embodiment of the present application;
图4为本申请实施例提供的一种定位的设备的结构示意图。FIG. 4 is a schematic structural diagram of a positioning device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention.
为了便于理解本申请的技术方案和技术效果,下面对现有技术进行简要说明:In order to facilitate the understanding of the technical solutions and technical effects of the present application, the prior art is briefly described below:
基于前述内容可知,无人机在需要进行返航的过程中,需要对当前位置进行较为快速且较为准确地定位,特别是针对无人机这种小型体积的可移动平台而言。Based on the foregoing content, it can be seen that in the process of returning home, the UAV needs to locate the current position relatively quickly and accurately, especially for a small-sized mobile platform such as the UAV.
在现有技术中,视觉回航可以将关键帧匹配和关键点匹配分为两个任务,,但考虑到时间效率,关键帧匹配可以使用BoW(Bag of Words,词袋模型)的方 式,该方式虽然时间效率较高,但效果不太理想。而在关键点匹配任务中,较常使用ORB(Oriented Fast and Rotated Brief,一种快速特征点提取和描述的算法)描述子,但是ORB描述子在视觉回航中经常出现的大尺度大角度的视角变换,效果较差。为了进一步提高时间效率,本申请实施例提供了将关键帧和关键点的描述子通过在同一个网络模型中生成,重新构造网络结构,以进行定位的方式。In the prior art, the key frame matching and key point matching can be divided into two tasks for visual return navigation, but considering the time efficiency, the key frame matching can use the BoW (Bag of Words, bag of words model) method. Although the time efficiency is high, the effect is not ideal. In the key point matching task, the ORB (Oriented Fast and Rotated Brief, a fast feature point extraction and description algorithm) descriptor is often used, but the ORB descriptor often appears in the visual return navigation. Large-scale and large-angle perspective Transformation is less effective. In order to further improve the time efficiency, the embodiments of the present application provide a method of generating the descriptors of key frames and key points in the same network model, and reconstructing the network structure for positioning.
下面结合附图,对本发明的一些实施方式作详细说明。在各实施例之间不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following embodiments and features in the embodiments may be combined with each other without conflict between the embodiments.
图1为本发明实施例提供的一种定位的方法的流程示意图;本申请实施例提供的该方法100可以应用于可移动平台,如,无人机以及智能移动机器人等,可移动平台包括视觉传感器。该方法100包括以下步骤:1 is a schematic flowchart of a positioning method provided by an embodiment of the present invention; the method 100 provided by an embodiment of the present application can be applied to a movable platform, such as an unmanned aerial vehicle and an intelligent mobile robot, and the movable platform includes a visual sensor. The method 100 includes the following steps:
101:获取视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集历史图像时的可移动平台的第一位置信息。101: Acquire first image description information and first key point description information of a historical image collected by a visual sensor, and obtain first position information of the movable platform when collecting the historical image.
102:获取视觉传感器采集的当前图像,并基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息。102: Acquire the current image collected by the visual sensor, and acquire second image description information and second key point description information of the current image based on the feature extraction model.
103:基于历史图像的第一图像描述信息和第一关键点描述信息,以及当前图像的第二图像描述信息和第二关键点描述信息,确定多张历史图像与当前图像的匹配结果。103: Based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image, determine the matching results of the multiple historical images and the current image.
104:根据匹配结果和历史图像的第一位置信息,确定采集当前图像时可移动平台的第二位置信息。104: According to the matching result and the first position information of the historical image, determine the second position information of the movable platform when the current image is collected.
其中,通过基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息,可以同时获取到第二图像描述信息和第二关键点描述信息,提高了获取这两个描述信息的效率,且还可以较为准确地确定出这两个描述信息,从而进一步节省了定位时间,提高了定位精度,也满足了获取这两个描述信息的实时性以及定位的实时性。同时,该方法也可以适用于无人机等可移动平台,帮助无人机可以较为顺利的进行回航,保证无人机的安全。Wherein, by obtaining the second image description information and the second key point description information of the current image based on the feature extraction model, the second image description information and the second key point description information can be obtained at the same time, which improves the efficiency of obtaining the two description information. The two description information can be more accurately determined, thereby further saving the positioning time, improving the positioning accuracy, and satisfying the real-time performance of acquiring the two description information and the real-time performance of positioning. At the same time, this method can also be applied to movable platforms such as UAVs, which can help the UAVs to return home smoothly and ensure the safety of UAVs.
需要说明的是,本方法100可以应用于可移动平台,该可移动平台除了无人机外,还可以涉及到其它可移动平台,或者可移动设备,如扫地机器人等,可以使得这些可移动平台或可移动设备能够自动回航或者自动回到原始地点等。It should be noted that the method 100 can be applied to a movable platform, and the movable platform can also involve other movable platforms besides drones, or movable devices, such as sweeping robots, etc., which can make these movable platforms Or the mobile device can automatically return to the home or automatically return to the original location, etc.
以下针对上述步骤进行详细地阐述:The above steps are described in detail below:
101:获取视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集历史图像时的可移动平台的第一位置信息。101: Acquire first image description information and first key point description information of a historical image collected by a visual sensor, and obtain first position information of the movable platform when collecting the historical image.
其中,视觉传感器是指利用光学元件和成像装置获取外部环境图像信息的仪器,可以设置在可移动平台的内部,用于获取可移动平台的外部环境信息,如无人机当前所在地理位置的外部环境图像。Among them, the visual sensor refers to an instrument that uses optical components and imaging devices to obtain image information of the external environment. It can be set inside the movable platform and used to obtain the external environment information of the movable platform, such as the outside of the current geographic location of the UAV. environment image.
历史图像是指可移动平台在移动过程中获取到的历史图像,如无人机在正常航行阶段中获取到的外部环境图像。当无人机在开始回航的时候,可以将正常航行阶段中的获取到的外部环境图像,作为历史图像进行参考,以确定无人机在回航的过程中,当前的地理位置。Historical images refer to the historical images obtained by the movable platform during the moving process, such as the external environment images obtained by the UAV during the normal navigation phase. When the UAV starts to return to the flight, the external environment image obtained during the normal navigation phase can be used as a historical image for reference to determine the current geographic location of the UAV during the process of returning to the flight.
第一图像描述信息是指表征图像特征的信息,如图像描述子。其中,图像可以是移动过程中的关键帧图像,则可以称为关键帧描述子。第一关键点描述信息是指表征图像中关键点特征的信息,如图像中关键点描述子。该关键点可以为图像中的角点或者边缘等。The first image description information refers to information representing the characteristics of the image, such as image descriptors. Wherein, the image can be a key frame image in the moving process, which can be called a key frame descriptor. The first key point description information refers to information representing the features of key points in the image, such as key point descriptors in the image. The key point may be a corner or edge in the image.
其中,第一位置信息是指可移动平台在获取到对应历史图像的时候,可移动平台所在的地理位置。可以通过可移动平台的定位装置来确定,如GPS(Global Positioning System,全球定位系统),来确定当前的地理位置。除了获取到地理位置外,还可以获取到可移动平台的姿势,也可以称为朝向信息,从而确定出可移动平台的位姿。The first location information refers to the geographic location where the movable platform is located when the movable platform obtains the corresponding historical image. The current geographic location can be determined by a positioning device of the movable platform, such as GPS (Global Positioning System, global positioning system). In addition to obtaining the geographic location, the pose of the movable platform, which may also be called orientation information, can also be obtained, so as to determine the pose of the movable platform.
而上述第一图像描述信息和第一关键点描述信息可以通过下述的特征提取模型来获取到。也可以通过其它获取方式来进行获取,如SIFT(尺度不变特征变换,Scale-invariant feature transform)算法,或者,SuperPoint(特征点)算法等。需要说明的是,虽然上述其它获取方式由于其算法较为复杂,而可移动平台可能存在体积较小等问题,无法适应上述复杂算法,但是还是可以运行上述复杂算法,但在实时性上不太理想。然而,对于历史图像来说,也可以不实时获取到历史图像,可以具有时间间隔地获取历史图像。The above-mentioned first image description information and first key point description information can be obtained by the following feature extraction model. It can also be acquired by other acquisition methods, such as SIFT (scale-invariant feature transform, Scale-invariant feature transform) algorithm, or, SuperPoint (feature point) algorithm, etc. It should be noted that although the above-mentioned other acquisition methods cannot adapt to the above-mentioned complex algorithms due to their relatively complex algorithms and the mobile platform may have problems such as small size, the above-mentioned complex algorithms can still be run, but the real-time performance is not ideal. . However, for the historical images, the historical images may not be acquired in real time, and the historical images may be acquired with time intervals.
例如,无人机在空中正常航行,无人机可以从正常航行开始,就通过无人机上设置的视觉传感器的摄像头,获取到无人机在空中时候的外部环境图像,即历史图像。该视觉传感器将获取到历史图像传输至无人机,以进行图像的处理。视觉传感器可以实时获取历史图像,也可以具有时间间隔的获取历史图像。此外,视觉传感器或者无人机还可以根据获取到的历史图像,来确定是否属于关键帧图像,可以根据关键帧的确定规则来确定,然后视觉传 感器再确定是否将该历史图像发送至无人机,或者无人机在确定关键帧后,再对历史图像进行处理。无人机可以通过下述特征提取模型或者SIFT算法等来获取到历史图像的图像描述子以及该图像中的关键点描述子,如角点描述子。For example, when the UAV is sailing normally in the air, the UAV can obtain the image of the external environment when the UAV is in the air through the camera of the vision sensor set on the UAV from the normal navigation, that is, the historical image. The vision sensor transmits the acquired historical images to the drone for image processing. The vision sensor can acquire historical images in real time, or it can acquire historical images with time intervals. In addition, the vision sensor or UAV can also determine whether it belongs to the key frame image according to the acquired historical image, which can be determined according to the determination rules of the key frame, and then the vision sensor determines whether to send the historical image to the UAV. , or the drone processes historical images after determining keyframes. The UAV can obtain image descriptors of historical images and key point descriptors in the image, such as corner descriptors, through the following feature extraction model or SIFT algorithm.
102:获取视觉传感器采集的当前图像,并基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息。102: Acquire the current image collected by the visual sensor, and acquire second image description information and second key point description information of the current image based on the feature extraction model.
其中,当前图像是指可移动平台在回航或者返回移动过程中,在当前地理位置获取到的图像。其第二图像描述信息和第二关键点描述信息,与上述步骤101中的第一图像描述信息和第一关键点描述信息的本质相同,此处就不再赘述。Wherein, the current image refers to an image obtained by the movable platform at the current geographic location during the process of returning to the voyage or returning to the mobile platform. The second image description information and the second key point description information are of the same nature as the first image description information and the first key point description information in the foregoing step 101, and will not be repeated here.
需要说明的是,由于第二图像描述信息和第二关键点描述信息是基于同一模型(即特征提取模型)获取的,不但可以提高信息获取效率,实时获取的描述子要求,还可以在模型训练的时候,对模型进行融合训练而不是分离训练,使得全局性更好。It should be noted that, since the second image description information and the second key point description information are obtained based on the same model (ie, the feature extraction model), it can not only improve the efficiency of information acquisition, but also meet the requirements of the descriptors obtained in real time during model training. When the model is fused instead of separately trained, the overall situation is better.
其中,特征提取模型包括特征提取层,图像描述信息生成层和关键点信息生成层;特征提取层用于基于卷积网络提取当前图像的共用特征信息;图像描述信息生成层用于基于共用特征信息生成第二图像描述信息;关键点信息生成层用于基于共用特征信息生成第二关键点描述信息。Among them, the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared feature information. generating second image description information; the key point information generation layer is used for generating second key point description information based on the common feature information.
图2示出了特征提取模型的结构示意图。其中,该特征提取模型包括特征提取层201,图像描述信息生成层203和关键点信息生成层202。在特征提取层201中可以具有卷积网络,该卷积网络包括多层卷积层。Figure 2 shows a schematic diagram of the structure of the feature extraction model. The feature extraction model includes a feature extraction layer 201 , an image description information generation layer 203 and a key point information generation layer 202 . There may be a convolutional network in the feature extraction layer 201, the convolutional network including multiple layers of convolutional layers.
具体的,特征提取层用于基于卷积网络提取当前图像的共用特征信息,包括:特征提取层用于基于卷积网络中的多个卷积层提取共用特征信息。在图2中可以包括4层卷积层,该卷积层可以对输入的历史图像或者当前图像,即真实图像204进行卷积,得到卷积后的图像特征信息,即共用特征信息。Specifically, the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network, including: the feature extraction layer is used to extract the common feature information based on multiple convolutional layers in the convolutional network. In FIG. 2 , four convolution layers may be included, and the convolution layer may perform convolution on the input historical image or current image, that is, the real image 204 , to obtain convolved image feature information, that is, shared feature information.
具体的,关键点信息生成层用于基于共用特征信息生成第二关键点描述信息,包括:关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,一层卷积层的卷积核数量与第二关键点描述信息的数量相同。如图2所示,关键点信息生成层202可以包括一层卷积层,该层卷积层的卷积核数量与第二关键点描述信息的数量相同。再经过双线性上采样可以得到当前图像中关键点描述子2021。Specifically, the key point information generation layer is used to generate the second key point description information based on the shared feature information, including: the key point information generation layer is used to extract the first key point information from the shared feature information based on a convolutional layer and bilinear upsampling. Two key point description information, the number of convolution kernels of one convolution layer is the same as the number of the second key point description information. As shown in FIG. 2 , the keypoint information generation layer 202 may include a convolutional layer, and the number of convolution kernels in the convolutional layer is the same as that of the second keypoint description information. After bilinear upsampling, the key point descriptor 2021 in the current image can be obtained.
具体的,图像描述信息生成层用于基于共用特征信息生成第二图像描述信息,包括:图像信息生成层用于通过两层卷积层以及NetVLAD层,从共用特征信息提取第二图像描述信息。如图2所示,对于图像描述信息生成层203而言,其可以包括两层卷积层以及一个NetVLAD(Net Vector of Local Aggregated Descriptors,网络局部聚合描述符)层,从而生成当前图像描述子2031。Specifically, the image description information generation layer is configured to generate the second image description information based on the shared feature information, including: the image information generation layer is configured to extract the second image description information from the shared feature information through two convolution layers and a NetVLAD layer. As shown in FIG. 2, for the image description information generation layer 203, it may include two convolution layers and a NetVLAD (Net Vector of Local Aggregated Descriptors) layer, thereby generating the current image descriptor 2031 .
需要说明的是,在关键点描述子生成方法中,虽然前文所述中说明了可以通过SIFT算法来获取描述子,且其效果较好,但是SIFT算法复杂度高,无法在嵌入式设备上实时进行,实时性较差对于回航而言。并且SIFT算法也没有针对大尺度,大角度的变化做专门的提升,并不适合在视觉回航中使用。而其它传统方法也普遍存在效果较差或者时间复杂度较高的情况,无法胜任视觉回航中对描述子的要求。It should be noted that, in the key point descriptor generation method, although it is described above that the SIFT algorithm can be used to obtain the descriptor, and its effect is good, the SIFT algorithm has high complexity and cannot be implemented in real time on embedded devices. For the return flight, the real-time performance is poor. In addition, the SIFT algorithm has not been specially improved for large-scale and large-angle changes, and is not suitable for use in visual return. However, other traditional methods generally have poor effect or high time complexity, and cannot meet the requirements of descriptors in visual return.
另,对于基于学习的关键点描述子生成方法中,SuperPoint是目前较好的模型,此模型能够同时得到关键点在图像中的位置和关键点描述子,但是是该模型相对而言较大,难以在嵌入式设备上实时运行,并且由于其训练数据是通过单应变化生成的,并不能很好的模拟视觉回航中实际的使用场景。而本申请实施例则是通过上述一个特征提取模型可以满足能够在可移动平台,即该嵌入式设备中实时运行的要求。其中,特征提取层可以通过步长为2的卷积层进行快速下采样提取当前图像或历史图像的特征,能够减少算力消耗。而此模型结构能够在保证提取描述子效果的同时,尽量的减少网络计算复杂度,同时结合图特征和点特征的信息,即共用特征信息,在同一个网络模型中同时生成关键点描述子和图像描述子,不仅充分利用了图像和关键点的共性,而且还大量的节约了重复计算特征的时间。例如,根据前文所述,当无人机遭受天气等环境因素,导致信号中断,或者,GPS定位装置出现问题,可自主自动触发回航。在回航的过程中,无人机可以通过视觉传感器实时获取当前图像,并发送至无人机进行处理。无人机接收到该当前图像后,可以将该当前图像输入至特征提取模型中。针对任一当前图像而言,先经过该模型中的特征提取层,通过该特征提取层中的卷积层获取到该当前图像的共用特征信息。然后将这个共用特征信息分别发送至图像描述信息生成层和关键点信息生成层。图像描述信息生成层接收到该共用特征信息后,经过两层卷积层以及NetVLAD层得到当前图像描述子。关键点信息生成层接收到该共用特征信息后,经过一层卷积层以及双线性上采样,得到当前图像中关键点描述子。In addition, for the key point descriptor generation method based on learning, SuperPoint is a better model at present. This model can obtain the position of the key point in the image and the key point descriptor at the same time, but the model is relatively large. It is difficult to run in real time on embedded devices, and because its training data is generated through homography changes, it cannot well simulate the actual usage scenarios in visual return navigation. However, in the embodiment of the present application, the above-mentioned one feature extraction model can meet the requirement of being able to run in real time on a mobile platform, that is, the embedded device. Among them, the feature extraction layer can extract the features of the current image or historical images by fast downsampling with a convolutional layer with a stride of 2, which can reduce computing power consumption. And this model structure can reduce the computational complexity of the network as much as possible while ensuring the effect of extracting the descriptors, and at the same time combine the information of graph features and point features, that is, the shared feature information, to generate key point descriptors and points in the same network model. Image descriptors not only make full use of the commonality of images and key points, but also save a lot of time for repeated feature calculation. For example, as mentioned above, when the drone is subjected to environmental factors such as weather, which causes the signal to be interrupted, or there is a problem with the GPS positioning device, it can automatically trigger the return flight. In the process of returning home, the UAV can obtain the current image in real time through the vision sensor and send it to the UAV for processing. After the drone receives the current image, the current image can be input into the feature extraction model. For any current image, the feature extraction layer in the model is first passed, and the common feature information of the current image is obtained through the convolution layer in the feature extraction layer. Then this shared feature information is sent to the image description information generation layer and the key point information generation layer respectively. After receiving the shared feature information, the image description information generation layer obtains the current image descriptor through two convolution layers and NetVLAD layers. After the key point information generation layer receives the shared feature information, it obtains the key point descriptor in the current image through a convolution layer and bilinear upsampling.
此外,由于是需要实时获取当前图像的图像描述子以及关键点描述子,所以这个描述子需要大量使用,由于当前得到的描述子大部分都是基于浮点数据类型的描述子,不仅占用空间大,而度量时间消耗大。会给嵌入式设备较多的资源消耗。所以为了能够提高资源的利用,减少占用空间以及内存消耗,可以通过将浮点数据类型的描述子转换为布尔数据类型的描述子来实现。In addition, since it is necessary to obtain the image descriptor and key point descriptor of the current image in real time, this descriptor needs to be used in large quantities. Since most of the currently obtained descriptors are based on floating-point data type descriptors, not only do they take up a lot of space , and the measurement time is expensive. It will consume more resources for embedded devices. Therefore, in order to improve the utilization of resources, reduce the occupied space and memory consumption, it can be realized by converting the descriptor of the floating-point data type to the descriptor of the Boolean data type.
具体的,图像信息生成层用于通过两层卷积层以及NetVLAD层,从共用特征信息提取第二图像描述信息,包括:图像信息生成层用于通过两层卷积层以及NetVLAD层,从共用特征信息提取到浮点数据类型的第二图像描述信息;图像信息生成层用于将浮点数据类型的第二图像描述信息转换为布尔数据类型的第二图像描述信息。Specifically, the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer, including: the image information generation layer is used to pass the two convolution layers and the NetVLAD layer. The feature information is extracted into the second image description information of the floating point data type; the image information generation layer is used to convert the second image description information of the floating point data type into the second image description information of the Boolean data type.
例如,根据前文所述,如图2所示,图像描述信息生成层203接收到该共用特征信息后,经过两层卷积层以及NetVLAD层得到当前图像描述子2031,此时得到的当前图像描述子2031为浮点数据类型,再将该浮点数据类型的当前图像描述子2031转换为布尔数据类型的当前图像描述子2032。For example, according to the foregoing, as shown in FIG. 2 , after receiving the shared feature information, the image description information generation layer 203 obtains the current image descriptor 2031 through two convolutional layers and the NetVLAD layer, and the current image description obtained at this time The sub 2031 is of floating point data type, and the current image descriptor 2031 of the floating data type is converted into the current image descriptor 2032 of the Boolean data type.
相对的,对于关键点信息生成层而言也会存在上述问题,所以也可以将该层中的关键点描述子的数据类型,由浮点数据类型转换为布尔数据类型。On the contrary, the above problem also exists for the key point information generation layer, so the data type of the key point descriptors in this layer can also be converted from the floating point data type to the Boolean data type.
具体的,关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,包括:关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取到浮点数据类型的第二关键点描述信息;关键点信息生成层用于将浮点数据类型的第二关键点描述信息转换为布尔数据类型的第二关键点描述信息。例如,根据前文所述,如图2所示,关键点信息生成层202接收到该共用特征信息后,经过一层卷积层以及双线性上采样,得到当前图像中关键点描述子2021。此时得到的关键点描述子2021为浮点数据类型,再将该浮点数据类型的关键点描述子2021转换为布尔数据类型的关键点描述子2022。Specifically, the key point information generation layer is used to extract the second key point description information from the shared feature information through a convolution layer and bilinear upsampling, including: the key point information generation layer is used to pass a convolution layer. and bilinear upsampling, extracting the second key point description information of the floating point data type from the common feature information; the key point information generation layer is used to convert the second key point description information of the floating point data type into the Boolean data type. The second key point describes the information. For example, according to the foregoing, as shown in FIG. 2 , after receiving the shared feature information, the keypoint information generation layer 202 obtains the keypoint descriptor 2021 in the current image through a convolutional layer and bilinear upsampling. The key point descriptor 2021 obtained at this time is of the floating point data type, and then the key point descriptor 2021 of the floating point data type is converted into the key point descriptor 2022 of the Boolean data type.
与SuperPoint算法来最终获得所有关键点描述子,然后再根据关键点的位置获取关键点的描述子的方式不同。本申请实施例在关键点信息生成层,经过一层卷积层后,直接通过双线性上采样的方式获取关键点描述子,并且通过只对关键点位置进行双线性上采样,在使用的时候,可以大大降低计算量。It is different from the SuperPoint algorithm to finally obtain all the keypoint descriptors, and then obtain the descriptors of the keypoints according to the positions of the keypoints. In the embodiment of the present application, in the key point information generation layer, after passing through a convolution layer, the key point descriptors are directly obtained by bilinear upsampling, and by only performing bilinear upsampling on the key point positions, when using can greatly reduce the amount of computation.
具体的,关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,包括:确定当前图像中关键点的位 置;关键点信息生成层用于通过一层卷积层得到共用特征信息的下采样信息;关键点信息生成层用于通过双线性上采样直接对下采样信息中对应位置的信息进行上采样,得到第二关键点描述信息。Specifically, the key point information generation layer is used to extract the second key point description information from the shared feature information based on a convolution layer and bilinear upsampling, including: determining the position of the key point in the current image; generating the key point information The layer is used to obtain the down-sampling information of the shared feature information through a convolution layer; the key point information generation layer is used to directly up-sample the information of the corresponding position in the down-sampling information through bilinear up-sampling to obtain the second key point Description.
其中,关键点的位置是指关键点在图像中的位置,即关键点所在图像中的大小。如关键点的大小为16*16像素大小,由此就可以确定出这个关键点在图像中的位置。The position of the key point refers to the position of the key point in the image, that is, the size of the image in which the key point is located. If the size of the key point is 16*16 pixels, the position of the key point in the image can be determined.
例如,根据前文所述,为了进一步提升模型的时间效率,在当前图像中的一定的方格区域内获取关键点,如16x16像素。关键点信息生成层在一层卷积下采样到原来的1/16的特征图(针对当前图像)后,直接利用双线性上采样的方式获取描述子,这样获取的描述子不仅减少了其它基于学习的方法中反卷积上采样的消耗,而且在实际训练和使用的时候,只针对关键点的位置进行上采样,可以大大的减少时间消耗。For example, according to the foregoing, in order to further improve the time efficiency of the model, key points are obtained in a certain grid area in the current image, such as 16x16 pixels. After the key point information generation layer is downsampled to the original 1/16 feature map (for the current image) by a layer of convolution, the descriptor is directly obtained by bilinear upsampling, which not only reduces other The consumption of deconvolution upsampling in the learning-based method, and in the actual training and use, only upsampling for the position of the key points can greatly reduce the time consumption.
上述特征提取模型的创建是通过模型训练得到的。因为该模型是多任务分支网络的模型,在训练的时候可以采取分步训练的方式,首先使用关键点训练集对该模型进行训练,初始固定出来图像描述信息生成层中的参数,即用于确定图像描述子(可以是当前图像描述子也可以是历史图像描述子)。当模型训练到损失无明显下降后,就可以确定出关键点信息生成层的参数和特征提取层的参数。然后,再利用获取到的图像匹配训练集去训练图像描述信息生成层,确定其最终的参数。The creation of the above feature extraction model is obtained through model training. Because the model is a multi-task branch network model, it can be trained step by step during training. First, use the key point training set to train the model, and initially fix the parameters in the image description information generation layer, which are used for Determine the image descriptor (which can be the current image descriptor or the historical image descriptor). When the model is trained until the loss does not decrease significantly, the parameters of the key point information generation layer and the parameters of the feature extraction layer can be determined. Then, the obtained image matching training set is used to train the image description information generation layer, and its final parameters are determined.
需要说明的是,也可以先对图像描述信息生成层进行训练,由此也可以确定出特征提取层的参数,然后再对关键点信息生成层进行训练。但是这种方式训练出来的模型相对上述训练方式训练出来的模型精准性稍差一些。It should be noted that, the image description information generation layer can also be trained first, so that the parameters of the feature extraction layer can also be determined, and then the key point information generation layer is trained. However, the model trained in this way is slightly less accurate than the model trained by the above training method.
此外,为了提高整个模型的训练时间,可以通过其它训练平台来训练该模型,如通过服务器或者电脑等来训练,当训练完成后在移植到可移动平台上。当然,如果可移动平台的性能可以支撑训练该模型,也可以在该可移动平台上训练该模型。In addition, in order to improve the training time of the entire model, the model can be trained through other training platforms, such as through a server or a computer, and then transplanted to a mobile platform after the training is completed. Of course, if the performance of the movable platform can support training the model, the model can also be trained on the movable platform.
具体的,通过第一训练数据,对初始特征提取层进行训练,生成训练后的特征提取层,作为特征提取模型中的特征提取层;第一训练数据包括对应于同一空间点的图像点对,该图像点对在表示为同一视觉场景的不同对应真实图像中;通过第一训练数据,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层,作为特征提取模型中的关键点信息生成层。Specifically, the initial feature extraction layer is trained through the first training data, and the trained feature extraction layer is generated as the feature extraction layer in the feature extraction model; the first training data includes image point pairs corresponding to the same spatial point, The image point pair is represented in different corresponding real images of the same visual scene; through the first training data, the initial key point information generation layer is trained, and the trained key point information generation layer is generated as the key point in the feature extraction model. Point information generation layer.
其中,第一训练数据则是上述关键点训练集中的训练数据。初始关键点信息生成层的结构与训练后的关键点信息生成层的结构相同。只是其中的参数不同。对于初始关键点信息生成层而言,其中的参数是初始参数。The first training data is the training data in the above-mentioned key point training set. The structure of the initial keypoint information generation layer is the same as that of the trained keypoint information generation layer. Only the parameters are different. For the initial keypoint information generation layer, the parameters are the initial parameters.
训练的过程则是网络模型的训练过程,此处就不再赘述。仅说明:图像点对可以是同一三维空间中的空间点所对应的图像点对。该图像点对来源于两个图像,这两个图像表示为同一视觉场景的不同真实图像,如同一个地点但是不同角度,或不同图像采集位置的两个真实图像。The training process is the training process of the network model, which will not be repeated here. It is only explained: the image point pair may be the image point pair corresponding to the spatial point in the same three-dimensional space. The image point pair is derived from two images that are represented as different real images of the same visual scene, like two real images of the same location but at different angles, or at different image acquisition locations.
包含上述图像点对的第一训练数据的获取方式如下:The acquisition method of the first training data including the above-mentioned image point pairs is as follows:
具体的,针对不同的视觉场景,获取每个视觉场景下的不同角度的真实图像;针对每个视觉场景,根据对应不同角度的真实图像,构建空间三维模型;基于空间点之间的相似度,从空间三维模型中选择空间点,以及获得选择后的每个空间点在真实图像中对应的真实图像点对;根据真实图像点对的采集位置之间的相似度,对真实图像点对进行选择,将选择出来的真实图像点对作为关键点对,从而得到第一训练数据。Specifically, for different visual scenes, obtain real images from different angles in each visual scene; for each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles; based on the similarity between spatial points, Select spatial points from the spatial three-dimensional model, and obtain the real image point pairs corresponding to each selected spatial point in the real image; select the real image point pairs according to the similarity between the collection positions of the real image point pairs , and the selected real image point pairs are used as key point pairs to obtain the first training data.
其中,获取不同角度的真实图像可以为:例如,根据前文所述,无人机可以按照飞行高度以及姿态角度的大小来采集真实图像。如根据飞行高度的低、中、高,飞行姿态角度的小、中、大的标准,利用无人机有针对性的采集下视真实图像,即图像数据。同时,为了加快后续模型训练速度,提高数据分布的均衡性,根据采集图像的相似度,从中剔除过于相似的数据。The acquisition of real images from different angles may be: For example, according to the foregoing description, the UAV may acquire real images according to the size of the flight height and the attitude angle. For example, according to the standard of low, medium and high flight height and small, medium and large flight attitude angle, the UAV is used to collect the real image of the downward view, that is, the image data. At the same time, in order to speed up the subsequent model training and improve the balance of data distribution, the data that is too similar is eliminated according to the similarity of the collected images.
由此,面向无人机飞行时的应用场景,可以克服数据大的视角、尺度变换。同时可以在模型训练、测试时提供真实数据,该真实数据可以包含大量的采集到的真实图像和真实图像中相匹配的特征点,也可以称为关键点。Therefore, for the application scenarios of UAV flying, the perspective and scale transformation of large data can be overcome. At the same time, real data can be provided during model training and testing. The real data can include a large number of collected real images and matching feature points in the real images, which can also be called key points.
构建空间三维模型的过程可以为:例如,根据前文所述,针对上述采集得到的同一视觉场景下的至少两个真实图像(可以是两个、三个、四个以及五个等),使用SFM(Structure from motion,三维建模)建模方法构建空间三维模型。模型建立好后,即存在空间三维模型中各个真实3D点所对应的至少两个真实图像中的2D点,从而形成2D点对。为了提高模型的泛化能力,在处理不同类别关键点时,都能够提取鲁棒的特征描述,本申请实施例可以使用多种不同种类的关键点,通过SFM来构建三维模型。该关键点类型可以包括但不限于SIFT类型的关键点(可以是通过SIFT算法得到的关键点或角点等)、FAST(Features from accelerated segment test,特征点检测算法)类型的关 键点(可以是通过FAST算法得到的关键点或角点等)、ORB类型的关键点(可以是通过ORB算法得到的关键点或角点等)、Harris(哈里斯)类型的关键点(可以是通过Harris算法得到的关键点或角点等)。从而得到更具一般性的训练数据。The process of constructing a three-dimensional spatial model may be: For example, according to the foregoing, for at least two real images (which may be two, three, four, and five, etc.) of the same visual scene obtained above, using SFM (Structure from motion, three-dimensional modeling) modeling method to build a three-dimensional model of space. After the model is established, there are 2D points in at least two real images corresponding to each real 3D point in the spatial three-dimensional model, thereby forming a 2D point pair. In order to improve the generalization ability of the model, robust feature descriptions can be extracted when processing different types of key points. The embodiment of the present application can use a variety of different types of key points to construct a three-dimensional model through SFM. The key point type may include, but is not limited to, SIFT type key points (which may be key points or corner points obtained by SIFT algorithm, etc.), FAST (Features from accelerated segment test, feature point detection algorithm) type key points (which may be The key points or corner points obtained by the FAST algorithm), the key points of the ORB type (which can be the key points or corner points obtained by the ORB algorithm, etc.), the key points of the Harris type (which can be obtained by the Harris algorithm) key points or corner points, etc.). This results in more general training data.
经过上述过程可以得到的2D点对所对应的3D点,会包含大量距离相近的3D点,尤其是当图像中某一区域纹理特别丰富时,会出现该区域对应的大量3D点,这样会影响训练数据分布的均衡,需要进行筛选处理。筛选过程如下:The 3D points corresponding to the 2D point pairs that can be obtained through the above process will contain a large number of 3D points with similar distances, especially when a certain area in the image is particularly rich in texture, a large number of 3D points corresponding to this area will appear, which will affect The balanced distribution of training data needs to be filtered. The screening process is as follows:
可以先定义一个3D点集合S,包含筛选之后的3D点。可以从上述生成的3D点中进行遍历,使得筛选出来的3D点中的任意两个3D点的相似度小于或等于一个阈值,则集合S中得到的3D点中的任意两个3D点的相似度小于或等于一个阈值。其中,相似度算法可以通过欧式距离算法来确定。A 3D point set S can be defined first, including the filtered 3D points. It can be traversed from the 3D points generated above, so that the similarity of any two 3D points in the filtered 3D points is less than or equal to a threshold, then the similarity of any two 3D points in the 3D points obtained in the set S is degree is less than or equal to a threshold. The similarity algorithm can be determined by the Euclidean distance algorithm.
此外还可以为:设置一个集合P,该集合P为候选3D点的集合。在筛选前,可以先将生成的所有的3D点作为候选3D点,设置在集合P中。先从集合P中确定任意两个3D点的相似度小于或等于一个阈值,可以通过遍历的方式来选择任意两个3D点,从而确定其相似度。如果相似度小于或等于一个阈值,则将该任意两个3D点放入到集合S中。此时,再计算集合P与集合S中每一个3D点的相似度,即欧式距离d,若d不大于设定阈值α,则将集合P中对应的3D点加入到集合S中,以使得集合S中的任意两个3D点的相似度都小于或等于一个阈值,从而使得集合S中的3D点不是过度相似的,具有数据的平衡性。如果经过筛选后,集合S为空,则可以将候选3D点P加入集合S,其中,候选3D点可以是上述生成的3D点。In addition, a set P may also be set, and the set P is a set of candidate 3D points. Before screening, all the generated 3D points can be set in the set P as candidate 3D points. First, it is determined from the set P that the similarity of any two 3D points is less than or equal to a threshold value, and any two 3D points can be selected by traversing to determine their similarity. If the similarity is less than or equal to a threshold, put the any two 3D points into the set S. At this time, the similarity between the set P and each 3D point in the set S, that is, the Euclidean distance d, is calculated. If d is not greater than the set threshold α, the corresponding 3D points in the set P are added to the set S, so that The similarity of any two 3D points in the set S is less than or equal to a threshold, so that the 3D points in the set S are not overly similar and have data balance. If the set S is empty after screening, the candidate 3D point P may be added to the set S, where the candidate 3D point may be the 3D point generated above.
在筛选完3D点,即完成了选择空间点,还需要继续筛选对应的2D点对,即选择对应的真实图像点对。应理解的是,根据前文可知,在创建了空间三维模型后,空间三维模型中的空间点与用于构建该空间三维模型的真实图像中的真实图像点具有对应关系,那么在筛选完成空间点后,该筛选完成的空间点则也是具有对应的真实图像点对的,即2D点对。After the 3D points are screened, the selection of spatial points is completed, and the corresponding 2D point pairs need to be screened, that is, the corresponding real image point pairs are selected. It should be understood that, according to the foregoing, after the three-dimensional spatial model is created, the spatial points in the three-dimensional spatial model have a corresponding relationship with the real image points in the real image used to construct the three-dimensional spatial model. Afterwards, the spatial points that have been screened also have corresponding real image point pairs, that is, 2D point pairs.
由于筛选后的每一个3D点会对应多个视角下(用于构建该空间三维模型的不同视角下的真实图像)的2D点,为了增加数据集难度,提升模型的准确率以及普适性。每一个3D点只保留最难的一对相匹配的2D点。经过上述过程得到3D点集合S,定义S中的任一一个3D点m,其对应的不同视角下的2D 点构成集合T,T中每个视角下图像采集装置(设置在可移动平台上),如相机,位姿构成集合Q,应理解,每个位姿对应一个图像采集装置,如相机。遍历集合Q,计算其中对应图像采集装置,如相机,位置之间的相似度,如欧式距离,得到其中欧式距离最大的两个相机位置,将T中对应的2D点保留,丢弃其余2D点。由此遍历集合S,确定集合S中每一个3D点对应的唯一2D点对,所有筛选后的2D点对构成集合T。应理解,对于欧式距离最大的两个相机(即图像采集装置)位置,是表示最不相似的两个位置。那么得到的2D点对就是最有难度的。由此,就可以得到第一训练数据了。Since each 3D point after screening will correspond to 2D points in multiple perspectives (real images from different perspectives used to construct the 3D model of the space), in order to increase the difficulty of the dataset, improve the accuracy and universality of the model. For each 3D point only the hardest pair of matching 2D points is kept. Through the above process, a 3D point set S is obtained, and any 3D point m in S is defined, and the corresponding 2D points under different viewing angles form a set T, and an image acquisition device (set on a movable platform) under each viewing angle in T is used. ), such as a camera, the poses constitute a set Q, and it should be understood that each pose corresponds to an image acquisition device, such as a camera. Traverse the set Q, calculate the similarity between the corresponding image acquisition devices, such as cameras, and positions, such as the Euclidean distance, to obtain the two camera positions with the largest Euclidean distance, keep the corresponding 2D points in T, and discard the remaining 2D points. Thus, the set S is traversed, and the unique 2D point pair corresponding to each 3D point in the set S is determined, and all the filtered 2D point pairs constitute the set T. It should be understood that the positions of the two cameras (ie, the image acquisition devices) with the largest Euclidean distance are the two positions that represent the least similarity. Then the 2D point pairs obtained are the most difficult. Thus, the first training data can be obtained.
然而,为了更好地表示训练后的模型效果,可以第一训练数据按照不同难度进行划分,划分为简单、一般、困难,三个种类。对于上述得到的集合S、T,由于集合S中任一3D点对应集合T中的2D点对,那么可以定义其中每一组对应的3D点m,2D点x和2D点y构成一个样本n,按照下述公式(1)计算每一个样本n的困难程度的得分L。However, in order to better represent the effect of the model after training, the first training data can be divided according to different difficulties, and divided into three categories: simple, general, and difficult. For the sets S and T obtained above, since any 3D point in the set S corresponds to a 2D point pair in the set T, then it can be defined that each group of corresponding 3D points m, 2D points x and 2D points y constitute a sample n , and calculate the difficulty score L of each sample n according to the following formula (1).
L=λ1 La+λ2Ld+λ3Lq  (1)L=λ1 La+λ2Ld+λ3Lq (1)
其中,La表示样本n中2D点对与3D点构成的夹角∠xpy,Ld表示2D点x、2D点y对应图像采集装置,如相机,位置之间的空间距离,Lq的表示2D点对应图像采集装置,如相机,位姿的四元数夹角。为了提高划分的合理性,引入权重参数λ1、λ2、λ3。根据最终的困难程度的得分L,将第一训练数据划分为简单、一般、困难。Among them, La represents the angle ∠xpy formed by the 2D point pair in the sample n and the 3D point, Ld represents the 2D point x, 2D point y corresponding to the image acquisition device, such as the camera, the spatial distance between the positions, Lq represents the corresponding 2D point Image acquisition device, such as camera, quaternion angle of pose. In order to improve the rationality of the division, weight parameters λ1, λ2 and λ3 are introduced. According to the final difficulty score L, the first training data is divided into easy, normal, and difficult.
需要说明的是,可以基于该第一训练数据的划分可以知道第一训练数据的难易程度,使得对后续模型的训练更能较为准确地把控,特别是对于模型是否能够符合众多应用场景,是否能够在不同应用场景下都能较为准确地获取描述子等。同时,还可以根据该难易程度,对第一训练数据进行调节,使得其中的样本的难易程度能够符合要求,符合模型训练的需求。根据前文可知,为了进一步减少描述子所使用的存储空间以及减少度量描述子间距离的时间。本申请实施例还可以通过增加布尔描述子的损失函数,在多重损失函数共同作用下使得最终输出布尔数据类型的图像描述子和布尔数据类型的关键点描述子,由于布尔数据类型的描述子在维度上远小于传统的特征描述子,其效果也优于传统特征描述子。此外,从特征提取模型中直接输出布尔数据类型的二进制描述子,更方便后续描述子的检索和匹配。It should be noted that the difficulty level of the first training data can be known based on the division of the first training data, so that the training of subsequent models can be more accurately controlled, especially whether the model can meet many application scenarios, Whether the descriptors can be obtained more accurately in different application scenarios. At the same time, the first training data can also be adjusted according to the degree of difficulty, so that the degree of difficulty of the samples can meet the requirements and meet the requirements of model training. As can be seen from the foregoing, in order to further reduce the storage space used by the descriptors and reduce the time for measuring the distance between the descriptors. In this embodiment of the present application, the loss function of the Boolean descriptor can be added, and under the combined action of multiple loss functions, the image descriptor of the Boolean data type and the key point descriptor of the Boolean data type can finally be output. The dimension is much smaller than the traditional feature descriptor, and its effect is also better than the traditional feature descriptor. In addition, binary descriptors of Boolean data type are directly output from the feature extraction model, which is more convenient for subsequent retrieval and matching of descriptors.
具体的,通过第一训练数据,对初始关键点信息生成层进行训练,生成 训练后的关键点信息生成层,包括:在初始关键点信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;通过第一训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层。Specifically, using the first training data to train the initial key point information generation layer to generate a trained key point information generation layer, including: adding Boolean to the loss function of the floating point data type in the initial key point information generation layer The loss function of the data type; through the first training data, the loss function of the floating point data type, and the loss function of the Boolean data type, the initial key point information generation layer is trained, and the trained key point information generation layer is generated.
根据前文所述可知,在训练关键点信息生成层进行训练前,该层的损失函数可以由浮点数据类型的损失函数转换为浮点数据类型的损失函数上增加布尔数据类型的损失函数,即形成多重损失函数。应理解,仅有浮点数据类型的损失函数也是可以实现模型训练的,但是模型在训练得到的描述子是浮点数据类型的描述子。所以,对于浮点数据类型的损失函数上增加布尔数据类型的损失函数作为该层的损失函数,同时通过第一训练数据来进行对该层的训练,得到训练后的该层。也可以得到训练后的特征提取层。According to the above, before training the key point information generation layer for training, the loss function of this layer can be converted from a loss function of floating point data type to a loss function of floating point data type, and a loss function of Boolean data type is added to the loss function, that is, Form multiple loss functions. It should be understood that only a loss function of floating-point data type can also implement model training, but the descriptor obtained by the model during training is a descriptor of floating-point data type. Therefore, the loss function of the boolean data type is added to the loss function of the floating point data type as the loss function of the layer, and at the same time, the layer is trained through the first training data, and the trained layer is obtained. The trained feature extraction layer can also be obtained.
基于此,在训练完成关键点信息生成层以及特征提取层后,就可以对图像描述信息生成层进行训练。Based on this, after the key point information generation layer and the feature extraction layer are trained, the image description information generation layer can be trained.
具体的,该方法100还可以包括:基于训练后的特征提取层,通过第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,作为特征提取模型中的图像描述信息生成层,其中,第二训练数据包括关键帧图像匹配对以及表示每个关键帧图像匹配对是否属于同一视觉场景的信息。Specifically, the method 100 may further include: based on the trained feature extraction layer, training the initial image description information generation layer by using the second training data, and generating the trained image description information generation layer as the image in the feature extraction model The description information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
其中,第二训练数据的获取方式可以为:获取真实图像,基于分类模型,从真实图像中确定出真实图像匹配对,作为关键帧图像匹配对,并确定各个真实图像匹配对的是否属于同一视觉场景,从而获取到第二训练数据。The second training data may be acquired in the following manner: acquiring real images, determining real image matching pairs from the real images based on the classification model, and using them as key frame image matching pairs, and determining whether each real image matching pair belongs to the same vision scene, so as to obtain the second training data.
其中,分类模型可以是对真实图像进行匹配的模型,该模型可以确定出真实图像中属于同一视觉场景的真实图像匹配对和不属于同一视觉场景的真实图像匹配对,如属于同一地点的两个真实图像。该模型可以为BoW。The classification model may be a model for matching real images, and the model may determine real image matching pairs that belong to the same visual scene and real image matching pairs that do not belong to the same visual scene, such as two matching pairs belonging to the same location. real image. The model can be BoW.
例如,根据前文所述,可以通过无人机在视觉回航的实际飞行场景中,获取到多个不同视觉场景下的多个真实图像。然后使用BoW模型,将真实图像输入至该模型中,得到由该模型确定的同一视觉场景下的真实图像匹配对,和不是同一场景下的真实图像匹配对。其中,该模型可以通过打分的方式,确定出匹配对。将分数高于阈值的真实图像匹配对作为同一视觉场景下的真实图像匹配对,即正样本训练数据。分数低于阈值的真实图像匹配对作为不是同一视觉场景下的真实图像匹配对,即负样本训练数据。由此,可以得到 第二训练数据。For example, according to the foregoing description, multiple real images in multiple different visual scenes can be obtained by the drone in the actual flight scene of the visual return flight. Then use the BoW model to input real images into the model to obtain matching pairs of real images in the same visual scene and matching pairs of real images in different scenes determined by the model. Among them, the model can determine matching pairs by scoring. The real image matching pairs with scores higher than the threshold are regarded as real image matching pairs in the same visual scene, that is, positive sample training data. The real image matching pairs whose scores are lower than the threshold are regarded as real image matching pairs that are not in the same visual scene, that is, the negative sample training data. Thus, the second training data can be obtained.
为了能够提高模型能力,以及模型的普适性。在经过BOW模型确定后,还可以再增加随机的候选匹配对,即从采集到的真实图像中随机抽取候选匹配对,生成候选匹配对,生成完候选匹配对后。由人工来进一步确定这些匹配对是否存在错误或者问题,当存在问题或错误,特别是由分类模型导致,则可以得到珍贵的负样本训练数据,以提高模型能力。In order to improve the ability of the model and the universality of the model. After the BOW model is determined, a random candidate matching pair can be added, that is, a candidate matching pair is randomly selected from the collected real image to generate a candidate matching pair, and after the candidate matching pair is generated. Whether there are errors or problems in these matching pairs is further determined manually. When there are problems or errors, especially caused by the classification model, precious negative sample training data can be obtained to improve the model ability.
具体的,该方法100还包括:通过展示真实图像匹配对,响应于用户的确定操作,确定真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。Specifically, the method 100 further includes: by displaying the real image matching pairs, in response to a user's determination operation, determining whether the real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
根据前文所述,在得到真实图像匹配对以及候选匹配对后,或者,只是在得到真实匹配对后,可以将匹配对对应的图像通过展示设备,如显示屏展示展示出来,可以以匹配对的形式进行展示,如,在一对匹配对进行展示的时候,除了展示出来对应的两个真实图像外,还可以展示出来两个真实图像之间的相对应的特征点,对应的特征点可以通过连线进行连接。然后,通过工人(即用户)进行标注。该批注可以包括以下几种情况:相同、不相同以及不确定。相同可以用“0”表示、不相同可以用“1”表示、不确定可以用“2”表示。可以把人工标注为不确定的匹配对剔除掉,不作为第二训练数据。其它的作为第二训练数据,即标注为“0”的匹配对以及标注为“1”的匹配对,作为第二训练数据。According to the foregoing, after obtaining the real image matching pairs and candidate matching pairs, or just after obtaining the real matching pairs, the images corresponding to the matching pairs can be displayed through a display device, such as a display screen, and the matching pairs can be displayed as the matching pairs. For example, when a pair of matching pairs is displayed, in addition to displaying the corresponding two real images, the corresponding feature points between the two real images can also be displayed, and the corresponding feature points can be displayed through line to connect. Then, annotations are made by workers (i.e. users). The annotation can include the following situations: same, not same, and indeterminate. The same can be represented by "0", the difference can be represented by "1", and the uncertainty can be represented by "2". Matching pairs that are manually marked as uncertain can be eliminated and not used as the second training data. Others are used as the second training data, that is, the matching pairs marked with "0" and the matching pairs marked with "1" are used as the second training data.
具体的,该方法100还包括:从真实图像中随机选择真实图像匹配对,作为关键帧图像匹配对;通过展示随机选择的真实图像匹配对,响应于用户的确定操作,确定随机选择的真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。Specifically, the method 100 further includes: randomly selecting real image matching pairs from real images as key frame image matching pairs; by displaying the randomly selected real image matching pairs, in response to a user's determination operation, determining the randomly selected real images Whether the matching pair belongs to the same visual scene, so as to obtain the second training data.
由于前文已经阐述过了,此处就不再赘述。Since it has been described above, it will not be repeated here.
需要说明的是,通过使用BoW模型进行匹配对的挑选,选取的负样本训练数据也是存在较困难的问题,通过在基于该BOW模型上进行人工标注,则可以发现比较有价值的负样本训练数据(即场景较相似,被BOW误确定为是属于同一视觉场景的匹配对),有助于训练更加鲁棒性的模型网络。此外,由于匹配对的图像可以是无人机从视觉回航的实际飞行场景中获取的,能够充分的反映在视觉回航任务中视角尺度变化。It should be noted that by using the BoW model to select matching pairs, the selected negative sample training data also has a difficult problem. By manually labeling based on the BOW model, more valuable negative sample training data can be found. (that is, the scenes are similar and are mistakenly identified by BOW as matching pairs belonging to the same visual scene), which helps to train a more robust model network. In addition, since the images of the matching pair can be obtained by the UAV from the actual flight scene of the visual return, it can fully reflect the change of the perspective scale in the visual return task.
在得到第二训练数据可以对初始图像描述信息生成层进行训练,具体的 训练过程就不再赘述了。最终可以得到训练后的图像描述信息生成层。After obtaining the second training data, the initial image description information generation layer can be trained, and the specific training process will not be repeated. Finally, the trained image description information generation layer can be obtained.
根据前文可知,为了进一步减少描述子所使用的存储空间以及减少度量描述子间距离的时间。本申请实施例还可以通过增加布尔描述子的损失函数,在多重损失函数共同作用下使得最终输出布尔数据类型的图像描述子和布尔数据类型的关键点描述子,由于布尔数据类型的描述子在维度上远小于传统的特征描述子,其效果也优于传统特征描述子。此外,从特征提取模型中直接输出布尔数据类型的二进制描述子,更方便后续描述子的检索和匹配,如从图像描述信息生成层输出第二图像描述信息的布尔数据类型的二进制描述子。As can be seen from the foregoing, in order to further reduce the storage space used by the descriptors and reduce the time for measuring the distance between the descriptors. In this embodiment of the present application, the loss function of the Boolean descriptor can be added, and under the combined action of multiple loss functions, the image descriptor of the Boolean data type and the key point descriptor of the Boolean data type can finally be output. The dimension is much smaller than the traditional feature descriptor, and its effect is also better than the traditional feature descriptor. In addition, the binary descriptor of Boolean data type is directly output from the feature extraction model, which is more convenient for the retrieval and matching of subsequent descriptors. For example, the binary descriptor of the Boolean data type of the second image description information is output from the image description information generation layer.
具体的,基于训练后的特征提取层,通过第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,包括:在初始图像描述信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;基于训练后的特征提取层,通过第二训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层。Specifically, based on the feature extraction layer after training, the initial image description information generation layer is trained by the second training data, and the trained image description information generation layer is generated, including: floating point data in the initial image description information generation layer The loss function of Boolean data type is added to the loss function of the type; based on the feature extraction layer after training, the initial image description information generation layer is processed through the second training data, the loss function of the floating point data type and the loss function of the Boolean data type. Train, generate the image description information generation layer after training.
由于前文已经阐述过了,此处就不再赘述。仅说明:对初始图像描述信息生成层的训练,是基于训练后的特征提取层以及浮点数据类型的损失函数以及布尔数据类型的损失函数,通过第二训练数据,对初始图像描述信息生成层进行训练的。从而可以完整地训练出来特征提取模型。Since it has been described above, it will not be repeated here. Only note: The training of the initial image description information generation layer is based on the feature extraction layer after training and the loss function of floating point data type and the loss function of Boolean data type. Through the second training data, the initial image description information generation layer is training. Thus, the feature extraction model can be completely trained.
因为本特征提取模型的网络是多任务分支网络,在训练的时候可以采取分步训练的方式,可以首先使用第一训练数据对该模型进行训练,即,对初始关键点信息生成层进行训练,当训练到损失无明显下降后,固定初始关键点信息生成层和初始特征提取层的参数,得到关键点信息生成层和特征提取层。然后利用第二训练数据去训练初始图像描述信息生成层,得到图像描述信息生成层。Because the network of this feature extraction model is a multi-task branch network, a step-by-step training method can be adopted during training. The first training data can be used to train the model, that is, the initial key point information generation layer is trained. When the loss does not decrease significantly after training, the parameters of the initial key point information generation layer and the initial feature extraction layer are fixed to obtain the key point information generation layer and the feature extraction layer. Then use the second training data to train the initial image description information generation layer to obtain the image description information generation layer.
由于本申请实施例是通过同一个特征提取模型同时获取第二图像描述信息和第二关键点描述信息,在训练的过程中,可以首先使用第一训练数据对特征提取模型中对应的,是因为第一训练数据是从空间三维模型中获取的,是完全正确的数据,经过第一训练数据的训练后,该模型的公共层,即特征提取层,已经是一个比较好的特征提取层了。Since the embodiment of the present application obtains the second image description information and the second key point description information simultaneously through the same feature extraction model, in the training process, the first training data can be used to firstly use the first training data to correspond to the feature extraction model, because the The first training data is obtained from the spatial three-dimensional model, which is completely correct data. After the training of the first training data, the common layer of the model, that is, the feature extraction layer, is already a good feature extraction layer.
然后再利用工人标注的第二训练数据去训练初始图像描述信息生成层,经过第一训练数据训练好的共用的特征提取层后,再使用第二训练数据训练 的时候可以避免工人标注时候产生的误差对网络的影响,从而得到更优的图像描述信息生成层。Then use the second training data marked by the workers to train the initial image description information generation layer. After the shared feature extraction layer trained by the first training data, the second training data can be used for training to avoid the generation of worker annotations. The influence of the error on the network, so as to obtain a better image description information generation layer.
需要说明的是,上述训练方式也可以颠倒顺序,即先通过第二训练数据进行训练,然后再通过第一训练数据进行训练。此处就不再赘述。It should be noted that the above-mentioned training methods can also be reversed in order, that is, the training is performed first with the second training data, and then the training is performed with the first training data. It will not be repeated here.
为了更加精准地调整该模型,还可以再利用同时包含关键点和关键帧图像的训练数据去微调整个网络。To fine-tune the model more precisely, the entire network can be fine-tuned using training data that contains both keypoints and keyframe images.
具体的,在训练完特征提取模型后,该方法100还包括:通过第三训练数据,对特征提取模型中的特征提取层,图像描述信息生成层和/或关键点信息生成层进行调整,第三训练数据包括关键帧图像匹配对以及关键帧图像匹配对中的关键点匹配对。Specifically, after the feature extraction model is trained, the method 100 further includes: adjusting the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through the third training data, and the third The three training data includes keyframe image matching pairs and keypoint matching pairs in the keyframe image matching pairs.
其中,第三训练数据的确定方式可以为:当两个真实图像中具有的真实图像点对数量大于阈值的情况下,则将两个真实图像以及对应的真实图像点对作为关键帧图像匹配对以及关键点匹配对,从而得到第三训练数据。Wherein, the third training data can be determined in the following way: when the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as the key frame image matching pair and key point matching pairs to obtain the third training data.
根据前文所述的方式可知,可以构建空间三维模型,模型建立好后,存在空间三维模型中各个真实3D点所对应的至少两个真实图像中的2D点,从而形成2D点对,即一对2D点对属于空间三维模型的一个3D点。当两个真实图像中的存在多个2D点对,其个数大于个数阈值,则将这两个真实图像以及其中的2D点对可以作为第三训练数据,第三训练数据可以具有多对真实图像,以及每对真实图像中具有对应的2D点对。According to the method described above, a three-dimensional spatial model can be constructed. After the model is established, there are at least two 2D points in the real images corresponding to each real 3D point in the spatial three-dimensional model, thereby forming a 2D point pair, that is, a pair of A 2D point pair belongs to a 3D point of a 3D model of space. When there are multiple 2D point pairs in the two real images, the number of which is greater than the number threshold, the two real images and the 2D point pairs in them can be used as the third training data, and the third training data can have multiple pairs real images, and each pair of real images has corresponding 2D point pairs.
在得到第三训练数据后,利用第三训练数据对通过第一训练数据以及第二训练数据训练好的模型进行微调。对训练好的特征提取模型中的特征提取层,图像描述信息生成层和/或关键点信息生成层的参数进行微调。此处就不再赘述。After the third training data is obtained, the model trained by the first training data and the second training data is fine-tuned by using the third training data. Fine-tune the parameters of the feature extraction layer, image description information generation layer and/or keypoint information generation layer in the trained feature extraction model. It will not be repeated here.
通过微调后的模型可以进行使用。如果该模型是在可移动平台上进行训练的,则可以直接进行使用,如果该模型是在终端上训练的,如服务器或者电脑等,则可以将训练好的最终模型移植到可移动平台上。The fine-tuned model can be used. If the model is trained on a mobile platform, it can be used directly. If the model is trained on a terminal, such as a server or a computer, the trained final model can be transplanted to the mobile platform.
根据前文所述,在获取到第二关键点描述信息后,可以根据关键点在图像中的顺序,将对应的信息进行组合,以便进行后续的匹配。According to the foregoing, after obtaining the description information of the second key points, the corresponding information can be combined according to the order of the key points in the image, so as to perform subsequent matching.
具体的,该方法100还包括:根据当前图像中的多个关键点的顺序,将对应的多个第二关键点描述信息组合到一个向量中。Specifically, the method 100 further includes: combining the corresponding multiple second key point description information into a vector according to the sequence of multiple key points in the current image.
例如,根据前文所述,无人机通过特征提取模型获取到当前图像的多个 关键点描述子后,可以按照关键点在当前图像中的顺序,将对应的描述子整合到一个向量中。以便后续进行匹配。For example, according to the foregoing, after the UAV obtains multiple key point descriptors of the current image through the feature extraction model, the corresponding descriptors can be integrated into a vector according to the order of the key points in the current image. for subsequent matching.
相应的,对于第一关键点描述信息而言,也是可以按照关键点在历史图像中的顺序,将对应的描述子整合到一个向量中。以便后续进行匹配。Correspondingly, for the first key point description information, it is also possible to integrate the corresponding descriptors into a vector according to the order of the key points in the historical image. for subsequent matching.
103:基于历史图像的第一图像描述信息和第一关键点描述信息,以及当前图像的第二图像描述信息和第二关键点描述信息,确定多张历史图像与当前图像的匹配结果。103: Based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image, determine the matching results of the multiple historical images and the current image.
其中,所述图像描述信息用于在多张所述历史图像中查找与所述当前图像具有相似场景的第一类历史图像,所述关键点描述信息用于在所述第一类历史图像中查找与所述当前图像的关键点相匹配的关键点,所述匹配结果包括所述当前图像的关键点与所述历史图像中的关键点的匹配关系。Wherein, the image description information is used to find a first type of historical image with a scene similar to the current image in a plurality of the historical images, and the key point description information is used to find a first type of historical image in the first type of historical image A key point matching the key point of the current image is searched, and the matching result includes the matching relationship between the key point of the current image and the key point in the historical image.
换句话说,图像描述信息用于对图像进行粗略的配对,基于此,得到与当前图像场景较为匹配的一张或者多张历史图像(第一类历史图像)。而关系到定位的,是关键点的匹配关系,基于关键点描述信息,能够将当前图像历史图像中的关键点进一步进行匹配,得到关键点的匹配关系,即,当前图像中的一个关键点,与历史图像中的一个关键点的匹配关系。In other words, the image description information is used to roughly pair the images, and based on this, one or more historical images (the first type of historical images) that more closely match the current image scene are obtained. What is related to positioning is the matching relationship of key points. Based on the description information of key points, the key points in the current image history image can be further matched to obtain the matching relationship of key points, that is, a key point in the current image, A match to a keypoint in the historical image.
历史图像中的关键点的位置信息可认为是准确的,因为基于历史图像中的关键点的位置信息,当前图像中的一个关键点与历史图像中的一个关键点的匹配关系,即可得到当前图像中的一个关键点的位置信息。The position information of the key points in the historical image can be considered accurate, because based on the position information of the key points in the historical image, the matching relationship between a key point in the current image and a key point in the historical image can be obtained. Location information of a key point in the image.
例如,根据前文所述,无人机在得到上述历史图像的图像描述子以及关键点描述子或者关键点描述子组成的向量。且得到了当前图像的图像描述子以及关键点描述子或者关键点描述子组成的向量。将当前图像对应的图像描述子以及关键点描述子或者关键点描述子组成的向量,可以与多个历史图像对应的图像描述子以及关键点描述子或者关键点描述子组成的向量,进行对比,可以通过相似度算法来确定对比结果,即匹配结果。For example, according to the foregoing, the UAV obtains the image descriptor of the above-mentioned historical image and the key point descriptor or the vector composed of the key point descriptor. And the image descriptor of the current image and the keypoint descriptor or the vector composed of the keypoint descriptor are obtained. The vector composed of the image descriptor corresponding to the current image and the key point descriptor or key point descriptor can be compared with the image descriptor corresponding to multiple historical images and the vector composed of the key point descriptor or key point descriptor. The comparison result, that is, the matching result, can be determined through a similarity algorithm.
在当前图像与历史图像进行对比的时候,对比结果可以是当前图像可以与其中一个历史图像完全相同,或者部分相同,即相似。当只存在相似的情况下,可以根据相似度算法得到相似度,确定相似度是否大于相似度阈值。当大于相似度阈值,则可以确定为匹配结果为匹配。否则,即为不匹配。When the current image is compared with the historical image, the comparison result may be that the current image may be exactly the same as one of the historical images, or partially the same, that is, similar. When there is only similarity, the similarity can be obtained according to the similarity algorithm to determine whether the similarity is greater than the similarity threshold. When it is greater than the similarity threshold, it can be determined that the matching result is a matching. Otherwise, it is a mismatch.
需要说明的是,上述相似度算法可以包括汉明距离、欧式距离等等。It should be noted that the above similarity algorithm may include Hamming distance, Euclidean distance, and the like.
此外,根据前文可知,上述图像描述子以及关键点描述子可以为布尔描 述子。而布尔描述子在通过相似度算法度量对应的描述子之间的距离的时候,只需要进行异或操作,获取相似度,如汉明距离,能够大大的加速对应的描述子之间距离的计算,从而进一步减少时间消耗。In addition, according to the foregoing, the above-mentioned image descriptors and key point descriptors can be Boolean descriptors. When the Boolean descriptor measures the distance between the corresponding descriptors through the similarity algorithm, it only needs to perform the XOR operation to obtain the similarity, such as the Hamming distance, which can greatly speed up the calculation of the distance between the corresponding descriptors. , thereby further reducing the time consumption.
104:根据匹配结果和历史图像的第一位置信息,确定采集当前图像时可移动平台的第二位置信息。104: According to the matching result and the first position information of the historical image, determine the second position information of the movable platform when the current image is collected.
例如,根据前文所述,无人机根据上述匹配结果,确定出当前图像与哪个历史图像相同或符合相似度阈值,从而根据该历史图像所属的地理位置,确定出当前图像所属的地理位置。其中,确定出的地理位置可以是当前图像的绝对地理位置,即以地理位置坐标系为基准的地理位置或者是相对于历史图像的地理位置。当两个图像不属于完全相同时,如存在角度变化,但是属于同一视觉场景下的两个图像。则可以通过这两个图像进行空间三维建模。然后根据该空间三维模型来确定出这两个图像的不同角度,或者不同位置,从而确定出当前图像的位置。For example, according to the above-mentioned matching results, the UAV determines which historical image the current image is the same as or meets the similarity threshold, so as to determine the geographic location to which the current image belongs based on the geographic location to which the historical image belongs. The determined geographic location may be an absolute geographic location of the current image, that is, a geographic location based on a geographic location coordinate system or a geographic location relative to a historical image. When the two images do not belong to exactly the same, such as there is an angle change, but belong to the two images in the same visual scene. Then, spatial 3D modeling can be performed through these two images. Then, different angles or different positions of the two images are determined according to the three-dimensional spatial model, thereby determining the position of the current image.
在确定出当前图像的位置后,可移动平台可以根据该位置进行回航。After determining the position of the current image, the movable platform can go back according to the position.
具体的,该方法100还可以包括:根据匹配结果中对应的第一位置信息与对应的第二位置信息的位置偏差,确定可移动平台的姿态;根据姿态,可移动平台从第二位置移动至第一位置,以实现可移动平台的自动返回。Specifically, the method 100 may further include: determining the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; according to the posture, the movable platform moves from the second position to the The first position to realize the automatic return of the movable platform.
例如,根据前文所述,根据上述两个位置的偏差,确定和调整无人机的姿势。使得无人机可以从第二位置向第一位置进行移动,从而实现回航。For example, as described above, the posture of the drone is determined and adjusted according to the deviation of the above two positions. So that the drone can move from the second position to the first position, so as to realize the return flight.
图3为本发明实施例提供的一种定位的装置的结构示意图;该装置300可以应用于可移动平台中,例如,无人机、智能移动机器人等,可移动平台包括视觉传感器。该装置300可以执行上述的定位的方法。其中,该装置300包括:第一获取模块301、第二获取模块302、第一确定模块303以及第二确定模块304。以下针对各个模块的功能进行详细的阐述:3 is a schematic structural diagram of a positioning device according to an embodiment of the present invention; the device 300 can be applied to a movable platform, such as an unmanned aerial vehicle, an intelligent mobile robot, etc., and the movable platform includes a visual sensor. The apparatus 300 can perform the above-mentioned positioning method. The apparatus 300 includes: a first obtaining module 301 , a second obtaining module 302 , a first determining module 303 and a second determining module 304 . The functions of each module are described in detail below:
第一获取模块301,用于获取视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集历史图像时的可移动平台的第一位置信息。The first obtaining module 301 is configured to obtain first image description information and first key point description information of historical images collected by the visual sensor, and obtain first position information of the movable platform when collecting the historical images.
第二获取模块302,用于获取视觉传感器采集的当前图像,并基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息。The second obtaining module 302 is configured to obtain the current image collected by the visual sensor, and obtain second image description information and second key point description information of the current image based on the feature extraction model.
第一确定模块303,用于基于历史图像的第一图像描述信息和第一关键点 描述信息,以及当前图像的第二图像描述信息和第二关键点描述信息,确定多张历史图像与当前图像的匹配结果。The first determination module 303 is configured to determine a plurality of historical images and the current image based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image. match result.
第一确定模块304,用于根据匹配结果和历史图像的第一位置信息,确定采集当前图像时可移动平台的第二位置信息。The first determination module 304 is configured to determine the second position information of the movable platform when the current image is collected according to the matching result and the first position information of the historical image.
具体的,特征提取模型包括特征提取层,图像描述信息生成层和关键点信息生成层;特征提取层用于基于卷积网络提取当前图像的共用特征信息;图像描述信息生成层用于基于共用特征信息生成第二图像描述信息;关键点信息生成层用于基于共用特征信息生成第二关键点描述信息。Specifically, the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared features. The information generates second image description information; the key point information generation layer is used for generating the second key point description information based on the common feature information.
具体的,特征提取层用于基于卷积网络中的多个卷积层提取共用特征信息。Specifically, the feature extraction layer is used to extract common feature information based on multiple convolutional layers in the convolutional network.
具体的,图像信息生成层用于通过两层卷积层以及NetVLAD层,从共用特征信息提取第二图像描述信息。Specifically, the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer.
具体的,图像信息生成层用于通过两层卷积层以及NetVLAD层,从所述共用特征信息提取到浮点数据类型的第二图像描述信息;图像信息生成层用于将浮点数据类型的第二图像描述信息转换为布尔数据类型的第二图像描述信息。Specifically, the image information generation layer is used to extract the second image description information of the floating point data type from the common feature information through the two convolution layers and the NetVLAD layer; the image information generation layer is used to convert the floating point data type to the second image description information. The second image description information is converted into the second image description information of the Boolean data type.
具体的,关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,一层卷积层的卷积核数量与第二关键点描述信息的数量相同。Specifically, the key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of one convolution layer is related to the second key point. The amount of descriptive information is the same.
具体的,关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取到浮点数据类型的第二关键点描述信息;关键点信息生成层用于将浮点数据类型的第二关键点描述信息转换为布尔数据类型的第二关键点描述信息。Specifically, the key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a convolution layer and bilinear upsampling; the key point information generation layer is used to convert the floating point data. The second key point description information of the point data type is converted into the second key point description information of the Boolean data type.
具体的,第二获取模块302,用于:确定当前图像中关键点的位置;关键点信息生成层用于通过一层卷积层得到共用特征信息的下采样信息;关键点信息生成层用于通过双线性上采样直接对下采样信息中对应位置的信息进行上采样,得到第二关键点描述信息。Specifically, the second acquisition module 302 is used to: determine the position of the key point in the current image; the key point information generation layer is used to obtain the downsampling information of the shared feature information through a layer of convolution layer; the key point information generation layer is used to The information of the corresponding position in the down-sampling information is directly up-sampled by bilinear up-sampling, so as to obtain the second key point description information.
此外,该装置300还包括:组合模块,用于根据当前图像中的多个关键点的顺序,将对应的多个第二关键点描述信息组合到一个向量中。In addition, the apparatus 300 further includes: a combining module, configured to combine the corresponding multiple second key point description information into a vector according to the sequence of the multiple key points in the current image.
此外,该装置300还包括:第三确定模块,用于根据匹配结果中对应的第一位置信息与对应的第二位置信息的位置偏差,确定可移动平台的姿态;移 动模块,用于根据姿态,可移动平台从第二位置移动至第一位置,以实现可移动平台的自动返回。In addition, the device 300 further includes: a third determining module, configured to determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; a moving module, configured to determine the posture of the movable platform according to the posture , the movable platform moves from the second position to the first position to realize the automatic return of the movable platform.
此外,该装置300还包括:训练模块,用于通过第一训练数据,对初始特征提取层进行训练,生成训练后的特征提取层,作为特征提取模型中的特征提取层;第一训练数据包括对应于同一空间点的图像点对,该图像点对在表示为同一视觉场景的不同对应真实图像中;训练模块,用于通过第一训练数据,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层,作为特征提取模型中的关键点信息生成层。In addition, the device 300 further includes: a training module for training the initial feature extraction layer through the first training data, and generating a trained feature extraction layer as the feature extraction layer in the feature extraction model; the first training data includes Image point pairs corresponding to the same spatial point, the image point pairs are represented in different corresponding real images of the same visual scene; the training module is used to train the initial key point information generation layer through the first training data, and generate training The latter key point information generation layer is used as the key point information generation layer in the feature extraction model.
此外,第二获取模块302,用于针对不同的视觉场景,获取每个视觉场景下的不同角度的真实图像;该装置300还包括:创建模块,用于针对每个视觉场景,根据对应不同角度的真实图像,构建空间三维模型;选择模块,用于基于空间点之间的相似度,从空间三维模型中选择空间点,以及获得选择后的每个空间点在真实图像中对应的真实图像点对;选择模块,用于根据真实图像点对的采集位置之间的相似度,对真实图像点对进行选择,将选择出来的真实图像点对作为关键点对,从而得到第一训练数据。In addition, the second acquisition module 302 is used for acquiring real images from different angles under each visual scene for different visual scenes; the device 300 further includes: a creation module for each visual scene, according to corresponding different angles The real image of the space is constructed, and the spatial three-dimensional model is constructed; the selection module is used to select the space point from the space three-dimensional model based on the similarity between the space points, and obtain the real image point corresponding to each selected space point in the real image. Right; the selection module is used to select the real image point pair according to the similarity between the collection positions of the real image point pair, and use the selected real image point pair as the key point pair to obtain the first training data.
具体的,训练模块,包括:增加单元,用于在初始关键点信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;训练单元,用于通过第一训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层。Specifically, the training module includes: an adding unit for adding a loss function of Boolean data type to the loss function of floating point data type in the initial key point information generation layer; a training unit for passing the first training data, floating point data The loss function of point data type and the loss function of Boolean data type are used to train the initial key point information generation layer, and generate the key point information generation layer after training.
此外,训练模块,还用于:基于训练后的特征提取层,通过第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,作为特征提取模型中的图像描述信息生成层,其中,第二训练数据包括关键帧图像匹配对以及表示每个关键帧图像匹配对是否属于同一视觉场景的信息。In addition, the training module is also used for: training the initial image description information generation layer through the second training data based on the trained feature extraction layer, and generating the trained image description information generation layer as the image description in the feature extraction model The information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
此外,第二获取模块302,还包括:获取真实图像,基于分类模型,从真实图像中确定出真实图像匹配对,作为关键帧图像匹配对,并确定各个真实图像匹配对的是否属于同一视觉场景,从而获取到第二训练数据。In addition, the second acquisition module 302 further includes: acquiring real images, determining real image matching pairs from the real images based on the classification model, as key frame image matching pairs, and determining whether each real image matching pair belongs to the same visual scene , so as to obtain the second training data.
此外,该装置300还包括:第三确定模块,用于通过展示真实图像匹配对,响应于用户的确定操作,确定真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。In addition, the apparatus 300 further includes: a third determination module, configured to determine whether the real image matching pairs belong to the same visual scene by displaying the real image matching pairs in response to the user's determination operation, thereby acquiring the second training data.
具体的,选择模块,还用于:从真实图像中随机选择真实图像匹配对,作为关键帧图像匹配对;第三确定模块,用于通过展示随机选择的真实图像 匹配对,响应于用户的确定操作,确定随机选择的真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。Specifically, the selection module is further configured to: randomly select real image matching pairs from the real images as key frame image matching pairs; the third determining module is configured to respond to the user's determination by displaying the randomly selected real image matching pairs operation to determine whether the randomly selected matching pairs of real images belong to the same visual scene, so as to obtain the second training data.
具体的,增加单元,还用于:在初始图像描述信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;训练单元,还用于基于训练后的特征提取层,通过所述第二训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层。Specifically, the adding unit is also used for: adding a loss function of boolean data type to the loss function of floating point data type in the initial image description information generation layer; the training unit is also used to extract the layer based on the features after training, through The second training data, the loss function of the floating point data type, and the loss function of the Boolean data type are used to train the initial image description information generation layer to generate a trained image description information generation layer.
此外,在训练完特征提取模型后,该装置300还包括:调整模块,用于通过第三训练数据,对特征提取模型中的特征提取层,图像描述信息生成层和/或关键点信息生成层进行调整,第三训练数据包括关键帧图像匹配对以及关键帧图像匹配对中的关键点匹配对。In addition, after the feature extraction model is trained, the apparatus 300 further includes: an adjustment module, configured to perform a feature extraction layer, an image description information generation layer and/or a key point information generation layer in the feature extraction model through the third training data After adjustment, the third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
此外,选择模块,还用于:当两个真实图像中具有的真实图像点对数量大于阈值的情况下,则将两个真实图像以及对应的真实图像点对作为关键帧图像匹配对以及关键点匹配对,从而得到第三训练数据。In addition, the selection module is also used for: when the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as key frame image matching pairs and key points Match pairs to obtain the third training data.
在一个可能的设计中,图3所示定位的装置300的结构可实现为一电子设备,该电子设备可以是定位的设备,如可移动平台。如图4所示,该定位的设备400可以包括:一个或多个处理器401、一个或多个存储器402以及视觉传感器403。其中,视觉传感器403,用于采集的历史图像以及当前图像。存储器402用于存储支持电子设备执行上述图1-图2所示实施例中提供的定位的方法的程序。处理器401被配置为用于执行存储器402中存储的程序。具体的,程序包括一条或多条计算机指令,其中,一条或多条计算机指令被处理器401执行时能够实现如下步骤:In a possible design, the structure of the positioning apparatus 300 shown in FIG. 3 may be implemented as an electronic device, and the electronic device may be a positioning device, such as a movable platform. As shown in FIG. 4 , the positioning device 400 may include: one or more processors 401 , one or more memories 402 and a visual sensor 403 . Among them, the visual sensor 403 is used to collect historical images and current images. The memory 402 is used to store a program that supports the electronic device to execute the positioning method provided in the embodiments shown in FIG. 1 to FIG. 2 . The processor 401 is configured to execute programs stored in the memory 402 . Specifically, the program includes one or more computer instructions, wherein the one or more computer instructions can implement the following steps when executed by the processor 401:
运行存储器402中存储的计算机程序以实现:获取视觉传感器采集的历史图像的第一图像描述信息和第一关键点描述信息,并获取采集历史图像时的可移动平台的第一位置信息;获取视觉传感器采集的当前图像,并基于特征提取模型获取当前图像的第二图像描述信息和第二关键点描述信息;基于历史图像的所述第一图像描述信息和所述第一关键点描述信息,以及当前图像的第二图像描述信息和第二关键点描述信息,确定多张历史图像与当前图像的匹配结果;根据匹配结果和历史图像的第一位置信息,确定采集当前图像时可移动平台的第二位置信息。Running the computer program stored in the memory 402 to achieve: obtaining the first image description information and the first key point description information of the historical images collected by the vision sensor, and obtaining the first position information of the movable platform when collecting the historical images; obtaining visual the current image collected by the sensor, and obtain the second image description information and the second key point description information of the current image based on the feature extraction model; the first image description information and the first key point description information based on the historical image, and The second image description information and the second key point description information of the current image are used to determine the matching results between multiple historical images and the current image; 2. Location information.
具体的,特征提取模型包括特征提取层,图像描述信息生成层和关键点 信息生成层;特征提取层用于基于卷积网络提取当前图像的共用特征信息;图像描述信息生成层用于基于共用特征信息生成第二图像描述信息;关键点信息生成层用于基于共用特征信息生成第二关键点描述信息。Specifically, the feature extraction model includes a feature extraction layer, an image description information generation layer and a key point information generation layer; the feature extraction layer is used to extract the common feature information of the current image based on the convolutional network; the image description information generation layer is used based on the shared features. The information generates second image description information; the key point information generation layer is used for generating the second key point description information based on the common feature information.
具体的,特征提取层用于基于卷积网络中的多个卷积层提取共用特征信息。Specifically, the feature extraction layer is used to extract common feature information based on multiple convolutional layers in the convolutional network.
具体的,图像信息生成层用于通过两层卷积层以及NetVLAD层,从共用特征信息提取第二图像描述信息。Specifically, the image information generation layer is used to extract the second image description information from the shared feature information through the two convolution layers and the NetVLAD layer.
具体的,图像信息生成层用于通过两层卷积层以及NetVLAD层,从共用特征信息提取到浮点数据类型的第二图像描述信息;图像信息生成层用于将浮点数据类型的第二图像描述信息转换为布尔数据类型的第二图像描述信息。Specifically, the image information generation layer is used to extract the second image description information of the floating point data type from the shared feature information through the two convolution layers and the NetVLAD layer; the image information generation layer is used to extract the second image description information of the floating point data type The image description information is converted into the second image description information of the Boolean data type.
具体的,关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,一层卷积层的卷积核数量与第二关键点描述信息的数量相同。Specifically, the key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of one convolution layer is related to the second key point. The amount of descriptive information is the same.
具体的,关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取到浮点数据类型的第二关键点描述信息;关键点信息生成层用于将浮点数据类型的第二关键点描述信息转换为布尔数据类型的第二关键点描述信息。Specifically, the key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a convolution layer and bilinear upsampling; the key point information generation layer is used to convert the floating point data. The second key point description information of the point data type is converted into the second key point description information of the Boolean data type.
此外,处理器401,还用于:确定当前图像中关键点的位置;关键点信息生成层用于通过一层卷积层得到所述共用特征信息的下采样信息;关键点信息生成层用于通过双线性上采样直接对下采样信息中对应位置的信息进行上采样,得到第二关键点描述信息。In addition, the processor 401 is further configured to: determine the position of the key point in the current image; the key point information generation layer is used to obtain the down-sampling information of the common feature information through a convolution layer; the key point information generation layer is used to The information of the corresponding position in the down-sampling information is directly up-sampled by bilinear up-sampling, so as to obtain the second key point description information.
此外,处理器401,还用于:根据当前图像中的多个关键点的顺序,将对应的多个第二关键点描述信息组合到一个向量中。In addition, the processor 401 is further configured to: combine the corresponding multiple second key point description information into a vector according to the sequence of the multiple key points in the current image.
具体的,处理器401,还用于:根据匹配结果中对应的第一位置信息与对应的第二位置信息的位置偏差,确定可移动平台的姿态;根据姿态,可移动平台从第二位置移动至第一位置,以实现可移动平台的自动返回。Specifically, the processor 401 is further configured to: determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result; according to the posture, the movable platform moves from the second position to the first position for automatic return of the movable platform.
此外,处理器401,还用于:通过第一训练数据,对初始特征提取层进行训练,生成训练后的特征提取层,作为特征提取模型中的特征提取层;第一训练数据包括对应于同一空间点的图像点对,该图像点对在表示为同一视觉场景的不同对应真实图像中;通过第一训练数据,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层,作为特征提取模型中的关键点 信息生成层。In addition, the processor 401 is further configured to: train the initial feature extraction layer by using the first training data, and generate a trained feature extraction layer as the feature extraction layer in the feature extraction model; Image point pairs of spatial points, the image point pairs are represented in different corresponding real images of the same visual scene; through the first training data, the initial key point information generation layer is trained, and the trained key point information generation layer is generated, As the keypoint information generation layer in the feature extraction model.
此外,处理器401,还用于:针对不同的视觉场景,获取每个视觉场景下的不同角度的真实图像;针对每个视觉场景,根据对应不同角度的真实图像,构建空间三维模型;基于空间点之间的相似度,从空间三维模型中选择空间点,以及获得选择后的每个空间点在真实图像中对应的真实图像点对;根据真实图像点对的采集位置之间的相似度,对真实图像点对进行选择,将选择出来的真实图像点对作为关键点对,从而得到第一训练数据。In addition, the processor 401 is further configured to: for different visual scenes, obtain real images from different angles in each visual scene; for each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles; The similarity between the points, select the spatial point from the spatial three-dimensional model, and obtain the real image point pair corresponding to each selected spatial point in the real image; according to the similarity between the collection positions of the real image point pair, The real image point pair is selected, and the selected real image point pair is used as the key point pair to obtain the first training data.
此外,处理器401,具体用于:在初始关键点信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;通过第一训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层。In addition, the processor 401 is specifically configured to: add a loss function of the Boolean data type to the loss function of the floating point data type in the initial key point information generation layer; pass the first training data, the loss function of the floating point data type and the Boolean data type The loss function of the data type, the initial key point information generation layer is trained, and the trained key point information generation layer is generated.
此外,处理器401,还用于:基于训练后的特征提取层,通过第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,作为特征提取模型中的图像描述信息生成层,其中,第二训练数据包括关键帧图像匹配对以及表示每个关键帧图像匹配对是否属于同一视觉场景的信息。In addition, the processor 401 is further configured to: based on the trained feature extraction layer, train the initial image description information generation layer by using the second training data, and generate the trained image description information generation layer as the image in the feature extraction model The description information generation layer, wherein the second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
此外,处理器401,还用于:获取真实图像,基于分类模型,从真实图像中确定出真实图像匹配对,作为关键帧图像匹配对,并确定各个真实图像匹配对的是否属于同一视觉场景,从而获取到第二训练数据。In addition, the processor 401 is further configured to: obtain a real image, determine a real image matching pair from the real image based on the classification model, use it as a key frame image matching pair, and determine whether each real image matching pair belongs to the same visual scene, Thus, the second training data is obtained.
此外,处理器401,还用于:通过展示真实图像匹配对,响应于用户的确定操作,确定真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。In addition, the processor 401 is further configured to: by displaying the real image matching pairs, in response to the user's determination operation, determine whether the real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
此外,处理器401,还用于:从真实图像中随机选择真实图像匹配对,作为关键帧图像匹配对;通过展示随机选择的真实图像匹配对,响应于用户的确定操作,确定随机选择的真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。In addition, the processor 401 is further configured to: randomly select real image matching pairs from real images as key frame image matching pairs; by displaying the randomly selected real image matching pairs, in response to a user's determination operation, determine the randomly selected real image matching pairs Whether the image matching pairs belong to the same visual scene, so as to obtain the second training data.
具体的,处理器401,具体用于:在初始图像描述信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;基于训练后的特征提取层,通过第二训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层。Specifically, the processor 401 is specifically configured to: add a loss function of the Boolean data type to the loss function of the floating point data type in the initial image description information generation layer; based on the feature extraction layer after training, through the second training data, The loss function of the floating point data type and the loss function of the Boolean data type are used to train the initial image description information generation layer, and generate the trained image description information generation layer.
此外,在训练完特征提取模型后,处理器401,还用于:通过第三训练数 据,对特征提取模型中的特征提取层,图像描述信息生成层和/或关键点信息生成层进行调整,第三训练数据包括关键帧图像匹配对以及关键帧图像匹配对中的关键点匹配对。In addition, after training the feature extraction model, the processor 401 is further configured to: adjust the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through the third training data, The third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
此外,处理器401,还用于:当两个真实图像中具有的真实图像点对数量大于阈值的情况下,则将两个真实图像以及对应的真实图像点对作为关键帧图像匹配对以及关键点匹配对,从而得到第三训练数据。In addition, the processor 401 is further configured to: when the number of real image point pairs in the two real images is greater than the threshold, then use the two real images and the corresponding real image point pairs as the key frame image matching pair and the key The points are matched to obtain the third training data.
另外,本发明实施例提供了一种计算机可读存储介质,存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,程序指令用于实现上述图1-图2的方法。In addition, an embodiment of the present invention provides a computer-readable storage medium, where the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used to implement the above-mentioned methods in FIG. 1 to FIG. 2 . .
本发明实施例提供的一种无人机;具体的,该无人机包括:机体以及图4所示的定位的设备,定位的设备设置在机体上。An embodiment of the present invention provides an unmanned aerial vehicle; specifically, the unmanned aerial vehicle includes: a body and a positioning device as shown in FIG. 4 , and the positioning device is provided on the body.
以上各个实施例中的技术方案、技术特征在与本相冲突的情况下均可以单独,或者进行组合,只要未超出本领域技术人员的认知范围,均属于本申请保护范围内的等同实施例。The technical solutions and technical features in the above embodiments can be used alone or combined in the case of conflict with the present invention, as long as they do not exceed the cognitive scope of those skilled in the art, they all belong to the equivalent embodiments within the protection scope of the present application .
在本发明所提供的几个实施例中,应该理解到,所揭露的相关检测装置(例如:IMU)和方法,可以通过其它的方式实现。例如,以上所描述的遥控装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,遥控装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed related detection apparatus (eg, IMU) and method may be implemented in other manners. For example, the embodiments of the remote control device described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or components. May be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, and the indirect coupling or communication connection of the remote control device or unit may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本 发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得计算机处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer processor (processor) to perform all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above descriptions are only the embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied to other related technologies Fields are similarly included in the scope of patent protection of the present invention.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. scope.

Claims (42)

  1. 一种定位的方法,其特征在于,所述方法应用于可移动平台,所述可移动平台包括视觉传感器,包括:A positioning method, characterized in that the method is applied to a movable platform, and the movable platform includes a vision sensor, including:
    获取所述视觉传感器采集的多张历史图像的第一图像描述信息和第一关键点描述信息,并获取采集所述历史图像时的所述可移动平台的第一位置信息;obtaining first image description information and first key point description information of a plurality of historical images collected by the visual sensor, and obtaining first position information of the movable platform when collecting the historical images;
    获取所述视觉传感器采集的当前图像,并基于特征提取模型获取所述当前图像的第二图像描述信息和第二关键点描述信息;acquiring the current image collected by the visual sensor, and acquiring the second image description information and the second key point description information of the current image based on the feature extraction model;
    基于所述历史图像的所述第一图像描述信息和所述第一关键点描述信息,以及所述当前图像的所述第二图像描述信息和所述第二关键点描述信息,确定多张所述历史图像与所述当前图像的匹配结果;Based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image, determine a plurality of the matching result between the historical image and the current image;
    根据所述匹配结果和所述历史图像的所述第一位置信息,确定采集所述当前图像时所述可移动平台的第二位置信息。According to the matching result and the first position information of the historical image, the second position information of the movable platform when the current image is collected is determined.
  2. 根据权利要求1所述的方法,其特征在于,所述特征提取模型包括特征提取层,图像描述信息生成层和关键点信息生成层;The method according to claim 1, wherein the feature extraction model comprises a feature extraction layer, an image description information generation layer and a key point information generation layer;
    所述特征提取层用于基于卷积网络提取所述当前图像的共用特征信息;The feature extraction layer is used to extract the common feature information of the current image based on the convolutional network;
    所述图像描述信息生成层用于基于所述共用特征信息生成所述第二图像描述信息;The image description information generation layer is configured to generate the second image description information based on the common feature information;
    所述关键点信息生成层用于基于所述共用特征信息生成第二关键点描述信息。The key point information generation layer is configured to generate second key point description information based on the common feature information.
  3. 根据权利要求2所述的方法,其特征在于,所述特征提取层用于基于卷积网络提取所述当前图像的共用特征信息,包括:The method according to claim 2, wherein the feature extraction layer is configured to extract common feature information of the current image based on a convolutional network, comprising:
    所述特征提取层用于基于卷积网络中的多个卷积层提取所述共用特征信息。The feature extraction layer is used for extracting the common feature information based on multiple convolutional layers in the convolutional network.
  4. 根据权利要求2所述的方法,其特征在于,所述图像描述信息生成层用于基于所述共用特征信息生成所述第二图像描述信息,包括:The method according to claim 2, wherein the image description information generation layer is configured to generate the second image description information based on the common feature information, comprising:
    所述图像信息生成层用于通过两层卷积层以及NetVLAD层,从所述共用特征信息提取所述第二图像描述信息。The image information generation layer is configured to extract the second image description information from the common feature information through two convolution layers and a NetVLAD layer.
  5. 根据权利要求4所述的方法,其特征在于,所述图像信息生成层用于通过两层卷积层以及NetVLAD层,从所述共用特征信息提取所述第二图像描述信息,包括:The method according to claim 4, wherein the image information generation layer is configured to extract the second image description information from the common feature information through two convolution layers and a NetVLAD layer, comprising:
    所述图像信息生成层用于通过两层卷积层以及NetVLAD层,从所述共用特征信息提取到浮点数据类型的第二图像描述信息;The image information generation layer is used to extract the second image description information of the floating point data type from the shared feature information through two layers of convolution layers and the NetVLAD layer;
    所述图像信息生成层用于将浮点数据类型的第二图像描述信息转换为布尔数据类型的第二图像描述信息。The image information generation layer is used for converting the second image description information of the floating point data type into the second image description information of the Boolean data type.
  6. 根据权利要求2所述的方法,其特征在于,所述关键点信息生成层用于基于所述共用特征信息生成第二关键点描述信息,包括:The method according to claim 2, wherein the key point information generation layer is configured to generate second key point description information based on the common feature information, comprising:
    所述关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,所述一层卷积层的卷积核数量与第二关键点描述信息的数量相同。The key point information generation layer is used to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the number of convolution kernels of the one convolution layer is the same as the second key The amount of point description information is the same.
  7. 根据权利要求6所述的方法,其特征在于,所述关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,包括:The method according to claim 6, wherein the key point information generation layer is configured to extract the second key point description information from the shared feature information through a convolution layer and bilinear upsampling, comprising:
    所述关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取到浮点数据类型的第二关键点描述信息;The key point information generation layer is used to extract the second key point description information of the floating point data type from the shared feature information through a layer of convolution layer and bilinear upsampling;
    所述关键点信息生成层用于将浮点数据类型的第二关键点描述信息转换为布尔数据类型的第二关键点描述信息。The key point information generation layer is used to convert the second key point description information of the floating point data type into the second key point description information of the Boolean data type.
  8. 根据权利要求2所述的方法,其特征在于,所述关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,包括:The method according to claim 2, wherein the key point information generation layer is configured to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, comprising:
    确定当前图像中关键点的位置;Determine the location of key points in the current image;
    所述关键点信息生成层用于通过一层卷积层得到所述共用特征信息的下采样信息;The key point information generation layer is used to obtain down-sampling information of the shared feature information through a convolution layer;
    所述关键点信息生成层用于通过双线性上采样直接对所述下采样信息中对应位置的信息进行上采样,得到所述第二关键点描述信息。The key point information generation layer is used for directly up-sampling the information of the corresponding position in the down-sampling information through bilinear up-sampling to obtain the second key point description information.
  9. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    根据所述当前图像中的多个关键点的顺序,将对应的多个第二关键点描述信息组合到一个向量中。According to the sequence of the multiple key points in the current image, the corresponding multiple second key point description information is combined into a vector.
  10. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    根据所述匹配结果中对应的第一位置信息与对应的第二位置信息的位置偏差,确定可移动平台的姿态;Determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result;
    根据所述姿态,所述可移动平台从所述第二位置移动至第一位置,以实现所述可移动平台的自动返回。Based on the gesture, the movable platform moves from the second position to the first position to achieve automatic return of the movable platform.
  11. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    通过第一训练数据,对初始特征提取层进行训练,生成训练后的特征提取层,作为特征提取模型中的特征提取层;所述第一训练数据包括对应于同一空间点的图像点对,该图像点对在表示为同一视觉场景的不同对应真实图像中;Through the first training data, the initial feature extraction layer is trained, and the trained feature extraction layer is generated as the feature extraction layer in the feature extraction model; the first training data includes image point pairs corresponding to the same spatial point, the Image point pairs in different corresponding real images represented as the same visual scene;
    通过所述第一训练数据,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层,作为特征提取模型中的关键点信息生成层。Through the first training data, the initial key point information generation layer is trained, and the trained key point information generation layer is generated as the key point information generation layer in the feature extraction model.
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:The method according to claim 11, wherein the method further comprises:
    针对不同的视觉场景,获取每个视觉场景下的不同角度的真实图像;For different visual scenes, obtain real images from different angles under each visual scene;
    针对每个视觉场景,根据对应不同角度的真实图像,构建空间三维模型;For each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles;
    基于空间点之间的相似度,从所述空间三维模型中选择空间点,以及获得选择后的每个空间点在真实图像中对应的真实图像点对;Based on the similarity between the spatial points, selecting spatial points from the three-dimensional spatial model, and obtaining a real image point pair corresponding to each selected spatial point in the real image;
    根据真实图像点对的采集位置之间的相似度,对真实图像点对进行选择,将选择出来的真实图像点对作为关键点对,从而得到第一训练数据。According to the similarity between the collection positions of the real image point pairs, the real image point pairs are selected, and the selected real image point pairs are used as key point pairs, thereby obtaining the first training data.
  13. 根据权利要求11所述的方法,其特征在于,所述通过所述第一训练数据,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层,包括:The method according to claim 11, wherein the first training data is used to train an initial key point information generation layer to generate a trained key point information generation layer, comprising:
    在所述初始关键点信息生成层中的浮点数据类型的损失函数上增加布尔 数据类型的损失函数;A loss function of Boolean data type is added to the loss function of floating point data type in the initial key point information generation layer;
    通过所述第一训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对所述初始关键点信息生成层进行训练,生成训练后的关键点信息生成层。The initial key point information generation layer is trained by using the first training data, the loss function of the floating point data type, and the loss function of the Boolean data type, and a trained key point information generation layer is generated.
  14. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    基于训练后的特征提取层,通过所述第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,作为特征提取模型中的图像描述信息生成层,其中,所述第二训练数据包括关键帧图像匹配对以及表示每个关键帧图像匹配对是否属于同一视觉场景的信息。Based on the trained feature extraction layer, the initial image description information generation layer is trained by the second training data, and the trained image description information generation layer is generated as the image description information generation layer in the feature extraction model. The second training data includes key-frame image matching pairs and information indicating whether each key-frame image matching pair belongs to the same visual scene.
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:The method of claim 14, wherein the method further comprises:
    获取真实图像,基于分类模型,从所述真实图像中确定出真实图像匹配对,作为关键帧图像匹配对,并确定各个真实图像匹配对的是否属于同一视觉场景,从而获取到第二训练数据。Acquire a real image, determine a real image matching pair from the real image based on the classification model, and use it as a key frame image matching pair, and determine whether each real image matching pair belongs to the same visual scene, thereby obtaining the second training data.
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:The method of claim 15, wherein the method further comprises:
    通过展示所述真实图像匹配对,响应于用户的确定操作,确定所述真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。By displaying the real image matching pairs, in response to the user's determination operation, it is determined whether the real image matching pairs belong to the same visual scene, thereby acquiring second training data.
  17. 根据权利要求15所述的方法,其特征在于,所述方法还包括:The method of claim 15, wherein the method further comprises:
    从所述真实图像中随机选择真实图像匹配对,作为关键帧图像匹配对;Randomly select real image matching pairs from the real images as key frame image matching pairs;
    通过展示随机选择的真实图像匹配对,响应于用户的确定操作,确定随机选择的真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。By displaying the randomly selected real image matching pairs, in response to the user's determination operation, it is determined whether the randomly selected real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
  18. 根据权利要求14所述的方法,其特征在于,所述基于训练后的特征提取层,通过所述第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,包括:The method according to claim 14, wherein, based on the trained feature extraction layer, the initial image description information generation layer is trained by the second training data, and the trained image description information generation layer is generated, include:
    在所述初始图像描述信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;adding a loss function of boolean data type to the loss function of floating point data type in the initial image description information generation layer;
    基于训练后的特征提取层,通过所述第二训练数据、浮点数据类型的损 失函数以及布尔数据类型的损失函数,对所述初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层。Based on the trained feature extraction layer, the initial image description information generation layer is trained through the second training data, the loss function of the floating point data type and the loss function of the Boolean data type, and the trained image description information is generated Generate layers.
  19. 根据权利要求14所述的方法,其特征在于,在训练完所述特征提取模型后,所述方法还包括:The method according to claim 14, wherein after training the feature extraction model, the method further comprises:
    通过第三训练数据,对所述特征提取模型中的特征提取层,图像描述信息生成层和/或关键点信息生成层进行调整,所述第三训练数据包括关键帧图像匹配对以及关键帧图像匹配对中的关键点匹配对。Adjust the feature extraction layer, the image description information generation layer and/or the key point information generation layer in the feature extraction model through third training data, where the third training data includes key frame image matching pairs and key frame images Keypoint matching pairs in matching pairs.
  20. 根据权利要求12所述的方法,其特征在于,所述方法还包括:The method of claim 12, wherein the method further comprises:
    当两个真实图像中具有的真实图像点对数量大于阈值的情况下,则将所述两个真实图像以及对应的真实图像点对作为关键帧图像匹配对以及关键点匹配对,从而得到第三训练数据。When the number of real image point pairs in the two real images is greater than the threshold, the two real images and the corresponding real image point pairs are used as key frame image matching pairs and key point matching pairs, so as to obtain a third training data.
  21. 一种定位的设备,包括:存储器、处理器以及视觉传感器;A positioning device, comprising: a memory, a processor and a vision sensor;
    所述存储器,用于存储计算机程序;the memory for storing computer programs;
    所述视觉传感器,用于采集的历史图像以及当前图像;The visual sensor is used to collect historical images and current images;
    所述处理器调用所述计算机程序,以实现如下步骤:The processor invokes the computer program to implement the following steps:
    获取所述视觉传感器采集的多张历史图像的第一图像描述信息和第一关键点描述信息,并获取采集所述历史图像时的所述可移动平台的第一位置信息;obtaining first image description information and first key point description information of a plurality of historical images collected by the visual sensor, and obtaining first position information of the movable platform when collecting the historical images;
    获取所述视觉传感器采集的当前图像,并基于特征提取模型获取所述当前图像的第二图像描述信息和第二关键点描述信息;acquiring the current image collected by the visual sensor, and acquiring the second image description information and the second key point description information of the current image based on the feature extraction model;
    基于所述历史图像的所述第一图像描述信息和所述第一关键点描述信息,以及所述当前图像的所述第二图像描述信息和所述第二关键点描述信息,确定多张所述历史图像与所述当前图像的匹配结果;Based on the first image description information and the first key point description information of the historical image, and the second image description information and the second key point description information of the current image, determine a plurality of the matching result between the historical image and the current image;
    根据所述匹配结果和所述历史图像的所述第一位置信息,确定采集所述当前图像时所述可移动平台的第二位置信息。According to the matching result and the first position information of the historical image, the second position information of the movable platform when the current image is collected is determined.
  22. 根据权利要求21所述的设备,其特征在于,所述特征提取模型包括特征提取层,图像描述信息生成层和关键点信息生成层;The device according to claim 21, wherein the feature extraction model comprises a feature extraction layer, an image description information generation layer and a key point information generation layer;
    所述特征提取层用于基于卷积网络提取所述当前图像的共用特征信息;The feature extraction layer is used to extract the common feature information of the current image based on the convolutional network;
    所述图像描述信息生成层用于基于所述共用特征信息生成所述第二图像描述信息;The image description information generation layer is configured to generate the second image description information based on the common feature information;
    所述关键点信息生成层用于基于所述共用特征信息生成第二关键点描述信息。The key point information generation layer is configured to generate second key point description information based on the common feature information.
  23. 根据权利要求22所述的设备,其特征在于,所述特征提取层用于基于卷积网络中的多个卷积层提取所述共用特征信息。The device according to claim 22, wherein the feature extraction layer is configured to extract the common feature information based on a plurality of convolutional layers in a convolutional network.
  24. 根据权利要求22所述的设备,其特征在于,所述图像信息生成层用于通过两层卷积层以及NetVLAD层,从所述共用特征信息提取所述第二图像描述信息。The device according to claim 22, wherein the image information generation layer is configured to extract the second image description information from the common feature information through two convolution layers and a NetVLAD layer.
  25. 根据权利要求24所述的设备,其特征在于,所述图像信息生成层用于通过两层卷积层以及NetVLAD层,从所述共用特征信息提取到浮点数据类型的第二图像描述信息;The device according to claim 24, wherein the image information generation layer is used to extract the second image description information of the floating point data type from the shared feature information through two convolution layers and a NetVLAD layer;
    所述图像信息生成层用于将浮点数据类型的第二图像描述信息转换为布尔数据类型的第二图像描述信息。The image information generation layer is used for converting the second image description information of the floating point data type into the second image description information of the Boolean data type.
  26. 根据权利要求22所述的设备,其特征在于,所述关键点信息生成层用于基于一层卷积层以及双线性上采样,从共用特征信息提取第二关键点描述信息,所述一层卷积层的卷积核数量与第二关键点描述信息的数量相同。The device according to claim 22, wherein the key point information generation layer is configured to extract the second key point description information from the shared feature information based on one convolution layer and bilinear upsampling, and the one The number of convolution kernels in the convolutional layer is the same as the number of the second keypoint description information.
  27. 根据权利要求26所述的设备,其特征在于,所述关键点信息生成层用于通过一层卷积层以及双线性上采样,从共用特征信息提取到浮点数据类型的第二关键点描述信息;The device according to claim 26, wherein the keypoint information generation layer is configured to extract the second keypoint of the floating-point data type from the shared feature information through a convolutional layer and bilinear upsampling Description;
    所述关键点信息生成层用于将浮点数据类型的第二关键点描述信息转换为布尔数据类型的第二关键点描述信息。The key point information generation layer is used to convert the second key point description information of the floating point data type into the second key point description information of the Boolean data type.
  28. 根据权利要求22所述的设备,其特征在于,所述处理器,还用于:The device according to claim 22, wherein the processor is further configured to:
    确定当前图像中关键点的位置;Determine the position of key points in the current image;
    所述关键点信息生成层用于通过一层卷积层得到所述共用特征信息的下采样信息;The key point information generation layer is used to obtain down-sampling information of the shared feature information through a convolution layer;
    所述关键点信息生成层用于通过双线性上采样直接对所述下采样信息中对应位置的信息进行上采样,得到所述第二关键点描述信息。The key point information generation layer is used for directly up-sampling the information of the corresponding position in the down-sampling information through bilinear up-sampling to obtain the second key point description information.
  29. 根据权利要求22所述的设备,其特征在于,所述处理器,还用于:根据所述当前图像中的多个关键点的顺序,将对应的多个第二关键点描述信息组合到一个向量中。The device according to claim 22, wherein the processor is further configured to: combine the corresponding multiple second key point description information into one according to the sequence of multiple key points in the current image in vector.
  30. 根据权利要求21所述的设备,其特征在于,所述处理器,还用于:The device according to claim 21, wherein the processor is further configured to:
    根据所述匹配结果中对应的第一位置信息与对应的第二位置信息的位置偏差,确定可移动平台的姿态;Determine the posture of the movable platform according to the position deviation between the corresponding first position information and the corresponding second position information in the matching result;
    根据所述姿态,所述可移动平台从所述第二位置移动至第一位置,以实现所述可移动平台的自动返回。Based on the gesture, the movable platform moves from the second position to the first position to achieve automatic return of the movable platform.
  31. 根据权利要求22所述的设备,其特征在于,所述处理器,还用于:通过第一训练数据,对初始特征提取层进行训练,生成训练后的特征提取层,作为特征提取模型中的特征提取层;所述第一训练数据包括对应于同一空间点的图像点对,该图像点对在表示为同一视觉场景的不同对应真实图像中;The device according to claim 22, wherein the processor is further configured to: train an initial feature extraction layer by using the first training data, and generate a trained feature extraction layer, which is used as a feature extraction layer in the feature extraction model. Feature extraction layer; the first training data includes image point pairs corresponding to the same spatial point, and the image point pairs are represented in different corresponding real images of the same visual scene;
    通过所述第一训练数据,对初始关键点信息生成层进行训练,生成训练后的关键点信息生成层,作为特征提取模型中的关键点信息生成层。Through the first training data, the initial key point information generation layer is trained, and the trained key point information generation layer is generated as the key point information generation layer in the feature extraction model.
  32. 根据权利要求31所述的设备,其特征在于,所述处理器,还用于:针对不同的视觉场景,获取每个视觉场景下的不同角度的真实图像;The device according to claim 31, wherein the processor is further configured to: for different visual scenes, obtain real images from different angles in each visual scene;
    针对每个视觉场景,根据对应不同角度的真实图像,构建空间三维模型;For each visual scene, build a three-dimensional spatial model according to the real images corresponding to different angles;
    基于空间点之间的相似度,从所述空间三维模型中选择空间点,以及获得选择后的每个空间点在真实图像中对应的真实图像点对;Based on the similarity between the spatial points, selecting spatial points from the three-dimensional spatial model, and obtaining a real image point pair corresponding to each selected spatial point in the real image;
    根据真实图像点对的采集位置之间的相似度,对真实图像点对进行选择,将选择出来的真实图像点对作为关键点对,从而得到第一训练数据。According to the similarity between the collection positions of the real image point pairs, the real image point pairs are selected, and the selected real image point pairs are used as key point pairs, thereby obtaining the first training data.
  33. 根据权利要求31所述的设备,其特征在于,所述处理器,具体用于: 在所述初始关键点信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;The device according to claim 31, wherein the processor is specifically configured to: add a loss function of Boolean data type to the loss function of floating point data type in the initial key point information generation layer;
    通过所述第一训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对所述初始关键点信息生成层进行训练,生成训练后的关键点信息生成层。The initial key point information generation layer is trained by using the first training data, the loss function of the floating point data type, and the loss function of the Boolean data type, and a trained key point information generation layer is generated.
  34. 根据权利要求32所述的设备,其特征在于,所述处理器,还用于:基于训练后的特征提取层,通过所述第二训练数据对初始图像描述信息生成层进行训练,生成训练后的图像描述信息生成层,作为特征提取模型中的图像描述信息生成层,其中,所述第二训练数据包括关键帧图像匹配对以及表示每个关键帧图像匹配对是否属于同一视觉场景的信息。The device according to claim 32, wherein the processor is further configured to: based on the trained feature extraction layer, perform training on the initial image description information generation layer by using the second training data, and generate a post-training The image description information generation layer is used as the image description information generation layer in the feature extraction model, wherein the second training data includes key frame image matching pairs and information indicating whether each key frame image matching pair belongs to the same visual scene.
  35. 根据权利要求34所述的设备,其特征在于,所述处理器,还用于:获取真实图像,基于分类模型,从所述真实图像中确定出真实图像匹配对,作为关键帧图像匹配对,并确定各个真实图像匹配对的是否属于同一视觉场景,从而获取到第二训练数据。The device according to claim 34, wherein the processor is further configured to: obtain a real image, and determine a real image matching pair from the real image based on a classification model, as a key frame image matching pair, And it is determined whether each real image matching pair belongs to the same visual scene, so as to obtain the second training data.
  36. 根据权利要求35所述的设备,其特征在于,所述处理器,还用于:通过展示所述真实图像匹配对,响应于用户的确定操作,确定所述真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。The device according to claim 35, wherein the processor is further configured to: by displaying the real image matching pairs, in response to a user's determination operation, determine whether the real image matching pairs belong to the same visual scene , so as to obtain the second training data.
  37. 根据权利要求35所述的设备,其特征在于,所述处理器,还用于:从所述真实图像中随机选择真实图像匹配对,作为关键帧图像匹配对;The device according to claim 35, wherein the processor is further configured to: randomly select real image matching pairs from the real images as key frame image matching pairs;
    通过展示随机选择的真实图像匹配对,响应于用户的确定操作,确定随机选择的真实图像匹配对是否属于同一视觉场景,从而获取到第二训练数据。By displaying the randomly selected real image matching pairs, in response to the user's determination operation, it is determined whether the randomly selected real image matching pairs belong to the same visual scene, thereby acquiring the second training data.
  38. 根据权利要求34所述的设备,其特征在于,所述处理器,具体用于:在所述初始图像描述信息生成层中的浮点数据类型的损失函数上增加布尔数据类型的损失函数;The device according to claim 34, wherein the processor is specifically configured to: add a loss function of Boolean data type to the loss function of floating point data type in the initial image description information generation layer;
    基于训练后的特征提取层,通过所述第二训练数据、浮点数据类型的损失函数以及布尔数据类型的损失函数,对所述初始图像描述信息生成层进行 训练,生成训练后的图像描述信息生成层。Based on the trained feature extraction layer, the initial image description information generation layer is trained through the second training data, the loss function of the floating point data type and the loss function of the Boolean data type, and the trained image description information is generated Generate layers.
  39. 根据权利要求34所述的设备,其特征在于,在训练完所述特征提取模型后,所述处理器,还用于:通过第三训练数据,对所述特征提取模型中的特征提取层,图像描述信息生成层和/或关键点信息生成层进行调整,所述第三训练数据包括关键帧图像匹配对以及关键帧图像匹配对中的关键点匹配对。The device according to claim 34, wherein after the feature extraction model is trained, the processor is further configured to: perform third training data on the feature extraction layer in the feature extraction model, The image description information generation layer and/or the key point information generation layer is adjusted, and the third training data includes key frame image matching pairs and key point matching pairs in the key frame image matching pairs.
  40. 根据权利要求32所述的设备,其特征在于,所述处理器,还用于:当两个真实图像中具有的真实图像点对数量大于阈值的情况下,则将所述两个真实图像以及对应的真实图像点对作为关键帧图像匹配对以及关键点匹配对,从而得到第三训练数据。The device according to claim 32, wherein the processor is further configured to: when the number of real image point pairs in the two real images is greater than a threshold, then the two real images and the The corresponding real image point pairs are used as key frame image matching pairs and key point matching pairs, thereby obtaining third training data.
  41. 一种无人机,其特征在于,包括:机体以及如权利要求21-40所述的设备。An unmanned aerial vehicle, characterized by comprising: an airframe and the device according to claims 21-40.
  42. 一种计算机可读存储介质,其特征在于,所述存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,所述程序指令用于实现权利要求1-20中任意一项所述的定位的方法。A computer-readable storage medium, characterized in that the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used to implement any one of claims 1-20 The method of positioning described in item.
PCT/CN2020/137313 2020-12-17 2020-12-17 Positioning method and device, and unmanned aerial vehicle and storage medium WO2022126529A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/137313 WO2022126529A1 (en) 2020-12-17 2020-12-17 Positioning method and device, and unmanned aerial vehicle and storage medium
CN202080069130.4A CN114556425A (en) 2020-12-17 2020-12-17 Positioning method, positioning device, unmanned aerial vehicle and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137313 WO2022126529A1 (en) 2020-12-17 2020-12-17 Positioning method and device, and unmanned aerial vehicle and storage medium

Publications (1)

Publication Number Publication Date
WO2022126529A1 true WO2022126529A1 (en) 2022-06-23

Family

ID=81667972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137313 WO2022126529A1 (en) 2020-12-17 2020-12-17 Positioning method and device, and unmanned aerial vehicle and storage medium

Country Status (2)

Country Link
CN (1) CN114556425A (en)
WO (1) WO2022126529A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116858215A (en) * 2023-09-05 2023-10-10 武汉大学 AR navigation map generation method and device
CN118097796A (en) * 2024-04-28 2024-05-28 中国人民解放军联勤保障部队第九六四医院 Gesture detection analysis system and method based on visual recognition

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677444B (en) * 2022-05-30 2022-08-26 杭州蓝芯科技有限公司 Optimized visual SLAM method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103256931A (en) * 2011-08-17 2013-08-21 清华大学 Visual navigation system of unmanned planes
WO2015143615A1 (en) * 2014-03-24 2015-10-01 深圳市大疆创新科技有限公司 Method and apparatus for correcting aircraft state in real time
US20160132057A1 (en) * 2013-07-09 2016-05-12 Duretek Inc. Method for constructing air-observed terrain data by using rotary wing structure
CN107209854A (en) * 2015-09-15 2017-09-26 深圳市大疆创新科技有限公司 For the support system and method that smoothly target is followed
CN110139038A (en) * 2019-05-22 2019-08-16 深圳市道通智能航空技术有限公司 It is a kind of independently to surround image pickup method, device and unmanned plane

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103256931A (en) * 2011-08-17 2013-08-21 清华大学 Visual navigation system of unmanned planes
US20160132057A1 (en) * 2013-07-09 2016-05-12 Duretek Inc. Method for constructing air-observed terrain data by using rotary wing structure
WO2015143615A1 (en) * 2014-03-24 2015-10-01 深圳市大疆创新科技有限公司 Method and apparatus for correcting aircraft state in real time
CN107209854A (en) * 2015-09-15 2017-09-26 深圳市大疆创新科技有限公司 For the support system and method that smoothly target is followed
CN110139038A (en) * 2019-05-22 2019-08-16 深圳市道通智能航空技术有限公司 It is a kind of independently to surround image pickup method, device and unmanned plane

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116858215A (en) * 2023-09-05 2023-10-10 武汉大学 AR navigation map generation method and device
CN116858215B (en) * 2023-09-05 2023-12-05 武汉大学 AR navigation map generation method and device
CN118097796A (en) * 2024-04-28 2024-05-28 中国人民解放军联勤保障部队第九六四医院 Gesture detection analysis system and method based on visual recognition

Also Published As

Publication number Publication date
CN114556425A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
Piasco et al. A survey on visual-based localization: On the benefit of heterogeneous data
Li et al. Dual-resolution correspondence networks
Kendall et al. Posenet: A convolutional network for real-time 6-dof camera relocalization
Laskar et al. Camera relocalization by computing pairwise relative poses using convolutional neural network
WO2022126529A1 (en) Positioning method and device, and unmanned aerial vehicle and storage medium
Zeng et al. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions
CN111652934B (en) Positioning method, map construction method, device, equipment and storage medium
Lim et al. Real-time image-based 6-dof localization in large-scale environments
US8442307B1 (en) Appearance augmented 3-D point clouds for trajectory and camera localization
CN107967457A (en) A kind of place identification for adapting to visual signature change and relative positioning method and system
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
Xia et al. Loop closure detection for visual SLAM using PCANet features
CN113298934B (en) Monocular visual image three-dimensional reconstruction method and system based on bidirectional matching
JP7430243B2 (en) Visual positioning method and related equipment
Müller et al. Squeezeposenet: Image based pose regression with small convolutional neural networks for real time uas navigation
Vishal et al. Accurate localization by fusing images and GPS signals
CN114926747A (en) Remote sensing image directional target detection method based on multi-feature aggregation and interaction
CN112861808B (en) Dynamic gesture recognition method, device, computer equipment and readable storage medium
CN110969648A (en) 3D target tracking method and system based on point cloud sequence data
CN111368733B (en) Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal
Alam et al. A review of recurrent neural network based camera localization for indoor environments
CN113592015B (en) Method and device for positioning and training feature matching network
JP7336653B2 (en) Indoor positioning method using deep learning
Drobnitzky et al. Survey and systematization of 3D object detection models and methods
Álvarez-Tuñón et al. Monocular visual simultaneous localization and mapping:(r) evolution from geometry to deep learning-based pipelines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965545

Country of ref document: EP

Kind code of ref document: A1