WO2023179342A1 - 重定位方法及相关设备 - Google Patents

重定位方法及相关设备 Download PDF

Info

Publication number
WO2023179342A1
WO2023179342A1 PCT/CN2023/079654 CN2023079654W WO2023179342A1 WO 2023179342 A1 WO2023179342 A1 WO 2023179342A1 CN 2023079654 W CN2023079654 W CN 2023079654W WO 2023179342 A1 WO2023179342 A1 WO 2023179342A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
current image
frame
feature point
feature
Prior art date
Application number
PCT/CN2023/079654
Other languages
English (en)
French (fr)
Inventor
郭亨凯
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023179342A1 publication Critical patent/WO2023179342A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a relocation method, device, electronic equipment, storage medium and program product.
  • Simultaneous Localization and Mapping means that the robot is equipped with a specific sensor, and without prior information about the environment, it estimates the position and posture of the sensor during movement and models the surrounding environment at the same time.
  • SLAM can be called visual SLAM (VSLAM).
  • VSLAM visual SLAM
  • SLAM technology has been researched and developed for more than thirty years, and researchers have done a lot of work. In the past ten years, with the development of computer vision, VSLAM has gained popularity among academics and researchers due to its advantages such as low hardware cost, light weight, and high precision.
  • VSLAM has gained popularity among academics and researchers due to its advantages such as low hardware cost, light weight, and high precision.
  • SLAM technology has been widely used in various augmented reality applications, such as plane detection and plane tracking.
  • the above planar tracking results may have errors.
  • the asymptotic inter-frame matching method adopted by SLAM technology may also cause the accumulation of errors, causing the plane tracking results to drift after a period of use. Therefore, how to eliminate the accumulation of errors during the plane tracking process of SLAM has become one of the key issues that SLAM technology needs to solve.
  • embodiments of the present disclosure provide a relocation method that can During the tracking process, the camera pose is accurately determined and the error accumulation in the plane tracking process is eliminated, thereby ensuring the accuracy of plane tracking.
  • the above-mentioned relocation method may include: in response to determining that the current image frame satisfies the relocation condition, obtaining feature points of the current image frame and descriptors of each feature point; based on the current image frame Feature points and descriptors of each feature point are used to perform feature matching between the current image frame and each saved key frame, respectively, to obtain feature point pairs matched between the current image frame and each key frame; based on the The feature point pairs respectively determine the matching degree between the current image frame and each key frame; determine the key frame with the highest matching degree with the current image frame as the target key frame; and use the camera position corresponding to the target key frame. pose to replace the camera pose corresponding to the current image frame.
  • a relocation device including:
  • a first feature point acquisition module configured to acquire the feature points of the current image frame and the descriptors of each feature point in response to determining that the relocation condition is met;
  • a first feature matching module configured to perform feature matching on the current image frame with each saved key frame based on the feature points of the current image frame and the descriptors of each feature point, to obtain the current image respectively. Feature point pairs after matching the frame with each key frame;
  • a matching degree determination module configured to determine the matching degree between the current image frame and each key frame based on the feature point pair
  • a target key frame determination module configured to determine the key frame with the highest matching degree to the current image frame as the target key frame
  • a pose replacement module is configured to replace the camera pose corresponding to the current image frame with the camera pose corresponding to the target key frame.
  • embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the program, the above-mentioned relocation method is implemented. .
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores computer instructions for causing The computer performs the relocation method described above.
  • An embodiment of the present disclosure also provides a computer program product, which includes computer program instructions.
  • the computer program instructions When the computer program instructions are run on a computer, they cause the computer to perform the above relocation method.
  • the camera pose may drift due to error accumulation, resulting in drift in the planar tracking results.
  • this key frame can be accurately determined, and the current image frame can be replaced with the camera pose corresponding to this key frame.
  • the corresponding camera pose thereby directly pulling the camera pose back to the camera pose corresponding to the previously saved key frame to eliminate the error accumulation in the plane tracking process and solve the drift problem of plane tracking caused by error accumulation. Ensure the accuracy of plane tracking.
  • Figure 1 shows the implementation process of retaining key frames in the relocation method according to some embodiments of the present disclosure
  • Figure 2 shows the implementation process of camera pose relocation based on retained key frames according to some embodiments of the present disclosure
  • Figure 3 shows a specific implementation process for determining the matching degree between the second image frame and the key frame according to some embodiments of the present disclosure
  • Figure 4 shows a schematic diagram of the internal structure of a relocation device according to some embodiments of the present disclosure
  • Figure 5 shows a schematic diagram of the internal structure of a relocation device according to other embodiments of the present disclosure.
  • FIG. 6 shows a more specific hardware structure diagram of an electronic device provided by this embodiment.
  • SLAM technology uses asymptotic inter-frame matching for plane tracking.
  • the camera pose corresponding to each image frame in a video can be obtained.
  • Feature points are matched; then, the mapping relationship between the current image frame and the feature points of the previous image frame is determined based on the feature matching results, and the camera pose corresponding to the current image frame is determined based on this mapping relationship, and the image is further tracked. planes in the frame etc.
  • the above-mentioned mapping relationship may be, for example, a homography matrix between two image frames or a fundamental matrix between two image frames, or the like. Due to the presence of noise, there may be errors in the camera pose and plane tracking results obtained for each image frame through the above method. And since the above results are obtained based on the relationship between the current image frame and its previous image frame, after the above plane tracking process runs for a period of time, Finally, errors may accumulate, resulting in serious drift in the plane tracking results after being used for a period of time.
  • some embodiments of the present disclosure provide a relocation method that can accurately determine the camera pose during the plane tracking process, eliminate error accumulation in the plane tracking process, and ensure the accuracy of the plane tracking.
  • the above relocation method can be implemented by a planar tracking device.
  • the above-mentioned planar tracking device may be an electronic device with computing capabilities.
  • the above-mentioned planar tracking device can also display an interactive interface that can interact with the user through the display screen, thereby providing the user with video or image processing functions.
  • the relocation method described in the embodiment of the present disclosure is usually executed after plane tracking of the current image frame, and mainly includes two parts. Among them, the first part is to retain key frames; the second part is to reposition the camera pose based on the retained key frames. The content of the above two parts will be described in detail below.
  • Figure 1 shows the implementation process of retaining key frames in the relocation method according to the embodiment of the present disclosure. As shown in Figure 1, the method may include the following steps:
  • step 102 in response to determining that the first image frame satisfies the key frame preliminary screening condition, the feature points of the above-mentioned first image frame and the descriptors of each feature point are obtained.
  • the above-mentioned first image frame refers to any image frame in the video that currently requires planar tracking, that is, it represents the current image frame to be processed. For convenience of description, it is called the first image frame in this embodiment.
  • the above-mentioned key frame preliminary screening conditions are pre-set conditions for initiating some operations of retaining key frames. That is, when it is determined that the current image frame meets the above-mentioned key frame preliminary screening conditions, the operation of retaining key frames is started and subsequent operations are performed. process; if it is determined that the current image frame does not meet the above key frame preliminary screening conditions, the subsequent process will not be executed.
  • the key frame preliminary screening conditions may include: determining that the difference between the camera pose corresponding to the first image frame and the camera pose corresponding to the saved key frame is greater than a preset A certain pose difference threshold.
  • the above pose difference threshold may include a distance difference threshold and a perspective difference threshold. Specifically, if it is determined that the camera pose corresponding to the above-mentioned first image frame is the same as the camera pose corresponding to any saved key frame, If the distance of the pose exceeds the above distance difference threshold and/or the perspective difference exceeds the above perspective difference threshold, the difference between the camera pose corresponding to the first image frame and the camera pose corresponding to the saved key frame can be determined.
  • the preset pose difference threshold that is, the above-mentioned first image frame satisfies the above-mentioned key frame preliminary screening conditions. This situation applies to situations where the machine automatically selects keyframes. Often, the initial image frame of a video can also be automatically set as the first keyframe.
  • the key frame preliminary screening conditions may include: detecting that the user clicks on the screen of the planar tracking device. This case applies to manual selection of keyframes.
  • the user can manually determine the location of the key frame, and when determining that the currently displayed image frame is a key frame, choose to click on the screen of the screen tracking device to start retaining the key frame. operate.
  • the camera pose corresponding to the above-mentioned first image frame can be obtained through the aforementioned plane tracking process, and will not be described again here.
  • the above-mentioned planar tracking device may use any image feature extraction method of computer vision to perform feature extraction on the above-mentioned first image frame to obtain the above-mentioned first image.
  • the feature points of the frame and the descriptors of each feature point may be used to perform feature extraction on the above-mentioned first image frame to obtain the above-mentioned first image.
  • the above-mentioned planar tracking device can use the Scale-invariant feature transform (SIFT) algorithm, the ORB (Oriented FAST and Rotated BRIEF) algorithm, and the accelerated version of the feature algorithm with robust characteristics (Speed Up Robust Features, SURF) and other methods are used to perform feature extraction on the above-mentioned first image frame to obtain the feature points of the first image frame and the descriptors of each feature point.
  • SIFT Scale-invariant feature transform
  • ORB Oriented FAST and Rotated BRIEF
  • SURF Speed Up Robust Features
  • Feature point number threshold In response to determining the above If the number of feature points of an image frame is less than the above-mentioned feature point number threshold, it can be determined that the above-mentioned first image frame is not a key frame, and the above-mentioned process ends; in response to determining that the number of feature points of the above-mentioned first image frame is greater than or equal to the above-mentioned feature points quantity threshold, you can continue to perform the following step 104.
  • step 104 feature matching is performed on the first image frame and the saved reference image frame based on the feature points of the first image frame and the descriptors of each feature point to obtain a matched feature point pair.
  • the above-mentioned reference image frame may be an image frame before the above-mentioned first image frame that is processed and saved by the above-mentioned planar tracking device.
  • the reference image frame may be an image frame preceding the first image frame.
  • the reference image frame may also be a key frame preceding the first image frame.
  • each feature point pair includes a feature point in the first image frame and a feature point in the reference image frame that corresponds to the feature point in the first image frame.
  • the above-mentioned planar tracking device can perform feature matching based on the descriptors of each feature point.
  • the above-mentioned planar tracking device may also use an optical flow tracking algorithm to track the feature points in the above-mentioned first image frame to the feature points in the above-mentioned reference image frame. This disclosure does not limit the feature matching method specifically adopted in step 104 above.
  • step 106 the homography matrix between the first image frame and the reference image frame is estimated based on the matched feature point pair.
  • the above-mentioned planar tracking device may determine the homography matrix between the above-mentioned first image frame and the above-mentioned reference image frame through a random sampling consensus algorithm (Random Sample Consensus, RANSAC).
  • RANSAC Random Sample Consensus
  • RANSAC is an algorithm first proposed by Fischler and Bolles in 1981. This algorithm calculates the mathematical model parameters of the data based on a sample data set containing abnormal data. Currently, the RANSAC algorithm is often used to find the best matching model in computer vision matching problems. Corresponding to the embodiment of the present disclosure, the best matching model obtained through the RANSAC algorithm using the above matched feature point pairs is the homography matrix described in this embodiment.
  • the above-mentioned RANSAC algorithm is used to determine the difference between the above-mentioned first image frame and
  • the above-mentioned process of homography matrices between reference image frames may include: first, taking the set of the above-mentioned feature point pairs as a set P; then, randomly selecting 4 sets of feature point pairs from the set P, and based on the selected 4 A set of feature point pairs estimates a model M; next, for the remaining feature point pairs in the set P, the distance between each feature point pair and the above model M is calculated respectively.
  • the feature point pair When the distance exceeds a set first threshold, Then the feature point pair is considered to be an outlier or an outside point; when the distance does not exceed this set threshold, the feature point pair is considered to be an inside point or an interior point; after calculating the remaining feature point pairs in the set P Then, record the number mi of interior points corresponding to the model M. Next, after repeating the above process k times, select the model M corresponding to the maximum mi as the final result. Of course, if after repeating the above process k times, the mi corresponding to all models M is less than another set second threshold, the estimation is considered to have failed, that is, the relationship between the above-mentioned first image frame and the above-mentioned reference image frame cannot be obtained. homography matrix between.
  • the first image frame is determined to be a key image frame, and the feature points of the first image frame, the descriptors of each feature point, and the corresponding first image frame are recorded. camera pose.
  • the above-mentioned step 108 may further include: in response to determining that the above-mentioned homography matrix cannot be estimated, it may be determined that the above-mentioned first image frame is not a key frame, and the above-mentioned process ends.
  • a series of key frames can be determined from each image frame of the video.
  • These key frames usually correspond to some relatively key camera poses. For example, between the camera poses corresponding to these key frames There is usually some distance and/or perspective difference. In this way, in subsequent operations, using these recorded key frames, the camera pose can be repositioned.
  • Figure 2 shows the implementation process of camera pose relocation based on retained key frames according to an embodiment of the present disclosure. As shown in Figure 2, the method may include the following steps:
  • step 202 in response to determining that the second image frame satisfies the relocation condition, the feature points of the above-mentioned second image frame and the descriptors of each feature point are obtained.
  • the above-mentioned second image frame refers to any image frame in the video that currently requires planar tracking, that is, it represents the current image frame to be processed. for For convenience of description, it is called the second image frame in this embodiment. It should be noted that when an image frame satisfies both the key frame initial screening condition and the relocation condition, the above-mentioned second image frame and the above-mentioned first image frame are the same image frame. In other cases, the second image frame and the first image frame may not be the same image frame.
  • the above relocation conditions are preset initial conditions for starting relocation.
  • the above-mentioned relocation conditions may include: the number of plane tracking failures between image frames exceeds a preset plane tracking failure threshold.
  • the camera pose estimation and plane tracking are performed based on the matched feature point pairs. If the camera pose cannot be estimated during the above-mentioned plane tracking process, it means that the plane tracking of the above-mentioned image frame has failed. In this case, the number of plane tracking failures can be increased by one. In this case, the camera pose corresponding to the previous image frame can be used as the camera pose corresponding to the image frame, that is, it is assumed that the image is still.
  • the relocation condition can be considered to be met.
  • the recorded number of plane tracking failures may also be cleared to zero.
  • the above-mentioned relocation condition may further include: the plane tracking error of adjacent image frames of the above-mentioned second image frame is less than a preset plane tracking error threshold. It should be noted that during the process of plane tracking, the error of the plane tracking result will also be evaluated to obtain the plane tracking error. Generally, the blurrier the image frame is, the greater the plane tracking error will be. When the plane tracking error of the adjacent image frame of the second image frame is less than the preset plane tracking error threshold, it means that the current second image frame is not correct. Blurred, the camera pose can be repositioned in the above-mentioned second image frame.
  • the above planar tracking device After determining that the above relocation conditions are met, the above planar tracking device will obtain the feature points of the current second image frame and the descriptors of each feature point.
  • the above-mentioned planar tracking device will use the same method as the above-mentioned step 102 to obtain the feature points of the first image frame and the descriptors of each feature point to obtain the feature points of the above-mentioned second image frame and Descriptors for each feature point.
  • the above-mentioned planar tracking device uses the SIFT algorithm to obtain the feature points of the above-mentioned first image frame and the descriptors of each feature point
  • the above-mentioned planar tracking device will also use the SIFT algorithm to obtain the above-mentioned third image frame.
  • the feature points of the two image frames and the descriptors of each feature point are used in the above-mentioned step 102.
  • the above-mentioned planar tracking device directly obtains the feature points of the above-mentioned first image frame and the descriptors of each feature point obtained during the planar tracking process, then in the current step 202, the above-mentioned planar tracking device will also Directly obtain the feature points of the above-mentioned second image frame obtained during the plane tracking process and the descriptors of each feature point.
  • step 204 based on the feature points of the above-mentioned second image frame and the descriptors of each feature point, the above-mentioned second image frame is feature matched with each saved key frame respectively, and the above-mentioned second image frame and each key frame are respectively obtained.
  • the second feature point pair after key frame matching is based on the feature points of the above-mentioned second image frame and the descriptors of each feature point.
  • each feature point pair includes a feature point in the second image frame and a feature point in the key frame corresponding to the feature point in the first image frame.
  • the above-mentioned planar tracking device can perform feature matching based on the descriptors of each feature point.
  • the above-mentioned planar tracking device may also use an optical flow tracking algorithm to track the feature points in the above-mentioned second image frame to the feature points in the above-mentioned key frames. This disclosure does not limit the feature matching method specifically adopted in the above step 204.
  • step 206 the matching degree between the second image frame and each key frame is determined based on the second feature point pair.
  • step 302 a homography matrix between the second image frame and the key frame is determined based on the second feature point pair.
  • the above-mentioned planar tracking device may also determine the homography matrix between the above-mentioned second image frame and the above-mentioned key frame through a RANSAC algorithm.
  • the specific method is as mentioned above and will not be repeated here.
  • step 304 it is determined that the second feature point pair satisfies the requirement reflected by the homography matrix. The number of feature point pairs of the relationship.
  • the RANSAC algorithm is an algorithm that finds the best matching model based on a set of sample data sets containing abnormal data.
  • the sample data set it uses contains abnormal data, not all sample pairs are Can satisfy the best matching model obtained by the RANSAC algorithm.
  • the samples that satisfy the best matching model obtained are usually called internal points or interior points, while the samples that do not satisfy the best matching model obtained are usually called internal points. Outlier or external point.
  • the best matching model obtained through the RANSAC algorithm using the above matched feature point pairs is the homography matrix in this embodiment.
  • the number of feature point pairs that satisfy the relationship reflected by the above homography matrix among all the above matched feature point pairs can be determined, that is, the number of intra-site points can be determined.
  • step 306 the number of the feature point pairs is used as the matching degree between the second image frame and the key frame.
  • the feature point correspondences on two image frames captured by the camera at the same position and through the same shooting angle should all satisfy the change relationship reflected by the homography matrix obtained from the two image frames.
  • the number of feature point pairs that satisfy the changing relationship reflected by the homography matrix obtained based on the two image frames obtained by the camera at completely different positions or through completely different shooting angles will be relatively small. Therefore, in the embodiment of the present disclosure, the number of the above-mentioned feature point pairs is used as the matching degree between the above-mentioned second image frame and the above-mentioned key frame.
  • step 208 the key frame with the highest matching degree with the second image frame is determined as the target key frame.
  • a key frame with the highest matching degree with the above-mentioned second image frame can be determined from all key frames as the target key frame.
  • step 210 replace the camera pose corresponding to the second image frame with the camera pose corresponding to the target key frame.
  • the camera pose may drift due to error accumulation, resulting in drift in the planar tracking results.
  • this key frame can be determined, and the camera pose corresponding to this key frame is used to replace the camera pose corresponding to the second image frame, thereby directly
  • the camera pose is pulled back to the camera pose corresponding to the previously saved key frame to eliminate the error accumulation in the plane tracking process, solve the drift problem of plane tracking caused by error accumulation, and ensure the accuracy of plane tracking.
  • the above method may further include before step 208: determining whether the matching degree between the second image frame and each key frame is less than a preset matching threshold; in response to determining The matching degree between the above-mentioned second image frame and each key frame is less than the preset matching degree threshold, indicating that the relocation fails, and the above process ends; in response to determining that the matching degree between the above-mentioned second image frame and each key frame is uneven is less than the preset matching degree threshold, continue to perform the above step 208.
  • the matching degree between the above-mentioned second image frame and each key frame is less than the preset matching degree threshold, it means that the above-mentioned second image frame does not match the above-mentioned each key frame. Therefore, there is no need to Replace the camera pose.
  • the methods in the embodiments of the present disclosure can be executed by a single device, such as a computer or server.
  • the method of this embodiment can also be applied in a distributed scenario, and is completed by multiple devices cooperating with each other.
  • one device among the multiple devices can only perform one or more steps in the method of the embodiment of the present disclosure, and the multiple devices will interact with each other to complete all the steps. method described.
  • Figure 4 shows a schematic diagram of the internal structure of a relocation device according to some embodiments of the present disclosure.
  • the repositioning device shown in Figure 4 may be located in the above-mentioned planar tracking device.
  • the above-mentioned relocation device may include:
  • the first feature point acquisition module 402 is configured to obtain the feature points of the current image frame and the descriptors of each feature point in response to determining that the relocation condition is met;
  • the first feature matching module 404 is configured to perform feature matching on the current image frame with each saved key frame based on the feature points of the current image frame and the descriptors of each feature point, and obtain the current image frame respectively. Feature point pairs matched between image frames and each key frame;
  • Matching degree determination module 406 configured to determine the matching degree of the current image frame and each key frame based on the feature point pair;
  • the target key frame determination module 408 is used to determine the key frame with the highest matching degree with the current image frame as the target key frame.
  • the pose replacement module 410 is configured to replace the camera pose corresponding to the current image frame with the camera pose corresponding to the target key frame.
  • the above-mentioned matching degree determination module 406 may include:
  • a homography matrix determination unit configured to, for each key frame, determine the homography matrix between the current image frame and the key frame based on the matched feature point pair between the current image frame and the key frame;
  • an interior point number determination unit used to determine the number of feature point pairs that satisfy the relationship reflected by the homography matrix among the feature point pairs
  • a matching degree determining unit configured to use the number of feature point pairs as a matching degree between the current image frame and the key frame.
  • Figure 5 shows an internal structure diagram of a relocation device according to other embodiments of the present disclosure. intention.
  • the above-mentioned relocation device in addition to the above-mentioned first feature point acquisition module 402, first feature matching module 404, matching degree determination module 406, target key frame determination module 408 and pose replacement module 410, the above-mentioned relocation device also includes Can further include:
  • the second feature point acquisition module 502 is configured to acquire the feature points of the current image frame and the descriptors of each feature point in response to determining that the key frame preliminary screening conditions are met;
  • the second feature matching module 504 is used to perform feature matching between the current image frame and the saved reference image frame based on the feature points of the current image frame and the descriptors of each feature point, and obtain the matched second features. point right;
  • Homography matrix estimation module 506 configured to estimate the homography matrix between the current image frame and the reference image frame according to the second feature point pair;
  • the key frame determination module 508 is configured to determine that the homography matrix can be estimated in response to determining that the current image frame is a key frame, and record the feature points of the current image frame, the descriptors of each feature point, and the The camera pose corresponding to the current image frame.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor When executing the program, the relocation method described in any of the above embodiments is implemented.
  • Figure 6 shows a more specific hardware structure diagram of an electronic device provided by this embodiment.
  • the device may include: a processor 2010, a memory 2020, an input/output interface 2030, a communication interface 2040 and a bus 2050.
  • the processor 2010, the memory 2020, the input/output interface 2030 and the communication interface 2040 implement communication connections between each other within the device through the bus 2050.
  • the processor 2010 may adopt a general-purpose CPU (Central Processing Unit, central processing unit). Processor), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technical solutions provided by the embodiments of this specification.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 2020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 2020 can store operating systems and other application programs. When the technical solutions provided by the embodiments of this specification are implemented through software or firmware, the relevant program codes are stored in the memory 2020 and called and executed by the processor 2010.
  • the input/output interface 2030 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 2040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 2050 includes a path that carries information between various components of the device (eg, processor 2010, memory 2020, input/output interface 2030, and communication interface 2040).
  • the above device only shows the processor 2010, the memory 2020, the input/output interface 2030, the communication interface 2040 and the bus 2050, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.
  • the present disclosure also provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium
  • the medium stores computer instructions, which are used to cause the computer to execute the relocation method described in any of the above embodiments.
  • the computer-readable media in this embodiment include permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cassettes tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • the computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the task processing method described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • DRAM dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供一种重定位方法,包括:响应于确定当前图像帧满足重定位条件,获取所述当前图像帧的特征点以及各特征点的描述子;基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到所述当前图像帧与每一个关键帧匹配后的特征点对;基于所述特征点对分别确定所述当前图像帧与每一个关键帧的匹配度;将与所述当前图像帧匹配度最高的关键帧确定为目标关键帧;以及使用所述目标关键帧对应的相机位姿替换所述当前图像帧对应的相机位姿。基于上述重定位方法,本公开还提供了重定位装置、电子设备、存储介质以及程序产品。

Description

重定位方法及相关设备
本申请要求2022年3月25日递交的,标题为“重定位方法及相关设备”、申请号为CN202210306850.9的中国发明专利申请的优先权。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及一种重定位方法、装置、电子设备、存储介质及程序产品。
背景技术
同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)是指机器人搭载特定传感器,在没有环境先验信息的情况下,于运动过程中估计传感器的位姿并同时对周围环境建模。在上述传感器主要为相机时,则可以将SLAM称之为视觉SLAM(VSLAM)。SLAM技术已经研究和发展了三十多年,研究人员已经做了大量的工作,近十年来,随着计算机视觉的发展,VSLAM以其硬件成本低廉、轻便、高精度等优势获得了学术界和工业界的青睐。
目前,SLAM技术已经广泛应用于各种增强现实的应用中,例如,平面检测和平面跟踪等。然而,由于噪声的存在,上述平面跟踪结果可能会存在误差。同时,SLAM技术采用的渐近式的帧间匹配方式还可能会造成误差的积累,从而导致在使用一段时间后平面跟踪结果存在漂移。因此,如何在SLAM的平面跟踪过程中消除误差的积累成为SLAM技术需要解决的关键问题之一。
发明内容
有鉴于此,本公开的实施例提供一种重定位方法,可以在平面跟 踪过程中,准确确定相机位姿,消除平面跟踪过程的误差积累,从而保证平面跟踪的准确度。
根据本公开的一些实施例,上述重定位方法可以包括:响应于确定当前图像帧满足重定位条件,获取所述当前图像帧的特征点以及各特征点的描述子;基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到所述当前图像帧与每一个关键帧匹配后的特征点对;基于所述特征点对分别确定所述当前图像帧与每一个关键帧的匹配度;将与所述当前图像帧匹配度最高的关键帧确定为目标关键帧;以及使用所述目标关键帧对应的相机位姿替换所述当前图像帧对应的相机位姿。
基于上述重定位方法,本公开的实施例提供了一种重定位装置,包括:
第一特征点获取模块,用于响应于确定满足重定位条件,获取当前图像帧的特征点以及各特征点的描述子;
第一特征匹配模块,用于基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到所述当前图像帧与每一个关键帧匹配后的特征点对;
匹配度确定模块,用于基于所述特征点对分别确定所述当前图像帧与每一个关键帧的匹配度;
目标关键帧确定模块,用于将与所述当前图像帧匹配度最高的关键帧确定为目标关键帧;以及
位姿替换模块,用于使用所述目标关键帧对应的相机位姿替换所述当前图像帧对应的相机位姿。
此外,本公开的实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述重定位方法。
本公开的实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使 计算机执行上述重定位方法。
本公开的实施例还提供了一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得计算机执行上述重定位方法。
从上述内容可以看出,在相机的反复运动过程中,相机位姿可能会因误差积累而产生漂移,从而导致平面跟踪结果也存在漂移。而通过本公开提供的重定位方法和装置,当相机运动回到已保存的一个关键帧对应的位姿时可以准确确定到这个关键帧,并利用这个关键帧对应的相机位姿替换当前图像帧对应的相机位姿,从而直接将相机位姿拉回到之前已经保存的关键帧对应的相机位姿上,以消除平面跟踪过程的误差积累,解决由于误差积累所造成的平面跟踪的漂移问题,保证平面跟踪的准确度。
附图说明
为了更清楚地说明本公开或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1显示了本公开一些实施例所述的重定位方法中保留关键帧部分的实现流程;
图2显示了本公开一些实施例所述的基于保留的关键帧进行相机位姿重定位的实现流程;
图3显示了本公开一些实施例所述的确定第二图像帧与关键帧的匹配度的具体实现流程;
图4显示了本公开一些实施例所述的重定位装置的内部结构示意图;
图5显示了本公开另一些实施例所述的重定位装置的内部结构示意图;以及
图6示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。
需要说明的是,除非另外定义,本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
如前所述,SLAM技术采用渐近式的帧间匹配方式进行平面跟踪。在进行平面跟踪的过程中可以得到一段视频中各图像帧对应的相机位姿。具体地,在平面跟踪过程中,可以首先对当前图像帧进行特征提取,得到当前图像帧的多个特征点以及各特征点的描述子;再将当前图像帧的特征点与其前一图像帧的特征点进行匹配;然后,再根据特征匹配结果确定当前图像帧和其前一图像帧特征点之间的映射关系,并根据这种映射关系确定当前图像帧对应的相机位姿,并进一步跟踪图像帧中的平面等。上述映射关系例如可以是两个图像帧之间的单应性矩阵或者是两个图像帧之间的基础矩阵等等。而由于噪声的存在,通过上述方法得到每一图像帧对应相机位姿以及平面跟踪等结果均可能会存在误差。且由于上述结果均是根据当前图像帧与其前一图像帧之间的关系得到的,因此,在上述平面跟踪过程运行一段时间之 后,还可能会造成误差的积累,从而导致在使用一段时间后平面跟踪结果将存在严重漂移。
为此,本公开的一些实施例提供了一种重定位方法,可以在平面跟踪过程中准确确定相机位姿,消除平面跟踪过程的误差积累,保证平面跟踪的准确度。需要说明的是,在本公开的实施例中,上述重定位方法可以由一平面跟踪设备实现。在本公开的实施例中,上述平面跟踪设备可以是具有计算能力的电子设备。上述平面跟踪设备还可以通过显示屏幕显示可以与用户进行交互的交互界面,从而为用户提供视频或图像处理的功能。
本公开实施例所述的重定位方法通常在对当前图像帧进行平面跟踪之后执行,主要包括两个部分的内容。其中,第一部分的内容是保留关键帧;第二部分的内容是基于保留的关键帧进行相机位姿的重定位。下面就分别对上述两个部分的内容进行详细说明。
图1显示了本公开实施例所述的重定位方法中保留关键帧部分的实现流程。如图1所示,该方法可以包括如下步骤:
在步骤102,响应于确定第一图像帧满足关键帧初筛条件,获取上述第一图像帧的特征点以及各特征点的描述子。
在本公开的实施例中,上述第一图像帧是指当前需要进行平面跟踪的视频中的任意一个图像帧,也即代表待处理的当前图像帧。为了描述方便,在本实施例中将其称为第一图像帧。
此外,上述关键帧初筛条件是预先设置的用于启动保留关键帧部分操作的条件,也即在确定当前图像帧满足上述关键帧初筛条件时,则启动保留关键帧的操作,执行后续的流程;而如果确定当前图像帧不满足上述关键帧初筛条件,则不执行后续的流程。
在本公开的一些实施例中,上述关键帧初筛条件可以包括:确定上述第一图像帧所对应的相机位姿与已保存的关键帧所对应的相机位姿之间的差异均大于预先设定的位姿差异阈值。上述位姿差异阈值可以包括距离差异阈值以及视角差异阈值。具体地,如果确定上述第一图像帧所对应的相机位姿与已保存的任意关键帧所对应的相机位 姿的距离超过上述距离差异阈值和/或其视角差异超过上述视角差异阈值,则可以确定上述第一图像帧所对应的相机位姿与已保存的关键帧所对应的相机位姿之间的差异均大于预先设定的位姿差异阈值,也即上述第一图像帧满足上述关键帧初筛条件。这种情况适用于机器自动选择关键帧的情况。通常,还可以将一段视频的初始图像帧自动设置为第一个关键帧。
在本公开的另一些实施例中,上述关键帧初筛条件可以包括:检测到用户点击上述平面跟踪设备的屏幕。这种情况适用于人工选择关键帧的情况。用户在通过上述平面跟踪设备的屏幕观看一段视频时,可以人工判断关键帧所在的位置,并在确定当前显示的图像帧为关键帧时,选择点击屏幕跟踪设备的屏幕,以启动保留关键帧的操作。
需要说明的是,上述第一图像帧所对应的相机位姿可以通过前述的平面跟踪过程获得,在此不再赘述。
此外,具体地,在本公开的实施例中,在上述步骤102,上述平面跟踪设备可以采用任意一种计算机视觉的图像特征提取方法对上述第一图像帧进行特征提取,以获取上述第一图像帧的特征点以及各特征点的描述子。例如,上述平面跟踪设备可以采用尺度不变特征变换(Scale-invariant feature transform,SIFT)算法、ORB(Oriented FAST and Rotated BRIEF)算法以及加速版的具有鲁棒特性的特征算法(Speed Up Robust Features,SURF)等等方法对上述第一图像帧进行特征提取,以获取第一图像帧的特征点以及各特征点的描述子。本公开对上述步骤102具体采用的特征提取方法不进行限定。
在本公开的另一些实施例中,如果之前在对上述第一图像帧进行平面跟踪时已经提取并记录了上述第一图像帧的特征点以及各特征点的描述子,也可以直接读取已记录的上述第一图像帧的特征点以及各特征点的描述子,而无需重新对上述第一图像帧进行特征提取。
在本公开的另一些实施例中,在获取了上述第一图像帧的特征点以及各特征点的描述子之后,还可以进一步确定上述第一图像帧的特征点的数量是否小于预先设定的特征点数量阈值。响应于确定上述第 一图像帧的特征点的数量小于上述特征点数量阈值,可以确定上述第一图像帧不是关键帧,并结束上述流程;响应于确定上述第一图像帧的特征点的数量大于或等于上述特征点数量阈值,可以继续执行下面的步骤104。
在步骤104,基于上述第一图像帧的特征点以及各特征点的描述子将上述第一图像帧与已保存的参考图像帧进行特征匹配,得到匹配后的特征点对。
在本公开的实施例中,上述参考图像帧可以为由上述平面跟踪设备处理并保存的在上述第一图像帧之前的图像帧。例如,上述参考图像帧可以是上述第一图像帧的前一个图像帧。又例如,上述参考图像帧还可以是上述第一图像帧的前一个关键帧。
在本公开的实施例中,每个特征点对均包含了上述第一图像帧中的一个特征点还包含了上述参考图像帧中的一个与上述第一图像帧中特征点对应的特征点。具体地,上述平面跟踪设备可以根据各个特征点的描述子来进行特征匹配。在本公开的另一些实施例中,上述平面跟踪设备还可以采用光流跟踪算法将上述第一图像帧中的特征点跟踪到上述参考图像帧中的特征点。本公开对上述步骤104具体采用的特征匹配方法不进行限定。
在步骤106,根据上述匹配后的特征点对估计上述第一图像帧与上述参考图像帧之间的单应矩阵。
在本公开的实施例中,上述平面跟踪设备可以通过随机抽样一致算法(Random Sample Consensus,RANSAC)来确定上述第一图像帧与上述参考图像帧之间的单应矩阵。
RANSAC是由Fischler和Bolles于1981年最先提出的算法。该算法根据一组包含异常数据的样本数据集,计算出数据的数学模型参数。目前,RANSAC算法在计算机视觉的匹配问题中通常被用来寻找最佳的匹配模型。对应于本公开的实施例,利用上述匹配后的特征点对通过RANSAC算法得到的最佳匹配模型就是本实施例中所述的单应矩阵。具体地,上述通过RANSAC算法来确定上述第一图像帧与 上述参考图像帧之间的单应矩阵的过程可以包括:首先,将上述特征点对组成的集合作为集合P;然后,从该集合P中随机选择4组特征点对,并基于所选择的4组特征点对估计出一个模型M;接下来,对于该集合P中剩余的特征点对,分别计算每个特征点对与上述模型M的距离,当距离超过一个设定的第一阈值时,则认为该特征点对是局外点或外点;当距离不超过这个设定的阈值时,则认为该特征点对是局内点或内点;在计算完该集合P中剩余的特征点对后,记录该模型M所对应的内点的个数mi。再接下来,将上面过程重复k次后,选择最大mi所对应的模型M作为最终结果。当然,如果将上面过程重复k次后,所有模型M所对应的mi均小于另一个设定的第二阈值时,则认为估计失败,也即无法获得上述第一图像帧与上述参考图像帧之间的单应矩阵。
在步骤108,响应于确定可估计出上述单应矩阵,确定上述第一图像帧为关键图像帧,并记录上述第一图像帧的特征点、各个特征点的描述子以及上述第一图像帧对应的相机位姿。
在本公开的实施例中,上述步骤108还可以进一步包括:响应于确定无法估计出上述单应矩阵,可以确定上述第一图像帧不是关键帧,并结束上述流程。
通过上述图1所述的方法,可以从视频的各个图像帧中确定一系列的关键帧,这些关键帧通常对应一些相对关键的相机位姿,例如,这些关键帧所对应的相机位姿之间通常会具有一定的距离和/或视角差异。如此,在后续的操作中,利用这些记录的关键帧,则可以对相机位姿进行重定位。
图2显示了本公开实施例所述的基于保留的关键帧进行相机位姿重定位的实现流程。如图2所示,该方法可以包括如下步骤:
在上述步骤202,响应于确定第二图像帧满足重定位条件,获取上述第二图像帧的特征点以及各特征点的描述子。
在本公开的实施例中,上述第二图像帧是指当前需要进行平面跟踪的视频中的任意一个图像帧,也即代表待处理的当前图像帧。为了 描述方便,在本实施例中将其称为第二图像帧。需要说明的是,在一个图像帧同时满足关键帧初筛条件以及重定位条件时,上述第二图像帧与上述第一图像帧是同一个图像帧。在其他情况下,上述第二图像帧与上述第一图像帧也可以不是同一个图像帧。
上述重定位条件是预先设置的用于启动重定位的初始条件。在本公开的一些实施例中,上述重定位条件可以包括:图像帧之间平面跟踪失败的次数超过预先设定的平面跟踪失败阈值。
如前所述,在平面跟踪过程中,一个图像帧需要与其前一图像帧之间进行特征匹配,然后根据匹配得到的特征点对进行相机位姿估计和平面跟踪。而如果上述平面跟踪过程中,无法估计出相机位姿,则说明对上述图像帧的平面跟踪失败,此时可将平面跟踪失败次数加1次。在这种情况下,可以使用其前一图像帧对应的相机位姿作为该图像帧对应的相机位姿,也即假定图像是静止的。在本公开的实施例中,如果直至当前的图像帧,也即第二图像帧,记录的平面跟踪失败的次数超过预先设定的平面跟踪失败阈值,则可以认为满足重定位条件。此外,在本公开的实施例中,在重定位之后,还可以将记录的平面跟踪失败的次数清零。
在本公开的另一些实施例中,上述重定位条件还可以进一步包括:上述第二图像帧的相邻图像帧平面跟踪误差小于预先设置的平面跟踪误差阈值。需要说明的是,在进行平面跟踪的过程中,还会对平面跟踪结果的误差进行评估,得到平面跟踪的误差。通常,图像帧的画面越模糊平面跟踪的误差将越大,当上述第二图像帧的相邻图像帧平面跟踪误差小于预先设置的平面跟踪误差阈值时,则说明当前第二图像帧的画面不模糊,可以在上述第二图像帧进行相机位姿重定位。
在确定满足上述重定位条件后,上述平面跟踪设备将获取当前第二图像帧的特征点以及各特征点的描述子。
特别地,在本公开的实施例中,上述平面跟踪设备将采用与上述步骤102中获取第一图像帧的特征点以及各特征点的描述子相同的方法获取上述第二图像帧的特征点以及各特征点的描述子。
例如,如果在上述步骤102,上述平面跟踪设备采用SIFT算法获取上述第一图像帧的特征点以及各特征点的描述子,则在当前步骤202,上述平面跟踪设备也将采用SIFT算法获取上述第二图像帧的特征点以及各特征点的描述子。又例如,如果在上述步骤102,上述平面跟踪设备直接获取在平面跟踪过程中得到的上述第一图像帧的特征点以及各特征点的描述子,则在当前步骤202,上述平面跟踪设备也将直接获取在平面跟踪过程中得到的上述第二图像帧的特征点以及各特征点的描述子。
在步骤204,基于上述第二图像帧的特征点以及各特征点的描述子,将上述第二图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到上述第二图像帧与每一个关键帧匹配后的第二特征点对。
在本公开的实施例中,每个特征点对包含了上述第二图像帧中的一个特征点还包含了关键帧中的一个与上述第一图像帧中特征点对应的特征点。具体地,上述平面跟踪设备可以根据各个特征点的描述子来进行特征匹配。在本公开的另一些实施例中,上述平面跟踪设备还可以采用光流跟踪算法将上述第二图像帧中的特征点跟踪到上述各个关键帧中的特征点。本公开对上述步骤204具体采用的特征匹配方法不进行限定。
在步骤206,基于上述第二特征点对分别确定上述第二图像帧与每一个关键帧的匹配度。
在本公开的实施例中,针对每个关键帧,基于上述第二特征点对确定上述第二图像帧与上述关键帧的匹配度的具体实现过程可以如图3所示,包括如下步骤:
在步骤302,根据上述第二特征点对确定上述第二图像帧与上述关键帧之间的单应矩阵。
在本公开的实施例中,上述平面跟踪设备也可以通过RANSAC算法来确定上述第二图像帧与上述关键帧之间的单应矩阵。具体方法如前所述,在此不再重复说明。
在步骤304,确定上述第二特征点对中满足上述单应矩阵所反映 关系的特征点对的数量。
如前所述,RANSAC算法是根据一组包含异常数据的样本数据集来寻找最佳的匹配模型的算法,而由于其所使用的样本数据集包含异常数据,因此,并不是所有的样本对都能满足通过RANSAC算法得到的最佳匹配模型,其中,满足所得到的最佳匹配模型的样本通常被称为局内点或内点,而不满足所得到的最佳匹配模型的样本通常被称为局外点或外点。对应于本公开的实施例,在上述步骤304中,利用上述匹配后的特征点对通过RANSAC算法得到的最佳匹配模型就是本实施例中的单应矩阵。而且,可以理解,并不是所有的特征点都满足上述单应矩阵所示的变换关系。因此,在本步骤中,可以确定上述所有匹配后特征点对中满足上述单应矩阵所反映关系的特征点对的数量,也就是确定局内点的数量。
在步骤306,将上述特征点对的数量作为上述第二图像帧与上述关键帧的匹配度。
本领域的技术人员可以理解,满足上述单应矩阵所反映关系的特征点对的数量越多,说明上述第二图像帧和上述关键帧的匹配度越高。例如,相机在相同位置以及通过相同拍摄视角拍摄得到的两个图像帧上的特征点对应当均满足根据这两个图像帧获得的单应矩阵所反映变化关系。而相机在完全不同位置或通过完全不同拍摄视角拍摄得到的两个图像帧上满足根据这两个图像帧获得的单应矩阵所反映变化关系的特征点对的数量就会相对较少。因此,在本公开的实施例中,将上述特征点对的数量作为上述第二图像帧与上述关键帧的匹配度。
在步骤208,将与上述第二图像帧匹配度最高的关键帧确定为目标关键帧。
由此可以看出,通过上述方法,可以从所有关键帧中确定一个与上述第二图像帧匹配度最高的关键帧作为目标关键帧。
通常,本领域技术人员可以理解,当相机位姿差异越小时其所拍摄的图像之间的匹配度应该越大。因此,通过上述方法,可以从所有 关键帧中找的其对应相机位姿与上述第二图像帧所对应相机位姿差异最小的关键帧。也就是在相机的反复运动过程中,当相机运动回到拍摄其中一个关键帧的位姿时可以通过上述方法确定到这一个关键帧。
在步骤210,使用上述目标关键帧对应的相机位姿替换上述第二图像帧对应的相机位姿。
可以看出,在相机的反复运动过程中,相机位姿可能会因误差积累而产生漂移,从而导致平面跟踪结果也存在漂移。通过上述方法,当相机运动回到拍摄其中一个关键帧的位姿时可以确定到这个关键帧,并利用这个关键帧对应的相机位姿替换上述第二图像帧对应的相机位姿,从而直接将相机位姿拉回到之前已经保存的关键帧对应的相机位姿,以消除平面跟踪过程的误差积累,解决由于误差积累所造成的平面跟踪的漂移问题,保证平面跟踪的准确度。
在本公开的另一些实施例中,上述方法在上述步骤208之前还可以进一步包括:确定上述第二图像帧与每一个关键帧的匹配度是否均小于预先设定的匹配度阈值;响应于确定上述第二图像帧与每一个关键帧的匹配度均小于预先设定的匹配度阈值,表明重定位失败,结束上述流程;响应于确定上述第二图像帧与每一个关键帧的匹配度不均小于预先设定的匹配度阈值,继续执行上述步骤208。
在上述实施例中,当上述第二图像帧与每一个关键帧的匹配度均小于预先设定的匹配度阈值时,说明上述第二图像帧与上述每一个关键帧均不匹配,因此,无需进行相机位姿的替换。
需要说明的是,本公开实施例的方法可以由单个设备执行,例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下,由多台设备相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可以只执行本公开实施例的方法中的某一个或多个步骤,这多台设备相互之间会进行交互以完成所述的方法。
需要说明的是,上述对本公开的一些实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记 载的动作或步骤可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
基于同一发明构思,与上述任意实施例方法相对应的,本公开还提供了一种重定位装置。图4显示了本公开一些实施例所述的重定位装置的内部结构示意图。图4所示的重定位装置可以位于中上述平面跟踪设备中。如图4所示,上述重定位装置可以包括:
第一特征点获取模块402,用于响应于确定满足重定位条件,获取当前图像帧的特征点以及各特征点的描述子;
第一特征匹配模块404,用于基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到所述当前图像帧与每一个关键帧匹配后的特征点对;
匹配度确定模块406,用于基于所述特征点对分别确定所述当前图像帧与每一个关键帧的匹配度;
目标关键帧确定模块408,用于将与所述当前图像帧匹配度最高的关键帧确定为目标关键帧;以及
位姿替换模块410,用于使用所述目标关键帧对应的相机位姿替换所述当前图像帧对应的相机位姿。
在本公开的实施例中,上述匹配度确定模块406可以包括:
单应矩阵确定单元,用于针对每一个关键帧,根据所述当前图像帧与所述关键帧匹配后的特征点对确定所述当前图像帧与所述关键帧之间的单应矩阵;
内点数量确定单元,用于确定所述特征点对中满足所述单应矩阵所反映关系的特征点对的数量;以及
匹配度确定单元,用于将所述特征点对的数量作为所述当前图像帧与所述关键帧的匹配度。
图5显示了本公开另一些实施例所述的重定位装置的内部结构示 意图。如图5所示,上述重定位装置除了包括上述第一特征点获取模块402、第一特征匹配模块404、匹配度确定模块406、目标关键帧确定模块408以及位姿替换模块410之外,还可以进一步包括:
第二特征点获取模块502,用于响应于确定满足关键帧初筛条件,获取所述当前图像帧的特征点以及各特征点的描述子;
第二特征匹配模块504,用于基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧与已保存的参考图像帧进行特征匹配,得到匹配后的第二特征点对;
单应矩阵估计模块506,用于根据所述第二特征点对估计所述当前图像帧与所述参考图像帧之间的单应矩阵;以及
关键帧确定模块508,用于响应于确定可估计出所述单应矩阵,确定所述当前图像帧为关键帧,并记录所述当前图像帧的特征点、各个特征点的描述子以及所述当前图像帧对应的相机位姿。
上述各个模块的具体实现可以参考前述方法以及附图,在此不再重复说明。为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本公开时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的装置用于实现前述任一实施例中相应的重定位方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一发明构思,与上述任意实施例方法相对应的,本公开还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上任意一实施例所述的重定位方法。
图6示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图,该设备可以包括:处理器2010、存储器2020、输入/输出接口2030、通信接口2040和总线2050。其中处理器2010、存储器2020、输入/输出接口2030和通信接口2040通过总线2050实现彼此之间在设备内部的通信连接。
处理器2010可以采用通用的CPU(Central Processing Unit,中央 处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器2020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器2020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器2020中,并由处理器2010来调用执行。
输入/输出接口2030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口2040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线2050包括一通路,在设备的各个组件(例如处理器2010、存储器2020、输入/输出接口2030和通信接口2040)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器2010、存储器2020、输入/输出接口2030、通信接口2040以及总线2050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
上述实施例的电子设备用于实现前述任一实施例中相应的重定位方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一发明构思,与上述任意实施例方法相对应的,本公开还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储 介质存储计算机指令,所述计算机指令用于使所述计算机执行如上任一实施例所述的重定位方法。
本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的任务处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本公开的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本公开实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。
另外,为简化说明和讨论,并且为了不会使本公开实施例难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本公开实施例难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本公开实施例的平台的(即,这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本公开的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没 有这些具体细节的情况下或者这些具体细节有变化的情况下实施本公开实施例。因此,这些描述应被认为是说明性的而不是限制性的。
尽管已经结合了本公开的具体实施例对本公开进行了描述,但是根据前面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可以使用所讨论的实施例。
本公开实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本公开实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (17)

  1. 一种重定位方法,包括:
    响应于确定当前图像帧满足重定位条件,获取所述当前图像帧的特征点以及各特征点的描述子;
    基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到所述当前图像帧与每一个关键帧匹配后的特征点对;
    基于所述特征点对分别确定所述当前图像帧与每一个关键帧的匹配度;
    将与所述当前图像帧匹配度最高的关键帧确定为目标关键帧;以及
    使用所述目标关键帧对应的相机位姿替换所述当前图像帧对应的相机位姿。
  2. 根据权利要求1所述的重定位方法,其中,所述重定位条件包括:图像帧之间平面跟踪失败的次数超过预先设定的平面跟踪失败阈值。
  3. 根据权利要求2所述的重定位方法,其中,所述重定位条件进一步包括:所述当前图像帧的相邻图像帧的平面跟踪误差小于预先设定的平面跟踪误差阈值。
  4. 根据权利要求1所述的重定位方法,其中,所述基于所述当前图像帧与每一个关键帧匹配后的特征点对分别确定所述当前图像帧与每一个关键帧的匹配度包括:
    针对每一个关键帧分别执行:
    根据所述特征点对确定所述当前图像帧与所述关键帧之间的单应矩阵;
    确定所述特征点对中满足所述单应矩阵所反映关系的特征点对的数量;以及
    将所述特征点对的数量作为所述当前图像帧与所述关键帧的匹 配度。
  5. 根据权利要求1所述的重定位方法,进一步包括:
    确定所述当前图像帧与每一个关键帧的匹配度是否均小于预先设定的匹配度阈值;以及
    响应于确定所述当前图像帧与每一个关键帧的匹配度均小于所述匹配度阈值,确定重定位失败,结束当前流程。
  6. 根据权利要求1所述的重定位方法,进一步包括:
    响应于确定所述当前图像帧满足关键帧初筛条件,获取所述当前图像帧的特征点以及各特征点的描述子;
    基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧与已保存的参考图像帧进行特征匹配,得到匹配后的第二特征点对;
    根据所述第二特征点对估计所述当前图像帧与所述参考图像帧之间的单应矩阵;以及
    响应于确定可估计出所述单应矩阵,确定所述当前图像帧为关键帧,并记录所述当前图像帧的特征点、各个特征点的描述子以及所述当前图像帧对应的相机位姿。
  7. 根据权利要求6所述的重定位方法,其中,所述关键帧初筛条件包括:检测到用户点击平面跟踪设备的屏幕;或者,确定所述当前图像帧所对应的相机位姿与所述每一个关键帧所对应的相机位姿之间的差异均大于预先设定的位姿差异阈值。
  8. 根据权利要求6所述的重定位方法,进一步包括:
    响应于确定所述当前图像帧的特征点的数量小于预先设定的特征点数量阈值,确定所述当前图像帧不是关键帧,结束当前流程;或者,
    响应于确定无法估计出所述单应矩阵,确定所述当前图像帧不是关键帧,结束当前流程。
  9. 根据权利要求1或6所述的重定位方法,其中,所述获取所述当前图像帧的特征点以及各特征点的描述子包括:
    采用尺度不变特征变换SIFT算法、ORB算法或加速版的具有鲁棒特性的特征算法SURF对所述当前图像帧进行特征提取,获取所述当前图像帧的特征点以及各特征点的描述子;或者
    读取已记录的所述当前图像帧的特征点以及各特征点的描述子。
  10. 根据权利要求1所述的重定位方法,其中,所述将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配包括:采用光流跟踪算法将所述当前图像帧中的特征点跟踪到所述每一个关键帧中的特征点。
  11. 根据权利要求6所述的重定位方法,其中,所述将所述当前图像帧与已保存的参考图像帧进行特征匹配包括:采用光流跟踪算法将所述当前图像帧中的特征点跟踪到所述参考图像帧中的特征点。
  12. 一种重定位装置,包括:
    第一特征点获取模块,用于响应于确定满足重定位条件,获取当前图像帧的特征点以及各特征点的描述子;
    第一特征匹配模块,用于基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧分别与已保存的每一个关键帧进行特征匹配,分别得到所述当前图像帧与每一个关键帧匹配后的特征点对;
    匹配度确定模块,用于基于所述特征点对分别确定所述当前图像帧与每一个关键帧的匹配度;
    目标关键帧确定模块,用于将与所述当前图像帧匹配度最高的关键帧确定为目标关键帧;以及
    位姿替换模块,用于使用所述目标关键帧对应的相机位姿替换所述当前图像帧对应的相机位姿。
  13. 根据权利要求12所述的重定位装置,其中,所述匹配度确定模块包括:
    单应矩阵确定单元,用于针对每一个关键帧,根据所述当前图像帧与所述关键帧匹配后的特征点对确定所述当前图像帧与所述关键帧之间的单应矩阵;
    内点数量确定单元,用于确定所述特征点对中满足所述单应矩阵所反映关系的特征点对的数量;以及
    匹配度确定单元,用于将所述特征点对的数量作为所述当前图像帧与所述关键帧的匹配度。
  14. 根据权利要求12所述的重定位装置,进一步包括:
    第二特征点获取模块,用于响应于确定满足关键帧初筛条件,获取所述当前图像帧的特征点以及各特征点的描述子;
    第二特征匹配模块,用于基于所述当前图像帧的特征点以及各特征点的描述子,将所述当前图像帧与已保存的参考图像帧进行特征匹配,得到匹配后的第二特征点对;
    单应矩阵估计模块,用于根据所述第二特征点对估计所述当前图像帧与所述参考图像帧之间的单应矩阵;以及
    关键帧确定模块,用于响应于确定可估计出所述单应矩阵,确定所述当前图像帧为关键帧,并记录所述当前图像帧的特征点、各个特征点的描述子以及所述当前图像帧对应的相机位姿。
  15. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1-8中任意一项所述的重定位方法。
  16. 一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使计算机执行权利要求1-8中任意一项所述的重定位方法。
  17. 一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得计算机执行如权利要求1-8中任意一项所述的重定位方法。
PCT/CN2023/079654 2022-03-25 2023-03-03 重定位方法及相关设备 WO2023179342A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210306850.9A CN116862979A (zh) 2022-03-25 2022-03-25 重定位方法及相关设备
CN202210306850.9 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023179342A1 true WO2023179342A1 (zh) 2023-09-28

Family

ID=88099912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079654 WO2023179342A1 (zh) 2022-03-25 2023-03-03 重定位方法及相关设备

Country Status (2)

Country Link
CN (1) CN116862979A (zh)
WO (1) WO2023179342A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170278231A1 (en) * 2016-03-25 2017-09-28 Samsung Electronics Co., Ltd. Device for and method of determining a pose of a camera
CN108596976A (zh) * 2018-04-27 2018-09-28 腾讯科技(深圳)有限公司 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN108615247A (zh) * 2018-04-27 2018-10-02 深圳市腾讯计算机系统有限公司 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN110533694A (zh) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 图像处理方法、装置、终端及存储介质
CN111429517A (zh) * 2020-03-23 2020-07-17 Oppo广东移动通信有限公司 重定位方法、重定位装置、存储介质与电子设备
CN114120301A (zh) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 一种位姿确定方法、装置及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170278231A1 (en) * 2016-03-25 2017-09-28 Samsung Electronics Co., Ltd. Device for and method of determining a pose of a camera
CN108596976A (zh) * 2018-04-27 2018-09-28 腾讯科技(深圳)有限公司 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN108615247A (zh) * 2018-04-27 2018-10-02 深圳市腾讯计算机系统有限公司 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN110533694A (zh) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 图像处理方法、装置、终端及存储介质
CN111429517A (zh) * 2020-03-23 2020-07-17 Oppo广东移动通信有限公司 重定位方法、重定位装置、存储介质与电子设备
CN114120301A (zh) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 一种位姿确定方法、装置及设备

Also Published As

Publication number Publication date
CN116862979A (zh) 2023-10-10

Similar Documents

Publication Publication Date Title
CN109242913B (zh) 采集器相对参数的标定方法、装置、设备和介质
CN110322500B (zh) 即时定位与地图构建的优化方法及装置、介质和电子设备
CN108805917B (zh) 空间定位的方法、介质、装置和计算设备
EP3175427B1 (en) System and method of pose estimation
US10110913B2 (en) Motion estimation using hybrid video imaging system
US20150103183A1 (en) Method and apparatus for device orientation tracking using a visual gyroscope
US9536321B2 (en) Apparatus and method for foreground object segmentation
US9756261B2 (en) Method for synthesizing images and electronic device thereof
CN110335317B (zh) 基于终端设备定位的图像处理方法、装置、设备和介质
US20150310617A1 (en) Display control device and display control method
US10204445B2 (en) Information processing apparatus, method, and storage medium for determining a failure of position and orientation measurement of an image capturing device
CN110349212B (zh) 即时定位与地图构建的优化方法及装置、介质和电子设备
US11042984B2 (en) Systems and methods for providing image depth information
US20150154455A1 (en) Face recognition with parallel detection and tracking, and/or grouped feature motion shift tracking
US10762713B2 (en) Method for developing augmented reality experiences in low computer power systems and devices
CN109472824B (zh) 物品位置变化检测方法及装置、存储介质、电子设备
CN110956131B (zh) 单目标追踪方法、装置及系统
JP2019106008A (ja) 推定装置、推定方法、及び推定プログラム
WO2023179342A1 (zh) 重定位方法及相关设备
CN113763466A (zh) 一种回环检测方法、装置、电子设备和存储介质
CN112085842A (zh) 深度值确定方法及装置、电子设备和存储介质
TWI694719B (zh) 影像處理方法,電子裝置及非暫態電腦可讀取儲存媒體
CN113760087A (zh) 命中点位置信息的确定方法和装置
US8463037B2 (en) Detection of low contrast for image processing
CN115210758A (zh) 运动模糊稳健的图像特征匹配

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773593

Country of ref document: EP

Kind code of ref document: A1