WO2019205865A1 - 相机姿态追踪过程的重定位方法、装置、设备及存储介质 - Google Patents

相机姿态追踪过程的重定位方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2019205865A1
WO2019205865A1 PCT/CN2019/079768 CN2019079768W WO2019205865A1 WO 2019205865 A1 WO2019205865 A1 WO 2019205865A1 CN 2019079768 W CN2019079768 W CN 2019079768W WO 2019205865 A1 WO2019205865 A1 WO 2019205865A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature point
initial
node
camera
target
Prior art date
Application number
PCT/CN2019/079768
Other languages
English (en)
French (fr)
Inventor
林祥凯
凌永根
暴林超
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19792167.9A priority Critical patent/EP3786892B1/en
Publication of WO2019205865A1 publication Critical patent/WO2019205865A1/zh
Priority to US16/915,825 priority patent/US11481923B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the embodiments of the present application relate to the field of augmented reality, and in particular, to a method, a device, a device, and a storage medium for relocating a camera attitude tracking process.
  • Visual SLAM refers to the technique of estimating the movement of the body while the camera is the main body, without the prior information of the environment, establishing a model of the environment during the movement.
  • SLAM can be used in the field of AR (Augmented Reality), robotics and unmanned driving.
  • the first frame image captured by the camera is usually used as a marker image (Anchor).
  • the device tracks the feature points commonly shared between the current image and the mark image, and calculates the pose change of the camera in the real world according to the change of the feature point position between the current image and the mark image.
  • the feature point loss (Lost) in the current image may occur, and the tracking cannot be continued.
  • the current image needs to be relocated using the SLAM relocation method.
  • the embodiment of the present application provides a relocation method, device, device and storage medium for a camera attitude tracking process.
  • the technical solution is as follows:
  • a relocation method of a camera attitude tracking process is provided, which is applied to a device having a camera for sequentially performing camera attitude tracking of a plurality of marker images, the method comprising:
  • Retargeting obtains the target pose parameter of the camera based on the initial pose parameter and the pose change amount.
  • a repositioning device for a camera attitude tracking process which is applied to a device having a camera for sequentially performing camera attitude tracking of a plurality of marker images
  • the apparatus comprises:
  • An image acquisition module configured to acquire a current image acquired after the i-th mark image of the plurality of mark images, i>1;
  • An information acquiring module configured to acquire an initial feature point and an initial pose parameter of the first one of the plurality of mark images when the current image meets a relocation condition
  • a feature point tracking module configured to perform feature point tracking on the initial image point of the current image with respect to the first feature image, to obtain a plurality of sets of matching feature point pairs
  • a feature point screening module configured to filter the plurality of sets of matching feature point pairs according to the constraint condition, and obtain the matched matching feature point pairs;
  • a calculation module configured to calculate a pose change amount when the camera changes from the initial pose parameter to the target pose parameter according to the matched matched feature point pairs;
  • a relocation module configured to reposition the target pose parameter of the camera according to the initial pose parameter and the pose change amount.
  • an electronic device including a memory and a processor
  • At least one instruction is stored in the memory, the at least one instruction being loaded and executed by the processor to implement a relocation method in a camera pose tracking process as described above.
  • a computer readable storage medium having stored therein at least one instruction loaded by a processor and executed to implement a camera pose tracking process as described above Relocation method in .
  • relocation can be implemented in the Anchor-SLAM algorithm for tracking a plurality of consecutive marker images, thereby reducing the interruption of the tracking process.
  • Possibility since the relocation process relocates the current image relative to the first marker image, the cumulative error generated by the tracking process of the plurality of marker images can also be eliminated, thereby solving the SLAM relocation method in the related art. Not applicable to the problem of the SLAM algorithm after the variant.
  • the matched feature point pairs are selected by screening the multiple sets of feature points according to the constraint conditions, and the matched pose points are used to calculate the pose change.
  • the matching speed is improved; on the other hand, since the selected feature point pairs are feature point pairs with better matching accuracy, the matching precision can be improved.
  • FIG. 1 is a schematic diagram of a scenario of an AR application scenario provided by an exemplary embodiment of the present application
  • FIG. 2 is a schematic diagram of a scenario of an AR application scenario provided by an exemplary embodiment of the present application
  • FIG. 3 is a schematic diagram of a schematic diagram of an Anchor-Switching AR System algorithm provided by an exemplary embodiment of the present application
  • FIG. 4 is a structural block diagram of an electronic device provided by an exemplary embodiment of the present application.
  • FIG. 5 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application
  • FIG. 6 and FIG. 7 are schematic diagrams of images in which an positioning error occurs in an AR application scenario provided by an exemplary embodiment of the present application
  • FIG. 8 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application.
  • FIG. 9 is a schematic diagram of a pyramid image provided by an exemplary embodiment of the present application.
  • FIG. 10 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application.
  • FIG. 11 is a flowchart of a method for relocating a camera pose tracking process provided by an exemplary embodiment of the present application.
  • FIG. 12 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application.
  • FIG. 13 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application.
  • FIG. 14 is a schematic diagram of the principle of a polar line constraint provided by an exemplary embodiment of the present application.
  • FIG. 15 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application.
  • 16 is a schematic diagram of a principle of a feature point tracking process provided by an exemplary embodiment of the present application.
  • 17 is a flowchart of a method for relocating a camera pose tracking process provided by an exemplary embodiment of the present application.
  • FIG. 18 is a schematic diagram showing the principle of a rasterized screening feature point process provided by an exemplary embodiment of the present application.
  • FIG. 19 is a flowchart of a method for relocating a camera attitude tracking process according to an exemplary embodiment of the present application.
  • FIG. 20 is a block diagram of a relocation device of a camera pose tracking process provided by an exemplary embodiment of the present application
  • FIG. 21 is a block diagram of an electronic device provided by an exemplary embodiment of the present application.
  • AR Augmented Reality
  • Virtual elements include, but are not limited to, images, video, and 3D models.
  • the goal of AR technology is to interact with the virtual world on the screen in the real world.
  • the camera pose parameters include a rotation matrix and a displacement vector, the rotation matrix is used to characterize the rotation angle of the camera in the real world, and the displacement vector is used to characterize the displacement distance of the camera in the real world.
  • the device adds a virtual character image to the image captured by the camera.
  • the image captured by the camera changes, and the orientation of the avatar changes. It simulates that the avatar is still in the image, and the camera changes with position and posture.
  • the effect of capturing images and avatars gives the user a realistic three-dimensional picture.
  • Anchor-Switching AR System is based on the camera attitude tracking connected to multiple marker images (Anchor) to determine the camera pose parameters in the natural scene, and then superimpose the virtual world AR system on the images captured by the camera according to the camera pose parameters.
  • IMU Inertial Measurement Unit
  • an IMU consists of three single-axis accelerometers and three single-axis gyros.
  • the accelerometer is used to detect the acceleration signal of each object in each coordinate axis of the three-dimensional coordinate system, and then calculate the displacement vector; Used to detect the rotation matrix of an object in a three-dimensional coordinate system.
  • the IMU includes a gyroscope, an accelerometer, and a geomagnetic sensor.
  • the three-dimensional coordinate system is established as follows: 1.
  • the X-axis is defined by the vector product Y*Z. In the current position of the device, the X-axis points to the east in a direction tangent to the ground; 2.
  • the Y-axis is At the current position of the device, it points in the direction tangent to the ground to the north pole of the earth's magnetic field; 3.
  • the Z axis points to the sky and is perpendicular to the ground.
  • the present application provides a relocation method suitable for the Anchor-Switching AR System algorithm.
  • the Anchor-Switching AR System algorithm divides the camera's motion process into at least two tracking processes for tracking. Each tracking process corresponds to the respective marker image.
  • a preset condition for example, the feature point that can be matched is less than a preset threshold
  • the previous image of the current image is determined as the i+1th marker image, and the i+1th segment tracking process is turned on. Where i is a positive integer.
  • FIG. 3 is a schematic diagram showing the principle of an Anchor-Switching AR System algorithm provided by an exemplary embodiment of the present application.
  • an object 320 is present, the device 340 provided with the camera is moved by the user, and a multi-frame image 1-6 including the object 320 is captured during the movement.
  • the device determines image 1 as the first marker image (born-anchor or born-image) and records the initial pose parameter, which may be acquired by the IMU, and then performs feature point tracking on the image 2 relative to the image 1.
  • the pose parameter of the camera when the image 2 is captured is calculated; the image 3 is tracked with respect to the image 1 , and the camera is calculated according to the initial pose parameter and the feature point tracking result.
  • the pose parameter when the image 3 is captured; the feature 4 is tracked with respect to the image 1 , and the pose parameter of the camera when the image 4 is captured is calculated based on the initial pose parameter and the feature point tracking result.
  • the image 5 is tracked with respect to the image 1 , and if the feature point tracking effect is worse than the preset condition (for example, the number of matching feature points is small), the image 4 is determined as the second marker image, and the image 5 is Performing feature point tracking with respect to the image 4, calculating the amount of displacement change of the camera between the captured image 4 and the image 5, and calculating the amount of displacement change between the captured image 4 and the image 1 and the initial pose parameter, The pose parameter of the camera when taking image 5. Then, the image 6 is tracked with respect to the image 4, and so on. If the feature point tracking effect of the current image is deteriorated, the previous frame image of the current image can be determined as a new mark image, and the new mark is switched. Feature point tracking is performed again after the image.
  • the preset condition for example, the number of matching feature points is small
  • feature point tracking may employ an algorithm based on a visual odometer principle, such as a feature point method or a direct method.
  • a visual odometer principle such as a feature point method or a direct method.
  • the Anchor-Switching AR System tracking process may be lost. Loss phenomenon means that enough feature points cannot be matched in the current image, resulting in tracking failure.
  • the device includes a processor 420, a memory 440, a camera 460, and an IMU 480.
  • Processor 420 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 420 is configured to execute at least one of instructions, code, code segments, and programs stored in the memory 440.
  • the processor 420 is electrically connected to the memory 440.
  • processor 420 is coupled to memory 440 via a bus.
  • Memory 440 stores one or more instructions, code, code segments, and/or programs. The instructions, code, code segments and/or programs, when executed by processor 420, are used to implement the SLAM relocation method provided in the following embodiments.
  • the processor 420 is also electrically coupled to the camera 460.
  • processor 420 is coupled to camera 460 via a bus.
  • Camera 460 is a sensor device having image acquisition capabilities. Camera 460 may also be referred to as a camera, a photosensitive device, and the like. Camera 460 has the ability to continuously acquire images or acquire images multiple times.
  • camera 460 is located inside or outside the device.
  • the camera 460 is a monocular camera.
  • the processor 420 is also electrically connected to the IMU 480.
  • the IMU 480 is configured to acquire the pose parameters of the camera every predetermined time interval, and record the time stamp of each set of pose parameters at the time of acquisition.
  • the camera's pose parameters include: displacement vector and rotation matrix. Among them, the rotation matrix acquired by IMU480 is relatively accurate, and the displacement vector acquired may have a large error due to the actual environment.
  • FIG. 5 a flowchart of a method of relocating a camera pose tracking process provided by an exemplary embodiment of the present application is shown.
  • This embodiment is exemplified by the application of the relocation method to the apparatus shown in FIG. 4 for performing camera attitude tracking of a plurality of marker images in sequence.
  • the method includes:
  • Step 502 Acquire a current image acquired after the i-th mark image in the plurality of mark images.
  • the camera in the device collects a frame image at a preset time interval to form an image sequence.
  • the camera acquires a frame of image forming image sequence according to a preset time interval during motion (translation and/or rotation).
  • the device determines the first frame image in the image sequence (or one frame image in the first few frames of images that meets the predetermined condition) as the first marker image, and performs the subsequently acquired image on the first marker image.
  • Feature point tracking, and calculating a camera pose parameter according to the feature point tracking result if the feature point tracking effect of the current frame image is worse than a preset condition, determining the previous frame image of the current frame image as the second marker image, The subsequently acquired image is subjected to feature point tracking with respect to the second marker image, and the camera pose parameter of the camera is calculated according to the feature point tracking result, and so on.
  • the device can sequentially perform camera attitude tracking of a plurality of marked images in sequence.
  • the camera When in the i-th tracking process corresponding to the i-th mark image, the camera captures the current image.
  • the current image is a certain frame image acquired after the i-th mark image, where i is an integer greater than one.
  • Step 504 Acquire initial feature points and initial pose parameters of the first one of the plurality of mark images when the current image meets the relocation condition;
  • the initial pose parameter is used to indicate the camera pose when the camera captures the first marker image.
  • the device determines if the current image meets the relocation criteria.
  • the relocation condition is used to indicate that the tracking process of the current image with respect to the i-th flag image fails, or the re-location condition is used to indicate that the accumulated error in the history tracking process has been higher than the preset condition.
  • the device tracks the current image relative to the i-th mark image, if there is no feature point matching the i-th mark image in the current image, or the i-th mark in the current image When the feature points of the image matching are less than the first number, it is determined that the tracking process of the current image with respect to the i-th flag image fails, and the relocation condition is met.
  • the device determines that the number of frames between the current image and the last relocated image is greater than the second number, determining that the accumulated error in the history tracking process is higher than a preset condition, or When it is determined that the number of mark images between the i-th mark image and the first mark image is greater than the third number, it is determined that the cumulative error in the history tracking process has been higher than the preset condition.
  • This embodiment does not limit the specific condition content of the relocation condition.
  • the device attempts to track the current image relative to the first marker image. At this point, the device obtains the initial feature points and the initial pose parameters in the first marked image of the cache.
  • the initial feature point is a feature point extracted from the first marker image, and the initial feature point may be plural, such as 10-500.
  • the initial pose parameter is used to indicate the camera pose when the camera captures the first marker image.
  • the initial pose parameters include a rotation matrix R and a displacement vector T, and the initial pose parameters can be acquired by the IMU.
  • Step 506 Perform feature point tracking on the current image relative to the initial feature point of the first marker image to obtain multiple sets of matching feature point pairs.
  • each set of matching feature point pairs includes two initial feature points and target feature points that match each other.
  • the feature point tracking can use a visual odometer-based tracking algorithm, which is not limited in this application.
  • feature point tracking uses a KLT (Kanade-Lucas) optical flow tracking algorithm; in another embodiment, feature point tracking is based on an ORB (Oriented FAST and Rotated BRIEF) algorithm.
  • KLT Kerade-Lucas
  • ORB Oriented FAST and Rotated BRIEF
  • the ORB feature descriptor performs feature point tracking.
  • the specific algorithm for feature point tracking is not limited in this application, and the feature point tracking process may adopt a feature point method or a direct method.
  • the device performs feature point extraction on the first marker image to obtain N initial feature points; the device further performs feature point extraction on the current image to obtain M candidate feature points; and then M candidate feature points are successively Matching with N initial feature points to determine at least one set of matching feature point pairs.
  • Each set of matching feature point pairs includes: an initial feature point and a target feature point.
  • the initial feature point is a feature point on the first marker image, and the target feature point is a candidate feature point having the highest matching degree with the initial feature point on the current image.
  • the number of initial feature points is greater than or equal to the number of matching feature point pairs.
  • the number of initial feature points is 450, and the matching feature point pairs are 320 groups.
  • Step 508 Filtering the plurality of sets of matching feature point pairs according to the constraint condition, and obtaining the matched matching feature point pairs;
  • the terminal can select more accurate four pairs of matching feature point pairs. Subsequent calculations.
  • the terminal filters the pairs of matching feature point pairs according to the constraint condition, and obtains the matched matching feature point pairs.
  • Constraints are used to constrain the matching accuracy of matching feature point pairs.
  • the constraint includes at least one of the following three conditions:
  • the matching uniqueness is a condition for indicating that the target feature point is a feature point that the initial feature point uniquely matches.
  • the corresponding points of the matching points in the two two-dimensional images on different views should be located on the corresponding polar lines, that is, different two
  • the matching feature point pairs in the two-dimensional image should satisfy the polar constraint test condition.
  • the polar constraint test condition is used to detect whether the polar constraint is satisfied between the target feature and the initial feature point.
  • a large number of feature points may be in a dense area.
  • the region representative constraint is used to pick out representative target feature points in a local region of the current image.
  • Step 510 Calculate a pose change amount when the camera changes from an initial pose parameter to a target pose parameter according to the matched matching feature point pair;
  • the target pose parameter is used to indicate a camera pose when the current image is acquired.
  • the device calculates a homography matrix homography between the two frames according to the selected at least four matched feature point pairs (initial feature points and target feature points); decomposes the homography matrix homography to obtain a camera from The pose change amount R relocalize and T relocalize when the initial pose parameter is changed to the target pose parameter.
  • the homography matrix describes the mapping relationship between two planes. If the feature points in the natural scene (real environment) fall on the same physical plane, the motion estimation can be performed through the homography matrix.
  • the device decomposes the homography matrix calculated by the at least four pairs of matching feature points by ransac, and obtains a rotation matrix R relocalize and a translation vector T relocalize .
  • R relocalize is the rotation matrix when the camera changes from the initial pose parameter to the target pose parameter
  • T relocalize is the displacement vector when the camera changes from the initial pose parameter to the target pose parameter
  • step 512 the target pose parameter is obtained by repositioning according to the initial pose parameter and the pose change amount.
  • the device transforms the initial pose parameter by the amount of pose change, and then re-positions the target pose parameter to calculate the camera pose when the camera captures the current image.
  • the terminal determines the current image as the i+1th mark image.
  • the terminal continues feature point tracking based on the (i+1)th tag image.
  • the terminal may continue to generate the i+2th marker image, the i+3th marker image, the i+4th marker image, and the like according to the subsequent feature point tracking situation, and so on.
  • the tracking content shown in Figure 3 above refers to the tracking content shown in Figure 3 above.
  • the relocation method provided by the embodiment can perform the Anchor-Switching in which a plurality of marker images are continuously tracked by relocating the current image and the first marker image when the current image meets the relocation condition.
  • Relocation is implemented in the AR system algorithm, thereby reducing the possibility of interruption of the tracking process, so that the SLAM relocation method in the related art is not suitable for the relocation problem in the AR field.
  • the relocation process is to reposition the current image relative to the first marker image
  • the first marker image can be considered to have no cumulative error, so the embodiment can also eliminate the tracking process of multiple marker images. Cumulative error.
  • the Anchor-Switching AR System algorithm is applied to the AR game field, and the camera has a physical keyboard on the table, and the device superimposes a virtual small on the physical keyboard's enter key according to the camera posture parameter. people.
  • the relocation technique is not used, the tracking error will be generated after a period of time. The device generates a significant drift when calculating the position of the virtual villain according to the camera attitude parameter with the error, and the virtual villain drifts to the position of the space bar. 6 is shown. If the relocation technique is adopted, the accumulated error is eliminated after the relocation succeeds, and when the position of the virtual villain is calculated according to the more accurate camera posture parameter, the virtual villain can remain unchanged near the enter key.
  • the first marker image is usually the first frame image captured by the camera and is also the current image used in the relocation process, for the purpose of improving the success rate of feature point matching,
  • the first marker image needs to be preprocessed. As shown in FIG. 8, before step 502, the following steps are further included:
  • Step 501a recording an initial pose parameter corresponding to the first marker image
  • the IMU is set in the device, and the camera's pose parameters and time stamps are collected periodically by the IMU.
  • the pose parameters include a rotation matrix and a displacement vector, and the timestamp is used to represent the acquisition time of the pose parameter.
  • the rotation matrix acquired by the IMU is relatively accurate.
  • the shooting time of each frame of image is recorded at the same time.
  • the device queries and records the initial pose parameters of the camera when taking the first marker image based on the shooting time of the first marker image.
  • Step 501b obtaining n pyramid images with different scales corresponding to the first marker image, where n is an integer greater than one;
  • the device also extracts the initial feature points in the first marker image.
  • the feature extraction algorithm used by the device to extract feature points may be a FAST (Features from Accelerated Segment Test) detection algorithm, a Shi-Tomasi corner detection algorithm, and a Harris Corner Detection. (Harris corner detection) algorithm, SIFT (Scale-Invariant Feature Transform) algorithm, ORB (Oriented FAST and Rotated BRIEF) algorithm.
  • An ORB feature point includes a FAST Point-point and a Binary Robust Independent Elementary Feature Descirptor.
  • the FAST corner point refers to the location of the ORB feature point in the image.
  • the FAST corner point mainly detects the obvious change of the local pixel gray scale, and is known for its fast speed.
  • the idea of the FAST corner If a pixel differs greatly from the neighborhood's pixels (too bright or too dark), the pixel may be a corner.
  • the BRIEF descriptor is a binary representation of a vector that describes the information about the pixels around the key in an artificially designed way.
  • the description vector of the BRIEF descriptor consists of a number of 0's and 1's, where 0's and 1's encode the size relationship of two pixels near the FAST corner.
  • the ORB feature is faster to calculate, it is suitable for implementation on mobile devices. However, since the ORB feature descriptor has no scale invariance, the scale change when the user holds the camera to capture the image is very obvious, and the user is likely to observe the corresponding image of the first marker image at a very long or very close scale. In an alternative implementation, the device generates n pyramid images of different scales for the first marker image.
  • the pyramid image refers to an image obtained by scaling the first marker image by a preset ratio. Taking the pyramid image including the four-layer image as an example, the first marker image is scaled according to the scaling ratios of 1.0, 0.8, 0.6, and 0.4, and four images of different scales are obtained.
  • Step 501c extracting initial feature points for each pyramid image, and recording two-dimensional coordinates of the initial feature points when the pyramid image is scaled to the original size.
  • the device extracts feature points for each layer of pyramid image and calculates an ORB feature descriptor. For the feature points extracted on the pyramid image that is not the original scale (1.0), after the pyramid image is scaled to the original scale, the two-dimensional coordinates of each feature point on the pyramid image of the original scale are recorded.
  • the feature points and two-dimensional coordinates on these pyramid images can be called layer-keypoint.
  • feature points on each layer of pyramid images have a maximum of 500 feature points.
  • the feature points on each pyramid image are determined as initial feature points.
  • the current image has a large scale and the high-frequency details on the current image are clearly visible, the current image and the pyramid image with a lower number of layers (such as the original image) will have a higher matching score.
  • the current image has a small scale and only the low-frequency information on the current image is visible, the current image has a higher matching score with the pyramid image with a higher number of layers.
  • the first marker image has three pyramid images 91, 92 and 93
  • the pyramid image 91 is located in the first layer of the pyramid, with the smallest dimension of the three images
  • the pyramid image 92 is located
  • the second layer of the pyramid has an intermediate dimension in the three images
  • the pyramid image 93 is located in the third layer of the pyramid, having the largest dimension of the three images, if the current image 94 is tracking the feature points relative to the first marker image
  • the device can match the current image 94 with the feature points extracted from the three pyramid images, respectively. Since the scales of the pyramid image 93 and the current image 94 are closer, the feature points extracted in the pyramid image 93 have a higher matching score.
  • a plurality of scale pyramid images are set on the first marker image, and then initial feature points on each pyramid image are extracted for subsequent feature point tracking processes, and the feature points on multiple scales are matched together.
  • the scale of the first marker image is automatically adjusted to achieve scale invariance.
  • the feature point tracking process is shown for step 506.
  • the computational complexity of the normal feature point tracking process is N m times.
  • the terminal performs matching acceleration based on the word bag model.
  • BoW Bog of Words
  • an article may have 10,000 words, of which there may be only 500 different words, each of which appears differently.
  • the word bag is like a bag, and each bag contains the same words. This constitutes a way of expressing text. This representation does not take into account the grammar and the order of the words.
  • an image is usually expressed in terms of feature points and feature descriptors of the feature points. If the feature descriptor of the feature point is regarded as a word, the corresponding word bag model can be constructed.
  • step 506 includes the following sub-steps, as shown in FIG. 10:
  • Step 506a clustering initial feature points into a first node tree by using a word bag model, each father node of the first node tree includes K child nodes, and each node includes initial feature points clustered to the same class;
  • each ORB feature point includes: a FAST point-point and a BRIER descriptor.
  • the BRIER descriptor can characterize the initial feature points that can be used for clustering.
  • the BoW in this embodiment can use the DBoW2 library, which is an open source software library developed by Lopez et al. in the University of Zara.
  • the device clusters a plurality of initial feature points into the first node tree through the word bag model.
  • the device first uses multiple initial feature points as the root node of the first node tree, and clusters multiple initial feature points into K categories through the word bag model to form a first layer node.
  • the nodes include initial feature points belonging to the same class; then, any one of the first layer nodes is clustered into K categories to form K child nodes of the node, and so on, the device will be the Lth node Any one of the nodes is clustered into K categories to form K child nodes of the node.
  • the clustering algorithm uses a K-means clustering algorithm, and the K-means clustering algorithm can use the features extracted from the images in the training set to train.
  • Step 506b extracting candidate feature points in the current image
  • the device also extracts the initial feature points in the first marker image.
  • the feature extraction algorithm used by the device to extract feature points may be a FAST (Features from Accelerated Segment Test) detection algorithm, a Shi-Tomasi corner detection algorithm, and a Harris Corner Detection. (Harris corner detection) algorithm, SIFT (Scale-Invariant Feature Transform) algorithm, ORB (Oriented FAST and Rotated BRIEF) algorithm.
  • An ORB feature point includes a FAST Point-point and a Binary Robust Independent Elementary Feature Descirptor.
  • the SIFT feature can also be extracted.
  • the embodiment of the present application does not limit this, and only needs to extract the same type of feature for the first tag image and the current image.
  • Step 506c clustering candidate feature points into a second node tree by a word bag model, each father node of the second node tree includes K child nodes, and each node includes candidate feature points clustered to the same class;
  • candidate feature points are represented by ORB feature points.
  • Each ORB feature point includes: a FAST point-point and a BRIER descriptor.
  • the BRIER descriptor can characterize candidate feature points that can be used for clustering.
  • the BoW in this embodiment can use the DBoW2 library, which is an open source software library developed by Lopez et al. in the University of Zara.
  • the device clusters a plurality of candidate feature points into the second node tree through the word bag model.
  • the device first uses multiple candidate feature points as the root node of the second node tree, and clusters the plurality of candidate feature points into K categories by the word bag model to form the first layer node, and each node includes the same node.
  • Candidate feature points of the class then, any one of the nodes in the first layer is clustered into K categories to form K child nodes of the node, and so on, the device will re-establish any node in the L-th node
  • the clustering is K categories, which constitute the K child nodes of the node.
  • the clustering algorithm uses a K-means clustering algorithm, and the K-means clustering algorithm can use the features extracted from the images in the training set to train.
  • Step 506d Perform feature point tracking on the i-th first node in the forward index in the first node tree and the i-th second node in the forward index in the second node tree to obtain multiple sets of matching features. Point to point.
  • the forward index refers to a sequence when traversing in a depth-first traversal order or a breadth-first traversal order.
  • the i-th first node and the i-th second node are nodes on the same node in the two node trees.
  • the i-th first node is the third node in the third-layer node on the first node tree
  • the i-th second node is the third node in the third-layer node on the second node tree.
  • the i-th first node is an intermediate node in the first node tree
  • the i-th second node is an intermediate node in the second node tree
  • the intermediate node is a node between the root node and the leaf node. If the i-th first/second node is the root node, the computational complexity is not simplified compared to the normal feature point tracking process; if the i-th first/second node is a leaf node, it may be missed. Correctly matched feature points.
  • the method reduces the range of search points to (N) ⁇ (M/(K ⁇ L)), thereby achieving exponential acceleration matching.
  • N initial feature points on the first marker image there are N initial feature points on the first marker image, and N initial feature points are clustered into the first node tree; there are M target feature points on the current image. (matching with M initial feature points), M ⁇ N, clustering M target feature points to the second node tree.
  • the third layer node in the two node trees (counting from the root node) is used as an index layer. For each node of the index layer, find the feature set Sa corresponding to the first node in the forward index of A, and find out The feature set Sb corresponding to the second node in the forward index of B calculates feature matching in Sa and Sb.
  • the number of target feature points on the current image is the same or less, so the number of matches is reduced to two sets (having a few Matching to dozens of feature points).
  • the relocation method provided by this embodiment separately clusters feature points on two images into two node trees based on the word bag model, and uses the nodes at the same position on the two node trees to narrow the feature points. Matching range at the time of matching, thereby accelerating the feature point tracking process, enabling faster tracking of feature points of the current image relative to the first marker image, thereby achieving a faster repositioning effect.
  • step 508 illustrates the process of screening pairs of matching feature point pairs according to constraints. Multiple sets of matching feature point pairs can be selected in the following three directions.
  • the same initial feature point may have multiple candidate feature points in the target image, and there is a matching degree between each candidate feature point and the initial feature point, and the first candidate feature point is generally determined to match the initial feature point.
  • Feature points In the feature point matching process, it is easy to appear two candidate feature points whose matching degree is very close. For example, there are repeated pattern patterns on the tablecloth. These two candidate feature points with very close matching degrees have a high probability of causing errors. match. That is to say, this type of matching feature point pair is likely to be unexpected and the matching error is not unique and should be deleted.
  • the matching uniqueness condition requires that the first candidate feature point (target feature point) of each initial feature point has a certain distance from the second ranked candidate feature point, that is, the target feature point is uniquely matched with the initial feature point. Feature points, otherwise discard the pair of matching feature point pairs.
  • step 508 may optionally include the following steps:
  • Step 5081 For the initial feature points in any pair of matching feature point pairs, acquire target feature points and sub-level feature points that match the initial feature points, where the target feature points are among the plurality of candidate feature points that match the initial feature points.
  • the feature point ranked first in the matching degree, and the second-level feature point is a feature point in which the matching degree is ranked second among the plurality of candidate feature points matching the initial feature point;
  • Step 5082 It is detected whether a difference between the first matching degree and the second ranked matching degree is greater than a preset threshold
  • the preset threshold is 80%.
  • the first matching degree the matching degree between the target feature point and the initial feature point
  • the second matching degree the matching degree between the next-level feature point and the initial feature point
  • Step 5083 When the difference between the first matching degree and the second ranked matching degree is greater than a preset threshold, determining that the target feature point is the filtered target feature point;
  • the target feature point is a feature point that the initial feature point uniquely matches, and the screening condition is met.
  • the set of matching feature point pairs is determined as the matched matching feature point pairs, or the screening of other constraints is continued.
  • Step 5084 discard the set of matching feature point pairs when the difference between the first matching degree and the second ranked matching degree is less than a preset threshold.
  • the matching feature point pair is likely to have a matching error, and the set of matching feature point pairs should be discarded.
  • the relocation method provided in this embodiment filters the matching feature point groups according to the matching uniqueness test, and can filter the matched feature point pairs with the possibility of large matching errors, thereby ensuring the matching after the screening.
  • the feature point group conforms to the matching unique feature, thereby improving the calculation accuracy in the subsequent relocation process.
  • the epipolar constraint means that the corresponding points of the matching points on other views are located on the corresponding polar lines.
  • FIG 14 is a schematic view of the principle of constraint limit, there is a three-dimensional point x on the plane of the real world, the presence of a left observation point x 1, the presence of the observation point x 2, satisfies the following relation is bound on the right imaging plane on the imaging plane:
  • R is the rotation matrix between the two camera poses and T is the displacement vector between the two camera poses.
  • Both sides can be multiplied by T at the same time.
  • T ⁇ X 2 T ⁇ R * X 1 ;
  • any set of matching points there must be a limitation of the basic matrix as above, and a minimum of 8 sets of matching feature point pairs are needed to calculate the basic matrix F. Therefore, after filtering out at least 8 sets of matching feature point pairs (such as 8 matching feature point pairs matching the matching uniqueness), a basic matrix F is used to verify the polar line error by ransac method, thereby eliminating those matching scores. High but geometrically incorrect points ensure geometric consistency.
  • step 508 may optionally include the following steps:
  • Step 508A fitting a base matrix by at least 8 sets of matching feature point pairs, the base matrix is used to fit a polar line constraint condition between the first marker image and the current image;
  • At least 8 sets of matching feature points selected by matching the uniqueness test condition are used as feature point pairs of the fitting base matrix.
  • the ransac calculation is performed by using at least 8 sets of matching feature points, and the homography matrix between the first marker image and the current image is calculated, and the homography matrix is decomposed to obtain a rotation matrix and a displacement vector. After multiplying the rotation matrix and the displacement vector, the base matrix F is fitted.
  • Step 508B for any matching feature point pair, calculate a product between the two-dimensional coordinates of the initial feature point, the base matrix, and the two-dimensional coordinates of the target feature point;
  • X 2 is the two-dimensional coordinates of the target feature point in the current image
  • X 1 is the two-dimensional coordinate of the initial feature point in the first marker image
  • F is the base matrix fitted in the previous step.
  • Step 508C detecting whether the product is smaller than an error threshold
  • the product should be zero. However, due to the existence of errors, the product is not completely zero. Therefore, an error threshold can be set in advance. When the product belongs to the error threshold, it is considered that the initial feature point and the target feature point conform to the polar line constraint.
  • step 508D is entered; if the product is greater than or equal to the error threshold, then step 508E is entered.
  • Step 508D When the product is less than the error threshold, determine that the matching feature point pair is the matched matching feature point pair.
  • the product is less than the error threshold, it is considered that the screening condition is met, and the pair of matching feature point pairs is determined as the matched matching feature point pair, or the screening of other constraints is continued.
  • Step 508E discarding the set of matching feature point pairs when the product is greater than or equal to the error threshold.
  • the relocation method provided by the embodiment can filter the matched feature point pairs according to the polar line constraint test, and can filter the matched feature point pairs that do not conform to the geometric position, thereby ensuring the matched feature points after the screening.
  • the group meets the polar line constraint characteristics, thereby improving the calculation accuracy in the subsequent relocation process.
  • the feature points used to calculate the homography matrix in the relocation calculation process need to have sufficient distance, preferably as far as possible on the marker image, which is more representative. Therefore, the regional representative constraint refers to selecting representative target feature points in each local region in each local region of the current image.
  • step 508 may optionally include the following sub-steps:
  • Step 508a rasterizing the current image to obtain a plurality of grid regions
  • the device rasterizes the current image according to the preset grid size. Divide the current image into multiple raster regions that do not overlap each other.
  • Step 508b For any grid area in which the target feature points exist in the plurality of grid areas, filter the target feature points having the highest matching degree in the grid area;
  • the target feature points in the plurality of matching feature point pairs are dispersed in the plurality of grid regions, and the target feature points in each of the grid regions may be zero or more.
  • the target feature point with the highest matching degree in the raster area is filtered out.
  • the left figure shows a preset grid through which the current image is rasterized to obtain a plurality of grid regions.
  • the target feature points in each grid area are filtered to select the target feature points with the highest matching degree.
  • step 508c the matching feature point pair corresponding to the target feature point with the highest matching degree is determined as the matched matching feature point pair.
  • the matching feature point pair where the target feature point is located is determined as the matched matching feature point pair.
  • the matching degree between each target feature point and the corresponding initial feature point is obtained, and the target feature point having the highest matching degree is determined as the matched matching feature point pair.
  • the relocation method selects a target feature point having the highest matching degree in each grid area as a representative target feature point in the grid area.
  • the representative target feature points can uniquely represent the current grid region, so that the homography matrix homography calculated during the relocation process is more robust, and the calculation can be limited by the number of grid regions. The maximum number of homography matrices when homography, thus ensuring the computational speed when calculating homography.
  • the pose variation calculation process for the camera pose shown in step 510 is performed.
  • the device obtains the selected pairs of matched feature point pairs, multiple pairs of matching feature point pairs (initial feature points and target feature points) are input into the ransac algorithm, and the current image is calculated relative to the first marker image.
  • the sex matrix homography through the decomposition algorithm in the IMU, can decompose the homography matrix homography to obtain the rotation matrix R relocalize and the translation vector T relocalize , that is, the target pose parameter of the camera when acquiring the current image.
  • step 510 optionally includes the following sub-steps:
  • Step 510a calculating a homography matrix of the camera during a camera pose change process according to the selected plurality of matched feature point pairs;
  • the device inputs multiple sets of matching feature point pairs (initial feature points and target feature points) into the ransac algorithm, and calculates a homography matrix homography of the current image relative to the first marker image.
  • Step 510b calculating a projected feature point of the initial feature point on the current image by the homography matrix
  • the device selects initial feature points with matching target feature points from all initial feature points, and calculates projected feature points of each initial feature point on the current image.
  • a projected feature point of each initial feature point on the current image is obtained.
  • Step 510c calculating a projection error between the projected feature point and the target feature point
  • a projection error between the projected feature point and the target feature point corresponding to the initial feature point is calculated.
  • the target feature point is considered to be inlier (inner point); when the distance between the projected feature point and the target feature point is greater than the distance error, the target feature is considered The point is outlier (outside point). Then, the device counts the ratio of the number of external points to the total number of all target feature points.
  • Step 510d When the projection error is less than the preset threshold, the homography matrix is decomposed to obtain the pose change amounts R relocalize and T relocalize when the camera changes from the initial pose parameter to the target pose parameter.
  • the preset threshold is 50%.
  • the device decomposes the homography matrix to obtain the camera changing from the initial attitude parameter.
  • the pose change amount R relocalize and T relocalize to the target pose parameter; when the ratio of the point at the outlier to the total number of target feature points is greater than 50%, it is considered that the homography matrix is unreliable in this calculation. , give up this result.
  • step 510b and step 510c is an optional step.
  • the relocation method provided in this embodiment can verify the homography matrix by counting the number of outliers, and abandon the current result when the verification fails, thereby ensuring accurate calculation of the homography matrix. Sex, which in turn ensures the accuracy of the calculation of the relocation results.
  • the relocation method of the camera attitude tracking process described above may be used in an AR program, by which the camera pose on the electronic device can be tracked in real time according to real world scene information, and according to The tracking result adjusts and modifies the display position of the AR element in the AR application.
  • the AR program running on the mobile phone shown in FIG. 1 or FIG. 2 as an example, when it is necessary to display a still cartoon character standing on a book, no matter how the user moves the mobile phone, it only needs to change according to the camera posture on the mobile phone. Modifying the display position of the cartoon character will keep the standing position of the cartoon character on the book unchanged.
  • FIG. 20 is a structural block diagram of a relocating device of a camera attitude tracking process provided by an exemplary embodiment of the present application.
  • the relocation device can be implemented as all or part of an electronic device by software, hardware, or a combination of both.
  • the electronic device is configured to sequentially perform camera attitude tracking of a plurality of marker images, the apparatus comprising:
  • the image acquisition module 2010 is configured to acquire a current image acquired after the i-th mark image of the plurality of mark images, i>1;
  • the information acquiring module 2020 is configured to acquire an initial feature point and an initial pose parameter of the first one of the plurality of mark images when the current image meets the relocation condition;
  • the feature point tracking module 2030 is configured to perform feature point tracking on the current image with respect to the initial feature point of the first marker image to obtain a plurality of sets of matching feature point pairs;
  • the feature point screening module 2040 is configured to filter the plurality of sets of matching feature point pairs according to the constraint condition, and obtain the matched matching feature point pairs;
  • the calculating module 2050 is configured to calculate, according to the selected matched feature point pairs, a pose change amount when the camera changes from the initial pose parameter to the target pose parameter;
  • the relocation module 2060 is configured to reposition the target pose parameter of the camera according to the initial pose parameter and the pose change amount.
  • the constraint comprises at least one of the following conditions:
  • the target feature point is a feature point that uniquely matches the initial feature point
  • the initial feature point and the target feature point satisfy a polar line constraint
  • the target feature point is a feature point with the highest matching degree on the grid area, and the grid area is an area obtained by rasterizing the current image.
  • the constraint includes that the target feature point is a feature point that uniquely matches the initial feature point;
  • the feature point screening module 2040 is configured to acquire the target feature point and the second-level feature point that match the initial feature point for the initial feature point of any one of the matched feature point pairs.
  • the target feature point is a feature point that is ranked first in the matching degree among the plurality of candidate feature points that match the initial feature point, and the second-level feature point is among the plurality of candidate feature points that match the initial feature point a second feature point of the matching degree; when the difference between the first matching degree and the second ranked matching degree is greater than a preset threshold, determining that the target feature point is the filtered Target feature point.
  • the constraint includes the initial feature point and the target feature point satisfying a polar line constraint
  • the feature point screening module 2040 is configured to calculate a product between the two-dimensional coordinates of the initial feature point, the base matrix, and the two-dimensional coordinates of the target feature point for any one of the matched feature point pairs; a base matrix is used to fit a polar constraint between the first marker image and the current image; when the product is less than an error threshold, determining that the matching feature point pair is the filtered matching feature Point to point.
  • the constraint includes that the target feature point is a feature point with the highest matching degree on the grid area;
  • the feature point screening module 2040 is configured to perform rasterization processing on the current image to obtain a plurality of grid regions, and filter out any grid region in which the target feature points exist in the plurality of grid regions. a target feature point having the highest matching degree in the grid region; and a matched feature point pair corresponding to the target feature point having the highest matching degree is determined as the matched matching feature point pair.
  • the feature point tracking module 2030 is configured to cluster the initial feature points into a first node tree by using a word bag model, where each father node of the first node tree includes K a child node, each node including initial feature points clustered to the same class; extracting candidate feature points in the current image, and clustering the candidate feature points to the second node tree by using the word bag model
  • Each parent node of the second node tree includes K child nodes, each of which includes candidate feature points clustered to the same class; the i-th in the forward index in the first node tree
  • the first node performs feature point tracking with the i-th second node in the forward index in the second node tree to obtain a plurality of matching feature point pairs.
  • the i-th first node is an intermediate node in the first node tree
  • the i-th second node is an intermediate node in the second node tree
  • the intermediate node is a node located between the root node and the leaf node.
  • the calculating module 2050 is configured to calculate a homography matrix of the camera during a camera pose change process according to the filtered plurality of matched feature point pairs;
  • the dependency matrix is decomposed to obtain the pose change amounts R relocalize and T relocalize when the camera changes from the initial pose parameter to the target pose parameter .
  • the calculating module 2050 is configured to calculate a projected feature point of the initial feature point on the current image by using the homography matrix; calculate the projected feature point and the Projection error between target feature points; when the projection error is less than a preset threshold, performing the decomposition on the homography matrix to obtain when the camera changes from the initial posture parameter to the target posture parameter The steps of the position change amount R relocalize and T relocalize .
  • FIG. 21 is a block diagram showing the structure of an electronic device 2100 provided by an exemplary embodiment of the present application.
  • the electronic device 2100 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III), and an MP4 (Moving Picture Experts Group Audio Layer IV). Audio level 4) Player, laptop or desktop computer.
  • Electronic device 2100 may also be referred to as a user device, a portable electronic device, a laptop electronic device, a desktop electronic device, and the like.
  • the electronic device 2100 includes a processor 2101 and a memory 2102.
  • the processor 2101 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 2101 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve.
  • the processor 2101 may also include a main processor and a coprocessor.
  • the main processor is a processor for processing data in an awake state, also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby.
  • the processor 2101 can be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display.
  • the processor 2101 may further include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
  • AI Artificial Intelligence
  • Memory 2102 can include one or more computer readable storage media, which can be non-transitory. Memory 2102 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in the memory 2102 is configured to store at least one instruction for execution by the processor 2101 to implement the camera pose provided by the method embodiments of the present application. The method of relocating the tracking process.
  • the electronic device 2100 can also optionally include: a peripheral device interface 2103 and at least one peripheral device.
  • the processor 2101, the memory 2102, and the peripheral device interface 2103 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 2103 via a bus, signal line or circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 2104, a touch display screen 2105, a camera 2106, an audio circuit 2107, a positioning component 2108, and a power source 2109.
  • the peripheral device interface 2103 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 2101 and the memory 2102.
  • processor 2101, memory 2102, and peripheral interface 2103 are integrated on the same chip or circuit board; in some other embodiments, any of processor 2101, memory 2102, and peripheral interface 2103 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the RF circuit 2104 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal.
  • the RF circuit 2104 communicates with the communication network and other communication devices via electromagnetic signals.
  • the radio frequency circuit 2104 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
  • the radio frequency circuit 2104 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • the radio frequency circuit 2104 can communicate with other electronic devices via at least one wireless communication protocol.
  • the wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks.
  • the radio frequency circuit 2104 may further include an NFC (Near Field Communication) related circuit, which is not limited in this application.
  • the display 2105 is for displaying a UI (User Interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display 2105 is a touch display, the display 2105 also has the ability to capture touch signals over the surface or surface of the display 2105.
  • the touch signal can be input to the processor 2101 as a control signal for processing.
  • the display 2105 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 2105 may be one, and the front panel of the electronic device 2100 is disposed; in other embodiments, the display screen 2105 may be at least two, respectively disposed on different surfaces of the electronic device 2100 or in a folded design.
  • the display screen 2105 can be a flexible display screen disposed on a curved surface or a folded surface of the electronic device 2100. Even the display screen 2105 can be set to a non-rectangular irregular pattern, that is, a profiled screen.
  • the display 2105 can be made of a material such as an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
  • Camera component 2106 is used to capture images or video.
  • camera assembly 2106 includes a front camera and a rear camera.
  • the front camera is placed on the front panel of the electronic device and the rear camera is placed on the back of the electronic device.
  • the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions.
  • camera assembly 2106 can also include a flash.
  • the flash can be a monochrome temperature flash or a two-color temperature flash.
  • the two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
  • the audio circuit 2107 can include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 2101 for processing, or input to the RF circuit 2104 for voice communication.
  • the microphones may be multiple, and are respectively disposed at different parts of the electronic device 2100.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is then used to convert electrical signals from the processor 2101 or the RF circuit 2104 into sound waves.
  • the speaker can be a conventional film speaker or a piezoelectric ceramic speaker.
  • the audio circuit 2107 can also include a headphone jack.
  • the positioning component 2108 is configured to locate the current geographic location of the electronic device 2100 to implement navigation or LBS (Location Based Service).
  • the positioning component 2108 can be a positioning component based on a GPS (Global Positioning System) of the United States, a Beidou system of China, or a Galileo system of Russia.
  • the power source 2109 is used to power various components in the electronic device 2100.
  • the power source 2109 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery that is charged by a wired line
  • a wireless rechargeable battery is a battery that is charged by a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • electronic device 2100 also includes one or more sensors 2110.
  • the one or more sensors 2110 include, but are not limited to, an acceleration sensor 2111, a gyro sensor 2112, a pressure sensor 2113, a fingerprint sensor 2114, an optical sensor 2115, and a proximity sensor 2116.
  • the acceleration sensor 2111 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the electronic device 2100.
  • the acceleration sensor 2111 can be used to detect components of gravity acceleration on three coordinate axes.
  • the processor 2101 can control the touch display 2105 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 2111.
  • the acceleration sensor 2111 can also be used for the acquisition of game or user motion data.
  • the gyro sensor 2112 can detect the body direction and the rotation angle of the electronic device 2100, and the gyro sensor 2112 can cooperate with the acceleration sensor 2111 to collect the 3D action of the user on the electronic device 2100. Based on the data collected by the gyro sensor 2112, the processor 2101 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • the pressure sensor 2113 may be disposed on a side border of the electronic device 2100 and/or a lower layer of the touch display screen 2105.
  • the pressure sensor 2113 When the pressure sensor 2113 is disposed on the side frame of the electronic device 2100, the user's holding signal to the electronic device 2100 can be detected, and the processor 2101 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 2113.
  • the operability control on the UI interface is controlled by the processor 2101 according to the user's pressure operation on the touch display screen 2105.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 2114 is configured to collect the fingerprint of the user, and the processor 2101 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 2114, or the fingerprint sensor 2114 identifies the identity of the user according to the collected fingerprint. Upon identifying that the identity of the user is a trusted identity, the processor 2101 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like.
  • the fingerprint sensor 2114 can be disposed on the front, back, or side of the electronic device 2100. When the physical button or vendor logo is provided on the electronic device 2100, the fingerprint sensor 2114 can be integrated with the physical button or the manufacturer logo.
  • Optical sensor 2115 is used to collect ambient light intensity.
  • the processor 2101 can control the display brightness of the touch display 2105 based on the ambient light intensity acquired by the optical sensor 2115. Illustratively, when the ambient light intensity is high, the display brightness of the touch display 2105 is raised; when the ambient light intensity is low, the display brightness of the touch display 2105 is lowered.
  • the processor 2101 can also dynamically adjust the shooting parameters of the camera assembly 2106 according to the ambient light intensity acquired by the optical sensor 2115.
  • Proximity sensor 2116 also referred to as a distance sensor, is typically disposed on the front panel of electronic device 2100.
  • the proximity sensor 2116 is used to collect the distance between the user and the front of the electronic device 2100.
  • the processor 2101 controls the touch display 2105 to switch from the bright screen state to the touch screen state; when the proximity sensor 2116 When it is detected that the distance between the user and the front side of the electronic device 2100 is gradually increased, the processor 2101 controls the touch display screen 2105 to switch from the state of the screen to the bright state.
  • FIG. 21 does not constitute a limitation to the electronic device 2100, and may include more or less components than those illustrated, or may be combined with some components or may be arranged with different components.
  • the application further provides a computer readable storage medium, where the storage medium stores at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one program, the code set or The instruction set is loaded and executed by the processor to implement the relocation method in the camera pose tracking process provided by the above method embodiment.
  • the present application also provides a computer program product that, when run on an electronic device, causes the electronic device to perform the relocation method in the camera pose tracking process described in the various method embodiments above.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

本申请公开了一种相机姿态追踪过程的重定位方法、装置、设备及存储介质,属于AR领域。所述方法包括:获取所述多个标记图像中第i个标记图像后采集的当前图像;当所述当前图像符合重定位条件时,获取所述多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;将所述当前图像相对于所述第一个标记图像进行特征点追踪,得到多组匹配特征点对;对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;根据所述筛选后的匹配特征点对,计算所述相机从所述初始位姿参数改变至目标位姿参数时的位姿变化量;根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数。

Description

相机姿态追踪过程的重定位方法、装置、设备及存储介质
本申请要求于2018年04月27日提交的申请号为201810393563.X、发明名称为“相机姿态追踪过程的重定位方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及增强现实领域,特别涉及一种相机姿态追踪过程的重定位方法、装置、设备及存储介质。
背景技术
视觉SLAM(simultaneous Localization and mapping,同时定位与地图构建)是指搭载相机的主体,在没有环境先验信息的情况下,于运动过程中建立环境的模型,同时估计自己的运动的技术。SLAM可以应用在AR(Augmented Reality,增强现实)领域、机器人领域和无人驾驶领域中。
以单目视觉SLAM为例,通常将相机采集的第一帧图像作为标记图像(Anchor)。在相机后续采集到当前图像时,设备对当前图像与标记图像之间共同具有的特征点进行追踪,根据当前图像与标记图像之间的特征点位置变化计算得到相机在现实世界中的位姿变化。但某些场景下会发生当前图像中的特征点丢失(Lost),无法继续追踪的情况。此时,需要使用SLAM重定位方法对当前图像进行重定位。
发明内容
本申请实施例提供了一种相机姿态追踪过程的重定位方法、装置、设备及存储介质。所述技术方案如下:
根据本申请的一个方面,提供了一种相机姿态追踪过程的重定位方法,应用于具有相机的设备中,所述设备用于按序执行多个标记图像的相机姿态追踪,所述方法包括:
获取所述多个标记图像中第i个标记图像后采集的当前图像,i>1;
当所述当前图像符合重定位条件时,获取所述多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;
将所述当前图像相对于所述第一个标记图像的所述初始特征点进行特征点追踪,得到多组匹配特征点对;对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;
根据所述筛选后的匹配特征点对,计算所述相机从所述初始位姿参数改变至目标位姿参数时的位姿变化量;
根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数。
根据本申请的另一方面,提供了一种相机姿态追踪过程的重定位装置,应用于具有相机的设备中,所述设备用于按序执行多个标记图像的相机姿态追踪,所述应用于具有相机的设备中,所述设备用于按序执行多个标记图像的相机姿态追踪,所述装置包括:
图像获取模块,用于获取所述多个标记图像中第i个标记图像后采集的当前图像,i>1;
信息获取模块,用于当所述当前图像符合重定位条件时,获取所述多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;
特征点追踪模块,用于将所述当前图像相对于所述第一个标记图像的所述初始特征点进行特征点追踪,得到多组匹配特征点对;
特征点筛选模块,用于对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;
计算模块,用于根据所述筛选后的匹配特征点对,计算所述相机从所述初始位姿参数改变至目标位姿参数时的位姿变化量;
重定位模块,用于根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数。
根据本申请的另一方面,提供了一种电子设备,所述电子设备包括存储器和处理器;
所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如上所述的相机姿态追踪过程中的重定位方法。
根据本申请的另一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上所述的相机姿态追踪过程中的重定位方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
通过在当前图像符合重定位条件时,将当前图像与第一个标记图像进行重定位,能够在对连续多个标记图像进行追踪的Anchor-SLAM算法中实现重定位,从而减少了追踪过程中断的可能性,由于重定位过程是将当前图像相对于第一个标记图像进行重定位,所以还能消除多个标记图像的追踪过程所产生的累积误差,从而解决相关技术中的SLAM重定位方法并不适用于变种后的SLAM算法的问题。
同时,通过对多组特征点匹配按照约束条件进行筛选得到筛选后的匹配特征点对,利用筛选后的匹配特征点对计算位姿变化量。一方面,由于减少了匹配过程中需要计算的特征点对,所以提高了匹配速度;另一方面,由于筛选出的特征点对是匹配准确性更好的特征点对,因此能够提高匹配精度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个示例性实施例提供的AR应用场景的场景示意图;
图2是本申请一个示例性实施例提供的AR应用场景的场景示意图;
图3是本申请一个示例性实施例提供的Anchor-Switching AR System算法的原理示意图;
图4是本申请一个示例性实施例提供的电子设备的结构框图;
图5是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图6和图7是本申请一个示例性实施例提供的AR应用场景中出现定位错误的图像示意图;
图8是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图9是本申请一个示例性实施例提供的金字塔图像的示意图;
图10是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图11是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图12是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图13是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图14是本申请一个示例性实施例提供的极线约束条件的原理示意图;
图15是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图16是本申请一个示例性实施例提供的特征点追踪过程的原理示意图;
图17是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图18是本申请一个示例性实施例提供的栅格化筛选特征点过程的原理示意图;
图19是本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图;
图20是本申请一个示例性实施例提供的相机姿态追踪过程的重定位装置的框图;
图21是本申请一个示例性实施例提供的电子设备的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
首先对本申请涉及的若干个名词进行简介:
AR(Augmented Reality,增强现实):一种在相机采集图像的过程中,实时地计算相机在现实世界(或称三维世界、真实世界)中的相机姿态参数,根据该相机姿态参数在相机采集的图像上添加虚拟元素的技术。虚拟元素包括但不限于:图像、视频和三维模型。AR技术的目标是在屏幕上把虚拟世界套接在现实世界上进行互动。该相机姿态参数包括旋转矩阵和位移向量,旋转矩阵用于表征相机在现实世界中发生的旋转角度,位移向量用于表征相机在现实世界中发生的位移距离。
例如,参见图1和参见图2,设备在相机拍摄到的图像中添加了一个虚拟人物形象。随着相机在现实世界中的运动,相机拍摄到的图像会发生变化,虚拟人物的拍摄方位也发生变化,模拟出了虚拟人物在图像中静止不动,而相机随着位置和姿态的变化同时拍摄图像和虚拟人物的效果,为用户呈现了一幅真实立体的画面。
Anchor-Switching AR System:是基于连接多个标记图像(Anchor)的相机姿态追踪来确定在自然场景下的相机姿态参数,进而根据相机姿态参数在相机采集的图像上叠加虚拟世界的AR系统。
IMU(Inertial Measurement Unit,惯性测量单元):是用于测量物体的三轴姿态角(或角速率)以及加速度的装置。一般的,一个IMU包含了三个单轴的加速度计和三个单轴的陀螺,加速度计用于检测物体在三维坐标系中每个坐标轴上的加速度信号,进而计算得到位移向量;而陀螺用于检测物体在三维坐标系中的旋转矩阵。可选地,IMU包括陀螺仪、加速度计和地磁传感器。
示意性的,三维坐标系的建立方式为:1、X轴使用向量积Y*Z来定义,在X轴在设备当前的位置上,沿与地面相切的方向指向东方;2、Y轴在设备当前的位置上,沿与地面相切的方向指向地磁场的北极;3、Z轴指向天空并垂直于地面。
在AR(Augmented Reality,增强现实)领域进行相机姿态追踪时,比如使用手机拍摄桌 面进行AR游戏的场景,由于AR使用场景存在其场景特殊性,通常会对现实世界中的某个固定平面进行持续性拍摄(比如某个桌面或墙面),直接使用相关技术中的SLAM重定位方法的效果较差,尚需提供一种适用于AR领域的重定位解决方案。
本申请提供了一种适用于Anchor-Switching AR System算法的重定位方法。Anchor-Switching AR System算法在确定相机姿态的过程中,将相机的运动过程划分为至少两段追踪过程进行追踪,每段追踪过程对应各自的标记图像。示意性的,当第i个标记图像对应的追踪过程中,当当前图像相对于第i个标记图像的追踪效果差于预设条件(比如能够匹配到的特征点少于预设阈值)时,将当前图像的上一个图像确定为第i+1个标记图像,开启第i+1段追踪过程。其中,i为正整数。示意性的参考图3,其示出了本申请一个示例性实施例提供的Anchor-Switching AR System算法的原理示意图。在现实世界中存在物体320,设置有相机的设备340被用户手持进行移动,在移动过程中拍摄得到包括物体320的多帧图像1-6。设备将图像1确定为第1个标记图像(born-anchor或born-image)并记录初始位姿参数,该初始位姿参数可以是IMU采集的,然后将图像2相对于图像1进行特征点追踪,根据初始位姿参数和特征点追踪结果计算出相机在拍摄图像2时的位姿参数;将图像3相对于图像1进行特征点追踪,根据初始位姿参数和特征点追踪结果计算出相机在拍摄图像3时的位姿参数;将图像4相对于图像1进行特征点追踪,根据初始位姿参数和特征点追踪结果计算出相机在拍摄图像4时的位姿参数。
然后,将图像5相对于图像1进行特征点追踪,如果特征点追踪效果差于预设条件(比如匹配的特征点数量较少),则将图像4确定为第2个标记图像,将图像5相对于图像4进行特征点追踪,计算出相机在拍摄图像4至图像5之间的位移变化量,再结合相机在拍摄图像4至图像1之间的位移变化量以及初始位姿参数,计算出相机在拍摄图像5时的位姿参数。然后再将图像6相对于图像4进行特征点追踪,依次类推,若当前图像的特征点追踪效果变差时,即可将当前图像的上一帧图像确定为新的标记图像,切换新的标记图像后重新进行特征点追踪。
可选地,特征点追踪可以采用基于视觉里程计原理的算法,比如特征点法或直接法。但是若相机在追踪过程中处于发生较为剧烈的运动、朝向强光源、朝向白色墙壁等各种异常场景时,上述Anchor-Switching AR System追踪过程可能会发生丢失(Lost)现象。丢失现象是指在当前图像中无法匹配到足够多的特征点,导致追踪失败。
参考图4,其示出了本申请一个示例性实施例提供的设备的结构框图。该设备包括:处理器420、存储器440、相机460和IMU 480。
处理器420包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器420用于执行存储器440中存储的指令、代码、代码片段和程序中的至少一种。
处理器420与存储器440电性相连。可选地,处理器420通过总线与存储器440相连。存储器440存储有一个或多个指令、代码、代码片段和/或程序。该指令、代码、代码片段和/或程序在被处理器420执行时,用于实现如下实施例中提供的SLAM重定位方法。
处理器420还与相机460电性相连。可选地,处理器420通过总线与相机460相连。相机460是具有图像采集能力的传感器件。相机460还可称为摄像头、感光器件等其它名称。相机460具有连续采集图像或多次采集图像的能力。可选地,相机460设置在设备内部或设备外部。可选地,该相机460是单目相机。
处理器420还与IMU480电性相连。可选地,IMU480用于每隔预定时间间隔采集相机的位姿参数,并记录每组位姿参数在采集时的时间戳。相机的位姿参数包括:位移向量和旋转矩阵。其中,IMU480采集的旋转矩阵相对准确,采集的位移向量受实际环境可能会有较大的误差。
参考图5,其示出了本申请一个示例性实施例提供的相机姿态追踪过程的重定位方法的流程图。本实施例以该重定位方法应用于图4所示的设备中来举例说明,该设备用于按序执行多个标记图像的相机姿态追踪。该方法包括:
步骤502,获取多个标记图像中第i个标记图像之后采集的当前图像;
设备内的相机按照预设时间间隔采集一帧帧图像,形成图像序列。可选地,相机是在运动(平移和/或旋转)过程中,按照预设时间间隔采集一帧帧图像形成图像序列。
可选地,设备将图像序列中的第一帧图像(或前几帧图像中符合预定条件的一帧图像)确定为第一个标记图像,将后续采集的图像相对于第一个标记图像进行特征点追踪,并根据特征点追踪结果计算相机的相机姿态参数;若当前帧图像的特征点追踪效果差于预设条件时,将当前帧图像的上一帧图像确定为第二个标记图像,将后续采集的图像相对于第二个标记图像进行特征点追踪,并根据特征点追踪结果计算相机的相机姿态参数,依次类推。设备可以按序进行连续多个标记图像的相机姿态追踪。
当处于第i个标记图像对应的第i个追踪过程时,相机会采集到当前图像。当前图像是第i个标记图像之后采集的某一帧图像,其中,i为大于1的整数。
步骤504,当当前图像符合重定位条件时,获取多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;
其中,初始位姿参数用于指示相机采集第一个标记图像时的相机姿态。
设备会确定当前图像是否符合重定位条件。重定位条件用于指示当前图像相对于第i个标记图像的追踪过程失败,或者,重定位条件用于指示历史追踪过程中的累积误差已经高于预设条件。
在一个可选的实施例中,设备对当前图像相对于第i个标记图像进行追踪,若当前图像中不存在与第i个标记图像匹配的特征点,或者,当前图像中与第i个标记图像匹配的特征点少于第一数量时,确定当前图像相对于第i个标记图像的追踪过程失败,符合重定位条件。
在另一个可选的实施例中,设备确定当前图像与上一次重定位的图像之间的帧数大于第二数量时,确定历史追踪过程中的累积误差已经高于预设条件,或者,设备确定第i个标记图像和第一个标记图像之间的标记图像数量大于第三数量时,确定历史追踪过程中的累计误差已经高于预设条件。
本实施例对重定位条件的具体条件内容不加以限定。
当当前图像符合重定位条件时,设备尝试将当前图像相对于第一个标记图像进行特征点追踪。此时,设备获取缓存的第一个标记图像中的初始特征点以及初始位姿参数。
初始特征点是从第一个标记图像上提取到的特征点,初始特征点可以是多个,比如10-500个。该初始位姿参数用于指示相机采集第一个标记图像时的相机姿态。可选地,初始位姿参数包括旋转矩阵R和位移向量T,初始位姿参数可以由IMU采集得到。
步骤506,将当前图像相对于第一个标记图像的初始特征点进行特征点追踪,得到多组匹配特征点对;
可选地,每组匹配特征点对中包括两个互相匹配的初始特征点和目标特征点。
特征点追踪可采用基于视觉里程计的追踪算法,本申请对此不加以限定。在一个实施例中,特征点追踪采用KLT(Kanade-Lucas)光流追踪算法;在另一个实施例中,特征点追踪采用基于ORB(Oriented FAST and Rotated BRIEF,快速特征点提取和描述)算法提取的ORB特征描述子进行特征点跟踪。本申请对特征点追踪的具体算法不加以限定,特征点追踪过程可以采用特征点法或直接法。
在一个实施例中,设备对第一个标记图像进行特征点提取,得到N个初始特征点;设备还对当前图像进行特征点提取,得到M个候选特征点;然后将M个候选特征点逐一与N个初始特征点进行匹配,确定出至少一组匹配特征点对。每组匹配特征点对包括:一个初始特征点和一个目标特征点。初始特征点是第1个标记图像上的特征点,目标特征点是当前图像上与该初始特征点匹配度最高的候选特征点。
可选地,初始特征点的数量大于或等于匹配特征点对的数量。比如,初始特征点的数量是450个,匹配特征点对为320组。
步骤508,对多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;
由于在重定位计算过程中,只需要至少四组匹配特征点对就能完成计算,因此当存在多组匹配特征点对可以使用时,终端可以挑选出较为准确的至少四组匹配特征点对进行后续计算。
可选地,终端按照约束条件对多组匹配特征点对进行筛选,得到筛选后的匹配特征点对。约束条件用于对匹配特征点对的匹配准确性进行约束。约束条件包括如下三个条件中的至少一个:
1、匹配唯一性约束条件;
匹配唯一性是用于指示目标特征点是该初始特征点唯一匹配的特征点的条件。
2、极线约束检验条件;
由于不同角度拍摄的两张二维图像是对现实世界中同一个三维环境进行拍摄得到的,因此两张二维图像中的匹配点在不同视图上的对应点应当位于相应的极线上,也即不同的两张二维图像中的匹配特征点对应该满足极线约束检验条件。
极线约束检验条件用于检测目标特征与初始特征点之间是否满足极线约束。
3、区域代表性约束条件。
在特征点匹配过程中,可能会出现大量特征点处于一个密集区域内的现象。理想情况下,计算两个图像之间的单应性矩阵homography时的需要有足够的距离。区域代表性约束条件用于在当前图像的局部区域中挑选出具有代表性的目标特征点。
步骤510,根据筛选后的匹配特征点对,计算相机从初始位姿参数改变至目标位姿参数时的位姿变化量;
可选地,目标位姿参数用于指示在采集当前图像时的相机姿态。
可选地,设备根据筛选后的至少四组匹配特征点对(初始特征点和目标特征点)计算两帧图像之间的单应性矩阵homography;对单应性矩阵homography进行分解,得到相机从初始位姿参数改变至目标位姿参数时的位姿变化量R relocalize和T relocalize
单应性矩阵描述了两个平面之间的映射关系,若自然场景(现实环境)中的特征点都落在同一物理平面上,则可以通过单应性矩阵进行运动估计。当存在至少四对相匹配的初始特征点和目标特征点时,设备通过ransac对该至少四对匹配特征点所计算得到的单应性矩阵进 行分解,得到旋转矩阵R relocalize和平移向量T relocalize
其中,R relocalize是相机从初始位姿参数改变至目标位姿参数时的旋转矩阵,T relocalize是相机从初始位姿参数改变至目标位姿参数时的位移向量。
步骤512,根据初始位姿参数和位姿变化量,重定位得到目标位姿参数。
设备将初始位姿参数利用位姿变化量进行变换后,重定位得到目标位姿参数,从而计算得到相机在采集当前图像时的相机姿态。
可选地,在对当前图像重定位成功时,终端将当前图像确定为第i+1个标记图像。
终端基于第i+1个标记图像继续进行特征点追踪。终端根据后续的特征点追踪情况,还可以继续生成第i+2个标记图像、第i+3个标记图像、第i+4个标记图像等等,以此类推不再赘述。相关过程可参考上述图3所示的追踪内容
综上所述,本实施例提供的重定位方法,通过在当前图像符合重定位条件时,将当前图像与第一个标记图像进行重定位,能够在连续多个标记图像进行追踪的Anchor-Switching AR system算法中实现重定位,从而减少了追踪过程中断的可能性,从而解决相关技术中的SLAM重定位方法并不适用于AR领域中重定位问题。
另外,由于重定位过程是将当前图像相对于第一个标记图像进行重定位,第一个标记图像可以认为是没有累积误差的,所以本实施例还能消除多个标记图像的追踪过程所产生的累积误差。
结合参考图6和图7,假设Anchor-Switching AR System算法应用于AR游戏领域,相机拍摄到的桌子上有一个物理键盘,由设备根据相机姿态参数在物理键盘的回车键上叠加一个虚拟小人。若未采用重定位技术,则在一段时间后会产生跟踪误差,设备根据存在误差的相机姿态参数计算虚拟小人的位置时产生了明显的漂移,虚拟小人漂移到了空格键的位置,如图6所示。若采用了重定位技术,则在重定位成功后消除了累计误差,根据较为准确的相机姿态参数计算虚拟小人的位置时,虚拟小人能够保持在回车键附近不变。
以下对上述重定位方法的若干个阶段进行介绍:
预处理阶段:
在基于图5所示的可选实施例中,由于第一个标记图像通常是相机拍摄的第一帧图像,也是重定位过程使用的当前图像,出于提高特征点匹配的成功率的目的,需要对第一个标记图像进行预处理。如图8所示,步骤502之前还包括如下步骤:
步骤501a,记录第一个标记图像对应的初始位姿参数;
设备中设置有IMU,通过IMU定时采集相机的位姿参数以及时间戳。位姿参数包括旋转矩阵和位移向量,时间戳用于表示位姿参数的采集时间。可选地,IMU采集的旋转矩阵是较为准确的。
设备中的相机采集每帧图像时,同时记录有每帧图像的拍摄时间。设备根据第一个标记图像的拍摄时间,查询并记录相机在拍摄第一个标记图像时的初始位姿参数。
步骤501b,获取第一个标记图像对应的n个尺度不同的金字塔图像,n为大于1的整数;
设备还提取第一个标记图像中的初始特征点。可选地,设备提取特征点时采用的特征提取算法可以为FAST(Features from Accelerated Segment Test,加速段测试特征点)检测算法、Shi-Tomasi(史托马西)角点检测算法、Harris Corner Detection(Harris角点检测)算法、SIFT (Scale-Invariant Feature Transform,尺度不变特征转换)算法、ORB(Oriented FAST and Rotated BRIEF,快速特征点提取和描述)算法等。
由于SIFT特征的实时计算难度较大,为了保证实时性,设备可以提取第一个标记图像中的ORB特征点。一个ORB特征点包括FAST角点(Key-point)和BRIER描述子(Binary Robust Independent Elementary Feature Descirptor)两部分。
FAST角点是指该ORB特征点在图像中所在的位置。FAST角点主要检测局部像素灰度变化明显的地方,以速度快著称。FAST角点的思想时:如果一个像素与邻域的像素差别较大(过亮或过暗),则该像素可能是一个角点。
BRIEF描述子是一个二进制表示的向量,该向量按照某种人为设计的方式描述了该关键点周围像素的信息。BRIEF描述子的描述向量由多个0和1组成,这里的0和1编码了FAST角点附近的两个像素的大小关系。
由于ORB特征的计算速度较快,因此适用于移动设备上实施。但由于ORB特征描述子没有尺度不变性,用户手持相机采集图像时的尺度变化又很明显,用户很可能在很远或很近的尺度下观测到第一个标记图像对应的画面,在一个可选的实现中,设备为第一个标记图像生成n个尺度不同的金字塔图像。
金字塔图像是指对第一个标记图像按照预设比例进行缩放后的图像。以金字塔图像包括四层图像为例,按照缩放比例1.0、0.8、0.6、0.4将第一个标记图像进行缩放后,得到四张不同尺度的图像。
步骤501c,对每个金字塔图像提取初始特征点,并记录初始特征点在金字塔图像缩放至原始尺寸时的二维坐标。
设备对每一层金字塔图像都提取特征点并计算ORB特征描述子。对于不是原始尺度(1.0)的金字塔图像上提取的特征点,将该金字塔图像按照缩放比例放大到原始尺度后,记录每个特征点在原始尺度的金字塔图像上的二维坐标。这些金字塔图像上的特征点以及二维坐标,可称为layer-keypoint。在一个例子中,每层金字塔图像上的特征点最多有500个特征点。
对于第一个标记图像,将每个金字塔图像上的特征点确定为初始特征点。在后续特征点追踪过程中,若当前图像的尺度很大,当前图像上的高频细节都清晰可见,则当前图像与层数较低的金字塔图像(比如原始图像)会有更高的匹配分数;反之,若当前图像的尺度很小,当前图像上只能看到模糊的低频信息,则当前图像与层数较高的金字塔图像有更高的匹配分数。
在如图9所示出的例子中,第一个标记图像具有三个金字塔图像91、92和93,金字塔图像91位于金字塔的第一层,具有三个图像中的最小尺度;金字塔图像92位于金字塔的第二层,具有三个图像中的中间尺度;金字塔图像93位于金字塔的第三层,具有三个图像中的最大尺度,若当前图像94相对于第一个标记图像进行特征点追踪时,设备可以将当前图像94分别与三个金字塔图像中提取的特征点进行匹配,由于金字塔图像93和当前图像94的尺度更接近,则金字塔图像93中提取的特征点具有更高的匹配分数。
本实施例通过对第一个标记图像设置多个尺度的金字塔图像,并进而提取每层金字塔图像上的初始特征点用于后续的特征点追踪过程,通过多个尺度上的特征点共同匹配,自动调节了第一个标记图像的尺度,实现了尺度不变性。
特征点追踪阶段:
在基于图5所示的可选实施例中,对于步骤506所示出的特征点追踪过程。假设第一个标记图像中的初始特征点是N个,当前图像中的候选特征点是M个,则正常的特征点追踪过程的计算复杂度是N m次。为了减少特征点追踪过程的计算复杂度,终端基于词袋模型进行匹配加速。BoW(Bag of Words,词袋模型)是自然语言处理领域经常使用的一个概念。以文本为例,一篇文章可能有一万个词,其中可能只有500个不同的单词,每个词出现的次数各不相同。词袋就像一个个袋子,每个袋子里装着同样的词。这构成了一种文本的表示方式。这种表示方式不考虑文法以及词的顺序。在计算机视觉领域,图像通常以特征点以及该特征点的特征描述子来表达。如果把该特征点的特征描述子看做单词,那么能构建出相应的词袋模型。
此时,步骤506包括如下子步骤,如图10所示:
步骤506a,通过词袋模型将初始特征点聚类至第一节点树,第一节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的初始特征点;
可选地,初始特征点采用ORB特征点来表示。每个ORB特征点包括:FAST角点(Key-point)和BRIER描述子。BRIER描述子能够表征初始特征点的特征,该特征能够用于进行聚类。
本实施例中的BoW可使用DBoW2库,DBoW2库是University of Zara里的Lopez等人开发的开源软件库。设备通过词袋模型将多个初始特征点聚类至第一节点树。
可选地,如图11所示,设备先将多个初始特征点作为第一节点树的根节点,通过词袋模型将多个初始特征点聚类为K个分类构成第一层节点,每个节点中包括属于同一类的初始特征点;然后,第一层节点中的任意一个节点再聚类为K个分类,构成该节点的K个孩子节点,依此类推,设备将第L层节点中的任意一个节点再聚类为K个分类,构成该节点的K个孩子节点。可选地,聚类算法采用K-means聚类算法,该K-means聚类算法可以采用训练集中的图像提取到的特征进行训练。
步骤506b,提取当前图像中的候选特征点;
设备还提取第一个标记图像中的初始特征点。可选地,设备提取特征点时采用的特征提取算法可以为FAST(Features from Accelerated Segment Test,加速段测试特征点)检测算法、Shi-Tomasi(史托马西)角点检测算法、Harris Corner Detection(Harris角点检测)算法、SIFT(Scale-Invariant Feature Transform,尺度不变特征转换)算法、ORB(Oriented FAST and Rotated BRIEF,快速特征点提取和描述)算法等。
由于SIFT特征的实时计算难度较大,为了保证实时性,设备可以提取第一个标记图像中的ORB特征点。一个ORB特征点包括FAST角点(Key-point)和BRIER描述子(Binary Robust Independent Elementary Feature Descirptor)两部分。当然在设备计算能力足够时,也可以提取SIFT特征,本申请实施例对此不加以限定,只需要对第一个标记图像和当前图像提取相同类型的特征即可。
步骤506c,通过词袋模型将候选特征点聚类至第二节点树,第二节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的候选特征点;
可选地,候选特征点采用ORB特征点来表示。每个ORB特征点包括:FAST角点(Key-point)和BRIER描述子。BRIER描述子能够表征候选特征点的特征,该特征能够用于进行聚类。
本实施例中的BoW可使用DBoW2库,DBoW2库是University of Zara里的Lopez等人 开发的开源软件库。设备通过词袋模型将多个候选特征点聚类至第二节点树。
可选地,设备先将多个候选特征点作为第二节点树的根节点,通过词袋模型将多个候选特征点聚类为K个分类构成第一层节点,每个节点中包括属于同一类的候选特征点;然后,第一层节点中的任意一个节点再聚类为K个分类,构成该节点的K个孩子节点,依此类推,设备将第L层节点中的任意一个节点再聚类为K个分类,构成该节点的K个孩子节点。可选地,聚类算法采用K-means聚类算法,该K-means聚类算法可以采用训练集中的图像提取到的特征进行训练。
步骤506d,将第一节点树中的正向索引中的第i个第一节点,与第二节点树中的正向索引中的第i个第二节点进行特征点追踪,得到多组匹配特征点对。
可选地,正向索引是指以深度优先遍历顺序或广度优先遍历顺序进行遍历时的顺序。第i个第一节点和第i个第二节点是两个节点树上位置相同的节点。比如,第i个第一节点是第一节点树上的第三层节点中第3个节点,则第i个第二节点是第二节点树上的第三层节点中第3个节点。
可选地,第i个第一节点是第一节点树中的中间节点,第i个第二节点是第二节点树中的中间节点,中间节点是位于根节点和叶子节点之间的节点。若第i个第一/第二节点是根节点,则计算复杂度与正常的特征点追踪过程相比没有得到简化;若第i个第一/第二节点是叶子节点,则有可能会错失正确匹配的特征点。设第i个第一节点和第i个第二节点是节点树上的第L层,第一个标记图像上有N个特征点,当前图像上有M个特征点,每个父亲节点有K个孩子节点,则本方法将搜索点的范围减少至(N)^(M/(K^L)),从而实现指数级的加速匹配。
在一个示意性的例子中,如图12所示,第一个标记图像上有N个初始特征点,将N个初始特征点聚类至第一节点树;当前图像上有M个目标特征点(与M个初始特征点匹配),M≤N,将M个目标特征点聚类至第二节点树。将两个节点树中的第三层节点(从根节点往下数)作为索引层,对于索引层的每个节点,找出A的正向索引中第一节点对应的特征集合Sa,找出B的正向索引中第二节点对应的特征集合Sb,在Sa和Sb中计算特征匹配。由于第一节点和第二节点上属于同一类的初始特征点大约为几个至几十个,当前图像上的目标特征点的数量相同或更少,因此匹配次数缩减为两个集合(拥有几个至几十个特征点)的匹配。
综上所述,本实施例提供的重定位方法,通过基于词袋模型将两个图像上的特征点分别聚类至两个节点树,利用两个节点树上相同位置的节点来缩小特征点匹配时的匹配范围,从而实现对特征点追踪过程的加速,能够更加快速地实现当前图像相对于第一个标记图像的特征点追踪,从而实现更快地重定位效果。
特征点筛选阶段:
由于在对第一个标记图像进行特征点提取时,通过不同尺度的金字塔图像提取了大量的特征。因此无论是通过正常的特征点追踪过程,还是上述可选实施例中的基于词袋加速的特征点追踪过程,最终得到的多组匹配特征点对中都会存在大量的错误匹配。对于Anchor-SLAM系统来讲,由于是通过分解两个图像对应的单应性矩阵homography来计算相机的旋转矩阵和平移向量,因此最少只需要4组匹配特征点对即可,多余的点反而会在ransac时造成不必要的误差。因此,实际计算过程并不需要太多组匹配特征点对,而是需要特别准确的少量组匹配特征点对即可。在基于图5的可选实施例中,步骤508所示出的按照约束条 件对多组匹配特征点对进行筛选的过程中。可选采用如下三个方向对多组匹配特征点对进行筛选。
1、匹配唯一性检验;
同一个初始特征点在目标图像中可能存在多个候选特征点,每个候选特征点与初始特征点之间存在匹配度,通常将排名第一的候选特征点确定与该初始特征点匹配的目标特征点。但在特征点匹配过程中,很容易出现两个匹配度非常接近的候选特征点,比如桌布上存在重复的花纹图案,这两个匹配度非常接近的候选特征点会有很大几率造成错误的匹配。也即这类型的匹配特征点对很可能出现意外而匹配失误,不具有唯一性,理应删除。
因此,匹配唯一性条件要求每个初始特征点的排名第一的候选特征点(目标特征点)与排名第二的候选特征点有一定的距离,也即目标特征点是与初始特征点唯一匹配的特征点,否则放弃该组匹配特征点对。
参考图13,此时步骤508可选包括如下步骤:
步骤5081,对于任一组匹配特征点对中的初始特征点,获取与初始特征点匹配的目标特征点和次一级特征点,目标特征点是与初始特征点匹配的多个候选特征点中匹配度排名第一的特征点,次一级特征点是与初始特征点匹配的多个候选特征点中匹配度排名第二的特征点;
步骤5082,检测排名第一的匹配度和排名第二的匹配度之间的差值是否大于预设阈值;
可选地,预设阈值是80%。设排名第一的匹配度(目标特征点与初始特征点之间的匹配度)为X,排名第二的匹配度(次一级特征点与初始特征点之间的匹配度)为Y,则检测X-Y是否大于80%X;若大于80%则进入步骤5083,若小于80%则进入步骤5084。
步骤5083,当排名第一的匹配度和排名第二的匹配度之间的差值大于预设阈值时,确定目标特征点是筛选后的目标特征点;
当排名第一的匹配度和排名第二的匹配度的差值大于预设阈值,则目标特征点是该初始特征点唯一匹配的特征点,符合筛选条件。将该组匹配特征点对确定为筛选后的匹配特征点对,或者,继续进行其它约束条件的筛选。
步骤5084,当排名第一的匹配度和排名第二的匹配度之间的差值小于预设阈值时,丢弃该组匹配特征点对。
当排名第一的匹配度和排名第二的匹配度的差值小于预设阈值,则该组匹配特征点对很有可能存在匹配失误,应当丢弃该组匹配特征点对。
综上所述,本实施例提供的重定位方法,通过按照匹配唯一性检验来筛选匹配特征点组,能够将存在较大匹配失误可能性的匹配特征点对滤除,从而保证筛选后的匹配特征点组符合匹配唯一特性,从而提高后续重定位过程中的计算准确性。
2、极线约束检验;
由于特征点的局部性,在多组匹配特征点对中可能会出现匹配度很高且满足匹配唯一性,但是几何位置上明显不满足要求的错误匹配。这种几何关系可以通过极线约束来约束。
极线约束(epipolar constraint)是指匹配点在其它视图上的对应点位于相应的极线上。对于Anchor-Switching AR System系统,由于每一帧图像都是同一个相机在不同相机姿态下拍摄,因此正确的匹配特征点对必然会满足极限约束。
图14是极限约束的原理示意图,在现实世界的平面上存在三维点x,则左成像平面上存 在观测点x 1,右成像平面上存在观测点x 2,则必然满足如下关系:
X 2=R*X 1+T;
其中,R为两个相机姿态之间的旋转矩阵,T为两个相机姿态之间的位移向量。
两边同时叉乘T可得
T×X 2=T×R*X 1
两边左乘X 2,则等式为0,进而得到:
X 2*T×X 2=0=X 2*T×R*X 1
令T*R为所求的基础矩阵F,则:
X 2×R*X 1=0;
显然,对于任意的一组匹配点,必然有如上的基础矩阵的限制,需要最少8组匹配特征点对即可计算出基础矩阵F。因此,在筛选出至少8组匹配特征点对(比如符合匹配唯一性的8组匹配特征点对)后,通过ransac的方法拟合出一个基础矩阵F验证极线误差,从而排除掉那些匹配分数高但是几何坐标不正确的点,从而保证几何一致性。
参考图15,此时步骤508可选包括如下步骤:
步骤508A,通过至少8组匹配特征点对拟合出基础矩阵,该基础矩阵用于拟合第一个标记图像和当前图像之间的极线约束条件;
可选地,通过匹配唯一性检验条件筛选出的至少8组匹配特征点,用于作为拟合基础矩阵的特征点对。
可选地,通过至少8组匹配特征点进行ransac的计算,计算得到第一个标记图像和当前图像之间的单应性矩阵,对单应性矩阵进行分解后得到旋转矩阵和位移向量。将旋转矩阵和位移向量相乘后,拟合出基础矩阵F。
步骤508B,对于任一个匹配特征点对,计算初始特征点的二维坐标、基础矩阵以及目标特征点的二维坐标之间的乘积;
对于任一个候选的匹配特征点对,按照如下公式进行计算:
X 2*F*X 1
其中,X 2是目标特征点在当前图像中的二维坐标,X 1是初始特征点在第一个标记图像中的二维坐标,F是上一步骤中拟合出的基础矩阵。
步骤508C,检测乘积是否小于误差阈值;
理想情况下,该乘积应当为零。但由于误差的存在,该乘积不完全为零。因此可以预先设置一个误差阈值,当该乘积属于误差阈值之内时,认为初始特征点和目标特征点之间符合极线约束。
若乘积小于误差阈值,则进入步骤508D;若乘积大于或等于误差阈值,则进入步骤508E。
步骤508D,当乘积小于误差阈值时,确定匹配特征点对是筛选后的匹配特征点对。
当乘积小于误差阈值时认为符合筛选条件,将该组匹配特征点对确定为筛选后的匹配特征点对,或者,继续进行其它约束条件的筛选。
步骤508E,当乘积大于或等于误差阈值时,丢弃该组匹配特征点对。
综上所述,本实施例提供的重定位方法,通过按照极线约束检验来筛选匹配特征点组,能够将不符合几何位置的匹配特征点对进行滤除,从而保证筛选后的匹配特征点组符合极线约束特性,从而提高后续重定位过程中的计算准确性
3、区域代表性约束
在特征点匹配过程中,目标图像上可能存在同一个密集区域内出现大量目标特征点的情况。特别是因为不同尺度的金字塔图像上提取的所有初始特征点都会放缩到原始尺度上,因此有更大几率出现在一个小范围内有好几个不同尺度下的目标特征点与初始特征点相匹配的情况。如图16所示,设左侧图像是第一个标记图像(born anchor或born image),右侧图像是当前图像。由于相机在采集当前图像时很靠近现实场景,因此只能与第一个标记图像上的局部区域匹配成功,此时所有匹配特征点对都集中出现在第一个标记图像上的一个局部区域内,再加上尺度金字塔,使得该局部区域内的匹配更加不具有代表性。
理想情况下,在重定位计算过程中用于计算单应性矩阵的特征点需要有足够的距离,最好是在标记图像上分布越远越好,这样的点更具有代表性。因此区域代表性约束是指在当前图像的各个局部区域中挑选出每个局部区域内具有代表性的目标特征点。
在基于图5的一个可选实施例中,提出了基于栅格的筛选方法。如图17所示,此时步骤508可选包括如下子步骤:
步骤508a,将当前图像进行栅格化处理,得到多个栅格区域;
设备按照预设的栅格大小,将当前图像进行栅格化处理。将当前图像划分为多个互不重叠的栅格区域。
步骤508b,对于多个栅格区域中存在目标特征点的任一栅格区域,筛选出该栅格区域中具有最高匹配度的目标特征点;
多个匹配特征点对中的目标特征点会分散在多个栅格区域中,每个栅格区域中的目标特征点可能为零到多个。对于存在目标特征点的任一栅格区域,会筛选出该栅格区域中具有最高匹配度的目标特征点。如图18所示,左图示出了预设栅格,通过该预设栅格对当前图像进行栅格化,得到多个栅格区域。对每个栅格区域中的目标特征点进行筛选,筛选出具有最高匹配度的目标特征点。
步骤508c,将具有最高匹配度的目标特征点对应的匹配特征点对,确定为筛选后的匹配特征点对。
当栅格区域中存在一个目标特征点时,将该目标特征点所在的匹配特征点对确定为筛选后的匹配特征点对。
当栅格区域中存在两个以上目标特征点时,获取每个目标特征点与对应初始特征点之间的匹配度,将具有最高匹配度的目标特征点确定为筛选后的匹配特征点对。
综上所述,本实施例提供的重定位方法,通过在每个栅格区域中筛选出具有最高匹配度的目标特征点,作为该栅格区域中具有代表性的目标特征点。该具有代表性的目标特征点能够唯一地代表当前的栅格区域,这样在重定位过程中计算出的单应性矩阵homography具有更好的鲁棒性,同时通过栅格区域的数量能够限制计算单应性矩阵homography时的最大数量,从而保证了计算homography时的计算速度。
重定位计算过程:
在基于图5所示的可选实施例中,对于步骤510所示出的相机姿态的位姿变化量计算过程。设备得到筛选后的多组匹配特征点对之后,将多组匹配特征点对(初始特征点和目标特征点)输入至ransac的算法中,计算得到当前图像相对于第一个标记图像的单应性矩阵homography,通过IMU中的分解算法对单应性矩阵homography可以分解得到旋转矩阵 R relocalize和平移向量T relocalize,也即相机在采集当前图像时的目标位姿参数。
如图19所示,步骤510可选包括如下子步骤:
步骤510a,根据筛选后的多组匹配特征点对,计算相机在相机姿态改变过程时的单应性矩阵;
设备将多组匹配特征点对(初始特征点和目标特征点)输入至ransac算法中,计算得到当前图像相对于第一个标记图像的单应性矩阵homography
步骤510b,通过单应性矩阵计算初始特征点在当前图像上的投影特征点;
设备从所有初始特征点中筛选出具有相匹配的目标特征点的初始特征点,计算每个初始特征点在当前图像上的投影特征点。可选地,将每个初始特征点与单应性矩阵homography相乘后,得到每个初始特征点在当前图像上的投影特征点。
步骤510c,计算投影特征点和目标特征点之间的投影误差;
对于每个初始特征点,计算与该初始特征点对应的投影特征点和目标特征点之间的投影误差。当投影特征点和目标特征点之间的距离小于距离误差时,认为该目标特征点为inlier(内点);当投影特征点和目标特征点之间的距离大于距离误差时,认为该目标特征点为outlier(外点)。然后,设备统计外点数量占所有目标特征点的总数量的比例。
步骤510d,当投影误差小于预设阈值时,对单应性矩阵进行分解,得到相机从初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize
可选地,预设阈值为50%。
当处于outlier的点占目标特征点的总数量的点的比例小于50%时,认为本次计算得到单应性矩阵是可靠的,设备对单应性矩阵进行分解,得到相机从初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize;当处于outlier的点占目标特征点的总数量的点的比例大于50%时,认为本次计算得到单应性矩阵是不可靠的,放弃本次结果。
需要说明的是,步骤510b和步骤510c所示出的统计过程是可选步骤,
综上所述,本实施例提供的重定位方法,能够通过统计outlier的个数来对单应性矩阵进行校验,当校验失败时放弃本次结果,从而保证单应性矩阵的计算准确性,进而保证重定位结果的计算准确性。
在一个示意性的例子中,上述相机姿态追踪过程的重定位方法可以用于AR程序中,通过该重定位方法能够实时根据现实世界的场景信息,对电子设备上的相机姿态进行追踪,并根据追踪结果调整和修改AR应用程序中的AR元素的显示位置。以图1或图2所示的运行在手机上的AR程序为例,当需要显示一个站立在书籍上的静止卡通人物时,不论用户如何移动该手机,只需要根据该手机上的相机姿态变化修改该卡通人物的显示位置,即可使该卡通人物在书籍上的站立位置保持不变。
以下为本申请的装置实施例,对于装置实施例中未详细描述的技术细节,请参考上述方法实施例中的描述,本文不再一一赘述。
请参考图20,其示出了本申请一个示例性实施例提供的相机姿态追踪过程的重定位装置的结构框图。该重定位装置可以通过软件、硬件或者两者的结合实现成为电子设备的全部或一部分。所述电子设备用于按序执行多个标记图像的相机姿态追踪,所述装置包括:
图像获取模块2010,用于获取所述多个标记图像中第i个标记图像后采集的当前图像,i>1;
信息获取模块2020,用于当所述当前图像符合重定位条件时,获取所述多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;
特征点追踪模块2030,用于将所述当前图像相对于所述第一个标记图像的所述初始特征点进行特征点追踪,得到多组匹配特征点对;
特征点筛选模块2040,用于对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;
计算模块2050,用于根据所述筛选后的匹配特征点对,计算所述相机从所述初始位姿参数改变至目标位姿参数时的位姿变化量;
重定位模块2060,用于根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数。
在一个可选的实施例中,所述约束条件包括如下条件中的至少一个:
所述目标特征点是与所述初始特征点唯一匹配的特征点;
所述初始特征点和所述目标特征点满足极线约束;
所述目标特征点是所在栅格区域上匹配度最高的特征点,所述栅格区域是将所述当前图像进行栅格化后得到的区域。
在一个可选的实施例中,所述约束条件包括所述目标特征点是与所述初始特征点唯一匹配的特征点;
所述特征点筛选模块2040,用于对于任一组所述匹配特征点对中的所述初始特征点,获取与所述初始特征点匹配的所述目标特征点和次一级特征点,所述目标特征点是与所述初始特征点匹配的多个候选特征点中匹配度排名第一的特征点,所述次一级特征点是与所述初始特征点匹配的多个候选特征点中匹配度排名第二的特征点;当所述排名第一的匹配度和所述排名第二的匹配度之间的差值大于预设阈值时,确定所述目标特征点是所述筛选后的目标特征点。
在一个可选的实施例中,所述约束条件包括所述初始特征点和所述目标特征点满足极线约束;
所述特征点筛选模块2040,用于对于任一个所述匹配特征点对,计算所述初始特征点的二维坐标、基础矩阵以及所述目标特征点的二维坐标之间的乘积;所述基础矩阵用于拟合所述第一个标记图像和所述当前图像之间的极线约束条件;当所述乘积小于误差阈值时,确定所述匹配特征点对是所述筛选后的匹配特征点对。
在一个可选的实施例中,所述约束条件包括所述目标特征点是所在栅格区域上匹配度最高的特征点;
所述特征点筛选模块2040,用于将所述当前图像进行栅格化处理,得到多个栅格区域;对于所述多个栅格区域中存在目标特征点的任一栅格区域,筛选出所述栅格区域中具有最高匹配度的目标特征点;将所述具有最高匹配度的目标特征点对应的匹配特征点对,确定为所述筛选后的匹配特征点对。
在一个可选的实施例中,所述特征点追踪模块2030,用于通过词袋模型将所述初始特征点聚类至第一节点树,所述第一节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的初始特征点;提取所述当前图像中的候选特征点,通过所述词袋模型将所述候选特征点聚类至第二节点树,所述第二节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的候选特征点;将所述第一节点树中的正向索引中的第i个第一 节点,与所述第二节点树中的正向索引中的第i个第二节点进行特征点追踪,得到多组匹配特征点对。
在一个可选的实施例中,所述第i个第一节点是所述第一节点树中的中间节点,所述第i个第二节点是所述第二节点树中的中间节点,所述中间节点是位于根节点和叶子节点之间的节点。
在一个可选的实施例中,所述计算模块2050,用于根据所述筛选后的多组匹配特征点对,计算所述相机在相机姿态改变过程时的单应性矩阵;对所述单应性矩阵进行分解,得到所述相机从所述初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize。
在一个可选的实施例中,所述计算模块2050,用于通过所述单应性矩阵计算所述初始特征点在所述当前图像上的投影特征点;计算所述投影特征点和所述目标特征点之间的投影误差;当所述投影误差小于预设阈值时,执行所述对所述单应性矩阵进行分解,得到所述相机从所述初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize的步骤。
图21示出了本申请一个示例性实施例提供的电子设备2100的结构框图。该电子设备2100可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。电子设备2100还可能被称为用户设备、便携式电子设备、膝上型电子设备、台式电子设备等其他名称。
通常,电子设备2100包括有:处理器2101和存储器2102。
处理器2101可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器2101可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器2101也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器2101可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器2101还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器2102可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器2102还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器2102中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器2101所执行以实现本申请中方法实施例提供的相机姿态追踪过程的重定位方法。
在一些实施例中,电子设备2100还可选包括有:外围设备接口2103和至少一个外围设备。处理器2101、存储器2102和外围设备接口2103之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口2103相连。示意性的,外围设备包括:射频电路2104、触摸显示屏2105、摄像头2106、音频电路2107、定位组件2108和电源2109中的至少一种。
外围设备接口2103可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器2101和存储器2102。在一些实施例中,处理器2101、存储器2102和外围设 备接口2103被集成在同一芯片或电路板上;在一些其他实施例中,处理器2101、存储器2102和外围设备接口2103中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路2104用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路2104通过电磁信号与通信网络以及其他通信设备进行通信。射频电路2104将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路2104包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路2104可以通过至少一种无线通信协议来与其它电子设备进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路2104还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏2105用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏2105是触摸显示屏时,显示屏2105还具有采集在显示屏2105的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器2101进行处理。此时,显示屏2105还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏2105可以为一个,设置电子设备2100的前面板;在另一些实施例中,显示屏2105可以为至少两个,分别设置在电子设备2100的不同表面或呈折叠设计;在再一些实施例中,显示屏2105可以是柔性显示屏,设置在电子设备2100的弯曲表面上或折叠面上。甚至,显示屏2105还可以设置成非矩形的不规则图形,也即异形屏。显示屏2105可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件2106用于采集图像或视频。可选地,摄像头组件2106包括前置摄像头和后置摄像头。通常,前置摄像头设置在电子设备的前面板,后置摄像头设置在电子设备的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件2106还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路2107可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器2101进行处理,或者输入至射频电路2104以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在电子设备2100的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器2101或射频电路2104的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路2107还可以包括耳机插孔。
定位组件2108用于定位电子设备2100的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件2108可以是基于美国的GPS(Global Positioning  System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源2109用于为电子设备2100中的各个组件进行供电。电源2109可以是交流电、直流电、一次性电池或可充电电池。当电源2109包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,电子设备2100还包括有一个或多个传感器2110。该一个或多个传感器2110包括但不限于:加速度传感器2111、陀螺仪传感器2112、压力传感器2113、指纹传感器2114、光学传感器2115以及接近传感器2116。
加速度传感器2111可以检测以电子设备2100建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器2111可以用于检测重力加速度在三个坐标轴上的分量。处理器2101可以根据加速度传感器2111采集的重力加速度信号,控制触摸显示屏2105以横向视图或纵向视图进行用户界面的显示。加速度传感器2111还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器2112可以检测电子设备2100的机体方向及转动角度,陀螺仪传感器2112可以与加速度传感器2111协同采集用户对电子设备2100的3D动作。处理器2101根据陀螺仪传感器2112采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器2113可以设置在电子设备2100的侧边框和/或触摸显示屏2105的下层。当压力传感器2113设置在电子设备2100的侧边框时,可以检测用户对电子设备2100的握持信号,由处理器2101根据压力传感器2113采集的握持信号进行左右手识别或快捷操作。当压力传感器2113设置在触摸显示屏2105的下层时,由处理器2101根据用户对触摸显示屏2105的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器2114用于采集用户的指纹,由处理器2101根据指纹传感器2114采集到的指纹识别用户的身份,或者,由指纹传感器2114根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器2101授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器2114可以被设置电子设备2100的正面、背面或侧面。当电子设备2100上设置有物理按键或厂商Logo时,指纹传感器2114可以与物理按键或厂商Logo集成在一起。
光学传感器2115用于采集环境光强度。在一个实施例中,处理器2101可以根据光学传感器2115采集的环境光强度,控制触摸显示屏2105的显示亮度。示意性的,当环境光强度较高时,调高触摸显示屏2105的显示亮度;当环境光强度较低时,调低触摸显示屏2105的显示亮度。在另一个实施例中,处理器2101还可以根据光学传感器2115采集的环境光强度,动态调整摄像头组件2106的拍摄参数。
接近传感器2116,也称距离传感器,通常设置在电子设备2100的前面板。接近传感器2116用于采集用户与电子设备2100的正面之间的距离。在一个实施例中,当接近传感器2116检测到用户与电子设备2100的正面之间的距离逐渐变小时,由处理器2101控制触摸显示屏2105从亮屏状态切换为息屏状态;当接近传感器2116检测到用户与电子设备2100的正面之间的距离逐渐变大时,由处理器2101控制触摸显示屏2105从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图21中示出的结构并不构成对电子设备2100的限定,可以 包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本申请还提供一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述方法实施例提供的相机姿态追踪过程中的重定位方法。
本申请还提供了一种计算机程序产品,当其在电子设备上运行时,使得电子设备执行上述各个方法实施例所述的相机姿态追踪过程中的重定位方法。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种相机姿态追踪过程的重定位方法,其特征在于,应用于具有相机的设备中,所述设备用于按序执行多个标记图像的相机姿态追踪,所述方法包括:
    获取所述多个标记图像中第i个标记图像后采集的当前图像,i>1;
    当所述当前图像符合重定位条件时,获取所述多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;
    将所述当前图像相对于所述第一个标记图像的所述初始特征点进行特征点追踪,得到多组匹配特征点对;对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;
    根据所述筛选后的匹配特征点对,计算所述相机从所述初始位姿参数改变至目标位姿参数时的位姿变化量;
    根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数。
  2. 根据权利要求1所述的方法,其特征在于,所述约束条件包括如下条件中的至少一个:
    所述目标特征点是与所述初始特征点唯一匹配的特征点;
    所述初始特征点和所述目标特征点满足极线约束;
    所述目标特征点是所在栅格区域上匹配度最高的特征点,所述栅格区域是将所述当前图像进行栅格化后得到的区域。
  3. 根据权利要求2所述的方法,其特征在于,所述约束条件包括所述目标特征点是与所述初始特征点唯一匹配的特征点;
    所述对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对,包括:
    对于任一组所述匹配特征点对中的所述初始特征点,获取与所述初始特征点匹配的所述目标特征点和次一级特征点,所述目标特征点是与所述初始特征点匹配的多个候选特征点中匹配度排名第一的特征点,所述次一级特征点是与所述初始特征点匹配的多个候选特征点中匹配度排名第二的特征点;
    当所述排名第一的匹配度和所述排名第二的匹配度之间的差值大于预设阈值时,确定所述目标特征点是所述筛选后的目标特征点。
  4. 根据权利要求2所述的方法,其特征在于,所述约束条件包括所述初始特征点和所述目标特征点满足极线约束;
    所述对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对,包括:
    对于任一个所述匹配特征点对,计算所述初始特征点的二维坐标、基础矩阵以及所述目标特征点的二维坐标之间的乘积;所述基础矩阵用于拟合所述第一个标记图像和所述当前图像之间的极线约束条件;
    当所述乘积小于误差阈值时,确定所述匹配特征点对是所述筛选后的匹配特征点对。
  5. 根据权利要求2所述的方法,其特征在于,所述约束条件包括所述目标特征点是所在栅格区域上匹配度最高的特征点;
    所述对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对,包括:
    将所述当前图像进行栅格化处理,得到多个栅格区域;
    对于所述多个栅格区域中存在目标特征点的任一栅格区域,筛选出所述栅格区域中具有最高匹配度的目标特征点;
    将所述具有最高匹配度的目标特征点对应的匹配特征点对,确定为所述筛选后的匹配特征点对。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述将所述当前图像相对于所述第一个标记图像的所述初始特征点进行特征点追踪,得到多组匹配特征点对,包括:
    通过词袋模型将所述初始特征点聚类至第一节点树,所述第一节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的初始特征点;
    提取所述当前图像中的候选特征点,通过所述词袋模型将所述候选特征点聚类至第二节点树,所述第二节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的候选特征点;
    将所述第一节点树中的正向索引中的第i个第一节点,与所述第二节点树中的正向索引中的第i个第二节点进行特征点追踪,得到多组匹配特征点对。
  7. 根据权利要求6所述的方法,其特征在于,所述第i个第一节点是所述第一节点树中的中间节点,所述第i个第二节点是所述第二节点树中的中间节点,所述中间节点是位于根节点和叶子节点之间的节点。
  8. 根据权利要求1至5任一所述的方法,其特征在于,所述根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数,包括:
    根据所述筛选后的多组匹配特征点对,计算所述相机在相机姿态改变过程时的单应性矩阵;
    对所述单应性矩阵进行分解,得到所述相机从所述初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述筛选后的多组匹配特征点对,计算所述相机在相机姿态改变过程时的单应性矩阵之后,还包括:
    通过所述单应性矩阵计算所述初始特征点在所述当前图像上的投影特征点;
    计算所述投影特征点和所述目标特征点之间的投影误差;
    当所述投影误差小于预设阈值时,执行所述对所述单应性矩阵进行分解,得到所述相机从所述初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize的步骤。
  10. 一种相机姿态追踪过程的重定位装置,其特征在于,应用于具有相机的设备中,所述设备用于按序执行多个标记图像的相机姿态追踪,所述装置包括:
    图像获取模块,用于获取所述多个标记图像中第i个标记图像后采集的当前图像,i>1;
    信息获取模块,用于当所述当前图像符合重定位条件时,获取所述多个标记图像中的第一个标记图像的初始特征点和初始位姿参数;
    特征点追踪模块,用于将所述当前图像相对于所述第一个标记图像的所述初始特征点进行特征点追踪,得到多组匹配特征点对;
    特征点筛选模块,用于对所述多组匹配特征点对按照约束条件进行筛选,得到筛选后的匹配特征点对;
    计算模块,用于根据所述筛选后的匹配特征点对,计算所述相机从所述初始位姿参数改变至目标位姿参数时的位姿变化量;
    重定位模块,用于根据所述初始位姿参数和所述位姿变化量,重定位得到所述相机的所述目标位姿参数。
  11. 根据权利要求10所述的装置,其特征在于,所述约束条件包括如下条件中的至少一个:
    所述目标特征点是与所述初始特征点唯一匹配的特征点;
    所述初始特征点和所述目标特征点满足极线约束;
    所述目标特征点是所在栅格区域上匹配度最高的特征点,所述栅格区域是将所述当前图像进行栅格化后得到的区域。
  12. 根据权利要求11所述的装置,其特征在于,所述约束条件包括所述目标特征点是与所述初始特征点唯一匹配的特征点;
    所述特征点筛选模块,用于对于任一组所述匹配特征点对中的所述初始特征点,获取与所述初始特征点匹配的所述目标特征点和次一级特征点,所述目标特征点是与所述初始特征点匹配的多个候选特征点中匹配度排名第一的特征点,所述次一级特征点是与所述初始特征点匹配的多个候选特征点中匹配度排名第二的特征点;当所述排名第一的匹配度和所述排名第二的匹配度之间的差值大于预设阈值时,确定所述目标特征点是所述筛选后的目标特征点。
  13. 根据权利要求11所述的装置,其特征在于,所述约束条件包括所述初始特征点和所述目标特征点满足极线约束;
    所述特征点筛选模块,用于对于任一个所述匹配特征点对,计算所述初始特征点的二维坐标、基础矩阵以及所述目标特征点的二维坐标之间的乘积;所述基础矩阵用于拟合所述第一个标记图像和所述当前图像之间的极线约束条件;当所述乘积小于误差阈值时,确定所述匹配特征点对是所述筛选后的匹配特征点对。
  14. 根据权利要求11所述的装置,其特征在于,所述约束条件包括所述目标特征点是所在栅格区域上匹配度最高的特征点;
    所述特征点筛选模块,用于将所述当前图像进行栅格化处理,得到多个栅格区域;对于 所述多个栅格区域中存在目标特征点的任一栅格区域,筛选出所述栅格区域中具有最高匹配度的目标特征点;将所述具有最高匹配度的目标特征点对应的匹配特征点对,确定为所述筛选后的匹配特征点对。
  15. 根据权利要求10至14任一所述的装置,其特征在于,
    所述特征点追踪模块,用于通过词袋模型将所述初始特征点聚类至第一节点树,所述第一节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的初始特征点;提取所述当前图像中的候选特征点,通过所述词袋模型将所述候选特征点聚类至第二节点树,所述第二节点树的每个父亲节点包括K个孩子节点,每个节点中包括被聚类至同一类的候选特征点;将所述第一节点树中的正向索引中的第i个第一节点,与所述第二节点树中的正向索引中的第i个第二节点进行特征点追踪,得到多组匹配特征点对。
  16. 根据权利要求15所述的装置,其特征在于,所述第i个第一节点是所述第一节点树中的中间节点,所述第i个第二节点是所述第二节点树中的中间节点,所述中间节点是位于根节点和叶子节点之间的节点。
  17. 根据权利要求10至14任一所述的装置,其特征在于,
    所述计算模块2,用于根据所述筛选后的多组匹配特征点对,计算所述相机在相机姿态改变过程时的单应性矩阵;对所述单应性矩阵进行分解,得到所述相机从所述初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize
  18. 根据权利要求17所述的装置,其特征在于,
    所述计算模块2050,用于通过所述单应性矩阵计算所述初始特征点在所述当前图像上的投影特征点;计算所述投影特征点和所述目标特征点之间的投影误差;当所述投影误差小于预设阈值时,执行所述对所述单应性矩阵进行分解,得到所述相机从所述初始姿态参数改变至目标姿态参数时的位姿变化量R relocalize和T relocalize的步骤。
  19. 一种电子设备,其特征在于,所述电子设备包括存储器和处理器;
    所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如权利要求1至9任一所述的相机姿态追踪过程中的重定位方法。
  20. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至9任一所述的相机姿态追踪过程中的重定位方法。
PCT/CN2019/079768 2018-04-27 2019-03-26 相机姿态追踪过程的重定位方法、装置、设备及存储介质 WO2019205865A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19792167.9A EP3786892B1 (en) 2018-04-27 2019-03-26 Method, device and apparatus for repositioning in camera orientation tracking process, and storage medium
US16/915,825 US11481923B2 (en) 2018-04-27 2020-06-29 Relocalization method and apparatus in camera pose tracking process, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810393563.XA CN108615248B (zh) 2018-04-27 2018-04-27 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN201810393563.X 2018-04-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/915,825 Continuation US11481923B2 (en) 2018-04-27 2020-06-29 Relocalization method and apparatus in camera pose tracking process, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2019205865A1 true WO2019205865A1 (zh) 2019-10-31

Family

ID=63661366

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/079768 WO2019205865A1 (zh) 2018-04-27 2019-03-26 相机姿态追踪过程的重定位方法、装置、设备及存储介质

Country Status (4)

Country Link
US (1) US11481923B2 (zh)
EP (1) EP3786892B1 (zh)
CN (1) CN108615248B (zh)
WO (1) WO2019205865A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862150A (zh) * 2020-06-19 2020-10-30 杭州易现先进科技有限公司 图像跟踪的方法、装置、ar设备和计算机设备
CN112181141A (zh) * 2020-09-23 2021-01-05 北京市商汤科技开发有限公司 Ar定位的方法、装置、电子设备及存储介质
CN112233252A (zh) * 2020-10-23 2021-01-15 上海影谱科技有限公司 一种基于特征匹配与光流融合的ar目标跟踪方法及系统
CN113223184A (zh) * 2021-05-26 2021-08-06 北京奇艺世纪科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN115272491A (zh) * 2022-08-12 2022-11-01 哈尔滨工业大学 双目ptz相机动态自标定方法
CN117292085A (zh) * 2023-11-27 2023-12-26 浙江大学 一种支持三维建模的实体交互控制方法及其装置

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876854B (zh) 2018-04-27 2022-03-08 腾讯科技(深圳)有限公司 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN110555883B (zh) 2018-04-27 2022-07-22 腾讯科技(深圳)有限公司 相机姿态追踪过程的重定位方法、装置及存储介质
CN110544280B (zh) 2018-05-22 2021-10-08 腾讯科技(深圳)有限公司 Ar系统及方法
CN109919998B (zh) * 2019-01-17 2021-06-29 中国人民解放军陆军工程大学 卫星姿态确定方法、装置和终端设备
CN109903343B (zh) * 2019-02-28 2023-05-23 东南大学 一种基于惯性姿态约束的特征匹配方法
CN110009681B (zh) * 2019-03-25 2021-07-30 中国计量大学 一种基于imu辅助的单目视觉里程计位姿处理方法
KR102143349B1 (ko) * 2019-03-27 2020-08-11 엘지전자 주식회사 이동 로봇의 제어 방법
CN110298884B (zh) * 2019-05-27 2023-05-30 重庆高开清芯科技产业发展有限公司 一种适于动态环境中单目视觉相机的位姿估计方法
CN110310333B (zh) * 2019-06-27 2021-08-31 Oppo广东移动通信有限公司 定位方法及电子设备、可读存储介质
US11256949B2 (en) * 2019-06-28 2022-02-22 Intel Corporation Guided sparse feature matching via coarsely defined dense matches
CN112150405A (zh) * 2019-06-28 2020-12-29 Oppo广东移动通信有限公司 一种图像质量分析方法及装置、存储介质
CN110706257B (zh) * 2019-09-30 2022-07-22 北京迈格威科技有限公司 有效特征点对的识别方法、相机状态的确定方法及装置
US11138760B2 (en) * 2019-11-06 2021-10-05 Varjo Technologies Oy Display systems and methods for correcting drifts in camera poses
CN111127497B (zh) * 2019-12-11 2023-08-04 深圳市优必选科技股份有限公司 一种机器人及其爬楼控制方法和装置
CN113033590A (zh) * 2019-12-25 2021-06-25 杭州海康机器人技术有限公司 图像特征匹配方法、装置、图像处理设备及存储介质
US11397869B2 (en) * 2020-03-04 2022-07-26 Zerofox, Inc. Methods and systems for detecting impersonating social media profiles
CN111563922B (zh) * 2020-03-26 2023-09-26 北京迈格威科技有限公司 视觉定位方法、装置、电子设备及存储介质
CN111784675A (zh) * 2020-07-01 2020-10-16 云南易见纹语科技有限公司 物品纹理信息处理的方法、装置、存储介质及电子设备
CN111757100B (zh) * 2020-07-14 2022-05-31 北京字节跳动网络技术有限公司 相机运动变化量的确定方法、装置、电子设备和介质
CN111950642B (zh) * 2020-08-17 2024-06-21 联想(北京)有限公司 一种重定位方法及电子设备
CN112164114B (zh) * 2020-09-23 2022-05-20 天津大学 一种基于天际线匹配的室外主动相机重定位方法
CN112272188B (zh) * 2020-11-02 2022-03-11 重庆邮电大学 一种电商平台数据隐私保护的可搜索加密方法
CN113012194B (zh) * 2020-12-25 2024-04-09 深圳市铂岩科技有限公司 目标追踪方法、装置、介质和设备
CN112651997B (zh) * 2020-12-29 2024-04-12 咪咕文化科技有限公司 地图构建方法、电子设备和存储介质
US11865724B2 (en) * 2021-04-26 2024-01-09 Ubkang (Qingdao) Technology Co., Ltd. Movement control method, mobile machine and non-transitory computer readable storage medium
CN113177974B (zh) * 2021-05-19 2024-07-12 上海商汤临港智能科技有限公司 一种点云配准方法、装置、电子设备及存储介质
CN113392909B (zh) * 2021-06-17 2022-12-27 深圳市睿联技术股份有限公司 数据处理方法、数据处理装置、终端及可读存储介质
CN113221926B (zh) * 2021-06-23 2022-08-02 华南师范大学 一种基于角点优化的线段提取方法
CN113223007A (zh) * 2021-06-28 2021-08-06 浙江华睿科技股份有限公司 视觉里程计的实现方法、装置及电子设备
CN113673321A (zh) * 2021-07-12 2021-11-19 浙江大华技术股份有限公司 目标重识别方法、目标重识别装置及计算机可读存储介质
CN114697553A (zh) * 2022-03-30 2022-07-01 浙江大华技术股份有限公司 设备的预置位调整方法和装置、存储介质及电子设备
CN115446834B (zh) * 2022-09-01 2024-05-28 西南交通大学 一种基于占据栅格配准的车底巡检机器人单轴重定位方法
CN116433887B (zh) * 2023-06-12 2023-08-15 山东鼎一建设有限公司 基于人工智能的建筑物快速定位方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120038616A (ko) * 2010-10-14 2012-04-24 한국전자통신연구원 마커리스 실감형 증강현실 제공 방법 및 시스템
CN106885574A (zh) * 2017-02-15 2017-06-23 北京大学深圳研究生院 一种基于重跟踪策略的单目视觉机器人同步定位与地图构建方法
CN106934827A (zh) * 2015-12-31 2017-07-07 杭州华为数字技术有限公司 三维场景的重建方法和装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009048516A (ja) * 2007-08-22 2009-03-05 Sony Corp 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム
CN102118561B (zh) * 2010-05-27 2013-09-11 周渝斌 监控系统中相机移动检测系统及方法
CN102435172A (zh) * 2011-09-02 2012-05-02 北京邮电大学 一种球形机器人视觉定位系统及视觉定位方法
KR101926563B1 (ko) * 2012-01-18 2018-12-07 삼성전자주식회사 카메라 추적을 위한 방법 및 장치
CN104680516B (zh) * 2015-01-08 2017-09-29 南京邮电大学 一种图像优质特征匹配集的获取方法
CN105141912B (zh) * 2015-08-18 2018-12-07 浙江宇视科技有限公司 一种信号灯重定位的方法及设备
CN105069809B (zh) * 2015-08-31 2017-10-03 中国科学院自动化研究所 一种基于平面混合标识物的相机定位方法及系统
US10152825B2 (en) * 2015-10-16 2018-12-11 Fyusion, Inc. Augmenting multi-view image data with synthetic objects using IMU and image data
CN106595601B (zh) * 2016-12-12 2020-01-07 天津大学 一种无需手眼标定的相机六自由度位姿精确重定位方法
CN107301661B (zh) * 2017-07-10 2020-09-11 中国科学院遥感与数字地球研究所 基于边缘点特征的高分辨率遥感图像配准方法
CN107481265B (zh) * 2017-08-17 2020-05-19 成都通甲优博科技有限责任公司 目标重定位方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120038616A (ko) * 2010-10-14 2012-04-24 한국전자통신연구원 마커리스 실감형 증강현실 제공 방법 및 시스템
CN106934827A (zh) * 2015-12-31 2017-07-07 杭州华为数字技术有限公司 三维场景的重建方法和装置
CN106885574A (zh) * 2017-02-15 2017-06-23 北京大学深圳研究生院 一种基于重跟踪策略的单目视觉机器人同步定位与地图构建方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3786892A4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862150A (zh) * 2020-06-19 2020-10-30 杭州易现先进科技有限公司 图像跟踪的方法、装置、ar设备和计算机设备
CN112181141A (zh) * 2020-09-23 2021-01-05 北京市商汤科技开发有限公司 Ar定位的方法、装置、电子设备及存储介质
CN112181141B (zh) * 2020-09-23 2023-06-23 北京市商汤科技开发有限公司 Ar定位的方法、装置、电子设备及存储介质
CN112233252A (zh) * 2020-10-23 2021-01-15 上海影谱科技有限公司 一种基于特征匹配与光流融合的ar目标跟踪方法及系统
CN112233252B (zh) * 2020-10-23 2024-02-13 上海影谱科技有限公司 一种基于特征匹配与光流融合的ar目标跟踪方法及系统
CN113223184A (zh) * 2021-05-26 2021-08-06 北京奇艺世纪科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN113223184B (zh) * 2021-05-26 2023-09-05 北京奇艺世纪科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN115272491A (zh) * 2022-08-12 2022-11-01 哈尔滨工业大学 双目ptz相机动态自标定方法
CN117292085A (zh) * 2023-11-27 2023-12-26 浙江大学 一种支持三维建模的实体交互控制方法及其装置
CN117292085B (zh) * 2023-11-27 2024-02-09 浙江大学 一种支持三维建模的实体交互控制方法及其装置

Also Published As

Publication number Publication date
US20200327695A1 (en) 2020-10-15
CN108615248A (zh) 2018-10-02
CN108615248B (zh) 2022-04-05
EP3786892A1 (en) 2021-03-03
EP3786892A4 (en) 2022-01-26
US11481923B2 (en) 2022-10-25
EP3786892B1 (en) 2024-05-01

Similar Documents

Publication Publication Date Title
WO2019205865A1 (zh) 相机姿态追踪过程的重定位方法、装置、设备及存储介质
WO2019205842A1 (zh) 相机姿态追踪过程的重定位方法、装置及存储介质
CN110544280B (zh) Ar系统及方法
WO2019205853A1 (zh) 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN108596976B (zh) 相机姿态追踪过程的重定位方法、装置、设备及存储介质
US11481982B2 (en) In situ creation of planar natural feature targets
WO2019205851A1 (zh) 位姿确定方法、装置、智能设备及存储介质
CN108876854B (zh) 相机姿态追踪过程的重定位方法、装置、设备及存储介质
CN111738220A (zh) 三维人体姿态估计方法、装置、设备及介质
CN109947886B (zh) 图像处理方法、装置、电子设备及存储介质
WO2019205850A1 (zh) 位姿确定方法、装置、智能设备及存储介质
CN114303120A (zh) 虚拟键盘
CN110310329A (zh) 操作显示设备的方法、信息处理系统及非暂时性存储介质
CN110148178B (zh) 相机定位方法、装置、终端及存储介质
CN108830186B (zh) 文本图像的内容提取方法、装置、设备及存储介质
CN108682037B (zh) 相机姿态追踪过程的重定位方法、装置、设备及存储介质
US20220398775A1 (en) Localization processing service
CN114170349A (zh) 图像生成方法、装置、电子设备及存储介质
CN112150560A (zh) 确定消失点的方法、装置及计算机存储介质
KR20240005953A (ko) 증강 현실 경험의 시동 시간 감소
TWI779332B (zh) 擴增實境系統與其錨定顯示虛擬物件的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19792167

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2019792167

Country of ref document: EP