WO2020259248A1 - Depth information-based pose determination method and device, medium, and electronic apparatus - Google Patents

Depth information-based pose determination method and device, medium, and electronic apparatus Download PDF

Info

Publication number
WO2020259248A1
WO2020259248A1 PCT/CN2020/094461 CN2020094461W WO2020259248A1 WO 2020259248 A1 WO2020259248 A1 WO 2020259248A1 CN 2020094461 W CN2020094461 W CN 2020094461W WO 2020259248 A1 WO2020259248 A1 WO 2020259248A1
Authority
WO
WIPO (PCT)
Prior art keywords
key frame
error
feature points
map
current frame
Prior art date
Application number
PCT/CN2020/094461
Other languages
French (fr)
Chinese (zh)
Inventor
王宇鹭
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020259248A1 publication Critical patent/WO2020259248A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a method for determining a pose based on depth information, a device for determining a pose based on depth information, a computer-readable storage medium, and electronic equipment.
  • SLAM Simultaneous Localization And Mapping, Simultaneous Localization And Mapping
  • the present disclosure provides a method for determining a pose based on depth information, a device for determining a pose based on depth information, a computer-readable storage medium, and electronic equipment, thereby improving at least to a certain extent the low tracking accuracy of existing SLAM methods The problem.
  • a method for determining a pose based on depth information including: acquiring a current frame image of a scene through a camera, and acquiring depth information of the current frame image; and extracting from the current frame image Feature points, the feature points with valid depth information are determined as three-dimensional feature points, and the feature points with invalid depth information are determined as two-dimensional feature points; the two-dimensional feature points and the three-dimensional feature points are respectively connected to the local map points of the scene Matching is performed to construct a first error function.
  • the first error function includes a two-dimensional error term and a three-dimensional error term.
  • the two-dimensional error term is the error between the successfully matched two-dimensional feature point and the local map point.
  • the three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point; by calculating the minimum value of the first error function, the pose parameter of the camera in the current frame is determined.
  • a device for determining a pose based on depth information including: an image acquisition module for acquiring a current frame image of a scene through a camera, and acquiring depth information of the current frame image;
  • the point extraction module is used to extract feature points from the current frame image, determine the feature points with valid depth information as three-dimensional feature points, and determine the feature points with invalid depth information as two-dimensional feature points;
  • the function building module is used to The two-dimensional feature points and the three-dimensional feature points are matched with the local map points respectively to construct a first error function.
  • the first error function includes a two-dimensional error term and a three-dimensional error term, and the two-dimensional error term is a successful match.
  • the three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point;
  • the pose determination module is used to calculate the first error function The minimum value of determines the pose parameter of the camera in the current frame.
  • a computer-readable storage medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the pose determination method of the first aspect and possible implementations thereof.
  • an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions Perform the pose determination method of the first aspect and possible implementations thereof.
  • the feature points extracted in the image are divided into two-dimensional feature points and three-dimensional feature points, and the two types of feature points are matched with local map points to construct a two-dimensional error term and a three-dimensional error term , So as to establish the first error function, and obtain the pose parameters of the camera in the current frame by optimizing the minimum value of the first error function.
  • introducing depth information to establish an error function as a basis for determining pose parameters can improve the accuracy of pose parameters in SLAM and increase tracking accuracy.
  • the feature points are classified, and error terms are constructed respectively.
  • this exemplary embodiment Compared with the method of optimizing all the feature points through one error term, this exemplary embodiment has higher flexibility and Pertinence, and can reduce the influence of invalid or wrong depth information on the results. In addition, it can also improve the stability and robustness of the SLAM method for a long time.
  • FIG. 1 shows a schematic diagram of the architecture of a SLAM system in this exemplary embodiment
  • Fig. 2 shows a flow chart of a method for determining a pose based on depth information in this exemplary embodiment
  • Fig. 3 shows a sub-flow chart of a method for determining a pose based on depth information in this exemplary embodiment
  • Fig. 4 shows a flow chart of a SLAM method in this exemplary embodiment
  • Fig. 5 shows a structural block diagram of a device for determining a pose based on depth information in this exemplary embodiment
  • Fig. 6 shows a computer-readable storage medium for implementing the above method in this exemplary embodiment
  • Fig. 7 shows an electronic device for implementing the above-mentioned method in this exemplary embodiment.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, the provision of these embodiments makes the present disclosure more comprehensive and complete, and fully conveys the concept of the example embodiments To those skilled in the art.
  • the described features, structures or characteristics may be combined in one or more embodiments in any suitable way.
  • the exemplary embodiments of the present disclosure first provide a method for determining a pose based on depth information, which is mainly applied in a SLAM scene to determine the pose of a camera.
  • Figure 1 shows the SLAM system architecture diagram of the application environment of this method.
  • the SLAM system 100 may include: a scene 101, a movable camera 102, a movable depth sensor 103, and a computing device 104.
  • the scene 101 is a realistic scene to be modeled, such as an interior, a courtyard, and a street.
  • the camera 102 and the depth sensor 103 can be integrated.
  • the camera 102 is a plane camera, and the depth sensor 103 is a TOF (Time Of Flight) sensor set aside; or the camera 102 and the depth sensor 103 are two cameras, A binocular camera is formed; or the depth sensor 103 is an infrared light device, which and the camera 102 form a structured light camera.
  • the camera 102 and the depth sensor 103 can move within the scene 101 to collect the image of the scene 101 and its depth information.
  • FIG. 1 shows that the camera 102 and the depth sensor 103 are set on a movable robot.
  • the user can hold a mobile phone or wear it. Smart glasses and the like move within the scene 101, and the mobile phone or smart glasses has a camera 102 and a depth sensor 103 built in.
  • the computing device 104 can be a terminal computer or server, etc., which is connected to the camera 102 and the depth sensor 103 for data interaction.
  • the camera 102 and the depth sensor 103 send the collected images and their depth information to the computing device 104, and the computing device 104 performs Process analysis to achieve positioning and modeling in SLAM.
  • the SLAM system 100 shown in FIG. 1 is only an example, and there can be several variations.
  • the camera 102, the depth sensor 103, and the computing device 104 can be integrated into one device, such as a built-in camera. 102.
  • the robot of the depth sensor 103 and the computing device 104 can take pictures while moving in the scene 101, and process the photos to realize positioning and modeling; the number of devices is not limited to the situation shown in FIG.
  • Figure 1 can also be added Devices not shown in, such as setting up an IMU (Inertia Measurement Unit) matched with the camera 102 to help determine the pose of the camera 102, or setting a projection device to generate a virtual projection in the scene 103, and the camera 102 , The user or the robot interacts.
  • IMU Inertia Measurement Unit
  • the scene is modeled from scratch, and no scene images are collected at this time; after the SLAM process starts, the camera follows the user or the robot to move in the scene, collecting scene images while moving, forming continuous frames of images Stream, sent to the computing device in real time; after obtaining a certain number of frames of images, the computing device can initialize the map model of the scene, which usually only includes a small part of the scene, or is different from the actual scene; after that, the camera collects a frame of Images and computing devices can update and optimize the map model based on the image (of course, you can also update and optimize the map model when the key frames are filtered out), add map points that are not in the map model, or modify the location of existing map points When updating and optimizing the map model, you need to determine the camera's pose parameters, which is a necessary part in SLAM. Only after the camera's pose parameters are determined, can the images collected by the camera be matched to To update and optimize the map model in the three-dimensional map model.
  • This exemplary embodiment is an improved method for how to determine the pose parameters of the camera in each frame.
  • the execution subject of this exemplary embodiment may be the computing device 104 therein.
  • Fig. 2 shows a process of this exemplary embodiment, which may include the following steps S210 to S240:
  • Step S210 Acquire a current frame image of the scene through the camera, and acquire depth information of the current frame image.
  • each frame of image collected by the camera the computing device analyzes one frame of image, and the current frame of image is the latest frame of image collected by the camera.
  • Depth information is collected at the same time as the current frame of image. For example, if you use a plane camera and a depth sensor to capture a scene image, you can get the depth information of each pixel in the image, usually the depth value; if you use a binocular camera to capture the scene image, you can The triangulation algorithm obtains the depth information of each pixel in the image; the structured light camera is used to shoot the scene image, and the infrared dot matrix can be used to project the infrared light signal to the scene, and after receiving the reflected signal, it is calculated by the change of the infrared light In-depth information, etc.
  • Step S220 Extract feature points from the current frame image, determine feature points with valid depth information as three-dimensional feature points, and determine feature points with invalid depth information as two-dimensional feature points.
  • the feature points are representative and highly recognizable points or regions in the image, such as corner points, edges, and some blocks in the image.
  • This exemplary embodiment can use the ORB algorithm (Oriented FAST and Rotated Brief).
  • FAST Features From Accelerated Segment Test, features based on accelerated segmentation detection
  • rotating Brief Binary Robust Independent Elementary Features, Binary Robust Independent Elementary Features
  • FAST can also be used , SIFT (Scale-Invariant Feature Transform, Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features, accelerated robust features) and other algorithms to extract feature points; it can also perform target detection on the current frame of image, and detect the edges of objects Extract certain feature points on the contour, and so on.
  • the extracted feature points are pixels in the current frame image, which have depth information.
  • depth information In this exemplary embodiment, considering the capability limitations of depth detection components such as depth sensors, binocular cameras, etc., it is impossible to accurately detect the depth information of objects that are too close or too far from the depth sensor, or for black or high-reflective materials.
  • the processing capabilities of objects and scenes with large lighting changes are poor, and the depth information of the current frame image may contain invalid pixel depth information. Therefore, based on whether the depth information is valid, the feature points can be divided into three-dimensional feature points and two-dimensional feature points: feature points with invalid depth information are two-dimensional feature points, and because their depth information is invalid, only their values in the current frame image are retained.
  • Two-dimensional coordinates that is, plane coordinates
  • the feature points with effective depth information are three-dimensional feature points.
  • they also have their third-dimensional coordinates in the depth direction, and their coordinate values are usually The depth value.
  • the detection methods and standards used can be based on Depending on the situation, this disclosure does not limit this, and several specific detection method examples are provided below.
  • the depth sensor When the depth sensor cannot accurately detect the depth information of the object, it can output the depth value of the corresponding part as an invalid value or an abnormal value.
  • the depth detection range of a TOF sensor is usually 0.5 to 3 meters. If the object reaches the TOF sensor If the distance is outside this range, the TOF (the time difference between the transmitted and received signal) sensed by the TOF sensor exceeds the upper or lower limit, and the depth value of the object can be recorded as an invalid value or upper and lower limit.
  • the depth value is not credible and is invalid information; on the contrary, if the depth value is a normal value within the detection range, it is valid information.
  • all feature points of each object in the current frame image can be uniformly detected with the object as a unit, and the depth value span of the object is detected (that is, the maximum depth value minus the minimum depth Value), if it is within the normal range, the depth information of all feature points of the object are valid.
  • a chair is detected from the current frame image, and 10 feature points (including corner points, edge points, etc.) are extracted from the contour of the chair, and the maximum depth value of the 10 feature points is subtracted from the minimum depth value to obtain the depth of the chair
  • the value span is considered to be the thickness of the chair in the depth direction; by setting the thickness range of each object in advance, for example, the thickness of the chair is 0.5 to 2 meters, it is judged whether the above depth value span is within this range, if it is, Then the depth information of the 10 feature points are all valid, otherwise all are invalid.
  • step S230 the two-dimensional feature points and the three-dimensional feature points are respectively matched with local map points of the scene to construct a first error function.
  • the local map point refers to the map point of the scene detected before the current frame within a local range centered on the shooting area of the current frame image, and the map point refers to the point added in the map model of the scene.
  • a certain number of key frames are usually selected from continuous frame images. This is a representative frame selected to reduce information redundancy in the modeling process. Usually, a certain number of frames can be selected every interval One frame is a key frame, or a key frame is extracted when the image content changes a lot.
  • the local map point may be a map point that appeared in the previous key frame and the common view key frame of the previous key frame.
  • the last key frame is the key frame closest to the current frame; common view means that the content of the two images is similar or has a common field of view (Field Of Vision, FOV), indicating that the two images are taken
  • FOV Field Of Vision
  • This exemplary embodiment can detect whether the feature points of other key frames and the previous key frame are the same points. If the number of the same feature points exceeds a certain ratio, the other key frames are the common view key frames of the previous key frame.
  • the matching method can include several exemplary methods: feature descriptions of feature points and local map points, for example, through ORB, BRIEF and other algorithms, according to the similarity of descriptors to determine whether the feature points and local map points match; for local map points Perform down-sampling to make it the same as the number of feature points in the current frame image, and then perform ICP (Iterative Closest Point) algorithm matching on the point cloud of feature points and the point cloud of local map points; based on target detection from The feature points extracted from the current frame image are matched with the object model in the local map point in the unit of object, and all the feature points in the successfully matched object are matched with the local map point in the corresponding object model.
  • ICP Intelligent Closest Point
  • the first error function can be constructed based on the matched feature points and the local map point pairs:
  • Loss1 is the first error function
  • Loss2D is the two-dimensional error term
  • Loss3D is the three-dimensional error term
  • P 2D represents the set of two-dimensional feature points, e i,k represents the difference between any one of them and the corresponding local map point Error
  • P 3D represents a collection of three-dimensional feature points, e j, k represents the error between any one of them and the corresponding local map point
  • k represents the current frame.
  • ⁇ () a robust kernel function ⁇ () can be added to formula (1) to reduce the influence of mismatches on the final result, as shown below:
  • the two-dimensional error term may be the reprojection error between the successfully matched two-dimensional feature point and the local map point
  • the three-dimensional error term may be the difference between the successfully matched three-dimensional feature point and the local map point.
  • the ICP (Iterated Nearest Neighbor) error between the two has the following relationship:
  • w represents the world coordinate system
  • Is the pose parameter of the camera in the current frame which represents the rotation and translation parameters used in the actual scene conversion from the world coordinate system to the plane of the current frame image
  • Is the plane coordinate of the two-dimensional feature point i in the current frame image Is the world coordinates of the local map point corresponding to the two-dimensional feature point i
  • ⁇ () represents the projection of the three-dimensional local map point to the image plane (here is the plane of the current frame image). Therefore, formula (3) represents the local map point After being reprojected to the plane of the current frame image, the plane coordinate error with the corresponding two-dimensional feature point.
  • an information matrix can also be added to the above formula (1) or (2) to measure the observation uncertainty of feature points, as shown below:
  • the indicated information matrix of the three-dimensional feature point j is related to the noise performance of the camera itself.
  • it is equivalent to performing a weighted calculation on the feature points at different positions, which can improve the accuracy of the first error function.
  • Step S240 Determine the pose parameter of the camera in the current frame by calculating the minimum value of the first error function.
  • the error function can also be called an optimization function, a constraint function, etc., which are used to optimize the corresponding variable parameters.
  • the first error function is used to optimize the pose parameters of the camera in the current frame. Taking formula (5) as an example, the relationship is as follows:
  • the condition for iterative convergence may be that the iteration reaches a certain number of rounds, or the value of the first error function decreased in two consecutive rounds of iterations is lower than a predetermined value, etc.
  • the feature points extracted in the image are divided into two-dimensional feature points and three-dimensional feature points, and the two types of feature points are respectively matched with local map points .
  • the pose parameters of the camera in the current frame are obtained.
  • introducing depth information to establish an error function as a basis for determining pose parameters can improve the accuracy of pose parameters in SLAM and increase tracking accuracy.
  • the feature points are classified, and error terms are constructed respectively.
  • this exemplary embodiment Compared with the method of optimizing all the feature points through one error term, this exemplary embodiment has higher flexibility and Pertinence, and can reduce the influence of invalid or wrong depth information on the results. In addition, it can also improve the stability and robustness of the SLAM method for a long time.
  • the first error function may also include an inertial measurement error term, which is The error between the IMU and the visual signal unit.
  • the visual signal unit refers to a unit for positioning and modeling through visual signals (mainly images), which mainly includes a camera, and may also include a depth sensor, a computer, etc., which are matched with the camera.
  • the first error function can be as follows:
  • the first error function can also be:
  • e IMU,k is the inertial measurement error term, which represents the error between the current frame IMU and the visual signal unit, Represents the information matrix of the IMU.
  • the inertial measurement error term is set in the first error function, and the IMU signal can be used as a basis parameter for pose optimization to further improve the accuracy of the pose parameters.
  • the alignment of the IMU and the visual signal unit may include the following steps: obtain the IMU gyroscope by calculating the minimum error between the rotation parameter in the pre-integration of the IMU and the rotation parameter measured by the visual signal unit Bias; by calculating the minimum error between the position parameter in the pre-integration of the IMU and the position parameter measured by the visual signal unit, the gravity acceleration of the IMU is obtained; based on the gyroscope bias and the gravity acceleration of the IMU, the IMU and the visual signal The unit is aligned.
  • the above steps can be performed in the initialization phase of SLAM, that is, the IMU and the visual signal unit are aligned in the initialization phase, and then in the tracking process, the above steps can also be performed to continuously optimize and adjust the alignment state of the IMU and the visual signal unit to further Improve the accuracy of tracking.
  • SLAM may also include a key frame processing thread (or called a map modeling thread, a map reconstruction thread, etc.).
  • the key frame processing thread may determine the current frame as a new key frame, and update the map model of the scene according to the new key frame and its pose parameters. Through the pose parameters, the new key frame can be converted to the world coordinate system, matched with the existing scene map model, optimized and corrected the map point position in the map model, or added new map points, and deleted abnormal map points Wait.
  • not every frame is processed as a key frame.
  • the following steps can be performed: when it is determined that the current frame meets the first preset condition, the current frame is determined as the new key frame Frame, the map model of the scene is updated; when it is determined that the current frame does not meet the first preset condition, the current frame is determined as a normal frame, and the next frame is processed.
  • the first preset condition may include:
  • the current frame is more than the preset number of frames from the previous key frame, and the preset number of frames can be set according to experience or actual application requirements. For example, if more than 15 frames are exceeded, the current frame is the new key frame.
  • the disparity between the current frame and the previous key frame exceeds the preset value.
  • the disparity is the opposite concept of the common view, which represents the degree of difference between the areas captured by the two frames. The greater the difference, the lower the common view, and the greater the parallax;
  • the difference can be set according to experience or actual application requirements. For example, it can be set to 15%. When the disparity between the current frame and the previous key frame exceeds 15%, it is a new key frame.
  • the current frame is a new key frame.
  • the first threshold, the second threshold, and the preset number can be set according to experience or actual application requirements.
  • the above three conditions can also be combined arbitrarily.
  • the current frame is more than the preset number of frames from the previous key frame, and the disparity between the current frame and the previous key frame exceeds the preset difference, the current frame is the new one
  • the key frame is not limited in this disclosure.
  • the existing map points can be updated, and the pose parameters of the key frames can be further optimized.
  • Figure 3 which specifically includes the following steps S310-S340:
  • Step S310 Obtain a new key frame and other key frames associated with the new key frame to form a key frame set.
  • key frames associated with the new key frame may be: M key frames closest to the new key frame, and N common view key frames of the new key frame, where M and N are preset positive integers, It can be set according to experience or actual application requirements. Of course, there may be repeated frames in M key frames and N common view key frames. The two parts can be unioned to obtain the key frame set, which is recorded as F key . Or, key frames that have other association relationships with the new key frame can also be formed into a key frame set F key .
  • Step S320 Obtain all the map points that have appeared in the key frame set to form a map point set.
  • step S330 a second error function is constructed based on the key frame set, the pose parameters of each key frame, and the map point set.
  • the second error function includes a reprojection error term, which is the sum of the reprojection error from any map point in the map point set to any key frame in the key frame set, which can be expressed as follows:
  • e o,p represents the reprojection error from any map point p in P map to any key frame o in F key .
  • ⁇ () can also be added to the second error function, then:
  • an inter-frame inertial measurement error term can also be set, which is the IMU between any two adjacent key frames i and i+1 in the key frame set
  • the sum of errors is as follows:
  • the IMU information matrix between key frames i and i+1 is also added, which can further optimize the second error function.
  • step S340 by calculating the minimum value of the second error function, the pose parameters of each key frame in the key frame set and the coordinates of each map point in the map point set are optimized to update the map model.
  • the optimization solution can be as follows:
  • X p is the world coordinate of any map point p in P map , It is the pose parameter of any key frame q in the F key .
  • abnormal map points can be deleted from existing map points.
  • the map points that meet the second preset condition can be used as abnormal map points, from Deleted from the map model.
  • the second preset condition can include any of the following:
  • the preset error threshold can be set according to experience or actual application requirements. When calculating the reprojection error, you can select all the key frames in the key frame set for calculation, or you can select p key frames with projection for calculation.
  • the key frame processing thread can predict the number of key frames where p is successfully tracked. This number is multiplied by a preset ratio less than or equal to 1, and the result is used to measure whether the tracking is abnormal; Let the ratio represent the allowable degree of deviation, which can be set according to experience or actual application requirements, for example, 0.5. The judgment relationship is as follows:
  • T1 is the first threshold
  • R is the preset ratio
  • Pre() is the prediction function.
  • new map points can be added. Specifically, if there is a feature point in the new key frame that does not match the local map point (or map point in the map point set), it can be considered that the feature point does not yet exist In the map model; then the feature point can be matched with the feature points of other key frames in the key frame set. If the matching is successful, a pair of feature points is obtained, which can be regarded as the projection of the same point in the scene on two different frames; triangulate the point pair to restore its three-dimensional coordinates in the scene To get a new map point, which can be added to the map model.
  • the feature points of each key frame in the key frame set are matched with the map points in the map point set.
  • the unmatched feature points form an unknown point.
  • the pose parameter can map the feature point to the world coordinate system, calculate its true position, and add it to the map model as a new map point. Even if the position of the map point is deviated, it can be continuously optimized in the processing of subsequent frames.
  • SLAM may also include a loopback detection thread, which is used to perform loopback detection for new keyframes to optimize the map model globally.
  • a loopback detection thread which is used to perform loopback detection for new keyframes to optimize the map model globally.
  • the feature points in the key frame are converted into a dictionary description through the pre-trained visual bag of words model; then the dictionary similarity between the key frame and the previous key frame is calculated, if the similarity reaches With a certain threshold, the key frame is considered to be a candidate loop frame; geometric verification is performed on the candidate loop frame, that is, the matching points should meet the corresponding geometric relationship, if the geometric verification is passed, it is considered a loop frame, and the map model is globally optimized .
  • Fig. 4 shows the flow of a SLAM method of this exemplary embodiment, including three parts respectively executed by a tracking thread, a key frame processing thread, and a loopback detection thread, which are specifically as follows:
  • the tracking thread executes step S411 to collect the current frame image and its depth information; then executes step S412 to extract feature points from the current frame image; then executes step S413 to classify feature points according to the depth information to obtain two-dimensional feature points and three-dimensional feature points ; Then step S414 is performed to match the two-dimensional feature points and the three-dimensional feature points with the local map points respectively to construct a first error function; then step S415 is performed to optimize the pose of the current frame by calculating the minimum value of the first error function Parameters; finally step S416 is performed to determine whether the current frame meets the first preset condition, if not, then enter the processing of the next frame, if it is satisfied, it will be treated as a new key frame and added to the key frame queue.
  • the process of tracking the thread ends.
  • the key frame processing thread executes step S421 to add a new key frame to the key frame queue; then executes step S422 to construct a second error function; then executes step S423 to calculate the second error
  • the minimum value of the function is to optimize the poses of multiple nearby key frames (key frames close to the current frame) and the position of the map point.
  • step S424 is executed to determine the IMU and vision Whether the signal unit is aligned, if not, perform step S425 for alignment; before alignment, it can also be judged whether there is obvious parallax between a certain number of nearby key frames, and if so, alignment can be performed If it does not exist, it is considered that the alignment is impossible, and the alignment step is skipped; then step S426 is executed to delete the abnormal map point in the map model; then step S427 is executed to add a new map point in the map model.
  • the flow of the key frame processing thread ends.
  • the loopback detection thread can perform global optimization based on the local optimization of the key frame processing thread. Specifically, it includes: first perform step S431 to perform loopback detection; if it is a loopback candidate frame, perform step S432 to perform geometric verification; if the geometric verification passes , Step S433 is executed to optimize the map model globally.
  • the pose determination device 500 may include: an image acquisition module 510 for acquiring current information about the scene through a camera. Frame image and obtain the depth information of the current frame image; the feature point extraction module 520 is used to extract feature points from the current frame image, determine feature points with valid depth information as three-dimensional feature points, and determine feature points with invalid depth information as Two-dimensional feature points; the function construction module 530 is used to match two-dimensional feature points and three-dimensional feature points with local map points respectively to construct a first error function, the first error function includes a two-dimensional error term and a three-dimensional error term, The two-dimensional error term is the error between the successfully matched two-dimensional feature point and the local map point, and the three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point; the pose determination module 540 is used to calculate The minimum value of the first error function determines the pose parameters of the camera in the current frame.
  • the error between the successfully matched two-dimensional feature point and the local map point may be a reprojection error
  • the error between the successfully matched three-dimensional feature point and the local map point may be the iterative nearest neighbor error
  • the local map points may include: the previous key frame and the map points that appeared in the common view key frame of the previous key frame; wherein, the previous key frame is the key closest to the current frame frame.
  • the first error function may also include an inertial measurement error term, which is the error between the IMU and the visual signal unit.
  • the visual signal unit includes a camera.
  • the pose determination device 500 may further include: an IMU alignment module, configured to calculate the minimum error between the rotation parameter in the pre-integration of the IMU and the rotation parameter measured by the visual signal unit , Get the gyroscope bias of the IMU, calculate the minimum error between the position parameter in the pre-integration of the IMU and the position parameter measured by the visual signal unit to get the gravity acceleration of the IMU, and based on the gyroscope bias and gravity of the IMU Acceleration, align the IMU and the visual signal unit.
  • an IMU alignment module configured to calculate the minimum error between the rotation parameter in the pre-integration of the IMU and the rotation parameter measured by the visual signal unit , Get the gyroscope bias of the IMU, calculate the minimum error between the position parameter in the pre-integration of the IMU and the position parameter measured by the visual signal unit to get the gravity acceleration of the IMU, and based on the gyroscope bias and gravity of the IMU Acceleration, align the IMU and the visual signal unit.
  • the pose determining apparatus 500 may further include: a map update module, configured to determine the current frame as a new key frame, and update the map model of the scene according to the new key frame and the pose parameters.
  • a map update module configured to determine the current frame as a new key frame, and update the map model of the scene according to the new key frame and the pose parameters.
  • the map update module may also be used to determine the current frame as a new key frame when it is determined that the current frame meets the first preset condition, and when it is determined that the current frame does not meet the first preset condition
  • the first preset condition can include any one or a combination of the following: the current frame is more than the preset number of frames from the previous key frame, and the previous key frame It is the key frame closest to the current frame; the disparity between the current frame and the previous key frame exceeds the preset difference; the number of feature points successfully tracked in the current frame is less than the preset number, and the number of feature points successfully tracked is determined by the following method: Statistics The number of two-dimensional feature points whose error with the local map point is less than the first threshold, and the number of three-dimensional feature points whose error with the local map point is less than the second threshold, the sum of the two numbers is the feature point with successful tracking Quantity.
  • the map update module may include: a key frame acquisition unit for acquiring a new key frame and other key frames associated with the new key frame to form a key frame set; a map point acquiring unit, It is used to obtain all the map points that have appeared in the key frame set to form a map point set; the second function construction unit is used to construct a second function based on the key frame set and the pose parameters of each key frame and the map point set. Error function.
  • the second error function includes a reprojection error term.
  • the reprojection error term is the sum of reprojection errors from any map point in the map point set to any key frame in the key frame set; an optimization processing unit for By calculating the minimum value of the second error function, the pose parameters of each key frame in the key frame set and the coordinates of each map point in the map point set are optimized to update the map model.
  • the second error function may also include an inter-frame inertial measurement error term, which is the sum of errors between any two adjacent key frames of the IMU in the key frame set.
  • other key frames associated with the new key frame may include: M key frames closest to the new key frame, and N common view key frames of the new key frame; wherein, M and N are preset positive integers.
  • the map update module may further include: a map point deletion unit, configured to delete map points in the map point set that meet the second preset condition from the map model; wherein, the second preset Conditions may include: the number of key frames where the map point is successfully tracked in the key frame set is less than the predicted number multiplied by the preset ratio, which is less than or equal to 1; or the reprojection of the map point on each key frame in the key frame set The mean value of the error is greater than the preset error threshold.
  • the map update module may further include: a map point adding unit configured to, if there is a feature point in the new key frame that does not match the local map point, collect the feature point with the key frame The feature points of other key frames are matched, and triangulation calculation is performed according to the matching results to obtain new map points to add to the map model.
  • a map point adding unit configured to, if there is a feature point in the new key frame that does not match the local map point, collect the feature point with the key frame The feature points of other key frames are matched, and triangulation calculation is performed according to the matching results to obtain new map points to add to the map model.
  • the pose determination device 500 may further include: a loop detection module, configured to perform loop detection for the new key frame, so as to globally optimize the map model.
  • Exemplary embodiments of the present disclosure also provide a computer-readable storage medium on which is stored a program product capable of implementing the above-mentioned method of this specification.
  • various aspects of the present disclosure can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the terminal device execute the above-mentioned instructions in this specification.
  • the steps according to various exemplary embodiments of the present disclosure are described in the "Exemplary Methods" section.
  • a program product 600 for implementing the above method according to an exemplary embodiment of the present disclosure is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and can be used in a terminal Running on equipment, such as a personal computer.
  • CD-ROM compact disk read-only memory
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can adopt any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using Internet service providers) Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers Internet service providers
  • Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method.
  • the electronic device 700 according to this exemplary embodiment of the present disclosure is described below with reference to FIG. 7.
  • the electronic device 700 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 700 may be in the form of a general-purpose computing device.
  • the components of the electronic device 700 may include, but are not limited to: the aforementioned at least one processing unit 710, the aforementioned at least one storage unit 720, a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710), and a display unit 740.
  • the storage unit 720 stores program codes, and the program codes can be executed by the processing unit 710 so that the processing unit 710 executes the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
  • the processing unit 710 may execute the method steps shown in FIG. 2, FIG. 3, or FIG. 4, etc.
  • the storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 721 and/or a cache storage unit 722, and may further include a read-only storage unit (ROM) 723.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 720 may also include a program/utility tool 724 having a set of (at least one) program module 725.
  • program module 725 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 730 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the electronic device 700 may also communicate with one or more external devices 800 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 700, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 750.
  • the electronic device 700 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 760 communicates with other modules of the electronic device 700 through the bus 730. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the exemplary embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A depth information-based pose determination method and device, a storage medium, and an electronic apparatus. The method comprises: acquiring, by means of a camera, a current frame image associated with a scene, and acquiring depth information of the current frame image (S210); extracting feature points from the current frame image, determining a feature point having valid depth information to be a three-dimensional feature point, and determining a feature point having invalid depth information to be a two-dimensional feature point (S220); respectively performing matching on the basis of the two-dimensional feature point, the three-dimensional feature point and a local map point to construct a first error function (S230); and determining a pose parameter of the camera on the basis of the current frame by calculating the minimum value of the first error function (S240). The invention can improve tracking precision of SLAM.

Description

基于深度信息的位姿确定方法、装置、介质与电子设备Method, device, medium and electronic equipment for determining pose based on depth information
本申请要求于2019年06月28日提交中国专利局、申请号为CN201910580095.1、申请名称为“基于深度信息的位姿确定方法、装置、介质与电子设备”的专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a patent application filed with the Chinese Patent Office on June 28, 2019, with the application number CN201910580095.1, and the application titled "Pose determination method, device, medium and electronic equipment based on depth information". The entire content is incorporated into this application by reference.
技术领域Technical field
本公开涉及计算机视觉技术领域,尤其涉及一种基于深度信息的位姿确定方法、基于深度信息的位姿确定装置、计算机可读存储介质与电子设备。The present disclosure relates to the field of computer vision technology, and in particular to a method for determining a pose based on depth information, a device for determining a pose based on depth information, a computer-readable storage medium, and electronic equipment.
背景技术Background technique
SLAM(Simultaneous Localization And Mapping,同时定位和建图)是通过终端设备在场景中移动并采集场景的图像,同时进行设备自身位姿的确定以及场景建模的一种方法,是AR(Augmented Reality,增强现实)、机器人等领域的基础技术。SLAM (Simultaneous Localization And Mapping, Simultaneous Localization And Mapping) is a method to move the terminal device in the scene and collect the image of the scene, and at the same time to determine the device’s own pose and model the scene. It is AR (Augmented Reality, enhanced Reality), basic technology in robotics and other fields.
现有的SLAM方法中,大多通过视觉信号单元(如相机)和IMU(Inertia Measurement Unit,惯性测量单元)的对准,实时确定相机的位姿,从而将相机拍摄到的场景画面还原为场景模型。然而,该方法受到IMU精度以及延迟性的限制,在短时间内可以获得较为准确的位姿信息,长时间使用则会产生较为严重的漂移,导致无法对相机进行准确的跟踪,不利于场景建模。In the existing SLAM methods, most of the visual signal units (such as cameras) and IMU (Inertia Measurement Unit) are aligned to determine the pose of the camera in real time, so as to restore the scene captured by the camera to a scene model. . However, this method is limited by the accuracy and latency of the IMU. More accurate pose information can be obtained in a short time. Long-term use will cause more serious drift, resulting in the inability to accurately track the camera, which is not conducive to scene construction. mold.
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only used to strengthen the understanding of the background of the present disclosure, and therefore may include information that does not constitute the prior art known to those of ordinary skill in the art.
发明内容Summary of the invention
本公开提供了一种基于深度信息的位姿确定方法、基于深度信息的位姿确定装置、计算机可读存储介质与电子设备,进而至少在一定程度上改善现有的SLAM方法中跟踪精度较低的问题。The present disclosure provides a method for determining a pose based on depth information, a device for determining a pose based on depth information, a computer-readable storage medium, and electronic equipment, thereby improving at least to a certain extent the low tracking accuracy of existing SLAM methods The problem.
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。Other characteristics and advantages of the present disclosure will become apparent through the following detailed description, or partly learned through the practice of the present disclosure.
根据本公开的第一方面,提供一种基于深度信息的位姿确定方法,包括:通过相机获取关于场景的当前帧图像,并获取所述当前帧图像的深度信息;从所述当前帧图像提取特征点,将深度信息有效的特征点确定为三维特征点,将深度信息无效的特征点确定为二维特征点;将所述二维特征点和三维特征点分别与所述场景的局部地图点进行匹配,以构建第一误差函数,所述第一误差函数包括二维误差项和三维误差项,所述二维误差项为匹配成功的二维特征点与局部地图点之间的误差,所述三维误差项为匹配成功的三维特征点与局部地图点之间的误差;通过计算所述第一误差函数的最 小值,确定所述相机在当前帧的位姿参数。According to a first aspect of the present disclosure, a method for determining a pose based on depth information is provided, including: acquiring a current frame image of a scene through a camera, and acquiring depth information of the current frame image; and extracting from the current frame image Feature points, the feature points with valid depth information are determined as three-dimensional feature points, and the feature points with invalid depth information are determined as two-dimensional feature points; the two-dimensional feature points and the three-dimensional feature points are respectively connected to the local map points of the scene Matching is performed to construct a first error function. The first error function includes a two-dimensional error term and a three-dimensional error term. The two-dimensional error term is the error between the successfully matched two-dimensional feature point and the local map point. The three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point; by calculating the minimum value of the first error function, the pose parameter of the camera in the current frame is determined.
根据本公开的第二方面,提供一种基于深度信息的位姿确定装置,包括:图像获取模块,用于通过相机获取关于场景的当前帧图像,并获取所述当前帧图像的深度信息;特征点提取模块,用于从所述当前帧图像提取特征点,将深度信息有效的特征点确定为三维特征点,将深度信息无效的特征点确定为二维特征点;函数构建模块,用于将所述二维特征点和三维特征点分别与局部地图点进行匹配,以构建第一误差函数,所述第一误差函数包括二维误差项和三维误差项,所述二维误差项为匹配成功的二维特征点与局部地图点之间的误差,所述三维误差项为匹配成功的三维特征点与局部地图点之间的误差;位姿确定模块,用于通过计算所述第一误差函数的最小值,确定所述相机在当前帧的位姿参数。According to a second aspect of the present disclosure, there is provided a device for determining a pose based on depth information, including: an image acquisition module for acquiring a current frame image of a scene through a camera, and acquiring depth information of the current frame image; The point extraction module is used to extract feature points from the current frame image, determine the feature points with valid depth information as three-dimensional feature points, and determine the feature points with invalid depth information as two-dimensional feature points; the function building module is used to The two-dimensional feature points and the three-dimensional feature points are matched with the local map points respectively to construct a first error function. The first error function includes a two-dimensional error term and a three-dimensional error term, and the two-dimensional error term is a successful match. The three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point; the pose determination module is used to calculate the first error function The minimum value of determines the pose parameter of the camera in the current frame.
根据本公开的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面的位姿确定方法及其可能的实现方式。According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the pose determination method of the first aspect and possible implementations thereof.
根据本公开的第四方面,提供一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述第一方面的位姿确定方法及其可能的实现方式。According to a fourth aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions Perform the pose determination method of the first aspect and possible implementations thereof.
本公开具有以下有益效果:The present disclosure has the following beneficial effects:
基于当前帧图像的深度信息,将图像中提取的特征点划分为二维特征点和三维特征点,并分别对两类特征点进行局部地图点的匹配,以构建二维误差项和三维误差项,从而建立第一误差函数,通过优化求解第一误差函数的最小值,得到相机在当前帧的位姿参数。一方面,引入深度信息建立误差函数,以作为位姿参数确定的依据,可以提高SLAM中位姿参数的准确度,增加跟踪精度。另一方面,基于深度信息是否有效,对特征点进行分类,并分别构建误差项,相比于将所有特征点通过一个误差项进行优化的方式,本示例性实施方式具有更高的灵活性与针对性,并可以降低无效或错误的深度信息对结果造成的影响,此外,也可以提高SLAM方法长时间运行的稳定性与鲁棒性。Based on the depth information of the current frame image, the feature points extracted in the image are divided into two-dimensional feature points and three-dimensional feature points, and the two types of feature points are matched with local map points to construct a two-dimensional error term and a three-dimensional error term , So as to establish the first error function, and obtain the pose parameters of the camera in the current frame by optimizing the minimum value of the first error function. On the one hand, introducing depth information to establish an error function as a basis for determining pose parameters can improve the accuracy of pose parameters in SLAM and increase tracking accuracy. On the other hand, based on whether the depth information is valid, the feature points are classified, and error terms are constructed respectively. Compared with the method of optimizing all the feature points through one error term, this exemplary embodiment has higher flexibility and Pertinence, and can reduce the influence of invalid or wrong depth information on the results. In addition, it can also improve the stability and robustness of the SLAM method for a long time.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present disclosure.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施方式,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the disclosure, and together with the specification are used to explain the principle of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1示出本示例性实施方式中一种SLAM系统的架构示意图;FIG. 1 shows a schematic diagram of the architecture of a SLAM system in this exemplary embodiment;
图2示出本示例性实施方式中一种基于深度信息的位姿确定方法的流程图;Fig. 2 shows a flow chart of a method for determining a pose based on depth information in this exemplary embodiment;
图3示出本示例性实施方式中一种基于深度信息的位姿确定方法的子流程图;Fig. 3 shows a sub-flow chart of a method for determining a pose based on depth information in this exemplary embodiment;
图4示出本示例性实施方式中一种SLAM方法的流程图;Fig. 4 shows a flow chart of a SLAM method in this exemplary embodiment;
图5示出本示例性实施方式中一种基于深度信息的位姿确定装置的结构框图;Fig. 5 shows a structural block diagram of a device for determining a pose based on depth information in this exemplary embodiment;
图6示出本示例性实施方式中一种用于实现上述方法的计算机可读存储介质;Fig. 6 shows a computer-readable storage medium for implementing the above method in this exemplary embodiment;
图7示出本示例性实施方式中一种用于实现上述方法的电子设备。Fig. 7 shows an electronic device for implementing the above-mentioned method in this exemplary embodiment.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, the provision of these embodiments makes the present disclosure more comprehensive and complete, and fully conveys the concept of the example embodiments To those skilled in the art. The described features, structures or characteristics may be combined in one or more embodiments in any suitable way.
本公开的示例性实施方式首先提供一种基于深度信息的位姿确定方法,该位姿确定方法主要应用于SLAM场景中,确定相机的位姿。图1示出了该方法应用环境的SLAM系统架构图。如图1所示,该SLAM系统100可以包括:场景101、可移动的相机102、可移动的深度传感器103和计算设备104。其中,场景101为待建模的现实场景,如室内、庭院、街道等。相机102和深度传感器103可以一体化设置,例如:相机102为平面摄像头,深度传感器103为设置在旁边的TOF(Time Of Flight,飞行时间)传感器;或者相机102和深度传感器103为两个摄像头,组成双目相机;或者深度传感器103为红外光装置,其与相机102组成结构光相机。相机102和深度传感器103可在场景101内移动,采集场景101的图像及其深度信息,图1示出相机102和深度传感器103设置在可移动的机器人上,此外也可以是用户手持手机或佩戴智能眼镜等在场景101内移动,该手机或智能眼镜内置有相机102和深度传感器103。计算设备104可以是终端计算机或服务器等,与相机102和深度传感器103通讯连接,可进行数据交互,相机102和深度传感器103将采集的图像及其深度信息发送到计算设备104,计算设备104进行处理分析,以实现SLAM中的定位和建模。The exemplary embodiments of the present disclosure first provide a method for determining a pose based on depth information, which is mainly applied in a SLAM scene to determine the pose of a camera. Figure 1 shows the SLAM system architecture diagram of the application environment of this method. As shown in FIG. 1, the SLAM system 100 may include: a scene 101, a movable camera 102, a movable depth sensor 103, and a computing device 104. Among them, the scene 101 is a realistic scene to be modeled, such as an interior, a courtyard, and a street. The camera 102 and the depth sensor 103 can be integrated. For example, the camera 102 is a plane camera, and the depth sensor 103 is a TOF (Time Of Flight) sensor set aside; or the camera 102 and the depth sensor 103 are two cameras, A binocular camera is formed; or the depth sensor 103 is an infrared light device, which and the camera 102 form a structured light camera. The camera 102 and the depth sensor 103 can move within the scene 101 to collect the image of the scene 101 and its depth information. FIG. 1 shows that the camera 102 and the depth sensor 103 are set on a movable robot. In addition, the user can hold a mobile phone or wear it. Smart glasses and the like move within the scene 101, and the mobile phone or smart glasses has a camera 102 and a depth sensor 103 built in. The computing device 104 can be a terminal computer or server, etc., which is connected to the camera 102 and the depth sensor 103 for data interaction. The camera 102 and the depth sensor 103 send the collected images and their depth information to the computing device 104, and the computing device 104 performs Process analysis to achieve positioning and modeling in SLAM.
需要说明的是,图1所示的SLAM系统100仅是一个示例,还可以有若干的变化形式,举例说明:相机102、深度传感器103和计算设备104可以集成于一个设备中,如内置了相机102、深度传感器103和计算设备104的机器人,其可以在场景101内一边移动一边拍照,并对照片进行处理,实现定位和建模;各装置的数目也不限于图1所示的情况,如可以设置多个相机(如手机上设置3个或4个摄像头),或者设置多台服务器组成的计算设备集群,通过云计算的方式对大量的场景图像进行处理,等等;还可以增加图1中未示出的装置,如设置与相机102配套的IMU(Inertia Measurement Unit,惯性测量单元),以帮助确定相机102的位姿,或者设置投影装置,在场景103内产生虚拟投影,与相机102、用户或机器人形成交互。It should be noted that the SLAM system 100 shown in FIG. 1 is only an example, and there can be several variations. For example, the camera 102, the depth sensor 103, and the computing device 104 can be integrated into one device, such as a built-in camera. 102. The robot of the depth sensor 103 and the computing device 104 can take pictures while moving in the scene 101, and process the photos to realize positioning and modeling; the number of devices is not limited to the situation shown in FIG. 1, as You can set up multiple cameras (such as 3 or 4 cameras on a mobile phone), or set up a computing device cluster composed of multiple servers, and process a large number of scene images through cloud computing, etc.; Figure 1 can also be added Devices not shown in, such as setting up an IMU (Inertia Measurement Unit) matched with the camera 102 to help determine the pose of the camera 102, or setting a projection device to generate a virtual projection in the scene 103, and the camera 102 , The user or the robot interacts.
在SLAM的起始时刻,从零开始对场景建模,此时未采集任何场景图像;SLAM流程开始后,相机跟随用户或者机器人在场景内移动,一边移动一边采集场景图像,形成连续帧的图像流,实时发送到计算设备;在获得一定帧数的图像后,计算设备可以初始化场景的地图模型,通常只包括场景的一小部分,或者与实际场景有差别;此后,相机每采集一帧的图像,计算设备都可以根据该图像对地图模型进行更新优化(当然也可以在筛选出关键帧时进行地图模型的更新优化),增加地图模型中没有的地图点,或者修正已有地图点的位置等;在进行地图模型的更新优化时,需要先确定相机的位姿参数,这在SLAM中是必需的一个环节,只有在确定相机的位姿参数后,才可以将相机采集的图像对应匹配到三维的地图模型中,以更新优化地图模型。At the beginning of SLAM, the scene is modeled from scratch, and no scene images are collected at this time; after the SLAM process starts, the camera follows the user or the robot to move in the scene, collecting scene images while moving, forming continuous frames of images Stream, sent to the computing device in real time; after obtaining a certain number of frames of images, the computing device can initialize the map model of the scene, which usually only includes a small part of the scene, or is different from the actual scene; after that, the camera collects a frame of Images and computing devices can update and optimize the map model based on the image (of course, you can also update and optimize the map model when the key frames are filtered out), add map points that are not in the map model, or modify the location of existing map points When updating and optimizing the map model, you need to determine the camera's pose parameters, which is a necessary part in SLAM. Only after the camera's pose parameters are determined, can the images collected by the camera be matched to To update and optimize the map model in the three-dimensional map model.
本示例性实施方式即是针对于如何在每一帧确定相机的位姿参数,所提出的改进方法。基于图1的SLAM系统100,本示例性实施方式的执行主体可以是其中的计算设备104。图2示出了本示例性实施方式的一种流程,可以包括以下步骤S210~S240:This exemplary embodiment is an improved method for how to determine the pose parameters of the camera in each frame. Based on the SLAM system 100 of FIG. 1, the execution subject of this exemplary embodiment may be the computing device 104 therein. Fig. 2 shows a process of this exemplary embodiment, which may include the following steps S210 to S240:
步骤S210,通过相机获取关于场景的当前帧图像,并获取当前帧图像的深度信息。Step S210: Acquire a current frame image of the scene through the camera, and acquire depth information of the current frame image.
其中,相机每采集一帧图像,计算设备分析一帧图像,当前帧图像是相机所采集的最新一帧图像。深度信息是与当前帧图像同时采集的,例如:采用平面摄像头加深度传感器拍摄场景图像,可以得到图像中每个像素点的深度信息,通常为深度值;采用双目相机拍摄场景图像,可以通过三角化算法得到图像中每个像素点的深度信息;采用结构光相机拍摄场景图像,可以利用红外点阵向场景投射红外光信号,并在接收反射回来的信号后,通过红外光的变化计算得到深度信息,等等。Among them, each frame of image collected by the camera, the computing device analyzes one frame of image, and the current frame of image is the latest frame of image collected by the camera. Depth information is collected at the same time as the current frame of image. For example, if you use a plane camera and a depth sensor to capture a scene image, you can get the depth information of each pixel in the image, usually the depth value; if you use a binocular camera to capture the scene image, you can The triangulation algorithm obtains the depth information of each pixel in the image; the structured light camera is used to shoot the scene image, and the infrared dot matrix can be used to project the infrared light signal to the scene, and after receiving the reflected signal, it is calculated by the change of the infrared light In-depth information, etc.
步骤S220,从当前帧图像提取特征点,将深度信息有效的特征点确定为三维特征点,将深度信息无效的特征点确定为二维特征点。Step S220: Extract feature points from the current frame image, determine feature points with valid depth information as three-dimensional feature points, and determine feature points with invalid depth information as two-dimensional feature points.
其中,特征点是图像中具有代表性的、辨识度较高的点或区域,例如图像中的角点、边缘和一些区块等,本示例性实施方式可以采用ORB算法(Oriented FAST and Rotated BRIEF,面向FAST(Features From Accelerated Segment Test,基于加速分割检测的特征)算法和旋转BRIEF(Binary Robust Independent Elementary Features,二进制鲁棒独立基本特征))提取特征点并对特征点进行描述;也可以采用FAST、SIFT(Scale-Invariant Feature Transform,尺度不变特征变换)、SURF(Speeded Up Robust Features,加速鲁棒特征)等算法提取特征点;还可以对当前帧图像进行目标检测,在检测到的物体边缘轮廓上提取一定的特征点,等等。Among them, the feature points are representative and highly recognizable points or regions in the image, such as corner points, edges, and some blocks in the image. This exemplary embodiment can use the ORB algorithm (Oriented FAST and Rotated Brief). , For FAST (Features From Accelerated Segment Test, features based on accelerated segmentation detection) algorithm and rotating Brief (Binary Robust Independent Elementary Features, Binary Robust Independent Elementary Features)) to extract feature points and describe feature points; FAST can also be used , SIFT (Scale-Invariant Feature Transform, Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features, accelerated robust features) and other algorithms to extract feature points; it can also perform target detection on the current frame of image, and detect the edges of objects Extract certain feature points on the contour, and so on.
提取的特征点为当前帧图像中的像素点,具有深度信息。本示例性实施方式中,考虑到深度传感器、双目相机等深度检测部件本身的能力限制,无法准确检测距离深度传感器过近或者过远的物体的深度信息,或者对于黑色材质或高反光材质的物体、光照变化较大的场景等处理能力较差,当前帧图像的深度信息可能包含无效的像素点深度信息。因此,基于深度信息是否有效,可以将特征点划分为三维特征点和二维特征点:深度信息无效的特征点为二维特征点,由于其深度信息无效,仅保留其在当前 帧图像中的二维坐标(即平面坐标);深度信息有效的特征点为三维特征点,除了其在当前帧图像中的二维坐标外,还具有其在深度方向上的第三维坐标,其坐标值通常为深度值。The extracted feature points are pixels in the current frame image, which have depth information. In this exemplary embodiment, considering the capability limitations of depth detection components such as depth sensors, binocular cameras, etc., it is impossible to accurately detect the depth information of objects that are too close or too far from the depth sensor, or for black or high-reflective materials. The processing capabilities of objects and scenes with large lighting changes are poor, and the depth information of the current frame image may contain invalid pixel depth information. Therefore, based on whether the depth information is valid, the feature points can be divided into three-dimensional feature points and two-dimensional feature points: feature points with invalid depth information are two-dimensional feature points, and because their depth information is invalid, only their values in the current frame image are retained. Two-dimensional coordinates (that is, plane coordinates); the feature points with effective depth information are three-dimensional feature points. In addition to their two-dimensional coordinates in the current frame image, they also have their third-dimensional coordinates in the depth direction, and their coordinate values are usually The depth value.
检测深度信息是否有效时,主要检测各特征点的深度信息能否准确反映所拍摄物体的实际情况,基于该思想原理,对于不同类型、不同场景下的图像,所采用的检测方法与标准可以因情况而异,本公开对此不做限定,以下提供几个具体的检测方法示例。When detecting whether the depth information is valid, it is mainly to detect whether the depth information of each feature point can accurately reflect the actual situation of the photographed object. Based on the principle of this idea, for images of different types and different scenes, the detection methods and standards used can be based on Depending on the situation, this disclosure does not limit this, and several specific detection method examples are provided below.
(1)深度传感器在无法准确检测物体深度信息的情况下,可以将对应部分的深度值输出为无效值或异常值,例如TOF传感器的深度检测范围通常为0.5~3米,如果物体到TOF传感器的距离在此范围之外,导致TOF传感器感应到的TOF(发射与接收信号之间的时间差)超出上限或下限,则可以将物体的深度值记为无效值或上限、下限值,因此该深度值不可信,为无效信息;反之,若深度值为检测范围内的正常数值,则为有效信息。(1) When the depth sensor cannot accurately detect the depth information of the object, it can output the depth value of the corresponding part as an invalid value or an abnormal value. For example, the depth detection range of a TOF sensor is usually 0.5 to 3 meters. If the object reaches the TOF sensor If the distance is outside this range, the TOF (the time difference between the transmitted and received signal) sensed by the TOF sensor exceeds the upper or lower limit, and the depth value of the object can be recorded as an invalid value or upper and lower limit. The depth value is not credible and is invalid information; on the contrary, if the depth value is a normal value within the detection range, it is valid information.
(2)基于目标检测所提取的特征点,可以以物体为单位,对当前帧图像中每个物体的全部特征点进行统一检测,检测该物体的深度值跨度(即最大深度值减去最小深度值),如果在正常范围内,则该物体的全部特征点的深度信息都有效。例如:从当前帧图像中检测到椅子,并从椅子的轮廓上提取10个特征点(包括角点、边缘点等),10个特征点中的最大深度值减去最小深度值得到椅子的深度值跨度,认为是该椅子在深度方向上的厚度;通过事先设定每种物体的厚度范围,如椅子的厚度为0.5~2米,则判断上述深度值跨度是否在该范围内,若在,则10个特征点的深度信息全部有效,反之则全部无效。(2) Based on the feature points extracted by target detection, all feature points of each object in the current frame image can be uniformly detected with the object as a unit, and the depth value span of the object is detected (that is, the maximum depth value minus the minimum depth Value), if it is within the normal range, the depth information of all feature points of the object are valid. For example: a chair is detected from the current frame image, and 10 feature points (including corner points, edge points, etc.) are extracted from the contour of the chair, and the maximum depth value of the 10 feature points is subtracted from the minimum depth value to obtain the depth of the chair The value span is considered to be the thickness of the chair in the depth direction; by setting the thickness range of each object in advance, for example, the thickness of the chair is 0.5 to 2 meters, it is judged whether the above depth value span is within this range, if it is, Then the depth information of the 10 feature points are all valid, otherwise all are invalid.
步骤S230,将二维特征点和三维特征点分别与场景的局部地图点进行匹配,以构建第一误差函数。In step S230, the two-dimensional feature points and the three-dimensional feature points are respectively matched with local map points of the scene to construct a first error function.
其中,局部地图点是指以当前帧图像所拍摄区域为中心的一个局部范围内、在当前帧之前已检测到的场景的地图点,地图点是指在场景的地图模型中已添加的点。在SLAM采集场景图像时,通常从连续帧图像中选取一定数量的关键帧,这是为了减少建模过程中的信息冗余而选取的具有代表性的帧,通常可以每间隔一定的帧数选取一帧为关键帧,或者在图像内容变化较多时提取关键帧。本示例性实施方式中,局部地图点可以是上一关键帧以及上一关键帧的共视关键帧中出现过的地图点。其中,上一关键帧为距离当前帧最近的关键帧;共视是指两帧图像的内容相似度较高,或者具有共同的视场角(Field Of Vision,FOV),说明两帧图像所拍摄的区域重合度较高,具有共视关系,其中一帧为另一帧的共视帧。本示例性实施方式可以检测其他关键帧与上一关键帧的特征点是否为相同的点,若相同的特征点数量超过一定的比例,则该其他关键帧为上一关键帧的共视关键帧,也可以根据相同特征点的数量确定每个其他关键帧与上一关键帧的共视程度,从高到低选取一定数量的其他关键帧,为上一关键帧的 共视关键帧。在确定上一关键帧及其共视关键帧后,对其中的地图点取并集,得到的地图点为局部地图点。The local map point refers to the map point of the scene detected before the current frame within a local range centered on the shooting area of the current frame image, and the map point refers to the point added in the map model of the scene. When SLAM collects scene images, a certain number of key frames are usually selected from continuous frame images. This is a representative frame selected to reduce information redundancy in the modeling process. Usually, a certain number of frames can be selected every interval One frame is a key frame, or a key frame is extracted when the image content changes a lot. In this exemplary embodiment, the local map point may be a map point that appeared in the previous key frame and the common view key frame of the previous key frame. Among them, the last key frame is the key frame closest to the current frame; common view means that the content of the two images is similar or has a common field of view (Field Of Vision, FOV), indicating that the two images are taken The regions of with a higher degree of overlap and a common view relationship, where one frame is the common view frame of the other frame. This exemplary embodiment can detect whether the feature points of other key frames and the previous key frame are the same points. If the number of the same feature points exceeds a certain ratio, the other key frames are the common view key frames of the previous key frame. , You can also determine the degree of common view between each other key frame and the previous key frame according to the number of the same feature points, and select a certain number of other key frames from high to low to be the common view key frame of the previous key frame. After determining the previous key frame and its common view key frame, take the union of the map points, and the obtained map points are local map points.
在得到局部地图点后,将当前帧图像中的二维特征点和三维特征点分别与局部地图点进行匹配,若判断特征点和局部地图点为场景中的同一点,则匹配成功。匹配方法可以包括几种示例性方法:对特征点和局部地图点分别进行特征描述,例如通过ORB、BRIEF等算法,根据描述子的相似度确定特征点和局部地图点是否匹配;对局部地图点进行下采样,使其与当前帧图像中的特征点数量相同,然后对特征点的点云与局部地图点的点云进行ICP(Iterative Closest Point,迭代最近邻点)算法匹配;基于目标检测从当前帧图像提取的特征点,以物体为单位与局部地图点中的物体模型进行匹配,将匹配成功的物体中的全部特征点与对应的物体模型中的局部地图点进行匹配。After the local map points are obtained, the two-dimensional feature points and the three-dimensional feature points in the current frame image are matched with the local map points respectively. If it is determined that the feature points and the local map points are the same point in the scene, the matching is successful. The matching method can include several exemplary methods: feature descriptions of feature points and local map points, for example, through ORB, BRIEF and other algorithms, according to the similarity of descriptors to determine whether the feature points and local map points match; for local map points Perform down-sampling to make it the same as the number of feature points in the current frame image, and then perform ICP (Iterative Closest Point) algorithm matching on the point cloud of feature points and the point cloud of local map points; based on target detection from The feature points extracted from the current frame image are matched with the object model in the local map point in the unit of object, and all the feature points in the successfully matched object are matched with the local map point in the corresponding object model.
在匹配后,可以基于匹配成功的特征点与局部地图点点对,构建第一误差函数:After the matching, the first error function can be constructed based on the matched feature points and the local map point pairs:
Figure PCTCN2020094461-appb-000001
Figure PCTCN2020094461-appb-000001
其中,Loss1为第一误差函数,Loss2D为二维误差项,Loss3D为三维误差项;P 2D表示二维特征点的集合,e i,k表示其中的任一点与对应的局部地图点之间的误差;P 3D表示三维特征点的集合,e j,k表示其中的任一点与对应的局部地图点之间的误差;k表示当前帧。 Among them, Loss1 is the first error function, Loss2D is the two-dimensional error term, and Loss3D is the three-dimensional error term; P 2D represents the set of two-dimensional feature points, e i,k represents the difference between any one of them and the corresponding local map point Error; P 3D represents a collection of three-dimensional feature points, e j, k represents the error between any one of them and the corresponding local map point; k represents the current frame.
在一种可选的实施方式中,可以在公式(1)中添加鲁棒核函数ρ(),用于减少误匹配对最终结果的影响,如下所示:In an optional implementation manner, a robust kernel function ρ() can be added to formula (1) to reduce the influence of mismatches on the final result, as shown below:
Figure PCTCN2020094461-appb-000002
Figure PCTCN2020094461-appb-000002
在一种可选的实施方式中,二维误差项可以是匹配成功的二维特征点与局部地图点之间的重投影误差,三维误差项可以是匹配成功的三维特征点与局部地图点之间的ICP(迭代最近邻)误差,即有以下关系:In an alternative embodiment, the two-dimensional error term may be the reprojection error between the successfully matched two-dimensional feature point and the local map point, and the three-dimensional error term may be the difference between the successfully matched three-dimensional feature point and the local map point. The ICP (Iterated Nearest Neighbor) error between the two has the following relationship:
Figure PCTCN2020094461-appb-000003
Figure PCTCN2020094461-appb-000003
Figure PCTCN2020094461-appb-000004
Figure PCTCN2020094461-appb-000004
其中,w表示世界坐标系,
Figure PCTCN2020094461-appb-000005
为相机在当前帧的位姿参数,表示实际场景从世界坐标系转换到当前帧图像的平面上所采用的旋转及平移参数;
Figure PCTCN2020094461-appb-000006
为二维特征点i在当前帧图像中的平面坐标,
Figure PCTCN2020094461-appb-000007
为二维特征点i对应的局部地图点的世界坐标,π()表示三维的局部地图点投影到图像平面(此处为当前帧图像的平面),因此,公式(3)表示将局部地图点重投影到当前帧图像的平面后,与对应的二维特征点之间的平面坐标误差。类似的,在公式(4)中,
Figure PCTCN2020094461-appb-000008
为三维特征点j在当前帧图像中的立体坐标(包含深度信息,为三维的相机坐标系中的坐标),
Figure PCTCN2020094461-appb-000009
为三维特征点j对应的局部地图点的世界坐标,经过
Figure PCTCN2020094461-appb-000010
转换到相机坐标系中,与
Figure PCTCN2020094461-appb-000011
计算坐标误差。
Among them, w represents the world coordinate system,
Figure PCTCN2020094461-appb-000005
Is the pose parameter of the camera in the current frame, which represents the rotation and translation parameters used in the actual scene conversion from the world coordinate system to the plane of the current frame image;
Figure PCTCN2020094461-appb-000006
Is the plane coordinate of the two-dimensional feature point i in the current frame image,
Figure PCTCN2020094461-appb-000007
Is the world coordinates of the local map point corresponding to the two-dimensional feature point i, and π() represents the projection of the three-dimensional local map point to the image plane (here is the plane of the current frame image). Therefore, formula (3) represents the local map point After being reprojected to the plane of the current frame image, the plane coordinate error with the corresponding two-dimensional feature point. Similarly, in formula (4),
Figure PCTCN2020094461-appb-000008
Is the three-dimensional coordinates of the three-dimensional feature point j in the current frame image (including depth information, which is the coordinates in the three-dimensional camera coordinate system),
Figure PCTCN2020094461-appb-000009
Is the world coordinate of the local map point corresponding to the three-dimensional feature point j, after
Figure PCTCN2020094461-appb-000010
Converted to the camera coordinate system, and
Figure PCTCN2020094461-appb-000011
Calculate the coordinate error.
在一种可选的实施方式中,还可以在上述公式(1)或(2)中添加信息矩阵,用于衡量特征点的观测不确定性,如下所示:In an optional implementation manner, an information matrix can also be added to the above formula (1) or (2) to measure the observation uncertainty of feature points, as shown below:
Figure PCTCN2020094461-appb-000012
Figure PCTCN2020094461-appb-000012
其中,
Figure PCTCN2020094461-appb-000013
是以协方差矩阵的形式表示的二维特征点i的信息矩阵,
Figure PCTCN2020094461-appb-000014
表示的三维特征点j的信息矩阵,信息矩阵与相机本身的噪声性能等相关,在此相当于对不同位置的特征点进行了加权计算,可以提高第一误差函数的准确度。
among them,
Figure PCTCN2020094461-appb-000013
Is the information matrix of the two-dimensional feature point i expressed in the form of a covariance matrix,
Figure PCTCN2020094461-appb-000014
The indicated information matrix of the three-dimensional feature point j is related to the noise performance of the camera itself. Here, it is equivalent to performing a weighted calculation on the feature points at different positions, which can improve the accuracy of the first error function.
步骤S240,通过计算第一误差函数的最小值,确定相机在当前帧的位姿参数。Step S240: Determine the pose parameter of the camera in the current frame by calculating the minimum value of the first error function.
在SLAM中,误差函数也可称为优化函数、约束函数等,其用于优化求解相应的变量参数,本示例性实施方式中,第一误差函数用于优化相机在当前帧的位姿参数,以公式(5)为例,有如下关系:In SLAM, the error function can also be called an optimization function, a constraint function, etc., which are used to optimize the corresponding variable parameters. In this exemplary embodiment, the first error function is used to optimize the pose parameters of the camera in the current frame. Taking formula (5) as an example, the relationship is as follows:
Figure PCTCN2020094461-appb-000015
Figure PCTCN2020094461-appb-000015
通过对第一误差函数进行非线性优化,多次迭代后求得位姿参数
Figure PCTCN2020094461-appb-000016
其中,迭代收敛的条件可以是迭代达到一定的轮数,或者第一误差函数在连续两轮迭代中降低的值低于预定的数值等。
Through nonlinear optimization of the first error function, the pose parameters are obtained after multiple iterations
Figure PCTCN2020094461-appb-000016
The condition for iterative convergence may be that the iteration reaches a certain number of rounds, or the value of the first error function decreased in two consecutive rounds of iterations is lower than a predetermined value, etc.
基于上述内容,本示例性实施方式中,基于当前帧图像的深度信息,将图像中提取的特征点划分为二维特征点和三维特征点,并分别对两类特征点进行局部地图点的匹配,以构建二维误差项和三维误差项,从而建立第一误差函数,通过优化求解第一误差函数的最小值,得到相机在当前帧的位姿参数。一方面,引入深度信息建立误差函数,以作为位姿参数确定的依据,可以提高SLAM中位姿参数的准确度,增加跟踪精度。另一方面,基于深度信息是否有效,对特征点进行分类,并分别构建误差项,相比于将所有特征点通过一个误差项进行优化的方式,本示例性实施方式具有更高的灵活性与针对性,并可以降低无效或错误的深度信息对结果造成的影响,此外,也可以提高SLAM方法长时间运行的稳定性与鲁棒性。Based on the above content, in this exemplary embodiment, based on the depth information of the current frame image, the feature points extracted in the image are divided into two-dimensional feature points and three-dimensional feature points, and the two types of feature points are respectively matched with local map points , In order to construct a two-dimensional error term and a three-dimensional error term, thereby establishing the first error function, and by optimizing the minimum value of the first error function, the pose parameters of the camera in the current frame are obtained. On the one hand, introducing depth information to establish an error function as a basis for determining pose parameters can improve the accuracy of pose parameters in SLAM and increase tracking accuracy. On the other hand, based on whether the depth information is valid, the feature points are classified, and error terms are constructed respectively. Compared with the method of optimizing all the feature points through one error term, this exemplary embodiment has higher flexibility and Pertinence, and can reduce the influence of invalid or wrong depth information on the results. In addition, it can also improve the stability and robustness of the SLAM method for a long time.
在一种可选的实施方式中,如果预先对IMU和视觉信号单元进行对准(或称为对齐、配准、融合、耦合等),则第一误差函数还可以包括惯性测量误差项,为IMU和视觉信号单元之间的误差。其中,视觉信号单元是指通过视觉信号(主要为图像)进行定位与建模的单元,主要包括相机,还可以包括与相机配套的深度传感器、计算机等。第一误差函数可以如下所示:In an optional implementation, if the IMU and the visual signal unit are aligned (or called alignment, registration, fusion, coupling, etc.) in advance, the first error function may also include an inertial measurement error term, which is The error between the IMU and the visual signal unit. Among them, the visual signal unit refers to a unit for positioning and modeling through visual signals (mainly images), which mainly includes a camera, and may also include a depth sensor, a computer, etc., which are matched with the camera. The first error function can be as follows:
Figure PCTCN2020094461-appb-000017
Figure PCTCN2020094461-appb-000017
引入鲁棒核函数ρ()和信息矩阵的情况下,第一误差函数也可以是:When the robust kernel function ρ() and the information matrix are introduced, the first error function can also be:
Figure PCTCN2020094461-appb-000018
Figure PCTCN2020094461-appb-000018
在公式(7)和(8)中,e IMU,k为惯性测量误差项,表示在当前帧IMU和视觉信号单元之间的误差,
Figure PCTCN2020094461-appb-000019
表示IMU的信息矩阵。第一误差函数中设置惯性测量误差项,可以将IMU信号作为位姿优化的一项依据参数,进一步提高位姿参数的准确度。
In formulas (7) and (8), e IMU,k is the inertial measurement error term, which represents the error between the current frame IMU and the visual signal unit,
Figure PCTCN2020094461-appb-000019
Represents the information matrix of the IMU. The inertial measurement error term is set in the first error function, and the IMU signal can be used as a basis parameter for pose optimization to further improve the accuracy of the pose parameters.
本示例性实施方式中,IMU和视觉信号单元的对准可以包括以下步骤:通过计算IMU的预积分中的旋转参数和视觉信号单元测量的旋转参数之间的误差最小值,得到 IMU的陀螺仪偏置;通过计算IMU的预积分中的位置参数和视觉信号单元测量的位置参数之间的误差最小值,得到IMU的重力加速度;基于IMU的陀螺仪偏置和重力加速度,将IMU和视觉信号单元进行对准。上述步骤可以在SLAM的初始化阶段执行,即在初始化阶段使IMU和视觉信号单元对准,此后在跟踪过程中,也可以执行上述步骤,不断优化调整IMU和视觉信号单元的对准状态,以进一步提高跟踪的精度。In this exemplary embodiment, the alignment of the IMU and the visual signal unit may include the following steps: obtain the IMU gyroscope by calculating the minimum error between the rotation parameter in the pre-integration of the IMU and the rotation parameter measured by the visual signal unit Bias; by calculating the minimum error between the position parameter in the pre-integration of the IMU and the position parameter measured by the visual signal unit, the gravity acceleration of the IMU is obtained; based on the gyroscope bias and the gravity acceleration of the IMU, the IMU and the visual signal The unit is aligned. The above steps can be performed in the initialization phase of SLAM, that is, the IMU and the visual signal unit are aligned in the initialization phase, and then in the tracking process, the above steps can also be performed to continuously optimize and adjust the alignment state of the IMU and the visual signal unit to further Improve the accuracy of tracking.
上述跟踪以及位姿确定的过程通常由SLAM中的跟踪线程负责执行,此外,SLAM还可以包括关键帧处理线程(或称为地图建模线程、地图重构线程等)。在一种可选的实施方式中,关键帧处理线程可以将当前帧确定为新的关键帧,根据新的关键帧及其位姿参数更新场景的地图模型。通过位姿参数,可以将新的关键帧转换到世界坐标系中,与已有的场景地图模型进行匹配,优化修正地图模型中的地图点位置,或者添加新的地图点,删除异常的地图点等。The aforementioned tracking and pose determination processes are usually performed by the tracking thread in SLAM. In addition, SLAM may also include a key frame processing thread (or called a map modeling thread, a map reconstruction thread, etc.). In an optional implementation manner, the key frame processing thread may determine the current frame as a new key frame, and update the map model of the scene according to the new key frame and its pose parameters. Through the pose parameters, the new key frame can be converted to the world coordinate system, matched with the existing scene map model, optimized and corrected the map point position in the map model, or added new map points, and deleted abnormal map points Wait.
在一种可选的实施方式中,并非每一帧都作为关键帧处理,在步骤S240后,可以执行以下步骤:当判断当前帧满足第一预设条件时,将当前帧确定为新的关键帧,更新场景的地图模型;当判断当前帧不满足第一预设条件时,将当前帧确定为普通帧,进入下一帧的处理。其中,第一预设条件可以包括:In an alternative embodiment, not every frame is processed as a key frame. After step S240, the following steps can be performed: when it is determined that the current frame meets the first preset condition, the current frame is determined as the new key frame Frame, the map model of the scene is updated; when it is determined that the current frame does not meet the first preset condition, the current frame is determined as a normal frame, and the next frame is processed. Among them, the first preset condition may include:
当前帧距离上一关键帧超过预设帧数,预设帧数可根据经验或实际应用需求设定,例如超过15帧则当前帧为新的关键帧。The current frame is more than the preset number of frames from the previous key frame, and the preset number of frames can be set according to experience or actual application requirements. For example, if more than 15 frames are exceeded, the current frame is the new key frame.
当前帧与上一关键帧的视差超过预设差值,视差为共视的相反概念,表示两帧所拍摄的区域的差异程度,差异越大,共视程度越低,视差越大;预设差值可根据经验或实际应用需求设定,例如可以设定为15%,当前帧与上一关键帧的视差超过15%时为新的关键帧。The disparity between the current frame and the previous key frame exceeds the preset value. The disparity is the opposite concept of the common view, which represents the degree of difference between the areas captured by the two frames. The greater the difference, the lower the common view, and the greater the parallax; The difference can be set according to experience or actual application requirements. For example, it can be set to 15%. When the disparity between the current frame and the previous key frame exceeds 15%, it is a new key frame.
统计当前帧图像中,与局部地图点之间的误差小于第一阈值的二维特征点的数量,以及与局部地图点之间的误差小于第二阈值的三维特征点的数量,两数量之和为当前帧跟踪成功的特征点数量,若小于预设数量,则当前帧为新的关键帧。第一阈值、第二阈值和预设数量可根据经验或实际应用需求设定。Count the number of two-dimensional feature points whose error with the local map point is less than the first threshold in the current frame image, and the number of three-dimensional feature points whose error with the local map point is less than the second threshold, the sum of the two numbers The number of feature points successfully tracked for the current frame. If it is less than the preset number, the current frame is a new key frame. The first threshold, the second threshold, and the preset number can be set according to experience or actual application requirements.
需要说明的是,也可以任意组合上述3种条件,例如当前帧距离上一关键帧超过预设帧数,且当前帧与上一关键帧的视差超过预设差值时,当前帧为新的关键帧,本公开对此不做限定。It should be noted that the above three conditions can also be combined arbitrarily. For example, when the current frame is more than the preset number of frames from the previous key frame, and the disparity between the current frame and the previous key frame exceeds the preset difference, the current frame is the new one The key frame is not limited in this disclosure.
在确定新的关键帧后,可以将其加入关键帧队列,关键帧处理线程对队列中的关键帧依次处理,以更新场景的地图模型。下面通过三个方面来具体说明如何更新地图模型:After a new key frame is determined, it can be added to the key frame queue, and the key frame processing thread processes the key frames in the queue sequentially to update the map model of the scene. The following three aspects specifically explain how to update the map model:
第1方面、可以对已有的地图点进行更新,同时进一步优化关键帧的位姿参数,参考图3所示,具体包括以下步骤S310~S340:In the first aspect, the existing map points can be updated, and the pose parameters of the key frames can be further optimized. Refer to Figure 3, which specifically includes the following steps S310-S340:
步骤S310,获取新的关键帧以及与新的关键帧关联的其他关键帧,形成关键帧集合。Step S310: Obtain a new key frame and other key frames associated with the new key frame to form a key frame set.
其中,与新的关键帧关联的其他关键帧可以是:距离新的关键帧最近的M个关键帧,以及新的关键帧的N个共视关键帧,M和N是预设的正整数,可根据经验或实际应用需求设定,当然M个关键帧和N个共视关键帧中可能有重复的帧,两部分取并集即可,得到关键帧集合,记为F key。或者也可以将与新的关键帧具有其他关联关系的关键帧形成关键帧集合F keyAmong them, other key frames associated with the new key frame may be: M key frames closest to the new key frame, and N common view key frames of the new key frame, where M and N are preset positive integers, It can be set according to experience or actual application requirements. Of course, there may be repeated frames in M key frames and N common view key frames. The two parts can be unioned to obtain the key frame set, which is recorded as F key . Or, key frames that have other association relationships with the new key frame can also be formed into a key frame set F key .
步骤S320,获取关键帧集合中出现过的全部地图点,形成地图点集合。Step S320: Obtain all the map points that have appeared in the key frame set to form a map point set.
换而言之,对F key中全部关键帧的地图点取并集,形成地图点集合,记为P mapIn other words, take the union of the map points of all key frames in F key to form a set of map points, denoted as P map .
步骤S330,基于关键帧集合与其中每个关键帧的位姿参数、以及地图点集合,构建第二误差函数。In step S330, a second error function is constructed based on the key frame set, the pose parameters of each key frame, and the map point set.
其中,第二误差函数包括重投影误差项,重投影误差项为地图点集合中的任一地图点到关键帧集合中的任一关键帧的重投影误差之和,可以表示如下:The second error function includes a reprojection error term, which is the sum of the reprojection error from any map point in the map point set to any key frame in the key frame set, which can be expressed as follows:
Figure PCTCN2020094461-appb-000020
Figure PCTCN2020094461-appb-000020
e o,p表示P map中的任一地图点p到F key中的任一关键帧o的重投影误差。进一步的,也可以在第二误差函数中添加鲁棒核函数ρ(),则有: e o,p represents the reprojection error from any map point p in P map to any key frame o in F key . Furthermore, a robust kernel function ρ() can also be added to the second error function, then:
Figure PCTCN2020094461-appb-000021
Figure PCTCN2020094461-appb-000021
在一种可选的实施方式中,为了提高第二误差函数的准确度,还可以设置帧间惯性测量误差项,为IMU在关键帧集合中任意两相邻关键帧i和i+1之间的误差之和,如下所示:In an optional implementation manner, in order to improve the accuracy of the second error function, an inter-frame inertial measurement error term can also be set, which is the IMU between any two adjacent key frames i and i+1 in the key frame set The sum of errors is as follows:
Figure PCTCN2020094461-appb-000022
Figure PCTCN2020094461-appb-000022
在公式(11)中,还添加了关键帧i和i+1之间的IMU信息矩阵,可以进一步优化第二误差函数。In formula (11), the IMU information matrix between key frames i and i+1 is also added, which can further optimize the second error function.
步骤S340,通过计算第二误差函数的最小值,优化关键帧集合中各关键帧的位姿参数以及地图点集合中各地图点的坐标,以更新地图模型。In step S340, by calculating the minimum value of the second error function, the pose parameters of each key frame in the key frame set and the coordinates of each map point in the map point set are optimized to update the map model.
其中,优化求解可以如下所示:Among them, the optimization solution can be as follows:
Figure PCTCN2020094461-appb-000023
Figure PCTCN2020094461-appb-000023
X p为P map中的任一地图点p的世界坐标,
Figure PCTCN2020094461-appb-000024
为F key中的任一关键帧q的位姿参数,通过优化求解这两项参数,可以优化修正地图点坐标,更新地图模型。
X p is the world coordinate of any map point p in P map ,
Figure PCTCN2020094461-appb-000024
It is the pose parameter of any key frame q in the F key . By optimizing these two parameters, the map point coordinates can be optimized and corrected, and the map model can be updated.
第2方面、可以删除已有的地图点中异常的地图点,具体而言,基于上述建立的关键帧集合和地图点集合,可以将满足第二预设条件的地图点作为异常地图点,从地图模型中删除。第二预设条件可以包括以下任一条:In the second aspect, abnormal map points can be deleted from existing map points. Specifically, based on the key frame set and map point set established above, the map points that meet the second preset condition can be used as abnormal map points, from Deleted from the map model. The second preset condition can include any of the following:
若地图点p在关键帧集合中各关键帧上的重投影误差的均值大于预设误差阈值,则p为异常地图点。预设误差阈值可根据经验或实际应用需求设定。在计算重投影误差时,可以选取关键帧集合中的全部关键帧进行计算,也可以选取p具有投影的关键 帧进行计算。If the average value of the reprojection error of the map point p on each key frame in the key frame set is greater than the preset error threshold, then p is an abnormal map point. The preset error threshold can be set according to experience or actual application requirements. When calculating the reprojection error, you can select all the key frames in the key frame set for calculation, or you can select p key frames with projection for calculation.
若地图点p在关键帧集合中被跟踪成功的关键帧数量小于预测数量乘以预设比例,则p为异常地图点。被跟踪成功是指地图点p在关键帧上的重投影误差小于一定的数值,例如小于上述第一阈值。基于p的位置以及各关键帧的位姿参数,关键帧处理线程可以预测p被跟踪成功的关键帧数量,该数量乘以小于或等于1的预设比例,结果用于衡量是否跟踪异常;预设比例表示允许的偏差程度,可根据经验或实际应用需求设定,例如为0.5。判断关系如下所示:If the number of successfully tracked key frames of the map point p in the key frame set is less than the predicted number multiplied by the preset ratio, then p is an abnormal map point. Successfully being tracked means that the reprojection error of the map point p on the key frame is less than a certain value, for example, less than the aforementioned first threshold. Based on the position of p and the pose parameters of each key frame, the key frame processing thread can predict the number of key frames where p is successfully tracked. This number is multiplied by a preset ratio less than or equal to 1, and the result is used to measure whether the tracking is abnormal; Let the ratio represent the allowable degree of deviation, which can be set according to experience or actual application requirements, for example, 0.5. The judgment relationship is as follows:
Figure PCTCN2020094461-appb-000025
Figure PCTCN2020094461-appb-000025
II()为指示函数,在()内为真和假时分别取值为1和0;T1为第一阈值,R为预设比例,Pre()表示预测函数。II() is an indicator function, and the values in () are 1 and 0 respectively when true and false; T1 is the first threshold, R is the preset ratio, and Pre() is the prediction function.
第3方面、可以增加新的地图点,具体而言,若新的关键帧中存在与局部地图点(或地图点集合中的地图点)不匹配的特征点,可以认为该特征点尚不存在于地图模型中;于是可以将该特征点与关键帧集合中其他关键帧的特征点进行匹配。若匹配成功,得到一对特征点点对,可以认为该点对是场景中的同一点在两个不同帧上的投影;对该点对进行三角化计算,以还原出其在场景中的三维坐标,从而得到一个新的地图点,可以添加到地图模型中。In the third aspect, new map points can be added. Specifically, if there is a feature point in the new key frame that does not match the local map point (or map point in the map point set), it can be considered that the feature point does not yet exist In the map model; then the feature point can be matched with the feature points of other key frames in the key frame set. If the matching is successful, a pair of feature points is obtained, which can be regarded as the projection of the same point in the scene on two different frames; triangulate the point pair to restore its three-dimensional coordinates in the scene To get a new map point, which can be added to the map model.
需要补充的是,上述方法实际可以应用于每个关键帧,将关键帧集合中每个关键帧的特征点都与地图点集合中的地图点进行匹配,未匹配成功的特征点形成一个未知点集合,再将未知点集合中的特征点两两匹配(采用不放回的方式,匹配出一个点对,则将两个点从集合中移除,因此不会出现一个点匹配到两个或多个点的情况),匹配的点对对应三角化生成新的地图点。What needs to be added is that the above method can actually be applied to each key frame. The feature points of each key frame in the key frame set are matched with the map points in the map point set. The unmatched feature points form an unknown point. Set, and then match the feature points in the unknown point set pairwise (using a non-replacement method, if a point pair is matched, the two points will be removed from the set, so there will be no match between one point and two or In the case of multiple points), the matched point pairs correspond to triangulation to generate new map points.
在一种可选的实施方式中,若新的关键帧中存在与局部地图点不匹配的特征点,由于该特征点具有深度信息,即该特征点具有三维的图像坐标,根据关键帧的位姿参数可以将该特征点映射到世界坐标系中,计算出其真实位置,从而作为新的地图点,添加到地图模型中。即使该地图点的位置有偏差,也可以在后续帧的处理中不断优化。In an optional implementation, if there is a feature point that does not match the local map point in the new key frame, since the feature point has depth information, that is, the feature point has three-dimensional image coordinates. The pose parameter can map the feature point to the world coordinate system, calculate its true position, and add it to the map model as a new map point. Even if the position of the map point is deviated, it can be continuously optimized in the processing of subsequent frames.
除了上述跟踪线程和关键帧处理线程外,SLAM还可以包括回环检测线程,用于针对新的关键帧进行回环检测,以对地图模型进行全局优化。具体而言,针对新的关键帧,通过预先训练的视觉词袋模型将该关键帧中的特征点转换为词典描述;然后计算该关键帧与之前的关键帧的词典相似性,若相似性达到一定的阈值,则认为该关键帧为候选回环帧;针对候选回环帧做几何验证,即匹配点之间应该满足对应的几何关系,如果通过几何验证则认为是回环帧,进行地图模型的全局优化。In addition to the above-mentioned tracking thread and key frame processing thread, SLAM may also include a loopback detection thread, which is used to perform loopback detection for new keyframes to optimize the map model globally. Specifically, for the new key frame, the feature points in the key frame are converted into a dictionary description through the pre-trained visual bag of words model; then the dictionary similarity between the key frame and the previous key frame is calculated, if the similarity reaches With a certain threshold, the key frame is considered to be a candidate loop frame; geometric verification is performed on the candidate loop frame, that is, the matching points should meet the corresponding geometric relationship, if the geometric verification is passed, it is considered a loop frame, and the map model is globally optimized .
图4示出了本示例性实施方式的一种SLAM方法流程,包括跟踪线程、关键帧处理线程和回环检测线程分别执行的3个部分,具体如下:Fig. 4 shows the flow of a SLAM method of this exemplary embodiment, including three parts respectively executed by a tracking thread, a key frame processing thread, and a loopback detection thread, which are specifically as follows:
跟踪线程执行步骤S411,采集当前帧图像及其深度信息;然后执行步骤S412,从当前帧图像提取特征点;然后执行步骤S413,根据深度信息将特征点分类,得到二 维特征点和三维特征点;然后执行步骤S414,将二维特征点和三维特征点分别与局部地图点匹配,构建第一误差函数;然后执行步骤S415,通过计算第一误差函数的最小值,优化求解当前帧的位姿参数;最后执行步骤S416,判断当前帧是否满足第一预设条件,若不满足,则进入下一帧的处理,若满足,则将其作为新的关键帧,加入到关键帧队列。跟踪线程的流程结束。The tracking thread executes step S411 to collect the current frame image and its depth information; then executes step S412 to extract feature points from the current frame image; then executes step S413 to classify feature points according to the depth information to obtain two-dimensional feature points and three-dimensional feature points ; Then step S414 is performed to match the two-dimensional feature points and the three-dimensional feature points with the local map points respectively to construct a first error function; then step S415 is performed to optimize the pose of the current frame by calculating the minimum value of the first error function Parameters; finally step S416 is performed to determine whether the current frame meets the first preset condition, if not, then enter the processing of the next frame, if it is satisfied, it will be treated as a new key frame and added to the key frame queue. The process of tracking the thread ends.
关键帧处理线程响应于当前帧作为新的关键帧,执行步骤S421,在关键帧队列中添加新的关键帧;然后执行步骤S422,构建第二误差函数;然后执行步骤S423,通过计算第二误差函数的最小值,优化多个附近关键帧(靠近当前帧的关键帧)的位姿以及地图点的位置,这是对地图模型的局部优化;在局部优化后,执行步骤S424,判断IMU和视觉信号单元是否对准,若未对准,则执行步骤S425以进行对准;在对准前,还可以判断一定数量的附近关键帧之间是否存在明显的视差,如果存在,则可以进行对准,如果不存在,则认为无法对准,跳过对准的步骤;然后执行步骤S426,删除地图模型中的异常地图点;然后执行步骤S427,在地图模型中添加新的地图点。关键帧处理线程的流程结束。In response to the current frame as the new key frame, the key frame processing thread executes step S421 to add a new key frame to the key frame queue; then executes step S422 to construct a second error function; then executes step S423 to calculate the second error The minimum value of the function is to optimize the poses of multiple nearby key frames (key frames close to the current frame) and the position of the map point. This is a local optimization of the map model; after the local optimization, step S424 is executed to determine the IMU and vision Whether the signal unit is aligned, if not, perform step S425 for alignment; before alignment, it can also be judged whether there is obvious parallax between a certain number of nearby key frames, and if so, alignment can be performed If it does not exist, it is considered that the alignment is impossible, and the alignment step is skipped; then step S426 is executed to delete the abnormal map point in the map model; then step S427 is executed to add a new map point in the map model. The flow of the key frame processing thread ends.
回环检测线程在关键帧处理线程局部优化的基础上,可以进行全局优化,具体包括:先执行步骤S431,进行回环检测;若为回环候选帧,则执行步骤S432,进行几何验证;若几何验证通过,则执行步骤S433,对地图模型全局优化。The loopback detection thread can perform global optimization based on the local optimization of the key frame processing thread. Specifically, it includes: first perform step S431 to perform loopback detection; if it is a loopback candidate frame, perform step S432 to perform geometric verification; if the geometric verification passes , Step S433 is executed to optimize the map model globally.
本公开的示例性实施方式还提供了一种基于深度信息的位姿确定装置,如图5所示,该位姿确定装置500可以包括:图像获取模块510,用于通过相机获取关于场景的当前帧图像,并获取当前帧图像的深度信息;特征点提取模块520,用于从当前帧图像提取特征点,将深度信息有效的特征点确定为三维特征点,将深度信息无效的特征点确定为二维特征点;函数构建模块530,用于将二维特征点和三维特征点分别与局部地图点进行匹配,以构建第一误差函数,第一误差函数包括二维误差项和三维误差项,二维误差项为匹配成功的二维特征点与局部地图点之间的误差,三维误差项为匹配成功的三维特征点与局部地图点之间的误差;位姿确定模块540,用于通过计算第一误差函数的最小值,确定相机在当前帧的位姿参数。Exemplary embodiments of the present disclosure also provide a pose determination device based on depth information. As shown in FIG. 5, the pose determination device 500 may include: an image acquisition module 510 for acquiring current information about the scene through a camera. Frame image and obtain the depth information of the current frame image; the feature point extraction module 520 is used to extract feature points from the current frame image, determine feature points with valid depth information as three-dimensional feature points, and determine feature points with invalid depth information as Two-dimensional feature points; the function construction module 530 is used to match two-dimensional feature points and three-dimensional feature points with local map points respectively to construct a first error function, the first error function includes a two-dimensional error term and a three-dimensional error term, The two-dimensional error term is the error between the successfully matched two-dimensional feature point and the local map point, and the three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point; the pose determination module 540 is used to calculate The minimum value of the first error function determines the pose parameters of the camera in the current frame.
在一种可选的实施方式中,匹配成功的二维特征点与局部地图点之间的误差可以是重投影误差,匹配成功的三维特征点与局部地图点之间的误差可以是迭代最近邻误差。In an optional implementation, the error between the successfully matched two-dimensional feature point and the local map point may be a reprojection error, and the error between the successfully matched three-dimensional feature point and the local map point may be the iterative nearest neighbor error.
在一种可选的实施方式中,局部地图点可以包括:上一关键帧以及上一关键帧的共视关键帧中出现过的地图点;其中,上一关键帧为距离当前帧最近的关键帧。In an optional implementation manner, the local map points may include: the previous key frame and the map points that appeared in the common view key frame of the previous key frame; wherein, the previous key frame is the key closest to the current frame frame.
在一种可选的实施方式中,如果位姿确定装置500预先对IMU和视觉信号单元进行对准,则第一误差函数还可以包括惯性测量误差项,为IMU和视觉信号单元之间的误差,视觉信号单元包括相机。In an alternative embodiment, if the pose determination device 500 aligns the IMU and the visual signal unit in advance, the first error function may also include an inertial measurement error term, which is the error between the IMU and the visual signal unit. , The visual signal unit includes a camera.
在一种可选的实施方式中,位姿确定装置500还可以包括:IMU对准模块,用于 通过计算IMU的预积分中的旋转参数和视觉信号单元测量的旋转参数之间的误差最小值,得到IMU的陀螺仪偏置,通过计算IMU的预积分中的位置参数和视觉信号单元测量的位置参数之间的误差最小值,得到IMU的重力加速度,并基于IMU的陀螺仪偏置和重力加速度,将IMU和视觉信号单元进行对准。In an optional embodiment, the pose determination device 500 may further include: an IMU alignment module, configured to calculate the minimum error between the rotation parameter in the pre-integration of the IMU and the rotation parameter measured by the visual signal unit , Get the gyroscope bias of the IMU, calculate the minimum error between the position parameter in the pre-integration of the IMU and the position parameter measured by the visual signal unit to get the gravity acceleration of the IMU, and based on the gyroscope bias and gravity of the IMU Acceleration, align the IMU and the visual signal unit.
在一种可选的实施方式中,位姿确定装置500还可以包括:地图更新模块,用于将当前帧确定为新的关键帧,根据新的关键帧以及位姿参数更新场景的地图模型。In an optional implementation manner, the pose determining apparatus 500 may further include: a map update module, configured to determine the current frame as a new key frame, and update the map model of the scene according to the new key frame and the pose parameters.
在一种可选的实施方式中,地图更新模块还可以用于当判断当前帧满足第一预设条件时,将当前帧确定为新的关键帧,当判断当前帧不满足第一预设条件时,将当前帧确定为普通帧,进入下一帧的处理;第一预设条件可以包括以下任意一个或多个的组合:当前帧距离上一关键帧超过预设帧数,上一关键帧为距离当前帧最近的关键帧;当前帧与上一关键帧的视差超过预设差值;在当前帧跟踪成功的特征点数量小于预设数量,跟踪成功的特征点数量通过以下方法确定:统计与局部地图点之间的误差小于第一阈值的二维特征点的数量,以及与局部地图点之间的误差小于第二阈值的三维特征点的数量,两数量之和为跟踪成功的特征点数量。In an optional implementation manner, the map update module may also be used to determine the current frame as a new key frame when it is determined that the current frame meets the first preset condition, and when it is determined that the current frame does not meet the first preset condition When, the current frame is determined as a normal frame, and the next frame is processed; the first preset condition can include any one or a combination of the following: the current frame is more than the preset number of frames from the previous key frame, and the previous key frame It is the key frame closest to the current frame; the disparity between the current frame and the previous key frame exceeds the preset difference; the number of feature points successfully tracked in the current frame is less than the preset number, and the number of feature points successfully tracked is determined by the following method: Statistics The number of two-dimensional feature points whose error with the local map point is less than the first threshold, and the number of three-dimensional feature points whose error with the local map point is less than the second threshold, the sum of the two numbers is the feature point with successful tracking Quantity.
在一种可选的实施方式中,地图更新模块可以包括:关键帧获取单元,用于获取新的关键帧以及与新的关键帧关联的其他关键帧,形成关键帧集合;地图点获取单元,用于获取关键帧集合中出现过的全部地图点,形成地图点集合;第二函数构建单元,用于基于关键帧集合与其中每个关键帧的位姿参数、以及地图点集合,构建第二误差函数,第二误差函数包括重投影误差项,重投影误差项为地图点集合中的任一地图点到关键帧集合中的任一关键帧的重投影误差之和;优化处理单元,用于通过计算第二误差函数的最小值,优化关键帧集合中各关键帧的位姿参数以及地图点集合中各地图点的坐标,以更新地图模型。In an alternative embodiment, the map update module may include: a key frame acquisition unit for acquiring a new key frame and other key frames associated with the new key frame to form a key frame set; a map point acquiring unit, It is used to obtain all the map points that have appeared in the key frame set to form a map point set; the second function construction unit is used to construct a second function based on the key frame set and the pose parameters of each key frame and the map point set. Error function. The second error function includes a reprojection error term. The reprojection error term is the sum of reprojection errors from any map point in the map point set to any key frame in the key frame set; an optimization processing unit for By calculating the minimum value of the second error function, the pose parameters of each key frame in the key frame set and the coordinates of each map point in the map point set are optimized to update the map model.
在一种可选的实施方式中,第二误差函数还可以包括帧间惯性测量误差项,为IMU在关键帧集合中任意两相邻关键帧之间的误差之和。In an optional implementation manner, the second error function may also include an inter-frame inertial measurement error term, which is the sum of errors between any two adjacent key frames of the IMU in the key frame set.
在一种可选的实施方式中,与新的关键帧关联的其他关键帧可以包括:距离新的关键帧最近的M个关键帧,以及新的关键帧的N个共视关键帧;其中,M和N是预设的正整数。In an optional implementation manner, other key frames associated with the new key frame may include: M key frames closest to the new key frame, and N common view key frames of the new key frame; wherein, M and N are preset positive integers.
在一种可选的实施方式中,地图更新模块还可以包括:地图点删除单元,用于将地图点集合中满足第二预设条件的地图点从地图模型中删除;其中,第二预设条件可以包括:地图点在关键帧集合中被跟踪成功的关键帧数量小于预测数量乘以预设比例,预设比例小于或等于1;或者地图点在关键帧集合中各关键帧上的重投影误差的均值大于预设误差阈值。In an optional implementation manner, the map update module may further include: a map point deletion unit, configured to delete map points in the map point set that meet the second preset condition from the map model; wherein, the second preset Conditions may include: the number of key frames where the map point is successfully tracked in the key frame set is less than the predicted number multiplied by the preset ratio, which is less than or equal to 1; or the reprojection of the map point on each key frame in the key frame set The mean value of the error is greater than the preset error threshold.
在一种可选的实施方式中,地图更新模块还可以包括:地图点添加单元,用于若新的关键帧中存在与局部地图点不匹配的特征点,则将该特征点与关键帧集合中其他关键帧的特征点进行匹配,根据匹配的结果进行三角化计算,得到新的地图点,以添 加到地图模型中。In an optional implementation manner, the map update module may further include: a map point adding unit configured to, if there is a feature point in the new key frame that does not match the local map point, collect the feature point with the key frame The feature points of other key frames are matched, and triangulation calculation is performed according to the matching results to obtain new map points to add to the map model.
在一种可选的实施方式中,位姿确定装置500还可以包括:回环检测模块,用于针对新的关键帧进行回环检测,以对地图模型进行全局优化。In an optional implementation manner, the pose determination device 500 may further include: a loop detection module, configured to perform loop detection for the new key frame, so as to globally optimize the map model.
上述装置的模块/单元的具体细节在方法部分实施方式中已经详细说明,未披露的方案细节内容可以参见方法部分的实施方式内容,因而不再赘述。The specific details of the modules/units of the above-mentioned device have been described in detail in the implementation of the method part, and the details of the undisclosed solution can be referred to the content of the implementation in the method part, and thus will not be repeated.
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as systems, methods, or program products. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software, which may be collectively referred to herein as "Circuit", "Module" or "System".
本公开的示例性实施方式还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序产品在终端设备上运行时,程序代码用于使终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。Exemplary embodiments of the present disclosure also provide a computer-readable storage medium on which is stored a program product capable of implementing the above-mentioned method of this specification. In some possible implementation manners, various aspects of the present disclosure can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the terminal device execute the above-mentioned instructions in this specification. The steps according to various exemplary embodiments of the present disclosure are described in the "Exemplary Methods" section.
参考图6所示,描述了根据本公开的示例性实施方式的用于实现上述方法的程序产品600,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。With reference to FIG. 6, a program product 600 for implementing the above method according to an exemplary embodiment of the present disclosure is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and can be used in a terminal Running on equipment, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规 的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using Internet service providers) Business to connect via the Internet).
本公开的示例性实施方式还提供了一种能够实现上述方法的电子设备。下面参照图7来描述根据本公开的这种示例性实施方式的电子设备700。图7显示的电子设备700仅仅是一个示例,不应对本公开实施方式的功能和使用范围带来任何限制。Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method. The electronic device 700 according to this exemplary embodiment of the present disclosure is described below with reference to FIG. 7. The electronic device 700 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
如图7所示,电子设备700可以以通用计算设备的形式表现。电子设备700的组件可以包括但不限于:上述至少一个处理单元710、上述至少一个存储单元720、连接不同系统组件(包括存储单元720和处理单元710)的总线730和显示单元740。As shown in FIG. 7, the electronic device 700 may be in the form of a general-purpose computing device. The components of the electronic device 700 may include, but are not limited to: the aforementioned at least one processing unit 710, the aforementioned at least one storage unit 720, a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710), and a display unit 740.
存储单元720存储有程序代码,程序代码可以被处理单元710执行,使得处理单元710执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。例如,处理单元710可以执行图2、图3或图4所示的方法步骤等。The storage unit 720 stores program codes, and the program codes can be executed by the processing unit 710 so that the processing unit 710 executes the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification. For example, the processing unit 710 may execute the method steps shown in FIG. 2, FIG. 3, or FIG. 4, etc.
存储单元720可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)721和/或高速缓存存储单元722,还可以进一步包括只读存储单元(ROM)723。The storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 721 and/or a cache storage unit 722, and may further include a read-only storage unit (ROM) 723.
存储单元720还可以包括具有一组(至少一个)程序模块725的程序/实用工具724,这样的程序模块725包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 720 may also include a program/utility tool 724 having a set of (at least one) program module 725. Such program module 725 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
总线730可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 730 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
电子设备700也可以与一个或多个外部设备800(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备700交互的设备通信,和/或与使得该电子设备700能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口750进行。并且,电子设备700还可以通过网络适配器760与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器760通过总线730与电子设备700的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备700使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 700 may also communicate with one or more external devices 800 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 700, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 750. In addition, the electronic device 700 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760. As shown in the figure, the network adapter 760 communicates with other modules of the electronic device 700 through the bus 730. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开示例性实施方式的方法。Through the description of the foregoing embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the exemplary embodiment of the present disclosure.
此外,上述附图仅是根据本公开示例性实施方式的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiment of the present disclosure, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的示例性实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the exemplary embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施方式。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施方式仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the description and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure, which follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is only limited by the appended claims.

Claims (14)

  1. 一种基于深度信息的位姿确定方法,其特征在于,包括:A method for determining a pose based on depth information, which is characterized in that it includes:
    通过相机获取关于场景的当前帧图像,并获取所述当前帧图像的深度信息;Acquiring a current frame image about the scene through a camera, and acquiring depth information of the current frame image;
    从所述当前帧图像提取特征点,将深度信息有效的特征点确定为三维特征点,将深度信息无效的特征点确定为二维特征点;Extracting feature points from the current frame image, determining feature points with valid depth information as three-dimensional feature points, and determining feature points with invalid depth information as two-dimensional feature points;
    将所述二维特征点和三维特征点分别与所述场景的局部地图点进行匹配,以构建第一误差函数,所述第一误差函数包括二维误差项和三维误差项,所述二维误差项为匹配成功的二维特征点与局部地图点之间的误差,所述三维误差项为匹配成功的三维特征点与局部地图点之间的误差;The two-dimensional feature points and the three-dimensional feature points are matched with local map points of the scene respectively to construct a first error function. The first error function includes a two-dimensional error term and a three-dimensional error term. The error term is the error between the successfully matched two-dimensional feature point and the local map point, and the three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point;
    通过计算所述第一误差函数的最小值,确定所述相机在当前帧的位姿参数。By calculating the minimum value of the first error function, the pose parameters of the camera in the current frame are determined.
  2. 根据权利要求1所述的方法,其特征在于,如果预先对惯性测量单元和视觉信号单元进行对准,则所述第一误差函数还包括惯性测量误差项,为所述惯性测量单元和所述视觉信号单元之间的误差,所述视觉信号单元包括所述相机。The method according to claim 1, wherein if the inertial measurement unit and the visual signal unit are aligned in advance, the first error function further includes an inertial measurement error term, which is the inertial measurement unit and the visual signal unit. Errors between visual signal units, the visual signal unit including the camera.
  3. 根据权利要求2所述的方法,其特征在于,所述对惯性测量单元和视觉信号单元进行对准,包括:The method according to claim 2, wherein the aligning the inertial measurement unit and the visual signal unit comprises:
    通过计算所述惯性测量单元的预积分中的旋转参数和所述视觉信号单元测量的旋转参数之间的误差最小值,得到所述惯性测量单元的陀螺仪偏置;Obtaining the gyroscope bias of the inertial measurement unit by calculating the minimum error between the rotation parameter in the pre-integration of the inertial measurement unit and the rotation parameter measured by the visual signal unit;
    通过计算所述惯性测量单元的预积分中的位置参数和所述视觉信号单元测量的位置参数之间的误差最小值,得到所述惯性测量单元的重力加速度;Obtaining the gravity acceleration of the inertial measurement unit by calculating the minimum error between the position parameter in the pre-integration of the inertial measurement unit and the position parameter measured by the visual signal unit;
    基于所述惯性测量单元的陀螺仪偏置和重力加速度,将所述惯性测量单元和所述视觉信号单元进行对准。Based on the gyroscope bias and gravitational acceleration of the inertial measurement unit, the inertial measurement unit and the visual signal unit are aligned.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    将所述当前帧确定为新的关键帧,根据所述新的关键帧以及所述位姿参数更新所述场景的地图模型。The current frame is determined as a new key frame, and the map model of the scene is updated according to the new key frame and the pose parameters.
  5. 根据权利要求4所述的方法,其特征在于,在确定所述相机在当前帧的位姿参数后,所述方法还包括:The method according to claim 4, wherein after determining the pose parameters of the camera in the current frame, the method further comprises:
    当判断所述当前帧满足第一预设条件时,将所述当前帧确定为新的关键帧;When it is determined that the current frame meets the first preset condition, determining the current frame as a new key frame;
    当判断所述当前帧不满足所述第一预设条件时,将所述当前帧确定为普通帧,进入下一帧的处理;When it is determined that the current frame does not meet the first preset condition, determine the current frame as a normal frame, and enter the processing of the next frame;
    所述第一预设条件包括以下任意一个或多个的组合:The first preset condition includes any one or a combination of the following:
    所述当前帧距离上一关键帧超过预设帧数,所述上一关键帧为距离所述当前帧最近的关键帧;The current frame is more than a preset number of frames from the previous key frame, and the previous key frame is the key frame closest to the current frame;
    所述当前帧与所述上一关键帧的视差超过预设差值;The disparity between the current frame and the previous key frame exceeds a preset difference;
    在所述当前帧跟踪成功的特征点数量小于预设数量,所述跟踪成功的特征点数量通过以下方法确定:The number of feature points successfully tracked in the current frame is less than a preset number, and the number of feature points successfully tracked is determined by the following method:
    统计与局部地图点之间的误差小于第一阈值的所述二维特征点的数量,以及与局部地图点之间的误差小于第二阈值的所述三维特征点的数量,两数量之和为所述跟踪成功的特征点数量。The number of the two-dimensional feature points whose error between statistics and local map points is less than the first threshold, and the number of three-dimensional feature points whose error with the local map points is less than the second threshold, the sum of the two numbers is The number of feature points with successful tracking.
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述新的关键帧以及所述位姿参数更新所述场景的地图模型,包括:The method according to claim 4, wherein the updating the map model of the scene according to the new key frame and the pose parameter comprises:
    获取所述新的关键帧以及与所述新的关键帧关联的其他关键帧,形成关键帧集合;Acquiring the new key frame and other key frames associated with the new key frame to form a key frame set;
    获取所述关键帧集合中出现过的全部地图点,形成地图点集合;Acquiring all map points that have appeared in the key frame set to form a map point set;
    基于所述关键帧集合与其中每个关键帧的位姿参数、以及所述地图点集合,构建第二误差函数,所述第二误差函数包括重投影误差项,所述重投影误差项为所述地图点集合中的任一地图点到所述关键帧集合中的任一关键帧的重投影误差之和;Based on the key frame set and the pose parameters of each of the key frames, and the map point set, a second error function is constructed. The second error function includes a reprojection error term. The sum of reprojection errors from any map point in the map point set to any key frame in the key frame set;
    通过计算所述第二误差函数的最小值,优化所述关键帧集合中各关键帧的位姿参数以及所述地图点集合中各地图点的坐标,以更新所述地图模型。By calculating the minimum value of the second error function, the pose parameters of each key frame in the key frame set and the coordinates of each map point in the map point set are optimized to update the map model.
  7. 根据权利要求6所述的方法,其特征在于,所述第二误差函数还包括帧间惯性测量误差项,为惯性测量单元在所述关键帧集合中任意两相邻关键帧之间的误差之和。The method according to claim 6, wherein the second error function further includes an inter-frame inertial measurement error term, which is an error between any two adjacent key frames of the inertial measurement unit in the key frame set. with.
  8. 根据权利要求6所述的方法,其特征在于,所述与所述新的关键帧关联的其他关键帧包括:距离所述新的关键帧最近的M个关键帧,以及所述新的关键帧的N个共视关键帧;其中,M和N是预设的正整数。The method according to claim 6, wherein the other key frames associated with the new key frame comprise: M key frames closest to the new key frame, and the new key frame N common-view key frames of, where M and N are preset positive integers.
  9. 根据权利要求6所述的方法,其特征在于,在更新所述地图模型时,还将所述地图点集合中满足第二预设条件的地图点从所述地图模型中删除;其中,所述第二预设条件包括:The method according to claim 6, wherein when the map model is updated, the map points satisfying a second preset condition in the map point set are also deleted from the map model; wherein, the The second preset conditions include:
    所述地图点在所述关键帧集合中被跟踪成功的关键帧数量小于预测数量乘以预设比例,所述预设比例小于或等于1;或者The number of key frames for which the map point is successfully tracked in the key frame set is less than the predicted number multiplied by a preset ratio, and the preset ratio is less than or equal to 1; or
    所述地图点在所述关键帧集合中各关键帧上的重投影误差的均值大于预设误差阈值。The average value of the reprojection error of the map point on each key frame in the key frame set is greater than a preset error threshold.
  10. 根据权利要求6所述的方法,其特征在于,在更新所述地图模型时,若所述新的关键帧中存在与所述局部地图点不匹配的特征点,则将该特征点与所述关键帧集合中其他关键帧的特征点进行匹配,根据匹配的结果进行三角化计算,得到新的地图点,以添加到所述地图模型中。The method according to claim 6, wherein when the map model is updated, if there is a feature point that does not match the local map point in the new key frame, then the feature point and the The feature points of other key frames in the key frame set are matched, and triangulation calculation is performed according to the matching result to obtain new map points to be added to the map model.
  11. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    针对所述新的关键帧进行回环检测,以对所述地图模型进行全局优化。Perform loopback detection on the new key frame to optimize the map model globally.
  12. 一种基于深度信息的位姿确定装置,其特征在于,包括:A device for determining a pose based on depth information, characterized in that it comprises:
    图像获取模块,用于通过相机获取关于场景的当前帧图像,并获取所述当前帧图像的深度信息;An image acquisition module for acquiring a current frame image about the scene through a camera, and acquiring depth information of the current frame image;
    特征点提取模块,用于从所述当前帧图像提取特征点,将深度信息有效的特征点确定为三维特征点,将深度信息无效的特征点确定为二维特征点;The feature point extraction module is configured to extract feature points from the current frame image, determine feature points with valid depth information as three-dimensional feature points, and determine feature points with invalid depth information as two-dimensional feature points;
    函数构建模块,用于将所述二维特征点和三维特征点分别与局部地图点进行匹配,以构建第一误差函数,所述第一误差函数包括二维误差项和三维误差项,所述二维误差项为匹配成功的二维特征点与局部地图点之间的误差,所述三维误差项为匹配成功的三维特征点与局部地图点之间的误差;The function construction module is used to match the two-dimensional feature points and the three-dimensional feature points with local map points respectively to construct a first error function. The first error function includes a two-dimensional error term and a three-dimensional error term. The two-dimensional error term is the error between the successfully matched two-dimensional feature point and the local map point, and the three-dimensional error term is the error between the successfully matched three-dimensional feature point and the local map point;
    位姿确定模块,用于通过计算所述第一误差函数的最小值,确定所述相机在当前帧的位姿参数。The pose determination module is configured to determine the pose parameters of the camera in the current frame by calculating the minimum value of the first error function.
  13. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-11任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the method of any one of claims 1-11 when the computer program is executed by a processor.
  14. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;以及Processor; and
    存储器,用于存储所述处理器的可执行指令;A memory for storing executable instructions of the processor;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-11任一项所述的方法。Wherein, the processor is configured to execute the method according to any one of claims 1-11 by executing the executable instructions.
PCT/CN2020/094461 2019-06-28 2020-06-04 Depth information-based pose determination method and device, medium, and electronic apparatus WO2020259248A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910580095.1A CN110349213B (en) 2019-06-28 2019-06-28 Pose determining method and device based on depth information, medium and electronic equipment
CN201910580095.1 2019-06-28

Publications (1)

Publication Number Publication Date
WO2020259248A1 true WO2020259248A1 (en) 2020-12-30

Family

ID=68177312

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/094461 WO2020259248A1 (en) 2019-06-28 2020-06-04 Depth information-based pose determination method and device, medium, and electronic apparatus

Country Status (2)

Country Link
CN (1) CN110349213B (en)
WO (1) WO2020259248A1 (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686961A (en) * 2020-12-31 2021-04-20 杭州海康机器人技术有限公司 Method and device for correcting calibration parameters of depth camera
CN112785705A (en) * 2021-01-21 2021-05-11 中国科学技术大学 Pose acquisition method and device and mobile equipment
CN112802185A (en) * 2021-01-26 2021-05-14 合肥工业大学 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception
CN112862803A (en) * 2021-02-26 2021-05-28 中国人民解放军93114部队 Infrared imaging SLAM method and device based on edge and feature point fusion
CN112927308A (en) * 2021-03-26 2021-06-08 鹏城实验室 Three-dimensional registration method, device, terminal and computer readable storage medium
CN113034596A (en) * 2021-03-26 2021-06-25 浙江大学 Three-dimensional object detection and tracking method
CN113052898A (en) * 2021-04-08 2021-06-29 四川大学华西医院 Point cloud and strong-reflection target real-time positioning method based on active binocular camera
CN113094462A (en) * 2021-04-30 2021-07-09 腾讯科技(深圳)有限公司 Data processing method and device and storage medium
CN113240806A (en) * 2021-05-13 2021-08-10 深圳市慧鲤科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN113324542A (en) * 2021-06-07 2021-08-31 北京京东乾石科技有限公司 Positioning method, device, equipment and storage medium
CN113345017A (en) * 2021-05-11 2021-09-03 香港理工大学深圳研究院 Method for assisting visual SLAM by using mark
CN113361365A (en) * 2021-05-27 2021-09-07 浙江商汤科技开发有限公司 Positioning method and device, equipment and storage medium
CN113362358A (en) * 2021-06-02 2021-09-07 东南大学 Robust pose estimation method based on instance segmentation in dynamic scene
CN113382365A (en) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 Pose tracking method and device of mobile terminal
CN113420590A (en) * 2021-05-13 2021-09-21 北京航空航天大学 Robot positioning method, device, equipment and medium in weak texture environment
CN113432593A (en) * 2021-06-25 2021-09-24 北京华捷艾米科技有限公司 Centralized synchronous positioning and map construction method, device and system
CN113487741A (en) * 2021-06-01 2021-10-08 中国科学院自动化研究所 Dense three-dimensional map updating method and device
CN113591865A (en) * 2021-07-28 2021-11-02 深圳甲壳虫智能有限公司 Loop detection method and device and electronic equipment
CN113609985A (en) * 2021-08-05 2021-11-05 诺亚机器人科技(上海)有限公司 Object pose detection method, detection device, robot and storage medium
CN113744308A (en) * 2021-08-06 2021-12-03 高德软件有限公司 Pose optimization method, pose optimization device, electronic device, pose optimization medium, and program product
CN113784026A (en) * 2021-08-30 2021-12-10 鹏城实验室 Method, apparatus, device and storage medium for calculating position information based on image
CN113793414A (en) * 2021-08-17 2021-12-14 中科云谷科技有限公司 Method, processor and device for establishing three-dimensional view of industrial field environment
CN113838129A (en) * 2021-08-12 2021-12-24 高德软件有限公司 Method, device and system for obtaining pose information
CN113850293A (en) * 2021-08-20 2021-12-28 北京大学 Positioning method based on multi-source data and direction prior joint optimization
CN113870428A (en) * 2021-09-29 2021-12-31 北京百度网讯科技有限公司 Scene map generation method, related device and computer program product
CN113884025A (en) * 2021-09-16 2022-01-04 河南垂天智能制造有限公司 Additive manufacturing structure optical loopback detection method and device, electronic equipment and storage medium
CN113936042A (en) * 2021-12-16 2022-01-14 深圳佑驾创新科技有限公司 Target tracking method and device and computer readable storage medium
CN114202579A (en) * 2021-11-01 2022-03-18 东北大学 Real-time multi-body SLAM system oriented to dynamic scene
CN114812540A (en) * 2022-06-23 2022-07-29 深圳市普渡科技有限公司 Picture construction method and device and computer equipment
CN115375870A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Loop detection optimization method, electronic equipment and computer readable storage device
CN115578432A (en) * 2022-09-30 2023-01-06 北京百度网讯科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN115700507A (en) * 2021-07-30 2023-02-07 北京小米移动软件有限公司 Map updating method and device
CN116030136A (en) * 2023-03-29 2023-04-28 中国人民解放军国防科技大学 Cross-view visual positioning method and device based on geometric features and computer equipment
CN117746381A (en) * 2023-12-12 2024-03-22 北京迁移科技有限公司 Pose estimation model configuration method and pose estimation method
CN117893693A (en) * 2024-03-15 2024-04-16 南昌航空大学 Dense SLAM three-dimensional scene reconstruction method and device

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085842B (en) * 2019-06-14 2024-04-09 北京京东乾石科技有限公司 Depth value determining method and device, electronic equipment and storage medium
CN110349213B (en) * 2019-06-28 2023-12-12 Oppo广东移动通信有限公司 Pose determining method and device based on depth information, medium and electronic equipment
CN110866977B (en) * 2019-10-31 2023-06-16 Oppo广东移动通信有限公司 Augmented reality processing method, device, system, storage medium and electronic equipment
CN112923916B (en) * 2019-12-06 2022-12-06 杭州海康机器人股份有限公司 Map simplifying method and device, electronic equipment and machine-readable storage medium
CN113034538B (en) * 2019-12-25 2023-09-05 杭州海康威视数字技术股份有限公司 Pose tracking method and device of visual inertial navigation equipment and visual inertial navigation equipment
CN111105462B (en) * 2019-12-30 2024-05-28 联想(北京)有限公司 Pose determining method and device, augmented reality equipment and readable storage medium
CN113191174B (en) * 2020-01-14 2024-04-09 北京京东乾石科技有限公司 Article positioning method and device, robot and computer readable storage medium
CN111292365B (en) * 2020-01-23 2023-07-25 抖音视界有限公司 Method, apparatus, electronic device and computer readable medium for generating depth map
CN111310654B (en) * 2020-02-13 2023-09-08 北京百度网讯科技有限公司 Map element positioning method and device, electronic equipment and storage medium
CN111311588B (en) * 2020-02-28 2024-01-05 浙江商汤科技开发有限公司 Repositioning method and device, electronic equipment and storage medium
CN111462107B (en) * 2020-04-10 2020-10-30 视研智能科技(广州)有限公司 End-to-end high-precision industrial part shape modeling method
CN111652933B (en) * 2020-05-06 2023-08-04 Oppo广东移动通信有限公司 Repositioning method and device based on monocular camera, storage medium and electronic equipment
CN111784778B (en) * 2020-06-04 2022-04-12 华中科技大学 Binocular camera external parameter calibration method and system based on linear solving and nonlinear optimization
CN111623773B (en) * 2020-07-17 2022-03-04 国汽(北京)智能网联汽车研究院有限公司 Target positioning method and device based on fisheye vision and inertial measurement
CN111833403B (en) * 2020-07-27 2024-05-31 闪耀现实(无锡)科技有限公司 Method and apparatus for spatial localization
CN111951262B (en) * 2020-08-25 2024-03-12 杭州易现先进科技有限公司 VIO error correction method, device, system and electronic device
CN112348889B (en) * 2020-10-23 2024-06-07 浙江商汤科技开发有限公司 Visual positioning method, and related device and equipment
CN112686953A (en) * 2020-12-21 2021-04-20 北京三快在线科技有限公司 Visual positioning method and device based on inverse depth parameter and electronic equipment
CN113256718B (en) * 2021-05-27 2023-04-07 浙江商汤科技开发有限公司 Positioning method and device, equipment and storage medium
CN113506369A (en) * 2021-07-13 2021-10-15 阿波罗智能技术(北京)有限公司 Method, apparatus, electronic device, and medium for generating map
CN113535875A (en) * 2021-07-14 2021-10-22 北京百度网讯科技有限公司 Map data expansion method, map data expansion device, electronic apparatus, map data expansion medium, and program product
CN113689485B (en) * 2021-08-25 2022-06-07 北京三快在线科技有限公司 Method and device for determining depth information of unmanned aerial vehicle, unmanned aerial vehicle and storage medium
CN113781563B (en) * 2021-09-14 2023-10-24 中国民航大学 Mobile robot loop detection method based on deep learning
CN114331915B (en) * 2022-03-07 2022-08-05 荣耀终端有限公司 Image processing method and electronic device
CN114750147B (en) * 2022-03-10 2023-11-24 深圳甲壳虫智能有限公司 Space pose determining method and device of robot and robot
CN115919461B (en) * 2022-12-12 2023-08-08 之江实验室 SLAM-based surgical navigation method
CN116386016B (en) * 2023-05-22 2023-10-10 杭州睿影科技有限公司 Foreign matter treatment method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933755A (en) * 2014-03-18 2015-09-23 华为技术有限公司 Static object reconstruction method and system
CN107869989A (en) * 2017-11-06 2018-04-03 东北大学 A kind of localization method and system of the fusion of view-based access control model inertial navigation information
CN108369741A (en) * 2015-12-08 2018-08-03 三菱电机株式会社 Method and system for registration data
CN109345588A (en) * 2018-09-20 2019-02-15 浙江工业大学 A kind of six-degree-of-freedom posture estimation method based on Tag
CN110335316A (en) * 2019-06-28 2019-10-15 Oppo广东移动通信有限公司 Method, apparatus, medium and electronic equipment are determined based on the pose of depth information
CN110349213A (en) * 2019-06-28 2019-10-18 Oppo广东移动通信有限公司 Method, apparatus, medium and electronic equipment are determined based on the pose of depth information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102589571B (en) * 2012-01-18 2014-06-04 西安交通大学 Spatial three-dimensional vision-computing verification method
EP3451288A1 (en) * 2017-09-04 2019-03-06 Universität Zürich Visual-inertial odometry with an event camera
CN109658449B (en) * 2018-12-03 2020-07-10 华中科技大学 Indoor scene three-dimensional reconstruction method based on RGB-D image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933755A (en) * 2014-03-18 2015-09-23 华为技术有限公司 Static object reconstruction method and system
CN108369741A (en) * 2015-12-08 2018-08-03 三菱电机株式会社 Method and system for registration data
CN107869989A (en) * 2017-11-06 2018-04-03 东北大学 A kind of localization method and system of the fusion of view-based access control model inertial navigation information
CN109345588A (en) * 2018-09-20 2019-02-15 浙江工业大学 A kind of six-degree-of-freedom posture estimation method based on Tag
CN110335316A (en) * 2019-06-28 2019-10-15 Oppo广东移动通信有限公司 Method, apparatus, medium and electronic equipment are determined based on the pose of depth information
CN110349213A (en) * 2019-06-28 2019-10-18 Oppo广东移动通信有限公司 Method, apparatus, medium and electronic equipment are determined based on the pose of depth information

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686961B (en) * 2020-12-31 2024-06-04 杭州海康机器人股份有限公司 Correction method and device for calibration parameters of depth camera
CN112686961A (en) * 2020-12-31 2021-04-20 杭州海康机器人技术有限公司 Method and device for correcting calibration parameters of depth camera
CN112785705A (en) * 2021-01-21 2021-05-11 中国科学技术大学 Pose acquisition method and device and mobile equipment
CN112785705B (en) * 2021-01-21 2024-02-09 中国科学技术大学 Pose acquisition method and device and mobile equipment
CN112802185A (en) * 2021-01-26 2021-05-14 合肥工业大学 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception
CN112802185B (en) * 2021-01-26 2022-08-02 合肥工业大学 Endoscope image three-dimensional reconstruction method and system facing minimally invasive surgery space perception
CN112862803A (en) * 2021-02-26 2021-05-28 中国人民解放军93114部队 Infrared imaging SLAM method and device based on edge and feature point fusion
CN112862803B (en) * 2021-02-26 2023-09-26 中国人民解放军93114部队 Infrared imaging SLAM method and device based on edge and feature point fusion
CN112927308B (en) * 2021-03-26 2023-09-26 鹏城实验室 Three-dimensional registration method, device, terminal and computer readable storage medium
CN113034596B (en) * 2021-03-26 2022-05-13 浙江大学 Three-dimensional object detection and tracking method
CN113034596A (en) * 2021-03-26 2021-06-25 浙江大学 Three-dimensional object detection and tracking method
CN112927308A (en) * 2021-03-26 2021-06-08 鹏城实验室 Three-dimensional registration method, device, terminal and computer readable storage medium
CN113052898A (en) * 2021-04-08 2021-06-29 四川大学华西医院 Point cloud and strong-reflection target real-time positioning method based on active binocular camera
CN113052898B (en) * 2021-04-08 2022-07-12 四川大学华西医院 Point cloud and strong-reflection target real-time positioning method based on active binocular camera
CN113094462A (en) * 2021-04-30 2021-07-09 腾讯科技(深圳)有限公司 Data processing method and device and storage medium
CN113094462B (en) * 2021-04-30 2023-10-24 腾讯科技(深圳)有限公司 Data processing method and device and storage medium
CN113345017A (en) * 2021-05-11 2021-09-03 香港理工大学深圳研究院 Method for assisting visual SLAM by using mark
CN113345017B (en) * 2021-05-11 2022-09-20 香港理工大学深圳研究院 Method for assisting visual SLAM by using mark
CN113240806A (en) * 2021-05-13 2021-08-10 深圳市慧鲤科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN113420590A (en) * 2021-05-13 2021-09-21 北京航空航天大学 Robot positioning method, device, equipment and medium in weak texture environment
CN113420590B (en) * 2021-05-13 2022-12-06 北京航空航天大学 Robot positioning method, device, equipment and medium in weak texture environment
CN113382365A (en) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 Pose tracking method and device of mobile terminal
CN113361365A (en) * 2021-05-27 2021-09-07 浙江商汤科技开发有限公司 Positioning method and device, equipment and storage medium
CN113487741B (en) * 2021-06-01 2024-05-28 中国科学院自动化研究所 Dense three-dimensional map updating method and device
CN113487741A (en) * 2021-06-01 2021-10-08 中国科学院自动化研究所 Dense three-dimensional map updating method and device
CN113362358A (en) * 2021-06-02 2021-09-07 东南大学 Robust pose estimation method based on instance segmentation in dynamic scene
CN113324542B (en) * 2021-06-07 2024-04-12 北京京东乾石科技有限公司 Positioning method, device, equipment and storage medium
CN113324542A (en) * 2021-06-07 2021-08-31 北京京东乾石科技有限公司 Positioning method, device, equipment and storage medium
CN113432593A (en) * 2021-06-25 2021-09-24 北京华捷艾米科技有限公司 Centralized synchronous positioning and map construction method, device and system
CN113591865B (en) * 2021-07-28 2024-03-26 深圳甲壳虫智能有限公司 Loop detection method and device and electronic equipment
CN113591865A (en) * 2021-07-28 2021-11-02 深圳甲壳虫智能有限公司 Loop detection method and device and electronic equipment
CN115700507A (en) * 2021-07-30 2023-02-07 北京小米移动软件有限公司 Map updating method and device
CN115700507B (en) * 2021-07-30 2024-02-13 北京小米移动软件有限公司 Map updating method and device
CN113609985B (en) * 2021-08-05 2024-02-23 诺亚机器人科技(上海)有限公司 Object pose detection method, detection device, robot and storable medium
CN113609985A (en) * 2021-08-05 2021-11-05 诺亚机器人科技(上海)有限公司 Object pose detection method, detection device, robot and storage medium
CN113744308B (en) * 2021-08-06 2024-02-20 高德软件有限公司 Pose optimization method, pose optimization device, electronic equipment, medium and program product
CN113744308A (en) * 2021-08-06 2021-12-03 高德软件有限公司 Pose optimization method, pose optimization device, electronic device, pose optimization medium, and program product
CN113838129A (en) * 2021-08-12 2021-12-24 高德软件有限公司 Method, device and system for obtaining pose information
CN113838129B (en) * 2021-08-12 2024-03-15 高德软件有限公司 Method, device and system for obtaining pose information
CN113793414A (en) * 2021-08-17 2021-12-14 中科云谷科技有限公司 Method, processor and device for establishing three-dimensional view of industrial field environment
CN113850293A (en) * 2021-08-20 2021-12-28 北京大学 Positioning method based on multi-source data and direction prior joint optimization
CN113784026A (en) * 2021-08-30 2021-12-10 鹏城实验室 Method, apparatus, device and storage medium for calculating position information based on image
CN113884025A (en) * 2021-09-16 2022-01-04 河南垂天智能制造有限公司 Additive manufacturing structure optical loopback detection method and device, electronic equipment and storage medium
CN113884025B (en) * 2021-09-16 2024-05-03 河南垂天智能制造有限公司 Method and device for detecting optical loop of additive manufacturing structure, electronic equipment and storage medium
CN113870428A (en) * 2021-09-29 2021-12-31 北京百度网讯科技有限公司 Scene map generation method, related device and computer program product
CN114202579A (en) * 2021-11-01 2022-03-18 东北大学 Real-time multi-body SLAM system oriented to dynamic scene
CN113936042A (en) * 2021-12-16 2022-01-14 深圳佑驾创新科技有限公司 Target tracking method and device and computer readable storage medium
CN113936042B (en) * 2021-12-16 2022-04-05 深圳佑驾创新科技有限公司 Target tracking method and device and computer readable storage medium
CN114812540A (en) * 2022-06-23 2022-07-29 深圳市普渡科技有限公司 Picture construction method and device and computer equipment
CN115578432A (en) * 2022-09-30 2023-01-06 北京百度网讯科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN115375870A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Loop detection optimization method, electronic equipment and computer readable storage device
CN115375870B (en) * 2022-10-25 2023-02-10 杭州华橙软件技术有限公司 Loop detection optimization method, electronic equipment and computer readable storage device
CN116030136B (en) * 2023-03-29 2023-06-09 中国人民解放军国防科技大学 Cross-view visual positioning method and device based on geometric features and computer equipment
CN116030136A (en) * 2023-03-29 2023-04-28 中国人民解放军国防科技大学 Cross-view visual positioning method and device based on geometric features and computer equipment
CN117746381A (en) * 2023-12-12 2024-03-22 北京迁移科技有限公司 Pose estimation model configuration method and pose estimation method
CN117893693A (en) * 2024-03-15 2024-04-16 南昌航空大学 Dense SLAM three-dimensional scene reconstruction method and device
CN117893693B (en) * 2024-03-15 2024-05-28 南昌航空大学 Dense SLAM three-dimensional scene reconstruction method and device

Also Published As

Publication number Publication date
CN110349213A (en) 2019-10-18
CN110349213B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
WO2020259248A1 (en) Depth information-based pose determination method and device, medium, and electronic apparatus
CN110335316B (en) Depth information-based pose determination method, device, medium and electronic equipment
CN110322500B (en) Optimization method and device for instant positioning and map construction, medium and electronic equipment
US10948297B2 (en) Simultaneous location and mapping (SLAM) using dual event cameras
WO2019170164A1 (en) Depth camera-based three-dimensional reconstruction method and apparatus, device, and storage medium
Chen et al. Rise of the indoor crowd: Reconstruction of building interior view via mobile crowdsourcing
JP5722502B2 (en) Planar mapping and tracking for mobile devices
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
WO2015135323A1 (en) Camera tracking method and device
US20210274358A1 (en) Method, apparatus and computer program for performing three dimensional radio model construction
CN113674416B (en) Three-dimensional map construction method and device, electronic equipment and storage medium
CN111709973B (en) Target tracking method, device, equipment and storage medium
JP2019075082A (en) Video processing method and device using depth value estimation
WO2021136386A1 (en) Data processing method, terminal, and server
EP3274964B1 (en) Automatic connection of images using visual features
CN110349212B (en) Optimization method and device for instant positioning and map construction, medium and electronic equipment
WO2022174711A1 (en) Visual inertial system initialization method and apparatus, medium, and electronic device
Nousias et al. Large-scale, metric structure from motion for unordered light fields
TW202244680A (en) Pose acquisition method, electronic equipment and storage medium
WO2023015938A1 (en) Three-dimensional point detection method and apparatus, electronic device, and storage medium
CN113610702B (en) Picture construction method and device, electronic equipment and storage medium
CN112258647B (en) Map reconstruction method and device, computer readable medium and electronic equipment
CN110849380B (en) Map alignment method and system based on collaborative VSLAM
JP2014102805A (en) Information processing device, information processing method and program
Laskar et al. Robust loop closures for scene reconstruction by combining odometry and visual correspondences

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20831213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20831213

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20831213

Country of ref document: EP

Kind code of ref document: A1