WO2019233090A1 - 一种同时定位与建图的方法及装置 - Google Patents
一种同时定位与建图的方法及装置 Download PDFInfo
- Publication number
- WO2019233090A1 WO2019233090A1 PCT/CN2018/124786 CN2018124786W WO2019233090A1 WO 2019233090 A1 WO2019233090 A1 WO 2019233090A1 CN 2018124786 W CN2018124786 W CN 2018124786W WO 2019233090 A1 WO2019233090 A1 WO 2019233090A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- map
- view
- point
- camera
- large field
- Prior art date
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 82
- 230000004807 localization Effects 0.000 title abstract description 5
- 239000013598 vector Substances 0.000 claims description 84
- 238000005457 optimization Methods 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 30
- 238000003384 imaging method Methods 0.000 claims description 17
- 230000001186 cumulative effect Effects 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 11
- 238000004891 communication Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims 2
- 230000008569 process Effects 0.000 description 25
- 239000011159 matrix material Substances 0.000 description 22
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000033001 locomotion Effects 0.000 description 7
- 230000009466 transformation Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 241000274965 Cyrestis thyodamas Species 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/003—Navigation within 3D models or images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
- G06T3/047—Fisheye or wide-angle transformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/579—Depth or shape recovery from multiple images from motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30181—Earth observation
- G06T2207/30184—Infrastructure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Definitions
- the invention relates to the field of simultaneous positioning and mapping, in particular to the field of simultaneous positioning and mapping based on a large field of view camera.
- Simultaneous Localization and Mapping is a technology that tracks the movement of a robot in real time and simultaneously builds a map of the surrounding environment to achieve positioning and navigation.
- the camera used in traditional SLAM is a perspective camera or a pinhole camera. Due to the limited field-of-view of the camera, there are insufficient common features between the acquired images, which may cause the SLAM algorithm to lose track. Compared with the pinhole camera used in traditional SLAM, the large-field-of-view camera has a wider field of view, so it has received extensive research and attention.
- One is to use the traditional de-distortion method to perform the distortion processing on the large-field-of-view image obtained by the large-field-of-view camera, and then use the traditional SLAM technology to realize the simultaneous positioning and mapping using the de-distorted image as a normal image.
- This technical solution is simple and easy to implement, but the traditional de-distortion method will lead to a loss of a large number of viewing angles, and cannot make full use of the wide viewing angle of a large field of view camera.
- the other is based on the large-field-of-view camera imaging model to directly perform SLAM processing on large-field-of-view images without distortion correction. That is, features are directly extracted and processed on a large field of view image without distortion correction. The features extracted by this technical solution may be affected by image distortion. At the same time, complex large-field-of-view camera imaging models will cause optimization to become extremely complicated, thereby affecting the performance of the system.
- the purpose of this application is to provide a method for simultaneous positioning and mapping.
- This method can, based on the multi-virtual pinhole camera model, de-distortion the large-field-of-view image acquired by the large-field-of-view camera; perform simultaneous positioning and mapping based on the de-distorted image.
- the present application provides a method for simultaneous positioning and mapping.
- the method includes: obtaining a large field of view image through a large field of view camera; obtaining a de-distortion image corresponding to the large field of view image based on a multiple virtual pinhole camera model; and determining the large field of view based on the de-distortion image Camera pose and build a map.
- the multi-virtual pinhole camera model includes at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the large field of view camera.
- the large field of view camera is a monocular large field of view camera.
- the determining the pose of the large-field-of-view camera and constructing a map based on the de-distortion image includes an initialization step including: obtaining a de-distortion image corresponding to a first time and a de-distortion image corresponding to a second time. Determining that the dedistortion image corresponding to the first time and the dedistortion image corresponding to the second time match feature points with each other; and constructing an initial map based on the mutually matched feature points.
- the constructing an initial map based on the matched feature points includes: based on the feature points in the de-distortion image corresponding to the first moment and the large-field-of-view camera at the first moment.
- the camera center determines the direction vector corresponding to the first feature point; determines the second feature based on the matched feature points in the de-distortion image corresponding to the second moment and the camera center of the large field of view camera at the second moment.
- a direction vector corresponding to the point triangulating the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and constructing an initial map based on the map point .
- the large field of view camera is a monocular large field of view camera.
- the determining the pose of the large-field-of-view camera and constructing a map based on the de-distortion image includes a global bundle optimization step, which includes: for each key large field of view in the map Frame, each map point associated with the key large field of view frame is projected into a multiple virtual pinhole camera model to obtain a reprojection point of the map point in the multiple virtual pinhole camera model; according to the map
- the re-projection point points in the multi-virtual pinhole camera model and the feature points corresponding to the map points determine the re-projection errors of the map points; according to the re-projection points of all map points associated with the key large field of view frames
- the projection error determines a re-projection error; based on the re-projection error, the pose of the key large field of view frame and the positions of all map points associated with the key large field of view frame are updated.
- the large field of view camera is a monocular large field of view camera.
- the determining the pose of the large field of view camera and constructing a map based on the de-distortion image includes a tracking step, and the tracking step includes: for each map point associated with the current large field of view frame, dividing the map point Projecting into a multiple virtual pinhole camera model to obtain a reprojection point of the map point in the multiple virtual pinhole camera model; according to the reprojection point of the map point in the multiple virtual pinhole camera model and The feature point corresponding to the map point determines the reprojection error of the map point; the reprojection error is determined according to the reprojection error of all the map points associated with the current large field of view frame; and based on the reprojection error, the updated Describe the pose of the current large field of view frame.
- the large field of view camera is a monocular large field of view camera.
- the determining the pose of the large-field-of-view camera and constructing a map based on the de-distortion image includes a mapping step, and the mapping step includes determining feature points where the current large-field-of-view frame and its reference frame match each other; Determining a direction vector corresponding to the first feature point based on the feature points of the current large field of view frame and the camera center of the current wide field of view camera; based on the feature points matched by the reference frame and all the positions corresponding to the reference frame
- the camera center of the large field of view camera determines a direction vector corresponding to the second feature point; triangulates the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point to determine the feature point Corresponding map points; a map is constructed based on the map points.
- the mapping step further includes a local bundling optimization step.
- the local bundling optimization step includes: for each key large field of view frame in the local map, projecting each map point associated with the key large field of view frame into a multiple virtual pinhole camera model to obtain the map A re-projection point of the point in the multi-virtual pinhole camera model; determining the map point according to a feature point corresponding to the map point of the re-projection point of the map point in the multi-virtual pin-hole camera model Re-projection error; determining re-projection errors according to re-projection errors of map points associated with all of the key large field-of-view frames; updating poses of the key large field-of-view frames and the large The position of all map points associated with the field of view frame.
- the large field of view camera is a binocular large field of view camera.
- the method includes: obtaining a left field of view image and a right field of view image through the binocular large field of view camera; obtaining a left de-distorted image corresponding to the left field of view image based on a first multi-virtual pinhole camera model; A second multi-virtual pinhole camera model to obtain a right de-distortion image corresponding to the right field of view image; and determining a pose of the binocular large field of view camera based on the left de-distortion image and the right de-distortion image And build the map.
- the first multi-virtual pinhole camera model includes at least two virtual pinhole cameras of different orientations, and a camera center of the at least two virtual pinhole cameras of different orientations and a binocular large-field-of-view camera The camera centers of the left cameras are coincident;
- the second multi-virtual pinhole camera model includes at least two virtual pinhole cameras of different orientations, and the camera centers of the at least two virtual pinhole cameras of different orientations and the dual The camera center of the right camera of the large field of view camera coincides.
- the based on the left and right distortion images. Determining the pose of the binocular large field of view camera and constructing a map includes an initialization step, the initialization step includes: determining feature points where the left de-distorted image and the right de-distorted image match each other; based on the mutual matching Feature points to build an initial map.
- determining the feature points where the left de-distortion image and the right de-distortion image match each other includes: determining that feature points in the left de-distortion image correspond to the right-de-distortion image.
- the epipolar line is a multi-line segment polyline.
- the constructing an initial map based on the matched feature points includes: determining the feature points in the left de-distorted image and the camera center of the left camera of the binocular large field of view camera to determine A direction vector corresponding to the first feature point; determining a direction vector corresponding to the second feature point based on the matched feature point in the right de-distortion image and the camera center of the right camera of the binocular large field of view camera; Said the baseline of the binocular large field of view camera, triangulating a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; based on the map Click to build the initial map.
- the determining the pose of the binocular large-field-of-view camera and constructing a map based on the left de-distorted image and the right de-distorted image includes a global bundling optimization step.
- the global bundling optimization step includes: for each key binocular image frame in the map, projecting a map point associated with the key binocular image frame into a first multiple virtual pinhole camera model to obtain the A re-projection point of a map point in the first multi-virtual pin-hole camera model; and a feature point corresponding to the map point according to the re-projection point of the map point in the first multi-virtual pin-hole camera model, Determining a reprojection error of the map points; determining a left reprojection error according to the reprojection errors of map points associated with all the key binocular image frames; or projecting a map point associated with the key binocular image frames to a second In the multi-virtual pinhole camera model, a re-projection point of the map point in the
- the determining a pose of the binocular large field of view camera and constructing a map based on the left de-distorted image and the right de-distorted image includes a tracking step.
- the tracking step includes: for each map point associated with the current binocular image frame, projecting the map point into a first multi-virtual pinhole camera model to obtain the map point in the first multi-virtual pinhole camera model A reprojection point in a camera model; determining a reprojection error of the map point according to a feature point corresponding to the map point in the first multi-virtual pinhole camera model and the map point;
- the reprojection error of all the map points associated with the current binocular image frame determines the left reprojection error; or the map points are projected into a second multi-virtual pinhole camera model to obtain the map points in the second A re-projection point in a multi-virtual pinhole camera model; determining a re-projection of the map point according to a feature point corresponding to the re-projection point
- determining the pose of the binocular large-field-of-view camera and constructing a map based on the left de-distorted image and the right de-distorted image includes a mapping step.
- the mapping step includes: determining feature points where the current left de-distortion image and the current right de-distortion image match each other; based on the feature points of the current left de-distortion image and the left camera of the current binocular large-field-of-view camera Determine the direction vector corresponding to the first feature point; based on the feature point of the current right de-distortion image and the camera center of the right camera of the current binocular large field of view camera, determine the second feature point corresponding A direction vector; triangulating a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and constructing a map based on the map point.
- the mapping step further includes a local bundling optimization step.
- the local bundling optimization step includes: for each key binocular image frame in the local map, projecting a map point associated with the key binocular image frame into a first multi-virtual pinhole camera model to obtain the map A re-projection point of the point in the first multi-virtual pinhole camera model; determined according to a feature point corresponding to the map point of the re-projection point of the map point in the first multi-virtual pin-hole camera model The reprojection error of the map points; determining the left reprojection error according to the reprojection errors of the map points associated with all the key binocular image frames; or projecting the map points associated with the key binocular image frames to a second multiple In the virtual pinhole camera model, a reprojection point of the map point in the second multiple virtual pinhole camera model is obtained; according to the map point, the reprojection point in the second multiple virtual pinhole camera model is obtained Determining a reprojection error of the map point according to
- determining the pose of the large field of view camera and constructing a map based on the de-distortion image includes a closed-loop detection processing step.
- the closed-loop detection processing step includes: when the current large-view frame is a key large-view frame, determining a closed-loop large-view frame in a map database similar to the current large-view frame; determining the current large-view frame Feature points that match each other with the closed loop large field of view frame; for each matching feature point in the current large field of view frame, map points associated with the feature point are transformed into multiple corresponding points of the closed loop field of view frame
- the coordinate system of the virtual pinhole camera model is then projected onto the imaging plane of the multiple virtual pinhole camera model to obtain the reprojection point of the map point in the closed-loop large field of view frame.
- the current large-view frame has a key large-view frame with a common view relationship and
- the at least two different orientations include a front orientation, an upward orientation, a downward orientation, a left orientation, or a right orientation of the cube.
- An aspect of the present application provides an apparatus for simultaneously positioning and mapping
- the apparatus includes at least one storage device, the storage device includes a set of instructions, and at least one processor in communication with the at least one storage device.
- the at least one processor is configured to enable the device for simultaneous positioning and mapping: obtaining a large field of view image through a large field of view camera; The distorted image corresponding to the large field of view image is described; based on the distorted image, a pose of the large field of view camera is determined and a map is constructed.
- the multiple virtual pinhole camera model includes at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the large field of view camera.
- FIG. 1 illustrates a system for simultaneous positioning and mapping according to some embodiments of the present application
- FIG. 2 shows a flowchart of a method for simultaneous positioning and mapping according to some embodiments of the present application
- FIG. 3 illustrates a multiple virtual pinhole camera model including two orientations according to some embodiments of the present application
- FIG. 4 illustrates a multi-virtual pinhole camera model including five orientations according to some embodiments of the present application
- FIG. 5 shows a schematic diagram of distortion removal based on a multiple virtual pinhole camera model according to some embodiments of the present application
- FIG. 6 illustrates original monocular fisheye images, traditional monocular fisheye images after de-distortion, and monocular fisheye images after distortion using the method of the present disclosure, according to some embodiments of the present application;
- FIG. 7 illustrates an original binocular fisheye image and a conventional binocular fisheye image after de-distortion, according to some embodiments of the present application
- FIG. 8 shows a flowchart of determining a camera pose and constructing a map according to some embodiments of the present application
- FIG. 9 shows a schematic diagram of a map point constructed by a monocular large field of view camera according to some embodiments of the present application.
- FIG. 10 is a schematic diagram of polar line search of a binocular large field of view camera according to some embodiments of the present application.
- FIG. 11 is a schematic diagram illustrating a map point constructed by a binocular large field of view camera according to some embodiments of the present application.
- the flowchart used in the present disclosure illustrates operations implemented by a system according to some embodiments of the present disclosure. It should be clearly understood that the operations of the flowchart may be implemented out of order. Instead, operations can be performed in reverse order or simultaneously. In addition, you can add one or more other actions to the flowchart. One or more actions can be removed from the flowchart.
- One aspect of the present disclosure relates to a method of simultaneous positioning and mapping.
- the method includes de-distorting the large-field-of-view image acquired by the large-field-of-view camera into a de-distorted image based on the multi-virtual pinhole camera model; determining the pose of the large-field-of-view camera and constructing a map based on the de-distorted image.
- the multiple virtual pinhole camera model includes at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the large field of view camera.
- FIG. 1 illustrates a system for simultaneous positioning and mapping according to some embodiments of the present application.
- the system 100 for simultaneous positioning and mapping can acquire a large field of view image and execute a method of simultaneous positioning and mapping.
- a method of simultaneous positioning and mapping For the method of simultaneous positioning and mapping, reference may be made to the description of FIG. 2 to FIG. 11.
- the system 100 for simultaneous positioning and mapping may include a large field of view camera 101 and a device 102 for simultaneous positioning and mapping.
- the large-field-of-view camera 101 and the device 102 for simultaneous positioning and mapping may be installed as a whole or separately.
- the large field of view camera 101 is used to acquire a fish-eye image of a scene.
- the large-field-of-view camera 101 may be a fish-eye camera, a refracting camera, or a panoramic imaging camera.
- the large-field-of-view camera 101 may be a monocular large-field-of-view camera, a binocular large-field-of-view camera, or a multi-eye large-field-of-view camera.
- the large field of view camera 101 includes a monocular fisheye camera and a binocular fisheye camera.
- the left camera of a binocular fisheye camera is called the left eye; the right camera of the binocular fisheye camera is called the right eye.
- the image acquired by the left eye is called a left fisheye image (left field of view image), and the image acquired by the right eye is called a right fisheye image (right field of view image).
- the device 102 for simultaneous positioning and mapping is an exemplary computing device that can perform the method of simultaneous positioning and mapping.
- the device 102 for simultaneous positioning and mapping may include a COM port 150 to facilitate data communication.
- the device 102 for simultaneously positioning and mapping may further include a processor 120 in the form of one or more processors for executing computer instructions.
- Computer instructions may include, for example, routines, programs, objects, components, data structures, procedures, modules, and functions that perform the specific functions described herein.
- the processor 120 may determine a distorted image of the fish-eye image based on the multiple virtual pinhole camera model.
- the processor 120 may determine the pose of the large-field-of-view camera 101 and construct a map based on the distorted image.
- the processor 120 may include one or more hardware processors, such as a microcontroller, microprocessor, reduced instruction set computer (RISC), application specific integrated circuit (ASIC), application-specific instruction-set Processor (ASIP), Central Processing Unit (CPU), Graphics Processing Unit (GPU), Physical Processing Unit (PPU), Microcontroller Unit, Digital Signal Processor (DSP), Field Programmable Gate Array (FPGA), Advanced RISC machine (ARM), programmable logic device (PLD), any circuit or processor capable of performing one or more functions, etc., or any combination thereof.
- RISC reduced instruction set computer
- ASIC application specific integrated circuit
- ASIP application-specific instruction-set Processor
- CPU Central Processing Unit
- GPU Graphics Processing Unit
- PPU Physical Processing Unit
- Microcontroller Unit Microcontroller Unit
- DSP Digital Signal Processor
- FPGA Field Programmable Gate Array
- ARM programmable logic device
- PLD programmable logic device
- the device 102 for simultaneous positioning and mapping may include an internal communication bus 110, program storage, and different forms of data storage (e.g., disk 170, read-only memory (ROM) 130, or random access memory (RAM) ) 140).
- the device 102 for simultaneously positioning and mapping may also include program instructions stored in ROM 130, RAM 140, and / or other types of non-transitory storage media to be executed by processor 120.
- the methods and / or processes of the present application may be implemented as program instructions.
- the device 102 for simultaneous positioning and mapping also includes an I / O component 160 that supports input / output between the computer and other components (eg, user interface elements).
- the device 102 for positioning and mapping at the same time can also receive programming and data through network communication.
- the device 102 for simultaneous positioning and mapping in this application may also include multiple processors, and therefore, the operations and / or method steps disclosed in this application may be performed by one processor as described in this disclosure, It can also be performed jointly by multiple processors.
- the processor 120 of the device 102 for positioning and mapping in the present application executes steps A and B, it should be understood that steps A and B can also be combined or separated by two different processors in information processing. Perform (eg, the first processor performs step A, the second processor performs step B, or the first and second processors perform steps A and B together).
- FIG. 2 shows a flowchart of a method for simultaneous positioning and mapping according to some embodiments of the present application.
- the process 200 may be implemented as a set of instructions in a non-transitory storage medium in the device 102 that simultaneously locates and maps.
- the device 102 for positioning and mapping at the same time can execute the set of instructions and can execute the steps in the process 200 accordingly.
- process 200 may add one or more additional operations not described, and / or delete one or more operations described herein. Furthermore, the order of operations shown in FIG. 2 and described below is not a limitation on this.
- the device 102 for simultaneous positioning and mapping can acquire a large field of view image through the large field of view camera 101.
- the large field of view camera 101 When the large field of view camera 101 is a monocular large field of view camera, the monocular large field of view camera acquires a large field of view image; when the large field of view camera 101 is a binocular large field of view camera, the binocular large field of view camera acquires large Field of view images, including left and right fields of view.
- the device 102 for simultaneous positioning and mapping may obtain a de-distortion image corresponding to the large field-of-view image based on a multiple virtual pinhole camera model.
- the device 102 for simultaneous positioning and mapping may obtain a de-distortion image corresponding to the large-field-of-view image based on a multiple virtual pinhole camera model.
- the above multiple virtual pinhole camera model may include at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the monocular large field of view camera.
- the device 102 for simultaneous positioning and mapping may obtain a left de-distorted image corresponding to the left field of view image based on the first multiple virtual pinhole camera model; Two multiple virtual pinhole camera models to obtain a right de-distortion image corresponding to the right field of view image.
- the first multiple virtual pinhole camera model and the second multiple virtual pinhole camera model may be the same or different.
- the first multi-virtual pinhole camera model may include at least two virtual pinhole cameras of different orientations, and the camera centers of the at least two virtual pinhole cameras of different orientations coincide with the left-purpose camera center of the large field of view camera 101;
- the second multi-virtual pinhole camera model may include at least two virtual pinhole cameras with different orientations, and the camera centers of the at least two virtual pinhole cameras with different orientations coincide with the right-view camera center of the large field of view camera 101.
- FIG. 3 illustrates a multiple virtual pinhole camera model including two orientations, according to some embodiments of the present application.
- the orientations of the two virtual pinhole cameras are at an angle of 90 degrees, and the camera center coincides with the camera center of the large field of view camera at point C.
- FIG. 4 illustrates a multi-virtual pinhole camera model including five orientations according to some embodiments of the present application.
- the multi-virtual pinhole camera model includes a virtual pinhole camera with a total of 5 orientations: forward, upward, downward, left, and right.
- the camera centers of the five virtual pinhole cameras and the camera centers of the large field of view coincide at point C.
- the above-mentioned de-distortion method is called a cubemap-based distortion method (hereinafter referred to as a cube or a cube model).
- the device 102 for simultaneous positioning and mapping may project a large field of view image (or left field of view image, right field of view image) to a multi-virtual pinhole camera model (or a first multi-virtual pinhole camera model, a second Multi-virtual pinhole camera model), obtain projection images of at least two virtual pinhole cameras with different orientations, and expand the projection views of the at least two virtual pinhole cameras with different orientations to obtain the left fisheye image Corresponding dedistorted image.
- a large field of view image or left field of view image, right field of view image
- a multi-virtual pinhole camera model or a first multi-virtual pinhole camera model, a second Multi-virtual pinhole camera model
- FIG. 5 a schematic diagram of de-distortion based on a multiple virtual pinhole camera model according to some embodiments of the present application is shown.
- the first multiple virtual pinhole camera model and the left field of view image are taken as examples.
- Point A is the left objective camera center of the binocular large field of view camera, and points B, C, and D are exemplary pixels in the left field of view image.
- the first multi-virtual pinhole camera 510 is a cube model, and includes five orientation virtual pinhole cameras, which are a front orientation, an upward orientation, a downward orientation, a left orientation, and a right orientation of the cube, respectively. The camera centers of the five-oriented virtual pinhole cameras coincide with point A.
- the left field of view images are projected onto the imaging planes of five differently oriented virtual pinhole cameras of the first multi-virtual pinhole camera model 510. Accordingly, five differently-oriented projection maps can be obtained. By expanding the five differently-oriented projection images, a left-distorted image can be obtained.
- FIG. 6 shows an original monocular fisheye image, a traditional monocular fisheye image after de-distortion, and a monocular fisheye after de-distortion using the method of the present disclosure, according to some embodiments of the present application. image.
- Figure 610 shows a large field of view image obtained with a monocular fisheye camera. It can be seen that the large field of view image has a wider field of view than the image obtained by an ordinary camera, but the entire image has spatial distortion, and the distortion is greater as it is farther from the center of the image.
- Figure 620 shows a de-distortion image obtained by performing a de-distortion process on the large-field-of-view image using a conventional de-distortion method.
- the angle of view of an image obtained by an ordinary camera is generally about 80 degrees, and the angle of view of the picture 620 is 100 degrees.
- the angle of view of the image obtained by the ordinary camera has been improved, it still loses a lot of angle of view compared to the image before the undistortion process. As a result, it is not possible to construct a map of all perspectives including a large field of view image.
- FIG. 630 A large field of view de-distortion image based on a five-direction multi-virtual pinhole camera model de-distortion expansion according to an embodiment of the present invention, that is, a de-distortion image obtained by a cube model. As shown, the plot 630 retains all perspectives of the large field of view image. SLAM based on this large-field-of-distortion image can construct a map that includes all original perspective content.
- FIG. 7 illustrates an original binocular fisheye image and a conventional binocular fisheye image after de-distortion, according to some embodiments of the present application.
- images 701 and 702 are respectively the original left fisheye image and right fisheye image obtained by the large field of view camera 101 in the real world.
- the images 703 and 704 are the left and right distortion images, respectively, after the traditional distortion.
- the images 601 and 602 are single images obtained by traditional de-distortion methods.
- the angle of view is only 100 degrees. It can be seen that, for the large-angle-of-view image acquired by the large-field-of-view camera 101, the de-distortion method provided in this application can effectively prevent image distortion while retaining a large-angle of view.
- the device 102 for simultaneous positioning and mapping may determine the pose of the large field of view camera and construct a map based on the de-distortion image.
- the device 102 for simultaneous positioning and mapping may extract feature points of the distorted image and construct a corresponding large field of view frame based on the extracted feature points; then based on the The large field of view frame determines the pose of the monocular large field of view camera and builds a map.
- the pose of the camera movement is estimated and a map is constructed directly based on the pixel brightness information in the large-field-of-distortion image without calculating key points and descriptors.
- the large-field-of-view distortion image obtained by the above-mentioned method based on the multiple virtual pinhole camera model de-distortion retains all the perspectives of the original large-field-of-view image. Therefore, simultaneous positioning and mapping can be performed based on the rich common features between the large field of view images to obtain more efficient positioning and more accurate maps. At the same time, the above-mentioned method also avoids the extra calculation cost brought by the complex projection model of the large field of view camera to the system.
- the device 102 for simultaneous positioning and mapping may extract feature points of the left and right distortion images, and construct a corresponding binocular image based on the extracted feature points. Frame; then determine the pose of the binocular large field of view camera and build a map based on the binocular image frame.
- the large field of view frame (or binocular image frame) includes information on all feature points in the de-distorted image (or left de-distorted image, right de-distorted image), it is possible to track the position of the large-field-of-view camera 101 accordingly. Pose and build a map.
- the device 102 for positioning and mapping at the same time may scale the de-distortion image (or left de-distortion image, right de-distortion image), and obtain the corresponding de-distortion image (or left de-distortion image, right de-distortion image).
- Image Pyramid The corner points are extracted from each scale image of the image pyramid and the descriptor is calculated.
- the corner points and the descriptors constitute characteristic points of the image.
- the corner points are highly recognizable and representative regions in the image, and are used to represent position information of the feature points in the image.
- Descriptors can be represented by vectors, which are used to describe the information of pixels around the corners. Descriptors can be designed according to similar appearance feature points.
- Feature points are extracted for a de-distorted image (or left de-distorted image, right de-distorted image), and a corresponding large field-of-view frame (or binocular image frame) is constructed based on the extracted feature points.
- the large field of view frame (or binocular image frame) includes all feature points in the corresponding de-distortion image (or left de-distortion image, right de-distortion image).
- the pixel data of the de-distorted image (or left de-distorted image, right de-distorted image) corresponding to the large-view frame (or binocular image frame) can be discarded. , Thereby saving storage space and reducing system power consumption.
- step 230 For a more detailed description of step 230, see FIG. 8 and related descriptions.
- the process 200 may further include making the left-eye and right-eye optical axes of the large-field-of-view camera 101 parallel.
- the device 102 for simultaneous positioning and mapping may adjust the virtual optical axes of the left and right eyes of the binocular fisheye camera through a binocular camera calibration program so that the virtual optical axes of the two are parallel.
- FIG. 8 shows a flowchart of determining a camera pose and constructing a map according to some embodiments of the present application.
- the process 230 may be implemented as a set of instructions in a non-transitory storage medium in the device 102 that simultaneously locates and maps.
- the device 102 for positioning and mapping at the same time can execute the set of instructions and can execute the steps in the process 200 accordingly.
- process 230 may add one or more additional operations not described, and / or delete one or more operations described herein. Furthermore, the order of operations shown in FIG. 8 and described below is not a limitation on this.
- the device 102 for simultaneous positioning and mapping may perform an initialization step, which may construct an initial map.
- the device 102 for simultaneous positioning and mapping can acquire two distorted images (or large-field-of-view frames) at two different times; determine the two images Frame) matching feature points; constructing an initial map based on the matching feature points.
- the device 102 for simultaneous positioning and mapping may obtain a dedistortion image (or a large field of view frame) corresponding to a first moment and a dedistortion image (or a large field of view frame) corresponding to a second moment; determine the first Feature points at which the dedistortion image (or large field of view frame) corresponding to the time and the second time corresponding to the dedistortion image (or large field of view frame) match each other; and an initial map is constructed based on the mutually matched feature points.
- the large field of view frame corresponding to the first time and the large field of view frame corresponding to the second time may be the current large field of view frame and the reference large field of view frame.
- the current large field of view frame and the reference large field of view frame may be continuous frames, or there may be one or more interval frames between the two. A certain parallax needs to exist between the current large field of view frame and the reference large field of view frame to ensure smooth initialization.
- the device 102 for simultaneous positioning and mapping may be based on a multiple virtual pinhole camera model (for example, the multiple virtual pinhole camera model shown in FIG. 4),
- the field-of-view frame) and the de-distortion image (or large field-of-view frame) corresponding to the second moment are decomposed into sub-field-of-view frames corresponding to each virtual pinhole camera, respectively. Therefore, for each virtual pinhole camera, two sub-field frames corresponding to the two sub-field frames are obtained from the dedistortion image (or large field frame) corresponding to the first moment and the dedistortion corresponding to the second moment. Image (or large field of view frame). Matching feature points are determined by performing inter-frame matching on the two sub-view frames.
- constructing the initial map based on the matched feature points includes: determining, based on the feature points in the de-distortion image corresponding to the first moment and the camera center of the large field of view camera at the first moment, determining A direction vector corresponding to the first feature point; determining a direction corresponding to the second feature point based on the matched feature point in the de-distortion image corresponding to the second moment and the camera center of the large field of view camera at the second moment Vector; triangulate a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and construct an initial map based on the map point.
- the device 102 for simultaneous positioning and mapping may decompose the reference large field of view frame F1 into sub-field frames F11, F12, F13, F14, and F11 corresponding to each virtual pinhole camera based on the multiple virtual pinhole camera model, respectively.
- the current large field of view frame F2 is also decomposed into subfield frames F21, F22, F23, F24, and F25 corresponding to each virtual pinhole camera based on the multi-virtual pinhole camera model.
- the sub-field frames F11 and F21 correspond to the forward-facing virtual pinhole camera
- the sub-field frames F12 and F22 correspond to the upward-facing virtual pinhole camera
- the sub-field frames F13 and F23 correspond to the downward-facing virtual pinhole camera
- sub-field frames F14 and F24 correspond to left-facing virtual pinhole cameras
- sub-field frames F15 and F25 correspond to right-facing virtual pinhole cameras.
- the feature points of the current large-view frame and the reference large-view frame are determined by performing inter-frame matching on the sub-view frames F11 and F21, F12 and F22, F13 and F23, F14 and F24, and F15 and F25. [Here, the matching of the sub-view frames is used to determine the feature points where the two view frames match each other, and then a triangulation is performed based on the direction vector to construct a new map point. Is the description correct?
- the sub-field frames F11 and F21 are taken as examples to describe the inter-frame matching.
- the feature points of the sub-view frames F11 and F21 are matched, and it is detected whether the number of matched feature point pairs is greater than or equal to the initialization threshold. If it is smaller than the initialization threshold, the initialization fails. If the number of matched feature point pairs exceeds the initialization threshold, for example, Random Sample Consensus (RANSAC) is used to calculate the essential matrix between two frames based on the direction vectors of the matched feature point pairs.
- the initialization threshold indicates the minimum number of feature point pairs required to initialize the map.
- the default setting value such as 100, can be directly used, or it can be set by the user in advance.
- the relative pose between the current large field of view frame and the reference large field of view frame is obtained by decomposing the essential matrix, and the relative pose can be represented by a pose matrix.
- the three-dimensional coordinates of the map point corresponding to the feature point pair that is, the position of the map point, are triangulated according to the feature point pair matched by the relative pose pair between the current large field of view frame and the reference large field of view frame.
- point O1 is the camera center of the virtual pinhole camera corresponding to the sub-field frame F11
- point O2 is the camera center of the virtual pinhole camera corresponding to the sub-field frame F12
- p1 and p2 are matched feature points .
- the three-dimensional coordinates of the map point that is, the position of the P point can be determined.
- the vectors O1p1 and O2p2 may not have intersections.
- the coordinates of the P point that minimizes the error can be obtained using the least square method.
- the distance between O1 and O2 has a great influence on the error of triangulation.
- the distance is too short, that is, the camera's translation is too small, the angular error observed at point P will cause a large depth error. If the distance is too long, the overlapping part of the scene will be much less, which makes the feature matching difficult. Therefore, a certain parallax needs to exist between the current large field of view frame and the reference large field of view frame. If the two selected large-view frames do not meet the requirements, the initialization fails. The two large-view frames are discarded and re-initialized.
- the initial map points are constructed based on the three-dimensional coordinates of the map points obtained from the triangulation.
- the three-dimensional coordinates are used as the coordinates of the map points, and the descriptors of the feature points corresponding to the three-dimensional coordinates are used as the descriptors of the map points.
- the device 102 for simultaneous positioning and mapping may perform the above-mentioned initialization steps of the monocular large-field-of-view camera; it may also be based on the feature points where the left and right distortion images match each other at the same time. Build the initial map.
- the device 102 for simultaneous positioning and mapping may determine the feature points where the left and right distortion images match each other; and build an initial map based on the mutually matched feature points.
- the device 102 for simultaneous positioning and mapping may determine the epipolar line corresponding to the feature point in the left de-distorted image in the right de-distorted image; and then search the polar line for the left-de-distorted image. Feature points in the matching feature points.
- the epipolar line is a multi-line segment polyline.
- FIG. 10 a schematic diagram of polar line search of a binocular large field of view camera according to some embodiments of the present application is shown.
- the left de-distorted image 1010 has an epipolar line 1001
- the right de-distorted image 1020 has an epipolar line 1002.
- the feature points that match the feature points of the left de-distorted image 1010 must be located in the epipolar line 1002.
- the feature points that match the feature points of the right de-distorted image 1020 must be located in the epipolar line 1001. Therefore, the feature points where the left de-distorted image and the right de-distorted image match each other can be quickly found through epipolar search.
- the epipolar line 1001 and the polar line 1002 are three-line polyline, including two inclined line segments and one horizontal line segment.
- the left de-distorted image 1010 and the right de-distorted image 1020 retain all perspectives of the left fisheye image and the right fisheye image, respectively. Simultaneous localization and mapping based on the left de-distorted image 1010 and the right de-distorted image 1020 can build a map including all the original perspective content.
- the above-mentioned constructing a map based on the matched feature points includes: first, determining a direction vector corresponding to the first feature point based on the feature points in the left de-distorted image and the left-view camera center of the large field of view camera 101 ; Secondly, based on the matched feature points in the right de-distorted image and the right-view camera center of the large field of view camera 101, a direction vector corresponding to the second feature point is determined; again, based on the baseline of the binocular fisheye camera, A direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point are triangulated to determine a map point corresponding to the feature point; finally, a map is constructed based on the map point.
- FIG. 11 a schematic diagram of constructing a map point by a binocular large field of view camera according to some embodiments of the present application is shown.
- a map point in front of the large field of view camera 101 is taken as an example.
- Point O1 is the center of the left target camera of the large field of view camera 101, and the feature point in the left de-distorted image is connected to the point O1 to obtain a direction vector corresponding to the first feature point.
- Point O2 is the center of the right-view camera of the large-field-of-view camera 101, and the matching feature point and the point O2 in the right de-distortion image are connected to obtain a direction vector corresponding to the second feature point.
- the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point may be a unitized vector.
- the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point intersect at a point E, to obtain a line segment O1E and a line segment O2E, respectively.
- Connect the point O1 and the point O2 to obtain a line segment O1O2 and the length of the line segment O1O2 is b (that is, the baseline of the large field of view camera 101).
- the line segment O1O2 forms a triangle with the line segment O1E and the line segment O2E. Solving the triangle, the length of the line segment O1E is d1, the length of the line segment O2E is d2, the angle between O1O2 and the line segment O1E is, and the angle between O1O2 and the line segment O2E is coordinate of.
- the map point E is transformed from the coordinate system of the large field of view camera 101 to the world coordinate system. Then, a map is constructed based on the position of the point E in the world coordinate system.
- the device 102 for simultaneous positioning and mapping may perform triangulation based on the following formula.
- formulas (1), (2), and (3) are obtained based on the sine and cosine theorem.
- the newly constructed map points of the monocular large field of view camera and the binocular large field of view camera need to be associated with each other.
- a map point can be observed by multiple key large field of view frames.
- the key large field of view frame where this map point is observed is associated with this map point, and which feature point on the key large field of view frame is recorded with this map point.
- the constructed initial map includes the above-mentioned two key large field-of-view frames and the above-mentioned initial map points, and information about the association relationship between them.
- the initialization step further includes: for a case where the number of matched feature point pairs exceeds an initialization threshold, constructing a vector based on a Bag of Word model according to the two key large field-of-view frames, and
- the vector based on the bag of words model is added to a map database.
- clustering is performed based on various image features. For example, eyes, nose, ears, mouth, edges and corners of various features, etc. are different feature classes. Assume that there are 10,000 classes. For each large field of view frame, you can analyze which classes it contains, with 1 being there and 0 being not. Then, this large field of view frame can be represented by a 10,000-dimensional vector. For different large-view frames, you can judge their similarity by comparing their vectors based on the bag of words model.
- the map database is used to store vectors based on the bag-of-words model constructed from key large field of view frames.
- the device 102 that simultaneously locates and maps may perform a global bundling optimization step.
- Global bundle set optimization optimizes all key large field of view frames (or key binocular image frames) and all map points in the map currently established by SLAM (hereinafter referred to as the current map).
- global bundling optimization is performed on the initial map constructed in step 810, that is, global bundling optimization is performed on the above map with only two key large video frames and map points. It can be understood that, in addition to the initial map, the global bundle set optimization can be performed on the current map at any time during the map construction process.
- the purpose of bundling optimization is to minimize the positions of map points in key large field of view frames (or Re-projection error on the key binocular image frame), thereby optimizing the map that is established.
- the device 102 for simultaneous positioning and mapping may project each map point associated with a key large field of view frame in a map to a multi-virtual pinhole camera model, and obtain each map point in the A reprojection point in a multi-virtual pinhole camera model; determining a reprojection error of each map point according to a feature point corresponding to the reprojection point of each map point in the multi-virtual pinhole camera model and the map point; Determining a reprojection error according to reprojection errors of map points associated with all the key large field of view frames; updating the pose of the key large field of view frames and associating with the key large field of view frames based on the reprojection errors The location of all map points.
- the pose of the frame (for example, the pose of the key large field of view frame) is the position of the large field of view camera 101 when the large field of view camera 101 acquires the frame. It is the pose of the frame.
- map points are transformed into the coordinate system of the corresponding virtual pinhole camera, for example, corresponding to the map point.
- the virtual pinhole camera is a forward virtual pinhole camera, and then is projected onto the imaging plane of the forward virtual pinhole camera to obtain a reprojection point of the map point. Since the multi-virtual pinhole camera model used here is the same model as the multi-virtual pinhole camera model used in the large-field-of-view image distortion processing in step 220, the imaging plane corresponds to the key large-field-of-view frame decomposition to the front Sub-field frame of a virtual pinhole camera.
- the re-projection point can be understood as an observation value of the map point based on the pose of the sub-view frame.
- the reprojection error of the map point is determined according to the feature points associated with the map point on the key large field of view frame (that is, the feature points on which the map point is obtained by triangulation) and the reprojection point of the map point. In the ideal situation where there is no error in the map established by SLAM, the reprojection error is zero. However, since real world conditions inevitably introduce errors such as measurement errors, reprojection errors cannot be completely eliminated. SLAM optimizes the maps by minimizing the reprojection errors.
- the device 102 for simultaneous positioning and mapping may project each map point associated with a key binocular image frame in a map to a first multi-virtual pinhole camera model to obtain the map points at Determining a reprojection point in the first multi-virtual pinhole camera model; and determining the feature point corresponding to the map point's reprojection point in the first multi-virtual pinhole camera model and a feature point corresponding to the map point Reprojection errors of map points; determine left reprojection errors according to reprojection errors of map points associated with all the key binocular image frames.
- the device 102 for simultaneously positioning and mapping may project the map points associated with the key binocular image frames into a second multiple virtual pinhole camera model to obtain the map points in the second multiple virtual pinhole camera model.
- a re-projection point in the hole camera model determining a re-projection error of the map point according to a feature point corresponding to the re-projection point of the map point in the second multi-virtual pinhole camera model and the map point;
- the right reprojection error is determined according to the reprojection errors of the map points associated with all the key binocular image frames.
- the device 102 for simultaneous positioning and mapping may update the pose of the key binocular image frame and all maps associated with the key binocular image frame based on the left reprojection error, the right reprojection error, or the sum of the two.
- the location of the point Specifically, for a monocular map point, the device 102 for simultaneous positioning and mapping may update the position of the key binocular image frame and the positions of all map points associated with the binocular image frame based on the left reprojection error or the right reprojection error.
- the device 102 for simultaneous positioning and mapping may update the position of the key binocular image frame and the positions of all map points associated with the binocular image frame based on the sum of the left reprojection error and the right reprojection error. .
- the device 102 for simultaneous positioning and mapping may determine a loss function based on reprojection errors (eg, the sum of left reprojection error, right reprojection error, left reprojection error, and right reprojection error).
- reprojection errors eg, the sum of left reprojection error, right reprojection error, left reprojection error, and right reprojection error.
- iteratively solve the key large field of view frame (or key double field) by using gradient descent methods such as Gauss-Newton method and Levenberg-Marquardt method. (The image frame) and the gradient corresponding to the position of the map point associated with it, and update the position of the key large field of view frame (or key binocular image frame) and the position of the map point associated with the gradient corresponding to the gradient.
- the current map reach the optimal state with the smallest reprojection error.
- the above-mentioned bundle optimization is based on the multiple virtual pinhole camera model, which is the same as the large field of view image distortion processing, and converts the complex large field of view camera projection model into multiple virtual pinhole camera projection models. This avoids the complex optimization processing brought by the complex projection model of the large field of view camera, thereby improving the system processing performance.
- the device 102 that simultaneously locates and maps may perform a tracking step.
- the tracking step optimizes the pose of the current large field of view camera by minimizing the reprojection error of the map point on the current large field of view frame (or the current binocular image frame).
- Step 830 may be performed at any time during map construction, for example, based on the initial map constructed in the above initialization step 810 or the map optimized based on the above-mentioned global bundle optimization step 820.
- SLAM continues to ) Track the pose of a large field of view camera movement.
- the device 102 for simultaneous positioning and mapping may project each map point associated with the current large field of view frame into a multi-virtual pinhole camera model to obtain the map point in the multi-virtual pin A reprojection point in a hole camera model; determining a reprojection error of the map point according to a feature point corresponding to the map point's reprojection point in the multi-virtual pinhole camera model and the map point; according to all A reprojection error of a map point associated with the current large field of view frame determines a reprojection error; and based on the reprojection error, the pose of the current large field of view frame is updated.
- the device 102 for simultaneous positioning and mapping may perform the following three sub-steps to complete the tracking step.
- Tracking sub-step 1 Determine a reference large-view frame of the current large-view frame.
- the previous large field of view frame of the current large field of view frame is determined as the reference large field of view frame.
- a key large field of view frame with the highest degree of co-viewing with the current large field of view frame in the local map is selected as the reference large field of view frame.
- the local map includes all key large field of view frames and all map points in the current map, where N is an integer greater than two. N can directly use the default setting value, such as 10, or it can be preset by the user. If the current map is an initialized map, the local map is the current map, including the initial two key large FOV frames and the map points associated with them.
- the local map includes at least N key large field of view frames with the highest degree of co-viewing with the current large field of view frames in the current map and the at least N Map points associated with key large field of view frames.
- a key large field of view frame with the highest degree of co-viewing with the previous large field of view frame of the current large field of view frame in the local map is selected as the reference large field of view frame.
- the current large-view frame and the previous large-view frame generally have a high degree of common viewing, so the reference large-view frame of the current large-view frame can be selected according to the latter.
- the key large field of view frame which has the highest co-view level of the previous large field frame is easier to select. This is beneficial to the smooth implementation of the SLAM method.
- the reference large field of view frame is determined through global matching.
- a vector based on the bag-of-words model is constructed based on the current large field of view frame.
- the map database established in the initialization step 810 is used to obtain a key large field of view frame that matches the current large field of view frame as a reference large field of view frame.
- the current large field of view frame and the previous large field of view frame are matched to obtain a matched feature point pair. If the number of matched feature point pairs is greater than the tracking threshold, it is determined that the previous large field of view frame of the current large field of view frame is the reference large field of view frame.
- the tracking threshold indicates the minimum number of feature point pairs required to track the pose of the camera with a large field of view, and a default setting value, such as 20, may be directly used, or may be preset by a user.
- the number of feature point pairs matching the current large field of view frame and the previous large field of view frame is not greater than the tracking threshold, select the key in the local map that has the highest degree of co-viewing with the current large field of view frame or its previous large field of view frame. Large field of view frame, matching the current large field of view frame and the key large field of view frame to obtain matched feature point pairs. If the number of matched feature point pairs is greater than the tracking threshold, the key large field of view frame is determined as the reference large field of view frame.
- the reference large field of view frame is determined through global matching. The specific determination process is as described above. For brevity, it will not be repeated here.
- Tracking sub-step 2 Determine the pose of the current large field of view frame based on the multiple virtual pinhole camera model based on the current large field of view frame and the reference large field of view frame determined above.
- the pose of the current large field of view frame is determined by determining the relative pose between the current large field of view frame and the reference large field of view frame.
- the current large field of view frame is decomposed into sub-field frames corresponding to each virtual pinhole camera based on the multiple virtual pinhole camera model, and the same operation is performed for the reference large field of view frame.
- two sub-field frames corresponding to it are obtained.
- a sub-field frame pair with the largest number of matching feature point pairs is selected.
- the two sub-view frames in the sub-view frame pair are inter-frame matched to obtain the relative pose between them.
- the specific inter-frame matching process of the sub-view frame is consistent with the inter-frame matching process in the initialization step 810. For brevity, details are not described herein again.
- each virtual pinhole camera Since the camera center of each virtual pinhole camera is coincident with the camera center of the large field of view camera, there is a fixed rotation angle between each virtual pinhole camera and the large field of view camera in the multiple virtual pinhole camera model.
- the rotation angle of each virtual pinhole camera corresponds to a certain rotation matrix. Therefore, the pose matrix of the large-view frame can be transformed into the pose matrix of the sub-view frame by the corresponding rotation matrix. Conversely, the pose matrix of the sub-view frame can also be transformed into the pose matrix of the large-view frame by the corresponding rotation matrix.
- the above solution uses multiple virtual pinhole camera models to convert the pose determination based on a complex large-field-of-view camera projection model into a pose calculation based on a simple virtual pin-hole camera projection model, making the SLAM algorithm for large-view Greatly simplified and significantly improved performance.
- Tracking sub-step 3 Update the pose of the current large field of view frame obtained in the above tracking sub-step 2.
- the map point associated with the characteristic point is transformed based on the multiple virtual pinhole camera model To the coordinate system of the corresponding virtual pinhole camera of the current large field of view frame. Then, the map point is projected onto the imaging plane of the virtual pinhole camera to obtain a reprojection point of the map point in the current large field of view frame.
- Processing is performed based on a multi-directional virtual pinhole camera model shown in FIG. 4.
- a matching feature point in the reference large field of view frame is on the imaging plane of the left-facing virtual pinhole camera.
- the map points associated with the feature points are transformed based on the multiple virtual pinhole camera model and correspond to the forward virtual pinhole camera coordinate system of the current large field of view frame.
- the reprojection point of the map point is obtained on the imaging plane of the forward virtual pinhole camera of the current large field of view frame.
- map point can be observed by the left-facing virtual pinhole camera in the multi-virtual pinhole camera model in the pose of the reference large-view frame, and the pose in the current large-view frame passes the multi-virtual pinhole.
- the forward-looking virtual pinhole camera in the camera model can observe the map point.
- the re-projection error of the map point is determined according to the re-projection point and the matching feature point in the current large field of view frame. Update the pose of the current large field of view frame according to the reprojection error of the map points associated with all matching feature points in the large field of view frame.
- the calculation of the reprojection error in this step and the processing of updating the pose of the current large field of view frame according to the reprojection error are consistent with the processing method in the global bundle optimization of step 820, and for the sake of brevity, they are not repeated here.
- the device 102 for simultaneous positioning and mapping may project each map point associated with the current binocular image frame into a first multi-virtual pinhole camera model to obtain that the map point is in the first A re-projection point in a multi-virtual pinhole camera model; and determining the map points based on the re-projection points of the map point in the first multi-virtual pin-hole camera model and feature points corresponding to the map points Re-projection error; determining a left re-projection error according to the re-projection errors of map points associated with all the current binocular image frames.
- the device 102 for simultaneous positioning and mapping may project the map point into a second multi-virtual pinhole camera model to obtain a re-projection of the map point in the second multi-virtual pinhole camera model Point; determining a reprojection error of the map point according to a feature point corresponding to the map point's reprojection point in the second multiple virtual pinhole camera model and the map point; according to all the current binoculars
- the reprojection error of the map points associated with the image frame determines the right reprojection error.
- the device 102 for simultaneous positioning and mapping may update the pose of the current binocular image frame based on the left reprojection error, the right reprojection error, or the sum of the two. For example, for a monocular map point, the device 102 for simultaneous positioning and mapping may update the pose of the current binocular image frame based on a left reprojection error or a right reprojection error; for a binocular map point, simultaneously locate and map The device 102 may update the pose of the current binocular image frame based on the sum of the left reprojection error and the right reprojection error.
- the device 102 for simultaneous positioning and mapping may solve the left re-projection error, the right re-projection error, or the sum of the left re-projection error and the right re-projection error to determine the pose increment of the large field of view camera 101;
- the prior information determines the current pose of the large field of view camera 101.
- the prior information may be the pose of the large-field-of-view camera 101 in the previous frame, or the sum of the pose of the large-field-of-view camera 101 in the previous frame and the pose increment of the previous frame.
- the pose increment of the previous frame is the pose increment between the pose of the large field of view camera 101 in the previous frame and the pose of the large field of view camera 101 in the previous two frames.
- the device 102 for simultaneous positioning and mapping may calculate the left projection error and / or the right projection error by using the following multiple formulas, and solve the pose increment.
- Formula (7) is expressed as follows:
- P represents a map point in the world coordinate system, which can be expressed as; represents a coordinate transformation matrix, which can convert the map point P from the world coordinate system to the coordinate system of a multiple virtual pinhole camera model; represents a rotation vector, which can map the map Point P is transformed from the coordinate system of the multiple virtual pinhole camera model to the coordinate system of one face of the multiple virtual pinhole camera model; K represents the camera matrix of the pinhole camera corresponding to each face of the virtual multiple pinhole camera model, The matrix contains camera parameters, such as the image center and focal length information; u represents the reprojection point of the map point P on one surface of the multi-virtual pinhole camera model.
- formula (7) can be further expressed as formula (8).
- P2 represents the projection point of the map point P on the coordinate system of the multiple virtual pinhole camera model
- P1 represents the projection point of the point P2 on the coordinate system of one surface of the multiple virtual pinhole camera model.
- the Jacobian matrix representing u to the camera pose and the diagonally symmetric matrix of P2.
- the Jacobian matrix of the map point P can be determined as follows:
- the Jacobian matrix of the map point P represents the rotation component of the coordinate transformation matrix.
- the device 102 for simultaneous positioning and mapping can determine the left reprojection error of the large field of view camera 101 and determine the large field of view camera based on formulas (7), (8), (9), and (10) The pose of 101.
- the device 102 for simultaneous positioning and mapping can determine the right reprojection error of the large field of view camera 101; then, based on the right reprojection error or left reprojection error and right reprojection error The sum determines the pose of the large field of view camera 101.
- the right reprojection error can be determined by formula (11). Among them, represents the re-projection point of the map point P on one surface of the second multi-virtual pinhole camera model; represents the offset of the left-eye relative to the right-eye of the large-field-of-view camera 101; and b indicates the baseline length of the large-field-of-view camera 101.
- the device 102 that simultaneously locates and maps may perform a mapping step (or a map update step).
- the mapping step can be based on the current map and expand the map with the movement of the large field of view camera. In other words, the mapping step inserts new map points into the current map.
- the mapping step 840 may be performed after the tracking step 830.
- the tracking step 830 is used to determine its pose, that is, the large field of view camera at the current moment is determined. Posture of movement.
- the device 102 for simultaneous positioning and mapping may determine feature points where the current large-field of view frame and its reference frame match each other; based on the feature points of the current large-field of view frame and the current large-field of view
- the camera center of the field camera determines the direction vector corresponding to the first feature point; based on the feature points matched by the reference frame and the camera center of the large field of view camera corresponding to the reference frame, determines the direction corresponding to the second feature point Vector; triangulate a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and construct a map based on the map point.
- the device 102 for simultaneous positioning and mapping may perform the following three sub-steps to complete the mapping step.
- Mapping sub-step 1 Determine whether the current large field of view frame is a key large field of view frame.
- the large-field-of-view camera collects data continuously, performing a map update operation on each obtained large-field-of-view frame will bring a huge amount of calculation. Therefore, you can select some large FOV frames that are considered important as the key FOV frames, and then perform map update operations based on the key FOV frames.
- How to determine the key large field of view frames can use any conventional or future developed technology. For example, based on the initial key large field of view frames, one key field of view frame is selected every ten large field of view frames at intervals. That is, select the 11th, 21st, 31st ... as the key large field of view frames. For another example, a large field of view frame with a suitable parallax from the previous key large field of view frame is selected as the key large field of view frame.
- the map update sub-step 2 is continued, and map update processing is performed according to the current large field of view frame.
- the map update sub-step 3 is continued to perform map point association processing on the current large field of view frame.
- Mapping sub-step 2 For the case where the current large field of view frame is a key large field of view frame, map update processing is performed according to the current large field of view frame.
- the key large field of view frame is decomposed into subfield frames corresponding to each virtual pinhole camera based on the multiple virtual pinhole camera model. Do the same.
- two corresponding sub-view frames are obtained, and a new map point is constructed by performing inter-frame matching on the two sub-view frames.
- a vector based on a bag of words model may be used to accelerate the matching between feature points.
- For feature point pairs matched by the bag-of-words model it is further tested whether they meet the epipolar constraint.
- the three-dimensional coordinate point of the new map point is obtained by triangulation based on the feature point pair.
- inter-field matching processing of the sub-view frames here and the process of obtaining the three-dimensional coordinate points of the new map points by triangulation based on the feature point pairs are consistent with the corresponding processing in the initialization step 810, and for the sake of brevity, they are not repeated here.
- the new map point After constructing a new map point, transform the new map point into a map point in the world coordinate system based on the pose of the current large field of view frame and insert it into the current map, and insert the current large field of view frame into the current map .
- the coordinate system of the first key large field of view frame used to construct the map during initialization is used as the world coordinate system.
- the camera coordinate system and the world coordinate system need to be transformed.
- a new bag-of-words-model-based vector is constructed according to the current large field of view frame and the new bag-of-words-based vector is added to the above-mentioned map database.
- the map database it is possible to perform vector-accelerated feature point matching based on the bag-of-words model, thereby improving the efficiency of SLAM tracking and mapping.
- Mapping sub-step 3 For the case where the current large field of view frame is not a key large field of view frame, map point association processing is performed on the current large field of view frame.
- the map point For each map point in the local map, the map point is transformed into the coordinate system of the corresponding virtual pinhole camera of the current large field of view frame based on the multiple virtual pinhole camera model according to the pose of the current large field of view frame.
- the map point is then projected onto the imaging plane of the virtual pinhole camera to obtain a reprojection point of the map point in the current large field of view frame. If the projection fails, the map point cannot be observed from the pose of the current large field of view frame. If the projection is successful, it indicates that the map point can be observed from the pose of the current large field of view frame, and a reprojection point of the map point is obtained.
- the feature point that best matches the map point is associated with the map point. It can be understood that through this step, the current large field of view frame and the map points that can be observed from the pose of the current large field of view frame are associated. In this way, when processing the next large field of view frame, the current large field of view frame can be used as the previous large field of view frame of the next large field of view frame for tracking processing. This makes SLAM tracking more consistent, more accurate positioning, and more accurate maps.
- the device 102 for simultaneous positioning and mapping may perform the above-mentioned mapping steps of the monocular large-field-of-view camera; it may also be based on the feature points where the left and right distortion images match each other at the same time. To build the map.
- the device 102 for simultaneous positioning and mapping may determine the feature points where the current left de-distorted image and the current right de-distorted image match each other; based on the feature points of the current left de-distorted image and the current binocular magnification
- the camera center of the left camera of the field camera determines the direction vector corresponding to the first feature point; based on the feature points of the current right de-distorted image and the camera center of the right camera of the current binocular large field of view camera, determine A direction vector corresponding to the second feature point; triangulating the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; based on the map point Build a map.
- the device 102 for simultaneous positioning and mapping may refer to the related description in the initialization step 810 to determine the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point. And perform triangulation.
- the mapping step 840 may further include local bundle optimization.
- the purpose of local bundle set optimization is to minimize the position of the key large field of view frame (or key binocular image frame) and the position of the map point in the local map to minimize the map point in the key large field of view frame (or Re-projection error on the key binocular image frame), thereby optimizing the map that is established.
- the map point For each map point associated with the key large field of view frame, the map point is transformed into the coordinate system of the corresponding virtual pinhole camera based on the multi-virtual pinhole camera model, and then projected onto the virtual pinhole camera. On the imaging plane to get the reprojection point of the map point.
- the reprojection error of the map point is determined according to the feature point associated with the map point and the reprojection point of the map point.
- the poses of the key large field of view frame and the positions of all map points associated with the key large field of view frame are updated.
- the process of bundle optimization in this step is consistent with the process in the above-mentioned global bundle optimization step 820, and for the sake of brevity, it will not be repeated here.
- the bundle optimization process for each key binocular image frame in the local map is as follows.
- the map points associated with the key binocular image frame are projected into a second multiple virtual pinhole camera model to obtain a reprojection point of the map points in the second multiple virtual pinhole camera model; Determining a reprojection error of the map point according to a feature point corresponding to the map point's reprojection point in the second multiple virtual pinhole camera model and the map point; according to all the key binocular image frames The reprojection error of the associated map point determines the right reprojection error.
- the right re-projection error or the sum of the left re-projection error and the right re-projection error, updating the pose of the key binocular image frame and the key binocular image frame The positions of all map points associated with the binocular image frame.
- the device 102 for simultaneous positioning and mapping may perform a closed-loop detection processing step.
- the closed-loop detection processing steps of the monocular large-field-of-view camera and the binocular large-field-of-view camera may be the same.
- the following takes the closed-loop detection processing of the monocular large-field of view camera as an example.
- a vector based on a bag of words model is used to detect a closed loop large field of view frame in the current map database that is similar to the current large field of view frame.
- a matching feature point pair between the closed-loop large-view frame and the current large-view frame is determined.
- a vector based on a bag-of-words model can be used to accelerate the matching between feature points.
- the map point associated with the feature point is transformed into the coordinate system of the corresponding virtual pinhole camera of the closed loop large field of view frame based on the multiple virtual pinhole camera model.
- the map point is then projected onto the imaging plane of the virtual pinhole camera to obtain a re-projection point of the map point in the closed-loop large field of view frame.
- a first re-projection error is determined according to the re-projection point and a matching feature point in the closed-loop large field of view frame.
- the first cumulative reprojection error is determined according to the first reprojection error of all matching feature points in the current large field of view frame.
- the map point associated with the characteristic point is transformed into the coordinate system of the corresponding virtual pinhole camera based on the multiple virtual pinhole camera model based on the current large field of view frame. And then project it onto the imaging plane of the virtual pinhole camera to obtain the reprojection point of the map point in the current large field of view frame.
- a second re-projection error is determined according to the re-projection point and a matching feature point in the current large field of view frame.
- a second cumulative reprojection error is determined according to the second reprojection error of all matching feature points in the closed-loop large field of view frame.
- a loss function is determined according to the first cumulative reprojection error and the second cumulative reprojection error.
- the similarity transformation matrix is optimized by minimizing the loss.
- the key large-view frames in the current map that have a common view relationship with the current large-view frames and the map points associated with them.
- the number of common map points observed in two large-view frames is larger than the common-view relationship threshold, which indicates that the two large-view frames have a common-view relationship.
- the common-view relationship threshold indicates the minimum number of common map points required for judging that two key large-view frames have a common-view relationship.
- the default setting value may be directly used, such as 20, or may be preset by the user.
- the poses of the key large field of view frames and the positions of the map points associated with the key large field of view frames are corrected through the similarity transformation matrix. This completes the closed-loop detection process.
- the closed-loop detection process further includes further optimizing the poses of all key large field-of-view frames in the current map and the positions of all map points through pose-graph optimization.
- the closed-loop detection process further includes finding and eliminating redundant key frames and map points to save system storage space while avoiding redundant computing operations.
- Steps 810 to 850 in the above embodiment provide an implementation of step 230 of a large field of view SLAM based on a multi-virtual pinhole camera model.
- any conventional or future-developed large-field SLAM method can be adopted.
- the above-mentioned optimization update processing for reprojection error calculation based on the multiple virtual pinhole camera model may be replaced with the optimization update processing based on the unit direction vector error calculation.
- the calculation based on the unit direction vector error achieves a final optimization goal by minimizing a difference between a unit direction vector corresponding to a map point and a unit direction vector corresponding to a feature point associated with the map point.
- the optimized target loss can be the distance between the unit direction vectors, or the angle between the unit vectors, or it can be another index describing the vector error.
- the present application sometimes combines various features in a single embodiment, a drawing, or a description thereof.
- the present application disperses various features in multiple embodiments of the present invention.
- this is not to say that a combination of these features is necessary, and those skilled in the art are likely to extract some of these features as a separate embodiment when reading this application.
- the embodiments in this application can also be understood as the integration of multiple secondary embodiments. It is also true that the content of each secondary embodiment is less than all the features of a single previously disclosed embodiment.
- numbers expressing quantities or properties used to describe and claim certain embodiments of the present application are understood to be modified in some cases by the terms “about”, “approximately” or “substantially”. For example, unless otherwise stated, “about”, “approximately” or “substantially” may mean a ⁇ 20% change in the value that it describes. Accordingly, in some embodiments, the numerical parameters set forth in the written description and appended claims are approximations that can vary depending on the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be interpreted based on the number of significant digits reported and by applying common rounding techniques. Although some embodiments set forth in this application list a wide range of numerical ranges and parameters are approximate, specific examples have listed numerical values as accurate as possible.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
- Geometry (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
Description
Claims (20)
- 一种同时定位与建图的方法,其特征在于,所述方法包括:通过大视场相机获取大视场图像;基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;基于所述去畸变图像,确定所述大视场相机的位姿并构建地图;其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括初始化步骤,所述初始化步骤包括:获取第一时刻对应的去畸变图像和第二时刻对应的去畸变图像;确定所述第一时刻对应的去畸变图像和所述第二时刻对应的去畸变图像互相匹配特征点;基于所述互相匹配的特征点构建初始地图。
- 如权利要求2所述的同时定位与建图的方法,其特征在于,所述基于所述互相匹配的特征点构建初始地图包括:基于所述第一时刻对应的去畸变图像中的特征点和所述第一时刻时所述大视场相机的相机中心,确定第一特征点对应的方向向量;基于所述第二时刻对应的去畸变图像中匹配的特征点和所述第二时刻时所述大视场相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建初始地图。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括全局捆集优化步骤,所述全局捆集优化步骤包括:对于所述地图中的每个关键大视场帧,将所述关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地 图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;基于所述重投影误差,更新所述关键大视场帧的位姿以及与所述关键大视场帧关联的所有地图点的位置。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括跟踪步骤,所述跟踪步骤包括:对于当前大视场帧关联的每个地图点,将所述地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前大视场帧关联的地图点的重投影误差确定重投影误差;基于所述重投影误差,更新所述当前大视场帧的位姿。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括建图步骤,所述建图步骤包括:确定当前大视场帧及其参考帧互相匹配的特征点;基于所述当前大视场帧的特征点和当前所述大视场相机的相机中心,确定第一特征点对应的方向向量;基于所述参考帧匹配的特征点和所述参考帧对应的所述大视场相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建地图。
- 如权利要求6所述的同时定位与建图的方法,其特征在于,所述建图步骤进一步包括局部捆集优化步骤,所述局部捆集优化步骤包括:对于局部地图中的每个关键大视场帧,将所述关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;根据所述重投影误差,更新所述关键大视场帧的位姿以及与该关键大视场帧关联的所有地图点的位置。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为双目大视场相机;所述方法包括:通过所述双目大视场相机获取左视场图像和右视场图像;基于第一多虚拟针孔相机模型,得到所述左视场图像对应的左去畸变图像;基于第二多虚拟针孔相机模型,得到所述右视场图像对应的右去畸变图像;基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图;其中,所述第一多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述双目大视场相机的左侧相机的相机中心重合;所述第二多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述双目大视场相机的右侧相机的相机中心重合。
- 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括初始化步骤,所述初始化步骤包括:确定所述左去畸变图像和所述右去畸变图像互相匹配的特征点;基于所述互相匹配的特征点构建初始地图。
- 如权利要求9所述的同时定位与建图的方法,其特征在于,所述确定所述左去畸变图像和所述右去畸变图像互相匹配的特征点包括:确定所述左去畸变图像中的特征点在所述右去畸变图像中对应的极线;在所述极线上搜索与所述左去畸变图像中的特征点匹配的特征点;其中,所述极线为多线段折线。
- 如权利要求9所述的同时定位与建图的方法,其特征在于,所述基于所述互相匹配的特征点构建初始地图包括:基于所述左去畸变图像中的特征点和所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;基于所述右去畸变图像中匹配的特征点和所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;基于所述双目大视场相机的基线,对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建初始地图。
- 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括全局捆集优化步骤,所述全局捆集优化步骤包括:对于所述地图中的每个关键双目图像帧,将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差;或将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差;基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
- 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括跟踪步骤,所述跟踪步骤包括:对于当前双目图像帧关联的每个地图点,将所述地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定左重投影误差;或将所述地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定右重投影误差;基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述当前双目图像帧的位姿。
- 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括建图步骤,所述建图步骤包括:确定当前左去畸变图像和当前右去畸变图像互相匹配的特征点;基于所述当前左去畸变图像的特征点和当前所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;基于所述当前右去畸变图像的特征点和当前所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建地图。
- 如权利要求14所述的同时定位与建图的方法,其特征在于,所述建图步骤进一步包括局部捆集优化步骤,所述局部捆集优化步骤包括:对于局部地图中的每个关键双目图像帧,将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差;或将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述 地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差;基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括闭环检测处理步骤,所述闭环检测处理步骤包括:当当前大视场帧是关键大视场帧时,确定地图数据库中与所述当前大视场帧相似的闭环大视场帧;确定所述当前大视场帧与所述闭环大视场帧互相匹配的特征点;针对所述当前大视场帧中每个匹配的特征点,将该特征点关联的地图点变换到所述闭环大视场帧对应的多虚拟针孔相机模型的坐标系中,再投影到所述多虚拟针孔相机模型的成像平面上,得到该地图点在所述闭环大视场帧中的重投影点,根据该重投影点与所述闭环大视场帧中匹配的特征点确定第一重投影误差;根据所述当前大视场帧中所有匹配的特征点的第一重投影误差确定第一累计重投影误差;针对所述闭环大视场帧中每个匹配的特征点,将该特征点关联的地图点变换到所述当前大视场帧对应的多虚拟针孔相机模型的坐标系中,再投影到所述多虚拟针孔相机模型的成像平面上,得到该地图点在所述当前大视场帧中的重投影点,根据该重投影点与所述当前大视场帧中匹配的特征点确定第二重投影误差;根据所述闭环大视场帧中所有匹配的特征点的第二重投影误差确定第二累计重投影误差;利用所述第一累计重投影误差和所述第二累计重投影误差,对地图中与所述当前大视场帧具有共视关系的关键大视场帧以及与其关联的地图点进行校正。
- 如权利要求1所述的同时定位与建图的方法,其特征在于,所述至少两个不同朝向包括:立方体的前朝向、上朝向、下朝向、左朝向或右朝向。
- 一种同时定位与建图的装置,包括:至少一个存储设备,所述存储设备包括一组指令;以及与所述至少一个存储设备通信的至少一个处理器,其中,当执行所述一组指令时,所述至少一个处理器用于使所述同时定位与建图的装置:通过大视场相机获取大视场图像;基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;基于所述去畸变图像,确定所述大视场相机的位姿并构建地图;其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
- 如权利要求18所述的同时定位与建图的装置,其特征在于,所述大视场相机为单目大视场相机;为了基于所述去畸变图像,确定所述大视场相机的位姿并构建地图,所述至少一个处理器进一步用于使所述同时定位与建图的装置执行初始化步骤,所述初始化步骤包括:获取第一时刻对应的去畸变图像和第二时刻对应的去畸变图像;确定所述第一时刻对应的去畸变图像和所述第二时刻对应的去畸变图像互相匹配特征点;基于所述互相匹配的特征点构建初始地图。
- 一种包括计算机程序产品的非暂时性计算机可读介质,所述计算机程序产品包括一些指令,所述指令使计算设备:通过大视场相机获取大视场图像;基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;基于所述去畸变图像,确定所述大视场相机的位姿并构建地图;其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019572827A JP7096274B2 (ja) | 2018-06-07 | 2018-12-28 | 自己位置推定と環境マップ作成を同時に行う方法及び装置 |
KR1020197039024A KR102367361B1 (ko) | 2018-06-07 | 2018-12-28 | 위치 측정 및 동시 지도화 방법 및 장치 |
US16/627,768 US11017545B2 (en) | 2018-06-07 | 2018-12-28 | Method and device of simultaneous localization and mapping |
EP18921621.1A EP3806036A4 (en) | 2018-06-07 | 2018-12-28 | METHOD AND DEVICE FOR SIMULTANEOUS LOCALIZATION AND MAPPING |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810578095.3A CN108776976B (zh) | 2018-06-07 | 2018-06-07 | 一种同时定位与建图的方法、系统及存储介质 |
CN201810578095.3 | 2018-06-07 | ||
CN201811401646.5 | 2018-11-22 | ||
CN201811401646.5A CN111210476B (zh) | 2018-11-22 | 2018-11-22 | 一种同时定位与建图的方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019233090A1 true WO2019233090A1 (zh) | 2019-12-12 |
Family
ID=68770787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/124786 WO2019233090A1 (zh) | 2018-06-07 | 2018-12-28 | 一种同时定位与建图的方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11017545B2 (zh) |
EP (1) | EP3806036A4 (zh) |
JP (1) | JP7096274B2 (zh) |
KR (1) | KR102367361B1 (zh) |
WO (1) | WO2019233090A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461998A (zh) * | 2020-03-11 | 2020-07-28 | 中国科学院深圳先进技术研究院 | 一种环境重建方法及装置 |
CN112509047A (zh) * | 2020-12-10 | 2021-03-16 | 北京地平线信息技术有限公司 | 基于图像的位姿确定方法、装置、存储介质及电子设备 |
JP2021140780A (ja) * | 2020-02-28 | 2021-09-16 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | 地図作成のためのコンピュータ実施方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2020287875A1 (en) * | 2019-06-07 | 2021-12-23 | Pictometry International Corp. | Using spatial filter to reduce bundle adjustment block size |
US11595568B2 (en) * | 2020-02-18 | 2023-02-28 | Occipital, Inc. | System for generating a three-dimensional scene of a physical environment |
CN113345032B (zh) * | 2021-07-07 | 2023-09-15 | 北京易航远智科技有限公司 | 一种基于广角相机大畸变图的初始化建图方法及系统 |
CN113465617B (zh) * | 2021-07-08 | 2024-03-19 | 上海汽车集团股份有限公司 | 一种地图构建方法、装置及电子设备 |
CN113506369A (zh) * | 2021-07-13 | 2021-10-15 | 阿波罗智能技术(北京)有限公司 | 用于生成地图的方法、装置、电子设备和介质 |
CN113781573B (zh) * | 2021-07-19 | 2024-04-23 | 长春理工大学 | 一种基于双目折反射全景相机的视觉里程计方法 |
CN116468786B (zh) * | 2022-12-16 | 2023-12-26 | 中国海洋大学 | 一种面向动态环境的基于点线联合的语义slam方法 |
CN116009559B (zh) * | 2023-03-24 | 2023-06-13 | 齐鲁工业大学(山东省科学院) | 一种输水管道内壁巡检机器人及检测方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130182894A1 (en) * | 2012-01-18 | 2013-07-18 | Samsung Electronics Co., Ltd. | Method and apparatus for camera tracking |
CN106846467A (zh) * | 2017-01-23 | 2017-06-13 | 阿依瓦(北京)技术有限公司 | 基于每个相机位置优化的实体场景建模方法和系统 |
CN107862744A (zh) * | 2017-09-28 | 2018-03-30 | 深圳万图科技有限公司 | 航空影像三维建模方法及相关产品 |
CN108776976A (zh) * | 2018-06-07 | 2018-11-09 | 驭势科技(北京)有限公司 | 一种同时定位与建图的方法、系统及存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100776215B1 (ko) * | 2005-01-25 | 2007-11-16 | 삼성전자주식회사 | 상향 영상을 이용한 이동체의 위치 추정 및 지도 생성장치 및 방법과 그 장치를 제어하는 컴퓨터 프로그램을저장하는 컴퓨터로 읽을 수 있는 기록 매체 |
JP2008102620A (ja) * | 2006-10-17 | 2008-05-01 | Toyota Motor Corp | 画像処理装置 |
KR101423139B1 (ko) * | 2012-06-19 | 2014-07-28 | 한양대학교 산학협력단 | 3차원 직선을 이용하여 위치를 인식하고 지도를 생성하는 방법 및 그 방법에 따른 이동체 |
US10203762B2 (en) * | 2014-03-11 | 2019-02-12 | Magic Leap, Inc. | Methods and systems for creating virtual and augmented reality |
US10852838B2 (en) * | 2014-06-14 | 2020-12-01 | Magic Leap, Inc. | Methods and systems for creating virtual and augmented reality |
KR101592740B1 (ko) * | 2014-07-24 | 2016-02-15 | 현대자동차주식회사 | 차량용 광각카메라의 영상 왜곡 보정 장치 및 방법 |
KR101666959B1 (ko) * | 2015-03-25 | 2016-10-18 | ㈜베이다스 | 카메라로부터 획득한 영상에 대한 자동보정기능을 구비한 영상처리장치 및 그 방법 |
-
2018
- 2018-12-28 KR KR1020197039024A patent/KR102367361B1/ko active IP Right Grant
- 2018-12-28 EP EP18921621.1A patent/EP3806036A4/en active Pending
- 2018-12-28 JP JP2019572827A patent/JP7096274B2/ja active Active
- 2018-12-28 WO PCT/CN2018/124786 patent/WO2019233090A1/zh unknown
- 2018-12-28 US US16/627,768 patent/US11017545B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130182894A1 (en) * | 2012-01-18 | 2013-07-18 | Samsung Electronics Co., Ltd. | Method and apparatus for camera tracking |
CN106846467A (zh) * | 2017-01-23 | 2017-06-13 | 阿依瓦(北京)技术有限公司 | 基于每个相机位置优化的实体场景建模方法和系统 |
CN107862744A (zh) * | 2017-09-28 | 2018-03-30 | 深圳万图科技有限公司 | 航空影像三维建模方法及相关产品 |
CN108776976A (zh) * | 2018-06-07 | 2018-11-09 | 驭势科技(北京)有限公司 | 一种同时定位与建图的方法、系统及存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3806036A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021140780A (ja) * | 2020-02-28 | 2021-09-16 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | 地図作成のためのコンピュータ実施方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム |
US11417014B2 (en) | 2020-02-28 | 2022-08-16 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for constructing map |
JP7150917B2 (ja) | 2020-02-28 | 2022-10-11 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | 地図作成のためのコンピュータ実施方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム |
CN111461998A (zh) * | 2020-03-11 | 2020-07-28 | 中国科学院深圳先进技术研究院 | 一种环境重建方法及装置 |
CN112509047A (zh) * | 2020-12-10 | 2021-03-16 | 北京地平线信息技术有限公司 | 基于图像的位姿确定方法、装置、存储介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
JP7096274B2 (ja) | 2022-07-05 |
EP3806036A4 (en) | 2022-03-16 |
KR20200014858A (ko) | 2020-02-11 |
KR102367361B1 (ko) | 2022-02-23 |
US11017545B2 (en) | 2021-05-25 |
EP3806036A1 (en) | 2021-04-14 |
US20210082137A1 (en) | 2021-03-18 |
JP2021505979A (ja) | 2021-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019233090A1 (zh) | 一种同时定位与建图的方法及装置 | |
CN109166149B (zh) | 一种融合双目相机与imu的定位与三维线框结构重建方法与系统 | |
US10989540B2 (en) | Binocular vision localization method, device and system | |
CN109307508B (zh) | 一种基于多关键帧的全景惯导slam方法 | |
EP2833322B1 (en) | Stereo-motion method of three-dimensional (3-D) structure information extraction from a video for fusion with 3-D point cloud data | |
Alismail et al. | Photometric bundle adjustment for vision-based slam | |
CN108776976B (zh) | 一种同时定位与建图的方法、系统及存储介质 | |
CN108682027A (zh) | 基于点、线特征融合的vSLAM实现方法及系统 | |
WO2019029099A1 (zh) | 基于图像梯度联合优化的双目视觉里程计算方法 | |
CN111127524A (zh) | 一种轨迹跟踪与三维重建方法、系统及装置 | |
CN106447601B (zh) | 一种基于投影-相似变换的无人机遥感影像拼接方法 | |
CN111553939B (zh) | 一种多目摄像机的图像配准算法 | |
CN111415375B (zh) | 一种基于多鱼眼摄像机和双针孔投影模型的slam方法 | |
CN112767546B (zh) | 移动机器人基于双目图像的视觉地图生成方法 | |
CN109613974B (zh) | 一种大场景下的ar家居体验方法 | |
CN109785373A (zh) | 一种基于散斑的六自由度位姿估计系统及方法 | |
Ly et al. | Extrinsic calibration of heterogeneous cameras by line images | |
CN116128966A (zh) | 一种基于环境物体的语义定位方法 | |
CN111829522B (zh) | 即时定位与地图构建方法、计算机设备以及装置 | |
Koppel et al. | Image-based rendering and modeling in video-endoscopy | |
CN117115271A (zh) | 无人机飞行过程中的双目相机外参数自标定方法及系统 | |
Guillemaut et al. | Using points at infinity for parameter decoupling in camera calibration | |
Yang et al. | Research and application of 3D face modeling algorithm based on ICP accurate alignment | |
Abdellali et al. | Absolute and relative pose estimation of a multi-view camera system using 2d-3d line pairs and vertical direction | |
CN113850293B (zh) | 基于多源数据和方向先验联合优化的定位方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2019572827 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20197039024 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18921621 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018921621 Country of ref document: EP Effective date: 20210111 |