WO2019233090A1 - 一种同时定位与建图的方法及装置 - Google Patents

一种同时定位与建图的方法及装置 Download PDF

Info

Publication number
WO2019233090A1
WO2019233090A1 PCT/CN2018/124786 CN2018124786W WO2019233090A1 WO 2019233090 A1 WO2019233090 A1 WO 2019233090A1 CN 2018124786 W CN2018124786 W CN 2018124786W WO 2019233090 A1 WO2019233090 A1 WO 2019233090A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
view
point
camera
large field
Prior art date
Application number
PCT/CN2018/124786
Other languages
English (en)
French (fr)
Inventor
王亚慧
蔡少骏
Original Assignee
驭势科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810578095.3A external-priority patent/CN108776976B/zh
Priority claimed from CN201811401646.5A external-priority patent/CN111210476B/zh
Application filed by 驭势科技(北京)有限公司 filed Critical 驭势科技(北京)有限公司
Priority to JP2019572827A priority Critical patent/JP7096274B2/ja
Priority to EP18921621.1A priority patent/EP3806036A4/en
Priority to KR1020197039024A priority patent/KR102367361B1/ko
Priority to US16/627,768 priority patent/US11017545B2/en
Publication of WO2019233090A1 publication Critical patent/WO2019233090A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • G06T3/047
    • G06T5/80
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30184Infrastructure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the invention relates to the field of simultaneous positioning and mapping, in particular to the field of simultaneous positioning and mapping based on a large field of view camera.
  • Simultaneous Localization and Mapping is a technology that tracks the movement of a robot in real time and simultaneously builds a map of the surrounding environment to achieve positioning and navigation.
  • the camera used in traditional SLAM is a perspective camera or a pinhole camera. Due to the limited field-of-view of the camera, there are insufficient common features between the acquired images, which may cause the SLAM algorithm to lose track. Compared with the pinhole camera used in traditional SLAM, the large-field-of-view camera has a wider field of view, so it has received extensive research and attention.
  • One is to use the traditional de-distortion method to perform the distortion processing on the large-field-of-view image obtained by the large-field-of-view camera, and then use the traditional SLAM technology to realize the simultaneous positioning and mapping using the de-distorted image as a normal image.
  • This technical solution is simple and easy to implement, but the traditional de-distortion method will lead to a loss of a large number of viewing angles, and cannot make full use of the wide viewing angle of a large field of view camera.
  • the other is based on the large-field-of-view camera imaging model to directly perform SLAM processing on large-field-of-view images without distortion correction. That is, features are directly extracted and processed on a large field of view image without distortion correction. The features extracted by this technical solution may be affected by image distortion. At the same time, complex large-field-of-view camera imaging models will cause optimization to become extremely complicated, thereby affecting the performance of the system.
  • the purpose of this application is to provide a method for simultaneous positioning and mapping.
  • This method can, based on the multi-virtual pinhole camera model, de-distortion the large-field-of-view image acquired by the large-field-of-view camera; perform simultaneous positioning and mapping based on the de-distorted image.
  • the present application provides a method for simultaneous positioning and mapping.
  • the method includes: obtaining a large field of view image through a large field of view camera; obtaining a de-distortion image corresponding to the large field of view image based on a multiple virtual pinhole camera model; and determining the large field of view based on the de-distortion image Camera pose and build a map.
  • the multi-virtual pinhole camera model includes at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the large field of view camera.
  • the large field of view camera is a monocular large field of view camera.
  • the determining the pose of the large-field-of-view camera and constructing a map based on the de-distortion image includes an initialization step including: obtaining a de-distortion image corresponding to a first time and a de-distortion image corresponding to a second time. Determining that the dedistortion image corresponding to the first time and the dedistortion image corresponding to the second time match feature points with each other; and constructing an initial map based on the mutually matched feature points.
  • the constructing an initial map based on the matched feature points includes: based on the feature points in the de-distortion image corresponding to the first moment and the large-field-of-view camera at the first moment.
  • the camera center determines the direction vector corresponding to the first feature point; determines the second feature based on the matched feature points in the de-distortion image corresponding to the second moment and the camera center of the large field of view camera at the second moment.
  • a direction vector corresponding to the point triangulating the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and constructing an initial map based on the map point .
  • the large field of view camera is a monocular large field of view camera.
  • the determining the pose of the large-field-of-view camera and constructing a map based on the de-distortion image includes a global bundle optimization step, which includes: for each key large field of view in the map Frame, each map point associated with the key large field of view frame is projected into a multiple virtual pinhole camera model to obtain a reprojection point of the map point in the multiple virtual pinhole camera model; according to the map
  • the re-projection point points in the multi-virtual pinhole camera model and the feature points corresponding to the map points determine the re-projection errors of the map points; according to the re-projection points of all map points associated with the key large field of view frames
  • the projection error determines a re-projection error; based on the re-projection error, the pose of the key large field of view frame and the positions of all map points associated with the key large field of view frame are updated.
  • the large field of view camera is a monocular large field of view camera.
  • the determining the pose of the large field of view camera and constructing a map based on the de-distortion image includes a tracking step, and the tracking step includes: for each map point associated with the current large field of view frame, dividing the map point Projecting into a multiple virtual pinhole camera model to obtain a reprojection point of the map point in the multiple virtual pinhole camera model; according to the reprojection point of the map point in the multiple virtual pinhole camera model and The feature point corresponding to the map point determines the reprojection error of the map point; the reprojection error is determined according to the reprojection error of all the map points associated with the current large field of view frame; and based on the reprojection error, the updated Describe the pose of the current large field of view frame.
  • the large field of view camera is a monocular large field of view camera.
  • the determining the pose of the large-field-of-view camera and constructing a map based on the de-distortion image includes a mapping step, and the mapping step includes determining feature points where the current large-field-of-view frame and its reference frame match each other; Determining a direction vector corresponding to the first feature point based on the feature points of the current large field of view frame and the camera center of the current wide field of view camera; based on the feature points matched by the reference frame and all the positions corresponding to the reference frame
  • the camera center of the large field of view camera determines a direction vector corresponding to the second feature point; triangulates the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point to determine the feature point Corresponding map points; a map is constructed based on the map points.
  • the mapping step further includes a local bundling optimization step.
  • the local bundling optimization step includes: for each key large field of view frame in the local map, projecting each map point associated with the key large field of view frame into a multiple virtual pinhole camera model to obtain the map A re-projection point of the point in the multi-virtual pinhole camera model; determining the map point according to a feature point corresponding to the map point of the re-projection point of the map point in the multi-virtual pin-hole camera model Re-projection error; determining re-projection errors according to re-projection errors of map points associated with all of the key large field-of-view frames; updating poses of the key large field-of-view frames and the large The position of all map points associated with the field of view frame.
  • the large field of view camera is a binocular large field of view camera.
  • the method includes: obtaining a left field of view image and a right field of view image through the binocular large field of view camera; obtaining a left de-distorted image corresponding to the left field of view image based on a first multi-virtual pinhole camera model; A second multi-virtual pinhole camera model to obtain a right de-distortion image corresponding to the right field of view image; and determining a pose of the binocular large field of view camera based on the left de-distortion image and the right de-distortion image And build the map.
  • the first multi-virtual pinhole camera model includes at least two virtual pinhole cameras of different orientations, and a camera center of the at least two virtual pinhole cameras of different orientations and a binocular large-field-of-view camera The camera centers of the left cameras are coincident;
  • the second multi-virtual pinhole camera model includes at least two virtual pinhole cameras of different orientations, and the camera centers of the at least two virtual pinhole cameras of different orientations and the dual The camera center of the right camera of the large field of view camera coincides.
  • the based on the left and right distortion images. Determining the pose of the binocular large field of view camera and constructing a map includes an initialization step, the initialization step includes: determining feature points where the left de-distorted image and the right de-distorted image match each other; based on the mutual matching Feature points to build an initial map.
  • determining the feature points where the left de-distortion image and the right de-distortion image match each other includes: determining that feature points in the left de-distortion image correspond to the right-de-distortion image.
  • the epipolar line is a multi-line segment polyline.
  • the constructing an initial map based on the matched feature points includes: determining the feature points in the left de-distorted image and the camera center of the left camera of the binocular large field of view camera to determine A direction vector corresponding to the first feature point; determining a direction vector corresponding to the second feature point based on the matched feature point in the right de-distortion image and the camera center of the right camera of the binocular large field of view camera; Said the baseline of the binocular large field of view camera, triangulating a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; based on the map Click to build the initial map.
  • the determining the pose of the binocular large-field-of-view camera and constructing a map based on the left de-distorted image and the right de-distorted image includes a global bundling optimization step.
  • the global bundling optimization step includes: for each key binocular image frame in the map, projecting a map point associated with the key binocular image frame into a first multiple virtual pinhole camera model to obtain the A re-projection point of a map point in the first multi-virtual pin-hole camera model; and a feature point corresponding to the map point according to the re-projection point of the map point in the first multi-virtual pin-hole camera model, Determining a reprojection error of the map points; determining a left reprojection error according to the reprojection errors of map points associated with all the key binocular image frames; or projecting a map point associated with the key binocular image frames to a second In the multi-virtual pinhole camera model, a re-projection point of the map point in the
  • the determining a pose of the binocular large field of view camera and constructing a map based on the left de-distorted image and the right de-distorted image includes a tracking step.
  • the tracking step includes: for each map point associated with the current binocular image frame, projecting the map point into a first multi-virtual pinhole camera model to obtain the map point in the first multi-virtual pinhole camera model A reprojection point in a camera model; determining a reprojection error of the map point according to a feature point corresponding to the map point in the first multi-virtual pinhole camera model and the map point;
  • the reprojection error of all the map points associated with the current binocular image frame determines the left reprojection error; or the map points are projected into a second multi-virtual pinhole camera model to obtain the map points in the second A re-projection point in a multi-virtual pinhole camera model; determining a re-projection of the map point according to a feature point corresponding to the re-projection point
  • determining the pose of the binocular large-field-of-view camera and constructing a map based on the left de-distorted image and the right de-distorted image includes a mapping step.
  • the mapping step includes: determining feature points where the current left de-distortion image and the current right de-distortion image match each other; based on the feature points of the current left de-distortion image and the left camera of the current binocular large-field-of-view camera Determine the direction vector corresponding to the first feature point; based on the feature point of the current right de-distortion image and the camera center of the right camera of the current binocular large field of view camera, determine the second feature point corresponding A direction vector; triangulating a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and constructing a map based on the map point.
  • the mapping step further includes a local bundling optimization step.
  • the local bundling optimization step includes: for each key binocular image frame in the local map, projecting a map point associated with the key binocular image frame into a first multi-virtual pinhole camera model to obtain the map A re-projection point of the point in the first multi-virtual pinhole camera model; determined according to a feature point corresponding to the map point of the re-projection point of the map point in the first multi-virtual pin-hole camera model The reprojection error of the map points; determining the left reprojection error according to the reprojection errors of the map points associated with all the key binocular image frames; or projecting the map points associated with the key binocular image frames to a second multiple In the virtual pinhole camera model, a reprojection point of the map point in the second multiple virtual pinhole camera model is obtained; according to the map point, the reprojection point in the second multiple virtual pinhole camera model is obtained Determining a reprojection error of the map point according to
  • determining the pose of the large field of view camera and constructing a map based on the de-distortion image includes a closed-loop detection processing step.
  • the closed-loop detection processing step includes: when the current large-view frame is a key large-view frame, determining a closed-loop large-view frame in a map database similar to the current large-view frame; determining the current large-view frame Feature points that match each other with the closed loop large field of view frame; for each matching feature point in the current large field of view frame, map points associated with the feature point are transformed into multiple corresponding points of the closed loop field of view frame
  • the coordinate system of the virtual pinhole camera model is then projected onto the imaging plane of the multiple virtual pinhole camera model to obtain the reprojection point of the map point in the closed-loop large field of view frame.
  • the current large-view frame has a key large-view frame with a common view relationship and
  • the at least two different orientations include a front orientation, an upward orientation, a downward orientation, a left orientation, or a right orientation of the cube.
  • An aspect of the present application provides an apparatus for simultaneously positioning and mapping
  • the apparatus includes at least one storage device, the storage device includes a set of instructions, and at least one processor in communication with the at least one storage device.
  • the at least one processor is configured to enable the device for simultaneous positioning and mapping: obtaining a large field of view image through a large field of view camera; The distorted image corresponding to the large field of view image is described; based on the distorted image, a pose of the large field of view camera is determined and a map is constructed.
  • the multiple virtual pinhole camera model includes at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the large field of view camera.
  • FIG. 1 illustrates a system for simultaneous positioning and mapping according to some embodiments of the present application
  • FIG. 2 shows a flowchart of a method for simultaneous positioning and mapping according to some embodiments of the present application
  • FIG. 3 illustrates a multiple virtual pinhole camera model including two orientations according to some embodiments of the present application
  • FIG. 4 illustrates a multi-virtual pinhole camera model including five orientations according to some embodiments of the present application
  • FIG. 5 shows a schematic diagram of distortion removal based on a multiple virtual pinhole camera model according to some embodiments of the present application
  • FIG. 6 illustrates original monocular fisheye images, traditional monocular fisheye images after de-distortion, and monocular fisheye images after distortion using the method of the present disclosure, according to some embodiments of the present application;
  • FIG. 7 illustrates an original binocular fisheye image and a conventional binocular fisheye image after de-distortion, according to some embodiments of the present application
  • FIG. 8 shows a flowchart of determining a camera pose and constructing a map according to some embodiments of the present application
  • FIG. 9 shows a schematic diagram of a map point constructed by a monocular large field of view camera according to some embodiments of the present application.
  • FIG. 10 is a schematic diagram of polar line search of a binocular large field of view camera according to some embodiments of the present application.
  • FIG. 11 is a schematic diagram illustrating a map point constructed by a binocular large field of view camera according to some embodiments of the present application.
  • the flowchart used in the present disclosure illustrates operations implemented by a system according to some embodiments of the present disclosure. It should be clearly understood that the operations of the flowchart may be implemented out of order. Instead, operations can be performed in reverse order or simultaneously. In addition, you can add one or more other actions to the flowchart. One or more actions can be removed from the flowchart.
  • One aspect of the present disclosure relates to a method of simultaneous positioning and mapping.
  • the method includes de-distorting the large-field-of-view image acquired by the large-field-of-view camera into a de-distorted image based on the multi-virtual pinhole camera model; determining the pose of the large-field-of-view camera and constructing a map based on the de-distorted image.
  • the multiple virtual pinhole camera model includes at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the large field of view camera.
  • FIG. 1 illustrates a system for simultaneous positioning and mapping according to some embodiments of the present application.
  • the system 100 for simultaneous positioning and mapping can acquire a large field of view image and execute a method of simultaneous positioning and mapping.
  • a method of simultaneous positioning and mapping For the method of simultaneous positioning and mapping, reference may be made to the description of FIG. 2 to FIG. 11.
  • the system 100 for simultaneous positioning and mapping may include a large field of view camera 101 and a device 102 for simultaneous positioning and mapping.
  • the large-field-of-view camera 101 and the device 102 for simultaneous positioning and mapping may be installed as a whole or separately.
  • the large field of view camera 101 is used to acquire a fish-eye image of a scene.
  • the large-field-of-view camera 101 may be a fish-eye camera, a refracting camera, or a panoramic imaging camera.
  • the large-field-of-view camera 101 may be a monocular large-field-of-view camera, a binocular large-field-of-view camera, or a multi-eye large-field-of-view camera.
  • the large field of view camera 101 includes a monocular fisheye camera and a binocular fisheye camera.
  • the left camera of a binocular fisheye camera is called the left eye; the right camera of the binocular fisheye camera is called the right eye.
  • the image acquired by the left eye is called a left fisheye image (left field of view image), and the image acquired by the right eye is called a right fisheye image (right field of view image).
  • the device 102 for simultaneous positioning and mapping is an exemplary computing device that can perform the method of simultaneous positioning and mapping.
  • the device 102 for simultaneous positioning and mapping may include a COM port 150 to facilitate data communication.
  • the device 102 for simultaneously positioning and mapping may further include a processor 120 in the form of one or more processors for executing computer instructions.
  • Computer instructions may include, for example, routines, programs, objects, components, data structures, procedures, modules, and functions that perform the specific functions described herein.
  • the processor 120 may determine a distorted image of the fish-eye image based on the multiple virtual pinhole camera model.
  • the processor 120 may determine the pose of the large-field-of-view camera 101 and construct a map based on the distorted image.
  • the processor 120 may include one or more hardware processors, such as a microcontroller, microprocessor, reduced instruction set computer (RISC), application specific integrated circuit (ASIC), application-specific instruction-set Processor (ASIP), Central Processing Unit (CPU), Graphics Processing Unit (GPU), Physical Processing Unit (PPU), Microcontroller Unit, Digital Signal Processor (DSP), Field Programmable Gate Array (FPGA), Advanced RISC machine (ARM), programmable logic device (PLD), any circuit or processor capable of performing one or more functions, etc., or any combination thereof.
  • RISC reduced instruction set computer
  • ASIC application specific integrated circuit
  • ASIP application-specific instruction-set Processor
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • PPU Physical Processing Unit
  • Microcontroller Unit Microcontroller Unit
  • DSP Digital Signal Processor
  • FPGA Field Programmable Gate Array
  • ARM programmable logic device
  • PLD programmable logic device
  • the device 102 for simultaneous positioning and mapping may include an internal communication bus 110, program storage, and different forms of data storage (e.g., disk 170, read-only memory (ROM) 130, or random access memory (RAM) ) 140).
  • the device 102 for simultaneously positioning and mapping may also include program instructions stored in ROM 130, RAM 140, and / or other types of non-transitory storage media to be executed by processor 120.
  • the methods and / or processes of the present application may be implemented as program instructions.
  • the device 102 for simultaneous positioning and mapping also includes an I / O component 160 that supports input / output between the computer and other components (eg, user interface elements).
  • the device 102 for positioning and mapping at the same time can also receive programming and data through network communication.
  • the device 102 for simultaneous positioning and mapping in this application may also include multiple processors, and therefore, the operations and / or method steps disclosed in this application may be performed by one processor as described in this disclosure, It can also be performed jointly by multiple processors.
  • the processor 120 of the device 102 for positioning and mapping in the present application executes steps A and B, it should be understood that steps A and B can also be combined or separated by two different processors in information processing. Perform (eg, the first processor performs step A, the second processor performs step B, or the first and second processors perform steps A and B together).
  • FIG. 2 shows a flowchart of a method for simultaneous positioning and mapping according to some embodiments of the present application.
  • the process 200 may be implemented as a set of instructions in a non-transitory storage medium in the device 102 that simultaneously locates and maps.
  • the device 102 for positioning and mapping at the same time can execute the set of instructions and can execute the steps in the process 200 accordingly.
  • process 200 may add one or more additional operations not described, and / or delete one or more operations described herein. Furthermore, the order of operations shown in FIG. 2 and described below is not a limitation on this.
  • the device 102 for simultaneous positioning and mapping can acquire a large field of view image through the large field of view camera 101.
  • the large field of view camera 101 When the large field of view camera 101 is a monocular large field of view camera, the monocular large field of view camera acquires a large field of view image; when the large field of view camera 101 is a binocular large field of view camera, the binocular large field of view camera acquires large Field of view images, including left and right fields of view.
  • the device 102 for simultaneous positioning and mapping may obtain a de-distortion image corresponding to the large field-of-view image based on a multiple virtual pinhole camera model.
  • the device 102 for simultaneous positioning and mapping may obtain a de-distortion image corresponding to the large-field-of-view image based on a multiple virtual pinhole camera model.
  • the above multiple virtual pinhole camera model may include at least two virtual pinhole cameras with different orientations, and a camera center of the at least two virtual pinhole cameras with different orientations coincides with a camera center of the monocular large field of view camera.
  • the device 102 for simultaneous positioning and mapping may obtain a left de-distorted image corresponding to the left field of view image based on the first multiple virtual pinhole camera model; Two multiple virtual pinhole camera models to obtain a right de-distortion image corresponding to the right field of view image.
  • the first multiple virtual pinhole camera model and the second multiple virtual pinhole camera model may be the same or different.
  • the first multi-virtual pinhole camera model may include at least two virtual pinhole cameras of different orientations, and the camera centers of the at least two virtual pinhole cameras of different orientations coincide with the left-purpose camera center of the large field of view camera 101;
  • the second multi-virtual pinhole camera model may include at least two virtual pinhole cameras with different orientations, and the camera centers of the at least two virtual pinhole cameras with different orientations coincide with the right-view camera center of the large field of view camera 101.
  • FIG. 3 illustrates a multiple virtual pinhole camera model including two orientations, according to some embodiments of the present application.
  • the orientations of the two virtual pinhole cameras are at an angle of 90 degrees, and the camera center coincides with the camera center of the large field of view camera at point C.
  • FIG. 4 illustrates a multi-virtual pinhole camera model including five orientations according to some embodiments of the present application.
  • the multi-virtual pinhole camera model includes a virtual pinhole camera with a total of 5 orientations: forward, upward, downward, left, and right.
  • the camera centers of the five virtual pinhole cameras and the camera centers of the large field of view coincide at point C.
  • the above-mentioned de-distortion method is called a cubemap-based distortion method (hereinafter referred to as a cube or a cube model).
  • the device 102 for simultaneous positioning and mapping may project a large field of view image (or left field of view image, right field of view image) to a multi-virtual pinhole camera model (or a first multi-virtual pinhole camera model, a second Multi-virtual pinhole camera model), obtain projection images of at least two virtual pinhole cameras with different orientations, and expand the projection views of the at least two virtual pinhole cameras with different orientations to obtain the left fisheye image Corresponding dedistorted image.
  • a large field of view image or left field of view image, right field of view image
  • a multi-virtual pinhole camera model or a first multi-virtual pinhole camera model, a second Multi-virtual pinhole camera model
  • FIG. 5 a schematic diagram of de-distortion based on a multiple virtual pinhole camera model according to some embodiments of the present application is shown.
  • the first multiple virtual pinhole camera model and the left field of view image are taken as examples.
  • Point A is the left objective camera center of the binocular large field of view camera, and points B, C, and D are exemplary pixels in the left field of view image.
  • the first multi-virtual pinhole camera 510 is a cube model, and includes five orientation virtual pinhole cameras, which are a front orientation, an upward orientation, a downward orientation, a left orientation, and a right orientation of the cube, respectively. The camera centers of the five-oriented virtual pinhole cameras coincide with point A.
  • the left field of view images are projected onto the imaging planes of five differently oriented virtual pinhole cameras of the first multi-virtual pinhole camera model 510. Accordingly, five differently-oriented projection maps can be obtained. By expanding the five differently-oriented projection images, a left-distorted image can be obtained.
  • FIG. 6 shows an original monocular fisheye image, a traditional monocular fisheye image after de-distortion, and a monocular fisheye after de-distortion using the method of the present disclosure, according to some embodiments of the present application. image.
  • Figure 610 shows a large field of view image obtained with a monocular fisheye camera. It can be seen that the large field of view image has a wider field of view than the image obtained by an ordinary camera, but the entire image has spatial distortion, and the distortion is greater as it is farther from the center of the image.
  • Figure 620 shows a de-distortion image obtained by performing a de-distortion process on the large-field-of-view image using a conventional de-distortion method.
  • the angle of view of an image obtained by an ordinary camera is generally about 80 degrees, and the angle of view of the picture 620 is 100 degrees.
  • the angle of view of the image obtained by the ordinary camera has been improved, it still loses a lot of angle of view compared to the image before the undistortion process. As a result, it is not possible to construct a map of all perspectives including a large field of view image.
  • FIG. 630 A large field of view de-distortion image based on a five-direction multi-virtual pinhole camera model de-distortion expansion according to an embodiment of the present invention, that is, a de-distortion image obtained by a cube model. As shown, the plot 630 retains all perspectives of the large field of view image. SLAM based on this large-field-of-distortion image can construct a map that includes all original perspective content.
  • FIG. 7 illustrates an original binocular fisheye image and a conventional binocular fisheye image after de-distortion, according to some embodiments of the present application.
  • images 701 and 702 are respectively the original left fisheye image and right fisheye image obtained by the large field of view camera 101 in the real world.
  • the images 703 and 704 are the left and right distortion images, respectively, after the traditional distortion.
  • the images 601 and 602 are single images obtained by traditional de-distortion methods.
  • the angle of view is only 100 degrees. It can be seen that, for the large-angle-of-view image acquired by the large-field-of-view camera 101, the de-distortion method provided in this application can effectively prevent image distortion while retaining a large-angle of view.
  • the device 102 for simultaneous positioning and mapping may determine the pose of the large field of view camera and construct a map based on the de-distortion image.
  • the device 102 for simultaneous positioning and mapping may extract feature points of the distorted image and construct a corresponding large field of view frame based on the extracted feature points; then based on the The large field of view frame determines the pose of the monocular large field of view camera and builds a map.
  • the pose of the camera movement is estimated and a map is constructed directly based on the pixel brightness information in the large-field-of-distortion image without calculating key points and descriptors.
  • the large-field-of-view distortion image obtained by the above-mentioned method based on the multiple virtual pinhole camera model de-distortion retains all the perspectives of the original large-field-of-view image. Therefore, simultaneous positioning and mapping can be performed based on the rich common features between the large field of view images to obtain more efficient positioning and more accurate maps. At the same time, the above-mentioned method also avoids the extra calculation cost brought by the complex projection model of the large field of view camera to the system.
  • the device 102 for simultaneous positioning and mapping may extract feature points of the left and right distortion images, and construct a corresponding binocular image based on the extracted feature points. Frame; then determine the pose of the binocular large field of view camera and build a map based on the binocular image frame.
  • the large field of view frame (or binocular image frame) includes information on all feature points in the de-distorted image (or left de-distorted image, right de-distorted image), it is possible to track the position of the large-field-of-view camera 101 accordingly. Pose and build a map.
  • the device 102 for positioning and mapping at the same time may scale the de-distortion image (or left de-distortion image, right de-distortion image), and obtain the corresponding de-distortion image (or left de-distortion image, right de-distortion image).
  • Image Pyramid The corner points are extracted from each scale image of the image pyramid and the descriptor is calculated.
  • the corner points and the descriptors constitute characteristic points of the image.
  • the corner points are highly recognizable and representative regions in the image, and are used to represent position information of the feature points in the image.
  • Descriptors can be represented by vectors, which are used to describe the information of pixels around the corners. Descriptors can be designed according to similar appearance feature points.
  • Feature points are extracted for a de-distorted image (or left de-distorted image, right de-distorted image), and a corresponding large field-of-view frame (or binocular image frame) is constructed based on the extracted feature points.
  • the large field of view frame (or binocular image frame) includes all feature points in the corresponding de-distortion image (or left de-distortion image, right de-distortion image).
  • the pixel data of the de-distorted image (or left de-distorted image, right de-distorted image) corresponding to the large-view frame (or binocular image frame) can be discarded. , Thereby saving storage space and reducing system power consumption.
  • step 230 For a more detailed description of step 230, see FIG. 8 and related descriptions.
  • the process 200 may further include making the left-eye and right-eye optical axes of the large-field-of-view camera 101 parallel.
  • the device 102 for simultaneous positioning and mapping may adjust the virtual optical axes of the left and right eyes of the binocular fisheye camera through a binocular camera calibration program so that the virtual optical axes of the two are parallel.
  • FIG. 8 shows a flowchart of determining a camera pose and constructing a map according to some embodiments of the present application.
  • the process 230 may be implemented as a set of instructions in a non-transitory storage medium in the device 102 that simultaneously locates and maps.
  • the device 102 for positioning and mapping at the same time can execute the set of instructions and can execute the steps in the process 200 accordingly.
  • process 230 may add one or more additional operations not described, and / or delete one or more operations described herein. Furthermore, the order of operations shown in FIG. 8 and described below is not a limitation on this.
  • the device 102 for simultaneous positioning and mapping may perform an initialization step, which may construct an initial map.
  • the device 102 for simultaneous positioning and mapping can acquire two distorted images (or large-field-of-view frames) at two different times; determine the two images Frame) matching feature points; constructing an initial map based on the matching feature points.
  • the device 102 for simultaneous positioning and mapping may obtain a dedistortion image (or a large field of view frame) corresponding to a first moment and a dedistortion image (or a large field of view frame) corresponding to a second moment; determine the first Feature points at which the dedistortion image (or large field of view frame) corresponding to the time and the second time corresponding to the dedistortion image (or large field of view frame) match each other; and an initial map is constructed based on the mutually matched feature points.
  • the large field of view frame corresponding to the first time and the large field of view frame corresponding to the second time may be the current large field of view frame and the reference large field of view frame.
  • the current large field of view frame and the reference large field of view frame may be continuous frames, or there may be one or more interval frames between the two. A certain parallax needs to exist between the current large field of view frame and the reference large field of view frame to ensure smooth initialization.
  • the device 102 for simultaneous positioning and mapping may be based on a multiple virtual pinhole camera model (for example, the multiple virtual pinhole camera model shown in FIG. 4),
  • the field-of-view frame) and the de-distortion image (or large field-of-view frame) corresponding to the second moment are decomposed into sub-field-of-view frames corresponding to each virtual pinhole camera, respectively. Therefore, for each virtual pinhole camera, two sub-field frames corresponding to the two sub-field frames are obtained from the dedistortion image (or large field frame) corresponding to the first moment and the dedistortion corresponding to the second moment. Image (or large field of view frame). Matching feature points are determined by performing inter-frame matching on the two sub-view frames.
  • constructing the initial map based on the matched feature points includes: determining, based on the feature points in the de-distortion image corresponding to the first moment and the camera center of the large field of view camera at the first moment, determining A direction vector corresponding to the first feature point; determining a direction corresponding to the second feature point based on the matched feature point in the de-distortion image corresponding to the second moment and the camera center of the large field of view camera at the second moment Vector; triangulate a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and construct an initial map based on the map point.
  • the device 102 for simultaneous positioning and mapping may decompose the reference large field of view frame F1 into sub-field frames F11, F12, F13, F14, and F11 corresponding to each virtual pinhole camera based on the multiple virtual pinhole camera model, respectively.
  • the current large field of view frame F2 is also decomposed into subfield frames F21, F22, F23, F24, and F25 corresponding to each virtual pinhole camera based on the multi-virtual pinhole camera model.
  • the sub-field frames F11 and F21 correspond to the forward-facing virtual pinhole camera
  • the sub-field frames F12 and F22 correspond to the upward-facing virtual pinhole camera
  • the sub-field frames F13 and F23 correspond to the downward-facing virtual pinhole camera
  • sub-field frames F14 and F24 correspond to left-facing virtual pinhole cameras
  • sub-field frames F15 and F25 correspond to right-facing virtual pinhole cameras.
  • the feature points of the current large-view frame and the reference large-view frame are determined by performing inter-frame matching on the sub-view frames F11 and F21, F12 and F22, F13 and F23, F14 and F24, and F15 and F25. [Here, the matching of the sub-view frames is used to determine the feature points where the two view frames match each other, and then a triangulation is performed based on the direction vector to construct a new map point. Is the description correct?
  • the sub-field frames F11 and F21 are taken as examples to describe the inter-frame matching.
  • the feature points of the sub-view frames F11 and F21 are matched, and it is detected whether the number of matched feature point pairs is greater than or equal to the initialization threshold. If it is smaller than the initialization threshold, the initialization fails. If the number of matched feature point pairs exceeds the initialization threshold, for example, Random Sample Consensus (RANSAC) is used to calculate the essential matrix between two frames based on the direction vectors of the matched feature point pairs.
  • the initialization threshold indicates the minimum number of feature point pairs required to initialize the map.
  • the default setting value such as 100, can be directly used, or it can be set by the user in advance.
  • the relative pose between the current large field of view frame and the reference large field of view frame is obtained by decomposing the essential matrix, and the relative pose can be represented by a pose matrix.
  • the three-dimensional coordinates of the map point corresponding to the feature point pair that is, the position of the map point, are triangulated according to the feature point pair matched by the relative pose pair between the current large field of view frame and the reference large field of view frame.
  • point O1 is the camera center of the virtual pinhole camera corresponding to the sub-field frame F11
  • point O2 is the camera center of the virtual pinhole camera corresponding to the sub-field frame F12
  • p1 and p2 are matched feature points .
  • the three-dimensional coordinates of the map point that is, the position of the P point can be determined.
  • the vectors O1p1 and O2p2 may not have intersections.
  • the coordinates of the P point that minimizes the error can be obtained using the least square method.
  • the distance between O1 and O2 has a great influence on the error of triangulation.
  • the distance is too short, that is, the camera's translation is too small, the angular error observed at point P will cause a large depth error. If the distance is too long, the overlapping part of the scene will be much less, which makes the feature matching difficult. Therefore, a certain parallax needs to exist between the current large field of view frame and the reference large field of view frame. If the two selected large-view frames do not meet the requirements, the initialization fails. The two large-view frames are discarded and re-initialized.
  • the initial map points are constructed based on the three-dimensional coordinates of the map points obtained from the triangulation.
  • the three-dimensional coordinates are used as the coordinates of the map points, and the descriptors of the feature points corresponding to the three-dimensional coordinates are used as the descriptors of the map points.
  • the device 102 for simultaneous positioning and mapping may perform the above-mentioned initialization steps of the monocular large-field-of-view camera; it may also be based on the feature points where the left and right distortion images match each other at the same time. Build the initial map.
  • the device 102 for simultaneous positioning and mapping may determine the feature points where the left and right distortion images match each other; and build an initial map based on the mutually matched feature points.
  • the device 102 for simultaneous positioning and mapping may determine the epipolar line corresponding to the feature point in the left de-distorted image in the right de-distorted image; and then search the polar line for the left-de-distorted image. Feature points in the matching feature points.
  • the epipolar line is a multi-line segment polyline.
  • FIG. 10 a schematic diagram of polar line search of a binocular large field of view camera according to some embodiments of the present application is shown.
  • the left de-distorted image 1010 has an epipolar line 1001
  • the right de-distorted image 1020 has an epipolar line 1002.
  • the feature points that match the feature points of the left de-distorted image 1010 must be located in the epipolar line 1002.
  • the feature points that match the feature points of the right de-distorted image 1020 must be located in the epipolar line 1001. Therefore, the feature points where the left de-distorted image and the right de-distorted image match each other can be quickly found through epipolar search.
  • the epipolar line 1001 and the polar line 1002 are three-line polyline, including two inclined line segments and one horizontal line segment.
  • the left de-distorted image 1010 and the right de-distorted image 1020 retain all perspectives of the left fisheye image and the right fisheye image, respectively. Simultaneous localization and mapping based on the left de-distorted image 1010 and the right de-distorted image 1020 can build a map including all the original perspective content.
  • the above-mentioned constructing a map based on the matched feature points includes: first, determining a direction vector corresponding to the first feature point based on the feature points in the left de-distorted image and the left-view camera center of the large field of view camera 101 ; Secondly, based on the matched feature points in the right de-distorted image and the right-view camera center of the large field of view camera 101, a direction vector corresponding to the second feature point is determined; again, based on the baseline of the binocular fisheye camera, A direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point are triangulated to determine a map point corresponding to the feature point; finally, a map is constructed based on the map point.
  • FIG. 11 a schematic diagram of constructing a map point by a binocular large field of view camera according to some embodiments of the present application is shown.
  • a map point in front of the large field of view camera 101 is taken as an example.
  • Point O1 is the center of the left target camera of the large field of view camera 101, and the feature point in the left de-distorted image is connected to the point O1 to obtain a direction vector corresponding to the first feature point.
  • Point O2 is the center of the right-view camera of the large-field-of-view camera 101, and the matching feature point and the point O2 in the right de-distortion image are connected to obtain a direction vector corresponding to the second feature point.
  • the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point may be a unitized vector.
  • the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point intersect at a point E, to obtain a line segment O1E and a line segment O2E, respectively.
  • Connect the point O1 and the point O2 to obtain a line segment O1O2 and the length of the line segment O1O2 is b (that is, the baseline of the large field of view camera 101).
  • the line segment O1O2 forms a triangle with the line segment O1E and the line segment O2E. Solving the triangle, the length of the line segment O1E is d1, the length of the line segment O2E is d2, the angle between O1O2 and the line segment O1E is, and the angle between O1O2 and the line segment O2E is coordinate of.
  • the map point E is transformed from the coordinate system of the large field of view camera 101 to the world coordinate system. Then, a map is constructed based on the position of the point E in the world coordinate system.
  • the device 102 for simultaneous positioning and mapping may perform triangulation based on the following formula.
  • formulas (1), (2), and (3) are obtained based on the sine and cosine theorem.
  • the newly constructed map points of the monocular large field of view camera and the binocular large field of view camera need to be associated with each other.
  • a map point can be observed by multiple key large field of view frames.
  • the key large field of view frame where this map point is observed is associated with this map point, and which feature point on the key large field of view frame is recorded with this map point.
  • the constructed initial map includes the above-mentioned two key large field-of-view frames and the above-mentioned initial map points, and information about the association relationship between them.
  • the initialization step further includes: for a case where the number of matched feature point pairs exceeds an initialization threshold, constructing a vector based on a Bag of Word model according to the two key large field-of-view frames, and
  • the vector based on the bag of words model is added to a map database.
  • clustering is performed based on various image features. For example, eyes, nose, ears, mouth, edges and corners of various features, etc. are different feature classes. Assume that there are 10,000 classes. For each large field of view frame, you can analyze which classes it contains, with 1 being there and 0 being not. Then, this large field of view frame can be represented by a 10,000-dimensional vector. For different large-view frames, you can judge their similarity by comparing their vectors based on the bag of words model.
  • the map database is used to store vectors based on the bag-of-words model constructed from key large field of view frames.
  • the device 102 that simultaneously locates and maps may perform a global bundling optimization step.
  • Global bundle set optimization optimizes all key large field of view frames (or key binocular image frames) and all map points in the map currently established by SLAM (hereinafter referred to as the current map).
  • global bundling optimization is performed on the initial map constructed in step 810, that is, global bundling optimization is performed on the above map with only two key large video frames and map points. It can be understood that, in addition to the initial map, the global bundle set optimization can be performed on the current map at any time during the map construction process.
  • the purpose of bundling optimization is to minimize the positions of map points in key large field of view frames (or Re-projection error on the key binocular image frame), thereby optimizing the map that is established.
  • the device 102 for simultaneous positioning and mapping may project each map point associated with a key large field of view frame in a map to a multi-virtual pinhole camera model, and obtain each map point in the A reprojection point in a multi-virtual pinhole camera model; determining a reprojection error of each map point according to a feature point corresponding to the reprojection point of each map point in the multi-virtual pinhole camera model and the map point; Determining a reprojection error according to reprojection errors of map points associated with all the key large field of view frames; updating the pose of the key large field of view frames and associating with the key large field of view frames based on the reprojection errors The location of all map points.
  • the pose of the frame (for example, the pose of the key large field of view frame) is the position of the large field of view camera 101 when the large field of view camera 101 acquires the frame. It is the pose of the frame.
  • map points are transformed into the coordinate system of the corresponding virtual pinhole camera, for example, corresponding to the map point.
  • the virtual pinhole camera is a forward virtual pinhole camera, and then is projected onto the imaging plane of the forward virtual pinhole camera to obtain a reprojection point of the map point. Since the multi-virtual pinhole camera model used here is the same model as the multi-virtual pinhole camera model used in the large-field-of-view image distortion processing in step 220, the imaging plane corresponds to the key large-field-of-view frame decomposition to the front Sub-field frame of a virtual pinhole camera.
  • the re-projection point can be understood as an observation value of the map point based on the pose of the sub-view frame.
  • the reprojection error of the map point is determined according to the feature points associated with the map point on the key large field of view frame (that is, the feature points on which the map point is obtained by triangulation) and the reprojection point of the map point. In the ideal situation where there is no error in the map established by SLAM, the reprojection error is zero. However, since real world conditions inevitably introduce errors such as measurement errors, reprojection errors cannot be completely eliminated. SLAM optimizes the maps by minimizing the reprojection errors.
  • the device 102 for simultaneous positioning and mapping may project each map point associated with a key binocular image frame in a map to a first multi-virtual pinhole camera model to obtain the map points at Determining a reprojection point in the first multi-virtual pinhole camera model; and determining the feature point corresponding to the map point's reprojection point in the first multi-virtual pinhole camera model and a feature point corresponding to the map point Reprojection errors of map points; determine left reprojection errors according to reprojection errors of map points associated with all the key binocular image frames.
  • the device 102 for simultaneously positioning and mapping may project the map points associated with the key binocular image frames into a second multiple virtual pinhole camera model to obtain the map points in the second multiple virtual pinhole camera model.
  • a re-projection point in the hole camera model determining a re-projection error of the map point according to a feature point corresponding to the re-projection point of the map point in the second multi-virtual pinhole camera model and the map point;
  • the right reprojection error is determined according to the reprojection errors of the map points associated with all the key binocular image frames.
  • the device 102 for simultaneous positioning and mapping may update the pose of the key binocular image frame and all maps associated with the key binocular image frame based on the left reprojection error, the right reprojection error, or the sum of the two.
  • the location of the point Specifically, for a monocular map point, the device 102 for simultaneous positioning and mapping may update the position of the key binocular image frame and the positions of all map points associated with the binocular image frame based on the left reprojection error or the right reprojection error.
  • the device 102 for simultaneous positioning and mapping may update the position of the key binocular image frame and the positions of all map points associated with the binocular image frame based on the sum of the left reprojection error and the right reprojection error. .
  • the device 102 for simultaneous positioning and mapping may determine a loss function based on reprojection errors (eg, the sum of left reprojection error, right reprojection error, left reprojection error, and right reprojection error).
  • reprojection errors eg, the sum of left reprojection error, right reprojection error, left reprojection error, and right reprojection error.
  • iteratively solve the key large field of view frame (or key double field) by using gradient descent methods such as Gauss-Newton method and Levenberg-Marquardt method. (The image frame) and the gradient corresponding to the position of the map point associated with it, and update the position of the key large field of view frame (or key binocular image frame) and the position of the map point associated with the gradient corresponding to the gradient.
  • the current map reach the optimal state with the smallest reprojection error.
  • the above-mentioned bundle optimization is based on the multiple virtual pinhole camera model, which is the same as the large field of view image distortion processing, and converts the complex large field of view camera projection model into multiple virtual pinhole camera projection models. This avoids the complex optimization processing brought by the complex projection model of the large field of view camera, thereby improving the system processing performance.
  • the device 102 that simultaneously locates and maps may perform a tracking step.
  • the tracking step optimizes the pose of the current large field of view camera by minimizing the reprojection error of the map point on the current large field of view frame (or the current binocular image frame).
  • Step 830 may be performed at any time during map construction, for example, based on the initial map constructed in the above initialization step 810 or the map optimized based on the above-mentioned global bundle optimization step 820.
  • SLAM continues to ) Track the pose of a large field of view camera movement.
  • the device 102 for simultaneous positioning and mapping may project each map point associated with the current large field of view frame into a multi-virtual pinhole camera model to obtain the map point in the multi-virtual pin A reprojection point in a hole camera model; determining a reprojection error of the map point according to a feature point corresponding to the map point's reprojection point in the multi-virtual pinhole camera model and the map point; according to all A reprojection error of a map point associated with the current large field of view frame determines a reprojection error; and based on the reprojection error, the pose of the current large field of view frame is updated.
  • the device 102 for simultaneous positioning and mapping may perform the following three sub-steps to complete the tracking step.
  • Tracking sub-step 1 Determine a reference large-view frame of the current large-view frame.
  • the previous large field of view frame of the current large field of view frame is determined as the reference large field of view frame.
  • a key large field of view frame with the highest degree of co-viewing with the current large field of view frame in the local map is selected as the reference large field of view frame.
  • the local map includes all key large field of view frames and all map points in the current map, where N is an integer greater than two. N can directly use the default setting value, such as 10, or it can be preset by the user. If the current map is an initialized map, the local map is the current map, including the initial two key large FOV frames and the map points associated with them.
  • the local map includes at least N key large field of view frames with the highest degree of co-viewing with the current large field of view frames in the current map and the at least N Map points associated with key large field of view frames.
  • a key large field of view frame with the highest degree of co-viewing with the previous large field of view frame of the current large field of view frame in the local map is selected as the reference large field of view frame.
  • the current large-view frame and the previous large-view frame generally have a high degree of common viewing, so the reference large-view frame of the current large-view frame can be selected according to the latter.
  • the key large field of view frame which has the highest co-view level of the previous large field frame is easier to select. This is beneficial to the smooth implementation of the SLAM method.
  • the reference large field of view frame is determined through global matching.
  • a vector based on the bag-of-words model is constructed based on the current large field of view frame.
  • the map database established in the initialization step 810 is used to obtain a key large field of view frame that matches the current large field of view frame as a reference large field of view frame.
  • the current large field of view frame and the previous large field of view frame are matched to obtain a matched feature point pair. If the number of matched feature point pairs is greater than the tracking threshold, it is determined that the previous large field of view frame of the current large field of view frame is the reference large field of view frame.
  • the tracking threshold indicates the minimum number of feature point pairs required to track the pose of the camera with a large field of view, and a default setting value, such as 20, may be directly used, or may be preset by a user.
  • the number of feature point pairs matching the current large field of view frame and the previous large field of view frame is not greater than the tracking threshold, select the key in the local map that has the highest degree of co-viewing with the current large field of view frame or its previous large field of view frame. Large field of view frame, matching the current large field of view frame and the key large field of view frame to obtain matched feature point pairs. If the number of matched feature point pairs is greater than the tracking threshold, the key large field of view frame is determined as the reference large field of view frame.
  • the reference large field of view frame is determined through global matching. The specific determination process is as described above. For brevity, it will not be repeated here.
  • Tracking sub-step 2 Determine the pose of the current large field of view frame based on the multiple virtual pinhole camera model based on the current large field of view frame and the reference large field of view frame determined above.
  • the pose of the current large field of view frame is determined by determining the relative pose between the current large field of view frame and the reference large field of view frame.
  • the current large field of view frame is decomposed into sub-field frames corresponding to each virtual pinhole camera based on the multiple virtual pinhole camera model, and the same operation is performed for the reference large field of view frame.
  • two sub-field frames corresponding to it are obtained.
  • a sub-field frame pair with the largest number of matching feature point pairs is selected.
  • the two sub-view frames in the sub-view frame pair are inter-frame matched to obtain the relative pose between them.
  • the specific inter-frame matching process of the sub-view frame is consistent with the inter-frame matching process in the initialization step 810. For brevity, details are not described herein again.
  • each virtual pinhole camera Since the camera center of each virtual pinhole camera is coincident with the camera center of the large field of view camera, there is a fixed rotation angle between each virtual pinhole camera and the large field of view camera in the multiple virtual pinhole camera model.
  • the rotation angle of each virtual pinhole camera corresponds to a certain rotation matrix. Therefore, the pose matrix of the large-view frame can be transformed into the pose matrix of the sub-view frame by the corresponding rotation matrix. Conversely, the pose matrix of the sub-view frame can also be transformed into the pose matrix of the large-view frame by the corresponding rotation matrix.
  • the above solution uses multiple virtual pinhole camera models to convert the pose determination based on a complex large-field-of-view camera projection model into a pose calculation based on a simple virtual pin-hole camera projection model, making the SLAM algorithm for large-view Greatly simplified and significantly improved performance.
  • Tracking sub-step 3 Update the pose of the current large field of view frame obtained in the above tracking sub-step 2.
  • the map point associated with the characteristic point is transformed based on the multiple virtual pinhole camera model To the coordinate system of the corresponding virtual pinhole camera of the current large field of view frame. Then, the map point is projected onto the imaging plane of the virtual pinhole camera to obtain a reprojection point of the map point in the current large field of view frame.
  • Processing is performed based on a multi-directional virtual pinhole camera model shown in FIG. 4.
  • a matching feature point in the reference large field of view frame is on the imaging plane of the left-facing virtual pinhole camera.
  • the map points associated with the feature points are transformed based on the multiple virtual pinhole camera model and correspond to the forward virtual pinhole camera coordinate system of the current large field of view frame.
  • the reprojection point of the map point is obtained on the imaging plane of the forward virtual pinhole camera of the current large field of view frame.
  • map point can be observed by the left-facing virtual pinhole camera in the multi-virtual pinhole camera model in the pose of the reference large-view frame, and the pose in the current large-view frame passes the multi-virtual pinhole.
  • the forward-looking virtual pinhole camera in the camera model can observe the map point.
  • the re-projection error of the map point is determined according to the re-projection point and the matching feature point in the current large field of view frame. Update the pose of the current large field of view frame according to the reprojection error of the map points associated with all matching feature points in the large field of view frame.
  • the calculation of the reprojection error in this step and the processing of updating the pose of the current large field of view frame according to the reprojection error are consistent with the processing method in the global bundle optimization of step 820, and for the sake of brevity, they are not repeated here.
  • the device 102 for simultaneous positioning and mapping may project each map point associated with the current binocular image frame into a first multi-virtual pinhole camera model to obtain that the map point is in the first A re-projection point in a multi-virtual pinhole camera model; and determining the map points based on the re-projection points of the map point in the first multi-virtual pin-hole camera model and feature points corresponding to the map points Re-projection error; determining a left re-projection error according to the re-projection errors of map points associated with all the current binocular image frames.
  • the device 102 for simultaneous positioning and mapping may project the map point into a second multi-virtual pinhole camera model to obtain a re-projection of the map point in the second multi-virtual pinhole camera model Point; determining a reprojection error of the map point according to a feature point corresponding to the map point's reprojection point in the second multiple virtual pinhole camera model and the map point; according to all the current binoculars
  • the reprojection error of the map points associated with the image frame determines the right reprojection error.
  • the device 102 for simultaneous positioning and mapping may update the pose of the current binocular image frame based on the left reprojection error, the right reprojection error, or the sum of the two. For example, for a monocular map point, the device 102 for simultaneous positioning and mapping may update the pose of the current binocular image frame based on a left reprojection error or a right reprojection error; for a binocular map point, simultaneously locate and map The device 102 may update the pose of the current binocular image frame based on the sum of the left reprojection error and the right reprojection error.
  • the device 102 for simultaneous positioning and mapping may solve the left re-projection error, the right re-projection error, or the sum of the left re-projection error and the right re-projection error to determine the pose increment of the large field of view camera 101;
  • the prior information determines the current pose of the large field of view camera 101.
  • the prior information may be the pose of the large-field-of-view camera 101 in the previous frame, or the sum of the pose of the large-field-of-view camera 101 in the previous frame and the pose increment of the previous frame.
  • the pose increment of the previous frame is the pose increment between the pose of the large field of view camera 101 in the previous frame and the pose of the large field of view camera 101 in the previous two frames.
  • the device 102 for simultaneous positioning and mapping may calculate the left projection error and / or the right projection error by using the following multiple formulas, and solve the pose increment.
  • Formula (7) is expressed as follows:
  • P represents a map point in the world coordinate system, which can be expressed as; represents a coordinate transformation matrix, which can convert the map point P from the world coordinate system to the coordinate system of a multiple virtual pinhole camera model; represents a rotation vector, which can map the map Point P is transformed from the coordinate system of the multiple virtual pinhole camera model to the coordinate system of one face of the multiple virtual pinhole camera model; K represents the camera matrix of the pinhole camera corresponding to each face of the virtual multiple pinhole camera model, The matrix contains camera parameters, such as the image center and focal length information; u represents the reprojection point of the map point P on one surface of the multi-virtual pinhole camera model.
  • formula (7) can be further expressed as formula (8).
  • P2 represents the projection point of the map point P on the coordinate system of the multiple virtual pinhole camera model
  • P1 represents the projection point of the point P2 on the coordinate system of one surface of the multiple virtual pinhole camera model.
  • the Jacobian matrix representing u to the camera pose and the diagonally symmetric matrix of P2.
  • the Jacobian matrix of the map point P can be determined as follows:
  • the Jacobian matrix of the map point P represents the rotation component of the coordinate transformation matrix.
  • the device 102 for simultaneous positioning and mapping can determine the left reprojection error of the large field of view camera 101 and determine the large field of view camera based on formulas (7), (8), (9), and (10) The pose of 101.
  • the device 102 for simultaneous positioning and mapping can determine the right reprojection error of the large field of view camera 101; then, based on the right reprojection error or left reprojection error and right reprojection error The sum determines the pose of the large field of view camera 101.
  • the right reprojection error can be determined by formula (11). Among them, represents the re-projection point of the map point P on one surface of the second multi-virtual pinhole camera model; represents the offset of the left-eye relative to the right-eye of the large-field-of-view camera 101; and b indicates the baseline length of the large-field-of-view camera 101.
  • the device 102 that simultaneously locates and maps may perform a mapping step (or a map update step).
  • the mapping step can be based on the current map and expand the map with the movement of the large field of view camera. In other words, the mapping step inserts new map points into the current map.
  • the mapping step 840 may be performed after the tracking step 830.
  • the tracking step 830 is used to determine its pose, that is, the large field of view camera at the current moment is determined. Posture of movement.
  • the device 102 for simultaneous positioning and mapping may determine feature points where the current large-field of view frame and its reference frame match each other; based on the feature points of the current large-field of view frame and the current large-field of view
  • the camera center of the field camera determines the direction vector corresponding to the first feature point; based on the feature points matched by the reference frame and the camera center of the large field of view camera corresponding to the reference frame, determines the direction corresponding to the second feature point Vector; triangulate a direction vector corresponding to the first feature point and a direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; and construct a map based on the map point.
  • the device 102 for simultaneous positioning and mapping may perform the following three sub-steps to complete the mapping step.
  • Mapping sub-step 1 Determine whether the current large field of view frame is a key large field of view frame.
  • the large-field-of-view camera collects data continuously, performing a map update operation on each obtained large-field-of-view frame will bring a huge amount of calculation. Therefore, you can select some large FOV frames that are considered important as the key FOV frames, and then perform map update operations based on the key FOV frames.
  • How to determine the key large field of view frames can use any conventional or future developed technology. For example, based on the initial key large field of view frames, one key field of view frame is selected every ten large field of view frames at intervals. That is, select the 11th, 21st, 31st ... as the key large field of view frames. For another example, a large field of view frame with a suitable parallax from the previous key large field of view frame is selected as the key large field of view frame.
  • the map update sub-step 2 is continued, and map update processing is performed according to the current large field of view frame.
  • the map update sub-step 3 is continued to perform map point association processing on the current large field of view frame.
  • Mapping sub-step 2 For the case where the current large field of view frame is a key large field of view frame, map update processing is performed according to the current large field of view frame.
  • the key large field of view frame is decomposed into subfield frames corresponding to each virtual pinhole camera based on the multiple virtual pinhole camera model. Do the same.
  • two corresponding sub-view frames are obtained, and a new map point is constructed by performing inter-frame matching on the two sub-view frames.
  • a vector based on a bag of words model may be used to accelerate the matching between feature points.
  • For feature point pairs matched by the bag-of-words model it is further tested whether they meet the epipolar constraint.
  • the three-dimensional coordinate point of the new map point is obtained by triangulation based on the feature point pair.
  • inter-field matching processing of the sub-view frames here and the process of obtaining the three-dimensional coordinate points of the new map points by triangulation based on the feature point pairs are consistent with the corresponding processing in the initialization step 810, and for the sake of brevity, they are not repeated here.
  • the new map point After constructing a new map point, transform the new map point into a map point in the world coordinate system based on the pose of the current large field of view frame and insert it into the current map, and insert the current large field of view frame into the current map .
  • the coordinate system of the first key large field of view frame used to construct the map during initialization is used as the world coordinate system.
  • the camera coordinate system and the world coordinate system need to be transformed.
  • a new bag-of-words-model-based vector is constructed according to the current large field of view frame and the new bag-of-words-based vector is added to the above-mentioned map database.
  • the map database it is possible to perform vector-accelerated feature point matching based on the bag-of-words model, thereby improving the efficiency of SLAM tracking and mapping.
  • Mapping sub-step 3 For the case where the current large field of view frame is not a key large field of view frame, map point association processing is performed on the current large field of view frame.
  • the map point For each map point in the local map, the map point is transformed into the coordinate system of the corresponding virtual pinhole camera of the current large field of view frame based on the multiple virtual pinhole camera model according to the pose of the current large field of view frame.
  • the map point is then projected onto the imaging plane of the virtual pinhole camera to obtain a reprojection point of the map point in the current large field of view frame. If the projection fails, the map point cannot be observed from the pose of the current large field of view frame. If the projection is successful, it indicates that the map point can be observed from the pose of the current large field of view frame, and a reprojection point of the map point is obtained.
  • the feature point that best matches the map point is associated with the map point. It can be understood that through this step, the current large field of view frame and the map points that can be observed from the pose of the current large field of view frame are associated. In this way, when processing the next large field of view frame, the current large field of view frame can be used as the previous large field of view frame of the next large field of view frame for tracking processing. This makes SLAM tracking more consistent, more accurate positioning, and more accurate maps.
  • the device 102 for simultaneous positioning and mapping may perform the above-mentioned mapping steps of the monocular large-field-of-view camera; it may also be based on the feature points where the left and right distortion images match each other at the same time. To build the map.
  • the device 102 for simultaneous positioning and mapping may determine the feature points where the current left de-distorted image and the current right de-distorted image match each other; based on the feature points of the current left de-distorted image and the current binocular magnification
  • the camera center of the left camera of the field camera determines the direction vector corresponding to the first feature point; based on the feature points of the current right de-distorted image and the camera center of the right camera of the current binocular large field of view camera, determine A direction vector corresponding to the second feature point; triangulating the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point to determine a map point corresponding to the feature point; based on the map point Build a map.
  • the device 102 for simultaneous positioning and mapping may refer to the related description in the initialization step 810 to determine the direction vector corresponding to the first feature point and the direction vector corresponding to the second feature point. And perform triangulation.
  • the mapping step 840 may further include local bundle optimization.
  • the purpose of local bundle set optimization is to minimize the position of the key large field of view frame (or key binocular image frame) and the position of the map point in the local map to minimize the map point in the key large field of view frame (or Re-projection error on the key binocular image frame), thereby optimizing the map that is established.
  • the map point For each map point associated with the key large field of view frame, the map point is transformed into the coordinate system of the corresponding virtual pinhole camera based on the multi-virtual pinhole camera model, and then projected onto the virtual pinhole camera. On the imaging plane to get the reprojection point of the map point.
  • the reprojection error of the map point is determined according to the feature point associated with the map point and the reprojection point of the map point.
  • the poses of the key large field of view frame and the positions of all map points associated with the key large field of view frame are updated.
  • the process of bundle optimization in this step is consistent with the process in the above-mentioned global bundle optimization step 820, and for the sake of brevity, it will not be repeated here.
  • the bundle optimization process for each key binocular image frame in the local map is as follows.
  • the map points associated with the key binocular image frame are projected into a second multiple virtual pinhole camera model to obtain a reprojection point of the map points in the second multiple virtual pinhole camera model; Determining a reprojection error of the map point according to a feature point corresponding to the map point's reprojection point in the second multiple virtual pinhole camera model and the map point; according to all the key binocular image frames The reprojection error of the associated map point determines the right reprojection error.
  • the right re-projection error or the sum of the left re-projection error and the right re-projection error, updating the pose of the key binocular image frame and the key binocular image frame The positions of all map points associated with the binocular image frame.
  • the device 102 for simultaneous positioning and mapping may perform a closed-loop detection processing step.
  • the closed-loop detection processing steps of the monocular large-field-of-view camera and the binocular large-field-of-view camera may be the same.
  • the following takes the closed-loop detection processing of the monocular large-field of view camera as an example.
  • a vector based on a bag of words model is used to detect a closed loop large field of view frame in the current map database that is similar to the current large field of view frame.
  • a matching feature point pair between the closed-loop large-view frame and the current large-view frame is determined.
  • a vector based on a bag-of-words model can be used to accelerate the matching between feature points.
  • the map point associated with the feature point is transformed into the coordinate system of the corresponding virtual pinhole camera of the closed loop large field of view frame based on the multiple virtual pinhole camera model.
  • the map point is then projected onto the imaging plane of the virtual pinhole camera to obtain a re-projection point of the map point in the closed-loop large field of view frame.
  • a first re-projection error is determined according to the re-projection point and a matching feature point in the closed-loop large field of view frame.
  • the first cumulative reprojection error is determined according to the first reprojection error of all matching feature points in the current large field of view frame.
  • the map point associated with the characteristic point is transformed into the coordinate system of the corresponding virtual pinhole camera based on the multiple virtual pinhole camera model based on the current large field of view frame. And then project it onto the imaging plane of the virtual pinhole camera to obtain the reprojection point of the map point in the current large field of view frame.
  • a second re-projection error is determined according to the re-projection point and a matching feature point in the current large field of view frame.
  • a second cumulative reprojection error is determined according to the second reprojection error of all matching feature points in the closed-loop large field of view frame.
  • a loss function is determined according to the first cumulative reprojection error and the second cumulative reprojection error.
  • the similarity transformation matrix is optimized by minimizing the loss.
  • the key large-view frames in the current map that have a common view relationship with the current large-view frames and the map points associated with them.
  • the number of common map points observed in two large-view frames is larger than the common-view relationship threshold, which indicates that the two large-view frames have a common-view relationship.
  • the common-view relationship threshold indicates the minimum number of common map points required for judging that two key large-view frames have a common-view relationship.
  • the default setting value may be directly used, such as 20, or may be preset by the user.
  • the poses of the key large field of view frames and the positions of the map points associated with the key large field of view frames are corrected through the similarity transformation matrix. This completes the closed-loop detection process.
  • the closed-loop detection process further includes further optimizing the poses of all key large field-of-view frames in the current map and the positions of all map points through pose-graph optimization.
  • the closed-loop detection process further includes finding and eliminating redundant key frames and map points to save system storage space while avoiding redundant computing operations.
  • Steps 810 to 850 in the above embodiment provide an implementation of step 230 of a large field of view SLAM based on a multi-virtual pinhole camera model.
  • any conventional or future-developed large-field SLAM method can be adopted.
  • the above-mentioned optimization update processing for reprojection error calculation based on the multiple virtual pinhole camera model may be replaced with the optimization update processing based on the unit direction vector error calculation.
  • the calculation based on the unit direction vector error achieves a final optimization goal by minimizing a difference between a unit direction vector corresponding to a map point and a unit direction vector corresponding to a feature point associated with the map point.
  • the optimized target loss can be the distance between the unit direction vectors, or the angle between the unit vectors, or it can be another index describing the vector error.
  • the present application sometimes combines various features in a single embodiment, a drawing, or a description thereof.
  • the present application disperses various features in multiple embodiments of the present invention.
  • this is not to say that a combination of these features is necessary, and those skilled in the art are likely to extract some of these features as a separate embodiment when reading this application.
  • the embodiments in this application can also be understood as the integration of multiple secondary embodiments. It is also true that the content of each secondary embodiment is less than all the features of a single previously disclosed embodiment.
  • numbers expressing quantities or properties used to describe and claim certain embodiments of the present application are understood to be modified in some cases by the terms “about”, “approximately” or “substantially”. For example, unless otherwise stated, “about”, “approximately” or “substantially” may mean a ⁇ 20% change in the value that it describes. Accordingly, in some embodiments, the numerical parameters set forth in the written description and appended claims are approximations that can vary depending on the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be interpreted based on the number of significant digits reported and by applying common rounding techniques. Although some embodiments set forth in this application list a wide range of numerical ranges and parameters are approximate, specific examples have listed numerical values as accurate as possible.

Abstract

一种同时定位和建图的方法、装置,以及非暂时性计算机可读介质。所述同时定位和建图的方法包括:通过大视场相机获取大视场图像(210);基于多虚拟针孔相机模型,得到大视场图像对应的去畸变图像(220);基于去畸变图像,确定大视场相机的位姿并构建地图(230)。所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且至少两个不同朝向的虚拟针孔相机的相机中心与大视场相机的相机中心重合。

Description

一种同时定位与建图的方法及装置 技术领域
本发明涉及同时定位与建图领域,尤其涉及一种基于大视场相机的同时定位与建图领域。
背景技术
同时定位与建图(Simultaneous Localization And Mapping,简称SLAM)是一种通过实时跟踪机器人运动并在此过程中同时建立周围环境地图以达到定位导航等目标的技术。
传统的SLAM使用的相机为透视相机(perspective camera)或者称为针孔相机(pinhole camera)。由于相机的视角(Field-of-View)有限,所获取的图像间存在的共有特征不足,可能导致SLAM算法跟踪丢失。相对于传统SLAM使用的针孔相机,大视场相机具有更大的视角,因而得到了广泛的研究和关注。
现有的基于大视场图像的SLAM技术方案主要有两种。
一种是先对大视场相机获得的大视场图像使用传统的去畸变方法进行去畸变处理,再把去畸变后的图像作为普通图像使用传统SLAM技术实现同时定位与建图。这种技术方案简单易行,但是传统的去畸变方法将导致损失很多的视角,不能充分利用大视场相机的广视角。
另一种是基于大视场相机成像模型直接针对未做畸变校正的大视场图像进行SLAM处理。即直接在未做畸变校正的大视场图像上提取特征进行处理。这种技术方案提取的特征可能会受到图像畸变的影响,同时复杂的大视场相机成像模型将导致优化变得异常复杂,从而影响系统的性能。
因此,迫切需要一种新的SLAM技术,能够保留大视场相机的所有视野同时避免图像畸变的影响,同时又能兼顾对景深的探测、定位、建图。
发明内容
本申请的目的在于提供一种同时定位与建图的方法。该方法可以,基于多虚拟针孔相机模型,将大视场相机获取的大视场图像去畸变;根据去畸变后的图像进行同时 定位和建图。
本申请一方面提供一种同时定位与建图的方法。所述方法包括:通过大视场相机获取大视场图像;基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;基于所述去畸变图像,确定所述大视场相机的位姿并构建地图。其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
在一些实施例中,所述大视场相机为单目大视场相机。所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括初始化步骤,所述初始化步骤包括:获取第一时刻对应的去畸变图像和第二时刻对应的去畸变图像;确定所述第一时刻对应的去畸变图像和所述第二时刻对应的去畸变图像互相匹配特征点;基于所述互相匹配的特征点构建初始地图。
在一些实施例中,所述基于所述互相匹配的特征点构建初始地图包括:基于所述第一时刻对应的去畸变图像中的特征点和所述第一时刻时所述大视场相机的相机中心,确定第一特征点对应的方向向量;基于所述第二时刻对应的去畸变图像中匹配的特征点和所述第二时刻时所述大视场相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建初始地图。
在一些实施例中,所述大视场相机为单目大视场相机。所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括全局捆集优化步骤,所述全局捆集优化步骤包括:对于所述地图中的每个关键大视场帧,将所述关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;基于所述重投影误差,更新所述关键大视场帧的位姿以及与所述关键大视场帧关联的所有地图点的位置。
在一些实施例中,所述大视场相机为单目大视场相机。所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括跟踪步骤,所述跟踪步骤包括:对于当前大视场帧关联的每个地图点,将所述地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据 所有所述当前大视场帧关联的地图点的重投影误差确定重投影误差;基于所述重投影误差,更新所述当前大视场帧的位姿。
在一些实施例中,所述大视场相机为单目大视场相机。所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括建图步骤,所述建图步骤包括:确定当前大视场帧及其参考帧互相匹配的特征点;基于所述当前大视场帧的特征点和当前所述大视场相机的相机中心,确定第一特征点对应的方向向量;基于所述参考帧匹配的特征点和所述参考帧对应的所述大视场相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建地图。
在一些实施例中,所述建图步骤进一步包括局部捆集优化步骤。所述局部捆集优化步骤包括:对于局部地图中的每个关键大视场帧,将所述关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;根据所述重投影误差,更新所述关键大视场帧的位姿以及与该关键大视场帧关联的所有地图点的位置。
在一些实施例中,所述大视场相机为双目大视场相机。所述方法包括:通过所述双目大视场相机获取左视场图像和右视场图像;基于第一多虚拟针孔相机模型,得到所述左视场图像对应的左去畸变图像;基于第二多虚拟针孔相机模型,得到所述右视场图像对应的右去畸变图像;基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图。其中,所述第一多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述双目大视场相机的左侧相机的相机中心重合;所述第二多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述双目大视场相机的右侧相机的相机中心重合。
在一些实施例中,所述基于所述左去畸变图像和所述右去畸变图像。确定所述双目大视场相机的位姿并构建地图包括初始化步骤,所述初始化步骤包括:确定所述左去畸变图像和所述右去畸变图像互相匹配的特征点;基于所述互相匹配的特征点构建初始地图。
在一些实施例中,所述确定所述左去畸变图像和所述右去畸变图像互相匹配的 特征点包括:确定所述左去畸变图像中的特征点在所述右去畸变图像中对应的极线;在所述极线上搜索与所述左去畸变图像中的特征点匹配的特征点。其中,所述极线为多线段折线。
在一些实施例中,所述基于所述互相匹配的特征点构建初始地图包括:基于所述左去畸变图像中的特征点和所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;基于所述右去畸变图像中匹配的特征点和所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;基于所述双目大视场相机的基线,对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建初始地图。
在一些实施例中,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括全局捆集优化步骤。所述全局捆集优化步骤包括:对于所述地图中的每个关键双目图像帧,将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差;或将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差;基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
在一些实施例中,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括跟踪步骤。所述跟踪步骤包括:对于当前双目图像帧关联的每个地图点,将所述地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定左重投影误差;或将所述地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所 述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定右重投影误差;基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述当前双目图像帧的位姿。
在一些实施例中,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括建图步骤。所述建图步骤包括:确定当前左去畸变图像和当前右去畸变图像互相匹配的特征点;基于所述当前左去畸变图像的特征点和当前所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;基于所述当前右去畸变图像的特征点和当前所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建地图。
在一些实施例中,所述建图步骤进一步包括局部捆集优化步骤。所述局部捆集优化步骤包括:对于局部地图中的每个关键双目图像帧,将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差;或将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差;基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
在一些实施例中,所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括闭环检测处理步骤。所述闭环检测处理步骤包括:当当前大视场帧是关键大视场帧时,确定地图数据库中与所述当前大视场帧相似的闭环大视场帧;确定所述当前大视场帧与所述闭环大视场帧互相匹配的特征点;针对所述当前大视场帧中每个匹配的特征点,将该特征点关联的地图点变换到所述闭环大视场帧对应的多虚拟针孔相机模型的坐标系中,再投影到所述多虚拟针孔相机模型的成像平面上,得到该地图点在所述闭环大视场帧中的重投影点,根据该重投影点与所述闭环大视场帧中匹配的特征点确定 第一重投影误差;根据所述当前大视场帧中所有匹配的特征点的第一重投影误差确定第一累计重投影误差;针对所述闭环大视场帧中每个匹配的特征点,将该特征点关联的地图点变换到所述当前大视场帧对应的多虚拟针孔相机模型的坐标系中,再投影到所述多虚拟针孔相机模型的成像平面上,得到该地图点在所述当前大视场帧中的重投影点,根据该重投影点与所述当前大视场帧中匹配的特征点确定第二重投影误差;根据所述闭环大视场帧中所有匹配的特征点的第二重投影误差确定第二累计重投影误差;利用所述第一累计重投影误差和所述第二累计重投影误差,对地图中与所述当前大视场帧具有共视关系的关键大视场帧以及与其关联的地图点进行校正。
在一些实施例中,所述至少两个不同朝向包括:立方体的前朝向、上朝向、下朝向、左朝向或右朝向。
本申请一方面提供一种同时定位与建图的装置,所述装置包括至少一个存储设备,所述存储设备包括一组指令;以及与所述至少一个存储设备通信的至少一个处理器。其中,当执行所述一组指令时,所述至少一个处理器用于使所述同时定位与建图的装置:通过大视场相机获取大视场图像;基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;基于所述去畸变图像,确定所述大视场相机的位姿并构建地图。所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
本申请中另外的特征将部分地在下面的描述中阐述。通过该阐述,使以下附图和实施例叙述的内容对本领域普通技术人员来说变得显而易见。本申请中的发明点可以通过实践或使用下面讨论的详细示例中阐述的方法、手段及其组合来得到充分阐释。
附图说明
以下附图详细描述了本申请中披露的示例性实施例。其中相同的附图标记在附图的若干视图中表示类似的结构。本领域的一般技术人员将理解这些实施例是非限制性的、示例性的实施例,附图仅用于说明和描述的目的,并不旨在限制本公开的范围,其他方式的实施例也可能同样的完成本申请中的发明意图。应当理解,附图未按比例绘制。其中:
图1示出了根据本申请的一些实施例所示的同时定位与建图的系统;
图2示出了根据本申请的一些实施例所示的同时定位与建图的方法的流程图;
图3示出了根据本申请的一些实施例所示的包含两个朝向的多虚拟针孔相机模 型;
图4示出了根据本申请的一些实施例所示的包含五个朝向的多虚拟针孔相机模型;
图5示出了根据本申请的一些实施例所示的基于多虚拟针孔相机模型去畸变的示意图;
图6示出了根据本申请的一些实施例所示的原始单目鱼眼图像、传统去畸变后的单目鱼眼图像,以及利用本公开的方法去畸变后的单目鱼眼图像;
图7示出了根据本申请的一些实施例所示的原始双目鱼眼图像和传统去畸变后的双目鱼眼图像;
图8示出了根据本申请的一些实施例所示的确定相机位姿和构建地图的流程图;
图9示出了根据本申请的一些实施例所示的单目大视场相机构建地图点的示意图;
图10示出了根据本申请的一些实施例所示的双目大视场相机极线搜索的示意图;
图11示出了根据本申请的一些实施例所示的双目大视场相机构建地图点的示意图。
具体实施方式
以下描述提供了本申请的特定应用场景和要求,目的是使本领域技术人员能够制造和使用本申请中的内容。对于本领域技术人员来说,对所公开的实施例的各种局部修改是显而易见的,并且在不脱离本公开的精神和范围的情况下,可以将这里定义的一般原理应用于其他实施例和应用。因此,本公开不限于所示的实施例,而是与权利要求一致的最宽范围。
这里使用的术语仅用于描述特定示例实施例的目的,而不是限制性的。比如,除非上下文另有明确说明,这里所使用的,单数形式“一”,“一个”和“该”也可以包括复数形式。当在本说明书中使用时,术语“包括”、“包含”和/或“含有”意思是指所关联的整数,步骤、操作、元素和/或组件存在,但不排除一个或多个其他特征、整数、步骤、操作、元素、组件和/或在该系统/方法中可以添加其他特征、整数、步骤、操作、元素、组件。
考虑到以下描述,本公开的这些特征和其他特征、以及结构的相关元件的操作和功能、以及部件的组合和制造的经济性可以得到明显提高。参考附图,所有这 些形成本公开的一部分。然而,应该清楚地理解,附图仅用于说明和描述的目的,并不旨在限制本公开的范围。
本公开中使用的流程图示出了根据本公开中的一些实施例的系统实现的操作。应该清楚地理解,流程图的操作可以不按顺序实现。相反,操作可以以反转顺序或同时实现。此外,可以向流程图添加一个或多个其他操作。可以从流程图中移除一个或多个操作。
本公开的一个方面涉及一种同时定位与建图的方法。具体地,该方法包括,基于多虚拟针孔相机模型,将大视场相机获取的大视场图像去畸变得到去畸变图像;基于去畸变图像确定大视场相机的位姿并构建地图。所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
图1示出了根据本申请的一些实施例所示的同时定位与建图的系统。
同时定位与建图的系统100可以获取大视场图像并执行同时定位与建图的方法。所述同时定位与建图的方法可以参考图2至图11的描述。
如图所示,同时定位和建图的系统100可以包括大视场相机101和同时定位和建图的装置102。所述大视场相机101和同时定位和建图的装置102可以作为整体安装,也可以分别安装。处于陈述本公开的发明点方便,本公开中的大视场相机以鱼眼相机为例。
大视场相机101用于获取景物的鱼眼图像。在一些实施例中,大视场相机101可以为鱼眼相机、反射折射相机、全景成像相机。在一些实施例中,大视场相机101可以为单目大视场相机、双目大视场相机或多目大视场相机。
作为示例,大视场相机101包括单目鱼眼相机和双目鱼眼相机。双目鱼眼相机的左侧相机称为左目;双目鱼眼相机的右侧相机称为右目。左目获取的图像称为左鱼眼图像(左视场图像),右目获取的图像称为右鱼眼图像(右视场图像)。
同时定位和建图的装置102为可以执行同时定位和建图的方法的示例性计算设备。
在一些实施例中,同时定位和建图的装置102可以包括COM端口150,以便于数据通信。同时定位和建图的装置102还可以包括处理器120,处理器120以一个或多个处理器的形式,用于执行计算机指令。计算机指令可以包括例如执行本文描述的特定功能的例程,程序,对象,组件,数据结构,过程,模块和功能。例如, 处理器120可以基于多虚拟针孔相机模型确定鱼眼图像的去畸变图像。又例如,处理器120可以基于去畸变图像确定大视场相机101的位姿并构建地图。
在一些实施例中,处理器120可以包括一个或多个硬件处理器,例如微控制器,微处理器,精简指令集计算机(RISC),专用集成电路(ASIC),特定于应用的指令-集处理器(ASIP),中央处理单元(CPU),图形处理单元(GPU),物理处理单元(PPU),微控制器单元,数字信号处理器(DSP),现场可编程门阵列(FPGA),高级RISC机器(ARM),可编程逻辑器件(PLD),能够执行一个或多个功能的任何电路或处理器等,或其任何组合。
在一些实施例中,同时定位和建图的装置102可以包括内部通信总线110,程序存储和不同形式的数据存储(例如,磁盘170,只读存储器(ROM)130,或随机存取存储器(RAM)140)。同时定位和建图的装置102还可以包括存储在ROM 130,RAM 140和/或将由处理器120执行的其他类型的非暂时性存储介质中的程序指令。本申请的方法和/或过程可以作为程序指令实现。同时定位和建图的装置102还包括I/O组件160,支持计算机和其他组件(例如,用户界面元件)之间的输入/输出。同时定位和建图的装置102还可以通过网络通信接收编程和数据。
仅仅为了说明问题,在本申请中同时定位和建图的装置102中仅描述了一个处理器。然而,应当注意,本申请中同时定位和建图的装置102还可以包括多个处理器,因此,本申请中披露的操作和/或方法步骤可以如本公开所述的由一个处理器执行,也可以由多个处理器联合执行。例如,如果在本申请中同时定位和建图的装置102的处理器120执行步骤A和步骤B,则应该理解,步骤A和步骤B也可以由信息处理中的两个不同处理器联合或分开执行(例如,第一处理器执行步骤A,第二处理器执行步骤B,或者第一和第二处理器共同执行步骤A和B)。
图2示出了根据本申请的一些实施例所示的同时定位与建图的方法的流程图。流程200可以实施为同时定位和建图的装置102中的非临时性存储介质中的一组指令。同时定位和建图的装置102可以执行该一组指令并且可以相应地执行流程200中的步骤。
以下呈现的所示流程200的操作,旨在是说明性的而非限制性的。在一些实施例中,流程200在实现时可以添加一个或多个未描述的额外操作,和/或删减一个或多个此处所描述的操作。此外,图2中所示的和下文描述的操作的顺序并不对此加以限制。
在210中,同时定位和建图的装置102可以通过大视场相机101获取大视场图像。
当大视场相机101为单目大视场相机时,单目大视场相机获取大视场图像;当大视场相机101为双目大视场相机时,双目大视场相机获取大视场图像,包括左视场图像和右视场图像。
在220中,同时定位和建图的装置102可以基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像。
当大视场相机101为单目大视场相机时,同时定位和建图的装置102可以基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像。
上述多虚拟针孔相机模型可以包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述单目大视场相机的相机中心重合。
当大视场相机101为双目大视场相机时,同时定位和建图的装置102可以基于第一多虚拟针孔相机模型,得到所述左视场图像对应的左去畸变图像;基于第二多虚拟针孔相机模型,得到所述右视场图像对应的右去畸变图像。所述第一多虚拟针孔相机模型和所述第二多虚拟针孔相机模型可以相同或不同。
上述第一多虚拟针孔相机模型可以包括至少两个不同朝向的虚拟针孔相机,所述至少两个不同朝向的虚拟针孔相机的相机中心与大视场相机101的左目的相机中心重合;上述第二多虚拟针孔相机模型可以包括至少两个不同朝向的虚拟针孔相机,所述至少两个不同朝向的虚拟针孔相机的相机中心与大视场相机101的右目的相机中心重合。
作为示例,图3示出了根据本申请的一些实施例所示的包含两个朝向的多虚拟针孔相机模型。两个虚拟针孔相机的朝向为90度夹角,且相机中心与大视场相机的相机中心重合于C点。
作为示例,图4示出了根据本申请的一些实施例所示的包含五个朝向的多虚拟针孔相机模型。如图所示,该多虚拟针孔相机模型包括立方体的前向、上朝向、下朝向、左朝向以及右朝向共5个朝向的虚拟针孔相机。5个虚拟针孔相机的相机中心与大视场相机的相机中心重合于C点。此时,上述去畸变方法称为基于立方体映射的去畸变方法(cubemap-based undistortion method)(以下简称Cube或者立方体模型)。
具体地,同时定位和建图的装置102可以将大视场图像(或左视场图像、右视场图像)投影到多虚拟针孔相机模型(或第一多虚拟针孔相机模型、第二多虚拟针孔相机模型)中,得到至少两个不同朝向的虚拟针孔相机的投影图,展开所述至少两个不同朝向的虚拟针孔相机的投影图,即可得到所述左鱼眼图像对应的去畸变图像。
参考图5,其示出了根据本申请的一些实施例所示的基于多虚拟针孔相机模型去畸变的示意图。以下,以第一多虚拟针孔相机模型和左视场图像为示例。
点A为双目大视场相机的左目的相机中心,点B、点C和点D为左视场图像中示例性的像素。第一多虚拟针孔相机510为立方体模型,包括五个朝向的虚拟针孔相机,分别为立方体的前朝向、上朝向、下朝向、左朝向及右朝向。所述五个朝向的虚拟针孔相机的相机中心重合于点A。
如图所示,将左视场图像投影到第一多虚拟针孔相机模型510的五个不同朝向的虚拟针孔相机的成像平面上。相应地,可以得到五个不同朝向的投影图。将所述五个不同朝向的投影图展开,即可得到左去畸变图像。
参考图6,其示出了根据本申请的一些实施例所示的原始单目鱼眼图像、传统去畸变后的单目鱼眼图像,以及利用本公开的方法去畸变后的单目鱼眼图像。
图610为通过单目鱼眼相机获取的一个大视场图像。可以看出,该大视场图像比普通相机获得的图像有更为宽广的视场,但是整个图像存在空间畸变,离图像中心越远畸变越大。
图620为通过传统的去畸变方法对该大视场图像进行去畸变处理后得到去畸变图像。普通相机获得的图像视角一般约为80度,图620的视角为100度。虽然相比普通相机获得的图像视角有了提升,但相对于未去畸变处理前的图像还是损失了很多视角。由此,无法构建包括大视场图像的所有视角的地图。
图630根据本发明一个实施例的基于5个朝向的多虚拟针孔相机模型去畸变展开的大视场去畸变图像,即通过立方体模型得到的去畸变图像。如图所示,图630保留了大视场图像的所有视角。基于该大视场去畸变图像进行SLAM能够构建包括原有所有视角内容的地图。
图7示出了根据本申请的一些实施例所示的原始双目鱼眼图像和传统去畸变后的双目鱼眼图像。
如图所示,图像701和图像702分别为大视场相机101在真实世界获取的原 始左鱼眼图像和右鱼眼图像。图像703和图像704分别为传统去畸变后的左去畸变图像和右去畸变图像。
作为与通过立方体模型处理后的左去畸变图像和右去畸变图像(图中均为示出)的对比,图像601和图像602经过传统去畸变方法获得的单张图像,图像的纵向和横向的视角均仅为100度。可见,对于大视场相机101获取的大视角图像,本申请提供的去畸变方法可以在保留大视角的同时,有效地防止图像畸变。
在230中,同时定位和建图的装置102可以基于所述去畸变图像,确定所述大视场相机的位姿并构建地图。
在一些实施例中,对于单目大视场相机,同时定位和建图的装置102可以提取去畸变图像的特征点,并基于所提取的特征点构建对应的大视场帧;然后基于所述大视场帧确定单目大视场相机的位姿并构建地图。
可选地,通过提取大视场去畸变图像的特征点,也即大视场去畸变图像的关键点和描述子,基于大视场去畸变图像的特征点跟踪相机运动的位姿并构建地图。可选地,直接根据大视场去畸变图像中的像素亮度信息估计相机的运动的位姿并构建地图,不用计算关键点和描述子。
通过上述基于多虚拟针孔相机模型去畸变的方法得到的大视场去畸变图像保留了原始大视场图像的所有视角。由此可以基于大视场图像间丰富的共有特征进行同时定位和建图,获得更高效的定位和更准确的地图。与此同时,上述方法还避免了大视场相机复杂的投影模型给系统带来的额外计算成本。
在一些实施例中,对于双目大视场相机,同时定位和建图的装置102可以提取左去畸变图像和右去畸变图像的特征点,并基于所提取的特征点构建对应的双目图像帧;然后基于所述双目图像帧确定双目大视场相机的位姿并构建地图。
因为大视场帧(或双目图像帧)中包括去畸变图像(或左去畸变图像、右去畸变图像)中的全部特征点的信息,所以能够依此跟踪大视场相机101运动的位姿并构建地图。
作为示例,同时定位和建图的装置102可以对上述去畸变图像(或左去畸变图像、右去畸变图像)进行缩放,获取该去畸变图像(或左去畸变图像、右去畸变图像)对应的图像金字塔(Image Pyramid)。在该图像金字塔的各尺度图像中提取角点并计算描述子。由所述角点和描述子构成图像的特征点。所述角点是图像中辨识度高且具有代表性的区域,用于表示特征点在图像中的位置信息。描述子可以 用向量表示,用于描述角点周围像素的信息。描述子可以按照外观相似的特征点应该有相似的描述子设计。
针对去畸变图像(或左去畸变图像、右去畸变图像)提取其特征点,并基于所提取的特征点构建对应的大视场帧(或双目图像帧)。该大视场帧(或双目图像帧)包括对应的去畸变图像(或左去畸变图像、右去畸变图像)中所有的特征点。大视场帧(或双目图像帧)构建完成后,可将该大视场帧(或双目图像帧)对应的去畸变图像(或左去畸变图像、右去畸变图像)的像素数据丢弃,从而节约存储空间,降低系统功耗。
更多关于步骤230的描述参加图8及其相关描述。
需要注意的是,当大视场相机101为双目大视场相机时,双目大视场相机的左目和右目的光轴可能不平行。因此,流程200可以进一步包括,使大视场相机101的左目和右目的光轴平行。例如,同时定位和建图的装置102可以通过双目相机标定程序调整双目鱼眼相机左目和右目的虚拟光轴,使得两者的虚拟光轴平行。
图8示出了根据本申请的一些实施例所示的确定相机位姿和构建地图的流程图。流程230可以实施为同时定位和建图的装置102中的非临时性存储介质中的一组指令。同时定位和建图的装置102可以执行该一组指令并且可以相应地执行流程200中的步骤。
以下呈现的所示流程230的操作,旨在是说明性的而非限制性的。在一些实施例中,流程230在实现时可以添加一个或多个未描述的额外操作,和/或删减一个或多个此处所描述的操作。此外,图8中所示的和下文描述的操作的顺序并不对此加以限制。
在810中,同时定位和建图的装置102可以执行初始化步骤,所述初始化步骤可以构建初始地图。
对于单目大视场相机,同时定位和建图的装置102可以获取两个不同时刻的去畸变图像(或大视场帧);确定所述两个不同时刻的去畸变图像(或大视场帧)互相匹配的特征点;基于所述互相匹配的特征点构建初始地图。
作为示例,同时定位和建图的装置102可以获取第一时刻对应的去畸变图像(或大视场帧)和第二时刻对应的去畸变图像(或大视场帧);确定所述第一时刻对应的去畸变图像(或大视场帧)和所述第二时刻对应的去畸变图像(或大视场帧)互相匹配的特征点;基于所述互相匹配的特征点构建初始地图。
在一些实施例中,第一时刻对应的大视场帧和第二时刻对应的大视场帧可以为当前大视场帧和参考大视场帧。所述当前大视场帧和参考大视场帧可以是连续帧,也可以在二者中间具有一个或多个间隔帧。当前大视场帧和参考大视场帧之间需要存在一定的视差,以保证初始化的顺利进行。
在一些实施例中,同时定位和建图的装置102可以基于多虚拟针孔相机模型(例如,图4所示的多虚拟针孔相机模型),将第一时刻对应的去畸变图像(或大视场帧)和第二时刻对应的去畸变图像(或大视场帧)分解为分别对应每个虚拟针孔相机的子视场帧。从而对于每个虚拟针孔相机得到与之对应的两个子视场帧,这两个子视场帧分别来自第一时刻对应的去畸变图像(或大视场帧)和第二时刻对应的去畸变图像(或大视场帧)。通过对这两个子视场帧进行帧间匹配来确定互相匹配的特征点。
在一些实施例中,基于互相匹配的特征点构建初始地图包括:基于所述第一时刻对应的去畸变图像中的特征点和所述第一时刻时所述大视场相机的相机中心,确定第一特征点对应的方向向量;基于所述第二时刻对应的去畸变图像中匹配的特征点和所述第二时刻时所述大视场相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建初始地图。
具体地,同时定位和建图的装置102可以把参考大视场帧F1基于该多虚拟针孔相机模型分解为分别对应每个虚拟针孔相机的子视场帧F11、F12、F13、F14和F15;把当前大视场帧F2也基于该多虚拟针孔相机模型分解为分别对应每个虚拟针孔相机的子视场帧F21、F22、F23、F24和F25。其中,子视场帧F11和F21对应于前向的虚拟针孔相机,子视场帧F12和F22对应于上朝向的虚拟针孔相机,子视场帧F13和F23对应于下朝向的虚拟针孔相机,子视场帧F14和F24对应于左朝向的虚拟针孔相机,子视场帧F15和F25对应于右朝向的虚拟针孔相机。通过对子视场帧F11和F21、F12和F22、F13和F23、F14和F24以及F15和F25进行帧间匹配来确定当前大视场帧和参考大视场帧互相匹配的特征点。[这里,子视场帧的匹配用于确定两个视场帧互相匹配的特征点,进而基于方向向量进行三角测量来构建新的地图点。这样描述是否正确]
下面以子视场帧F11和F21为例对帧间匹配进行说明。
首先,对子视场帧F11和F21的特征点进行匹配,检测匹配的特征点对的个 数是否大于或等于初始化阈值,若小于初始化阈值,则初始化失败。如果匹配的特征点对的个数超过初始化阈值,运用例如随机抽样一致算法(Random Sample Consensus,简称RANSAC)基于匹配的特征点对的方向向量计算两帧之间的本质(Essential)矩阵。其中,初始化阈值表示初始化构建地图所需的最少的特征点对的数量,可以直接使用默认设置值,如100,也可由用户预先设置。
然后通过分解本质矩阵得到当前大视场帧与参考大视场帧之间的相对位姿,所述相对位姿可以通过位姿矩阵表示。根据当前大视场帧与参考大视场帧之间的相对位姿对匹配的特征点对进行三角测量得到该特征点对对应的地图点的三维坐标,也即地图点的位置。
如图9所示,O1点是子视场帧F11对应的虚拟针孔相机的相机中心,O2点是子视场帧F12对应的虚拟针孔相机的相机中心,p1和p2是匹配的特征点。通过向量O1p1的方向和向量O2p2的方向,就可以确定地图点的三维坐标也即P点的位置。在SLAM中由于噪声的影响,向量O1p1和向量O2p2有可能没有交点,这个时候可以使用例如最小二乘法求取使误差最小的P点的坐标。O1和O2之间的距离对三角测量的误差影响很大。距离太短,也就是说相机的平移太小时,对P点观测的角度误差会导致较大的深度误差。而距离太远,场景的重叠部分会少很多,使特征匹配变得困难。因此,当前大视场帧和参考大视场帧之间需要存在一定的视差。如果选取的两个大视场帧不满足要求,初始化失败,放弃这两个大视场帧并重新初始化。
最后基于上述三角测量得到地图点的三维坐标构建初始的地图点。其中,以该三维坐标作为地图点的坐标,以该三维坐标对应的特征点的描述子作为地图点的描述子。
对于双目大视场相机,同时定位和建图的装置102可以执行上述单目大视场相机的初始化步骤;也可以基于同一时刻的左去畸变图像和右去畸变图像互相匹配的特征点来构建初始地图。
作为示例,同时定位和建图的装置102可以确定左去畸变图像和右去畸变图像互相匹配的特征点;基于所述互相匹配的特征点构建初始地图。
在一些实施例中,同时定位和建图的装置102可以确定左去畸变图像中的特征点在右去畸变图像中对应的极线;然后在所述极线上搜索与所述左去畸变图像中的特征点匹配的特征点。其中,所述极线为多线段折线。
参考图10,其示出了根据本申请的一些实施例所示的双目大视场相机极线搜索的示意图。左去畸变图像1010上有极线1001,右去畸变图像1020上有极线1002。与左去畸变图像1010的特征点相匹配的特征点一定位于极线1002中。相反,与右去畸变图像1020的特征点相匹配的特征点一定位于极线1001中。因此,通过极线搜索即可快速地找到左去畸变图像和右去畸变图像互相匹配的特征点。
如图所示,极线1001和极线1002为三线段折线,包括两条倾斜的线段和一条水平的线段。
如图所示,左去畸变图像1010和右去畸变图像1020分别保留了左鱼眼图像和右鱼眼图像的所有视角。基于左去畸变图像1010和右去畸变图像1020进行同时定位和建图能够构建包括原有所有视角内容的地图。
在一些实施例中,上述基于所述互相匹配的特征点构建地图包括:首先,基于左去畸变图像中的特征点和大视场相机101左目的相机中心,确定第一特征点对应的方向向量;其次,基于右去畸变图像中匹配的特征点和大视场相机101右目的相机中心,确定第二特征点对应的方向向量;再次,基于所述双目鱼眼相机的基线,对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;最后,基于所述地图点构建地图。
参考图11,其示出了根据本申请的一些实施例所示的双目大视场相机构建地图点的示意图。以下,以大视场相机101前面的地图点为例。
点O1为大视场相机101左目的相机中心,连接左去畸变图像中的特征点和点O1得到第一特征点对应的方向向量。点O2为大视场相机101右目的相机中心,连接右去畸变图像中匹配的特征点和点O2得到第二特征点对应的方向向量。在一些实施例中,第一特征点对应的方向向量和第二特征点对应的方向向量可以为单位化后的向量。
所述第一特征点对应的方向向量和所述第二特征点对应的方向向量相交于点E,分别得到线段O1E和线段O2E。连接点O1和点O2得到线段O1O2,线段O1O2的长度为b(即大视场相机101的基线)。线段O1O2与线段O1E和线段O2E形成三角形。对所述三角形求解,得出线段O1E的长度为d1,线段O2E的长度为d2,O1O2与线段O1E的夹角为,O1O2与线段O2E的夹角为,进而得出特征点对应的地图点E的坐标。再结合大视场相机101的当前位姿,将地图点E从大视场相机101的坐标系转化到世界坐标系。然后,根据世界坐标系中的点E的位置构建地图。
具体地,同时定位和建图的装置102可以基于以下公式进行三角测量。首先基于正余弦定理得到公式(1)、(2)和(3)。
Figure PCTCN2018124786-appb-000001
公式(1)
Figure PCTCN2018124786-appb-000002
公式(2)
Figure PCTCN2018124786-appb-000003
公式(3)
合并公式(2)和(3),可以得到公式(4),表述如下:
d 1cosα 1+d 2cosα 2=b,公式(4)
合并公式(1)和公式(4),得到公式(5),表示如下:
Figure PCTCN2018124786-appb-000004
公式(5)
同时,结合公式(6),对公式(1)、(2)、(3)、(4)、(5)求解,可以得到d1和d2。
Figure PCTCN2018124786-appb-000005
公式(6)
在一些实施例中,对于单目大视场相机和双目大视场相机新构建的地图点,还需要进行关联处理。一个地图点可被多个关键大视场帧观测到,将观测到这个地图点的关键大视场帧与这个地图点进行关联,同时记录关键大视场帧上哪一个特征点与这个地图点有关联,也即哪一个特征点可以用于测量以得到这个地图点。对于上述初始化得到的地图点,需要关联初始化创建的两个关键大视场帧,同时记录这两个关键大视场帧上哪一个特征点与该地图点有关联。
所构建的初始地图,包括上述两个关键大视场帧和上述初始的地图点,以及它们之间的关联关系的信息。
在一些实施例中,初始化步骤还包括:对于匹配的特征点对的个数超过初始化阈值的情况,根据上述两个关键大视场帧构建基于词袋(Bag of Word)模型的向量,并将所述基于词袋模型的向量加入到地图数据库。在词袋模型中,根据各种图像特征进行聚类。比如说,眼睛、鼻子、耳朵、嘴、各种特征的边缘和角等等为不同的特征类。假设有10000个类,对于每一个大视场帧,可以分析它含有哪几个类,以1表示有,以0表示没有。那么,这个大视场帧就可用一个10000维的向量 来表达。对于不同的大视场帧,可以通过比较它们基于词袋模型的向量来判断它们的相似程度。所述地图数据库用于存储根据关键大视场帧构建的基于词袋模型的向量。
在820中,同时定位和建图的装置102可以执行全局捆集优化步骤。全局捆集优化对SLAM当前建立的地图(以下简称当前地图)中的所有关键大视场帧(或关键双目图像帧)和所有地图点进行优化。例如,对步骤810构建的初始地图进行全局捆集优化,即对上述只有两个关键大视频帧和地图点的地图进行全局捆集优化。可以理解,除了对初始地图,还可以在构建地图过程中的任意时刻,对该时刻当前的地图执行该全局捆集优化。捆集优化的目的在于通过微调地图中关键大视场帧(或关键双目图像帧)的位姿以及地图点的位置,最小化SLAM所构建地图中的地图点在关键大视场帧(或关键双目图像帧)上的重投影误差,由此优化所建立的地图。
对于单目大视场相机,同时定位和建图的装置102可以将地图中的关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到每个地图点在所述多虚拟针孔相机模型中的重投影点;根据每个地图点在所述多虚拟针孔相机模型中的重投影点与该地图点对应的特征点,确定每个地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;基于所述重投影误差,更新所述关键大视场帧的位姿以及与所述关键大视场帧关联的所有地图点的位置。
需要注意的是,本申请中,帧的位姿(例如,关键大视场帧的位姿)为大视场相机101获取该帧的时刻大视场相机101运动的位姿,为了简洁,称其为帧的位姿。
以图4所示的5个朝向的多虚拟针孔相机模型为例,基于该多虚拟针孔相机模型将地图点变换到对应的虚拟针孔相机的坐标系中,例如与该地图点对应的虚拟针孔相机为前向的虚拟针孔相机,再投影到该前向的虚拟针孔相机的成像平面上以得到该地图点的重投影点。由于这里使用的多虚拟针孔相机模型与步骤220中大视场图像去畸变处理所使用的多虚拟针孔相机模型是同一个模型,因此该成像平面对应于该关键大视场帧分解到前向的虚拟针孔相机的子视场帧。该重投影点可以理解为基于该子视场帧的位姿对该地图点的观测值。根据该地图点在该关键大视场帧上关联的特征点(即三角测量得到该地图点所依据的特征点)以及该地图点的重投影点确定该地图点的重投影误差。在SLAM所建立地图不存在误差的理想状况下,重 投影误差为0。但是因为现实条件下不可避免地会引入测量误差等误差,导致重投影误差不能被完全消除,SLAM通过最小化该重投影误差来最优化所建立的地图。
对于双目大视场相机,同时定位和建图的装置102可以将地图中的关键双目图像帧关联的每个地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差。
可替代地,同时定位和建图的装置102可以将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差。
进一步地,同时定位和建图的装置102可以基于左重投影误差、右重投影误差或两者之和,更新关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。具体地,对于单目地图点,同时定位和建图的装置102可以基于左重投影误差或右重投影误差,更新关键双目图像帧的位姿以及双目图像帧关联的所有地图点的位置;对于双目地图点,同时定位和建图的装置102可以基于左重投影误差和右重投影误差之和,更新关键双目图像帧的位姿以及双目图像帧关联的所有地图点的位置。
在一些实施例中,同时定位和建图的装置102可以根据重投影误差(例如,左重影误差、右重投影误差、左重投影误差和右重投影误差之和)确定损失函数。在得到损失函数之后,可选地,通过高斯牛顿法(Gauss-Newton)、列文伯格-马夸尔特法(Levenberg-Marquardt)等梯度下降方法迭代求解关键大视场帧(或关键双目图像帧)的位姿以及与其关联的地图点的位置各自对应的梯度,通过各自对应的梯度更新关键大视场帧(或关键双目图像帧)的位姿以及与其关联的地图点的位置,最终使得当前的地图达到重投影误差最小的最优状态。
上述捆集优化基于与大视场图像去畸变处理一样的多虚拟针孔相机模型,把复杂的大视场相机的投影模型转换为多个虚拟针孔相机投影模型。由此避免了大视场相机复杂的投影模型带来的复杂优化处理,从而提升了系统处理性能。
在830中,同时定位和建图的装置102可以执行跟踪步骤。所述跟踪步骤通 过最小化地图点在当前大视场帧(或当前双目图像帧)上的重投影误差来优化当前大视场相机的位姿。在跟踪步骤中,仅优化当前大视场相机的位姿,而其他时刻的大视场相机的位姿以及地图点的位置保持不变。步骤830可以在地图构建的任意时刻进行,例如基于上述初始化步骤810构建的初始地图或基于上述全局捆集优化步骤820优化后的地图,SLAM持续根据新的大视场帧(或双目图像帧)跟踪大视场相机运动的位姿。
对于单目大视场相机,同时定位和建图的装置102可以将当前大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前大视场帧关联的地图点的重投影误差确定重投影误差;基于所述重投影误差,更新所述当前大视场帧的位姿。
在一些实施例,同时定位和建图的装置102可以执行以下三个子步骤完成跟踪步骤。
跟踪子步骤1:确定当前大视场帧的参考大视场帧。
可选地,确定当前大视场帧的前一大视场帧为参考大视场帧。
可选地,选取局部地图中与当前大视场帧共视程度最高的关键大视场帧作为参考大视场帧。其中,对于当前地图中的关键大视场帧的数量小于N的情况,局部地图包括当前地图中所有的关键大视场帧以及所有的地图点,其中N为大于2的整数。N可以直接使用默认设置值,如10,也可由用户预先设置。如当前地图是初始化得到的地图,局部地图即为当前地图,包括初始的两个关键大视场帧以及与其关联的地图点。对于当前地图中的关键大视场帧的数量不小于N的情况,局部地图包括当前地图中与当前大视场帧共视程度最高的至少N个关键大视场帧以及与所述至少N个关键大视场帧关联的地图点。
可选地,选取局部地图中与当前大视场帧的前一大视场帧共视程度最高的关键大视场帧作为参考大视场帧。当前大视场帧和其前一大视场帧之间通常具有较高的共视程度,所以可以根据后者选取当前大视场帧的参考大视场帧。和与刚刚创建的当前大视场帧共视程度最高的关键大视场帧相比,与其前一大视场帧共视程度最高的关键大视场帧更易于选取。由此,利于SLAM方法的顺利实现。
可选地,通过全局匹配确定参考大视场帧。首先,根据当前大视场帧构建基 于词袋模型的向量。然后,根据该基于词袋模型的向量查询初始化步骤810建立的地图数据库,以获取与当前大视场帧匹配的关键大视场帧,以作为参考大视场帧。
在一个示例中,对当前大视场帧和其前一大视场帧进行匹配,得到匹配的特征点对。如果匹配的特征点对的数量大于跟踪阈值,确定当前大视场帧的前一大视场帧为参考大视场帧。其中,跟踪阈值表示跟踪大视场相机的位姿所需的最少的特征点对的数量,可以直接使用默认设置值,如20,也可由用户预先设置。
如果当前大视场帧和其前一大视场帧匹配的特征点对的数量不大于跟踪阈值,选取局部地图中与当前大视场帧或其前一大视场帧共视程度最高的关键大视场帧,对当前大视场帧和该关键大视场帧进行匹配,得到匹配的特征点对。如果匹配的特征点对的数量大于跟踪阈值,确定该关键大视场帧为参考大视场帧。
如果当前大视场帧和该关键大视场帧匹配的特征点对的数量不大于跟踪阈值,通过全局匹配确定参考大视场帧。具体确定过程如前所述,为了简洁,这里不再赘述。
由此,可以获得当前大视场帧的更具参考性的参考大视场帧,使得SLAM跟踪更为准确,建图更为高效。
跟踪子步骤2:根据当前大视场帧和上述确定的参考大视场帧,基于多虚拟针孔相机模型确定当前大视场帧的位姿。在一个示例中,通过确定当前大视场帧与参考大视场帧之间的相对位姿,来确定当前大视场帧的位姿。
把当前大视场帧基于多虚拟针孔相机模型分解为分别对应每个虚拟针孔相机的子视场帧,对参考大视场帧也执行同样操作。从而对于每个虚拟针孔相机得到与之对应的两个子视场帧。从分别对应于不同虚拟针孔相机的子视场帧对中,选取匹配的特征点对的数量最多的子视场帧对。通过对该子视场帧对中的两个子视场帧进行帧间匹配获得它们之间的相对位姿。具体的子视场帧的帧间匹配过程与初始化步骤810中的帧间匹配处理一致,为了简洁,这里不再赘述。
由于每个虚拟针孔相机的相机中心与大视场相机的相机中心是重合的,因此多虚拟针孔相机模型中的每个虚拟针孔相机与大视场相机之间存在固定的旋转角度。每个虚拟针孔相机的旋转角度对应一个确定的旋转矩阵。由此,大视场帧的位姿矩阵可以通过对应的旋转矩阵变换为子视场帧的位姿矩阵。反之,子视场帧的位姿矩阵也可以通过对应的旋转矩阵变换为大视场帧的位姿矩阵。
上述方案通过多虚拟针孔相机模型,把基于复杂的大视场相机投影模型的位 姿求取转换为基于简单的虚拟针孔相机投影模型的位姿求取,使得大视场SLAM的算法极大简化,性能显著提升。
跟踪子步骤3:更新上述跟踪子步骤2获得的当前大视场帧的位姿。
根据当前大视场帧和参考大视场帧之间匹配的特征点对,针对参考大视场帧中每个匹配的特征点,将该特征点关联的地图点基于多虚拟针孔相机模型变换到当前大视场帧的、对应的虚拟针孔相机的坐标系中。然后,再将该地图点投影到该虚拟针孔相机的成像平面上,以得到该地图点在当前大视场帧中的重投影点。
在一个示例中,当前大视场帧和参考大视场帧之间存在较大视差。基于如图4所示5个朝向的多虚拟针孔相机模型进行处理。参考大视场帧中某个匹配的特征点在左朝向的虚拟针孔相机的成像平面上。该特征点关联的地图点基于多虚拟针孔相机模型进行变换后,对应到当前大视场帧的前向的虚拟针孔相机的坐标系中。在当前大视场帧的前向的虚拟针孔相机的成像平面上得到该地图点的重投影点。可以理解,在参考大视场帧的位姿通过多虚拟针孔相机模型中的左朝向的虚拟针孔相机可以观察到该地图点,而在当前大视场帧的位姿通过多虚拟针孔相机模型中的前向的虚拟针孔相机可以观察到该地图点。
根据该重投影点与当前大视场帧中匹配的特征点确定该地图点的重投影误差。根据参考大视场帧中所有匹配的特征点关联的地图点的重投影误差来更新当前大视场帧的位姿。本步骤中的重投影误差计算以及根据重投影误差更新当前大视场帧的位姿的处理与步骤820全局捆集优化中的处理方法一致,为了简洁,这里不再赘述。
通过进一步优化更新当前大视场帧的位姿,提升当前大视场帧的位姿的可信度,减小跟踪误差。使得SLAM跟踪更为准确,建图更为高效。
对于双目大视场相机,同时定位和建图的装置102可以将当前双目图像帧关联的每个地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定左重投影误差。
可替代地,同时定位和建图的装置102可以将所述地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应 的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定右重投影误差。
进一步地,同时定位和建图的装置102可以基于左重投影误差、右重投影误差或两者之和,更新当前双目图像帧的位姿。例如,对于单目地图点,同时定位和建图的装置102可以基于左重投影误差或右重投影误差,更新当前双目图像帧的位姿;对于双目地图点,同时定位和建图的装置102可以基于左重投影误差和右重投影误差之和,更新当前双目图像帧的位姿。
具体地,同时定位和建图的装置102可以求解左重投影误差、右重投影误差,或左重投影误差与右重投影误差之和确定大视场相机101的位姿增量;然后,结合先验信息,确定大视场相机101的当前位姿。
在一些实施例中,上述先验信息可以为前一帧大视场相机101的位姿,或前一帧大视场相机101的位姿与前一帧位姿增量之和。所述前一帧位姿增量为前一帧大视场相机101的位姿与前两帧大视场相机101的位姿之间的位姿增量。
在一些实施例中,同时定位和建图的装置102可以通过下述多个公式计算左投影误差和/或右投影误差,并求解位姿增量。公式(7)表示如下:
Figure PCTCN2018124786-appb-000006
公式(7)
其中,P表示世界坐标系中的地图点,可以表示为;表示坐标转换矩阵,可以将地图点P从世界坐标系转化到多虚拟针孔相机模型的坐标系上;表示旋转向量,可以将地图点P从多虚拟针孔相机模型的坐标系转化到所述多虚拟针孔相机模型一个面的坐标系上;K表示虚拟多针孔相机的每个面对应的针孔相机的摄像机矩阵,该矩阵中包含了相机的参数,例如图像的中心以及焦距的信息;u表示地图点P在所述多虚拟针孔相机模型一个面上的重投影点。
由以上描述可知,公式(7)可以进一步表述为公式(8)。
Figure PCTCN2018124786-appb-000007
公式(8)
其中,P2表示地图点P在多虚拟针孔相机模型的坐标系上的投影点;P1表示点P2在所述多虚拟针孔相机模型一个面的坐标系上的投影点。
因此,可以根据链规则导出u到相机位姿的雅克比矩阵。如公式(9)所示:
Figure PCTCN2018124786-appb-000008
公式(9)
其中,表示u到相机位姿的雅克比矩阵,表示P2的斜对称矩阵。
根据公式(9),可以确定地图点P的雅克比矩阵,表示如下:
Figure PCTCN2018124786-appb-000009
公式(10)
其中,地图点P的雅克比矩阵;表示坐标变换矩阵的旋转分量。
对于大视场相机101,同时定位和建图的装置102可以基于公式(7)、(8)、(9)和(10)确定大视场相机101的左重投影误差并确定大视场相机101的位姿。
应当理解的是,基于同样的原理,同时定位和建图的装置102可以确定大视场相机101的右重投影误差;然后,基于所述右重投影误差或左重投影误差与右重投影误差之和确定大视场相机101的位姿。
具体地,可通过公式(11)确定右重投影误差。其中,表示地图点P在第二多虚拟针孔相机模型一个面上的重投影点;表示大视场相机101左目相对于右目的偏移量;b表示大视场相机101的基线长度。
Figure PCTCN2018124786-appb-000010
公式(11)
在840中,同时定位和建图的装置102可以执行建图步骤(或称为地图更新步骤)。建图步骤可以在当前的地图的基础上,随着大视场相机的运动,扩张地图。换言之,建图步骤将插入新的地图点到当前的地图中。可选地,建图步骤840可以在跟踪步骤830之后进行,对于当前大视场帧(或当前双目图像帧),通过跟踪步骤830确定其位姿,也即确定了当前时刻大视场相机运动的位姿。
对于单目大视场相机,同时定位和建图的装置102可以确定当前大视场帧及其参考帧互相匹配的特征点;基于所述当前大视场帧的特征点和当前所述大视场相机的相机中心,确定第一特征点对应的方向向量;基于所述参考帧匹配的特征点和所述参考帧对应的所述大视场相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建地图。
在一些实施例中,同时定位和建图的装置102可以执行以下三个子步骤完成建图步骤。
建图子步骤1:确定当前大视场帧是否是关键大视场帧。
由于大视场相机在持续的运动采集数据,对获得的每个大视场帧进行地图更新操作会带来巨大的计算量。因此,可以选择某些认为重要的大视场帧作为关键大 视场帧,再根据关键大视场帧进行地图更新操作。如何确定关键大视场帧可以采用任何常规的或未来开发的技术。例如基于初始的关键大视场帧,每间隔10个大视场帧选1个为关键大视场帧。即选择第11个、第21个、第31个……为关键大视场帧。又例如,选取与上一个关键大视场帧有合适的视差的大视场帧为关键大视场帧。
对于当前大视场帧是关键大视场帧的情况,继续地图更新子步骤2,根据当前大视场帧进行地图更新处理。对于当前大视场帧不是关键大视场帧的情况,继续地图更新子步骤3,对当前大视场帧进行地图点关联处理。
建图子步骤2:对于当前大视场帧是关键大视场帧的情况,根据当前大视场帧进行地图更新处理。
针对局部地图中的每个关键大视场帧,把该关键大视场帧基于多虚拟针孔相机模型分解为分别对应每个虚拟针孔相机的子视场帧,对当前大视场帧也执行同样操作。从而对于每个虚拟针孔相机得到与之对应的两个子视场帧,并通过对该两个子视场帧进行帧间匹配来构建新的地图点。
可选地,在对该两个子视场帧进行帧间匹配的过程中可以采用基于词袋模型的向量加速特征点之间的匹配。对于通过词袋模型匹配的特征点对,进一步检测其是否符合对极约束。对符合对极约束的特征点对,通过基于该特征点对的三角测量得到新的地图点的三维坐标点。
这里子视场帧的帧间匹配处理以及通过基于特征点对的三角测量得到新的地图点的三维坐标点的过程,与初始化步骤810中的对应处理一致,为了简洁,这里不再赘述。
构建新的地图点后,基于当前大视场帧的位姿把该新的地图点变换为世界坐标系中的地图点并插入到当前的地图,并把当前大视场帧插入到当前的地图。一般把初始化中用于构建地图的第一个关键大视场帧的坐标系作为世界坐标系。之后对于新的地图点,需要进行相机坐标系与世界坐标系的变换。
本领域的普通技术人员可以理解,当前地图通过不断插入新的地图点和新的关键大视场帧在逐步“成长”完善。
可选地,根据当前大视场帧构建新的基于词袋模型的向量并将所述新的基于词袋模型的向量加入到上述地图数据库。根据地图数据库可以进行基于词袋模型的向量加速特征点之间的匹配,从而提高SLAM跟踪和建图的效率。
建图子步骤3:对于当前大视场帧不是关键大视场帧的情况,对当前大视场帧进行地图点关联处理。
针对局部地图中的每个地图点,根据当前大视场帧的位姿把该地图点基于多虚拟针孔相机模型变换到当前大视场帧的、对应的虚拟针孔相机的坐标系中。再把该地图点投影到该虚拟针孔相机的成像平面上,以得到该地图点在当前大视场帧中的重投影点。如果投影失败,表示从当前大视场帧的位姿观测不到该地图点。如果投影成功,表示从当前大视场帧的位姿可以观测到该地图点,获得该地图点的重投影点。在当前大视场帧的所有特征点中,选择该重投影点附近的特征点中、与该地图点最匹配的特征点与该地图点进行关联。可以理解,通过这一步骤,关联了当前大视场帧以及从该当前大视场帧的位姿可以观测到的地图点。如此,在对下一大视场帧处理时,该当前大视场帧即可作为下一大视场帧的前一大视场帧用于跟踪处理。使得SLAM跟踪更为连贯,定位更为精准,构建的地图更为准确。
对于双目大视场相机,同时定位和建图的装置102可以执行上述单目大视场相机的建图步骤;也可以基于同一时刻的左去畸变图像和右去畸变图像互相匹配的特征点来构建地图。
对于后者,同时定位和建图的装置102可以确定当前左去畸变图像和当前右去畸变图像互相匹配的特征点;基于所述当前左去畸变图像的特征点和当前所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;基于所述当前右去畸变图像的特征点和当前所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;基于所述地图点构建地图。
在一些实施例中,对于双目大视场相机,同时定位和建图的装置102可以参考初始化步骤810中的相关描述确定上述第一特征点对应的方向向量和第二特征点对应的方向向量,并进行三角测量。
如果当前大视场帧(或当前双目图像帧)是关键大视场帧(或关键双目图像帧),建图步骤840可以进一步包括局部捆集优化。局部捆集优化的目的在于通过微调局部地图中关键大视场帧(或关键双目图像帧)的位姿以及地图点的位置,最小化局部地图中的地图点在关键大视场帧(或关键双目图像帧)上的重投影误差,由此优化所建立的地图。
对于单目大视场相机,针对局部地图中的每个关键大视场帧的捆集优化处理如下。
针对与该关键大视场帧关联的每个地图点,基于所述多虚拟针孔相机模型将该地图点变换到对应的虚拟针孔相机的坐标系中,再投影到该虚拟针孔相机的成像平面上,以得到该地图点的重投影点。并根据该地图点关联的特征点与该地图点的重投影点确定该地图点的重投影误差。根据与该关键大视场帧关联的所有地图点的重投影误差,更新该关键大视场帧的位姿以及与该关键大视场帧关联的所有地图点的位置。本步骤中的捆集优化处理过程和上述全局捆集优化步骤820中的处理过程一致,为了简洁,这里不再赘述。
对于双目大视场相机,针对局部地图中的每个关键双目图像帧的捆集优化处理如下。
将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差。
可替代地,将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差。
进一步地,基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
在850中,同时定位和建图的装置102可以执行闭环检测处理步骤。单目大视场相机和双目大视场相机的闭环检测处理步骤可以是相同的,以下以单目大视场相机的闭环检测处理为示例。
对于当前大视场帧是关键大视场帧的情况,基于词袋模型的向量检测当前的地图数据库中的、与当前大视场帧相似的闭环大视场帧。
确定该闭环大视场帧与当前大视场帧之间匹配的特征点对。可选地,可以采 用基于词袋模型的向量加速特征点之间的匹配。
根据该闭环大视场帧与当前大视场帧之间匹配的特征点对通过相似变换算子(Sim3Solver)以及RANSAC算法计算该闭环大视场帧与当前大视场帧之间的相似变换矩阵。
针对当前大视场帧中每个匹配的特征点,将该特征点关联的地图点基于多虚拟针孔相机模型变换到该闭环大视场帧的、对应的虚拟针孔相机的坐标系中。再将该地图点投影到该虚拟针孔相机的成像平面上,以得到该地图点在该闭环大视场帧中的重投影点。根据该重投影点与该闭环大视场帧中匹配的特征点确定第一重投影误差。根据当前大视场帧中所有匹配的特征点的第一重投影误差确定第一累计重投影误差。
针对该闭环大视场帧中每个匹配的特征点,将该特征点关联的地图点基于多虚拟针孔相机模型变换到当前大视场帧的、对应的虚拟针孔相机的坐标系中。再投影到该虚拟针孔相机的成像平面上,以得到该地图点在当前大视场帧中的重投影点。根据该重投影点与当前大视场帧中匹配的特征点确定第二重投影误差。根据该闭环大视场帧中所有匹配的特征点的第二重投影误差确定第二累计重投影误差。
根据所述第一累计重投影误差和所述第二累计重投影误差确定损失函数。通过最小化损失优化上述相似变换矩阵。
为了消除闭环过程中累计的误差,需要对当前地图中与当前大视场帧具有共视关系的关键大视场帧以及与其关联的地图点进行校正。首先获取与当前大视场帧具有共视关系的关键大视场帧。两个大视场帧观察到的公共的地图点的数量大于共视关系阈值表示这两个大视场帧具有共视关系。其中,共视关系阈值表示判断两个关键大视场帧具有共视关系所需的最少的公共的地图点的数量,可以直接使用默认设置值,如20,也可由用户预先设置。然后通过上述相似变换矩阵校正这些关键大视场帧的位姿以及与这些关键大视场帧所关联的地图点的位置。到此闭环检测处理完成。
随着大视场相机的运动,跟踪计算的大视场相机的位姿以及三角测量得到的地图点的位置,都是有误差的。即使使用局部捆集优化或全局捆集优化去优化,仍然会存在累积误差。通过上述闭环检测处理,可以有效消除累计误差,从而使得SLAM构建的地图更加准确。
可选地,闭环检测处理还包括通过姿态图优化(pose-graph optimization) 对当前地图中所有关键大视场帧的位姿以及所有地图点的位置进行进一步优化。可选地,闭环检测处理还包括寻找并消除冗余的关键帧和地图点,以节约系统存储空间同时避免冗余的运算操作。
上述实施例中的步骤810至850给出了一种基于多虚拟针孔相机模型的大视场SLAM的步骤230的实施方式。可以理解,基于步骤220获取的去畸变图像,可以采用任何常规或未来开发的大视场SLAM方法。例如,上述基于多虚拟针孔相机模型进行重投影误差计算进行的优化更新处理,可以替换为基于单位方向向量误差计算进行优化更新处理。所述基于单位方向向量误差计算通过最小化地图点所对应的单位方向向量与该地图点关联的特征点对应的单位方向向量之间的差异达到最终的优化目标。所优化的目标损失可以是单位方向向量之间的距离,或单位向量之间的夹角,或可以是描述向量误差的其他指标。
最后,需要说明的是,本申请中提到的“左”和“右”,例如,“左目”、“右目”、“左鱼眼图像”、“右鱼眼图像”、“左去畸变图像”、“右去畸变图像”、“左重投影误差”、“右重投影误差”仅为了说明的目的,并不限定本申请保护的范围。
综上所述,在阅读本详细公开内容之后,本领域技术人员可以明白,前述详细公开内容可以仅以示例的方式呈现,并且可以不是限制性的。尽管这里没有明确说明,本领域技术人员可以理解本申请意图囊括对实施例的各种合理改变,改进和修改。这些改变,改进和修改旨在由本公开提出,并且在本公开的示例性实施例的精神和范围内。
此外,本申请中的某些术语已被用于描述本公开的实施例。例如,“一个实施例”,“实施例”和/或“一些实施例”意味着结合该实施例描述的特定特征,结构或特性可以包括在本公开的至少一个实施例中。因此,可以强调并且应当理解,在本说明书的各个部分中对“实施例”或“一个实施例”或“替代实施例”的两个或更多个引用不一定都指代相同的实施例。此外,特定特征,结构或特性可以在本公开的一个或多个实施例中适当地组合。
应当理解,在本公开的实施例的前述描述中,为了帮助理解一个特征,出于简化本公开的目的,本申请有时将各种特征组合在单个实施例、附图或其描述中。或者,本申请又是将各种特征分散在多个本发明的实施例中。然而,这并不是说这些特征的组合是必须的,本领域技术人员在阅读本申请的时候完全有可能将其中 一部分特征提取出来作为单独的实施例来理解。也就是说,本申请中的实施例也可以理解为多个次级实施例的整合。而每个次级实施例的内容在于少于单个前述公开实施例的所有特征的时候也是成立的。
在一些实施方案中,表达用于描述和要求保护本申请的某些实施方案的数量或性质的数字应理解为在某些情况下通过术语“约”,“近似”或“基本上”修饰。例如,除非另有说明,否则“约”,“近似”或“基本上”可表示其描述的值的±20%变化。因此,在一些实施方案中,书面描述和所附权利要求书中列出的数值参数是近似值,其可以根据特定实施方案试图获得的所需性质而变化。在一些实施方案中,数值参数应根据报告的有效数字的数量并通过应用普通的舍入技术来解释。尽管阐述本申请的一些实施方案列出了广泛范围的数值范围和参数是近似值,但具体实施例中都列出了尽可能精确的数值。
本文引用的每个专利,专利申请,专利申请的出版物和其他材料,例如文章,书籍,说明书,出版物,文件,物品等,可以通过引用结合于此。用于所有目的的全部内容,除了与其相关的任何起诉文件历史,可能与本文件不一致或相冲突的任何相同的,或者任何可能对权利要求的最宽范围具有限制性影响的任何相同的起诉文件历史。现在或以后与本文件相关联。举例来说,如果在与任何所包含的材料相关联的术语的描述、定义和/或使用与本文档相关的术语、描述、定义和/或之间存在任何不一致或冲突时,使用本文件中的术语为准。
最后,应理解,本文公开的申请的实施方案是对本申请的实施方案的原理的说明。其他修改后的实施例也在本申请的范围内。因此,本申请披露的实施例仅仅作为示例而非限制。本领域技术人员可以根据本申请中的实施例采取替代配置来实现本申请中的发明。因此,本申请的实施例不限于申请中被精确地描述过的哪些实施例。

Claims (20)

  1. 一种同时定位与建图的方法,其特征在于,所述方法包括:
    通过大视场相机获取大视场图像;
    基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;
    基于所述去畸变图像,确定所述大视场相机的位姿并构建地图;
    其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
  2. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括初始化步骤,所述初始化步骤包括:
    获取第一时刻对应的去畸变图像和第二时刻对应的去畸变图像;
    确定所述第一时刻对应的去畸变图像和所述第二时刻对应的去畸变图像互相匹配特征点;
    基于所述互相匹配的特征点构建初始地图。
  3. 如权利要求2所述的同时定位与建图的方法,其特征在于,所述基于所述互相匹配的特征点构建初始地图包括:
    基于所述第一时刻对应的去畸变图像中的特征点和所述第一时刻时所述大视场相机的相机中心,确定第一特征点对应的方向向量;
    基于所述第二时刻对应的去畸变图像中匹配的特征点和所述第二时刻时所述大视场相机的相机中心,确定第二特征点对应的方向向量;
    对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;
    基于所述地图点构建初始地图。
  4. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括全局捆集优化步骤,所述全局捆集优化步骤包括:
    对于所述地图中的每个关键大视场帧,
    将所述关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地 图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;
    基于所述重投影误差,更新所述关键大视场帧的位姿以及与所述关键大视场帧关联的所有地图点的位置。
  5. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括跟踪步骤,所述跟踪步骤包括:
    对于当前大视场帧关联的每个地图点,
    将所述地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前大视场帧关联的地图点的重投影误差确定重投影误差;
    基于所述重投影误差,更新所述当前大视场帧的位姿。
  6. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为单目大视场相机;所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括建图步骤,所述建图步骤包括:
    确定当前大视场帧及其参考帧互相匹配的特征点;
    基于所述当前大视场帧的特征点和当前所述大视场相机的相机中心,确定第一特征点对应的方向向量;
    基于所述参考帧匹配的特征点和所述参考帧对应的所述大视场相机的相机中心,确定第二特征点对应的方向向量;
    对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;
    基于所述地图点构建地图。
  7. 如权利要求6所述的同时定位与建图的方法,其特征在于,所述建图步骤进一步包括局部捆集优化步骤,所述局部捆集优化步骤包括:
    对于局部地图中的每个关键大视场帧,
    将所述关键大视场帧关联的每个地图点投影到多虚拟针孔相机模型中,得到所述地图点在所述多虚拟针孔相机模型中的重投影点;根据所述地图点在所述多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键大视场帧关联的地图点的重投影误差确定重投影误差;
    根据所述重投影误差,更新所述关键大视场帧的位姿以及与该关键大视场帧关联的所有地图点的位置。
  8. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述大视场相机为双目大视场相机;所述方法包括:
    通过所述双目大视场相机获取左视场图像和右视场图像;
    基于第一多虚拟针孔相机模型,得到所述左视场图像对应的左去畸变图像;
    基于第二多虚拟针孔相机模型,得到所述右视场图像对应的右去畸变图像;
    基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图;
    其中,所述第一多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述双目大视场相机的左侧相机的相机中心重合;
    所述第二多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述双目大视场相机的右侧相机的相机中心重合。
  9. 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括初始化步骤,所述初始化步骤包括:
    确定所述左去畸变图像和所述右去畸变图像互相匹配的特征点;
    基于所述互相匹配的特征点构建初始地图。
  10. 如权利要求9所述的同时定位与建图的方法,其特征在于,所述确定所述左去畸变图像和所述右去畸变图像互相匹配的特征点包括:
    确定所述左去畸变图像中的特征点在所述右去畸变图像中对应的极线;
    在所述极线上搜索与所述左去畸变图像中的特征点匹配的特征点;
    其中,所述极线为多线段折线。
  11. 如权利要求9所述的同时定位与建图的方法,其特征在于,所述基于所述互相匹配的特征点构建初始地图包括:
    基于所述左去畸变图像中的特征点和所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;
    基于所述右去畸变图像中匹配的特征点和所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;
    基于所述双目大视场相机的基线,对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;
    基于所述地图点构建初始地图。
  12. 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括全局捆集优化步骤,所述全局捆集优化步骤包括:
    对于所述地图中的每个关键双目图像帧,
    将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差;或
    将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差;
    基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
  13. 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括跟踪步骤,所述跟踪步骤包括:
    对于当前双目图像帧关联的每个地图点,
    将所述地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定左重投影误差;或
    将所述地图点投影到第二多虚拟针孔相机模型中,得到所述地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述当前双目图像帧关联的地图点的重投影误差确定右重投影误差;
    基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述当前双目图像帧的位姿。
  14. 如权利要求8所述的同时定位与建图的方法,其特征在于,所述基于所述左去畸变图像和所述右去畸变图像,确定所述双目大视场相机的位姿并构建地图包括建图步骤,所述建图步骤包括:
    确定当前左去畸变图像和当前右去畸变图像互相匹配的特征点;
    基于所述当前左去畸变图像的特征点和当前所述双目大视场相机的左侧相机的相机中心,确定第一特征点对应的方向向量;
    基于所述当前右去畸变图像的特征点和当前所述双目大视场相机的右侧相机的相机中心,确定第二特征点对应的方向向量;
    对所述第一特征点对应的方向向量和所述第二特征点对应的方向向量进行三角测量,确定所述特征点对应的地图点;
    基于所述地图点构建地图。
  15. 如权利要求14所述的同时定位与建图的方法,其特征在于,所述建图步骤进一步包括局部捆集优化步骤,所述局部捆集优化步骤包括:
    对于局部地图中的每个关键双目图像帧,
    将所述关键双目图像帧关联的地图点投影到第一多虚拟针孔相机模型中,得到所述地图点在所述第一多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第一多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定左重投影误差;或
    将所述关键双目图像帧关联的地图点投影到第二多虚拟针孔相机模型中,得到所述 地图点在所述第二多虚拟针孔相机模型中的重投影点;根据所述地图点在所述第二多虚拟针孔相机模型中的重投影点与所述地图点对应的特征点,确定所述地图点的重投影误差;根据所有所述关键双目图像帧关联的地图点的重投影误差确定右重投影误差;
    基于所述左重投影误差、所述右重投影误差或所述左重投影误差和所述右重投影误差之和,更新所述关键双目图像帧的位姿以及与所述关键双目图像帧关联的所有地图点的位置。
  16. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述基于所述去畸变图像,确定所述大视场相机的位姿并构建地图包括闭环检测处理步骤,所述闭环检测处理步骤包括:
    当当前大视场帧是关键大视场帧时,确定地图数据库中与所述当前大视场帧相似的闭环大视场帧;
    确定所述当前大视场帧与所述闭环大视场帧互相匹配的特征点;
    针对所述当前大视场帧中每个匹配的特征点,将该特征点关联的地图点变换到所述闭环大视场帧对应的多虚拟针孔相机模型的坐标系中,再投影到所述多虚拟针孔相机模型的成像平面上,得到该地图点在所述闭环大视场帧中的重投影点,根据该重投影点与所述闭环大视场帧中匹配的特征点确定第一重投影误差;
    根据所述当前大视场帧中所有匹配的特征点的第一重投影误差确定第一累计重投影误差;
    针对所述闭环大视场帧中每个匹配的特征点,将该特征点关联的地图点变换到所述当前大视场帧对应的多虚拟针孔相机模型的坐标系中,再投影到所述多虚拟针孔相机模型的成像平面上,得到该地图点在所述当前大视场帧中的重投影点,根据该重投影点与所述当前大视场帧中匹配的特征点确定第二重投影误差;
    根据所述闭环大视场帧中所有匹配的特征点的第二重投影误差确定第二累计重投影误差;
    利用所述第一累计重投影误差和所述第二累计重投影误差,对地图中与所述当前大视场帧具有共视关系的关键大视场帧以及与其关联的地图点进行校正。
  17. 如权利要求1所述的同时定位与建图的方法,其特征在于,所述至少两个不同朝向包括:立方体的前朝向、上朝向、下朝向、左朝向或右朝向。
  18. 一种同时定位与建图的装置,包括:
    至少一个存储设备,所述存储设备包括一组指令;以及
    与所述至少一个存储设备通信的至少一个处理器,其中,当执行所述一组指令时,所述至少一个处理器用于使所述同时定位与建图的装置:
    通过大视场相机获取大视场图像;
    基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;
    基于所述去畸变图像,确定所述大视场相机的位姿并构建地图;
    其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
  19. 如权利要求18所述的同时定位与建图的装置,其特征在于,所述大视场相机为单目大视场相机;为了基于所述去畸变图像,确定所述大视场相机的位姿并构建地图,所述至少一个处理器进一步用于使所述同时定位与建图的装置执行初始化步骤,所述初始化步骤包括:
    获取第一时刻对应的去畸变图像和第二时刻对应的去畸变图像;
    确定所述第一时刻对应的去畸变图像和所述第二时刻对应的去畸变图像互相匹配特征点;
    基于所述互相匹配的特征点构建初始地图。
  20. 一种包括计算机程序产品的非暂时性计算机可读介质,所述计算机程序产品包括一些指令,所述指令使计算设备:
    通过大视场相机获取大视场图像;
    基于多虚拟针孔相机模型,得到所述大视场图像对应的去畸变图像;
    基于所述去畸变图像,确定所述大视场相机的位姿并构建地图;
    其中,所述多虚拟针孔相机模型包括至少两个不同朝向的虚拟针孔相机,且所述至少两个不同朝向的虚拟针孔相机的相机中心与所述大视场相机的相机中心重合。
PCT/CN2018/124786 2018-06-07 2018-12-28 一种同时定位与建图的方法及装置 WO2019233090A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2019572827A JP7096274B2 (ja) 2018-06-07 2018-12-28 自己位置推定と環境マップ作成を同時に行う方法及び装置
EP18921621.1A EP3806036A4 (en) 2018-06-07 2018-12-28 METHOD AND DEVICE FOR SIMULTANEOUS LOCALIZATION AND MAPPING
KR1020197039024A KR102367361B1 (ko) 2018-06-07 2018-12-28 위치 측정 및 동시 지도화 방법 및 장치
US16/627,768 US11017545B2 (en) 2018-06-07 2018-12-28 Method and device of simultaneous localization and mapping

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810578095.3A CN108776976B (zh) 2018-06-07 2018-06-07 一种同时定位与建图的方法、系统及存储介质
CN201810578095.3 2018-06-07
CN201811401646.5 2018-11-22
CN201811401646.5A CN111210476B (zh) 2018-11-22 2018-11-22 一种同时定位与建图的方法及装置

Publications (1)

Publication Number Publication Date
WO2019233090A1 true WO2019233090A1 (zh) 2019-12-12

Family

ID=68770787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124786 WO2019233090A1 (zh) 2018-06-07 2018-12-28 一种同时定位与建图的方法及装置

Country Status (5)

Country Link
US (1) US11017545B2 (zh)
EP (1) EP3806036A4 (zh)
JP (1) JP7096274B2 (zh)
KR (1) KR102367361B1 (zh)
WO (1) WO2019233090A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461998A (zh) * 2020-03-11 2020-07-28 中国科学院深圳先进技术研究院 一种环境重建方法及装置
CN112509047A (zh) * 2020-12-10 2021-03-16 北京地平线信息技术有限公司 基于图像的位姿确定方法、装置、存储介质及电子设备
JP2021140780A (ja) * 2020-02-28 2021-09-16 ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド 地図作成のためのコンピュータ実施方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3139467A1 (en) * 2019-06-07 2020-12-10 David R. Nilosek Using spatial filter to reduce bundle adjustment block size
US11595568B2 (en) * 2020-02-18 2023-02-28 Occipital, Inc. System for generating a three-dimensional scene of a physical environment
CN113345032B (zh) * 2021-07-07 2023-09-15 北京易航远智科技有限公司 一种基于广角相机大畸变图的初始化建图方法及系统
CN113465617B (zh) * 2021-07-08 2024-03-19 上海汽车集团股份有限公司 一种地图构建方法、装置及电子设备
CN113506369A (zh) * 2021-07-13 2021-10-15 阿波罗智能技术(北京)有限公司 用于生成地图的方法、装置、电子设备和介质
CN113781573B (zh) * 2021-07-19 2024-04-23 长春理工大学 一种基于双目折反射全景相机的视觉里程计方法
CN116468786B (zh) * 2022-12-16 2023-12-26 中国海洋大学 一种面向动态环境的基于点线联合的语义slam方法
CN116009559B (zh) * 2023-03-24 2023-06-13 齐鲁工业大学(山东省科学院) 一种输水管道内壁巡检机器人及检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130182894A1 (en) * 2012-01-18 2013-07-18 Samsung Electronics Co., Ltd. Method and apparatus for camera tracking
CN106846467A (zh) * 2017-01-23 2017-06-13 阿依瓦(北京)技术有限公司 基于每个相机位置优化的实体场景建模方法和系统
CN107862744A (zh) * 2017-09-28 2018-03-30 深圳万图科技有限公司 航空影像三维建模方法及相关产品
CN108776976A (zh) * 2018-06-07 2018-11-09 驭势科技(北京)有限公司 一种同时定位与建图的方法、系统及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100776215B1 (ko) 2005-01-25 2007-11-16 삼성전자주식회사 상향 영상을 이용한 이동체의 위치 추정 및 지도 생성장치 및 방법과 그 장치를 제어하는 컴퓨터 프로그램을저장하는 컴퓨터로 읽을 수 있는 기록 매체
JP2008102620A (ja) 2006-10-17 2008-05-01 Toyota Motor Corp 画像処理装置
KR101423139B1 (ko) * 2012-06-19 2014-07-28 한양대학교 산학협력단 3차원 직선을 이용하여 위치를 인식하고 지도를 생성하는 방법 및 그 방법에 따른 이동체
US10203762B2 (en) * 2014-03-11 2019-02-12 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
US10852838B2 (en) * 2014-06-14 2020-12-01 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
KR101592740B1 (ko) * 2014-07-24 2016-02-15 현대자동차주식회사 차량용 광각카메라의 영상 왜곡 보정 장치 및 방법
KR101666959B1 (ko) * 2015-03-25 2016-10-18 ㈜베이다스 카메라로부터 획득한 영상에 대한 자동보정기능을 구비한 영상처리장치 및 그 방법

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130182894A1 (en) * 2012-01-18 2013-07-18 Samsung Electronics Co., Ltd. Method and apparatus for camera tracking
CN106846467A (zh) * 2017-01-23 2017-06-13 阿依瓦(北京)技术有限公司 基于每个相机位置优化的实体场景建模方法和系统
CN107862744A (zh) * 2017-09-28 2018-03-30 深圳万图科技有限公司 航空影像三维建模方法及相关产品
CN108776976A (zh) * 2018-06-07 2018-11-09 驭势科技(北京)有限公司 一种同时定位与建图的方法、系统及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3806036A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021140780A (ja) * 2020-02-28 2021-09-16 ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド 地図作成のためのコンピュータ実施方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム
US11417014B2 (en) 2020-02-28 2022-08-16 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for constructing map
JP7150917B2 (ja) 2020-02-28 2022-10-11 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド 地図作成のためのコンピュータ実施方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム
CN111461998A (zh) * 2020-03-11 2020-07-28 中国科学院深圳先进技术研究院 一种环境重建方法及装置
CN112509047A (zh) * 2020-12-10 2021-03-16 北京地平线信息技术有限公司 基于图像的位姿确定方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
US11017545B2 (en) 2021-05-25
KR102367361B1 (ko) 2022-02-23
KR20200014858A (ko) 2020-02-11
EP3806036A1 (en) 2021-04-14
US20210082137A1 (en) 2021-03-18
JP7096274B2 (ja) 2022-07-05
EP3806036A4 (en) 2022-03-16
JP2021505979A (ja) 2021-02-18

Similar Documents

Publication Publication Date Title
WO2019233090A1 (zh) 一种同时定位与建图的方法及装置
CN109166149B (zh) 一种融合双目相机与imu的定位与三维线框结构重建方法与系统
US10989540B2 (en) Binocular vision localization method, device and system
CN109307508B (zh) 一种基于多关键帧的全景惯导slam方法
EP2833322B1 (en) Stereo-motion method of three-dimensional (3-D) structure information extraction from a video for fusion with 3-D point cloud data
Alismail et al. Photometric bundle adjustment for vision-based slam
Mouragnon et al. Generic and real-time structure from motion using local bundle adjustment
CN108776976B (zh) 一种同时定位与建图的方法、系统及存储介质
CN108682027A (zh) 基于点、线特征融合的vSLAM实现方法及系统
WO2019029099A1 (zh) 基于图像梯度联合优化的双目视觉里程计算方法
CN106447601B (zh) 一种基于投影-相似变换的无人机遥感影像拼接方法
CN111127524A (zh) 一种轨迹跟踪与三维重建方法、系统及装置
CN111553939B (zh) 一种多目摄像机的图像配准算法
CN111415375B (zh) 一种基于多鱼眼摄像机和双针孔投影模型的slam方法
CN112767546B (zh) 移动机器人基于双目图像的视觉地图生成方法
CN109613974B (zh) 一种大场景下的ar家居体验方法
Ly et al. Extrinsic calibration of heterogeneous cameras by line images
CN109785373A (zh) 一种基于散斑的六自由度位姿估计系统及方法
CN111829522B (zh) 即时定位与地图构建方法、计算机设备以及装置
Koppel et al. Image-based rendering and modeling in video-endoscopy
CN117115271A (zh) 无人机飞行过程中的双目相机外参数自标定方法及系统
Guillemaut et al. Using points at infinity for parameter decoupling in camera calibration
CN103489165A (zh) 一种面向视频拼接的小数查找表生成方法
Yang et al. Research and application of 3D face modeling algorithm based on ICP accurate alignment
CN116128966A (zh) 一种基于环境物体的语义定位方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019572827

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20197039024

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921621

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018921621

Country of ref document: EP

Effective date: 20210111