WO2022002150A1 - 一种视觉点云地图的构建方法、装置 - Google Patents

一种视觉点云地图的构建方法、装置 Download PDF

Info

Publication number
WO2022002150A1
WO2022002150A1 PCT/CN2021/103653 CN2021103653W WO2022002150A1 WO 2022002150 A1 WO2022002150 A1 WO 2022002150A1 CN 2021103653 W CN2021103653 W CN 2021103653W WO 2022002150 A1 WO2022002150 A1 WO 2022002150A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
key frame
feature points
map
matching feature
Prior art date
Application number
PCT/CN2021/103653
Other languages
English (en)
French (fr)
Inventor
易雨亭
李建禹
龙学雄
党志强
Original Assignee
杭州海康机器人技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康机器人技术有限公司 filed Critical 杭州海康机器人技术有限公司
Publication of WO2022002150A1 publication Critical patent/WO2022002150A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • G01C21/32Structuring or formatting of map data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images

Definitions

  • the present application relates to the field of navigation and positioning, and in particular, to a method and device for constructing a visual point cloud map.
  • a visual point cloud map is a type of map that is constructed.
  • the visual point cloud map describes the vision, pose and other information of points in the environment through the three-dimensional point set in space. Therefore, two types of data information are needed to construct a visual point cloud map: key frames and map points.
  • key frames describe the environment in the environment.
  • Point of vision map points describe the pose of points in the environment.
  • a collection formed by a large number of map points constitutes a point cloud.
  • SLAM means that the robot starts from an unknown position in an unknown environment. During the movement process, it locates its own position and posture by repeating the observed map features, and then incrementally builds a map according to its own position, so as to achieve simultaneous positioning and map construction. the goal of.
  • SLAM-based map construction in terms of input, there is no input before the robot moves. When the robot starts to move, there is raw sensor data input; in terms of output, the estimated pose and estimated map are output. It can be seen that in the related art, a robot is positioned on the map while establishing a new map or improving a known map. This is similar to the process of placing a person in an unfamiliar city and familiarizing the person with the city. Based on the above, it can be seen that the SLAM map construction of related technologies couples the mapping problem and the positioning problem together, and the mapping and positioning affect each other.
  • the embodiments of the present application provide a method and apparatus for constructing a visual point cloud map, so as to avoid the influence of positioning on the mapping.
  • Feature extraction is performed on the source image frames collected in the space to be built to obtain the feature points of the source image frames;
  • the point cloud formed by the set of map points of all key frames is the first visual point cloud map.
  • feature extraction is performed on the source image frames collected in the space of the map to be constructed to obtain feature points of the source image frames, further comprising:
  • the method further includes:
  • the least squares method is used to perform graph optimization on the pose of the keyframes, and/or, according to the reprojection error, the spatial position information of the map points is optimized, Get the second vision point cloud map.
  • performing image preprocessing on the source image frame to obtain the target image frame including:
  • the source image frame is de-distorted to obtain a de-distorted image
  • the performing stretching processing on the foreground image includes:
  • the pixel value of the foreground image is less than or equal to the set minimum gray value, the pixel value of the foreground image is the minimum value within the pixel value range;
  • the pixel value of the foreground image is greater than the minimum gray value and less than the set maximum gray value, the pixel value in a certain proportion to the maximum value of the pixel is taken as the pixel value of the foreground image; the certain proportion is the pixel value of the foreground image and the The ratio of the difference between the minimum gray value and the difference between the maximum gray value and the minimum gray value;
  • the pixel value of the foreground image is greater than or equal to the maximum gray value, the pixel value of the foreground image is the maximum value within the range of pixel values;
  • the feature extraction is performed based on the target image frame to obtain the feature points of the target image frame, including:
  • the feature points in the grid are arranged in descending order according to the response value of the feature points, and the first Q feature points are retained to obtain the filtered feature points; among them, Q is based on the feature points in the target image frame.
  • the number and the set upper limit of the total number of feature points, and the total number of feature points in the grid are determined;
  • Feature descriptors are calculated separately for each feature point after screening.
  • the Q is determined according to the number of feature points in the target image frame and the set upper limit of the total number of feature points, and the total number of feature points in the grid, including: Q is the number of feature points in the target image frame divided by the set. The quotient of the upper limit of the total number of feature points is multiplied by the total number of feature points in the grid, and the result is rounded down.
  • performing inter-frame tracking on the source image frame to determine key frames including:
  • the key frame condition satisfies at least one of the following conditions:
  • the number of matching feature points is greater than the set first threshold
  • the spatial distance from the previous key frame is greater than the set second threshold
  • the spatial angle from the previous keyframe is greater than the set third threshold.
  • the source image frame is an image frame originating from a monocular camera and being on the same plane;
  • the calculation of the spatial position information of the matching feature points in the current key frame includes:
  • the x coordinate is: the ratio of the product of the pixel abscissa of the matching feature point in the current key frame and the camera installation height to the camera focal length;
  • the y coordinate is: the ratio of the product of the pixel ordinate of the matching feature point in the current key frame and the camera installation height to the camera focal length;
  • the z coordinate is: camera installation height.
  • the source image frame is an image frame originating from a monocular camera and not on the same plane;
  • the calculation of the spatial position information of the matching feature points in the current key frame includes:
  • the pixel coordinates of at least 8 pairs of matching feature points consisting of matching feature points in the current key frame and matching feature points in the previous key frame, obtain the essential matrix of the current key frame and the previous key frame;
  • For each matching feature point According to the relative pose between the current key frame and the previous key frame, according to the triangulation calculation relationship, at least the depth value of the matching feature point in the current key frame is obtained; The depth value of the feature point is obtained to obtain the spatial position information of the matching feature point.
  • the essential matrix of the current key frame and the previous key frame including :
  • At least the depth value of the matching feature point in the current key frame is obtained, including:
  • the depth value of the matching feature point in the current key frame and the matrix of the normalized plane coordinates of the matching feature point is equal to, the depth value of the matching feature point in the previous key frame, the rotation matrix in the relative pose, And the sum of the matrix product of the normalized plane coordinates of the matching feature point in the previous keyframe and the translation matrix in the relative pose, according to the rotation matrix in the relative pose between the current keyframe and the previous keyframe and the translation matrix, the matrix of the normalized plane coordinates of the matching feature point in the current key frame and the previous key frame, to obtain the depth value of the matching feature point in the current key frame;
  • the spatial position information of the matching feature point is obtained, including:
  • the x coordinate is: the product of the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the depth value of the matching feature point;
  • the y coordinate is: the product of the pixel ordinate of the normalized plane of the matching feature point in the current key frame and the depth value of the matching feature point;
  • the z coordinate is: camera focal length.
  • the source image frame is a binocular image frame originating from a binocular camera and not on the same plane;
  • the image preprocessing is performed on the source image frame to obtain the target image frame, including:
  • the feature extraction based on the target image frame to obtain the feature points of the target image frame includes: extracting the feature points of the first target image frame and the feature points of the second target image frame respectively;
  • the judging whether the target image frame is the first frame includes: judging whether the binocular target image frame is the first frame; if so, then using any frame in the binocular target image frame as a key frame; otherwise, determining according to key frame conditions Whether any frame in the target image frame is a key frame;
  • the calculation of the spatial position information of the matching feature points in the current key frame includes:
  • the x-coordinate of the matching feature point in the current key frame is: the product of the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the length of the binocular baseline, divided by the normalized value of the matching feature point in the current key frame The absolute value of the difference between the pixel abscissa of the normalization plane and the pixel abscissa of the normalization plane of the matching feature point in the second frame;
  • the y-coordinate of the matching feature point in the current key frame is: the product of the pixel ordinate of the normalized plane of the matching feature point in the current key frame and the length of the binocular baseline, divided by the normalized value of the matching feature point in the current key frame The absolute value of the difference between the pixel abscissa of the normalization plane and the pixel abscissa of the normalization plane of the matching feature point in the second frame;
  • the z-coordinate of the matching feature point in the current key frame is: the product of the camera focal length and the binocular baseline length, divided by the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the matching feature point in the second frame The absolute value of the difference between the pixel abscissas of the normalized plane.
  • the least squares method is used to perform graph optimization on the poses of the keyframes, including:
  • a second objective function for key frame pose graph optimization is constructed, and the closed-loop constraint is used as the constraint, and the least squares method is used to solve the pose of the key frame when the second objective function achieves the minimum value.
  • identifying the closed-loop key frame in the first visual point cloud map according to the artificial mark or key frame similarity calculation including:
  • the keyframes with the same identification are collected in different times as closed-loop keyframes
  • the relative poses between the closed-loop keyframes are calculated based on the closed-loop keyframes, as closed-loop constraints, including:
  • the matching feature points in the closed-loop key frame are calculated
  • the pixel coordinate matrix of the matching feature point in the first closed-loop keyframe is equal to the rotation in the relative pose between the first closed-loop keyframe and the second closed-loop keyframe
  • the product of the matrix and the pixel coordinate matrix of the second closed-loop key frame plus the relationship of the translation matrix in the relative pose is calculated to obtain the relative pose as an initial value
  • the second objective function constructed for the optimization of the pose graph of the key frame is constrained by the closed-loop constraint, and the least squares method is used to solve the pose of the key frame when the second objective function obtains the minimum value, including:
  • the calculating whether the similarity between the two key frames is greater than a set similarity threshold includes:
  • the feature points belonging to the node are clustered into k categories to obtain the next layer of nodes;
  • the feature points belonging to the node are grouped into k categories to obtain the next layer of nodes; repeating the above for each node of the next layer, the feature points belonging to the node are grouped into k categories class, the steps of obtaining the next layer of nodes, until the last leaf layer, to obtain a visual dictionary, the visual dictionary includes N feature points, and each fork is a tree of k;
  • the leaf layer includes the word feature points in the visual dictionary
  • k, d, and N are all natural numbers, and N is the total number of feature points in the visual dictionary
  • the weight of the word feature point is calculated, and the key frame is described as a word feature point and its weight as elements.
  • Set the set includes N elements;
  • optimizing the spatial location information of the map points according to the reprojection error includes:
  • a third objective function of the re-projection error is constructed
  • the initial value of the reprojection error is: the difference between the pixel position of the map point in the key frame and the reprojection position of the map point in the image;
  • the re-projected position of the map point in the image is obtained according to the camera internal parameters, the pose of the key frame, and the spatial position information of the map point.
  • the embodiment of the present application also provides a device for constructing a visual point cloud map, including a first visual point cloud map construction unit, the unit comprising:
  • the feature extraction module is used to perform feature extraction on the source image frame collected in the space to be constructed to obtain the feature points of the source image frame;
  • the map point generation module is used to track the source image frame between frames, determine the key frame, match the feature points in the current key frame with the feature points in the previous key frame, obtain the matching feature points of the current key frame, and calculate The spatial position information of the matching feature points in the current key frame, and the spatial position information of the matching feature points is used as the map point information of the current key frame,
  • the point cloud formed by the set of map points of all key frames is the first visual point cloud map.
  • An embodiment of the present application further provides an electronic device, including a memory and a processor, where the memory stores executable computer instructions, and the processor is configured to execute the instructions stored in the memory, so as to implement any of the above Describe the steps of the construction method of the visual point cloud map.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of any of the above-mentioned methods for constructing a visual point cloud map are implemented.
  • An embodiment of the present application further provides a computer program, which implements the steps of any of the above-mentioned construction methods for a visual point cloud map when the computer program is executed by a processor.
  • the feature extraction is performed on the image frames collected in the space where the map is to be constructed, and the spatial location information of the matching feature points is obtained through inter-frame matching, and the matching feature points are used as map points.
  • a visual point cloud map composed of a set of map points of all key frames, which realizes the generation and description of 3D points in the physical environment.
  • the process of constructing the map separates the mapping and the positioning, and effectively removes the mutual influence between the mapping and the positioning.
  • the map construction method provided by the embodiments of the present application has better adaptability and stability.
  • the accuracy of the map is improved.
  • the map can be corrected in time without losing the initial map data, which enhances the scalability of the map construction and is conducive to integration with the improved map construction method.
  • FIG. 1 is a schematic flowchart of constructing a map based on image data collected by a monocular camera according to Embodiment 1 of the present application.
  • FIG. 2 is a schematic diagram of feature point screening provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of constructing a map based on front-view image data collected by a monocular camera according to Embodiment 2 of the present application.
  • FIG. 4 is a schematic flowchart of constructing a map based on image data collected by a binocular camera according to Embodiment 3 of the present application.
  • FIG. 5 is a schematic diagram of accumulated errors provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of optimizing a first visual point cloud map according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a visual dictionary provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an apparatus for constructing a visual point cloud map according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an image preprocessing module provided by an embodiment of the present application.
  • a visual point cloud map is obtained through feature extraction and feature point matching of inter-frame tracking.
  • pose graph optimization is performed through closed-loop constraints, and/or map point optimization is performed through reprojection errors to improve the accuracy of the map.
  • the constructed visual point cloud map includes at least key frame pose information and spatial position information of map points, wherein each map point may also have feature point descriptor information.
  • the embodiment of the present application provides a method for constructing a visual point cloud map.
  • the method for constructing a visual point cloud map can be applied to a robot or a server connected to the robot, which is not limited.
  • the construction method of the visual point cloud map includes:
  • Feature extraction is performed on the source image frames collected in the space to be built to obtain the feature points of the source image frames;
  • the point cloud formed by the set of map points of all key frames is the first visual point cloud map.
  • the process of constructing the map separates the mapping and the positioning, and effectively removes the mutual influence between the mapping and the positioning.
  • the map construction method provided by the embodiments of the present application has better adaptability and stability.
  • the image data is collected by a monocular camera and the image data is a ground texture image as an example for description. It should be understood that, in this embodiment of the present application, the image data may be simply referred to as an image or an image frame, and the image frame may not be limited to ground texture images, and other types of image frames may also be applicable.
  • FIG. 1 is a schematic flowchart of constructing a map based on image data collected by a monocular camera according to Embodiment 1 of the present application.
  • the construction process of the map can include the following three stages: image preprocessing, feature extraction, and inter-frame tracking. Optionally, for each image frame, the following steps are performed:
  • Step 101 taking the collected image frame as a source image frame, and preprocessing the source image frame to obtain a target image frame, so as to extract feature points in the image frame.
  • the visual point cloud map is a ground texture map
  • the texture information in the ground texture image frame needs to be extracted. Therefore, the purpose of preprocessing the ground texture image frame is to obtain an image frame mainly based on texture information, so as to Feature points including texture information are extracted.
  • step 101 can be refined into the following steps:
  • Step 1011 perform de-distortion processing on the source image frame according to the distortion coefficient of the camera, and obtain the de-distorted image frame I(u, v), where u and v represent pixel coordinates, and I(u, v) represent the The pixel value at this pixel coordinate (u, v).
  • Step 1012 Perform image filtering on the de-distorted image frame I(u, v) to obtain a background image frame I b (u, v).
  • Step 1012 may be to perform Gaussian filtering on the dedistorted image frame I(u, v), where the size of the Gaussian filter kernel may be set to 45 ⁇ 45.
  • step 1012 can be expressed mathematically as:
  • I b (u, v) G ⁇ I (u, v);
  • G is the filter kernel of image filtering
  • I b (u, v) is the background image frame, that is, the filtered image frame
  • I (u, v) is the undistorted image frame
  • the image filter kernel (such as the above-mentioned Gaussian filter kernel) may be set relatively large, so that the filtered image frame is as close to the real background image frame as possible.
  • the de-distorted image frame I(u, v) can be inverted first, and expressed mathematically as: pixel maximum value -I(u, v) .
  • the inversion operation is: 255-I(u, v).
  • the above-mentioned texture area is the area where the feature points in the image frame are located. If the brightness of the texture area in the source image frame is lower than the preset brightness threshold, the de-distorted image frame I(u, v) can be inverted to obtain the inverted image frame, and then the inverted image The frame is subjected to image filtering to obtain the background image frame I b (u, v).
  • Step 1013 spent distorted image frames frame by subtracting the background image, in order to obtain texture information based foreground image frame I f (u, v).
  • the foreground image frame obtained in the above step 1013 can be expressed mathematically as:
  • I f (u, v) I (u, v) -I b (u, v);
  • I f (u, v) as a foreground image frame
  • I b (u, v) as the background image frame, i.e., the filtered image frame
  • I (u, v) is the undistorted image frame.
  • Step 1014 stretching the foreground image frame to obtain the target image frame.
  • the texture information in the captured image frame is weak, and the pixel values (grayscale) of the texture area are mostly distributed in a narrow grayscale interval. Therefore, in this step, the pixel value of the foreground image frame is stretched to the pixel value range, and the grayscale interval of the pixel value of the foreground image frame is enlarged.
  • the pixel value range may be a range that a pixel point can actually take, that is, 0 to 255.
  • the gray value is the pixel value.
  • step 1014 may be:
  • the pixel value of the foreground image is less than or equal to the minimum gray value
  • the pixel value of the foreground image is the minimum value within the pixel value range, for example, the minimum value of the pixel is 0;
  • the contrast of the pixel value of the foreground image is increased.
  • the pixel value of the foreground image may be taken as a pixel value proportional to the maximum value of the pixel.
  • the above ratio may be: the ratio of the difference between the pixel value of the foreground image and the minimum gray value and the difference between the maximum gray value and the minimum gray value.
  • the pixel value of the foreground image is the maximum value within the pixel value range, for example, the maximum value of the pixel is 255.
  • the above-mentioned maximum gray value and minimum gray value may be values preset by the user according to actual needs.
  • the maximum gray value is 200, 220, etc.
  • the minimum gray value is 50, 100, etc.
  • step 1014 can be expressed mathematically as:
  • Drawing foreground image frame I f '(u, v) is expressed as:
  • I f '(u, v ) of the target frame image i.e., after stretching foreground image frame
  • I f (u, v) represents the foreground image frame
  • I min is the minimum gradation value
  • I max is the maximum gradation
  • the pixel value ranges from 0 to 255.
  • the pixel value of the foreground image is the pixel value of one pixel in the foreground image frame. For each pixel in the foreground image frame:
  • the pixel value of the pixel point is the minimum value within the pixel value range
  • the manner of increasing the contrast may be: taking a pixel value that is in a preset ratio to the maximum value of the pixel value range as the pixel value of the pixel point.
  • the preset ratio may be: the ratio of the first difference value and the second difference value, the first difference value is the difference between the pixel value of the pixel point and the minimum gray value, and the second difference value is the maximum gray value and the minimum gray value. difference between grayscale values.
  • the pixel value of the pixel point is the maximum value within the pixel value range.
  • Image frame preprocessing is beneficial to improve the contrast of image frames. In some environments with weak texture information, it is beneficial to improve the contrast of texture areas, so as to extract more feature points.
  • Step 102 Extract feature points based on the current target image frame to convert image information into feature information to obtain a feature point set of the current target image frame.
  • ORB Oriented FAST and Rotated BRIEF, oriented to accelerated segmentation test features and binary robust independent basic features
  • SIFT Scale invariant feature Transform, scale invariant feature transform
  • SURF Speeded Up Robust Features, speed up robust features
  • step 102 may include:
  • Step 1021 based on the target image frame, adopt the FAST (Features from Accelerated Segment Test, accelerated segment test feature) algorithm to perform feature detection to obtain FAST feature points.
  • FAST Features from Accelerated Segment Test, accelerated segment test feature
  • Step 1022 Screen the FAST feature points to effectively control the scale of the feature points.
  • the target image frame may be divided into a certain number of grids, as shown in FIG. 2 , which is the present application A schematic diagram of feature point screening provided in the embodiment.
  • the target image frame is divided into a plurality of grids. The number of grids is set according to actual needs.
  • the feature points in the grid are arranged in descending order according to the response value of the FAST feature points, and the first Q feature points are reserved, wherein Q is based on the number of feature points in a target image frame and the total number of feature points set The upper limit, and the total number of feature points in the grid are determined.
  • the number of feature points retained by different grids can be different or the same.
  • the upper limit of the total number of feature points is set to 100, and the number of feature points in the target image frame is 2000, according to the number of feature points in the target image frame (2000) and the total number of feature points
  • Step 1023 for each FAST feature point screened out, determine the direction of the FAST feature point, that is, calculate the feature point with r as the centroid within the radius, and the feature point coordinates to the centroid form a vector as the direction of the feature point. .
  • the filtered FAST feature points are the first Q feature points retained above.
  • the above step 1023 may be: for each FAST feature point screened out, calculate the centroid of all FAST feature points within the range with the feature point as the center and r as the radius, and form a vector from the FAST feature point to the centroid as the FAST Orientation of feature points.
  • Step 1024 Calculate a feature descriptor of a binary string for each of the filtered FAST feature points, so as to obtain feature point information in the current target image frame.
  • feature descriptors such as rBRIEF and oBRIEF may be used to represent feature point information.
  • Step 103 inter-frame tracking, to match the feature points in the previous and subsequent frames, calculate the coordinates of the matched feature points in the world coordinate system, and save them as map points with three-dimensional space position information.
  • inter-frame matching to match the feature points in the preceding and following frames, may be referred to as inter-frame matching for short.
  • the coordinates in the world coordinate system can also be called space coordinates.
  • Step 1031 determine whether the current target image frame is the first frame; if so, use the target image frame as a key frame; otherwise, perform step 1032 to perform inter-frame matching to determine whether the current target image frame is a key frame.
  • Step 1032 matching the current target image frame with the previous key frame, namely:
  • any feature point i of the current target image frame calculate whether the matching degree between the feature point i in the current target image frame and the descriptor of the feature point i in the previous key frame is less than the set matching threshold; if so, Then it is determined that the two feature points match; otherwise, it is determined that the two feature points do not match.
  • the matching degree can be described by the Hamming distance, and the matching threshold is the Hamming distance threshold.
  • the matching threshold size can be set according to actual needs.
  • the feature point i in the current target image frame is the same point in the space corresponding to the feature point i in the previous key frame. If the matching degree between the feature point i in the current target image frame and the descriptor of the feature point i in the previous key frame is less than the set matching threshold, then the feature point i in the current target image frame and the descriptor in the previous key frame The feature point i is matched, and the feature point i is the matching feature point of the current target image frame.
  • Step 1033 according to the key frame condition, judge whether the current target image frame is a key frame; if so, take the current target image frame as a key frame, and perform step 1034 to perform map update based on the key frame; otherwise, do not perform map update .
  • the key frame condition may be that the number of matching feature points is greater than the set first threshold.
  • step 1033 may be: when the number of matching feature points of the current target image frame is greater than the set first threshold, it may be determined that the current target image frame is a key frame;
  • a keyframe condition can also be one of the following:
  • the spatial distance from the previous key frame is greater than the set second threshold
  • the space angle between the previous keyframe is greater than the set third threshold
  • the above spatial distance is: the distance between the current position and the previous position.
  • the above space angle is: the angle the robot rotates from the previous position to the current position.
  • the current position is the position of the robot when the current target image frame is collected
  • the previous position is the position of the robot when the last key frame was collected.
  • step 1033 and step 1032 are reversed, that is, the key frame is first determined, and then the matching feature points are determined.
  • Step 1034 based on the current key frame, calculate the coordinates of each matched feature point (referred to as matching feature point), and save it as map point information;
  • each map point corresponds to three-dimensional space position information.
  • the three-dimensional spatial position information is referred to as the spatial position information, which is the coordinates of the map point in the world coordinate system.
  • the point where the map point is projected in the image frame is the feature point.
  • the map point information may include spatial location information.
  • the map point information may further include: a key frame collected at the spatial coordinates indicated by the map point information, and the posture of the robot when the key frame is collected.
  • any matching feature point of the current key frame is in the world
  • the coordinates in the coordinate system can be used to project the matching feature points of the current key frame onto the image plane through the external parameters of the camera to obtain the pixel coordinates in the image coordinate system.
  • the coordinates of the matching feature points in the world coordinate system are the spatial position information of the matching feature points.
  • ground texture image frames are in the same plane: the distance between each spatial point and the plane where the lens of the monocular camera is located is the same.
  • the space point is the point in the world coordinate system corresponding to the pixel point in the ground texture image frame.
  • the pixel coordinates of the matching feature points of the current key frame can be projected into the world coordinate system through the external parameters of the camera to obtain the spatial position information of the matching feature points.
  • the above step 1034 may be:
  • the x coordinate is the ratio of the product of the pixel abscissa u of the matching feature point i of the current key frame and the camera installation height to the camera focal length
  • the y coordinate is the ratio of the product of the pixel ordinate v of the matching feature point i of the current key frame and the camera installation height to the camera focal length
  • the z coordinate is the camera mount height.
  • the spatial position information of the matching feature point i can be expressed mathematically as:
  • H is the installation height of the camera
  • f is the focal length of the camera
  • u and v are the pixel coordinates of the matching feature point i in the image coordinate system
  • x and y are the coordinates in the world coordinate system.
  • Steps 101 to 103 are repeatedly performed until all source image frames are processed, and a first visual point cloud map composed of a large number of map points is obtained.
  • Steps 101 to 103 are repeatedly performed to obtain a large amount of map point information, and one map point information can identify one map point in one world coordinate system.
  • a large number of map point information can identify a large number of map points, and combined with key frames, constitute the first visual point cloud map.
  • the embodiment of the present application provides a method for constructing a visual point cloud map.
  • feature point matching is performed with adjacent key frames, and a map point is generated based on the pixel coordinates of the matched feature points. Three-dimensional coordinates to obtain a visual point cloud map.
  • map construction in the process of map construction, there is no need to determine the positioning information of the robot, which avoids the influence of the positioning information on the map construction.
  • the technical solutions provided by the embodiments of the present application avoid the problem of discrete (ie discontinuous) map information due to the distance between map points, and realize continuous The construction of map points enables continuous positioning in positioning applications without jumping problems.
  • the image data is collected by a monocular camera
  • the collected image frames are image frames that are not on the same plane as an example for description.
  • a monocular camera adopts a forward-looking installation, that is, the robot captures image frames through the forward-looking camera.
  • the spatial point is the point in the world coordinate system corresponding to the pixel point in the image frame.
  • Fig. 3 is a schematic flow chart of constructing a map based on the front-view image data collected by the monocular camera provided in the second embodiment of the present application. For each image frame, perform the following steps:
  • Step 301 Perform de-distortion processing on the source image frame according to the distortion coefficient of the camera to obtain the de-distorted image frame I(u, v), where u and v represent pixel coordinates, and I(u, v) represent the pixel in the image frame The pixel value at the coordinate.
  • the acquired image frame is used as the source image frame, and then the source image frame is de-distorted according to the distortion coefficient of the camera to obtain the de-distorted image frame I(u, v).
  • Step 302 determine whether the pixel value of each pixel in the dedistorted image frame is greater than the set first pixel threshold; if so, perform an inversion operation on the pixel whose pixel value is greater than the first pixel threshold, and then conduct The de-distorted image frame after the reverse operation is filtered; otherwise, the de-distorted image frame I(u, v) is directly subjected to image filtering to obtain the background image frame I b (u, v).
  • the above step 302 may be: for each pixel in the de-distorted image frame, determine whether the pixel value of the pixel is greater than the set first pixel threshold; if it is greater than the first pixel threshold, perform an inversion operation on the pixel ; If it is less than or equal to the first pixel threshold, the pixel point does not need to be inverted.
  • image filtering is performed on the processed dedistorted image frame to obtain the background image frame I b (u , v).
  • Step 303 de-distorted image frames frame by subtracting the background image, foreground image frames to obtain I f (u, v).
  • the foreground image frame obtained in the above step 303 can be expressed mathematically as:
  • I f (u, v) I (u, v) -I b (u, v).
  • Step 304 determining foreground image frame I f (u, v) whether or not the pixel value distribution; If so, the foreground image frame as a target image frame; otherwise, the foreground image frames stretching to obtain a target image frame,
  • the stretching process in step 304 is the same as that in step 1014 .
  • the present embodiments of the application if the pixel values of the foreground pixels in an image frame uniformly distributed over the range 0 to 255, it is determined that a high quality image of the foreground image frame, the foreground image frame I f (u, v) of the pixel values If the distribution is uniform, the foreground frame is used as the target image frame; if the pixel values of the pixels in the foreground image frame are distributed in a narrow grayscale interval, for example, the pixel values of the pixels in the foreground image frame are distributed in the interval of 100-150, lower image quality is determined foreground image frame, the foreground image frame I f (u, v) of the non-uniform distribution of pixel values, the foreground image frame stretching, to obtain a target image frame.
  • the image stretching process is not performed for high image quality, and the image stretching process is performed for low image quality, so that the image stretching process is selectively processed according to the image quality, which reduces the burden on the device.
  • Step 305 Extract feature points based on the current target image frame to convert image information into feature information to obtain a feature point set of the current target image frame.
  • This step 305 is the same as step 102 .
  • Step 306 judge whether the current target image frame is the first frame; if so, then take the current target image frame as a key frame, and then return to step 310; otherwise, perform step 307 to perform inter-frame matching to determine whether the current target image frame is a key frame. Keyframe.
  • Step 307 matching the current target image frame with the previous key frame, namely:
  • any feature point i of the current target image frame calculate whether the matching degree between the feature point i in the current target image frame and the descriptor of the feature point i in the previous key frame is less than the set matching threshold; if so, Then it is determined that the two feature points match, and the feature point i is the matching feature point of the current target image frame; otherwise, it is determined that the two feature points do not match.
  • the matching degree can be described by the Hamming distance, and the matching threshold is the Hamming distance threshold.
  • Step 308 judge whether the current target image frame is a key frame according to the key frame condition; if so, take the current target image frame as a key frame, and execute step 309 to perform map update based on the key frame; otherwise, do not perform map update , and directly execute step 310.
  • the number of matching feature points is greater than the set first threshold
  • the spatial distance from the previous key frame is greater than the set second threshold
  • the spatial angle from the previous keyframe is greater than the set third threshold.
  • Step 309 Calculate the coordinates of each matching feature point based on the current key frame, and save it as map point information.
  • the obtained current map information includes: unupdated map point information and updated map point information.
  • each map point corresponds to three-dimensional spatial position information.
  • the eight-point method is used to calculate the essential matrix between the two image frames according to the pixel coordinates of the matching feature points, and SVD is performed on the essential matrix.
  • SVD is performed on the essential matrix.
  • the above-mentioned relative pose of the camera is the relative pose between the two image frames.
  • the camera is installed on the robot, so the relative pose of the camera can be understood as: the relative pose of the robot between the positions where the two image frames are collected.
  • the coordinates of each matching feature point in step 309 can be calculated according to the following steps:
  • Step 3091 in accordance with the normalized plane coordinate matching feature points essential matrix E and the current key frame i, p 1, matching feature points on a key frame i is normalized planar coordinates p 2 satisfies: for any matching feature points , the product of the transposed matrix of the normalized plane coordinates of the matching feature point in the previous key frame, the essential matrix, and the matrix of the normalized plane coordinates of the matching feature point in the current key frame is equal to 0. Solve the relationship of the essential matrix E.
  • the solution of the essential matrix E can be expressed mathematically as:
  • the essential matrix E is a 3 ⁇ 3 matrix that reflects the relationship between the representation of the image point of a point P in the space in the camera coordinate system under different viewing angle cameras.
  • the function of the essential matrix E is: a point on the first image frame is multiplied by the essential matrix, and the result is the epipolar line of the point on the second image frame.
  • the normalized plane coordinates of the matching feature point i of the current keyframe Normalized plane coordinates of the previous keyframe matching feature point i p 1 and p 2 correspond to the same point in space, that is, p 1 and p 2 correspond to the same point in the world coordinate system, and p 1 and p 2 are a pair of matching feature points.
  • the essential matrix can be solved by substituting the normalized plane coordinates of eight pairs of matching feature points. Among them, the matching feature point i of the current key frame and the matching feature point i of the previous key frame form a pair of matching feature points.
  • Step 3092 Perform SVD on the essential matrix E to obtain the relative pose between the current key frame and the previous key frame, that is, the relative pose of the camera, including the translation matrix t and the rotation matrix R.
  • Step 3093 calculated based on the principle of triangulation, the depth value of the current keyframe matched feature point depth value i s 1, the matching feature points of a key frame i s 2 satisfies:
  • s 2 can be obtained, and the obtained s 2 is substituted into Equation 1 to obtain s 1 .
  • R represents the rotation matrix
  • t represents the translation matrix
  • p 1 is the normalized plane coordinate of the matching feature point i of the current key frame
  • p 2 is the normalized plane coordinate of the matching feature point i of the previous key frame.
  • Step 3094 according to the depth value s 1 of the matching feature point i of the current key frame, calculate the coordinates of the matching feature point i of the current key frame in the world coordinate system, which can be:
  • the x coordinate is: the product of the pixel abscissa of the normalized plane of the matching feature point i in the current key frame and the depth value of the matching feature point;
  • the y coordinate is: the product of the pixel ordinate of the normalized plane of the matching feature point i in the current key frame and the depth value of the matching feature point;
  • the z coordinate is: camera focal length.
  • f is the focal length of the camera to convert the normalized plane coordinates to the imaging plane.
  • u 1 is the abscissa of the pixel in the normalized plane coordinates
  • v 1 is the ordinate of the pixel in the normalized plane coordinates
  • s 1 is the depth value of the matching feature point i of the current key frame.
  • Step 310 determine whether the processing of the source image frame is completed; if so, end; otherwise, process the next source image frame, and return to step 301 until all the source image frames are processed, and a first vision composed of a large number of map points is obtained. Point cloud map.
  • An embodiment of the present application provides a method for constructing a visual point cloud map based on image frames of different planes collected by a forward-looking camera.
  • feature point matching is performed with adjacent key frames.
  • map construction there is no need to determine the positioning information of the robot, so that the map construction and positioning are separated, and the stability of the constructed map and the adaptability to complex environments are improved.
  • the image data is collected by a binocular camera as an example for illustration, and the collected image frames are image frames that are not on the same plane.
  • FIG. 4 is a schematic flowchart of constructing a map based on image data collected by a binocular camera according to Embodiment 3 of the present application.
  • One eye camera in the binocular camera is used as the first eye camera
  • the other eye camera in the binocular camera is used as the second eye camera.
  • the following steps are performed:
  • Step 401 Preprocess the first source image frame and the second source image frame to obtain a current binocular target image frame, including the first target image frame and the second target image frame;
  • the first source image frame and the second source image frame may be preprocessed in parallel, or the first source image frame and the second source image frame may be preprocessed in series, respectively, and no limited.
  • the above step 401 may be: preprocessing the first source image frame to obtain the first target image frame, and preprocessing the second source image frame to obtain the second target image frame.
  • the first target image frame and the second target image frame constitute the current binocular target image frame.
  • step 401 The preprocessing in step 401 is the same as that in steps 301 to 304 .
  • Step 402 based on the current binocular target image frame, extract the feature points of the first target image frame and the feature points of the second target image frame respectively, to convert the image information into feature information, and obtain the feature points of the current binocular target image frame gather.
  • Step 403 determine whether the current binocular target image frame is the first binocular image frame; if so, take any frame in the current binocular target image frame as a key frame, and execute step 406; otherwise, execute step 404, proceed Inter-frame matching to determine whether any frame in the current binocular target image frame is a key frame.
  • Step 404 in order to improve the efficiency of matching, any target image frame in the current binocular target image frame can be matched with the previous key frame, and the matching feature point of the target image frame can be obtained,
  • step 404 The matching method in step 404 is the same as that in step 307 .
  • Step 405 according to the key frame condition, judge whether any target image frame in the current binocular target image frame is a key frame; if so, then use the target image frame as the key frame of the current binocular target image frame, and execute step 406, to update the map based on this keyframe; otherwise, do not update the map.
  • the target image frame is a key frame:
  • the number of matching feature points is greater than the set first threshold
  • the spatial distance from the previous key frame is greater than the set second threshold
  • the spatial angle from the previous keyframe is greater than the set third threshold.
  • the target image frame of the key frame determined in step 405 and the target image frame of the matching feature points extracted in step 404 are the same target image frame.
  • Step 406 based on the first matching feature point in the current key frame, search for the second matching feature point that is successfully matched in the current binocular target image, calculate the coordinates of the first matching feature point, and save it as map point information.
  • the obtained current map information includes: unupdated map point information and updated map point information.
  • each map point corresponds to three-dimensional space position information, that is, map point coordinates.
  • the second matching feature point is a feature point in a target image frame other than the current key frame in the current binocular target image frame, and the matching degree between the second matching feature point and the first matching feature point is less than the set set matching threshold.
  • the above step 406 may be: based on the first matching feature point in the current key frame, search the current binocular target image frame, obtain a second matching feature point matching the first matching feature point, and based on the second matching feature point The coordinates of the first matching feature point are calculated, and the coordinates of the first matching feature point are stored as map point information.
  • the calculation process of the coordinates of any matching feature point (first matching feature point) i in the current key frame is as follows:
  • the x coordinate is: the product of the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the length of the binocular baseline, divided by the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the first The absolute value of the difference between the pixel abscissas of the normalized plane of the matching feature points in the two frames;
  • the y-coordinate is: the product of the pixel ordinate of the normalized plane of the matching feature point in the current key frame and the length of the binocular baseline, divided by the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the first The absolute value of the difference between the pixel abscissas of the normalized plane of the matching feature points in the two frames;
  • the z coordinate is: the product of the camera focal length and the binocular baseline length, divided by the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the pixel abscissa of the normalized plane of the matching feature point in the second frame The absolute value of the difference.
  • (u 1 , v 1 ) is the pixel coordinates of the normalized plane of the matching feature points of the first frame (ie, the current key frame), and (u 2 , v 2 ) is the normalization of the matching feature points of the second frame.
  • the pixel coordinates of the plane, f represents the focal length of the camera, and b represents the length of the binocular baseline.
  • the matching feature points of the first frame and the matching feature points of the second frame correspond to the same point in the world coordinate system.
  • Steps 401 to 406 are repeatedly performed until all source binocular image frames are processed, and a first visual point cloud map composed of a large number of map points is obtained.
  • the embodiment of the present application provides a method for constructing a visual point cloud map based on a binocular image frame, which uses the binocular image frame to obtain the spatial coordinates of matching feature points, and the calculation is simple.
  • map construction there is no need to determine the positioning information of the robot, so that the map construction and positioning are separated, and the stability of the constructed map and the adaptability to complex environments are improved.
  • the method of generating map points based on continuous matching between image frames will generate cumulative errors. As the moving distance of the robot increases, the above-mentioned cumulative error will become larger and larger.
  • FIG. 5 is a schematic diagram of the accumulated error provided by the embodiment of the present application.
  • T 1 and T 19 are near the same position, but the calculated trajectory is not near the same position due to the accumulated error.
  • the least squares method can be used to optimize by constructing closed-loop constraints.
  • FIG. 6 is a schematic flowchart of optimizing a first visual point cloud map according to an embodiment of the present application.
  • the optimization method may include: closed-loop point identification, closed-loop constraint calculation, and map optimization.
  • the map optimization includes pose graph optimization and/or map point optimization, as follows.
  • Step 601 Identify key frames with closed-loop constraints in the first visual point cloud map through manual marking or key frame similarity calculation.
  • the method of manual marking is adopted: during image data acquisition, a unique identification pattern is arranged in the environment, so that a closed loop is generated between key frames with the same identification acquired in different times.
  • This method has the advantage of high reliability.
  • the second embodiment is the method of natural identification, that is, the method of calculating the similarity of key frames: by calculating whether the similarity between two key frames is greater than a set similarity threshold, it is judged whether a closed loop occurs.
  • the similarity includes the similarity in the distribution of feature points and the similarity of image pixels.
  • Keyframes with closed-loop constraints in the first visual point cloud map are identified through keyframe similarity calculation, which can include:
  • the k-means clustering algorithm (k-means) is used to cluster all feature points into k categories, thus obtaining the first layer of nodes.
  • the feature points belonging to the node are clustered into k categories to obtain the next layer.
  • the leaf layer is the word feature points in the dictionary.
  • FIG. 7 is a schematic diagram of a visual dictionary provided by an embodiment of the present application.
  • each hollow circle represents a node, and the connecting line between the two nodes represents the path when searching for a certain feature point.
  • n the number of words appearing feature points ⁇ i in the dictionary for visual c i
  • the weight ⁇ i of the word feature point is expressed as weight:
  • any key frame A can be a set with each word feature point ⁇ i and its weight as elements, and the mathematical expression is:
  • A ⁇ ( ⁇ 1 , ⁇ 1 ), ( ⁇ 2 , ⁇ 2 ), ... ( ⁇ N , ⁇ N ) ⁇
  • N is the total number of feature points in the visual dictionary.
  • v Ai is an element in the set of key frames A described according to the visual dictionary
  • v Bi is an element in the set of key frames B described according to the dictionary model
  • N is the total number of feature points in the visual dictionary.
  • Step 602 Calculate the closed-loop constraints based on the key frames determined to have closed-loop constraints (hereinafter referred to as closed-loop key frames for short).
  • the above step 602 may include:
  • Step 6021 based on the closed-loop key frame, calculate the matching feature points in the closed-loop key frame:
  • P is the set of m matching feature points in the first closed-loop key frame A
  • P is the set of m matching feature points in the second closed-loop key frame B
  • p i and p′ i are pixel coordinates.
  • the pixel coordinates may also be referred to as a pixel coordinate matrix.
  • the first closed-loop keyframe and the second closed-loop keyframe are closed-loop keyframes.
  • the above matching degree calculation may be to calculate the Hamming distance between the descriptors of the two feature points. If the Hamming distance is less than the set Hamming threshold, it is determined that the two feature points match.
  • Step 6022 Calculate the inter-frame motion information between the two closed-loop key frames according to the matching feature points in the closed-loop key frames, that is, calculate the relative pose between the two closed-loop key frames, and the relative pose represents the accumulated error.
  • (R, t) is the relative pose between two closed-loop keyframes.
  • (R, t) reflects the closed-loop constraints between two closed-loop keyframes, and the relative pose can be calculated through the above relationship as the initial value; i is a natural number, and 1 ⁇ i ⁇ m, p i , p′ i are pixel coordinates.
  • the least squares method can be used to solve, for example, the nonlinear optimization LM (Levenberg-Marquardt, Levenberg-Marquardt) algorithm can be used to solve.
  • LM Longberg-Marquardt, Levenberg-Marquardt
  • is the Lie algebra representation of (R, t)
  • p i and p′ i are pixel coordinates
  • m is the number of matching feature points in the closed-loop key frame.
  • the above-mentioned pixel position information is pixel coordinates.
  • Step 603 according to the closed-loop constraint, optimize the map points in the first visual point cloud map.
  • the optimization of the map points in the first visual point cloud map may include: pose graph optimization and map point optimization.
  • the pose graph optimization is processed according to step 6031
  • the map point optimization is processed according to step 6032 .
  • Step 6031 and step 6032 are in no order.
  • steps 6031 and 6032 the optimization process of any one of the steps can also be selectively performed. For example, only pose graph optimization, or, only map point optimization.
  • Step 6031 in view of the Lie algebra representation ⁇ i of the pose T i of any key frame i and the Lie algebra representation ⁇ j of the pose T j of any key frame j , the relative pose error e ij can be expressed as :
  • the symbol ⁇ represents the anti-symmetric matrix
  • the symbol ⁇ represents the inverse operation of the anti-symmetric matrix
  • T ij represents the relative pose between key frame i and key frame j
  • ⁇ ij represents the frame between key frame i and key frame j
  • the pose of the key frame is the pose of the camera (or robot) when the key frame is collected.
  • is the weight of the error term
  • is the key frame set
  • e ij represents the relative pose error between key frame i and key frame j.
  • the accumulated errors determined according to the closed-loop keyframes are distributed to each keyframe, thereby correcting the pose of the keyframes.
  • Step 6032 According to the pose T i of any key frame i, the pixel position z ij of the coordinate y j of the three-dimensional map point j in the key frame i is collected, and the reprojection error e ij is constructed:
  • I is the identity matrix
  • [I 3 ⁇ 3 0 3 ⁇ 1 ] constitutes a 3 ⁇ 4 matrix
  • T i is a 4 ⁇ 4 matrix
  • K is the camera intrinsic parameter.
  • 0 3 ⁇ 1 is [0 0 0] T .
  • x is a homogeneous representation of pixel coordinates
  • x 1 , x 2 and x 3 represent the three numbers within x.
  • the above-mentioned pixel position z ij of the coordinate y j of the three-dimensional map point j in the key frame i is collected according to the pose T i of any key frame i, and the reprojection error e ij is constructed, which can be understood as, based on the position of the key frame i Zi T i, determining the coordinates of the map point j y j of the pixel position i z ij in the key frame, and further according to the pixel position z ij, configured reprojection error e ij.
  • the coordinate y j of the three-dimensional map point j is the coordinate of the map point j in the world coordinate system
  • the pixel position z ij represents the pixel coordinate of the map point j in the key frame i
  • is the weight of the error term
  • j is the map point
  • e ij represents the reprojection error
  • the above-mentioned reprojection error obtained according to the pose T i of the key frame i, the coordinates of the map point j, the camera internal parameters, and the pixel coordinates of the map point j in the key frame i is substituted into the third objective function, and used as the initial value , using the Gauss-Newton algorithm or the LM algorithm, iteratively solves the coordinate y j of the three-dimensional map point j when the third objective function obtains the minimum value, so as to correct the three-dimensional spatial position information of the map point j.
  • the pose T i of the key frame i may be the pose optimized in step 6031.
  • the optimized pose of the key frame and/or the optimized coordinates of the map point are saved as the map information of the visual point cloud.
  • the second visual point cloud map is obtained through the pose optimization of the key frame and/or the optimization of the coordinates of the map points.
  • the mapping process is separated into: a processing stage of constructing an independent first visual point cloud map, and a processing stage of obtaining a second visual point cloud map through closed-loop constraint calculation and map optimization.
  • Each processing stage has a corresponding output map saved. Even if the mapping is not ideal, the original data in the previous processing stage is also saved. This makes building maps more extensible and easier to integrate with various improved map building methods.
  • FIG. 8 is a schematic diagram of an apparatus for constructing a visual point cloud map provided by an embodiment of the present application.
  • the apparatus includes: a first visual point cloud map construction unit 801, a closed-loop unit 802, a map optimization unit 803, and an IO (Input Output, input output) unit 804 for reading and saving map files.
  • a first visual point cloud map construction unit 801 a closed-loop unit 802, a map optimization unit 803, and an IO (Input Output, input output) unit 804 for reading and saving map files.
  • IO Input Output, input output
  • the source image frame from the outside is input to the first visual point cloud map construction unit 801; the first visual point cloud map construction unit 801 is used to generate the first visual point cloud map; the closed-loop unit 802 is used to generate the first visual point cloud map; The first visual point cloud map generated by the cloud map construction unit 801 adds closed-loop constraints; the map optimization unit 803 is configured to perform key frame pose graph optimization and map point optimization on the first visual point cloud map based on the closed-loop constraints.
  • the first visual point cloud map construction unit 801 may include:
  • the image preprocessing module 8011 is used to preprocess the source image frame
  • Feature extraction module 8012 for converting the image information of the preprocessed source image frame into feature information
  • the map point generation module 8013 is used to perform inter-frame tracking on the source image frame, determine the key frame, match the feature point in the current key frame with the feature point in the previous key frame, and obtain the matching feature point of the current key frame; Calculate the spatial position information of the matching feature points in the current key frame, and use the spatial position information of the matching feature points as the map point information of the current key frame;
  • map point generation module 8013 can specifically be used to perform inter-frame tracking on the source image frame, determine the key frame, calculate the spatial position information of the matching feature points in the current key frame, and use the spatial position information of the matching feature points as the current key frame. Map point information.
  • the point cloud formed by the set of map points of all key frames is the first visual point cloud map.
  • the closed loop unit 802 includes:
  • the closed-loop key frame identification module 8021 is used to identify the closed-loop key frame in the first visual point cloud map according to the artificial mark or key frame similarity calculation;
  • the closed-loop constraint calculation module 8022 is used to calculate the relative pose between the closed-loop keyframes based on the closed-loop keyframes as a closed-loop constraint; construct a second objective function for optimization of the keyframe pose graph, and use the least squares method to solve such that The pose of the key frame when the second objective function obtains the minimum value.
  • the map optimization unit 803 includes a key frame pose graph optimization module 8031 and/or a map point optimization module 8032;
  • the key frame pose graph optimization module 8031 is used to optimize the pose of the key frame based on the first visual point cloud map, according to the closed-loop key frames with closed-loop constraints, and use the least squares method to obtain the second visual point cloud. map;
  • the map point optimization module 8032 is configured to optimize the spatial position information of the map points based on the first visual point cloud map and according to the reprojection error to obtain the second visual point cloud map.
  • FIG. 9 is a schematic diagram of an image preprocessing module provided by an embodiment of the present application.
  • the image preprocessing module may include:
  • the image de-distortion sub-module is used to de-distort the source image frame according to the distortion coefficient of the camera to obtain the de-distorted image frame;
  • the image filtering sub-module is used to perform image filtering on the dedistorted image frame to obtain the background image frame;
  • the image difference sub-module is used to subtract the background image frame from the distorted image frame to obtain the foreground image frame;
  • the image stretching sub-module is used to stretch the foreground image frame to obtain the target image frame.
  • the mapping process is separated into independent first visual point cloud map construction unit, closed-loop unit and map optimization unit, there is no coupling relationship between each unit, and each processing stage has a corresponding output map to save, even if In the case of unsatisfactory mapping, the original data in the previous process is also saved; it has strong scalability and is easy to integrate with various improvement methods.
  • the embodiment of the present application also provides a visual point cloud map construction device, the device includes: a first visual point cloud map construction unit, the unit includes:
  • the feature extraction module is used to perform feature extraction on the source image frame collected in the space to be constructed to obtain the feature points of the source image frame;
  • the map point generation module is used to track the source image frame between frames and determine the key frame; match the feature points in the current key frame with the feature points in the previous key frame to obtain the matching feature points of the current key frame; calculate The spatial position information of the matching feature points in the current key frame, and the spatial position information of the matching feature points is used as the map point information of the current key frame;
  • the point cloud formed by the set of map points of all key frames is the first visual point cloud map.
  • a feature extraction module which can be specifically used for:
  • the map point generation module is also used to perform map optimization on the pose of the key frame based on the first visual point cloud map, according to the closed-loop key frame with closed-loop constraints, using the least squares method, and/or, according to the reprojection error, on the map
  • the spatial position information of the points is optimized to obtain the second vision point cloud map.
  • a feature extraction module which can be specifically used for:
  • the source image frame is de-distorted to obtain a de-distorted image
  • a feature extraction module which can be specifically used for:
  • the pixel value of the foreground image is less than or equal to the set minimum gray value, the pixel value of the foreground image is the minimum value within the pixel value range;
  • the pixel value of the foreground image is greater than the minimum gray value and less than the set maximum gray value, the pixel value in a certain proportion to the maximum value of the pixel is taken as the pixel value of the foreground image; the certain proportion is the pixel value of the foreground image and the The ratio of the difference between the minimum gray value and the difference between the maximum gray value and the minimum gray value;
  • the pixel value of the foreground image is greater than or equal to the maximum gray value, the pixel value of the foreground image is the maximum value within the range of pixel values;
  • the feature points in the grid are arranged in descending order according to the response value of the feature points, the first Q feature points are retained, and the filtered feature points are obtained; wherein, Q is based on the feature points in the target image frame.
  • the number and the set upper limit of the total number of feature points, and the total number of feature points in the grid are determined;
  • Feature descriptors are calculated separately for each feature point after screening.
  • the Q is determined according to the number of feature points in the target image frame and the set upper limit of the total number of feature points, and the total number of feature points in the grid, including: Q is the number of feature points in the target image frame divided by the set. The quotient of the upper limit of the total number of feature points is multiplied by the total number of feature points in the grid, and the result is rounded down.
  • map point generation module which can be used for:
  • the key frame condition satisfies at least one of the following conditions:
  • the number of matching feature points is greater than the set first threshold
  • the spatial distance from the previous key frame is greater than the set second threshold
  • the spatial angle from the previous keyframe is greater than the set third threshold.
  • the source image frame is an image frame originating from a monocular camera and on the same plane;
  • Map point generation module which can be used for:
  • the x coordinate is: the ratio of the product of the pixel abscissa of the matching feature point in the current key frame and the camera installation height to the camera focal length;
  • the y coordinate is: the ratio of the product of the pixel ordinate of the matching feature point in the current key frame and the camera installation height to the camera focal length;
  • the z coordinate is: camera installation height.
  • the source image frame is an image frame originating from a monocular camera and not on the same plane;
  • Map point generation module which can be used for:
  • the pixel coordinates of at least 8 pairs of matching feature points consisting of matching feature points in the current key frame and matching feature points in the previous key frame, obtain the essential matrix of the current key frame and the previous key frame;
  • For each matching feature point According to the relative pose between the current key frame and the previous key frame, according to the triangulation calculation relationship, at least the depth value of the matching feature point in the current key frame is obtained; The depth value of the feature point is obtained to obtain the spatial position information of the matching feature point.
  • map point generation module which can be used for:
  • map point generation module can also be used for:
  • the depth value of the matching feature point in the current key frame and the matrix of the normalized plane coordinates of the matching feature point is equal to, the depth value of the matching feature point in the previous key frame, the rotation matrix in the relative pose, And the sum of the matrix product of the normalized plane coordinates of the matching feature point in the previous keyframe and the translation matrix in the relative pose, according to the rotation matrix in the relative pose between the current keyframe and the previous keyframe and the translation matrix, the matrix of the normalized plane coordinates of the matching feature point in the current key frame and the previous key frame, to obtain the depth value of the matching feature point in the current key frame;
  • the x coordinate is: the product of the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the depth value of the matching feature point;
  • the y coordinate is: the product of the pixel ordinate of the normalized plane of the matching feature point in the current key frame and the depth value of the matching feature point;
  • the z coordinate is: camera focal length.
  • the source image frame is a binocular image frame originating from a binocular camera and not on the same plane;
  • Feature extraction module which can be used for:
  • the map point generation module can specifically be used to: determine whether the binocular target image frame is the first frame; if so, use any frame in the binocular target image frame as a key frame; otherwise, determine the target image according to the key frame conditions Whether any of the frames is a keyframe;
  • the x-coordinate of the matching feature point in the current key frame is: the product of the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the length of the binocular baseline, divided by the normalized value of the matching feature point in the current key frame The absolute value of the difference between the pixel abscissa of the normalization plane and the pixel abscissa of the normalization plane of the matching feature point in the second frame;
  • the y-coordinate of the matching feature point in the current key frame is: the product of the pixel ordinate of the normalized plane of the matching feature point in the current key frame and the length of the binocular baseline, divided by the normalized value of the matching feature point in the current key frame The absolute value of the difference between the pixel abscissa of the normalization plane and the pixel abscissa of the normalization plane of the matching feature point in the second frame;
  • the z-coordinate of the matching feature point in the current key frame is: the product of the camera focal length and the binocular baseline length, divided by the pixel abscissa of the normalized plane of the matching feature point in the current key frame and the matching feature point in the second frame The absolute value of the difference between the pixel abscissas of the normalized plane.
  • map point generation module which can be used for:
  • a second objective function for key frame pose graph optimization is constructed, and the closed-loop constraint is used as the constraint, and the least squares method is used to solve the pose of the key frame when the second objective function achieves the minimum value.
  • map point generation module which can be used for:
  • the keyframes with the same identification are collected in different times as closed-loop keyframes
  • the matching feature points in the closed-loop key frame are calculated
  • the pixel coordinate matrix of the matching feature point in the first closed-loop keyframe is equal to the rotation in the relative pose between the first closed-loop keyframe and the second closed-loop keyframe
  • the product of the matrix and the pixel coordinate matrix of the second closed-loop key frame plus the relationship of the translation matrix in the relative pose is calculated to obtain the relative pose as an initial value
  • map point generation module which can be used for:
  • the feature points belonging to the node are clustered into k categories to obtain the next layer of nodes;
  • the feature points belonging to the node are clustered into k categories to obtain the nodes of the next layer; repeating the above for each node of the next layer, the feature points belonging to the node are clustered into k categories class, the steps of obtaining the next layer of nodes, until the last leaf layer, to obtain a visual dictionary, the visual dictionary includes N feature points, and each fork is a tree of k;
  • the leaf layer includes the word feature points in the visual dictionary
  • k, d, and N are all natural numbers, and N is the total number of feature points in the visual dictionary
  • the weight of the word feature point is calculated, and the key frame is described as a word feature point and its weight as elements.
  • Set the set includes N elements;
  • map point generation module which can be used for:
  • a third objective function of the re-projection error is constructed
  • the initial value of the reprojection error is: the difference between the pixel position of the map point in the key frame and the reprojection position of the map point in the image;
  • the re-projected position of the map point in the image is obtained according to the camera internal parameters, the pose of the key frame, and the spatial position information of the map point.
  • the process of constructing the map separates the mapping and the positioning, and effectively removes the mutual influence between the mapping and the positioning.
  • the map construction method provided by the embodiments of the present application has better adaptability and stability.
  • the accuracy of the map is improved.
  • the map can be corrected in time without losing the initial map data, which enhances the scalability of the map construction and is conducive to integration with the improved map construction method.
  • Embodiments of the present application further provide a visual point cloud map construction device, including a memory and a processor, where the memory stores executable computer instructions, and the processor is configured to execute the instructions stored in the memory, so as to implement any of the above The steps of the construction method of the visual point cloud map.
  • the memory may include RAM (Random Access Memory, random access memory), and may also include NVM (Non-Volatile Memory, non-volatile memory), such as at least one disk storage.
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the processor can be a general-purpose processor, including CPU (Central Processing Unit, central processing unit), NP (Network Processor, network processor), etc.; it can also be DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array, Field Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit, central processing unit
  • NP Network Processor, network processor
  • DSP Digital Signal Processing, digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array, Field Programmable Gate Array
  • other programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
  • An embodiment of the present application further provides an electronic device, including a memory and a processor, the memory stores executable computer instructions, and the processor is configured to execute the instructions stored in the memory, so as to implement any of the above-mentioned visual point cloud maps steps of the build method.
  • the electronic device can be a robot or a server connected to the robot.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of any of the above-described construction methods for a visual point cloud map are implemented.
  • An embodiment of the present application further provides a computer program, which implements the steps of any of the above-mentioned construction methods for a visual point cloud map when the computer program is executed by a processor.
  • the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

Abstract

一种视觉点云地图的构建方法及装置,方法包括:对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧的特征点;对源图像帧进行帧间跟踪,确定关键帧;将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息,其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。构建地图的过程将建图和定位进行分离,有效地除去了建图与定位的相互影响,对于在复杂多变的环境中,具有更好的适应性和稳定性;装置与方法相对应。

Description

一种视觉点云地图的构建方法、装置
本申请要求于2020年6月30日提交中国专利局、申请号为20201061570.6发明名称为“一种视觉点云地图的构建方法、装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及导航定位领域,特别地,涉及一种视觉点云地图的构建方法、装置。
背景技术
构建地图与定位是即时定位与建图(Simultaneous Localization And Mapping,SLAM)研究中的重点技术,而构建地图是实现定位的前提条件,地图的好坏直接影响到定位的精度。视觉点云地图是所构建的一种地图。视觉点云地图通过空间中的三维点集,描述环境中点的视觉、位姿等信息,故而,构建视觉点云地图需要两类数据信息:关键帧和地图点,其中,关键帧描述环境中点的视觉,地图点描述环境中点的位姿。其中,由大量地图点形成的集合构成了点云。
SLAM是指:机器人从未知环境的未知位置出发,在运动过程中,通过重复观测到的地图特征,定位自身位置和姿态,再根据自身位置增量式的构建地图,从而达到同时定位和地图构建的目的。
基于SLAM的地图构建,从输入而言,在机器人运动之前,没有输入,机器人开始运动的时候,有传感器原始数据输入;从输出而言,输出估计位姿和估计地图。可见,相关技术中在建立新地图或者改进已知地图的同时,在该地图上定位机器人。这类似于把一个人放到陌生的城市,让这个人熟悉该城市的过程。基于上述可知,相关技术的SLAM的地图构建将建图问题和定位问题耦合在一起,建图和定位二者相互影响。
发明内容
本申请实施例提供了一种视觉点云地图的构建方法、装置,以避免定位对建图的影响。
本申请实施例提供的一种视觉点云地图的构建方法是这样实现的:
对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点;
对源图像帧进行帧间跟踪,确定关键帧;
将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;
计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息;
其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
可选的,所述对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点,进一步包括:
对源图像帧进行图像预处理,得到目标图像帧;
基于目标图像帧进行特征提取,得到目标图像帧的特征点;
该方法进一步包括:
基于第一视觉点云地图,根据存在闭环约束的闭环关键帧,采用最小二乘法,对关键帧位姿进行图优化,和/或,根据重投影误差,对地图点的空间位置信息进行优化,得到第二视觉点云地图。
可选的,所述对源图像帧进行图像预处理,得到目标图像帧,包括:
根据相机的畸变系数,对源图像帧进行去畸变处理,得到去畸变图像;
判断去畸变图像中各个像素点的像素值是否大于第一像素阈值;如果是,则将去畸变图像中像素值大于第一像素阈值的像素点进行取反操作,然后对取反后的去畸变图像进行图像滤波,得到背景图像;否则,将去畸变图像进行图像滤波,得到背景图像;
用去畸变图像减去背景图像,得到前景图像;
判断前景图像中的像素值是否分布均匀;如果均匀,则将该前景图像作为目标图像帧;否则,对前景图像进行拉伸处理,得到目标图像帧。
可选的,所述对前景图像进行拉伸处理,包括:
若前景图像像素值小于等于设定的最小灰度值时,将该前景图像像素值取值为像素取值范围内的最小值;
若前景图像像素值大于最小灰度值、且小于设定的最大灰度值时,按照与像素最大值成一定比例的像素值作为该前景图像像素值;所述一定比例为前景图像像素值与最小灰度值之差与最大灰度值与最小灰度值之差的比值;
若前景图像像素值大于等于最大灰度值时,将该前景图像像素值取值为像素取值范围内的最大值;
所述基于目标图像帧进行特征提取,得到目标图像帧的特征点,包括:
对目标图像帧进行特征检测,得到特征点;
将目标图像帧划分成一定数量的网格;
对于任一网格中的特征点,将网格内的特征点按特征点响应值降序排列,保留前Q个特征点,得到筛选后的特征点;其中,Q根据目标图像帧中特征点的数量和设定的特征点总数上限、该网格中的特征点总数确定;
对筛选后的各特征点,分别计算特征描述符。
可选的,所述Q根据目标图像帧中特征点的数量和设定的特征点总数上限、该网格中的特征点总数确定,包括:Q为目标图像帧中特征点的数量除以设定的特征点总数上限之商,乘以网格中的特征点总数后的结果向下取整得到。
可选的,所述对源图像帧进行帧间跟踪,确定关键帧,包括:
对于每一目标图像帧:判断该目标图像帧是否为首帧;如果是,则将该目标图像帧作为关键帧;否则,根据关键帧条件确定该目标图像帧是否为关键帧;
其中,所述关键帧条件至少满足以下条件之一:
匹配特征点数量大于设定的第一阈值;
与上一关键帧之间的空间距离大于设定的第二阈值;
与上一关键帧之间的空间角度大于设定的第三阈值。
可选的,所述源图像帧为来源于单目相机、且为同一平面的图像帧;
所述计算当前关键帧中匹配特征点的空间位置信息,包括:
对于每一匹配特征点:
x坐标为:当前关键帧中该匹配特征点的像素横坐标与相机安装高度的乘积结果与相机焦距的比值;
y坐标为:当前关键帧中该匹配特征点的像素纵坐标与相机安装高度的乘积结果与相机焦距的比值;
z坐标为:相机安装高度。
可选的,所述源图像帧为来源于单目相机、且为非同一平面的图像帧;
所述计算当前关键帧中匹配特征点的空间位置信息,包括:
根据由当前关键帧中匹配特征点与上一关键帧中匹配特征点组成的至少8对匹配特征点的像素坐标,得到当前关键帧与上一关键帧的本质矩阵;
对本质矩阵进行奇异值分解,得到当前关键帧与上一关键帧之间的相对位姿;
对于每一匹配特征点:根据当前关键帧与上一关键帧之间的相对位姿,按照三角化计算关系,至少得到当前关键帧中该匹配特征点的深度值;根据当前关键帧中该匹配特征点的深度值,得到该匹配特征点的空间位置信息。
可选的,所述根据由当前关键帧中匹配特征点与上一关键帧中匹配特征点组成的至少8对匹配特征点的像素坐标,得到当前关键帧与上一关键帧的本质矩阵,包括:
对于任一匹配特征点:
根据上一关键帧中该匹配特征点的归一化平面坐标的转置矩阵、本质矩阵、当前关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积等于0的关系,代入8对匹配特征点的像素坐标,得到本质矩阵;
所述根据当前关键帧与上一关键帧之间的相对位姿,按照三角化计算关系,至少得到当前关键帧中该匹配特征点的深度值,包括:
基于当前关键帧中该匹配特征点的深度值与该匹配特征点的归一化平面坐标的矩阵之乘积等于,上一关键帧中该匹配特征点的深度值、相对位姿中的旋转矩阵、以及上一关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积与相对位姿中的平移矩阵之和,根据当前关键帧与上一关键帧之间的相对位姿中的旋转矩阵和平移矩阵、当前关键帧和上一关键帧中该匹配特征点的归一化平面坐标的矩阵,得到当前关键帧中该匹配特征点的深度值;
所述根据当前关键帧中该匹配特征点的深度值,得到该匹配特征点的空间位置信息,包括:
x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与该匹配特征点的深度值的乘积;
y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与该匹配特征点的深度值的乘积;
z坐标为:相机焦距。
可选的,所述源图像帧为来源于双目相机、且为非同一平面的双目图像帧;
所述对源图像帧进行图像预处理,得到目标图像帧,包括:
对来自第一目相机的第一源图像帧、来自第二目相机的第二源图像帧分别进行图像预处理,得到第一目标图像帧和第二目标图像帧,作为双目目标图像帧;
所述基于目标图像帧进行特征提取,得到目标图像帧的特征点,包括:分别提取第一目标图像帧的特征点和第二目标图像帧的特征点;
所述判断该目标图像帧是否为首帧,包括:判断双目目标图像帧是否为首帧;如果是,则将该双目目标图像帧中的任一帧作为关键帧;否则,根据关键帧条件确定该目标图像帧中的任一帧是否为关键帧;
所述计算当前关键帧中匹配特征点的空间位置信息,包括:
对于当前关键帧中每一匹配特征点:
将当前关键帧作为当前双目目标图像帧中的第一帧,将该双目目标图像帧中的另一目标图像帧作为第二帧,将第一帧中该匹配特征点与第二帧中的特征点进行匹配;如果匹配成功,得到第二帧中的匹配特征点,则:
当前关键帧中该匹配特征点的x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
当前关键帧中该匹配特征点的y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
当前关键帧中该匹配特征点的z坐标为:相机焦距与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值。
可选的,所述根据存在闭环约束的闭环关键帧,采用最小二乘法,对关键帧位姿进行图优化,包括:
根据人工标记或关键帧相似度计算,识别出第一视觉点云地图中的闭环关键帧;
基于闭环关键帧,计算闭环关键帧之间的相对位姿,作为闭环约束;
构造用于关键帧位姿图优化的第二目标函数,以闭环约束为约束,采用最小二乘法,求解使得第二目标函数取得最小值时的关键帧的位姿。
可选的,所述根据人工标记或关键帧相似度计算,识别出第一视觉点云地图中的闭环关键帧,包括:
将不同次地采集到相同标识的关键帧作为闭环关键帧;
或者,
计算两关键帧间的相似度是否大于设定的相似度阈值;如果是,则判定该两关键帧为闭环关键帧,其中,相似度包括:特征点分布上的相似度和图像像素的相似度;
所述基于闭环关键帧,计算闭环关键帧之间的相对位姿,作为闭环约束,包括:
基于闭环关键帧,计算该闭环关键帧中的匹配特征点;
对于该闭环关键帧中的任一匹配特征点,根据第一闭环关键帧中该匹配特征点的像素坐标矩阵等于,第一闭环关键帧和第二闭环关键帧之间的相对位姿中的旋转矩阵与第二闭环关键帧的像素坐标矩阵的乘积加上相对位姿中的平移矩阵的关系,计算得到相对位姿,作为初始值;
构建累计闭环关键帧中的所有匹配特征点的像素位置信息误差的第一目标函数,代入所有匹配特征点的像素坐标矩阵,迭代求解使得第一目标函数取得最小值时的相对位姿;
所述构造用于关键帧位姿图优化的第二目标函数,以闭环约束为约束,采用最小二乘法,求解使得第二目标函数取得最小值时的关键帧的位姿,包括:
根据任一第一关键帧的位姿和任一第二关键帧的位姿之间的相对位姿的误差,构建累计该第一关键帧和该第二关键帧的相对位姿的误差的第二目标函数;
以该第一关键帧和第二关键帧之间的相对位姿误差作为初始值,以所述闭环约束为约束,迭代求解使得第二目标函数取得最小值时的第一关键帧的位姿和第二关键帧的位姿。
可选的,所述计算两关键帧间的相似度是否大于设定的相似度阈值,包括:
在根节点,用k均值聚类算法将所有特征点聚成k类,得到第一层节点;
对于第一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点;
对于下一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点;重复执行所述对于下一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点的步骤,直至最后的叶子层,得到视觉字典,该视觉字典包括N个特征点、且每次分叉为k的树;
其中,从根节点到叶子层共计d层,叶子层中包括视觉字典中的单词特征点;k、d、N均为自然数,N为视觉字典中特征点的总数;
对于任一关键帧,根据该关键帧中所有特征点数量以及任一单词特征点出现的次数,计算该单词特征点的权重,将该关键帧描述为以各个单词特征点及其权重为元素的集合,该集合包括有N个元素;
根据第一关键帧所描述的集合中的所有元素和第二关键帧所描述的集合中的所有元素,计算第一关键帧与第二关键帧的相似度;
若相似度大于设定的相似度阈值,判定两关键帧之间存在闭环约束。
可选的,所述根据重投影误差,对地图点的空间位置信息进行优化,包括:
对任一关键帧,根据该关键帧的位姿所采集到任一地图点在该关键帧中的像素位置所存在的重投影误差,构建重投影误差的第三目标函数;
以重投影误差初始值,迭代求解使得第三目标函数取得最小值时的地图点的空间位置信息;
其中,重投影误差初始值为:该地图点在该关键帧中的像素位置与该地图点重投影在图像中的位置之差;
所述该地图点重投影在图像中的位置根据相机内参、该关键帧的位姿、该地图点的空间位置信息得到。
本申请实施例还提供了一种视觉点云地图的构建装置,包括第一视觉点云地图构建单元,该单元包括:
特征提取模块,用于对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点;
地图点生成模块,用于对源图像帧进行帧间跟踪,确定关键帧,将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点,计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息,
其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
本申请实施例还提供了一种电子设备,包括存储器和处理器,所述存储器存储有可执行的计算机指令,所述处理器被配置执行所述存储器中存储的指令,以实现上述任一所述视觉点云地图的构建方法的步骤。
本申请实施例还提供了一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一所述视觉点云地图的构建方法的步骤。
本申请实施例还提供了一种计算机程序,所述计算机程序被处理器执行时实现上述任一所述视觉点云地图的构建方法的步骤。
本申请实施例提供的视觉点云地图的构建方法,通过对待建地图的空间所采集的图像帧进行特征提取,通过帧间匹配,获取匹配特征点的空间位置信息,将匹配特征点作为地图点,得到由所有关键帧的地图点集合所构成的视觉点云地图,实现了物理环境中三维点的生成和描述。
本申请实施例中,构建地图的过程将建图和定位进行分离,有效地除去了建图与定位的相互影响。在复杂多变的环境中,本申请实施例提供的地图构建方法具有更好的适应性和稳定性。
此外,由于点云地图具有连续性,相比于由地图节点所构成的地图,能实现连续的定位,避免了定位过程中的跳变问题,降低了重定位发生的概率。
进一步地,通过对地图关键帧的位姿图优化和/或地图点优化,提高了地图的精度。在发生错误闭环情况,能够及时对地图进行修正,且不会丢失初始地图数据,这使得构建地图的扩展性增强,有利于与改进的地图构建方法进行融合。
附图说明
图1为本申请实施例一提供的基于单目相机所采集的图像数据构建地图的一种流程示意图。
图2为本申请实施例提供的特征点筛选的一种示意图。
图3为本申请实施例二提供的基于单目相机所采集的前视图像数据构建地图的一种流程示意图。
图4为本申请实施例三提供的基于双目相机所采集的图像数据构建地图的一种流程示意图。
图5为本申请实施例提供的累计误差的一种示意图。
图6为本申请实施例提供的对第一视觉点云地图进行优化的一种流程示意图。
图7为本申请实施例提供的视觉字典的一种示意图。
图8为本申请实施例提供的视觉点云地图的构建装置的一种示意图。
图9为本申请实施例提供的图像预处理模块的一种示意图。
具体实施方式
为了使本申请的目的、技术手段和优点更加清楚明白,以下结合附图对本申请做进一步详细说明。
本申请实施例中,基于采集的图像数据,通过特征提取和帧间跟踪的特征点匹配,获得视觉点云地图。可选地,通过闭环约束进行位姿图优化,和/或通过重投影误差进行地图点优化,以提高地图的精度。所构建的视觉点云地图至少包括关键帧位姿信息和地图点的空间位置信息,其中,每个地图点还可以具有特征点描述符信息。
为避免定位对建图的影响,本申请实施例提供了一种视觉点云地图的构建方法,该视觉点云地图的构建方法可以应用于机器人或与机器人连接的服务器,对此不进行限定。该视觉点云地图的构建方法包括:
对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点;
对源图像帧进行帧间跟踪,确定关键帧;
将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;
计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息;
其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
本申请实施例提供的技术方案中,构建地图的过程将建图和定位进行分离,有效地除去了建图与定位的相互影响。在复杂多变的环境中,本申请实施例提供的地图构建方法具有更好的适应性和稳定性。
此外,由于点云地图具有连续性,相比于由地图节点所构成的地图,能实现连续的定位,避免了定位过程中的跳变问题,降低了重定位发生的概率。
实施例一
为便于理解,在本申请实施例中,以图像数据是由单目相机采集、图像数据为地面纹理图像为例来说明。所应理解的是,本申请实施例中,图像数据可以简称为图像或图像帧,图像帧可不限于地面纹理图像,其它类型的图像帧也可适用。
参见图1所示,图1为本申请实施例一提供的基于单目相机所采集的图像数据构建地图的一种流程示意图。该地图的构建过程可以包括以下三个阶段:图像预处理、特征提取、以及帧间跟踪。可选地,对于每一图像帧,执行如下步骤:
步骤101,将所采集的图像帧作为源图像帧,对源图像帧进行预处理,得到目标图像帧,以便于提取图像帧中的特征点。例如,视觉点云地图为地面纹理地图时,需要提取地面纹理图像帧中的纹理信息,因此,对地面纹理图像帧所进行的预处理的目的是:得到以纹理信息为主的图像帧,以便提取包括纹理信息的特征点。
可选的,上述步骤101可以细化为如下步骤:
步骤1011,根据相机的畸变系数对源图像帧进行去畸变处理,得到去畸变图像帧I(u,v),其中,u、v表示像素坐标,I(u,v)表示去畸变图像帧中该像素坐标(u,v)处的像素值。
步骤1012,对去畸变图像帧I(u,v)进行图像滤波,得到背景图像帧I b(u,v)。
例如,图像滤波为高斯滤波。步骤1012可以为,对去畸变图像帧I(u,v)进行高斯滤波,其中,高斯滤波核大小可以设置为45×45。
上述步骤1012可以用数学式表达为:
I b(u,v)=G×I(u,v);
其中,G为图像滤波的滤波核,I b(u,v)为背景图像帧,即滤波后的图像帧;I(u,v)为去畸变图像帧;
本申请实施例中,图像滤波核(如上述高斯滤波核)可以设置的比较大,使得滤波后的图像帧尽可能的接近真实的背景图像帧。
可选的,若纹理区域为图像帧中的较暗部分,可先将去畸变图像帧I(u,v)进行取反操作,用数学式表达为:像素最大值-I(u,v)。例如,像素最大值为255,则取反操作为:255-I(u,v)。
上述纹理区域为图像帧中特征点所在的区域。若源图像帧中纹理区域的亮度低于预设亮度阈值,则可对去畸变图像帧I(u,v)进行取反操作,得到取反后的图像帧,之后,对取反后的图像帧进行图像滤波,得到背景图像帧I b(u,v)。
步骤1013,用去畸变图像帧减去背景图像帧,得到以纹理信息为主的前景图像帧I f(u,v)。上述步骤1013中求得前景图像帧可以用数学式表达为:
I f(u,v)=I(u,v)-I b(u,v);
其中,I f(u,v)为前景图像帧,I b(u,v)为背景图像帧,即滤波后的图像帧;I(u,v)为去畸变图像帧。
步骤1014,对前景图像帧进行拉伸,得到目标图像帧。
通常情况下,采集的图像帧中纹理信息较弱,纹理区域的像素值(灰度)大多分布在狭窄的灰度区间。因此,在该步骤中,将前景图像帧的像素值拉伸到像素取值范围上,扩大前景图像帧的像素值的灰度区间。
本申请实施例中,像素取值范围可以为像素点实际能够取值的范围,即0~255。灰度值即为像素值。
一个可选的实施例中,上述步骤1014可以为:
当前景图像像素值小于等于最小灰度值时,将该前景图像像素值取值为像素取值范围内的最小值,例如,像素最小值为0;
当前景图像像素值大于最小灰度值、且小于最大灰度值时,增加该前景图像像素值的对比度。可选地,可以按照与像素最大值成一定比例的像素值作为该前景图像像素值。可选地,上述比例可以为:前景图像像素值与最小灰度值之差与最大灰度值与最小灰度值之差的比值。
当前景图像像素值大于等于最大灰度值时,将该前景图像像素值取值为像素取值范围内的最大值,例如,像素最大值为255。
上述最大灰度值和最小灰度值可以为用户根据实际需求预先设定的值。例如最大灰度值为200、220等,最小灰度值为50、100等
这种情况下,上述步骤1014可以用数学式表达为:
拉伸后的前景图像帧I f'(u,v)表示为:
Figure PCTCN2021103653-appb-000001
其中,I f'(u,v)为目标图像帧,即拉伸后的前景图像帧,I f(u,v)表示前景图像帧,I min为最小灰度值,I max为最大灰度值,在上式中,像素取值范围为0~255。
本申请实施例中,前景图像像素值为前景图像帧中一个像素点的像素值。对于前景图像帧中每个像素点:
当该像素点的像素值小于等于最小灰度值时,将该像素点的像素值取值为像素取值范围内的最小值;
当该像素点的像素值大于最小灰度值、且小于最大灰度值时,增加该像素点的像素值的对比度。可选的,增加对比度方式可以为:将与像素取值范围的最大值成预设比例的像素值作为该像素点的像素值。其中,预设比例可以为:第一差值和第二差值的比值,第一差值为该像素点的像素值与最小灰度值之差,第二差值为最大灰度值与最小灰度值之差。
当该像素点的像素值大于等于最大灰度值时,将该像素点的像素值取值为像素取值范围内的最大值。
图像帧预处理有利于提升图像帧的对比度,在一些纹理信息较弱的环境,有利于提高纹理区域的对比度,从而提取到更多的特征点。
步骤102,基于当前目标图像帧提取特征点,以将图像信息转换为特征信息,得到当前目标图像帧的特征点集合。
在本步骤中,可采用ORB(Oriented FAST and Rotated BRIEF,面向加速分段测试特征和二进制鲁棒独立的基本特征)、SIFT(Scale invariant feature Transform,尺度不变特征变换)、SURF(Speeded Up Robust Features,加速稳健特征)等算法提取特征点。
以ORB算法为例,基于一目标图像帧,上述步骤102可以包括:
步骤1021,基于目标图像帧,采用FAST(Features from Accelerated Segment Test,加速分段测试特征)算法进行特征检测,得到FAST特征点。
步骤1022,对FAST特征点进行筛选,以有效控制特征点的规模。
为了保证特征点分布均匀的同时,尽可能筛选出显著的特征点,一个可选的实施例中,可以将目标图像帧划分成一定数量的网格,如图2所示,图2为本申请实施例提供的特征点筛选示意图。图2中,将目标图像帧划分成多个的网格。网格的数量根据实际需求进行设定。
所有的特征点按网格进行筛选,可以为:
针对任一网格,将该网格内的特征点按FAST特征点响应值降序排列,保留前Q个特征点,其中,Q根据一目标图像帧中特征点的数量和所设的特征点总数上限、以及该网格中的特征点总数确定。不同网格所保留的特征点数可以不同,也可以相同。
例如,一目标图像帧中,特征点总数上限设定为100个,该目标图像帧中特征点的数量为2000个,则根据该目标图像帧中特征点的数量(2000个)和特征点总数上限(100个),可以确定2000/100=20,即该目标图像帧中每20个特征点选出一个。如果该目标图像帧的某网格中有20个特征点,则该网格保留的特征点为1,即,Q=1。
上述Q的确定,用数学式表达可以为:
Figure PCTCN2021103653-appb-000002
其中,符号
Figure PCTCN2021103653-appb-000003
表示向下取整。
步骤1023,对筛选出来的每个FAST特征点,确定该FAST特征点的方向,也就是,计算特征点以r为半径范围内的质心,特征点坐标到质心形成一个向量作为该特征点的方向。
筛选出来的FAST特征点即为上述保留前Q个特征点。
上述步骤1023可以为:对筛选出来的每个FAST特征点,计算以特征点为圆心,以r为半径的范围内所有FAST特征点的质心,将该FAST特征点到质心形成一个向量作为该FAST特征点的方向。
步骤1024,对筛选出来的每个FAST特征点,计算一个二进制串的特征描述符,从而得到当前该目标图像帧中的特征点信息。
本申请实施例中,可以采用rBRIEF、oBRIEF等特征描述符表示特征点信息。
步骤103,帧间跟踪,以对前后帧中的特征点进行匹配,计算匹配特征点在世界坐标系下的坐标,作为具有三维空间位置信息的地图点保存。
上述帧间跟踪,以对前后帧中的特征点进行匹配,可以简称为帧间匹配。在世界坐标系下的坐标又可以称为空间坐标。
在该步骤103中,对当前已提取特征点的目标图像帧:
步骤1031,判断当前目标图像帧是否为首帧;如果是,则将该目标图像帧作为关键帧;否则,则执行步骤1032,进行帧间匹配,以确定当前目标图像帧是否为关键帧。
步骤1032,将当前目标图像帧与上一关键帧进行匹配,即:
对于当前目标图像帧的任一特征点i,计算当前目标图像帧中的特征点i与上一关键帧中特征点i的描述符之间的匹配度是否小于设定的匹配阈值;如果是,则判定两特征点匹配;否则,判定该两特征点不匹配。
其中,匹配度可以采用汉明距离来描述,匹配阈值为汉明距离阈值。匹配阈值大小可以根据实际需求进行设定。
当前目标图像帧中的特征点i与上一关键帧中特征点i对应空间中同一点。如果当前目标图像帧中的特征点i与上一关键帧中特征点i的描述符之间的匹配度小于设定的匹配阈值,则当前目标图像帧中的特征点i与上一关键帧中特征点i匹配,特征点i为当前目标图像帧的匹配特征点。
步骤1033,根据关键帧条件判断当前目标图像帧是否为关键帧;如果是,则将当前目标图像帧作为关键帧,执行步骤1034,以基于该关键帧进行地图更新;否则,则不进行地图更新。
在该步骤1033中,关键帧条件可以为匹配特征点数量大于设定的第一阈值。这种情况下,步骤1033可以为:当当前目标图像帧的匹配特征点数量大于设定的第一阈值时,可以判定当前目标图像帧为关键帧;
关键帧条件还可以是以下条件之一:
与上一关键帧之间的空间距离大于设定的第二阈值;
与上一关键帧之间的空间角度大于设定的第三阈值;
上述空间距离为:当前位置与上一位置之间的距离。上述空间角度为:从上一位置至当前位置,机器人所旋转的角度。当前位置为采集当前目标图像帧时机器人的位置,上一位置为采集上一关键帧时机器人的位置。
当关键帧条件为匹配特征点数量大于设定的第一阈值之外的条件时,步骤1033与步骤1032进行对调,即:先确定关键帧,然后再确定匹配特征点。
步骤1034,基于当前关键帧,计算各个匹配的特征点(简称为匹配特征点)的坐标,并作为地图点信息保存;
由于每一当前关键帧与上一关键帧的匹配特征点不完全相同,这样,上一关键帧中与当前关键帧匹配的特征点的坐标会被该步骤1034的计算结果更新,而与当前关键帧未匹配的特征点的坐标则未被更新,从而使得得到的当前地图信息包括:未更新的地图点信息和已更新的地图点信息。其中,每个地图点对应有三维空间位置信息。三维空间位置信息简称为空间位置信息,即为地图点在世界坐标系下的坐标。地图点投影在图像帧中的点即为特征点。
地图点信息可以包括空间位置信息。地图点信息还可以包括:在该地图点信息所指示的空间坐标处采集的关键帧,以及采集该关键帧时机器人的姿态。
在本申请实施例中,鉴于单目相机所采集的地面纹理图像帧处于同一平面,例如,单目相机安装于机器人底部,故而,采集图像帧时,当前关键帧的任一匹配特征点在世界坐标系下的坐标可以通过相机的外参,将当前关键帧的匹配特征点投影到图像平面上,得到图像坐标系下的像素坐标。匹配特征点在世界坐标系下的坐标即为匹配特征点的空间位置信息。
上述地面纹理图像帧处于同一平面可以理解为:各个空间点与单目相机的镜头所在平面的距离相同。空间点为地面纹理图像帧中像素点对应的世界坐标系下的点。
基于此,在采集到图像帧后,可以通过相机的外参,将当前关键帧的匹配特征点的像素坐标投影到世界坐标系下,得到匹配特征点的空间位置信息。以当前关键帧的匹配特征点i为例,上述步骤1034可以为:
x坐标为当前关键帧的匹配特征点i的像素横坐标u与相机安装高度的乘积结果与相机焦距的比值,
y坐标为当前关键帧的匹配特征点i的像素纵坐标v与相机安装高度的乘积结果与相机焦距的比值,
z坐标为相机安装高度。
可选的,匹配特征点i的空间位置信息可以用数学式表达为:
Figure PCTCN2021103653-appb-000004
Figure PCTCN2021103653-appb-000005
z=H
其中,H为相机的安装高度,f为相机的焦距,u和v为匹配特征点i在图像坐标系中的像素坐标,x和y为世界坐标系下的坐标。
反复执行步骤101~103,直至所有的源图像帧处理完毕,得到由大量地图点构成的第一视觉点云地图。
反复执行步骤101~103,得到大量地图点信息,一个地图点信息可以标识出一个世界坐标系下的一个地图点。大量地图点信息可标识出大量地图点,结合关键帧,构成了第一视觉点云地图。
本申请实施例提供了一种视觉点云地图的构建方法,通过对所采集的源图像帧的后处理,以相邻关键帧进行特征点匹配,基于匹配的特征点的像素坐标生成地图点的三维坐标,从而得到视觉点云地图。本申请实施例中,在地图构建过程中,无需确定机器人的定位信息,避免了定位信息对地图构建的影响。相比于基于机器人的定位信息构建地图节点的地图构建方式,本申请实施例提供的技术方案,避免了由于地图点之间有间距而导致地图信息离散(即不连续)的问题,实现了连续地图点的构建,使得定位应用中,能实现连续的定位,无跳变问题存在。
实施例二
在本申请实施例中,以图像数据是由单目相机采集、所采集的图像帧为非同一平面的图像帧为例来说明。例如,单目相机采用前视安装,即,机器人通过前视相机采集图像帧。
上述图像帧为非同一平面可以理解为:各个空间点与单目相机的镜头所在平面的距离不同。空间点为图像帧中像素点对应的世界坐标系下的点。
参见图3所示,图3为本申请实施例二提供的基于单目相机所采集的前视图像数据构建地图的一种 流程示意图。对于每一图像帧,执行如下步骤:
步骤301,根据相机的畸变系数对源图像帧进行去畸变处理,得到去畸变图像帧I(u,v),其中,u、v表示像素坐标,I(u,v)表示图像帧中该像素坐标处的像素值。
上述步骤301中,将所采集的图像帧作为源图像帧,进而根据相机的畸变系数对源图像帧进行去畸变处理,得到去畸变图像帧I(u,v)。
步骤302,判断去畸变图像帧中各个像素点的像素值是否大于设定的第一像素阈值;如果是,则将像素值大于第一像素阈值的像素点进行取反操作,然后对进行了取反操作后的去畸变图像帧进行滤波;否则,直接对去畸变图像帧I(u,v),进行图像滤波,得到背景图像帧I b(u,v)。
上述步骤302可以为:对于去畸变图像帧中每个像素点,判断该像素点的像素值是否大于设定的第一像素阈值;如果大于第一像素阈值,则对该像素点进行取反操作;如果小于等于第一像素阈值,则无需对该像素点进行取反操作。在对去畸变图像帧中的所有像素点均执行了上述判断,并基于判断结果对去畸变图像帧进行处理之后,对处理之后的去畸变图像帧进行图像滤波,得到背景图像帧I b(u,v)。
步骤303,用去畸变图像帧减去背景图像帧,得到前景图像帧I f(u,v)。上述步骤303中求得前景图像帧可以用数学式表达为:
I f(u,v)=I(u,v)-I b(u,v)。
步骤304,判断前景图像帧I f(u,v)的像素值是否分布均匀;如果是,则将前景图像帧作为目标图像帧;否则,则对前景图像帧进行拉伸,得到目标图像帧,步骤304中的拉伸处理与步骤1014相同。
本申请实施例中,若前景图像帧中像素点的像素值在0~255区间上分布均匀,则确定该前景图像帧的图像质量较高,前景图像帧I f(u,v)的像素值分布均匀,将该前景帧作为目标图像帧;若前景图像帧中像素点的像素值分布在一个狭窄的灰度区间上,例如前景图像帧中像素点的像素值分布在100~150区间上,则确定前景图像帧的图像质量较低,前景图像帧I f(u,v)的像素值分布不均匀,对前景图像帧进行拉伸,得到目标图像帧。
在本步骤中,使得高图像质量不进行图像拉伸处理,而低图像质量进行图像拉伸处理,从而使得图像拉伸处理根据图像质量进行选择性处理,降低了设备的负担。
步骤305,基于当前目标图像帧提取特征点,以将图像信息转换为特征信息,得到当前目标图像帧的特征点集合。
在本步骤中,可采用ORB、SIFT、SIFT的高效改良版SURF等算法提取特征点。该步骤305与步骤102相同。
步骤306,判断当前目标图像帧是否为首帧;如果是,则将当前目标图像帧作为关键帧,然后返回步骤310;否则,则执行步骤307,进行帧间匹配,以确定当前目标图像帧是否为关键帧。
步骤307,将当前目标图像帧与上一关键帧进行匹配,即:
对于当前目标图像帧的任一特征点i,计算当前目标图像帧中的特征点i与上一关键帧中特征点i的描述符之间的匹配度是否小于设定的匹配阈值;如果是,则判定两特征点匹配,特征点i为当前目标图像帧的匹配特征点;否则,判定该两特征点不匹配。
其中,匹配度可以采用汉明距离来描述,匹配阈值为汉明距离阈值。
步骤308,根据关键帧条件判断当前目标图像帧是否为关键帧;如果是,则将当前目标图像帧作为关键帧,执行步骤309,以基于该关键帧进行地图更新;否则,则不进行地图更新,直接执行步骤310。
本申请实施例中,当满足以下关键帧条件之一时,判定当前目标图像帧为关键帧:
匹配特征点数量大于设定的第一阈值;
与上一关键帧之间的空间距离大于设定的第二阈值;
与上一关键帧之间的空间角度大于设定的第三阈值。
步骤309,基于当前关键帧,计算各个匹配特征点的坐标,并作为地图点信息保存。这样,得到的当前地图信息包括:未更新的地图点信息和已更新的地图点信息。其中,每个地图点对应有三维空间位 置信息。
在本申请实施例中,鉴于单目相机所采集的图像帧处于非同一平面,故而,根据匹配特征点的像素坐标,采用八点法计算两图像帧之间的本质矩阵,对本质矩阵进行SVD(Singular Value Decomposition,奇异值分解),得到相机相对位姿,然后基于三角化计算原理,根据两图像帧之间的相对位姿,采用最小二乘法,计算任一匹配特征点i在世界坐标系下的坐标。
上述相机相对位姿即为两图像帧之间的相对位姿。相机安装在机器人上,因此,相机相对位姿又可以理解为:机器人在采集两图像帧的位置之间的相对位姿。
可选的,步骤309中各个匹配特征点的坐标,可以按照以下步骤计算:
步骤3091,根据本质矩阵E和当前关键帧的匹配特征点i的归一化平面坐标p 1、上一关键帧的匹配特征点i的归一化平面坐标p 2满足:对于任一匹配特征点,上一关键帧中该匹配特征点的归一化平面坐标的转置矩阵、本质矩阵、当前关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积等于0的关系,求解本质矩阵E。本质矩阵E的求解可以用数学式表达为:
Figure PCTCN2021103653-appb-000006
其中,本质矩阵E是反映空间中一点P的像点在不同视角相机下相机坐标系中的表示之间的关系,为3×3矩阵。本质矩阵E的作用是:第一图像帧上的一个点被本质矩阵相乘,其结果为此点在第二图像帧上的对极线。
当前关键帧的匹配特征点i的归一化平面坐标
Figure PCTCN2021103653-appb-000007
上一关键帧匹配特征点i的归一化平面坐标
Figure PCTCN2021103653-appb-000008
p 1和p 2对应空间中的同一点,即p 1和p 2对应世界坐标系下的同一点,p 1和p 2为一对匹配特征点。
在本质矩阵E的求解算法中,代入八对匹配特征点的归一化平面坐标,可求解出本质矩阵。其中,当前关键帧的匹配特征点i与上一关键帧的匹配特征点i组成一对匹配特征点。
步骤3092,对本质矩阵E进行SVD,得到当前关键帧与上一关键帧之间的相对位姿,即相机的相对位姿,包括平移矩阵t和旋转矩阵R。
步骤3093,基于三角化计算原理,当前关键帧的匹配特征点i的深度值s 1、上一关键帧的匹配特征点i的深度值s 2满足:
s 1p 1=s 2Rp 2+t;
采用最小二乘法,可以求解得出s 1和s 2
或者,
将上式两边同时乘以p 1的反对称矩阵p 1^,可得:
s 1p 1^p 1=0=s 2p 1^Rp 2+p 1^t;
由此可求得s 2,将求得的s 2代入式1,得到s 1
上式中,R表示旋转矩阵,t表示平移矩阵,p 1为当前关键帧的匹配特征点i的归一化平面坐标,p 2为上一关键帧匹配特征点i的归一化平面坐标。
步骤3094,根据当前关键帧的匹配特征点i的深度值s 1,计算当前关键帧的匹配特征点i在世界坐标系下的坐标,可以为:
x坐标为:当前关键帧中该匹配特征点i的归一化平面的像素横坐标与该匹配特征点的深度值的乘积;
y坐标为:当前关键帧中该匹配特征点i的归一化平面的像素纵坐标与该匹配特征点的深度值的乘积;
z坐标为:相机焦距。
数学式表示为:
x=s 1u 1
y=s 1v 1
z=f。
其中,f为相机焦距,以将归一化平面坐标转化为成像平面。u 1为归一化平面坐标中的像素横坐标,v 1为归一化平面坐标中的像素纵坐标,s 1为当前关键帧的匹配特征点i的深度值。
步骤310,判断源图像帧是否处理完毕;如果是,则结束;否则,处理下一源图像帧,返回执行步骤301,直至所有的源图像帧处理完毕,得到由大量地图点构成的第一视觉点云地图。
本申请实施例提供了一种基于前视相机采集的非同一平面的图像帧的视觉点云地图的构建方法,通过对所采集的源图像帧的后处理,以相邻关键帧进行特征点匹配,基于匹配特征点的像素坐标生成地图点的三维坐标,从而得到视觉点云地图。本申请实施例中,在地图构建过程中,无需确定机器人的定位信息,使得地图构建与定位分离,提高了构建的地图的稳定性和对复杂环境的适应能力。
实施例三
在本申请实施例中,以图像数据是由双目相机采集为例来说明,所采集的图像帧为非同一平面的图像帧。
参见图4所示,图4为本申请实施例三提供的基于双目相机所采集的图像数据构建地图的一种流程示意图。以双目相机中的一目相机为第一目相机,双目相机中的另一目相机为第二目相机。对于每一双目图像帧,即,同一时间来自第一目相机的第一源图像帧和来自第二目相机的第二源图像帧,执行如下步骤:
步骤401,对第一源图像帧和第二源图像帧进行预处理,得到当前双目目标图像帧,包括第一目标图像帧和第二目标图像帧;。
在该步骤中,可以并行地对第一源图像帧和第二源图像帧进行预处理,也可以分别串行地对第一源图像帧和第二源图像帧进行预处理,对此不进行限定。
上述步骤401可以为:对第一源图像帧进行预处理,得到第一目标图像帧,对第二源图像帧进行预处理,得到第二目标图像帧。第一目标图像帧和第二目标图像帧构成当前双目目标图像帧。
步骤401中的预处理与步骤301~步骤304相同。
步骤402,基于当前双目目标图像帧,分别提取第一目标图像帧的特征点和第二目标图像帧的特征点,以将图像信息转换为特征信息,得到当前双目目标图像帧的特征点集合。
在本步骤中,可采用ORB、SIFT、SURF等算法提取特征点。该步骤中,第一目标图像帧和第二目标图像帧的特征点提取过程与步骤102相同。
步骤403,判断当前双目目标图像帧是否为首个双目图像帧;如果是,则将当前双目目标图像帧中的任一帧作为关键帧,执行步骤406;否则,则执行步骤404,进行帧间匹配,以确定当前双目目标图像帧中的任一帧是否为关键帧。
步骤404,为了提高匹配的效率,可将当前双目目标图像帧中的任一目标图像帧与上一关键帧进行匹配,可以得到该目标图像帧的匹配特征点,
步骤404中的匹配方式与步骤307相同。
步骤405,根据关键帧条件判断当前双目目标图像帧中的任一目标图像帧是否为关键帧;如果是,则将该目标图像帧作为当前双目目标图像帧的关键帧,执行步骤406,以基于该关键帧进行地图更新;否则,则不进行地图更新。
本申请实施例中,当满足以下关键帧条件之一时,判定该目标图像帧为关键帧:
匹配特征点数量大于设定的第一阈值;
与上一关键帧之间的空间距离大于设定的第二阈值;
与上一关键帧之间的空间角度大于设定的第三阈值。
步骤405中确定关键帧的目标图像帧与步骤404中提取匹配特征点的目标图像帧为同一目标图像帧。
步骤406,基于当前关键帧中的第一匹配特征点,搜索当前双目目标图像中匹配成功的第二匹配特征点,计算第一匹配特征点的坐标,并作为地图点信息保存。这样,得到的当前地图信息包括:未更新的地图点信息和已更新的地图点信息。其中,每个地图点对应有三维空间位置信息,即地图点坐标。
本步骤中,第二匹配特征点为当前双目目标图像帧中除当前关键帧外的一帧目标图像帧中的特征点,并且第二匹配特征点与第一匹配特征点的匹配度小于设定的匹配阈值。基于此,上述步骤406可以为:基于当前关键帧中的第一匹配特征点,搜索当前双目目标图像帧,获得与第一匹配特征点匹配的第二匹配特征点,基于第二匹配特征点计算第一匹配特征点的坐标,并第一匹配特征点的坐标作为地图点信息保存。
在本申请实施例中,当前关键帧中任一匹配特征点(第一匹配特征点)i的坐标的计算过程如下:
将当前关键帧作为当前双目目标图像帧中的第一帧,该帧中的匹配特征点已通过步骤404得到;将该双目目标图像帧中的另一目标图像帧作为第二帧;将第一帧中的匹配特征点i与第二帧中的特征点j进行匹配,即,计算匹配特征点i与特征点j的描述符之间的匹配度,基于匹配度,确定匹配特征点i与特征点j是否匹配;如果匹配,得到第二帧中的第二匹配特征点(即特征点j),则匹配特征点i的地图点坐标为:
x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
z坐标为:相机焦距与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值。
数学式表示为:
Figure PCTCN2021103653-appb-000009
Figure PCTCN2021103653-appb-000010
Figure PCTCN2021103653-appb-000011
其中,(u 1,v 1)为第一帧(即当前关键帧)的匹配特征点的归一化平面的像素坐标,(u 2,v 2)为第二帧的匹配特征点的归一化平面的像素坐标,f表示相机焦距,b表示双目基线长度。
上述第一帧的匹配特征点和第二帧的匹配特征点对应世界坐标系下的同一点。
如果不匹配,则放弃匹配特征点i的坐标计算。
反复执行步骤401~406,直至所有的源双目图像帧处理完毕,得到由大量地图点构成的第一视觉点云地图。
本申请实施例提供了一种基于双目图像帧的视觉点云地图的构建方法,利用双目图像帧获得匹配特征点的空间坐标,计算简单。并且,在地图构建过程中,无需确定机器人的定位信息,使得地图构建与定位分离,提高了构建的地图的稳定性和对复杂环境的适应能力。
鉴于第一视觉点云地图是通过图像帧间的连续匹配、不断记录生成的地图点而得到,基于图像帧间连续匹配的地图点生成方式会产生累计误差。随着机器人运动距离的增加,上述累计误差会越来越大。
参见图5所示,图5为本申请实施例提供的累计误差的一种示意图。图5中,T i表示采集第i图像 帧时机器人的位置,i=0,1,…,20,左侧客观的真实轨迹,右侧为计算得到的轨迹。其中,T 1和T 19在同一位置附近,但计算得到的轨迹,因为累计误差而导致不在同一位置附近。为了消除累计误差,提高第一视觉点云地图中地图点的空间位置信息的精度,可以通过构建闭环约束,采用最小二乘法进行优化。
参见图6所示,图6为本申请实施例提供的对第一视觉点云地图进行优化的一种流程示意图。该优化方法可以包括:闭环点识别、闭环约束计算、地图优化。其中,地图优化包括位姿图优化和/或地图点优化,如下。
步骤601,通过人工标记或关键帧相似度计算,识别出第一视觉点云地图中存在闭环约束的关键帧。
实施方式之一,采用人工标记的方法:图像数据采集时在环境中布置唯一的标识图案,以使得不同次地采集到相同标识的关键帧之间产生闭环。该方式具有可靠性高的优点。
实施方式之二,自然标识的方法,即关键帧相似度计算的方法:通过计算两关键帧间的相似度是否大于设定的相似度阈值,来判断是否发生闭环。其中,相似度包括特征点分布上的相似度和图像像素的相似度。
通过关键帧相似度计算,识别出第一视觉点云地图中存在闭环约束的关键帧,可以包括:
一、构建自然特征的视觉字典。
例如,若多图像帧中共存在N个特征点,构建一个深度为d,每次分叉为k的树,则构建视觉字典的流程如下:
在根节点,用k均值聚类算法(k-means)把所有特征点聚成k类,这样得到了第一层节点。
对第一层的每个节点,把属于该节点的特征点再聚成k类,得到下一层。
依此类推,直至叶子层,其中,从根节点到叶子层共计d层。叶子层即为字典中的单词特征点。
参见图7所示,图7为本申请实施例提供的视觉字典的一种示意图。图7中,每个空心圆表示一个节点,两个节点间的连线表示查找某一特征点时的路径。
二、对于一图像帧(即关键帧)中所有特征点数量为n,单词特征点ω i在视觉字典中出现的次数为c i,该单词特征点ω i的权重表示为:
Figure PCTCN2021103653-appb-000012
根据视觉字典,任一关键帧A的描述可以为以各个单词特征点ω i及其权重为元素的集合,数学式表达为:
A={(ω 1,η 1),(ω 2,η 2),……(ω N,η N)}
其中,N为视觉字典中特征点的总数。
四、根据上述视觉字典,可采用L1范数形式对任一两关键帧A和关键帧B之间的相似度S进行描述,如下:
Figure PCTCN2021103653-appb-000013
其中,v Ai为根据视觉字典所描述的关键帧A的集合中的元素,v Bi为根据字典模型所描述的关键帧B的集合中的元素,N为视觉字典中特征点的总数。
若相似度S大于设定的相似度阈值,判定两帧之间存在闭环约束。
步骤602,基于判定为存在闭环约束的关键帧(后文简称为闭环关键帧),计算闭环约束。上述步骤602可以包括:
步骤6021,基于闭环关键帧,计算该闭环关键帧中的匹配特征点:
若第一闭环关键帧A和第二闭环关键帧B存在闭环,计算第一闭环关键帧A中的任一特征点i和第二闭环关键帧B中的任一特征点j的匹配度;若匹配度小于设定的匹配度阈值,则判定该两特征点匹配,如此反复地对每个特征点进行匹配,得到m个匹配特征点。可以用数学式表达为:
P={p 1,p 2……p m},P′={p 1′,p 2′……p m′}
其中,P为第一闭环关键帧A中的m个匹配特征点集合,P为第二闭环关键帧B中m个匹配特征点集合,p i、p′ i为像素坐标。本申请实施例中,像素坐标又可以称为像素坐标矩阵。
第一闭环关键帧和第二闭环关键帧为闭环关键帧。
上述匹配度计算可以是计算两特征点的描述符之间的汉明距离。若汉明距离小于设定的汉明阈值,则判定为两特征点匹配。
步骤6022,根据闭环关键帧中的匹配特征点,计算两闭环关键帧之间的帧间运动信息,即,计算两闭环关键帧之间的相对位姿,该相对位姿表征了累计误差。
鉴于闭环关键帧中的任一匹配特征点满足以下关系:
p i=Rp′ i+t
其中,(R,t)为两闭环关键帧之间的相对位姿。(R,t)反映了两闭环关键帧之间的闭环约束,通过上述关系可计算得到相对位姿,作为初始值;i为自然数,且,1≤i≤m,p i、p′ i为像素坐标。
对于上述闭环约束,可采用最小二乘法求解,例如,采用非线性优化的LM(Levenberg-Marquardt,列文伯格-马夸尔特)算法求解。通过构造第一目标函数,代入闭环关键帧中所有匹配特征点的像素位置信息,迭代求解使得第一目标函数取得最小值时的ζ,从而求得R和t。可以用数学式表达为:
Figure PCTCN2021103653-appb-000014
其中,ζ为(R,t)的李代数表示,p i、p′ i为像素坐标,m为闭环关键帧中匹配特征点的数量。上述像素位置信息为像素坐标。
步骤603,根据闭环约束,对第一视觉点云地图中的地图点进行优化。
对第一视觉点云地图中的地图点的优化可以包括:位姿图优化和地图点优化。其中,位姿图优化是按照步骤6031处理,地图点优化按照步骤6032处理。步骤6031与步骤6032无先后顺序。
所应理解的是,步骤6031与步骤6032还可以选择性的执行其中任一步骤的优化过程。例如,仅进行位姿图优化,或者,仅进行地图点优化。
步骤6031,鉴于任一关键帧i的位姿T i的李代数表示ζ i和任一关键帧j的位姿T j的李代数表示ζ j之间的相对位姿的误差e ij可以表示为:
Figure PCTCN2021103653-appb-000015
其中,符号∧表示反对称矩阵,符号∨表示反对称矩阵的逆运算,T ij表示关键帧i和关键帧j之间的相对位姿,ζ ij表示关键帧i和关键帧j之间的帧间相对李代数表示。其中,关键帧的位姿为采集该关键帧时相机(或机器人)的位姿。
故而,构造用于关键帧的位姿图优化的第二目标函数:
Figure PCTCN2021103653-appb-000016
其中,Ω为误差项的权重,ε为关键帧集合,e ij表示关键帧i和关键帧j之间的相对位姿的误差。
在第二目标函数中代入测量得到的关键帧i和关键帧j之间的相对位姿的误差,作为初始值,以步骤6022求得的闭环关键帧之间的相对位姿为约束,采用高斯-牛顿算法或LM算法,迭代求解使得第二目标函数取得最小值时的关键帧i的位姿T i的李代数表示ζ i和关键帧j位姿T j的李代数表示ζ j
这样,根据闭环关键帧所确定的累积误差被分配至各个关键帧中,从而修正了关键帧的位姿。
步骤6032,根据任一关键帧i的位姿T i采集到三维地图点j的坐标y j在关键帧i中的像素位置z ij, 构造重投影误差e ij
Figure PCTCN2021103653-appb-000017
其中,
Figure PCTCN2021103653-appb-000018
表示地图点重投影在图像帧中的位置,为:
Figure PCTCN2021103653-appb-000019
Figure PCTCN2021103653-appb-000020
其中,I为单位矩阵,[I 3×3 0 3×1]构成3×4的矩阵,T i为4×4的矩阵,
Figure PCTCN2021103653-appb-000021
为4×1的矩阵,K为相机内参。0 3×1为[0 0 0] T。x为像素坐标的齐次表示,x 1、x 2和x 3表示x内的三个数。
上述根据任一关键帧i的位姿T i采集到三维地图点j的坐标y j在关键帧i中的像素位置z ij,构造重投影误差e ij,可以理解为,基于关键帧i的位姿T i,确定地图点j的坐标y j在关键帧i中的像素位置z ij,进而根据像素位置z ij,构造重投影误差e ij。其中,三维地图点j的坐标y j为地图点j在世界坐标系下的坐标,像素位置z ij表示地图点j在关键帧i中的像素坐标,
Figure PCTCN2021103653-appb-000022
表示地图点j重投影在关键帧i中的像素坐标。
构造重投影误差的第三目标函数:
Figure PCTCN2021103653-appb-000023
其中,Ω为误差项的权重,j为地图点,e ij表示重投影误差。
在第三目标函数中代入上述根据关键帧i的位姿T i、地图点j的坐标、相机内参、以及地图点j在关键帧i中的像素坐标所得到的重投影误差,并作为初始值,采用高斯-牛顿算法或LM算法,迭代求解使得第三目标函数取得最小值时的三维地图点j的坐标y j,从而对地图点j的三维空间位置信息进行修正。
在该步骤6032中,可选地,关键帧i的位姿T i可以为经过步骤6031优化后的位姿。
将优化后的关键帧的位姿和/或优化后的地图点的坐标作为视觉点云的地图信息保存。这样,通过关键帧的位姿优化和/或地图点坐标的优化,得到第二视觉点云地图。
本申请实施例中,将建图过程分离为:独立的第一视觉点云地图构建的处理阶段,以及通过闭环约束计算和地图优化,获得第二视觉点云地图的处理阶段。每一处理阶段都有对应的输出地图保存,即使发生建图不理想的情况,也保存了上一处理阶段中的原始数据。这使得构建地图的扩展性增强,方便与各种改进的地图构建方法进行融合。
参见图8所示,图8为本申请实施例提供的视觉点云地图的构建装置的一种示意图。该装置包括:第一视觉点云地图构建单元801、闭环单元802、地图优化单元803、以及用于实现地图文件的读取和保存的IO(Input Output,输入输出)单元804。
其中,来自外部的源图像帧输入至第一视觉点云地图构建单元801;第一视觉点云地图构建单元801用于生成的第一视觉点云地图;闭环单元802用于向第一视觉点云地图构建单元801生成的第一视觉点云地图添加闭环约束;地图优化单元803用于基于闭环约束对第一视觉点云地图进行关键帧位姿图优化和地图点优化。
可选地,第一视觉点云地图构建单元801可以包括:
图像预处理模块8011,用于对源图像帧进行预处理;
特征提取模块8012,用于将预处理后的源图像帧的图像信息转换为特征信息;
地图点生成模块8013,用于对源图像帧进行帧间跟踪,确定关键帧,将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息;
上述地图点生成模块8013,具体可以用于对源图像帧进行帧间跟踪,确定关键帧,计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息。
其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
闭环单元802包括:
闭环关键帧识别模块8021,用于根据人工标记或关键帧相似度计算,识别出第一视觉点云地图中的闭环关键帧;
闭环约束计算模块8022,用于基于闭环关键帧,计算闭环关键帧之间的相对位姿,作为闭环约束;构造用于关键帧位姿图优化的第二目标函数,采用最小二乘法,求解使得第二目标函数取得最小值时的关键帧的位姿。
地图优化单元803包括关键帧位姿图优化模块8031和/或地图点优化模块8032;
其中,关键帧位姿图优化模块8031,用于基于第一视觉点云地图,根据存在闭环约束的闭环关键帧,采用最小二乘法,对关键帧位姿进行图优化,得到第二视觉点云地图;
地图点优化模块8032,用于基于第一视觉点云地图,根据重投影误差,对地图点的空间位置信息进行优化,得到第二视觉点云地图。
参见图9所示,图9为本申请实施例提供的图像预处理模块的一种示意图。该图像预处理模块可以包括:
图像去畸变子模块,用于根据相机的畸变系数对源图像帧进行去畸变处理,得到去畸变图像帧;
图像滤波子模块,用于将去畸变图像帧进行图像滤波,得到背景图像帧;
图像差分子模块,用于用去畸变图像帧减去背景图像帧,得到前景图像帧;
图像拉伸子模块,用于对前景图像帧进行拉伸处理,得到目标图像帧。
本申请实施例中,将建图过程分离为独立的第一视觉点云地图构建单元、闭环单元和地图优化单元,各单元间无耦合关系,每一处理阶段都有对应的输出地图保存,即使发生建图不理想的情况,也保存了上一过程中的原始数据;扩展性强,方便与各种改进方法进行融合。
本申请实施例还提供了一种视觉点云地图的构建装置,该装置包括:第一视觉点云地图构建单元,该单元包括:
特征提取模块,用于对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点;
地图点生成模块,用于对源图像帧进行帧间跟踪,确定关键帧;将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息;
其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
可选的,特征提取模块,具体可以用于:
对源图像帧进行图像预处理,得到目标图像帧;
基于目标图像帧进行特征提取,得到目标图像帧的特征点;
地图点生成模块,还用于基于第一视觉点云地图,根据存在闭环约束的闭环关键帧,采用最小二乘法,对关键帧位姿进行图优化,和/或,根据重投影误差,对地图点的空间位置信息进行优化,得到第二视觉点云地图。
可选的,特征提取模块,具体可以用于:
根据相机的畸变系数,对源图像帧进行去畸变处理,得到去畸变图像;
判断去畸变图像中各个像素点的像素值是否大于第一像素阈值;如果是,则将去畸变图像中像素值 大于第一像素阈值的像素点进行取反操作,然后对取反后的去畸变图像进行图像滤波,得到背景图像;否则,将去畸变图像进行图像滤波,得到背景图像;
用去畸变图像减去背景图像,得到前景图像;
判断前景图像中的像素值是否分布均匀;如果均匀,则将该前景图像作为目标图像帧;否则,对前景图像进行拉伸处理,得到目标图像帧。
可选的,特征提取模块,具体可以用于:
若前景图像像素值小于等于设定的最小灰度值时,将该前景图像像素值取值为像素取值范围内的最小值;
若前景图像像素值大于最小灰度值、且小于设定的最大灰度值时,按照与像素最大值成一定比例的像素值作为该前景图像像素值;所述一定比例为前景图像像素值与最小灰度值之差与最大灰度值与最小灰度值之差的比值;
若前景图像像素值大于等于最大灰度值时,将该前景图像像素值取值为像素取值范围内的最大值;
对目标图像帧进行特征检测,得到特征点;
将目标图像帧划分成一定数量的网格;
对于任一网格中的特征点,将网格内的特征点按特征点响应值降序排列,保留前Q个特征点,得到筛选后的特征点;其中,Q根据目标图像帧中特征点的数量和设定的特征点总数上限、该网格中的特征点总数确定;
对筛选后的各特征点,分别计算特征描述符。
可选的,所述Q根据目标图像帧中特征点的数量和设定的特征点总数上限、该网格中的特征点总数确定,包括:Q为目标图像帧中特征点的数量除以设定的特征点总数上限之商,乘以网格中的特征点总数后的结果向下取整得到。
可选的,地图点生成模块,具体可以用于:
对于每一目标图像帧:判断该目标图像帧是否为首帧;如果是,则将该目标图像帧作为关键帧;否则,根据关键帧条件确定该目标图像帧是否为关键帧;
其中,所述关键帧条件至少满足以下条件之一:
匹配特征点数量大于设定的第一阈值;
与上一关键帧之间的空间距离大于设定的第二阈值;
与上一关键帧之间的空间角度大于设定的第三阈值。
可选的,源图像帧为来源于单目相机、且为同一平面的图像帧;
地图点生成模块,具体可以用于:
对于每一匹配特征点:
x坐标为:当前关键帧中该匹配特征点的像素横坐标与相机安装高度的乘积结果与相机焦距的比值;
y坐标为:当前关键帧中该匹配特征点的像素纵坐标与相机安装高度的乘积结果与相机焦距的比值;
z坐标为:相机安装高度。
可选的,所述源图像帧为来源于单目相机、且为非同一平面的图像帧;
地图点生成模块,具体可以用于:
根据由当前关键帧中匹配特征点与上一关键帧中匹配特征点组成的至少8对匹配特征点的像素坐标,得到当前关键帧与上一关键帧的本质矩阵;
对本质矩阵进行奇异值分解,得到当前关键帧与上一关键帧之间的相对位姿;
对于每一匹配特征点:根据当前关键帧与上一关键帧之间的相对位姿,按照三角化计算关系,至少得到当前关键帧中该匹配特征点的深度值;根据当前关键帧中该匹配特征点的深度值,得到该匹配特征点的空间位置信息。
可选的,地图点生成模块,具体可以用于:
对于任一匹配特征点:
根据上一关键帧中该匹配特征点的归一化平面坐标的转置矩阵、本质矩阵、当前关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积等于0的关系,代入8对匹配特征点的像素坐标,得到本质矩阵;
此外,地图点生成模块,还具体可以用于:
基于当前关键帧中该匹配特征点的深度值与该匹配特征点的归一化平面坐标的矩阵之乘积等于,上一关键帧中该匹配特征点的深度值、相对位姿中的旋转矩阵、以及上一关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积与相对位姿中的平移矩阵之和,根据当前关键帧与上一关键帧之间的相对位姿中的旋转矩阵和平移矩阵、当前关键帧和上一关键帧中该匹配特征点的归一化平面坐标的矩阵,得到当前关键帧中该匹配特征点的深度值;
这种情况下,x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与该匹配特征点的深度值的乘积;
y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与该匹配特征点的深度值的乘积;
z坐标为:相机焦距。
可选的,所述源图像帧为来源于双目相机、且为非同一平面的双目图像帧;
特征提取模块,具体可以用于:
对来自第一目相机的第一源图像帧、来自第二目相机的第二源图像帧分别进行图像预处理,得到第一目标图像帧和第二目标图像帧,作为双目目标图像帧;
分别提取第一目标图像帧的特征点和第二目标图像帧的特征点;
地图点生成模块,具体可以用于:判断双目目标图像帧是否为首帧;如果是,则将该双目目标图像帧中的任一帧作为关键帧;否则,根据关键帧条件确定该目标图像帧中的任一帧是否为关键帧;
对于当前关键帧中每一匹配特征点:
将当前关键帧作为当前双目目标图像帧中的第一帧,将该双目目标图像帧中的另一目标图像帧作为第二帧,将第一帧中该匹配特征点与第二帧中的特征点进行匹配;如果匹配成功,得到第二帧中的匹配特征点,则:
当前关键帧中该匹配特征点的x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
当前关键帧中该匹配特征点的y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
当前关键帧中该匹配特征点的z坐标为:相机焦距与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值。
可选的,地图点生成模块,具体可以用于:
根据人工标记或关键帧相似度计算,识别出第一视觉点云地图中的闭环关键帧;
基于闭环关键帧,计算闭环关键帧之间的相对位姿,作为闭环约束;
构造用于关键帧位姿图优化的第二目标函数,以闭环约束为约束,采用最小二乘法,求解使得第二目标函数取得最小值时的关键帧的位姿。
可选的,地图点生成模块,具体可以用于:
将不同次地采集到相同标识的关键帧作为闭环关键帧;
或者,
计算两关键帧间的相似度是否大于设定的相似度阈值;如果是,则判定该两关键帧为闭环关键帧,其中,相似度包括:特征点分布上的相似度和图像像素的相似度;
基于闭环关键帧,计算该闭环关键帧中的匹配特征点;
对于该闭环关键帧中的任一匹配特征点,根据第一闭环关键帧中该匹配特征点的像素坐标矩阵等于,第一闭环关键帧和第二闭环关键帧之间的相对位姿中的旋转矩阵与第二闭环关键帧的像素坐标矩阵的乘积加上相对位姿中的平移矩阵的关系,计算得到相对位姿,作为初始值;
构建累计闭环关键帧中的所有匹配特征点的像素位置信息误差的第一目标函数,代入所有匹配特征点的像素坐标矩阵,迭代求解使得第一目标函数取得最小值时的相对位姿;
根据任一第一关键帧的位姿和任一第二关键帧的位姿之间的相对位姿的误差,构建累计该第一关键帧和该第二关键帧的相对位姿的误差的第二目标函数;
以该第一关键帧和第二关键帧之间的相对位姿误差作为初始值,以所述闭环约束为约束,迭代求解使得第二目标函数取得最小值时的第一关键帧的位姿和第二关键帧的位姿。
可选的,地图点生成模块,具体可以用于:
在根节点,用k均值聚类算法将所有特征点聚成k类,得到第一层节点;
对于第一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点;
对于下一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点;重复执行所述对于下一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点的步骤,直至最后的叶子层,得到视觉字典,该视觉字典包括N个特征点、且每次分叉为k的树;
其中,从根节点到叶子层共计d层,叶子层中包括视觉字典中的单词特征点;k、d、N均为自然数,N为视觉字典中特征点的总数;
对于任一关键帧,根据该关键帧中所有特征点数量以及任一单词特征点出现的次数,计算该单词特征点的权重,将该关键帧描述为以各个单词特征点及其权重为元素的集合,该集合包括有N个元素;
根据第一关键帧所描述的集合中的所有元素和第二关键帧所描述的集合中的所有元素,计算第一关键帧与第二关键帧的相似度;
若相似度大于设定的相似度阈值,判定两关键帧之间存在闭环约束。
可选的,地图点生成模块,具体可以用于:
对任一关键帧,根据该关键帧的位姿所采集到任一地图点在该关键帧中的像素位置所存在的重投影误差,构建重投影误差的第三目标函数;
以重投影误差初始值,迭代求解使得第三目标函数取得最小值时的地图点的空间位置信息;
其中,重投影误差初始值为:该地图点在该关键帧中的像素位置与该地图点重投影在图像中的位置之差;
所述该地图点重投影在图像中的位置根据相机内参、该关键帧的位姿、该地图点的空间位置信息得到。
本申请实施例中,构建地图的过程将建图和定位进行分离,有效地除去了建图与定位的相互影响。在复杂多变的环境中,本申请实施例提供的地图构建方法具有更好的适应性和稳定性。
此外,由于点云地图具有连续性,相比于由地图节点所构成的地图,能实现连续的定位,避免了定位过程中的跳变问题,降低了重定位发生的概率。
进一步地,通过对地图关键帧的位姿图优化和/或地图点优化,提高了地图的精度。在发生错误闭环情况,能够及时对地图进行修正,且不会丢失初始地图数据,这使得构建地图的扩展性增强,有利于与改进的地图构建方法进行融合。
本申请实施例还提供了一种视觉点云地图的构建设备,包括存储器和处理器,存储器存储有可执行的计算机指令,处理器被配置执行存储器中存储的指令,以实现上述任一所述的视觉点云地图的构建方法的步骤。
存储器可以包括RAM(Random Access Memory,随机存取存储器),也可以包括NVM(Non-Volatile Memory,非易失性存储器),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processing,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
本申请实施例还提供了一种电子设备,包括存储器和处理器,存储器存储有可执行的计算机指令,处理器被配置执行存储器中存储的指令,以实现上述任一所述的视觉点云地图的构建方法的步骤。
该电子设备可以为机器人,也可以为与机器人连接的服务器。
本申请实施例还提供了一种计算机可读存储介质,该存储介质内存储有计算机程序,计算机程序被处理器执行时实现上述任一所述的视觉点云地图的构建方法的步骤。
本申请实施例还提供了一种计算机程序,所述计算机程序被处理器执行时实现上述任一所述的视觉点云地图的构建方法的步骤。
对于装置/网络侧设备/存储介质实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (18)

  1. 一种视觉点云地图的构建方法,该方法包括,
    对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点;
    对源图像帧进行帧间跟踪,确定关键帧;
    将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;
    计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息;
    其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
  2. 如权利要求1所述的方法,其中,所述对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点,进一步包括:
    对源图像帧进行图像预处理,得到目标图像帧;
    基于目标图像帧进行特征提取,得到目标图像帧的特征点;
    该方法进一步包括:
    基于第一视觉点云地图,根据存在闭环约束的闭环关键帧,采用最小二乘法,对关键帧位姿进行图优化,和/或,根据重投影误差,对地图点的空间位置信息进行优化,得到第二视觉点云地图。
  3. 如权利要求2所述的方法,其中,所述对源图像帧进行图像预处理,得到目标图像帧,包括:
    根据相机的畸变系数,对源图像帧进行去畸变处理,得到去畸变图像;
    判断去畸变图像中各个像素点的像素值是否大于第一像素阈值;如果是,则将去畸变图像中像素值大于第一像素阈值的像素点进行取反操作,然后对取反后的去畸变图像进行图像滤波,得到背景图像;否则,将去畸变图像进行图像滤波,得到背景图像;
    用去畸变图像减去背景图像,得到前景图像;
    判断前景图像中的像素值是否分布均匀;如果均匀,则将该前景图像作为目标图像帧;否则,对前景图像进行拉伸处理,得到目标图像帧。
  4. 如权利要求3所述的方法,其中,所述对前景图像进行拉伸处理,包括:
    若前景图像像素值小于等于设定的最小灰度值时,将该前景图像像素值取值为像素取值范围内的最小值;
    若前景图像像素值大于最小灰度值、且小于设定的最大灰度值时,按照与像素最大值成一定比例的像素值作为该前景图像像素值;所述一定比例为前景图像像素值与最小灰度值之差与最大灰度值与最小灰度值之差的比值;
    若前景图像像素值大于等于最大灰度值时,将该前景图像像素值取值为像素取值范围内的最大值;
    所述基于目标图像帧进行特征提取,得到目标图像帧的特征点,包括:
    对目标图像帧进行特征检测,得到特征点;
    将目标图像帧划分成一定数量的网格;
    对于任一网格中的特征点,将网格内的特征点按特征点响应值降序排列,保留前Q个特征点,得到筛选后的特征点;其中,Q根据目标图像帧中特征点的数量和设定的特征点总数上限、该网格中的特征点总数确定;
    对筛选后的各特征点,分别计算特征描述符。
  5. 如权利要求4所述的方法,其中,所述Q根据目标图像帧中特征点的数量和设定的特征点总数上限、该网格中的特征点总数确定,包括:Q为目标图像帧中特征点的数量除以设定的特征点总数上限之商,乘以网格中的特征点总数后的结果向下取整得到。
  6. 如权利要求2所述的方法,其中,所述对源图像帧进行帧间跟踪,确定关键帧,包括:
    对于每一目标图像帧:判断该目标图像帧是否为首帧;如果是,则将该目标图像帧作为关键帧;否则,根据关键帧条件确定该目标图像帧是否为关键帧;
    其中,所述关键帧条件至少满足以下条件之一:
    匹配特征点数量大于设定的第一阈值;
    与上一关键帧之间的空间距离大于设定的第二阈值;
    与上一关键帧之间的空间角度大于设定的第三阈值。
  7. 如权利要求6所述的方法,其中,所述源图像帧为来源于单目相机、且为同一平面的图像帧;
    所述计算当前关键帧中匹配特征点的空间位置信息,包括:
    对于每一匹配特征点:
    x坐标为:当前关键帧中该匹配特征点的像素横坐标与相机安装高度的乘积结果与相机焦距的比值;
    y坐标为:当前关键帧中该匹配特征点的像素纵坐标与相机安装高度的乘积结果与相机焦距的比值;
    z坐标为:相机安装高度。
  8. 如权利要求6所述的方法,其中,所述源图像帧为来源于单目相机、且为非同一平面的图像帧;
    所述计算当前关键帧中匹配特征点的空间位置信息,包括:
    根据由当前关键帧中匹配特征点与上一关键帧中匹配特征点组成的至少8对匹配特征点的像素坐标,得到当前关键帧与上一关键帧的本质矩阵;
    对本质矩阵进行奇异值分解,得到当前关键帧与上一关键帧之间的相对位姿;
    对于每一匹配特征点:根据当前关键帧与上一关键帧之间的相对位姿,按照三角化计算关系,至少得到当前关键帧中该匹配特征点的深度值;根据当前关键帧中该匹配特征点的深度值,得到该匹配特征点的空间位置信息。
  9. 如权利要求8所述的方法,其中,所述根据由当前关键帧中匹配特征点与上一关键帧中匹配特征点组成的至少8对匹配特征点的像素坐标,得到当前关键帧与上一关键帧的本质矩阵,包括:
    对于任一匹配特征点:
    根据上一关键帧中该匹配特征点的归一化平面坐标的转置矩阵、本质矩阵、当前关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积等于0的关系,代入8对匹配特征点的像素坐标,得到本质矩阵;
    所述根据当前关键帧与上一关键帧之间的相对位姿,按照三角化计算关系,至少得到当前关键帧中该匹配特征点的深度值,包括:
    基于当前关键帧中该匹配特征点的深度值与该匹配特征点的归一化平面坐标的矩阵之乘积等于,上一关键帧中该匹配特征点的深度值、相对位姿中的旋转矩阵、以及上一关键帧中该匹配特征点的归一化平面坐标的矩阵之乘积与相对位姿中的平移矩阵之和,根据当前关键帧与上一关键帧之间的相对位姿中的旋转矩阵和平移矩阵、当前关键帧和上一关键帧中该匹配特征点的归一化平面坐标的矩阵,得到当前关键帧中该匹配特征点的深度值;
    所述根据当前关键帧中该匹配特征点的深度值,得到该匹配特征点的空间位置信息,包括:
    x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与该匹配特征点的深度值的乘积;
    y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与该匹配特征点的深度值的乘积;
    z坐标为:相机焦距。
  10. 如权利要求6所述的方法,其中,所述源图像帧为来源于双目相机、且为非同一平面的双目图像帧;
    所述对源图像帧进行图像预处理,得到目标图像帧,包括:
    对来自第一目相机的第一源图像帧、来自第二目相机的第二源图像帧分别进行图像预处理,得到第一目标图像帧和第二目标图像帧,作为双目目标图像帧;
    所述基于目标图像帧进行特征提取,得到目标图像帧的特征点,包括:分别提取第一目标图像帧的特征点和第二目标图像帧的特征点;
    所述判断该目标图像帧是否为首帧,包括:判断双目目标图像帧是否为首帧;如果是,则将该双目目标图像帧中的任一帧作为关键帧;否则,根据关键帧条件确定该目标图像帧中的任一帧是否为关键帧;
    所述计算当前关键帧中匹配特征点的空间位置信息,包括:
    对于当前关键帧中每一匹配特征点:
    将当前关键帧作为当前双目目标图像帧中的第一帧,将该双目目标图像帧中的另一目标图像帧作为第二帧,将第一帧中该匹配特征点与第二帧中的特征点进行匹配;如果匹配成功,得到第二帧中的匹配特征点,则:
    当前关键帧中该匹配特征点的x坐标为:当前关键帧中该匹配特征点的归一化平面的像素横坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
    当前关键帧中该匹配特征点的y坐标为:当前关键帧中该匹配特征点的归一化平面的像素纵坐标与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值;
    当前关键帧中该匹配特征点的z坐标为:相机焦距与双目基线长度的乘积,除以当前关键帧中该匹配特征点的归一化平面的像素横坐标与第二帧中匹配特征点的归一化平面的像素横坐标之差的绝对值。
  11. 如权利要求2至10任一所述的方法,其中,所述根据存在闭环约束的闭环关键帧,采用最小二乘法,对关键帧位姿进行图优化,包括:
    根据人工标记或关键帧相似度计算,识别出第一视觉点云地图中的闭环关键帧;
    基于闭环关键帧,计算闭环关键帧之间的相对位姿,作为闭环约束;
    构造用于关键帧位姿图优化的第二目标函数,以闭环约束为约束,采用最小二乘法,求解使得第二目标函数取得最小值时的关键帧的位姿。
  12. 如权利要求11所述的方法,其中,所述根据人工标记或关键帧相似度计算,识别出第一视觉点云地图中的闭环关键帧,包括:
    将不同次地采集到相同标识的关键帧作为闭环关键帧;
    或者,
    计算两关键帧间的相似度是否大于设定的相似度阈值;如果是,则判定该两关键帧为闭环关键帧,其中,相似度包括:特征点分布上的相似度和图像像素的相似度;
    所述基于闭环关键帧,计算闭环关键帧之间的相对位姿,作为闭环约束,包括:
    基于闭环关键帧,计算该闭环关键帧中的匹配特征点;
    对于该闭环关键帧中的任一匹配特征点,根据第一闭环关键帧中该匹配特征点的像素坐标矩阵等于,第一闭环关键帧和第二闭环关键帧之间的相对位姿中的旋转矩阵与第二闭环关键帧的像素坐标矩阵的乘积加上相对位姿中的平移矩阵的关系,计算得到相对位姿,作为初始值;
    构建累计闭环关键帧中的所有匹配特征点的像素位置信息误差的第一目标函数,代入所有匹配特征点的像素坐标矩阵,迭代求解使得第一目标函数取得最小值时的相对位姿;
    所述构造用于关键帧位姿图优化的第二目标函数,以闭环约束为约束,采用最小二乘法,求解使得第二目标函数取得最小值时的关键帧的位姿,包括:
    根据任一第一关键帧的位姿和任一第二关键帧的位姿之间的相对位姿的误差,构建累计该第一关键帧和该第二关键帧的相对位姿的误差的第二目标函数;
    以该第一关键帧和第二关键帧之间的相对位姿误差作为初始值,以所述闭环约束为约束,迭代求解使得第二目标函数取得最小值时的第一关键帧的位姿和第二关键帧的位姿。
  13. 如权利要求12所述的方法,其中,所述计算两关键帧间的相似度是否大于设定的相似度阈值,包括:
    在根节点,用k均值聚类算法将所有特征点聚成k类,得到第一层节点;
    对于第一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点;
    对于下一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点;重复执行所述对于下 一层的每个节点,将属于该节点的特征点聚成k类,得到下一层节点的步骤,直至最后的叶子层,得到视觉字典,该视觉字典包括N个特征点、且每次分叉为k的树;
    其中,从根节点到叶子层共计d层,叶子层中包括视觉字典中的单词特征点;k、d、N均为自然数,N为视觉字典中特征点的总数;
    对于任一关键帧,根据该关键帧中所有特征点数量以及任一单词特征点出现的次数,计算该单词特征点的权重,将该关键帧描述为以各个单词特征点及其权重为元素的集合,该集合包括有N个元素;
    根据第一关键帧所描述的集合中的所有元素和第二关键帧所描述的集合中的所有元素,计算第一关键帧与第二关键帧的相似度;
    若相似度大于设定的相似度阈值,判定两关键帧之间存在闭环约束。
  14. 如权利要求2至10任一所述的方法,其中,所述根据重投影误差,对地图点的空间位置信息进行优化,包括:
    对任一关键帧,根据该关键帧的位姿所采集到任一地图点在该关键帧中的像素位置所存在的重投影误差,构建重投影误差的第三目标函数;
    以重投影误差初始值,迭代求解使得第三目标函数取得最小值时的地图点的空间位置信息;
    其中,重投影误差初始值为:该地图点在该关键帧中的像素位置与该地图点重投影在图像中的位置之差;
    所述该地图点重投影在图像中的位置根据相机内参、该关键帧的位姿、该地图点的空间位置信息得到。
  15. 一种视觉点云地图的构建装置,其中,该装置包括第一视觉点云地图构建单元,该单元包括:
    特征提取模块,用于对待建地图的空间所采集的源图像帧,进行特征提取,得到源图像帧特征点;
    地图点生成模块,用于对源图像帧进行帧间跟踪,确定关键帧;将当前关键帧中的特征点与上一关键帧中的特征点进行匹配,得到当前关键帧的匹配特征点;计算当前关键帧中匹配特征点的空间位置信息,将匹配特征点的空间位置信息作为当前关键帧的地图点信息;
    其中,所有关键帧的地图点集合所构成的点云为第一视觉点云地图。
  16. 一种电子设备,其中,包括存储器和处理器,所述存储器存储有可执行的计算机指令,所述处理器被配置执行所述存储器中存储的指令,以实现如权利要求1至14任一所述视觉点云地图的构建方法的步骤。
  17. 一种计算机可读存储介质,其中,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至14任一所述视觉点云地图的构建方法的步骤。
  18. 一种计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至14任一所述视觉点云地图的构建方法的步骤。
PCT/CN2021/103653 2020-06-30 2021-06-30 一种视觉点云地图的构建方法、装置 WO2022002150A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010615170.6A CN111795704B (zh) 2020-06-30 2020-06-30 一种视觉点云地图的构建方法、装置
CN202010615170.6 2020-06-30

Publications (1)

Publication Number Publication Date
WO2022002150A1 true WO2022002150A1 (zh) 2022-01-06

Family

ID=72809796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103653 WO2022002150A1 (zh) 2020-06-30 2021-06-30 一种视觉点云地图的构建方法、装置

Country Status (2)

Country Link
CN (1) CN111795704B (zh)
WO (1) WO2022002150A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311756A (zh) * 2020-02-11 2020-06-19 Oppo广东移动通信有限公司 增强现实ar显示方法及相关装置
CN114529705A (zh) * 2022-04-22 2022-05-24 山东捷瑞数字科技股份有限公司 一种三维引擎编辑器的界面布局处理方法
CN114745533A (zh) * 2022-02-28 2022-07-12 杭州小伴熊科技有限公司 一种空间关键点数据采集极值定准方法及系统
CN116030136A (zh) * 2023-03-29 2023-04-28 中国人民解放军国防科技大学 基于几何特征的跨视角视觉定位方法、装置和计算机设备
CN116147618A (zh) * 2023-01-17 2023-05-23 中国科学院国家空间科学中心 一种适用动态环境的实时状态感知方法及系统
CN116452776A (zh) * 2023-06-19 2023-07-18 国网浙江省电力有限公司湖州供电公司 基于视觉同步定位与建图系统的低碳变电站场景重建方法
CN116567166A (zh) * 2023-07-07 2023-08-08 广东省电信规划设计院有限公司 一种视频融合方法、装置、电子设备及存储介质
CN116681733A (zh) * 2023-08-03 2023-09-01 南京航空航天大学 一种空间非合作目标近距离实时位姿跟踪方法
CN116883251A (zh) * 2023-09-08 2023-10-13 宁波市阿拉图数字科技有限公司 基于无人机视频的图像定向拼接与三维建模方法
CN117635875A (zh) * 2024-01-25 2024-03-01 深圳市其域创新科技有限公司 一种三维重建方法、装置及终端
CN117635875B (zh) * 2024-01-25 2024-05-14 深圳市其域创新科技有限公司 一种三维重建方法、装置及终端

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111795704B (zh) * 2020-06-30 2022-06-03 杭州海康机器人技术有限公司 一种视觉点云地图的构建方法、装置
CN113761091B (zh) * 2020-11-27 2024-04-05 北京京东乾石科技有限公司 闭环检测方法、装置、电子设备、系统和存储介质
CN112614185B (zh) * 2020-12-29 2022-06-21 浙江商汤科技开发有限公司 地图构建方法及装置、存储介质
CN112767546B (zh) * 2021-01-22 2022-08-02 湖南大学 移动机器人基于双目图像的视觉地图生成方法
CN113063424B (zh) * 2021-03-29 2023-03-24 湖南国科微电子股份有限公司 一种商场内导航方法、装置、设备及存储介质
CN113515536B (zh) * 2021-07-13 2022-12-13 北京百度网讯科技有限公司 地图的更新方法、装置、设备、服务器以及存储介质
CN113674411A (zh) * 2021-07-29 2021-11-19 浙江大华技术股份有限公司 基于位姿图调整的建图方法及相关设备
CN113536024B (zh) * 2021-08-11 2022-09-09 重庆大学 一种基于fpga的orb_slam重定位特征点检索加速方法
CN113670293A (zh) * 2021-08-11 2021-11-19 追觅创新科技(苏州)有限公司 地图构建方法及装置
CN114088099A (zh) * 2021-11-18 2022-02-25 北京易航远智科技有限公司 基于已知地图的语义重定位方法、装置、电子设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671120A (zh) * 2018-11-08 2019-04-23 南京华捷艾米软件科技有限公司 一种基于轮式编码器的单目slam初始化方法及系统
US20190206116A1 (en) * 2017-12-28 2019-07-04 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for monocular simultaneous localization and mapping
CN110378345A (zh) * 2019-06-04 2019-10-25 广东工业大学 基于yolact实例分割模型的动态场景slam方法
US20200005487A1 (en) * 2018-06-28 2020-01-02 Ubtech Robotics Corp Ltd Positioning method and robot using the same
CN111322993A (zh) * 2018-12-13 2020-06-23 杭州海康机器人技术有限公司 一种视觉定位方法和装置
CN111795704A (zh) * 2020-06-30 2020-10-20 杭州海康机器人技术有限公司 一种视觉点云地图的构建方法、装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330373A (zh) * 2017-06-02 2017-11-07 重庆大学 一种基于视频的违章停车监控系统
CN107341814B (zh) * 2017-06-14 2020-08-18 宁波大学 基于稀疏直接法的四旋翼无人机单目视觉测程方法
CN107369183A (zh) * 2017-07-17 2017-11-21 广东工业大学 面向mar的基于图优化slam的跟踪注册方法及系统
CN109887029A (zh) * 2019-01-17 2019-06-14 江苏大学 一种基于图像颜色特征的单目视觉里程测量方法
CN110570453B (zh) * 2019-07-10 2022-09-27 哈尔滨工程大学 一种基于双目视觉的闭环式跟踪特征的视觉里程计方法
CN110533722B (zh) * 2019-08-30 2024-01-12 的卢技术有限公司 一种基于视觉词典的机器人快速重定位方法及系统
CN110782494A (zh) * 2019-10-16 2020-02-11 北京工业大学 一种基于点线融合的视觉slam方法
CN111325842B (zh) * 2020-03-04 2023-07-28 Oppo广东移动通信有限公司 地图构建方法、重定位方法及装置、存储介质和电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190206116A1 (en) * 2017-12-28 2019-07-04 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for monocular simultaneous localization and mapping
US20200005487A1 (en) * 2018-06-28 2020-01-02 Ubtech Robotics Corp Ltd Positioning method and robot using the same
CN109671120A (zh) * 2018-11-08 2019-04-23 南京华捷艾米软件科技有限公司 一种基于轮式编码器的单目slam初始化方法及系统
CN111322993A (zh) * 2018-12-13 2020-06-23 杭州海康机器人技术有限公司 一种视觉定位方法和装置
CN110378345A (zh) * 2019-06-04 2019-10-25 广东工业大学 基于yolact实例分割模型的动态场景slam方法
CN111795704A (zh) * 2020-06-30 2020-10-20 杭州海康机器人技术有限公司 一种视觉点云地图的构建方法、装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG, XIAOTONG: "Design of Localization Algorithm for Sweeping Robot Based on Monocular Vision", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, no. 12, 15 January 2019 (2019-01-15), pages 1 - 108, XP055883460, ISSN: 1674-0246 *
ZHANG, JUNJIE: "Three-dimensional Map Construction and Application Based on Visual SLAM", BASIC SCIENCES, CHINA MASTER’S THESES FULL-TEXT DATABASE, no. 1, 15 January 2020 (2020-01-15), pages 1 - 76, XP055883454, ISSN: 1674-0246 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311756A (zh) * 2020-02-11 2020-06-19 Oppo广东移动通信有限公司 增强现实ar显示方法及相关装置
CN114745533A (zh) * 2022-02-28 2022-07-12 杭州小伴熊科技有限公司 一种空间关键点数据采集极值定准方法及系统
CN114745533B (zh) * 2022-02-28 2024-05-07 杭州小伴熊科技有限公司 一种空间关键点数据采集极值定准方法及系统
CN114529705A (zh) * 2022-04-22 2022-05-24 山东捷瑞数字科技股份有限公司 一种三维引擎编辑器的界面布局处理方法
CN114529705B (zh) * 2022-04-22 2022-07-19 山东捷瑞数字科技股份有限公司 一种三维引擎编辑器的界面布局处理方法
CN116147618B (zh) * 2023-01-17 2023-10-13 中国科学院国家空间科学中心 一种适用动态环境的实时状态感知方法及系统
CN116147618A (zh) * 2023-01-17 2023-05-23 中国科学院国家空间科学中心 一种适用动态环境的实时状态感知方法及系统
CN116030136A (zh) * 2023-03-29 2023-04-28 中国人民解放军国防科技大学 基于几何特征的跨视角视觉定位方法、装置和计算机设备
CN116452776B (zh) * 2023-06-19 2023-10-20 国网浙江省电力有限公司湖州供电公司 基于视觉同步定位与建图系统的低碳变电站场景重建方法
CN116452776A (zh) * 2023-06-19 2023-07-18 国网浙江省电力有限公司湖州供电公司 基于视觉同步定位与建图系统的低碳变电站场景重建方法
CN116567166B (zh) * 2023-07-07 2023-10-17 广东省电信规划设计院有限公司 一种视频融合方法、装置、电子设备及存储介质
CN116567166A (zh) * 2023-07-07 2023-08-08 广东省电信规划设计院有限公司 一种视频融合方法、装置、电子设备及存储介质
CN116681733A (zh) * 2023-08-03 2023-09-01 南京航空航天大学 一种空间非合作目标近距离实时位姿跟踪方法
CN116681733B (zh) * 2023-08-03 2023-11-07 南京航空航天大学 一种空间非合作目标近距离实时位姿跟踪方法
CN116883251A (zh) * 2023-09-08 2023-10-13 宁波市阿拉图数字科技有限公司 基于无人机视频的图像定向拼接与三维建模方法
CN116883251B (zh) * 2023-09-08 2023-11-17 宁波市阿拉图数字科技有限公司 基于无人机视频的图像定向拼接与三维建模方法
CN117635875A (zh) * 2024-01-25 2024-03-01 深圳市其域创新科技有限公司 一种三维重建方法、装置及终端
CN117635875B (zh) * 2024-01-25 2024-05-14 深圳市其域创新科技有限公司 一种三维重建方法、装置及终端

Also Published As

Publication number Publication date
CN111795704B (zh) 2022-06-03
CN111795704A (zh) 2020-10-20

Similar Documents

Publication Publication Date Title
WO2022002150A1 (zh) 一种视觉点云地图的构建方法、装置
WO2022002039A1 (zh) 一种基于视觉地图的视觉定位方法、装置
CN109684924B (zh) 人脸活体检测方法及设备
Kim et al. Recurrent transformer networks for semantic correspondence
CN113012212B (zh) 一种基于深度信息融合的室内场景三维点云重建方法和系统
CN111780764B (zh) 一种基于视觉地图的视觉定位方法、装置
CN108960211B (zh) 一种多目标人体姿态检测方法以及系统
Heo et al. Joint depth map and color consistency estimation for stereo images with different illuminations and cameras
CN107633226B (zh) 一种人体动作跟踪特征处理方法
CN112967341B (zh) 基于实景图像的室内视觉定位方法、系统、设备及存储介质
CN111445459B (zh) 一种基于深度孪生网络的图像缺陷检测方法及系统
WO2019041660A1 (zh) 人脸去模糊方法及装置
CN113592911B (zh) 表观增强深度目标跟踪方法
CN110458235B (zh) 一种视频中运动姿势相似度比对方法
CN112163588A (zh) 基于智能进化的异源图像目标检测方法、存储介质及设备
CN114119739A (zh) 一种基于双目视觉的手部关键点空间坐标获取方法
CN113361542A (zh) 一种基于深度学习的局部特征提取方法
Liu et al. Regularization based iterative point match weighting for accurate rigid transformation estimation
CN110175954A (zh) 改进的icp点云快速拼接方法、装置、电子设备及存储介质
Zou et al. Microarray camera image segmentation with Faster-RCNN
CN111709317B (zh) 一种基于显著性模型下多尺度特征的行人重识别方法
CN112614167A (zh) 一种结合单偏光与正交偏光图像的岩石薄片图像对齐方法
CN111127353A (zh) 一种基于块配准和匹配的高动态图像去鬼影方法
Li et al. Research on hybrid information recognition algorithm and quality of golf swing
Wang Three-Dimensional Image Recognition of Athletes' Wrong Motions Based on Edge Detection.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834676

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21834676

Country of ref document: EP

Kind code of ref document: A1