CN115619851A

CN115619851A - Anchor point-based VSLAM (virtual local area network) rear-end optimization method, device, medium, equipment and vehicle

Info

Publication number: CN115619851A
Application number: CN202211177297.XA
Authority: CN
Inventors: 杜振东; 林伟; 徐慧明; 宋健雄; 孙思聪
Original assignee: Uisee Technologies Beijing Co Ltd
Current assignee: Uisee Technologies Beijing Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-01-17

Abstract

The present disclosure relates to an anchor point-based VSLAM backend optimization method, apparatus, medium, device, and vehicle, the method comprising: acquiring a plurality of alignment anchor points and a plurality of images collected by a camera; the time stamps of the alignment anchor points correspond to the time stamps of the images respectively; the alignment anchor points are pose data generated by adopting other positioning sources; generating map points and key frames based on the images; each key frame corresponds to an alignment anchor point; and if the local back-end optimization condition is met, performing VSLAM back-end optimization based on the alignment anchor points corresponding to the key frames. Therefore, according to the technical scheme provided by the disclosure, the alignment anchor points are introduced into the VSLAM rear-end optimization step, the alignment anchor points with higher pose precision are used as the prior pose true values of the corresponding key frames, and pose constraints between the key frames and the alignment anchor points are added in the VSLAM rear-end optimization step, so that the rear-end optimization precision is improved, the VSLAM image construction precision is further improved, and the problems of larger accumulated error and continuous increase are solved.

Description

Anchor point-based VSLAM (virtual local area network) rear-end optimization method, device, medium, equipment and vehicle

Technical Field

The disclosure relates to the technical field of computer vision, and in particular relates to a VSLAM rear-end optimization method, device, medium, equipment and vehicle based on anchor points.

Background

When the computer vision is applied to an unmanned driving scene or an auxiliary driving scene, the computer vision can be applied to a positioning technology. Currently, a mainstream Positioning technology is generally a multi-sensor fusion technology, and Positioning sources participating in the fusion mainly include Global Positioning System (GPS) Positioning, laser radar-based LSLAM Positioning, pure-vision-based VSLAM, and visual-Inertial odometer (VIO) of an Inertial Measurement Unit (IMU), and the like. Among them, VSLAM has formed a more mature framework over many years, as shown in fig. 1.

In the related art, sensors applied to the VSLAM are mainly classified into a monocular camera, a binocular camera, and an RGB-D camera. Depth information is available to RGB-D cameras, but due to the associated hardware capability limitations, the range of depth perceived by RGB-D cameras is limited and rarely used in unmanned and assisted driving scenarios. Monocular cameras and binocular cameras are currently common sensors of the VSLAM technology in the field of unmanned driving, and the corresponding positioning technologies are called monocular VSLAM and binocular VSLAM, respectively. In monocular VSLAM and binocular VSLAM, steps of front-end generation of local maps and back-end optimization are usually included in order to reduce accumulated errors and obtain globally consistent trajectories and maps. The target front-end and back-end optimization is usually performed based on the key frames and map points corresponding to the local map generated by the front end, and the accumulated error is continuously reduced by using the constraint between the key frames and the map points, the mutual constraint between the key frames and loop information in an iterative manner. However, due to the existence of noise, the accuracy of the back-end optimization is poor, a large accumulated error still exists, and the accumulated error increases with the increase of the size of the map.

Disclosure of Invention

To solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides an anchor point-based VSLAM backend optimization method, apparatus, medium, device, and vehicle.

The present disclosure provides an anchor point-based VSLAM backend optimization method, including:

acquiring a plurality of alignment anchor points and a plurality of images collected by a camera; the time stamps of the alignment anchor points correspond to the time stamps of the images respectively; the alignment anchor points are pose data generated by adopting other positioning sources;

generating map points and key frames based on the images; each key frame corresponds to an alignment anchor point;

and if the local back-end optimization condition is met, performing VSLAM back-end optimization based on the alignment anchor points corresponding to the key frames.

Optionally, the local back-end optimization condition comprises at least one of a first condition and a second condition;

the first condition includes: the number of the newly added key frames is greater than the preset number threshold; the number of the newly added key frames is the number of the key frames which are already added after the last local back-end optimization;

the second condition includes: the pose error is greater than the preset error threshold; the pose error is an error determined based on the current keyframe pose and the corresponding anchor pose.

Optionally, the performing VSLAM backend optimization includes:

performing local back-end optimization based on the alignment anchor points;

judging whether all the images are processed;

if not, returning to the image to generate map points and key frames based on the image;

and if the processing is finished, performing global back-end optimization based on the alignment anchor points.

Optionally, the performing local backend optimization or the performing global backend optimization includes:

and optimizing a reprojection error and an error between the key frame and the corresponding alignment anchor point by adopting a graph optimization mode.

determining a cost function corresponding to the VSLAM observation model;

the cost function tends to be minimum through iterative optimization;

wherein the cost function is:

wherein e is _ij Representing the cumulative cost, x _i Representing the corresponding position of the camera in the current image frame, y _i Represents the corresponding map point, z _ij Representing the observation data, ax, collected by the camera for that map point _i Representing the corresponding position of the alignment anchor point.

Optionally, the pose error comprises at least one of a position error, a yaw angle error, a pitch angle error, and a roll angle error; the pose error is calculated by adopting the following formula:

E _yaw ＝(yaw _i -yaw _ai )

E _pitch ＝(pitch _i -pitch _ai )

E _roll ＝(roll _i -roll _ai )

wherein (x) _i ，y _i ，z _i ，yaw _i ，pitch _i ，roll _i ) Represents the pose of the current keyframe, (x) _ai ，y _ai ，z _ai ，yaw _ai ，pitch _ai ，roll _ai ) Representing the pose of the alignment anchor; epos represents a position error between the current key frame and the corresponding alignment anchor point, eyaw represents a yaw angle error between the current key frame and the corresponding alignment anchor point, epitch represents a pitch angle error between the current key frame and the corresponding alignment anchor point, and Eroll represents a roll angle error between the current key frame and the alignment anchor point.

The present disclosure also provides an anchor point-based VSLAM backend optimization device, including:

the acquisition module is used for acquiring a plurality of alignment anchor points and a plurality of images collected by the camera; the time stamps of the alignment anchor points correspond to the time stamps of the images respectively; the alignment anchor points are pose data generated by adopting other positioning sources;

a generation module for generating map points and keyframes based on at least the images; each key frame corresponds to an alignment anchor point;

and the optimization module is used for performing VSLAM rear-end optimization based on the alignment anchor points corresponding to the key frames after judging that the local rear-end optimization conditions are met.

The present disclosure also provides a computer-readable storage medium having stored thereon a computer program for performing the steps of any of the above-described methods.

The present disclosure also provides an electronic device, including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the steps of any one of the above methods.

The present disclosure also provides a vehicle including any one of the above-described electronic devices.

Compared with the prior art, the technical scheme provided by the disclosure has the following advantages:

the anchor point-based VSLAM back-end optimization method provided by the present disclosure comprises: acquiring a plurality of alignment anchor points and a plurality of images collected by a camera; the time stamp of the alignment anchor point corresponds to the time stamp of the image respectively; the alignment anchor points are pose data generated by adopting other positioning sources; generating map points and key frames based on the images; each key frame corresponds to an alignment anchor point; and if the local back-end optimization condition is met, performing VSLAM back-end optimization based on the alignment anchor points corresponding to the key frames. The alignment anchor points are pose data generated based on other positioning sources, and more accurate pose data can be given to the images and the key frames corresponding to the timestamps; therefore, VSLAM rear-end optimization is carried out by combining the alignment anchor points, namely the alignment anchor points with higher pose precision are taken as the prior pose true values of the corresponding key frames in the step of introducing the alignment anchor points into the VSLAM rear-end optimization, and pose constraints between the key frames and the alignment anchor points are added in the VSLAM rear-end optimization step, so that the rear-end optimization precision is improved, the VSLAM mapping precision is improved, and the problems of larger accumulated error and continuous increase are solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flow-chart framework diagram of a VSLAM according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a diagram optimization method provided in the related art;

fig. 3 is a schematic flowchart of a VSLAM backend optimization method based on anchor points according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a diagram optimization manner provided by an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of another anchor-based VSLAM backend optimization method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an anchor point-based VSLAM backend optimization apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

The disclosed embodiments relate generally to unmanned positioning technology. In particular, with the development of computer vision technology, unmanned driving has gained widespread attention as one of typical application scenarios in the context of new national infrastructure. Among the key technologies for unmanned driving, positioning technology occupies an extremely important position. The framework of the VSLAM is shown in fig. 1, and generally includes the steps of acquiring sensor data, front-end visual odometry (i.e., front-end), back-end nonlinear optimization (i.e., back-end optimization), loop detection, and mapping.

In the step of the front-end visual odometer, information extraction is carried out on images acquired by the cameras, the movement of the corresponding cameras between adjacent images is estimated, and a local map is generated. The method for realizing the front-end visual odometer can be classified according to whether features need to be extracted or not, and can be classified into a feature point method and a direct method. Wherein, due to the existence of noise, the front-end visual odometer can generate continuously accumulated errors; correspondingly, the back-end optimization step receives camera poses at different moments (i.e., camera poses corresponding to different images) and loop detection information, and performs optimization to reduce accumulated errors, thereby facilitating obtaining globally consistent tracks and maps.

The currently common back-end optimization method is nonlinear optimization, and mainly aims at key frames and map points (namely obtained three-dimensional points) generated at the front end, and utilizes the constraints between the key frames and the map points, the constraints between the key frames and loop information to continuously reduce the overall error in an iterative manner. However, due to the presence of noise, the back-end optimization is still subject to large errors in the back-end optimized map, relying solely on these intrinsic constraints present in the VSLAM process, and these errors increase as the size of the map increases.

In addition, the back-end optimization can also be realized by adopting a graph optimization mode, and reference can be made to fig. 2. In which map points and key frames generated by a front-end visual odometer are shown; map points { y1, y2, y 3.., yn } are shown as dots in fig. 2, the true locations { ry1, ry2, ry 3.., ryn } of the corresponding map points are shown as squares in fig. 2, key frames { x1, x2, x 3.., xm } are shown as triangles in fig. 2, and the true locations { rx1, rx2, rx 3.,. Rxn } of the corresponding key frames are shown as hexagons in fig. 2. Map point locations and keyframe poses form the vertices of the graph optimization. Due to the existence of noise, there is a reprojection error in the observation of the keyframe to the map point, which constitutes an edge of the map optimization, as shown by the dashed line L1 in fig. 2.

Based on the above, the back-end optimization by adopting the graph optimization mode means that the value of the integral error edge tends to be minimum through continuous optimization, so that the positions and the poses of the map points and the key frames approach to the true value. The map optimization mode only depends on map points and key frames generated by the front-end visual odometer and internal constraints between the map points and the key frames, so that the accuracy of the optimized map cannot be high enough, and when the front-end visual odometer gives wrong initial values of the positions of the key frames and the map points due to mismatching and the like, the rear-end optimization cannot be corrected well. Meanwhile, the back-end optimization method inevitably has accumulated errors, and although the looped-back detection exists, the looped-back map can be corrected to a certain extent, but the error remaining degree is still large. While for open tracks, where the tracks are not loops, the cumulative error will be larger.

Aiming at the problem, the invention provides a VSLAM rear-end optimization method based on anchor points. The anchor point in the embodiments of the present disclosure refers to track data (i.e., pose data) generated by using other positioning sources, such as GPS, LSLAM, or fusion positioning, and may be generated offline or online, which is not limited herein. It can be understood that in the embodiments of the present disclosure, anchor points are characterized by having very high pose accuracy, and by aligning the anchor points with images by time stamps, pose data of anchor points of corresponding images (and key frames determined based on the images) can be given, and pose data with higher accuracy and higher accuracy can be given to corresponding images and key frames, so that by introducing anchor points into the VSLAM back-end optimization method, constraints between the key frames and anchor points can be added to the back-end optimization process, thereby improving back-end accuracy and reducing accumulated errors; and the accuracy of the visual map built by the VSLAM is improved, and the accuracy of the visual map is not attenuated along with the increase of the size of the map, namely the accuracy of the visual map can keep higher accuracy for any size of map, so that the problem of continuous increase of accumulated error is solved. Meanwhile, the method is suitable for maps corresponding to various tracks, and high precision can be achieved no matter the tracks form rings or the tracks do not form rings.

It can be understood that the technical solution provided by the embodiment of the present disclosure mainly solves the problem of how to improve the map accuracy obtained by VSLAM backend optimization. Specifically, the problem of how to improve the optimization precision of the back end is solved by aiming at the VSLAM adopting the feature point method front end and the graph optimization back end.

The following provides an exemplary description of a method, an apparatus, a medium, a device, and a vehicle for anchor point-based VSLAM backend optimization according to embodiments of the present disclosure with reference to the accompanying drawings.

Exemplarily, fig. 3 is a schematic flowchart of a VSLAM backend optimization method based on anchor points according to an embodiment of the present disclosure. Referring to fig. 3, the method may include the steps of:

and S110, acquiring a plurality of alignment anchor points and a plurality of images collected by the camera.

The time stamps of the alignment anchor points correspond to the time stamps of the images respectively; the alignment anchor points are pose data generated by adopting other positioning sources. Other positioning sources may include, for example, GPS, LSLAM, or fusion positioning, among others. It can be understood that the alignment anchor points have higher pose precision relative to the image, so that pose data under a real scale with higher image accuracy under corresponding time can be given.

In some embodiments, there may be a difference in at least one of sampling time and rate of adoption of other positioning sources and the camera, and the timestamp of the initial anchor point generated by the other positioning sources does not correspond to the timestamp of the image, so that before this step, interpolating the initial anchor point based on the timestamp may be further included to obtain an anchor point whose timestamp corresponds to one of the images, that is, an aligned anchor point. In the embodiment of the present disclosure, the timestamp of the image acquired by the camera is used as a reference, and the pose data of other positioning sources are interpolated (for example, linear interpolation) to obtain the alignment anchor point corresponding to the image, so as to obtain more accurate pose data corresponding to each image, and prepare for performing S120 and S130 subsequently.

Illustratively, an alignment anchor may be pose data for up to 6 degrees of freedom, i.e., x, y, z, yaw, pitch, roll, in three-dimensional space for example. It can be appreciated that the information of 6 degrees of freedom is not necessarily all utilized in aligning anchor points for subsequent steps. For example, for an unmanned vehicle, the degrees of freedom used may be 3, e.g., x, y, and yaw, respectively; for a drone, the degrees of freedom used may be 6, i.e. all the degrees of freedom described above.

In some embodiments, generating an initial anchor point corresponding to a GPS may include: directly extracting east, normal, height, yaw, pitch and roll data in the GPS data, converting global coordinates into local coordinates, namely unifying the data to the same coordinate system, and correspondingly generating a GPS anchor point.

In some embodiments, generating an initial anchor point corresponding to a radar (LIDAR) may include: based on data collected by the radar, namely point cloud, an LSLAM map is generated through mapping and positioning, and meanwhile, an initial anchor point corresponding to the radar is generated.

And S120, generating map points and key frames based on the images.

Wherein, each key frame is corresponding to an alignment anchor point. The key frames are partial images selected from all the images, namely the set of the key frames is a subset of the set of the images; and the images all have corresponding alignment anchor points, so that each key frame also has corresponding alignment anchor points.

In the step, the image and the alignment anchor point are used as input, VSLAM mapping is carried out, map points and key frames are generated, and each key frame is provided with the corresponding alignment anchor point.

And S130, if the local back-end optimization condition is met, performing VSLAM back-end optimization based on the alignment anchor points corresponding to the key frames.

The local backend optimization condition is a condition for determining whether VSLAM backend optimization can be performed, and may also be understood as a condition for determining whether local backend optimization is achieved. If the local back-end optimization condition is met, performing VSLAM back-end optimization, specifically, performing VSLAM back-end optimization based on the alignment anchor points corresponding to the key frame, which is described in detail later; if the local back-end optimization condition is not met, VSLAM back-end optimization is not carried out, VSLAM is returned to build a map, and map points and key frames continue to be accumulated.

In some embodiments, the local back-end optimization condition comprises at least one of a first condition and a second condition; wherein the first condition comprises: the number of the newly added key frames is greater than a preset number threshold; the number of the newly added key frames is the number of the key frames which are already added after the last local back-end optimization; the second condition includes: the pose error is greater than a preset error threshold; the pose error is an error determined based on the current keyframe pose and the corresponding anchor pose.

Specifically, one or both of the first condition and the second condition are satisfied, and both the first condition and the second condition satisfy the local backend optimization condition.

In the first condition, a preset quantity threshold value is used for judging whether the newly added key frames are enough or not; if the number of the newly added key frames is larger than the preset number threshold, the fact that the local back-end optimization is carried out at the last time is indicated, and enough key frames are newly added; correspondingly, the local back-end optimization condition is met, and then local back-end optimization is carried out again.

In the second condition, the pose of the anchor point is pose data of the alignment anchor point, and a preset error threshold value is used for measuring whether the pose error is too large; if the pose error is larger than a preset error threshold, indicating that the error between the anchor point poses corresponding to the pose distance of the current key frame is too large; correspondingly, the local back-end optimization condition is met, and then local back-end optimization is carried out again.

In some embodiments, the pose error comprises at least one of a position error, a yaw angle error, a pitch angle error, and a roll angle error; the pose error is calculated by adopting the following formula:

E _yaw ＝(yaw _i -yaw _ai )

E _pitch ＝(pitch _i -pitch _ai )

E _roll ＝(roll _i -roll _ai )

wherein (x) _i ,y _i ,z _i ,yaw _i ,pitch _i ,roll _i ) Represents the pose of the current keyframe, (x) _ai ,y _ai ,z _ai ,yaw _ai ,pitch _ai ,roll _ai ) Representing the pose of the alignment anchor; epos represents a position error between the current key frame and the corresponding alignment anchor point, eyaw represents a yaw angle error between the current key frame and the corresponding alignment anchor point, epitch represents a pitch angle error between the current key frame and the corresponding alignment anchor point, and Eroll represents a roll angle error between the current key frame and the alignment anchor point.

Wherein, in combination with the 6 degrees of freedom corresponding to the three-dimensional space exemplified above, the position error may be based on a position error calculated jointly by x, y and z, which corresponds to a distance error in the three-dimensional space; in other embodiments, the position error may also be calculated based on one or two of the degrees of freedom x, y, and z to obtain the distance error in the corresponding degree of freedom. The angle error is obtained by corresponding calculation on three angle degrees of freedom, and is calculated on three different angle degrees of freedom respectively.

In some embodiments, for moving bodies where rotation is generally not possible, such as autonomous vehicles, the pose errors primarily include position errors and yaw angle errors; for a moving body which can rotate, such as an unmanned aerial vehicle, the pose error not only includes a position error, but also includes three angle errors.

It can be understood that when at least one of the position error and the three angle errors exceeds the corresponding preset error threshold, it indicates that the error deviation is large, and the back-end optimization is required.

In some embodiments, performing back-end optimization may include performing local back-end optimization and global back-end optimization to obtain a globally consistent and high-precision map.

Exemplarily, on the basis of fig. 1, performing VSLAM backend optimization in S130 may specifically include the following steps:

performing local back-end optimization based on the alignment anchor points;

judging whether all the images are processed;

if not, returning to generate map points and key frames based on the image;

Specifically, local back-end optimization is performed based on an alignment anchor point; further judging whether all the images are processed; if not, returning to the VSLAM image building based on the image and the alignment anchor point to generate map points and key frames; and performing global back-end optimization based on the alignment anchor points until all the images are processed, so as to further unify the global size.

It should be noted that, the local backend optimization and the global backend optimization may use the same optimization manner, or may use different optimization manners, which is not limited herein, and the specific optimization manner is described in the following text.

In some embodiments, performing local backend optimization or performing global backend optimization may specifically include the following steps:

In the embodiment of the disclosure, a graph optimization mode is adopted to perform VSLAM rear-end optimization, and the pose of the alignment anchor point is taken as a reference, so that the positions and the poses of the map point and the key frame are closer to real values, the accumulated error is favorably reduced, and the optimized map has higher precision.

Exemplarily, fig. 4 is a schematic diagram illustrating a diagram optimization manner provided by the embodiment of the present disclosure. On the basis of fig. 2, reference is made to fig. 4, wherein the same reference numerals are not explained repeatedly; unlike fig. 2, anchor pose is introduced in fig. 4, specifically, { a1, a2, a3,. Ann }, an } represents the pose of the alignment anchor to which the image (i.e., keyframe) corresponds, and dashed line L2 represents the edge formed by the error between the keyframe pose and the anchor pose. By adopting the image optimization and optimizing the reprojection error and the error between the key frame pose and the anchor point pose at the same time, the overall error can be greatly reduced, so that the key frame pose is close to the anchor point pose, and the key frame pose is close to the real pose, so that the position of the map point is closer to the real position.

It should be noted that fig. 2 and fig. 4 only exemplarily show 4 different points, that is, the value of n is 4, and in other embodiments, the value of n may also be set to any other value, which is not limited herein.

In some embodiments, performing local backend optimization, or performing global backend optimization, comprises:

determining a cost function corresponding to the VSLAM observation model;

through iterative optimization, the cost function tends to be minimum;

wherein the cost function is:

wherein e is _ij Represents the cumulative cost, x _i Representing the corresponding position of the camera in the current image frame, y _i Represents the corresponding map point, z _ij Representing the observation data, ax, collected by the camera for that map point _i Representing the corresponding position of the alignment anchor.

Specifically, the VSLAM problem corresponds to two basic models, namely, a motion model and an observation model, which can be shown in the form of equations, in the embodiment of the present disclosure, only the observation model is focused on, and the equation corresponding to the observation model can be expressed as:

z _k，j ＝h(y _j ，x _k ，v _k，j ).

wherein x represents the pose of the camera, i.e. the pose of the keyframe; y represents the position of a map point which can be observed by the camera, namely the coordinate of the map point; v represents the observation noise and z represents the observation data, i.e. the projection of the map point on the image. The significance of this equation is: the camera is at x _k A map point yj is observed at a position, and an observation datum z is correspondingly generated _k,j And wherein the observed noise is v _k,j 。

The rear-end optimization of the VSLAM is a solving mode aiming at the observation model and adopting iterative optimization, and the value of the whole cost function is smaller and smaller by setting the cost function and continuously iterating along the gradient descending direction, so that the values of x and y are closer and closer to the true value. Based on the equation corresponding to the observation model, a least square method is selected for solving, and then the cost function can be expressed as:

wherein the actual observation data z is due to the presence of observation noise _ij Not exactly equal to the theoretical value of observation h (x) _i ,y _j ) The two correspond to the observed data error. Based on the above, the equation form of the cost function can be obtained by adding all the observation data errors and performing least square solution.

In the embodiment of the disclosure, an alignment anchor point is introduced in the backend optimization. Thus, the camera position x is added in addition to the error part of the equation corresponding to the observation model _i Position ax of alignment anchor point corresponding thereto _i The error portion therebetween, and thus the cost function in the disclosed embodiment is the result of adding the two portions of error described above, as shown in the following equation:

it can be understood that the cost function is not a linear function and needs to be optimized by a non-linear optimization means.

In some embodiments, fig. 5 is a schematic flowchart of another anchor-based VSLAM backend optimization method provided in this disclosure, which shows a flowchart of a method for performing VSLAM backend optimization based on an online anchor generation manner. Referring to fig. 5, the method may include:

s201, data of the camera and at least one other positioning source are collected simultaneously.

Wherein, the term "simultaneously" means that the acquisition periods are the same, but the sampling rate and the sampling time are completely consistent. In this step, data acquired with the camera and at least one other positioning source are acquired during the same time period. Other positioning sources may include, for example, radar or GPS; the data collected by the radar can be point cloud data, the data collected by the GPS can be GPS data, the data collected by the camera can be images, and the data of the three carry timestamps so as to align the timestamps in the subsequent steps.

S202, generating an initial anchor point based on at least one of other positioning sources.

The method comprises the steps that a GPS anchor point can be generated through data extraction based on GPS data, and the GPS anchor point is used as an initial anchor point; or generating an LSLAM anchor point, namely an initial anchor point corresponding to the radar, by mapping and positioning based on the point cloud data of the radar, and taking the LSLAM anchor point as the initial anchor point.

In other embodiments, data acquired by other positioning sources with higher precision may also be used to generate the initial anchor point, which is not limited herein.

It can be understood that when the pose data directly obtained based on a certain positioning source is the relative pose data, the relative pose data can be converted into absolute pose data, that is, pose data in global coordinates, by combining with data of other positioning sources such as a GPS, so as to obtain the initial anchor point.

And S203, based on the initial anchor point, interpolating to obtain an alignment anchor point corresponding to the data of the camera.

The time stamps of the cameras are not completely the same because the sampling rates and sampling times of the cameras and other positioning sources such as a GPS (global positioning system), a radar and the like for collecting data are not completely consistent. In this step, anchor points, i.e., alignment anchor points, corresponding to the data timestamps of the cameras one to one are obtained through interpolation.

And S204, establishing a graph by the VSLAM.

Specifically, for monocular VSLAMs, VSLAMs can be constructed based on images and alignment anchors to generate map points and key frames; alternatively, for binocular VSLAMs, VSLAM mapping may be performed based on the images to generate map points and keyframes. The specific process of VSLAM mapping may include: and extracting feature points in the image, and performing VSLAM mapping based on a feature point method.

It should be noted that, in the embodiment of the present disclosure, for a monocular VSLAM, a graph is built by combining an alignment anchor, and scale information and global coordinates in the VSLAM process can be provided based on the alignment anchor, so that reduction of a real scale is facilitated, and a success rate and accuracy are improved at the same time.

S205, conditions for achieving local backend optimization?

Specifically, whether the condition of local back-end optimization is achieved is judged. If the condition of local back-end optimization (i.e. Y) is reached, performing back-end optimization, i.e. executing S206 and subsequent steps; if the condition of local backend optimization (i.e., N) is not reached, the process returns to S204.

And S206, local back-end optimization is carried out based on the alignment anchor points.

Specifically, local back-end optimization is performed by taking the alignment anchor point as a reference of the key frame pose.

S207, all the images are processed?

Specifically, it is determined whether all the images have been processed. If all the images are processed, performing a subsequent global back-end optimization step, namely executing S208; if all the images are not processed, namely unprocessed images still exist, returning to the VSLAM image building step, namely S204, continuing to process the images until all the images are processed, and then performing global back-end optimization.

It should be noted that all images in this step refer to generating map points and key frames from the front-end odometer, and determining the image corresponding to the key frame that is successfully initialized and all images located behind the key frame in time sequence, and may not include the image before successful initialization; in addition, for the method of mapping the key frame from the successfully initialized key frame, all the images further include the image before the successfully initialized key frame.

And S208, performing global back-end optimization based on the alignment anchor points.

Specifically, global back-end optimization is performed by taking the alignment anchor point as a reference of the key frame pose.

It should be noted that the global backend optimization and the local backend optimization may use the same optimization method, and the difference is only in the range of the optimization. Specifically, the method comprises the following steps: the local back-end optimization is to select a plurality of key frames with a common view relation near a small range as an optimization object aiming at the current key frame, solve a local free solution, and the local back-end optimization executed later can influence the result of the local back-end optimization in the front; and the global back-end optimization is to use all key frames as optimization objects to solve a global free solution with the same global scale.

And S209, generating a VSLAM map.

Specifically, an optimized map is obtained, and further, the generated VSLAM map may be saved.

It can be understood that, in the method shown in fig. 5, the steps of mapping different from the conventional VSLAM are as follows: it is required to synchronously acquire data of at least one other positioning source, for example, at least one sensor in radar, GPS and the like, correspondingly generate at least one of LIDAR anchor point and GPS anchor point, and use the same as pose reference in VSLAM mapping process. In other embodiments, the anchor point may also be generated offline based on data acquired by other positioning sources, and used in the VSLAM mapping process, which is neither described nor limited herein.

According to the anchor point-based VSLAM rear-end optimization method provided by the embodiment of the disclosure, by introducing the alignment anchor point, the higher precision of the alignment anchor point can be fully utilized, and the precision of rear-end optimization is improved, so that the overall error in the VSLAM map building process is not increased along with the increase of the map scale, but is related to the precision of the alignment anchor point, and the accumulative error of VSLAM map building can be obviously reduced; specifically, the alignment anchor points are used as the prior pose truth values of the key frames with higher accuracy, and pose constraints between the key frames and the alignment anchor points are added in back-end optimization, so that the mapping accuracy can be improved. In addition, the method can provide real scale by utilizing the alignment anchor point, thereby omitting the loop detection process of the traditional VSLAM mapping and solving the problem of uncertainty of monocular VSLAM mapping scale.

On the basis of the foregoing embodiments, the embodiments of the present disclosure further provide an anchor point-based VSLAM backend optimization apparatus, which is configured to execute any one of the methods provided in the foregoing embodiments, and can achieve corresponding beneficial effects.

Exemplarily, fig. 6 is a schematic structural diagram of an anchor point-based VSLAM backend optimization apparatus according to an embodiment of the present disclosure. Referring to fig. 6, the apparatus 300 includes: an obtaining module 310, configured to obtain a plurality of alignment anchor points and a plurality of images collected by a camera; the time stamp of the alignment anchor point corresponds to the time stamp of the image respectively; the alignment anchor points are pose data generated by adopting other positioning sources; a generating module 320 for generating map points and key frames based on at least the images; each key frame corresponds to an alignment anchor point; and an optimization module 330, configured to perform VSLAM backend optimization based on the alignment anchor corresponding to the key frame after determining that the local backend optimization condition is satisfied.

The anchor point-based VSLAM rear-end optimization device provided by the embodiment of the disclosure can introduce an alignment anchor point into the device through the synergistic effect among the functional modules; the alignment anchor points are pose data generated based on other positioning sources, and more accurate pose data can be given to the image frames corresponding to the timestamps; therefore, the device performs VSLAM rear-end optimization by combining the alignment anchor points, namely the alignment anchor points with higher pose precision are used as the prior pose truth values of the corresponding key frames, and pose constraints between the key frames and the alignment anchor points are added during VSLAM rear-end optimization, so that the rear-end optimization precision is improved, the VSLAM mapping precision is improved, and the problems of larger accumulated error and continuous increase are solved.

In some embodiments, the local back-end optimization condition comprises at least one of a first condition and a second condition; a first condition comprising: the number of the newly added key frames is greater than a preset number threshold; the number of the newly added key frames is the number of the key frames which are already added after the last local back-end optimization; a second condition comprising: the pose error is greater than a preset error threshold; the pose error is an error determined based on the current keyframe pose and the corresponding anchor pose.

In some embodiments, on the basis of fig. 6, the optimization module 330 is configured to perform VSLAM backend optimization, and includes the optimization module 330 specifically configured to:

performing local back-end optimization based on the alignment anchor points;

judging whether all the images are processed;

if not, returning to generate map points and key frames based on the image;

In some embodiments, the optimization module 330 is configured to perform local backend optimization or perform global backend optimization, and includes the optimization module 330 specifically configured to:

In some embodiments, the optimization module 330 is configured to perform local backend optimization or perform global backend optimization, and the optimization module 330 is specifically configured to:

determining a cost function corresponding to the VSLAM observation model;

through iterative optimization, the cost function tends to be minimum;

wherein the cost function is:

wherein e is _ij Representing the cumulative cost, x _i Representing the corresponding position of the camera in the current image frame, y _i Representing the corresponding map point, z _ij Representing the observation data, ax, collected by the camera for that map point _i Representing the corresponding position of the alignment anchor.

In some embodiments, the pose error comprises at least one of a position error, a yaw angle error, a pitch angle error, and a roll angle error; the pose error is calculated by the following formula:

E _yaw ＝(yaw _i -yaw _ai )

E _pitch ＝(pitch _i -pitch _ai )

E _roll ＝(roll _i -roll _ai )

It can be understood that the apparatus shown in fig. 7 can implement any one of the methods provided in the foregoing embodiments, and has corresponding beneficial effects, which can be specifically understood with reference to the foregoing understanding, and therefore, the details are not described herein.

On the basis of the foregoing embodiment, as shown in fig. 7, a schematic structural diagram of an electronic device according to an embodiment of the present disclosure is shown. Referring to fig. 7, the electronic device 400 includes: a processor 420; a memory 410 for storing instructions executable by processor 420; the processor 420 is configured to read the executable instructions from the memory 410 and execute the executable instructions to implement the steps of any one of the methods provided in the foregoing embodiments, which has corresponding beneficial effects and is not described herein again to avoid repeated descriptions.

Processor 420 may be, among other things, a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the computer to perform desired functions.

Memory 410 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processor 420 to implement the method steps of the various embodiments of the present application described above and/or other desired functions.

In addition to the above-described methods and electronic devices, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the method steps of the various embodiments of the present application.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, the disclosed embodiments may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by the processor 420, cause the processor 420 to perform the method steps of the various embodiments of the present application.

A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

On the basis of the foregoing embodiment, an embodiment of the present disclosure further provides a vehicle, which includes the foregoing electronic device, and has corresponding beneficial effects, and for avoiding repeated description, details are not repeated herein.

In other embodiments, the vehicle further comprises an unmanned system, which includes, but is not limited to, a perception module, a decision module, a chassis execution module, and the like.

It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The previous description is only for the purpose of describing particular embodiments of the present disclosure, so as to enable those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A VSLAM back-end optimization method based on anchor points is characterized by comprising the following steps:

2. The method of claim 1, wherein the local back-end optimization condition comprises at least one of a first condition and a second condition;

3. The method of claim 1 or 2, wherein the performing VSLAM backend optimization comprises:

performing local back-end optimization based on the alignment anchor points;

judging whether all the images are processed;

4. The method of claim 3, wherein the performing local backend optimization or the performing global backend optimization comprises:

5. The method of claim 3, wherein the performing local backend optimization or the performing global backend optimization comprises:

determining a cost function corresponding to the VSLAM observation model;

the cost function tends to be minimum through iterative optimization;

wherein the cost function is:

wherein e is _ij Represents the cumulative cost, x _i Representing the corresponding position of the camera in the current image frame, y _i Representing the corresponding map point, z _ij Representing the observation data, ax, collected by the camera for that map point _i Representing the corresponding position of the alignment anchor.

6. The method according to claim 2, wherein the pose error includes at least one of a position error, a yaw angle error, a pitch angle error, and a roll angle error; the pose error is calculated by adopting the following formula:

E _yaw ＝(yaw _i -yaw _ai )

E _pitch ＝(pitch _i -pitch _ai )

E _roll ＝(roll _i -roll _ai )

7. An anchor-based VSLAM backend optimization device, comprising:

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for performing the steps of the method according to any of claims 1-6.

9. An electronic device, comprising: a processor; a memory for storing the processor-executable instructions; the processor configured to read the executable instructions from the memory and execute the executable instructions to implement the steps of the method according to any one of claims 1-6.

10. A vehicle characterized by comprising the electronic apparatus of claim 9.