CN115128655B

CN115128655B - Positioning method and device for automatic driving vehicle, electronic equipment and storage medium

Info

Publication number: CN115128655B
Application number: CN202211059514.5A
Authority: CN
Inventors: 张康宁
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2022-12-02
Anticipated expiration: 2042-08-31
Also published as: CN115128655A

Abstract

The application discloses a positioning method and device of an automatic driving vehicle, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a monocular image, corresponding combined navigation positioning data and a positioning state of the combined navigation positioning data; determining the camera pose of the monocular image according to the monocular image, the combined navigation positioning data and the positioning state of the combined navigation positioning data; determining key points in the monocular image, and determining three-dimensional landmark points corresponding to the key points according to the key points in the monocular image; and optimizing the camera pose and the three-dimensional road mark points of the monocular image by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result. According to the method and the device, the monocular vision odometer is combined with the positioning state of the integrated navigation positioning data to optimize the camera pose and the three-dimensional road mark point in the actual scene, accurate positioning can be achieved under the conditions that satellite positioning signals fluctuate or are lost, and the robustness of automatic driving vehicle positioning is guaranteed.

Description

Positioning method and device for automatic driving vehicle, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of automatic driving technologies, and in particular, to a method and an apparatus for positioning an automatic driving vehicle, an electronic device, and a storage medium.

Background

In the field of automatic driving, a combined navigation Positioning device composed of a GPS (Global Positioning System)/RTK (Real-time kinematic), real-time dynamic difference) + IMU (Inertial Measurement Unit) is generally used to provide absolute Positioning information, including information such as absolute position and attitude, for an automatic driving vehicle, and then the automatic driving vehicle performs Real-time path planning and driving decision according to the absolute Positioning information.

However, in practical scenarios, the combined navigation and positioning device composed of the GPS/RTK + IMU also has some significant problems:

1) The GPS/RTK is greatly influenced by the quality of satellite positioning signals, for example, in an urban scene, a high-rise dense area can cause the satellite positioning signals to be blocked, and the satellite positioning signals can be lost under a large bridge and in a tunnel scene, so that the positioning accuracy and the positioning stability are greatly influenced;

2) When satellite positioning signals are interfered for a long time, the GPS/RTK cannot make up for accumulated drift errors of the IMU, the combined navigation positioning precision is reduced, and then the positioning errors are increased along with the time, so that the problem of positioning drift is caused.

Disclosure of Invention

The embodiment of the application provides a positioning method and device of an automatic driving vehicle, electronic equipment and a storage medium, so as to ensure the positioning accuracy and the positioning robustness of the automatic driving vehicle.

The embodiment of the application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a positioning method for an autonomous vehicle, where the method includes:

acquiring a monocular image, corresponding combined navigation positioning data and a positioning state of the combined navigation positioning data;

determining the camera pose of the monocular image according to the monocular image, the combined navigation positioning data and the positioning state of the combined navigation positioning data;

determining key points in the monocular image, and determining three-dimensional landmark points corresponding to the key points according to the key points in the monocular image;

and optimizing the camera pose of the monocular image and the three-dimensional landmark points by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result.

Optionally, before acquiring the monocular image and the corresponding combined navigation positioning data, and the positioning state of the combined navigation positioning data, the method further comprises:

acquiring a monocular image in an initialization stage, and determining whether the monocular image in the initialization stage is a key frame image;

if so, adding the monocular image in the initialization stage into a preset sliding window;

determining the camera pose of the monocular image at the initialization stage in the preset sliding window according to the monocular image at the initialization stage in the preset sliding window, the combined navigation positioning data and the positioning state of the combined navigation positioning data;

determining three-dimensional landmark points corresponding to the key points by utilizing a triangulation algorithm according to the key points of the monocular image at the initialization stage in the preset sliding window;

and optimizing the camera pose of the monocular image in the initialization stage in the preset sliding window and the three-dimensional landmark point by using the preset optimization algorithm so as to complete initialization according to an optimization result.

Optionally, the determining, according to the monocular image and the combined navigation positioning data in the initialization stage in the preset sliding window and the positioning state of the combined navigation positioning data, the camera pose of the monocular image in the initialization stage in the preset sliding window includes:

under the condition that the number of the monocular images in the initialization stage in the preset sliding window reaches the size of the preset sliding window, determining whether the positioning state of the combined navigation positioning data corresponding to the monocular images in each frame of the initialization stage in the preset sliding window meets the initialization condition;

if the positioning states of the combined navigation positioning data corresponding to the monocular images of each frame of the initialization stage in the preset sliding window all meet the initialization condition, determining the camera pose of the monocular images of each frame of the initialization stage according to the combined navigation positioning data corresponding to the monocular images of each frame of the initialization stage and a preset external parameter;

otherwise, the preset sliding window is emptied.

Optionally, the determining, according to the monocular image, the combined navigation positioning data, and the positioning state of the combined navigation positioning data, the camera pose of the monocular image comprises:

if the positioning state of the combined navigation positioning data is an available state, determining the camera pose of the monocular image according to the combined navigation positioning data and a preset external parameter;

and if the positioning state of the combined navigation positioning data is an unavailable state, acquiring a previous frame of monocular image corresponding to the monocular image, and determining the camera pose of the monocular image according to the relative motion of the monocular image and the previous frame of monocular image.

Optionally, the determining the key points in the monocular image and determining the three-dimensional landmark points corresponding to the key points according to the key points in the monocular image includes:

performing histogram equalization processing on the monocular image to obtain a processed monocular image;

determining key points in the processed monocular image by using a preset tracking algorithm;

filtering the key points in the processed monocular image by using a preset filtering strategy to obtain the key points in the filtered monocular image;

and determining the three-dimensional landmark points of the monocular image by utilizing a triangularization algorithm based on the key points in the filtered monocular image.

Optionally, the filtering the key points in the processed monocular image by using a preset filtering strategy to obtain the key points in the filtered monocular image includes:

filtering key points in the processed monocular image by using a reverse optical flow tracking algorithm and a basic matrix algorithm; and/or the presence of a gas in the gas,

and generating a dynamic target mask according to the region of interest in the processed monocular image and the semantic segmentation result of the monocular image, and filtering key points in the processed monocular image by using the dynamic target mask.

Optionally, the optimizing the camera pose of the monocular image and the three-dimensional landmark point by using a preset optimization algorithm to obtain a positioning result of the autonomous vehicle according to the optimization result includes:

taking the camera pose of the monocular image as pose prior constraint, and determining the weight of the pose prior constraint;

if the camera pose of the monocular image is obtained based on the combined navigation positioning data, determining the weight of the pose prior constraint as a first weight;

if the camera pose of the monocular image is obtained based on the previous frame of monocular image, determining the weight of pose prior constraint as a second weight;

optimizing the camera pose of each frame of monocular image in a preset sliding window and a corresponding three-dimensional landmark point according to the pose prior constraint, the weight of the pose prior constraint and the minimized reprojection error;

wherein the first weight is greater than the second weight.

Optionally, the optimization result includes a three-dimensional landmark point in the local map, and after the camera pose of the monocular image and the three-dimensional landmark point are optimized by using a preset nonlinear optimization algorithm to obtain a positioning result of the autonomous vehicle according to the optimization result, the method further includes:

and adjusting the three-dimensional road mark points in the local map by using a preset adjusting strategy, wherein the preset adjusting strategy comprises at least one of the following strategies:

deleting invalid three-dimensional landmark points in the local map;

deleting three-dimensional landmark points which are observed in a preset sliding window for a time less than a first preset time threshold and are not observed in the monocular image of the current frame;

deleting three-dimensional landmark points which are not observed by any frame monocular image in a preset sliding window;

and setting the three-dimensional landmark points with the observation times larger than a second preset time threshold value in the preset sliding window to be fixed.

In a second aspect, an embodiment of the present application further provides a positioning device for an autonomous vehicle, where the device includes:

the acquisition unit is used for acquiring the monocular image, the corresponding combined navigation positioning data and the positioning state of the combined navigation positioning data;

a first determining unit, configured to determine a camera pose of the monocular image according to the monocular image, the combined navigation positioning data, and a positioning state of the combined navigation positioning data;

the second determining unit is used for determining key points in the monocular image and determining three-dimensional landmark points corresponding to the key points according to the key points in the monocular image;

and the first optimization unit is used for optimizing the camera pose of the monocular image and the three-dimensional landmark points by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result.

In a third aspect, an embodiment of the present application further provides an electronic device, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform any of the methods described above.

In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform any of the methods described above.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: according to the positioning method of the automatic driving vehicle, the monocular image, the corresponding combined navigation positioning data and the positioning state of the combined navigation positioning data are obtained firstly; then, according to the monocular image, the combined navigation positioning data and the positioning state of the combined navigation positioning data, determining the camera pose of the monocular image; then determining key points in the monocular image, and determining three-dimensional landmark points corresponding to the key points according to the key points in the monocular image; and finally, optimizing the camera pose and the three-dimensional road mark points of the monocular image by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result. According to the positioning method of the automatic driving vehicle, the monocular vision odometer is combined with the positioning state of the combined navigation positioning data to optimize the camera position and the three-dimensional road mark point in the actual scene, accurate positioning can be achieved under the conditions that satellite positioning signals fluctuate or are lost, and the robustness of automatic driving vehicle positioning is guaranteed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic flow chart illustrating a method for locating an autonomous vehicle according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a positioning device of an autonomous vehicle according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

The embodiment of the present application provides a method for locating an autonomous vehicle, and as shown in fig. 1, provides a schematic flow chart of the method for locating an autonomous vehicle in the embodiment of the present application, where the method at least includes the following steps S110 to S140:

step S110, acquiring a monocular image, corresponding combined navigation positioning data and a positioning state of the combined navigation positioning data.

The positioning method of the automatic driving vehicle is mainly realized based on a monocular vision odometer technology, wherein the monocular vision odometer estimates the camera pose of each frame according to the change of a monocular image shot by a monocular camera.

In the actual positioning stage of the automatic driving vehicle, a monocular image acquired by a monocular camera on the automatic driving vehicle needs to be acquired first, wherein the monocular image mainly refers to a road image in a certain view field range in front of the vehicle. Furthermore, it is also necessary to acquire the combined navigation positioning data and the positioning state of the combined navigation positioning data at the corresponding time instant, which may be determined, for example, from the RTK differential state.

Because the data acquisition frequency of the camera is different from that of the combined navigation equipment, and the data acquisition frequency of the combined navigation equipment is generally higher, the monocular image and the positioning data originally output by the combined navigation equipment can be processed in a time synchronization manner, for example, the combined navigation positioning data can be interpolated by adopting an interpolation method, so that the combined navigation positioning data corresponding to the image moment can be obtained, and the calculation precision of the subsequent camera pose can be improved.

And step S120, determining the camera pose of the monocular image according to the monocular image, the combined navigation positioning data and the positioning state of the combined navigation positioning data.

The estimation and optimization of the camera pose can be determined based on different positioning states of the combined navigation positioning data, for example, under the condition that the combined navigation positioning data is accurate, the combined navigation positioning data can provide more accurate prior information for the determination and optimization of the camera pose, so that the camera pose is determined by combining the positioning states of the combined navigation positioning data in the whole optimization process of the camera pose, the calculation precision of the camera pose can be greatly improved, and the positioning precision of an automatic driving vehicle is improved.

Step S130, determining key points in the monocular image, and determining three-dimensional landmark points corresponding to the key points according to the key points in the monocular image.

In the embodiment of the application, a key point in the monocular image also needs to be determined, where the key point can be regarded as a two-dimensional pixel point corresponding to a key target extracted from a road image, so that the key point in the monocular image needs to be further converted into a three-dimensional landmark point in a world coordinate system, and more constraints and references can be provided for optimization of the camera pose based on absolute position information of the three-dimensional landmark point, so that the calculation accuracy of the camera pose is further improved.

And S140, optimizing the camera pose of the monocular image and the three-dimensional landmark points by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result.

According to the embodiment of the application, the camera pose and the three-dimensional landmark points obtained in the previous steps can be optimized by utilizing a nonlinear optimization algorithm and the like, so that the positioning result of the automatic driving vehicle is obtained according to the optimization result. The optimized result can comprise an optimized camera pose, and an external parameter transformation matrix between the camera and the vehicle body can be obtained by calibration in advance, so that the optimized camera pose can be converted into a current vehicle body pose based on the external parameters from the camera to the vehicle body, and further a positioning result of the automatic driving vehicle can be determined according to the current vehicle body pose, for example, the positioning result can be further fused with positioning data output by other sensors to obtain a final positioning result.

The nonlinear optimization algorithm may be, for example, a BA (Bundle Adjustment) optimization algorithm, and specifically, which optimization algorithm is adopted, and a person skilled in the art may flexibly select the optimization algorithm according to actual requirements, which is not specifically limited herein.

According to the positioning method of the automatic driving vehicle, the monocular vision odometer is combined with the positioning state of the combined navigation positioning data to optimize the camera pose and the three-dimensional road mark point in the actual scene, accurate positioning can be achieved under the conditions that satellite positioning signals fluctuate or are lost, and the robustness of automatic driving vehicle positioning is guaranteed.

In some embodiments of the present application, prior to acquiring the monocular image and the corresponding combined navigation positioning data, and the positioning state of the combined navigation positioning data, the method further comprises: acquiring a monocular image in an initialization stage, and determining whether the monocular image in the initialization stage is a key frame image; if so, adding the monocular image in the initialization stage into a preset sliding window; determining the camera pose of the monocular image at the initialization stage in the preset sliding window according to the monocular image at the initialization stage in the preset sliding window, the combined navigation positioning data and the positioning state of the combined navigation positioning data; determining three-dimensional landmark points corresponding to the key points by utilizing a triangulation algorithm according to the key points of the monocular image at the initialization stage in the preset sliding window; and optimizing the camera pose of the monocular image in the initialization stage in the preset sliding window and the three-dimensional landmark point by using the preset optimization algorithm so as to complete initialization according to an optimization result.

According to the embodiment of the application, before real-time positioning of an automatic driving vehicle is carried out, initialization processing can be carried out firstly, whether an acquired monocular image is a key frame image or not can be judged firstly in an initialization stage, the first frame monocular image can be directly used as the key frame image, and for a subsequently acquired monocular image, judgment can be carried out according to the parallax size between image frames, for example, if the parallax between the two frames of monocular images is larger than a preset parallax threshold value, which indicates that sufficient relative motion exists between the two frames of monocular images, the current frame monocular image can be used as the key frame image, otherwise, the current frame monocular image is not used as the key frame image. And maintaining the determined key frame image in a preset sliding window mode, wherein the size of the preset sliding window can be flexibly set according to actual requirements, and is not specifically limited herein.

For the monocular image in the preset sliding window, on one hand, the camera pose of the monocular image can be calculated by combining the combined navigation positioning data and the positioning state of the combined navigation positioning data, and on the other hand, the three-dimensional landmark point corresponding to the key point can be calculated by utilizing a triangulation algorithm according to the key point of the monocular image in the preset sliding window. When triangularization is carried out, multi-frame triangularization can be carried out on all key points which are not smaller than a preset observation frequency threshold value in a preset sliding window, the multi-frame triangularization is a process of solving the position of a 3D point under a world coordinate system under the condition that the 2D observation coordinates of multi-frame images, the camera internal reference matrix and the camera pose of each frame image are known, n of the multi-frame images is not less than 2, therefore, the preset observation frequency threshold value can be set to be 2, if a certain key point is observed in the whole preset sliding window for more than two times, the error of the three-dimensional landmark point obtained by triangularization based on the key points observed in succession by the multi-frame images is smaller, and the three-dimensional landmark point has more reliable precision.

In addition, the embodiment of the application can also mark the three-dimensional landmark points which are abnormally triangulated (can be directly obtained according to the triangularization result, for example, the triangularization result is none) or exceed the camera view by a certain distance range as invalid points, so that the three-dimensional landmark points marked as the invalid points can be uniformly deleted after the subsequent optimization is completed, errors caused by participation of the invalid points in the optimization are avoided, and meanwhile, certain calculation power can be saved.

And finally, performing BA and other nonlinear optimization on the camera poses of all monocular images in the preset sliding window and corresponding three-dimensional landmark points, fixing the camera pose of the first frame to be unchanged in the optimization process, performing gradient descent optimization on the camera poses and the three-dimensional landmark points of other frames by using pose prior constraint and minimized reprojection errors, updating the camera poses and the three-dimensional landmark points of each frame after the optimization is completed, and finally obtaining the camera poses and the local maps of each frame after initialization.

In some embodiments of the present application, the determining, according to the monocular image and the combined navigation positioning data of the initialization stage in the preset sliding window and the positioning state of the combined navigation positioning data, the camera pose of the monocular image of the initialization stage in the preset sliding window includes: determining whether the positioning state of the combined navigation positioning data corresponding to the monocular images of each frame of initialization stage in the preset sliding window meets the initialization condition or not under the condition that the number of the monocular images of the initialization stage in the preset sliding window reaches the size of the preset sliding window; if the positioning state of the combined navigation positioning data corresponding to the monocular images of each frame initialization stage in the preset sliding window meets the initialization condition, determining the camera pose of the monocular images of each frame initialization stage according to the combined navigation positioning data corresponding to the monocular images of each frame initialization stage and a preset external parameter; otherwise, the preset sliding window is emptied.

The error caused by the camera pose calculated based on the monocular vision odometer is an accumulated result, so the accuracy of the camera pose determined in the initialization stage influences the calculation accuracy of the subsequent camera pose to a great extent.

Specifically, in the initialization stage, it is first determined whether the number of keyframes in the preset sliding window reaches the size of the preset sliding window, and if so, it may be further determined whether the positioning state of the combined navigation positioning data corresponding to each frame of image in the preset sliding window meets the initialization condition, and the monocular vision odometer needs the good positioning state of the combined navigation positioning data to complete initialization, so that if the positioning state of the combined navigation positioning data is available, for example, the RTK differential state is a fixed solution, which indicates that the positioning accuracy of the combined navigation positioning data is high, it may be considered that the initialization condition is met at this time.

When the positioning state of the combined navigation positioning data corresponding to each frame of image in the preset sliding window meets the initialization condition, the vehicle body pose in the combined navigation positioning data corresponding to each frame of monocular image can be converted into the camera pose of each frame of monocular image based on the external reference from a camera calibrated in advance to the vehicle body. That is, the camera pose of each frame of monocular image calculated in the initialization stage of the monocular visual odometer is derived from the combined navigation positioning data with higher positioning precision, so that the calculation precision of the camera pose in the initialization stage is ensured, and a more accurate optimization basis is provided for the optimization of the subsequent camera pose.

In some embodiments of the present application, the determining the camera pose of the monocular image from the monocular image and the combined navigation positioning data, and the positioning state of the combined navigation positioning data, comprises: if the positioning state of the combined navigation positioning data is an available state, determining the camera pose of the monocular image according to the combined navigation positioning data and a preset external parameter; and if the positioning state of the combined navigation positioning data is an unavailable state, acquiring a previous frame monocular image corresponding to the monocular image, and determining the camera pose of the monocular image according to the relative motion of the monocular image and the previous frame monocular image.

In an actual positioning stage, different strategies can be adopted to determine the camera pose based on the positioning state of the combined navigation positioning data, if the positioning state of the combined navigation positioning data is an available state, the positioning accuracy of the combined navigation positioning data is higher, and at the moment, the vehicle body pose in the combined navigation positioning data corresponding to each frame of monocular image can be converted into the camera pose of each frame of monocular image based on preset external parameters from a camera to a vehicle body which are calibrated in advance.

If the positioning state of the combined navigation positioning data is an unavailable state, it indicates that the positioning accuracy of the combined navigation positioning data is low at this time, and the camera pose cannot be calculated according to the positioning accuracy, at this time, the camera pose of the monocular image of the current frame can be estimated by combining the monocular image of the previous frame, for example, in a scene of uniform motion or approximately uniform motion, the relative motion between the camera of the previous frame and the camera of the current frame can be directly used as the relative motion between the camera of the current frame and the camera of the previous frame, and then the initial value of the camera pose of the current frame can be estimated by combining the camera pose of the previous frame, and then the final camera pose of the current frame is solved by using a PnP (persistent-n-Point) algorithm, which is a method for solving Point pair motion from 3D to 2D, and describes how to estimate the camera pose when knowing a plurality of 3D space points and their projection positions.

In some embodiments of the present application, the determining key points in the monocular image and determining three-dimensional landmark points corresponding to the key points according to the key points in the monocular image includes: performing histogram equalization processing on the monocular image to obtain a processed monocular image; determining key points in the processed monocular image by using a preset tracking algorithm; filtering the key points in the processed monocular image by using a preset filtering strategy to obtain the key points in the filtered monocular image; and determining the three-dimensional landmark points of the monocular image by utilizing a triangulation algorithm based on the key points in the filtered monocular image.

When the key points in the monocular image are extracted, the histogram equalization processing can be performed on the monocular image firstly, so that the influence of illumination change is reduced, then the key points in the image are determined by using a preset tracking algorithm such as an optical flow tracking algorithm on the processed monocular image, the key points comprise the key points of any target which may be concerned in the image, but not all the key points are suitable for the subsequent triangulation processing, so that the extracted key points can be filtered by adopting a certain filtering strategy, for example, the key points with larger tracking errors can be filtered, or the key points of dynamic objects can be filtered, so that the extracted key points have higher robustness and tracking accuracy. And finally, carrying out multi-frame estimation on key points which are filtered in the current preset sliding window and are not triangulated and reach a certain number of observation times to obtain new three-dimensional landmark points, wherein the three-dimensional landmark points with abnormal triangularization or exceeding a certain distance range of the camera view field can be marked as invalid points.

In some embodiments of the application, the filtering the key points in the processed monocular image with a preset filtering policy to obtain the key points in the filtered monocular image includes: filtering key points in the processed monocular image by using a reverse optical flow tracking algorithm and a basic matrix algorithm; and/or generating a dynamic target mask according to the region of interest in the processed monocular image and the semantic segmentation result of the monocular image, and filtering key points in the processed monocular image by using the dynamic target mask.

On one hand, when filtering the key points in the monocular image, the embodiment of the application can determine the matching errors of the key points by adopting a reverse optical flow tracking algorithm and a basic matrix algorithm, so that the key points with larger matching errors or wrong tracking can be filtered, and the tracking accuracy of the key points is improved.

On the other hand, the key points of the dynamic target are greatly influenced by the target motion, so that the robustness of the key points of the dynamic target is poor, and the key points of the dynamic target in the monocular image can be filtered. For filtering the key points of the dynamic target, a Region of Interest (ROI) intercepted from the monocular image and a semantic segmentation result of the monocular image may be used to generate a dynamic target mask (mask), and then the dynamic target mask is used to filter the key points tracked to the dynamic target, thereby improving the robustness of the key points. The semantic segmentation result of the monocular image is mainly a result obtained by segmenting a dynamic object such as a vehicle, a pedestrian and the like in the image, and can be obtained based on the existing semantic segmentation model based on deep learning training, which is not specifically limited herein.

In addition, in order to ensure the number of the three-dimensional landmark points and the subsequent optimization effect, the number of the key points in the monocular image needs to meet the requirement of a preset number threshold, and the number of the key points remaining in the current frame monocular image after the filtering process may not meet the requirement of the preset number threshold.

In some embodiments of the present application, the optimizing the camera pose of the monocular image and the three-dimensional landmark point by using a preset optimization algorithm to obtain a positioning result of the autonomous vehicle according to the optimization result includes: taking the camera pose of the monocular image as pose prior constraint, and determining the weight of the pose prior constraint; if the camera pose of the monocular image is obtained based on the combined navigation positioning data, determining the weight of the pose prior constraint as a first weight; if the camera pose of the monocular image is obtained based on the previous frame of monocular image, determining the weight of the pose prior constraint as a second weight; optimizing the camera pose of each frame of monocular image in a preset sliding window and a corresponding three-dimensional landmark point according to the pose prior constraint, the weight of the pose prior constraint and the minimized reprojection error; wherein the first weight is greater than the second weight.

The optimization process of the actual positioning stage is substantially the same as that of the initialization stage, and the main differences are that: in the actual positioning stage, the camera pose of the monocular image is not necessarily obtained under the condition that the combined navigation positioning signal is good, for example, in the foregoing embodiment, when the positioning state of the combined navigation positioning data is an unavailable state, the camera pose of the current frame monocular image is estimated according to the camera pose of the previous frame monocular image, and in this case, the accuracy or reliability of the camera pose determined under the condition that the positioning state of the combined navigation positioning data is an available state is lower than that of the camera pose determined under the condition that the positioning state of the combined navigation positioning data is an available state, so in the optimization process, the importance degrees of camera poses from different sources as pose prior constraints are different, and thus different weights can be respectively given to the optimization process of adaptively adjusting the camera pose and the three-dimensional landmark point.

Therefore, the embodiment of the application further reflects the influence of the positioning state of the combined navigation positioning data on the optimization process of the camera pose and the three-dimensional landmark point, and under the condition of good combined navigation positioning signals, the camera pose converted based on the combined navigation positioning data is more trusted, namely the weight of pose prior constraint is given to be larger, and under the condition of poor combined navigation positioning signals, the weight of the pose prior constraint is given to be smaller. Therefore, the accuracy of camera pose optimization is further improved, and the positioning accuracy of the automatic driving vehicle is ensured.

It should be noted that, in the actual positioning stage and the initialization stage of the present application, the camera pose and the three-dimensional landmark point are optimized by means of the preset sliding window, but in the actual positioning stage, after the optimization is completed by using the relevant information of the current frame monocular image, it may be determined whether the current frame monocular image is a key frame, that is, whether the current frame monocular image can be added into the preset sliding window.

The reason for this is that, for any frame of monocular image, if it is determined that it is a key frame, it is directly added to the preset sliding window and the earliest frame of image in the preset sliding window is removed, and if not, the frame of image is discarded. Therefore, if the current frame monocular image is acquired and then is judged to be a key frame, the current frame monocular image is discarded when the current frame monocular image is not a key frame, and further, the key point information contained in the current frame monocular image cannot be used for subsequent triangulation processing, so that certain information loss is caused. If the strategy of optimizing and judging is adopted, even though the current frame monocular image may not be a key frame, a new three-dimensional landmark point still may be obtained through triangularization processing, and therefore more reference information is provided for the optimization process.

In some embodiments of the present application, the optimization result includes a three-dimensional landmark point in a local map, and after optimizing the camera pose of the monocular image and the three-dimensional landmark point by using a preset non-linear optimization algorithm to obtain a positioning result of the autonomous vehicle according to the optimization result, the method further includes: and adjusting the three-dimensional road mark points in the local map by using a preset adjusting strategy, wherein the preset adjusting strategy comprises at least one of the following strategies: deleting invalid three-dimensional landmark points in the local map; deleting three-dimensional landmark points which are observed in a preset sliding window for a time less than a first preset time threshold and are not observed in the monocular image of the current frame; deleting three-dimensional landmark points which are not observed by any frame monocular image in a preset sliding window; and setting the three-dimensional landmark points with the observation times larger than a second preset time threshold value in the preset sliding window to be fixed.

The optimization result of the embodiment of the application can comprise the camera pose after each frame is optimized in the monocular vision odometer and the three-dimensional landmark point in the local map.

After each optimization, the embodiment of the present application may further adjust the three-dimensional landmark point in the current local map by using a preset adjustment strategy, where the preset adjustment strategy may include, for example:

1) Invalid points in the three-dimensional landmark points are deleted, and the three-dimensional landmark points with abnormal triangulation or beyond a certain range of the camera vision are marked as the invalid points after the triangulation processing in the embodiment, and the invalid points influence the optimization precision of the camera pose, so that the invalid points can be deleted;

2) Deleting three-dimensional landmark points which are observed for a number of times smaller than a first preset number threshold value in a current preset sliding window and are not observed in a current frame monocular image, wherein the three-dimensional landmark points are observed for a few times continuously and are observed based on previous frames of monocular images in the preset sliding window instead of being observed based on the current frame monocular image, for example, the three-dimensional landmark points are observed only in two frames of monocular images and cannot be observed in the current frame any more, the error of the three-dimensional landmark points is larger, and the optimization effect on the pose of a current frame camera is not large, so the three-dimensional landmark points can be deleted;

3) Deleting three-dimensional landmark points which are not observed by any frame in the current preset sliding window, for example, when a new monocular image enters the preset sliding window, the monocular image of the earliest frame is removed, the three-dimensional landmark points corresponding to the removed monocular image are not deleted at the moment, and the three-dimensional landmark points are not observed in the current preset sliding window, so that the three-dimensional landmark points can be deleted;

4) The three-dimensional landmark points with the observation times larger than the second preset time threshold value in the current preset sliding window are set to be fixed, and the situation shows that the three-dimensional landmark points can be continuously observed in multi-frame monocular images, and the error of the three-dimensional observation points can be considered to be small, so that the three-dimensional landmark points can not be optimized in the subsequent optimization process.

It should be noted that the optimization of the camera pose in the present application is a continuous process, and when the satellite positioning signal meets the initialization condition, the initialization of the monocular vision odometer is triggered, and the camera pose in the accurate initialization stage is obtained by using the combined navigation positioning data conversion, and the subsequent real-time positioning is performed based on the accurate initialization stage. In the real-time positioning process, if the current satellite positioning signal is good, the camera pose is still obtained by the conversion of the satellite positioning signal and is used as pose prior constraint, so that more accurate optimized pose is obtained, and if the current satellite positioning signal is poor, the pose is optimized by the monocular vision odometer, so that more accurate optimized pose can still be obtained.

In summary, the positioning method of the autonomous vehicle solves the problem of positioning drift of GPS/RTK + IMU combined navigation under the condition that a satellite positioning signal is shielded or lost by using the monocular vision odometer, on one hand, the camera pose is updated and the three-dimensional road marking point is estimated based on the combined navigation positioning data under the good condition of the satellite positioning signal, and on the other hand, the current camera pose is obtained by performing optimization solution according to the three-dimensional road marking point and the camera pose of the previous frame under the condition that the satellite positioning signal fluctuates or is lost, so that the current vehicle body pose is obtained, and the positioning accuracy and the positioning robustness of the autonomous vehicle are guaranteed.

The embodiment of the present application further provides a positioning device 200 for an autonomous vehicle, as shown in fig. 2, which provides a schematic structural diagram of the positioning device for an autonomous vehicle in the embodiment of the present application, where the device 200 at least includes: an obtaining unit 210, a first determining unit 220, a second determining unit 230, and a first optimizing unit 240, wherein:

an obtaining unit 210, configured to obtain a monocular image and corresponding combined navigation positioning data, and a positioning state of the combined navigation positioning data;

a first determining unit 220, configured to determine a camera pose of the monocular image according to the monocular image and the combined navigation positioning data, and a positioning state of the combined navigation positioning data;

a second determining unit 230, configured to determine a key point in the monocular image, and determine a three-dimensional landmark point corresponding to the key point according to the key point in the monocular image;

and the first optimization unit 240 is configured to optimize the camera pose of the monocular image and the three-dimensional landmark point by using a preset optimization algorithm, so as to obtain a positioning result of the autonomous vehicle according to the optimization result.

In some embodiments of the present application, the apparatus further comprises: the third determining unit is used for acquiring the monocular image in the initialization stage and determining whether the monocular image in the initialization stage is the key frame image or not; the adding unit is used for adding the monocular image in the initialization stage into a preset sliding window if the monocular image is in the initialization stage; a fourth determining unit, configured to determine a camera pose of the monocular image in the initialization stage in the preset sliding window according to the monocular image in the initialization stage in the preset sliding window, the combined navigation positioning data, and a positioning state of the combined navigation positioning data; a fifth determining unit, configured to determine, according to a key point of the monocular image at the initialization stage in the preset sliding window, a three-dimensional landmark point corresponding to the key point by using a triangulation algorithm; and the second optimization unit is used for optimizing the camera pose of the monocular image in the initialization stage in the preset sliding window and the three-dimensional landmark point by using the preset optimization algorithm so as to complete initialization according to an optimization result.

In some embodiments of the application, the fourth determining unit is specifically configured to: determining whether the positioning state of the combined navigation positioning data corresponding to the monocular images of each frame of initialization stage in the preset sliding window meets the initialization condition or not under the condition that the number of the monocular images of the initialization stage in the preset sliding window reaches the size of the preset sliding window; if the positioning state of the combined navigation positioning data corresponding to the monocular images of each frame initialization stage in the preset sliding window meets the initialization condition, determining the camera pose of the monocular images of each frame initialization stage according to the combined navigation positioning data corresponding to the monocular images of each frame initialization stage and a preset external parameter; otherwise, the preset sliding window is emptied.

In some embodiments of the present application, the first determining unit 220 is specifically configured to: if the positioning state of the combined navigation positioning data is an available state, determining the camera pose of the monocular image according to the combined navigation positioning data and a preset external parameter; and if the positioning state of the combined navigation positioning data is an unavailable state, acquiring a previous frame of monocular image corresponding to the monocular image, and determining the camera pose of the monocular image according to the relative motion of the monocular image and the previous frame of monocular image.

In some embodiments of the present application, the second determining unit 230 is specifically configured to: performing histogram equalization processing on the monocular image to obtain a processed monocular image; determining key points in the processed monocular image by using a preset tracking algorithm; filtering the key points in the processed monocular image by using a preset filtering strategy to obtain the key points in the filtered monocular image; and determining the three-dimensional landmark points of the monocular image by utilizing a triangularization algorithm based on the key points in the filtered monocular image.

In some embodiments of the present application, the second determining unit 230 is specifically configured to: filtering key points in the processed monocular image by using a reverse optical flow tracking algorithm and a basic matrix algorithm; and/or generating a dynamic target mask according to the region of interest in the processed monocular image and the semantic segmentation result of the monocular image, and filtering key points in the processed monocular image by using the dynamic target mask.

In some embodiments of the present application, the first optimization unit 240 is specifically configured to: taking the camera pose of the monocular image as pose prior constraint, and determining the weight of the pose prior constraint; if the camera pose of the monocular image is obtained based on the combined navigation positioning data, determining the weight of the pose prior constraint as a first weight; if the camera pose of the monocular image is obtained based on the previous frame of monocular image, determining the weight of the pose prior constraint as a second weight; optimizing the camera pose of each frame of monocular image in a preset sliding window and a corresponding three-dimensional landmark point according to the pose prior constraint, the weight of the pose prior constraint and the minimized reprojection error; wherein the first weight is greater than the second weight.

In some embodiments of the present application, the optimization result includes a three-dimensional landmark point in a local map, and the apparatus further includes: an adjusting unit, configured to adjust a three-dimensional landmark point in the local map by using a preset adjustment policy, where the preset adjustment policy includes at least one of: deleting invalid three-dimensional landmark points in the local map; deleting three-dimensional landmark points which are observed in a preset sliding window for a time less than a first preset time threshold and are not observed in the monocular image of the current frame; deleting three-dimensional landmark points which are not observed by any frame monocular image in a preset sliding window; and setting the three-dimensional landmark points with the observation times larger than a second preset time threshold value in the preset sliding window to be fixed.

It can be understood that the positioning device for an autonomous vehicle can implement the steps of the positioning method for an autonomous vehicle provided in the foregoing embodiments, and the explanations regarding the positioning method for an autonomous vehicle are applicable to the positioning device for an autonomous vehicle, and are not repeated herein.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 3, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other by an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the non-volatile memory into the memory and runs the computer program to form the positioning device of the automatic driving vehicle on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

The method performed by the positioning device of the autonomous vehicle disclosed in the embodiment of fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and combines hardware thereof to complete the steps of the method.

The electronic device may further execute the method executed by the positioning apparatus of the autonomous vehicle in fig. 1, and implement the functions of the positioning apparatus of the autonomous vehicle in the embodiment shown in fig. 1, which are not described herein again in this application embodiment.

Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method performed by the positioning apparatus of an autonomous vehicle in the embodiment shown in fig. 1, and are specifically configured to perform:

and optimizing the camera pose of the monocular image and the three-dimensional landmark point by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to an optimization result.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of locating an autonomous vehicle, wherein the method comprises:

optimizing the camera pose of the monocular image and the three-dimensional landmark points by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result;

the determining the camera pose of the monocular image according to the monocular image, the combined navigation positioning data, and the positioning state of the combined navigation positioning data comprises:

if the positioning state of the combined navigation positioning data is an unavailable state, acquiring a previous frame monocular image corresponding to the monocular image, and determining the camera pose of the monocular image according to the relative motion of the monocular image and the previous frame monocular image;

the optimizing the camera pose of the monocular image and the three-dimensional landmark point by using a preset optimization algorithm comprises the following steps:

determining the weight of pose prior constraint according to the camera poses of the monocular images from different sources;

and optimizing the camera pose of the monocular image and the three-dimensional landmark points by using a preset optimization algorithm according to the weight of the pose prior constraint.

2. The method of claim 1, wherein prior to acquiring the monocular image and the corresponding combined navigational positioning data, and the positioning status of the combined navigational positioning data, the method further comprises:

3. The method of claim 2, wherein the determining the camera pose of the monocular image of the initialization stage in the preset sliding window according to the monocular image of the initialization stage in the preset sliding window and the combined navigation positioning data, and the positioning state of the combined navigation positioning data comprises:

otherwise, the preset sliding window is emptied.

4. The method of claim 1, wherein the determining key points in the monocular image and the three-dimensional landmark points corresponding to the key points from the key points in the monocular image comprises:

and determining the three-dimensional landmark points of the monocular image by utilizing a triangulation algorithm based on the key points in the filtered monocular image.

5. The method of claim 4, wherein the filtering the key points in the processed monocular image by using a preset filtering strategy to obtain the key points in the filtered monocular image comprises:

filtering key points in the processed monocular image by using a reverse optical flow tracking algorithm and a basic matrix algorithm; and/or the presence of a gas in the atmosphere,

6. The method of claim 1, wherein the optimizing the camera pose of the monocular image and the three-dimensional landmark points using a preset optimization algorithm to obtain a positioning result of the autonomous vehicle according to the optimization result comprises:

if the camera pose of the monocular image is obtained based on the previous frame of monocular image, determining the weight of the pose prior constraint as a second weight;

wherein the first weight is greater than the second weight.

7. The method of claim 1, wherein the optimization results include three-dimensional landmark points in a local map, and after optimizing the camera pose of the monocular image and the three-dimensional landmark points using a preset non-linear optimization algorithm to obtain a positioning result of the autonomous vehicle according to the optimization results, the method further comprises:

deleting invalid three-dimensional landmark points in the local map;

8. A positioning device of an autonomous vehicle, wherein the device comprises:

a first determining unit, configured to determine a camera pose of the monocular image according to the monocular image and the combined navigation positioning data, and a positioning state of the combined navigation positioning data;

the first optimization unit is used for optimizing the camera pose of the monocular image and the three-dimensional landmark points by using a preset optimization algorithm so as to obtain a positioning result of the automatic driving vehicle according to the optimization result;

the first determining unit is specifically configured to:

if the positioning state of the combined navigation positioning data is an unavailable state, acquiring a previous frame of monocular image corresponding to the monocular image, and determining the camera pose of the monocular image according to the relative motion of the monocular image and the previous frame of monocular image;

the first optimization unit is specifically configured to:

determining pose prior constraint weights according to camera poses of the monocular images from different sources;

9. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 7.

10. A computer readable storage medium storing one or more programs which, when executed by an electronic device comprising a plurality of applications, cause the electronic device to perform the method of any of claims 1-7.