CN116295457A

CN116295457A - Vehicle vision positioning method and system based on two-dimensional semantic map

Info

Publication number: CN116295457A
Application number: CN202211649741.3A
Authority: CN
Inventors: 李军; 赵季; 陈韫韬; 徐宁仪
Original assignee: Huixi Intelligent Technology Shanghai Co ltd
Current assignee: Huixi Intelligent Technology Shanghai Co ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-06-23
Anticipated expiration: 2042-12-21
Also published as: CN116295457B

Abstract

The invention provides a vehicle vision positioning method and system based on a two-dimensional semantic map, comprising the following steps: acquiring current sensor pose data and camera image data; performing image sensing on camera image data to obtain visual semantic sensing elements; acquiring a two-dimensional semantic map of the environment where the current vehicle is located, and obtaining map elements; according to the visual semantic perception elements, the heights of map elements in the two-dimensional semantic map are obtained, and three-dimensional coordinates of sampling points of the map elements are constructed; carrying out mixed data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements; and carrying out nonlinear optimization iteration on the mixed data association result to obtain the optimal current pose, and completing the visual positioning of the vehicle. According to the invention, a lightweight two-dimensional semantic map without elevation information is utilized, a visual perception result is registered with the semantic map in a hybrid matching mode combining two-dimensional distance transformation and three-dimensional matching, and a two-dimensional error and a three-dimensional error are minimized through nonlinear optimization, so that high-precision positioning is performed.

Description

Vehicle vision positioning method and system based on two-dimensional semantic map

Technical Field

The invention relates to the technical field of automatic driving, in particular to a vehicle vision positioning method and system based on a two-dimensional semantic map, and provides a corresponding computer terminal and a computer-readable storage medium.

Background

When the automatic driving vehicle executes tasks, the pose (including position and orientation) output by the positioning module is essential important information for modules such as planning control and the like. Currently, three schemes are mainly adopted for automatic driving vehicle positioning: the first is to combine the data of gesture sensors such as high-precision differential GPS and inertial navigation to estimate the self gesture; secondly, based on the existing point cloud map positioning, local point cloud data are obtained through a laser radar or a camera and registered with a global point cloud map to estimate the pose; and thirdly, based on existing semantic map positioning, sensing the surrounding environment based on camera images or laser radar point clouds, and comparing with a global high-precision semantic map to estimate rotation and translation.

However, the first solution described above requires a GPS ground base station, as well as expensive inertial navigation equipment, and suffers from poor efficiency when the GPS signal is blocked in non-open areas; the second solution needs a dense point cloud map, which has high building, storing, transmitting and updating costs and cannot be applied to a large-scale scene; the third solution, which aligns the local visual perception result with the high-precision map for estimation based on the existing high-precision semantic map positioning, generally has the following problems:

1. The existing visual semantic locating schemes are all based on three-dimensional high-precision semantic maps, and the map is required to contain longitude, latitude and altitude information of map elements. However, the three-dimensional high-precision map is limited by related regulations and legal regulations of geographic information confidentiality, and is difficult to build and apply in a large scale in the automatic driving field in consideration of the difficulty of acquiring accurate height information;

2. the existing visual semantic locating scheme needs to carry out clear data association between semantic perception and map elements, and is easy to generate mismatching at the position where the local perception is not perfectly consistent with the map, and the locating effect is poor;

3. with the development of the neural network, the vision can output three-dimensional depth information of semantic elements, and the existing vision positioning scheme is only based on a two-dimensional image sensing result and does not fully utilize the result of the sensing network.

Disclosure of Invention

The invention provides a vehicle vision positioning method and system based on a two-dimensional semantic map, and provides a corresponding computer terminal and a computer readable storage medium.

According to one aspect of the present invention, there is provided a vehicle visual localization method based on a two-dimensional semantic map, comprising:

Acquiring current sensor pose data and camera image data;

performing image sensing on the camera image data to obtain visual semantic sensing elements;

acquiring a two-dimensional semantic map of the environment where the current vehicle is located, and obtaining map elements;

according to the visual semantic perception elements, the heights of map elements in the two-dimensional semantic map are obtained, and three-dimensional coordinates of map element sampling points are constructed;

carrying out mixed data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements;

and carrying out nonlinear optimization iteration based on the mixed data association result to obtain the optimal current pose, and completing the visual positioning of the vehicle.

Optionally, the acquiring current sensor pose data and camera image data, wherein the acquiring the sensor pose data includes:

and acquiring position data and posture data by adopting a global posture sensor or a relative posture sensor to obtain the posture data of the sensor.

Optionally, the image sensing of the camera image data to obtain a visual semantic sensing element includes:

performing image sensing on the obtained camera image data by using a deep learning neural network to obtain visual semantic sensing elements; wherein:

The visual semantic perception element comprises: semantic information and geometric information; wherein the semantic information includes: category label information and basic semantic attribute information of the road elements, including: the sign board type, the lane line virtual-real attribute and the lane line color attribute; the geometric information includes: geometric information of a two-dimensional image space and geometric information of a three-dimensional space.

Optionally, the acquiring the two-dimensional semantic map of the environment where the current vehicle is located includes:

inquiring a two-dimensional semantic map in a specified search range around the position based on position data in the sensor pose data to obtain a plurality of map elements around the current position of the vehicle; wherein:

the map element includes: lane lines, stop lines, road signs, lamp posts and signboards; the lane lines are necessary map elements and are used for providing stable transverse pose constraint for pose estimation; one or more of the other map elements are optional map elements for providing longitudinal pose constraints for pose estimation.

Optionally, the obtaining the height of the map element in the two-dimensional semantic map according to the visual semantic perception element, and constructing the three-dimensional coordinates of the map element sampling point includes:

Estimating the pavement by adopting a robust estimation mode according to the visual semantic perception elements;

and converting the coordinates of the map elements in the two-dimensional semantic map into a local coordinate system, and calculating the three-dimensional coordinates of the map element sampling points with heights by combining the estimation results of the road surface.

Optionally, the performing hybrid data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements includes:

the visual semantic perception element and the three-dimensional coordinates of the map element sampling points are clearly associated in a three-dimensional space, and abnormal map elements in the map elements are removed;

performing fuzzy association based on distance transformation on the three-dimensional coordinates of the visual semantic perception element and the map element sampling points in a two-dimensional image space;

and combining the clear association result and the fuzzy association result to obtain a mixed data association result.

Optionally, the explicitly associating the visual semantic perception element with the three-dimensional coordinates of the map element sampling point in the three-dimensional space, and removing the abnormal map element at the same time includes:

calculating a three-dimensional matching error between the visual semantic perception element and the map element;

According to the three-dimensional matching error, calculating a three-dimensional space optimal matching pair, and carrying out clear association;

and finding out invalid map elements and eliminating the map elements.

Optionally, the performing fuzzy association based on distance transformation on the three-dimensional coordinates of the visual semantic perception element and the map element sampling point in the two-dimensional image space includes:

performing distance transformation on the visual semantic perception elements to obtain a distance transformation graph; wherein, in the distance transformation graph, each pixel value represents the distance from the pixel point to the nearest target, and the larger the pixel value is, the farther the distance is;

and projecting the three-dimensional coordinates of the map element sampling points into a two-dimensional image space by adopting a projective transformation model, calculating the pixel coordinates of the pixel points corresponding to the map element sampling points after projection, obtaining the pixel values corresponding to the pixel coordinates in the distance transformation graph, obtaining the nearest targets of the pixel points, and carrying out fuzzy association.

Optionally, performing nonlinear optimization iteration based on the mixed data association result to obtain an optimal current pose, including:

and constructing two-dimensional and three-dimensional errors by taking the current pose as an estimated state quantity and taking the mixed data association result as an error item by adopting a nonlinear least square method, and carrying out iterative estimation to obtain the optimal current pose.

Constructing two-dimensional and three-dimensional errors as error items by data association results, and obtaining the optimal current pose by iterative estimation, wherein the method comprises the following steps of:

constructing a nonlinear optimization model, wherein the nonlinear optimization model takes rotation and translation of the current pose as state quantity to be estimated and takes the current sensor pose data as an initial value of the state quantity;

for a three-dimensional clear correlation result in the mixed data correlation result, adding a three-dimensional matching error into the three-dimensional clear correlation result as a residual error item in the nonlinear optimization model;

for the two-dimensional fuzzy association result in the mixed data association result, adding the pixel value of the pixel coordinate of the pixel point corresponding to the projected map element sampling point in the distance transformation chart as another residual error item in the nonlinear optimization model;

and based on a nonlinear optimization model added with two residual terms, iteratively solving the least square problem of the nonlinear optimization model, and estimating the optimal state of the current pose.

According to another aspect of the present invention, there is provided a vehicle vision positioning system based on a two-dimensional semantic map, comprising:

the data acquisition module is used for acquiring current sensor pose data and camera image data;

The perception element acquisition module is used for carrying out image perception on the camera image data to obtain visual semantic perception elements;

the map element acquisition module is used for acquiring a two-dimensional semantic map of the environment where the current vehicle is located to obtain map elements;

the three-dimensional coordinate construction module acquires the height of the map element in the two-dimensional semantic map according to the visual semantic perception element and constructs the three-dimensional coordinate of the map element sampling point;

the mixed data association module is used for carrying out mixed data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements;

and the vehicle vision positioning module is used for carrying out nonlinear optimization iteration on the mixed data association result to obtain the optimal current pose and finish vehicle vision positioning.

According to a third aspect of the present invention there is provided a computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform the method of any one of the above, or to run the system of the above.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method of any of the above, or to run a system as described above.

Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:

the vehicle vision positioning method and system based on the two-dimensional semantic map provided by the invention can perform high-precision positioning by only using the lightweight semantic map (two-dimensional semantic map), reduce the construction, storage, transmission and updating costs of the map required by positioning, are easier to obtain, and are suitable for large-scale establishment and application in the field of automatic driving.

The vehicle vision positioning method and system based on the two-dimensional semantic map provided by the invention only use the two-dimensional semantic map, do not require map elements to carry elevation information, and can more easily meet the confidentiality requirements and legal requirements of various countries and various scenes, so that the automatic driving positioning function based on the high-precision map can rapidly cover a larger area range.

According to the vehicle vision positioning method and system based on the two-dimensional semantic map, provided by the invention, the mixed data association and the nonlinear optimization of the pose are carried out by combining the distance transformation and the three-dimensional space matching of the two-dimensional image space, and the clear association and the fuzzy association are carried out between the local perception element and the map element, so that various road scenes can be dealt with, and accurate positioning results can be obtained especially under the scenes of uneven lane line shapes, separation, convergence, intersection and the like.

The vehicle vision positioning method and system based on the two-dimensional semantic map combines the two-dimensional image coordinates and the three-dimensional height information output by the neural network, so as to realize a more stable and accurate positioning function.

According to the vehicle vision positioning method and system based on the two-dimensional semantic map, only the low-precision pose sensor and the camera are used, high-precision positioning is completed at low cost, and commercialization popularization can be completed more quickly.

According to the vehicle vision positioning method and system based on the two-dimensional semantic map, the road elements are perceived through the neural network, so that a three-dimensional perception result can be generated, the accuracy is higher and more stable, and the prediction process of the neural network has better real-time performance.

According to the vehicle vision positioning method and system based on the two-dimensional semantic map, map elements and perception elements are matched through three-dimensional space and distance transformation, so that the calculated amount is greatly reduced, and the vehicle vision positioning method and system based on the two-dimensional semantic map have higher stability.

According to the vehicle vision positioning method and system based on the two-dimensional semantic map, provided by the invention, the two-dimensional and three-dimensional results of vision perception are fully utilized, a local lane line curve is not required to be established in an additional mode, the flow is simpler and more convenient, and the effect is more stable under complex lane line shapes and scenes.

The vehicle vision positioning method and system based on the two-dimensional semantic map provided by the invention can process all road scenes based on more direct minimization of two-dimensional and three-dimensional matching errors; based on the correlation of three-dimensional and two-dimensional distance transformation data, curve lines and even irregularly-shaped lane lines can be processed, and all observation information in a visual field range is fully utilized.

The vehicle vision positioning method and system based on the two-dimensional semantic map can use the camera and the low-precision pose sensor, and is lower in cost and calculation amount, and the obtained semantic information is more abundant.

The vehicle vision positioning method and system based on the two-dimensional semantic map, provided by the invention, combine the three-dimensional space matching and the mixed data association under the two-dimensional distance transformation, do not need to establish a local map, and can also ensure the positioning stability when the local environment perception is not completely consistent with the high-precision semantic map (for example, the geographic scene is changed and the map is not updated in time).

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a workflow diagram of a vehicle visual localization method based on a two-dimensional semantic map according to an embodiment of the present invention.

FIG. 2 is a diagram showing an example of an image distance conversion effect according to a preferred embodiment of the present invention; the method comprises the steps of (a) sensing sampling points for lane lines drawn on a camera image and (b) correspondingly generating a distance transformation graph.

Fig. 3 is a schematic diagram of the components of a vehicle vision positioning system based on a two-dimensional semantic map according to an embodiment of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.

The embodiment of the invention provides a vehicle vision positioning method based on a two-dimensional semantic map, which utilizes a lightweight two-dimensional semantic map without elevation information, registers a vision sensing result with the semantic map by combining a two-dimensional distance transformation and three-dimensional matching mixed matching mode, and performs high-precision positioning by minimizing two-dimensional and three-dimensional errors through nonlinear optimization so as to realize the unmanned vehicle vision positioning method.

As shown in fig. 1, the vehicle visual positioning method based on the two-dimensional semantic map provided in this embodiment may include:

s1, acquiring current sensor pose data and camera image data;

s2, performing image sensing on the camera image data to obtain visual semantic sensing elements;

s3, acquiring a two-dimensional semantic map of the environment where the current vehicle is located, and obtaining map elements;

s4, acquiring the height of the map element in the two-dimensional semantic map according to the visual semantic perception element, and constructing the three-dimensional coordinates of the map element sampling points;

s5, carrying out mixed data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements;

and S6, carrying out nonlinear optimization iteration on the mixed data association result to obtain the optimal current pose, and completing the visual positioning of the vehicle.

In a preferred embodiment of S1, acquiring current sensor pose data may include:

and acquiring current position data and posture data by adopting a global posture sensor or a relative posture sensor to obtain current sensor posture data.

In one specific application example, the global pose sensor employs a low-precision global pose sensor, such as: and (3) a single-point GPS, and directly obtaining the pose data of the sensor.

In one specific application example, the relative pose sensor employs a low precision relative pose sensor, such as: a consumer IMU and a vehicle wheel speed meter. Further, in a scene where an initial global pose is known, using a low-precision relative pose sensor, sensor pose data is obtained from the known initial global pose and sensor relative pose data.

In a preferred embodiment of S2, performing image sensing on the camera image data to obtain visual semantic sensing elements may include:

performing image sensing on the obtained camera image data by using a deep learning neural network to obtain visual semantic sensing elements of the road elements; wherein:

the visual semantic perception element comprises: semantic information and geometric information; wherein, the semantic information includes: category label information and basic semantic attribute information of the road elements, including: lane line virtual-real attribute, lane line color attribute and sign category; the geometric information includes: geometric information of a two-dimensional image space and geometric information of a three-dimensional space, such as space position coordinates, dot line-plane shapes, and the like.

In a preferred embodiment of S3, acquiring the two-dimensional semantic map of the environment in which the current vehicle is located may include:

Inquiring a two-dimensional semantic map in a specified search range around a position based on position data in sensor pose data to obtain a plurality of map elements around the current position of the vehicle; wherein:

the map element includes: common road elements such as lane lines, stop lines, road signs, lamp posts, signboards and the like, wherein the lane lines are essential elements, are most common on a structured road and are used for providing stable transverse pose constraint for pose estimation; the other elements are optional elements, including at least one of them, for providing longitudinal pose constraints for pose estimation.

In a preferred embodiment of S4, according to the visual semantic perception element, acquiring the height of the map element in the two-dimensional semantic map, and constructing the three-dimensional coordinates of the map element sampling point may include:

s41, estimating the pavement by adopting a robust estimation mode according to visual semantic perception elements;

s42, converting coordinates of map elements in the two-dimensional semantic map into a local coordinate system, and calculating three-dimensional coordinates of map element sampling points with heights by combining the estimation result of the road surface.

In a specific application example, the transformation mode of transforming the coordinates of the map elements into the local coordinate system may be a multiplication transformation matrix.

In a specific application example of S41, the road surface may be estimated by RANSAC and variants thereof.

In a preferred embodiment of S5, performing the mixed data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements may include:

s51, the visual semantic perception elements and the three-dimensional coordinates of the map element sampling points are clearly associated in the three-dimensional space, and abnormal map elements in the map elements are removed.

S52, performing fuzzy association based on distance transformation on three-dimensional coordinates of the visual semantic perception element and the map element sampling points in a two-dimensional image space;

s53, combining the explicit association result and the fuzzy association result to obtain a mixed data association result

In a preferred embodiment of S51, explicitly associating the visual semantic perception element with the three-dimensional coordinates of the map element sampling points in the three-dimensional space, and removing the abnormal map element may include:

s511, calculating a three-dimensional matching error between the visual semantic perception element and the map element;

s512, calculating the optimal matching pair of the three-dimensional space according to the three-dimensional matching error, and carrying out clear association;

s513, finding out invalid map elements and eliminating the map elements.

In a specific application example of S511, the matching error may be designed based on the characteristics of various semantic information in the visual semantic perception element, for example: the lane line can consider the transverse distance between the two lines, the pavement marker can consider the intersection ratio of the rectangular envelope frames of the pavement marker, and the lamp post can consider the point-to-point distance between the grounding points of the pavement marker; the specific matching error items under each design example can be flexibly designed according to the output form of semantic perception and the storage form of elements in a high-precision semantic map.

In a specific application implementation of S512, a scheme of graph matching may be used to obtain the optimal matching pair, for example: a hungarian matching algorithm is used.

In one specific application implementation of S513, the invalid map elements are found and culled in combination with the matching error and the geometric priors of the objects.

In a preferred embodiment of S52, performing the fuzzy association based on the distance transformation on the three-dimensional coordinates of the visual semantic sensing element and the map element sampling point in the two-dimensional image space may include:

s521, performing distance transformation on the visual semantic perception elements to obtain a distance transformation graph; in the distance transformation graph, each pixel value represents the distance from the pixel point to the nearest target, and the larger the pixel value is, the farther the distance is indicated;

S522, projecting the three-dimensional coordinates of the map element sampling points into a two-dimensional image space by adopting a projective transformation model, calculating the pixel coordinates of the pixel points corresponding to the projected map element sampling points, obtaining the pixel values in the distance transformation diagram corresponding to the pixel coordinates, obtaining the nearest targets of the pixel points, and carrying out fuzzy association.

In a specific application example of S521, the lane line is subjected to distance transformation, and in the obtained lane line distance transformation graph, each pixel value represents the distance from the pixel point to the nearest lane line, and the higher the brightness (i.e., the larger the pixel value), the farther the pixel point is from the lane line. As shown in fig. 2, an example graph of distance transformation corresponding to a lane line sensing result is shown, in which (a) is a lane line sensing sampling point drawn on a camera image, and (b) is a distance transformation graph generated correspondingly.

In a specific application example of S522, for each pixel after the projection of the high-precision semantic map element, only the fuzzy association with the visual semantic perception element is needed, that is, only the distance from the pixel to the nearest target is needed to be considered, and no specific association to which target is needed to be considered is needed. For the continuous elements with unfixed shapes such as lane lines, clear point-to-point matching cannot be achieved, and fuzzy association can better handle the conditions of shape change, irregular shape, partial perception, incomplete consistency with a map and the like.

In a preferred embodiment of S6, performing nonlinear optimization iteration on the mixed data association result to obtain an optimal current pose may include:

and constructing a two-dimensional and three-dimensional error by taking the current pose as an estimated state quantity and taking a mixed data associated result as an error item by adopting a nonlinear least square method, and carrying out iterative estimation to obtain the optimal current pose.

Further, a nonlinear least square method is adopted, the current pose is taken as an estimated state quantity, a two-dimensional and three-dimensional error is constructed by taking a mixed data association result as an error term, and the optimal current pose is obtained through iterative estimation, which can comprise:

s61, constructing a nonlinear optimization model, wherein the nonlinear optimization model takes rotation and translation of the current pose as state quantity to be estimated and takes the current sensor pose data as an initial value of the state quantity;

s62, for a three-dimensional clear correlation result in the mixed data correlation result, adding a three-dimensional matching error into a residual error item in a nonlinear optimization model;

s63, for a two-dimensional fuzzy association result in the mixed data association result, adding a pixel value of a pixel coordinate of a pixel point corresponding to the projected map element sampling point in the distance transformation chart as another residual error item in the nonlinear optimization model;

S64, based on the nonlinear optimization model added with the two residual terms, the least square problem of the nonlinear optimization model is solved in an iterative mode, and the optimal state of the current pose is estimated.

The technical scheme provided by the embodiment of the invention is further described below with reference to the accompanying drawings.

The vehicle vision positioning method based on the two-dimensional semantic map provided by the embodiment of the invention comprises the following steps of:

step 1, collecting current sensor pose data and image data output by a camera, wherein the sensor pose data comprise position and pose data output by a low-precision pose sensor. The low-precision pose sensor can be a low-precision global pose sensor, such as a single-point GPS, and can directly obtain global pose as sensor pose data; the rough global pose can be calculated as sensor pose data according to the initial global pose and sensor relative pose data by using low-precision relative pose sensors, such as a consumption-grade IMU (inertial measurement unit), a vehicle wheel speed meter and the like, under the scene of knowing the initial global pose.

And 2, performing image sensing on the image data obtained in the step 1 based on a deep learning neural network to obtain visual semantic sensing results of road elements such as lane lines, signboards, lamp poles and the like, wherein the visual semantic sensing results comprise semantic information and geometric information. The semantic information comprises category labels and basic semantic attributes of road elements, such as a sign category, a lane line virtual-real attribute, a lane line color attribute and the like; the geometric information includes geometric information of a two-dimensional image space and geometric information of a three-dimensional space. The result of semantic perception is hereinafter referred to as a perception element.

And 3, inquiring a two-dimensional semantic map (high-precision semantic map) in a specified search range near the position according to the vehicle position data obtained in the step 1, and obtaining map elements of a plurality of two-dimensional semantic maps (high-precision semantic maps) around the vehicle, which are hereinafter referred to as map elements. The map elements comprise common road elements such as lane lines, stop lines, road signs, lamp posts, signboards and the like, wherein: the lane lines are essential elements, are most common on a structured road, and can provide stable transverse pose constraint for pose estimation; the other elements are optional elements, at least one of which may provide a longitudinal pose constraint for pose estimation.

And 4, estimating the height of the map elements in the two-dimensional semantic map to obtain the three-dimensional coordinates of the ground element sampling points. Further, the method comprises the following steps:

and 4.1, estimating the road surface by adopting a robust estimation mode according to the perception element obtained in the step 2. Implementations may estimate the road surface by RANSAC and variants thereof.

And 4.2, converting the coordinates of the map elements obtained in the step 3 into a local coordinate system, and calculating the three-dimensional coordinates of the map element sampling points with heights by combining the road surface estimated in the step 4.1.

And 5, associating the three-dimensional coordinates of the sensing element obtained in the step 2 and the map element around the vehicle obtained in the step 4 by combining a mixed data association mode of three-dimensional space and two-dimensional space distance transformation. Further, the method comprises the following steps:

step 5.1, the three-dimensional coordinates of the sensing element obtained in step 2 and the map element sampling point obtained in step 4 are explicitly associated in a three-dimensional space based on semantic information and three-dimensional matching errors (distances between the sensing element and the map element). At the same time, some abnormal map elements are removed. Further, the method comprises the following steps:

and 5.1.1, calculating the three-dimensional matching error between the sensing element and the map element. In particular implementations, the three-dimensional matching error may be designed based on characteristics of various semantic elements, such as: the lane lines may take into account the lateral distance between two lines in three-dimensional space, the pavement marker may take into account the intersection ratio of its rectangular envelope, and the light pole may take into account the three-dimensional point-to-point distance between its ground points. The specific matching error item of each class instance can be flexibly designed according to the output form of the sensing element and the storage form of the map element. In a specific application example, the output form of the sensing element of the lane line and the storage form of the map element are taken as a curve, so that the matching error item is designed as a three-dimensional space line-to-line error; while other elements may be in the form of points, rectangles, line segments, etc., the corresponding match error term may be designed as a point-to-point, rectangle-to-rectangle, line segment-to-line segment error.

And 5.1.2, calculating the optimal matching pair according to the matching error calculated in the step 5.1.1. Specific implementations may employ schemes for graph matching, such as the hungarian matching algorithm.

And 5.1.3, finding out invalid map elements and eliminating the map elements. The implementation may incorporate the matching error calculated in step 5.1.1, a priori, the geometry of the object.

And 5.2, performing fuzzy association based on distance transformation on the sensing element obtained in the step 2 and the three-dimensional coordinates of the map element sampling points obtained in the step 4 in a two-dimensional image space. Further, the method comprises the following steps:

and 5.2.1, performing distance transformation on the sensing elements obtained in the step 2 to obtain a distance transformation graph. In the distance map, each pixel value represents the distance of the pixel to the nearest target, and a larger pixel value indicates a greater distance. For example, in a lane-line distance map, each pixel value represents the closest lane-line distance from the pixel point, and a higher luminance (i.e., a larger pixel value) indicates that the pixel point is farther from the lane line. Fig. 2 shows an exemplary diagram of distance transformation corresponding to lane line perception results.

And 5.2.2, projecting the three-dimensional coordinates of the map element sampling points obtained in the step 4 to a two-dimensional image space through a projective transformation model, calculating the corresponding pixel coordinates after projection, and obtaining the pixel values in the distance transformation diagram corresponding to the coordinates obtained in the step 5.2.1. For each pixel point after the projection of the high-precision semantic map element, only fuzzy association is needed with a perception result, and only the distance from the point to the nearest target is needed to be considered, and the specific association to which target is needed to be considered is not needed. For the continuous elements with unfixed shapes such as lane lines, clear point-to-point matching cannot be achieved, and fuzzy association can better handle the conditions of shape change, irregular shape, partial perception, incomplete consistency with a map and the like.

And 6, under a nonlinear optimization framework, based on a least square method, taking the current pose as an estimated state quantity, constructing a two-dimensional and three-dimensional error as an error term by taking the data correlation obtained in the step 5, and iteratively estimating the optimal pose. Further, the method comprises the following steps:

and 6.1, constructing a nonlinear optimization model, taking rotation and translation of the current pose as state quantity to be estimated, and taking the rough global pose obtained in the step 1 as an initial value of the state quantity. Relevant optimization parameters such as matrix solver, gradient descent mode and robust kernel function are set. Wherein the nonlinear optimization framework can adopt a mainstream nonlinear optimization open source library, such as G2O, CERES and the like.

And 6.2, adding the geometric matching error of each matching pair in the three-dimensional matching result obtained in the clear association in the step 5.1 as a residual error item in the nonlinear optimization problem. The design of the geometrical match error term is the same as the match error term in step 5.1.1.

And 6.3, projecting pixel coordinates for the distance transformation graph obtained in the fuzzy association of the step 5.2 and the obtained map elements, and adding the pixel value of each pixel coordinate in the distance transformation graph as another residual term in the nonlinear optimization problem.

And 6.4, iteratively solving a least square problem for the nonlinear optimization problem, and estimating the optimal state of the current pose.

It should be noted that the distance transformation graph obtained in step 5.1 is continuously smooth, and the pixel value corresponding to each floating point number coordinate can be obtained by interpolation on the distance transformation graph, so that the distance transformation residual term added in step 6.3 is also continuously smooth, and the derivative of the residual term on the state quantity is also continuously smooth, which is friendly for iterative solution of the nonlinear least squares problem.

Based on the technical scheme provided by the embodiment of the invention, the following is further described:

a high-precision semantic map is a high-precision map containing only sparse key elements, wherein the map elements carry semantic information of objects. The high-precision semantic map for automatic driving generally comprises elements such as lane lines, roadside edges, signboards and traffic lights, wherein the element semantic information can comprise information such as lane line attributes, colors and signboards, and the element geometric information can be expressed in a parametric equation, sparse sampling points and the like. The semantic information of the semantic map provides an important priori for automatic driving positioning and regulation tasks, occupies a small space, and is beneficial to storage and transmission.

SLAM is an instant positioning and map construction technology, takes pose data of sensors such as cameras, inertial navigation, radars and the like as input, estimates the position of the sensor in the environment, and can build a local map. The SLAM framework comprises the technologies of feature extraction, data association, calibration, state estimation and the like.

The neural network is a network system formed by connecting a large number of basic units and having strong learning ability, and is widely used in the fields of automatic driving, natural language processing and the like.

Distance transformation is an image transformation algorithm that converts an image into another representation, with the pixel value of each pixel in the transformed image representing the distance of the point to the nearest particular object.

Nonlinear optimization is a method for estimating the optimal state of a nonlinear system based on a least square method under the influence of noise.

RANSAC is a random sample consensus algorithm used to iteratively estimate a mathematical model in noisy data.

The hungarian algorithm is a combined optimization algorithm for solving task allocation problems in polynomial time, and is commonly used for solving the optimal matching relationship between two groups of targets.

The embodiment of the invention provides a vehicle vision positioning system based on a two-dimensional semantic map.

As shown in fig. 3, the vehicle vision positioning system based on the two-dimensional semantic map provided by this embodiment may include:

and the vehicle vision positioning module is used for carrying out nonlinear optimization iteration on the mixed data association result to obtain the optimal current pose and finish the vehicle vision positioning.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred embodiment for constructing the system, which is not described herein.

An embodiment of the present invention provides a computer terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the program, is operative to perform the method or operate the system of any of the foregoing embodiments of the present invention.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

And a processor for executing the computer program stored in the memory to implement the steps in the method or the modules of the system according to the above embodiments. Reference may be made in particular to the description of the previous method and system embodiments.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

An embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the method of any of the above embodiments of the present invention or to run the system of any of the above embodiments of the present invention.

According to the vehicle vision positioning method and system based on the two-dimensional semantic map, provided by the embodiment of the invention, the high-precision positioning can be performed only by using the lightweight semantic map (the two-dimensional semantic map), the construction, storage, transmission and update costs of the map required by positioning are reduced, the vehicle vision positioning method and system based on the two-dimensional semantic map are easier to obtain, and the vehicle vision positioning method and system based on the two-dimensional semantic map are suitable for large-scale establishment and application in the automatic driving field; the map elements are not required to have elevation information, so that the security requirements and legal requirements of various countries and scenes can be met more easily, and the automatic driving positioning function based on the high-precision map can cover a larger area rapidly; the method combines the distance transformation of the two-dimensional image space and the three-dimensional space matching to perform the mixed data association of clear association and fuzzy association between the local perception element and the map element and the nonlinear optimization of the pose, so that various road scenes can be dealt with, and accurate positioning results can be obtained especially in the scenes of uneven, separation, convergence, intersection and the like of the shapes of the lane lines; the two-dimensional image coordinates and the three-dimensional height information output by the neural network are combined to realize a more stable and accurate positioning function; the road elements are perceived through the neural network, so that a three-dimensional perception result can be generated, the accuracy is higher, the neural network is more stable, and the prediction process of the neural network has better real-time performance; the map elements and the perception elements are matched through three-dimensional space and distance transformation, so that the calculated amount is greatly reduced, and the stability is higher; the two-dimensional and three-dimensional results of visual perception are fully utilized, a local lane line curve is not required to be established in an additional mode, the flow is simpler and more convenient, and the effect is more stable under complex lane line shapes and scenes; all road scenes can be processed based on more direct minimization of two-dimensional and three-dimensional matching errors; based on the correlation of three-dimensional and two-dimensional distance transformation data, curves and even irregularly-shaped lane lines can be processed, and all observation information in a visual field range is fully utilized; the camera and the low-precision pose sensor can be used for completing high-precision positioning at lower cost, so that commercialized popularization can be completed more quickly; the cost is lower, the calculated amount is lower, and the obtained semantic information is more abundant; the method combines the mixed data association under the three-dimensional space matching and the two-dimensional distance transformation, does not need to establish a local map, and can ensure the positioning stability when the local environment perception is not completely consistent with the high-precision semantic map (for example, the geographic scene changes and the map is not updated in time).

Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.

The foregoing embodiments of the present invention are not all well known in the art.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. A vehicle vision positioning method based on a two-dimensional semantic map, comprising:

Acquiring current sensor pose data and camera image data;

2. The two-dimensional semantic map-based vehicle vision positioning method of claim 1, wherein the acquiring current sensor pose data and camera image data, wherein the acquiring sensor pose data comprises:

3. The vehicle visual positioning method based on a two-dimensional semantic map according to claim 1, wherein the performing image sensing on the camera image data to obtain visual semantic sensing elements comprises:

4. The vehicle visual positioning method based on the two-dimensional semantic map according to claim 1, wherein the acquiring the two-dimensional semantic map of the environment in which the current vehicle is located comprises:

the map element includes: lane lines, stop lines, road signs, lamp posts and signboards; the lane lines are necessary elements and are used for providing stable transverse pose constraint for pose estimation; one or more of the other map elements are optional map elements for providing longitudinal pose constraints for pose estimation.

5. The vehicle vision positioning method based on a two-dimensional semantic map according to claim 1, wherein the obtaining the height of the map element in the two-dimensional semantic map according to the vision semantic perception element, and constructing the three-dimensional coordinates of the map element sampling points, comprises:

6. The two-dimensional semantic map-based vehicle visual localization method of claim 1, wherein the performing hybrid data association on the three-dimensional coordinates of the map element sampling points and the visual semantic perception elements comprises:

7. The two-dimensional semantic map-based vehicle vision positioning method according to claim 6, wherein the explicitly associating the vision semantic perception element with the three-dimensional coordinates of the map element sampling points in the three-dimensional space while rejecting abnormal map elements comprises:

and finding out invalid map elements and eliminating the map elements.

8. The two-dimensional semantic map-based vehicle vision positioning method according to claim 6, wherein the performing a distance transformation-based fuzzy association in a two-dimensional image space of the three-dimensional coordinates of the visual semantic perception element and the map element sampling point comprises:

And projecting the three-dimensional coordinates of the map element sampling points into a two-dimensional image space by adopting a projective transformation model, calculating the pixel coordinates of the pixel points corresponding to the map element sampling points after projection, obtaining the pixel values in the distance transformation diagram corresponding to the pixel coordinates, obtaining the distance from the pixel points to the nearest target, and carrying out fuzzy association.

9. The vehicle vision positioning method based on the two-dimensional semantic map according to claim 1, wherein the performing nonlinear optimization iteration based on the mixed data association result to obtain an optimal current pose comprises:

and constructing a two-dimensional and three-dimensional error by taking the current pose as an estimated state quantity and taking the result of the mixed data association as an error term by adopting a nonlinear least square method, and carrying out iterative estimation to obtain the optimal current pose.

10. The vehicle vision positioning method based on the two-dimensional semantic map according to claim 9, wherein the step of constructing a two-dimensional and three-dimensional error as an error term by using the nonlinear least square method and the current pose as an estimated state quantity and using the result of the mixed data association, and performing iterative estimation to obtain the optimal current pose comprises:

11. A vehicle vision positioning system based on a two-dimensional semantic map, comprising:

12. A computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-10 or to run the system of claim 11 when the program is executed by the processor.

13. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-10 or to run the system of claim 11.