CN116026344A

CN116026344A - Automobile positioning method and system, storage medium and electronic equipment

Info

Publication number: CN116026344A
Application number: CN202211563938.5A
Authority: CN
Inventors: 廖学聪
Original assignee: Guangzhou Chenchuang Technology Development Co ltd
Current assignee: Guangzhou Chenchuang Technology Development Co ltd
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-04-28

Abstract

The invention relates to the technical field of automobile positioning, in particular to an automobile positioning method and system, a storage medium and electronic equipment. The automobile positioning method is applied to an automobile positioning system, and comprises the following steps: step 1, preprocessing image information output by a monocular camera; step 2, front-end visual odometer processing and loop detection processing are carried out on the preprocessed data; step 3, performing back-end nonlinear optimization processing on the data processed by the front-end visual odometer and the data processed by the loop-back processing; and 4, constructing a positioning map according to the data after the nonlinear optimization processing of the rear end. The invention can use limited resources to generate the moving map, and simultaneously locate the current position of the automobile, thereby bringing convenience to life of people.

Description

Automobile positioning method and system, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of automobile positioning, in particular to an automobile positioning method and system, a storage medium and electronic equipment.

Background

With the rapid development of industrial automation, there is an increasing demand for accurate positioning and navigation of automobiles. However, there is still a lack of a method for generating a moving map using limited resources while locating the current location of the car.

Disclosure of Invention

In order to solve the above problems, a main object of the present invention is to provide a vehicle positioning method, which can generate a moving map using limited resources and simultaneously position the current vehicle.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the automobile positioning method is applied to an automobile positioning system and comprises the following steps of:

step 1, preprocessing image information output by a monocular camera;

step 2, front-end visual odometer processing and loop detection processing are carried out on the preprocessed data;

step 3, performing back-end nonlinear optimization processing on the data processed by the front-end visual odometer and the data processed by the loop-back processing;

and 4, constructing a positioning map according to the data after the nonlinear optimization processing of the rear end.

Further, the preprocessing is ORB feature extraction and matching based on visual sensors.

Further, the visual odometer is processed to estimate the camera motion trajectory between two adjacent images and to determine a local map.

And further, performing one-pass feature matching on any two graphs by the loop detection processing, determining whether the two graphs are associated according to the matching condition of the feature points, if so, proving that the loop exists, otherwise, indicating that the loop does not exist.

Further, if loop exists, initial estimation of the vehicle pose is performed.

And if the loop does not exist, loop detection and correction processing is carried out, and the data after the loop detection and correction processing is subjected to back-end nonlinear optimization processing.

The present invention also provides a computer-readable storage medium having stored thereon a computer program characterized in that: the computer program is executed by a processor as described above.

The invention also provides an electronic device characterized by comprising: a processor, a memory, and a program, wherein the program is stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the method as described above.

The invention also provides an automobile positioning system which performs the method as described above.

Further, the system includes:

the camera imaging model module is used for preprocessing the image information output by the monocular camera;

the front-end visual odometer module is used for performing front-end visual odometer processing on the preprocessed data;

the loop detection module is used for carrying out loop detection processing on the preprocessed data;

the back-end nonlinear optimization module is used for performing back-end nonlinear optimization on the data processed by the front-end visual odometer and the data processed by the loop;

and the map building module is used for building a positioning map according to the data after the back-end nonlinear optimization processing.

The invention has the beneficial effects that:

the automobile positioning method is applied to an automobile positioning system, and comprises the following steps: step 1, preprocessing image information output by a monocular camera; step 2, front-end visual odometer processing and loop detection processing are carried out on the preprocessed data; step 3, performing back-end nonlinear optimization processing on the data processed by the front-end visual odometer and the data processed by the loop-back processing; and 4, constructing a positioning map according to the data after the nonlinear optimization processing of the rear end. The invention can use limited resources to generate the moving map, and simultaneously locate the current position of the automobile, thereby bringing convenience to life of people.

Drawings

FIG. 1 is a flow chart of a method for locating an automobile according to the present invention.

Fig. 2 is a flowchart of the method for positioning an automobile according to the present invention.

Fig. 3 is a schematic diagram of a similar triangle in an embodiment of the invention.

Fig. 4 is a schematic diagram of a pixel coordinate system in an embodiment of the invention.

Fig. 5 is a flow chart of a visual odometer in an embodiment of the invention.

Fig. 6 is a schematic diagram of image features in an embodiment of the invention.

FIG. 7 is a schematic diagram of a K-ary dictionary in an embodiment of the present invention.

Fig. 8 is a schematic diagram of epipolar search in an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and detailed description, wherein it is to be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.

Referring to fig. 1 and 2, the present invention provides an automobile positioning method, which is applied to an automobile positioning system, wherein the automobile positioning system mainly constructs a map updated in real time by a motion gesture of a monocular camera in three-dimensional world coordinates, and simultaneously estimates a position of an automobile in the map by a motion direction between two-dimensional coordinates of the camera, so as to achieve the purpose of automobile positioning.

The automobile positioning method comprises the following steps:

step 1, preprocessing image information output by a monocular camera;

Further, the loop detection processing is to perform one-pass feature matching on any two graphs, determine whether the two graphs are associated according to the matching condition of feature points, if so, prove that the loop exists, otherwise, indicate that the loop does not exist.

Further, if loop exists, initial estimation of the vehicle pose is performed.

Further, if no loop exists, loop detection and correction processing is performed, and back-end nonlinear optimization processing is performed on the data after the loop detection and correction processing.

The invention also provides an automobile positioning system which performs the method as described above. The operating system used for the development of the system is ubuntu16.4, and the sensor module used is a monocular camera. The development process has definite purpose, clear logic and no special key technical difficulty.

Further, the system comprises:

The functions of the modules are described as follows:

1. camera imaging model module (please refer to fig. 3 and 4)

The monocular camera can map coordinate points in the three-dimensional world onto the two-dimensional image plane, and this process can be described by a mapping model. The current mainstream model is a pinhole camera model, which simply simplifies the camera into pinhole imaging, and then describes the coordinates of points in three-dimensional space as a function of their projection onto the image plane of an ideal pinhole camera.

The pinhole model geometry needs to be modeled first. Let O-X-Y-Z be the camera coordinate system (wherein the Z-axis points to the front of the camera, the positive X-axis direction is right, the positive Y-axis direction is downward, O is the camera optical center, also the pinhole in the pinhole model, the focal length f is the distance between the physical imaging plane and the pinhole), the real world space point is P [ X, Y, Z] ^T Through the projection of the small hole O, the imaging point is P ' [ X ', Y ', Z ' ] and falls on the physical imaging plane O ' -X ' -Y '] ^T According to the triangle similarity relationship of fig. 3, there are:

the relationship of the points P to P' in three-dimensional space can be obtained, and the image can be more accurately characterized by a pixel coordinate system.

The plane of the pixel coordinate system is the imaging plane of the monocular camera, the origin O is at the upper left of the image, the u-axis is parallel to the x-axis, the v-axis is parallel to the y-axis, and the unit of the pixel coordinate system is pixel (pixel), i.e. resolution. There is a scaling relationship between the pixel coordinate system and the imaging coordinate system. The x and y axes are respectively alpha and beta times, and are shifted by [ c ] _x ,c _y ]T units, then the pixel point and coordinates [ u, v ]] ^T The relation between the two is:

u＝αX'+c _x (2)

v＝βY'+c _y (3)

let αf be f _x Let βf be f _y The world point P and the pixel point P can be obtained by the connection _uv The formula of (2) is:

the matrix format of the above is as follows:

[ u, v,1 ] in the above formula] ^T K is the homogeneous coordinate of the pixel coordinate point puv and k is the internal reference matrix of the monocular camera. The study herein is of the coordinates of the P point in the world coordinate system (denoted P _W ) And the point P in the above formula is a coordinate in the monocular camera coordinate system (o-x-y-z). Assuming that the pose of the monocular camera is represented by a rotation matrix R and a displacement vector t, then P _W Find P _UV The formula of (2) is shown below.

P in the above formula is a three-dimensional homogeneous vector, TP _W Obtaining P _Z ＝[X,Y,Z,1] ^T The steps between the additions herein are: p (P) _Z ＝[X/Z,Y/Z,1] ^T The formula is adjusted as follows:

the above equation is also called an observation equation, and the fourth-order T matrix is an extrinsic matrix of the monocular camera, which varies with the change of the pose (R, T) of the camera. Meanwhile, it can be seen that T is a position variable to be estimated, the camera internal reference matrix K is fixed when the camera leaves the factory, and the camera internal reference matrix K can be obtained through camera calibration software.

2. Front visual odometer module (please refer to fig. 5 and 6)

The visual odometer section corresponds to the front end of the system. The motion pose of the monocular camera is estimated incrementally and refined using optimization techniques. The visual odometer system consists of a specific camera arrangement, a software architecture and a hardware platform for generating camera poses at each moment. The mainstream estimation mode of the camera gesture is based on characteristics, and tracking can be performed by extracting different interest points and by means of vectors describing local areas around the key points. This approach relies on image texture and is generally not suitable for use in a non-textured or low-textured environment.

The visual odometer mainly comprises a plurality of modules of image key point extraction, calculation descriptors, feature matching and pose estimation. As shown in fig. 3, the visual odometer obtains gradation information of successive frame images by each iteration, and then performs feature extraction and feature description on the obtained information. In this process, there is generally an image preprocessing process before extracting the image feature points, so as to reduce the adverse effects of overexposure, light shortage and the like of the monocular camera. The feature point extraction algorithm has better stability and noise immunity after image preprocessing.

In the process of extracting the key points of the image, the characteristic point extraction is mainly relied on, and the characteristic point extraction specifically refers to angles, edges and blocks in the image, as shown in fig. 4. In the present system, the image extraction algorithm used is an ORB feature extraction algorithm. The feature point extraction method is an algorithm for extracting and describing the feature points rapidly, a feature extraction part is developed from a FAST algorithm, and a feature point description part is improved according to BRIEF.

The detection process flow of the feature extraction algorithm is as follows: first a pixel p is selected in the image, and a threshold is set assuming its brightness Ip. And then taking the pixel p as a center, adding a main direction to the feature points by using a gray centroid method, and defining the moment of the neighborhood pixel of any one feature point p as shown in a formula 1, wherein the centroid pointed from the geometric center O of the image block can be found by the moment (as shown in a formula 2). The geometric center O and the centroid C of the image block are connected to obtain a direction vector, and the directions of all the feature points can be defined as shown in formula 3. Finally, 16 pixels on a circle with radius of 3 are selected, and the pixel p can be regarded as a feature point provided that the brightness of the selected circle with continuous N points is larger than ip+T or smaller than Ip-T.

θ＝arctan(m ₀₁ /m ₁₀ ) (11)

The characteristic point description algorithm is based on the original BRIEF algorithm, and the characteristic of rotation invariance is added.

The above formula is a binary code descriptor vector for describing BRIEF, p (x), p (y) represent gray values of any two pixels in an image, n pairs of points are selected arbitrarily to generate a binary character string, and the generated descriptor is:

in the formula, n is generally 128, 256 or 512, and the lower the feature correlation is, the better the feature correlation is. ORB introduces a 2*n matrix to randomize n binary feature point sets (x ₁ ,y ₁ )、(x ₂ ,y ₂ )、...(x _n ,y _n ) To solve the problem of no rotational invariance. The matrix is defined as follows:

to further increase efficiency, ORB utilizes a neighborhood direction θ and a corresponding rotation matrix R _θ S is a modified version S _θ ：

And θ is the principal direction we find for the feature points. After adding the direction, obtaining the characteristic point descriptor with the rotation angle:

g _n (p,θ):＝f _n (p)|(x _i ,y _i )∈S _θ (16)

feature matching is a very critical step in a visual odometer, which solves the problem of data correlation between adjacent image frames. By accurately matching the descriptors between images or between images and maps, much burden can be reduced for subsequent operations such as pose estimation, optimization, and the like. The feature matching mode adopted by the system is a violent matching method, is the most direct and simple method for calculating the similarity between features, and can be used for matching rapidly in real time so as to ensure the instantaneity in the map construction and positioning process. The ORB feature descriptor consists of binary, and the Hamming distance refers to the number of characters at different positions corresponding to two character strings. The similarity between two feature points is measured by the Hamming distance measuring method, and the smaller the Hamming distance is, the higher the similarity between the two feature points is. And sequentially finding each characteristic point in the previous frame, and comparing the characteristic points with the highest similarity in the next frame, namely, the characteristic points with the smallest hamming distance to obtain a conclusion of whether the characteristic points are correctly matched.

3. Rear-end nonlinear optimization module

The back-end optimization mainly aims at solving the noise problem in the positioning and mapping process. In practice, there is some noise no matter how accurate the camera is. The measurement error of inexpensive monocular cameras is greater, and some monocular cameras are also affected by magnetic fields and temperature. Therefore, in order to solve the problem of how to estimate camera motion from an image, we need to care how much noise is present and how this noise is transferred from the last time to the current time, how much confidence we have in the current estimation. In back-end optimization, the main problem is how to estimate the state of the whole system from these noise data and how much uncertainty this state estimation is-this is called the maximum a posteriori probability. The states here include not only the track of the car itself but also a map. In the system, the front end mainly comprises image feature extraction, matching and initial pose estimation, and the rear end mainly comprises filtering and nonlinear optimization algorithms. However, under the same calculation amount, the nonlinear optimization method can obtain better optimization effect. The method can calculate the optimal estimation by taking the pose of each moment of the camera and the 3D world coordinates of map points as the optimization quantity and taking a beam adjustment method as a loss function, and then the most reasonable global view of the pose of the camera and all map points at all moments is obtained.

Camera motion is three-dimensional rigid motion, consisting mainly of rotation and displacement. Wherein the rotation matrix R is a special orthogonal group denoted SO (3) and the camera displacement is represented by a three-dimensional vector t. The camera motion can then be represented by a fourth order T of a special european group SE (3):

SO (3) and SE (3) have good addition closure, SO the mathematical problem of camera motion can be well solved by using them. Let T be _i Representing the pose at time i, the three-dimensional motion between times i and i+1 is T _i+1 The pose at time i+1 is:

T _i+1 ＝T _i T _i+1 (18)

however, the rotation matrix and the switching matrix do not have a closed form for the addition, i.e. for any two rotation matrices R ₁ And R is ₂ Their sum is no longer a rotation matrix, as defined by the matrix addition. And the lie algebra is a digital representation model of the camera pose which can be derived, and can solve the problems. SO (3) and SE (3) are both a type of prune group, with respective lie algebra, which can describe different local properties of the prune group. SO (3) corresponds to the number of lie algebra SO (3), and SE (3) Li Dai is denoted as SE (3).

Assume that the camera pose at time T is T _t The corresponding lie algebra is ζ, and at time t+1, a motion DeltaT is added, and the corresponding lie algebra is DeltaζThe pose at time t+1 is:

T _t+1 ＝ΔTT _t (19)

t on the upper part _t+1 The corresponding lie algebra is ζ _t+1 The following relationship exists:

T _t+1 ＝exp(ζ _t+1 )＝exp(Δζ^)exp(ζ^ _t ) (20)

the following can be approximated from the BCH:

the addition has the following relation:

exp((ζ+Δζ)^)＝exp((J _l Δζ)^)exp(ζ^)＝exp(ζ^)exp((J _r Δζ)^) (22)

assume that a certain p= [ X, Y, Z,1 in space] ^T The point is changed to obtain T _p The corresponding lie algebra of T is ζ, and the derivative of T can be converted into the following formula to calculate:

/>

the derivative of the motion equation to the lie algebra can be obtained by simplifying the expansion of the above formula:

but here contains J of relatively complex form _l . Simpler derivative calculation methods can be provided by the perturbation model:

finally, a matrix of 4x6 can be obtained, the direct deduction ratio of the matrix and the lie algebra is omitted, and the Jacobian matrix J is omitted _l The disturbance model is more practical.

4. Loop detection module (please refer to FIG. 7)

In the front-end visual odometer, since only correlation between adjacent image frames is considered, an accumulated error may be generated, resulting in unreliable long-term estimation results. In other words, a globally consistent track and map cannot be constructed. To account for long-term accumulated errors, closed loop detection may be employed to reduce error accumulation.

Most of the loop detection is based on appearance, and the similarity between images is compared. If a feature point method, such as using SIFT features to describe an image, is used, first, each SIFT vector is 128-dimensional, and each image is assumed to contain 1000 SIFT features, when image similarity calculation is performed, the calculation amount is very large, so that feature points are not directly used, but a word bag model is generally used. The word bag model is used for constructing a visual dictionary by extracting image features and classifying the features, so that the dictionary generation problem is equivalent to a clustering problem. Clustering problems are particularly common in unsupervised machine learning for machines to find rules in the data themselves. The problem of dictionary generation by BoW also belongs to one of them.

The system adopts K-means algorithm to cluster. First, let us assume that we extract feature points, such as n, for a large number of images. At this time, the K-means algorithm is used to gather n feature points into m words (m is far smaller than n), and the content of the K-means is:

1. randomly selecting n center points;

2. for each sample, calculating the distance between the sample and each center point, and taking the smallest sample as the classification of the sample;

3. recalculating the center point of each class;

4. if the change of each center point is very small, the algorithm converges and exits; otherwise, returning to the step 2.

And (5) carrying out K-means aggregation to obtain a dictionary of n words. Because the order of n is large, and the process of generating the picture word bag needs to search the dictionary according to a certain characteristic point to obtain a corresponding word, the dictionary can be expressed by using a tree structure, so that the word searching speed is improved. The K-means dictionary can be derived here by an extended K-means algorithm, which is as follows:

1. at the root node, all samples are clustered into K classes by using K-means, thus obtaining a first layer;

2. for each node of the first layer, the samples belonging to the node are re-gathered into k types to obtain the next layer;

3. and the like, finally obtaining a leaf layer. The leaf layer is Words.

K-ary tree of depth q can accommodate K ^q A word. When a certain feature point matches a corresponding word, q times may be at most matched. If linear matching is used, K can be reached ^q And twice.

5. Drawing building module

The front-end visual odometer or beam adjustment method is actually simulating and optimizing the location of the road sign. However, the requirements for mapping are not uniform due to the diversified demands. In this context, the upper layer is an automotive system, so it is desirable to have global positioning of the road. Positioning is a basic function of a map, but further, it is desirable to keep the map in place in order not to re-establish the map once each time. Therefore, modeling is only needed once, and the automobile can still be positioned in the map after the automobile is started next time. If it is desired to have the car reach a specific location, it is desirable to calculate a path that covers the entire map by navigation. Navigation means that the automobile can plan a path in a map, and an optimal path is found between any two map points. In this process, we need to know which places in the map can pass and which places cannot.

The above problems require a dense reconstruction of the stereo vision of the monocular camera. In this method, polar search and block matching techniques are required to determine how to locate a pixel in the first map that appears in the other map. Then, when we know the position of a pixel in each map, and then use a number of triangulation to determine its depth, the depth estimate can gradually converge from a very uncertain amount to a stable value as the measurement increases. This is the depth filtering technique.

Referring to FIG. 8, camera O on the left side of the figure ₁ Observe thatA certain pixel point p ₁ But its depth cannot be known, so it is assumed that its depth is within a certain range d. Therefore, the spatial point corresponding to the pixel is distributed on a certain line segment. At another view angle O ₂ The projection of this line segment also forms a line on the image plane, which is a epipolar line. This line may be determined when the motion between the two monocular cameras is known. Regarding determining which point on a pole line is p ₁ The mapped points need to be used with block matching techniques, which may improve the distinguishability to some extent.

The main flow of block matching is as follows: taking p first ₁ Peripheral and polar patches, p ₁ The surrounding small blocks are marked as

The n small blocks on the polar line are marked as B _i I=1,..n. Then calculate the difference between the small blocks, the present system calculates the correlation of the two small blocks through NCC (Normalized Cross Correlation) as follows:

a correlation close to 0 indicates that the two images are dissimilar and a correlation close to 1 indicates that the two images are similar.

After the similarity measurement of a and each Bi is calculated on the polar line, an NCC distribution along the polar line is obtained, and a plurality of peaks exist in the NCC distribution, so that the most real corresponding point needs to be found through a depth filter.

The filter in the system adopts a depth filter under the Gaussian distribution assumption. Suppose that the depth d of a pixel obeys:

P(d)＝N(μ,σ ² ) (27)

every time new data arrives, its depth is observed. Similarly, assume that this observation is also a gaussian distribution:

it is known that the product of the gaussian distributions is still a gaussian distribution. The original d distribution can be updated with the observed information by information fusion. Let d be the distribution after fusion

Then from the product of the gaussian distribution it can be derived:

the μ then needs to be calculated by geometric relationships _obs ，σ _obs . In the upper graph, p is found ₁ Corresponding p ₂ Dots, thereby observing p ₁ Is considered p ₁ The corresponding three-dimensional point is P. Thereby can record O ₁ P is P, O ₁ O ₂ Translation t, O for monocular camera ₂ P is denoted as a. The left and right angles below the triangle formed are denoted as α, β from left to right. Now consider the polar line l ₂ There is an error in the size of one pixel, so that the angle β becomes β ', and p becomes p', and the angle above the triangle is noted as γ.

First the geometrical relationship between these quantities is written in columns:

a＝p-t (30)

α＝arccos<p,t> (31)

β＝arccos<a,-t> (32)

perturbation of p2 by one pixel will produce a variation δβ for β, assuming a monocular focal length of f:

so that:

β'＝β+δβ (34)

γ＝π-α-β' (35)

from the sine theorem, the magnitude of p' can be found as:

from this, the depth uncertainty caused by the uncertainty of the individual pixels can be determined. If the block match for the epipolar search is considered to have an error of only one pixel, it can be set:

σ _obs ＝||p||-||p'|| (37)

if the epipolar search uncertainty is greater than one pixel, this uncertainty can also be amplified according to this formula. In actual engineering, the depth data may be considered to have converged when the uncertainty is less than a threshold.

In summary, the complete process of estimating dense depth can be given as follows:

1. assuming that the depth of all pixels meets some initial gaussian distribution;

2. determining the position of the projection point through polar line searching and block matching when new data are generated;

3. calculating the depth and uncertainty after triangularization according to the geometric relationship;

4. the current observation is fused into the estimation of the previous step. If the convergence is carried out, stopping calculation, otherwise, returning to the second step.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way; those skilled in the art can smoothly practice the invention as shown in the drawings and described above; however, those skilled in the art will appreciate that many modifications, adaptations, and variations of the present invention are possible in light of the above teachings without departing from the scope of the invention; meanwhile, any equivalent changes, modifications and evolution of the above embodiments according to the essential technology of the present invention still fall within the scope of the present invention.

Claims

1. The automobile positioning method is characterized by being applied to an automobile positioning system and comprises the following steps of:

step 1, preprocessing image information output by a monocular camera;

2. The automobile positioning method according to claim 1, wherein: the preprocessing is ORB feature extraction and matching based on a visual sensor.

3. The automobile positioning method according to claim 1, wherein: the visual odometer is processed to estimate the camera motion trail between two adjacent images and determine a local map.

4. The automobile positioning method according to claim 1, wherein: and the loop detection processing is to perform one-pass feature matching on any two graphs, determine whether the two graphs are associated according to the matching condition of feature points, prove that the loop exists if the association exists, and otherwise, show that the loop does not exist.

5. The method for locating a vehicle according to claim 4, wherein: if loop exists, initial estimation of the vehicle pose is carried out.

6. The method for locating a vehicle according to claim 4, wherein: if the loop does not exist, loop detection and correction processing is carried out, and the data after the loop detection and correction processing is subjected to back-end nonlinear optimization processing.

7. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program being executable by a processor for performing the method of any of claims 1-6.

8. An electronic device, comprising: a processor, a memory, and a program, wherein the program is stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the method of any of claims 1-6.

9. An automobile positioning system, characterized in that: the car positioning system performs the method of any of claims 1-6.

10. The vehicle positioning system of claim 9, wherein: the system comprises: