CN111260538B

CN111260538B - Positioning and vehicle-mounted terminal based on long-baseline binocular fisheye camera

Info

Publication number: CN111260538B
Application number: CN201811468765.2A
Authority: CN
Inventors: 刘一龙; 徐抗; 谢国富
Original assignee: Beijing Momenta Technology Co ltd
Current assignee: Beijing Momenta Technology Co ltd
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2023-10-03
Anticipated expiration: 2038-12-03
Also published as: CN111260538A

Abstract

The invention discloses a positioning and vehicle-mounted terminal based on a long baseline binocular fisheye camera, wherein the method comprises the following steps: acquiring a plurality of first two-dimensional feature points and a plurality of first three-dimensional map points of a first image; updating the current scale of the map according to the set order and step length in a preset scale range, and updating a plurality of first three-dimensional map points according to the current scale to obtain a plurality of second three-dimensional map points; calculating the average value of normalized cross-correlation values of two-dimensional projections of a plurality of second three-dimensional map points at each updated scale on the first image and the second image; projecting a plurality of second three-dimensional map points corresponding to the largest average value of the normalized cross-correlation values to the second image to obtain a plurality of second two-dimensional feature points matched with the plurality of first two-dimensional feature points; and determining the spatial position of the moving target based on the matching relation of the plurality of first two-dimensional feature points and the plurality of second two-dimensional feature points.

Description

Positioning and vehicle-mounted terminal based on long-baseline binocular fisheye camera

Technical Field

The invention relates to the field of automatic driving, in particular to a binocular vision-based positioning and vehicle-mounted terminal.

Background

The matching points are searched between the binocular images, and the aim is to determine a plurality of pairs of matching points expressed by pixel coordinates in the overlapping area of the contents on the two images, so that each pair of matching points corresponds to the same three-dimensional point in the three-dimensional world.

Existing binocular positioning techniques are mainly divided into two categories: 1) A descriptor matching-based method; 2) A direct method based on the gray information of the image. The existing method is generally suitable for a scene with a short center distance of a camera, and under a scene with a long base line, an image content overlapping area generally occupies a small area at a position with obvious image edge distortion, meanwhile, the image characteristic difference is large, and extreme conditions such as shielding exist, and the number and quality of matching points cannot be ensured by the existing method, so that inaccurate positioning is caused.

Disclosure of Invention

The invention provides a binocular vision-based positioning and vehicle-mounted terminal which is used for overcoming at least one problem in the prior art.

According to a first aspect of embodiments of the present invention, there is provided a binocular vision based positioning comprising the steps of:

acquiring a plurality of first two-dimensional feature points and a plurality of first three-dimensional map points of a first image, wherein the first image is acquired through a first image acquisition unit of a binocular image acquisition device, and the binocular image acquisition device is arranged on a moving target;

updating the current scale of the map according to the set order and step length in a preset scale range, and updating a plurality of first three-dimensional map points according to the current scale to obtain a plurality of second three-dimensional map points;

calculating the average value of normalized cross-correlation values of two-dimensional projections of a plurality of second three-dimensional map points at each updated scale on the first image and the second image, wherein the second image is acquired by a second image acquisition unit of the binocular image acquisition device, and the acquisition time of the second image is the same as that of the first image;

projecting a plurality of second three-dimensional map points corresponding to the largest average value of the normalized cross-correlation values to the second image to obtain a plurality of second two-dimensional feature points matched with the plurality of first two-dimensional feature points;

and determining the spatial position of the moving target based on the matching relation of the plurality of first two-dimensional feature points and the plurality of second two-dimensional feature points.

Optionally, the binocular image acquisition device is a long baseline binocular fisheye camera.

Optionally, the method further comprises the following steps:

and updating the scale of the map according to the scale corresponding to the maximum average value of the normalized cross-correlation values.

Optionally, the plurality of first two-dimensional feature points and the plurality of first three-dimensional map points of the first image are acquired by a monocular vision odometer.

Optionally, updating the current scale of the map according to the set order and step length within the preset scale range includes:

in a preset scale range, the initial value of the current scale is the maximum possible value, and then the next value of the current scale is obtained before the initial value reaches the minimum value according to the following steps:

acquiring a three-dimensional map point with the minimum depth relative to a first camera in the plurality of first three-dimensional map points;

reducing the current scale of the map by a set amount, and updating the three-dimensional map point with the minimum depth according to the reduced scale;

projecting the updated three-dimensional map points to the second image to obtain the change direction of the projection of the three-dimensional map points with the minimum depth on the second image;

and stepping a distance of one pixel along the changing direction, and taking the ratio of the depth at the moment to the original depth as a new scale of the map.

Optionally, said calculating an average of normalized cross-correlation values of two-dimensional projections of a plurality of said second three-dimensional map points at each updated scale over said first image and second image comprises:

projecting each first two-dimensional feature point corresponding to each second three-dimensional map point and n x n square pixel points around the first two-dimensional feature point onto the second image according to the updated depth value to obtain n x n pixel positions on the second image, wherein the depths in local small areas on the set image are the same, and n is a positive integer;

calculating a normalized cross-correlation value NCC of the two-dimensional projection of each of the second three-dimensional map points on the first image and the second image according to the following formula:

wherein x is ₁ A two-dimensional feature point, x, projected on the first image for any one of a plurality of the second three-dimensional map points ₂ A two-dimensional feature point projected on the second image for any one of a plurality of the second three-dimensional map points; i ₁ ,I ₂ Respectively a first image and a second image, I is the index of n x n pixels around the two-dimensional feature point, I ₁ (x ₁ I) represents a two-dimensional feature point x on the first image ₁ Ith pixel in surrounding n x n region, I ₂ (x ₂ I) represents a two-dimensional feature point x on the second image ₂ The ith pixel in the surrounding n x n region,representing a two-dimensional feature point x on a first image ₁ Average value of surrounding n x n pixel values, < >>Representing a two-dimensional feature point x on a second image ₂ An average value of surrounding n x n pixel values;

and according to the obtained NCC values, the average value of normalized cross-correlation values of the two-dimensional projections of the second three-dimensional map points on the first image and the second image is obtained.

According to a second aspect of an embodiment of the present invention, there is provided a vehicle-mounted terminal including:

the acquisition module is configured to acquire a plurality of first two-dimensional feature points and a plurality of first three-dimensional map points of a first image, wherein the first image is acquired by a first image acquisition unit of a binocular image acquisition device, and the binocular image acquisition device is arranged on a moving target;

the map point updating module is configured to update the current scale of the map according to the set sequence and step length in a preset scale range, and update the plurality of first three-dimensional map points according to the current scale to obtain a plurality of second three-dimensional map points;

a calculation module configured to calculate an average value of normalized cross-correlation values of two-dimensional projections of the plurality of second three-dimensional map points at each updated scale on the first image and a second image, wherein the second image is acquired by a second image acquisition unit of the binocular image acquisition device, and the acquisition time of the second image is the same as the acquisition time of the first image;

the matching module is configured to project a plurality of second three-dimensional map points corresponding to the largest average value of the normalized cross-correlation values to the second image to obtain a plurality of second two-dimensional feature points matched with the plurality of first two-dimensional feature points;

and the positioning module is used for determining the spatial position of the moving target based on the matching relation of the plurality of first two-dimensional feature points and the plurality of second two-dimensional feature points. .

Optionally, the above vehicle-mounted terminal further includes:

and the scale updating module is configured to update the scale of the map according to the scale corresponding to the maximum average value of the normalized cross-correlation values.

Optionally, the map point updating module includes:

the scale adjusting unit is configured to set the initial value of the current scale as the maximum possible value in a preset scale range, and then the next value of the current scale is the scale value determined by the step scale obtaining unit before the initial value reaches the minimum value:

a depth minimum point acquisition unit configured to acquire a three-dimensional map point having a minimum depth with respect to a first camera among a plurality of the first three-dimensional map points;

a depth minimum point updating unit configured to reduce a current scale of the map by a set amount, and update a three-dimensional map point with the minimum depth according to the reduced scale;

a change direction determining unit configured to project the updated three-dimensional map point onto the second image, to obtain a change direction of the projection of the three-dimensional map point with the minimum depth on the second image;

the step scale acquisition unit is configured to step a distance of one pixel along the changing direction, and takes the ratio of the depth at the moment to the original depth as a new scale of the map.

Optionally, the computing module includes:

the projection unit is configured to project each first two-dimensional feature point corresponding to each second three-dimensional map point and n x n square pixel points around the first two-dimensional feature point onto the second image according to the updated depth value to obtain n x n pixel positions on the second image, wherein the depth in a local small area on the set image is the same, and n is a positive integer;

an NCC calculating unit configured to calculate a normalized cross-correlation value NCC of a two-dimensional projection of each of the second three-dimensional map points on the first image and the second image according to the following formula:

wherein x is ₁ A two-dimensional feature point, x, projected on the first image for any one of a plurality of the second three-dimensional map points ₂ A two-dimensional feature point projected on the second image for any one of a plurality of the second three-dimensional map points; i ₁ ,I ₂ Respectively a first image and a second image, I is the index of n x n pixels around the two-dimensional feature point, I ₁ (x ₁ I) a pixel value representing an ith two-dimensional feature point on the first image, I ₂ (x ₂ I) representing the pixel value of the ith two-dimensional feature point on the second image,representing the average of all n x n pixel values on the first image,/for>Representing an average of all n x n pixel values on the second image;

and an average value unit configured to obtain an average value of normalized cross-correlation values of two-dimensional projections of the plurality of second three-dimensional map points on the first image and the second image according to the obtained plurality of NCC values.

According to the embodiment of the invention, the two-dimensional characteristic points and the three-dimensional map points of the first image are acquired firstly, and then according to the characteristics of inaccurate depth of each three-dimensional map point acquired from a single frame image and relatively accurate depth relation, each three-dimensional map point is changed into an accurate three-dimensional position through uniform scale correction, and then the updated three-dimensional map points are projected to the second image to reach the correct two-dimensional position, so that the best matching points of the first image and the second image are obtained, and the positioning accuracy of a moving target is improved. Compared with the prior art, the embodiment of the invention can obtain better matching under the conditions of small area of overlapping area between visual angles, possibly serious distortion, larger exposure difference, shielding and the like; meanwhile, compared with the existing independent matching of all feature points, the method can reduce the calculated amount of matching, the matching result is not easy to be influenced by few mismatching, and the positioning robustness is improved.

The innovative points of the embodiments of the present invention include, but are not limited to, the following:

1. the two-dimensional feature points and the three-dimensional map points originally estimated by the monocular visual odometer VO are used as initial values of matching, and the reason for this is that the inventor finds that the depth of the three-dimensional map points obtained by using the VO is inaccurate in the process of realizing the invention, but the relative depth relation is more accurate, the three-dimensional position with higher accuracy can be changed into the three-dimensional position with higher accuracy through scale correction, the operation amount can be reduced while providing reliable basis for matching the image feature points, and the matching efficiency is improved, which is one of the innovation points of the embodiment of the invention.

2. The method for acquiring the two-dimensional characteristic point matching of the image under the given scale is characterized in that an optimal scale value is estimated for the image acquired by the camera 1, so that the three-dimensional points can reach the true positions of the three-dimensional points under the action of the optimal scale value, the image acquired by the camera 2 can reach the correct two-dimensional positions, and the accuracy of the image characteristic point matching is improved.

3. The scale updating strategy in the image feature point matching process improves the matching efficiency by calculating and selecting the scale stepping direction, and is one of the innovation points of the embodiment of the invention.

4. According to the image feature point matching method provided by the embodiment of the invention, the optimal matching feature points can be estimated in a robust way under the configuration of long base lines and high distortion (such as a fish-eye camera), and the correction quantity of the visual odometer scale can be output at the same time, and fed back to the visual odometer to carry out error correction and provide more optimization constraint. This is one of the innovative points of the present invention.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a binocular vision based positioning flow chart of one embodiment of the present invention;

FIG. 2 is a flow chart of a matching method according to an embodiment of the present invention;

fig. 3 is a block diagram of an in-vehicle terminal according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present invention and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a binocular vision-based positioning and vehicle-mounted terminal. The following will describe in detail.

FIG. 1 is a binocular vision based positioning flow chart of one embodiment of the present invention; the method is applied to vehicle-mounted terminals such as vehicle-mounted computers, vehicle-mounted industrial control computers (Industrial personal Computer, IPC) and the like, and the embodiment of the invention is not limited. The vehicle-mounted terminal is connected with each sensor of the vehicle and receives and processes data acquired by each sensor. As shown in fig. 1, the binocular vision based positioning includes the steps of:

s110, acquiring a plurality of first two-dimensional feature points and a plurality of first three-dimensional map points of a first image, wherein the first image is acquired through a first image acquisition unit of a binocular image acquisition device, and the binocular image acquisition device is arranged on a moving target.

In the embodiment of the present invention, the two image capturing units of the binocular image capturing apparatus may be cameras, and for convenience of description, the cameras hereinafter refer to the image capturing units unless otherwise specified. The binocular image acquisition device can be arranged on any side of the front, the rear, the left and the right of the vehicle. Optionally, the binocular image capturing device may be a long baseline binocular fisheye camera, and the camera may be a fisheye camera, where a Field OF View (FOV) OF the fisheye camera is larger, so that a target image captured by a single fisheye camera includes as many surrounding environments OF a vehicle as possible, thereby improving the integrity OF observation, further improving the integrity OF a constructed local map, and increasing the information content contained in the local map.

In one implementation, the first plurality of two-dimensional feature points and the first plurality of three-dimensional map points of the first image are acquired by a monocular vision odometer.

In the process of implementing the present invention, the inventor finds that two-dimensional feature points and three-dimensional map points for a certain camera (such as camera 1, which corresponds to the above-mentioned camera head) can be obtained by using the monocular Vision Odometer (VO) implemented in the prior art, and using these information, reliable clues can be provided for matching, because the inventor researches find that these map points have the following characteristics:

(1) If projected towards the camera 1, each three-dimensional point may be projected onto its corresponding two-dimensional feature point;

(2) The depth of each map point is inaccurate, but the relative depth relation is more accurate. Namely, through uniform scale correction, the three-dimensional position can be changed into a relatively accurate three-dimensional position at the same time;

(3) The points are scaled along the projection direction of the camera 1 without affecting their still projection onto the original two-dimensional feature points.

Based on these characteristics we only need to estimate an optimal scale value so that under its effect these three-dimensional points can reach their true positions, while the image projected onto the camera 2 can also reach the correct two-dimensional position, which is one of the innovative points of the present invention.

And S120, updating the current scale of the map according to the set sequence and step length within a preset scale range, and updating the plurality of first three-dimensional map points according to the current scale to obtain a plurality of second three-dimensional map points.

In one embodiment of the method of the present invention,

s130, calculating an average value of normalized cross-correlation values of two-dimensional projections of a plurality of second three-dimensional map points under each updated scale on the first image and the second image, wherein the second image is acquired through a second image acquisition unit of the binocular image acquisition device, and the acquisition time of the second image is the same as that of the first image.

In one implementation manner, the updating the current scale of the map according to the set order and step length within the preset scale range specifically includes:

reducing the current scale of the map by a set amount (which can be set according to the experience of a technician in specific implementation), and updating the three-dimensional map point with the minimum depth according to the reduced scale;

In one implementation, the calculating the average of normalized cross-correlation values of the two-dimensional projections of the plurality of second three-dimensional map points at each updated scale over the first image and the second image includes:

And S140, projecting a plurality of second three-dimensional map points corresponding to the maximum average value of the normalized cross-correlation values to the second image to obtain a plurality of second two-dimensional feature points matched with the plurality of first two-dimensional feature points.

In the implementation of the invention, the two-dimensional characteristic points on the first image and the two-dimensional characteristic points on the second image can be used as the best matching points of the first image and the second image.

And S150, determining the spatial position of the moving target based on the matching relation of the plurality of first two-dimensional feature points and the plurality of second two-dimensional feature points.

In one implementation manner, a plurality of three-dimensional space coordinate points can be determined according to a plurality of first two-dimensional feature points and a plurality of second two-dimensional feature points which are matched at the current moment, and then the current space position of the moving target can be determined according to a plurality of modes such as a pre-calibrated space coordinate system or an image landmark.

According to the embodiment of the invention, the two-dimensional characteristic points and the three-dimensional map points of the first image are acquired firstly, and then according to the characteristics of inaccurate depth of each three-dimensional map point acquired from a single frame image and relatively accurate depth relation, each three-dimensional map point is changed into an accurate three-dimensional position through uniform scale correction, and then the updated three-dimensional map points are projected to the second image to reach the correct two-dimensional position, so that the optimal matching points of the first image and the second image are obtained, and the spatial position of the moving target is determined. Compared with the prior art, the embodiment of the invention can obtain better matching under the conditions of small area of overlapping area between visual angles, possibly serious distortion, larger exposure difference, shielding and the like; meanwhile, compared with the existing independent matching of all feature points, the method can reduce the calculation amount of matching, and the matching result is not easy to be influenced by few mismatching.

In one implementation, the method further includes the steps of:

and updating the scale of the map according to the scale corresponding to the maximum average value of the normalized cross-correlation values, for example, feeding the updated scale back to the visual odometer for error correction, thereby providing better optimization constraint.

The embodiment of the invention fully utilizes the characteristics and the effective information of the visual odometer, provides a totally new image characteristic point matching method, can robustly estimate the optimal matching under the configuration of long base line and high distortion (such as a fish-eye camera), and simultaneously outputs the correction quantity for the visual odometer scale.

FIG. 2 is a flow chart of a matching method according to an embodiment of the present invention; as shown, the matching method includes:

1. the image of the camera 1 obtains two-dimensional characteristic points and three-dimensional map points through monocular VO

2. Initializing the scale to the maximum possible value and then looping the following steps before it reaches the minimum value:

a) Applying the current scale to all original three-dimensional map points to obtain updated three-dimensional positions of the map points

b) Calculating a normalized cross-correlation value (NCC) of two-dimensional projections of the same three-dimensional map point between two cameras

i. Assuming that the depth of local small areas on the image is the same, projecting the two-dimensional feature points corresponding to each map point and n x n square pixel points around the two-dimensional feature points to the camera 2 according to the updated depth values to obtain n x n pixel positions on the image of the camera 2

Calculating NCC values:

NCC values for all three-dimensional points were averaged and recorded

c) Step-wise updating of dimensions along epipolar lines

i. Finding the point of the smallest depth (nearest to camera 1) among all three-dimensional points

Reducing its dimensions by a very small amount, projecting a new point onto the camera 2 to obtain a change of direction of its projection onto the camera 2, i.e. tangential to the epipolar line

Stepping a distance of one pixel in this direction, where the ratio of depth to original depth is taken as the new scale

3. After the cycle is completed, outputting the matching with the maximum NCC value and the corresponding scale value

The embodiment of the invention fully utilizes the characteristics and the effective information of the visual odometer, provides a brand new matching method, can robustly estimate the optimal matching under the configuration of the long-baseline and high-distortion fisheye camera, and simultaneously outputs the correction quantity for the visual odometer scale. The result can be fed back to the visual odometer for error correction and to provide more optimization constraints. This is one of the innovative points of the present invention.

In accordance with the above-described method embodiments, FIG. 3 shows a block diagram of an in-vehicle terminal according to one embodiment; as shown in the figure, the in-vehicle terminal 300 includes:

an acquisition module 310 configured to acquire a plurality of first two-dimensional feature points and a plurality of first three-dimensional map points of a first image, wherein the first image is acquired by a first image acquisition unit of a binocular image acquisition apparatus, the binocular image acquisition apparatus being disposed on a moving object;

the map point updating module 320 is configured to update a current scale of the map according to a set order and a step length within a preset scale range, and update a plurality of first three-dimensional map points according to the current scale to obtain a plurality of second three-dimensional map points;

a calculation module 330 configured to calculate an average of normalized cross-correlation values of two-dimensional projections of the plurality of second three-dimensional map points at each updated scale on the first and second images;

a matching module 340, configured to project a plurality of second three-dimensional map points corresponding to the average value of the normalized cross-correlation values that is the largest to the second image, so as to obtain a plurality of second two-dimensional feature points that are matched with the plurality of first two-dimensional feature points;

a positioning module 350, configured to determine a spatial position where the moving target is located based on a matching relationship between the plurality of first two-dimensional feature points and the plurality of second two-dimensional feature points.

In an implementation manner, the vehicle-mounted terminal further includes:

In one implementation, the map point updating module includes:

In one implementation, the computing module includes:

According to the embodiment of the invention, the two-dimensional characteristic points and the three-dimensional map points of the first image are acquired firstly, and then according to the characteristics of inaccurate depth of each three-dimensional map point acquired from a single frame image and relatively accurate depth relation, each three-dimensional map point is changed into an accurate three-dimensional position through uniform scale correction, and then the updated three-dimensional map points are projected to the second image to reach the correct two-dimensional position, so that the optimal matching points of the first image and the second image are obtained. Compared with the prior art, the embodiment of the invention can obtain better matching under the conditions of small area of overlapping area between visual angles, possibly serious distortion, larger exposure difference, shielding and the like; meanwhile, compared with the existing independent matching of all feature points, the method can reduce the calculation amount of matching, and the matching result is not easy to be influenced by few mismatching.

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

Those of ordinary skill in the art will appreciate that: the modules in the apparatus of the embodiments may be distributed in the apparatus of the embodiments according to the description of the embodiments, or may be located in one or more apparatuses different from the present embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The binocular vision-based positioning method is characterized by comprising the following steps of:

projecting each first two-dimensional feature point corresponding to each second three-dimensional map point and n x n square pixel points around the first two-dimensional feature point onto a second image according to the updated depth value to obtain n x n pixel positions on the second image, wherein the depths in local small areas on the set image are the same, and n is a positive integer;

wherein x is ₁ A two-dimensional feature point, x, projected on the first image for any one of a plurality of the second three-dimensional map points ₂ A two-dimensional feature point projected on the second image for any one of a plurality of the second three-dimensional map points; i ₁ ,I ₂ Respectively a first image and a second image, I is the index of n x n pixels around the two-dimensional feature point, I ₁ (x ₁ I) represents a two-dimensional feature point x on the first image ₁ Ith pixel in surrounding n x n region, I ₂ (x ₂ I) represents a two-dimensional feature point x on the second image ₂ The ith pixel in the surrounding n x n region,representing a two-dimensional feature point x on a first image ₁ An average of the surrounding n x n pixel values,representing a two-dimensional feature point x on a second image ₂ An average value of surrounding n x n pixel values;

according to the obtained NCC values, the average value of normalized cross-correlation values of two-dimensional projections of the second three-dimensional map points on the first image and the second image is obtained through a second image acquisition unit of the binocular image acquisition device, wherein the obtaining time of the second image is the same as that of the first image;

2. The binocular vision based positioning method of claim 1, wherein the binocular image acquisition apparatus is a long baseline binocular fisheye camera.

3. The binocular vision based positioning method of any one of claims 1-2, further comprising the steps of:

4. The binocular vision based positioning method of claim 1, wherein the plurality of first two-dimensional feature points and the plurality of first three-dimensional map points of the first image are acquired by a monocular vision odometer.

5. The binocular vision based positioning method of claim 1, wherein the updating the current scale of the map in a set order and step size within a preset scale range comprises:

6. A vehicle-mounted terminal, characterized by comprising:

a computing module configured to:

wherein x is ₁ A two-dimensional feature point, x, projected on the first image for any one of a plurality of the second three-dimensional map points ₂ A two-dimensional feature point projected on the second image for any one of a plurality of the second three-dimensional map points; i ₁ ,I ₂ A first image and a second image respectivelyAn image, I, is an index of n x n pixels around a two-dimensional feature point, I ₁ (x ₁ I) represents a two-dimensional feature point x on the first image ₁ Ith pixel in surrounding n x n region, I ₂ (x ₂ I) represents a two-dimensional feature point x on the second image ₂ The ith pixel in the surrounding n x n region,representing a two-dimensional feature point x on a first image ₁ An average of the surrounding n x n pixel values,representing a two-dimensional feature point x on a second image ₂ An average value of surrounding n x n pixel values;

and the positioning module is configured to determine the spatial position of the moving target based on the matching relation of the plurality of first two-dimensional feature points and the plurality of second two-dimensional feature points.

7. The in-vehicle terminal according to claim 6, characterized by further comprising:

8. The vehicle-mounted terminal according to claim 6, wherein the map point updating module includes: