CN113409391A

CN113409391A - Visual positioning method and related device, equipment and storage medium

Info

Publication number: CN113409391A
Application number: CN202110711230.9A
Authority: CN
Inventors: 王求元
Original assignee: Zhejiang Shangtang Technology Development Co Ltd
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-09-17
Anticipated expiration: 2041-06-25
Also published as: CN113409391B

Abstract

The application discloses a visual positioning method and a related device, equipment and storage medium, wherein the visual positioning method comprises the following steps: acquiring a current image frame obtained by shooting by a shooting device; acquiring a first reference attitude, wherein the first reference attitude is an attitude of the shooting device corresponding to the shooting moment of the current image frame and relative to a reference plane; adjusting the first reference attitude by utilizing the offset between the reference plane and a preset plane in the world coordinate system to obtain a second reference attitude; and determining the final pose of the current image frame in the world coordinate system based on the second reference pose and the image information of the current image frame and other image frames, wherein the shooting time of the other image frames is before the current image frame. By the aid of the scheme, accuracy of the final pose is improved.

Description

Visual positioning method and related device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a visual positioning method, and a related apparatus, device, and storage medium.

Background

Computer vision technologies such as Augmented Reality (AR) and Virtual Reality (VR) are current hotspot technologies, and a camera is used as an input device and processed by an image algorithm, so that the surrounding environment can be digitized, and the use experience of interaction with a real environment can be acquired. Visual localization is an important application of AR technology, VR technology. By acquiring the image shot by the equipment, the pose information of the equipment can be obtained.

However, in the existing visual positioning technology, position and pose information is obtained by directly positioning and calculating image frames shot by equipment, and the calculated position and pose are often inaccurate due to errors in positioning and calculating the image frames.

Therefore, how to improve the accuracy of the visual positioning has very important significance.

Disclosure of Invention

The application provides a visual positioning method, a related device, equipment and a storage medium.

A first aspect of the present application provides a visual positioning method, including: acquiring a current image frame obtained by shooting by a shooting device; acquiring a first reference attitude, wherein the first reference attitude is an attitude of the shooting device corresponding to the shooting moment of the current image frame and relative to a reference plane; adjusting the first reference attitude by utilizing the offset between the reference plane and a preset plane in the world coordinate system to obtain a second reference attitude; and determining the final pose of the current image frame in the world coordinate system based on the second reference pose and the image information of the current image frame and other image frames, wherein the shooting time of the other image frames is before the current image frame.

Therefore, by acquiring the offset between the reference plane and the preset plane in the world coordinate system, the first reference attitude information can be adjusted based on the offset to obtain the second reference attitude information, and the second reference attitude information is used as the reference information of the attitude of the current image frame relative to the preset plane in the world coordinate system, so that the final attitude of the current image frame in the world coordinate system can be optimized by using the second reference attitude information, the accuracy of the final attitude is improved, and accurate positioning is realized.

Before the first reference attitude is adjusted by using the offset between the reference plane and the preset plane in the world coordinate system to obtain the second reference attitude, the method further includes: acquiring a first pose of the first historical image frame in a world coordinate system, and acquiring a third reference pose, wherein the third reference pose is a pose of the shooting device corresponding to the shooting time of the first historical image frame and relative to a reference plane; and obtaining an offset by using the posture in the first posture and the third reference posture, wherein the posture in the first posture is a posture relative to a preset plane.

Therefore, by obtaining the attitudes with respect to the preset plane and the reference plane corresponding to the same time, the offset between the reference plane and the preset plane in the world coordinate system can be obtained using the attitudes of the preset plane and the reference plane.

The first reference attitude is detected by a sensing device fixed relative to the shooting device, and the difference between the detection time of the first reference attitude and the shooting time of the current image frame does not exceed a first preset time difference; the third reference posture is detected by a sensing device fixed relative to the shooting device, and the difference between the detection time of the third reference posture and the shooting time of the first historical image frame does not exceed a second preset time difference.

Therefore, by selecting the first reference attitude or the third reference attitude that satisfies the first preset time difference or the second preset time difference, it can be considered that the first reference attitude is the attitude information of the photographing device at the photographing time of the current image frame at this time, and the third reference attitude is the attitude information of the photographing device at the photographing time of the first history image frame, thereby obtaining the reference attitude corresponding to the photographing time of the current image frame.

The first pose is determined based on the positioning auxiliary image, wherein the preset plane is a plane where the positioning auxiliary image is located; obtaining an offset by using the pose in the first pose and the third reference pose includes: and taking the ratio between the posture in the first posture and the third reference posture as the offset.

Therefore, by taking the ratio, the offset amount can be obtained.

The above acquiring the first pose of the first historical image frame in the world coordinate system includes: determining a first transformation parameter between the first historical image frame and the positioning auxiliary image based on a first matching point pair between the first historical image frame and the positioning auxiliary image, and obtaining a first posture by using the first transformation parameter; or determining a second transformation parameter between the first historical image frame and the second historical image frame based on a second matching point pair between the first historical image frame and the second historical image frame, and obtaining the first pose by using the second transformation parameter and a third transformation parameter between the second historical image frame and the positioning auxiliary image, wherein the second historical image frame is positioned in front of the first historical image frame.

Thus, the first pose may be derived based on a first pair of matching points between the first history image frame and the positioning assistance image, or a second pair of matching points between the first history image frame and the second history image frame and a third transformation parameter between the second history image frame and the positioning assistance image.

The determining the final pose of the current image frame in the world coordinate system based on the second reference pose and the image information of the current image frame and other image frames includes: and determining a final pose based on the second reference pose and photometric errors between the current image frame and other image frames.

Therefore, by calculating photometric errors between the current image frame and other image frames and reducing the errors by using the second reference pose, the accuracy of the final pose can be improved.

Wherein, the determining the final pose based on the second reference pose and the photometric error between the current image frame and the other image frames includes: at least one first candidate pose is obtained, and a first candidate pose is selected as a final pose based on the second reference pose and first pixel value differences between the current image frame and other image frames.

Therefore, the first candidate pose is restrained and optimized by utilizing the second reference pose, and the accurate first candidate pose can be obtained when the first candidate pose is solved.

The first candidate pose is determined based on the initial pose of the current image frame in the world coordinate system, and the initial pose is determined based on photometric errors between the current image frame and other image frames; before determining the final pose based on the second reference pose and photometric errors between the current image frame and other image frames, the method further comprises: determining a spatial point corresponding to the first characteristic point in the other image frames by using the second poses of the other image frames in the world coordinate system; selecting a first candidate pose as a final pose based on the second reference pose and the first pixel value difference between the current image frame and the other image frames, comprising: and determining a second feature point corresponding to the first candidate pose from the current image frame based on each first candidate pose and each space point, acquiring a first pixel value difference between the first feature point and the second feature point, and selecting the first candidate pose as a final pose based on the first pixel value difference and a pose difference between the second reference pose and the first candidate pose.

Therefore, by using the corresponding points of the three-dimensional points in the determined space in the other image frames and in the current image frame, it is possible to obtain a more accurate first candidate pose by calculating the difference of the pixel values and by using the data of the gyroscope (second reference pose) to constrain and optimize the first candidate pose.

Before determining the final pose based on the second reference pose and photometric errors between the current image frame and other image frames, the method further comprises the following steps: acquiring at least one second candidate pose, and selecting a second candidate pose as an initial pose based on second pixel value differences between the current image frame and other image frames; and/or selecting a first candidate pose as a final change pose based on the first pixel value difference and a pose difference between the second reference pose and the first candidate pose, comprising: and selecting a first candidate pose corresponding to a second feature point with the first pixel value difference and the posture difference meeting first preset requirements as a final change pose.

Therefore, by determining the corresponding points of the three-dimensional points in the space in other image frames and in the current image frame, a more accurate initial pose can be obtained by a difference method for calculating the second pixel value. In addition, the first candidate poses meeting the preset requirements are screened, so that relatively accurate pose information can be obtained.

Wherein the second candidate pose is determined based on the second pose; and/or selecting a second candidate pose as the initial pose based on second pixel value differences between the current image frame and other image frames, including: and selecting a second candidate pose corresponding to a second feature point with a second pixel value difference meeting a second preset requirement as an initial pose.

Therefore, a relatively accurate initial pose can be obtained by screening the second candidate poses meeting the preset requirements.

A second aspect of the present application provides a visual positioning apparatus comprising: the device comprises an acquisition module, a first position and posture determination module, an adjustment module and a second position and posture determination module; the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a current image frame obtained by shooting by a shooting device of the device; the first pose determination module is used for acquiring a first reference pose, wherein the first reference pose is a pose of the equipment corresponding to the shooting moment of the current image frame and relative to a reference plane; the adjusting module is used for adjusting the first reference attitude by utilizing the offset between the reference plane and a preset plane in the world coordinate system to obtain a second reference attitude; and the second pose determination module is used for determining the final pose of the current image frame in the world coordinate system based on the second reference pose and the image information of the current image frame and other image frames, wherein the shooting time of the other image frames is before the current image frame.

A third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the visual positioning method in the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions, which when executed by a processor, implement the visual positioning method of the first aspect.

According to the scheme, the offset between the reference plane and the preset plane in the world coordinate system is obtained, the first reference attitude information can be adjusted based on the offset, and the second reference attitude information is obtained and is used as the reference information of the attitude of the current image frame relative to the preset plane in the world coordinate system, so that the final attitude of the current image frame in the world coordinate system can be optimized by using the second reference attitude information, the accuracy of the final attitude is improved, and accurate positioning is achieved.

Drawings

FIG. 1 is a schematic flow chart of a first embodiment of a visual positioning method of the present application;

FIG. 2 is a schematic flow chart of a second embodiment of the visual positioning method of the present application;

FIG. 3 is a schematic flow chart of a third embodiment of the visual positioning method of the present application;

FIG. 4 is a schematic diagram of a frame of an embodiment of the visual positioning apparatus of the present application;

FIG. 5 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 6 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flow chart of a visual positioning method according to a first embodiment of the present application.

Specifically, the method may include the steps of:

step S11: a current image frame captured by a capturing device is acquired.

It is understood that the camera may be any device capable of capturing images, such as a mobile phone, a tablet computer, a notebook computer, etc. The current image frame is obtained by shooting through the shooting device at the current moment by the shooting device.

In some embodiments, in order to achieve positioning of the shooting device, the shooting device may be used to shoot a target plane, and then the method of this embodiment is used to obtain, based on the image frame obtained by shooting, a pose of the shooting device relative to the target plane when the image frame is shot, that is, a pose of the shooting device in a world coordinate system established based on the target plane when the image frame is shot. The world coordinate system established based on the target plane may be understood as a preset plane of the world coordinate system, such as an XOY plane or a YOZ plane, and the like, thereby constructing the world coordinate system. In some application scenarios, a certain point of the object plane may be taken as the origin of the world coordinate system, such as but not limited to the midpoint of the corresponding object plane portion in the image frame, or the midpoint of the positioning assistance image in the object plane. In some embodiments, the target plane may be any plane, for example, a horizontal plane or a plane other than a vertical plane, so that the pose of the camera with respect to the other plane can be obtained.

Step S12: a first reference pose is acquired.

The first reference attitude is an attitude of the photographing device corresponding to a photographing time of the current image frame and relative to the reference plane. The first reference attitude is, for example, rotation information of the photographing apparatus, i.e., rotation information of the photographing apparatus with respect to a reference plane.

In some implementations, the first reference pose is detected by a sensing device that is fixed relative to the camera, and thus the pose information obtained by the sensing device can be considered pose information for the camera. The sensor may be a sensor integrated within the camera device, such that the sensor device is fixed relative to the camera device, for example, a sensor within a mobile phone or a tablet computer. The sensor can also be external in camera device to with camera device setting on same platform, for example camera device and sensor all set up in an electronic equipment, like unmanned aerial vehicle, mobile robot etc.. The sensing device is for example a gyroscope.

In some implementation scenarios, the difference between the detection time of the first reference pose and the capturing time of the current image frame does not exceed a first preset time difference, and thus the first reference pose may be considered as the pose of the capturing device corresponding to the capturing time of the current image frame. Wherein the first preset difference is, for example, 20 ms, 15 ms, etc., and the specific time can be set according to the requirement. In a specific implementation scenario, in a case that there are a plurality of detection times, and the difference between the shooting time of the current image frame and the detection time does not exceed the first preset time difference, the detection time closest to the shooting time of the current image frame may be selected to obtain the first reference posture. Since the difference between the detection time and the capturing time of the current image frame does not exceed the first preset time difference, the first reference attitude may be considered to be the attitude information of the device at the capturing time of the current image frame at this time. The sensor device is, for example, a gyroscope of the camera device. The reference plane is, for example, a certain plane determined based on data of the gyroscope, for example, a horizontal plane or a vertical plane.

Step S13: and adjusting the first reference attitude by using the offset between the reference plane and a preset plane in the world coordinate system to obtain a second reference attitude.

The predetermined plane in the world coordinate system is, for example, an XOY plane of the world coordinate system, or an XOZ plane, a YOZ plane, or the like. In one implementation scenario, the preset plane is the XOY plane of the world coordinate system. In a specific implementation scenario, if the pose information of the shooting device is acquired based on the image frame obtained by shooting the positioning auxiliary image on the target plane, that is, the plane where the positioning auxiliary image is located, may be a preset plane.

After the first reference attitude is obtained, it means that the attitude of the photographing apparatus with respect to the reference plane has been obtained. At this time, the offset between the reference plane and the other plane may be obtained, and the first reference posture is adjusted by using the offset, so as to obtain a second reference posture of the photographing device relative to the other plane, that is, to obtain the posture of the photographing device relative to the other plane, that is, the rotation information.

In the present embodiment, the other plane is a preset plane in the world coordinate system, and thus the second reference posture can be regarded as rotation information of the photographing device with respect to the preset plane of the world coordinate system.

In one implementation scenario, since the first reference attitude is detected by the gyroscope and the reference plane is a certain plane determined based on the gyroscope, the second reference attitude obtained by adjusting the first reference attitude by using the offset between the reference plane and the preset plane in the world coordinate system can also be regarded as the rotation amount required for transforming the reference plane to the preset plane.

Step S14: and determining the final pose of the current image frame in the world coordinate system based on the second reference pose and the image information of the current image frame and other image frames.

In one implementation scenario, the capture time of the other image frames precedes the current image frame. The other image frames may differ from the current image frame by n frames, n being greater than or equal to 0. In one implementation scenario, n is 0, and the other image frames are previous to the current image frame. In one implementation scenario, when n is greater than 0, the other image may be a frame before the current frame image.

The image information of the current image frame or other image frames is, for example, information about feature points on the image, and the information about the feature points can be used to calculate pose information of the camera. The feature points can be obtained by extracting the features of the image frame through a feature extraction algorithm. In the present application, the pose information may include pose information and position information of the photographing device, and will not be described later. In one specific implementation scenario, the orientation information of the camera is orientation information, which can also be understood as rotation information, and the position information is a three-dimensional coordinate of the camera in a world coordinate system or a three-dimensional vector from an origin to a position of the camera.

In the disclosed embodiment, the final pose is the pose of the current image frame in the world coordinate system.

In an implementation scenario, image information of a current image frame and image information of other image frames can be used first, and an image processing algorithm is used to obtain first relative pose change information between the two image frames. Then, an initial final pose is obtained based on the first relative pose change information and pose information of other image frames relative to the world coordinate system, which is obtained by calculation by using the other image frames. Or directly taking the pose of the image frame closest to the current frame as the initial final pose, for example, the pose of the image frame of the previous frame can be taken as the initial final pose of the image of the current frame. And the initial final pose comprises rotation amount information relative to a preset plane of a world coordinate system, and the final pose of the current image frame in the world coordinate system is determined by calculating and optimizing the rotation amount information, the second reference pose information and the image information of the current image frame and other image frames.

Therefore, the offset between the reference plane and the preset plane in the world coordinate system is obtained, the first reference attitude information can be adjusted based on the offset to obtain the second reference attitude information, and the second reference attitude information is used as the reference information of the attitude of the current image frame relative to the preset plane in the world coordinate system, so that the final attitude of the current image frame in the world coordinate system can be optimized by using the second reference attitude information, the accuracy of the final attitude is improved, and accurate positioning is realized.

In an implementation scenario, based on the second reference pose and image information of the current image frame and other image frames, determining a final pose of the current image frame in a world coordinate system may specifically be: and determining a final pose based on the second reference pose and photometric errors between the current image frame and other image frames.

In a specific implementation scenario, the relative pose change between the current image frame and other image frames is firstly obtained, the luminosity error between the current image frame and other image frames is calculated, then the final pose of the current image frame is obtained by using the relative pose change, then the second reference pose is used as a constraint factor to optimize and reduce the luminosity error as much as possible, and finally the final pose of the previous image frame in a world coordinate system is obtained.

In a specific implementation scenario, the pose information of the current image frame in the world coordinate system is firstly obtained as an initial final pose, then the photometric error between the current image frame and other image frames is obtained by using the pose information of other image frames in the world coordinate system, then the photometric error is optimized and reduced as much as possible by using the second reference pose as a constraint factor, and finally the final pose of the current image frame in the world coordinate system is obtained.

Referring to fig. 2, fig. 2 is a flowchart illustrating a visual positioning method according to a second embodiment of the present application. This embodiment is a further extension of the first embodiment, and specifically, before the step of "obtaining the first reference posture" mentioned in the step S12, the following steps may be further performed to obtain the offset between the reference plane and the preset plane in the world coordinate system.

Step S21: and acquiring a first pose of the first historical image frame in a world coordinate system, and acquiring a third reference pose.

In one implementation scenario, the first pose is determined based on the positioning assistance image, wherein the predetermined plane is a plane where the positioning assistance image is located, at this time, the center of the positioning assistance image is at an origin of a world coordinate system, a horizontal axis of the positioning assistance image is parallel to an X axis of the world coordinate system, a vertical axis of the positioning assistance image is parallel to a Y axis of the world coordinate system, and a Z axis of the world coordinate system is perpendicular to the positioning assistance image plane.

In one implementation scenario, the first pose may be obtained by performing image detection using the positioning assistance image through an image registration technique.

In one implementation scenario, the third reference pose may be a pose of the camera with respect to the reference plane corresponding to the capturing time of the first historical image frame. The third reference attitude is detected by a sensing device (e.g., a gyroscope) fixed relative to the camera.

In one implementation scenario, the difference between the detection time of the third reference pose and the capturing time of the first historical image frame does not exceed a second preset time difference. In this way, the third reference attitude can be considered to be the same as the attitude information of the first attitude.

In one implementation scenario, a first transformation parameter between the first history image frame and the positioning assistance image may be determined based on a first matching point pair between the first history image frame and the positioning assistance image, and a first pose may be derived using the first transformation parameter.

The first transformation parameter is for example a homography matrix of the first historical image frame with respect to the positioning assistance image. The first pose may be obtained based on a homography using an image processing algorithm, such as a pnp (passive-n-Point) algorithm.

In the present embodiment, the feature points for feature extraction based on the image frame can be considered to be in the same plane as the positioning assistance image. In the present application, the feature points extracted from the image frames may include feature points obtained by feature extraction of a series of image frames in an image pyramid established based on the image frames. By respectively extracting the features of the first historical image frame and the positioning auxiliary image, a first feature point corresponding to the first historical image frame and a second feature point corresponding to the positioning auxiliary image can be obtained. The number of feature points is not particularly limited. The feature extraction algorithm is, for example, FAST (features from obtained segment) algorithm, SIFT (Scale-innovative feature transform) algorithm, orb (organized FAST and related bridge) algorithm, and the like. In one implementation scenario, the feature extraction algorithm is the orb (organized FAST and rotaed brief) algorithm. After the feature points are obtained, a feature representation corresponding to each feature point is also obtained, and the feature representation is, for example, a feature vector. Therefore, each feature point has a feature representation corresponding to it.

By calculating the matching degree of each first feature point and each second feature point, a series of matching point pairs can be obtained, and then a matching point pair with a high matching degree can be selected as a first matching point pair. The degree of matching between the first feature point and the second feature point may be calculated as a distance between feature representations of the two feature points, and a closer distance may be considered as a better match. Then, a first transformation parameter between the first historical image frame and the positioning assistance image may be determined based on the obtained series of first matching point pairs by using an image registration algorithm, and a first pose may be obtained by using the first transformation parameter. The image registration algorithm is for example a grayscale and template based algorithm or a feature based matching method. For example, with respect to the feature-based matching method, a certain number of matching point pairs with respect to the image to be registered and the target image may be obtained, and then transformation parameters of the first history image frame and the positioning auxiliary image are calculated by using a random consensus sampling algorithm (RANSAC), thereby achieving the registration of the images.

In this way, a first pose may be derived based on a first matching point pair between the first historical image frame and the positioning assistance image.

In one implementation scenario, a second transformation parameter between the first historical image frame and the second historical image frame may be determined based on a second matching point pair between the first historical image frame and the second historical image frame, and the first pose may be obtained using the second transformation parameter and a third transformation parameter between the second historical image frame and the positioning assistance image.

In this implementation scenario, the second historical image frame precedes the first historical image frame.

The specific process of obtaining the second matching point pair may refer to the specific description of obtaining the first matching point pair, and is not described herein again.

The third transformation parameter is for example a homography matrix of the second historical image frame with respect to the positioning assistance image. Then, by using an image processing algorithm, such as a pnp algorithm, the pose information when the second historical image frame is captured can be obtained based on the homography matrix. As to how to obtain the third transformation parameter between the second history image frame and the positioning assistance image, reference may be made to the above detailed description of obtaining the first transformation parameter, which is not described herein again.

Thus, the first pose is obtained by using the second transformation parameter between the first and second historical image frames and the third transformation parameter between the second historical image frame and the positioning assistance image. In an implementation scenario, the process of obtaining the second transformation parameter may be obtained by calculating a matching point pair composed of feature points extracted from target images included in the first history image frame and the second history image frame.

In one implementation scenario, the specific process of obtaining the first transformation parameter of the first historical image frame relative to the positioning assistance image or obtaining the third transformation parameter of the second historical image frame relative to the positioning assistance image may include the following steps 1 and 2.

Step 1: one of the sets of first matching point pairs is selected as a target matching point pair.

In this embodiment, the feature point obtained by performing feature extraction on the positioning assistance image is defined as a third feature point, and the feature point obtained by performing feature extraction on the basis of the first history image frame or the second history image frame is defined as a fourth feature point. In one implementation scenario, the matching degree between the third feature point and the fourth feature point may be calculated to obtain the target matching point pair.

Then, a group of first matching point pairs is selected as the target matching point pair. In the selection, the selection may be started from the most matched pair. In the target matching point pair, the third feature point is the first matching point, and the fourth feature point is the second matching point.

Step 2: and obtaining a homography matrix corresponding to the target matching point pair based on the direction information of the target matching point pair.

The direction information of the target matching point pair represents a rotation angle of the first history frame image with respect to the positioning assistance image or a rotation angle of the second history frame image with respect to the positioning assistance image. Specifically, a first image region centered on the first matching point may be first extracted from the positioning assistance image, and a second image region centered on the fourth matching point may be extracted from the first history image frame or the second history image frame. Then, a first deflection angle of the first image area and a second deflection angle of the second image area are determined. Finally, a first transformation parameter or a third transformation parameter is obtained based on the first deflection angle and the second deflection angle, and specifically, the corresponding transformation parameter may be obtained based on the direction information of the target matching point pair and the pixel coordinate information of the first matching point and the second matching point in the target matching point pair.

In one implementation scenario, the first deflection angle is a directional angle between a line connecting the centroid of the first image region and the center of the first image region and a predetermined direction (e.g., an X-axis of a world coordinate system). The second deflection angle is a directed included angle between a connecting line of the centroid of the second image area and the center of the second image area and the preset direction.

In another implementation scenario, the first deflection angle θ can be directly obtained by the following equation:

θ＝arctan(∑yI(x,y),∑xI(x,y)) (1)

in the above formula (1), (x, y) represents an offset of a certain pixel point in the first image region with respect to the center of the first image region, I (x, y) represents a pixel value of the pixel point, Σ represents a summation coincidence, and a summation range is a pixel point in the first image region. Similarly, the second deflection angle can also be calculated in the same way.

In one embodiment, the transformation parameters between the first historical frame image or the second historical frame image and the positioning assistance image may be obtained through the following steps a and b.

Step a: an angular difference between the first deflection angle and the second deflection angle is obtained.

The angular difference is, for example, the difference between the first deflection angle and the second deflection angle.

In one implementation scenario, equation (2) for calculating the angular difference is as follows:

wherein theta is an angle difference,

being a first deflection angle, T denotes a positioning assistance image,

f denotes the first history frame image or the second history frame image as the second deflection angle.

Step b: and obtaining a first candidate transformation parameter based on the angle difference and the scale corresponding to the first matching point pair.

The first candidate transformation parameter is, for example, a homography matrix of correspondence between the first or second historical frame image and the positioning assistance image. The homography matrix is calculated as follows:

H＝H_lH_sH_RH_r (3)

wherein, H is a corresponding homography matrix between the positioning auxiliary image and the first historical frame image or the second historical frame image, namely a first candidate transformation parameter; h_rRepresenting the amount of translation of the first or second historical frame image relative to the positioning assistance image; h_sThe scale corresponding to the represented first matching point pair is the scale information when the positioning auxiliary image is zoomed; h_RRepresenting the amount of rotation, H, of the first or second history frame image relative to the positioning assistance image_lRepresenting the amount of translation reset after translation.

In order to obtain the angular difference, the above equation (3) may be converted to obtain equation (4).

Wherein,

pixel coordinates on the positioning aid image for the first matching point;

pixel coordinates of the second matching point on the first historical frame image or the second historical frame image; s is the scale corresponding to the first matching point pair, i.e. the point

A corresponding scale; θ is the angular difference.

Therefore, the rotation angle of the first historical frame image or the second historical frame image relative to the positioning auxiliary image is obtained by calculating the direction information of the target matching point pair, so that the rotation angle information can be used for obtaining the transformation parameters between the first historical frame image or the second historical frame image and the positioning auxiliary image, and the corresponding transformation parameters can be calculated by using fewer matching point pairs.

Step S22: and obtaining an offset by using the posture in the first posture and the third reference posture, wherein the posture in the first posture is a posture relative to a preset plane.

The posture in the first posture is a posture relative to a preset plane, that is, rotation amount information relative to the preset plane. Thus, an offset can be derived based on the pose in the first pose and the third reference pose.

In one implementation scenario, the ratio between the pose in the first pose and the third reference pose may be taken as the offset. In this way, the offset can be obtained by taking the ratio.

In one implementation scenario, the pose in the first pose is R₁And the third reference attitude is R₂And δ represents the offset amount, the calculation formula (5) of the offset amount is as follows:

δ＝R₁(R₂)^-1 (5)

In an implementation scenario, the feature points obtained by feature extraction through the feature extraction algorithm mentioned in the above embodiments may all be considered to be located on the same plane as the target image.

Referring to fig. 3, fig. 3 is a flowchart illustrating a visual positioning method according to a third embodiment of the present application. In this implementation, the final pose is pose information of the current image frame in the world coordinate system, that is, pose information of the photographing device in the world coordinate system when photographing the current image frame.

Step S31: a current image frame captured by a capturing device of a capturing device is acquired.

For a detailed description of this step, please refer to step S11 above.

Step S32: a first reference attitude is acquired, wherein the first reference attitude is an attitude of the photographing device corresponding to a photographing time of the current image frame and relative to a reference plane.

For a detailed description of this step, please refer to step S12 above.

Step S33: and adjusting the first reference attitude by using the offset between the reference plane and a preset plane in the world coordinate system to obtain a second reference attitude.

For a detailed description of this step, please refer to step S13 above.

Step S34: and acquiring at least one first candidate pose, and selecting a first candidate pose as a final pose based on the second reference pose and the first pixel value difference between the current image frame and other image frames.

In the embodiment of the present disclosure, the first candidate pose is pose information of the current image frame in a world coordinate system. The first candidate poses may be a plurality of first candidate poses obtained based on an image processing algorithm, or poses of image frames having pose information closest to a current image frame may be directly selected as a candidate first candidate pose, and then a plurality of first candidate poses may be generated by using an iterative optimization method.

On the basis of the first candidate poses, the first pixel value difference corresponding to each first candidate pose can be obtained on the basis of each first candidate pose and on the basis of the second reference pose and the first pixel value difference between the current image frame and other image frames, and then one first candidate pose is selected as the final pose. The first pixel value difference between the current image frame and the other image frames may be a pixel value difference of a pixel point on the current image frame corresponding to a pixel point on the other image frames. For example. There is a three-dimensional point A in space, A is a on the current image frame₁A is a on other image frames₂，a₁The pixel point is the point a of other image frames₂And (4) corresponding pixel points. At the same time, the first candidate pose is further optimized using the second reference pose and the pose in the first candidate pose.

In one implementation scenario, the following equation (6) may be utilized to select a first candidate pose as the final pose.

Wherein, C is the final error information;

is the first candidate pose and is the first candidate pose,

for rotation amount information (which may also be referred to as rotation amount or orientation),

is translation amount information;

is a second reference attitude;

is the second reference posture

With gestures in the first candidate pose

The difference in attitude therebetween; spatial three-dimensional point X_pA spatial point corresponding to a fifth feature point in the other image frames is determined based on the first candidate pose,

as a spatial three-dimensional point X_pA sixth feature point projected on the current image frame,

the pixel value of a sixth characteristic point on the current image frame is K, and the K is a memory matrix of the shooting device; i (x)_p) Is the pixel value of the fifth feature point on the other image frame,

is a first pixel value difference; sigma_pRepresenting that the first pixel value difference is calculated and summed for the points (the fifth feature point and the sixth feature point) corresponding to the current image frame and the other image frames; α, β are the tuning parameters of two constraint terms. The proportion can be set through actual use.

And generating a plurality of first candidate poses by using an iterative optimization method, and selecting the corresponding first candidate pose when the final error information C is minimum.

In one implementation scenario, the user may be provided with a display,

the calculation formula (7) is as follows:

wherein,

is detected by the sensing device and is detected by the sensing device,

the rotation amount information with respect to the predetermined plane is shown, and δ is the offset amount obtained by the above equation (5).

In the formula (6), the first and second groups,

is detected by a sensing device, e.g. data obtained by gyroscope detection, and then information on the amount of rotation relative to a predetermined plane is obtained using the amount of offset

For the rotation amount information in the first candidate pose obtained by calculation to be also rotation amount information with respect to the preset plane, both should be theoretically the same. Therefore, constraint information of the first candidate pose can be used to optimize.

Therefore, the first candidate pose is restrained by the second reference pose, and the more accurate first candidate pose can be obtained when the first candidate pose is subjected to iterative optimization.

In a disclosed embodiment, after the final error information is obtained, the first candidate pose corresponding to the second feature point of which the final error information meets the first preset requirement is selected as the final pose. The first preset requirement may be set as required, and is not limited herein. In one implementation scenario, if the first pixel value difference and the pose difference are calculated by the above formula (6), the first candidate pose information corresponding to C that meets the preset requirement is selected as the final pose. Therefore, relatively accurate pose information can be obtained by screening the first candidate poses meeting the preset requirements.

In a disclosed embodiment, before the step S33 is executed, a step of determining a spatial point corresponding to the fifth feature point in the other image frame by using the second pose in the world coordinate system of the other image frame may be executed to obtain a spatial point corresponding to the fifth feature point in the other image frame. In the embodiments of the present disclosure, the fifth feature point is the first feature point mentioned in the claims.

The second poses of other image frames in the world coordinate system may be calculated based on an image registration algorithm, or obtained by using a visual tracking algorithm, which is not limited herein. After the second pose is obtained, the depth value of the spatial point corresponding to the fifth feature point in the space can be calculated, and then the three-dimensional coordinates of the spatial point can be calculated, so that the position of the spatial point can be determined. Thereby, a spatial point corresponding to the fifth feature point in a certain number of other image frames may be determined.

In the case of obtaining the fifth feature point, the "selecting a first candidate pose as the final pose based on the second reference pose and the first pixel value difference between the current image frame and the other image frames" mentioned in step S34 may specifically be: and determining a sixth feature point corresponding to the first candidate pose from the current image frame based on each first candidate pose and the space point, and selecting a first candidate pose as a final pose based on the first pixel value difference and the pose difference between the second reference pose and the first candidate pose. In the embodiment of the present disclosure, the sixth feature point is the second feature point in the claims.

After the three-dimensional coordinates of the space point corresponding to the fifth feature point in the other image frames, the second poses of the other image frames in the world coordinate system, and the first candidate poses of the current frame image in the world coordinate system are obtained, the sixth feature point corresponding to the space point can be determined in the current image frame in a projection mode. The sixth feature point is a point on the current image corresponding to the fifth feature point on the other image frame.

Then, a first pixel value difference may be obtained based on the fifth feature point and the sixth feature point, specifically, the first pixel value difference may be obtained based on the pixel value of the fifth feature point and the pixel value of the sixth feature point. Finally, a first candidate pose may be selected as the final pose based on the first pixel value difference and the pose difference between the second reference pose and the first candidate pose. The specific calculation method may refer to the above equation (6).

Therefore, by using the corresponding points of the three-dimensional points in the determined space in other image frames and in the current image frame, a more accurate first candidate pose can be obtained by a difference method for calculating pixel values.

In a disclosed embodiment, the first candidate pose is determined based on an initial pose of the current image frame in a world coordinate system. That is, a series of first candidate poses can be obtained based on the initial pose and an iterative optimization method, and then a final pose is selected from the series of first candidate poses.

In one implementation scenario, the initial pose is determined based on photometric errors between the current image frame and other image frames. Namely, the initial pose can be obtained by combining a photometric error equation and an iterative optimization method.

In one implementation scenario, before performing the above-mentioned "determining a final pose based on photometric errors between the second reference pose, the current image frame and other image frames", an initial pose may be obtained by performing the following step 1.

Step 1: at least one second candidate pose is acquired, and a second candidate pose is selected as an initial pose based on second pixel value differences between the current image frame and other image frames.

The second candidate pose is, for example, pose information of other image frames with respect to the world coordinate system. The second candidate poses can be a plurality of second candidate poses calculated based on an image processing algorithm, and the pose of the image frame with pose information closest to other image frames can also be directly selected as a candidate second candidate pose. Then, a plurality of second candidate poses can be generated by using an iterative optimization method. In an implementation scenario, the second candidate pose may be determined based on the second pose, and specifically, a plurality of second candidate poses may be generated by using an iterative optimization method based on the second pose.

On this basis, one second candidate pose may then be selected as the initial pose based on each second candidate pose and based on second pixel value differences between the current image frame and other image frames. The second pixel value difference between the current image frame and the other image frames may be a pixel value difference of a pixel point on the current image frame corresponding to a pixel point on the other image frames. For example. There is a three-dimensional point B in space, B being B on the current image frame₁B on other image frames is B₂，B₁The pixel point is the point B of the other image frame₂And (4) corresponding pixel points.

In one implementation scenario, a second candidate pose may be selected as the initial change pose between the current image frame and the other image frames by equation (8) below.

Wherein C is the second pixel value difference;

is the second candidate pose and is the second candidate pose,

in order to be the information of the amount of rotation,

is translation amount information; spatial three-dimensional point X_pA spatial point corresponding to a fifth feature point in the other image frame is determined based on the second candidate pose,

the pixel value of a sixth characteristic point on the current image frame is K, and the K is a memory matrix of the shooting device; i (x)_p) Is the pixel value, sigma, of the fifth feature point on the other image frame_pIndicating that the second pixel value difference is calculated and summed for the points (the fifth characteristic point and the sixth characteristic point) corresponding to the current image frame and other image frames;

and generating a plurality of second candidate poses by using an iterative optimization method, and selecting the corresponding second candidate pose with the minimum difference C of the second pixel values as an initial pose.

In an implementation scenario, after the second pixel value difference is obtained, a second candidate pose corresponding to a second feature point whose second pixel value difference satisfies a second preset requirement may be selected as an initial change pose. The second preset requirement may be set as required, and is not limited herein. If the second pixel value difference is calculated through the formula (8), selecting the second candidate pose information corresponding to the C meeting the preset requirement as the initial pose. Therefore, the second candidate poses meeting the preset requirements are screened, and relatively accurate pose information can be obtained.

Therefore, the difference of the second pixel values is calculated, so that the initial pose meeting the requirements is obtained; and then, based on the initial pose, the final error information is obtained by combining the detection data (second reference pose) of the gyroscope and the photometric error, and then the final pose meeting the requirements can be obtained. By utilizing the correction of the gyroscope data, the final pose with higher accuracy can be obtained.

Referring to fig. 4, fig. 4 is a schematic frame diagram of an embodiment of a visual positioning apparatus according to the present application. The visual positioning apparatus 40 includes an acquisition module 41, a first position and orientation determination module 42, an adjustment module 43, and a second position and orientation determination module 44. The obtaining module 41 is configured to obtain a current image frame captured by the capturing device; the first pose determination module 42 is configured to obtain a first reference pose, where the first reference pose is a pose of the photographing device corresponding to the photographing time of the current image frame and relative to the reference plane; the adjusting module 43 is configured to adjust the first reference posture by using an offset between the reference plane and a preset plane in the world coordinate system to obtain a second reference posture; the second pose determination module 44 is configured to determine a final pose of the current image frame in the world coordinate system based on the second reference pose and image information of the current image frame and other image frames, wherein shooting times of the other image frames are before the current image frame.

Before the adjusting module 43 is configured to adjust the first reference pose by using the offset between the reference plane and the preset plane in the world coordinate system to obtain the second reference pose, the offset obtaining module is configured to obtain the first pose of the first historical image frame in the world coordinate system and obtain a third reference pose, where the third reference pose is a pose of the shooting device corresponding to the shooting time of the first historical image frame and relative to the reference plane; and obtaining an offset by using the posture in the first posture and the third reference posture, wherein the posture in the first posture is a posture relative to a preset plane.

The first reference attitude is detected by a sensing device fixed relative to the shooting device, and the difference between the detection time of the first reference attitude and the shooting time of the current image frame does not exceed a first preset time difference; and/or the third reference posture is detected by a sensing device fixed relative to the shooting device, and the difference between the detection time of the third reference posture and the shooting time of the first historical image frame does not exceed a second preset time difference.

The first pose is determined based on the positioning auxiliary image, wherein the preset plane is a plane where the positioning auxiliary image is located; and/or the offset acquisition module is used for obtaining an offset by using the posture in the first pose and the third reference posture, and comprises: and taking the ratio between the posture in the first posture and the third reference posture as the offset.

The offset obtaining module is configured to obtain a first pose of a first historical image frame in a world coordinate system, and includes: determining a first transformation parameter between the first historical image frame and the positioning auxiliary image based on a first matching point pair between the first historical image frame and the positioning auxiliary image, and obtaining a first posture by using the first transformation parameter; or determining a second transformation parameter between the first historical image frame and the second historical image frame based on a second matching point pair between the first historical image frame and the second historical image frame, and obtaining the first pose by using the second transformation parameter and a third transformation parameter between the second historical image frame and the positioning auxiliary image, wherein the second historical image frame is positioned in front of the first historical image frame.

The second pose determination module 44 is configured to determine a final pose of the current image frame in the world coordinate system based on the second reference pose and image information of the current image frame and other image frames, and includes: and determining a final pose based on the second reference pose and photometric errors between the current image frame and other image frames.

The second pose determination module 44 is configured to determine a final pose based on the second reference pose and photometric errors between the current image frame and other image frames, and includes: at least one first candidate pose is obtained, and a first candidate pose is selected as a final pose based on the second reference pose and first pixel value differences between the current image frame and other image frames.

The first candidate pose is determined based on initial pose information of the current image frame in a world coordinate system, and the initial pose information is determined based on photometric errors between the current image frame and other image frames; and/or before the second pose determination module 44 is configured to determine the final pose based on the second reference pose and the photometric error between the current image frame and the other image frames, the second pose determination module 44 is further configured to determine the spatial point corresponding to the first feature point in the other image frames by using the second poses of the other image frames in the world coordinate system. In addition, the second pose determination module 44 is configured to select a first candidate pose as a final pose based on the second reference pose and the first pixel value difference between the current image frame and the other image frames, including: and determining a second feature point corresponding to the first candidate pose from the current image frame based on each first candidate pose and the space point, obtaining a first pixel value difference between the first feature point and the second feature point, and selecting the first candidate pose as a final pose based on the first pixel value difference and a pose difference between the second reference pose and the first candidate pose.

Before the second pose determination module 44 is configured to determine the final pose based on the second reference pose and the photometric errors between the current image frame and the other image frames, the second pose determination module 44 is further configured to obtain at least one second candidate pose, and select a second candidate pose as the initial pose based on the second pixel value difference between the current image frame and the other image frames; the second pose determination module 44 is configured to select a first candidate pose as a final change pose based on the first pixel value difference and the pose difference between the second reference pose and the first candidate pose, including: and selecting a first candidate pose corresponding to a second feature point with the first pixel value difference and the posture difference meeting first preset requirements as a final change pose.

And determining the second candidate pose based on the second pose. The second pose determination module 44 is configured to select a second candidate pose as the initial pose based on second pixel value differences between the current image frame and other image frames, and includes: and selecting a second candidate pose corresponding to a second feature point with a second pixel value difference meeting a second preset requirement as an initial pose.

Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of an electronic device according to the present application. The electronic device 50 comprises a memory 51 and a processor 52 coupled to each other, and the processor 52 is configured to execute program instructions stored in the memory 51 to implement the steps of any of the embodiments of the visual positioning method described above. In one particular implementation scenario, electronic device 50 may include, but is not limited to: a microcomputer, a server, and the electronic device 50 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 52 is configured to control itself and the memory 71 to implement the steps of any of the embodiments of the visual positioning method described above. Processor 52 may also be referred to as a CPU (Central Processing Unit). Processor 52 may be an integrated circuit chip having signal processing capabilities. The Processor 52 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 52 may be commonly implemented by an integrated circuit chip.

The scheme can be beneficial to improving the accuracy of image registration.

Referring to fig. 6, fig. 6 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 60 stores program instructions 601 capable of being executed by a processor, the program instructions 601 being for implementing the steps of any of the embodiments of the visual localization method described above.

According to the scheme, the offset between the reference plane and the preset plane in the world coordinate system is obtained, the first reference attitude information can be adjusted based on the offset, and the second reference attitude information is obtained and is used as the reference information of the attitude of the current image frame relative to the preset plane in the world coordinate system, so that the final attitude of the current image frame in the world coordinate system can be optimized by using the second reference attitude information, the accuracy of the final attitude is improved, and accurate positioning is realized.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of visual localization, comprising:

acquiring a current image frame obtained by shooting by a shooting device;

acquiring a first reference attitude, wherein the first reference attitude is an attitude of the photographing device corresponding to a photographing time of the current image frame and relative to a reference plane;

adjusting the first reference attitude by utilizing the offset between the reference plane and a preset plane in a world coordinate system to obtain a second reference attitude;

determining a final pose of the current image frame in the world coordinate system based on the second reference pose and image information of the current image frame and other image frames, wherein shooting time of the other image frames is before the current image frame.

2. The method of claim 1, wherein before the adjusting the first reference pose using the offset between the reference plane and a predetermined plane in the world coordinate system to obtain the second reference pose, the method further comprises:

acquiring a first pose of a first historical image frame in the world coordinate system, and acquiring a third reference pose, wherein the third reference pose is a pose of the shooting device corresponding to the shooting time of the first historical image frame and relative to a reference plane;

and obtaining the offset by using the posture in the first pose and a third reference posture, wherein the posture in the first pose is a posture relative to the preset plane.

3. The method according to claim 1 or 2, characterized in that said first reference attitude is detected by a sensing device fixed with respect to said capturing device, the detection instant of said first reference attitude being not more than a first preset time difference from the capturing instant of said current image frame;

and/or the third reference posture is detected by a sensing device fixed relative to the shooting device, and the difference between the detection time of the third reference posture and the shooting time of the first historical image frame does not exceed a second preset time difference.

4. The method according to claim 2 or 3, wherein the first pose is determined based on a positioning assistance image, wherein the preset plane is a plane in which the positioning assistance image is located;

and/or obtaining the offset by using the posture in the first pose and a third reference posture, wherein the obtaining of the offset comprises:

and taking the ratio between the posture in the first pose and the third reference posture as the offset.

5. The method of claim 4, wherein the acquiring a first pose of a first historical image frame in the world coordinate system comprises:

determining a first transformation parameter between the first history image frame and the positioning auxiliary image based on a first matching point pair between the first history image frame and the positioning auxiliary image, and obtaining the first pose by using the first transformation parameter; or,

determining a second transformation parameter between the first history image frame and a second history image frame based on a second matching point pair between the first history image frame and the second history image frame, and obtaining the first pose by using the second transformation parameter and a third transformation parameter between the second history image frame and the positioning auxiliary image, wherein the second history image frame is positioned before the first history image frame.

6. The method of any one of claims 1 to 5, wherein the determining a final pose of the current image frame in the world coordinate system based on the second reference pose, the image information of the current image frame and other image frames comprises:

determining the final pose based on the second reference pose, photometric errors between the current image frame and the other image frames.

7. The method of claim 6,

said determining said final pose based on said second reference pose, photometric errors between said current image frame and said other image frames, comprising:

at least one first candidate pose is obtained, and one first candidate pose is selected as the final pose based on the second reference pose and a first pixel value difference between the current image frame and the other image frames.

8. The method of claim 7, wherein the first candidate pose is determined based on an initial pose of the current image frame in a world coordinate system, the initial pose being determined based on photometric errors between the current image frame and the other image frames;

and/or, prior to said determining the final pose based on the second reference pose, photometric errors between the current image frame and the other image frames, the method further comprises:

determining a spatial point corresponding to a first feature point in the other image frame by using a second pose of the other image frame in the world coordinate system;

the selecting, based on the second reference pose and first pixel value differences between the current image frame and the other image frames, the first candidate pose as the final pose comprises:

determining a second feature point corresponding to the first candidate pose from the current image frame based on each of the first candidate pose and the spatial point, acquiring a first pixel value difference between the first feature point and the second feature point, and selecting a first candidate pose as the final pose based on the first pixel value difference and a pose difference between the second reference pose and the first candidate pose.

9. The method of claim 8, wherein prior to said determining said final pose based on said second reference pose, photometric errors between said current image frame and said other image frames, said method further comprises the steps of:

acquiring at least one second candidate pose, and selecting one second candidate pose as the initial pose based on a second pixel value difference between the current image frame and the other image frames;

and/or, the selecting a first candidate pose as the final change pose based on the first pixel value difference and a pose difference between the second reference pose and the first candidate pose comprises:

and selecting a first candidate pose corresponding to the second feature point with the first pixel value difference and the posture difference meeting a first preset requirement as the final change pose.

10. The method of claim 9, wherein the second candidate pose is determined based on the second pose;

and/or selecting one of the second candidate poses as the initial pose based on a second pixel value difference between the current image frame and the other image frame, including:

and selecting a second candidate pose corresponding to the second feature point with the second pixel value difference meeting a second preset requirement as the initial pose.

11. A visual positioning device, comprising:

the acquisition module is used for acquiring a current image frame obtained by shooting by the shooting device;

the first posture determining module is used for acquiring a first reference posture, wherein the first reference posture is a posture of the shooting device corresponding to the shooting moment of the current image frame and relative to a reference plane;

the adjusting module is used for adjusting the first reference attitude by utilizing the offset between the reference plane and a preset plane in a world coordinate system to obtain a second reference attitude;

and the second pose determination module is used for determining the final pose of the current image frame in the world coordinate system based on the second reference pose and the image information of the current image frame and other image frames, wherein the shooting time of the other image frames is before the current image frame.

12. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the visual positioning method of any one of claims 1 to 10.

13. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the visual positioning method of any of claims 1 to 10.