CN116934857A

CN116934857A - Visual positioning method, device, equipment and medium based on panoramic picture

Info

Publication number: CN116934857A
Application number: CN202310875673.0A
Authority: CN
Inventors: 杨毅; 李睿; 高宇; 梁浩; 潘淼鑫
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-10-24

Abstract

The application provides a visual positioning method, device, equipment and medium based on panoramic pictures, and belongs to the technical field of robot visual positioning; the method comprises the steps of obtaining a certain number of panoramic pictures; calculating feature points and feature descriptors; selecting a reference frame; calculating the corresponding relation between the rest pictures and the characteristic points of the reference frame; selecting a frame of picture most similar to the reference frame and calculating the relation between the position and the posture; triangularization calculating depth information of the feature points; calculating the position and posture information of the rest pictures by a PnP method; all positioning information is optimized through a beam adjustment method, surrounding environment information can be comprehensively obtained through a panoramic picture, stability and accuracy of results can be greatly improved when visual positioning is carried out, particularly effective texture information can be obtained when a scene is degraded, positioning is continued, and the method improves the stability and accuracy of visual positioning and expands application scenes.

Description

Visual positioning method, device, equipment and medium based on panoramic picture

Technical Field

The application belongs to the technical field, and particularly relates to a novel solar cell module. The application belongs to the technical field of robot vision positioning, and particularly relates to a vision positioning method, device, equipment and medium based on panoramic pictures.

Background

In the field of visual positioning technology, pictures used to calculate positioning information are typically based on a small-bore imaging model. The small hole imaging is a natural phenomenon, a plate with small holes is used for shielding between a wall body and an object, an inverted real image of the object can be formed on the wall body, the middle shielding plate is moved back and forth, and the size of the image on the wall body can be changed. Such an imaging model is relatively simple but can only image a fixed field angle, so that the acquired image information can only be a certain part of space. In degraded scenes, such as white walls or when the colors are very consistent, the acquired image information is insufficient to calculate stable camera positioning information. Panoramic pictures refer to pictures with imaging ranges of 360 degrees of horizontal and vertical field angles, and texture information of the panoramic pictures can be restored to spherical coordinates so as to display omnidirectional environmental information. The panoramic camera can be used for conveniently acquiring high-quality panoramic pictures. However, solutions for visual localization based on panoramic pictures have been lacking in the prior art. The panoramic picture can be used for maximally acquiring surrounding environment information, and if visual positioning based on the panoramic picture is realized, the stability and accuracy in positioning calculation are greatly improved.

Disclosure of Invention

In view of the above, the application aims to provide a panoramic picture-based visual positioning method, a panoramic picture-based visual positioning device, a panoramic picture-based visual positioning equipment and a panoramic picture-based visual positioning medium, which can improve the stability and accuracy of visual positioning and expand application scenes.

A visual positioning method based on panoramic pictures comprises the following steps:

acquiring a plurality of panoramic pictures;

extracting feature points of the panoramic picture;

selecting one of panoramic pictures as a reference frame, and determining a global coordinate system by using the reference frame;

selecting panoramic pictures with the maximum similarity with the reference frames for pairing, and calculating an essential matrix E to obtain matching point pairs;

obtaining [ R|T ] representing the position and the gesture by calculating and decomposing an essential matrix E, wherein R and T are a rotation transformation matrix and a translation matrix respectively;

triangularizing the matching point pairs based on the obtained position and posture [ R|T ] to obtain three-dimensional points;

based on the obtained three-dimensional points, calculating positions and postures of the rest panoramic pictures relative to the reference frames by adopting a PnP method;

and optimizing the positions and the postures [ R|T ] of all panoramic pictures relative to the reference frame and the three-dimensional points by using a beam adjustment method, and completing positioning.

Preferably, the sets [ R|T ] obtained by decomposing the matrix E are triangulated, and the set [ R|T ] to obtain a positive depth is selected.

Further, selecting a distance from a point to a polar plane as a cost function, taking the minimum cost function as an optimization target, and iteratively solving [ R|T ] by a method for optimizing the column-Mart]The method comprises the steps of carrying out a first treatment on the surface of the The point-to-polar plane distance is expressed as: (M2) ^T * E.m1)/e.m1; where M1 and M2 represent a coordinate matrix of a pair of matching points.

Preferably, the method for triangulating the matching point pair based on the obtained position and posture [ R|T ] includes:

and (3) setting any point in the matching point pair as p in the spherical point in the panorama, setting the corresponding three-dimensional point as X, and according to the obtained R and T, letting F= [ R|T ], and obtaining the three-dimensional point X by constructing a linear equation of p multiplied by FX=0 and using SVD decomposition.

Further, after the positions and the postures of the rest panoramic pictures relative to the reference frame are calculated by adopting a PnP method, the following optimization is carried out:

for each panoramic picture obtained subsequently, calculating a matching point pair between the panoramic picture and a reference frame, and filtering mismatching points by adopting a RANSAC algorithm: setting M and M as coordinate matrixes of a spherical point p and a corresponding three-dimensional point X respectively, selecting a ray distance from a reprojection point to a target point as a cost function in a RANSAC algorithm, and iteratively solving [ R|T ] by a column-Mart optimization method]Using the cost function to be minimum; wherein, the ray distance is expressed as:|m| represents modulo M.

Preferably, ASIFI feature descriptors are calculated for the feature points.

Preferably, the similarity is calculated by adopting a normalized cross-correlation method.

A panoramic picture-based visual positioning device, comprising:

the panoramic picture acquisition module is used for acquiring a plurality of panoramic pictures;

the feature point extraction module is used for extracting feature points of the panoramic picture;

a pose resolving module configured to perform: selecting one of panoramic pictures as a reference frame, and determining a global coordinate system by using the reference frame;

a result optimization module configured to perform: and optimizing the positions and the postures [ R|T ] of all panoramic pictures relative to the reference frame and the three-dimensional points by using a beam adjustment method, and completing positioning.

An electronic device comprising a processor, a memory for storing instructions executable by the processor; wherein the processor is configured to perform the steps of the visual positioning method described above.

A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the steps of the visual positioning method described above.

The application has the following beneficial effects:

the application provides a visual positioning method, device, equipment and medium based on panoramic pictures, and belongs to the technical field of robot visual positioning; the method comprises the steps of obtaining a certain number of panoramic pictures; calculating feature points and feature descriptors; selecting a reference frame; calculating the corresponding relation between the rest pictures and the characteristic points of the reference frame; selecting a frame of picture most similar to the reference frame and calculating the relation between the position and the posture; triangularization calculating depth information of the feature points; calculating the position and posture information of the rest pictures by a PnP method; according to the application, the surrounding environment information can be comprehensively obtained by using the panoramic picture through optimizing all positioning information by a beam adjustment method, the stability and accuracy of a result can be greatly improved when visual positioning is carried out, particularly, effective texture information can be obtained when a scene is degraded, and the positioning is continued.

Drawings

FIG. 1 is a schematic flow chart of a visual positioning method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a visual positioning method device according to an embodiment of the present application;

FIG. 3 is an original error representation;

FIG. 4 is a representation of three equivalent errors;

fig. 5 is a comparison of the optimization effect using four errors.

Detailed Description

The application will now be described in detail by way of example with reference to the accompanying drawings.

The application provides a visual positioning method based on panoramic pictures, which can be applied to robots. The robot may be a wheeled mobile robot or a four-legged robot similar to a robot dog. The wheeled mobile robot may be a medical delivery robot, an unmanned sweeping vehicle, a disinfection robot, an unmanned delivery vehicle, or the like, and the application scenario of the robot is not particularly limited in this embodiment.

The visual positioning method provided by the embodiment mainly comprises the following steps: acquiring a certain number of panoramic pictures; calculating feature points and feature descriptors; selecting a reference frame; calculating the corresponding relation between the rest pictures and the characteristic points of the reference frame; selecting a frame of picture most similar to the reference frame and calculating the relation between the position and the posture; triangularization calculating depth information of the feature points; calculating the position and posture information of the rest pictures by a PnP method; and optimizing all positioning information by a beam adjustment method.

As shown in fig. 1, the specific implementation method includes the following steps:

s1, a panoramic camera is used for collecting a certain number of panoramic pictures.

The original photos can be directly synthesized into panoramic photos through processing software by shooting by using a 'insta 360 One X2' panoramic camera of, for example, the shadow stone technology. When shooting, only a command for starting recording is needed to be given, and camera parameters or lens parameters are not needed to be adjusted.

The number of pictures depends on the complexity of the environment, and if the degradation is obvious, the distance interval of acquisition should be reduced, and the acquisition number is increased. For a general scene, acquisition at 10Hz is sufficient.

"degenerated" mainly means that the texture features of the current scene are not obvious, such as a pure-colored wall surface, and it is difficult to extract qualified image features. In general, the degradation judgment is not specific quantization index, and is usually judged by the pose result variation abnormality calculated based on the features, such as more/less than a threshold value.

Because each picture is given time, the acquisition distance interval can directly use the time interval of two adjacent pictures. The number of acquisitions can then be calculated from the total time of acquisition and the time interval of the pictures.

S2, traversing the whole picture through the gray level change of the picture, calculating characteristic points, and calculating ASIFI characteristic descriptors of each characteristic point. The ASIFT descriptor can adapt to the texture features in the panoramic picture that are significantly distorted compared to the original SIFT descriptor. In this embodiment, the feature points may be calculated using a calculation function provided in the OpenCV open source library.

Extraction of ASIFT descriptors can be used: cv:: affineFeature:: detectAndCommpute.

S3, the first picture is usually selected as a reference frame for visual positioning, and the reference frame is used as a global coordinate system. For other special positioning occasions, pictures with rich texture features can be selected, so that enough feature points are ensured for subsequent feature matching. The feature matching can be carried out by adopting a general feature matching method, such as violent matching and KNN matching. Different feature descriptors can use the same feature matching method and can obtain the same matching effect.

S4, selecting a picture to be paired with the reference frame to calculate an essential matrix. The picture should be chosen to have the greatest similarity to the reference frame, for example using NCC to measure similarity, so that more three-dimensional points (points with xyz three-dimensional coordinates, usually only xy two-dimensional coordinates for the image point) can be obtained. The essential matrix calculation adopts a general calculation method, R and T are respectively used for representing the gesture and the position, E is used as an essential matrix, E=T≡R, and an antisymmetric matrix is calculated by a ≡symbol.

There are 8 degrees of freedom in the essential matrix E, so 8 matching point pairs are required to solve for E ("8-point method to decompose the essential matrix"). The previously calculated matching point pairs have mismatching points, for which a random sample consensus (RANSAC) algorithm is added to filter the mismatching points. Decomposing the matrix E may result in 4 sets r|t, of which only one set is correct. The validation method is triangularization using each set to indicate that the set of solutions is the correct solution when a positive depth is obtained.

By calculating and decomposing the matrix of essence, r|T can be obtained]But the results at this time still need to be further optimized. In this embodiment, the distance from the point to the polar plane is selected as the cost function, and [ R|T ] is solved iteratively by the optimization method of the Levenner-Marde]This distance minimization is used to complete the final optimization. Let the coordinate matrix of a pair of matching points P1 and P2 be M1 and M2, respectively, the distance from the point to the polar plane be (M2 ^T *E*M1)/|E*M1|。

The ray from the spherical center of the panoramic picture where the point P1 is located to the point P1 is m1, and the equivalent ray of EM1 is set asC2 is the sphere center of the sphere of the panoramic picture where the point P2 is located. The rays corresponding to T and R.times.M1 are T and Rm1 respectively, and the combining direction of the two rays is +.>

When noise is not considered, equation M2 ^T * Em1=0 can be interpreted as ray M2 and rayPerpendicular, i.e.)>As shown in FIG. 3, when noise is present, the noisy ray m2 is set to be m2', m2' and +.>The sin value of the included angle represents the error of the solving process (namely M2T is equal to EM1 |=0), namely the included angle epsilon in the figure _g Sin value of (c) as an optimization target:

in the following optimization comparison we used sin ^-1 (ε _g ) To represent this error. To this end, we have chosen the remaining three equivalent error representation methods that contain the errors described above to compare the respective optimization effects.

The remaining three error representations are shown in fig. 4. Wherein the plane formed by t and Rm1 is a polar plane, and the tangent point of the tangential plane is P2.

1. Intersection point and polar plane distance epsilon _p Where M2 is typically normalized to a modulus length of 1.

2、ε _p Is the tangential plane distance of (2)

3. Sampson distance

The following optimization was accomplished using the levenson-motter method, targeting the 4 errors described above, respectively, with a specific optimization effect as shown in fig. 5. Wherein the vertical axis is optimized sin-1 (. Epsilon.) _g ) The angle, the horizontal axis is the artificially added angle error. Compared with the original error, the four optimization modes have obvious effects, wherein the four optimization modes are represented by epsilon _p The error is the best for the optimization of the objective. Thus selecting the intersection to polar plane distance ε _p As an optimization objective.

S5, triangulating the matched point pairs by using the R|T obtained in the step S4 to obtain three-dimensional points:

assuming that any point in the matching point pair is a spherical point p in the panorama and the corresponding three-dimensional point is X, in this embodiment, the three-dimensional point is solved by using a method of solving a linear equation, and according to R and T obtained in step S4, let f= [ r|t ], and the size of F be (4X 4), the three-dimensional point X is obtained by constructing a linear equation of p×fx=0 and decomposing by using SVD.

S6, according to the three-dimensional point X obtained in the step S5, the position and the gesture of the subsequent picture relative to the reference frame are solved through a PnP method.

S7, the position and the gesture of the subsequent picture calculated in the step S6 relative to the reference frame need to be further optimized.

For each picture obtained subsequently, a matching point pair between the picture and a reference frame is calculated, and because of the existence of mismatching points in the matching point pair, RANSAC is added for filtering the mismatching points. Setting M and M as coordinate matrixes of a spherical point p and a corresponding three-dimensional point X respectively, selecting a ray distance from a reprojection point to a target point as a cost function in a RANSAC algorithm, and iteratively solving [ R|T ] by a column-Mart optimization method]The distance is used to be minimal. Wherein, this distance is expressed as:where |m| represents modulo M.

The "littoral-Marte" method in this embodiment is a general method in the art, and will not be described here again.

S8, the method is the same as a general visual positioning method, and after a certain number of pictures are solved, all [ R|T ] and three-dimensional points are adjusted by using beam adjustment optimization.

The final output of the visual localization is represented using a "pose". Where "bit" represents the position in the global coordinate system, i.e., [ X, Y, Z ] three coordinate values, and "pose" represents the pose of the coordinate system in relation to the global coordinate system, typically expressed in terms of angles about three coordinate axes, i.e., "roll angle" about the X-axis, pitch angle about the Y-axis, and heading angle about the Z-axis.

The embodiment also provides a visual positioning device based on panoramic pictures, which can be applied to a robot or electronic equipment, and the structural principle of the visual positioning device is shown in fig. 2, and the visual positioning device comprises:

the panoramic picture acquisition module 1 is used for acquiring a plurality of panoramic pictures;

the feature point extraction module 2 is used for traversing the whole picture according to the gray level change of the panoramic picture, extracting feature points and calculating feature descriptors of each feature point;

the pose resolving module 3 is used for selecting a first panoramic picture as a reference frame and determining a global coordinate system by using the reference frame; selecting a second panoramic picture and the reference frame to pair and calculate an essential matrix E, obtaining a matching point pair, wherein the second panoramic picture has the maximum similarity with the reference frame, and obtaining [ R|T ] representing the position and the gesture by calculating and decomposing the essential matrix E, wherein R and T are a rotation transformation matrix and a translation matrix respectively; triangularizing the matching point pairs to obtain three-dimensional points; and finishing the calculation of the position and the gesture of the rest panoramic pictures by adopting a PnP method;

a result optimization module 4, configured to reject incorrect matching points in the matching point pair by adopting a random sample consensus (RANSAC) algorithm; selecting the ray distance from the re-projection point to the target point as a cost function, and iteratively solving R|T by using a method of optimizing the column-Mart to use the minimum distance; and optimizing all [ R|T ] and three-dimensional points using a beam adjustment method.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The embodiment also provides an electronic device, which comprises a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to perform the steps of the aforementioned visual positioning method.

Accordingly, the present embodiment also provides a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the steps of the aforementioned visual positioning method.

The electronic device may be a digital computer in various forms, such as an industrial personal computer, an embedded computer, or a laptop computer, a desktop computer, a workstation, a server, a blade server, a mainframe computer, and other suitable computers, which are applied to various robots. The electronic device may also represent various forms of mobile devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

By way of example, the electronic device includes a computing unit that can perform various suitable actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) or a computer program loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device may also be stored. The computing unit, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus. A plurality of components in a device are connected to an I/O interface, comprising: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central Processing Units (CPUs), graphics Processing Units (GPUs), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, and the like. The computing unit performs the various methods and processes described above, such as the method of visual localization. For example, in some embodiments, the visual positioning method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. One or more of the steps of visual positioning described above may be performed when the computer program is loaded into RAM and executed by a computing unit. Alternatively, in other embodiments, the computing unit may be configured to perform the method of visual localization by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described above can be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

It should be noted that although the method of the present application is illustrated in the accompanying drawings as being performed in a particular order, this is not required to or implied that the steps shown must be performed in the particular order or that all of the steps shown be performed in order to achieve desirable results. Alternatively, some steps may be omitted, multiple steps may be combined into one step to be performed, and/or one step may be decomposed into multiple steps to be performed.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program commands may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the commands executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Example two

On the basis of the first embodiment, in order to further solve the technical problem that stable descriptors cannot be obtained when texture features are not obvious, the embodiment provides another visual positioning method based on panoramic pictures.

In the first embodiment, a method of calculating an "image feature descriptor" is used to find a pixel with rich texture features in a picture and track the movement of the pixel in successive pictures. The method is suitable for scenes with rich texture features, but cannot obtain stable descriptors when the texture features are not obvious on pure-color surfaces such as white walls.

The present embodiment employs a pixel tracking method based on "optical flow". The pixel tracking method based on the optical flow is not limited by texture features. Optical flow tracking is the tracking of pixel points in successive multi-frame pictures using pixel gray values. The specific optical flow calculation and tracking is to use a function cv in an Opencv open source library, namely calcopticalfowpyrlk. The details of the other steps have been described in relation to the first embodiment of the method and will not be described in detail here.

Example III

The embodiment provides an example of a calculation process, and the pose is calculated according to the visual positioning method provided by the first embodiment from the panoramic picture.

Step 1, taking the spatial positions of two adjacent panoramic pictures pano_1 and pano_2, where pano_1 is located, as a global coordinate system. At this time, the image is obtained by storing spherical pixels in a common photo in an equirectangular projection manner, and using the following projections:

(1) Calculating an extremely flat projection tangent plane based on the regular icosahedron;

(2) Dividing the tangent plane according to the required photo pixels and calculating the polar coordinate value of each pixel in the tangent plane;

(3) RGB values for each pixel in the slice are calculated using bilinear interpolation.

Img_1 and img_2 are set as the results representing the above steps.

And 2, selecting a proper tracking method (characteristic or optical flow, wherein the optical flow is used, and the solving process of the characteristic method is the same) according to the use scene and the requirement. And extracting a pixel point which is suitable as a target pixel in img_1, and tracking the target pixel in img_2 by using an Opencv function. At this time, corresponding pixel information in the two pictures can be obtained. By traversing all suitable pixels, a certain amount of pixel correspondence information can be obtained for subsequent calculations. This number is usually determined empirically, but the theoretical minimum is 6 pairs, the more the number the better the stability of the calculation result, but also more false correspondence information is introduced. And (3) re-projecting the corresponding information in the panoramic photo in the reverse direction in the step (1) to obtain the corresponding information in the panoramic photo.

And 3, calculating an essential matrix by using an 8-point method. Similar to the planar photograph, the panoramic photograph may also derive the same essential matrix form, e=t≡r.

The essential matrix E can thus be calculated using the "8-point method". During feature extraction, the matched point pairs contain wrong matching results, so that obvious wrong corresponding information is removed through a RANSAC algorithm during calculation of an 8-point method.

And 4, optimizing the result in the step 3 by using a column-Mart optimization method, and obtaining the pose of the panoramic picture pano_2 relative to the panoramic picture pano_1.

And 5, continuously calculating the pose of the panoramic picture after pano_2. Using the result of the calculation in step 4, three-dimensional points of pixels are calculated using triangulation.

And 6, tracking pixel points on other pictures by continuing to use the light flow tracking, and solving PnP by combining the three-dimensional points in the step 5 to obtain the pose of the other pictures relative to pano_1.

The results in step 7 and step 6 are usually solved by directly solving a linear equation, so that the accuracy is poor. And (3) continuously solving PnP results of the multi-frame pictures, and jointly optimizing the picture pose and the three-dimensional point by using a beam method adjustment (Bundle adjustment).

And 8, the results in the step 4 and the step 7 are pose information, namely positioning information, of the panoramic picture relative to pano_1. And (5) finishing solving.

In summary, the above embodiments are only preferred embodiments of the present application, and are not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The visual positioning method based on the panoramic picture is characterized by comprising the following steps of:

acquiring a plurality of panoramic pictures;

extracting feature points of the panoramic picture;

2. The panoramic picture-based visual positioning method of claim 1, wherein: each set [ R|T ] obtained by decomposing the matrix E is triangulated, and the set [ R|T ] to obtain a positive depth is selected.

3. The panoramic picture-based visual positioning method of claim 2, wherein: selecting a distance from a point to a polar plane as a cost function, taking the minimum cost function as an optimization target, and iteratively solving R|T by a method for optimizing the Leven-Marde]The method comprises the steps of carrying out a first treatment on the surface of the The point-to-polar plane distance is expressed as: (M2) ^T * E x M1)/|e x M1|; where M1 and M2 represent the coordinate matrix of a pair of matching points, and T represents the transpose.

4. The panoramic picture-based visual positioning method of claim 1, wherein: the method for triangulating the matching point pair based on the obtained position and posture comprises the following steps:

5. The panoramic picture-based visual positioning method of claim 1, wherein: after the positions and the postures of the rest panoramic pictures relative to the reference frame are calculated by adopting a PnP method, the following optimization is carried out:

6. The panoramic picture-based visual positioning method of claim 1, wherein: and calculating ASIFI feature descriptors for the feature points.

7. The panoramic picture-based visual positioning method of claim 1, wherein: the similarity is calculated by adopting a normalized cross-correlation method.

8. A panoramic picture-based visual positioning device, comprising:

9. An electronic device comprising a processor, a memory for storing instructions executable by the processor; wherein the processor is configured to perform the steps of the visual positioning method of any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the steps of the visual positioning method of any one of claims 1 to 7.