WO2021237443A1

WO2021237443A1 - Visual positioning method and apparatus, device and readable storage medium

Info

Publication number: WO2021237443A1
Application number: PCT/CN2020/092284
Authority: WO
Inventors: 陈尊裕; 吴珏其; 胡斯洋; 陈欣; 吴沛谦; 张仲文
Original assignee: 蜂图志科技控股有限公司
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2021-12-02
Also published as: CN111758118B; CN111758118A; JP2023523364A; JP7446643B2

Abstract

A visual positioning method and apparatus, a device, and a readable storage medium, the method comprising: acquiring a wide-angle photo, and randomly segmenting the wide-angle photo to obtain an atlas to be measured (S101); and inputting the atlas into a positioning model for position identification to obtain a plurality of candidate positions, wherein the positioning model is a neural network model trained by using panoramic photos in a real map (S102); and determining the final position by using the plurality of candidate positions (S103). The positioning model may be obtained by training the neural network model on the basis of the panoramic photos in the real map, and visual positioning may be completed on the basis of the positioning model, which solves the problem of the collection difficulty of training samples for visual positioning.

Description

Visual positioning method, device, equipment and readable storage medium

Technical field

This application relates to the field of positioning technology, and in particular to a visual positioning method, device, equipment, and readable storage medium.

Background technique

The principle of visual positioning based on machine learning: Use a large number of real scene photos with location markers for training, and get a neural network model whose input is a photo (RGB numerical matrix) and output is a specific location. After obtaining the trained neural network model, the user only needs to take a picture of the environment to get the specific shooting location.

This method needs to collect a large number of photo samples of the use environment as a training data set. For example, some documents record that in order to realize the visual positioning of a 35-meter-wide street corner store, 330 photos need to be collected, and in order to realize the visual positioning of a 140-meter street (positioning only on one side), 1,500 Multiple photos; in order to realize the positioning of a certain factory, the factory needs to be divided into 18 areas, and each area needs to take 200 images. It can be seen that in order to ensure the visual positioning effect, it is necessary to collect a large number of on-site photos as training data, and these photos must be taken to every corner of the scene, which is very time-consuming and labor-intensive.

In summary, how to solve problems such as difficulty in sample collection in visual positioning is a technical problem urgently needed to be solved by those skilled in the art.

Summary of the invention

The purpose of this application is to provide a visual positioning method, device, equipment, and readable storage medium, which use panoramic photos in a real map to train a neural network model, which can solve the problem of difficult sample collection in visual positioning.

In order to solve the above technical problems, this application provides the following technical solutions:

A visual positioning method, including:

Obtain a wide-angle photo, and perform random segmentation on the wide-angle photo to obtain an atlas to be tested;

Inputting the atlas to be tested into a positioning model for positioning recognition, and obtaining multiple candidate positioning; the positioning model is a neural network model trained by using panoramic photos in a real map;

Using a plurality of the candidate positions, the final position is determined.

Preferably, said determining the final position using a plurality of said candidate positions includes:

Perform clustering processing on a plurality of candidate locations, and use a clustering result to screen the plurality of candidate locations;

Use the selected candidate locations to construct geometric figures;

The geometric center of the geometric figure is taken as the final positioning.

Preferably, it also includes:

Using the final positioning to calculate the standard deviation of a plurality of candidate positionings;

The standard deviation is used as the positioning error of the final positioning.

Preferably, the process of training the neural network model includes:

Acquiring a number of the panoramic photos from the real-scene map, and determining the geographic location of each of the real-scene photos;

Performing anti-distortion transformation on a plurality of said panoramic photos to obtain a plurality of groups of plane projection photos with the same aspect ratio;

According to the corresponding relationship with the panoramic photos, mark each group of the plane projection photos with a geographic mark; the geographic mark includes a geographic location and a specific orientation;

Use geographic-tagged plane projection photos as training samples;

The neural network model is trained using the training samples, and the trained neural network model is determined as the positioning model.

Preferably, said performing the anti-distortion transformation on several of said panoramic photos to obtain several groups of plane projection photos with the same aspect ratio, including:

In the anti-distortion transformation, each of the panoramic photos is divided according to different focal length parameters, and several groups of plane projection photos with different viewing angles are obtained.

Preferably, in the de-warping transformation, dividing each of the panoramic photos according to different focal length parameters to obtain several groups of plane projection photos with different viewing angles includes:

Each of the panoramic photos is segmented according to the number of segments corresponding to the original image coverage greater than a specified percentage, and several groups of adjacent images have plane projection photos with overlapping viewing angles.

Preferably, the process of training the neural network model further includes:

The training samples are supplemented by using scene photos obtained from the Internet or environment photos collected from the positioning environment.

Preferably, performing random segmentation on the wide-angle photos to obtain the atlas to be tested includes:

According to the number of divisions, random division is performed on the wide-angle photo with an original image coverage greater than a specified percentage, and a set of atlases to be tested matching the number of divisions is obtained.

A visual positioning device includes:

The atlas to be tested acquisition module is used to acquire wide-angle photos, and randomly segment the wide-angle photos to obtain the atlas to be tested;

The candidate positioning acquisition module is used to input the atlas to be tested into a positioning model for positioning recognition to obtain multiple candidate positionings; the positioning model is a neural network model trained by using panoramic photos in a real-world map;

The positioning output module is used to determine the final positioning by using a plurality of the candidate positionings.

A visual positioning device, including:

Memory, used to store computer programs;

The processor is used to implement the above-mentioned visual positioning method when the computer program is executed.

A readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the visual positioning method as described above is realized.

Apply the method provided in the embodiments of this application to obtain wide-angle photos, and randomly segment the wide-angle photos to obtain the atlas to be tested; input the atlas to be tested into the positioning model for positioning recognition, and obtain multiple candidate positions; the positioning model is Use the neural network model trained on the panoramic photos in the real map; use multiple candidate locations to determine the final location.

The real map is a map where you can see the real street scene, and the real map includes a 360-degree real scene. And the panoramic photo in the real map is the real street view map, which overlaps with the application environment of visual positioning. Based on this, in this method, the neural network module is trained by using the panoramic photos in the real map to obtain a positioning model for visual positioning. After obtaining the wide-angle photos, perform random segmentation on the wide-angle photos to obtain the atlas to be tested. Input the atlas to be tested into the positioning model for positioning recognition, and then multiple candidate positionings can be obtained. Based on these candidate positions, the final position can be determined. It can be seen that in this method, a positioning model can be obtained by training the neural network model based on the panoramic photos in the real scene map, and the visual positioning can be completed based on the positioning model, which solves the problem of difficulty in the collection of visual positioning training samples.

Correspondingly, the embodiments of the present application also provide devices, equipment, and readable storage media corresponding to the above-mentioned visual positioning method, which have the above-mentioned technical effects, and will not be repeated here.

Description of the drawings

In order to more clearly explain the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. For those of ordinary skill in the art, Without creative work, other drawings can be obtained based on these drawings.

Fig. 1 is an implementation flowchart of a visual positioning method in an embodiment of the application;

FIG. 2 is a schematic diagram of a perspective segmentation in an embodiment of this application;

FIG. 3 is a schematic structural diagram of a visual positioning device in an embodiment of the application;

4 is a schematic structural diagram of a visual positioning device in an embodiment of this application;

Fig. 5 is a schematic diagram of a specific structure of a visual positioning device in an embodiment of the application.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the application, the application will be further described in detail below with reference to the accompanying drawings and specific implementations. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

It should be noted that, since the neural network model can be stored in the cloud or a local device, the visual positioning method provided in the embodiment of the present invention can be directly applied to a cloud server, or in a local device. The device that needs to be positioned can be positioned through a wide-angle photo if it has the functions of taking pictures and networking.

Please refer to FIG. 1. FIG. 1 is a flowchart of a visual positioning method in an embodiment of the application. The method includes the following steps:

S101. Obtain a wide-angle photo, and perform random segmentation on the wide-angle photo to obtain a set of atlases to be tested.

Wide-angle, that is, pictures taken with a wide-angle lens or panoramic mode. Simply put, the smaller the focal length, the wider the field of view, and the wider the range of the scene that can be accommodated in the photo.

Because, in the visual positioning method provided by the invention, the panoramic photos in the real map are used to train the neural network model. Therefore, in order to better perform visual positioning, when using the positioning model for visual positioning, the required photos are also wide-angle photos. For example, the user can use the wide-angle mode (or ultra-wide-angle mode) or the panoramic mode to take a picture of the surrounding environment at a location that needs to be positioned. The angle of view exceeds 120 degrees (of course, it can also be other degrees, such as 140 degrees, 180 degrees, etc.). Photo.

After the wide-angle photos are obtained, they are randomly divided to obtain the atlas to be tested composed of several divided photos.

In particular, the specific number of photos divided into the wide-angle photo can be set according to the training effect of the positioning model of the world and the actual positioning accuracy requirements. Generally speaking, within the recognizable range (the size of the photo is too small, there will be no relevant positioning features, and the problem of effective recognition cannot be carried out), the larger the number of divisions, the higher the positioning accuracy. Of course, the more training iterations of the model, The longer the training time.

Preferably, in order to improve the positioning accuracy, when segmenting the wide-angle photo, the wide-angle photo can also be randomly divided according to the number of divisions with the original image coverage greater than a specified percentage, to obtain the atlas to be tested matching the number of divisions. Specifically, the wide-angle photos can be randomly divided into N pieces with an aspect ratio of 1:1 (it should be noted that the aspect ratio can also be other ratios, and the aspect ratio is the same as the aspect ratio of the training sample used for training the positioning model. That is, the image whose height is 1/3 to 1/2 of the height of the wide-angle photo is used as the atlas to be measured. Among them, the number of N is set according to the training effect and positioning accuracy needs. When the training effect is slightly poor and the positioning accuracy is high, select a higher value of N. Usually the number of N can be set to 100 (of course, other values can also be selected , Such as 50, 80, etc., will not be enumerated here one by one). Generally, the random segmentation result requires the coverage of the original image (that is, the wide-angle photo) to be >95% (of course, it can also be set to other percentages, which will not be enumerated here).

S102: Input the atlas to be tested into the positioning model for positioning recognition, and obtain multiple candidate positioning.

Among them, the positioning model is a neural network model trained by using panoramic photos in the real map.

In order to obtain a more precise positioning effect, in this embodiment, each segmented photo in the atlas to be measured is input into the positioning model for positioning recognition, and an output about the positioning result is obtained for each photo. In this embodiment, the positioning result corresponding to each divided photo is used as a candidate positioning.

It should be noted that before actual application, a positioning model needs to be trained. The process of training a neural network model includes:

Step 1: Obtain a number of panoramic photos from the real-world map, and determine the geographic location of each real-world photo;

Step 2: Perform anti-distortion transformation on several panoramic photos to obtain several groups of plane projection photos with the same aspect ratio;

Step 3: Mark a geographic mark for each group of plane projection photos according to the correspondence with the panoramic photos; the geographic mark includes the geographic location and the specific orientation;

Step 4. Use the geo-tagged plane projection photos as training samples;

Step 5. Use the training samples to train the neural network model, and determine the trained neural network model as the positioning model.

For ease of description, the above five steps are combined for description.

Since the angle of view of the panoramic photo is close to 360 degrees, in this embodiment, the panoramic photo can be subjected to anti-distortion transformation, and then several groups of plane projection photos with the same length ratio can be obtained. Since there is a corresponding relationship between the panoramic photos and the geographic locations in the real scene map, in this embodiment, the geographic locations of a group of planar projection photos divided from the same panoramic photo correspond to the geographic locations of the panoramic photos. In addition, when dividing a panoramic photo, the segmentation is performed based on the angle of view. Therefore, the orientation of the divided photo is also clear. In this embodiment, the geographic location and the specific orientation are used as geographic markers and added. In other words, every flat projection photo has a corresponding geographic location and specific orientation.

Take a plane projection photo with geographic markers as a training sample, and then use the training sample to train the neural network model, and the trained neural network model is the positioning model. Specifically, a collection of photos with specific locations and specific orientations can be used as the data pool. Randomly select 80% of the data pool as the training set, and the remaining 20% as the test set. The ratio can also be adjusted according to the actual training situation. Input the training set into the initialized or pre-trained neural network model of the large-scale image set for training, and use the test set to verify the training results. Common neural network structures that can be used are CNN (Convolutional Neural Network, convolutional neural network, which is a feedforward neural network, including convolutional layer (alternating convolutional layer) and pooling layer) and its derivative structure, LSTM (Long Short-Term Memory, long and short-term memory network, a time recurrent neural network (RNN)) and hybrid structures, etc. In the embodiments of the present application, the specific neural network used is not limited. After completing the training, a neural network model that is suitable for the site of the real map data source is obtained, that is, the positioning model.

Preferably, in order to adapt to different focal lengths (ie viewing angles) of different picture acquisition devices in practical applications, the panoramic photos can be segmented according to different focal length parameters, so as to obtain plane projection photos with different viewing angles as training samples. Specifically, each panoramic photo can be divided according to different focal length parameters in the anti-distortion transformation to obtain several groups of plane projection photos with different viewing angles. That is, the number of divisions n is determined according to the focal length parameter F. The smaller the focal length parameter, the larger the viewing angle, the smaller the number of divisions n can be. As shown in Fig. 2, Fig. 2 is a schematic diagram of a viewing angle segmentation in an embodiment of this application. The most commonly used focal length parameter is F=0.5, the viewing angle is 90 degrees, and the number of divisions n=4 can cover a full 360-degree angle. When multiple plane projection photos with different perspectives are needed, the focal length parameter F can also be changed to other values, such as 1.0 and 1.3, to obtain plane projection photos with other perspectives.

Preferably, in order to improve the accuracy of viewing angle positioning, when segmenting the panoramic photo, the panoramic photo can also be segmented according to the number of segments corresponding to the original image coverage greater than a specified percentage. That is, under the same viewing angle, the adjacent pictures have a flat projection photo with a covering angle. Specifically, each panoramic photo is segmented according to the number of segments corresponding to the original image coverage greater than a specified percentage, and several groups of adjacent images have overlapping perspective plane projection photos. That is, in order to enrich the shooting angle of the photo, it is recommended that the number of divisions be greater than the number of equal divisions when the focal length is fixed. That is, the axis perpendicular to the ground of the panoramic photo projection spherical surface is the rotation axis, and the center of the line of sight (the arrow in the figure 2) is rotated every 45 degrees to split a plane projection photo with a viewing angle of 90 degrees. At this time, the adjacent pictures There will be a 45-degree overlapping viewing angle. According to the orientation angle of the center of the line of sight, the orientation data is then marked for the resulting flat projection photo. Because the value of F can also be 1.0 and 1.3, the viewing angle is about 60 degrees and 30 degrees, respectively, and the value of n can also be 12 and 24. You can also set more F values and increase the number of n to further improve the coverage of the training set. Generally, the coverage rate is greater than 95%.

Preferably, considering that in practical applications, relying on panoramic photos for training may cause poor visual positioning and recognition due to the low update frequency of the real map and other reasons, the process of training the neural network model can also be used from Use the Internet to obtain scene photos, or supplement the training samples with environmental photos collected from the positioning environment.

S103. Use multiple candidate locations to determine a final location.

After obtaining multiple candidate locations, the final location can be determined based on these candidate locations. After obtaining the final positioning, it can be output for the user to view.

Specifically, one location can be randomly selected from the candidate locations as the final location, or several candidate locations can be randomly selected from the candidate locations, and the geometric centers of geometric figures corresponding to these candidate locations can be taken as the final location. Of course, several candidate locations with a high degree of overlap can also be used as the final location.

Preferably, considering that relatively special individual positions may appear in the candidate positions, in order to improve the accuracy of the final positioning, the candidate positions can be clustered and filtered, and the candidate positions that are free from most positioning positions can be removed, and then based on The remaining candidate positions determine the final position. Specifically, the implementation process includes:

Step 1: Perform clustering processing on multiple candidate locations, and use the clustering results to screen multiple candidate locations;

Step 2: Use the selected candidate locations to construct geometric figures;

Step three, take the geometric center of the geometric figure as the final positioning.

Specifically, a clustering algorithm such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise) can be used to classify candidate locations, and adjacent location data can be classified into one category. Among them, the classification parameter can be set to ε neighborhood=1, and the minimum number of points minPts=5. The position result with the largest number is regarded as a reliable result, and the geometric center of the geometric figure corresponding to all candidate positions of this type is calculated as the final positioning result.

Preferably, in order to better display the positioning situation, the positioning error can also be determined. Specifically, the final location is used to calculate the standard deviation of multiple candidate locations; the standard deviation is used as the location error of the final location. That is, the variance between each candidate location and the final location is calculated and accumulated to obtain the final location error.

It should be noted that, based on the foregoing embodiments, the embodiments of the present application also provide corresponding improvement solutions. In the preferred/improved embodiments, the same steps as in the above-mentioned embodiments or the corresponding steps can be referred to each other, and the corresponding beneficial effects can also be referred to each other, which will not be repeated in the preferred/improved embodiments herein.

Corresponding to the above method embodiment, the embodiment of the present application also provides a visual positioning device. The visual positioning device described below and the visual positioning method described above can be referred to each other.

As shown in Figure 3, the visual positioning device includes:

The atlas to be tested acquisition module 101 is used to acquire wide-angle photos, and randomly segment the wide-angle photos to obtain the atlas to be tested;

The candidate location acquisition module 102 is used to input the atlas to be tested into a location model for location recognition to obtain multiple candidate locations; the location model is a neural network model trained by using panoramic photos in the real map;

The positioning output module 103 is used for determining the final positioning using multiple candidate positionings.

Using the device provided in the embodiments of this application, the wide-angle photos are obtained, and the wide-angle photos are randomly divided to obtain the atlas to be tested; the atlas to be tested is input to the positioning model for positioning recognition, and multiple candidate positions are obtained; the positioning model is Use the neural network model trained on the panoramic photos in the real map; use multiple candidate locations to determine the final location.

The real map is a map where you can see the real street scene, and the real map includes a 360-degree real scene. And the panoramic photo in the real map is the real street view map, which overlaps with the application environment of visual positioning. Based on this, in this device, the neural network module is trained by using the panoramic photos in the real map to obtain a positioning model for visual positioning. After obtaining the wide-angle photos, perform random segmentation on the wide-angle photos to obtain the atlas to be tested. Input the atlas to be tested into the positioning model for positioning recognition, and multiple candidate positionings can be obtained. Based on these candidate positions, the final position can be determined. It can be seen that in this device, a positioning model can be obtained by training the neural network model based on the panoramic photos in the real map, and based on the positioning model, the visual positioning can be completed, which solves the problem of difficulty in the collection of visual positioning training samples.

In a specific implementation manner of the present application, the positioning output module 103 specifically includes:

The positioning screening unit is used to perform clustering processing on multiple candidate locations, and use the clustering results to screen multiple candidate locations;

The geometric figure construction unit is used to construct a geometric figure by using several candidate positions obtained by screening;

The final positioning determining unit is used to take the geometric center of the geometric figure as the final positioning.

In a specific implementation manner of the present application, the positioning output module 103 further includes:

The positioning error determining unit is used to calculate the standard deviation of multiple candidate positioning by using the final positioning; the standard deviation is used as the positioning error of the final positioning.

In a specific implementation manner of the present application, the model training module includes:

The panoramic photo obtaining unit is used to obtain several panoramic photos from the real-world map and determine the geographic location of each real-world photo;

The anti-distortion transformation unit is used to perform anti-distortion transformation on several panoramic photos to obtain several groups of plane projection photos with the same aspect ratio;

The geotagging unit is used to tag each group of plane projection photos with geotags according to the corresponding relationship with the panoramic photos; the geotags include geographic location and specific orientation;

The training sample determination unit is used to use the geographic-tagged plane projection photos as the training sample;

The model training unit is used to train the neural network model using training samples, and determine the trained neural network model as a positioning model.

In a specific implementation of the present application, the anti-warping transformation unit is specifically used to segment each panoramic photo according to different focal length parameters in the anti-warping transformation to obtain several groups of plane projection photos with different viewing angles.

In a specific implementation of the present application, the anti-distortion transformation unit is specifically used to divide each panoramic photo according to the number of divisions whose coverage ratio of the corresponding original image is greater than a specified percentage, to obtain planes with overlapping viewing angles in several groups of adjacent pictures Project photos.

In a specific implementation manner of this application, the model training module further includes:

The sample supplement unit is used to supplement the training samples by using the scene photos obtained from the Internet or the environment photos collected from the positioning environment.

In a specific implementation of the present application, the atlas acquisition module 101 to be tested is specifically configured to perform random segmentation of the wide-angle photo with the original image coverage greater than a specified percentage according to the number of segments, to obtain the image to be tested matching the number of segments set.

Corresponding to the above method embodiment, the embodiment of the present application also provides a visual positioning device. The visual positioning device described below and the visual positioning method described above can be referenced correspondingly.

As shown in Figure 4, the visual positioning device includes:

The memory 410 is used to store computer programs;

The processor 420 is configured to implement the steps of the visual positioning method provided in the foregoing method embodiment when executing a computer program.

Specifically, please refer to FIG. 5, which is a schematic diagram of a specific structure of a visual positioning device provided by this embodiment. The visual positioning device may have relatively large differences due to different configurations or performance, and may include one or more processors ( Central processing units, CPU) 420 (for example, one or more processors) and memory 410, one or more of which store computer application programs 413 or data 412. Among them, the memory 410 may be short-term storage or persistent storage. The computer application program may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device. Furthermore, the central processing unit 420 may be configured to communicate with the memory 410 and execute a series of instruction operations in the memory 410 on the visual positioning device 301.

The visual positioning device 400 may also include one or more power supplies 430, one or more wired or wireless network interfaces 440, one or more input and output interfaces 450, and/or one or more operating systems 411.

The steps in the visual positioning method described above can be implemented by the structure of the visual positioning device.

Corresponding to the above method embodiment, the embodiment of the present application also provides a readable storage medium, and a readable storage medium described below and a visual positioning method described above can be referenced correspondingly.

A readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of the visual positioning method provided by the foregoing method embodiment are implemented.

The readable storage medium can specifically be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk that can store program codes. Readable storage medium.

Those skilled in the art may further realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, in order to clearly illustrate the hardware and For the interchangeability of software, the composition and steps of each example have been generally described in accordance with the function in the above description. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Those skilled in the art can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present application.

Claims

A visual positioning method, characterized in that it comprises:

Obtain a wide-angle photo, and perform random segmentation on the wide-angle photo to obtain an atlas to be tested;

Inputting the atlas to be tested into a positioning model for positioning recognition, and obtaining multiple candidate positioning; the positioning model is a neural network model trained by using panoramic photos in a real map;

Using a plurality of the candidate positions, the final position is determined.
The visual positioning method according to claim 1, wherein said determining the final position by using a plurality of said candidate positions comprises:

Perform clustering processing on a plurality of candidate locations, and use a clustering result to screen the plurality of candidate locations;

Use the selected candidate locations to construct geometric figures;

The geometric center of the geometric figure is taken as the final positioning.
The visual positioning method according to claim 2, further comprising:

Using the final positioning to calculate the standard deviation of a plurality of candidate positionings;

The standard deviation is used as the positioning error of the final positioning.
The visual positioning method according to claim 1, wherein the process of training the neural network model comprises:

Acquiring a number of the panoramic photos from the real-scene map, and determining the geographic location of each of the real-scene photos;

Performing anti-distortion transformation on a plurality of said panoramic photos to obtain a plurality of groups of plane projection photos with the same aspect ratio;

According to the corresponding relationship with the panoramic photos, mark each group of the plane projection photos with a geographic mark; the geographic mark includes a geographic location and a specific orientation;

Use geographic-tagged plane projection photos as training samples;

The neural network model is trained using the training samples, and the trained neural network model is determined as the positioning model.
The visual positioning method according to claim 4, characterized in that said performing inverse distortion transformation on several of said panoramic photos to obtain several groups of plane projection photos with the same aspect ratio comprises:

In the anti-distortion transformation, each of the panoramic photos is divided according to different focal length parameters, and several groups of plane projection photos with different viewing angles are obtained.
The visual positioning method according to claim 5, wherein the segmentation of each of the panoramic photos according to different focal length parameters in the de-warping transformation to obtain several groups of plane projection photos with different viewing angles comprises:

Each of the panoramic photos is segmented according to the number of segments corresponding to the original image coverage greater than a specified percentage, and several groups of adjacent images have plane projection photos with overlapping viewing angles.
The visual positioning method according to claim 4, wherein the process of training the neural network model further comprises:

The training samples are supplemented by using scene photos obtained from the Internet or environment photos collected from the positioning environment.
The visual positioning method according to claim 1, wherein the random segmentation of the wide-angle photos to obtain the atlas to be measured comprises:

According to the number of divisions, random division is performed on the wide-angle photo with an original image coverage greater than a specified percentage, and a set of atlases to be tested matching the number of divisions is obtained.
A visual positioning device, characterized in that it comprises:

The atlas to be tested acquisition module is used to acquire wide-angle photos, and randomly segment the wide-angle photos to obtain the atlas to be tested;

The candidate positioning acquisition module is used to input the atlas to be tested into a positioning model for positioning recognition to obtain multiple candidate positionings; the positioning model is a neural network model trained by using panoramic photos in a real-world map;

The positioning output module is used to determine the final positioning by using a plurality of the candidate positionings.
A visual positioning device, characterized in that it comprises:

Memory, used to store computer programs;

The processor is configured to implement the visual positioning method according to any one of claims 1 to 8 when the computer program is executed.
A readable storage medium, wherein a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the visual positioning method according to any one of claims 1 to 8 is realized.