CN114266830A

CN114266830A - Underground large-space high-precision positioning method

Info

Publication number: CN114266830A
Application number: CN202111617089.2A
Authority: CN
Inventors: 郭明; 潘登; 程鹏; 高楚天
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-01
Anticipated expiration: 2041-12-28
Also published as: CN114266830B

Abstract

The invention discloses a high-precision positioning method for an underground large space, which comprises the following steps: s1, arranging targets in the underground space; s2, converting the space image picture data into a space three-dimensional point cloud model, and generating a triangulation network model based on the three-dimensional point cloud model; s3, outputting the homonymous pixel point of the central pixel of the space real-time image picture based on the trained deep learning network model; s4, resolving to obtain a three-dimensional space coordinate of the central position point based on the triangulation network model; and S5, drawing a central line, wherein the intersection point of the central line is the real-time position point. The invention can carry out real-time position location in the environment without GNSS signals and complex electromagnetism, and the location method of the invention has lower cost and can realize quick location.

Description

Underground large-space high-precision positioning method

Technical Field

The invention relates to the technical field of space positioning. More particularly, the invention relates to a high-precision positioning method for underground large space.

Background

For the increasing high-precision positioning and navigation application requirements, most of the currently used positioning technologies of the conventional positioning services have some problems in the interior of a complex building or in the underground space environment, and cannot provide real-time reliable positioning services for the underground large space.

The satellite navigation needs to be carried out in a new building area with a wide visual field and few obstacles, in the field, in exploration and positioning and the like, and in an underground space, satellite signals are poor in quality or satellites cannot be received at all, so that real-time and accurate position service cannot be provided; the WIFI positioning can realize complex large-range positioning and is convenient to network. The WIFI positioning disadvantage also clearly exists: the WIFI tag used for WIFI positioning is designed in a nonstandard mode, only the data format refers to the 802.11b format, and the standard WIFI protocol is not supported; the WIFI tag has large power consumption, the continuous emission current is more than 200ma, and the service life of a battery limits the popularization and the application of the WIFI positioning tag; the cost of the label is relatively high, which is not beneficial to large-scale popularization; the WIFI positioning has a serious same frequency interference problem, and systems can influence each other. Bluetooth calculates the distance by using the intensity change of a receiving end and a signal source, and estimates the position by using a triangulation method, but under the complex electromagnetic environment of large underground space, an accurate signal calculation model is difficult to establish, the positioning delay is high, the precision is poor, and the error cannot meet the requirement of real-time positioning; the radio frequency positioning has the advantages that the size of the mark is small, the manufacturing cost is low, the acting distance is short, the communication capability is not available, the mark is not convenient to integrate into other systems, the accurate positioning cannot be realized, and a large amount of engineering practice experience difficulty is required for arranging the card reader and the antenna. The ultra-wideband is a carrier-free communication technology, data are transmitted by utilizing nanosecond-microsecond non-sine wave narrow pulses, the distance between the nanosecond-microsecond non-sine wave narrow pulses and the microsecond non-sine wave narrow pulses is measured by using the receiving time difference of pulse signals, the position is measured and calculated by using a triangulation method, the precision is good in the range of the line of sight, but the problems of large error and low precision are caused by the influence of multiple paths, non-line of sight and other reasons in the signal transmission process, the common positioning distance is firstly determined to be 20 meters, and the manufacturing cost is too high and is not suitable for large underground space; the Zigbee technology is a short-distance and low-speed wireless network technology emerging in recent years, but the technology is not perfect, only can finish rough positioning, cannot realize real-time positioning, and has the problems of low precision, large influence of environmental interference and the like.

Disclosure of Invention

An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.

The invention also aims to provide a high-precision positioning method for the underground large space, which solves the problem that the prior art cannot perform real-time positioning in the complex electromagnetic environment of the underground large space.

To achieve these objects and other advantages in accordance with the present invention, there is provided a method for high-precision positioning of a large space underground, comprising the steps of:

s1, arranging targets in the underground space;

s2, collecting space image picture data by using a panoramic camera system, converting the space image picture data into a three-dimensional point cloud model of a space, carrying out absolute coordinate conversion on the three-dimensional point cloud model by using control point coordinates based on a control measurement mode, and generating a triangulation network model by using the converted three-dimensional point cloud model; the panoramic camera system comprises eight single lens reflex cameras;

s3, training the deep learning network model by using the spatial image picture data obtained in the step S2 as a sample set, acquiring spatial real-time image picture data in real time by using a panoramic camera system, inputting the spatial real-time image picture data into the trained deep learning network model, performing homonymy pixel matching identification, and outputting homonymy pixel points of central pixels of the spatial real-time image pictures shot by eight single-lens reflex cameras;

s4, resolving to obtain three-dimensional space coordinates of object space central position points corresponding to at least three single lens reflex cameras based on a triangulation network model according to the homonymous pixel points obtained in the step S3;

and S5, drawing at least three central lines according to the central position points of the at least three object spaces obtained in the step S4, wherein the intersection point of the at least three central lines is the real-time position point.

Preferably, in the underground large-space high-precision positioning method, the eight single-lens reflex cameras of the panoramic camera system are all located on the same horizontal plane and are respectively located in the middle of each side of the regular octagon.

Preferably, the underground large space high-precision positioning method includes the following specific steps of S2:

acquiring space image picture data by using a panoramic camera system based on a close-range photogrammetry method;

carrying out feature detection and matching on the image data with overlapped space images based on an SFM algorithm, solving a collinear equation according to a three-point collinear principle of an object point, an image point and a projection center, and carrying out global optimization by using a light beam method to obtain three-dimensional sparse point cloud data of a space;

converting the three-dimensional sparse point cloud data into a spatial three-dimensional point cloud model through an encryption algorithm;

and measuring control point coordinates based on control measurement, performing absolute coordinate conversion on the three-dimensional point cloud model by using the control point coordinates, and generating a triangulation network model by using the converted three-dimensional point cloud model.

Preferably, in the underground large space high-precision positioning method, the convolution of the backbone network of the deep learning network model in step S3 is a convolution with a self-attention mechanism, and the self-attention mechanism shares a convolution kernel in a channel dimension.

Preferably, the underground large space high-precision positioning method includes the following specific steps of S4:

if the homonymous pixel point obtained in the step S3 is one of the vertices of the triangulation network model, the three-dimensional space coordinate of the vertex is the three-dimensional space coordinate corresponding to the homonymous pixel point, and if the homonymous pixel point obtained in the step S3 is in a triangle other than the vertex of the triangulation network model, the three-dimensional space coordinate of the homonymous pixel point is calculated according to the three vertex coordinates and the pixel interrelation of the triangle corresponding to the homonymous pixel point;

and converting each pixel point with the same name into a three-dimensional space coordinate according to the method to obtain the three-dimensional space coordinate of the object space center position point corresponding to each single lens reflex.

Preferably, the underground large space high-precision positioning method specifically includes, in step S5: drawing a perpendicular line of a side corresponding to the regular octagon on a horizontal plane by taking a central position point of any object space as a starting point, wherein the perpendicular line is the central line;

in step S5, if at least three center lines intersect at the same point, the coordinate corresponding to the intersection point is the coordinate of the real-time position point; if at least three central lines do not intersect at the same point, calculating a quasi-intersection point by adopting a least square method, wherein a coordinate corresponding to the quasi-intersection point is the coordinate of the real-time position point.

The present invention also provides an electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory to make the electronic device execute the method.

The present invention also provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the claims.

The invention at least comprises the following beneficial effects:

1. the invention has the advantages of low real-time position positioning cost, short construction period and convenient and quick system maintenance. The positioning method can normally operate in a GNSS signal-free and complex electromagnetic environment, provides accurate and low-delay positioning service, and can solve the problem of high-precision real-time positioning in the complex electromagnetic environment of large underground space.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a schematic diagram of a real-time location point of the present invention;

FIG. 2 is a schematic view of the panoramic camera system of the present invention

FIG. 3 is a schematic diagram of the center line of the triangulation model with corresponding location points at the vertices of the triangulation model according to the invention;

FIG. 4 is a schematic diagram of a centerline drawn when the corresponding location point is not at the vertex of the triangulation network model according to the present invention.

Detailed Description

The present invention is further described in detail below with reference to the drawings and examples so that those skilled in the art can practice the invention with reference to the description.

It will be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.

It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the reagents and materials are commercially available unless otherwise specified.

In the description of the present invention, the terms "lateral", "longitudinal", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

The invention provides a high-precision positioning method for an underground large space, which comprises the following steps:

s1, arranging targets in the underground space;

s2, collecting space image picture data by using a panoramic camera system, converting the space image picture data into a three-dimensional point cloud model of a space, carrying out absolute coordinate conversion on the three-dimensional point cloud model by using control point coordinates based on a control measurement mode, and generating a triangulation network model by using the converted three-dimensional point cloud model; the panoramic camera system comprises eight single-lens reflex cameras, and calibration processing is carried out on any single-lens reflex camera before shooting;

s3, training the deep learning network model by using the spatial image picture data obtained in the step S2 as a sample set, acquiring spatial real-time image picture data in real time by using a panoramic camera system, inputting the spatial real-time image picture data into the trained deep learning network model, performing homonymy pixel matching identification, and outputting homonymy pixel points of central pixels of the spatial real-time image pictures shot by eight single-lens reflex cameras; one single lens reflex corresponds to one pixel point with the same name;

s4, resolving to obtain three-dimensional space coordinates of object space central position points corresponding to at least three single lens reflex cameras based on a triangulation network model according to the homonymous pixel points obtained in the step S3; the three-dimensional space coordinates of the object space central position points corresponding to three or more than three single-lens reflex cameras can be calculated, and preferably, the three-dimensional space coordinates of the object space central position points of eight single-lens reflex cameras are calculated;

and S5, drawing at least three central lines according to the central position points of the at least three object spaces obtained in the step S4, wherein the intersection point of the at least three central lines is the real-time position point. Preferably, the three-dimensional space coordinates of the eight object center position points are all solved, eight central lines are drawn, and the intersection points of the eight central lines are taken as real-time position points.

The positioning method can normally operate in a GNSS signal-free and complex electromagnetic environment, provides accurate and low-delay positioning service, and can solve the problem of high-precision real-time positioning in the complex electromagnetic environment of large underground space. The invention has the advantages of low real-time position positioning cost, short construction period and convenient and quick system maintenance.

As shown in fig. 2, in another technical solution of the underground large space high-precision positioning method, the eight single-lens reflex cameras of the panoramic camera system are all located on the same horizontal plane and are respectively located in the middle of each side of a regular octagon.

According to the invention, the panoramic camera system comprising eight single-lens reflex cameras is adopted to collect the spatial image picture data, the eight single-lens reflex cameras are arranged in a regular octagon shape, so that dead-angle-free shooting can be ensured in the data collection process, the integrity of the image data is ensured, the shooting efficiency can be greatly improved by simultaneously shooting the eight single-lens reflex cameras, and the delay of data collection is reduced; any single-lens reflex camera is positioned in the middle of one side of the regular octagonal structure, so that the normal lines of the eight single-lens reflex cameras are intersected at the same point, and the subsequent real-time position positioning is facilitated.

In another technical scheme, the underground large space high-precision positioning method specifically comprises the following steps of S2:

and measuring control point coordinates based on control measurement, performing absolute coordinate conversion on the three-dimensional point cloud model by using the control point coordinates, and generating a triangulation network model by using the converted three-dimensional point cloud model. And selecting a target or other reference object in the image picture as a control point.

The basic principle of close-range photogrammetry is based on a three-dimensional digital image, a computer is used for image processing and image matching, corresponding image points and coordinates are automatically identified, the three-dimensional coordinates of a target object are determined by an analytic photogrammetry method, and a digital elevation model, an orthoimage, a vector line drawing and the like are output. The digital close-range photogrammetry has the advantages which cannot be compared with other measurement methods in the aspects of timeliness, safety, information quantity and workload.

When the object point, the image point and the projection center are on the same straight line, the central projection equation of the object point and the image point is a collinear equation. Each ray may list two collinearity equations:

in the equation, x and y are plane coordinates of the image point; x is the number of₀，y₀The inner orientation element of the image; x_d，Y_d，Z_dIs the object space coordinates of the camera; x, Y, Z are the object space coordinates of the object point. Wherein the error equation is:

v is the coordinate correction matrix, Δ is the correction matrix, B is the coefficient matrix, and l is the difference matrix.

The normal equation is:

wherein P is an observation weight matrix. The space rear intersection is that the inner orientation element and at least 3 ground point coordinates of the known photo and the corresponding image point coordinates are solved by utilizing a collinear equation to obtain 6 outer orientation elements of the photo. The spatial forward intersection is to calculate the three-dimensional coordinates of ground points by using more than 2 photos, and the method needs to know the coordinates of the image points and the internal and external orientation elements of the photos.

In another technical solution, in the underground large space high-precision positioning method, the convolution of the backbone network of the deep learning network model in step S3 is a convolution with a self-attention mechanism, and the self-attention mechanism shares a convolution kernel in a channel dimension.

With the improvement of target detection technology, the current mainstream target detection algorithm is mainly based on a deep learning model. The system uses a two-dimensional deep learning algorithm as a main mode of picture matching, divides the whole image of image data into grids of S x S, predicts only one target for each grid unit, predicts a fixed number of bounding boxes, is responsible for detecting the targets falling into the grids, predicts the bounding boxes, the positioning confidence coefficients and all category probability vectors of the targets contained in all the grid units at one time, and finally selects a more appropriate bounding box as the final positioning.

During prediction, a bounding box is determined through some parameter values, and whether an object exists in each grid is judged to be targetability, coordinate information in the bounding box and a label of a category. Five values are determined in each Box, including four coordinate information of x, y, w, h, and a confidence score. The confidence score of the box represents the accuracy of the bounding box selection and the possibility of the target contained in the box, and if the target exists, the target is determined to be of which type, and if the center coordinate of a certain target in the ground route falls into which grid unit, the target is predicted by the grid unit. The two boxes are followed by classes, with different datasets having different classes, and for the PASCAL VOC dataset there are 20 classes, which is a total of 30 parameters.

For each grid cell, B bounding boxes need to be predicted, each box computes a confidence score, and only one target is detected regardless of the number of bounding boxes B. C conditional class probabilities are predicted, one value for each class for the possible target classes. To summarize, the image is divided into grids of S × S, each grid cell predicts confidence scores for B bounding boxes and C class probabilities, which are encoded as tensors of S × S (B × 5+ C). For the PASCAL VOC dataset, where S =7, B =2, there are 20 classes in the dataset, so C =20, the final prediction is a tensor of 7 x 30. There are 3 boxes, and each grid cell has three boxes of different dimensions to predict the object.

The invention improves the input end when the deep learning network is trained, and particularly adopts a Mosaic data enhancement mode, 4 pictures are randomly used in an input sample set (the sample set is space image data acquired by a first panoramic camera system), are randomly scaled and then are randomly distributed for splicing, the detection training set is greatly enriched by the Mosaic data enhancement mode, and especially, many small targets are increased by the random scaling, so that the robustness of the network is better.

The network structure of the deep learning network of the invention comprises three parts: the method comprises the steps of Backbone (Backbone) network, neck network and head network (two-layer conv structure), wherein the Backbone network is CSDarknet-53, the convolutional layer in the CSDarknet-53 is replaced by the convolutional layer with a self-attention mechanism, the self-attention mechanism shares a convolutional kernel in a channel dimension, a characteristic weight matrix is learned through the self-attention mechanism, the more important convolutional kernel is obtained when the weight value is larger, and n weights before the weight value are shared convolutional kernels.

The model of present needs a large amount of computing resources to generate an accurate detection result, but on mobile or edge equipment, more crucial is computational efficiency and precision, and in order to promote the recognition precision and accelerate the recognition speed, the self-attention factor is added in the method and the device to effectively improve the recognition precision and speed of the deep learning network.

The self-attention factors share convolution kernels (kernel) in the channel dimension, while more flexible modeling is performed in the space dimension with space-specific kernel. The size of the convolution kernel in the factor is

Where O < C, indicates that all channels share O kernel. The operation is expressed as:

wherein

Is a convolution kernel.

In the self-attention factor, a fixed weight matrix is not adopted as a learnable parameter, but the generation of a corresponding kernel based on the input feature map is considered, thereby ensuring that the kernel size and the input feature size can be automatically aligned in the spatial dimension. The general form of kernel generation is as follows:

wherein

Is a set of indices of the neighborhood of coordinates (i, j), so

Showing inclusion on feature map

A certain pa oftch。

Starting from a simple and effective design concept, a bottleeck structure similar to SEnet is provided:

it is taken as the single point set of { (i, j) }, i.e.

Taking a single pixel with coordinates (i, j) on feature map, thereby obtaining an instantiation generated by kernel;

wherein

And

is the linear transformation matrix and r is the channel reduction ratio, is the intermediate BN and ReLU.

According to the invention, a self-attention mechanism is introduced into a backbone network, the self-attention mechanism shares a convolution kernel in a channel dimension, the same convolution kernel is used for different output channels through the self-attention mechanism, the channel invariance is ensured, the parameter amount and the calculation amount are reduced through the channel invariance, the convolution kernel of an inner volume operator is firstly manufactured, and 1X1 convolution is carried out for channel compression = (C// compression ratio); the channel expands to (number of groups G x K x K), the parameter amount of convolution is about K x K x C x C, and the parameter amount is reduced. Because the cores of the channels in the group are shared, the number of parameters cannot be increased by using large-size cores in the space scale, the problem that the calculation amount of each channel needs to be greatly increased by using different cores is solved, and the calculation complexity is greatly reduced on the premise of ensuring the performance improvement.

The Neck (Neck) is SP and PNET, one 608 × 3 image is input from the input end and enters the Backbone network of the Backbone network, there are five SP structures in CSDarknet-53, each SP structure performs one down-sampling on the image, after 5 down-sampling, the feature map of 19 × 19 is finally obtained, and the feature maps of 5 sizes 304 × 304, 152 × 152, 76, 38 × 38 and 19 × 19 are obtained altogether, and the feature maps of the last three sizes are selected and input to the Neck. The method comprises the steps of starting up sampling from a feature map with the minimum size in a neck, splicing with a feature map with the second small size after the first up sampling, then up sampling, splicing with the feature map with the maximum size, down sampling the final spliced feature map, splicing with the feature map spliced by the first up sampling, and then continuing down sampling and splicing with the feature map with the minimum size. And finally obtaining three paths of information, wherein each path of information comprises information obtained by fusing different size characteristic diagrams, and then sending the information to the heads with different sizes for processing, and obtaining a target detection result in the heads. Using the Mish function as the activation function (TOP-1 and TOP-5 with Mish activation function have slightly higher precision than without), the Mish activation function expression is:

the method is characterized in that SPPs are added to the neck (the receiving range of trunk features is more effectively increased by adopting an SPP module than a mode of simply using k x k maximum pooling, and the most important context features are obviously separated), the SPPs are firstly used in fast-RCNN, a feature map is input from an input end, then multi-scale maximum pooling is carried out, four paths of feature maps are obtained after pooling is carried out, then padding is used, the size of the padding is 1 smaller than that of the feature map, and the size of the original feature map can be reserved. After four paths of feature maps are spliced, the sizes of the original feature maps can be restored through convolution. The introduction of SPP into CSDarknet-53 increases the receptive field, extracts the most important features, and has no loss of network processing speed.

The PANET that the neck used, the PANET principle is similar with the FPN principle, but the information below the FPN can information loss after the multilayer network, and PANET not only has the circuit from top to bottom but also the circuit from bottom to top, (FPN layer conveys strong semantic characteristics from top to bottom, and the characteristic pyramid conveys strong location characteristics from bottom to top, carries out parameter polymerization to different detection layers from different backbone layers) after the concatenation of the same size characteristic map, the characteristic of existing bottom has high-level semantic characteristics again.

In another technical scheme, the underground large space high-precision positioning method specifically comprises the following steps of S4:

if the homonymous pixel point obtained in step S3 is one of the vertices of the triangulation network model (as shown in fig. 3), the three-dimensional space coordinate of the vertex is the three-dimensional space coordinate corresponding to the homonymous pixel point, and if the homonymous pixel point obtained in step S3 is in a triangle outside the vertices of the triangulation network model (as shown in fig. 4), the three-dimensional space coordinate of the homonymous pixel point is calculated according to the three vertex coordinates and the pixel interrelation of the triangle in which the homonymous pixel point is located;

and converting each pixel point with the same name into a three-dimensional space coordinate according to the method to obtain the three-dimensional space coordinate of the object center position point corresponding to each single lens reflex (the three-dimensional space coordinate corresponding to the pixel point with the same name is the three-dimensional space coordinate of the object center position point in the picture image shot by the single lens reflex corresponding to the pixel point with the same name).

The technical scheme includes that two-dimensional homonymous pixel points are converted into three-dimensional space coordinates based on a triangulation network model, the triangulation network model is formed by packaging a plurality of pixel points, each homonymous pixel point corresponds to one pixel point in the triangulation network model, corresponding position points of the homonymous pixel points are found in the triangulation network model according to the corresponding relation, and if the corresponding position points are triangular vertexes of the triangulation network model, corresponding three-dimensional space coordinates exist on the vertexes, the three-dimensional space coordinates of the vertexes can be directly determined to be the three-dimensional space coordinates of object space center position points; if the corresponding position point is not on the triangle vertex of the triangulation network model, calculating the three-dimensional space coordinate of the corresponding position point according to the mutual relation between the three-dimensional space coordinate of the three vertices of the triangle where the corresponding position is located and the pixel, and obtaining the three-dimensional space coordinate of the object center position point.

In another technical scheme, the underground large space high-precision positioning method specifically comprises the following steps of S5: drawing a perpendicular line of the corresponding side of the regular octagon on a horizontal plane by taking the central position point of any object space as a starting point, wherein the perpendicular line is the central line;

As shown in fig. 3, when the corresponding position point of the homonymous pixel point in the triangulation network model is at the vertex, the vertex of the triangle is used as the starting point, and the perpendicular line of the side where the single lens reflex corresponding to the vertex is located is drawn, and the perpendicular line is the center line corresponding to the single lens reflex; as shown in fig. 4, when the corresponding position point in the triangulation network model is not located on the vertex, the three-dimensional space coordinate of the corresponding position point is calculated, and the corresponding position point is used as the starting point to draw the perpendicular line of the side where the single lens reflex corresponding to the corresponding position point is located, where the perpendicular line is the center line corresponding to the single lens reflex, and the center lines corresponding to all object center position points are drawn by the same method, as shown in fig. 1, eight center lines are compared with the T point, and the T point is the real-time position point.

The invention provides an electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory to make the electronic device execute the method.

The present invention is a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the above-described method.

The number of apparatuses and the scale of the process described herein are intended to simplify the description of the present invention. Applications, modifications and variations of the present invention will be apparent to those skilled in the art.

While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims

1. The underground large-space high-precision positioning method is characterized by comprising the following steps of:

s1, arranging targets in the underground space;

2. The underground large space high-precision positioning method according to claim 1, wherein the eight single-lens reflex cameras of the panoramic camera system are all located on the same horizontal plane and are respectively located in the middle of each side of the regular octagon.

3. The underground large space high-precision positioning method according to claim 1, wherein the step S2 is specifically:

4. The underground large space high precision positioning method of claim 1, wherein the convolution of the backbone network of the deep learning network model in step S3 is a convolution with a self-attention mechanism, and the self-attention mechanism shares a convolution kernel in the channel dimension.

5. The underground large space high-precision positioning method according to claim 1, wherein the step S4 is specifically:

6. The underground large space high-precision positioning method according to claim 2, wherein the step S5 is specifically as follows: drawing a perpendicular line of a side corresponding to the regular octagon on a horizontal plane by taking a central position point of any object space as a starting point, wherein the perpendicular line is the central line;

7. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, and the processor executes the one or more computer programs stored in the memory when the electronic device is running, so as to cause the electronic device to perform the method according to any of claims 1 to 6.

8. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 6.