Disclosure of Invention
In order to overcome the problems of high cost and poor positioning effect of the conventional indoor positioning method or at least partially solve the problems, embodiments of the present invention provide an indoor positioning method and apparatus based on an image.
According to a first aspect of the embodiments of the present invention, there is provided an image-based indoor positioning method, including:
acquiring an indoor image of a building shot by a user, and performing feature extraction on the indoor image and a BIM image of the building acquired in advance in a BIM image library based on a convolution calculation model to acquire a feature matrix of the indoor image and a feature matrix of the BIM image;
dividing the indoor image, acquiring a weight matrix of the indoor image according to a division result, and weighting the feature matrix of the indoor image and the feature matrix of the BIM image by using the weight matrix of the indoor image;
matching the indoor image with each BIM image according to the weighting result of the characteristic matrix of the indoor image and each BIM image, and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user; and the BIM image and the position coordinate are stored in a pre-associated mode.
Specifically, the step of extracting the features of the indoor image and the BIM image based on a convolution calculation model to obtain the feature matrix of the indoor image and the feature matrix of the BIM image includes:
scanning the indoor image and the BIM image by using a sliding window to obtain sub-images of the indoor image and the BIM image;
performing convolution calculation on the subgraph of the indoor image, the subgraph of the BIM image, the indoor image and the BIM image based on a convolution calculation model;
and taking the convolution calculation result of the sub-image of the indoor image and the indoor image as a characteristic matrix of the indoor image, and taking the convolution calculation result of the sub-image of the BIM image and the BIM image as the characteristic matrix of the BIM image.
Specifically, the step of scanning the indoor image and the BIM image by using a sliding window to acquire subgraphs of the indoor image and the BIM image comprises:
aligning the upper left corner of the sliding window with the upper left corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image;
aligning the upper right corner of the sliding window with the upper right corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image;
aligning the lower left corner of the sliding window with the lower left corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image;
aligning the lower right corner of the sliding window with the lower right corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image;
and aligning the central point of the sliding window with the central points of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image.
Specifically, the convolution calculation model is a VGG16 model;
correspondingly, the step of extracting the features of the indoor image and the BIM image based on the convolution calculation model to obtain the feature matrix of the indoor image and the feature matrix of the BIM image comprises the following steps:
taking the indoor image as an input of the VGG16 model, and taking an output of maxpool in a fourth layer of the VGG16 model as a feature matrix of the indoor image;
and taking the BIM image as an input of the VGG16 model, and taking an output of maxpool in a fourth layer of the VGG16 model as a feature matrix of the BIM image.
Specifically, the step of segmenting the indoor image and acquiring a weight matrix of the indoor image according to a segmentation result includes:
performing semantic segmentation on the indoor image based on a DeepLabv3+ model to obtain a segmentation result of the indoor image;
dividing the indoor image into grids, if the segmentation result of the indoor image in any grid comprises a foreground, setting the weight of the grid as a first preset weight, and otherwise, setting the weight of the grid as a second preset weight; wherein the first preset weight is smaller than the second preset weight;
and constructing a weight matrix of the indoor image according to the weights of all the grids.
Specifically, the step of weighting the feature matrix of the indoor image and the feature matrix of the BIM image respectively by using the weight matrix of the indoor image includes:
adjusting the size of the weight matrix of the indoor image so that the size of the weight matrix of the indoor image is the same as the size of the feature matrix of the indoor image;
and performing Hadamard multiplication on the adjusted weight matrix of the indoor image and the feature matrices of the indoor image and the BIM image respectively to obtain the weighting results of the feature matrices of the indoor image and each BIM image.
Specifically, the step of matching the indoor image with each of the BIM images according to the weighting result of the feature matrices of the indoor image and each of the BIM images, and using the position coordinate corresponding to the BIM image that is most matched with the indoor image as the position coordinate of the user includes:
expanding the weighting result of the feature matrix of the indoor image to obtain the feature vector of the indoor image, and expanding the weighting result of the feature matrix of each BIM image to obtain the feature vector of each BIM image;
calculating cosine similarity between the feature vectors of the indoor images and the feature vectors of the BIM images, and taking the BIM image corresponding to the maximum cosine similarity as the BIM image which is most matched with the indoor images;
and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user.
According to a second aspect of the embodiments of the present invention, there is provided an image-based indoor positioning apparatus, including:
the extraction module is used for acquiring an indoor image of a building shot by a user, extracting features of the indoor image and a BIM image of the building acquired in advance in a BIM image library based on a convolution calculation model, and acquiring a feature matrix of the indoor image and a feature matrix of the BIM image;
the weighting module is used for segmenting the indoor image, acquiring a weight matrix of the indoor image according to a segmentation result, and weighting the feature matrix of the indoor image and the feature matrix of the BIM image by using the weight matrix of the indoor image;
the positioning module is used for matching the indoor image with each BIM image according to the weighting result of the characteristic matrix of the indoor image and each BIM image, and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user; and the BIM image and the position coordinate are stored in a pre-associated mode.
According to a third aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor calls the program instruction to execute the image-based indoor positioning method provided in any one of the various possible implementations of the first aspect.
According to a fourth aspect of embodiments of the present invention, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the image-based indoor positioning method provided in any one of the various possible implementations of the first aspect.
The embodiment of the invention provides an image-based indoor positioning method and device, wherein the method uses a convolution calculation model to extract the characteristics of a real indoor image and a BIM image shot by a user, so that the problem of cross-domain image matching of the real indoor image and the BIM image is solved; the method comprises the steps of obtaining a weight matrix by segmenting an indoor image, weighting the characteristic matrices of the indoor image and the BIM image by using the weight matrix, realizing concentrated distribution of attention, finally matching the indoor image and the BIM image according to a weighting result, and taking a position coordinate corresponding to the BIM image which is most matched with the indoor image as a position coordinate of a user, so that automatic indoor positioning based on the image is realized, and the positioning precision is high.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In an embodiment of the present invention, an image-based indoor positioning method is provided, and fig. 1 is a schematic overall flow chart of the image-based indoor positioning method provided in the embodiment of the present invention, where the method includes: s101, acquiring an indoor image of a Building shot by a user, extracting features of the indoor image and a Building Information Model (BIM) image of the Building acquired in advance from a BIM gallery based on a convolution calculation model, and acquiring a feature matrix of the indoor image and a feature matrix of the BIM image;
the indoor image is a picture of an indoor scene, which is shot by a user on a road network located indoors in a building by using a camera device such as a mobile phone. And comparing the indoor image shot by the user with the BIM image to realize indoor positioning. The present embodiment can acquire the position coordinates of the photo taken by the user, and also can acquire the direction angle of the taken photo and the BIM image that is most matched with the taken photo by inputting the taken indoor image and the BIM gallery. In this embodiment, the actually photographed indoor image and the BIM image belong to two images in different fields, and there is a certain difference between the two images, such as texture, structural ratio, and the like, as shown in fig. 2 and 3. Because the difference of the images in different fields can not achieve the expected matching effect by directly applying the traditional image matching method such as the SIFT (Scale Invariant Feature Transform) Feature descriptor, the embodiment extracts the features of the indoor image and the BIM image through the convolution calculation model, reduces the difference of the images in different fields, and performs indoor positioning according to the image features obtained by convolution.
Before feature extraction is performed on the indoor image and the BIM image, the indoor image and the BIM image are read to obtain a matrix I (x, y, z). Where x, y are the width and height of the image, respectively, and z is the R, G and B channels of the image. And (3) adjusting the sizes of the indoor image and the BIM image to be (m, n) by using a bilinear interpolation algorithm.
BIM image acquisition aims at building a BIM image and a matching position coordinate data set by means of a BIM model. Countless acquisition points exist in the BIM indoor model, if infinite acquisition points are acquired, on one hand, acquisition is difficult, on the other hand, the image characteristics of adjacent points are similar, unnecessary resource waste is caused, moreover, too many acquisition points can cause too many BIM images to be detected, and the operation time of position positioning is greatly reduced. How to reasonably set acquisition points of BIM data and convert an acquisition point set in an infinite space into a limited BIM image acquisition point set is a key step of indoor positioning of cooperation of BIM and computer vision.
The method comprises the steps of combining a BIM indoor map model formed by a building, and selecting a BIM image acquisition point set, wherein the position to be positioned is a passable area in the BIM indoor map model generally, so that the spatial position of the passable area can be ignored, and nodes in a topological road network are key positions in an indoor space, such as doors, stairway bends and the like.
S102, segmenting the indoor image, acquiring a weight matrix of the indoor image according to a segmentation result, and weighting the feature matrix of the indoor image and the feature matrix of the BIM image by using the weight matrix of the indoor image;
and segmenting the indoor image into a foreground part and a background part. Foreground portions in indoor images, such as tables and chairs, flowerpots, people, etc., are dynamic and are not present in the BIM model, which are factors that are not conducive to indoor positioning. While the view of indoor positioning is focused on background parts such as ceilings, floors, walls, etc. In order to distinguish different degrees of influence of the foreground part and the background part on indoor positioning, different weights are set for the foreground and the background in the indoor image, so that a weight matrix of the indoor image is obtained, and the feature matrix of the indoor image and the feature matrix of the BIM image are weighted according to the weight matrix.
S103, matching the indoor image with each BIM image according to the weighting result of the characteristic matrix of the indoor image and each BIM image, and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user; and the BIM image and the position coordinate are stored in a pre-associated mode.
And comparing the weighting result of the feature matrix of the indoor image with the weighting result of the feature matrix of each BIM image, and acquiring the BIM image which is most matched with the indoor image according to the comparison result. And the position coordinates and the direction angle of the BIM image acquisition are taken as the position coordinates and the azimuth angle of indoor image shooting. And taking the position coordinates of BIM image acquisition as the coordinates of a user for shooting the indoor image.
The method has the advantages that the convolution calculation model is used for extracting the characteristics of the real indoor image and the BIM image shot by the user, so that the problem of cross-domain image matching of the real indoor image and the BIM image is solved; the method comprises the steps of obtaining a weight matrix by segmenting an indoor image, weighting the characteristic matrices of the indoor image and the BIM image by using the weight matrix, realizing concentrated distribution of attention, finally matching the indoor image and the BIM image according to a weighting result, and taking a position coordinate corresponding to the BIM image which is most matched with the indoor image as a position coordinate of a user, so that automatic indoor positioning based on the image is realized, and the positioning precision is high.
On the basis of the foregoing embodiment, in this embodiment, the step of extracting features of the indoor image and the BIM image based on a convolution calculation model, and acquiring a feature matrix of the indoor image and a feature matrix of the BIM image includes: scanning the indoor image and the BIM image by using a sliding window to obtain sub-images of the indoor image and the BIM image; performing convolution calculation on the subgraph of the indoor image, the subgraph of the BIM image, the indoor image and the BIM image based on a convolution calculation model; and taking the convolution calculation result of the sub-image of the indoor image and the indoor image as a characteristic matrix of the indoor image, and taking the convolution calculation result of the sub-image of the BIM image and the BIM image as the characteristic matrix of the BIM image.
Specifically, a plurality of sub-images are respectively intercepted from the indoor image and the BIM image by using a sliding window, so that the error of the situation that the visual field width of the indoor photo which is really shot is slightly larger than that of the BIM screenshot is reduced. And performing convolution calculation on the subgraph of the indoor image and the original indoor image to obtain a characteristic matrix of the indoor image. And carrying out convolution calculation on the subgraph of the BIM image and the original BIM image to obtain a feature matrix of the BIM image.
On the basis of the above embodiment, in this embodiment, the step of scanning the indoor image and the BIM image by using the sliding window to obtain the subgraphs of the indoor image and the BIM image includes: aligning the upper left corner of the sliding window with the upper left corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; aligning the upper right corner of the sliding window with the upper right corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; aligning the lower left corner of the sliding window with the lower left corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; aligning the lower right corner of the sliding window with the lower right corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; and aligning the central point of the sliding window with the central points of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image.
Specifically, five sub-images are intercepted by using the sliding window of (m ', n'), wherein the five sub-images respectively comprise four corner points and a central point, and the original image, and six images are obtained. And performing feature extraction on the six indoor images and the six BIM images. As shown in fig. 4, a in fig. 4 is an original image, and the hatched portion represents the feature extraction performed on the whole original image. B, c, d, e and f in fig. 4 are respectively expressed as subgraphs intercepted by aligning four corner points and a central point, and the shaded part represents the subgraph intercepted by a sliding window, and the characteristics of the subgraphs are extracted.
On the basis of the above embodiments, the convolution calculation model in this embodiment is a VGG16 model; correspondingly, the step of extracting the features of the indoor image and the BIM image based on the convolution calculation model to obtain the feature matrix of the indoor image and the feature matrix of the BIM image comprises the following steps: taking the indoor image as an input of the VGG16 model, and taking an output of maxpool in a fourth layer of the VGG16 model as a feature matrix of the indoor image; and taking the BIM image as an input of the VGG16 model, and taking an output of maxpool in a fourth layer of the VGG16 model as a feature matrix of the BIM image.
Specifically, the convolutional network structure of the VGG16 model is shown in Table 1, where conv3 denotes that the convolution kernel is a matrix of 3 × 3, the Spatial compensation (Spatial Padding) of the convolutional layer is one pixel, and the type of maxpool is (2, 2). in the convolution operation value, convolution is applied to the image, and through sparse connection and parameter sharing, the convolutional network is made to have properties to translation and the like while the number of parameters is greatly reducedThe same way is changed, we say that it is equivalent. If the functions f (x) and g (x) satisfy f [ g (x)]=g[f(x)]Then f (x) has an isodenaturing property for transform g. For convolution, if let g be an arbitrary translation function of the input, then the convolution function is equally degenerate for g. Pooling uses the overall statistical characteristics of neighboring outputs at a location instead of the output of the network at that location. Assume the input of the pooling layer is aijThen output Amax=max(aij). The maximum pooling makes the convolutional network locally shift invariant to the image. Since the accuracy of the fourth layer is the highest, the embodiment obtains the output of the maxpool of the fourth layer to obtain the feature matrix O (i; j; 512) of the image, and outputs 512 feature maps.
TABLE 1 convolutional network architecture of VGG16
The embodiment trains the VGG16 model using ImageNet before feature extraction using the VGG16 model. Preprocessing of ImageNet images is required prior to training. The pixel values of all ImageNet images are counted to obtain the mean values (r, g, b) of the three channels of RGB of all ImageNet images, namely
Wherein p is the value of the pixel point, i and j represent the coordinates of the pixel point in the width and height directions of the image, R represents the R channel of the image, G represents the G channel of the image, B represents the B channel of the image, and n represents the total number of the images of all training sets in ImageNet.
R, g and b are subtracted from pixels of three channels of RGB of each image respectively, preprocessing of the image is carried out, so that the influence of illumination or photo brightness on feature extraction is reduced, extracted features are emphasized on natural characteristics of the image, and an extracted feature matrix I (m ', n', 3) is obtained. Inputting I (m ', n', 3) into a VGG16 model, wherein a training set is ImageNet, and performing convolution calculation as follows:
the above formula is applied to the convolution calculation of I (m ', n', 3). Wherein V is the tensor of the input image; k is a convolution kernel; i, j are the height and width of the image; k is the number of channels and s is the step length.
On the basis of the foregoing embodiments, in this embodiment, the step of segmenting the indoor image and acquiring the weight matrix of the indoor image according to the segmentation result includes: performing semantic segmentation on the indoor image based on a DeepLabv3+ model to obtain a segmentation result of the indoor image; dividing the indoor image into grids, if the segmentation result of the indoor image in any grid comprises a foreground, setting the weight of the grid as a first preset weight, and otherwise, setting the weight of the grid as a second preset weight; wherein the first preset weight is smaller than the second preset weight; and constructing a weight matrix of the indoor image according to the weights of all the grids.
Specifically, the DeepLabv3+ model was used for semantic segmentation. The DeepLabv3+ model emphasizes foreground rather than background in the training process on the training set. Based on the segmentation result, the interval of the grid is set to (l)m,ln) And carrying out grid division on the indoor image. If there is a semantically segmented part, i.e., foreground, in the mesh, the mesh weight is reduced to w (w)<1) And the weights of the rest grids are 1 to obtain a weight matrix W. In this case, the first predetermined potential is w and the second predetermined weight is 1. Thereby reducing the influence of dynamic objects and focusing attention on static parts. A in fig. 5 indicates a mesh division result based on the division result, white lines are the divided meshes, black parts are the background, and white parts are the foreground in b in fig. 5. Drawing (A)In c of 5, the black part is an area with the weight of the second preset weight, and the white part is an area with the weight of the first preset weight.
In addition, the weight matrix of each sub-image of the indoor image can be acquired by adopting the method. When weighting is carried out, the feature matrix of the indoor image and the feature matrix of the BIM image are weighted by using the weight matrix of the indoor image, and the feature matrix of each subgraph of the indoor image and the feature matrix of each subgraph of the BIM image are weighted by using the weight matrix of each subgraph of the indoor image.
On the basis of the foregoing embodiments, in this embodiment, the step of weighting the feature matrix of the indoor image and the feature matrix of the BIM image by using the weight matrix of the indoor image includes: adjusting the size of the weight matrix of the indoor image so that the size of the weight matrix of the indoor image is the same as the size of the feature matrix of the indoor image; and performing Hadamard multiplication on the adjusted weight matrix of the indoor image and the feature matrices of the indoor image and the BIM image respectively to obtain the weighting results of the feature matrices of the indoor image and each BIM image.
Specifically, the weight matrix is scaled to be the same as the feature matrix, resulting in W'. Hadamard products are carried out on the feature matrix and the weight matrix to obtain a feature matrix O' (i, j,512), so that the attention of the indoor image and the BIM image is adjusted.
On the basis of the foregoing embodiments, in this embodiment, the step of matching the indoor image with each of the BIM images according to the weighting result of the feature matrices of the indoor image and each of the BIM images, and taking the position coordinate corresponding to the BIM image that is most matched with the indoor image as the position coordinate of the user includes: expanding the weighting result of the feature matrix of the indoor image to obtain the feature vector of the indoor image, and expanding the weighting result of the feature matrix of each BIM image to obtain the feature vector of each BIM image; calculating cosine similarity between the feature vectors of the indoor images and the feature vectors of the BIM images, and taking the BIM image corresponding to the maximum cosine similarity as the BIM image which is most matched with the indoor images; and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user.
Specifically, the feature matrix O' (I, j,512) is expanded into feature vectors. And the feature vectors of all BIM images in the BIM image library form a vector library. The method comprises the steps of calculating cosine similarity between a feature vector of an indoor image and each feature vector in a vector library, selecting the feature vector in the vector library corresponding to the maximum cosine similarity, taking a BIM image to which the selected feature vector belongs as a BIM image which is most matched with the indoor image, and taking a position coordinate associated with the BIM image as a position coordinate for shooting the indoor image, so that the position of a user is positioned. A complete flow chart of the image-based indoor positioning method is shown in fig. 6.
In another embodiment of the present invention, an image-based indoor positioning apparatus is provided, which is used for implementing the method in the foregoing embodiments. Therefore, the description and definition in the embodiments of the image-based indoor positioning method described above may be used for understanding the respective execution modules in the embodiments of the present invention. Fig. 7 is a schematic diagram of an overall structure of an image-based indoor positioning apparatus according to an embodiment of the present invention, where the apparatus includes an extraction module 701, a weighting module 702, and a positioning module 703, where:
the extraction module 701 is configured to acquire an indoor image of a building, which is shot by a user, perform feature extraction on the indoor image and a Building Information Model (BIM) image of the building, which is acquired in advance in a BIM gallery, based on a convolution calculation model, and acquire a feature matrix of the indoor image and a feature matrix of the BIM image;
the weighting module 702 is configured to segment the indoor image, obtain a weight matrix of the indoor image according to a segmentation result, and weight the feature matrix of the indoor image and the feature matrix of the BIM image respectively by using the weight matrix of the indoor image;
the positioning module 703 is configured to match the indoor image with each of the BIM images according to a weighting result of the feature matrices of the indoor image and each of the BIM images, and use a position coordinate corresponding to the BIM image that is most matched with the indoor image as a position coordinate of the user; and the BIM image and the position coordinate are stored in a pre-associated mode.
The method has the advantages that the convolution calculation model is used for extracting the characteristics of the real indoor image and the BIM image shot by the user, so that the problem of cross-domain image matching of the real indoor image and the BIM image is solved; the method comprises the steps of obtaining a weight matrix by segmenting an indoor image, weighting the characteristic matrices of the indoor image and the BIM image by using the weight matrix, realizing concentrated distribution of attention, finally matching the indoor image and the BIM image according to a weighting result, and taking a position coordinate corresponding to the BIM image which is most matched with the indoor image as a position coordinate of a user, so that automatic indoor positioning based on the image is realized, and the positioning precision is high.
On the basis of the foregoing embodiment, the extraction module in this embodiment is specifically configured to: scanning the indoor image and the BIM image by using a sliding window to obtain sub-images of the indoor image and the BIM image; performing convolution calculation on the subgraph of the indoor image, the subgraph of the BIM image, the indoor image and the BIM image based on a convolution calculation model; and taking the convolution calculation result of the sub-image of the indoor image and the indoor image as a characteristic matrix of the indoor image, and taking the convolution calculation result of the sub-image of the BIM image and the BIM image as the characteristic matrix of the BIM image.
On the basis of the foregoing embodiment, the extraction module in this embodiment is further configured to: aligning the upper left corner of the sliding window with the upper left corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; aligning the upper right corner of the sliding window with the upper right corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; aligning the lower left corner of the sliding window with the lower left corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; aligning the lower right corner of the sliding window with the lower right corners of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image; and aligning the central point of the sliding window with the central points of the indoor image and the BIM image, and intercepting sub-images of the indoor image and the BIM image.
On the basis of the above embodiment, the convolution calculation model in this embodiment is a VGG16 model; correspondingly, the extraction module is specifically configured to: taking the indoor image as an input of the VGG16 model, and taking an output of maxpool in a fourth layer of the VGG16 model as a feature matrix of the indoor image; and taking the BIM image as an input of the VGG16 model, and taking an output of maxpool in a fourth layer of the VGG16 model as a feature matrix of the BIM image.
On the basis of the above embodiments, the weighting module in this embodiment is specifically configured to: performing semantic segmentation on the indoor image based on a DeepLabv3+ model to obtain a segmentation result of the indoor image; dividing the indoor image into grids, if the segmentation result of the indoor image in any grid comprises a foreground, setting the weight of the grid as a first preset weight, and otherwise, setting the weight of the grid as a second preset weight; wherein the first preset weight is smaller than the second preset weight; and constructing a weight matrix of the indoor image according to the weights of all the grids.
On the basis of the above embodiments, the weighting module in this embodiment is specifically configured to: adjusting the size of the weight matrix of the indoor image so that the size of the weight matrix of the indoor image is the same as the size of the feature matrix of the indoor image; and performing Hadamard multiplication on the adjusted weight matrix of the indoor image and the feature matrices of the indoor image and the BIM image respectively to obtain the weighting results of the feature matrices of the indoor image and each BIM image.
On the basis of the foregoing embodiments, the positioning module in this embodiment is specifically configured to: expanding the weighting result of the feature matrix of the indoor image to obtain the feature vector of the indoor image, and expanding the weighting result of the feature matrix of each BIM image to obtain the feature vector of each BIM image; calculating cosine similarity between the feature vectors of the indoor images and the feature vectors of the BIM images, and taking the BIM image corresponding to the maximum cosine similarity as the BIM image which is most matched with the indoor images; and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user.
Fig. 8 illustrates a physical structure diagram of an electronic device, and as shown in fig. 8, the electronic device may include: a processor (processor)801, a communication Interface (Communications Interface)802, a memory (memory)803 and a communication bus 804, wherein the processor 801, the communication Interface 802 and the memory 803 complete communication with each other through the communication bus 804. The processor 801 may call logic instructions in the memory 803 to perform the following method: acquiring an indoor image of a building shot by a user, and performing feature extraction on the indoor image and a BIM image of the building in a BIM image library based on a convolution calculation model to acquire a feature matrix of the indoor image and a feature matrix of the BIM image; dividing the indoor image, acquiring a weight matrix of the indoor image according to a division result, and weighting the feature matrices of the indoor image and the BIM image respectively by using the weight matrix of the indoor image; matching the indoor image with each BIM image according to the weighting result of the characteristic matrix of the indoor image and each BIM image, and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user; wherein, the BIM image and the position coordinate are stored in advance in an associated manner.
In addition, the logic instructions in the memory 803 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: acquiring an indoor image of a building shot by a user, and performing feature extraction on the indoor image and a BIM image of the building in a BIM image library based on a convolution calculation model to acquire a feature matrix of the indoor image and a feature matrix of the BIM image; dividing the indoor image, acquiring a weight matrix of the indoor image according to a division result, and weighting the feature matrices of the indoor image and the BIM image respectively by using the weight matrix of the indoor image; matching the indoor image with each BIM image according to the weighting result of the characteristic matrix of the indoor image and each BIM image, and taking the position coordinate corresponding to the BIM image which is most matched with the indoor image as the position coordinate of the user; wherein, the BIM image and the position coordinate are stored in advance in an associated manner.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.