CN114494436A

CN114494436A - Indoor scene positioning method and device

Info

Publication number: CN114494436A
Application number: CN202210089833.4A
Authority: CN
Inventors: 刘建华; 冯国强; 王楠
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-13

Abstract

The application provides an indoor scene positioning method and device, and the method comprises the following steps: inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image; determining the position information of the current position based on the first position information of the target positioning identifier; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building. According to the indoor scene positioning method and device, the indoor scene identification with universality is used as the positioning anchor point, so that real-time positioning under indoor scenes can be carried out according to the identification matching positioning method provided by the application under different indoor scenes.

Description

Indoor scene positioning method and device

Technical Field

The application relates to the field of building positioning, in particular to an indoor scene positioning method and device.

Background

The indoor positioning technology is one of core technologies Based on Location Based Services (LBS), a plurality of application schemes oriented to scenes exist, visual features are taken as main semantic information for helping people understand the environment, so that the visual features occupy a leading part, and a plurality of technologies related to indoor scene recognition are widely adopted.

However, the engineering application problem of indoor scene identification and positioning is not well solved due to the problems of insufficient semantic constraint information of the building Map, immature matching and positioning technology of the Map Location Anchor (MLA) of the building, and the like.

Disclosure of Invention

The application aims to provide an indoor scene positioning method and device, which are used for indoor scene positioning of a mobile terminal.

The application provides an indoor scene positioning method, which comprises the following steps:

acquiring a real-time image in a target building, which is acquired by a camera; inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image; determining the position information of the current position based on the first position information of the target positioning identifier; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

Optionally, before the inputting the real-time image into the object model and identifying the object location identifier in the real-time image, the method further includes: acquiring images of different angles and/or different distances of a positioning identifier used for auxiliary positioning in the target building, and generating the target sample set; training the improved YOLOv5 model by using the target sample set to obtain the target model; wherein the improved YOLOv5 model comprises best class only part, and removes multi-label multi label part in the original YOLOv5 model.

Optionally, the determining the location information of the current location based on the first location information of the target location identifier includes: matching the identification information of the target positioning identification with the identification information of the positioning identification in the map positioning anchor point MLA of the target building to obtain first position information corresponding to the target positioning identification; wherein the map location anchor MLA comprises: at least one positioning anchor point; the anchor point information of each positioning anchor point comprises: identification information of the positioning identification and position information of the positioning identification.

Optionally, after the identification information of the target positioning identifier is matched with the identification information of the positioning identifier in the map positioning anchor point MLA of the target building to obtain the first position information corresponding to the target positioning identifier, the method further includes: calculating the distance between the camera and the auxiliary positioning point according to the second position information of the auxiliary positioning point, and obtaining third position information of a corresponding positioning reference point; generating a buffer area by taking the positioning reference point as a center; the radius of the buffer area is the sum of a target error range and a preset step length; screening walking nodes positioned in the range of the buffer area from a walking node set to obtain a walking node subset, and calculating the distance between each walking node in the walking node subset and the positioning reference point; determining fourth position information of the walking node with the shortest distance between the walking node subset and the positioning reference point as position information of the current position; wherein the auxiliary positioning point is used for constraint verification of the position information of the current position; the auxiliary positioning points are as follows: and any positioning point before the current position determines the position information.

Optionally, after the calculating the distance between each walking node in the subset of walking nodes and the positioning reference point, the method further comprises: determining a shortest distance between the subset of walking nodes and the location reference point as a distance error; and determining the maximum error range after the obtained plurality of distance errors are subjected to self-adaptive correction as the target error range.

Optionally, before determining the location information of the current location based on the first location information of the target location identifier, the method further includes: obtaining a Building Information Model (BIM) model of the target building based on the three-dimensional model of the target building; extracting the spatial topological relation of the spatial nodes of the target building, and extracting the spatial coordinate information of the spatial nodes from the BIM; constructing a road network organization in the horizontal direction and the vertical direction based on the spatial topological relation and the spatial coordinate information to obtain a network model; extracting geometric information from the BIM model, and selecting a space represented by a building component which belongs to a specific space and has a boundary relation to obtain an entity model; linking the entity model and the network model according to a preset linking rule to obtain a building map mixed data model; wherein the preset linking rules include linking and mapping based on semantic relationships between building elements in the solid model and spatial nodes in the network model.

Optionally, after the entity model and the network model are linked according to a preset linking rule to obtain a building map mixed data model, the method further includes: setting walking nodes on the mixed data model to obtain a walking node set; the walking nodes are nodes which are arranged between any adjacent walking end nodes in the preset step length; the walking end node is set based on the position information of the space node in the network model and the auxiliary positioning mark required for indoor navigation.

The application also provides an indoor scene positioning device, includes:

the acquisition module is used for acquiring a real-time image in the target building acquired by the camera; the identification module is used for inputting the real-time image into a target model and identifying a target positioning identifier in the real-time image; the positioning module is used for determining the position information of the current position based on the first position information of the target positioning identifier; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

Optionally, the apparatus further comprises: a training module; the acquisition module is further configured to acquire images of different angles and/or different distances of a positioning identifier used for assisting positioning in the target building, and generate the target sample set; the training module is configured to train the improved YOLOv5 model using the target sample set to obtain the target model; wherein the improved YOLOv5 model comprises best class only part, and removes multi-label multi label part in the original YOLOv5 model.

Optionally, the positioning module is specifically configured to match identification information of the target positioning identifier with identification information of a positioning identifier in a map positioning anchor MLA of the target building, so as to obtain first position information corresponding to the target positioning identifier; wherein the map location anchor MLA comprises: at least one positioning anchor point; the anchor point information of each positioning anchor point comprises: identification information of the positioning identification and position information of the positioning identification.

Optionally, the positioning module is further specifically configured to calculate a distance between the camera and the auxiliary positioning point according to the second position information of the auxiliary positioning point, and obtain third position information of a corresponding positioning reference point; the positioning module is specifically further configured to generate a buffer area with the positioning reference point as a center; the radius of the buffer area is the sum of a target error range and a preset step length; the positioning module is specifically further configured to screen walking nodes located in the range of the buffer area from a walking node set to obtain a walking node subset, and calculate a distance between each walking node in the walking node subset and the positioning reference point; the positioning module is further specifically configured to determine, as the position information of the current position, fourth position information of a walking node in the walking node subset having a shortest distance to the positioning reference point; wherein the auxiliary positioning point is used for constraint verification of the position information of the current position; the auxiliary positioning points are as follows: and any positioning point before the current position determines the position information.

Optionally, the apparatus further comprises: a determination module; the determining module is configured to determine a shortest distance between the subset of walking nodes and the positioning reference point as a distance error; the determining module is further configured to determine a maximum error range obtained by performing adaptive correction on the obtained multiple distance errors as the target error range.

Optionally, the apparatus further comprises: a construction module and an extraction module; the building module is used for obtaining a building information model BIM (building information model) of the target building based on the three-dimensional model of the target building; the extraction module is used for extracting the spatial topological relation of the spatial node of the target building and extracting the spatial coordinate information of the spatial node from the BIM model; the building module is further configured to build a road network organization in a horizontal direction and a vertical direction based on the spatial topological relation and the spatial coordinate information to obtain a network model; the extraction module is further used for extracting geometric information from the BIM model; the building module is also used for selecting the space represented by the building components which belong to the specific space and have the boundary relation to obtain an entity model; the building module is further used for linking the entity model and the network model according to a preset linking rule to obtain a building map mixed data model; wherein the preset linking rules include linking and mapping based on semantic relationships between building elements in the solid model and spatial nodes in the network model.

Optionally, the apparatus further comprises: setting a module; the setting module is used for setting walking nodes on the mixed data model to obtain the walking node set; the walking nodes are nodes which are arranged between any adjacent walking end nodes in the preset step length; the walking end node is set based on the position information of the space node in the network model and the auxiliary positioning mark required for indoor navigation.

The present application further provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the indoor scene localization method as described in any of the above.

The present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the indoor scene positioning method as described in any of the above when executing the program.

The present application also provides a readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the indoor scene localization method as defined in any of the above.

According to the indoor scene positioning method and device, when a user needs to position indoors, the user can use the mobile terminal to shoot an indoor scene, obtain a real-time image in a target building collected by the camera, input the real-time image into a target model, obtain first position information of a target positioning identifier in the real-time image, and then determine the position information of the current position based on the first position information, so that the real-time positioning of the indoor scene is achieved. The target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an indoor scene positioning method provided in the present application;

FIG. 2 is a schematic diagram of a setup walking node provided herein;

fig. 3 is a schematic structural diagram of an indoor scene positioning device provided in the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

In order to be capable of widely positioning an indoor scene through a mobile terminal, the embodiment of the application provides an indoor scene positioning method. Firstly, a building map positioning anchor point (MLA) geocoding entity library is constructed, on one hand, a real-scene building map like 'being in the scene' can be provided for a user, and on the other hand, semantic anchor point constraint information is provided for positioning of a mobile terminal. Secondly, identifying a camera of the mobile terminal to acquire a real-time image by using an improved YOLOv5s deep learning model carried on the mobile terminal, and obtaining element information of the universal positioning anchor point in the building scene. And finally, matching the acquired scene element space position with a building map positioning anchor point (MLA), thereby realizing real-time positioning navigation. The indoor scene positioning method provided by the embodiment of the application can effectively realize higher positioning accuracy in the building scene with rich MLA element information. In addition, the building map positioning anchor point (MLA) has the universality characteristic, and the positioning algorithm based on scene element identification can be compatible with the expansion of indoor map data types, so that the indoor scene positioning method provided by the application has a good engineering application prospect.

The indoor scene positioning method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As shown in fig. 1, an indoor scene positioning method provided in an embodiment of the present application may include the following steps 101 to 103:

and 101, acquiring a real-time image in a target building, which is acquired by a camera.

It should be noted that the indoor scene positioning method provided in the embodiment of the present application may be applied to a mobile terminal, and may also be applied to other portable devices, for example, a sweeping robot and other devices.

For example, the camera may be a camera arranged on a mobile terminal, and when a user needs to perform indoor scene positioning in a target building, the current indoor scene may be shot by using the mobile terminal to obtain a real-time image in the target building.

And 102, inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image.

Wherein the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building. The target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

It should be noted that the positioning identifier adopted in the embodiment of the present application is an indoor identifier with universality in an indoor scene, for example: indoor signs such as doors, doorplates, fire-fighting cabinets, fire alarm devices, security exits, cameras, Wireless Local Area Networks (WLAN), electronic boxes and elevators. When the user performs real-time positioning of an indoor scene, the positioning identifier with universality may be used as a positioning anchor mla (map Location anchor) to calculate the Location information of the current Location.

For example, after a real-time image in a target building is acquired through a camera, the real-time image can be input into a target model, a positioning identifier for assisting positioning is identified from semantic information contained in image data of the real-time image through the target model, and real-time positioning of an indoor scene is realized based on the positioning identifier.

It is understood that the semantic information includes: the mobile terminal can better understand the motion rule of the user, sense the user scene, plan the navigation route and be covered in the multi-level and rich-dimensionality information of the high-precision map of the building. The semantic information can better represent the scene where the user is located.

Illustratively, the target model is obtained by training a modified YOLOv5 model using a target sample set. The target sample set comprises images of positioning marks used for auxiliary positioning in the target building, namely, images of the positioning marks in the target building, which are acquired from multiple angles and different distances, and the acquired images are used for training the subsequent Yolov5 model to obtain the target model. The target model can accurately identify the positioning marks from the real-time images.

Specifically, the step 102 may include the following steps 102a1 and 102a 2:

step 102a1, acquiring images of different angles and/or different distances of positioning marks used for assisting positioning in the target building, and generating the target sample set.

It can be understood that, in order to improve the accuracy of the model for identifying the location markers, images of different angles and different distances of the location markers need to be acquired to train the improved YOLOv5 model.

Step 102a2, training the improved YOLOv5 model by using the target sample set to obtain the target model.

Wherein the improved YOLOv5 model comprises best class only part, and removes multi-label multi label part in the original YOLOv5 model.

It should be noted that, in the embodiment of the present application, a deep learning open source framework Pytorch is used to perform model building, training, testing, verifying, and mobile terminal deployment on the YOLO v5 algorithm, so as to implement identification of a positioning anchor point element (i.e., the positioning identifier for assisting positioning) in an indoor scene. The YOLOv5 network architecture contains four network models, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5 x. The main difference between the two is that the number of the feature extraction modules and the number of the convolution kernels of each network model at a specific position are different, and the size and the number of the parameters of each network model are increased in turn. As the experiment has nine targets to be identified, and has high requirements on the real-time property and the lightweight property of the identification model. Therefore, the accuracy, efficiency and size of the recognition model are comprehensively considered, and finally, the building map positioning anchor point element recognition network in the indoor scene is improved based on the YOLOv5s system structure.

Illustratively, the YOLO v5s architecture consists essentially of four parts, an input, a backbone, a neck, and a detection net. The input end adopts methods such as Mosaic data enhancement, self-adaptive anchor frame calculation, self-adaptive picture scaling and the like to carry out optimization processing on the input image, so that the calculated amount is reduced, and the target detection speed is increased. The backbone network is a convolution neural network, and is used for aggregating and forming image characteristics on different fine image granularities, so that the calculation of a model is reduced, and the training speed is accelerated.

The improved YOLOv5 model provided by the embodiment of the present application first uses a slicing operation to divide an input three-channel image (3 × 640 × 640) into 4 slices, each of which has a size of 3 × 320 × 320. Second, the four parts are deeply connected using the merge array Concat operation, with an output feature map size of 12 × 320 × 320. Then, an output feature map having a size of 32 × 320 × 320 is generated by a convolution layer composed of 32 convolution kernels. Finally, the result is output to the next layer through a Batch Normalization (BN) layer and a Hardswish activation function. The neck network is a feature aggregation layer for mixing and combining a series of image features, and is mainly used for generating a Feature Pyramid Network (FPN) and then transmitting an output feature map to a detection network.

The detection network is mainly used for final detection of the model, and applies an anchor frame to a feature map output by a previous layer to output a vector with the class probability of a target object, a target score and the position of a bounding box around the target. The detection network of the YOLOv5s architecture consists of three detection layers, which input feature maps with dimensions 80 × 80, 40 × 40 and 20 × 20, respectively, for detecting image objects of different sizes.

According to the improved YOLOv5 model, multiple arrays are used for storing candidate frame parameters in the post-processing process, the original multi label is removed, a last class only part is added, then a predicted boundary frame and an object class in an original image are generated and marked, so that the recognition task of the text is adapted, and the detection of the building map positioning anchor point element object in an indoor scene image is realized.

Compared with the original YOLOv5 model, the improved YOLOv5 model provided by the application has the advantages that the recognition speed of the image frames is improved, and the image frames can be improved from 31.858 frames/s to 33.219 frames/s. Meanwhile, the training speed and the recognition accuracy are improved.

It should be noted that, in order to deploy the trained improved YOLOv5 model to the mobile terminal, first, the YOLOv5.pt model needs to be converted into the tflite model, and a Flatbuffer serialization model file format is adopted, so as to be more suitable for being carried by the mobile terminal. Meanwhile, in order to reduce the calculation pressure of the mobile terminal, the model is compressed in a quantification mode, the weight parameters stored in the model file are converted into FP16 from Float32, and after compression, the model is deployed to the mobile terminal.

Specifically, the quantization formula is shown as formula one, formula two, formula three, and formula four as follows:

X_quantized＝X_float÷X_scale+X_zeropint(formula one)

X_float＝X_scale×(X_quantized-X_zeropoint) (formula four)

Wherein, the formula I is quantization from floating point value to fixed point value, the formula IV is inverse quantization from fixed point value to floating point value, and X_floatRepresenting true floating-point values, X_quantizedRepresenting quantized fixed-point values, X_scaleIndicating the compression ratio of the quantization interval,

the value of the floating-point value that is the largest,

the minimum floating-point value is represented by,

which represents the maximum fixed-point value,

denotes the minimum fixed point value, X_zeropintRepresenting the quantized fixed-point value corresponding to the zero floating-point value.

And 103, determining the position information of the current position based on the first position information of the target positioning identifier.

Illustratively, the target location identifier is at least one of location identifiers in a target building, and the target model identifies the target location identifier included in the real-time image from the real-time image.

For example, after the target location identifier in the real-time image is identified by the target model, the position information of the current position of the user may be determined according to the first position information of the target location identifier. In order to improve the positioning accuracy of indoor positioning, the target positioning identifier may include at least one positioning identifier, and usually at least 4 positioning identifiers are required to determine the position information of the current position of the user.

For example, after the target positioning identifier is recognized, the first position information of the target positioning identifier may be determined according to the map positioning anchor point MLA of the target building. The map positioning anchor point MLA can be downloaded to a local database from a server by the mobile terminal for storage, so that the query efficiency is improved.

It can be understood that, since the map positioning anchor point is a positioning anchor point in a specific building, different buildings need to be positioned in real time in an indoor scene through the corresponding map positioning anchor points. When a user carries out real-time positioning on an indoor scene in a target building, the mobile terminal can acquire an MLA copy of a map positioning anchor point of the building where the user is located from the server.

Specifically, the step 103 may include the following steps 103 a:

and 103a, matching the identification information of the target positioning identification with the identification information of the positioning identification in the map positioning anchor point MLA of the target building to obtain first position information corresponding to the target positioning identification.

Wherein the map location anchor MLA comprises: at least one positioning anchor point; the anchor point information of each positioning anchor point comprises: identification information of the positioning identification and position information of the positioning identification.

It is understood that the anchor point is at least one of the positioning anchor points in the target building, and each positioning anchor point may include: a location indicator, and location information of the location indicator. After the positioning identifier contained in the real-time image is identified through the target model, the corresponding positioning anchor point can be found from the map positioning anchor point MLA, and the position information of the current position is determined according to the position information of the positioning identifier corresponding to the positioning anchor point.

Illustratively, the map location anchor MLA includes information related to each of the map location anchors MLA of the target building. Specifically, the location identifier may include identification information of the location identifier corresponding to each location anchor point, and location information of the location identifier.

It will be appreciated that when the user is located within the building, the various location markers within the building are in the same coordinate system as the user. Therefore, the position information of the current position of the user can be determined through the position information of the positioning mark shot by the user. The position information is obtained based on the same coordinate system.

Exemplarily, after the position information of the target positioning identifier is acquired, the position information of the current position of the user may be determined by matching the positioning identifier corresponding to the identified positioning anchor with the positioning identifier corresponding to each anchor in the MLA. After the step 103a, the following steps 103b1 to 103b4 may be further included:

step 103b1, calculating the distance between the camera and the auxiliary positioning point according to the second position information of the auxiliary positioning point, and obtaining the third position information of the corresponding positioning reference point.

For example, after the first position information of the target positioning point is obtained, the distance between each positioning point in the target positioning point and the auxiliary positioning point may be calculated according to the first position information, and a distance array SM is obtained_n(n is the number of target location identities). Then, calculating the relative position relationship between the camera and the target positioning mark by using the P3P algorithm, and further obtaining the distance SM between the camera and the auxiliary positioning point_t. Finally, the distance SM_tThe corresponding walking node is determined as the positioning reference point.

It should be noted that the P3P (passive-n-Point, PnP) algorithm is a method for solving the motion of a 3D to 2D Point pair. It describes how the pose of the camera is estimated when knowing the N3D spatial points and their projected positions. Namely, n 3D space points are given, and the pose of the camera is solved, wherein n is 3 in the embodiment of the application.

And step 103b2, taking the positioning reference point as the center, and generating a buffer area.

And the radius of the buffer area is the sum of the target error range and a preset step length.

Step 103b3, screening walking nodes located in the buffer area range from the walking node set to obtain a walking node subset, and calculating the distance between each walking node in the walking node subset and the positioning reference point.

Step 103b4, determining the fourth position information of the walking node with the shortest distance between the walking node subset and the positioning reference point as the position information of the current position.

Wherein the auxiliary positioning point is used for constraint verification of the position information of the current position; the auxiliary positioning points are as follows: and any positioning point before the current position determines the position information.

Illustratively, after the target positioning identifier in the real-time image is recognized through the target model and the position information of each positioning identifier in the target positioning identifier is acquired, the relative position relationship between the camera and the target positioning identifier can be obtained through a P3P algorithm, and the three-dimensional space coordinate information of the camera in the target building is further obtained through calculation.

For example, in order to increase the accuracy of positioning, the embodiment of the present application further performs constraint verification on the calculation process of the position information of the current position through an auxiliary positioning point, so as to improve the accuracy of the finally obtained position information of the current position.

The auxiliary positioning point may be any known positioning point, and the position information of the current position may be calculated according to the above steps 103b1 to 103b 4.

The above-described walking node set is a set of walking nodes provided in the target building, and the position information of the current position of the user is based on the walking nodes.

For example, the target error range may be determined based on the following steps, that is, after the step 103b4, the following steps 103b5 and 103b6 may be further included:

step 103b5, determining the shortest distance between the subset of walking nodes and the positioning reference point as the distance error.

Step 103b6, determining the maximum error range after the obtained plurality of distance errors are subjected to adaptive correction as the target error range.

For example, after determining the walking node with the shortest distance between the walking node subset and the positioning reference point and obtaining the corresponding shortest distance, the distance may be determinedTaking the shortest distance as a distance error, carrying out self-adaptive correction on the shortest distance and the distance error obtained previously, and obtaining a maximum error range E after self-adaptive correction_maxThe maximum error range can be used as a target error range for the next positioning, and the target error range participates in the calculation of the current position.

It can be understood that, when the initial positioning is performed, that is, when any shortest distance is not obtained yet, a preset value may be set for the target error range, and the target error range may be adjusted in the subsequent calculation process.

In a possible implementation manner, the auxiliary locating point may be an initial locating point at which a user performs real-time indoor scene locating. The initial positioning point can be obtained by accurate calculation through an auxiliary positioning device arranged in the target building.

Exemplarily, the mapping anchor MLA may include MLA(s) and MLA (c). The MLA (S) is a set of semantic information points which can be sensed and identified by a sensor of the mobile terminal in a target building, and is constructed by signal sensing from a barometer, an accelerometer, a gyroscope and a Bluetooth sensor and geometrical constraint information of a building space. The construction of MLA (S) adds geometric constraint information to the indoor positioning of the multi-source sensor, and is mainly used for constraint positioning in different scenes. Mla (c) is a set of positioning identifiers used for assisting positioning within the target building.

When the auxiliary anchor point is an initial anchor point, the position information of the initial anchor point can be obtained according to the MLA(s) part in the map anchor point MLA. Namely, the mobile terminal can determine the initial positioning point according to the auxiliary positioning device arranged in the target building, and in the subsequent calculation process of calculating the position information of the current position, the calculation process of the position information of the current position is subjected to constraint verification through the position information of the initial positioning point, so that the positioning accuracy of the current position is improved.

Optionally, in this embodiment of the present application, the set of walking nodes may be obtained through the following steps.

For example, before the step 103, the indoor scene positioning method provided in the embodiment of the present application may further include the following steps 104 to 108:

and 104, obtaining a Building Information Model (BIM) model of the target building based on the three-dimensional model of the target building.

Illustratively, the primary condition for three-dimensional map modeling of a target building is to acquire original engineering data of the target building, the original engineering data including an engineering drawing, a component table, and the like used when the target building is constructed, and a real image acquired by a camera after the target building is completed, input the original engineering data into Revit software for three-dimensional modeling to obtain an original model, and then determine a lightweight BIM model of the target building based on the original model, the BIM model being rich in information described but containing much redundant information which is unnecessary for positioning and navigation, thereby reducing transmission efficiency of computers and mobile devices, and therefore, it is necessary to simplify the BIM model before extracting the information, the lightweight BIM model in the embodiment of the present invention is mainly directed to a complicated wall structure, unnecessary lines and surfaces, and redundant structural information and the like are deleted, redundant internal structural information in the BIM is deleted through bridging, welding, sealing, deleting and other operations in the lightweight process of the BIM, and geometrical information such as vertexes, normals and the like of the original model is reserved.

And 105, extracting the spatial topological relation of the spatial nodes of the target building, and extracting the spatial coordinate information of the spatial nodes from the BIM model.

And 106, constructing a road network organization in the horizontal direction and the vertical direction based on the spatial topological relation and the spatial coordinate information to obtain a network model.

Illustratively, the BIM model has rich indoor three-dimensional features that can provide spatial information of the target building indoor environment, including geometric and topological relationships, and certain interior components, such as openings, facilities, and surfaces. The BIM model is suitable for various applications and can acquire connectivity among spaces required by people to pass through, but extraction and organization of data such as semantic, geometric and spatial topological relations contained in the BIM model are important problems in building map model construction. The embodiment of the invention puts the spatial topological relation and the geometric information contained in the network model into the network model. The target building is a building for which a building map hybrid data model for positioning navigation needs to be constructed. The spatial nodes of the target building are nodes representing all parts of the target building in a light-weight BIM model, such as window nodes, room nodes, stair nodes and the like. The spatial topological relation of the spatial nodes refers to the position incidence relation among the spatial nodes. The spatial coordinate information of each spatial node includes spatial coordinate information of corridor, stair, elevator and room nodes, but is not limited to the spatial coordinate information of the nodes. According to the embodiment of the application, the space topological relation of the space nodes of the target building is extracted through the gbxml derived by the Revit software, and the space coordinate information of the space nodes is extracted from the lightweight BIM model based on the Blender software to construct a road network organization in the horizontal direction and the vertical direction to obtain the network model.

And 107, extracting geometric information from the BIM model, and selecting a space represented by the building components with boundary relation and belonging to a specific space to obtain an entity model.

Illustratively, a solid model is composed of three-dimensional building element solid elements with a spatial concept. In a practical building indoor environment, the space is usually surrounded by three-dimensional building element entities such as walls, columns, etc. Thus, with reference to the spatial representation in the geometric boundary model, building elements having boundary relationships belonging to a particular space are selected to represent a certain space. In other words, in the solid model described in the patent, the space is expressed in dependence on the boundary relationship with the surrounding building constituent elements, which can be queried as objects in a space query operation.

And 108, linking the entity model and the network model according to a preset linking rule to obtain a building map mixed data model.

Wherein the preset linking rules include linking and mapping based on semantic relationships between building elements in the solid model and spatial nodes in the network model.

Illustratively, the link between the network model and the entity model is mainly used for completing the link and mapping between the space node in the network model and the three-dimensional entity component element in the entity model. Their connection mode depends on the semantic relationship between them, and there are mainly two kinds of relationships: one is a direct relationship and the other is an indirect relationship. The direct relation exists between the communication node in the network model and the target node of the indoor facility component abstraction, and the direct relation and the relation of the building entity component in the entity model are in one-to-one correspondence. The indirect relationship is mainly due to the difference in spatial expression between the network model and the solid model. In the network model, a space is abstracted as one node, and in the solid model, the space is composed of a plurality of building constituent elements. Therefore, the expression of indirect relationships needs to be linked to space. The specific description is to establish the relationship between the space and the elements in the network model and the relationship between the space and the elements in the entity model respectively. And then establishing an indirect relation between the network model and the entity model by taking the space elements as media. In the network model, since it is a direct abstract representation of space, the relationship between elements and space elements in the network model is one-to-one and therefore simpler. In a mockup, the relationship between an element and a spatial element can be divided into an inclusion relationship, a boundary relationship, and a spatial relationship, wherein the inclusion relationship represents a relationship in which the spatial element includes a mockup element, for example, a lobby space includes a display mockup element, and the boundary relationship refers to a boundary between the mockup element and the spatial element, and the spatial relationship: for example, the boundaries of room space elements are composed of solid model elements such as doors and walls.

For example, as shown in fig. 2, a solid circle represents a walking node, a hollow circle represents a walking end node, a leftmost dashed box represents the arrangement of walking nodes in a staircase, the staircase comprises the walking end node and the walking node, each step is arranged as a walking node, the layout condition of a floor in a target building is arranged above the walking node, the floor is composed of a corridor in the middle, rooms arranged along two sides of the corridor in sequence and two stairs, the two sides of the room form a sector to construct a door representing the house price, the central axis of the corridor is represented by a bold line, and the communication relationship between any two nodes is represented by a thin line. Firstly, determining a first part of walking end nodes according to the existing space nodes in the network model, then determining turning nodes corresponding to each node in the first part of walking end nodes as second part of walking end nodes, finally, setting the ends of the center line of the corridor in the network model as the last part of walking end nodes, such as the ends at two ends of the corridor in fig. 2, namely the points where the center line of the corridor intersects with the wall, and finally taking all the parts of walking end nodes as the whole walking end nodes.

For example, in the embodiment of the present application, in order to construct anchor point data in the map location anchor point MLA, the BIM model may be derived into a general three-dimensional format FBX, and is imported into a Blender for capturing and collecting the map location anchor point. Specifically, the geometric center of the universal building components such as doors, doorplates, fire-fighting cabinets, fire alarms, security exits, cameras, WLANs, electronic boxes, elevators, etc. in the BIM model can be used as the positioning anchor point used in the embodiment of the present application.

According to the indoor scene positioning method provided by the embodiment of the application, the building indoor map positioning anchor point diagram is constructed through the attribute incidence relation of each element of the building indoor space. Map positioning anchor points MLA are distributed in the indoor environment of the building with the same road network walking nodes, in the process of user movement, more accurate bottom layer characteristics of a scene map are obtained through deep learning, building scene semantic recognition is achieved, recognition results are matched with the map positioning anchor points, meanwhile, logical reasoning is conducted through scene element matching results to achieve deep integration of information of all parts of sensing, semantics, positioning and element management, more accurate position coordinate information of instantiated scene elements is further obtained, and mobile phone indoor scene recognition and positioning of building map semantic constraint are achieved.

It should be noted that, in the indoor scene positioning method provided in the embodiment of the present application, the execution main body may be an indoor scene positioning device, or a control module used for executing the indoor scene positioning method in the indoor scene positioning device. The indoor scene positioning device provided by the embodiment of the present application is described by taking an example of an indoor scene positioning method executed by an indoor scene positioning device.

In the embodiments of the present application, the above-described methods are illustrated in the drawings. The indoor scene positioning method is exemplarily described by referring to one of the drawings in the embodiments of the present application. In specific implementation, the indoor scene positioning method shown in each method drawing can also be implemented by combining any other drawing which can be combined and is illustrated in the above embodiments, and details are not repeated here.

The indoor scene positioning device provided by the present application is described below, and the indoor scene positioning methods described below and described above may be referred to in correspondence with each other.

Fig. 3 is a schematic structural diagram of an indoor scene positioning device according to an embodiment of the present application, and as shown in fig. 3, the indoor scene positioning device specifically includes:

an obtaining module 301, configured to obtain a real-time image in a target building, which is acquired by a camera; the identification module 302 is configured to input the real-time image into a target model, and identify a target location identifier in the real-time image; a positioning module 303, configured to determine, based on the first location information of the target positioning identifier, location information of a current location; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

Optionally, the apparatus further comprises: a training module; the obtaining module 301 is further configured to obtain images of different angles and/or different distances of a positioning identifier used for assisting positioning in the target building, and generate the target sample set; the training module is configured to train the improved YOLOv5 model using the target sample set to obtain the target model; wherein the improved YOLOv5 model comprises best class only part, and removes multi-label multi label part in the original YOLOv5 model.

Optionally, the positioning module 303 is specifically configured to match the identification information of the target positioning identifier with the identification information of the positioning identifier in the map positioning anchor MLA of the target building, so as to obtain first position information corresponding to the target positioning identifier; wherein the map location anchor MLA comprises: at least one positioning anchor point; the anchor point information of each positioning anchor point comprises: identification information of the positioning identification and position information of the positioning identification.

Optionally, the positioning module 303 is further specifically configured to calculate a distance between the camera and the auxiliary positioning point according to the second position information of the auxiliary positioning point, and obtain third position information of a corresponding positioning reference point; the positioning module 303 is further configured to generate a buffer area with the positioning reference point as a center; the radius of the buffer area is the sum of a target error range and a preset step length; the positioning module 303 is further configured to filter walking nodes located in the buffer area from the walking node set to obtain a walking node subset, and calculate a distance between each walking node in the walking node subset and the positioning reference point; the positioning module 303 is further configured to determine fourth location information of a walking node in the walking node subset, where the distance between the walking node in the walking node subset and the positioning reference point is shortest, as the location information of the current location; wherein the auxiliary positioning point is used for constraint verification of the position information of the current position; the auxiliary positioning points are as follows: and any positioning point before the current position determines the position information.

The indoor scene positioning device provided by the application constructs the positioning anchor point diagram of the indoor map of the building according to the attribute incidence relation of each element of the indoor space of the building. Map positioning anchor points MLA are distributed in the indoor environment of the building with the same road network walking nodes, in the process of user movement, more accurate bottom layer characteristics of a scene map are obtained through deep learning, building scene semantic recognition is achieved, recognition results are matched with the map positioning anchor points, meanwhile, logical reasoning is conducted through scene element matching results to achieve deep integration of information of all parts of sensing, semantics, positioning and element management, more accurate position coordinate information of instantiated scene elements is further obtained, and mobile phone indoor scene recognition and positioning of building map semantic constraint are achieved.

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform an indoor scene location method, the method comprising: acquiring a real-time image in a target building, which is acquired by a camera; inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image; determining the position information of the current position based on the first position information of the target positioning identifier; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: and the image contains a positioning identifier used for assisting positioning in the target building.

In addition, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present application further provides a computer program product, the computer program product includes a computer program stored on a readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, the computer can execute the indoor scene positioning method provided by the above methods, the method includes: inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image; determining the position information of the current position based on the first position information of the target positioning identifier; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

In yet another aspect, the present application further provides a readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the indoor scene positioning method provided in the above aspects, the method comprising: inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image; determining the position information of the current position based on the first position information of the target positioning identifier; the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An indoor scene positioning method, comprising:

acquiring a real-time image in a target building, which is acquired by a camera;

inputting the real-time image into a target model, and identifying a target positioning identifier in the real-time image;

determining the position information of the current position based on the first position information of the target positioning identifier;

the target positioning identifier is at least one of a positioning identifier set used for auxiliary positioning in a target building; the target model is obtained by training an improved YOLOv5 model by using a target sample set; the target sample set includes: an image containing a location marker for assisting in location within the target building.

2. The method of claim 1, wherein before inputting the real-time image into the object model and identifying the object location marker in the real-time image, the method further comprises:

acquiring images of different angles and/or different distances of a positioning identifier used for auxiliary positioning in the target building, and generating the target sample set;

training the improved YOLOv5 model by using the target sample set to obtain the target model;

wherein the improved YOLOv5 model comprises best class only parts, and multi-label multilabel parts in the original YOLOv5 model are removed.

3. The method of claim 1, wherein determining the location information of the current location based on the first location information of the target positioning identifier comprises:

matching the identification information of the target positioning identification with the identification information of the positioning identification in the map positioning anchor point MLA of the target building to obtain first position information corresponding to the target positioning identification;

4. The method according to claim 3, wherein after matching the identification information of the target positioning identifier with the identification information of the positioning identifier in the map positioning anchor MLA of the target building to obtain the first location information corresponding to the target positioning identifier, the method further comprises:

calculating the distance between the camera and the auxiliary positioning point according to the second position information of the auxiliary positioning point, and obtaining third position information of a corresponding positioning reference point;

generating a buffer area by taking the positioning reference point as a center; the radius of the buffer area is the sum of a target error range and a preset step length;

screening walking nodes positioned in the range of the buffer area from a walking node set to obtain a walking node subset, and calculating the distance between each walking node in the walking node subset and the positioning reference point;

determining fourth position information of the walking node with the shortest distance between the walking node subset and the positioning reference point as position information of the current position;

5. The method of claim 4, wherein after calculating the distance between each walking node in the subset of walking nodes and the location reference point, the method further comprises:

determining a shortest distance between the subset of walking nodes and the location reference point as a distance error;

and determining the maximum error range after the obtained plurality of distance errors are subjected to self-adaptive correction as the target error range.

6. The method of claim 4, wherein before determining the location information of the current location based on the first location information of the target positioning identifier, the method further comprises:

obtaining a Building Information Model (BIM) model of the target building based on the three-dimensional model of the target building;

extracting the spatial topological relation of the spatial nodes of the target building, and extracting the spatial coordinate information of the spatial nodes from the BIM;

constructing a road network organization in the horizontal direction and the vertical direction based on the spatial topological relation and the spatial coordinate information to obtain a network model;

extracting geometric information from the BIM model, and selecting a space represented by a building component which belongs to a specific space and has a boundary relation to obtain an entity model;

linking the entity model and the network model according to a preset linking rule to obtain a building map mixed data model;

7. The method of claim 6, wherein after the entity model and the network model are linked according to a preset linking rule to obtain a building map mixture data model, the method further comprises:

setting walking nodes on the mixed data model to obtain a walking node set;

the walking nodes are nodes which are arranged between any adjacent walking end nodes in the preset step length; the walking end node is set based on the position information of the space node in the network model and the auxiliary positioning mark required for indoor navigation.

8. An indoor scene positioning device, the device comprising:

the acquisition module is used for acquiring a real-time image in the target building acquired by the camera;

the identification module is used for inputting the real-time image into a target model and identifying a target positioning identifier in the real-time image;

the positioning module is used for determining the position information of the current position based on the first position information of the target positioning identifier;

9. A readable storage medium having stored thereon a computer program, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the indoor scene localization method according to any one of claims 1 to 7.

10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the indoor scene localization method of any one of claims 1 to 7.