WO2020258820A1 - Mobile side vision fusion positioning method and system, and electronic device - Google Patents

Mobile side vision fusion positioning method and system, and electronic device Download PDF

Info

Publication number
WO2020258820A1
WO2020258820A1 PCT/CN2019/130553 CN2019130553W WO2020258820A1 WO 2020258820 A1 WO2020258820 A1 WO 2020258820A1 CN 2019130553 W CN2019130553 W CN 2019130553W WO 2020258820 A1 WO2020258820 A1 WO 2020258820A1
Authority
WO
WIPO (PCT)
Prior art keywords
mobile terminal
positioning
video frame
static object
current position
Prior art date
Application number
PCT/CN2019/130553
Other languages
French (fr)
Chinese (zh)
Inventor
赵希敏
胡金星
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2020258820A1 publication Critical patent/WO2020258820A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations

Definitions

  • This application belongs to the cross-technology field of artificial intelligence and geographic information technology, and in particular relates to a mobile terminal vision fusion positioning method, system and electronic equipment.
  • the Global Navigation Satellite System can realize navigation and positioning outdoors.
  • radio positioning technologies represented by GNSS, cellular networks, and WIFI can achieve sub-meter precise positioning in open outdoor locations.
  • the principle is to rely on detecting the characteristic parameters of the propagated signal to achieve positioning.
  • Common methods include proximity detection, observation time difference of arrival (OTDOA) and so on.
  • Indoor positioning technology mainly realizes the positioning and tracking of people and objects in various indoor spaces.
  • the demand for safety and monitoring of people and objects based on indoor positioning is also increasing.
  • People's demand for location services in indoor environments has become increasingly significant.
  • Domestic and foreign scholars have conducted a lot of exploration and research.
  • most indoor positioning systems are implemented based on proximity detection, triangular, multilateral positioning, and fingerprint positioning methods, or combined positioning methods are adopted to improve accuracy.
  • the indoor environment is changeable and complex, and there is no universal solution. How to improve accuracy, real-time, security, scalability, low cost, convenience, and specialization are current research hotspots. .
  • WIFI indoor radio positioning technologies such as WIFI, Bluetooth, etc.
  • RSSI Received Signal Strength Indication
  • WIFI indoor positioning technology generally includes: the positioning method based on the received signal strength indicating the RSSI distance rendezvous and the RSSI location fingerprint method.
  • the matching of the signal is the main part of its research, and the positioning accuracy lies in the calibration point density.
  • This technology has the advantages of easy expansion, automatic data update, and low cost, so it is the first to realize large-scale applications.
  • Bluetooth positioning is based on a short-distance low-power communication protocol.
  • the implementation methods can be centroid positioning, fingerprint positioning, and proximity detection; Bluetooth positioning has the advantages of low power consumption, close range, and extensive use, but at the same time it has poor stability and is affected by environmental interference. Big.
  • the precise micro-positioning technology iBeacon developed by Apple based on Bluetooth low energy works in a similar way to the previous Bluetooth technology.
  • the signal is transmitted by the Beacon, and the Bluetooth device receives and feedbacks the signal. When the user enters, exits or wanders in the area, the Beacon
  • the broadcast has the ability to spread, and can calculate the distance between the user and the Beacon (calculated by RSSI). It can be seen that as long as there are three iBeacon devices, it can be located.
  • the above-mentioned indoor positioning technologies are all based on radio frequency signals. Radio signals are easily affected by indoor environments such as obstacles. Environmental changes cause positioning accuracy to decrease. At the same time, the early stage requires construction operations, a large amount of equipment, and high maintenance costs. .
  • the vision sensor positioning technology uses triangulation technology to obtain the current relative position to achieve positioning. Compared with other sensor positioning methods, the vision sensor has a higher positioning accuracy and low cost.
  • the EasyLiving system is a positioning system based on computer vision, using high-performance mobile terminals with high accuracy; but when the indoor environment is complex, it is difficult to maintain high accuracy all the time.
  • SLAM Simultaneous Location And Mapping
  • visual sensors can be introduced.
  • the EV-Loc indoor positioning system proposed in 2012 is a positioning system that uses visual signals as auxiliary positioning to improve accuracy.
  • Google's Visual Positioning Service (VPS) technology based on the principle of visual positioning has theoretical accuracy up to centimeter level.
  • the existing visual positioning systems such as EasyLiving, Google VPS, etc.
  • Just relative positioning In order to achieve accurate indoor geographic positioning, a large number of manual signs must be deployed at fixed points in advance, and the preliminary preparations are tedious.
  • these existing visual positioning technologies only consider the data sensed by the sensors, and do not make use of the semantic information carried by these data.
  • This application provides a mobile terminal vision fusion positioning method, system and electronic device, which aim to solve one of the above technical problems in the prior art at least to a certain extent.
  • a mobile terminal vision fusion positioning method includes the following steps:
  • Step a Acquire the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
  • Step b Use a mobile terminal to obtain video frames
  • Step c Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method
  • the positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  • the technical solution adopted in the embodiment of the application further includes: in the step b, after the use of the mobile terminal to obtain the video frame, the method further includes: the visual odometer calculates the current pose information of the mobile terminal according to the obtained video frame.
  • the technical solution adopted in the embodiment of the present application further includes: the visual odometer calculates the current pose information of the mobile terminal according to the acquired video frames specifically including:
  • Step b1 The visual odometer scales the acquired video frame to the set size, saves it in the image sliding window, and judges whether the current video frame is the first frame, if the current video frame is the first frame, only the key point extraction operation is performed ; Otherwise, extract the key points and calculate the residual difference between the current key point and the key point of the previous video frame; the single key point residual e is the error of the key point pixel brightness, and the calculation formula is:
  • I 2 is obtained by I 1 after a certain movement
  • R and t are the motion trajectories of the mobile terminal
  • x 1 is the pixel position of the key point in the image I 1
  • x 2 is the pixel of the key point in the image I 2 Position
  • p 1 is the coordinate of the key point in real space
  • K is the internal parameter matrix of the mobile terminal
  • Step b2 Use Gauss Newton's method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record them in the pose storage sliding window;
  • Step b3 Obtain the mobile terminal pose of the current video frame, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is the motion offset of the mobile terminal.
  • the technical solution adopted in the embodiment of the application further includes: the step b also includes: judging the monitoring positioning state, and if it is not in the positioning state, adding the current pose information of the mobile terminal to the current position acquired last time , Update the current position of the positioning target; if it is in the positioning state, perform step c.
  • the technical solution adopted in the embodiment of the application further includes: in the step c, the detecting a static object in the video frame, obtaining the coordinate information of the static object through the BIM spatial database, and bringing the coordinate information of the static object Entering a multi-target object positioning model, and iteratively solving the positioning model through the Gauss Newton method, and obtaining the current position of the mobile terminal specifically includes:
  • Step c1 Take out the video frame and input it into the target detection neural network to obtain the type of static object contained in the video frame, and set the center pixel position of the static object as a calibration point, and then take out the next video frame And the mobile terminal pose information of the next video frame, using triangulation to calculate the calibration point and the depth information of the mobile terminal; wherein the triangulation formula is as follows:
  • s 1 and s 2 are the depth information of key points
  • Step c2 Load the BIM spatial information database with the coordinate information of the current position through the identified static object category, and obtain the coordinate information of the static object according to the BIM spatial information database;
  • Step c3 Bring the coordinate information of the static object into the positioning model, and use Gauss-Newton method to iteratively solve to obtain the current position of the mobile terminal; the Gauss-Newton method to solve the equations is:
  • (x, y, z) is the current position of the mobile terminal
  • (x n, y n, z n) are BIM coordinate information
  • ⁇ n is the static object depth to the mobile terminal
  • ⁇ n is a measure of the depth of the noise
  • Step c4 Combine the current position of the mobile terminal with the static object coordinate information to obtain a positioning result of the current position.
  • a mobile terminal vision fusion positioning system including:
  • Initial positioning unit Based on the calibrated starting position and sensor information, obtain the initial position of the mobile terminal and set the initial position as the current position of the positioning target;
  • Video frame acquisition module used to acquire video frames using mobile terminals
  • Target positioning module used to detect static objects in the video frame, obtain the coordinate information of the static objects through the BIM spatial database, bring the coordinate information of the static objects into the multi-target object positioning model, and iterate through the Gauss-Newton method Solve the positioning model, obtain the new current position of the mobile terminal, and combine the new current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  • the technical solution adopted in the embodiment of the present application further includes a pose calculation module, which is used to calculate the current pose information of the mobile terminal according to the acquired video frame through the visual odometer.
  • the pose calculation module includes:
  • Key point extraction unit used to zoom the acquired video frame to a set size through the visual odometer, save it in the image sliding window, and determine whether the current video frame is the first frame, if the current video frame is the first frame, only Carry out the operation of extracting key points; otherwise, extract the key points and calculate the residual difference between the current key point and the key point of the previous video frame; the single key point residual e is the error of the key point pixel brightness, and the calculation formula is:
  • I 2 is obtained by I 1 after a certain movement
  • R and t are the motion trajectories of the mobile terminal
  • x 1 is the pixel position of the key point in the image I 1
  • x 2 is the pixel of the key point in the image I 2 Position
  • p 1 is the coordinate of the key point in real space
  • K is the internal parameter matrix of the mobile terminal
  • Motion pose solving unit used to use Gauss-Newton method to solve the residual Jacobian, get the motion pose of the current video frame and the previous video frame, and record it to the pose storage sliding window;
  • Motion offset calculation unit used to obtain the mobile terminal pose of the current video frame, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is the motion offset of the mobile terminal shift.
  • the technical solution adopted in the embodiment of this application also includes a positioning judgment module and a position update module.
  • the positioning judgment module is used to monitor the positioning status through judgment. If it is not in the positioning state, the current position of the mobile terminal is updated by the position update module. After the pose information is added to the current position obtained last time, the current position of the positioning target is updated; if it is in the positioning state, the positioning result of the positioning target is obtained through the target positioning module.
  • the target positioning module specifically includes:
  • Object recognition and depth calculation unit used to take out the video frame, input it into the target detection neural network, obtain the type of static object contained in the video frame, and set the center pixel position of the static object as a calibration point, Then take out the next video frame and the mobile terminal pose information of the next video frame, and use the triangulation method to calculate the calibration point and the depth information of the mobile terminal; wherein the triangulation method formula is as follows:
  • s 1 and s 2 are the depth information of key points
  • Coarse positioning unit used to load the BIM spatial information database with the coordinate information of the current position through the identified static object category, and obtain the coordinate information of the static object according to the BIM spatial information database;
  • Fine positioning unit used to bring the coordinate information of the static object into the positioning model, and iteratively solve it by using the Gauss Newton method to obtain the current position of the mobile terminal; the Gauss Newton method to solve the equations is:
  • (x, y, z) is the current position of the mobile terminal
  • (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database
  • ⁇ n is the depth of the static object to the mobile terminal
  • ⁇ n is the depth measurement noise
  • Positioning result generating unit used to combine the current position of the mobile terminal with the static object coordinate information to obtain the positioning result.
  • an electronic device including:
  • At least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor to enable
  • the at least one processor can perform the following operations of the aforementioned mobile terminal visual fusion positioning method:
  • Step a Acquire the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
  • Step b Use a mobile terminal to obtain video frames
  • Step c Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method
  • the positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  • the beneficial effects produced by the embodiments of the present application are: the mobile terminal visual fusion positioning method, system and electronic equipment of the embodiments of the present application use visual sensors to detect and identify static objects in the real world to obtain the object spatial relationship, and The object space relationship provided by the BIM model is matched with geographical topological space, and then a nonlinear equation set is established according to the distance measurement of the object, and the equation set is solved iteratively, and the precise position is obtained by convergence, thereby realizing a more accurate, convenient and cheaper positioning method.
  • FIG. 1 is a flowchart of a method for visual fusion positioning on a mobile terminal according to an embodiment of the present application
  • Figure 2 is a schematic diagram of the target detection neural network structure
  • Figure 3 is a schematic diagram of key points selection
  • FIG. 4 is a schematic structural diagram of a mobile terminal visual fusion positioning system according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the hardware device structure of the mobile terminal vision fusion positioning method provided by an embodiment of the present application.
  • FIG. 1 is a flowchart of a mobile terminal vision fusion positioning method according to an embodiment of the present application.
  • the mobile terminal vision fusion positioning method of the embodiment of the present application includes the following steps:
  • Step 100 System initialization
  • step 100 the system initialization includes the following steps:
  • Step 110 Initialize the visual odometer
  • the initialization of the visual odometer includes operations such as memory allocation of the pose manager and initial value of variables;
  • the pose manager includes a main data structure such as a sliding window for pose storage, an image sliding window, and key points.
  • the sliding window of the pose storage is used to store the calculated pose information;
  • the image sliding window is used to cache the shooting information of the mobile terminal, waiting for the extraction of key points and the estimation of the depth of the calibration point;
  • the key points are used to identify a frame of image
  • the pixel gradient change in a certain area in the image is used for the similarity comparison of subsequent images;
  • the pose manager also includes key point extraction function, key point similarity estimation function, key point update function, key Point drop function, key point index function, key point depth estimation function, frame pose estimation function, and pose addition, deletion, and query functions.
  • Step 120 initialization of semantic positioning calibration
  • the initialization of the semantic positioning calibration includes the training of the target detection neural network, the loading of the model, and the generation and loading of the BIM spatial database.
  • the target detection neural network structure is shown in Figure 2.
  • the target detection neural network structure described in this application refers to the existing target detection method, the network structure has not been changed, and only the proprietary static object data set is used when training the network After training and optimization, it is deployed to mobile terminals.
  • the BIM spatial database is constructed and indexed using the R-tree method.
  • the BIM spatial data structure includes the electronic map of the area where the current location is located and the category of static objects contained in the area, the coordinate information of the static object, other static objects near the static object, and the building where the current location is located.
  • the spatial layout information of each floor of the building includes the electronic map of the area where the current location is located and the category of static objects contained in the area, the coordinate information of the static object, other static objects near the static object, and the building where the current location is located.
  • Step 130 Acquire the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
  • the mobile terminal of the embodiment of the present application is an Android-equipped terminal device with a 9-axis IMU sensor.
  • the initial positioning can be set through the indoor flat map to set the starting point location, or set through the automatic identification of unique signs, and then accumulate based on acceleration, gyroscope and other sensors.
  • Step 200 Obtain video frames using a mobile terminal
  • Step 300 The visual odometer calculates the current pose information of the mobile terminal according to the acquired video frames
  • the pose is a motion matrix, which contains information such as rotation and translation of the mobile terminal.
  • the visual odometer uses the pixel residuals of key points to estimate the pose changes of the mobile terminal in two adjacent frames, and then obtains the movement offset of the mobile terminal.
  • the visual odometer pose calculation includes:
  • Step 310 The visual odometer scales the acquired video frame to 300px*300px (specifically can be set according to the actual application), saves it in the image sliding window, and judges whether the current video frame is the first frame, if the current video frame is Only the key points are extracted in the first frame, otherwise, the key points are extracted and the residuals between the key points and the key points of the previous video frame are calculated;
  • step 310 the key point calculation steps are as follows: first, select the pixel p on the video frame, and take 30% of the gray value G of the pixel P as the threshold T. Then, taking pixel p as the center, select 16 pixels on a circle with a radius of 3. If there are N consecutive points on the selected circle whose brightness is greater than G+T or less than GT, then the pixel p can be considered as a video The key point of the frame. Repeat the above steps until all the pixels on the video frame are traversed to complete the key point calculation; finally, non-maximum suppression is used, and within 50px*50px, only the key points that respond to the maximum value are retained. As shown in Figure 3, a schematic diagram of key points selection.
  • the sixteen pixel values of point p with a radius of 3 are in the gray part of the table below.
  • the gray part is the selected N points, calculate the size of these N points and P, and determine whether they are key points according to the above rules.
  • P is a value from 0 to 255.
  • the residual e of a single key point is the error of the pixel brightness of the key point, and the calculation formula is as follows:
  • I 2 is obtained by I 1 after a certain movement
  • R and t are the trajectories of the mobile terminal
  • x 1 is the pixel position of the key point in the image I 1
  • x 2 is the key point in the image I 2
  • P 1 is the coordinate of the key point in real space
  • K is the internal parameter matrix of the mobile terminal.
  • Step 320 Use the Gauss-Newton method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record them in the pose storage sliding window;
  • step 320 using the Gauss-Newton method to solve the residual Jacobian specifically includes:
  • is the pose of the mobile terminal
  • J is the gradient of the residual with respect to the Lie algebra
  • ⁇ * is the iteration increment.
  • the Gauss-Newton method is used to solve the optimization problem, which is solved iteratively along the gradient by a step of the objective function.
  • Step 321 Given the initial point p 0 and the number of iterations k, the allowable error ⁇ >0. When the number of iterations and the error do not meet the conditions, perform step 330;
  • Step 322 If the objective function f((X) k+1 )-f((X) k ) is less than the threshold ⁇ , exit, otherwise go to step 323;
  • Step 323 Calculate the iteration increment, bring in the objective function, and return to step 321.
  • Step 330 Obtain the pose of the mobile terminal of the current video frame, which is a row vector with six degrees of freedom, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is The movement of the mobile terminal is offset.
  • Step 400 By judging the positioning monitoring state, if it is not in the positioning state, perform step 500; otherwise, perform step 600;
  • the location monitoring is a callback type, and may be executed at any time after the initialization is completed.
  • This application starts the visual odometry pose calculation after acquiring the video frame. If it is not in the positioning state, the calculated pose information is added to the current position obtained last time to obtain the current position update result.
  • This step has been executed repeatedly and will not stop due to changes in positioning status.
  • the positioning monitor is installed in the positioning state, the semantic positioning calibration is called, the current position is calculated according to the recognized object, the movement offset calculated by the visual odometer is replaced, and the user's current position is updated in combination with the static object coordinates in the BIM spatial database. Then use that position as the final position.
  • Step 500 After adding the current pose information to the current position acquired last time, update the current position of the positioning target;
  • Step 600 Invoke semantic positioning calibration to detect static objects in the video frame, and use the calibration point to use the triangulation method (triangulation refers to observing the angle of the same point in two places to determine the distance of the point) to estimate the static object
  • triangulation refers to observing the angle of the same point in two places to determine the distance of the point
  • the depth information of the static object and the semantic information of the spatial data are used to propose a multi-target object positioning model, and the positioning model is solved iteratively through the Gauss-Newton method to obtain the static object coordinate information and the current position of the mobile terminal, and the static object coordinate information is moved The current position of the terminal is combined to obtain the positioning result of the positioning target;
  • step 600 the positioning method of semantic positioning calibration specifically includes the following steps:
  • Step 610 Take out the video frame in the image sliding window and input it into the target detection neural network to obtain the type of static object contained in the video frame, and set the center pixel position of the identified static object as the calibration point, and then take it out
  • the pose information of the next video frame and the mobile terminal of the next video frame are calculated using the triangulation method to calculate the calibration point and the depth information of the mobile terminal;
  • step 610 the triangulation formula is as follows:
  • s 1 and s 2 are the depth information of key points.
  • Step 620 Obtain the coordinate information of the static object in the video frame through the coarse positioning; the position data of the coarse positioning comes from the coordinate information of the static object stored in the BIM spatial database, and retrieve the coordinate information near the current position through the recognized object category Load the BIM spatial information database to find out the coordinate information carried by the recognized object category.
  • Step 630 fine positioning; in order to further optimize the positioning accuracy, the coordinate information obtained from the coarse positioning is brought into the positioning model, and the Gauss-Newton method is used to iteratively solve the problem to obtain the current position of the mobile terminal.
  • step 630 the positions of the static object and the mobile terminal should satisfy the following relationship:
  • (x, y, z) is the current position of the mobile terminal
  • (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, representing the distance of the static object from the fixed coordinate (such as The relative position of the coordinate information of the building center)
  • ⁇ n is the depth of the current static object to the mobile terminal
  • ⁇ n is the depth measurement noise.
  • Step 640 Combine the current position of the mobile terminal iteratively calculated by fine positioning (the position is the offset relative to the specific target coordinates of the building) with the static object coordinate information obtained by coarse positioning to obtain the positioning result of the current position, and generate Indoor electronic map and BIM spatial database of positioning results;
  • step 640 an electronic map of the area is obtained according to the positioning coordinates, and BIM information is superimposed to produce a nearby indoor electronic map of the current positioning result.
  • FIG. 4 is a schematic structural diagram of a mobile terminal visual fusion positioning system according to an embodiment of the present application.
  • the mobile terminal vision fusion positioning system in the embodiment of the application includes an initialization module, a video frame acquisition module, a pose calculation module, a positioning judgment module, a position update module, and a target positioning module.
  • Initialization module used for system initialization; specifically, the initialization module includes:
  • Visual odometer initialization unit used for the initialization of visual odometer; including: the memory allocation of the pose manager, the initial value of variables and other operations; the pose manager includes a sliding window for pose storage, image sliding window, key
  • the main data structure such as points, the sliding window of the pose storage is used to store the calculated pose information; the image sliding window is used to cache the shooting information of the mobile terminal, waiting for the extraction of key points and the estimation of the depth of the calibration point; the key points are used It is used to identify the pixel gradient change of a certain area in a frame of image for the similarity comparison of subsequent images; besides this main data structure, the pose manager also includes key point extraction function, key point similarity estimation function, key Point update function, key point discard function, key point index function, key point depth estimation function, frame pose estimation function, and pose addition, deletion, and query functions.
  • Semantic positioning calibration initialization unit used for the initialization of semantic positioning calibration; including target detection neural network training, model loading, and BIM spatial database generation and loading.
  • the target detection neural network structure described in this application refers to existing target detection methods, and uses a proprietary static object data set to train and optimize the network when training the network, and then deploy it to the mobile terminal.
  • the BIM spatial database is constructed and indexed using the R-tree method.
  • the BIM spatial data structure includes the electronic map of the area where the current location is located and the category of static objects contained in the area, the coordinate information of the static object, other static objects near the static object, and the building where the current location is located The spatial layout information of each floor of the building.
  • Initial positioning unit used to obtain the initial position of the mobile terminal, and set the initial position as the current position; wherein, the mobile terminal in the embodiment of the present application is a terminal device equipped with Android and has a 9-axis IMU sensor.
  • the initial positioning can be set through the indoor flat map to set the starting point position, or set through automatic identification of unique signs, and then accumulate based on acceleration, gyroscope and other sensors.
  • Video frame acquisition module used to acquire video frames using mobile terminals
  • Pose calculation module used to calculate the current pose information of the mobile terminal according to the acquired video frames through the visual odometer; among them, the pose is the motion matrix, which contains information such as the rotation and translation of the mobile terminal.
  • the visual odometer uses the pixel residuals of key points to estimate the pose changes of the mobile terminal in two adjacent frames, and then obtains the movement offset of the mobile terminal.
  • the pose calculation module includes:
  • Key point extraction unit used to scale the acquired video frame to 300px*300px, save it in the image sliding window, and determine whether the current video frame is the first frame, if the current video frame is the first frame, only the key points are extracted. Otherwise, extract the key point and calculate the residual difference between the key point and the key point of the previous video frame; among them, the key point calculation steps are as follows: first select the pixel p on the video frame, and take 30% of the gray value G of the pixel P as Threshold T. Then, taking the pixel p as the center, select 16 pixels on a circle with a radius of 3.
  • the pixel p can be considered as a video The key point of the frame. Repeat the above steps until all the pixels on the video frame are traversed to complete the key point calculation; finally, non-maximum suppression is used, and within 50px*50px, only the key points that respond to the maximum value are retained.
  • Figure 3 a schematic diagram of key points selection. For example: the sixteen pixel values of point p with a radius of 3 are in the gray part of the table below. The gray part is the selected N points, calculate the size of these N points and P, and determine whether they are key points according to the above rules. P is a value from 0 to 255.
  • the residual e of a single key point is the error of the pixel brightness of the key point, and the calculation formula is as follows:
  • I 2 is obtained by I 1 after a certain movement
  • R and t are the trajectories of the mobile terminal
  • x 1 is the pixel position of the key point in the image I 1
  • x 2 is the key point in the image I 2
  • P 1 is the coordinate of the key point in real space
  • K is the internal parameter matrix of the mobile terminal.
  • Motion pose solving unit used to use Gauss-Newton method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record it in the pose storage sliding window; among them, use Gauss-Newton method to solve
  • the residual Jacobian specifically includes:
  • is the pose of the mobile terminal
  • J is the gradient of the residual with respect to the Lie algebra
  • ⁇ * is the iteration increment.
  • the Gauss-Newton method is used to solve the optimization problem, through a step of the objective function, and iteratively descending along the gradient.
  • Motion offset calculation unit used to obtain the mobile terminal pose of the current video frame, which is a row vector with six degrees of freedom, extracts the spatial offset of the pose, and converts the spatial offset into a relative coordinate offset
  • the shift value is the movement offset of the mobile terminal.
  • Positioning judgment module judge the positioning monitoring terminal, if it is not in the positioning state, update the current position of the positioning target through the position update module; otherwise, obtain the positioning result of the positioning target through the target positioning module; among them, this application starts after obtaining the video frame Visual odometer pose calculation, if it is not in the positioning state, add the calculated pose information to the current position obtained last time to get the current position update result. This step has been executed repeatedly and will not stop due to changes in positioning status.
  • the semantic positioning calibration is called, the current position is calculated according to the recognized object, the movement offset calculated by the visual odometer is replaced, and the current position of the user is updated with the static object coordinates in the spatial database, and then the position It is drawn on the map platform as a new location.
  • Position update module used to add the current pose information to the current position obtained last time, update the current position of the positioning target, and draw an indoor electronic map according to the updated current position;
  • Target positioning module used to call semantic positioning calibration to detect static objects in the video frame, and use the calibration point to adopt the triangulation method (triangulation refers to observing the angle of the same point in two places to determine the distance of the point)
  • triangulation refers to observing the angle of the same point in two places to determine the distance of the point
  • Estimate the depth information of the static object propose a multi-target object positioning model through the depth information of the static object and the semantic information of the spatial data, use the Gauss-Newton method to iteratively solve the positioning model, obtain the static object coordinate information and the current position of the mobile terminal, and convert the static object coordinate The information is combined with the current position of the mobile terminal to obtain the positioning result of the current position.
  • the target positioning module includes:
  • Object recognition and depth calculation unit used to take out the video frame in the image sliding window, input it into the target detection neural network, get the type of static object contained in the video frame, and set the center pixel position of the identified static object To calibrate the point, take out the next video frame and the pose information of the mobile terminal in the next video frame, and use the triangulation method to calculate the depth information between the calibration point and the mobile terminal; the formula of the triangulation method is as follows:
  • s 1 and s 2 are the depth information of key points.
  • Coarse positioning unit used to obtain the coordinate information of the static object in the video frame; the position data of the coarse positioning comes from the coordinate information of the static object stored in the BIM spatial database, and retrieved near the coordinate information range of the current position through the recognized object category Load the BIM spatial information database to find out the coordinate information carried by the recognized object category.
  • Fine positioning unit It is used to bring the coordinate information obtained by coarse positioning into the positioning model, and iteratively solve by Gauss Newton method to obtain the current position of the mobile terminal.
  • the relationship between the static object and the terminal position should satisfy the following relationship:
  • (x, y, z) is the current position of the mobile terminal
  • (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, representing the distance of the static object from the fixed coordinate (such as The relative position of the coordinate information of the building center)
  • ⁇ n is the depth of the current static object to the mobile terminal
  • ⁇ n is the depth measurement noise.
  • Positioning result generating unit used to combine the current position of the mobile terminal calculated by the fine positioning iteration (the position is the offset relative to the specific target coordinates of the building) with the static object coordinate information obtained from the coarse positioning to obtain the current position positioning result. Obtain the electronic map of the area according to the positioning result, superimpose the BIM information, and generate the nearby indoor electronic map of the current positioning result.
  • FIG. 5 is a schematic diagram of the hardware device structure of the mobile terminal vision fusion positioning method provided by an embodiment of the present application.
  • the device includes one or more processors and memory. Taking a processor as an example, the device may also include: an input system and an output system.
  • the processor, the memory, the input system, and the output system may be connected by a bus or other methods.
  • the connection by a bus is taken as an example.
  • the memory can be used to store non-transitory software programs, non-transitory computer executable programs and modules.
  • the processor executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions, and modules stored in the memory, that is, realizing the processing methods of the foregoing method embodiments.
  • the memory may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data and the like.
  • the memory may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory may optionally include a memory remotely arranged with respect to the processor, and these remote memories may be connected to the processing system through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input system can receive input digital or character information, and generate signal input.
  • the output system may include display devices such as a display screen.
  • the one or more modules are stored in the memory, and when executed by the one or more processors, the following operations of any of the foregoing method embodiments are performed:
  • the image sample data of the transmission line components to construct the mobile terminal visual fusion positioning model specifically includes the following steps:
  • Step a Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
  • Step b Use a mobile terminal to obtain video frames
  • Step c Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method
  • the positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  • the embodiments of the present application provide a non-transitory (non-volatile) computer storage medium, the computer storage medium stores computer executable instructions, and the computer executable instructions can perform the following operations:
  • Step a Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
  • Step b Use a mobile terminal to obtain video frames
  • Step c Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method
  • the positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  • the embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer To make the computer do the following:
  • Step a Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
  • Step b Use a mobile terminal to obtain video frames
  • Step c Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method
  • the positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  • the mobile terminal vision fusion positioning method, system and electronic device of the embodiments of the application use visual sensors to detect and recognize static objects in the real world to obtain object spatial relationships, and perform geographic topological spatial matching with the object spatial relationships provided by the BIM model, and then according to the objects
  • the distance measurement establishes a nonlinear equation set, iteratively solves the equation set, and converges to obtain a precise position, thereby realizing a more accurate, convenient and cheaper positioning method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

A mobile side vision fusion positioning method and system, and an electronic device. The method comprises: step a: system initialization (100), obtaining an initial location of a mobile terminal as the current location of a positioning target on the basis of a calibrated starting location and sensor information; step b, obtaining a video frame by using the mobile terminal (200); and step c, detecting a static object in the video frame, obtaining geographic coordinate information of the static object by means of a BIM space database, and bringing the coordinate information of the static object into a multi-target object positioning model, iteratively solving the positioning model by means of a Gauss-Newton method, obtaining the current location of the mobile terminal, and combining the current location of the mobile terminal and the coordinate information of the static object to obtain a positioning result of the positioning target. The more convenient, precise and cheaper positioning method can be implemented.

Description

一种移动端视觉融合定位方法、系统及电子设备Mobile terminal vision fusion positioning method, system and electronic equipment 技术领域Technical field
本申请属于人工智能与地理信息技术的交叉技术领域,特别涉及一种移动端视觉融合定位方法、系统及电子设备。This application belongs to the cross-technology field of artificial intelligence and geographic information technology, and in particular relates to a mobile terminal vision fusion positioning method, system and electronic equipment.
背景技术Background technique
全球卫星导航系统(Global Navigation Satellite System,GNSS)可以实现在室外实现导航与定位,目前以GNSS、蜂窝网络、WIFI等为代表的无线电定位技术在空旷的室外定位已经可以完成亚米级的精准定位,其原理是依靠检测传播信号的特征参数来实现定位,常用方法包括邻近探测法、基于观测到达时间差(observed time difference of arrival,OTDOA)等。The Global Navigation Satellite System (GNSS) can realize navigation and positioning outdoors. Currently, radio positioning technologies represented by GNSS, cellular networks, and WIFI can achieve sub-meter precise positioning in open outdoor locations. The principle is to rely on detecting the characteristic parameters of the propagated signal to achieve positioning. Common methods include proximity detection, observation time difference of arrival (OTDOA) and so on.
室内定位技术主要实现人员、物体的在各种室内空间中的定位与跟踪,基于室内定位对人员、物体的安全与监测需求也越来越大,人们在室内环境中的位置服务需求已经日趋显著,国内外学者进行了大量探索与研究。目前室内定位系统大都基于临近探测、三角、多边定位、指纹定位法来实现,或者为了提高精度采取组合定位的方法。但是由于多路径效应影响,室内环境易变、复杂,尚未有一种普适的解决方案,如何提升精度、实时性、安全性,提高可扩展能力,低成本、便捷性、专业化是当前研究热点。Indoor positioning technology mainly realizes the positioning and tracking of people and objects in various indoor spaces. The demand for safety and monitoring of people and objects based on indoor positioning is also increasing. People's demand for location services in indoor environments has become increasingly significant. , Domestic and foreign scholars have conducted a lot of exploration and research. At present, most indoor positioning systems are implemented based on proximity detection, triangular, multilateral positioning, and fingerprint positioning methods, or combined positioning methods are adopted to improve accuracy. However, due to the influence of multi-path effects, the indoor environment is changeable and complex, and there is no universal solution. How to improve accuracy, real-time, security, scalability, low cost, convenience, and specialization are current research hotspots. .
目前,室内无线电定位技术(如WIFI、蓝牙等)大都采用接收信号强度指示(Received Signal Strength Indication,RSSI)作为定位算法的依据,通过利用信号衰减和距离变化关系来实现定位。WIFI室内定位技术一般包括:基于接收信号强度指示RSSI距离交会的定位方法和基于RSSI位置指纹法,信号的匹配是其研究的主要部分,定位精度在于校准点密度。该技术具有便于扩展、可 自动更新数据、成本低的优势,因此最先实现了规模化应用。At present, indoor radio positioning technologies (such as WIFI, Bluetooth, etc.) mostly use Received Signal Strength Indication (RSSI) as the basis of positioning algorithms, and realize positioning by using signal attenuation and distance change relationships. WIFI indoor positioning technology generally includes: the positioning method based on the received signal strength indicating the RSSI distance rendezvous and the RSSI location fingerprint method. The matching of the signal is the main part of its research, and the positioning accuracy lies in the calibration point density. This technology has the advantages of easy expansion, automatic data update, and low cost, so it is the first to realize large-scale applications.
蓝牙定位基于短距离低功耗通讯协议,实现的方法可以是质心定位法、指纹定位和邻近探测法;蓝牙定位有功耗低、近距离、运用广泛等优点,但同时稳定性差、受环境干扰大。苹果公司开发的基于低耗能蓝牙的精确的微定位技术iBeacon工作原理类似之前的蓝牙技术,由Beacon发射信号,蓝牙设备定位接受、反馈信号,当用户进入、退出或者在区域内徘徊时,Beacon的广播有能力进行传播,可计算用户和Beacon的距离(可通过RSSI计算)。由此可知,只要有三个iBeacon设备即可定位。Bluetooth positioning is based on a short-distance low-power communication protocol. The implementation methods can be centroid positioning, fingerprint positioning, and proximity detection; Bluetooth positioning has the advantages of low power consumption, close range, and extensive use, but at the same time it has poor stability and is affected by environmental interference. Big. The precise micro-positioning technology iBeacon developed by Apple based on Bluetooth low energy works in a similar way to the previous Bluetooth technology. The signal is transmitted by the Beacon, and the Bluetooth device receives and feedbacks the signal. When the user enters, exits or wanders in the area, the Beacon The broadcast has the ability to spread, and can calculate the distance between the user and the Beacon (calculated by RSSI). It can be seen that as long as there are three iBeacon devices, it can be located.
上述这些室内定位技术均是基于无线射频信号开展,无线电信号很容易受室内环境如障碍物的影响,环境变化造成定位精度下降,同时,前期需要施工作业、布设大量设备、费用及维护成本较高。The above-mentioned indoor positioning technologies are all based on radio frequency signals. Radio signals are easily affected by indoor environments such as obstacles. Environmental changes cause positioning accuracy to decrease. At the same time, the early stage requires construction operations, a large amount of equipment, and high maintenance costs. .
视觉传感器定位技术,通过三角测量技术从而获取当前相对位置从而实现定位,相对于其他传感器定位方法,视觉传感器的定位精度较高,且成本低廉。EasyLiving系统是基于计算机视觉的定位系统,采用高性能的照移动终端,准确性比较高;但当室内环境复杂时,很难一直保持高精度。通过移动机器人同步定位和制图(Simultaneous Location And Mapping,SLAM)的原理,可以引入视觉传感器。2012年提出的EV-Loc室内定位系统是一个以视觉信号作为辅助定位来提高精度的定位系统。基于视觉定位原理的谷歌视觉定位服务(Visual Positioning Service,VPS)技术,其理论精度可达厘米级别。The vision sensor positioning technology uses triangulation technology to obtain the current relative position to achieve positioning. Compared with other sensor positioning methods, the vision sensor has a higher positioning accuracy and low cost. The EasyLiving system is a positioning system based on computer vision, using high-performance mobile terminals with high accuracy; but when the indoor environment is complex, it is difficult to maintain high accuracy all the time. Through the principle of simultaneous positioning and mapping (Simultaneous Location And Mapping, SLAM) of mobile robots, visual sensors can be introduced. The EV-Loc indoor positioning system proposed in 2012 is a positioning system that uses visual signals as auxiliary positioning to improve accuracy. Google's Visual Positioning Service (VPS) technology based on the principle of visual positioning has theoretical accuracy up to centimeter level.
然而,现有的视觉定位系统,如EasyLiving、谷歌VPS等,大都基于SLAM原理通过提取视觉传感器捕捉的特征点,通过三角测距法,结合加速度、陀螺仪等传感器来算出当前位置移动偏移,只是相对定位。要想实现室内准确地理定位,需要事先在固定点部署大量人工标志,前期准备工作繁琐。与此同时,现有的这些视觉定位技术仅仅只考虑传感器感知到的数据,并没有对这些数据 携带的语义信息加以利用。However, the existing visual positioning systems, such as EasyLiving, Google VPS, etc., are mostly based on the SLAM principle by extracting feature points captured by visual sensors, using triangulation ranging method, combined with acceleration, gyroscope and other sensors to calculate the current position movement offset. Just relative positioning. In order to achieve accurate indoor geographic positioning, a large number of manual signs must be deployed at fixed points in advance, and the preliminary preparations are tedious. At the same time, these existing visual positioning technologies only consider the data sensed by the sensors, and do not make use of the semantic information carried by these data.
发明内容Summary of the invention
本申请提供了一种移动端视觉融合定位方法、系统及电子设备,旨在至少在一定程度上解决现有技术中的上述技术问题之一。This application provides a mobile terminal vision fusion positioning method, system and electronic device, which aim to solve one of the above technical problems in the prior art at least to a certain extent.
为了解决上述问题,本申请提供了如下技术方案:In order to solve the above-mentioned problems, this application provides the following technical solutions:
一种移动端视觉融合定位方法,包括以下步骤:A mobile terminal vision fusion positioning method includes the following steps:
步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Acquire the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
本申请实施例采取的技术方案还包括:在所述步骤b中,所述使用移动终端获取视频帧后还包括:视觉里程计根据获取的视频帧计算移动终端当前的位姿信息。The technical solution adopted in the embodiment of the application further includes: in the step b, after the use of the mobile terminal to obtain the video frame, the method further includes: the visual odometer calculates the current pose information of the mobile terminal according to the obtained video frame.
本申请实施例采取的技术方案还包括:所述视觉里程计根据获取的视频帧计算移动终端当前的位姿信息具体包括:The technical solution adopted in the embodiment of the present application further includes: the visual odometer calculates the current pose information of the mobile terminal according to the acquired video frames specifically including:
步骤b1:视觉里程计将获取的视频帧缩放至设定大小后,存入图像滑动窗口,并判断当前视频帧是否为第一帧,如果当前视频帧为第一帧则仅进行提取关键点操作;否则,提取关键点并计算当前关键点与前一视频帧关键点的残差;单个关键点残差e是关键点像素亮度的误差,计算公式为:Step b1: The visual odometer scales the acquired video frame to the set size, saves it in the image sliding window, and judges whether the current video frame is the first frame, if the current video frame is the first frame, only the key point extraction operation is performed ; Otherwise, extract the key points and calculate the residual difference between the current key point and the key point of the previous video frame; the single key point residual e is the error of the key point pixel brightness, and the calculation formula is:
e=I 1(x 1)-I 2(x 2)=I 1(Kp 1)-I 2(K(Rp 1+t)) e=I 1 (x 1 )-I 2 (x 2 )=I 1 (Kp 1 )-I 2 (K(Rp 1 +t))
上述公式中,I 2为I 1经过某种运动得到,R和t是移动终端运动轨迹,x 1是关键点在图像I 1中的像素位置,x 2是关键点在图像I 2中的像素位置,p 1是关键点在现实空间中的坐标,K是移动终端的内参矩阵; In the above formula, I 2 is obtained by I 1 after a certain movement, R and t are the motion trajectories of the mobile terminal, x 1 is the pixel position of the key point in the image I 1 , and x 2 is the pixel of the key point in the image I 2 Position, p 1 is the coordinate of the key point in real space, K is the internal parameter matrix of the mobile terminal;
步骤b2:利用高斯牛顿法求解残差雅可比,得到当前视频帧与上一视频帧的运动位姿,并将其记录至位姿存储滑动窗口;Step b2: Use Gauss Newton's method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record them in the pose storage sliding window;
步骤b3:得到当前视频帧的移动终端位姿,提取所述位姿的空间偏移量,并将空间偏移量转换成相对坐标偏移值,即为移动终端的运动偏移。Step b3: Obtain the mobile terminal pose of the current video frame, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is the motion offset of the mobile terminal.
本申请实施例采取的技术方案还包括:所述步骤b还包括:判断监听定位状态,如果不是在定位状态,则将所述移动终端当前的位姿信息与上一次获取的当前位置相加后,对定位目标的当前位置进行更新;如果是在定位状态,执行步骤c。The technical solution adopted in the embodiment of the application further includes: the step b also includes: judging the monitoring positioning state, and if it is not in the positioning state, adding the current pose information of the mobile terminal to the current position acquired last time , Update the current position of the positioning target; if it is in the positioning state, perform step c.
本申请实施例采取的技术方案还包括:在所述步骤c中,所述检测视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置具体包括:The technical solution adopted in the embodiment of the application further includes: in the step c, the detecting a static object in the video frame, obtaining the coordinate information of the static object through the BIM spatial database, and bringing the coordinate information of the static object Entering a multi-target object positioning model, and iteratively solving the positioning model through the Gauss Newton method, and obtaining the current position of the mobile terminal specifically includes:
步骤c1:取出所述视频帧,输入至目标检测神经网络中,得到该视频帧中所包含的静态物体种类,并将所述静态物体的中心像素位置设置为标定点,然后取出下一视频帧以及所述下一视频帧的移动终端位姿信息,利用三角测量法算出标定点与移动终端的深度信息;其中,所述三角测量法公式如下:Step c1: Take out the video frame and input it into the target detection neural network to obtain the type of static object contained in the video frame, and set the center pixel position of the static object as a calibration point, and then take out the next video frame And the mobile terminal pose information of the next video frame, using triangulation to calculate the calibration point and the depth information of the mobile terminal; wherein the triangulation formula is as follows:
Figure PCTCN2019130553-appb-000001
Figure PCTCN2019130553-appb-000001
上述公式中,s 1,s 2是关键点的深度信息; In the above formula, s 1 and s 2 are the depth information of key points;
步骤c2:通过识别的静态物体类别,以当前位置的坐标信息加载BIM空 间信息数据库,根据所述BIM空间信息数据库得到所述静态物体的坐标信息;Step c2: Load the BIM spatial information database with the coordinate information of the current position through the identified static object category, and obtain the coordinate information of the static object according to the BIM spatial information database;
步骤c3:将所述静态物体的坐标信息带入定位模型中,并采用高斯牛顿法迭代求解,得到移动终端当前位置;高斯牛顿法求解方程组为:Step c3: Bring the coordinate information of the static object into the positioning model, and use Gauss-Newton method to iteratively solve to obtain the current position of the mobile terminal; the Gauss-Newton method to solve the equations is:
Figure PCTCN2019130553-appb-000002
Figure PCTCN2019130553-appb-000002
上述公式中,(x,y,z)是移动终端的当前位置,(x n,y n,z n)是BIM坐标信息,ρ n是静态物体到移动终端的深度,σ n是深度的测量噪声; In the above formula, (x, y, z) is the current position of the mobile terminal, (x n, y n, z n) are BIM coordinate information, ρ n is the static object depth to the mobile terminal, σ n is a measure of the depth of the noise;
步骤c4:将所述移动终端当前位置与静态物体坐标信息相结合,得到当前位置的定位结果。Step c4: Combine the current position of the mobile terminal with the static object coordinate information to obtain a positioning result of the current position.
本申请实施例采取的另一技术方案为:一种移动端视觉融合定位系统,包括:Another technical solution adopted by the embodiment of this application is: a mobile terminal vision fusion positioning system, including:
初始定位单元:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Initial positioning unit: Based on the calibrated starting position and sensor information, obtain the initial position of the mobile terminal and set the initial position as the current position of the positioning target;
视频帧获取模块:用于使用移动终端获取视频帧;Video frame acquisition module: used to acquire video frames using mobile terminals;
目标定位模块:用于检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端新的当前位置,并将所述移动终端新的当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Target positioning module: used to detect static objects in the video frame, obtain the coordinate information of the static objects through the BIM spatial database, bring the coordinate information of the static objects into the multi-target object positioning model, and iterate through the Gauss-Newton method Solve the positioning model, obtain the new current position of the mobile terminal, and combine the new current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
本申请实施例采取的技术方案还包括位姿计算模块,所述位姿计算模块用于通过视觉里程计根据获取的视频帧计算移动终端当前的位姿信息。The technical solution adopted in the embodiment of the present application further includes a pose calculation module, which is used to calculate the current pose information of the mobile terminal according to the acquired video frame through the visual odometer.
本申请实施例采取的技术方案还包括:所述位姿计算模块包括:The technical solution adopted in the embodiment of the present application further includes: the pose calculation module includes:
关键点提取单元:用于通过视觉里程计将获取的视频帧缩放至设定大小后,存入图像滑动窗口,并判断当前视频帧是否为第一帧,如果当前视频帧为第一帧则仅进行提取关键点操作;否则,提取关键点并计算当前关键点与前一视频帧关键点的残差;单个关键点残差e是关键点像素亮度的误差,计算公式为:Key point extraction unit: used to zoom the acquired video frame to a set size through the visual odometer, save it in the image sliding window, and determine whether the current video frame is the first frame, if the current video frame is the first frame, only Carry out the operation of extracting key points; otherwise, extract the key points and calculate the residual difference between the current key point and the key point of the previous video frame; the single key point residual e is the error of the key point pixel brightness, and the calculation formula is:
e=I 1(x 1)-I 2(x 2)=I 1(Kp 1)-I 2(K(Rp 1+t)) e=I 1 (x 1 )-I 2 (x 2 )=I 1 (Kp 1 )-I 2 (K(Rp 1 +t))
上述公式中,I 2为I 1经过某种运动得到,R和t是移动终端运动轨迹,x 1是关键点在图像I 1中的像素位置,x 2是关键点在图像I 2中的像素位置,p 1是关键点在现实空间中的坐标,K是移动终端的内参矩阵; In the above formula, I 2 is obtained by I 1 after a certain movement, R and t are the motion trajectories of the mobile terminal, x 1 is the pixel position of the key point in the image I 1 , and x 2 is the pixel of the key point in the image I 2 Position, p 1 is the coordinate of the key point in real space, K is the internal parameter matrix of the mobile terminal;
运动位姿求解单元:用于利用高斯牛顿法求解残差雅可比,得到当前视频帧与上一视频帧的运动位姿,并将其记录至位姿存储滑动窗口;Motion pose solving unit: used to use Gauss-Newton method to solve the residual Jacobian, get the motion pose of the current video frame and the previous video frame, and record it to the pose storage sliding window;
运动偏移计算单元:用于得到当前视频帧的移动终端位姿,提取所述位姿的空间偏移量,并将空间偏移量转换成相对坐标偏移值,即为移动终端的运动偏移。Motion offset calculation unit: used to obtain the mobile terminal pose of the current video frame, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is the motion offset of the mobile terminal shift.
本申请实施例采取的技术方案还包括定位判断模块和位置更新模块,所述定位判断模块用于通过判断监听定位状态,如果不是在定位状态,则通过位置更新模块将所述移动终端当前的位姿信息与上一次获取的当前位置相加后,对定位目标的当前位置进行更新;如果是在定位状态,通过目标定位模块获取定位目标的定位结果。The technical solution adopted in the embodiment of this application also includes a positioning judgment module and a position update module. The positioning judgment module is used to monitor the positioning status through judgment. If it is not in the positioning state, the current position of the mobile terminal is updated by the position update module. After the pose information is added to the current position obtained last time, the current position of the positioning target is updated; if it is in the positioning state, the positioning result of the positioning target is obtained through the target positioning module.
本申请实施例采取的技术方案还包括:所述目标定位模块具体包括:The technical solution adopted in the embodiment of the present application further includes: the target positioning module specifically includes:
物体识别及深度计算单元:用于取出所述视频帧,输入至目标检测神经网络中,得到该视频帧中所包含的静态物体种类,并将所述静态物体的中心像素位置设置为标定点,然后取出下一视频帧以及所述下一视频帧的移动终端位姿信息,利用三角测量法算出标定点与移动终端的深度信息;其中,所述三角测 量法公式如下:Object recognition and depth calculation unit: used to take out the video frame, input it into the target detection neural network, obtain the type of static object contained in the video frame, and set the center pixel position of the static object as a calibration point, Then take out the next video frame and the mobile terminal pose information of the next video frame, and use the triangulation method to calculate the calibration point and the depth information of the mobile terminal; wherein the triangulation method formula is as follows:
Figure PCTCN2019130553-appb-000003
Figure PCTCN2019130553-appb-000003
上述公式中,s 1,s 2是关键点的深度信息; In the above formula, s 1 and s 2 are the depth information of key points;
粗定位单元:用于通过识别的静态物体类别,以当前位置的坐标信息加载BIM空间信息数据库,根据所述BIM空间信息数据库得到所述静态物体的坐标信息;Coarse positioning unit: used to load the BIM spatial information database with the coordinate information of the current position through the identified static object category, and obtain the coordinate information of the static object according to the BIM spatial information database;
精定位单元:用于将所述静态物体的坐标信息带入定位模型中,并采用高斯牛顿法迭代求解,得到移动终端当前位置;高斯牛顿法求解方程组为:Fine positioning unit: used to bring the coordinate information of the static object into the positioning model, and iteratively solve it by using the Gauss Newton method to obtain the current position of the mobile terminal; the Gauss Newton method to solve the equations is:
Figure PCTCN2019130553-appb-000004
Figure PCTCN2019130553-appb-000004
上述公式中,(x,y,z)是移动终端的当前位置,(x n,y n,z n)是BIM空间数据库中存储的静态物体坐标信息,ρ n是静态物体到移动终端的深度,σ n是深度的测量噪声; In the above formula, (x, y, z) is the current position of the mobile terminal, (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, and ρ n is the depth of the static object to the mobile terminal , Σ n is the depth measurement noise;
定位结果生成单元:用于将所述移动终端当前位置与静态物体坐标信息相结合,获取定位结果。Positioning result generating unit: used to combine the current position of the mobile terminal with the static object coordinate information to obtain the positioning result.
本申请实施例采取的又一技术方案为:一种电子设备,包括:Another technical solution adopted by the embodiments of the present application is: an electronic device, including:
至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的移动端视觉融合定位方法的以下操作:At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor to enable The at least one processor can perform the following operations of the aforementioned mobile terminal visual fusion positioning method:
步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Acquire the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
相对于现有技术,本申请实施例产生的有益效果在于:本申请实施例的移动端视觉融合定位方法、系统及电子设备利用视觉传感器检测并识别真实世界的静态物体得到物体空间关系,并与BIM模型提供的物体空间关系进行地理拓扑空间匹配,再根据物体的距离测量建立非线性方程组,迭代求解方程组,收敛得到精确位置,从而实现更精准更便捷更低廉的定位方法。Compared with the prior art, the beneficial effects produced by the embodiments of the present application are: the mobile terminal visual fusion positioning method, system and electronic equipment of the embodiments of the present application use visual sensors to detect and identify static objects in the real world to obtain the object spatial relationship, and The object space relationship provided by the BIM model is matched with geographical topological space, and then a nonlinear equation set is established according to the distance measurement of the object, and the equation set is solved iteratively, and the precise position is obtained by convergence, thereby realizing a more accurate, convenient and cheaper positioning method.
附图说明Description of the drawings
图1是本申请实施例的移动端视觉融合定位方法的流程图;FIG. 1 is a flowchart of a method for visual fusion positioning on a mobile terminal according to an embodiment of the present application;
图2是目标检测神经网络结构示意图;Figure 2 is a schematic diagram of the target detection neural network structure;
图3为关键点选取示意图;Figure 3 is a schematic diagram of key points selection;
图4是本申请实施例的移动端视觉融合定位系统的结构示意图;4 is a schematic structural diagram of a mobile terminal visual fusion positioning system according to an embodiment of the present application;
图5是本申请实施例提供的移动端视觉融合定位方法的硬件设备结构示意图。FIG. 5 is a schematic diagram of the hardware device structure of the mobile terminal vision fusion positioning method provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application.
请参阅图1,是本申请实施例的移动端视觉融合定位方法的流程图。本申请实施例的移动端视觉融合定位方法包括以下步骤:Please refer to FIG. 1, which is a flowchart of a mobile terminal vision fusion positioning method according to an embodiment of the present application. The mobile terminal vision fusion positioning method of the embodiment of the present application includes the following steps:
步骤100:系统初始化;Step 100: System initialization;
步骤100中,系统初始化包括以下步骤:In step 100, the system initialization includes the following steps:
步骤110:视觉里程计的初始化;Step 110: Initialize the visual odometer;
步骤110中,视觉里程计的初始化包括:位姿管理器的内存分配、变量的赋初值等操作;位姿管理器包括一个位姿存储的滑动窗口、图像滑动窗口、关键点等主要数据结构,位姿存储的滑动窗口用于存储计算得出的位姿信息;图像滑动窗口用于缓存移动终端的拍摄信息,等待关键点的提取以及标定点深度的估计;关键点用于标识一帧图像中某一区域的像素梯度变化情况,用于后续图像的相似度比较;除此主要数据结构以外,位姿管理器还包括关键点提取函数、关键点相似度估计函数、关键点更新函数、关键点丢弃函数、关键点索引函数、关键点深度估计函数、帧位姿估计函数以及位姿的增、删、查函数等数据。In step 110, the initialization of the visual odometer includes operations such as memory allocation of the pose manager and initial value of variables; the pose manager includes a main data structure such as a sliding window for pose storage, an image sliding window, and key points. , The sliding window of the pose storage is used to store the calculated pose information; the image sliding window is used to cache the shooting information of the mobile terminal, waiting for the extraction of key points and the estimation of the depth of the calibration point; the key points are used to identify a frame of image The pixel gradient change in a certain area in the image is used for the similarity comparison of subsequent images; in addition to this main data structure, the pose manager also includes key point extraction function, key point similarity estimation function, key point update function, key Point drop function, key point index function, key point depth estimation function, frame pose estimation function, and pose addition, deletion, and query functions.
步骤120:语义定位校准的初始化;Step 120: initialization of semantic positioning calibration;
步骤120中:语义定位校准的初始化包括目标检测神经网络的训练、模型的加载,以及BIM空间数据库的生成及加载。其中,目标检测神经网络结构如图2所示,本申请所述目标检测神经网络结构是参考现有目标检测方法,网络结构并未做出更改,仅在训练网络时采用专有静态物体数据集训练并经优化后,部署至移动终端。BIM空间数据库采用R树方法构建、索引,其中BIM空间数据结构包含当前位置所在区域的电子地图和该区域包含的静态物体类别、静态物体坐标信息、静态物体临近的其他静态物体以及当前位置所在建筑的每层建筑空间布局信息。In step 120: the initialization of the semantic positioning calibration includes the training of the target detection neural network, the loading of the model, and the generation and loading of the BIM spatial database. Among them, the target detection neural network structure is shown in Figure 2. The target detection neural network structure described in this application refers to the existing target detection method, the network structure has not been changed, and only the proprietary static object data set is used when training the network After training and optimization, it is deployed to mobile terminals. The BIM spatial database is constructed and indexed using the R-tree method. The BIM spatial data structure includes the electronic map of the area where the current location is located and the category of static objects contained in the area, the coordinate information of the static object, other static objects near the static object, and the building where the current location is located The spatial layout information of each floor of the building.
步骤130:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step 130: Acquire the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
步骤130中,本申请实施例的移动终端为配备Android的终端设备,带有9轴IMU传感器。初始定位可通过室内平面地图设定起始点位置,或通过自动识别唯一标志设定,然后基于加速度、陀螺仪等传感器累计。In step 130, the mobile terminal of the embodiment of the present application is an Android-equipped terminal device with a 9-axis IMU sensor. The initial positioning can be set through the indoor flat map to set the starting point location, or set through the automatic identification of unique signs, and then accumulate based on acceleration, gyroscope and other sensors.
步骤200:使用移动终端获取视频帧;Step 200: Obtain video frames using a mobile terminal;
步骤300:视觉里程计根据获取的视频帧计算移动终端当前的位姿信息;Step 300: The visual odometer calculates the current pose information of the mobile terminal according to the acquired video frames;
步骤300中,位姿是运动矩阵,包含移动终端旋转、平移等信息。视觉里程计通过利用关键点的像素残差,来估计移动终端相邻两帧的位姿变化,进而求得移动终端的运动偏移。具体的,视觉里程计位姿计算具体包括:In step 300, the pose is a motion matrix, which contains information such as rotation and translation of the mobile terminal. The visual odometer uses the pixel residuals of key points to estimate the pose changes of the mobile terminal in two adjacent frames, and then obtains the movement offset of the mobile terminal. Specifically, the visual odometer pose calculation includes:
步骤310:视觉里程计将获取的视频帧缩放至300px*300px(具体可根据实际应用进行设定)后,存入图像滑动窗口,并判断当前视频帧是否为第一帧,如果当前视频帧为第一帧则仅提取关键点,否则,提取关键点并计算该关键点与前一视频帧关键点的残差;Step 310: The visual odometer scales the acquired video frame to 300px*300px (specifically can be set according to the actual application), saves it in the image sliding window, and judges whether the current video frame is the first frame, if the current video frame is Only the key points are extracted in the first frame, otherwise, the key points are extracted and the residuals between the key points and the key points of the previous video frame are calculated;
步骤310中,关键点计算步骤如下:首先在视频帧上选取像素p,并取像素P灰度值G的30%为阈值T。然后,以像素p为中心,选取半径为3的圆上的16个像素点,如果选取的圆上连续有N个点亮度大于G+T或者小于G-T时,则该像素p可被认为是视频帧的关键点。重复上述步骤,直到遍历完视频帧上的所有像素,完成关键点计算;最后,采用非极大值抑制,在50px*50px内,仅保留响应极大值的关键点。如图3所示,为关键点选取示意图。例如:点p半径为3的十六个像素值下表灰色部分,灰色部分就是选取的N个点,计算这N个点与P的大小,按照上面规则确定是否是关键点。P是一个0到255的值。单个关键点残差e是关键点像素亮度的误差,计算公式如下:In step 310, the key point calculation steps are as follows: first, select the pixel p on the video frame, and take 30% of the gray value G of the pixel P as the threshold T. Then, taking pixel p as the center, select 16 pixels on a circle with a radius of 3. If there are N consecutive points on the selected circle whose brightness is greater than G+T or less than GT, then the pixel p can be considered as a video The key point of the frame. Repeat the above steps until all the pixels on the video frame are traversed to complete the key point calculation; finally, non-maximum suppression is used, and within 50px*50px, only the key points that respond to the maximum value are retained. As shown in Figure 3, a schematic diagram of key points selection. For example: the sixteen pixel values of point p with a radius of 3 are in the gray part of the table below. The gray part is the selected N points, calculate the size of these N points and P, and determine whether they are key points according to the above rules. P is a value from 0 to 255. The residual e of a single key point is the error of the pixel brightness of the key point, and the calculation formula is as follows:
e=I 1(x 1)-I 2(x 2)=I 1(Kp 1)-I 2(K(Rp 1+t))    (1) e=I 1 (x 1 )-I 2 (x 2 )=I 1 (Kp 1 )-I 2 (K(Rp 1 +t)) (1)
公式(1)中,I 2为I 1经过某种运动得到,R和t是移动终端运动轨迹,x 1是关键点在图像I 1中的像素位置,x 2是关键点在图像I 2中的像素位置,p 1是关键点在现实空间中的坐标,K是移动终端的内参矩阵。残差的李代数形式为: In formula (1), I 2 is obtained by I 1 after a certain movement, R and t are the trajectories of the mobile terminal, x 1 is the pixel position of the key point in the image I 1 , and x 2 is the key point in the image I 2 P 1 is the coordinate of the key point in real space, and K is the internal parameter matrix of the mobile terminal. The Lie algebraic form of the residual is:
e I 1(Kp 1)-I 2(K exp(ξ )p 1))   (2) e I 1 (Kp 1 )-I 2 (K exp(ξ )p 1 )) (2)
公式(2)中,
Figure PCTCN2019130553-appb-000005
In formula (2),
Figure PCTCN2019130553-appb-000005
步骤320:利用高斯牛顿法求解残差雅可比,得到当前视频帧与上一视频帧的运动位姿,并将其记录至位姿存储滑动窗口;Step 320: Use the Gauss-Newton method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record them in the pose storage sliding window;
步骤320中,利用高斯牛顿法求解残差雅可比具体包括:In step 320, using the Gauss-Newton method to solve the residual Jacobian specifically includes:
移动终端位姿优化目标函数:Mobile terminal pose optimization objective function:
Figure PCTCN2019130553-appb-000006
Figure PCTCN2019130553-appb-000006
公式(3)中,ξ是移动终端位姿,J是残差相对于李代数的梯度;即
Figure PCTCN2019130553-appb-000007
In formula (3), ξ is the pose of the mobile terminal, and J is the gradient of the residual with respect to the Lie algebra; that is,
Figure PCTCN2019130553-appb-000007
高斯牛顿迭代增量求解方程为:The Gauss-Newton iteration increment solution equation is:
Figure PCTCN2019130553-appb-000008
Figure PCTCN2019130553-appb-000008
公式(4)中,Δξ *是迭代增量。 In formula (4), Δξ * is the iteration increment.
高斯牛顿法求解最优化问题,通过目标函数的一阶梯度,沿着该梯度下降迭代求解。The Gauss-Newton method is used to solve the optimization problem, which is solved iteratively along the gradient by a step of the objective function.
目标函数为:(x i-x) 2+(y i-y) 2+(z i-z) 2=f(X)其中X=[x,y,z] The objective function is: (x i -x) 2 +(y i -y) 2 +(z i -z) 2 =f(X) where X=[x,y,z]
迭代增量ΔX k为:J(X) TJ(X)ΔX=-J(X) Tf(X) The iteration increment ΔX k is: J(X) T J(X)ΔX=-J(X) T f(X)
其中J(X)是方程的雅可比矩阵;Where J(X) is the Jacobian matrix of the equation;
Figure PCTCN2019130553-appb-000009
Figure PCTCN2019130553-appb-000009
公式(5)中,
Figure PCTCN2019130553-appb-000010
In formula (5),
Figure PCTCN2019130553-appb-000010
高斯牛顿法求解流程如下:The solution process of Gauss-Newton method is as follows:
步骤321:给定初始点p 0,以及迭代次数k,允许误差ε>0。当迭代次数和误差不满足条件时,则执行步骤330; Step 321: Given the initial point p 0 and the number of iterations k, the allowable error ε>0. When the number of iterations and the error do not meet the conditions, perform step 330;
步骤322:若目标函数f((X) k+1)-f((X) k)小于阈值ε则退出,否则执行步骤323; Step 322: If the objective function f((X) k+1 )-f((X) k ) is less than the threshold ε, exit, otherwise go to step 323;
步骤323:算出迭代增量,带入目标函数,返回步骤321。Step 323: Calculate the iteration increment, bring in the objective function, and return to step 321.
步骤330:得到当前视频帧的移动终端位姿,该位姿是一个六自由度的行向量,提取位姿的空间偏移量,并把空间偏移量转换成相对坐标偏移值,即为移动终端的运动偏移。Step 330: Obtain the pose of the mobile terminal of the current video frame, which is a row vector with six degrees of freedom, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is The movement of the mobile terminal is offset.
步骤400:通过判断定位监听状态,如果不是在定位状态,则执行步骤500;否则,执行步骤600;Step 400: By judging the positioning monitoring state, if it is not in the positioning state, perform step 500; otherwise, perform step 600;
步骤400中,定位监听是回调类型的,可能在初始化完成之后的任意时间执行。本申请在获取视频帧后即开始视觉里程计位姿计算,如果不是定位状态,将计算得到的位姿信息与上一次获取到的当前位置相加,得到本次当前位置的更新结果。该步骤一直反复执行,并不会因定位状态改变而停止。当定位监听装为定位状态时,则调用语义定位校准,根据识别的物体计算当前位置,替换视觉里程计所计算的运动偏移,并结合BIM空间数据库中的静态物体坐标更新用户的当前位置,然后将该位置作为最终位置。In step 400, the location monitoring is a callback type, and may be executed at any time after the initialization is completed. This application starts the visual odometry pose calculation after acquiring the video frame. If it is not in the positioning state, the calculated pose information is added to the current position obtained last time to obtain the current position update result. This step has been executed repeatedly and will not stop due to changes in positioning status. When the positioning monitor is installed in the positioning state, the semantic positioning calibration is called, the current position is calculated according to the recognized object, the movement offset calculated by the visual odometer is replaced, and the user's current position is updated in combination with the static object coordinates in the BIM spatial database. Then use that position as the final position.
步骤500:将当前位姿信息与上一次获取的当前位置相加后,对定位目标的当前位置进行更新;Step 500: After adding the current pose information to the current position acquired last time, update the current position of the positioning target;
步骤600:调用语义定位校准检测视频帧中的静态物体,并利用标定点采用三角测量方法(三角测量是指通过在两处观察同一个点的夹角,从而确定该点的距离)估计静态物体的深度信息,通过静态物体的深度信息以及空间数据的语义信息提出多目标物体定位模型,通过高斯牛顿法迭代求解定位模型,获取静态物体坐标信息及移动终端当前位置,将静态物体坐标信息与移动终端当前位置相结合,得到定位目标的定位结果;Step 600: Invoke semantic positioning calibration to detect static objects in the video frame, and use the calibration point to use the triangulation method (triangulation refers to observing the angle of the same point in two places to determine the distance of the point) to estimate the static object The depth information of the static object and the semantic information of the spatial data are used to propose a multi-target object positioning model, and the positioning model is solved iteratively through the Gauss-Newton method to obtain the static object coordinate information and the current position of the mobile terminal, and the static object coordinate information is moved The current position of the terminal is combined to obtain the positioning result of the positioning target;
步骤600中,语义定位校准的定位方式具体包括以下步骤:In step 600, the positioning method of semantic positioning calibration specifically includes the following steps:
步骤610:取出图像滑动窗口中的视频帧,输入至目标检测神经网络中,得到该视频帧中所包含的静态物体种类,并将识别出的静态物体的中心像素位置设置为标定点,然后取出下一视频帧以及下一视频帧的移动终端位姿信息,利用三角测量法,算出标定点与移动终端的深度信息;Step 610: Take out the video frame in the image sliding window and input it into the target detection neural network to obtain the type of static object contained in the video frame, and set the center pixel position of the identified static object as the calibration point, and then take it out The pose information of the next video frame and the mobile terminal of the next video frame are calculated using the triangulation method to calculate the calibration point and the depth information of the mobile terminal;
步骤610中,三角测量法公式如下:In step 610, the triangulation formula is as follows:
Figure PCTCN2019130553-appb-000011
Figure PCTCN2019130553-appb-000011
公式(6)中,s 1,s 2是关键点的深度信息。 In formula (6), s 1 and s 2 are the depth information of key points.
步骤620:通过粗定位得到视频帧中静态物体的坐标信息;粗定位的位置数据来自于BIM空间数据库中存储的静态物体的坐标信息,通过识别的物体类别,以当前位置的坐标信息范围附近检索所加载的BIM空间信息数据库,找出识别的物体类别所携带的坐标信息。Step 620: Obtain the coordinate information of the static object in the video frame through the coarse positioning; the position data of the coarse positioning comes from the coordinate information of the static object stored in the BIM spatial database, and retrieve the coordinate information near the current position through the recognized object category Load the BIM spatial information database to find out the coordinate information carried by the recognized object category.
步骤630:精定位;为了进一步优化定位精度,将粗定位得到的坐标信息带入定位模型中,并采用高斯牛顿法迭代求解,得到移动终端当前位置。Step 630: fine positioning; in order to further optimize the positioning accuracy, the coordinate information obtained from the coarse positioning is brought into the positioning model, and the Gauss-Newton method is used to iteratively solve the problem to obtain the current position of the mobile terminal.
步骤630中,静态物体与移动终端的位置应满足以下关系:In step 630, the positions of the static object and the mobile terminal should satisfy the following relationship:
(x i-x) 2+(y i-y) 2+(z i-z) 2=δρ ii  (7) (x i -x) 2 +(y i -y) 2 +(z i -z) 2 =δρ ii (7)
因此可建立并解算如下非线性方程组:Therefore, the following nonlinear equations can be established and solved:
Figure PCTCN2019130553-appb-000012
Figure PCTCN2019130553-appb-000012
公式(8)中,(x,y,z)是移动终端当前的位置,(x n,y n,z n)是BIM空间数据库中存储的静态物体坐标信息,表征静态物体距离固定坐标(比如建筑物中心的坐标信息)的相对位置,ρ n是当前静态物体到移动终端的深度,σ n是深度的测量噪声。 In formula (8), (x, y, z) is the current position of the mobile terminal, (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, representing the distance of the static object from the fixed coordinate (such as The relative position of the coordinate information of the building center), ρ n is the depth of the current static object to the mobile terminal, and σ n is the depth measurement noise.
步骤640:将精定位迭代计算出的移动终端当前位置(该位置为相对于建筑物特定目标坐标的偏移)与粗定位得到的静态物体坐标信息相结合,得到当前位置的定位结果,并生成定位结果的室内电子地图及BIM空间数据库;Step 640: Combine the current position of the mobile terminal iteratively calculated by fine positioning (the position is the offset relative to the specific target coordinates of the building) with the static object coordinate information obtained by coarse positioning to obtain the positioning result of the current position, and generate Indoor electronic map and BIM spatial database of positioning results;
步骤640中,根据定位坐标获取所在区域的电子地图,叠加BIM信息,生产当前定位结果的临近室内电子地图。In step 640, an electronic map of the area is obtained according to the positioning coordinates, and BIM information is superimposed to produce a nearby indoor electronic map of the current positioning result.
请参阅图4,是本申请实施例的移动端视觉融合定位系统的结构示意图。本申请实施例的移动端视觉融合定位系统包括初始化模块、视频帧获取模块、位姿计算模块、定位判断模块、位置更新模块和目标定位模块。Please refer to FIG. 4, which is a schematic structural diagram of a mobile terminal visual fusion positioning system according to an embodiment of the present application. The mobile terminal vision fusion positioning system in the embodiment of the application includes an initialization module, a video frame acquisition module, a pose calculation module, a positioning judgment module, a position update module, and a target positioning module.
初始化模块:用于系统初始化;具体地,初始化模块包括:Initialization module: used for system initialization; specifically, the initialization module includes:
视觉里程计初始化单元:用于视觉里程计的初始化;包括:位姿管理器的内存分配、变量的赋初值等操作;位姿管理器包括一个位姿存储的滑动窗口、图像滑动窗口、关键点等主要数据结构,位姿存储的滑动窗口用于存储计算得出的位姿信息;图像滑动窗口用于缓存移动终端的拍摄信息,等待关键点的提取以及标定点深度的估计;关键点用于标识一帧图像中某一区域的像素梯度变 化情况,用于后续图像的相似度比较;除此主要数据结构以外,位姿管理器还包括关键点提取函数、关键点相似度估计函数、关键点更新函数、关键点丢弃函数、关键点索引函数、关键点深度估计函数、帧位姿估计函数以及位姿的增、删、查函数等数据。Visual odometer initialization unit: used for the initialization of visual odometer; including: the memory allocation of the pose manager, the initial value of variables and other operations; the pose manager includes a sliding window for pose storage, image sliding window, key The main data structure such as points, the sliding window of the pose storage is used to store the calculated pose information; the image sliding window is used to cache the shooting information of the mobile terminal, waiting for the extraction of key points and the estimation of the depth of the calibration point; the key points are used It is used to identify the pixel gradient change of a certain area in a frame of image for the similarity comparison of subsequent images; besides this main data structure, the pose manager also includes key point extraction function, key point similarity estimation function, key Point update function, key point discard function, key point index function, key point depth estimation function, frame pose estimation function, and pose addition, deletion, and query functions.
语义定位校准初始化单元:用于语义定位校准的初始化;包括目标检测神经网络的训练、模型的加载,以及BIM空间数据库的生成及加载。本申请所述目标检测神经网络结构是参考现有目标检测方法,在训练网络时采用专有静态物体数据集训练并经优化后,部署至移动终端。BIM空间数据库采用R树方法构建、索引,其中BIM空间数据结构包含当前位置所在区域的电子地图和该区域包含的静态物体类别、静态物体坐标信息、静态物体临近的其他静态物体以及当前位置所在建筑的每层建筑空间布局信息。Semantic positioning calibration initialization unit: used for the initialization of semantic positioning calibration; including target detection neural network training, model loading, and BIM spatial database generation and loading. The target detection neural network structure described in this application refers to existing target detection methods, and uses a proprietary static object data set to train and optimize the network when training the network, and then deploy it to the mobile terminal. The BIM spatial database is constructed and indexed using the R-tree method. The BIM spatial data structure includes the electronic map of the area where the current location is located and the category of static objects contained in the area, the coordinate information of the static object, other static objects near the static object, and the building where the current location is located The spatial layout information of each floor of the building.
初始定位单元:用于获取移动终端的初始位置,并将初始位置设置为当前位置;其中,本申请实施例的移动终端为配备Android的终端设备,带有9轴IMU传感器。初始定位可通过室内平面地图设定起始点位置,或通过自动识别唯一标志设定,然后基于加速度、陀螺仪等传感器累计。Initial positioning unit: used to obtain the initial position of the mobile terminal, and set the initial position as the current position; wherein, the mobile terminal in the embodiment of the present application is a terminal device equipped with Android and has a 9-axis IMU sensor. The initial positioning can be set through the indoor flat map to set the starting point position, or set through automatic identification of unique signs, and then accumulate based on acceleration, gyroscope and other sensors.
视频帧获取模块:用于使用移动终端获取视频帧;Video frame acquisition module: used to acquire video frames using mobile terminals;
位姿计算模块:用于通过视觉里程计根据获取的视频帧计算移动终端当前的位姿信息;其中,位姿是运动矩阵,包含移动终端旋转、平移等信息。视觉里程计通过利用关键点的像素残差,来估计移动终端相邻两帧的位姿变化,进而求得移动终端的运动偏移。具体的,位姿计算模块包括:Pose calculation module: used to calculate the current pose information of the mobile terminal according to the acquired video frames through the visual odometer; among them, the pose is the motion matrix, which contains information such as the rotation and translation of the mobile terminal. The visual odometer uses the pixel residuals of key points to estimate the pose changes of the mobile terminal in two adjacent frames, and then obtains the movement offset of the mobile terminal. Specifically, the pose calculation module includes:
关键点提取单元:用于将获取的视频帧缩放至300px*300px后,存入图像滑动窗口,并判断当前视频帧是否为第一帧,如果当前视频帧为第一帧则仅提取关键点,否则,提取关键点并计算该关键点与前一视频帧关键点的残差;其 中,关键点计算步骤如下:首先在视频帧上选取像素p,并取像素P灰度值G的30%为阈值T。然后,以像素p为中心,选取半径为3的圆上的16个像素点,如果选取的圆上连续有N个点亮度大于G+T或者小于G-T时,则该像素p可被认为是视频帧的关键点。重复上述步骤,直到遍历完视频帧上的所有像素,完成关键点计算;最后,采用非极大值抑制,在50px*50px内,仅保留响应极大值的关键点。如图3所示,为关键点选取示意图。例如:点p半径为3的十六个像素值下表灰色部分,灰色部分就是选取的N个点,计算这N个点与P的大小,按照上面规则确定是否是关键点。P是一个0到255的值。单个关键点残差e是关键点像素亮度的误差,计算公式如下:Key point extraction unit: used to scale the acquired video frame to 300px*300px, save it in the image sliding window, and determine whether the current video frame is the first frame, if the current video frame is the first frame, only the key points are extracted. Otherwise, extract the key point and calculate the residual difference between the key point and the key point of the previous video frame; among them, the key point calculation steps are as follows: first select the pixel p on the video frame, and take 30% of the gray value G of the pixel P as Threshold T. Then, taking the pixel p as the center, select 16 pixels on a circle with a radius of 3. If there are N consecutive points on the selected circle whose brightness is greater than G+T or less than GT, the pixel p can be considered as a video The key point of the frame. Repeat the above steps until all the pixels on the video frame are traversed to complete the key point calculation; finally, non-maximum suppression is used, and within 50px*50px, only the key points that respond to the maximum value are retained. As shown in Figure 3, a schematic diagram of key points selection. For example: the sixteen pixel values of point p with a radius of 3 are in the gray part of the table below. The gray part is the selected N points, calculate the size of these N points and P, and determine whether they are key points according to the above rules. P is a value from 0 to 255. The residual e of a single key point is the error of the pixel brightness of the key point, and the calculation formula is as follows:
e=I 1(x 1)-I 2(x 2)=I 1(Kp 1)-I 2(K(Rp 1+t))    (1) e=I 1 (x 1 )-I 2 (x 2 )=I 1 (Kp 1 )-I 2 (K(Rp 1 +t)) (1)
公式(1)中,I 2为I 1经过某种运动得到,R和t是移动终端运动轨迹,x 1是关键点在图像I 1中的像素位置,x 2是关键点在图像I 2中的像素位置,p 1是关键点在现实空间中的坐标,K是移动终端的内参矩阵。残差的李代数形式为: In formula (1), I 2 is obtained by I 1 after a certain movement, R and t are the trajectories of the mobile terminal, x 1 is the pixel position of the key point in the image I 1 , and x 2 is the key point in the image I 2 P 1 is the coordinate of the key point in real space, and K is the internal parameter matrix of the mobile terminal. The Lie algebraic form of the residual is:
e=I 1(Kp 1)-I 2(K(exp(ξ )p 1))    (2) e=I 1 (Kp 1 )-I 2 (K(exp(ξ )p 1 )) (2)
公式(2)中,
Figure PCTCN2019130553-appb-000013
In formula (2),
Figure PCTCN2019130553-appb-000013
运动位姿求解单元:用于利用高斯牛顿法求解残差雅可比,得到当前视频帧与上一视频帧的运动位姿,并将其记录至位姿存储滑动窗口;其中,利用高斯牛顿法求解残差雅可比具体包括:Motion pose solving unit: used to use Gauss-Newton method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record it in the pose storage sliding window; among them, use Gauss-Newton method to solve The residual Jacobian specifically includes:
移动终端位姿优化目标函数:Mobile terminal pose optimization objective function:
Figure PCTCN2019130553-appb-000014
Figure PCTCN2019130553-appb-000014
公式(3)中,ξ是移动终端位姿,J是残差相对于李代数的梯度;即
Figure PCTCN2019130553-appb-000015
In formula (3), ξ is the pose of the mobile terminal, and J is the gradient of the residual with respect to the Lie algebra; that is,
Figure PCTCN2019130553-appb-000015
高斯牛顿迭代增量求解方程为:The Gauss-Newton iteration increment solution equation is:
Figure PCTCN2019130553-appb-000016
Figure PCTCN2019130553-appb-000016
公式(4)中,Δξ *是迭代增量。 In formula (4), Δξ * is the iteration increment.
高斯牛顿法求解最优化问题,通过目标函数的一阶梯度,沿着该梯度下降迭代求解。The Gauss-Newton method is used to solve the optimization problem, through a step of the objective function, and iteratively descending along the gradient.
目标函数为:(x i-x) 2+(y i-y) 2+(z i-z) 2=f(X)其中X=[x,y,z] The objective function is: (x i -x) 2 +(y i -y) 2 +(z i -z) 2 =f(X) where X=[x,y,z]
迭代增量ΔX k为:J(X) TJ(X)ΔX=-J(X) Tf(X) The iteration increment ΔX k is: J(X) T J(X)ΔX=-J(X) T f(X)
其中J(X)是方程的雅可比矩阵;Where J(X) is the Jacobian matrix of the equation;
Figure PCTCN2019130553-appb-000017
Figure PCTCN2019130553-appb-000017
公式(5)中,
Figure PCTCN2019130553-appb-000018
In formula (5),
Figure PCTCN2019130553-appb-000018
高斯牛顿法求解流程如下:The solution process of Gauss-Newton method is as follows:
1:给定初始点p 0,以及迭代次数k,允许误差ε>0。 1: Given the initial point p 0 and the number of iterations k, the allowable error ε>0.
2:若目标函数f((X) k+1)-f((X) k)小于阈值ε则退出,否则执行下一步; 2: If the objective function f((X) k+1 )-f((X) k ) is less than the threshold ε, exit, otherwise proceed to the next step;
3:算出迭代增量,带入目标函数,返回步骤1。3: Calculate the iteration increment, bring in the objective function, and return to step 1.
运动偏移计算单元:用于得到当前视频帧的移动终端位姿,该位姿是一个六自由度的行向量,提取位姿的空间偏移量,并把空间偏移量转换成相对坐标偏移值,即为移动终端的运动偏移。Motion offset calculation unit: used to obtain the mobile terminal pose of the current video frame, which is a row vector with six degrees of freedom, extracts the spatial offset of the pose, and converts the spatial offset into a relative coordinate offset The shift value is the movement offset of the mobile terminal.
定位判断模块:判断定位监听终端,如果不是定位状态下,通过位置更新模块更新定位目标的当前位置;否则,通过目标定位模块获取定位目标的定位结果;其中,本申请在获取视频帧后即开始视觉里程计位姿计算,如果不是定位状态下,将计算得到的位姿信息与上一次获取到的当前位置相加,得到本次当前位置的更新结果。该步骤一直反复执行,并不会因定位状态改变而停止。 是定位状态下时,则调用语义定位校准,根据识别的物体计算当前位置,替换视觉里程计所计算的运动偏移,并结合空间数据库中的静态物体坐标更新用户的当前位置,然后将该位置作为新的位置绘制到地图平台上。Positioning judgment module: judge the positioning monitoring terminal, if it is not in the positioning state, update the current position of the positioning target through the position update module; otherwise, obtain the positioning result of the positioning target through the target positioning module; among them, this application starts after obtaining the video frame Visual odometer pose calculation, if it is not in the positioning state, add the calculated pose information to the current position obtained last time to get the current position update result. This step has been executed repeatedly and will not stop due to changes in positioning status. When it is in the positioning state, the semantic positioning calibration is called, the current position is calculated according to the recognized object, the movement offset calculated by the visual odometer is replaced, and the current position of the user is updated with the static object coordinates in the spatial database, and then the position It is drawn on the map platform as a new location.
位置更新模块:用于将当前位姿信息与上一次获取的当前位置相加,对定位目标的当前位置进行更新,并根据更新后的当前位置绘制室内电子地图;Position update module: used to add the current pose information to the current position obtained last time, update the current position of the positioning target, and draw an indoor electronic map according to the updated current position;
目标定位模块:用于调用语义定位校准检测视频帧中的静态物体,并利用标定点采用三角测量方法(三角测量是指通过在两处观察同一个点的夹角,从而确定该点的距离)估计静态物体的深度信息,通过静态物体的深度信息以及空间数据的语义信息提出多目标物体定位模型,通过高斯牛顿法迭代求解定位模型,获取静态物体坐标信息及移动终端当前位置,将静态物体坐标信息与移动终端当前位置相结合,得到当前位置的定位结果。Target positioning module: used to call semantic positioning calibration to detect static objects in the video frame, and use the calibration point to adopt the triangulation method (triangulation refers to observing the angle of the same point in two places to determine the distance of the point) Estimate the depth information of the static object, propose a multi-target object positioning model through the depth information of the static object and the semantic information of the spatial data, use the Gauss-Newton method to iteratively solve the positioning model, obtain the static object coordinate information and the current position of the mobile terminal, and convert the static object coordinate The information is combined with the current position of the mobile terminal to obtain the positioning result of the current position.
具体的,目标定位模块包括:Specifically, the target positioning module includes:
物体识别及深度计算单元:用于取出图像滑动窗口中的视频帧,输入至目标检测神经网络中,得到该视频帧中所包含的静态物体种类,并将识别出的静态物体的中心像素位置设置为标定点,然后取出下一视频帧以及下一视频帧的移动终端位姿信息,利用三角测量法,算出标定点与移动终端的深度信息;其中,三角测量法公式如下:Object recognition and depth calculation unit: used to take out the video frame in the image sliding window, input it into the target detection neural network, get the type of static object contained in the video frame, and set the center pixel position of the identified static object To calibrate the point, take out the next video frame and the pose information of the mobile terminal in the next video frame, and use the triangulation method to calculate the depth information between the calibration point and the mobile terminal; the formula of the triangulation method is as follows:
Figure PCTCN2019130553-appb-000019
Figure PCTCN2019130553-appb-000019
公式(6)中,s 1,s 2是关键点的深度信息。 In formula (6), s 1 and s 2 are the depth information of key points.
粗定位单元:用于得到视频帧中静态物体的坐标信息;粗定位的位置数据来自于BIM空间数据库中存储的静态物体的坐标信息,通过识别的物体类别,以当前位置的坐标信息范围附近检索所加载的BIM空间信息数据库,找出识别的物体类别所携带的坐标信息。Coarse positioning unit: used to obtain the coordinate information of the static object in the video frame; the position data of the coarse positioning comes from the coordinate information of the static object stored in the BIM spatial database, and retrieved near the coordinate information range of the current position through the recognized object category Load the BIM spatial information database to find out the coordinate information carried by the recognized object category.
精定位单元:用于将粗定位得到的坐标信息带入定位模型中,并采用高斯牛顿法迭代求解,得到移动终端当前位置。其中,静态物体与终端位置的关系应满足以下关系:Fine positioning unit: It is used to bring the coordinate information obtained by coarse positioning into the positioning model, and iteratively solve by Gauss Newton method to obtain the current position of the mobile terminal. Among them, the relationship between the static object and the terminal position should satisfy the following relationship:
(x i-x) 2+(y i-y) 2+(z i-z) 2=δρ ii   (7) (x i -x) 2 +(y i -y) 2 +(z i -z) 2 =δρ ii (7)
因此可建立并解算如下非线性方程组:Therefore, the following nonlinear equations can be established and solved:
Figure PCTCN2019130553-appb-000020
Figure PCTCN2019130553-appb-000020
公式(8)中,(x,y,z)是移动终端当前的位置,(x n,y n,z n)是BIM空间数据库中存储的静态物体坐标信息,表征静态物体距离固定坐标(比如建筑物中心的坐标信息)的相对位置,ρ n是当前静态物体到移动终端的深度,σ n是深度的测量噪声。 In formula (8), (x, y, z) is the current position of the mobile terminal, (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, representing the distance of the static object from the fixed coordinate (such as The relative position of the coordinate information of the building center), ρ n is the depth of the current static object to the mobile terminal, and σ n is the depth measurement noise.
定位结果生成单元:用于将精定位迭代计算出的移动终端当前位置(该位置为相对于建筑物特定目标坐标的偏移)与粗定位得到的静态物体坐标信息相结合,得到当前位置的定位结果。根据定位结果获取所在区域的电子地图,叠加BIM信息,生成当前定位结果的临近室内电子地图。Positioning result generating unit: used to combine the current position of the mobile terminal calculated by the fine positioning iteration (the position is the offset relative to the specific target coordinates of the building) with the static object coordinate information obtained from the coarse positioning to obtain the current position positioning result. Obtain the electronic map of the area according to the positioning result, superimpose the BIM information, and generate the nearby indoor electronic map of the current positioning result.
图5是本申请实施例提供的移动端视觉融合定位方法的硬件设备结构示意图。如图5所示,该设备包括一个或多个处理器以及存储器。以一个处理器为例,该设备还可以包括:输入系统和输出系统。FIG. 5 is a schematic diagram of the hardware device structure of the mobile terminal vision fusion positioning method provided by an embodiment of the present application. As shown in Figure 5, the device includes one or more processors and memory. Taking a processor as an example, the device may also include: an input system and an output system.
处理器、存储器、输入系统和输出系统可以通过总线或者其他方式连接,图5中以通过总线连接为例。The processor, the memory, the input system, and the output system may be connected by a bus or other methods. In FIG. 5, the connection by a bus is taken as an example.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块。处理器通过运行存储在存储器中的非暂态 软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述方法实施例的处理方法。As a non-transitory computer-readable storage medium, the memory can be used to store non-transitory software programs, non-transitory computer executable programs and modules. The processor executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions, and modules stored in the memory, that is, realizing the processing methods of the foregoing method embodiments.
存储器可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至处理系统。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data and the like. In addition, the memory may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory may optionally include a memory remotely arranged with respect to the processor, and these remote memories may be connected to the processing system through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
输入系统可接收输入的数字或字符信息,以及产生信号输入。输出系统可包括显示屏等显示设备。The input system can receive input digital or character information, and generate signal input. The output system may include display devices such as a display screen.
所述一个或者多个模块存储在所述存储器中,当被所述一个或者多个处理器执行时,执行上述任一方法实施例的以下操作:The one or more modules are stored in the memory, and when executed by the one or more processors, the following operations of any of the foregoing method embodiments are performed:
根据输电线路元件的图像样本数据构建移动端视觉融合定位模型,具体包括以下步骤:According to the image sample data of the transmission line components to construct the mobile terminal visual fusion positioning model, it specifically includes the following steps:
步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模 块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例提供的方法。The above-mentioned products can execute the methods provided in the embodiments of the present application, and have corresponding functional modules and beneficial effects for the execution methods. For technical details not described in detail in this embodiment, please refer to the method provided in the embodiment of this application.
本申请实施例提供了一种非暂态(非易失性)计算机存储介质,所述计算机存储介质存储有计算机可执行指令,该计算机可执行指令可执行以下操作:The embodiments of the present application provide a non-transitory (non-volatile) computer storage medium, the computer storage medium stores computer executable instructions, and the computer executable instructions can perform the following operations:
步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行以下操作:The embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer To make the computer do the following:
步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
本申请实施例的移动端视觉融合定位方法、系统及电子设备利用视觉传感器检测并识别真实世界的静态物体得到物体空间关系,并与BIM模型提供的 物体空间关系进行地理拓扑空间匹配,再根据物体的距离测量建立非线性方程组,迭代求解方程组,收敛得到精确位置,从而实现更精准更便捷更低廉的定位方法。The mobile terminal vision fusion positioning method, system and electronic device of the embodiments of the application use visual sensors to detect and recognize static objects in the real world to obtain object spatial relationships, and perform geographic topological spatial matching with the object spatial relationships provided by the BIM model, and then according to the objects The distance measurement establishes a nonlinear equation set, iteratively solves the equation set, and converges to obtain a precise position, thereby realizing a more accurate, convenient and cheaper positioning method.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本申请中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本申请所示的这些实施例,而是要符合与本申请所公开的原理和新颖特点相一致的最宽的范围。The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use this application. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined in this application can be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application will not be limited to the embodiments shown in this application, but should conform to the widest scope consistent with the principles and novel features disclosed in this application.

Claims (11)

  1. 一种移动端视觉融合定位方法,其特征在于,包括以下步骤:A mobile terminal vision fusion positioning method is characterized in that it comprises the following steps:
    步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
    步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
    步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的地理坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the geographic coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and use the Gauss Newton method to iteratively solve the problem. The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  2. 根据权利要求1所述的移动端视觉融合定位方法,其特征在于,在所述步骤b中,所述使用移动终端获取视频帧后还包括:视觉里程计根据获取的视频帧计算移动终端当前的位姿信息。The mobile terminal visual fusion positioning method according to claim 1, characterized in that, in said step b, after said using the mobile terminal to obtain the video frame, it further comprises: the visual odometer calculates the current mobile terminal according to the obtained video frame Posture information.
  3. 根据权利要求2所述的移动端视觉融合定位方法,其特征在于,所述视觉里程计根据获取的视频帧计算移动终端当前的位姿信息具体包括:The mobile terminal visual fusion positioning method according to claim 2, wherein the visual odometer calculating the current pose information of the mobile terminal according to the acquired video frames specifically comprises:
    步骤b1:视觉里程计将获取的视频帧缩放至设定大小后,存入图像滑动窗口,并判断当前视频帧是否为第一帧,如果当前视频帧为第一帧则仅进行提取关键点操作;否则,提取关键点并计算当前关键点与前一视频帧关键点的残差;单个关键点残差e是关键点像素亮度的误差,计算公式为:Step b1: The visual odometer scales the acquired video frame to the set size, saves it in the image sliding window, and judges whether the current video frame is the first frame, if the current video frame is the first frame, only the key point extraction operation is performed ; Otherwise, extract the key points and calculate the residual difference between the current key point and the key point of the previous video frame; the single key point residual e is the error of the key point pixel brightness, and the calculation formula is:
    e=I 1(x 1)-I 2(x 2)=I 1(Kp 1)-I 2(K(Rp 1+t)) e=I 1 (x 1 )-I 2 (x 2 )=I 1 (Kp 1 )-I 2 (K(Rp 1 +t))
    上述公式中,I 2为I 1经过某种运动得到,R和t是移动终端运动轨迹,x 1是关键点在图像I 1中的像素位置,x 2是关键点在图像I 2中的像素位置,p 1是关键点在现实空间中的坐标,K是移动终端的内参矩阵; In the above formula, I 2 is obtained by I 1 after a certain movement, R and t are the motion trajectories of the mobile terminal, x 1 is the pixel position of the key point in the image I 1 , and x 2 is the pixel of the key point in the image I 2 Position, p 1 is the coordinate of the key point in real space, K is the internal parameter matrix of the mobile terminal;
    步骤b2:利用高斯牛顿法求解残差雅可比,得到当前视频帧与上一视频帧的运动位姿,并将其记录至位姿存储滑动窗口;Step b2: Use Gauss Newton's method to solve the residual Jacobian, obtain the motion pose of the current video frame and the previous video frame, and record them in the pose storage sliding window;
    步骤b3:得到当前视频帧的移动终端位姿,提取所述位姿的空间偏移量,并将空间偏移量转换成相对坐标偏移值,即为移动终端的运动偏移。Step b3: Obtain the mobile terminal pose of the current video frame, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is the motion offset of the mobile terminal.
  4. 根据权利要求3所述的移动端视觉融合定位方法,其特征在于,所述步骤b还包括:判断监听定位状态,如果不是定位状态,则将所述移动终端当前的位姿信息与上一次获取的当前位置相加后,对定位目标的当前位置进行更新;如果是定位状态,执行步骤c。The mobile terminal visual fusion positioning method according to claim 3, wherein the step b further comprises: judging and monitoring the positioning state, and if it is not the positioning state, then comparing the current pose information of the mobile terminal with the last acquired After the current position of is added, the current position of the positioning target is updated; if it is in the positioning state, perform step c.
  5. 根据权利要求4所述的移动端视觉融合定位方法,其特征在于,在所述步骤c中,所述检测视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置具体包括:The mobile terminal visual fusion positioning method according to claim 4, characterized in that, in the step c, the static object in the video frame is detected, the coordinate information of the static object is obtained through the BIM spatial database, and the The coordinate information of the static object is brought into the multi-target object positioning model, and the positioning model is solved iteratively by the Gauss-Newton method, and obtaining the current position of the mobile terminal specifically includes:
    步骤c1:取出所述视频帧,输入至目标检测神经网络中,得到该视频帧中所包含的静态物体种类,并将所述静态物体的中心像素位置设置为标定点,然后取出下一视频帧以及所述下一视频帧的移动终端位姿信息,利用三角测量法算出标定点与移动终端的深度信息;其中,所述三角测量法公式如下:Step c1: Take out the video frame and input it into the target detection neural network to obtain the type of static object contained in the video frame, and set the center pixel position of the static object as a calibration point, and then take out the next video frame And the mobile terminal pose information of the next video frame, using triangulation to calculate the calibration point and the depth information of the mobile terminal; wherein the triangulation formula is as follows:
    Figure PCTCN2019130553-appb-100001
    Figure PCTCN2019130553-appb-100001
    上述公式中,s 1,s 2是关键点的深度信息; In the above formula, s 1 and s 2 are the depth information of key points;
    步骤c2:通过识别的静态物体类别,以当前位置的坐标信息加载BIM空间信息数据库,根据所述BIM空间信息数据库得到所述静态物体的坐标信息;Step c2: Load the BIM spatial information database with the coordinate information of the current position through the identified static object category, and obtain the coordinate information of the static object according to the BIM spatial information database;
    步骤c3:将所述静态物体的坐标信息带入定位模型中,并采用高斯牛顿法迭代求解,得到移动终端当前位置;高斯牛顿法求解方程组为:Step c3: Bring the coordinate information of the static object into the positioning model, and use Gauss-Newton method to iteratively solve to obtain the current position of the mobile terminal; the Gauss-Newton method to solve the equations is:
    Figure PCTCN2019130553-appb-100002
    Figure PCTCN2019130553-appb-100002
    上述公式中,(x,y,z)是移动终端的当前位置,(x n,y n,z n)是BIM空间数据库中存储的静态物体坐标信息,ρ n是静态物体到移动终端的深度,σ n是深度的测量噪声; In the above formula, (x, y, z) is the current position of the mobile terminal, (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, and ρ n is the depth of the static object to the mobile terminal , Σ n is the depth measurement noise;
    步骤c4:将所述移动终端当前位置与静态物体坐标信息相结合,得到当前位置的定位结果,并生成所述定位结果的室内电子地图及BIM空间数据库。Step c4: Combine the current position of the mobile terminal with the static object coordinate information to obtain a positioning result of the current position, and generate an indoor electronic map and a BIM spatial database of the positioning result.
  6. 一种移动端视觉融合定位系统,其特征在于,包括:A mobile terminal vision fusion positioning system, characterized in that it comprises:
    初始定位单元:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Initial positioning unit: Based on the calibrated starting position and sensor information, obtain the initial position of the mobile terminal and set the initial position as the current position of the positioning target;
    视频帧获取模块:用于使用移动终端获取视频帧;Video frame acquisition module: used to acquire video frames using mobile terminals;
    目标定位模块:用于检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端新的当前位置,并将所述移动终端新的当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Target positioning module: used to detect static objects in the video frame, obtain the coordinate information of the static objects through the BIM spatial database, bring the coordinate information of the static objects into the multi-target object positioning model, and iterate through the Gauss-Newton method Solve the positioning model, obtain the new current position of the mobile terminal, and combine the new current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
  7. 根据权利要求6所述的移动端视觉融合定位系统,其特征在于,还包括位姿计算模块,所述位姿计算模块用于通过视觉里程计根据获取的视频帧计算移动终端当前的位姿信息。The mobile terminal visual fusion positioning system according to claim 6, further comprising a pose calculation module, the pose calculation module is used to calculate the current pose information of the mobile terminal according to the acquired video frame through the visual odometer .
  8. 根据权利要求7所述的移动端视觉融合定位系统,其特征在于,所述位姿计算模块包括:The mobile terminal visual fusion positioning system according to claim 7, wherein the pose calculation module comprises:
    关键点提取单元:用于通过视觉里程计将获取的视频帧缩放至设定大小后,存入图像滑动窗口,并判断当前视频帧是否为第一帧,如果当前视频帧为第一 帧则仅进行提取关键点操作;否则,提取关键点并计算当前关键点与前一视频帧关键点的残差;单个关键点残差e是关键点像素亮度的误差,计算公式为:Key point extraction unit: used to zoom the acquired video frame to a set size through the visual odometer, save it in the image sliding window, and determine whether the current video frame is the first frame, if the current video frame is the first frame, only Carry out the operation of extracting key points; otherwise, extract the key points and calculate the residual difference between the current key point and the key point of the previous video frame; the single key point residual e is the error of the key point pixel brightness, and the calculation formula is:
    e=I 1(x 1)-I 2(x 2)=I 1(Kp 1)-I 2(K(Rp 1+t)) e=I 1 (x 1 )-I 2 (x 2 )=I 1 (Kp 1 )-I 2 (K(Rp 1 +t))
    上述公式中,I 2为I 1经过某种运动得到,R和t是移动终端运动轨迹,x 1是关键点在图像I 1中的像素位置,x 2是关键点在图像I 2中的像素位置,p 1是关键点在现实空间中的坐标,K是移动终端的内参矩阵; In the above formula, I 2 is obtained by I 1 after a certain movement, R and t are the motion trajectories of the mobile terminal, x 1 is the pixel position of the key point in the image I 1 , and x 2 is the pixel of the key point in the image I 2 Position, p 1 is the coordinate of the key point in real space, K is the internal parameter matrix of the mobile terminal;
    运动位姿求解单元:用于利用高斯牛顿法求解残差雅可比,得到当前视频帧与上一视频帧的运动位姿,并将其记录至位姿存储滑动窗口;Motion pose solving unit: used to use Gauss-Newton method to solve the residual Jacobian, get the motion pose of the current video frame and the previous video frame, and record it to the pose storage sliding window;
    运动偏移计算单元:用于得到当前视频帧的移动终端位姿,提取所述位姿的空间偏移量,并将空间偏移量转换成相对坐标偏移值,即为移动终端的运动偏移。Motion offset calculation unit: used to obtain the mobile terminal pose of the current video frame, extract the spatial offset of the pose, and convert the spatial offset into a relative coordinate offset value, which is the motion offset of the mobile terminal shift.
  9. 根据权利要求8所述的移动端视觉融合定位系统,其特征在于,还包括定位判断模块和位置更新模块,所述定位判断模块通过判断监听定位状态,如果不是定位状态,则通过位置更新模块将所述移动终端当前的位姿信息与上一次获取的当前位置相加后,对定位目标的当前位置进行更新;如果是定位状态,通过目标定位模块获取定位目标的定位结果。The mobile terminal vision fusion positioning system according to claim 8, further comprising a positioning judgment module and a position update module, the positioning judgment module monitors the positioning status by judging, and if it is not the positioning status, the location update module will After the current pose information of the mobile terminal is added to the current position acquired last time, the current position of the positioning target is updated; if it is in the positioning state, the positioning result of the positioning target is obtained through the target positioning module.
  10. 根据权利要求9所述的移动端视觉融合定位系统,其特征在于,所述目标定位模块具体包括:The mobile terminal visual fusion positioning system according to claim 9, wherein the target positioning module specifically comprises:
    物体识别及深度计算单元:用于取出所述视频帧,输入至目标检测神经网络中,得到该视频帧中所包含的静态物体种类,并将所述静态物体的中心像素位置设置为标定点,然后取出下一视频帧以及所述下一视频帧的移动终端位姿信息,利用三角测量法算出标定点与移动终端的深度信息;其中,所述三角测量法公式如下:Object recognition and depth calculation unit: used to take out the video frame, input it into the target detection neural network, obtain the type of static object contained in the video frame, and set the center pixel position of the static object as a calibration point, Then take out the next video frame and the mobile terminal pose information of the next video frame, and use the triangulation method to calculate the calibration point and the depth information of the mobile terminal; wherein the triangulation method formula is as follows:
    Figure PCTCN2019130553-appb-100003
    Figure PCTCN2019130553-appb-100003
    上述公式中,s 1,s 2是关键点的深度信息; In the above formula, s 1 and s 2 are the depth information of key points;
    粗定位单元:用于通过识别的静态物体类别,以当前位置的坐标信息加载BIM空间信息数据库,根据所述BIM空间信息数据库得到所述静态物体的坐标信息;Coarse positioning unit: used to load the BIM spatial information database with the coordinate information of the current position through the identified static object category, and obtain the coordinate information of the static object according to the BIM spatial information database;
    精定位单元:用于将所述静态物体的坐标信息带入定位模型中,并采用高斯牛顿法迭代求解,得到移动终端当前位置;高斯牛顿法求解方程组为:Fine positioning unit: used to bring the coordinate information of the static object into the positioning model, and iteratively solve it by using the Gauss Newton method to obtain the current position of the mobile terminal; the Gauss Newton method to solve the equations is:
    Figure PCTCN2019130553-appb-100004
    Figure PCTCN2019130553-appb-100004
    上述公式中,(x,y,z)是移动终端的当前位置,(x n,y n,z n)是BIM空间数据库中存储的静态物体坐标信息,ρ n是静态物体到移动终端的深度,σ n是深度的测量噪声; In the above formula, (x, y, z) is the current position of the mobile terminal, (x n , y n , z n ) is the static object coordinate information stored in the BIM spatial database, and ρ n is the depth of the static object to the mobile terminal , Σ n is the depth measurement noise;
    定位结果生成单元:用于将所述移动终端当前位置与静态物体坐标信息相结合,得到当前位置的定位结果,并生成临近室内电子地图。Positioning result generating unit: used to combine the current position of the mobile terminal with the static object coordinate information to obtain the positioning result of the current position, and generate a nearby indoor electronic map.
  11. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述1至5任一项所述的移动端视觉融合定位方法的以下操作:At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor to enable The at least one processor can perform the following operations of the mobile terminal visual fusion positioning method described in any one of 1 to 5 above:
    步骤a:基于标定的起始位置和传感器信息,获取移动终端的初始位置,并将初始位置设置为定位目标的当前位置;Step a: Obtain the initial position of the mobile terminal based on the calibrated starting position and sensor information, and set the initial position as the current position of the positioning target;
    步骤b:使用移动终端获取视频帧;Step b: Use a mobile terminal to obtain video frames;
    步骤c:检测所述视频帧中的静态物体,通过BIM空间数据库获取所述静 态物体的坐标信息,将所述静态物体的坐标信息带入多目标物体定位模型,通过高斯牛顿法迭代求解所述定位模型,获取移动终端当前位置,并将所述移动终端当前位置与静态物体的坐标信息相结合,得到定位目标的定位结果。Step c: Detect the static object in the video frame, obtain the coordinate information of the static object through the BIM spatial database, bring the coordinate information of the static object into the multi-target object positioning model, and solve the iteratively by Gauss-Newton method The positioning model obtains the current position of the mobile terminal, and combines the current position of the mobile terminal with the coordinate information of the static object to obtain the positioning result of the positioning target.
PCT/CN2019/130553 2019-06-26 2019-12-31 Mobile side vision fusion positioning method and system, and electronic device WO2020258820A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910562370.7A CN110375739B (en) 2019-06-26 2019-06-26 Mobile terminal vision fusion positioning method and system and electronic equipment
CN201910562370.7 2019-06-26

Publications (1)

Publication Number Publication Date
WO2020258820A1 true WO2020258820A1 (en) 2020-12-30

Family

ID=68249420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130553 WO2020258820A1 (en) 2019-06-26 2019-12-31 Mobile side vision fusion positioning method and system, and electronic device

Country Status (2)

Country Link
CN (1) CN110375739B (en)
WO (1) WO2020258820A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110375739B (en) * 2019-06-26 2021-08-24 中国科学院深圳先进技术研究院 Mobile terminal vision fusion positioning method and system and electronic equipment
CN112815923B (en) * 2019-11-15 2022-12-30 华为技术有限公司 Visual positioning method and device
CN111189440B (en) * 2019-12-31 2021-09-07 中国电建集团华东勘测设计研究院有限公司 Positioning navigation method based on comparison of spatial information model and real-time image
CN111462200B (en) * 2020-04-03 2023-09-19 中国科学院深圳先进技术研究院 Cross-video pedestrian positioning and tracking method, system and equipment
CN111735487B (en) * 2020-05-18 2023-01-10 清华大学深圳国际研究生院 Sensor, sensor calibration method and device, and storage medium
CN112533135B (en) * 2020-11-18 2022-02-15 联通智网科技股份有限公司 Pedestrian positioning method and device, server and storage medium
CN112788583B (en) * 2020-12-25 2024-01-05 深圳酷派技术有限公司 Equipment searching method and device, storage medium and electronic equipment
CN113963068B (en) * 2021-10-25 2022-08-23 季华实验室 Global calibration method for mirror image type single-camera omnidirectional stereoscopic vision sensor
WO2024043831A1 (en) * 2022-08-23 2024-02-29 Nanyang Technological University Mobile robot initialization in a building based on a building information model (bim) of the building

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106352867A (en) * 2015-07-16 2017-01-25 福特全球技术公司 Method and apparatus for determining a vehicle ego-position
CN108955718A (en) * 2018-04-10 2018-12-07 中国科学院深圳先进技术研究院 A kind of visual odometry and its localization method, robot and storage medium
WO2019025112A1 (en) * 2017-07-12 2019-02-07 Veoneer Sweden Ab A driver assistance system and method
CN109523589A (en) * 2018-11-13 2019-03-26 浙江工业大学 A kind of design method of more robust visual odometry
CN109544636A (en) * 2018-10-10 2019-03-29 广州大学 A kind of quick monocular vision odometer navigation locating method of fusion feature point method and direct method
CN110375739A (en) * 2019-06-26 2019-10-25 中国科学院深圳先进技术研究院 A kind of mobile terminal vision fusion and positioning method, system and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056664B (en) * 2016-05-23 2018-09-21 武汉盈力科技有限公司 A kind of real-time three-dimensional scene reconstruction system and method based on inertia and deep vision
CN108253964A (en) * 2017-12-29 2018-07-06 齐鲁工业大学 A kind of vision based on Time-Delay Filter/inertia combined navigation model building method
CN108717712B (en) * 2018-05-29 2021-09-03 东北大学 Visual inertial navigation SLAM method based on ground plane hypothesis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106352867A (en) * 2015-07-16 2017-01-25 福特全球技术公司 Method and apparatus for determining a vehicle ego-position
WO2019025112A1 (en) * 2017-07-12 2019-02-07 Veoneer Sweden Ab A driver assistance system and method
CN108955718A (en) * 2018-04-10 2018-12-07 中国科学院深圳先进技术研究院 A kind of visual odometry and its localization method, robot and storage medium
CN109544636A (en) * 2018-10-10 2019-03-29 广州大学 A kind of quick monocular vision odometer navigation locating method of fusion feature point method and direct method
CN109523589A (en) * 2018-11-13 2019-03-26 浙江工业大学 A kind of design method of more robust visual odometry
CN110375739A (en) * 2019-06-26 2019-10-25 中国科学院深圳先进技术研究院 A kind of mobile terminal vision fusion and positioning method, system and electronic equipment

Also Published As

Publication number Publication date
CN110375739A (en) 2019-10-25
CN110375739B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2020258820A1 (en) Mobile side vision fusion positioning method and system, and electronic device
CN112347840B (en) Vision sensor laser radar integrated unmanned aerial vehicle positioning and image building device and method
US11961297B2 (en) Terminal device, information processing device, object identifying method, program, and object identifying system
US10869166B2 (en) Location correlation in a region based on signal strength indications
JP5893802B2 (en) Sensor calibration and position estimation based on vanishing point determination
CN103119611B (en) The method and apparatus of the location based on image
WO2013027628A1 (en) Information processing device, information processing method, and program
CN105246039A (en) Image processing-based indoor positioning method and system
CN103827634A (en) Logo detection for indoor positioning
JP2021517284A (en) Indoor positioning methods, indoor positioning systems, indoor positioning devices and computer readable media
US10846933B2 (en) Geophysical sensor positioning system
CN111812978B (en) Cooperative SLAM method and system for multiple unmanned aerial vehicles
CN112652020A (en) Visual SLAM method based on AdaLAM algorithm
JP2023503750A (en) ROBOT POSITIONING METHOD AND DEVICE, DEVICE, STORAGE MEDIUM
TWI822423B (en) Computing apparatus and model generation method
CN108731679A (en) Mobile robot environmental characteristic localization method
CN103905826A (en) Self-adaptation global motion estimation method
Geng et al. Robot positioning and navigation technology is based on Integration of the Global Navigation Satellite System and real-time kinematics
KR102407802B1 (en) Apparatus for estimating indoor and outdoor three-dimensional coordinates and orientation based on artificial neaural network learning
CN117537803B (en) Robot inspection semantic-topological map construction method, system, equipment and medium
Lee et al. Wi-Fi, CCTV and PDR Integrated Pedestrian Positioning System.
Chengqing et al. An improved visual indoor navigation method based on fully convolutional neural network
Jing et al. Video Seamless Splicing Method Based on SURF Algorithm and Harris Corner Points Detection
CN117451052A (en) Positioning method, device, equipment and storage medium based on vision and wheel speed meter
CN116105752A (en) Distance measurement-assisted multi-ground robot collaborative mapping method, device and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19934826

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19934826

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27/07/2022)