WO2023083256A1 - Procédé et appareil d'affichage de pose, et système, serveur, et support de stockage - Google Patents

Procédé et appareil d'affichage de pose, et système, serveur, et support de stockage Download PDF

Info

Publication number
WO2023083256A1
WO2023083256A1 PCT/CN2022/131134 CN2022131134W WO2023083256A1 WO 2023083256 A1 WO2023083256 A1 WO 2023083256A1 CN 2022131134 W CN2022131134 W CN 2022131134W WO 2023083256 A1 WO2023083256 A1 WO 2023083256A1
Authority
WO
WIPO (PCT)
Prior art keywords
positioning
image
target
map
pose
Prior art date
Application number
PCT/CN2022/131134
Other languages
English (en)
Chinese (zh)
Inventor
李佳宁
李�杰
毛慧
浦世亮
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2023083256A1 publication Critical patent/WO2023083256A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/38Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system
    • G01S19/39Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system the satellite radio beacon positioning system transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/42Determining position
    • G01S19/45Determining position by combining measurements of signals from the satellite radio beacon positioning system with a supplementary measurement
    • G01S19/47Determining position by combining measurements of signals from the satellite radio beacon positioning system with a supplementary measurement the supplementary measurement being an inertial measurement, e.g. tightly coupled inertial
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders

Definitions

  • the present application relates to the field of computer vision, and in particular to a pose display method, device and system, a server, and a machine-readable storage medium.
  • GPS Global Positioning System
  • the Beidou satellite navigation system consists of three parts: the space segment, the ground segment and the user segment. It can provide users with high-precision, high-reliability positioning, navigation, and timing services around the clock and around the world, and has regional navigation, positioning, and timing capabilities. .
  • the GPS or the Beidou satellite navigation system can be used to locate the terminal device.
  • the GPS signal or the Beidou signal is relatively good, the GPS or Beidou satellite navigation system can be used to accurately locate the terminal device.
  • the GPS or Beidou satellite navigation system cannot accurately locate the terminal device. For example, in coal, electric power, petrochemical and other energy industries, there are more and more requirements for positioning. These positioning requirements are generally in indoor environments. Due to problems such as signal occlusion, it is impossible to accurately locate terminal devices.
  • the present application provides a pose display method, which is applied to a cloud edge management system.
  • the cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of the target scene.
  • the method includes: the terminal device in the target scene In the process of moving in the center, acquire the target image of the target scene and the motion data of the terminal device, and determine the self-positioning trajectory of the terminal device based on the target image and the motion data; if the target image includes multiple frames image, select a part of the frame image from the multi-frame image as the image to be tested, and send the image to be tested and the self-positioning track to the server; the server is based on the image to be tested and the self-positioning track
  • the positioning track generates a fusion positioning track of the terminal device in the three-dimensional visual map, the fusion positioning track includes a plurality of fusion positioning poses; for each fusion positioning pose in the fusion positioning track, the server Determine the target positioning pose corresponding to the fused positioning pose, and
  • the present application provides a cloud edge management system
  • the cloud edge management system includes a terminal device and a server
  • the server includes a three-dimensional visual map of the target scene
  • the terminal device is used to acquire The target image of the target scene and the motion data of the terminal device, determining the self-positioning trajectory of the terminal device based on the target image and the motion data; if the target image includes multiple frames of images, then from the Selecting a part of frame images from the multi-frame images as the image to be tested, and sending the image to be tested and the self-positioning track to the server;
  • the server is used to generate the image to be tested and the self-positioning track based on the
  • the fusion positioning trajectory of the terminal device in the three-dimensional visual map, the fusion positioning trajectory includes a plurality of fusion positioning poses; for each fusion positioning pose in the fusion positioning trajectory, determine the fusion positioning pose corresponding target positioning pose, and display the target positioning pose.
  • the present application provides a pose display device, which is applied to a server in a cloud edge management system, where the server includes a three-dimensional visual map of a target scene, and the device includes: an acquisition module for acquiring an image to be tested and a self-positioning trajectory; Wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device, and the image to be tested is a partial frame image in the multi-frame images included in the target image; generating A module, configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image-to-be-tested and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses; the display module uses For each fused positioning pose in the fused positioning trajectory, determine a target positioning pose corresponding to the fused positioning pose, and display the target positioning pose.
  • the present application provides a server, including a processor and a machine-readable storage medium, the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor is used to execute the machine-executable instructions to implement the pose display method according to the embodiment of the present application.
  • the present application provides a machine-readable storage medium.
  • Computer instructions are stored on the machine-readable storage medium.
  • the pose display method according to the embodiment of the present application can be implemented.
  • a cloud-edge combined positioning and display method is proposed in the embodiment of the present application.
  • the target image and motion data are collected through the terminal device at the edge end, and high frame rate self-positioning is performed based on the target image and motion data to obtain High frame rate self-positioning trajectories.
  • the server in the cloud receives the image to be tested and the self-positioning trajectory sent by the terminal device, and obtains the fusion positioning trajectory with a high frame rate based on the image to be tested and the self-positioning trajectory, that is, the fusion positioning trajectory with a high frame rate in the 3D visual map to achieve high frame rate It is a vision-based indoor positioning method and can display fusion positioning tracks.
  • the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing the amount of data transmitted by the network.
  • Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. It can be applied in coal, electric power, petrochemical and other energy industries to realize the indoor positioning of personnel (such as workers, inspection personnel, etc.), quickly obtain the location information of personnel, and ensure the safety of personnel.
  • FIG. 1 is a schematic flowchart of a pose display method in an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a cloud edge management system in an embodiment of the present application
  • FIG. 3 is a schematic flow diagram of determining a self-positioning trajectory in an embodiment of the present application
  • FIG. 4 is a schematic flow diagram of determining a global positioning track in an embodiment of the present application.
  • Fig. 5 is a schematic diagram of a self-positioning trajectory, a global positioning trajectory and a fusion positioning trajectory
  • FIG. 6 is a schematic flow diagram of determining a fusion positioning trajectory in an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a pose display device in an embodiment of the present application.
  • first, second, and third may be used in the embodiment of the present application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, furthermore, the use of the word “if” could be interpreted as “at” or “when” or "in response to a determination.”
  • a pose display method is proposed, which can be applied to the cloud edge management system.
  • the cloud edge management system can include a terminal device (that is, a terminal device at the edge end) and a server (that is, a server in the cloud), and the server A three-dimensional visual map of the target scene (such as indoor environment, outdoor environment, etc.) may be included.
  • Fig. 1 it is a schematic flow chart of the pose display method, which may include:
  • Step 101 During the movement of the terminal device in the target scene, acquire the target image of the target scene and the motion data of the terminal device, and determine the self-positioning trajectory of the terminal device based on the target image and motion data.
  • the terminal device traverses the current frame of images from the multiple frames of images; based on the self-positioning Pose, the map position of the terminal device in the self-positioning coordinate system (that is, the coordinate position) and the motion data determine the self-positioning pose corresponding to the current frame image (that is, the self-positioning pose of the terminal device); based on multi-frame images
  • the self-positioning pose corresponding to each frame of the image generates the self-positioning trajectory of the terminal device in the self-positioning coordinate system.
  • Posture includes position and posture
  • self-positioning coordinate system is a coordinate system established with the self-positioning pose corresponding to the first frame image in the multi-frame images as the coordinate origin.
  • the map position of the terminal device in the self-positioning coordinate system may be generated based on the current position of the terminal device (ie, the position corresponding to the current frame image). If the current frame image is a non-key image, the map position of the terminal device in the self-positioning coordinate system does not need to be generated based on the current position of the terminal device.
  • the current location of the terminal device is, for example, the actual physical location of the terminal device when it collects the current frame of image.
  • the current frame image is a key image. If the number of matching feature points between the current frame image and the previous frame image of the current frame image reaches a preset threshold, it is determined that the current frame image is a non-key image.
  • the first frame of image may be used as a key image.
  • Step 102 If the target image includes multiple frames of images, the terminal device selects a part of frame images from the multiple frames of images as the image to be tested, and sends the image to be tested and the self-positioning trajectory to the server.
  • the terminal device may select M frames of images from multiple frames of images as images to be tested, and M may be a positive integer, such as 1, 2, 3, and so on.
  • M may be a positive integer, such as 1, 2, 3, and so on.
  • Step 103 the server generates a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory may include multiple fusion positioning poses.
  • the server may determine the target map point corresponding to the image to be tested from the three-dimensional visual map of the target scene, and determine the global positioning track of the terminal device in the three-dimensional visual map based on the target map point. Then, the server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track.
  • the frame rate of the fused positioning pose included in the fused positioning track may be greater than the frame rate of the global positioning pose included in the global positioning track, that is, the frame rate of the fused positioning track is higher than the frame rate of the global positioning track .
  • the pose frame rate refers to the frequency of pose output, that is, the number of poses output by the system per second.
  • the fused positioning track may be a high frame rate pose in the 3D visual map, and the global positioning track may be a low frame rate pose in the 3D visual map.
  • the frame rate of the fused localization trajectory is higher than that of the global localization trajectory, indicating that the number of fused localization poses is greater than the number of global localization poses.
  • the frame rate of the fused positioning pose included in the fused positioning track may be equal to the frame rate of the self-positioning pose included in the self-positioning track, that is, the frame rate of the fused positioning track is equal to the frame rate of the self-positioning track, that is, the self-positioning Trajectories can be high frame rate poses.
  • the frame rate of the fused localization trajectory is equal to the frame rate of the self-localization trajectory, which means that the number of fused localization poses is equal to the number of self-localization poses.
  • the 3D visual map may include but not limited to at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, and a sample local descriptor corresponding to the feature points in the sample image. Descriptor, map point information.
  • the server determines the target map point corresponding to the image to be tested from the 3D visual map of the target scene, and determines the global positioning track of the terminal device in the 3D visual map based on the target map point, which may include but not limited to: For each frame of the image to be tested, candidate sample images are selected from the multiple frames of sample images based on the similarity between the image to be tested and the multiple frames of sample images corresponding to the three-dimensional visual map.
  • a plurality of feature points are acquired from the image to be tested; for each feature point, a target map point corresponding to the feature point is determined from the plurality of map points corresponding to the candidate sample image.
  • a global positioning pose in the three-dimensional visual map corresponding to the image to be tested is determined based on the plurality of feature points and target map points corresponding to the plurality of feature points.
  • a global positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the global positioning poses corresponding to all images to be tested.
  • the server selects candidate sample images from the multi-frame sample images, which may include: determining the global descriptor to be tested corresponding to the image to be tested , determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the 3D visual map; wherein, the 3D visual map includes at least the sample global descriptor corresponding to each frame of sample image.
  • a candidate sample image is selected from multiple frames of sample images; wherein, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image
  • the distance between the global descriptor to be tested and the distance between each sample global descriptor is the minimum distance; and/or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is less than distance threshold.
  • the server determines the global descriptor to be tested corresponding to the image to be tested, which may include but not limited to: determine the bag-of-words vector corresponding to the image-to-be-tested based on the trained dictionary model, and determine the bag-of-words vector corresponding to the image-to-be-tested A global descriptor to be tested; or, inputting the image to be tested into a trained deep learning model to obtain a target vector corresponding to the image to be tested, and determining the target vector as the global descriptor to be tested corresponding to the image to be tested.
  • the above is just an example of determining the global descriptor to be tested, and is not limited thereto.
  • the server determines the target map point corresponding to the feature point from the multiple map points corresponding to the candidate sample image, which may include but not limited to: determining the local descriptor to be tested corresponding to the feature point, and the local descriptor to be tested is used for represents the feature vector of the image block where the feature point is located, and the image block may be located in the image to be tested. determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each map point corresponding to the candidate sample image; wherein, the three-dimensional visual map includes at least a sample corresponding to each map point corresponding to the candidate sample image local descriptor.
  • the target map point can be selected from multiple map points corresponding to the candidate sample image based on the distance between the local descriptor to be tested and each sample local descriptor; wherein, the local descriptor to be tested and the target map
  • the distance between the sample local descriptors corresponding to the points may be the minimum distance among the distances between the local descriptor to be tested and each sample local descriptor, and/or, the distance between the local descriptor to be tested and the target map point The distance between sample local descriptors is less than the distance threshold.
  • the server generates the fused positioning trajectory of the terminal device in the 3D visual map based on the self-positioning trajectory and the global positioning trajectory, which may include but not limited to: the server may select from all self-positioning poses included in the self-positioning trajectory N self-localization poses corresponding to the target time period, and select P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory; wherein, N and P are positive integers, and N is greater than P. Based on the N self-positioning poses and the P global positioning poses, N fusion positioning poses corresponding to the N self-positioning poses are determined, and the N self-positioning poses are in one-to-one correspondence with the N fusion positioning poses. Based on the N fusion positioning poses, the fusion positioning trajectory of the terminal device in the 3D visual map is generated.
  • the server may also update the fused positioning track. Specifically, the server may also select an initial fused positioning pose from the fused positioning trajectory, and select an initial self-localization pose corresponding to the initial fused positioning pose from the self-localization trajectory.
  • the target self-localization pose is selected from the self-localization trajectory, and the target fusion localization pose is determined based on the initial fusion localization pose, the initial self-localization pose and the target self-localization pose. Then, a new fusion positioning trajectory is generated based on the target fusion positioning pose and the fusion positioning trajectory to replace the original fusion positioning trajectory.
  • Step 104 For each fused positioning pose in the fused positioning trajectory, the server determines a target positioning pose corresponding to the fused positioning pose, and displays the target positioning pose.
  • the server may determine the fused positioning pose as the target positioning pose, and display the target positioning pose on the three-dimensional visual map.
  • the server converts the fusion positioning pose into the target positioning pose in the 3D visualized map, and displays the target positioning pose through the 3D visualized map.
  • the 3D visual map is constructed by the visual mapping algorithm and is only used for the map positioning algorithm; the 3D visual map is a 3D model used to show the 3D structure of the scene.
  • the method of determining the target transformation matrix between the 3D visual map and the 3D visual map may include but not limited to: for each of the multiple marked points in the target scene, the corresponding A coordinate pair, the coordinate pair may include the position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map; the target transformation matrix is determined based on the coordinate pairs corresponding to the multiple calibration points.
  • an initial transformation matrix map the position coordinates in the three-dimensional visual map to mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determine whether the initial transformation matrix is based on the relationship between the mapping coordinates and the actual coordinates in the three-dimensional visual map Converged; if yes, then determine the initial transformation matrix as the target transformation matrix; if not, then adjust the initial transformation matrix, use the adjusted transformation matrix as the initial transformation matrix, return to execute the 3D visual map based on the initial transformation matrix
  • the position coordinates in are mapped to the mapping coordinates in the three-dimensional visualization map, and so on until the target transformation matrix is obtained.
  • a positioning and display method combining cloud and edge is proposed, and the target image and motion data are collected through the terminal device at the edge end, and high frame rate self-positioning is performed based on the target image and motion data. Get high frame rate self-localization trajectories.
  • the server in the cloud receives the image to be tested and the self-positioning trajectory sent by the terminal device, and obtains the fusion positioning trajectory with a high frame rate based on the image to be tested and the self-positioning trajectory, that is, the fusion positioning trajectory with a high frame rate in the 3D visual map to achieve high frame rate High-speed and high-precision positioning function, realize high-precision, low-cost, and easy-to-deploy indoor positioning function, it is a vision-based indoor positioning method, and can display the fusion positioning track in the three-dimensional visual map.
  • the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing the amount of data transmitted by the network.
  • Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. It can be applied in coal, electric power, petrochemical and other energy industries to realize the indoor positioning of personnel (such as workers, inspection personnel, etc.), quickly obtain the location information of personnel, and ensure the safety of personnel.
  • the embodiment of the present application proposes a cloud-edge combined visual positioning and display method.
  • the server determines the fused positioning track of the terminal device in the 3D visual map, and displays the fused positioning track.
  • the target scene can be an indoor environment, that is, when the terminal device moves in the indoor environment, the server determines the fusion positioning track of the terminal device in the 3D visual map, that is, a vision-based indoor positioning method is proposed.
  • the target scene can also be an outdoor environment. There is no restriction on this.
  • the cloud edge management system may include terminal devices (that is, edge terminal devices) and servers (that is, cloud servers). Of course, the cloud edge management system may also include other devices , such as wireless base stations and routers, etc., there is no restriction on this.
  • the server may include a 3D visual map of the target scene and a 3D visual map corresponding to the 3D visual map, the server may generate a fusion positioning track of the terminal device in the 3D visual map, and display the fusion positioning track in the 3D visual map (required converted into a trajectory that can be displayed in the three-dimensional visualization map), so that managers can view the fusion positioning trajectory in the three-dimensional visualization map through the web.
  • the terminal device may include a vision sensor and a motion sensor, etc.
  • the vision sensor may be a camera, etc., and the vision sensor is used to collect images of a target scene during the movement of the terminal device. For the convenience of distinction, this image is recorded as a target image, and the target image includes multiple frames of images (that is, multiple frames of real-time images collected during the movement of the terminal device).
  • the motion sensor can be such as IMU (Inertial Measurement Unit, Inertial Measurement Unit), etc.
  • the IMU is a measuring device including a gyroscope and an accelerometer.
  • the motion sensor is used to collect motion data of the terminal device during the movement of the terminal device, such as acceleration and angular velocity etc.
  • the terminal device can be a wearable device (such as a video helmet, smart watch, smart glasses, etc.), and the visual sensor and the motion sensor are deployed on the wearable device; Personnel carry it with them when performing work, and have a device that integrates real-time video and audio collection, photography, recording, intercom, positioning, etc.), and the visual sensor and motion sensor are deployed on the recorder; or, the terminal device is a camera ( Such as split cameras, etc.), and vision sensors and motion sensors are deployed on the cameras.
  • the terminal device is a camera (such as split cameras, etc.), and vision sensors and motion sensors are deployed on the cameras.
  • the terminal device can acquire target images and motion data, perform high frame rate self-positioning based on the target images and motion data, and obtain high frame rate self-positioning trajectories (such as 6DOF (six degrees of freedom) self-positioning trajectories).
  • the self-localization trajectory may include multiple self-localization poses. Since the self-localization trajectory is a self-localization trajectory with a high frame rate, the number of self-localization poses in the self-localization trajectory is relatively large.
  • the terminal device can select some frame images from the multi-frame images of the target image as the image to be tested, and send the high frame rate self-positioning trajectory and the image to be tested to the server.
  • the server can obtain the self-positioning track and the image to be tested, and the server can perform global positioning at a low frame rate according to the image to be tested and the 3D visual map of the target scene, and obtain a global positioning track with a low frame rate (that is, the position of the terminal device in the 3D visual map) global positioning track).
  • the global positioning track may include multiple global positioning poses. Since the global positioning track is a global positioning track with a low frame rate, the number of global positioning poses in the global positioning track is relatively small.
  • the server can fuse the high frame rate self-positioning trajectory and the low frame rate global positioning trajectory to obtain the high frame rate fusion positioning trajectory, that is, the high frame rate fusion positioning trajectory in the 3D visual map, that is, the high frame rate fusion positioning results.
  • the fused positioning trajectory may include multiple fused positioning poses. Since the fused positioning trajectory is a high frame rate fused positioning trajectory, the number of fused positioning poses in the fused positioning trajectory is relatively large.
  • poses (such as self-positioning poses, global positioning poses, fusion positioning poses, etc.) can be positions and poses, which are generally represented by rotation matrices and translation vectors, without limitation.
  • a globally unified high frame rate visual positioning function can be realized, and a high frame rate fusion positioning trajectory (such as 6DOF pose) in the three-dimensional visual map can be obtained. It is a globally consistent high frame rate positioning method, which realizes high frame rate, high precision, low cost, and easy-to-deploy indoor positioning functions of terminal equipment, and realizes indoor globally consistent high frame rate positioning functions.
  • the terminal device is an electronic device with a visual sensor and a motion sensor, which can acquire the target image of the target scene (such as continuous video image) and the motion data of the terminal device (such as IMU data), and determine the terminal device's motion based on the target image and motion data self-positioning trajectory.
  • the target image of the target scene such as continuous video image
  • the motion data of the terminal device such as IMU data
  • the target image may include multiple frames of images, and for each frame of images, the terminal device determines a self-localization pose corresponding to the image, that is, multiple frames of images correspond to multiple self-localization poses.
  • the self-positioning track of the terminal device may include multiple self-positioning poses, which can be understood as a collection of multiple self-positioning poses.
  • the terminal device determines the self-localization pose corresponding to the first frame image, and for the second frame image in the multi-frame images, the terminal device determines the self-localization pose corresponding to the second frame image, and so on.
  • the self-localization pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system (that is, the self-positioning coordinate system), and the self-localization pose corresponding to the second frame image is the pose point in the reference coordinate system, that is, relative to The pose point of the coordinate origin (that is, the self-positioning pose corresponding to the first frame image), and the self-positioning pose corresponding to the third frame image is the pose point in the reference coordinate system, that is, the pose point relative to the coordinate origin , and so on, the self-localization pose corresponding to each frame image is the pose point in the reference coordinate system.
  • these self-localization poses can be composed into a self-localization trajectory in the reference coordinate system, and the self-localization trajectory includes these self-localization poses.
  • Step 301 acquiring a target image of a target scene and motion data of a terminal device.
  • Step 302 if the target image includes multiple frames of images, traverse the current frame of images from the multiple frames of images.
  • the self-positioning pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system (that is, the self-positioning coordinate system), that is, the self-positioning pose and The origin of the coordinates coincides.
  • subsequent steps may be used to determine the self-localization pose corresponding to the second frame image.
  • subsequent steps can be used to determine the self-localization pose corresponding to the third frame image, and so on, each frame image can be traversed as the current frame image.
  • Step 303 using the optical flow algorithm to calculate the feature point association relationship between the current frame image and the previous frame image of the current frame image.
  • the optical flow algorithm uses the change of pixels in the current frame image in the time domain and the correlation between the current frame image and the previous frame image to find the corresponding relationship between the current frame image and the previous frame image, Thus, the manner of calculating the motion information of the object between the current frame image and the previous frame image.
  • Step 304 Determine whether the current frame image is a key image based on the number of matching feature points between the current frame image and the previous frame image. For example, if the number of matching feature points between the current frame image and the previous frame image does not reach the preset threshold, it is used to indicate that the current frame image and the previous frame image have changed greatly, resulting in the matching between the two frame images If the number of feature points is relatively small, it is determined that the current frame image is a key image, and step 305 is performed.
  • step 306 is performed.
  • the matching ratio between the current frame image and the previous frame image can also be calculated based on the number of matching feature points between the current frame image and the previous frame image, for example, the ratio of the number of matching feature points to the total number of feature points Proportion. If the matching ratio does not reach the preset ratio, it is determined that the current frame image is a key image, and if the matching ratio reaches the preset ratio, it is determined that the current frame image is a non-key image.
  • Step 305 if the current frame image is the key image, generate a map position in the self-positioning coordinate system (ie, the reference coordinate system) based on the current position of the terminal device (ie, the position where the terminal device is when collecting the current frame image), that is, generate A new 3D map location. If the current frame image is a non-key image, the map position of the terminal device in the self-positioning coordinate system does not need to be generated based on the current position of the terminal device.
  • Step 306 Determine the self-positioning position corresponding to the current frame image based on the self-positioning pose corresponding to each frame image of the K frame image in front of the current frame image, the map position of the terminal device in the self-positioning coordinate system, and the motion data of the terminal device
  • K can be a positive integer, and can be a value configured according to experience, and there is no limitation on this.
  • all the motion data between the previous frame image of the current frame image and the current frame image can be pre-integrated to obtain the inertial measurement constraints between the two frame images.
  • the self-positioning pose and motion data such as velocity, acceleration, angular velocity, etc.
  • the K frame images such as sliding windows
  • the map position in the self-positioning coordinate system and inertial measurement constraints (previous frame image
  • the object’s velocity, acceleration, angular velocity, etc.) between the current frame image and the current frame image can be optimized by using the bundled set to optimize the self-positioning pose and velocity corresponding to the K frame images (such as sliding windows) in front of the current frame image, and the inertial measurement sensor
  • the state variables of offset", "map point position in the self-localization coordinate system” are jointly optimized and updated to obtain the self-localization pose corresponding to the current frame image, and there is no limit to this bundle optimization process.
  • a certain frame and part of the map positions within the sliding window can also be marginalized, and these constraint information can be preserved in a priori form.
  • the terminal device can use the VIO (Visual Inertial Odometry, visual inertial odometer) algorithm to determine the self-positioning pose, that is to say, the input data of the VIO algorithm is the target image and motion data, and the output data of the VIO algorithm is the self-positioning pose.
  • the VIO algorithm can obtain the self-localization pose.
  • the VIO algorithm is used to perform steps 301 to 306 to obtain the self-localization pose.
  • the VIO algorithm can include but not limited to VINS (Visual Inertial Navigation Systems, visual inertial navigation system), SVO (Semi-direct Visual Odometry, semi-direct visual odometer), MSCKF (Multi State Constraint Kalman Filter, Kalman under multi-state constraints Filter), etc., are not limited here, as long as the self-localization pose can be obtained.
  • VINS Visual Inertial Navigation Systems, visual inertial navigation system
  • SVO Semi-direct Visual Odometry, semi-direct visual odometer
  • MSCKF Multi State Constraint Kalman Filter, Kalman under multi-state constraints Filter
  • Step 307 Generate a self-positioning trajectory of the terminal device in the self-positioning coordinate system based on the self-positioning pose corresponding to each frame of the multi-frame images, and the self-positioning trajectory includes multiple self-positioning poses in the self-positioning coordinate system.
  • the terminal device can obtain the self-localization trajectory in the self-localization coordinate system, and the self-localization trajectory may include the self-localization pose corresponding to each frame of multiple images.
  • the terminal device can obtain the self-localization poses corresponding to these images, that is, the self-localization trajectory can include a large number of self-localization poses, that is, the terminal device can obtain high frame rate self-localization poses. positioning track.
  • the terminal device may select a part of frame images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server. For example, the terminal device sends the self-positioning trajectory and the image to be tested to the server through a wireless network (such as 4G, 5G, Wifi, etc.). Since the frame rate of the image to be tested is low, the network bandwidth occupied is small.
  • a wireless network such as 4G, 5G, Wifi, etc.
  • a 3D visual map of the target scene It is necessary to pre-build a 3D visual map of the target scene and store the 3D visual map in the server, so that the server can perform global positioning based on the 3D visual map.
  • the 3D visual map is a storage method for the image information of the target scene, which can collect multiple frames of sample images of the target scene, and build a 3D visual map based on these sample images. For example, based on the multi-frame sample image of the target scene, visual mapping algorithms such as SFM (Structure From Motion, motion recovery structure) or SLAM (Simultaneous Localization And Mapping, simultaneous positioning and mapping) can be used to construct a 3D vision of the target scene map, there is no limit to how it can be constructed.
  • SFM Structure From Motion, motion recovery structure
  • SLAM Simultaneous Localization And Mapping, simultaneous positioning and mapping
  • the three-dimensional visual map may include the following information:
  • Sample image pose The sample image is a representative image when constructing a 3D visual map, that is, a 3D visual map can be constructed based on the sample image, and the pose matrix of the sample image (which can be referred to as the sample image pose) can be stored in the 3D visual map.
  • the map ie the 3D visual map, may include sample image poses.
  • Sample global descriptor For each frame of sample image, the sample image can correspond to the image global descriptor, and the image global descriptor is recorded as the sample global descriptor.
  • the sample global descriptor is a high-dimensional vector to represent the sample image, and the sample The global descriptor is used to distinguish the image features of different sample images.
  • the bag-of-words vector corresponding to the sample image can be determined based on the trained dictionary model, and the bag-of-words vector is determined as the sample global descriptor corresponding to the sample image.
  • the bag of words (Bag of Words) method is a way to determine the global descriptor.
  • the word bag vector can be constructed, which is a kind of image similarity detection method.
  • Vector representation method the bag-of-words vector can be used as the sample global descriptor corresponding to the sample image.
  • a "dictionary” also known as a dictionary model.
  • a classification tree is obtained after training.
  • Each classification tree can represent a visual "words”, and these visual "words” form a dictionary model.
  • all the feature point descriptors in the sample image can be classified as "words", and the frequency of occurrence of all words can be counted, so that the frequency of each word in the dictionary can form a vector, which is The bag-of-words vector corresponding to the sample image.
  • the bag-of-words vector can be used to measure the similarity between two frames of images, and the bag-of-words vector is used as a sample global descriptor corresponding to the sample image.
  • the sample image can be input to the trained deep learning model to obtain the target vector corresponding to the sample image, and determine the target vector as the sample global descriptor corresponding to the sample image.
  • the deep learning method is a way to determine the global descriptor.
  • the sample image can be multi-layered through the deep learning model, and finally a high-dimensional target vector is obtained.
  • the target vector is used as the sample global descriptor corresponding to the sample image.
  • the deep learning model such as the CNN (Convolutional Neural Networks, Convolutional Neural Network) model, etc.
  • the sample image can be input to the deep learning model, and the deep learning model processes the sample image to obtain a high-dimensional target vector, which is used as the sample global descriptor corresponding to the sample image .
  • the sample local descriptor corresponding to the feature point of the sample image For each frame of the sample image, the sample image can include multiple feature points, and the feature points can include specific pixel positions in the sample image and are used to describe the position local
  • the two parts of information of the descriptor of the range, that is, the feature point can correspond to an image local descriptor, and the image local descriptor is recorded as the sample local descriptor, and the sample local descriptor uses a vector to describe the feature point (that is, the pixel point The feature of the image block in the vicinity of the position), this vector can also be called the descriptor of the feature point.
  • the sample local descriptor is a feature vector used to represent the image block where the feature point is located, and the image block can be located in the sample image. It should be noted that, for a feature point in the sample image (ie, a two-dimensional feature point), the feature point can correspond to a map point in a three-dimensional visual map (ie, a three-dimensional map point). Therefore, the sample local descriptor corresponding to the feature point, It may also be a sample local descriptor corresponding to the map point corresponding to the feature point.
  • ORB Oriented FAST and Rotated BRIEF, oriented fast rotation
  • SIFT Scale-Invariant Feature Transform, scale-invariant feature transformation
  • SURF Speeded Up Robust Features, accelerated robust features
  • ORB Oriented FAST and Rotated BRIEF, oriented fast rotation
  • SIFT Scale-Invariant Feature Transform, scale-invariant feature transformation
  • SURF Speeded Up Robust Features, accelerated robust features
  • deep learning algorithms such as SuperPoint, DELF, D2-Net, etc.
  • Map point information may include, but not limited to: the 3D spatial position of the map point, all observed sample images, and the corresponding 2D feature points (that is, the feature points corresponding to the map point) numbers.
  • the server Based on the acquired 3D visual map of the target scene, after the server obtains the image to be tested, it determines the target map point corresponding to the image to be tested from the 3D visual map of the target scene, and determines the location of the terminal device on the 3D visual map based on the target map point.
  • the global positioning track in .
  • the server can determine the global positioning pose corresponding to the image to be tested. Assuming that there are M frames of the image to be tested, the M frames of the image to be tested correspond to M global positioning poses, and the terminal device in the three-dimensional visual map
  • the global positioning trajectory of can include M global positioning poses, which can be understood as the global positioning trajectory is a collection of M global positioning poses. For the first frame of the image to be tested in the M frames of images to be tested, determine the global positioning pose corresponding to the first frame of the image to be tested, and for the second frame of the image to be tested, determine the global positioning pose corresponding to the second frame of the image to be tested , and so on.
  • the global positioning pose is a pose point in the 3D visual map, that is, a pose point in the 3D visual map coordinate system.
  • these global positioning poses are composed into a global positioning track in the 3D visual map, and the global positioning track includes these global positioning poses.
  • the server may determine the global positioning track of the terminal device in the 3D visual map by using the following steps:
  • Step 401 the server acquires the image to be tested of the target scene from the terminal device.
  • the terminal device may acquire a target image, and the target image includes multiple frames of images, the terminal device may select M frames of images from the multiple frames of images as images to be tested, and send the M frames of images to be tested to the server.
  • the multi-frame images include key images and non-key images.
  • the terminal device may use the key images in the multi-frame images as the images to be tested, while the non-key images are not used as the images to be tested.
  • the terminal device can select the image to be tested from multiple frames of images at a fixed interval, assuming that the fixed interval is 5 (of course, the fixed interval can be arbitrarily configured according to experience, and there is no limit to this), then the first frame of image can be As the image to be tested, the image of the 6th (1+5) frame is used as the image to be tested, the image of the 11th (6+5) frame is used as the image to be tested, and so on, and one frame is selected for every 5 frames of images to be tested image.
  • Step 402 for each frame of the image to be tested, determine the global descriptor to be tested corresponding to the image to be tested.
  • the image to be tested may correspond to a global descriptor of the image, and the global descriptor of the image may be recorded as a global descriptor to be tested, and the global descriptor to be tested is represented by a high-dimensional vector
  • the image to be tested, the global descriptor to be tested is used to distinguish the image features of different images to be tested.
  • the bag of words vector corresponding to the image to be tested is determined based on the trained dictionary model, and the bag of words vector is determined as the global descriptor to be tested corresponding to the image to be tested.
  • input the image to be tested to the trained deep learning model to obtain the target vector corresponding to the image to be tested, and determine the target vector as the global descriptor to be tested corresponding to the image to be tested .
  • the global descriptor to be tested corresponding to the image to be tested can be determined based on the bag of visual words method or the deep learning method.
  • the determination method refer to the determination method of the global descriptor of the sample, and will not be repeated here.
  • Step 403 For each frame of the image to be tested, determine the similarity between the global descriptor to be tested corresponding to the image to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the 3D visual map.
  • the three-dimensional visual map can include the sample global descriptor corresponding to each frame of sample image, therefore, the similarity between the global descriptor to be tested and each sample global descriptor can be determined, and the similarity is "distance "Similarity" as an example, the distance between the global descriptor to be tested and each sample global descriptor can be determined, such as the Euclidean distance, that is, the Euclidean distance between two feature vectors is calculated.
  • Step 404 Based on the distance between the global descriptor to be tested and the global descriptor of each sample, select candidate sample images from the multi-frame sample images corresponding to the 3D visual map; wherein, the global descriptor to be tested and the candidate sample The distance between the sample global descriptors corresponding to the image is the minimum distance among the distances between the global descriptor to be tested and each sample global descriptor; and/or, the sample corresponding to the global descriptor to be tested and the candidate sample image The distance between global descriptors is less than the distance threshold.
  • the distance 1 between the global descriptor to be tested and the sample global descriptor corresponding to sample image 1 can be calculated, and the The distance 2 between the global descriptor and the sample global descriptor corresponding to the sample image 2, and the distance 3 between the global descriptor to be tested and the sample global descriptor corresponding to the sample image 3 is calculated.
  • the sample image 1 is selected as the candidate sample image.
  • distance 1 is less than the distance threshold (can be configured based on experience), and distance 2 is less than the distance threshold, but distance 3 is not less than the distance threshold, then both sample image 1 and sample image 2 are selected as candidate sample images.
  • the sample image 1 is selected as the candidate sample image, but if the distance 1 is the minimum distance and the distance 1 is not less than the distance threshold, the candidate sample cannot be selected image, i.e. relocation failed.
  • a candidate sample image corresponding to the image to be tested may be selected from multiple frames of sample images corresponding to the three-dimensional visual map, and the number of candidate sample images is at least one.
  • Step 405. For each frame of the image to be tested, obtain a plurality of feature points from the image to be tested, and for each feature point, determine a local descriptor to be tested corresponding to the feature point, and the local descriptor to be tested is used to represent the The feature vector of the image block where the feature point is located, and the image block may be located in the image to be tested.
  • the image to be tested may include a plurality of feature points, and the feature points may be specific pixel positions in the image to be tested.
  • the feature point may correspond to an image local descriptor, which is recorded as the local descriptor to be tested.
  • the local descriptor to be tested uses a vector to describe the feature of the image block in the vicinity of the feature point (that is, the pixel point position), and the vector can also be called the descriptor of the feature point.
  • the local descriptor to be tested is a feature vector used to represent the image block where the feature point is located.
  • ORB ORB, SIFT, SURF and other algorithms can be used to extract feature points from the image to be tested, and determine the local descriptors to be tested corresponding to the feature points. It is also possible to use deep learning algorithms (such as SuperPoint, DELF, D2-Net, etc.) to extract feature points from the image to be tested, and determine the local descriptors to be tested corresponding to the feature points. There is no limit to this, as long as the feature points can be obtained , and determine the local descriptor to be tested.
  • deep learning algorithms such as SuperPoint, DELF, D2-Net, etc.
  • Step 406 for each feature point corresponding to the image to be tested, determine the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image corresponding to the image to be tested (i.e. candidate).
  • the map point corresponding to each feature point in the sample image corresponds to the distance between the sample local descriptors), such as the Euclidean distance, that is, the Euclidean distance between two feature vectors is calculated.
  • the 3D visual map includes sample local descriptors corresponding to each map point corresponding to the sample image, therefore, after obtaining the candidate sample image corresponding to the image to be tested, from the 3D visual map Obtain the sample local descriptor corresponding to each map point corresponding to the candidate sample image. After each feature point corresponding to the image to be tested is obtained, the distance between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image is determined.
  • Step 407 for each feature point, based on the distance between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image, from multiple maps corresponding to the candidate sample image
  • the target map point is selected from the points; wherein, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is the minimum distance among the distances between the local descriptor to be tested and each sample local descriptor , and/or, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is smaller than a distance threshold.
  • the distance 1 between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to map point 1 can be calculated, and Calculate the distance 2 between the local descriptor to be tested and the sample local descriptor corresponding to map point 2, and calculate the distance 3 between the local descriptor to be tested and the sample local descriptor corresponding to map point 3.
  • map point 1 may be selected as the target map point.
  • map point 1 may be selected as the target map point.
  • map point 1 can be selected as the target map point; however, if distance 1 is the minimum distance and distance 1 is not less than the distance threshold, the target cannot be selected map point, i.e. relocation failed.
  • the target map point corresponding to the feature point is selected from the candidate sample images corresponding to the image to be tested, and the matching relationship between the feature point and the target map point is obtained.
  • Step 408 based on the multiple feature points corresponding to the image to be tested and the target map points corresponding to the multiple feature points, determine the global positioning pose in the 3D visual map corresponding to the image to be tested.
  • the image to be tested may correspond to multiple feature points, and each feature point corresponds to a target map point.
  • the target map point corresponding to feature point 1 is map point 1
  • the target map point corresponding to feature point 2 is map point 2, and so on, so as to obtain multiple matching relationship pairs.
  • Each matching relationship pair includes a feature point (that is, a two-dimensional feature point) and a map point (that is, a three-dimensional map point in a three-dimensional visual map), the feature point represents the two-dimensional position in the image to be tested, and the map point represents the three-dimensional
  • the three-dimensional position in the visual map, that is, the matching relation pair includes the mapping relationship from the two-dimensional position to the three-dimensional position, that is, the mapping relationship from the two-dimensional position in the image to be tested to the three-dimensional position in the three-dimensional visual map.
  • the global positioning pose in the three-dimensional visual map corresponding to the image to be tested cannot be determined based on the multiple matching relationship pairs. If the total number of multiple matching relationship pairs reaches the quantity requirement (that is, the total number reaches a preset number value), it means that the global positioning pose in the three-dimensional visual map corresponding to the image to be tested can be determined based on multiple matching relationship pairs, The global positioning pose in the three-dimensional visual map corresponding to the image to be tested can be determined based on multiple matching relationship pairs.
  • the PnP (Perspective n Point, n-point perspective) algorithm can be used to calculate the global positioning pose in the three-dimensional visual map corresponding to the image to be tested, and the calculation method is not limited.
  • the input data of the PnP algorithm is a plurality of matching relationship pairs.
  • the matching relationship pair includes the two-dimensional position in the image to be tested and the three-dimensional position in the three-dimensional visual map.
  • the PnP algorithm can be used to calculate the pose of the image to be tested in the three-dimensional visual map, that is, the global positioning pose.
  • the global positioning pose in the 3D visual map corresponding to the image to be tested is obtained, that is, the global positioning pose corresponding to the image to be tested in the 3D visual map coordinate system is obtained.
  • a valid matching relationship pair may also be found from the multiple matching relationship pairs.
  • the PnP algorithm can be used to calculate the global positioning pose in the 3D visual map corresponding to the image to be tested.
  • the RANSAC Random SAmple Consensus, random sample consistency
  • the RANSAC Random SAmple Consensus, random sample consistency
  • Step 409 Generate a global positioning track of the terminal device in the 3D visual map based on the global positioning poses corresponding to the M frames of images to be tested acquired in step 401 , the global positioning track includes multiple global positioning poses in the 3D visual map. So far, the server can obtain the global positioning trajectory in the 3D visual map, that is, the global positioning trajectory in the coordinate system of the 3D visual map.
  • the global positioning trajectory can include the global positioning poses corresponding to the M frames of images to be tested, that is, the global positioning trajectory May include M global positioning poses. Since the M frames of images to be tested are partial images selected from all images, the global positioning track may include global positioning poses corresponding to a small number of images, that is, the server can obtain a global positioning track with a low frame rate.
  • the server After the server obtains the high frame rate self-positioning trajectory and the low frame rate global positioning trajectory, it fuses the high frame rate self-positioning trajectory with the low frame rate global positioning trajectory to obtain the high frame rate trajectory in the 3D visual map coordinate system
  • the fused positioning trajectory that is, the fused positioning trajectory of the terminal device in the 3D visual map.
  • the fusion positioning trajectory is a high frame rate pose in the 3D visual map
  • the global positioning trajectory is a low frame rate pose in the 3D visual map, that is, the frame rate of the fusion positioning trajectory is higher than the frame rate of the global positioning trajectory, and the fusion positioning pose
  • the number is greater than the number of global positioning poses.
  • the white solid circle represents a self-localization pose
  • a trajectory composed of multiple self-localization poses is called a self-localization trajectory, that is, a self-localization trajectory includes multiple self-localization poses.
  • the self-localization pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system SL (that is, the self-localization coordinate system), and the self-localization pose corresponding to the first frame image is recorded as self-positioning pose It coincides with the coordinate origin of the reference coordinate system S L.
  • S L For each self-localization pose in the self-localization trajectory, is the self-localization pose in the reference frame S L .
  • the gray solid line circle represents the global positioning pose.
  • the trajectory composed of multiple global positioning poses is called the global positioning trajectory, that is, the global positioning trajectory includes multiple global positioning poses.
  • the global positioning pose can be the three-dimensional visual map coordinate system S
  • the pose under G that is, each global positioning pose in the global positioning trajectory is the global positioning pose under the 3D visual map coordinate system S G , that is, the global positioning pose under the 3D visual map.
  • the white dotted circle represents the fusion positioning pose.
  • the trajectory composed of multiple fusion positioning poses is called the fusion positioning trajectory, that is, the fusion positioning trajectory includes multiple fusion positioning poses.
  • the fusion positioning pose can be the three-dimensional visual map coordinate system S G
  • the pose under that is, each fusion positioning pose in the fusion positioning trajectory is the fusion positioning pose under the three-dimensional visual map coordinate system S G , that is, the fusion positioning pose under the three-dimensional visual map.
  • each frame of image corresponds to a self-positioning pose
  • a part of frame images is selected from the multiple frames of images as the image to be tested, and each frame of the image to be tested corresponds to a global positioning pose , so the number of self-localization poses is larger than the number of global localization poses.
  • each self-localization pose corresponds to a fusion positioning pose (that is, the self-localization pose and the fusion positioning pose correspond one-to-one), that is, the number of self-localization poses
  • the number of fused localization poses is the same as that of fused localization poses, therefore, the number of fused localization poses is also larger than the number of global localization poses.
  • the server can implement the trajectory fusion function and the pose transformation function.
  • the server can implement the trajectory fusion function and the pose transformation function through the following steps to obtain the Fusion positioning track in the map:
  • Step 601 Select N self-localization poses corresponding to the target time period from all self-localization poses included in the self-localization trajectory, and select N self-localization poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory
  • the P global positioning poses for example, N may be greater than P.
  • N self-localization poses corresponding to the target time period that is, self-localization poses determined based on images collected during the target time period
  • P global positioning poses corresponding to the target time period that is, the global positioning pose determined based on the images collected in the target time period
  • Step 602 Determine N fused positioning poses corresponding to the N self-localization poses and P global positioning poses based on the N self-localization poses and the P global positioning poses, and the N self-localization poses correspond to the N fused positioning poses one-to-one.
  • the self-positioning pose can be determined based on N self-positioning poses and P global positioning poses Corresponding fusion positioning pose Determining the self-localization pose Corresponding fusion positioning pose Determining the self-localization pose Corresponding fusion positioning pose and so on.
  • N self-localization poses there are N self-localization poses, P global localization poses and N fusion localization poses, the N self-localization poses are all known values, and the P global localization poses All are known values, and the N fusion positioning poses are all unknown values, which are the pose values that need to be solved.
  • the self-positioning pose and fusion localization pose Corresponding, self-positioning pose and fusion localization pose Corresponding, self-positioning pose and fusion localization pose correspond, and so on.
  • global positioning pose and fusion localization pose Correspondence, global positioning pose and fusion localization pose correspond, and so on.
  • the first constraint value can be determined based on the N self-positioning poses and N fusion positioning poses, and the first constraint value is used to represent the residual value between the fusion positioning pose and the self-localization pose, such as based on and difference, and difference, ..., and Calculate the first constraint value.
  • the calculation formula of the first constraint value is not limited in this embodiment, and it only needs to be related to the above-mentioned differences.
  • the second constraint value can be determined based on P global positioning poses and P fusion positioning poses (that is, P fusion positioning poses corresponding to P global positioning poses are selected from N fusion positioning poses).
  • the two constraint values are used to represent the residual value (which can be an absolute difference) between the fusion positioning pose and the global positioning pose, such as can be based on and difference, ..., and Calculate the second constraint value.
  • the calculation formula of the second constraint value is not limited in this embodiment, and it only needs to be related to the above-mentioned differences.
  • the target constraint value may be calculated based on the first constraint value and the second constraint value, for example, the target constraint value may be the sum of the first constraint value and the second constraint value. Since the N self-positioning poses and P global positioning poses are all known values, and the N fusion positioning poses are all unknown values, therefore, by adjusting the values of the N fusion positioning poses, the target constraint value is minimum. When the target constraint value is the minimum, the values of the N fusion positioning poses are the final solution pose values, so far, the values of the N fusion positioning poses are obtained.
  • formula (1) can be used to calculate the target constraint value:
  • F(T) represents the target constraint value
  • the part before the plus sign (subsequently recorded as the first part) is the first constraint value
  • the part after the plus sign (subsequently recorded as the second part) is the second Constraint values
  • ⁇ i, i+1 is the residual information matrix for the self-localization pose, which can be configured according to experience, and there is no restriction on this.
  • ⁇ k is the residual information matrix for the global positioning pose, which can be configured according to experience. No restrictions.
  • the first part represents the relative transformation constraint between the self-localization pose and the fused localization pose, which can be reflected by the first constraint value.
  • N is the number of all self-localization poses in the self-localization trajectory, that is, N self-localization poses.
  • the second part represents the global positioning constraints of the global positioning pose and the fusion positioning pose, which can be reflected by the second constraint value.
  • P is the number of all global positioning poses in the global positioning trajectory, that is, P global positioning poses.
  • Positioning poses for fusion (with corresponding global positioning poses ), for The corresponding global positioning pose, e k represents the fusion positioning pose relative to the global localization pose residuals.
  • the optimization goal can be to minimize the value of F(T), so that the fusion positioning pose can be obtained, that is, the 3D visual map coordinate system
  • the following fusion positioning trajectory can be referred to formula (4): arg min F(T) By minimizing the value of F(T), the fusion positioning trajectory can be obtained, and the fusion positioning trajectory can include multiple fusion positioning poses .
  • Step 603 Generate a fused positioning trajectory of the terminal device in the 3D visual map based on the N fused positioning poses, where the fused positioning trajectory includes the N fused positioning poses in the 3D visual map.
  • the server obtains the fused positioning trajectory in the 3D visual map, that is, the fused positioning trajectory in the 3D visual map coordinate system, the number of fused positioning poses in the fused positioning trajectory is greater than the number of global positioning poses in the global positioning trajectory, That is to say, a fusion positioning track with a high frame rate can be obtained.
  • Step 604 Select an initial fusion positioning pose from the fusion positioning trajectory, and select an initial self-localization pose corresponding to the initial fusion positioning pose from the self-localization trajectory.
  • Step 605 Select a target self-localization pose from the self-localization track, and determine a target fusion positioning pose based on the initial fusion positioning pose, the initial self-localization pose, and the target self-localization pose.
  • the fusion positioning trajectory can also be updated.
  • the initial fusion positioning pose can be selected from the fusion positioning trajectory, and the self-positioning trajectory can be The initial self-localization pose is selected, and the target self-localization pose is selected from the self-localization track.
  • the target fusion positioning pose can be determined based on the initial fusion positioning pose, the initial self-localization pose and the target self-localization pose. Then, a new fusion positioning trajectory may be generated based on the target fusion positioning pose and the fusion positioning trajectory to replace the original fusion positioning trajectory.
  • the self-positioning trajectory includes and Between the self-localization poses
  • the global localization trajectory includes and Between the global positioning poses
  • the fusion positioning trajectory includes and Between the fusion positioning pose, after that, if a new self-positioning pose is obtained
  • the following formula (4) can also be used to determine the fusion positioning pose
  • formula (4) represents the self-localization pose
  • the corresponding fusion positioning pose that is, the target fusion positioning pose
  • Indicates the fusion positioning pose that is, the initial fusion positioning pose selected from the fusion positioning trajectory
  • Indicates the self-localization pose which is selected from the self-localization trajectory
  • the corresponding initial self-localization pose Indicates the self-localization pose, that is, the target self-localization pose selected from the self-localization trajectory.
  • the pose can be positioned based on the initial fusion
  • the initial self-localization pose and the target self-localization pose Determining the target fusion positioning pose
  • a new fusion positioning trajectory can be generated, that is, the new fusion positioning trajectory can include the target fusion positioning pose In this way, the fusion positioning trajectory is updated.
  • step 601-step 603 is the trajectory fusion process
  • step 604-step 605 is the pose transformation process
  • trajectory fusion is the process of registering and fusing the self-positioning trajectory and the global positioning trajectory, so as to realize the self-positioning trajectory from the self-positioning trajectory
  • the transformation from the positioning coordinate system to the 3D visual map coordinate system uses the global positioning results to correct the trajectory.
  • a trajectory fusion is performed. Since not all frames can successfully obtain global positioning trajectories, the poses corresponding to these frames are output to the fused positioning pose of the 3D visual map coordinate system through pose transformation, that is, the pose transformation process.
  • the 3D visualization map of the target scene It is necessary to pre-build a 3D visualization map of the target scene and store the 3D visualization map in the server, and the server can display the trajectory based on the 3D visualization map.
  • the 3D visualization map is a 3D visualization map of the target scene, which is mainly used for trajectory display and can be obtained through laser scanning and manual modeling.
  • the 3D visualized map is a viewable visualized map, for example, it can be obtained by using a composition algorithm, and this application does not limit the construction method of the 3D visualized map.
  • the 3D visual map and the 3D visual map Based on the 3D visual map of the target scene and the 3D visual map of the target scene, it is necessary to register the 3D visual map and the 3D visual map to ensure that the 3D visual map and the 3D visual map are aligned in space. For example, sampling the 3D visual map, changing the 3D visual map from a triangular patch form to a dense point cloud form, and using the point cloud and the 3D point cloud of the 3D visual map to pass ICP (Iterative Closest Point, Iterative Closest Point) The algorithm performs registration to obtain the transformation matrix T from the 3D visual map to the 3D visual map; finally, the transformation matrix T is used to transform the 3D visual map into the 3D visual map coordinate system, and a 3D visual map aligned with the 3D visual map is obtained.
  • ICP Iterative Closest Point, Iterative Closest Point
  • the transformation matrix T (referred to as the target transformation matrix) can be determined in the following manner:
  • Method 1 When constructing a 3D visual map and a 3D visualization map, multiple calibration points can be deployed in the target scene (different calibration points can be distinguished by different shapes, so that the calibration points can be identified from the image), and the 3D visual map can include multiple A calibrated point, the 3D visualization map can also include multiple calibrated points. For each of the multiple calibration points, a coordinate pair corresponding to the calibration point can be determined, and the coordinate pair includes the position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map. The target transformation matrix can be determined based on the coordinate pairs corresponding to the multiple calibration points.
  • Method 2 Obtain an initial transformation matrix, map the position coordinates in the 3D visual map to mapping coordinates in the 3D visual map based on the initial transformation matrix, and determine the initial transformation matrix based on the relationship between the mapped coordinates and the actual coordinates in the 3D visual map Whether it has converged; if so, the initial transformation matrix can be determined as the target transformation matrix, that is, the target transformation matrix is obtained; if not, the initial transformation matrix can be adjusted, and the adjusted transformation matrix can be used as the initial transformation matrix, and then , returning to perform the operation of mapping the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map based on the initial transformation matrix, and so on until the target transformation matrix is obtained.
  • an initial transformation matrix can be obtained first, and there is no restriction on the method of obtaining the initial transformation matrix. It can be an initial transformation matrix set randomly, or an initial transformation matrix obtained by a certain algorithm. This initial transformation matrix is required
  • the iteratively optimized matrix that is, iteratively optimizes the initial transformation matrix continuously, and uses the iteratively optimized initial transformation matrix as the target transformation matrix.
  • the position coordinates in the 3D visual map can be mapped to the mapping coordinates in the 3D visual map based on the initial transformation matrix.
  • the mapping coordinates in the 3D visual map are transformed coordinates based on the initial transformation matrix
  • the actual coordinates in the 3D visual map are the real coordinates in the 3D visual map, that is, the position coordinates in the 3D visual map correspond to
  • the difference between the mapped coordinates and the actual coordinates is smaller, it means that the accuracy of the initial transformation matrix is higher.
  • the difference between the mapped coordinates and the actual coordinates is larger, it means that the accuracy of the initial transformation matrix is worse.
  • it can be determined based on the difference between the mapped coordinates and the actual coordinates whether the initial transformation matrix has converged.
  • the mapped coordinates and the actual coordinates can be the sum of multiple sets of differences, and each set of differences corresponds to a difference between the mapped coordinates and the actual coordinates
  • the threshold it is determined that the initial transformation matrix has converged. If the mapped coordinates If the difference from the actual coordinates is not less than the threshold, it is determined that the initial transformation matrix has not converged.
  • the initial transformation matrix can be adjusted, and there is no restriction on the adjustment process.
  • the ICP algorithm is used to adjust the initial transformation matrix, and the adjusted transformation matrix is used as the initial transformation matrix, and the return execution is based on the initial
  • the transformation matrix maps the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map, and so on until the target transformation matrix is obtained. If the initial transformation matrix has converged, the initial transformation matrix is determined as the target transformation matrix.
  • Mode 3 Sampling the 3D visual map to obtain a first point cloud corresponding to the 3D visual map; sampling the 3D visual map to obtain a second point cloud corresponding to the 3D visual map.
  • the ICP algorithm is used to register the first point cloud and the second point cloud, and the target transformation matrix between the 3D visual map and the 3D visual map is obtained.
  • the first point cloud and the second point cloud can be obtained, the first point cloud includes a large number of 3D points, the second point cloud includes a large number of 3D points, based on a large number of 3D points of the first point cloud and a large number of 3D points of the second point cloud point, the ICP algorithm can be used for registration, and the registration process is not limited.
  • the server can convert the fusion positioning pose into a 3D visualization map based on the target transformation matrix between the 3D visual map and the 3D visualization map Target positioning pose, and display the target positioning pose through a three-dimensional visualization map.
  • the manager can open a web browser and access the server through the network to view the target positioning poses displayed in the three-dimensional visualization map, and these target positioning poses form a trajectory.
  • the server can display the target positioning pose of the terminal device on the three-dimensional visual map, so that managers can view the target positioning pose displayed on the three-dimensional visual map.
  • Managers can change the viewing angle by dragging the mouse to realize 3D viewing of the track.
  • the server includes client software, and the client software reads and renders the 3D visualization map, and displays the target positioning pose on the 3D visualization map.
  • a user such as a manager
  • the viewing angle of the three-dimensional visualization map can be changed by dragging the mouse.
  • a positioning and display method combining cloud and edge is proposed, and the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing network transmission amount of data.
  • Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices.
  • the system architecture of cloud-edge integration can share the computing pressure, reduce the hardware cost of terminal equipment, and reduce the amount of network transmission data.
  • the final positioning result can be displayed on a 3D visualization map, and the management personnel access the server through the web terminal for interactive display.
  • an embodiment of the present application proposes a cloud edge management system, the cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of a target scene.
  • the terminal device is used to acquire a target image of the target scene and motion data of the terminal device during the process of moving in the target scene, and determine a self-positioning trajectory of the terminal device based on the target image and the motion data ; If the target image includes multiple frames of images, select a part of frame images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server.
  • the server is configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory includes a plurality of fusion positioning poses; for the Fusing each fused positioning pose in the fused positioning trajectory, determining a target positioning pose corresponding to the fused positioning pose, and displaying the target positioning pose.
  • the terminal device includes a visual sensor and a motion sensor; wherein, the visual sensor is used to obtain the target image of the target scene, and the motion sensor is used to obtain the motion data of the terminal device .
  • the terminal device is a wearable device, and the visual sensor and the motion sensor are deployed on the wearable device; or, the terminal device is a recorder, and the visual sensor and the motion sensor are deployed on the on the recorder; or, the terminal device is a camera, and the vision sensor and the motion sensor are deployed on the camera.
  • the server when the server generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, it is specifically used to: determine from the three-dimensional visual map The target map point corresponding to the image to be tested, determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; generating the terminal based on the self-positioning track and the global positioning track The fused positioning trajectory of the device in the three-dimensional visual map; wherein, the frame rate of the fused positioning pose included in the fused positioning trajectory is greater than the frame rate of the global positioning pose included in the global positioning trajectory; the fused positioning trajectory The frame rate of the included fused localization pose is equal to the frame rate of the self-localization pose included in the self-localization trajectory.
  • the server determines the target positioning pose corresponding to the fused positioning pose, and when displaying the target positioning pose is specifically used to: based on the target transformation matrix between the 3D visual map and the 3D visual map , converting the fusion positioning pose into the target positioning pose in the 3D visualization map, and displaying the target positioning pose through the 3D visualization map;
  • the server includes client software, and the client The terminal software reads and renders the three-dimensional visualization map, and displays the target positioning pose on the three-dimensional visualization map; wherein, the user accesses the client software through a Web browser to pass the client software View the target positioning pose displayed in the three-dimensional visual map; wherein, when viewing the target positioning pose displayed in the three-dimensional visual map through the client software, drag the mouse to change the three-dimensional The viewing angle of the visualized map.
  • a pose display device is proposed in the embodiment of the present application, which is applied to the server in the cloud edge management system, and the server includes a three-dimensional visual map of the target scene, as shown in FIG. 7 , which is Structural diagram of the pose display device.
  • the pose display device includes: an acquisition module 71, configured to acquire an image to be tested and a self-positioning trajectory; wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device , the image to be tested is a partial frame image in the multi-frame images included in the target image; a generation module 72, configured to generate the terminal device in the three-dimensional vision based on the image to be tested and the self-positioning trajectory
  • the fusion positioning track in the map, the fusion positioning track includes a plurality of fusion positioning poses; the display module 73 is used to determine the fusion positioning pose corresponding to each fusion positioning pose in the fusion positioning track. target location pose, and display the target location pose.
  • the generation module 72 when the generation module 72 generates the fused positioning trajectory of the terminal device in the 3D visual map based on the image to be tested and the self-positioning trajectory, it is specifically used to: determine the target corresponding to the image to be tested from the 3D visual map map point, determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; generating the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track
  • the fusion positioning track; the frame rate of the fusion positioning pose included in the fusion positioning track is greater than the frame rate of the global positioning pose included in the global positioning track; the frame rate of the fusion positioning pose included in the fusion positioning track is equal to the self-positioning
  • the frame rate of the self-localization pose included in the trajectory when the generation module 72 generates the fused positioning trajectory of the terminal device in the 3D visual map based on the image to be tested and the self-positioning trajectory.
  • the three-dimensional visual map includes at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, a sample local descriptor corresponding to a feature point in the sample image, and map point information;
  • the generation module 72 determines the target map point corresponding to the image to be tested from the three-dimensional visual map, and when determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point, it is specifically used to: for each frame to be tested Based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, candidate sample images are selected from the multi-frame sample images; multiple feature points are obtained from the image to be tested; for each feature point, determine the target map point corresponding to the feature point from the multiple map points corresponding to the candidate sample image; determine the image to be tested based on the multiple feature points and the target map points corresponding to the multiple feature points The corresponding global positioning pose in the three-dimensional visual map; generating
  • the generation module 72 is based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, when selecting candidate sample images from the multi-frame sample images, it is specifically used to : Determine the global descriptor to be tested corresponding to the image to be tested, and determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of the sample image corresponding to the three-dimensional visual map; Measure the distance between the global descriptor and each sample global descriptor, and select the candidate sample image from the multi-frame sample images; wherein, the sample corresponding to the global descriptor to be tested and the candidate sample image
  • the distance between the global descriptors is the minimum distance among the distances between the global descriptor to be tested and each sample global descriptor, and/or, the sample corresponding to the global descriptor to be tested and the candidate sample image
  • the distance between global descriptors is less than the distance threshold.
  • the generation module 72 determines the global descriptor to be tested corresponding to the image to be tested, it is specifically used to: determine the bag-of-words vector corresponding to the image-to-be-tested based on the trained dictionary model, and convert the bag-of-words vector to The vector is determined as the global descriptor to be tested corresponding to the image to be tested; or, the image to be tested is input to a trained deep learning model to obtain a target vector corresponding to the image to be tested, and the target The vector is determined as the global descriptor to be tested corresponding to the image to be tested.
  • the generation module 72 determines the target map point corresponding to the feature point from the plurality of map points corresponding to the candidate sample image, it is specifically used to: determine the local descriptor to be measured corresponding to the feature point, The local descriptor to be tested is used to represent the feature vector of the image block where the feature point is located, and the image block is located in the image to be tested; determine the local descriptor to be tested corresponding to the candidate sample image The distance between the sample local descriptors corresponding to each map point; based on the distance between the local descriptor to be tested and each sample local descriptor, select the selected map point from the plurality of map points corresponding to the candidate sample image The target map point; wherein, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is that the local descriptor to be tested corresponds to each map point corresponding to the candidate sample image The minimum distance among the distances between the sample local descriptors, and/or, the distance between the local descriptor to
  • the generation module 72 when the generation module 72 generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory, it is specifically used to: select from all self-positioning poses included in the self-positioning trajectory N self-positioning poses corresponding to the target time period, and selecting P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory; N is greater than P; based on The N self-positioning poses and the P global positioning poses determine N fusion positioning poses corresponding to the N self-positioning poses, and the N self-positioning poses are in one-to-one correspondence with the N fusion positioning poses; A fused positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the N fused positioning poses.
  • the display module 73 determines the target positioning pose corresponding to the fused positioning pose, and when displaying the target positioning pose, is specifically used for: based on the target between the 3D visual map and the 3D visual map transformation matrix, converting the fused positioning pose into the target positioning pose in the three-dimensional visualization map, and displaying the target positioning pose through the three-dimensional visualization map; wherein, the display module 73 is also used to adopt The target transformation matrix between the three-dimensional visual map and the three-dimensional visual map is determined in the following manner: for each of the multiple marked points in the target scene, a coordinate pair corresponding to the marked point is determined, and the coordinate pair includes the The position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map; determine the target transformation matrix based on the coordinate pairs corresponding to the plurality of calibration points; or, obtain the initial transformation matrix, mapping position coordinates in the 3D visual map to mapping coordinates in the 3D visual map based on the initial transformation matrix, and determining the initial transformation matrix
  • the server may include: a processor and a machine-readable storage medium, and the machine-readable storage medium stores information that can be executed by the processor. machine-executable instructions; the processor is configured to execute the machine-executable instructions to implement the pose display method disclosed in the above examples of the present application.
  • the embodiment of the present application also provides a machine-readable storage medium, on which several computer instructions are stored, and when the computer instructions are executed by a processor, the present invention can be realized. Apply the pose display method disclosed in the above example.
  • the above-mentioned machine-readable storage medium may be any electronic, magnetic, optical or other physical storage device, which may contain or store information, such as executable instructions, data, and so on.
  • the machine-readable storage medium can be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard disk drive), solid state drive, any type of storage disk (such as CD, DVD, etc.), or similar storage media, or a combination of them.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable equipment to produce computer-implemented processing, so that the information executed on the computer or other programmable equipment
  • the instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un procédé et un appareil d'affichage de pose, et un système. Le procédé consiste : pendant le processus de déplacement d'un dispositif terminal dans une scène cible, à acquérir une image cible de la scène cible et des données de déplacement du dispositif terminal, et à déterminer une trajectoire d'auto-positionnement en fonction de l'image cible et des données de déplacement (101) ; si l'image cible comprend une pluralité de trames d'images, à sélectionner, par le dispositif terminal, parmi la pluralité de trames d'image, certaines images en tant qu'images à soumettre à une détection, et à envoyer lesdites images et la trajectoire d'auto-positionnement à un serveur (102) ; à générer, par le serveur, en fonction desdites images et de la trajectoire d'auto-positionnement, une trajectoire de positionnement fusionnée du dispositif terminal dans une carte visuelle tridimensionnelle, la trajectoire de positionnement fusionnée comprenant une pluralité de poses de positionnement fusionnées (103) ; et pour chaque pose de positionnement fusionnée, à déterminer, par le serveur, une pose de positionnement cible correspondant à la pose de positionnement fusionnée, et à afficher la pose de positionnement cible (104). Ainsi, une fonction de positionnement présentant une fréquence de trame élevée et une précision élevée est mise en œuvre, et un dispositif terminal n'envoie qu'une trajectoire d'auto-positionnement et des images à soumettre à une détection, ce qui permet de réduire la quantité de données transmises par un réseau, et de réduire la consommation de ressources informatiques et la consommation de ressources de stockage du dispositif terminal.
PCT/CN2022/131134 2021-11-15 2022-11-10 Procédé et appareil d'affichage de pose, et système, serveur, et support de stockage WO2023083256A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111350621.9A CN114185073A (zh) 2021-11-15 2021-11-15 一种位姿显示方法、装置及系统
CN202111350621.9 2021-11-15

Publications (1)

Publication Number Publication Date
WO2023083256A1 true WO2023083256A1 (fr) 2023-05-19

Family

ID=80540921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131134 WO2023083256A1 (fr) 2021-11-15 2022-11-10 Procédé et appareil d'affichage de pose, et système, serveur, et support de stockage

Country Status (2)

Country Link
CN (1) CN114185073A (fr)
WO (1) WO2023083256A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185073A (zh) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 一种位姿显示方法、装置及系统
CN117346650A (zh) * 2022-06-28 2024-01-05 中兴通讯股份有限公司 视觉定位的位姿确定方法、装置以及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080167814A1 (en) * 2006-12-01 2008-07-10 Supun Samarasekera Unified framework for precise vision-aided navigation
CN105143821A (zh) * 2013-04-30 2015-12-09 高通股份有限公司 依据slam地图的广域定位
CN107818592A (zh) * 2017-11-24 2018-03-20 北京华捷艾米科技有限公司 协作式同步定位与地图构建的方法、系统及交互系统
CN113382365A (zh) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 移动终端的位姿跟踪方法及设备
CN114120301A (zh) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 一种位姿确定方法、装置及设备
CN114185073A (zh) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 一种位姿显示方法、装置及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080167814A1 (en) * 2006-12-01 2008-07-10 Supun Samarasekera Unified framework for precise vision-aided navigation
CN105143821A (zh) * 2013-04-30 2015-12-09 高通股份有限公司 依据slam地图的广域定位
CN107818592A (zh) * 2017-11-24 2018-03-20 北京华捷艾米科技有限公司 协作式同步定位与地图构建的方法、系统及交互系统
CN113382365A (zh) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 移动终端的位姿跟踪方法及设备
CN114120301A (zh) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 一种位姿确定方法、装置及设备
CN114185073A (zh) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 一种位姿显示方法、装置及系统

Also Published As

Publication number Publication date
CN114185073A (zh) 2022-03-15

Similar Documents

Publication Publication Date Title
CN112567201B (zh) 距离测量方法以及设备
US10134196B2 (en) Mobile augmented reality system
WO2023083256A1 (fr) Procédé et appareil d'affichage de pose, et système, serveur, et support de stockage
CN111081199B (zh) 选择用于显示的时间分布的全景图像
US9342927B2 (en) Augmented reality system for position identification
Chen et al. Rise of the indoor crowd: Reconstruction of building interior view via mobile crowdsourcing
CN110617821B (zh) 定位方法、装置及存储介质
CN108700947A (zh) 用于并发测距和建图的系统和方法
CN111127524A (zh) 一种轨迹跟踪与三维重建方法、系统及装置
KR20150013709A (ko) 컴퓨터 생성된 3d 객체들 및 필름 카메라로부터의 비디오 공급을 실시간으로 믹싱 또는 합성하기 위한 시스템
CN110533719B (zh) 基于环境视觉特征点识别技术的增强现实定位方法及装置
US9551579B1 (en) Automatic connection of images using visual features
WO2023060964A1 (fr) Procédé d'étalonnage et appareil, dispositif, support de stockage et produit-programme informatique associés
CN112288853A (zh) 三维重建方法、三维重建装置、存储介质
CN114120301A (zh) 一种位姿确定方法、装置及设备
CN112907557A (zh) 道路检测方法、装置、计算设备及存储介质
Ma et al. Location and 3-D visual awareness-based dynamic texture updating for indoor 3-D model
Zhu et al. PairCon-SLAM: Distributed, online, and real-time RGBD-SLAM in large scenarios
CN116843754A (zh) 一种基于多特征融合的视觉定位方法及系统
WO2023140990A1 (fr) Odométrie inertielle visuelle avec profondeur d'apprentissage automatique
Hu et al. Real-time camera localization with deep learning and sensor fusion
Liu et al. LSFB: A low-cost and scalable framework for building large-scale localization benchmark
Liu et al. HyperSight: Boosting distant 3D vision on a single dual-camera smartphone
WO2021111613A1 (fr) Dispositif de création de carte tridimensionnelle, procédé de création de carte tridimensionnelle et programme de création tridimensionnelle
Porzi et al. An automatic image-to-DEM alignment approach for annotating mountains pictures on a smartphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892049

Country of ref document: EP

Kind code of ref document: A1

WD Withdrawal of designations after international publication