WO2024001849A1 - Procédé et appareil de détermination de pose basés sur la localisation visuelle, et dispositif électronique - Google Patents

Procédé et appareil de détermination de pose basés sur la localisation visuelle, et dispositif électronique Download PDF

Info

Publication number
WO2024001849A1
WO2024001849A1 PCT/CN2023/101166 CN2023101166W WO2024001849A1 WO 2024001849 A1 WO2024001849 A1 WO 2024001849A1 CN 2023101166 W CN2023101166 W CN 2023101166W WO 2024001849 A1 WO2024001849 A1 WO 2024001849A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
terminal
images
constrained
target
Prior art date
Application number
PCT/CN2023/101166
Other languages
English (en)
Chinese (zh)
Inventor
武廷繁
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024001849A1 publication Critical patent/WO2024001849A1/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • Embodiments of the present invention relate to the field of navigation, and specifically, to a visual positioning pose determination method, device, and electronic equipment.
  • the device requesting navigation needs to be positioned.
  • the methods that can be used include visual positioning method.
  • the position positioned by the visual positioning method is easy to be inaccurate.
  • Embodiments of the present invention provide a visual positioning pose determination method, device, and electronic equipment to at least solve the technical problem of inaccurate positioning poses.
  • a method for determining the pose of visual positioning including: during the movement of a terminal equipped with a camera, acquiring multiple images captured by the terminal; from the multiple images, Select multiple target images based on disparity; upload the multiple target images to the cloud to obtain the constrained pose of the terminal; determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • a device for determining the posture of visual positioning including: an acquisition module configured to acquire multiple images captured by the terminal when the terminal equipped with a camera is moving; select The module is configured to select multiple target images from the multiple images mentioned above based on parallax; the upload module is configured to upload the multiple target images mentioned above to the cloud to obtain to the constrained pose of the terminal; the determination module is configured to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • a storage medium is also provided, and a computer program is stored in the storage medium, wherein the computer program executes the above-mentioned pose determination method for visual positioning when run by a processor.
  • an electronic device including a memory and a processor.
  • a computer program is stored in the memory, and the processor is configured to perform the above-mentioned visual positioning pose through the computer program. Determine the method.
  • Figure 1 is a flow chart of an optional visual positioning pose determination method according to an embodiment of the present invention.
  • Figure 2 is a flow chart of a local V0 scale recovery algorithm of an optional visual positioning pose determination method according to an embodiment of the present invention
  • Figure 3 is a block diagram of a navigation system based on local V0 scale recovery according to an optional visual positioning pose determination method according to an embodiment of the present invention
  • Figure 4 is a schematic structural diagram of an optional visual positioning posture determination device according to an embodiment of the present invention.
  • Figure 5 is a schematic diagram of an optional electronic device according to an embodiment of the present invention.
  • a method for determining the pose of visual positioning is provided.
  • the above method includes:
  • S104 select multiple target images from multiple images based on disparity
  • S108 Determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • the posture may be the movement trajectory and position of the terminal.
  • the purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.
  • the above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera.
  • the camera can be a single camera or a camera array composed of multiple cameras.
  • the above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.
  • multiple target images can be selected based on disparity.
  • multiple target images After being uploaded to the cloud, the cloud can determine the constrained pose of the terminal based on the target image.
  • the constrained pose is the pose used to constrain the local pose of the terminal.
  • the constrained pose is sent to the terminal, and then the terminal determines the constrained pose according to the constrained position. pose and local pose to determine the accurate target pose of the terminal.
  • the target pose After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.
  • the constrained pose of the above-mentioned terminal is determined by selecting the target image based on the parallax in the multiple images, and using the constrained pose to constrain the local position, the accurate target pose of the terminal can be determined. This achieves the purpose of improving the accuracy of the determined pose, thereby solving the technical problem of inaccurate positioning.
  • selecting multiple target images from multiple images based on disparity includes:
  • the two images with the largest disparity are used as the images among the plurality of target images.
  • an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.
  • determining the target pose of the terminal based on the constrained pose and the local pose of the terminal includes:
  • the transformation matrix is determined based on the constrained pose and the local pose
  • the product of the local pose and the scale factor is used as the target pose.
  • the transformation matrix can be determined based on the constrained posture and the local posture.
  • the scale factor is obtained from the transformation matrix.
  • the scale factor is a factor used to adjust the local posture of the terminal.
  • the local posture of the terminal is multiplied by the scale factor to obtain the calculated posture.
  • the calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.
  • determining the transformation matrix based on the constrained pose and the local pose includes:
  • the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.
  • obtaining the scale factors from the transformation matrix includes:
  • the scale factor s can be calculated.
  • uploading multiple target images to the cloud to obtain the constrained pose of the terminal includes:
  • the cloud repositions each target image among the multiple target images according to the navigation map, and obtains the relocation position corresponding to each target image;
  • the cloud arranges the relocation positions in order to obtain the constrained pose.
  • the target images can be uploaded to the cloud.
  • a navigation map is saved in the cloud, and the navigation map is a map within a certain area.
  • Multiple target images can be used to determine images with high similarity in the navigation map.
  • the position of each target image in the multiple target images can be determined in the navigation map. Locations are arranged in chronological order You can get the pose. The obtained pose is used as the constrained pose.
  • the above method also includes:
  • the cloud obtains the panoramic video in the navigation area and multiple captured images in the navigation area;
  • the navigation map in this embodiment needs to be obtained in advance.
  • Panoramic videos can be captured in the navigation area and multiple captured images can be captured.
  • the panoramic videos and captured images can be used to generate point cloud maps.
  • the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.
  • generating point cloud maps based on panoramic videos and captured images includes:
  • target frames can be extracted from the panoramic video, and each target frame is an image.
  • the sparse point cloud is densified to obtain a point cloud map.
  • This embodiment proposes a monocular visual mileage (V0) scale recovery algorithm combined with cloud relocation (visual positioning), so that local (terminal) V0 can be effectively applied in navigation user tracking.
  • the key to the local V0 scale recovery algorithm combined with cloud relocation is to run V0 on the terminal when navigation starts, and select a number of key frames at certain intervals to send to the cloud.
  • the cloud repositions these key frames to obtain their corresponding constraint bits. pose, and returns the constrained pose to the terminal; the terminal uses the returned constrained pose as a constraint, adds it to the calculation process of the local pose, solves a transformation matrix, and decomposes the scale factor from the transformation matrix.
  • this scale Factors can restore the true scale of the local position.
  • the pose constraints returned by the cloud are added to the local pose calculation, it can improve The reliability of the local posture calculated by V0.
  • the local posture can be tracked accurately and efficiently over a long period of time, effectively improving the efficiency and accuracy of user tracking during navigation.
  • the application scenarios are also wider, without distinguishing between indoor and outdoor scenes.
  • FIG. 2 is the basic flow chart of the local V0 scale recovery algorithm.
  • the basic process of the algorithm includes: key frame screening and uploading, key frame relocation, solving similarity transformation and V0 scale recovery.
  • Key frames refer to the key frames generated when the terminal runs V0. First, some of the key frames recently solved by V0 are screened on the terminal, and the ones with the largest parallax are selected and uploaded to the cloud for relocation.
  • the terminal refers to a common smartphone, nothing special. Model requirements.
  • Cloud relocation means that the cloud visually locates the received key frames and returns the positioning results to the terminal. After the terminal obtains the poses of these uploaded key frames, it adds these poses as constraints to the calculation of the local pose, and finally can solve a similarity transformation.
  • the scale factor decomposed from the similarity transformation can be used to recover the true scale of the local posture.
  • Figure 3 is the overall block diagram of the navigation system based on local V0 scale recovery.
  • the system includes cloud and terminal.
  • the cloud includes navigation map generation and navigation services, and the terminal provides functional interfaces for users.
  • Navigation map generation includes collecting original mapping data, generating point cloud maps, and generating navigation maps based on point cloud maps and plane maps.
  • Navigation services provided by the cloud include: identification and positioning, path planning and real-time navigation.
  • the functions provided to users by the terminal include: starting navigation, initial positioning, destination selection, and real-time navigation.
  • Cloud offline navigation map generates high-precision maps for navigation.
  • Map use a panoramic camera to shoot panoramic videos.
  • pictures of some scenes on the map taken by a general monocular camera are required, and the precise position of the picture is obtained through real-time differential positioning (Real-time kinematic, RTK).
  • RTK real-time differential positioning
  • the point cloud map is obtained using the original data and the panoramic 3D reconstruction algorithm.
  • the final navigation map is obtained based on the point cloud map and the plane map.
  • the navigation map is stored in the cloud, and the corresponding area map is loaded every time the navigation is started.
  • the entire navigation service is conducted within the scope of the map.
  • the cloud navigation service mainly focuses on initial positioning and relocation tasks during user tracking.
  • the real-time navigation process uses the local V0 scale recovery proposed in the embodiment of the present invention.
  • the cloud is deployed on high-performance servers, and the network needs to be kept open at the same time. After the navigation is started, the terminal will continuously interact with the cloud navigation service to achieve real-time navigation.
  • Step 1 Use a panoramic camera to shoot the navigation service (map) area to obtain a panoramic video.
  • the shooting process contains at least one "loopback". Looping means to circle back to the "origin” after shooting for a certain distance.
  • the "origin” does not specifically refer to the initial scanning position, but to part of the scene that has been walked through during the scanning process.
  • the shooting route is similar to the five Olympic rings. That is, the panoramic video collected by hand contains images/videos of the same object from different angles.
  • Step 2 Take several pictures of part of the scene, and use RTK to obtain the precise location coordinates of the taken pictures.
  • RTK Radio Timing Determination Protocol
  • Step 3 Use the 3D reconstruction algorithm to perform 3D reconstruction of the data collected in steps 1 and 2 to generate a point cloud map.
  • the basic process of the three-dimensional reconstruction algorithm is: extract frames from the panoramic video, run Simultaneous Localization and Mapping (SLAM) to obtain panoramic key frames and poses, cut panoramic images to generate monocular images and their corresponding poses, and use Monocular images and poses are run through a cross matrix structure (Structure from Motion, SfM) to generate sparse point clouds, and densification generates dense point clouds.
  • SLAM Simultaneous Localization and Mapping
  • Step 4 Combine the flat map and the point cloud map generated in step 3 to generate the navigation map used in the navigation process and save it to the cloud.
  • Step 1 The terminal starts the augmented reality (AR) navigation service, and the cloud loads the navigation map generated in process 1.
  • the terminal refers to a smartphone with a smooth network, and there are no special requirements for the brand.
  • Step 2 The terminal starts initial positioning, uses the camera to take pictures of the current environment, and uploads them to Cloud.
  • Step 3 The cloud performs initial positioning on the current environment picture. After obtaining the pose of the current picture, it returns it to the terminal as the user's initial position.
  • Step 4 After the terminal obtains the initial position, select the navigation destination and upload it to the cloud.
  • Step 5 The cloud plans the navigation path based on the starting point and destination of the navigation, and renders the movement direction indicator on the screen of the terminal.
  • Step 6 The user moves according to the instructions, and the terminal screen displays the picture captured by the current camera. At the same time, the terminal starts V0 and turns on user tracking.
  • Step 7 When the terminal V0 reaches a certain time, start the local V0 scale recovery algorithm.
  • the algorithm process is shown in Figure 1. The specific steps are as follows
  • Step 7.1 The terminal selects several key frames with the largest average disparity from some of the recent key frames obtained by V0 and uploads them to the cloud;
  • Step 7.2 The cloud repositions the uploaded key frames and returns the corresponding pose
  • Step 7.3 The terminal adds the cloud relocation key frame pose as a priori to the calculation of V0, and defines the residual as the prior pose minus the similarity transformation matrix multiplied by the local pose.
  • the transformation matrix T is obtained through the above formula 1
  • the scale factor s is obtained through the above formula 2.
  • Step 8 Use the pose solved by recovering the local V0 of the true scale as the user's current pose to achieve multi-modal user tracking that integrates local V0 and cloud relocation. At the same time, the terminal continuously uses the current location to determine whether the destination has been reached.
  • mapping algorithm and recognition algorithm are deployed in the cloud during implementation.
  • the specific implementation process is as follows:
  • Step 1 Follow step 1 of process 1 to collect original video data.
  • the indoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot.
  • the panoramic camera can also be fixed by other equipment, such as a helmet.
  • Step 2 Collect image data according to step 2 of process 1.
  • Image data is generally collected through mobile phones Take photos or other devices that can take photos. Usually shooting more typical scenes, such as store signs, etc., such scenes are more likely to be the starting point or end point of navigation.
  • Step 3 Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.
  • Step 4 Follow step 4 of process 1 to generate a navigation map.
  • the point cloud map is the point cloud map generated in step 3.
  • the floor map uses the CAD drawing of the building.
  • Step 5 Follow steps 1 and 2 of process 2 to start navigation.
  • the cloud loads map information and starts to accept messages from the terminal.
  • the terminal uploads pictures of the current environment.
  • Step 6 Follow steps 3, 4 and 5 of process 2 to generate a navigation path.
  • the cloud performs initial positioning based on the uploaded image and returns the result to the terminal.
  • the terminal selects the navigation destination and uploads it to the cloud.
  • the cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.
  • Step 7 follow steps 6 to 8 of process 2.
  • the terminal starts V0.
  • V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm.
  • the local V0 can achieve relatively accurate user tracking.
  • the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.
  • mapping algorithm and recognition algorithm are deployed in the cloud.
  • the specific implementation process is as follows:
  • Step 1 Collect original video data as described in Step 1 of Process 1.
  • the outdoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot. If the scene is large, you can also use other methods such as a drone equipped with a panoramic camera. The shooting route must still comply with the description in step 1 of the process.
  • Step 2 Collect image data according to step 2 of process 1.
  • Image data is generally collected through mobile phones Take photos or other devices that can take photos. We usually shoot more typical scenes, such as road signs, building gates, etc. Such scenes are more likely to be the starting point or end point of navigation.
  • Step 3 Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.
  • Step 4 follow step 4 of process 1 to generate a navigation map.
  • the point cloud map is the point cloud map generated in step 3.
  • the plan map can be a plan CAD map and road network information.
  • Step 5 Follow steps 1 and 2 of process 2 to start navigation.
  • the cloud loads map information and starts to accept messages from the terminal.
  • the terminal uploads pictures of the current environment.
  • Step 6 Follow steps 3, 4 and 5 of process 2 to generate a navigation path.
  • the cloud performs initial positioning based on the uploaded image and returns the result to the terminal.
  • the terminal selects the navigation destination and uploads it to the cloud.
  • the cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.
  • Step 7 follow steps 6 to 8 of process 2.
  • the terminal starts V0.
  • V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm.
  • the local V0 can achieve relatively accurate user tracking.
  • the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.
  • a visual positioning pose determination device is also provided, as shown in Figure 4, including:
  • the acquisition module 402 is used to acquire multiple images captured by the terminal during the movement of the terminal equipped with a camera;
  • the selection module 404 is used to select multiple target images from multiple images based on disparity
  • the upload module 406 is used to upload multiple target images to the cloud to obtain the constrained pose of the terminal;
  • the determination module 408 is used to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • the posture may be the movement trajectory and position of the terminal.
  • the purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.
  • the above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera.
  • the camera can be a single camera or a camera array composed of multiple cameras.
  • the above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.
  • multiple target images can be selected based on disparity.
  • the cloud can determine the constrained pose of the terminal based on the target image.
  • the constrained pose is a pose used to constrain the local pose of the terminal.
  • the constrained pose is sent to the terminal, and then the constrained pose is sent to the terminal.
  • the terminal determines the accurate target pose of the terminal based on the constrained pose and the local pose. After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.
  • the constrained pose is determined by the target image selected based on the parallax in the multiple images, and the constrained pose is used to constrain the local position, the terminal can be determined Accurate target pose.
  • the terminal can be determined Accurate target pose.
  • the above-mentioned selection module includes: a first determination unit, used to determine multiple first images of the same object from the above-mentioned multiple images; a second determination unit, used to determine the above-mentioned multiple first images Among the images, the two images with the largest disparity are used as the images among the multiple target images mentioned above.
  • an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.
  • the above-mentioned determination module includes: a third determination unit, used to determine the transformation matrix according to the above-mentioned constrained posture and the above-mentioned local posture; an acquisition unit, used to obtain the scale factor from the above-mentioned transformation matrix; Four determination units, configured to use the product of the above-mentioned local posture and the above-mentioned scale factor as the above-mentioned target posture.
  • the transformation matrix can be determined based on the constrained posture and the local posture.
  • the scale factor is obtained from the transformation matrix.
  • the scale factor is a factor used to adjust the local posture of the terminal.
  • the local posture of the terminal is multiplied by the scale factor to obtain the calculated posture.
  • the calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.
  • the above-mentioned third determination unit includes: a first input sub-unit, used to substitute the first numerical value of the above-mentioned local posture and the second numerical value of the above-mentioned constrained posture into the above-mentioned formula 1 to obtain the above-mentioned transformation matrix and residuals.
  • the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.
  • the above-mentioned acquisition unit includes: a second input sub-unit, used to substitute the relative rotation and relative offset of the above-mentioned constrained posture and the above-mentioned local posture into the above-mentioned formula 2 to obtain the above-mentioned scale factor.
  • the scale factor s can be calculated.
  • the above-mentioned upload module includes: a relocation unit, used to notify the above-mentioned cloud to relocate each of the above-mentioned multiple target images according to the navigation map, and obtain the corresponding The relocation positions; the above-mentioned cloud arranges the above-mentioned relocation positions in order to obtain the above-mentioned constrained poses.
  • the target images can be uploaded to the cloud.
  • a navigation map is saved in the cloud, and the navigation map is a map within a certain area.
  • Multiple target images can be used to determine images with high similarity in the navigation map.
  • the position of each target image in the multiple target images can be determined in the navigation map.
  • the positions can be obtained by arranging the positions in chronological order. The obtained pose is used as the constrained pose.
  • the cloud can obtain a panoramic video in the navigation area and multiple captured images in the navigation area; generate a point cloud map based on the panoramic video and the captured images; combine the point cloud map with a plane The maps are combined to obtain the above navigation map.
  • the navigation map in this embodiment needs to be obtained in advance.
  • Panoramic videos can be captured in the navigation area and multiple captured images can be captured.
  • the panoramic videos and captured images can be used to generate point cloud maps.
  • the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.
  • the cloud can extract the target frame from the above panoramic video; determine the first pose of the above target frame; intersect the above first pose to generate a matrix structure to obtain a sparse point cloud; perform the above sparse The point cloud is densified to obtain the above point cloud map.
  • target frames can be extracted from the panoramic video, and each target frame is an image.
  • the sparse point cloud is densified to obtain a point cloud map.
  • Figure 5 is a structural block diagram of an optional electronic device according to an embodiment of the present application. As shown in Figure 5, it includes a processor 502, a communication interface 504, a memory 506 and a communication bus 508. The processor 502, the communication interface 504 and memory 506 complete communication with each other through communication bus 508, where,
  • Memory 506 for storing computer programs
  • the processor 502 is used to implement the following steps when executing the computer program stored on the memory 506:
  • the target pose of the terminal is determined based on the constrained pose and the local pose of the terminal.
  • the above-mentioned communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above-mentioned electronic devices and other devices.
  • the memory may include RAM or non-volatile memory, such as at least one disk memory.
  • the memory may also be at least one storage device located remotely from the aforementioned processor.
  • the above memory 506 may include, but is not limited to, the acquisition module 402, the selection module 404, the upload module 406 and the determination module 408 in the above visual positioning pose determination device. In addition, it may also include but is not limited to other module units in the above-mentioned visual positioning posture determination device, which will not be described again in this example.
  • the above-mentioned processor may be a general-purpose processor and may include but is not limited to: a central processing unit (Central Processing Unit). Processing Unit (CPU), Network Processor (NP), etc.; it can also be a Digital Signal Processing (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array ( Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the device that implements the pose determination method of visual positioning can be a terminal device, and the terminal device can be a smart phone (such as an Android phone, iOS phone, etc.) , tablet computers, handheld computers, and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal devices.
  • Figure 5 does not limit the structure of the above electronic device.
  • the electronic device may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 5 , or have a different configuration than that shown in FIG. 5 .
  • the program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, ROM, RAM, magnetic disk or optical disk, etc.
  • a computer-readable storage medium stores a computer program, wherein the computer program executes the above-mentioned visual positioning when run by the processor. Steps in the pose determination method.
  • the storage media can include: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • the integrated units in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in the above computer-readable storage medium.
  • the technical solution of the embodiment of the present invention is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium. , including several instructions to cause one or more computer devices (which can be personal computers, servers, network devices, etc.) to execute all or part of the steps of the methods described in various embodiments of the embodiments of the present invention.
  • the disclosed client can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in the embodiment of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Image Processing (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

Un procédé de détermination de pose basé sur la localisation visuelle comprend : pendant un processus de déplacement d'un terminal sur lequel une caméra est montée, l'acquisition d'une pluralité d'images capturées par le terminal (S102) ; la sélection d'une pluralité d'images cibles parmi la pluralité d'images selon une disparité (S104) ; le téléchargement de la pluralité d'images cibles vers un nuage pour acquérir une pose de contrainte du terminal (S106) ; et la détermination d'une pose cible du terminal selon la pose de contrainte et une pose locale du terminal (S108). Sont en outre divulgués un appareil de détermination de pose basé sur la localisation visuelle et un dispositif électronique.
PCT/CN2023/101166 2022-06-28 2023-06-19 Procédé et appareil de détermination de pose basés sur la localisation visuelle, et dispositif électronique WO2024001849A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210751878.3A CN117346650A (zh) 2022-06-28 2022-06-28 视觉定位的位姿确定方法、装置以及电子设备
CN202210751878.3 2022-06-28

Publications (1)

Publication Number Publication Date
WO2024001849A1 true WO2024001849A1 (fr) 2024-01-04

Family

ID=89369772

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101166 WO2024001849A1 (fr) 2022-06-28 2023-06-19 Procédé et appareil de détermination de pose basés sur la localisation visuelle, et dispositif électronique

Country Status (2)

Country Link
CN (1) CN117346650A (fr)
WO (1) WO2024001849A1 (fr)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018124211A1 (de) * 2017-10-06 2019-04-11 Nvidia Corporation Lernbasierte Kameraposenschätzung von Bildern einer Umgebung
CN112197770A (zh) * 2020-12-02 2021-01-08 北京欣奕华数字科技有限公司 一种机器人的定位方法及其定位装置
CN112270710A (zh) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 位姿确定方法、位姿确定装置、存储介质与电子设备
CN112819860A (zh) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 视觉惯性系统初始化方法及装置、介质和电子设备
CN113029128A (zh) * 2021-03-25 2021-06-25 浙江商汤科技开发有限公司 视觉导航方法及相关装置、移动终端、存储介质
CN113409391A (zh) * 2021-06-25 2021-09-17 浙江商汤科技开发有限公司 视觉定位方法及相关装置、设备和存储介质
WO2022002039A1 (fr) * 2020-06-30 2022-01-06 杭州海康机器人技术有限公司 Procédé et dispositif de positionnement visuel sur la base d'une carte visuelle
CN114120301A (zh) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 一种位姿确定方法、装置及设备
CN114185073A (zh) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 一种位姿显示方法、装置及系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018124211A1 (de) * 2017-10-06 2019-04-11 Nvidia Corporation Lernbasierte Kameraposenschätzung von Bildern einer Umgebung
WO2022002039A1 (fr) * 2020-06-30 2022-01-06 杭州海康机器人技术有限公司 Procédé et dispositif de positionnement visuel sur la base d'une carte visuelle
CN112270710A (zh) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 位姿确定方法、位姿确定装置、存储介质与电子设备
CN112197770A (zh) * 2020-12-02 2021-01-08 北京欣奕华数字科技有限公司 一种机器人的定位方法及其定位装置
CN112819860A (zh) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 视觉惯性系统初始化方法及装置、介质和电子设备
CN113029128A (zh) * 2021-03-25 2021-06-25 浙江商汤科技开发有限公司 视觉导航方法及相关装置、移动终端、存储介质
CN113409391A (zh) * 2021-06-25 2021-09-17 浙江商汤科技开发有限公司 视觉定位方法及相关装置、设备和存储介质
CN114120301A (zh) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 一种位姿确定方法、装置及设备
CN114185073A (zh) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 一种位姿显示方法、装置及系统

Also Published As

Publication number Publication date
CN117346650A (zh) 2024-01-05

Similar Documents

Publication Publication Date Title
EP3457683B1 (fr) Génération dynamique d'image d'une scène basée sur le retrait d'un objet indésirable présent dans la scène
US9159169B2 (en) Image display apparatus, imaging apparatus, image display method, control method for imaging apparatus, and program
AU2009257959B2 (en) 3D content aggregation built into devices
KR102000536B1 (ko) 합성 이미지를 촬영하는 촬영 장치 및 그 방법
CN108958469B (zh) 一种基于增强现实的在虚拟世界增加超链接的方法
WO2010028559A1 (fr) Procédé et dispositif de raccordement d'images
EP2981945A1 (fr) Procédé et appareil permettant de déterminer des informations d'emplacement d'appareil de prise de vues et/ou des informations de pose d'appareil de prise de vues selon un système mondial de coordonnées
CN103945134A (zh) 一种照片的拍摄和查看方法及其终端
US11044398B2 (en) Panoramic light field capture, processing, and display
US10068157B2 (en) Automatic detection of noteworthy locations
CN108776822B (zh) 目标区域检测方法、装置、终端及存储介质
CN110296686A (zh) 基于视觉的定位方法、装置及设备
CN114882106A (zh) 位姿确定方法和装置、设备、介质
JP2016194783A (ja) 画像管理システム、通信端末、通信システム、画像管理方法、及びプログラム
CN112422812B (zh) 图像处理方法、移动终端及存储介质
JP2016194784A (ja) 画像管理システム、通信端末、通信システム、画像管理方法、及びプログラム
KR102100667B1 (ko) 휴대 단말기에서 이미지 데이터를 생성하는 장치 및 방법
CN117196955A (zh) 一种全景图像拼接方法及终端
WO2024001849A1 (fr) Procédé et appareil de détermination de pose basés sur la localisation visuelle, et dispositif électronique
WO2018000299A1 (fr) Procédé d'assistance à l'acquisition d'images par un dispositif
GB2513865A (en) A method for interacting with an augmented reality scene
CN110599602B (zh) 一种ar模型的训练方法、装置、电子设备及存储介质
US20190114793A1 (en) Image Registration Method and Apparatus for Terminal, and Terminal
EP3287912A1 (fr) Méthode pour la création d'objets spatiales basées sur un lieu, méthode pour l'affichage du ledit objet ainsi qu'un système applicatif correspondante
TWI785332B (zh) 基於光標籤的場景重建系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23830029

Country of ref document: EP

Kind code of ref document: A1