WO2024001849A1 - Visual-localization-based pose determination method and apparatus, and electronic device - Google Patents

Visual-localization-based pose determination method and apparatus, and electronic device Download PDF

Info

Publication number
WO2024001849A1
WO2024001849A1 PCT/CN2023/101166 CN2023101166W WO2024001849A1 WO 2024001849 A1 WO2024001849 A1 WO 2024001849A1 CN 2023101166 W CN2023101166 W CN 2023101166W WO 2024001849 A1 WO2024001849 A1 WO 2024001849A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
terminal
images
constrained
target
Prior art date
Application number
PCT/CN2023/101166
Other languages
French (fr)
Chinese (zh)
Inventor
武廷繁
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024001849A1 publication Critical patent/WO2024001849A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • Embodiments of the present invention relate to the field of navigation, and specifically, to a visual positioning pose determination method, device, and electronic equipment.
  • the device requesting navigation needs to be positioned.
  • the methods that can be used include visual positioning method.
  • the position positioned by the visual positioning method is easy to be inaccurate.
  • Embodiments of the present invention provide a visual positioning pose determination method, device, and electronic equipment to at least solve the technical problem of inaccurate positioning poses.
  • a method for determining the pose of visual positioning including: during the movement of a terminal equipped with a camera, acquiring multiple images captured by the terminal; from the multiple images, Select multiple target images based on disparity; upload the multiple target images to the cloud to obtain the constrained pose of the terminal; determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • a device for determining the posture of visual positioning including: an acquisition module configured to acquire multiple images captured by the terminal when the terminal equipped with a camera is moving; select The module is configured to select multiple target images from the multiple images mentioned above based on parallax; the upload module is configured to upload the multiple target images mentioned above to the cloud to obtain to the constrained pose of the terminal; the determination module is configured to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • a storage medium is also provided, and a computer program is stored in the storage medium, wherein the computer program executes the above-mentioned pose determination method for visual positioning when run by a processor.
  • an electronic device including a memory and a processor.
  • a computer program is stored in the memory, and the processor is configured to perform the above-mentioned visual positioning pose through the computer program. Determine the method.
  • Figure 1 is a flow chart of an optional visual positioning pose determination method according to an embodiment of the present invention.
  • Figure 2 is a flow chart of a local V0 scale recovery algorithm of an optional visual positioning pose determination method according to an embodiment of the present invention
  • Figure 3 is a block diagram of a navigation system based on local V0 scale recovery according to an optional visual positioning pose determination method according to an embodiment of the present invention
  • Figure 4 is a schematic structural diagram of an optional visual positioning posture determination device according to an embodiment of the present invention.
  • Figure 5 is a schematic diagram of an optional electronic device according to an embodiment of the present invention.
  • a method for determining the pose of visual positioning is provided.
  • the above method includes:
  • S104 select multiple target images from multiple images based on disparity
  • S108 Determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • the posture may be the movement trajectory and position of the terminal.
  • the purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.
  • the above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera.
  • the camera can be a single camera or a camera array composed of multiple cameras.
  • the above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.
  • multiple target images can be selected based on disparity.
  • multiple target images After being uploaded to the cloud, the cloud can determine the constrained pose of the terminal based on the target image.
  • the constrained pose is the pose used to constrain the local pose of the terminal.
  • the constrained pose is sent to the terminal, and then the terminal determines the constrained pose according to the constrained position. pose and local pose to determine the accurate target pose of the terminal.
  • the target pose After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.
  • the constrained pose of the above-mentioned terminal is determined by selecting the target image based on the parallax in the multiple images, and using the constrained pose to constrain the local position, the accurate target pose of the terminal can be determined. This achieves the purpose of improving the accuracy of the determined pose, thereby solving the technical problem of inaccurate positioning.
  • selecting multiple target images from multiple images based on disparity includes:
  • the two images with the largest disparity are used as the images among the plurality of target images.
  • an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.
  • determining the target pose of the terminal based on the constrained pose and the local pose of the terminal includes:
  • the transformation matrix is determined based on the constrained pose and the local pose
  • the product of the local pose and the scale factor is used as the target pose.
  • the transformation matrix can be determined based on the constrained posture and the local posture.
  • the scale factor is obtained from the transformation matrix.
  • the scale factor is a factor used to adjust the local posture of the terminal.
  • the local posture of the terminal is multiplied by the scale factor to obtain the calculated posture.
  • the calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.
  • determining the transformation matrix based on the constrained pose and the local pose includes:
  • the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.
  • obtaining the scale factors from the transformation matrix includes:
  • the scale factor s can be calculated.
  • uploading multiple target images to the cloud to obtain the constrained pose of the terminal includes:
  • the cloud repositions each target image among the multiple target images according to the navigation map, and obtains the relocation position corresponding to each target image;
  • the cloud arranges the relocation positions in order to obtain the constrained pose.
  • the target images can be uploaded to the cloud.
  • a navigation map is saved in the cloud, and the navigation map is a map within a certain area.
  • Multiple target images can be used to determine images with high similarity in the navigation map.
  • the position of each target image in the multiple target images can be determined in the navigation map. Locations are arranged in chronological order You can get the pose. The obtained pose is used as the constrained pose.
  • the above method also includes:
  • the cloud obtains the panoramic video in the navigation area and multiple captured images in the navigation area;
  • the navigation map in this embodiment needs to be obtained in advance.
  • Panoramic videos can be captured in the navigation area and multiple captured images can be captured.
  • the panoramic videos and captured images can be used to generate point cloud maps.
  • the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.
  • generating point cloud maps based on panoramic videos and captured images includes:
  • target frames can be extracted from the panoramic video, and each target frame is an image.
  • the sparse point cloud is densified to obtain a point cloud map.
  • This embodiment proposes a monocular visual mileage (V0) scale recovery algorithm combined with cloud relocation (visual positioning), so that local (terminal) V0 can be effectively applied in navigation user tracking.
  • the key to the local V0 scale recovery algorithm combined with cloud relocation is to run V0 on the terminal when navigation starts, and select a number of key frames at certain intervals to send to the cloud.
  • the cloud repositions these key frames to obtain their corresponding constraint bits. pose, and returns the constrained pose to the terminal; the terminal uses the returned constrained pose as a constraint, adds it to the calculation process of the local pose, solves a transformation matrix, and decomposes the scale factor from the transformation matrix.
  • this scale Factors can restore the true scale of the local position.
  • the pose constraints returned by the cloud are added to the local pose calculation, it can improve The reliability of the local posture calculated by V0.
  • the local posture can be tracked accurately and efficiently over a long period of time, effectively improving the efficiency and accuracy of user tracking during navigation.
  • the application scenarios are also wider, without distinguishing between indoor and outdoor scenes.
  • FIG. 2 is the basic flow chart of the local V0 scale recovery algorithm.
  • the basic process of the algorithm includes: key frame screening and uploading, key frame relocation, solving similarity transformation and V0 scale recovery.
  • Key frames refer to the key frames generated when the terminal runs V0. First, some of the key frames recently solved by V0 are screened on the terminal, and the ones with the largest parallax are selected and uploaded to the cloud for relocation.
  • the terminal refers to a common smartphone, nothing special. Model requirements.
  • Cloud relocation means that the cloud visually locates the received key frames and returns the positioning results to the terminal. After the terminal obtains the poses of these uploaded key frames, it adds these poses as constraints to the calculation of the local pose, and finally can solve a similarity transformation.
  • the scale factor decomposed from the similarity transformation can be used to recover the true scale of the local posture.
  • Figure 3 is the overall block diagram of the navigation system based on local V0 scale recovery.
  • the system includes cloud and terminal.
  • the cloud includes navigation map generation and navigation services, and the terminal provides functional interfaces for users.
  • Navigation map generation includes collecting original mapping data, generating point cloud maps, and generating navigation maps based on point cloud maps and plane maps.
  • Navigation services provided by the cloud include: identification and positioning, path planning and real-time navigation.
  • the functions provided to users by the terminal include: starting navigation, initial positioning, destination selection, and real-time navigation.
  • Cloud offline navigation map generates high-precision maps for navigation.
  • Map use a panoramic camera to shoot panoramic videos.
  • pictures of some scenes on the map taken by a general monocular camera are required, and the precise position of the picture is obtained through real-time differential positioning (Real-time kinematic, RTK).
  • RTK real-time differential positioning
  • the point cloud map is obtained using the original data and the panoramic 3D reconstruction algorithm.
  • the final navigation map is obtained based on the point cloud map and the plane map.
  • the navigation map is stored in the cloud, and the corresponding area map is loaded every time the navigation is started.
  • the entire navigation service is conducted within the scope of the map.
  • the cloud navigation service mainly focuses on initial positioning and relocation tasks during user tracking.
  • the real-time navigation process uses the local V0 scale recovery proposed in the embodiment of the present invention.
  • the cloud is deployed on high-performance servers, and the network needs to be kept open at the same time. After the navigation is started, the terminal will continuously interact with the cloud navigation service to achieve real-time navigation.
  • Step 1 Use a panoramic camera to shoot the navigation service (map) area to obtain a panoramic video.
  • the shooting process contains at least one "loopback". Looping means to circle back to the "origin” after shooting for a certain distance.
  • the "origin” does not specifically refer to the initial scanning position, but to part of the scene that has been walked through during the scanning process.
  • the shooting route is similar to the five Olympic rings. That is, the panoramic video collected by hand contains images/videos of the same object from different angles.
  • Step 2 Take several pictures of part of the scene, and use RTK to obtain the precise location coordinates of the taken pictures.
  • RTK Radio Timing Determination Protocol
  • Step 3 Use the 3D reconstruction algorithm to perform 3D reconstruction of the data collected in steps 1 and 2 to generate a point cloud map.
  • the basic process of the three-dimensional reconstruction algorithm is: extract frames from the panoramic video, run Simultaneous Localization and Mapping (SLAM) to obtain panoramic key frames and poses, cut panoramic images to generate monocular images and their corresponding poses, and use Monocular images and poses are run through a cross matrix structure (Structure from Motion, SfM) to generate sparse point clouds, and densification generates dense point clouds.
  • SLAM Simultaneous Localization and Mapping
  • Step 4 Combine the flat map and the point cloud map generated in step 3 to generate the navigation map used in the navigation process and save it to the cloud.
  • Step 1 The terminal starts the augmented reality (AR) navigation service, and the cloud loads the navigation map generated in process 1.
  • the terminal refers to a smartphone with a smooth network, and there are no special requirements for the brand.
  • Step 2 The terminal starts initial positioning, uses the camera to take pictures of the current environment, and uploads them to Cloud.
  • Step 3 The cloud performs initial positioning on the current environment picture. After obtaining the pose of the current picture, it returns it to the terminal as the user's initial position.
  • Step 4 After the terminal obtains the initial position, select the navigation destination and upload it to the cloud.
  • Step 5 The cloud plans the navigation path based on the starting point and destination of the navigation, and renders the movement direction indicator on the screen of the terminal.
  • Step 6 The user moves according to the instructions, and the terminal screen displays the picture captured by the current camera. At the same time, the terminal starts V0 and turns on user tracking.
  • Step 7 When the terminal V0 reaches a certain time, start the local V0 scale recovery algorithm.
  • the algorithm process is shown in Figure 1. The specific steps are as follows
  • Step 7.1 The terminal selects several key frames with the largest average disparity from some of the recent key frames obtained by V0 and uploads them to the cloud;
  • Step 7.2 The cloud repositions the uploaded key frames and returns the corresponding pose
  • Step 7.3 The terminal adds the cloud relocation key frame pose as a priori to the calculation of V0, and defines the residual as the prior pose minus the similarity transformation matrix multiplied by the local pose.
  • the transformation matrix T is obtained through the above formula 1
  • the scale factor s is obtained through the above formula 2.
  • Step 8 Use the pose solved by recovering the local V0 of the true scale as the user's current pose to achieve multi-modal user tracking that integrates local V0 and cloud relocation. At the same time, the terminal continuously uses the current location to determine whether the destination has been reached.
  • mapping algorithm and recognition algorithm are deployed in the cloud during implementation.
  • the specific implementation process is as follows:
  • Step 1 Follow step 1 of process 1 to collect original video data.
  • the indoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot.
  • the panoramic camera can also be fixed by other equipment, such as a helmet.
  • Step 2 Collect image data according to step 2 of process 1.
  • Image data is generally collected through mobile phones Take photos or other devices that can take photos. Usually shooting more typical scenes, such as store signs, etc., such scenes are more likely to be the starting point or end point of navigation.
  • Step 3 Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.
  • Step 4 Follow step 4 of process 1 to generate a navigation map.
  • the point cloud map is the point cloud map generated in step 3.
  • the floor map uses the CAD drawing of the building.
  • Step 5 Follow steps 1 and 2 of process 2 to start navigation.
  • the cloud loads map information and starts to accept messages from the terminal.
  • the terminal uploads pictures of the current environment.
  • Step 6 Follow steps 3, 4 and 5 of process 2 to generate a navigation path.
  • the cloud performs initial positioning based on the uploaded image and returns the result to the terminal.
  • the terminal selects the navigation destination and uploads it to the cloud.
  • the cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.
  • Step 7 follow steps 6 to 8 of process 2.
  • the terminal starts V0.
  • V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm.
  • the local V0 can achieve relatively accurate user tracking.
  • the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.
  • mapping algorithm and recognition algorithm are deployed in the cloud.
  • the specific implementation process is as follows:
  • Step 1 Collect original video data as described in Step 1 of Process 1.
  • the outdoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot. If the scene is large, you can also use other methods such as a drone equipped with a panoramic camera. The shooting route must still comply with the description in step 1 of the process.
  • Step 2 Collect image data according to step 2 of process 1.
  • Image data is generally collected through mobile phones Take photos or other devices that can take photos. We usually shoot more typical scenes, such as road signs, building gates, etc. Such scenes are more likely to be the starting point or end point of navigation.
  • Step 3 Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.
  • Step 4 follow step 4 of process 1 to generate a navigation map.
  • the point cloud map is the point cloud map generated in step 3.
  • the plan map can be a plan CAD map and road network information.
  • Step 5 Follow steps 1 and 2 of process 2 to start navigation.
  • the cloud loads map information and starts to accept messages from the terminal.
  • the terminal uploads pictures of the current environment.
  • Step 6 Follow steps 3, 4 and 5 of process 2 to generate a navigation path.
  • the cloud performs initial positioning based on the uploaded image and returns the result to the terminal.
  • the terminal selects the navigation destination and uploads it to the cloud.
  • the cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.
  • Step 7 follow steps 6 to 8 of process 2.
  • the terminal starts V0.
  • V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm.
  • the local V0 can achieve relatively accurate user tracking.
  • the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.
  • a visual positioning pose determination device is also provided, as shown in Figure 4, including:
  • the acquisition module 402 is used to acquire multiple images captured by the terminal during the movement of the terminal equipped with a camera;
  • the selection module 404 is used to select multiple target images from multiple images based on disparity
  • the upload module 406 is used to upload multiple target images to the cloud to obtain the constrained pose of the terminal;
  • the determination module 408 is used to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  • the posture may be the movement trajectory and position of the terminal.
  • the purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.
  • the above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera.
  • the camera can be a single camera or a camera array composed of multiple cameras.
  • the above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.
  • multiple target images can be selected based on disparity.
  • the cloud can determine the constrained pose of the terminal based on the target image.
  • the constrained pose is a pose used to constrain the local pose of the terminal.
  • the constrained pose is sent to the terminal, and then the constrained pose is sent to the terminal.
  • the terminal determines the accurate target pose of the terminal based on the constrained pose and the local pose. After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.
  • the constrained pose is determined by the target image selected based on the parallax in the multiple images, and the constrained pose is used to constrain the local position, the terminal can be determined Accurate target pose.
  • the terminal can be determined Accurate target pose.
  • the above-mentioned selection module includes: a first determination unit, used to determine multiple first images of the same object from the above-mentioned multiple images; a second determination unit, used to determine the above-mentioned multiple first images Among the images, the two images with the largest disparity are used as the images among the multiple target images mentioned above.
  • an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.
  • the above-mentioned determination module includes: a third determination unit, used to determine the transformation matrix according to the above-mentioned constrained posture and the above-mentioned local posture; an acquisition unit, used to obtain the scale factor from the above-mentioned transformation matrix; Four determination units, configured to use the product of the above-mentioned local posture and the above-mentioned scale factor as the above-mentioned target posture.
  • the transformation matrix can be determined based on the constrained posture and the local posture.
  • the scale factor is obtained from the transformation matrix.
  • the scale factor is a factor used to adjust the local posture of the terminal.
  • the local posture of the terminal is multiplied by the scale factor to obtain the calculated posture.
  • the calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.
  • the above-mentioned third determination unit includes: a first input sub-unit, used to substitute the first numerical value of the above-mentioned local posture and the second numerical value of the above-mentioned constrained posture into the above-mentioned formula 1 to obtain the above-mentioned transformation matrix and residuals.
  • the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.
  • the above-mentioned acquisition unit includes: a second input sub-unit, used to substitute the relative rotation and relative offset of the above-mentioned constrained posture and the above-mentioned local posture into the above-mentioned formula 2 to obtain the above-mentioned scale factor.
  • the scale factor s can be calculated.
  • the above-mentioned upload module includes: a relocation unit, used to notify the above-mentioned cloud to relocate each of the above-mentioned multiple target images according to the navigation map, and obtain the corresponding The relocation positions; the above-mentioned cloud arranges the above-mentioned relocation positions in order to obtain the above-mentioned constrained poses.
  • the target images can be uploaded to the cloud.
  • a navigation map is saved in the cloud, and the navigation map is a map within a certain area.
  • Multiple target images can be used to determine images with high similarity in the navigation map.
  • the position of each target image in the multiple target images can be determined in the navigation map.
  • the positions can be obtained by arranging the positions in chronological order. The obtained pose is used as the constrained pose.
  • the cloud can obtain a panoramic video in the navigation area and multiple captured images in the navigation area; generate a point cloud map based on the panoramic video and the captured images; combine the point cloud map with a plane The maps are combined to obtain the above navigation map.
  • the navigation map in this embodiment needs to be obtained in advance.
  • Panoramic videos can be captured in the navigation area and multiple captured images can be captured.
  • the panoramic videos and captured images can be used to generate point cloud maps.
  • the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.
  • the cloud can extract the target frame from the above panoramic video; determine the first pose of the above target frame; intersect the above first pose to generate a matrix structure to obtain a sparse point cloud; perform the above sparse The point cloud is densified to obtain the above point cloud map.
  • target frames can be extracted from the panoramic video, and each target frame is an image.
  • the sparse point cloud is densified to obtain a point cloud map.
  • Figure 5 is a structural block diagram of an optional electronic device according to an embodiment of the present application. As shown in Figure 5, it includes a processor 502, a communication interface 504, a memory 506 and a communication bus 508. The processor 502, the communication interface 504 and memory 506 complete communication with each other through communication bus 508, where,
  • Memory 506 for storing computer programs
  • the processor 502 is used to implement the following steps when executing the computer program stored on the memory 506:
  • the target pose of the terminal is determined based on the constrained pose and the local pose of the terminal.
  • the above-mentioned communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above-mentioned electronic devices and other devices.
  • the memory may include RAM or non-volatile memory, such as at least one disk memory.
  • the memory may also be at least one storage device located remotely from the aforementioned processor.
  • the above memory 506 may include, but is not limited to, the acquisition module 402, the selection module 404, the upload module 406 and the determination module 408 in the above visual positioning pose determination device. In addition, it may also include but is not limited to other module units in the above-mentioned visual positioning posture determination device, which will not be described again in this example.
  • the above-mentioned processor may be a general-purpose processor and may include but is not limited to: a central processing unit (Central Processing Unit). Processing Unit (CPU), Network Processor (NP), etc.; it can also be a Digital Signal Processing (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array ( Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the device that implements the pose determination method of visual positioning can be a terminal device, and the terminal device can be a smart phone (such as an Android phone, iOS phone, etc.) , tablet computers, handheld computers, and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal devices.
  • Figure 5 does not limit the structure of the above electronic device.
  • the electronic device may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 5 , or have a different configuration than that shown in FIG. 5 .
  • the program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, ROM, RAM, magnetic disk or optical disk, etc.
  • a computer-readable storage medium stores a computer program, wherein the computer program executes the above-mentioned visual positioning when run by the processor. Steps in the pose determination method.
  • the storage media can include: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • the integrated units in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in the above computer-readable storage medium.
  • the technical solution of the embodiment of the present invention is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium. , including several instructions to cause one or more computer devices (which can be personal computers, servers, network devices, etc.) to execute all or part of the steps of the methods described in various embodiments of the embodiments of the present invention.
  • the disclosed client can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in the embodiment of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Image Processing (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

A visual-localization-based pose determination method, which comprises: during a movement process of a terminal on which a camera is mounted, acquiring a plurality of images captured by the terminal (S102); selecting a plurality of target images from among the plurality of images according to a disparity (S104); uploading the plurality of target images to a cloud to acquire a constraint pose of the terminal (S106); and determining a target pose of the terminal according to the constraint pose and a local pose of the terminal (S108). Further disclosed are a visual-localization-based pose determination apparatus and an electronic device.

Description

视觉定位的位姿确定方法、装置以及电子设备Position determination method, device and electronic equipment for visual positioning
相关申请的交叉引用Cross-references to related applications
本申请基于2022年6月28日提交的发明名称为“视觉定位的位姿确定方法、装置以及电子设备”的中国专利申请CN202210751878.3,并且要求该专利申请的优先权,通过引用将其所公开的内容全部并入本申请。This application is based on the Chinese patent application CN202210751878.3 with the invention title "Visual Positioning Pose and Orientation Determination Method, Device and Electronic Equipment" submitted on June 28, 2022, and claims the priority of this patent application, all contents of which are incorporated by reference. All disclosed contents are incorporated into this application.
技术领域Technical field
本发明实施例涉及导航领域,具体而言,涉及一种视觉定位的位姿确定方法、装置以及电子设备。Embodiments of the present invention relate to the field of navigation, and specifically, to a visual positioning pose determination method, device, and electronic equipment.
背景技术Background technique
现有技术中,通常需要对终端的位姿进行定位。如导航等过程中,需要对请求导航的设备进行定位。可以采用的方法有视觉定位法。而视觉定位法定位的位姿容易不准确。In the existing technology, it is usually necessary to locate the posture of the terminal. For example, during navigation and other processes, the device requesting navigation needs to be positioned. The methods that can be used include visual positioning method. However, the position positioned by the visual positioning method is easy to be inaccurate.
发明内容Contents of the invention
本发明实施例提供了一种视觉定位的位姿确定方法、装置以及电子设备,以至少解决定位的位姿不准确的技术问题。Embodiments of the present invention provide a visual positioning pose determination method, device, and electronic equipment to at least solve the technical problem of inaccurate positioning poses.
根据本发明实施例的一个方面,提供了一种视觉定位的位姿确定方法,包括:在安装有摄像头的终端移动的过程中,获取上述终端拍摄的多张图像;从上述多张图像中,根据视差选择出多张目标图像;将上述多张目标图像上传到云端,以获取到上述终端的约束位姿;根据上述约束位姿和上述终端的本地位姿确定出上述终端的目标位姿。According to one aspect of an embodiment of the present invention, a method for determining the pose of visual positioning is provided, including: during the movement of a terminal equipped with a camera, acquiring multiple images captured by the terminal; from the multiple images, Select multiple target images based on disparity; upload the multiple target images to the cloud to obtain the constrained pose of the terminal; determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
根据本发明实施例的另一方面,提供了一种视觉定位的位姿确定装置,包括:获取模块,设置为在安装有摄像头的终端移动的过程中,获取上述终端拍摄的多张图像;选择模块,设置为从上述多张图像中,根据视差选择出多张目标图像;上传模块,设置为将上述多张目标图像上传到云端,以获取 到上述终端的约束位姿;确定模块,设置为根据上述约束位姿和上述终端的本地位姿确定出上述终端的目标位姿。According to another aspect of the embodiment of the present invention, a device for determining the posture of visual positioning is provided, including: an acquisition module configured to acquire multiple images captured by the terminal when the terminal equipped with a camera is moving; select The module is configured to select multiple target images from the multiple images mentioned above based on parallax; the upload module is configured to upload the multiple target images mentioned above to the cloud to obtain to the constrained pose of the terminal; the determination module is configured to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
根据本发明实施例的又一方面,还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被处理器运行时执行上述视觉定位的位姿确定方法。According to another aspect of the embodiment of the present invention, a storage medium is also provided, and a computer program is stored in the storage medium, wherein the computer program executes the above-mentioned pose determination method for visual positioning when run by a processor.
根据本发明实施例的又一方面,还提供了一种电子设备,包括存储器和处理器,上述存储器中存储有计算机程序,上述处理器被设置为通过上述计算机程序执行上述的视觉定位的位姿确定方法。According to another aspect of the embodiment of the present invention, an electronic device is also provided, including a memory and a processor. A computer program is stored in the memory, and the processor is configured to perform the above-mentioned visual positioning pose through the computer program. Determine the method.
附图说明Description of drawings
此处所说明的附图用来提供对本发明实施例的进一步理解,构成本申请实施例的一部分,本发明实施例的示意性实施例及其说明用于解释本发明实施例,并不构成对本发明实施例的不当限定。在附图中:The drawings described here are used to provide a further understanding of the embodiments of the present invention and constitute a part of the embodiments of the present application. The schematic embodiments of the embodiments of the present invention and their descriptions are used to explain the embodiments of the present invention and do not constitute an explanation of the embodiments of the present invention. Improper limitation of the embodiment. In the attached picture:
图1是根据本发明实施例的一种可选的视觉定位的位姿确定方法的流程图;Figure 1 is a flow chart of an optional visual positioning pose determination method according to an embodiment of the present invention;
图2是根据本发明实施例的一种可选的视觉定位的位姿确定方法的本地V0尺度恢复算法的流程图;Figure 2 is a flow chart of a local V0 scale recovery algorithm of an optional visual positioning pose determination method according to an embodiment of the present invention;
图3是根据本发明实施例的一种可选的视觉定位的位姿确定方法的基于本地V0尺度恢复的导航系统框图;Figure 3 is a block diagram of a navigation system based on local V0 scale recovery according to an optional visual positioning pose determination method according to an embodiment of the present invention;
图4是根据本发明实施例的一种可选的视觉定位的位姿确定装置的结构示意图;Figure 4 is a schematic structural diagram of an optional visual positioning posture determination device according to an embodiment of the present invention;
图5是根据本发明实施例的一种可选的电子设备的示意图。Figure 5 is a schematic diagram of an optional electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明实施例的方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明实施例一部分的实施例,而不是全 部的实施例。基于本发明实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明实施例保护的范围。In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only examples of a part of the embodiments of the present invention, not all of them. Examples of parts. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of the embodiments of the present invention.
需要说明的是,本发明实施例的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of the embodiments of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. order. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the invention described herein are capable of being practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
根据本发明实施例的第一方面,提供了一种视觉定位的位姿确定方法,可选地,如图1所示,上述方法包括:According to the first aspect of the embodiment of the present invention, a method for determining the pose of visual positioning is provided. Optionally, as shown in Figure 1, the above method includes:
S102,在安装有摄像头的终端移动的过程中,获取终端拍摄的多张图像;S102: During the movement of the terminal equipped with the camera, obtain multiple images captured by the terminal;
S104,从多张图像中,根据视差选择出多张目标图像;S104, select multiple target images from multiple images based on disparity;
S106,将多张目标图像上传到云端,以获取到终端的约束位姿;S106, upload multiple target images to the cloud to obtain the constrained pose of the terminal;
S108,根据约束位姿和终端的本地位姿确定出终端的目标位姿。S108: Determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
可选的,本实施例中,位姿可以为终端的移动轨迹和位置。本实施例的目的在于确定出终端的准确的目标位姿,即终端准确的移动轨迹和位置,从而可以应用在对终端进行导航、对终端进行定位的过程中。Optionally, in this embodiment, the posture may be the movement trajectory and position of the terminal. The purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.
上述的终端可以安装摄像头,摄像头可以包括前置摄像头、后置摄像头或外接摄像头,摄像头可以为单摄像头或多摄像头组成的摄像头阵列。上述终端可以被携带进行移动。例如,某用户携带终端在一定区域内移动,则终端可以通过摄像头拍照,获取多张图像。需要说明的是,终端的摄像头拍摄的是上述用户所在的一定区域内的图像,如果终端被放置到衣服口袋中,摄像头被衣服布料遮挡,则无法获取上述多张图像。The above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera. The camera can be a single camera or a camera array composed of multiple cameras. The above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.
在获取到多张图像后,可以根据视差选择多张目标图像。多张目标图像 被上传到云端后,云端可以根据目标图像确定终端的约束位姿,约束位姿是用于对终端的本地位姿进行约束的位姿,将约束位姿发送给终端,再由终端根据约束位姿和本地位姿来确定终端的准确的目标位姿。确定出目标位姿后,可以在终端上显示目标位姿,以进行导航或者定位。After acquiring multiple images, multiple target images can be selected based on disparity. multiple target images After being uploaded to the cloud, the cloud can determine the constrained pose of the terminal based on the target image. The constrained pose is the pose used to constrain the local pose of the terminal. The constrained pose is sent to the terminal, and then the terminal determines the constrained pose according to the constrained position. pose and local pose to determine the accurate target pose of the terminal. After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.
在本发明实施例中,采用了在安装有摄像头的终端移动的过程中,获取上述终端拍摄的多张图像;从上述多张图像中,根据视差选择出多张目标图像;将上述多张目标图像上传到云端,以获取到上述终端的约束位姿;根据上述约束位姿和上述终端的本地位姿确定出上述终端的目标位姿的方法,由于在上述方法中,通过安装摄像头的终端移动的时候拍摄多张图像,通过多张图像中根据视差选择的目标图像来确定约束位姿,并使用约束位姿来约束本地位置,可以确定出终端准确的目标位姿。从而实现了提高确定的位姿的准确度的目的,进而解决了定位的位姿不准确的技术问题。In the embodiment of the present invention, during the movement of a terminal equipped with a camera, a plurality of images taken by the terminal are obtained; from the plurality of images, a plurality of target images are selected according to the parallax; and the plurality of target images are selected The image is uploaded to the cloud to obtain the constrained pose of the above-mentioned terminal; the method of determining the target pose of the above-mentioned terminal based on the above-mentioned constrained pose and the local pose of the above-mentioned terminal, because in the above method, the terminal with the camera installed moves When taking multiple images, the constrained pose is determined by selecting the target image based on the parallax in the multiple images, and using the constrained pose to constrain the local position, the accurate target pose of the terminal can be determined. This achieves the purpose of improving the accuracy of the determined pose, thereby solving the technical problem of inaccurate positioning.
作为一种可选的示例,从多张图像中,根据视差选择出多张目标图像包括:As an optional example, selecting multiple target images from multiple images based on disparity includes:
从多张图像中,确定同一对象的多张第一图像;Determine multiple first images of the same object from multiple images;
将多张第一图像中,视差最大的两张图像作为多张目标图像中的图像。Among the plurality of first images, the two images with the largest disparity are used as the images among the plurality of target images.
本实施例中,在根据视差从多张图像中选择多张目标图像时,可以获取同一对象的图像,然后,计算同一对象的图像中的每两张图像之间的视差,并按照视差进行排序,排序后,可以将视差最大的两张图像作为目标图像。如果包含多个对象,则每一个对象确定出两张目标图像。In this embodiment, when selecting multiple target images from multiple images based on disparity, an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.
作为一种可选的示例,根据约束位姿和终端的本地位姿确定出终端的目标位姿包括:As an optional example, determining the target pose of the terminal based on the constrained pose and the local pose of the terminal includes:
根据约束位姿和本地位姿确定出变换矩阵;The transformation matrix is determined based on the constrained pose and the local pose;
从变换矩阵中获取尺度因子;Get the scale factor from the transformation matrix;
将本地位姿与尺度因子的乘积作为目标位姿。 The product of the local pose and the scale factor is used as the target pose.
本实施例中,可以根据约束位姿和本地位姿确定变换矩阵,确定出变换矩阵后,从变换矩阵中获取尺度因子。尺度因子是用于对终端的本地位姿进行调整的因子,通过尺度因子,对终端的本地位姿进行乘法计算,得到计算后的位姿,计算后的位姿即为通过尺度因子调整后的位姿,计算后的位姿为准确的目标位姿。In this embodiment, the transformation matrix can be determined based on the constrained posture and the local posture. After the transformation matrix is determined, the scale factor is obtained from the transformation matrix. The scale factor is a factor used to adjust the local posture of the terminal. The local posture of the terminal is multiplied by the scale factor to obtain the calculated posture. The calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.
作为一种可选的示例,根据约束位姿和本地位姿确定出变换矩阵包括:As an optional example, determining the transformation matrix based on the constrained pose and the local pose includes:
将本地位姿的第一数值与约束位姿的第二数值代入上述公式1,得到变换矩阵与残差。Substitute the first numerical value of the local posture and the second numerical value of the constrained posture into the above formula 1 to obtain the transformation matrix and residual.
可选的,本实施例中,当知晓本地位姿的第一数值与约束位姿的第二数值的情况下,将两者代入到上述公式中。由于本地位姿和约束位姿是一连串的位置信息,因此,可以计算得到上述残差和上述变换矩阵T。Optionally, in this embodiment, when the first numerical value of the local posture and the second numerical value of the constrained posture are known, the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.
作为一种可选的示例,从变换矩阵中获取尺度因子包括:As an optional example, obtaining the scale factors from the transformation matrix includes:
将约束位姿和本地位姿的相对旋转与相对偏移代入到上述公式2中,得到尺度因子。Substituting the relative rotation and relative offset between the constrained pose and the local pose into the above formula 2, the scale factor is obtained.
可选的,本实施例中,由于上述变换矩阵T已经计算得到,而r与t均为已知量,因此,可以计算尺度因子s。Optionally, in this embodiment, since the above transformation matrix T has been calculated, and r and t are both known quantities, the scale factor s can be calculated.
作为一种可选的示例,将多张目标图像上传到云端,以获取到终端的约束位姿包括:As an optional example, uploading multiple target images to the cloud to obtain the constrained pose of the terminal includes:
云端按照导航地图,对多张目标图像中的每一张目标图像进行重定位,得到每一张目标图像对应的重定位位置;The cloud repositions each target image among the multiple target images according to the navigation map, and obtains the relocation position corresponding to each target image;
云端将重定位位置按照先后顺序排列,得到约束位姿。The cloud arranges the relocation positions in order to obtain the constrained pose.
可选的,本实施例中,当确定出多张目标图像之后,可以将目标图像上传到云端。云端保存有导航地图,导航地图是一定区域内的地图。多张目标图像可以在导航地图中确定相似度高的图像,经过比对,可以在导航地图中确定出多张目标图像中每一张目标图像的位置。位置按照时间先后顺序排列 即可得到位姿。将得到的位姿作为约束位姿。Optionally, in this embodiment, after multiple target images are determined, the target images can be uploaded to the cloud. A navigation map is saved in the cloud, and the navigation map is a map within a certain area. Multiple target images can be used to determine images with high similarity in the navigation map. After comparison, the position of each target image in the multiple target images can be determined in the navigation map. Locations are arranged in chronological order You can get the pose. The obtained pose is used as the constrained pose.
作为一种可选的示例,上述方法还包括:As an optional example, the above method also includes:
云端获取导航区域内的全景视频和导航区域内的多张拍摄图像;The cloud obtains the panoramic video in the navigation area and multiple captured images in the navigation area;
根据全景视频和拍摄图像,生成点云地图;Generate point cloud maps based on panoramic videos and captured images;
将点云地图与平面地图结合,得到导航地图。Combine the point cloud map with the flat map to obtain the navigation map.
可选的,本实施例中的导航地图需要预先获取。可以在导航区域内拍摄全景视频,并拍摄多张的拍摄图像,全景视频和拍摄图像可以用于生成点云地图。然后,点云地图和导航区域的平面地图结合,得到上述的导航地图。Optionally, the navigation map in this embodiment needs to be obtained in advance. Panoramic videos can be captured in the navigation area and multiple captured images can be captured. The panoramic videos and captured images can be used to generate point cloud maps. Then, the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.
作为一种可选的示例,根据全景视频和拍摄图像,生成点云地图包括:As an optional example, generating point cloud maps based on panoramic videos and captured images includes:
从全景视频中抽取目标帧;Extract target frames from panoramic videos;
确定目标帧的第一位姿;Determine the first pose of the target frame;
对第一位姿进行交叉,生成矩阵结构,得到稀疏点云;Cross the first pose to generate a matrix structure and obtain a sparse point cloud;
对稀疏点云进行稠密化,得到点云地图。Densify the sparse point cloud to obtain a point cloud map.
可选的,本实施例中,在获取到全景视频和拍摄图像后,可以从全景视频中抽取目标帧,每一个目标帧为一个图像。将抽取的多帧目标帧的位置确定为第一位姿,对第一位姿进行交叉,生成矩阵结构,得到稀疏点云,稀疏点云进行稠密化,得到点云地图。Optionally, in this embodiment, after acquiring the panoramic video and shooting images, target frames can be extracted from the panoramic video, and each target frame is an image. Determine the position of the extracted multi-frame target frame as the first pose, cross the first pose, generate a matrix structure, and obtain a sparse point cloud. The sparse point cloud is densified to obtain a point cloud map.
本实施例提出一种结合云端重定位(视觉定位)的单目视觉里程(V0)尺度恢复算法,使本地(终端)V0可以有效地应用在导航的用户跟踪中。结合云端重定位的本地V0尺度恢复算法其关键在于,导航启动时在终端运行V0,每隔一定时间选择若干关键帧发送到云端,云端对这部分关键帧进行重定位,获得其对应的约束位姿,并将约束位姿返回到终端;终端将返回的约束位姿作为约束条件,加入到本地位姿的计算过程中,求解出一个变换矩阵,从变换矩阵中分解出尺度因子,根据此尺度因子可以恢复处本地位姿的真实尺度。同时,由于本地位姿计算时加入了云端返回的的位姿约束,可以提高 V0计算出的本地位姿的可靠性。使用本实施例的尺度恢复算法,可以使本地位姿长期准确且高效率跟踪用户,有效的提高了导航时用户追踪的效率和准确率,同时应用场景也更广泛,不区分室内和室外场景。This embodiment proposes a monocular visual mileage (V0) scale recovery algorithm combined with cloud relocation (visual positioning), so that local (terminal) V0 can be effectively applied in navigation user tracking. The key to the local V0 scale recovery algorithm combined with cloud relocation is to run V0 on the terminal when navigation starts, and select a number of key frames at certain intervals to send to the cloud. The cloud repositions these key frames to obtain their corresponding constraint bits. pose, and returns the constrained pose to the terminal; the terminal uses the returned constrained pose as a constraint, adds it to the calculation process of the local pose, solves a transformation matrix, and decomposes the scale factor from the transformation matrix. According to this scale Factors can restore the true scale of the local position. At the same time, since the pose constraints returned by the cloud are added to the local pose calculation, it can improve The reliability of the local posture calculated by V0. Using the scale recovery algorithm of this embodiment, the local posture can be tracked accurately and efficiently over a long period of time, effectively improving the efficiency and accuracy of user tracking during navigation. At the same time, the application scenarios are also wider, without distinguishing between indoor and outdoor scenes.
图2为本地V0尺度恢复算法的基本流程图。算法的基本流程包括:关键帧筛选上传、关键帧重定位、求解相似变换和V0尺度恢复。关键帧是指终端运行V0时生成的关键帧,首先在终端上对V0最近求解出的部分关键帧进行筛选,选择视差最大的若干张上传到云端进行重定位,终端指常见智能手机,无特殊型号要求。云端重定位是指云端对接收到的关键帧进行视觉定位,并将定位结果返回到终端。终端得到这若干张上传的关键帧的位姿后,将这些位姿作为约束加入到本地位姿的计算中,最终可以求解到一个相似变换。从相似变换中分解出的尺度因子,根据此尺度因子,可以恢复出本地位姿的真实尺度。Figure 2 is the basic flow chart of the local V0 scale recovery algorithm. The basic process of the algorithm includes: key frame screening and uploading, key frame relocation, solving similarity transformation and V0 scale recovery. Key frames refer to the key frames generated when the terminal runs V0. First, some of the key frames recently solved by V0 are screened on the terminal, and the ones with the largest parallax are selected and uploaded to the cloud for relocation. The terminal refers to a common smartphone, nothing special. Model requirements. Cloud relocation means that the cloud visually locates the received key frames and returns the positioning results to the terminal. After the terminal obtains the poses of these uploaded key frames, it adds these poses as constraints to the calculation of the local pose, and finally can solve a similarity transformation. The scale factor decomposed from the similarity transformation can be used to recover the true scale of the local posture.
图3为基于本地V0尺度恢复的导航系统总体框图。系统包括云端和终端。如附图3所示,云端包括导航地图生成和导航服务,终端为用户提供功能接口。导航地图生成包括采集原始建图数据,生成点云地图,根据点云地图和平面地图生成导航地图。云端提供的导航服务包括:识别定位、路径规划和实时导航。终端提供的给用户的功能包括:启动导航、初始定位、目的地选择,实时导航。Figure 3 is the overall block diagram of the navigation system based on local V0 scale recovery. The system includes cloud and terminal. As shown in Figure 3, the cloud includes navigation map generation and navigation services, and the terminal provides functional interfaces for users. Navigation map generation includes collecting original mapping data, generating point cloud maps, and generating navigation maps based on point cloud maps and plane maps. Navigation services provided by the cloud include: identification and positioning, path planning and real-time navigation. The functions provided to users by the terminal include: starting navigation, initial positioning, destination selection, and real-time navigation.
云端离线导航地图生成用于导航的高精度地图。在导航区域(地图)范围内,使用全景相机拍摄全景视频,全景相机型号无特定要求。此外还需要一般单目相机拍摄的地图部分场景的图片,通过实时差分定位(Real-time kinematic,RTK)获得图片精确位置,相机型号无特殊要求。使用原始数据和全景三维重建算法得到点云地图。根据点云地图和平面地图得到最终的导航地图。导航地图存在云端,每次导航启动时加载对应区域地图即可。导航业务全程在地图范围内进行。云端导航服务以初始的定位和用户跟踪时的重定位任务为主,实时导航的过程使用本发明实施例提出的本地V0尺度恢复。云端部署在高性能服务器上,同时需要保持网络通畅。在导航启动后终端会和云端导航服务不断交互,实现实时导航。 Cloud offline navigation map generates high-precision maps for navigation. Within the navigation area (map), use a panoramic camera to shoot panoramic videos. There are no specific requirements for the panoramic camera model. In addition, pictures of some scenes on the map taken by a general monocular camera are required, and the precise position of the picture is obtained through real-time differential positioning (Real-time kinematic, RTK). There are no special requirements for the camera model. The point cloud map is obtained using the original data and the panoramic 3D reconstruction algorithm. The final navigation map is obtained based on the point cloud map and the plane map. The navigation map is stored in the cloud, and the corresponding area map is loaded every time the navigation is started. The entire navigation service is conducted within the scope of the map. The cloud navigation service mainly focuses on initial positioning and relocation tasks during user tracking. The real-time navigation process uses the local V0 scale recovery proposed in the embodiment of the present invention. The cloud is deployed on high-performance servers, and the network needs to be kept open at the same time. After the navigation is started, the terminal will continuously interact with the cloud navigation service to achieve real-time navigation.
流程1,本实施例中,离线生成导航地图的具体步骤如下Process 1. In this embodiment, the specific steps for generating a navigation map offline are as follows:
步骤1:使用全景相机拍摄导航业务(地图)区域,得到全景视频。对采图使用的全景相机的品牌和型号无特殊要求。需要特别说明的是:拍摄过程中至少包含一个“回环”。回环指在拍摄一段距离后绕回“原点”,其中,“原点”并非特指最开始扫描的位置,而是扫描过程中曾经走过部分场景。拍摄路线类似奥运五环的形式。也就是手,采集的全景视频中,包含同一对象不同角度的图像/视频。Step 1: Use a panoramic camera to shoot the navigation service (map) area to obtain a panoramic video. There are no special requirements for the brand and model of the panoramic camera used for image collection. It should be noted that the shooting process contains at least one "loopback". Looping means to circle back to the "origin" after shooting for a certain distance. The "origin" does not specifically refer to the initial scanning position, but to part of the scene that has been walked through during the scanning process. The shooting route is similar to the five Olympic rings. That is, the panoramic video collected by hand contains images/videos of the same object from different angles.
步骤2:对场景中部分场景拍摄若干图片,并使用RTK获取拍摄图片是精确位置坐标。对拍摄图片设备的品牌和型号无特殊要求,常见手机、单反相机等设备都可。对RTK的品牌和型号无特殊要求。需要特殊说明的是:(I)这部分图片会被用来做真实尺度回复,因此拍摄位置不能在同一条直线上,拍摄位置尽可能分布在整个区域中;(II)尽可能拍摄对于场景中典型的场景。Step 2: Take several pictures of part of the scene, and use RTK to obtain the precise location coordinates of the taken pictures. There are no special requirements for the brand and model of the equipment for taking pictures. Common mobile phones, SLR cameras and other equipment are acceptable. There are no special requirements for the brand and model of RTK. Special instructions are needed: (I) This part of the picture will be used for real-scale restoration, so the shooting positions cannot be on the same straight line, and the shooting positions should be distributed as much as possible in the entire area; (II) Try to shoot as close to the scene as possible Typical scenario.
步骤3:使用三维重建算法对步骤1和步骤2采集的数据进行三维重建,生成点云地图。三维重建算法基本流程为:全景视频抽帧、运行视觉即时定位与地图创建(Simultaneous Localization and Mapping,SLAM)获取全景关键帧和位姿、全景切图生成单目图片和其对应的位姿、使用单目图片和位姿运行交叉矩阵结构(Structure from Motion,SfM)生成稀疏点云、稠密化生成稠密点云。Step 3: Use the 3D reconstruction algorithm to perform 3D reconstruction of the data collected in steps 1 and 2 to generate a point cloud map. The basic process of the three-dimensional reconstruction algorithm is: extract frames from the panoramic video, run Simultaneous Localization and Mapping (SLAM) to obtain panoramic key frames and poses, cut panoramic images to generate monocular images and their corresponding poses, and use Monocular images and poses are run through a cross matrix structure (Structure from Motion, SfM) to generate sparse point clouds, and densification generates dense point clouds.
步骤4:结合平面地图和步骤3生成的点云地图,生成导航过程中使用的导航地图,保存到云端。Step 4: Combine the flat map and the point cloud map generated in step 3 to generate the navigation map used in the navigation process and save it to the cloud.
流程2,导航流程如下:Process 2, the navigation process is as follows:
步骤1:终端启动增强现实(Augmented Reality,AR)导航服务,云端加载流程1中生成的导航地图。终端指网络通畅的智能手机,对品牌无特殊要求。Step 1: The terminal starts the augmented reality (AR) navigation service, and the cloud loads the navigation map generated in process 1. The terminal refers to a smartphone with a smooth network, and there are no special requirements for the brand.
步骤2:终端开始初始定位,使用摄像头拍摄当前环境图片,并上传到 云端。Step 2: The terminal starts initial positioning, uses the camera to take pictures of the current environment, and uploads them to Cloud.
步骤3:云端对当前环境图片运行初始定位,获得当前图片的位姿后,返回给终端作为用户初始位置。Step 3: The cloud performs initial positioning on the current environment picture. After obtaining the pose of the current picture, it returns it to the terminal as the user's initial position.
步骤4:终端获得初始位置后,选择导航目的地,上传到云端。Step 4: After the terminal obtains the initial position, select the navigation destination and upload it to the cloud.
步骤5:云端根据导航的起点和目的地规划导航路径,并将移动方向指示标识渲染到终端的屏幕上。Step 5: The cloud plans the navigation path based on the starting point and destination of the navigation, and renders the movement direction indicator on the screen of the terminal.
步骤6:用户按照指示标识移动,终端屏幕实施显示当前摄像头拍摄到的画面,同时终端启动V0,开启用户跟踪。Step 6: The user moves according to the instructions, and the terminal screen displays the picture captured by the current camera. At the same time, the terminal starts V0 and turns on user tracking.
步骤7:当终端V0达到一定时间,启动本地V0尺度恢复算法,算法流程如附图1所示,具体步骤如下Step 7: When the terminal V0 reaches a certain time, start the local V0 scale recovery algorithm. The algorithm process is shown in Figure 1. The specific steps are as follows
步骤7.1:终端从V0最近得到的部分个关键帧中,选择平均视差最大的若干关键帧上传到云端;Step 7.1: The terminal selects several key frames with the largest average disparity from some of the recent key frames obtained by V0 and uploads them to the cloud;
步骤7.2:云端对上传的关键帧进行重定位,返回对应的位姿;Step 7.2: The cloud repositions the uploaded key frames and returns the corresponding pose;
步骤7.3:终端将云端重定位关键帧位姿作为先验加入到V0的计算中,将残差定义为先验位姿减去相似变换矩阵乘以本地位姿。通过上述公式1求变换矩阵T,通过上述公式2求尺度因子s。Step 7.3: The terminal adds the cloud relocation key frame pose as a priori to the calculation of V0, and defines the residual as the prior pose minus the similarity transformation matrix multiplied by the local pose. The transformation matrix T is obtained through the above formula 1, and the scale factor s is obtained through the above formula 2.
步骤8:将恢复出真实尺度的本地V0求解出的位姿作为用户当前位姿,实现融合本地V0和云端重定位的多模态用户跟踪。同时终端不停使用当前位置判断是都到达目的地。Step 8: Use the pose solved by recovering the local V0 of the true scale as the user's current pose to achieve multi-modal user tracking that integrates local V0 and cloud relocation. At the same time, the terminal continuously uses the current location to determine whether the destination has been reached.
室内场景AR导航,实施时将建图算法和识别算法部署在云端,具体实施流程如下:For indoor scene AR navigation, the mapping algorithm and recognition algorithm are deployed in the cloud during implementation. The specific implementation process is as follows:
步骤1:按照流程1步骤1采集原始视频数据。室内全景视频拍摄方法一般是手持全景相机,按照流程1步骤1的方式拍摄,也可以通过其它设备固定全景相机,如头盔等。Step 1: Follow step 1 of process 1 to collect original video data. The indoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot. The panoramic camera can also be fixed by other equipment, such as a helmet.
步骤2:按照流程1步骤2采集图片数据。采集图片数据一般通过手机 拍照,其它可以拍照的设备也可。通常拍摄较典型的场景,如店铺招牌等,此类场景更大概率是导航的起点或者终点。Step 2: Collect image data according to step 2 of process 1. Image data is generally collected through mobile phones Take photos or other devices that can take photos. Usually shooting more typical scenes, such as store signs, etc., such scenes are more likely to be the starting point or end point of navigation.
步骤3:部署自研建图算法,然后按照流程1步骤3所诉,对上述步骤1和步骤2采集的原始数据,使用算法进行三维重建,生成点云地图。Step 3: Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.
步骤4:按照流程1步骤4生成导航地图,点云地图为步骤3生成的点云地图,室内情况下平面地图使用建筑的CAD图。Step 4: Follow step 4 of process 1 to generate a navigation map. The point cloud map is the point cloud map generated in step 3. In indoor situations, the floor map uses the CAD drawing of the building.
步骤5:按照流程二步骤1和步骤2启动导航,启动导航时云端加载地图信息,并开始接受来自终端的消息,终端上传当前环境图片。Step 5: Follow steps 1 and 2 of process 2 to start navigation. When starting navigation, the cloud loads map information and starts to accept messages from the terminal. The terminal uploads pictures of the current environment.
步骤6:按照流程二步骤3、步骤4和步骤5生成导航路径,云端根据上传的图片进行初始定位,将结果返回到终端,终端选择导航目的地上传到云端,云端根据当前位置和目的地生成导航路径,并渲染到终端屏幕上。Step 6: Follow steps 3, 4 and 5 of process 2 to generate a navigation path. The cloud performs initial positioning based on the uploaded image and returns the result to the terminal. The terminal selects the navigation destination and uploads it to the cloud. The cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.
步骤7:按照流程二步骤6到步骤8,终端启动V0,当V0运行跟踪每达到20秒,启动尺度恢复算法。在执行本地尺度恢复时,从V0最近得到的10张关键帧中选择平均视差最大的3张上传到云端进行重定位;将重定位的位姿作为先验,求解本地V0位姿和先验位姿的相似变换,通过此变换矩阵恢复出尺度因子,最终恢复出本地V0的尺度,实现实时多模态导航。本地V0可以较准确的用户跟踪,结合本发明实施例的尺度恢复算法恢复出本地V0的真实尺度,可以实现导航过程中持续准确的进行用户跟踪。Step 7: Follow steps 6 to 8 of process 2. The terminal starts V0. When V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm. When performing local scale recovery, select the 3 with the largest average parallax from the 10 most recent key frames obtained by V0 and upload them to the cloud for relocation; use the relocated pose as a priori to solve the local V0 pose and prior position. Similar transformation of pose, the scale factor is restored through this transformation matrix, and finally the scale of local V0 is restored, realizing real-time multi-modal navigation. The local V0 can achieve relatively accurate user tracking. Combined with the scale recovery algorithm of the embodiment of the present invention, the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.
按上述流程实施,可以实现完整的室内AR导航。By following the above process, complete indoor AR navigation can be achieved.
室外场景AR导航,实施时将建图算法和识别算法部署在云端,具体实施流程如下:When implementing outdoor scene AR navigation, the mapping algorithm and recognition algorithm are deployed in the cloud. The specific implementation process is as follows:
步骤1:按照流程1步骤1所诉采集原始视频数据。室外全景视频拍摄方法一般是手持全景相机,按照流程1步骤1的方式拍摄。如果场景较大,还可以使用无人机搭载全景相机等其它方式,拍摄路线仍要符合流程1步骤1描述。Step 1: Collect original video data as described in Step 1 of Process 1. The outdoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot. If the scene is large, you can also use other methods such as a drone equipped with a panoramic camera. The shooting route must still comply with the description in step 1 of the process.
步骤2:按照流程1步骤2采集图片数据。采集图片数据一般通过手机 拍照,其它可以拍照的设备也可。通常拍摄较典型的场景,如路标,建筑大门等,此类场景更大概率是导航的起点或者终点。Step 2: Collect image data according to step 2 of process 1. Image data is generally collected through mobile phones Take photos or other devices that can take photos. We usually shoot more typical scenes, such as road signs, building gates, etc. Such scenes are more likely to be the starting point or end point of navigation.
步骤3:部署自研建图算法,然后按照流程1步骤3所诉,对上述步骤1和步骤2采集的原始数据,使用算法进行三维重建,生成点云地图。Step 3: Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.
步骤4:按照流程1步骤4生成导航地图,点云地图为步骤3生成的点云地图,室外情况下平面地图可以是平面CAD图和路网信息。Step 4: Follow step 4 of process 1 to generate a navigation map. The point cloud map is the point cloud map generated in step 3. In outdoor situations, the plan map can be a plan CAD map and road network information.
步骤5:按照流程二步骤1和步骤2启动导航,启动导航时云端加载地图信息,并开始接受来自终端的消息,终端上传当前环境图片。Step 5: Follow steps 1 and 2 of process 2 to start navigation. When starting navigation, the cloud loads map information and starts to accept messages from the terminal. The terminal uploads pictures of the current environment.
步骤6:按照流程二步骤3、步骤4和步骤5生成导航路径,云端根据上传的图片进行初始定位,将结果返回到终端,终端选择导航目的地上传到云端,云端根据当前位置和目的地生成导航路径,并渲染到终端屏幕上。Step 6: Follow steps 3, 4 and 5 of process 2 to generate a navigation path. The cloud performs initial positioning based on the uploaded image and returns the result to the terminal. The terminal selects the navigation destination and uploads it to the cloud. The cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.
步骤7:按照流程二步骤6到步骤8,终端启动V0,当V0运行跟踪每达到20秒,启动尺度恢复算法。在执行本地尺度恢复时,从V0最近得到的10张关键帧中选择平均视差最大的3张上传到云端进行重定位;将重定位的位姿作为先验,求解本地V0位姿和先验位姿的相似变换,通过此变换矩阵恢复出尺度因子,最终恢复出本地V0的尺度,实现实时多模态导航。本地V0可以较准确的用户跟踪,结合本发明实施例的尺度恢复算法恢复出本地V0的真实尺度,可以实现导航过程中持续准确的进行用户跟踪。Step 7: Follow steps 6 to 8 of process 2. The terminal starts V0. When V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm. When performing local scale recovery, select the 3 with the largest average parallax from the 10 most recent key frames obtained by V0 and upload them to the cloud for relocation; use the relocated pose as a priori to solve the local V0 pose and prior position. Similar transformation of pose, the scale factor is restored through this transformation matrix, and finally the scale of local V0 is restored, realizing real-time multi-modal navigation. The local V0 can achieve relatively accurate user tracking. Combined with the scale recovery algorithm of the embodiment of the present invention, the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.
按上述流程实施,可以实现完整的室外AR导航。By following the above process, complete outdoor AR navigation can be achieved.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明实施例并不受所描述的动作顺序的限制,因为依据本发明实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明实施例所必须的。 It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the embodiments of the present invention are not limited by the described action sequence. limitation, because certain steps may be performed in other orders or simultaneously according to embodiments of the present invention. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily necessary for the embodiments of the present invention.
根据本申请实施例的另一方面,还提供了一种视觉定位的位姿确定装置,如图4所示,包括:According to another aspect of the embodiment of the present application, a visual positioning pose determination device is also provided, as shown in Figure 4, including:
获取模块402,用于在安装有摄像头的终端移动的过程中,获取终端拍摄的多张图像;The acquisition module 402 is used to acquire multiple images captured by the terminal during the movement of the terminal equipped with a camera;
选择模块404,用于从多张图像中,根据视差选择出多张目标图像;The selection module 404 is used to select multiple target images from multiple images based on disparity;
上传模块406,用于将多张目标图像上传到云端,以获取到终端的约束位姿;The upload module 406 is used to upload multiple target images to the cloud to obtain the constrained pose of the terminal;
确定模块408,用于根据约束位姿和终端的本地位姿确定出终端的目标位姿。The determination module 408 is used to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
可选的,本实施例中,位姿可以为终端的移动轨迹和位置。本实施例的目的在于确定出终端的准确的目标位姿,即终端准确的移动轨迹和位置,从而可以应用在对终端进行导航、对终端进行定位的过程中。Optionally, in this embodiment, the posture may be the movement trajectory and position of the terminal. The purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.
上述的终端可以安装摄像头,摄像头可以包括前置摄像头、后置摄像头或外接摄像头,摄像头可以为单摄像头或多摄像头组成的摄像头阵列。上述终端可以被携带进行移动。例如,某用户携带终端在一定区域内移动,则终端可以通过摄像头拍照,获取多张图像。需要说明的是,终端的摄像头拍摄的是上述用户所在的一定区域内的图像,如果终端被放置到衣服口袋中,摄像头被衣服布料遮挡,则无法获取上述多张图像。The above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera. The camera can be a single camera or a camera array composed of multiple cameras. The above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.
在获取到多张图像后,可以根据视差选择多张目标图像。多张目标图像被上传到云端后,云端可以根据目标图像确定终端的约束位姿,约束位姿是用于对终端的本地位姿进行约束的位姿,将约束位姿发送给终端,再由终端根据约束位姿和本地位姿来确定终端的准确的目标位姿。确定出目标位姿后,可以在终端上显示目标位姿,以进行导航或者定位。After acquiring multiple images, multiple target images can be selected based on disparity. After multiple target images are uploaded to the cloud, the cloud can determine the constrained pose of the terminal based on the target image. The constrained pose is a pose used to constrain the local pose of the terminal. The constrained pose is sent to the terminal, and then the constrained pose is sent to the terminal. The terminal determines the accurate target pose of the terminal based on the constrained pose and the local pose. After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.
由于在上述方法中,通过安装摄像头的终端移动的时候拍摄多张图像,通过多张图像中根据视差选择的目标图像来确定约束位姿,并使用约束位姿来约束本地位置,可以确定出终端准确的目标位姿。从而实现了提高确定的 位姿的准确度的目的。Since in the above method, multiple images are taken when the terminal with the camera is moved, the constrained pose is determined by the target image selected based on the parallax in the multiple images, and the constrained pose is used to constrain the local position, the terminal can be determined Accurate target pose. Thus achieving the improvement of certain The purpose of pose accuracy.
作为一种可选的示例,上述选择模块包括:第一确定单元,用于从上述多张图像中,确定同一对象的多张第一图像;第二确定单元,用于将上述多张第一图像中,视差最大的两张图像作为上述多张目标图像中的图像。As an optional example, the above-mentioned selection module includes: a first determination unit, used to determine multiple first images of the same object from the above-mentioned multiple images; a second determination unit, used to determine the above-mentioned multiple first images Among the images, the two images with the largest disparity are used as the images among the multiple target images mentioned above.
本实施例中,在根据视差从多张图像中选择多张目标图像时,可以获取同一对象的图像,然后,计算同一对象的图像中的每两张图像之间的视差,并按照视差进行排序,排序后,可以将视差最大的两张图像作为目标图像。如果包含多个对象,则每一个对象确定出两张目标图像。In this embodiment, when selecting multiple target images from multiple images based on disparity, an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.
作为一种可选的示例,上述确定模块包括:第三确定单元,用于根据上述约束位姿和上述本地位姿确定出变换矩阵;获取单元,用于从上述变换矩阵中获取尺度因子;第四确定单元,用于将上述本地位姿与上述尺度因子的乘积作为上述目标位姿。As an optional example, the above-mentioned determination module includes: a third determination unit, used to determine the transformation matrix according to the above-mentioned constrained posture and the above-mentioned local posture; an acquisition unit, used to obtain the scale factor from the above-mentioned transformation matrix; Four determination units, configured to use the product of the above-mentioned local posture and the above-mentioned scale factor as the above-mentioned target posture.
本实施例中,可以根据约束位姿和本地位姿确定变换矩阵,确定出变换矩阵后,从变换矩阵中获取尺度因子。尺度因子是用于对终端的本地位姿进行调整的因子,通过尺度因子,对终端的本地位姿进行乘法计算,得到计算后的位姿,计算后的位姿即为通过尺度因子调整后的位姿,计算后的位姿为准确的目标位姿。In this embodiment, the transformation matrix can be determined based on the constrained posture and the local posture. After the transformation matrix is determined, the scale factor is obtained from the transformation matrix. The scale factor is a factor used to adjust the local posture of the terminal. The local posture of the terminal is multiplied by the scale factor to obtain the calculated posture. The calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.
作为一种可选的示例,上述第三确定单元包括:第一输入子单元,用于将上述本地位姿的第一数值与上述约束位姿的第二数值代入上述公式1,得到上述变换矩阵与残差。As an optional example, the above-mentioned third determination unit includes: a first input sub-unit, used to substitute the first numerical value of the above-mentioned local posture and the second numerical value of the above-mentioned constrained posture into the above-mentioned formula 1 to obtain the above-mentioned transformation matrix and residuals.
可选的,本实施例中,当知晓本地位姿的第一数值与约束位姿的第二数值的情况下,将两者代入到上述公式中。由于本地位姿和约束位姿是一连串的位置信息,因此,可以计算得到上述残差和上述变换矩阵T。Optionally, in this embodiment, when the first numerical value of the local posture and the second numerical value of the constrained posture are known, the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.
作为一种可选的示例,上述获取单元包括:第二输入子单元,用于将上述约束位姿和上述本地位姿的相对旋转与相对偏移代入到上述公式2中,得到上述尺度因子。 As an optional example, the above-mentioned acquisition unit includes: a second input sub-unit, used to substitute the relative rotation and relative offset of the above-mentioned constrained posture and the above-mentioned local posture into the above-mentioned formula 2 to obtain the above-mentioned scale factor.
可选的,本实施例中,由于上述变换矩阵T已经计算得到,而r与t均为已知量,因此,可以计算尺度因子s。Optionally, in this embodiment, since the above transformation matrix T has been calculated, and r and t are both known quantities, the scale factor s can be calculated.
作为一种可选的示例,上述上传模块包括:重定位单元,用于通知上述云端按照导航地图,对上述多张目标图像中的每一张目标图像进行重定位,得到每一张目标图像对应的重定位位置;上述云端将上述重定位位置按照先后顺序排列,得到上述约束位姿。As an optional example, the above-mentioned upload module includes: a relocation unit, used to notify the above-mentioned cloud to relocate each of the above-mentioned multiple target images according to the navigation map, and obtain the corresponding The relocation positions; the above-mentioned cloud arranges the above-mentioned relocation positions in order to obtain the above-mentioned constrained poses.
可选的,本实施例中,当确定出多张目标图像之后,可以将目标图像上传到云端。云端保存有导航地图,导航地图是一定区域内的地图。多张目标图像可以在导航地图中确定相似度高的图像,经过比对,可以在导航地图中确定出多张目标图像中每一张目标图像的位置。位置按照时间先后顺序排列即可得到位姿。将得到的位姿作为约束位姿。Optionally, in this embodiment, after multiple target images are determined, the target images can be uploaded to the cloud. A navigation map is saved in the cloud, and the navigation map is a map within a certain area. Multiple target images can be used to determine images with high similarity in the navigation map. After comparison, the position of each target image in the multiple target images can be determined in the navigation map. The positions can be obtained by arranging the positions in chronological order. The obtained pose is used as the constrained pose.
作为一种可选的示例,上述云端可以获取导航区域内的全景视频和上述导航区域内的多张拍摄图像;根据上述全景视频和上述拍摄图像,生成点云地图;将上述点云地图与平面地图结合,得到上述导航地图。As an optional example, the cloud can obtain a panoramic video in the navigation area and multiple captured images in the navigation area; generate a point cloud map based on the panoramic video and the captured images; combine the point cloud map with a plane The maps are combined to obtain the above navigation map.
可选的,本实施例中的导航地图需要预先获取。可以在导航区域内拍摄全景视频,并拍摄多张的拍摄图像,全景视频和拍摄图像可以用于生成点云地图。然后,点云地图和导航区域的平面地图结合,得到上述的导航地图。Optionally, the navigation map in this embodiment needs to be obtained in advance. Panoramic videos can be captured in the navigation area and multiple captured images can be captured. The panoramic videos and captured images can be used to generate point cloud maps. Then, the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.
作为一种可选的示例,云端可以从上述全景视频中抽取目标帧;确定上述目标帧的第一位姿;对上述第一位姿进行交叉,生成矩阵结构,得到稀疏点云;对上述稀疏点云进行稠密化,得到上述点云地图。As an optional example, the cloud can extract the target frame from the above panoramic video; determine the first pose of the above target frame; intersect the above first pose to generate a matrix structure to obtain a sparse point cloud; perform the above sparse The point cloud is densified to obtain the above point cloud map.
可选的,本实施例中,在获取到全景视频和拍摄图像后,可以从全景视频中抽取目标帧,每一个目标帧为一个图像。将抽取的多帧目标帧的位置确定为第一位姿,对第一位姿进行交叉,生成矩阵结构,得到稀疏点云,稀疏点云进行稠密化,得到点云地图。Optionally, in this embodiment, after acquiring the panoramic video and shooting images, target frames can be extracted from the panoramic video, and each target frame is an image. Determine the position of the extracted multi-frame target frame as the first pose, cross the first pose, generate a matrix structure, and obtain a sparse point cloud. The sparse point cloud is densified to obtain a point cloud map.
本实施例的其他示例请参见上述示例,在此不在赘述。 For other examples of this embodiment, please refer to the above examples and will not be described again here.
图5是根据本申请实施例的一种可选的电子设备的结构框图,如图5所示,包括处理器502、通信接口504、存储器506和通信总线508,其中,处理器502、通信接口504和存储器506通过通信总线508完成相互间的通信,其中,Figure 5 is a structural block diagram of an optional electronic device according to an embodiment of the present application. As shown in Figure 5, it includes a processor 502, a communication interface 504, a memory 506 and a communication bus 508. The processor 502, the communication interface 504 and memory 506 complete communication with each other through communication bus 508, where,
存储器506,用于存储计算机程序;Memory 506 for storing computer programs;
处理器502,用于执行存储器506上所存放的计算机程序时,实现如下步骤:The processor 502 is used to implement the following steps when executing the computer program stored on the memory 506:
在安装有摄像头的终端移动的过程中,获取终端拍摄的多张图像;During the movement of the terminal equipped with a camera, obtain multiple images taken by the terminal;
从多张图像中,根据视差选择出多张目标图像;Select multiple target images from multiple images based on disparity;
将多张目标图像上传到云端,以获取到终端的约束位姿;Upload multiple target images to the cloud to obtain the constrained pose of the terminal;
根据约束位姿和终端的本地位姿确定出终端的目标位姿。The target pose of the terminal is determined based on the constrained pose and the local pose of the terminal.
可选地,在本实施例中,上述的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线、或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口用于上述电子设备与其他设备之间的通信。Optionally, in this embodiment, the above-mentioned communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus. The communication interface is used for communication between the above-mentioned electronic devices and other devices.
存储器可以包括RAM,也可以包括非易失性存储器(non-volatile memory),例如,至少一个磁盘存储器。可选地,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include RAM or non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.
作为一种示例,上述存储器506中可以但不限于包括上述视觉定位的位姿确定装置中的获取模块402、选择模块404、上传模块406以及确定模块408。此外,还可以包括但不限于上述视觉定位的位姿确定装置中的其他模块单元,本示例中不再赘述。As an example, the above memory 506 may include, but is not limited to, the acquisition module 402, the selection module 404, the upload module 406 and the determination module 408 in the above visual positioning pose determination device. In addition, it may also include but is not limited to other module units in the above-mentioned visual positioning posture determination device, which will not be described again in this example.
上述处理器可以是通用处理器,可以包含但不限于:中央处理器(Central  Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor may be a general-purpose processor and may include but is not limited to: a central processing unit (Central Processing Unit). Processing Unit (CPU), Network Processor (NP), etc.; it can also be a Digital Signal Processing (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array ( Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the above embodiments, which will not be described again in this embodiment.
本领域普通技术人员可以理解,图5所示的结构仅为示意,实施上述视觉定位的位姿确定方法的设备可以是终端设备,该终端设备可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图5其并不对上述电子设备的结构造成限定。例如,电子设备还可包括比图5中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图5所示的不同的配置。Those of ordinary skill in the art can understand that the structure shown in Figure 5 is only illustrative, and the device that implements the pose determination method of visual positioning can be a terminal device, and the terminal device can be a smart phone (such as an Android phone, iOS phone, etc.) , tablet computers, handheld computers, and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal devices. Figure 5 does not limit the structure of the above electronic device. For example, the electronic device may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 5 , or have a different configuration than that shown in FIG. 5 .
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、ROM、RAM、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program. The program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, ROM, RAM, magnetic disk or optical disk, etc.
根据本发明实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被处理器运行时执行上述视觉定位的位姿确定方法中的步骤。According to another aspect of the embodiment of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, wherein the computer program executes the above-mentioned visual positioning when run by the processor. Steps in the pose determination method.
可选地,在本实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。Optionally, in this embodiment, those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program, and the program can be stored in a Among computer-readable storage media, the storage media can include: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。 The above serial numbers of the embodiments of the present invention are only for description and do not represent the advantages and disadvantages of the embodiments.
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明实施例的各个实施例所述方法的全部或部分步骤。If the integrated units in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in the above computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present invention is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium. , including several instructions to cause one or more computer devices (which can be personal computers, servers, network devices, etc.) to execute all or part of the steps of the methods described in various embodiments of the embodiments of the present invention.
在本发明实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the embodiments of the present invention, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed client can be implemented in other ways. Among them, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in the embodiment of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.
以上所述仅是本发明实施例的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明实施例的保护范围。 The above are only preferred implementations of the embodiments of the present invention. It should be noted that those of ordinary skill in the art can make several improvements and modifications without departing from the principles of the embodiments of the present invention. Improvements and modifications should also be considered as the protection scope of the embodiments of the present invention.

Claims (10)

  1. 一种视觉定位的位姿确定方法,包括:A method for determining the pose of visual positioning, including:
    在安装有摄像头的终端移动的过程中,获取所述终端拍摄的多张图像;During the movement of a terminal equipped with a camera, obtain multiple images captured by the terminal;
    从所述多张图像中,根据视差选择出多张目标图像;Select a plurality of target images from the plurality of images based on disparity;
    将所述多张目标图像上传到云端,以获取到所述终端的约束位姿;Upload the multiple target images to the cloud to obtain the constrained pose of the terminal;
    根据所述约束位姿和所述终端的本地位姿确定出所述终端的目标位姿。The target pose of the terminal is determined based on the constrained pose and the local pose of the terminal.
  2. 根据权利要求1所述的方法,其中,所述从所述多张图像中,根据视差选择出多张目标图像包括:The method according to claim 1, wherein selecting a plurality of target images from the plurality of images based on disparity includes:
    从所述多张图像中,确定同一对象的多张第一图像;Determine multiple first images of the same object from the multiple images;
    将所述多张第一图像中,视差最大的两张图像作为所述多张目标图像中的图像。Among the plurality of first images, the two images with the largest parallax are used as the images among the plurality of target images.
  3. 根据权利要求1所述的方法,其中,所述根据所述约束位姿和所述终端的本地位姿确定出所述终端的目标位姿包括:The method according to claim 1, wherein determining the target pose of the terminal based on the constrained pose and the local pose of the terminal includes:
    根据所述约束位姿和所述本地位姿确定出变换矩阵;Determine a transformation matrix according to the constrained pose and the local pose;
    从所述变换矩阵中获取尺度因子;Obtain scale factors from the transformation matrix;
    将所述本地位姿与所述尺度因子的乘积作为所述目标位姿。The product of the local posture and the scale factor is used as the target posture.
  4. 根据权利要求3所述的方法,其中,所述根据所述约束位姿和所述本地位姿确定出变换矩阵包括:The method according to claim 3, wherein determining the transformation matrix according to the constrained pose and the local pose includes:
    将所述本地位姿的第一数值与所述约束位姿的第二数值代入如下公式,得到所述变换矩阵与残差:
    Substitute the first numerical value of the local pose and the second numerical value of the constrained pose into the following formula to obtain the transformation matrix and residual:
    其中,所述residual为残差,是所述约束位姿的旋转,是所述约束位姿的平移,R是所述本地位姿的旋转,T是所述本地位姿的平移,T为所述变换矩阵。Among them, the residual is the residual, is the rotation of the constrained pose, is the translation of the constrained pose, R is the rotation of the local pose, T is the translation of the local pose, and T is the transformation matrix.
  5. 根据权利要求3所述的方法,其中,所述从所述变换矩阵中获取尺度因子包括:The method according to claim 3, wherein obtaining the scale factor from the transformation matrix includes:
    将所述约束位姿和所述本地位姿的相对旋转与相对偏移代入到如下公式中,得到所述尺度因子:
    The relative rotation and relative offset of the constrained posture and the local posture are substituted into the following formula to obtain the scale factor:
    其中,T为所述变换矩阵,s为所述尺度因子,r为所述约束位姿和所述本地位姿的相对旋转,t为所述约束位姿和所述本地位姿的相对偏移。Where, T is the transformation matrix, s is the scale factor, r is the relative rotation between the constrained pose and the local pose, and t is the relative offset between the constrained pose and the local pose. .
  6. 根据权利要求1所述的方法,其中,所述将所述多张目标图像上传到云端,以获取到所述终端的约束位姿包括:The method according to claim 1, wherein uploading the plurality of target images to the cloud to obtain the constrained pose of the terminal includes:
    所述云端按照导航地图,对所述多张目标图像中的每一张目标图像进行重定位,得到每一张目标图像对应的重定位位置;The cloud repositions each of the plurality of target images according to the navigation map to obtain the relocation position corresponding to each target image;
    所述云端将所述重定位位置按照先后顺序排列,得到所述约束位姿。The cloud arranges the relocation positions in order to obtain the constrained pose.
  7. 根据权利要求6所述的方法,其中,所述方法还包括:The method of claim 6, further comprising:
    所述云端获取导航区域内的全景视频和所述导航区域内的多张拍摄图像;The cloud acquires the panoramic video in the navigation area and multiple captured images in the navigation area;
    根据所述全景视频和所述拍摄图像,生成点云地图;Generate a point cloud map based on the panoramic video and the captured image;
    将所述点云地图与平面地图结合,得到所述导航地图。The point cloud map is combined with the plane map to obtain the navigation map.
  8. 根据权利要求7所述的方法,其中,所述根据所述全景视频和所述拍摄图像,生成点云地图包括: The method according to claim 7, wherein generating a point cloud map according to the panoramic video and the captured image includes:
    从所述全景视频中抽取目标帧;Extract target frames from the panoramic video;
    确定所述目标帧的第一位姿;Determine the first pose of the target frame;
    对所述第一位姿进行交叉,生成矩阵结构,得到稀疏点云;Cross the first pose to generate a matrix structure and obtain a sparse point cloud;
    对所述稀疏点云进行稠密化,得到所述点云地图。Densify the sparse point cloud to obtain the point cloud map.
  9. 一种视觉定位的位姿确定装置,包括:A posture determination device for visual positioning, including:
    获取模块,设置为在安装有摄像头的终端移动的过程中,获取所述终端拍摄的多张图像;An acquisition module configured to acquire a plurality of images captured by a terminal equipped with a camera during the movement of the terminal;
    选择模块,设置为从所述多张图像中,根据视差选择出多张目标图像;A selection module configured to select multiple target images from the multiple images based on disparity;
    上传模块,设置为将所述多张目标图像上传到云端,以获取到所述终端的约束位姿;An upload module, configured to upload the plurality of target images to the cloud to obtain the constrained pose of the terminal;
    确定模块,设置为根据所述约束位姿和所述终端的本地位姿确定出所述终端的目标位姿。A determining module configured to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
  10. 一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至8任一项中所述的方法。 An electronic device includes a memory and a processor, a computer program is stored in the memory, and the processor is configured to execute the method described in any one of claims 1 to 8 through the computer program.
PCT/CN2023/101166 2022-06-28 2023-06-19 Visual-localization-based pose determination method and apparatus, and electronic device WO2024001849A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210751878.3 2022-06-28
CN202210751878.3A CN117346650A (en) 2022-06-28 2022-06-28 Pose determination method and device for visual positioning and electronic equipment

Publications (1)

Publication Number Publication Date
WO2024001849A1 true WO2024001849A1 (en) 2024-01-04

Family

ID=89369772

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101166 WO2024001849A1 (en) 2022-06-28 2023-06-19 Visual-localization-based pose determination method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN117346650A (en)
WO (1) WO2024001849A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018124211A1 (en) * 2017-10-06 2019-04-11 Nvidia Corporation Learning-based camera pose estimation of images of an environment
CN112197770A (en) * 2020-12-02 2021-01-08 北京欣奕华数字科技有限公司 Robot positioning method and positioning device thereof
CN112270710A (en) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 Pose determination method, pose determination device, storage medium, and electronic apparatus
CN112819860A (en) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 Visual inertial system initialization method and device, medium and electronic equipment
CN113029128A (en) * 2021-03-25 2021-06-25 浙江商汤科技开发有限公司 Visual navigation method and related device, mobile terminal and storage medium
CN113409391A (en) * 2021-06-25 2021-09-17 浙江商汤科技开发有限公司 Visual positioning method and related device, equipment and storage medium
WO2022002039A1 (en) * 2020-06-30 2022-01-06 杭州海康机器人技术有限公司 Visual positioning method and device based on visual map
CN114120301A (en) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 Pose determination method, device and equipment
CN114185073A (en) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 Pose display method, device and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018124211A1 (en) * 2017-10-06 2019-04-11 Nvidia Corporation Learning-based camera pose estimation of images of an environment
WO2022002039A1 (en) * 2020-06-30 2022-01-06 杭州海康机器人技术有限公司 Visual positioning method and device based on visual map
CN112270710A (en) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 Pose determination method, pose determination device, storage medium, and electronic apparatus
CN112197770A (en) * 2020-12-02 2021-01-08 北京欣奕华数字科技有限公司 Robot positioning method and positioning device thereof
CN112819860A (en) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 Visual inertial system initialization method and device, medium and electronic equipment
CN113029128A (en) * 2021-03-25 2021-06-25 浙江商汤科技开发有限公司 Visual navigation method and related device, mobile terminal and storage medium
CN113409391A (en) * 2021-06-25 2021-09-17 浙江商汤科技开发有限公司 Visual positioning method and related device, equipment and storage medium
CN114120301A (en) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 Pose determination method, device and equipment
CN114185073A (en) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 Pose display method, device and system

Also Published As

Publication number Publication date
CN117346650A (en) 2024-01-05

Similar Documents

Publication Publication Date Title
EP3457683B1 (en) Dynamic generation of image of a scene based on removal of undesired object present in the scene
US9159169B2 (en) Image display apparatus, imaging apparatus, image display method, control method for imaging apparatus, and program
AU2009257959B2 (en) 3D content aggregation built into devices
KR102000536B1 (en) Photographing device for making a composion image and method thereof
CN108958469B (en) Method for adding hyperlinks in virtual world based on augmented reality
WO2010028559A1 (en) Image splicing method and device
EP2981945A1 (en) Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system
CN103945134A (en) Method and terminal for taking and viewing photos
US11044398B2 (en) Panoramic light field capture, processing, and display
US10068157B2 (en) Automatic detection of noteworthy locations
CN108776822B (en) Target area detection method, device, terminal and storage medium
CN110296686A (en) Localization method, device and the equipment of view-based access control model
JP2016194783A (en) Image management system, communication terminal, communication system, image management method, and program
CN112422812B (en) Image processing method, mobile terminal and storage medium
JP2016194784A (en) Image management system, communication terminal, communication system, image management method, and program
KR102100667B1 (en) Apparatus and method for generating an image in a portable terminal
CN117196955A (en) Panoramic image stitching method and terminal
WO2024001849A1 (en) Visual-localization-based pose determination method and apparatus, and electronic device
WO2018000299A1 (en) Method for assisting acquisition of picture by device
GB2513865A (en) A method for interacting with an augmented reality scene
CN114882106A (en) Pose determination method and device, equipment and medium
CN110599602B (en) AR model training method and device, electronic equipment and storage medium
US20190114793A1 (en) Image Registration Method and Apparatus for Terminal, and Terminal
EP3287912A1 (en) Method for creating location-based space object, method for displaying space object, and application system thereof
TWI785332B (en) Three-dimensional reconstruction system based on optical label

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23830029

Country of ref document: EP

Kind code of ref document: A1