WO2024001849A1

WO2024001849A1 - Visual-localization-based pose determination method and apparatus, and electronic device

Info

Publication number: WO2024001849A1
Application number: PCT/CN2023/101166
Authority: WO
Inventors: 武廷繁
Original assignee: 中兴通讯股份有限公司
Priority date: 2022-06-28
Filing date: 2023-06-19
Publication date: 2024-01-04
Also published as: CN117346650A

Abstract

A visual-localization-based pose determination method, which comprises: during a movement process of a terminal on which a camera is mounted, acquiring a plurality of images captured by the terminal (S102); selecting a plurality of target images from among the plurality of images according to a disparity (S104); uploading the plurality of target images to a cloud to acquire a constraint pose of the terminal (S106); and determining a target pose of the terminal according to the constraint pose and a local pose of the terminal (S108). Further disclosed are a visual-localization-based pose determination apparatus and an electronic device.

Description

Position determination method, device and electronic equipment for visual positioning

Cross-references to related applications

This application is based on the Chinese patent application CN202210751878.3 with the invention title "Visual Positioning Pose and Orientation Determination Method, Device and Electronic Equipment" submitted on June 28, 2022, and claims the priority of this patent application, all contents of which are incorporated by reference. All disclosed contents are incorporated into this application.

Technical field

Embodiments of the present invention relate to the field of navigation, and specifically, to a visual positioning pose determination method, device, and electronic equipment.

Background technique

In the existing technology, it is usually necessary to locate the posture of the terminal. For example, during navigation and other processes, the device requesting navigation needs to be positioned. The methods that can be used include visual positioning method. However, the position positioned by the visual positioning method is easy to be inaccurate.

Contents of the invention

Embodiments of the present invention provide a visual positioning pose determination method, device, and electronic equipment to at least solve the technical problem of inaccurate positioning poses.

According to one aspect of an embodiment of the present invention, a method for determining the pose of visual positioning is provided, including: during the movement of a terminal equipped with a camera, acquiring multiple images captured by the terminal; from the multiple images, Select multiple target images based on disparity; upload the multiple target images to the cloud to obtain the constrained pose of the terminal; determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.

According to another aspect of the embodiment of the present invention, a device for determining the posture of visual positioning is provided, including: an acquisition module configured to acquire multiple images captured by the terminal when the terminal equipped with a camera is moving; select The module is configured to select multiple target images from the multiple images mentioned above based on parallax; the upload module is configured to upload the multiple target images mentioned above to the cloud to obtain to the constrained pose of the terminal; the determination module is configured to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.

According to another aspect of the embodiment of the present invention, a storage medium is also provided, and a computer program is stored in the storage medium, wherein the computer program executes the above-mentioned pose determination method for visual positioning when run by a processor.

According to another aspect of the embodiment of the present invention, an electronic device is also provided, including a memory and a processor. A computer program is stored in the memory, and the processor is configured to perform the above-mentioned visual positioning pose through the computer program. Determine the method.

Description of drawings

The drawings described here are used to provide a further understanding of the embodiments of the present invention and constitute a part of the embodiments of the present application. The schematic embodiments of the embodiments of the present invention and their descriptions are used to explain the embodiments of the present invention and do not constitute an explanation of the embodiments of the present invention. Improper limitation of the embodiment. In the attached picture:

Figure 1 is a flow chart of an optional visual positioning pose determination method according to an embodiment of the present invention;

Figure 2 is a flow chart of a local V0 scale recovery algorithm of an optional visual positioning pose determination method according to an embodiment of the present invention;

Figure 3 is a block diagram of a navigation system based on local V0 scale recovery according to an optional visual positioning pose determination method according to an embodiment of the present invention;

Figure 4 is a schematic structural diagram of an optional visual positioning posture determination device according to an embodiment of the present invention;

Figure 5 is a schematic diagram of an optional electronic device according to an embodiment of the present invention.

Detailed ways

In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only examples of a part of the embodiments of the present invention, not all of them. Examples of parts. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of the embodiments of the present invention.

It should be noted that the terms "first", "second", etc. in the description and claims of the embodiments of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. order. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the invention described herein are capable of being practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

According to the first aspect of the embodiment of the present invention, a method for determining the pose of visual positioning is provided. Optionally, as shown in Figure 1, the above method includes:

S102: During the movement of the terminal equipped with the camera, obtain multiple images captured by the terminal;

S104, select multiple target images from multiple images based on disparity;

S106, upload multiple target images to the cloud to obtain the constrained pose of the terminal;

S108: Determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.

Optionally, in this embodiment, the posture may be the movement trajectory and position of the terminal. The purpose of this embodiment is to determine the accurate target pose of the terminal, that is, the accurate movement trajectory and position of the terminal, so that it can be applied in the process of navigating and positioning the terminal.

The above-mentioned terminal can be equipped with a camera, which can include a front camera, a rear camera or an external camera. The camera can be a single camera or a camera array composed of multiple cameras. The above terminal can be carried and moved. For example, if a user moves within a certain area with a terminal, the terminal can take photos through the camera and obtain multiple images. It should be noted that the camera of the terminal captures images in a certain area where the above-mentioned user is located. If the terminal is placed in a pocket of clothes and the camera is blocked by cloth, the multiple images mentioned above cannot be obtained.

After acquiring multiple images, multiple target images can be selected based on disparity. multiple target images After being uploaded to the cloud, the cloud can determine the constrained pose of the terminal based on the target image. The constrained pose is the pose used to constrain the local pose of the terminal. The constrained pose is sent to the terminal, and then the terminal determines the constrained pose according to the constrained position. pose and local pose to determine the accurate target pose of the terminal. After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.

In the embodiment of the present invention, during the movement of a terminal equipped with a camera, a plurality of images taken by the terminal are obtained; from the plurality of images, a plurality of target images are selected according to the parallax; and the plurality of target images are selected The image is uploaded to the cloud to obtain the constrained pose of the above-mentioned terminal; the method of determining the target pose of the above-mentioned terminal based on the above-mentioned constrained pose and the local pose of the above-mentioned terminal, because in the above method, the terminal with the camera installed moves When taking multiple images, the constrained pose is determined by selecting the target image based on the parallax in the multiple images, and using the constrained pose to constrain the local position, the accurate target pose of the terminal can be determined. This achieves the purpose of improving the accuracy of the determined pose, thereby solving the technical problem of inaccurate positioning.

As an optional example, selecting multiple target images from multiple images based on disparity includes:

Determine multiple first images of the same object from multiple images;

Among the plurality of first images, the two images with the largest disparity are used as the images among the plurality of target images.

In this embodiment, when selecting multiple target images from multiple images based on disparity, an image of the same object can be obtained, and then the disparity between each two images of the same object is calculated and sorted according to the disparity. , after sorting, the two images with the largest disparity can be used as the target image. If multiple objects are included, two target images are determined for each object.

As an optional example, determining the target pose of the terminal based on the constrained pose and the local pose of the terminal includes:

The transformation matrix is determined based on the constrained pose and the local pose;

Get the scale factor from the transformation matrix;

The product of the local pose and the scale factor is used as the target pose.

In this embodiment, the transformation matrix can be determined based on the constrained posture and the local posture. After the transformation matrix is determined, the scale factor is obtained from the transformation matrix. The scale factor is a factor used to adjust the local posture of the terminal. The local posture of the terminal is multiplied by the scale factor to obtain the calculated posture. The calculated posture is the posture adjusted by the scale factor. pose, the calculated pose is the accurate target pose.

As an optional example, determining the transformation matrix based on the constrained pose and the local pose includes:

Substitute the first numerical value of the local posture and the second numerical value of the constrained posture into the above formula 1 to obtain the transformation matrix and residual.

Optionally, in this embodiment, when the first numerical value of the local posture and the second numerical value of the constrained posture are known, the two are substituted into the above formula. Since the local pose and the constrained pose are a series of position information, the above residual and the above transformation matrix T can be calculated.

As an optional example, obtaining the scale factors from the transformation matrix includes:

Substituting the relative rotation and relative offset between the constrained pose and the local pose into the above formula 2, the scale factor is obtained.

Optionally, in this embodiment, since the above transformation matrix T has been calculated, and r and t are both known quantities, the scale factor s can be calculated.

As an optional example, uploading multiple target images to the cloud to obtain the constrained pose of the terminal includes:

The cloud repositions each target image among the multiple target images according to the navigation map, and obtains the relocation position corresponding to each target image;

The cloud arranges the relocation positions in order to obtain the constrained pose.

Optionally, in this embodiment, after multiple target images are determined, the target images can be uploaded to the cloud. A navigation map is saved in the cloud, and the navigation map is a map within a certain area. Multiple target images can be used to determine images with high similarity in the navigation map. After comparison, the position of each target image in the multiple target images can be determined in the navigation map. Locations are arranged in chronological order You can get the pose. The obtained pose is used as the constrained pose.

As an optional example, the above method also includes:

The cloud obtains the panoramic video in the navigation area and multiple captured images in the navigation area;

Generate point cloud maps based on panoramic videos and captured images;

Combine the point cloud map with the flat map to obtain the navigation map.

Optionally, the navigation map in this embodiment needs to be obtained in advance. Panoramic videos can be captured in the navigation area and multiple captured images can be captured. The panoramic videos and captured images can be used to generate point cloud maps. Then, the point cloud map is combined with the flat map of the navigation area to obtain the above-mentioned navigation map.

As an optional example, generating point cloud maps based on panoramic videos and captured images includes:

Extract target frames from panoramic videos;

Determine the first pose of the target frame;

Cross the first pose to generate a matrix structure and obtain a sparse point cloud;

Densify the sparse point cloud to obtain a point cloud map.

Optionally, in this embodiment, after acquiring the panoramic video and shooting images, target frames can be extracted from the panoramic video, and each target frame is an image. Determine the position of the extracted multi-frame target frame as the first pose, cross the first pose, generate a matrix structure, and obtain a sparse point cloud. The sparse point cloud is densified to obtain a point cloud map.

This embodiment proposes a monocular visual mileage (V0) scale recovery algorithm combined with cloud relocation (visual positioning), so that local (terminal) V0 can be effectively applied in navigation user tracking. The key to the local V0 scale recovery algorithm combined with cloud relocation is to run V0 on the terminal when navigation starts, and select a number of key frames at certain intervals to send to the cloud. The cloud repositions these key frames to obtain their corresponding constraint bits. pose, and returns the constrained pose to the terminal; the terminal uses the returned constrained pose as a constraint, adds it to the calculation process of the local pose, solves a transformation matrix, and decomposes the scale factor from the transformation matrix. According to this scale Factors can restore the true scale of the local position. At the same time, since the pose constraints returned by the cloud are added to the local pose calculation, it can improve The reliability of the local posture calculated by V0. Using the scale recovery algorithm of this embodiment, the local posture can be tracked accurately and efficiently over a long period of time, effectively improving the efficiency and accuracy of user tracking during navigation. At the same time, the application scenarios are also wider, without distinguishing between indoor and outdoor scenes.

Figure 2 is the basic flow chart of the local V0 scale recovery algorithm. The basic process of the algorithm includes: key frame screening and uploading, key frame relocation, solving similarity transformation and V0 scale recovery. Key frames refer to the key frames generated when the terminal runs V0. First, some of the key frames recently solved by V0 are screened on the terminal, and the ones with the largest parallax are selected and uploaded to the cloud for relocation. The terminal refers to a common smartphone, nothing special. Model requirements. Cloud relocation means that the cloud visually locates the received key frames and returns the positioning results to the terminal. After the terminal obtains the poses of these uploaded key frames, it adds these poses as constraints to the calculation of the local pose, and finally can solve a similarity transformation. The scale factor decomposed from the similarity transformation can be used to recover the true scale of the local posture.

Figure 3 is the overall block diagram of the navigation system based on local V0 scale recovery. The system includes cloud and terminal. As shown in Figure 3, the cloud includes navigation map generation and navigation services, and the terminal provides functional interfaces for users. Navigation map generation includes collecting original mapping data, generating point cloud maps, and generating navigation maps based on point cloud maps and plane maps. Navigation services provided by the cloud include: identification and positioning, path planning and real-time navigation. The functions provided to users by the terminal include: starting navigation, initial positioning, destination selection, and real-time navigation.

Cloud offline navigation map generates high-precision maps for navigation. Within the navigation area (map), use a panoramic camera to shoot panoramic videos. There are no specific requirements for the panoramic camera model. In addition, pictures of some scenes on the map taken by a general monocular camera are required, and the precise position of the picture is obtained through real-time differential positioning (Real-time kinematic, RTK). There are no special requirements for the camera model. The point cloud map is obtained using the original data and the panoramic 3D reconstruction algorithm. The final navigation map is obtained based on the point cloud map and the plane map. The navigation map is stored in the cloud, and the corresponding area map is loaded every time the navigation is started. The entire navigation service is conducted within the scope of the map. The cloud navigation service mainly focuses on initial positioning and relocation tasks during user tracking. The real-time navigation process uses the local V0 scale recovery proposed in the embodiment of the present invention. The cloud is deployed on high-performance servers, and the network needs to be kept open at the same time. After the navigation is started, the terminal will continuously interact with the cloud navigation service to achieve real-time navigation.

Process 1. In this embodiment, the specific steps for generating a navigation map offline are as follows:

Step 1: Use a panoramic camera to shoot the navigation service (map) area to obtain a panoramic video. There are no special requirements for the brand and model of the panoramic camera used for image collection. It should be noted that the shooting process contains at least one "loopback". Looping means to circle back to the "origin" after shooting for a certain distance. The "origin" does not specifically refer to the initial scanning position, but to part of the scene that has been walked through during the scanning process. The shooting route is similar to the five Olympic rings. That is, the panoramic video collected by hand contains images/videos of the same object from different angles.

Step 2: Take several pictures of part of the scene, and use RTK to obtain the precise location coordinates of the taken pictures. There are no special requirements for the brand and model of the equipment for taking pictures. Common mobile phones, SLR cameras and other equipment are acceptable. There are no special requirements for the brand and model of RTK. Special instructions are needed: (I) This part of the picture will be used for real-scale restoration, so the shooting positions cannot be on the same straight line, and the shooting positions should be distributed as much as possible in the entire area; (II) Try to shoot as close to the scene as possible Typical scenario.

Step 3: Use the 3D reconstruction algorithm to perform 3D reconstruction of the data collected in steps 1 and 2 to generate a point cloud map. The basic process of the three-dimensional reconstruction algorithm is: extract frames from the panoramic video, run Simultaneous Localization and Mapping (SLAM) to obtain panoramic key frames and poses, cut panoramic images to generate monocular images and their corresponding poses, and use Monocular images and poses are run through a cross matrix structure (Structure from Motion, SfM) to generate sparse point clouds, and densification generates dense point clouds.

Step 4: Combine the flat map and the point cloud map generated in step 3 to generate the navigation map used in the navigation process and save it to the cloud.

Process 2, the navigation process is as follows:

Step 1: The terminal starts the augmented reality (AR) navigation service, and the cloud loads the navigation map generated in process 1. The terminal refers to a smartphone with a smooth network, and there are no special requirements for the brand.

Step 2: The terminal starts initial positioning, uses the camera to take pictures of the current environment, and uploads them to Cloud.

Step 3: The cloud performs initial positioning on the current environment picture. After obtaining the pose of the current picture, it returns it to the terminal as the user's initial position.

Step 4: After the terminal obtains the initial position, select the navigation destination and upload it to the cloud.

Step 5: The cloud plans the navigation path based on the starting point and destination of the navigation, and renders the movement direction indicator on the screen of the terminal.

Step 6: The user moves according to the instructions, and the terminal screen displays the picture captured by the current camera. At the same time, the terminal starts V0 and turns on user tracking.

Step 7: When the terminal V0 reaches a certain time, start the local V0 scale recovery algorithm. The algorithm process is shown in Figure 1. The specific steps are as follows

Step 7.1: The terminal selects several key frames with the largest average disparity from some of the recent key frames obtained by V0 and uploads them to the cloud;

Step 7.2: The cloud repositions the uploaded key frames and returns the corresponding pose;

Step 7.3: The terminal adds the cloud relocation key frame pose as a priori to the calculation of V0, and defines the residual as the prior pose minus the similarity transformation matrix multiplied by the local pose. The transformation matrix T is obtained through the above formula 1, and the scale factor s is obtained through the above formula 2.

Step 8: Use the pose solved by recovering the local V0 of the true scale as the user's current pose to achieve multi-modal user tracking that integrates local V0 and cloud relocation. At the same time, the terminal continuously uses the current location to determine whether the destination has been reached.

For indoor scene AR navigation, the mapping algorithm and recognition algorithm are deployed in the cloud during implementation. The specific implementation process is as follows:

Step 1: Follow step 1 of process 1 to collect original video data. The indoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot. The panoramic camera can also be fixed by other equipment, such as a helmet.

Step 2: Collect image data according to step 2 of process 1. Image data is generally collected through mobile phones Take photos or other devices that can take photos. Usually shooting more typical scenes, such as store signs, etc., such scenes are more likely to be the starting point or end point of navigation.

Step 3: Deploy the self-developed mapping algorithm, and then follow the process 1, step 3, and use the algorithm to perform three-dimensional reconstruction on the original data collected in steps 1 and 2 above to generate a point cloud map.

Step 4: Follow step 4 of process 1 to generate a navigation map. The point cloud map is the point cloud map generated in step 3. In indoor situations, the floor map uses the CAD drawing of the building.

Step 5: Follow steps 1 and 2 of process 2 to start navigation. When starting navigation, the cloud loads map information and starts to accept messages from the terminal. The terminal uploads pictures of the current environment.

Step 6: Follow steps 3, 4 and 5 of process 2 to generate a navigation path. The cloud performs initial positioning based on the uploaded image and returns the result to the terminal. The terminal selects the navigation destination and uploads it to the cloud. The cloud generates the navigation path based on the current location and destination. Navigate the path and render it to the terminal screen.

Step 7: Follow steps 6 to 8 of process 2. The terminal starts V0. When V0 runs and tracks for 20 seconds, it starts the scale recovery algorithm. When performing local scale recovery, select the 3 with the largest average parallax from the 10 most recent key frames obtained by V0 and upload them to the cloud for relocation; use the relocated pose as a priori to solve the local V0 pose and prior position. Similar transformation of pose, the scale factor is restored through this transformation matrix, and finally the scale of local V0 is restored, realizing real-time multi-modal navigation. The local V0 can achieve relatively accurate user tracking. Combined with the scale recovery algorithm of the embodiment of the present invention, the true scale of the local V0 can be restored, which can achieve continuous and accurate user tracking during the navigation process.

By following the above process, complete indoor AR navigation can be achieved.

When implementing outdoor scene AR navigation, the mapping algorithm and recognition algorithm are deployed in the cloud. The specific implementation process is as follows:

Step 1: Collect original video data as described in Step 1 of Process 1. The outdoor panoramic video shooting method is generally to hold a panoramic camera and follow the process 1 step 1 to shoot. If the scene is large, you can also use other methods such as a drone equipped with a panoramic camera. The shooting route must still comply with the description in step 1 of the process.

Step 2: Collect image data according to step 2 of process 1. Image data is generally collected through mobile phones Take photos or other devices that can take photos. We usually shoot more typical scenes, such as road signs, building gates, etc. Such scenes are more likely to be the starting point or end point of navigation.

Step 4: Follow step 4 of process 1 to generate a navigation map. The point cloud map is the point cloud map generated in step 3. In outdoor situations, the plan map can be a plan CAD map and road network information.

By following the above process, complete outdoor AR navigation can be achieved.

It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the embodiments of the present invention are not limited by the described action sequence. limitation, because certain steps may be performed in other orders or simultaneously according to embodiments of the present invention. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily necessary for the embodiments of the present invention.

According to another aspect of the embodiment of the present application, a visual positioning pose determination device is also provided, as shown in Figure 4, including:

The acquisition module 402 is used to acquire multiple images captured by the terminal during the movement of the terminal equipped with a camera;

The selection module 404 is used to select multiple target images from multiple images based on disparity;

The upload module 406 is used to upload multiple target images to the cloud to obtain the constrained pose of the terminal;

The determination module 408 is used to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.

After acquiring multiple images, multiple target images can be selected based on disparity. After multiple target images are uploaded to the cloud, the cloud can determine the constrained pose of the terminal based on the target image. The constrained pose is a pose used to constrain the local pose of the terminal. The constrained pose is sent to the terminal, and then the constrained pose is sent to the terminal. The terminal determines the accurate target pose of the terminal based on the constrained pose and the local pose. After the target pose is determined, the target pose can be displayed on the terminal for navigation or positioning.

Since in the above method, multiple images are taken when the terminal with the camera is moved, the constrained pose is determined by the target image selected based on the parallax in the multiple images, and the constrained pose is used to constrain the local position, the terminal can be determined Accurate target pose. Thus achieving the improvement of certain The purpose of pose accuracy.

As an optional example, the above-mentioned selection module includes: a first determination unit, used to determine multiple first images of the same object from the above-mentioned multiple images; a second determination unit, used to determine the above-mentioned multiple first images Among the images, the two images with the largest disparity are used as the images among the multiple target images mentioned above.

As an optional example, the above-mentioned determination module includes: a third determination unit, used to determine the transformation matrix according to the above-mentioned constrained posture and the above-mentioned local posture; an acquisition unit, used to obtain the scale factor from the above-mentioned transformation matrix; Four determination units, configured to use the product of the above-mentioned local posture and the above-mentioned scale factor as the above-mentioned target posture.

As an optional example, the above-mentioned third determination unit includes: a first input sub-unit, used to substitute the first numerical value of the above-mentioned local posture and the second numerical value of the above-mentioned constrained posture into the above-mentioned formula 1 to obtain the above-mentioned transformation matrix and residuals.

As an optional example, the above-mentioned acquisition unit includes: a second input sub-unit, used to substitute the relative rotation and relative offset of the above-mentioned constrained posture and the above-mentioned local posture into the above-mentioned formula 2 to obtain the above-mentioned scale factor.

As an optional example, the above-mentioned upload module includes: a relocation unit, used to notify the above-mentioned cloud to relocate each of the above-mentioned multiple target images according to the navigation map, and obtain the corresponding The relocation positions; the above-mentioned cloud arranges the above-mentioned relocation positions in order to obtain the above-mentioned constrained poses.

Optionally, in this embodiment, after multiple target images are determined, the target images can be uploaded to the cloud. A navigation map is saved in the cloud, and the navigation map is a map within a certain area. Multiple target images can be used to determine images with high similarity in the navigation map. After comparison, the position of each target image in the multiple target images can be determined in the navigation map. The positions can be obtained by arranging the positions in chronological order. The obtained pose is used as the constrained pose.

As an optional example, the cloud can obtain a panoramic video in the navigation area and multiple captured images in the navigation area; generate a point cloud map based on the panoramic video and the captured images; combine the point cloud map with a plane The maps are combined to obtain the above navigation map.

As an optional example, the cloud can extract the target frame from the above panoramic video; determine the first pose of the above target frame; intersect the above first pose to generate a matrix structure to obtain a sparse point cloud; perform the above sparse The point cloud is densified to obtain the above point cloud map.

For other examples of this embodiment, please refer to the above examples and will not be described again here.

Figure 5 is a structural block diagram of an optional electronic device according to an embodiment of the present application. As shown in Figure 5, it includes a processor 502, a communication interface 504, a memory 506 and a communication bus 508. The processor 502, the communication interface 504 and memory 506 complete communication with each other through communication bus 508, where,

Memory 506 for storing computer programs;

The processor 502 is used to implement the following steps when executing the computer program stored on the memory 506:

During the movement of the terminal equipped with a camera, obtain multiple images taken by the terminal;

Select multiple target images from multiple images based on disparity;

Upload multiple target images to the cloud to obtain the constrained pose of the terminal;

The target pose of the terminal is determined based on the constrained pose and the local pose of the terminal.

Optionally, in this embodiment, the above-mentioned communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus. The communication interface is used for communication between the above-mentioned electronic devices and other devices.

The memory may include RAM or non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.

As an example, the above memory 506 may include, but is not limited to, the acquisition module 402, the selection module 404, the upload module 406 and the determination module 408 in the above visual positioning pose determination device. In addition, it may also include but is not limited to other module units in the above-mentioned visual positioning posture determination device, which will not be described again in this example.

The above-mentioned processor may be a general-purpose processor and may include but is not limited to: a central processing unit (Central Processing Unit). Processing Unit (CPU), Network Processor (NP), etc.; it can also be a Digital Signal Processing (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array ( Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

Optionally, for specific examples in this embodiment, reference may be made to the examples described in the above embodiments, which will not be described again in this embodiment.

Those of ordinary skill in the art can understand that the structure shown in Figure 5 is only illustrative, and the device that implements the pose determination method of visual positioning can be a terminal device, and the terminal device can be a smart phone (such as an Android phone, iOS phone, etc.) , tablet computers, handheld computers, and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal devices. Figure 5 does not limit the structure of the above electronic device. For example, the electronic device may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 5 , or have a different configuration than that shown in FIG. 5 .

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program. The program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, ROM, RAM, magnetic disk or optical disk, etc.

According to another aspect of the embodiment of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, wherein the computer program executes the above-mentioned visual positioning when run by the processor. Steps in the pose determination method.

Optionally, in this embodiment, those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program, and the program can be stored in a Among computer-readable storage media, the storage media can include: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

The above serial numbers of the embodiments of the present invention are only for description and do not represent the advantages and disadvantages of the embodiments.

If the integrated units in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in the above computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present invention is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium. , including several instructions to cause one or more computer devices (which can be personal computers, servers, network devices, etc.) to execute all or part of the steps of the methods described in various embodiments of the embodiments of the present invention.

In the embodiments of the present invention, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed client can be implemented in other ways. Among them, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiment of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

The above are only preferred implementations of the embodiments of the present invention. It should be noted that those of ordinary skill in the art can make several improvements and modifications without departing from the principles of the embodiments of the present invention. Improvements and modifications should also be considered as the protection scope of the embodiments of the present invention.

Claims

A method for determining the pose of visual positioning, including:

During the movement of a terminal equipped with a camera, obtain multiple images captured by the terminal;

Select a plurality of target images from the plurality of images based on disparity;

Upload the multiple target images to the cloud to obtain the constrained pose of the terminal;

The target pose of the terminal is determined based on the constrained pose and the local pose of the terminal.
The method according to claim 1, wherein selecting a plurality of target images from the plurality of images based on disparity includes:

Determine multiple first images of the same object from the multiple images;

Among the plurality of first images, the two images with the largest parallax are used as the images among the plurality of target images.
The method according to claim 1, wherein determining the target pose of the terminal based on the constrained pose and the local pose of the terminal includes:

Determine a transformation matrix according to the constrained pose and the local pose;

Obtain scale factors from the transformation matrix;

The product of the local posture and the scale factor is used as the target posture.
The method according to claim 3, wherein determining the transformation matrix according to the constrained pose and the local pose includes:

Substitute the first numerical value of the local pose and the second numerical value of the constrained pose into the following formula to obtain the transformation matrix and residual:

Among them, the residual is the residual, is the rotation of the constrained pose, is the translation of the constrained pose, R is the rotation of the local pose, T is the translation of the local pose, and T is the transformation matrix.
The method according to claim 3, wherein obtaining the scale factor from the transformation matrix includes:

The relative rotation and relative offset of the constrained posture and the local posture are substituted into the following formula to obtain the scale factor:

Where, T is the transformation matrix, s is the scale factor, r is the relative rotation between the constrained pose and the local pose, and t is the relative offset between the constrained pose and the local pose. .
The method according to claim 1, wherein uploading the plurality of target images to the cloud to obtain the constrained pose of the terminal includes:

The cloud repositions each of the plurality of target images according to the navigation map to obtain the relocation position corresponding to each target image;

The cloud arranges the relocation positions in order to obtain the constrained pose.
The method of claim 6, further comprising:

The cloud acquires the panoramic video in the navigation area and multiple captured images in the navigation area;

Generate a point cloud map based on the panoramic video and the captured image;

The point cloud map is combined with the plane map to obtain the navigation map.
The method according to claim 7, wherein generating a point cloud map according to the panoramic video and the captured image includes:

Extract target frames from the panoramic video;

Determine the first pose of the target frame;

Cross the first pose to generate a matrix structure and obtain a sparse point cloud;

Densify the sparse point cloud to obtain the point cloud map.
A posture determination device for visual positioning, including:

An acquisition module configured to acquire a plurality of images captured by a terminal equipped with a camera during the movement of the terminal;

A selection module configured to select multiple target images from the multiple images based on disparity;

An upload module, configured to upload the plurality of target images to the cloud to obtain the constrained pose of the terminal;

A determining module configured to determine the target pose of the terminal based on the constrained pose and the local pose of the terminal.
An electronic device includes a memory and a processor, a computer program is stored in the memory, and the processor is configured to execute the method described in any one of claims 1 to 8 through the computer program.