CN117911498A

CN117911498A - Pose determination method and device, electronic equipment and storage medium

Info

Publication number: CN117911498A
Application number: CN202410078390.8A
Authority: CN
Inventors: 刘少杰
Original assignee: Guangzhou Kaidelian Software Technology Co ltd; Guangzhou Kaidelian Intelligent Technology Co ltd
Current assignee: Guangzhou Kaidelian Software Technology Co ltd; Guangzhou Kaidelian Intelligent Technology Co ltd
Priority date: 2024-01-18
Filing date: 2024-01-18
Publication date: 2024-04-19

Abstract

The invention relates to the technical field of computer vision, and discloses a pose determining method, a pose determining device, electronic equipment and a storage medium, wherein the pose determining method comprises the following steps: extracting a target frame image from target video data, wherein the target video data is obtained by image acquisition of a target scene through a physical camera; constructing a three-dimensional model of the target scene; extracting a plurality of characteristic point data in the three-dimensional model, and constructing a simulation image based on the plurality of characteristic point data; and determining pose data when the physical camera acquires the target frame image based on a matching result between the simulation image and the target frame image, wherein the target frame image comprises a plurality of characteristic point data. The method can effectively reduce the manual measurement cost and achieve the purpose of improving the pose data determination efficiency.

Description

Pose determination method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a pose determining method, a pose determining device, electronic equipment and a storage medium.

Background

Digital twinning is a technology for corresponding a digital representation of an entity physical object or system to a virtual model corresponding to the actual state of the entity physical object or system, thereby realizing real-time monitoring, simulation and analysis. In digital twinning, various different data and materials need to be fused, wherein in order to fuse video materials acquired by a camera into a corresponding three-dimensional model, a real world coordinate system and a virtual world coordinate system need to be accurately corresponding and matched.

In the related art, in order to determine the position of the camera when the camera collects video data, a related person needs to use a field measurement mode to determine the position through a total station (all-purpose measuring instrument). However, when the range of a target scene needing to be subjected to three-dimensional modeling is large, the pose of the camera is determined by adopting the method, a large amount of labor cost is required, and the efficiency is low.

Disclosure of Invention

In view of the above, the present invention provides a pose determining method, device, electronic apparatus and storage medium, so as to solve the problems of low pose determining efficiency and high cost of the camera.

In a first aspect, the present invention provides a pose determining method, including:

Extracting a target frame image from target video data, wherein the target video data is obtained by image acquisition of a target scene through a physical camera;

constructing a three-dimensional model of the target scene;

Extracting a plurality of characteristic point data in the three-dimensional model, and constructing a simulation image based on the plurality of characteristic point data;

And determining pose data when the physical camera acquires the target frame image based on a matching result between the simulation image and the target frame image, wherein the target frame image comprises a plurality of characteristic point data.

The beneficial effects are that: according to the pose determining method, the manual measuring cost can be effectively reduced, pose data when the physical camera collects the target frame image is determined based on the matching result between the simulated image under the virtual world coordinate system and the target frame image under the real world coordinate system, the pose data determining mode can be more efficient and convenient, and the purpose of improving the pose data determining efficiency is achieved.

In an alternative embodiment, the simulated image is obtained based on initial pose data of a virtual camera, the virtual camera corresponds to the physical camera and is deployed on the three-dimensional model; based on a matching result between the simulation image and the target frame image, determining pose data when the physical camera collects the target frame image comprises the following steps:

Superposing the simulation image on the target frame image, and respectively determining the projection coordinates of each feature point data projected on the target frame image;

Determining a matching result between the simulation image and the target frame image based on the coordinate difference between the point coordinate of each feature point data on the target frame image and the corresponding projection coordinate;

based on the matching result, the initial pose data is adjusted, and the pose data when the physical camera collects the target frame image is obtained.

The beneficial effects are that: based on the matching result between the simulated image in the virtual world coordinate system and the target frame image in the real world coordinate system, whether the pose of the virtual camera when the virtual camera collects the simulated image is the same as the pose of the physical camera when the physical camera collects the target frame image can be effectively evaluated, and further when the simulated image is different from the target frame image, the purpose of quickly determining the pose data of the physical camera when the physical camera collects the target frame image can be achieved through targeted adjustment of the initial pose data of the virtual camera.

In an alternative embodiment, determining a matching result between the simulated image and the target frame image based on a coordinate difference between a point coordinate of each feature point data on the target frame image and a corresponding projection coordinate includes:

carrying out variance processing according to the coordinate difference between the point coordinate of each characteristic point data on the target frame image and the corresponding projection coordinate to obtain the position deviation value of a plurality of characteristic point data between the simulated image and the target frame image;

If the position deviation value is smaller than or equal to a preset threshold value, determining that the matching result between the analog image and the target frame image is the same;

if the position deviation value is larger than the preset threshold value, the matching result between the simulation image and the target frame image is determined to be different.

In an alternative embodiment, based on the matching result, the initial pose data is adjusted to obtain pose data when the physical camera collects the target frame image, including:

if the matching results are the same, the initial pose data are used as pose data when the physical camera collects the target frame image;

If the matching results are different, the initial pose data are adjusted until the matching results are the same, and the adjusted initial pose data are used as pose data when the physical camera acquires the target frame image.

In an alternative embodiment, the initial pose data is adjusted by a virtual reality display device.

The beneficial effects are that: the adjusting efficiency can be improved, and the adjusted initial pose data is more in line with human senses.

In an alternative embodiment, extracting the target frame image from the target video data includes:

And determining and extracting a target frame image based on a result of target detection on the target video data, wherein the target frame image is composed of at least one static object.

The beneficial effects are that: the stability of the target frame image can be improved, and the interference of invalid data is avoided, so that the accuracy of determining pose data can be improved.

In an alternative embodiment, the target frame image is pulled from the streaming server in real time.

The beneficial effects are that: the key visual information can be acquired based on the video data acquired by the physical camera in real time, so that the pose data of the camera can be well determined.

In a second aspect, the present invention provides a pose determining apparatus, comprising:

The first extraction module is used for extracting target frame images from target video data, wherein the target video data is obtained by acquiring images of a target scene through a physical camera;

The acquisition module is used for constructing a three-dimensional model of the target scene;

the second extraction module is used for extracting a plurality of characteristic point data in the three-dimensional model and constructing a simulation image based on the plurality of characteristic point data;

The processing module is used for determining pose data when the physical camera collects the target frame image based on a matching result between the simulation image and the target frame image, and the target frame image comprises a plurality of feature point data.

In a third aspect, the present invention provides an electronic device, comprising: the processor is in communication connection with the memory, and the memory stores computer instructions, and the processor executes the computer instructions to perform the pose determination method according to the first aspect or any implementation manner corresponding to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the pose determination method of the first aspect or any of its corresponding embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a total station according to the prior art;

FIG. 2 is a flow chart of a pose determination method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another pose determination method according to an embodiment of the present invention;

Fig. 4 is a block diagram of a structure of still another pose determination apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a structural framework of a streaming server according to an embodiment of the present invention;

FIG. 6 is a flow chart of a method for determining a pose according to an embodiment of the present invention;

fig. 7 is a block diagram of a structure of still another pose determination apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the related art, in order to determine the position of the camera when capturing video data, a related person needs to use a field measurement mode to determine the position by using a total station (all-purpose measuring instrument) as shown in fig. 1. The total station adopts a geometric measurement principle, and establishes the geometric relationship between the external known point and the point to be measured by observing parameters such as coordinates, azimuth angles and the like of the external known point and measurement angles and distances between the external known point and the point to be measured, and further determines longitude, latitude and altitude information of the point to be measured through processing modes such as calculation, deduction and the like.

Therefore, in order to accurately correspond and match the real world coordinate system and the virtual world coordinate system when the camera collects video data, relevant personnel are required to continuously measure and debug through the total station, so that the pose of the camera for collecting the video data is determined. However, when the range of the target scene is large, if the pose of the camera is still determined by adopting the method, a great amount of labor cost is consumed, and the efficiency is extremely low.

In view of this, the embodiment of the invention provides a pose determining method, which not only can reduce labor cost, but also can improve the determining efficiency of pose data.

According to an embodiment of the present invention, there is provided a pose determination method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

The embodiment provides a pose determining method, which can be used for electronic equipment and comprises the following steps: tablet, computer, etc., fig. 2 is a flowchart of a pose determining method according to an embodiment of the present invention, as shown in fig. 2, the process includes the following steps:

Step S201, extracting a target frame image from the target video data.

In the embodiment of the invention, the target video data is obtained by image acquisition of a target scene through a physical camera.

Step S202, constructing a three-dimensional model of the target scene.

In the embodiment of the invention, in order to facilitate clearer understanding and communication of the characteristics, structures and layout of the target scene, a three-dimensional model of the target scene is constructed. Wherein the target scene is a real scene of the physical world, for example: the target scene may include any of the following types of scenes: building scenes, urban scenes, industrial scenes, natural scenes and the like can be specifically set according to actual requirements, and the method is not limited in the invention.

In some examples, a three-dimensional model of the target scene may be constructed by oblique photography techniques. Specifically, the unmanned aerial vehicle is adopted to collect photos of the target scene according to a preset cruising route. Because the position, the gesture and the field angle of the unmanned aerial vehicle are known, the coordinates and the elevations of a plurality of characteristic points of the target scene can be determined based on an aerial triangulation technology, and the coordinates and the elevations are mapped onto the surface of the three-dimensional model, so that the three-dimensional model of the target scene is obtained. Because of the relative maturity of aerial triangulation techniques, no further description is provided herein. Preferably, unmanned aerial vehicle with centimeter-level positioning accuracy can be adopted for photo acquisition, so that acquisition progress is improved, and the obtained three-dimensional model can be guaranteed to be more attached to a target scene.

Step S203, extracting a plurality of feature point data in the three-dimensional model, and constructing a simulation image based on the plurality of feature point data.

In the embodiment of the invention, a plurality of characteristic point data are extracted from a three-dimensional model and projected onto an image plane coordinate system, and a simulated image of the plurality of characteristic point data under a two-dimensional coordinate system is obtained by an image rendering mode so as to estimate pose data of a physical camera based on a mapping relation between the image plane coordinate and a world coordinate system.

In an example, the feature point data may be any of the following types of feature points in the target scene: for example: characteristic points on a static object, characteristic points with a distinct color, or characteristic points at corners of the object surface. The feature point data can be extracted manually or by target detection, which is not limited in the invention.

Step S204, based on the matching result between the simulation image and the target frame image, pose data when the physical camera collects the target frame image is determined.

In the embodiment of the present invention, the target frame image includes a plurality of feature point data. To improve the pose determination efficiency, a target frame image including a plurality of feature point data is extracted from the target video data. Based on the matching result between the simulation image and the target frame image, the position deviation between the target frame image and the simulation image can be clarified, and further based on the position deviation between the target frame image and the simulation image, the pose data of the physical camera during the simulation image acquisition is estimated through a preset algorithm, so that the labor cost can be effectively reduced, and the aim of improving the pose determination efficiency is fulfilled. For example: the preset algorithm may be a random sample consensus algorithm (Random Sample Consensus, RANSAC) or PnP (PERSPECTIVE-n-point, an algorithm for calculating camera pose).

According to the pose determining method, the manual measuring cost can be effectively reduced, pose data when the physical camera collects the target frame image is determined based on the matching result between the simulated image under the virtual world coordinate system and the target frame image under the real world coordinate system, the pose data determining mode can be more efficient and convenient, and the purpose of improving the pose data determining efficiency is achieved.

In an optional implementation manner, in order to facilitate the construction of the simulated image, a virtual camera corresponding to the physical camera is deployed in the three-dimensional model, so that the pose data of the simulated image acquired by the virtual camera is used for restoring the pose of the target frame image acquired by the physical camera, and the purpose of improving the determination efficiency of the pose data is achieved.

Wherein, the expression of the mapping relation between the image plane coordinates and the world coordinate system is as follows:

wherein, The internal parameters are used for representing the camera; /(I)External parameters for representing the camera; u and v represent pixel coordinates of the feature point data in the image plane coordinates, X, Y, Z represent three-dimensional coordinates of the feature point data restored to the world coordinates, R represents a rotation matrix of the virtual camera, T represents a translation vector of the virtual camera, f _x and f _y are focal lengths of the analog image, and u ₀ and v ₀ are center point coordinates of the analog image. The internal participation of the virtual camera is the same as that of the example camera, and is known data provided by manufacturers of physical cameras. The external parameters of the virtual camera are pose data of the physical camera which finally needs to be determined. In order to ensure that the simulation image can be successfully constructed, the initial pose parameters of the virtual camera are pre-configured, so that the direction can be clearly regulated when the adjustment is performed later, and the adjustment efficiency is improved.

The embodiment provides a pose determining method, which can be used for the electronic equipment, and comprises the following steps: tablet, computer, etc., fig. 3 is a flowchart of a pose determining method according to an embodiment of the present invention, as shown in fig. 3, the process includes the following steps:

Step S301 extracts a target frame image from the target video data. Please refer to step S201 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S302, constructing a three-dimensional model of the target scene. Please refer to step S202 in the embodiment shown in fig. 2, which is not described herein.

Step S303, extracting a plurality of feature point data in the three-dimensional model, and constructing a simulation image based on the plurality of feature point data. Please refer to step S203 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S304, based on the matching result between the simulation image and the target frame image, pose data when the physical camera collects the target frame image is determined.

Specifically, the step S304 includes:

In step S3041, the simulated image is superimposed on the target frame image, and projection coordinates of each feature point data projected onto the target frame image are determined respectively.

In the embodiment of the invention, in order to determine whether the positions of the plurality of feature point data on the target frame image are different from the positions of the plurality of feature point data on the simulation image, the simulation image is superimposed on the target frame image, and then the projection coordinates of each feature point data projected on the target frame image are respectively determined, so that the position deviation of the projection positions of the plurality of feature point data on the target frame image and the actual positions of the plurality of feature point data on the target frame image can be more intuitively and rapidly determined.

Step S3042, determining a matching result between the simulation image and the target frame image based on the coordinate difference between the point coordinates and the corresponding projection coordinates of each feature point data on the target frame image.

In the embodiment of the invention, in order to determine whether the simulated image obtained by the virtual camera according to the initial pose data is the same as the pose data when the physical camera collects the target frame image, the coordinate difference between the point coordinate of each characteristic point data on the target frame image and the corresponding projection coordinate is respectively determined, and then the matching condition between the simulated image and the target frame image is detected through the coordinate difference corresponding to each characteristic point data, so that a matching result is obtained, the adjustment direction of the initial pose data can be clarified according to the matching result, and the adjustment efficiency is improved.

In some alternative embodiments, step S3042 includes:

And a1, carrying out variance processing according to the coordinate difference between the point coordinate of each characteristic point data on the target frame image and the corresponding projection coordinate to obtain the position deviation value of a plurality of characteristic point data between the simulation image and the target frame image.

And a2, if the position deviation value is smaller than or equal to a preset threshold value, determining that the matching result between the simulation image and the target frame image is the same.

And a3, if the position deviation value is larger than a preset threshold value, determining that the matching result between the simulation image and the target frame image is different.

Specifically, in order to improve the matching efficiency and save the calculation cost, after determining the coordinate differences corresponding to each feature point data, variance processing is performed on all the coordinate differences to determine the overall position deviation condition of the plurality of feature point data, so as to obtain the position deviation values of the plurality of feature point data between the simulation image and the target frame image.

The preset threshold may be understood as the maximum position deviation value for which a difference in position is considered reasonable. If the position deviation value is smaller than or equal to the preset threshold value, it is reasonable to characterize that the difference exists between the positions of the plurality of feature point data and the target frame image, and then the matching result between the simulation image and the target frame image can be judged to be the same. If the position deviation value is larger than the preset threshold value, it is unreasonable to characterize that the positions of the plurality of feature point data between the simulated image and the target frame image are different, and then it can be determined that the matching results between the simulated image and the target frame image are different.

Step S3043, based on the matching result, adjusting initial pose data to obtain pose data when the physical camera collects the target frame image.

In the embodiment of the invention, the virtual camera is a camera corresponding to the physical camera in the virtual world, so that the initial pose data can be used as estimated pose data when the physical camera acquires the target frame image. Based on the matching result, whether the pose of the virtual camera when the virtual camera collects the analog image is the same as the pose of the physical camera when the physical camera collects the target frame image or not can be determined, and whether the initial pose data need to be pertinently adjusted or not can be further judged, so that the determination efficiency of the pose data of the physical camera when the physical camera collects the target frame image is improved.

In some alternative embodiments, step S3043 includes:

And b1, if the matching results are the same, taking the initial pose data as the pose data when the physical camera acquires the target frame image.

And b2, if the matching results are different, adjusting the initial pose data until the matching results are the same, and taking the adjusted initial pose data as the pose data when the physical camera acquires the target frame image.

Specifically, if the matching results are the same, the pose of the virtual camera when the virtual camera collects the analog image is the same as the pose of the physical camera when the physical camera collects the target frame image, and the initial pose data is not required to be adjusted, and the initial pose data is directly used as the pose data when the physical camera collects the target frame image.

If the matching results are different, the pose of the virtual camera for acquiring the analog image is different from the pose of the physical camera for acquiring the target frame image, and the pose data of the physical camera for acquiring the target frame image is required to be determined by adjusting the initial pose data. Therefore, the steps from step S302 to step S3043 need to be repeatedly and circularly executed until the matching result is the same, and the adjusted initial pose data is further used as the pose data when the physical camera collects the target frame image.

In some alternative implementations, the initial pose data may be adjusted by a virtual reality display device. The pose data refer to position data and gesture data, the position data comprises three-dimensional space positions (x, y, z) of the physical camera under a world coordinate system, and the gesture data comprises roll (roll), yaw (head) and pitch (pitch). That is, the pose data is constituted of data of 6 dimensions. Therefore, in order to improve the adjusting efficiency and enable the adjusted initial pose data to be more in line with human senses, the virtual reality display equipment is utilized for auxiliary adjustment. In an example, a Virtual Reality (VR) display device is an example of a wearable device (e.g., a head mounted display device). In the process of adjusting initial pose data by utilizing the VR display device, a combined image of the analog image superimposed on the target frame image can be sent to the VR display device for display, and the initial pose data can be adjusted by a user of the VR display device in a mode of adjusting the position and the visual angle of the head, so that the operation efficiency can be improved, and the professionality of related personnel can be greatly reduced. In another example, the VR display device may also be an electronic display screen, and the user of the VR display device adjusts the initial pose data by adjusting the left and right handles, so as to achieve the purpose of improving the operation efficiency and reducing the professionality of related personnel.

According to the pose determining method provided by the embodiment, based on the matching result between the simulated image in the virtual world coordinate system and the target frame image in the real world coordinate system, whether the pose of the virtual camera when the virtual camera collects the simulated image is the same as the pose of the physical camera when the physical camera collects the target frame image can be effectively evaluated, and further when the simulated image is different from the target frame image, the purpose of quickly determining the pose data of the physical camera when the physical camera collects the target frame image can be achieved through the targeted adjustment of the initial pose data of the virtual camera.

The embodiment provides a pose determining method, which can be used for the electronic equipment, and comprises the following steps: tablet, computer, etc., fig. 4 is a flowchart of a pose determining method according to an embodiment of the present invention, as shown in fig. 4, the flowchart includes the following steps:

step S401, extracting a target frame image from the target video data.

Specifically, the step S401 includes:

step S4011, based on a result of performing target detection on the target video data, a target frame image is determined and extracted.

In an embodiment of the invention, the target frame image is made up of at least one static object. In order to improve the determination accuracy of pose data and avoid the interference of dynamic objects, target detection is performed on target video data to identify frame images only containing static objects in the target video data, so that the frame images are extracted as target frame images.

In an example, based on the result of the target detection, there may be a plurality of frames of frame images containing only the static object in the target video data, and therefore, one frame image may be randomly selected from the plurality of frames of frame images containing only the static object as the target frame image by means of random sampling, and extracted.

Step S402, constructing a three-dimensional model of the target scene.

Step S403, extracting a plurality of feature point data in the three-dimensional model, and constructing a simulation image based on the plurality of feature point data.

Step S404, based on the matching result between the simulation image and the target frame image, pose data when the physical camera collects the target frame image is determined.

The pose determining method provided by the embodiment can improve the stability of the target frame image, avoid the interference of invalid data and further improve the determining accuracy of pose data.

In some optional embodiments, the target frame image is obtained from the streaming media server in real time, so that key visual information can be obtained based on video data acquired by the physical camera in real time, and pose data of the camera can be better determined. In one example, the streaming server may be a local GBS (Game Broadcasting Server, a server for receiving, encoding, and transmitting real-time audio-video streams of game content to users). The camera may be obtained by picture pulling from the GBS via a real-time streaming Protocol (REAL TIME STREAMING Protocol, RTSP). In some alternative implementation scenarios, as shown in fig. 5, the structural framework of the streaming media server may be an internet of things system including a device terminal layer, a national standard platform layer, a simple global business service layer, and an application layer. The equipment terminal layer comprises various terminal equipment of the Internet of things, such as a network camera, a network video recorder and the like. These devices are able to sense the environment, collect data, and communicate with other devices. The equipment terminal layer is mainly used for collecting environment data and transmitting the environment data to the national standard platform for processing and analysis. And the national standard platform layer is used for receiving the data transmitted by the equipment terminal, and processing, analyzing and storing the data. And the simple global business service layer is used for providing various business applications and services, integrating data and functions provided by the national standard platform, and providing personalized and intelligent services for users through technologies such as cloud computing, big data analysis and artificial intelligence. The application layer is an application interface of the uppermost layer of the Internet of things system and is used for users to directly use. The application layer can be a mobile phone application, a webpage application, a desktop application and the like, and a user can receive services from the simple global business service layer through the application layer and interact with the Internet of things system. The design of the application layer should be simple and easy to use, providing rich functions and personalized setting options.

In some alternative implementation scenarios, as shown in fig. 6, the pose determination process of the virtual camera may be as follows:

Step S601, pulling target video data for collecting a target scene in real time through a streaming media server, and extracting a target frame image.

Step S602, constructing a three-dimensional model of the target scene.

Step S603, extracting a plurality of feature point data in the three-dimensional model, and constructing a simulation image based on the plurality of feature point data and the initial pose data of the virtual camera.

In step S604, the simulated image is superimposed on the target frame image, and the matching result between the simulated image and the target frame image is determined based on the coordinate difference between the point coordinates of each feature point data on the target frame image and the corresponding projection coordinates.

Step S605, based on the matching result, the initial pose data is adjusted through the virtual reality display device, and the pose data when the physical camera collects the target frame image is obtained.

In the pose determining method provided by the invention, the pose data of the physical camera is determined by adopting a virtual measurement mode, so that the labor cost can be greatly saved. And in the step of pose judgment and adjustment, the virtual reality display equipment is adopted for adjustment, so that the adjusted initial pose data more accords with human intuition, and the improvement of the operation efficiency is facilitated.

The embodiment also provides a pose determining device, which is used for realizing the embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a pose determining apparatus, as shown in fig. 7, including:

the first extraction module 701 is configured to extract a target frame image from target video data, where the target video data is obtained by performing image acquisition on a target scene by using a physical camera;

an acquisition module 702, configured to construct a three-dimensional model of a target scene;

A second extraction module 703, configured to extract a plurality of feature point data in the three-dimensional model, and construct a simulation image based on the plurality of feature point data;

The processing module 704 is configured to determine pose data when the physical camera collects the target frame image based on a matching result between the analog image and the target frame image, where the target frame image includes a plurality of feature point data.

In some optional embodiments, the simulated image is obtained based on initial pose data of a virtual camera, the virtual camera corresponding to the physical camera and deployed on the three-dimensional model; the processing module 704 includes: the first execution unit is used for superposing the analog image on the target frame image and respectively determining the projection coordinate of each piece of characteristic point data projected onto the target frame image; a matching unit for determining a matching result between the simulation image and the target frame image based on a coordinate difference between a point coordinate of each feature point data on the target frame image and a corresponding projection coordinate; and the adjusting unit is used for adjusting the initial pose data based on the matching result to obtain the pose data when the physical camera acquires the target frame image.

In some alternative embodiments, the matching unit includes: the first processing unit is used for carrying out variance processing according to the coordinate difference between the point coordinate of each characteristic point data on the target frame image and the corresponding projection coordinate to obtain the position deviation value of a plurality of characteristic point data between the simulation image and the target frame image; the first determining unit is used for determining that the matching result between the simulation image and the target frame image is the same if the position deviation value is smaller than or equal to a preset threshold value; and the second determining unit is used for determining that the matching results between the simulation image and the target frame image are different if the position deviation value is larger than a preset threshold value.

In some alternative embodiments, the adjustment unit comprises: the second execution unit is used for taking the initial pose data as the pose data when the physical camera acquires the target frame image if the matching results are the same; and the third execution unit is used for adjusting the initial pose data if the matching results are different until the matching results are the same, and taking the adjusted initial pose data as the pose data when the physical camera acquires the target frame image.

In some alternative embodiments, the initial pose data is adjusted by a virtual reality display device.

In some alternative embodiments, the first extraction module 701 includes: and the image extraction module is used for determining and extracting a target frame image based on the result of target detection on the target video data, wherein the target frame image is composed of at least one static object.

In some alternative embodiments, the target frame image is pulled from the streaming server in real-time.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The pose determination device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application SPECIFIC INTEGRATED Circuit) Circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above functions.

The embodiment of the invention also provides electronic equipment, which is provided with the pose determining device shown in the figure 7.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, as shown in fig. 8, the electronic device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 8.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The electronic device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 8.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A pose determination method, the method comprising:

extracting a target frame image from target video data, wherein the target video data is obtained by acquiring an image of a target scene through a physical camera;

Constructing a three-dimensional model of the target scene;

extracting a plurality of characteristic point data in the three-dimensional model, and constructing a simulation image based on the characteristic point data;

and determining pose data when the physical camera acquires the target frame image based on a matching result between the simulation image and the target frame image, wherein the target frame image comprises the plurality of feature point data.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The simulation image is obtained based on initial pose data of a virtual camera, and the virtual camera corresponds to the physical camera and is deployed on the three-dimensional model;

the determining pose data when the physical camera collects the target frame image based on the matching result between the simulation image and the target frame image comprises the following steps:

superposing the simulation image on the target frame image, and respectively determining the projection coordinates of each characteristic point data projected onto the target frame image;

determining a matching result between the simulated image and the target frame image based on a coordinate difference between a point coordinate of each feature point data on the target frame image and a corresponding projection coordinate;

and adjusting the initial pose data based on the matching result to obtain pose data when the physical camera acquires the target frame image.

3. The method according to claim 2, wherein the determining a matching result between the simulated image and the target frame image based on a coordinate difference between a point coordinate of each feature point data on the target frame image and a corresponding projection coordinate includes:

Performing variance processing according to the coordinate difference between the point coordinate of each characteristic point data on the target frame image and the corresponding projection coordinate to obtain the position deviation value of the plurality of characteristic point data between the simulated image and the target frame image;

If the position deviation value is smaller than or equal to a preset threshold value, determining that the matching result between the simulation image and the target frame image is the same;

and if the position deviation value is larger than the preset threshold value, determining that the matching result between the simulation image and the target frame image is different.

4. The method of claim 3, wherein adjusting the initial pose data based on the matching result to obtain pose data of the physical camera when the physical camera collects the target frame image comprises:

if the matching results are the same, the initial pose data are used as pose data when the physical camera acquires the target frame image;

And if the matching results are different, adjusting the initial pose data until the matching results are the same, and taking the adjusted initial pose data as the pose data when the physical camera acquires the target frame image.

5. The method of claim 4, wherein the initial pose data is adjusted by a virtual reality display device.

6. The method of claim 1, wherein extracting the target frame image from the target video data comprises:

And determining and extracting the target frame image based on the result of target detection on the target video data, wherein the target frame image is composed of at least one static object.

7. The method of claim 6, wherein the target frame image is pulled from a streaming server in real time.

8. A pose determination apparatus, the apparatus comprising:

the first extraction module is used for extracting a plurality of characteristic point data in the three-dimensional model and constructing a simulation image based on the plurality of characteristic point data;

The second extraction module is used for extracting target frame images from target video data, the target video data are obtained by image acquisition of the target scene through a physical camera, and the target frame images comprise the plurality of characteristic point data;

And the processing module is used for determining pose data when the physical camera acquires the target frame image based on a matching result between the simulation image and the target frame image.

9. An electronic device, comprising:

A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the pose determination method according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer instructions for causing a computer to execute the pose determination method according to any of claims 1 to 7.