CN115588069A

CN115588069A - Scene reconstruction method and device, equipment and storage medium

Info

Publication number: CN115588069A
Application number: CN202211404635.9A
Authority: CN
Inventors: 梅新岩; 谢启宇; 杨辰
Original assignee: Nanjing Opper Software Technology Co ltd
Current assignee: Nanjing Opper Software Technology Co ltd
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2023-01-10

Abstract

The application discloses a scene reconstruction method, a scene reconstruction device, equipment and a storage medium, wherein at least two groups of scene data are determined, the at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene data comprise coordinate data of each feature point in a group of feature points, the coordinate data comprise first standard equipment coordinate system (NDC) coordinates and first texture coordinates, the first NDC coordinates in different scene data adopt the same target NDC, and the second texture coordinates in different scene data adopt the same target screen coordinate system; for each group of scene data in the at least two groups of scene data, converting a first NDC coordinate in the scene data into a world coordinate to obtain data to be rendered; splicing at least two groups of data to be rendered to obtain target scene data; and rendering the target scene data to obtain a target virtual scene.

Description

Scene reconstruction method and device, equipment and storage medium

Technical Field

The present application relates to mobile communications technologies, and in particular, to a method and an apparatus for scene reconstruction, a device, and a storage medium.

Background

With the development and popularization of technologies such as intelligent stereoscopic display and the like, the virtual reality/augmented reality technology has rapidly developed. And scene reconstruction technology is a key step of virtual reality/augmented reality technology. Three-dimensional reconstruction is performed based on multi-view image data, which requires that the world coordinate systems of data acquisition terminals acquiring image data from different views are the same, i.e., the X/Y/Z axes are uniformly oriented. The graphics Application Program Interfaces (APIs) used for graphics rendering are also various, including: openGL, vulkan, metal, webGL, webGPU, etc., and the coordinate systems used by them are also very different, and the graphical interfaces adopted by the data acquisition terminals of different platforms are different, so that there is a problem of inconsistent world coordinate systems when image data are acquired by the data acquisition terminals of different platforms, so that the implementation of the scene reconstruction technology is limited to a plurality of different data acquisition terminals of the same platform.

Disclosure of Invention

The embodiment of the application provides a scene reconstruction method, a scene reconstruction device, equipment and a storage medium, and can realize cross-platform scene reconstruction.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a scene reconstruction method, which comprises the following steps:

determining at least two groups of scene data, wherein the at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene data comprise coordinate data of each feature point in a group of feature points, the coordinate data comprise coordinates of a first standard equipment coordinate system (NDC) and first texture coordinates, the first NDC coordinates in different scene data adopt the same target NDC, and second texture coordinates in different scene data adopt the same target screen coordinate system;

for each group of scene data in the at least two groups of scene data, converting a first NDC coordinate in the scene data into a world coordinate to obtain data to be rendered;

splicing at least two groups of data to be rendered to obtain target scene data;

and rendering the target scene data to obtain a target virtual scene.

An embodiment of the present application provides a scene reconstruction device, including:

a first determining module configured to determine at least two sets of scene data, where the at least two sets of scene data include scene data determined based on different graphics interfaces, the scene data include coordinate data of each feature point in a set of feature points, the coordinate data include first standard device coordinate system NDC coordinates and first texture coordinates, the first NDC coordinates in different sets of scene data use the same target NDC, and the second texture coordinates in different sets of scene data use the same target screen coordinate system;

a conversion module configured to convert, for each of the at least two sets of scene data, a first NDC coordinate in the scene data into a world coordinate, resulting in data to be rendered;

the splicing module is configured to splice at least two groups of data to be rendered to obtain target scene data;

and the rendering module is configured to render the target scene data to obtain a target virtual scene.

An embodiment of the present application provides a scene reconstruction device, which includes a processor configured to:

and rendering the target scene data to obtain a target virtual scene.

An embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, and when the processor executes the computer program, the steps in the scene reconstruction method are implemented.

An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for reconstructing a scene is implemented.

The chip provided in the embodiment of the present application is configured to implement the scene reconstruction method, and the chip includes: and the processor is used for calling and running the computer program from the memory so that the equipment provided with the chip executes the scene reconstruction method.

The method, the device, the equipment and the storage medium for scene reconstruction provided by the embodiment of the application determine at least two groups of scene data, wherein the at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene data comprise coordinate data of each feature point in a group of feature points, the coordinate data comprise coordinates of a first standard equipment coordinate system NDC and first texture coordinates, the first NDC coordinates in different scene data adopt the same target NDC, and the second texture coordinates in different scene data adopt the same target screen coordinate system; for each group of scene data in the at least two groups of scene data, converting a first NDC coordinate in the scene data into a world coordinate to obtain data to be rendered; splicing at least two groups of data to be rendered to obtain target scene data; rendering the target scene data to obtain a target virtual scene; therefore, different data acquisition terminals for acquiring image data can acquire different graphical interfaces of scene data, so that different data acquisition terminals can adopt different graphical interfaces, namely, the data acquisition terminals are allowed to use different platforms, and the scene reconstruction technology can be implemented in a plurality of data acquisition terminals adopting different platforms to realize cross-platform scene reconstruction.

Drawings

Fig. 1 is a schematic diagram of an alternative architecture of a scene reconstruction system provided in an embodiment of the present application;

fig. 2A is a schematic diagram of an alternative architecture of a scene reconstruction system according to an embodiment of the present application;

fig. 2B is a schematic diagram of an alternative architecture of a scene reconstruction system according to an embodiment of the present application;

fig. 3 is an alternative flowchart of a scene reconstruction method according to an embodiment of the present application

FIG. 4 is an alternative schematic diagram of a target virtual scene provided by an embodiment of the present application;

FIG. 5 is an alternative schematic diagram of coordinate system alignment provided by embodiments of the present application;

fig. 6 is an alternative flowchart of a scene reconstruction method provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of an alternative relationship between a platform and a graphics interface provided by embodiments of the present application;

FIG. 8 is an alternative schematic diagram of a spatial transformation process provided by an embodiment of the present application;

FIG. 9 is an alternative schematic diagram of a crop space, an NDC space, and a screen space provided by embodiments of the present application;

FIG. 10 is a schematic diagram of a screen coordinate system of different graphical interfaces of an embodiment of the present application;

FIG. 11 is an exemplary illustration of a world coordinate system, NDC, screen coordinate system of different graphical interfaces of an embodiment of the present application;

FIG. 12 is a schematic diagram of the transformation effect of the screen coordinate system of the embodiment of the present application;

FIG. 13 is a schematic diagram of spatial relationships of feature points of different data acquisition devices according to an embodiment of the present application;

FIG. 14 is a schematic diagram of a 3D-2D matching point relationship provided in an embodiment of the present application;

FIG. 15 is a schematic diagram illustrating a reconstruction effect of a target virtual scene provided in an embodiment of the present application;

FIG. 16 is a schematic diagram illustrating a reconstruction effect of a target virtual scene provided in an embodiment of the present application;

fig. 17 is an alternative schematic structural diagram of a scene reconstruction apparatus provided in an embodiment of the present application;

fig. 18 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 19 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.

The embodiment of the application can provide a scene reconstruction method, a scene reconstruction device, scene reconstruction equipment and a storage medium. In practical application, the scene reconstruction method may be implemented by a scene reconstruction device, and each functional entity in the scene reconstruction device may be cooperatively implemented by hardware resources of a computer device (e.g., electronic devices such as a terminal device and a network device), such as computing resources such as a processor, and communication resources (e.g., for supporting communication in various manners such as optical cables and cells).

Of course, the embodiments of the present application are not limited to being provided as methods and hardware, and may be provided as a storage medium (storing instructions for executing a scene reconstruction method provided by the embodiments of the present application) in many implementations.

Fig. 1 is a schematic diagram of an application scenario of an embodiment of the present application.

As shown in fig. 1, the scene reconstruction system 100 may include: the data acquisition terminal 10N is the data acquisition terminal 101, the data acquisition terminal 102, and the data acquisition terminal 10N, wherein the N data acquisition terminals perform image data acquisition, different data acquisition terminals perform image data acquisition based on different viewing angles, and the acquisition ranges corresponding to different data acquisition terminals have the same object, so that the content in the acquisition ranges corresponding to different data acquisition terminals is subjected to scene reconstruction based on the same object.

In the embodiment of the present application, the image data acquired by all the data acquisition terminals are image data of the same object from different angles, such as: the plurality of data acquisition terminals acquire image data of the same human body from different angles, so as to reconstruct the posture of the user, and the data acquisition terminals can also be image data of different objects, such as: different data acquisition terminals acquire image data of different users, and different tables are placed in front of different users, so that a virtual meeting scene of a plurality of different users is created.

In this embodiment, the scene reconstruction system 100 may further include a scene reconstruction device 120, where the scene reconstruction device 120 is configured to perform scene reconstruction according to scene data obtained by acquiring image data by different data acquisition terminals.

In an example, as shown in fig. 2A, the scene reconstruction device 120 is a data acquisition terminal 10N in fig. 1, where N is from 1 to N, and at this time, the terminal performing scene reconstruction simultaneously performs image data acquisition.

In an example, as shown in fig. 2B, the scene reconstruction device 120 is a terminal device other than the data acquisition terminal 101 to the data acquisition terminal 10N in fig. 1.

In the embodiment of the application, in scene reconstruction equipment, at least two groups of scene data are determined, the at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene data comprise coordinate data of each feature point in a group of feature points, the coordinate data comprise coordinates of a first standard equipment coordinate system NDC and first texture coordinates, the first NDC coordinates in different scene data adopt the same target NDC, and second texture coordinates in different scene data adopt the same target screen coordinate system; for each group of scene data in the at least two groups of scene data, converting a first NDC coordinate in the scene data into a world coordinate to obtain data to be rendered; splicing at least two groups of data to be rendered to obtain target scene data; and rendering the target scene data to obtain a target virtual scene.

For the convenience of understanding of the technical solutions of the embodiments of the present application, the following related technologies of the embodiments of the present application are described below, and the following related technologies may be optionally combined with the technical solutions of the embodiments of the present application as alternatives, and all of them belong to the protection scope of the embodiments of the present application.

Embodiments of a scene reconstruction method, a scene reconstruction device, a scene reconstruction apparatus, and a storage medium according to the embodiments of the present application are described below with reference to schematic diagrams of scene reconstruction systems shown in fig. 1, fig. 2A, or fig. 2B.

An embodiment of the present application provides a scene reconstruction method, which is applied to a sending device, and as shown in fig. 3, the method may include:

s301, the scene reconstruction device determines at least two groups of scene data, the at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene data comprise coordinate data of each feature point in a group of feature points, the coordinate data comprise coordinates of a first standard device coordinate system (NDC) and first texture coordinates, the first NDC coordinates in different scene data adopt the same target NDC, and second texture coordinates in different scene data adopt the same target screen coordinate system.

In the scene reconstruction system, each data acquisition terminal in a plurality of data acquisition terminals acquires data of a target object to obtain image data, and the target objects corresponding to different data acquisition terminals can be different positions of the same object or different target objects. The image data collected by the data collection terminal can comprise: image color data and depth data.

After the image data are acquired by the plurality of data acquisition terminals, for each data acquisition terminal, mesh data reflecting the coordinates of the characteristic points of the acquired object are deduced according to the image data acquired by the data acquisition terminal, wherein the coordinates of the characteristic points in the mesh data are model coordinates. And the data acquisition terminal performs spatial conversion on the mesh data to obtain original scene data. In the space conversion process, the involved space may include: the model space is a local space, a world space, a visual space, a cutting space, an NDC space and a screen space of the data acquisition terminal, wherein a coordinate system adopted by the local space is a model coordinate system, and coordinates of the characteristic points in the model coordinate system are model coordinates; the world space adopts a world coordinate system, and the coordinates of the characteristic points in the world coordinate system are world coordinates; the visual space adopts a visual coordinate system, namely a camera coordinate system, and the coordinates of the characteristic points in the camera coordinate system are camera coordinates; a cutting coordinate system is adopted in the cutting space, and the coordinates of the characteristic points in the cutting coordinate system are cutting coordinates; NDC is adopted in the NDC space, and the coordinates of the characteristic points in the NDC are NDC coordinates; the screen space adopts a screen coordinate system, and the coordinates of the characteristic points in the screen coordinate system are screen coordinates, namely texture coordinates. The NDC coordinate is a three-dimensional coordinate, and the texture coordinate is a two-dimensional coordinate. The raw scene data may include: NDC coordinates and texture coordinates.

The method comprises the steps that based on different NDC coordinate systems and/or different screen coordinate systems adopted by original scene data obtained by different data acquisition terminals, the original scene data are subjected to coordinate system alignment to obtain scene data, the NDCs adopted by the NDC coordinates in the scene data are the same as a target NDC, and the screen coordinate systems adopted by texture coordinates in the scene data are the same as a target screen coordinate system. Here, NDC coordinates in the scene data are referred to as first NDC coordinates, NDC coordinates in the original scene data are referred to as second NDC coordinates, texture coordinates in the scene data are referred to as first texture coordinates, and texture coordinates in the original scene data are referred to as second texture coordinates.

In the embodiment of the present application, the coordinate systems are different, which may be understood as at least one of the following: the direction of the same axis and the value of the same axis. In one example, if the positive direction of the X-axis in coordinate system 1 is to the right and the positive direction of the X-axis in coordinate system 2 is to the left, coordinate system 1 and coordinate system 2 are considered to be different. In one example, the X-axis in the coordinate system 1 ranges from 0 to 1, and the X-axis in the coordinate system 2 ranges from-1 to 1, and the coordinate system 1 and the coordinate system 2 are considered to be different. In one example, if the positive direction of the X-axis in the coordinate system 1 is to the right and the Y-axis has a range of values from 0 to 1, and the positive direction of the X-axis in the coordinate system 2 is to the left and the Y-axis has a range of values from-1 to 1, then the coordinate system 1 and the coordinate system 2 are considered to be different.

In the embodiment of the application, the original scene data in one data acquisition terminal is a group of original scene data, and the scene data obtained by aligning the coordinate system of the group of original scene data is a group of scene data. The data acquisition terminal can align the original scene data with the coordinate system to obtain the scene data, and the scene reconstruction device can align the original scene data to obtain the scene data.

The scene data using the same NDC and screen coordinate system is obtained by aligning the coordinate systems of the original scene data using different NDCs and/or screen coordinate systems, so that the at least two sets of scene data include the scene data determined based on different graphic interfaces, and the data acquisition terminals corresponding to different scene data can adopt different graphic interfaces. Based on the fact that at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene reconstruction system comprises data acquisition terminals adopting different graphic interfaces, and the data acquisition terminals in the scene creation system comprise data acquisition terminals using different platforms, namely operating systems.

In the embodiment of the present application, the platform used by the data acquisition terminal may include, but is not limited to: android (Android), windows, linux, OS, and the like. The data acquisition terminal adopts different platforms and different graphic interfaces. In this embodiment of the application, the graphical interface that the data acquisition terminal adopted can include: directX, openGL, vulkan, metal, etc. The Windows/Linux platform can use the following graphics APIs: directX, openGL, vulkan, android platforms may use the following graphics APIs: openGL/Vulkan, IOS/MacOS platform uses graphics APIs: and (5) Metal.

In practical application, different data acquisition terminals can operate a reference graphic interface, and the reference graphic interface can encapsulate different types of graphic interfaces. In an example, the reference interface can be a WebGPU that can package Vulkan, metal, directX.

S302, for each group of scene data in the at least two groups of scene data, the scene reconstruction device converts the first NDC coordinates in the scene data into world coordinates to obtain data to be rendered.

After the scene reconstruction device determines at least two groups of scene data, for each group of scene data, converting first DNC coordinates in the scene data into world coordinates.

And for a characteristic point, performing coordinate inverse transformation on the first NDC coordinate of the characteristic point in the scene data to obtain the world coordinate of the special diagnosis point.

In the embodiment of the present application, in the space conversion process, the involved space may include: the model space is a local space, a world space, a visual space, a clipping space, an NDC space and a screen space of the data acquisition terminal, and the scene reconstruction device converts a first NDC in the NDC space to the world space through the clipping space and the visual space in sequence.

The coordinates of the feature points in the local space are converted into the world space through a first matrix M, the coordinates of the feature points in the world space are converted into a view space through a second matrix V, the coordinates of the feature points in the view space are converted into a clipping space through a third matrix, namely a projection matrix P, the coordinates in the clipping space need to be processed through a fourth matrix, namely a clipping matrix C, and the first NDC needs to be obtained by multiplying the projection matrix P by an alignment matrix A. In one example, if the first NDC is identified as P0 and the world coordinates after the first NDC conversion is identified as P1, then equation (1) exists:

P1V P C = P0 formula (1);

thus, P1 can be expressed as formula (2):

p1= Inverse (V × P × C) P0 formula (2);

inverse () represents an Inverse transform.

In the embodiment of the application, the data acquisition terminal and the scene reconstruction device can determine the NDC based on the graphical interface adopted by the data acquisition terminal, but cannot determine the world coordinate system used by each data acquisition terminal in the process of space conversion, and the scene reconstruction device enables the world coordinates of each feature point to use the same world coordinate system, namely, the world coordinate systems are aligned through the conversion of scene data aligned by the NDC. The world coordinates of the feature points are three-dimensional coordinates.

S303, splicing at least two groups of data to be rendered by the scene reconstruction equipment to obtain target scene data.

After the scene reconstruction device obtains the world coordinates of the feature points aligned with the world coordinate system, the data to be rendered are spliced based on the three-dimensional world coordinates and the two-dimensional first texture coordinates of the feature points, wherein the splicing object can be understood as splicing the world coordinates indicating the positions of the feature points in the world space in the data to be rendered, so that all the feature points are spliced to obtain target scene data representing the topological relation of the feature points in the target virtual scene to be reconstructed.

In an example, as shown in fig. 4, a data acquisition terminal in a scene reconstruction system includes: the terminal 401, the terminal 402, the terminal 403, the terminal 404 and the terminal 405, where the terminal 401, the terminal 402 and the terminal 404 use an OS system, the terminal 403 uses an android system, the terminal 405 uses a Windows system, the NDC of the terminal 401, the terminal 402 and the terminal 404 are the same as a coordinate system 421, the NDC of the terminal 403 is a coordinate system 422, and the NDC of the terminal 405 is the same as a coordinate system 423, where original scene data of each data acquisition terminal is aligned to the coordinate system 421 to obtain scene data of each data acquisition terminal, and content acquired by each data acquisition terminal is spliced based on the scene data of each data acquisition terminal to obtain data to be rendered corresponding to the target virtual scene 430, in fig. 4, when the coordinate systems are aligned, coordinate system alignment is performed on the NDC coordinates in the original acquired data of the terminal 403 and the terminal 405, and NDC coordinates in the original scene data of the terminal 401, the terminal 402 and the terminal 404 do not need to be aligned.

S304, rendering the target scene data by the scene reconstruction equipment to obtain a target virtual scene.

And after the scene reconstruction equipment determines the data to be rendered, rendering the data to be rendered, displaying the data to be rendered on a display interface, and outputting the target virtual scene.

The scene reconstruction method provided by the embodiment of the application determines at least two groups of scene data, wherein the at least two groups of scene data comprise scene data determined based on different graphic interfaces, the scene data comprise coordinate data of each feature point in a group of feature points, the coordinate data comprise first standard equipment coordinate system (NDC) coordinates and first texture coordinates, the first NDC coordinates in different scene data adopt the same target NDC, and the second texture coordinates in different scene data adopt the same target screen coordinate system; for each group of scene data in the at least two groups of scene data, converting a first NDC coordinate in the scene data into a world coordinate to obtain data to be rendered; splicing at least two groups of data to be rendered to obtain target scene data; rendering the target scene data to obtain a target virtual scene; therefore, different data acquisition terminals for acquiring image data can acquire different graphical interfaces of scene data, so that different data acquisition terminals can adopt different graphical interfaces, namely, the data acquisition terminals are allowed to use different platforms, and the scene reconstruction technology can be implemented in a plurality of data acquisition terminals adopting different platforms to realize cross-platform scene reconstruction.

In some embodiments, the splicing, by the scene reconstruction device, at least two sets of data to be rendered to obtain target scene data includes: determining the relative pose between different data acquisition terminals in at least two data acquisition terminals corresponding to the at least two groups of data to be rendered based on the world coordinates and the first texture coordinates of the feature points; and splicing corresponding data to be rendered based on the relative pose between different data acquisition terminals to obtain the target scene data.

When splicing data to be rendered, the scene reconstruction equipment determines the relative pose between the viewpoints of the image data acquired by the data acquisition terminal corresponding to each data to be rendered, wherein for different data acquisition terminals, the relative pose between the viewpoints of the acquired image data can be understood as the relative pose between the cameras of the different data acquisition terminals.

The relative pose of the cameras may be represented as a rotational-translation matrix, wherein the rotational-translation matrix may include: a rotation matrix that may characterize a rotation angle between the cameras and a translation matrix that may characterize a translation distance between the cameras.

In the embodiment of the application, the relative pose between the cameras of every two data acquisition terminals can be determined, and the relative pose between the cameras of the data acquisition terminals with common characteristic points can also be determined.

After the relative poses between the cameras of different data acquisition terminals are determined, for two groups of data to be rendered with corresponding relative displacement in the at least two groups of data to be rendered, the two groups of data to be rendered are spliced based on the relative poses between the cameras of the two data acquisition terminals, so that the splicing of each feature point indicated by the at least two groups of data to be rendered is completed, and the target scene data is obtained.

In some embodiments, the scene reconstruction device determines, based on the world coordinates and the first texture coordinates of the feature points, a relative pose between different data acquisition terminals in at least two data acquisition terminals corresponding to the at least two sets of data to be rendered, including: respectively taking each data to be rendered in the at least two groups of data to be rendered as first data to be rendered, and executing the following processing on the first data to be rendered:

determining second data to be rendered, which has a common characteristic point with the data to be rendered, in the at least two groups of data to be rendered;

and determining the relative pose between a first data acquisition terminal corresponding to the first data to be rendered and a second data acquisition terminal corresponding to the second generation rendering data according to the first NDC coordinate and the first texture coordinate of the common feature point.

In the embodiment of the application, the relative pose of the camera can be determined for the data to be rendered with the common feature points. Here, two data to be rendered having a common feature point are referred to as first rendering data and second rendering data, respectively.

When the first data to be rendered and the second data to be rendered are determined, any data to be rendered can be used as the first data to be rendered, and the second generation rendering data is searched for in other data to be rendered except the first data to be rendered in at least two groups of data to be rendered. Here, the second data to be rendered may be determined according to world coordinates included in the data to be rendered. When the same world coordinate exists in the first data to be rendered and the second generation rendering data, the characteristic point indicated by the world coordinate is represented as a common characteristic point of the two data to be rendered.

After determining the world coordinates of the common feature points of the first to-be-rendered data and the second generation rendered data, acquiring the world coordinates of the common feature points, namely the common world coordinates and first texture coordinates of the common world coordinates in the first to-be-rendered data and corresponding first texture coordinates in the second to-be-rendered data, and determining the relative pose between the cameras of the data acquisition terminal corresponding to the first to-be-rendered data and the data acquisition terminal corresponding to the second to-be-rendered data based on the first texture coordinates of the common world coordinates and the common world coordinates in the first to-be-rendered data and the second to-be-rendered data respectively.

In the embodiment of the present application, the algorithm for determining the relative pose between the cameras may include, but is not limited to, a P3P algorithm.

In some embodiments, the scene reconstruction device determines at least two sets of scene data, including:

determining original scene data of each data acquisition terminal in at least two data acquisition terminals, wherein the original scene data comprise a second NDC coordinate and a second texture coordinate of the feature point, which are obtained by space conversion of image data acquired by the data acquisition terminals;

and aiming at the original scene data of each data acquisition terminal in the at least two data acquisition terminals, under the condition that a coordinate system adopted by the original scene data is not a target coordinate system, carrying out coordinate system alignment on the original scene data to obtain the scene data corresponding to the original scene data.

Under the condition that the scene reconstruction equipment aligns the coordinate systems, the original scene data sent by the data acquisition terminals are received from different data acquisition terminals, so that at least two groups of original scene data are obtained.

The scene reconstruction device judges whether a coordinate system adopted by each original scene data in at least two groups of original scene data is a target coordinate system or not under the condition that at least two groups of original scene data are obtained, wherein the target coordinate system comprises a target NDC and a target screen coordinate system, and for one original scene data, if at least one of the following conditions is met, the coordinate system adopted by the original scene data is considered not to be the target coordinate system:

the NDC employed is not the target NDC;

the screen coordinate system used is not the target screen coordinate system.

For original scene data of which the adopted coordinate system is not a target coordinate system, the scene reconstruction equipment performs coordinate conversion on the coordinate system of the original scene data and converts the coordinate system of the original scene data into the target coordinate system to obtain the scene data. Here, the conversion into the target coordinate system may be understood such that the NDC coordinates of the scene data after the original scene data is converted adopt the target NDC and the texture coordinates adopt the target screen coordinate system. For the original scene data with the coordinate system as the target coordinate system, the original scene data can be regarded as the scene data, and the coordinate system conversion is not needed. And the scene reconstruction equipment enables the NDC coordinates of each data acquisition terminal to adopt a target NDC and the texture coordinates of each data acquisition terminal to adopt a target screen coordinate system through the conversion of the coordinate system, so that the alignment of the coordinate system is realized.

In the embodiment of the application, when the original scene data needs to be aligned with the coordinate system, the coordinate conversion of each feature point in the original scene data can be performed based on the adjustment matrix, and the feature points are converted into the target coordinate system.

For the second NDC coordinates, the corresponding adjustment matrix may be referred to as a first adjustment matrix, and the second NDC coordinates not belonging to the target NDC are converted to the second NDC coordinates in the target NDC through conversion of the first adjustment matrix.

For the second texture coordinate, the corresponding adjustment matrix may be referred to as a second adjustment matrix, and the second texture coordinate not belonging to the target screen coordinate system is converted into the second texture coordinate in the target screen coordinate system through the conversion of the second adjustment matrix.

In this embodiment of the application, when the scene creating device is used as the data acquisition terminal, the target NDC may be an NDC used by the scene reconstructing device, or an NDC other than the NDC used by the scene reconstructing device, and similarly, the target screen coordinate system may be a screen coordinate system adopted by the scene reconstructing device, or a screen coordinate system other than the screen coordinate system adopted by the scene creating device.

In one example, as shown in fig. 5, the data collection terminal includes: the terminal 501, the terminal 502, the terminal 503 and the terminal 504 respectively send respective original scene data to the terminal 504 serving as scene reconstruction equipment, after the terminal 504 receives the original scene data 1 sent by the terminal 501, the original scene data 2 sent by the terminal 502, the original scene data 4 sent by the terminal 503 and the original scene data 4 of the terminal itself, under the condition that a target NDC is an NDC used by the terminal 504 and a target screen coordinate system is a screen coordinate system used by the terminal 504, whether a coordinate system adopted by the original scene data 1, the original scene data 2 and the original scene data 4 is a target coordinate system is determined, under the condition that the coordinate system adopted by the original scene data 1 and the original scene data 2 is not the target coordinate system is determined, coordinate system conversion is performed on the original scene data 1 and the original scene data 2 to obtain the scene data 1 and the scene data 2 adopting the target coordinate system, the original scene data 3 is the scene data 3, and the original scene data 4 is the scene data 4.

In the embodiment of the present application, the types of methods for the data acquisition terminal to perform coordinate system alignment and the scene creation terminal to perform coordinate system alignment are different in that:

under the condition that the data acquisition terminal aligns the coordinate system, the data acquisition terminal only judges whether the coordinate system adopted by the original scene data of the data acquisition terminal is a target coordinate system, converts the original scene data of the data acquisition terminal into scene data under the condition that the coordinate system is not the target coordinate system, and sends the scene data to scene reconstruction equipment;

the method comprises the steps that under the condition that a scene reconstruction device carries out coordinate system alignment, original scene data sent by each data acquisition terminal are received, and under the condition that a plurality of groups of received original scene data comprise one or more groups of original scene data with different adopted coordinate systems and target coordinate systems, coordinate system conversion is carried out on the original scene data, so that scene data are obtained.

In some embodiments, the scene reconstruction device further performs the following:

aiming at each data acquisition terminal in the at least two data acquisition terminals, the following processing is respectively executed:

determining a graphical interface used by the data acquisition terminal;

and determining a coordinate system adopted by the original scene data of the data acquisition terminal based on a graphic interface used by the data acquisition terminal.

The scene reconstruction device can receive operating system information or graphic interface information of the data acquisition terminal from the data acquisition terminal, wherein the operating system information is used for identifying the type of an operating system adopted by the corresponding data acquisition terminal, and the graphic interface information is used for identifying the type of a graphic interface adopted by the corresponding data acquisition terminal.

The scene reconstruction equipment determines the type of the graphic interface used by the data acquisition terminal according to the operating system information or the graphic interface information sent from the data acquisition terminal.

Taking the case that the data acquisition terminal sends the operating system information to the scene reconstruction device, when the data acquisition terminal independent of the scene reconstruction device sends the original scene data to the scene reconstruction device, the operating system information can be sent to the scene creation device, so that the scene reconstruction device can determine the operating system used by the corresponding data acquisition terminal based on the received operating system information, and then judge the graphical interface used by the data acquisition terminal according to the determined operating system.

Taking the example that the data acquisition terminal sends the graphical interface information to the scene reconstruction device, when the data acquisition terminal independent of the scene reconstruction device sends the original scene data to the scene reconstruction device, the data acquisition terminal can send the graphical interface information of the graphical interface used by the data acquisition terminal to the scene creation device, and the scene reconstruction device judges the type of the graphical interface used by the data acquisition terminal according to the graphical interface information.

In the embodiment of the application, for different graphic interfaces, the relationship between the graphic interface and the coordinate system identifier can be bound, wherein different coordinate systems are represented by different coordinate system identifiers. After the scene reconstruction device determines the graphical interface, a coordinate system adopted by the original scene data of the data acquisition terminal device is determined based on the graphical interface and the binding relationship.

In some embodiments, the aligning the coordinate system of the original scene data by the scene reconstruction device to obtain the scene data corresponding to the original scene data includes:

and under the condition that the second NDC coordinate adopts a first NDC and the first NDC is not the target NDC, converting the first NDC coordinate into an NDC space corresponding to the target NDC according to the first NDC and the target NDC to obtain the first NDC coordinate.

The scene reconstruction device determines whether a first NDC adopted by a second NDC coordinate sent by a data acquisition terminal is a target NDC or not based on the type of a graphic interface adopted by the data acquisition terminal. In some embodiments, the type of the NDC corresponding to the different graphics interface types may be identified, and in a case that the NDC corresponding to the graphics interface type of the current data acquisition terminal is not the target NDC, it is determined that the first NDC adopted by the second NDC coordinate is not the target NDC.

Here, in the case where the first NDC is not the target NDC, the difference between the first NDC and the target NDC includes at least one of: the direction of the same axis and the value of the same axis.

And under the condition that the first NDC adopted by the second NDC coordinate is not the target NDC, the scene reconstruction equipment performs coordinate conversion on the second NDC coordinate based on the first adjusting matrix to obtain the first NDC coordinate adopting the target NDC.

In some embodiments, the transforming, by the scene reconstruction device, the first NDC coordinate into an NDC space corresponding to a target NDC according to the first NDC and the target NDC to obtain the first NDC coordinate includes:

determining a first adjustment matrix according to the first NDC and the target NDC;

and converting the second NDC coordinate according to the first adjusting matrix to obtain the second NDC coordinate.

In this embodiment, the difference between the different first NDC and the target NDC may be different, and therefore, the first adjustment matrix may be determined according to the current first NDC and the target NDC. The scene reconstruction device can be associated with corresponding first adjustment matrixes corresponding to different NDCs, and then the scene reconstruction device searches the corresponding first adjustment matrixes according to the first NDCs and the target NDCs.

In an embodiment of the present application, when performing the NDC conversion, the scene reconstruction device may perform the conversion by using a first projection conversion matrix and a first adjustment matrix, where the first projection conversion matrix is used to convert directions of any one or more of an X axis, a Y axis, and a Z axis, and the first projection conversion matrix may include a perspective projection matrix and an orthogonal projection matrix. The first adjustment matrix can be used for converting the values of the characteristic points in the X axis, the Y axis and the Z axis.

and under the condition that the second texture coordinate adopts a first screen coordinate system and the first screen coordinate system is not the target screen coordinate system, converting the first texture coordinate into a screen coordinate system space corresponding to the target screen coordinate system according to the first screen coordinate system and the target screen coordinate system to obtain the first texture coordinate.

The scene reconstruction equipment determines whether a first screen coordinate system adopted by a second texture coordinate sent by a data acquisition terminal is a target screen coordinate system or not based on the type of a graphic interface adopted by the data acquisition terminal. In some embodiments, the types of the screen coordinate systems of different graphic interface types may be identified, and in a case that the screen coordinate system corresponding to the graphic interface type of the current data acquisition terminal is not the target screen coordinate system, it is determined that the first screen coordinate system adopted by the second texture coordinate is not the target screen coordinate system.

Here, in a case where the first screen coordinate system is not the target screen coordinate system, the difference between the first screen coordinate system and the target screen coordinate system includes at least one of: the direction of the same axis and the value of the same axis.

And under the condition that the first screen coordinate system adopted by the second texture coordinate system is not the target screen coordinate system, the scene reconstruction equipment performs coordinate conversion on the second texture coordinate system on the basis of the second adjusting matrix to obtain the first texture coordinate system adopting the target screen coordinate system.

In some embodiments, the transforming, by the scene reconstruction device, the first texture coordinate into a screen coordinate system space corresponding to a target screen coordinate system according to the first screen coordinate system and the target screen coordinate system to obtain the first texture coordinate includes:

determining a second adjustment matrix according to the first screen coordinate system and the target screen coordinate system;

and converting the second texture coordinate according to the second adjustment matrix to obtain the second texture coordinate.

In the embodiment of the application, the screen coordinate system is a two-dimensional coordinate system formed by a horizontal direction (U) and a vertical direction (V).

In the embodiment of the application, the difference between the different first screen coordinate systems and the target screen coordinate system may be different, and therefore, the second adjustment matrix may be determined according to the current first screen coordinate system and the target screen coordinate system. Corresponding second adjustment matrixes can be associated in the scene reconstruction equipment corresponding to different screen coordinate systems, and then the scene reconstruction equipment searches the corresponding second adjustment matrixes according to the first screen coordinate system and the target screen coordinate system.

In this embodiment of the application, the scene reconstruction device may perform conversion by using a second projection conversion matrix and a second adjustment matrix when performing conversion of a screen coordinate system, where the second projection conversion matrix is used to convert a direction of any one of U and V, and the second projection matrix may include a perspective projection matrix and an orthogonal projection matrix. The second adjustment matrix may be used to convert the values of the feature points over U and V.

The scene reconstruction method provided in the embodiment of the present application is further described below.

The embodiment of the application provides a scene reconstruction method, based on the cross-platform property of a WebGPU, NDC space data of equipment terminals participating in scene reconstruction are aligned in a 3D coordinate system in an NDC space, 2D UV coordinate system alignment is carried out in a screen space, and then the camera pose of each terminal is calculated, and a synthesized scene is rendered. On one hand, the cross-platform performance of the webgpu is exerted, and the algorithm can be deployed in equipment terminals of different platforms; on the other hand, the introduced coordinate system alignment mode achieves a unified coordinate space, and consistency and accuracy of scene reconstruction are guaranteed.

The method provided by the embodiment of the application can be shown in fig. 6, and includes the following stages:

s601, data acquisition stage.

In the data acquisition stage, different data acquisition terminals perform data acquisition 6011, where the data acquisition terminals may include: the method comprises the steps of adopting a Windows terminal of a Windows platform (namely an operating system), adopting a Linux terminal of a Linux platform, adopting an Android terminal of an Android platform and adopting an IOS/MacOS terminal of an IOS/MacOS platform. The data collected by the data collection terminal can comprise: image color data, depth data, etc.

The acquisition objects of different data acquisition terminals are the same or different.

S602, data processing stage.

The data processing stage includes the following functions: spatial coordinate system translation 6021 and coordinate system alignment 6022.

In the phase of space coordinate system conversion, each data acquisition terminal performs space coordinate system conversion on the acquired data, and converts the acquired data from an object coordinate system to a screen coordinate system through a world coordinate system, a camera coordinate system, a cutting coordinate system and a normalized device coordinate system (NDC) in sequence. And the webGPU of the data acquisition terminal performs space coordinate system conversion on the acquired data.

In the embodiment of the application, the NDCs used for different graphical interfaces are the same or different, and the screen coordinate systems used are the same or different.

And in the process of converting the coordinate system or after the coordinate system is converted, aligning different NDC coordinates adopting NDC and texture coordinates adopting different screen coordinate systems by using the coordinate system.

In the coordinate system alignment stage, for a data acquisition terminal, when the used NDC is not the target NDC, the NDC data obtained through NDC conversion is aligned to the target NDC, and when the used screen coordinate system is not the target screen coordinate system, the conversion result converted to the screen coordinate system is aligned to the target screen coordinate system. After the data acquisition terminal aligns the coordinate system, the data after the coordinate system alignment is sent to the reconstruction terminal.

In the coordinate system alignment stage, each data acquisition terminal can also send the data after coordinate conversion to the reconstruction terminal, and the reconstruction terminal aligns the received NDC data to the target NDC and aligns the received conversion result to the target screen coordinate system.

The reconstruction terminal may also be one of the data acquisition terminals, or another terminal other than the data acquisition terminal.

S603, a data rendering stage.

A data rendering phase comprising the following functions: pose estimation 6031 and topological relation establishment 6032.

And the reconstruction terminal estimates the poses of the cameras of each data acquisition terminal according to the data after the coordinate system is aligned, wherein the poses of the cameras represent a rotation and translation matrix between the cameras. The reconstruction terminal can determine the relative pose between the cameras according to the three-dimensional data on the NDC coordinate system and the two-dimensional data on the screen coordinate.

And the reconstruction terminal splices the data of each acquisition terminal, which are aligned in the space system, through the relative pose between the cameras, and finally synthesizes a complete scene.

The stages shown in fig. 6 will be further described below.

Data acquisition phase

In the embodiment of the application, different data acquisition terminals and different reconstruction terminals operate webGPU interfaces, wherein the webGPU interfaces can package the following graphics APIs: directX, openGL, vulkan, metal, etc. The Windows/Linux platform can use the following graphics APIs: directX, openGL, vulkan, android platforms may use the following graphics APIs: openGL/Vulkan, IOS/MacOS platform uses graphics APIs: the Metal is used.

In the scene reconstruction method provided in the embodiment of the present application, a Webgpu algorithm for implementing the scene reconstruction method is deployed on a Webgpu interface, and the following graphics APIs are encapsulated based on the Webgpu interface: directX, vulkan, metal, and different graphics interfaces can be applied to different platforms, so the Webgpu algorithm provided by the embodiment of the present application can be deployed to a terminal that employs any one of the following platforms: windows platform, linux platform, android platform, and Ios/Macos platform.

As shown in fig. 7, the browser API: the webGPU is based on native graphics API: vulkan, direct3D.12, metal can run on the following operating systems, platforms: therefore, the scene creation method provided by the embodiment of the application can be deployed on platforms such as android, linuX, windows10 and IOS in a cross manner, so that the cross-platform is realized.

Data processing stage

In the embodiment of the present application, when the data acquisition terminal performs spatial coordinate system conversion on the acquired data, as shown in fig. 8, the following space is involved: the system comprises a local space 801, a world space 802, a camera space 803, a cropping space 804, a Normalization Device Coordinates (NDC) space 805 and a screen space 806, wherein the image data collected by the image collecting terminal adopts the local space, namely an object space, the coordinate system of the local space is a local coordinate system, namely an object coordinate system, the coordinate system of the world space is a world coordinate system, the coordinate system of the camera space is a camera coordinate system, the coordinate system of the cropping space is a cropping coordinate system, the coordinate system of the NDC space is an NDC space, the screen space is also called a UV space, and the coordinate system of the screen space is a screen coordinate system.

The data acquired by the data acquisition terminal adopts an object coordinate system of a local space, and the data acquisition terminal performs space coordinate system conversion on the acquired data to obtain two-dimensional data adopting a screen coordinate system. The spatial coordinate system conversion process is shown in fig. 8, and includes:

model transformation 81 is performed on object coordinates of the local space 801 to obtain world coordinates of the world space 802, view transformation 82 is performed on the world coordinates to obtain camera coordinates of the camera space 803, projection transformation 83 is performed on the camera coordinates to obtain clipping coordinates of the clipping space 804, perspective division 84 is performed on the clipping coordinates to obtain NDC coordinates of the NDC space 805, and viewport transformation 85 is performed on the NDC coordinates to obtain texture coordinates of the screen space 806. The texture coordinate of the screen coordinate system is adopted as two-dimensional data, and other coordinates in the space coordinate system conversion process are three-dimensional data.

The object coordinate system is a coordinate system in the local coordinate space, representing one coordinate system of all points in the model with respect to the origin of the model itself.

The world coordinate system is a coordinate system in the world space, and data adopting the world coordinate system refers to a coordinate position of each point in the model relative to an origin of the world coordinate system.

The camera coordinate system is the coordinate system of the camera space, the camera is located in the world coordinate system for photographing the object in the scene, and the position of the object from the camera angle is the coordinate of the object in the camera space, that is, the position in the world coordinate system is interpreted from the camera angle.

The cutting coordinate system is a coordinate system of a cutting space, the cutting space is a viewing cone space provided with an object in a visual field based on a viewpoint, if perspective projection is adopted, the object in the visual field presents the effect of big-end-up and small-end-up,

NDC is a coordinate system of an NDC space, and data adopting a cutting coordinate system are normalized, wherein after normalization, the sizes of x-axis and y-axis of all coordinates belong to [ -1,1], and the sizes of z-axis of all coordinates belong to [0,1] or [ -1,1].

The screen space coordinate system is a coordinate system of the screen space, which is a horizontal (U) vertical (V) coordinate system, and all vertices of the objects in the data are converted to a horizontal (U) vertical (V) coordinate system of the screen space, which represents 2D coordinates used in generating the texture or sampling the texture.

When the Webgpu algorithm is deployed to a Windows/Linux platform, graphics APIs such as DirectX/OpenGL/Vulkan are generally used; when the Webgpu algorithm is deployed on an Android platform, graphics APIs such as OpenGL/Vulkan are generally used; when the Webgpu algorithm is deployed to an IOS/MacOS platform, a graphics API such as Metal is generally used. And differences exist between the NDC and the screen coordinate system of each graphics API.

As shown in fig. 9, an object 901 needs to be converted to 3D NDC coordinates in the NDC space 904 within the cropping space 903 defined by the camera 902, then mapped into the UV space 905 of the 2D screen, and finally displayed on the display screen. As shown in fig. 10, the screen coordinate system of the UV space 905 has an origin (0,0) on the upper left side and an origin (0,0) on the lower left side. In the case where U and V of the UV space take values of 0 to 1, the end point of the UV space is (0.5).

Different data collection terminals may apply any of the following graphics APIs: directX, openGL, vulkan, and metal.

For the NDC space, as shown in FIG. 11, the z-axis direction of the NDC of each graphics API is inward and the x-axis direction is rightward, the Y-axis directions of the NDCs of DirectX, and metal are upward, and the Y-axis directions of the NDCs of Vulkan are downward. The range of the Z axis of NDC of DirectX, metal and Vulkan is [0,1], the range of the Z axis of NDC of OpenGL is [ -1,1], and therefore NDC coordinates from data acquisition terminals adopting different platforms need to be subjected to NDC alignment.

Taking the NDC of the Metal platform as an example, the coordinate systems of the DirectX platform and the Metal platform are consistent without conversion; for converting the opengl platform into the metal platform, as shown in formula (3) and formula (4), the perspective projection matrix M of the opengl platform _proj-OpenGL Parallel projection matrix M _orth-OpenGL The left multiplication is needed to be carried out by a first adjustment matrix M' to obtain a perspective projection matrix M of the metal platform _proj-metal Parallel projection matrix M _orth-metal ：

M _proj-metal ＝M′·M _proj-OpenGL Formula (3);

M _orth-metal ＝M′·M _orth-OpenGL formula (4);

wherein the content of the first and second substances,

in the matrix, n is the z coordinate of the near surface of the clipping space, f is the z coordinate of the far surface of the clipping space, t is the y coordinate of the top (top) line of the near surface of the clipping space, b is the y coordinate of the bottom (bottom) line of the near surface of the clipping space, l is the x coordinate of the left (left) line of the near surface of the clipping space, and r is the x coordinate of the right (right) line of the near surface of the clipping space.

For the conversion of the vulkan platform into the metal platform, as shown in fig. 12, the Y value is turned over in the NDC of the vulkan platform to obtain the NDC of the metal platform.

If the NDC of the OpenGL platform is taken as the target NDC, when the NDC of the metal platform or the vulkan platform is converted, the first adjustment matrix is an inverse matrix of the first adjustment matrix when the NDC of the metal platform is taken as the target NDC.

And the scene reconstruction equipment converts the NDC coordinates aligned with the NDC coordinate system into a world space to obtain the world coordinates aligned with the world coordinate system.

Data rendering phase

And the pose estimation and the data splicing are involved in the data rendering stage.

And when the pose estimation is carried out, the pose estimation of the cameras is carried out by the aligned coordinate points which need data processing, and the rotation and translation matrixes among the cameras are calculated. Here, the relative pose between the cameras of the two data capture ends may be estimated from the coordinates corresponding to the same point at the different data capture ends. Here, for the two data acquisition ends a and B, the world coordinates and texture coordinates of the multiple target points corresponding to the data acquisition end a are acquired, the world coordinates and texture coordinates of the multiple target points corresponding to the data processing end B are acquired, the world coordinates and texture coordinates of the target points are processed by using a P3P algorithm, and a rotational translation matrix between the camera of the data acquisition end a and the camera of the data acquisition end B is obtained.

As shown in fig. 13, projections of feature points shown by black padding on the camera 1 and the camera 2 are different based on different viewing angles of the camera 1 and the camera 2, where the feature points shown by black padding are feature points within the capture range of the camera 1 and common feature points among the feature points within the capture range of the camera 2, and here, the feature points can be determined based on world coordinates of each feature point within the capture range of the camera 1 and world coordinates of each feature point within the capture range of the camera 2, where the world coordinates corresponding to a feature point in the camera 1 are the same as the world coordinates corresponding to the camera 2, and then the feature point is a common feature point.

After the common feature points of the camera 1 and the camera 2 are determined, the relative poses of the camera 1 and the camera 2 are determined based on the 3D world coordinates of each common feature point and the texture coordinates (2D coordinates) of each common feature point on the display screen of the camera 1 and the texture coordinates on the display screen of the camera 2 shown in fig. 13.

Here, for a feature point, the world coordinates of each feature point may be determined based on the NDC coordinates of the unified coordinate system.

In the embodiment of the application, the rotation and translation matrix can represent a rotation matrix and a translation matrix, wherein the rotation matrix represents the posture change amount of the characteristic point from one camera to another camera.

In the embodiment of the application, a P3P algorithm can be adopted to determine the relative poses of different cameras, namely different data acquisition terminals, so that the coordinates and the rotation angles of the different data acquisition terminals in a world coordinate system can be obtained. Where P3P requires the use of a given 3-point geometry, as shown in fig. 14, the input data includes 3 pairs of 3D-2D matching points for one camera. Note that the 3D point is A, B, C, and the 2D points are a, b, and c, where lower case letters represent the projection of the feature point represented by the corresponding upper case letter on the camera imaging plane. In addition, P3P needs to use a pair of verification points, and select the correct one from the possible solutions (the verification point is denoted as D-D), and the optical center of the camera is O.

When data splicing is carried out, the scene reconstruction equipment splices the feature points in different cameras, wherein the topological relation between the feature points corresponding to different data acquisition terminals is established according to the relative poses between different data acquisition terminals and the world coordinates of the feature points of different data acquisition terminals, finally the coordinate data of the complete scene in the central area, namely the data to be rendered, is synthesized, and the data to be rendered is rendered to obtain the reconstructed virtual scene.

In an example, as shown in fig. 15, each image content outside the range 1501 is a two-dimensional display content of image data acquired by each data acquisition terminal on a display screen of a display device, and the scene reconstruction device splices data to be rendered corresponding to the image data acquired by each data acquisition terminal to obtain a three-dimensional virtual scene within the range 1501.

According to the scene reconstruction method provided by the embodiment of the application, based on the cross-platform property of the WebGPU, the coordinate systems of equipment terminals participating in scene reconstruction are aligned in an NDC space, the camera pose of each terminal is further calculated, and synthesized scene information is rendered; on one hand, the cross-platform performance of the webgpu is exerted, and the algorithm can be deployed in equipment terminals of different platforms; on the other hand, the terminal of various platforms can work simultaneously by the introduced coordinate system alignment mode, so that a uniform coordinate space is achieved, and the consistency and the accuracy of scene reconstruction are ensured.

The scene reconstruction method provided by the embodiment of the application can be further implemented as the following two scenes:

when three-dimensional scene reconstruction is carried out on intelligent terminals of various different platforms, respective coordinate systems are different, and the intelligent terminal of the Windows platform/Linux platform can adopt an NDC device coordinate system of DirectX/OpenGL; an intelligent terminal of the Android platform can adopt an NDC equipment coordinate system of Vulkan/OpenGL; the intelligent terminal of the Ios/Macos platform can adopt the NDC equipment coordinate system of Metal. According to the scene reconstruction method provided by the embodiment of the application, the three-dimensional data acquired by the intelligent terminals of all the platforms are aligned to the unified NDC coordinate system, the screen UV data mapped by the three-dimensional data are unified to the unified screen coordinate system, then the camera poses of all the intelligent terminals are calculated through the combination of a plurality of groups of 3D points and 2D UV points by means of a P3P algorithm, and then the reconstruction and rendering of the three-dimensional scene are completed.

In a second scenario, in a multi-player interactive game, as shown in fig. 16, in a virtual video conference, the intelligent terminals of each participant are different, and at this time, based on webgpu, the intelligent terminals can be conveniently deployed to various terminals, and the bottom layer can autonomously select a graphics API to use according to a terminal platform. On one hand, the algorithm is easy to adapt to a terminal platform, and on the other hand, the consistency and the uniformity of the scene can be ensured according to the solution of coordinate system alignment provided by the text.

Fig. 16 shows a reconstructed virtual scene, where real users to which different virtual characters belong in the virtual scene are located at different geographic positions, data acquisition terminals of the real users acquire images of the real users and desks behind the real users, respectively, each data acquisition terminal transmits data to a scene reconstruction device, the scene reconstruction device virtualizes desks behind the real users into different positions of the same desk according to the different positions, and determines relative displacement between cameras of different data acquisition terminals according to the acquired desks, so as to splice images corresponding to different data acquisition terminal devices according to the determined relative displacement.

As shown in fig. 17, a scene reconstruction apparatus 1700 of the embodiment of the present application includes:

a first determining module 1701 configured to determine at least two sets of scene data, where the at least two sets of scene data include scene data determined based on different graphics interfaces, the scene data include coordinate data of each feature point in a set of feature points, the coordinate data include first standard device coordinate system NDC coordinates and first texture coordinates, the first NDC coordinates in different sets of scene data use the same target NDC, and the second texture coordinates in different sets of scene data use the same target screen coordinate system;

a conversion module 1702, configured to convert, for each of the at least two sets of scene data, a first NDC coordinate in the scene data into a world coordinate, resulting in data to be rendered;

a splicing module 1703 configured to splice at least two sets of the data to be rendered to obtain target scene data;

a rendering module 1704 configured to render the target scene data to obtain a target virtual scene.

In some embodiments, the stitching module 1703 is further configured to:

determining the relative pose between different data acquisition terminals in at least two data acquisition terminals corresponding to the at least two groups of data to be rendered based on the world coordinates and the first texture coordinates of the feature points;

and splicing corresponding data to be rendered based on the relative pose between different data acquisition terminals to obtain the target scene data.

In some embodiments, the stitching module 1703 is further configured to:

respectively taking each data to be rendered in the at least two groups of data to be rendered as first data to be rendered, and executing the following processing on the first data to be rendered:

and determining the relative pose between a first data acquisition terminal corresponding to the first data to be rendered and a second data acquisition terminal corresponding to the second generation of rendering data according to the first NDC coordinate and the first texture coordinate of the common feature point.

In some embodiments, the first determination module 1701 is further configured to:

In some embodiments, apparatus 1700 further comprises: a second determination module further configured to:

determining a graphical interface used by the data acquisition terminal;

In practical applications, the first determining module 1701, the converting module 1702, the splicing module 1703, the rendering module 1704, and the second determining module may be implemented by a processor located on a scene reconstruction device, specifically, a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

It should be understood by those skilled in the art that the related description of the scene reconstruction apparatus according to the embodiments of the present application can be understood by referring to the related description of the scene reconstruction method according to the embodiments of the present application.

An embodiment of the present application provides an electronic device, fig. 18 is a schematic structural diagram of another optional electronic device provided in an embodiment of the present application, and as shown in fig. 18, an embodiment of the present application provides an electronic device 1800, including:

a processor 181 and a storage medium 182 storing instructions executable by the processor 181, the storage medium 182 depending on the processor 181 for performing operations via a communication bus 183, the instructions when executed by the processor 181 performing the scene reconstruction method as performed in one or more of the embodiments described above.

It should be noted that, in practical applications, the various components in the terminal are coupled together by a communication bus 183. It is understood that the communication bus 183 is used to enable connection communication between these components. The communication bus 183 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled in figure 18 as communication bus 183.

The present application provides a computer storage medium, which is used to store a computer program, where the computer program enables a computer to execute the steps of the scene reconstruction method according to one or more embodiments described above.

An exemplary structure diagram of an electronic device 1900 provided in the embodiment of the present application. The electronic device 1900 shown in fig. 19 includes a processor 1910. The electronic device may be implemented as a scene reconstruction device.

The processor 1910 is configured to:

and rendering the target scene data to obtain a target virtual scene.

In this embodiment, the processor 1910 may call and run a computer program from the memory to implement the scene reconstruction method in this embodiment.

Optionally, as shown in fig. 19, the electronic device 1900 may further include a memory 1920. From the memory 1920, the processor 1910 may call and execute a computer program to implement the scene reconstruction method in the embodiment of the present application.

The memory 1920 may be a separate device from the processor 1910 or may be integrated into the processor 1910.

Optionally, as shown in fig. 19, the electronic device 1900 may further include a transceiver 1930, and the processor 1910 may control the transceiver 1930 to communicate with other devices, and in particular, may transmit information or data to other devices or receive information or data transmitted by other devices.

The transceiver 1930 may include a transmitter and a receiver, among other things. The transceiver 1190 may further include antennas, which may be one or more in number.

Optionally, the electronic device 1900 may implement a corresponding process implemented by the electronic device in each method of the embodiment of the present application, and for brevity, details are not described here again. Alternatively, the electronic device 1900 may be a terminal device or a network device.

It should be understood that the processor of the embodiments of the present application may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), enhanced Synchronous SDR AM (ESDRAM), synchronous link Dynamic random access memory (Synchlink DRAM, SLDRAM), and Direct memory bus random access memory (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should be understood that the above memories are exemplary but not limiting, for example, the memories in the embodiments of the present application may also be static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), direct Rambus RAM (DR RAM), and the like. That is, the memory in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The embodiment of the application also provides a computer readable storage medium for storing the computer program.

Optionally, the computer-readable storage medium may be applied to the electronic device in the embodiment of the present application, and the computer program enables the computer to execute the corresponding process implemented by the electronic device in each method in the embodiment of the present application, which is not described herein again for brevity.

Embodiments of the present application also provide a computer program product comprising computer program instructions.

Optionally, the computer program product may be applied to the electronic device in the embodiment of the present application, and the computer program instructions enable the computer to execute corresponding processes implemented by the electronic device in the methods in the embodiment of the present application, which are not described herein again for brevity.

The embodiment of the application also provides a computer program.

Optionally, the computer program may be applied to the electronic device in the embodiment of the present application, and when the computer program runs on a computer, the computer is enabled to execute a corresponding process implemented by the electronic device in each method in the embodiment of the present application, and for brevity, details are not described here again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for scene reconstruction, the method comprising:

and rendering the target scene data to obtain a target virtual scene.

2. The method according to claim 1, wherein the splicing at least two groups of data to be rendered to obtain target scene data comprises:

3. The method according to claim 2, wherein the determining, based on the world coordinates and the first texture coordinates of the feature points, the relative pose between different data acquisition terminals in at least two data acquisition terminals corresponding to the at least two groups of data to be rendered comprises:

4. The method of claim 1, wherein determining at least two sets of scene data comprises:

5. The method of claim 4, further comprising:

aiming at each data acquisition terminal in the at least two data acquisition terminals, respectively executing the following processing:

determining a graphical interface used by the data acquisition terminal;

6. The method according to claim 5, wherein the aligning the coordinate system of the original scene data to obtain the scene data corresponding to the original scene data comprises:

7. The method of claim 6, wherein the transforming the first NDC coordinate into an NDC space corresponding to a target NDC according to the first NDC and the target NDC to obtain the first NDC coordinate comprises:

8. The method according to claim 5, wherein the aligning the coordinate system of the original scene data to obtain the scene data corresponding to the original scene data comprises:

9. The method according to claim 8, wherein the transforming the first texture coordinate into a screen coordinate system space corresponding to a target screen coordinate system according to the first screen coordinate system and the target screen coordinate system to obtain the first texture coordinate comprises:

10. An apparatus for scene reconstruction, the apparatus comprising:

the conversion module is configured to convert a first NDC coordinate in the scene data into a world coordinate for each of the at least two sets of scene data to obtain data to be rendered;

11. A scene reconstruction device comprising a processor, wherein the processor is configured to:

and rendering the target scene data to obtain a target virtual scene.

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the scene reconstruction method according to any one of claims 1 to 9 when executing the computer program.

13. A storage medium storing an executable program, wherein the executable program, when executed by a processor, implements the scene reconstruction method of any one of claims 1 to 9.