CN114373041A

CN114373041A - Three-dimensional reconstruction method and equipment

Info

Publication number: CN114373041A
Application number: CN202111533663.6A
Authority: CN
Inventors: 张思栋; 许瀚誉; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-04-19
Anticipated expiration: 2041-12-15
Also published as: CN114373041B

Abstract

The application relates to the technical field of computer vision and computer graphics, and provides a three-dimensional reconstruction method and equipment.A collection terminal removes repeated vertex data in initial reconstruction data to obtain target reconstruction data after reconstructing a three-dimensional model of a target object; in the real-time interaction process, the acquisition terminal acquires pose data of each vertex in the three-dimensional model according to a target depth image of a current frame of a target object, and acquires pose data of the remaining vertices according to indexes of the remaining vertices after duplication is removed, so that redundancy of the pose data transmitted in real time is reduced, the transmission speed of the pose data is increased, the rendering terminal can be ensured to render and display the three-dimensional model corresponding to the current frame in real time according to the pose data of the remaining vertices and target reconstruction data, and the real-time performance of three-dimensional reconstruction is improved.

Description

Three-dimensional reconstruction method and equipment

Technical Field

The application relates to the technical field of computer vision and computer graphics, in particular to a three-dimensional reconstruction method and equipment.

Background

In recent years, three-dimensional reconstruction technology has become a key research content in the fields of computer vision, computer graphics and the like, and is widely applied to scenes such as augmented reality, tele-immersive communication, three-dimensional live video and the like.

Three-dimensional reconstruction refers to the process of reconstructing three-dimensional information from single-view or multi-view images. For some static scenes and objects, three-dimensional models of the scenes and the objects are reconstructed through a static three-dimensional reconstruction algorithm and then rendered in the three-dimensional scenes for display. For some dynamic objects (such as human beings), because of dynamic change, a series of three-dimensional models are often required to be established, and the three-dimensional models are sequentially rendered and displayed to describe the motion state of the object.

In a real-time remote three-dimensional reconstruction social system, the main concern is the real-time dynamic three-dimensional reconstruction of human bodies and human faces. The acquisition terminal acquires data by using various sensors, reconstructs human body three-dimensional information by adopting a dynamic three-dimensional reconstruction method, sends the human body three-dimensional information to the rendering terminal, and renders and displays the human body three-dimensional model in real time according to the human body three-dimensional information by the rendering terminal. Because the vertex colors, vertex normal vectors, surface slices and other reconstruction data of the multi-path three-dimensional model are transmitted in real time, the data volume transmitted by each frame is large, and large pressure is caused to the bandwidth, so that when the frame rate of the three-dimensional model is insufficient, the rendering terminal visually causes the model to act in a blocking mode, and after the time of the frame rate being less than 20 frames exceeds a certain threshold value, the vertigo is caused, and the immersive experience of a user is influenced.

Disclosure of Invention

The embodiment of the application provides a three-dimensional reconstruction method and equipment, which are used for reducing the data volume of three-dimensional reconstruction and reducing the pressure of transmission bandwidth, thereby improving the real-time performance of three-dimensional reconstruction.

In a first aspect, an embodiment of the present application provides a three-dimensional reconstruction method applied to an acquisition terminal, including:

acquiring a plurality of frames of depth images and corresponding RGB images of a target object rotating in a preset posture, and reconstructing a three-dimensional model of the target object according to the plurality of frames of depth images and the corresponding RGB images;

removing repeated vertex data in the initial reconstruction data corresponding to the three-dimensional model to obtain target reconstruction data and sending the target reconstruction data to a rendering terminal;

acquiring a target depth image of the current frame of the target object in the interaction process, and acquiring pose data of each vertex in the three-dimensional model according to the target depth image;

and extracting the pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of each vertex according to the indexes of the residual vertexes after the duplication is removed, and sending the pose data to the rendering terminal, so that the rendering terminal renders and displays the three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes.

In a second aspect, an embodiment of the present application provides a three-dimensional reconstruction method applied to a rendering terminal, including:

receiving target reconstruction data sent by an acquisition terminal; the target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of a target object, and the three-dimensional model is reconstructed by a multi-frame depth image and a corresponding RGB image, which are obtained by rotating the target object in a preset posture;

receiving pose data of the residual vertexes in the three-dimensional model of the current frame sent by the acquisition terminal; extracting the pose data of the residual vertexes from the pose data of each vertex of the three-dimensional model according to the indexes of the residual vertexes after the duplication is removed, wherein the pose data of each vertex is acquired according to the target depth image of the current frame of the target object;

and replacing the pose data of the corresponding peak of the target reconstruction data with the pose data of the residual peak, and rendering and displaying the three-dimensional model corresponding to the current frame according to the replaced target reconstruction data.

In a third aspect, an embodiment of the present application provides an acquisition terminal, including: the system comprises an RGBD camera, a processor, a memory and a communication interface, wherein the RGBD camera, the memory and the communication interface are connected with the processor through a bus;

the RGBD camera is used for acquiring a depth image and an RGB image;

the memory stores a computer program according to which the processor performs the following operations:

acquiring a plurality of frames of depth images and corresponding RGB images of a target object which are acquired by the RGBD camera and rotate in a preset posture, and reconstructing a three-dimensional model of the target object according to the plurality of frames of depth images and the corresponding RGB images;

removing repeated vertex data in the initial reconstruction data corresponding to the three-dimensional model to obtain target reconstruction data, and sending the target reconstruction data to a rendering terminal through the communication interface;

acquiring a target depth image of the current frame of the target object acquired by the RGBD camera in the interaction process, and acquiring pose data of each vertex in the three-dimensional model according to the target depth image;

and extracting the pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of each vertex according to the indexes of the residual vertexes after the duplication is removed, and sending the pose data of the residual vertexes to the rendering terminal through the communication interface, so that the rendering terminal renders and displays the three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertex of the target reconstruction data with the pose data of the residual vertexes.

In a fourth aspect, the present application provides a rendering terminal, including a processor, a memory, a display, and a communication interface, where the memory and the communication interface are connected to the processor through a bus;

receiving target reconstruction data sent by an acquisition terminal through the communication interface; the target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of a target object, and the three-dimensional model is reconstructed by a multi-frame depth image and a corresponding RGB image, which are obtained by rotating the target object in a preset posture;

receiving pose data of the residual vertexes in the three-dimensional model of the current frame sent by the acquisition terminal through the communication interface; extracting the pose data of the residual vertexes from the pose data of each vertex of the three-dimensional model according to the indexes of the residual vertexes after the duplication is removed, wherein the pose data of each vertex is acquired according to the target depth image of the current frame of the target object;

and replacing the pose data of the corresponding peak of the target reconstruction data with the pose data of the residual peak, rendering the three-dimensional model corresponding to the current frame according to the replaced target reconstruction data, and displaying the three-dimensional model by the display.

In a fifth aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a three-dimensional reconstruction method provided in an embodiment of the present application.

In the embodiment of the application, the acquisition terminal reconstructs a three-dimensional model of the target object according to the multi-frame depth image and the corresponding RGB image of the target object rotating in the preset posture, removes repeated vertex data in initial reconstruction data of the three-dimensional model to obtain target reconstruction data, and removes the duplication of the vertex data to ensure that the target reconstruction data does not contain redundant vertex data and reduce the data volume relative to the initial reconstruction data, so that the pressure of a transmission bandwidth is reduced when the target reconstruction data is sent to the rendering terminal; furthermore, in the real-time interaction process, the acquisition terminal acquires pose data of each vertex in the three-dimensional model according to the target depth image of the current frame of the target object, and extracts pose data of the remaining vertices from the acquired pose data of each vertex according to the indexes of the remaining vertices after duplication removal, so that redundancy of the pose data transmitted in real time is reduced, the transmission speed of the pose data is increased, the rendering terminal is ensured to render and display the three-dimensional model corresponding to the current frame in real time according to the pose data of the remaining vertices and the target reconstruction data, and the real-time performance of three-dimensional reconstruction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 illustrates a block diagram of a three-dimensional reconstruction system provided by an embodiment of the present application;

fig. 2 exemplarily shows a flowchart of a three-dimensional reconstruction method implemented on an acquisition terminal side according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a method for self-modeling at an acquisition terminal side according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a self-modeling process provided by an embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for removing duplicate data from reconstructed data of a model provided by an embodiment of the present application;

fig. 6 is a flowchart illustrating a three-dimensional reconstruction method implemented at a rendering terminal side according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a reconstruction effect provided by an embodiment of the present application;

fig. 8 is a flowchart illustrating a complete three-dimensional reconstruction method provided by an embodiment of the present application;

fig. 9 is a diagram illustrating a structure of an acquisition terminal according to an embodiment of the present application;

fig. 10 is a block diagram schematically illustrating a rendering terminal according to an embodiment of the present application.

Detailed Description

An important application scenario of the dynamic three-dimensional reconstruction technology is a real-time remote three-dimensional reconstruction social system. And after the acquisition terminal reconstructs the three-dimensional model based on the acquired color image and depth data, the three-dimensional reconstruction data is transmitted to the rendering terminal through the cloud for rendering and displaying. Wherein the collection terminal and the rendering terminal are relative to the user. For example, a user a corresponds to one display terminal 1, a user B corresponds to one display terminal 2, the display terminal 1 is an acquisition terminal for the user a, and is a rendering terminal for the user B, and similarly, the display terminal 2 is an acquisition terminal for the user B, and is a rendering terminal for the user a.

For example, in a virtual social scene of a real-time remote three-dimensional communication system, different user terminals transmit respective dynamic three-dimensional model data to other user terminals through three-dimensional reconstruction, and the other user terminals receive the data and then perform rendering display at the local terminal. Compared with the traditional voice or video communication mode, the mode can enable the user in different places to achieve the immersive social experience.

The core technology of the real-time remote three-dimensional reconstruction social system relates to real-time three-dimensional reconstruction of a human body and a human face, a coding, decoding and transmission technology of two-dimensional or three-dimensional data and an immersive rendering and displaying technology. The three-dimensional reconstruction data relates to data of vertexes, patches and the like of three-dimensional geometry, wherein the vertex data comprises vertex coordinates, vertex normal vectors and vertex colors. The human body dynamic three-dimensional reconstruction process comprises the following steps: input parameters such as posture data, geometric data, and (clothes) material data of a human body are obtained from various sensor devices, and the input parameters are processed by a non-rigid real-time three-dimensional reconstruction method, so that a three-dimensional model of the human body is reconstructed.

In the dynamic three-dimensional reconstruction process, the higher the resolution of a model voxel is, the more severe the increase of the data volume is, so that on the premise of no mature, efficient and high-fidelity three-dimensional data compression technology, in a dynamic real-time three-dimensional reconstruction scheme, cloud data transmission has an important influence on the quality of model reconstruction and the imaging of a rendering terminal.

For example, under the condition that the bandwidth of the local area network is 30FPS, the transmission rate required by the voxel with the resolution of 192 × 128 (pixels) is 256Mbps, the imaging effect of the resolution at the rendering and display end is poor, while the transmission rate required by the voxel with the resolution of 384 × 384 is 1120Mbps (taking 30FPS as an example), the increase of the transmitted data amount is severe compared with the voxel with the resolution of 192 × 128 (pixels), and the real-time transmission is difficult under the current ideal 5G network bandwidth condition.

However, real-time social interaction requires the transmission of data such as model vertices, patches, and textures for each frame of the reconstructed model. Therefore, the real-time transmission of the model can be realized while the fidelity and the authenticity of the three-dimensional reconstruction model are ensured, and the method becomes the key point of research on remote real-time remote three-dimensional reconstruction social system productization.

At present, a method of 'pre-modeling and real-time pose driving model' is generally adopted for a real-time remote three-dimensional reconstruction social system, although the method reduces the data volume of real-time transmission, the redundancy of vertex data in the reconstruction data of 'pre-modeling' and the pose data of real-time transmission is large, the real-time performance of the system can still be influenced, and the real-time performance is worse for the transmission of multi-path model data of a high-precision model.

In view of this, under the condition of ensuring the authenticity of the model, the network bandwidth occupied by data transmission is reduced, so as to meet the requirement of the real-time performance of the social system, and the rendering display efficiency of the rendering terminal is improved. The embodiment of the application provides a three-dimensional reconstruction method and equipment, which divide real-time three-dimensional reconstruction and remote terminal rendering display into four stages. The first stage is a self-modeling stage, a high-precision three-dimensional model is reconstructed from a depth image and an RGB image acquired by an RGBD camera through a high-performance computer and a three-dimensional reconstruction algorithm, and data of the complete three-dimensional model comprises vertex coordinates, vertex colors, vertex normal vectors, patches and textures formed by vertices and the like. The second stage is a model vertex deduplication stage, because many vertices of the reconstructed model are repeated, and more redundant data such as coordinates, normal vectors, colors and the like are generated, so that the data volume of the model is increased, and the transmission speed is influenced. And the third stage is a transmission stage of the model data after the duplication removal and the current model attitude data, and transmits the model data of the vertex coordinates, the vertex normal vectors, the vertex colors and the like after the duplication removal and the real-time pose data of each frame of model vertex to the cloud. And the fourth stage is a rendering display stage, wherein a three-dimensional model is rendered and displayed according to the model data after the duplication removal and the vertex pose data transmitted in real time. According to the embodiment of the application, the data volume of transmission can be effectively reduced through the vertex data deduplication, the transmission time delay is reduced, the rendering display efficiency is improved, and according to the model data after the deduplication and the human body three-dimensional model rendered by the real-time vertex pose data, the action is more natural and real, the viewing effect is better, and the immersive experience of a user is improved.

It should be noted that the method provided by the embodiment of the present application is not only suitable for three-dimensional reconstruction and data transmission of human bodies and human faces, but also suitable for three-dimensional reconstruction and real-time data transmission of any moving object with rigidity and non-rigidity.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 schematically illustrates a three-dimensional reconstruction system architecture diagram provided in an embodiment of the present application. As shown in fig. 1, the system includes an acquisition terminal 101, a transmission terminal 102, and a rendering terminal 103.

The acquisition terminal 101 includes an RGBD camera 101_1 and a high performance computer 101_ 2. The RGBD camera 101_1 may be a Kinect camera or a Realsense camera. The RGBD camera 101_1 is mainly used for collecting a depth image and an RGB image of a user and a real scene where the user is located, and the depth image and the RGB image of the scene can be used for foreground segmentation to segment a human body from the whole scene. Through a matched computer 101_2, the collected depth image and RGB image of the user are subjected to related operation to obtain human body pose data, and reconstructed data such as vertex coordinates, vertex colors, vertex normal vectors and patch formed by the vertices of the three-dimensional model are reconstructed. In addition, the computer 101_2 can also remove duplicate data of the pose data and the reconstruction data, thereby reducing redundant data.

The transmission terminal 102 is used for acquiring the reconstruction data and the real-time pose data of the acquisition terminal, and distributing the data after encoding. Generally, the transmission terminal 102 losslessly encodes the received data of the acquisition terminal 101 and the data of the distribution rendering terminal 103, so that the data can be encoded and decoded by adopting a mature compression technology with high efficiency and fidelity. The transmission terminal 102 may be an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, big data and artificial intelligence platform, and the like.

The rendering terminal 103 renders the three-dimensional model of the human body according to the re-created data after the duplication removal and the real-time pose data received from the transmission terminal 102, and performs immersion display. The rendering terminal 103 may be a display device such as a television, a mobile phone, a computer, VR/AR glasses, and the like.

It should be noted that the system architecture shown in fig. 1 may be deployed according to different usage scenarios, for example, in a live broadcast scenario, a main broadcast end sets an acquisition terminal of the system, a client end sets a rendering terminal of the system, and a user may browse a reconstruction model through the rendering terminal; for another example, in a conference scene, two conference rooms of a teleconference need to arrange a collection terminal and a rendering terminal at the same time to perform real-time three-dimensional communication in the two conference rooms, and the experience is higher than that of a traditional voice or video communication mode.

Based on the system architecture shown in fig. 1, fig. 2 exemplarily shows a flowchart of a method for three-dimensional reconstruction provided in an embodiment of the present application, and as shown in fig. 2, the flowchart is executed by an acquisition terminal and mainly includes the following steps:

s201: acquiring a plurality of frames of depth images and corresponding RGB images of the target object rotating in a preset posture, and reconstructing a three-dimensional model of the target object according to the plurality of frames of depth images and the corresponding RGB images.

In an embodiment of the present application, the acquisition terminal acquires a depth image and an RGB image of the target object to pre-construct a three-dimensional model of the target object, specifically referring to fig. 3:

s2011: acquiring a multi-frame scene depth image acquired by an RGBD camera.

In performing S2011, the RGBD camera is fixed, and multiple frames of scene depth images not including the target object are collected for segmentation of the foreground and background.

S2012: the method comprises the steps of acquiring a depth image of a target object in a preset posture acquired by an RGBD camera, and generating an inner layer driving model of the target object according to the acquired depth image.

In an optional implementation manner, when S2012 is executed, the target object moves in front of the RGBD camera, faces the RGBD camera, is swung to a Tpose posture, is still for 2 to 3 seconds, the RGBD camera acquires a depth image of one frame of the target object, and uses the depth image as a reference frame, and combines the depth image of the scene to perform segmentation of the foreground and the background, and generates a mask map of the target object through operations such as expansion and erosion of the image. Further, according to the generated mask map of the target object, point cloud data of the target object is extracted from the depth image of the target object, and the extracted point cloud data and the point cloud data of a parameterized human body model (such as an SMPL model) are subjected to non-rigid fitting to obtain an inner layer driving model of the target object.

S2013: acquiring a multi-frame depth image and a corresponding RGB image of a target object rotating in a preset posture.

In an optional embodiment, in step S2013, the target object rotates in front of the RGBD camera for one circle in the Tpose posture, and during the rotation, it needs to be ensured that the RGBD camera can acquire complete data of the target object from the head to the step, so as to improve the integrity of the three-dimensional model.

S2014: and reconstructing a geometric model of the target object according to the multi-frame depth images, and performing texture mapping on the geometric model according to the multi-frame RGB images to generate a three-dimensional model of the target object.

When S2014 is executed, for each frame of depth image, point cloud data of the target object is extracted from the depth image according to the mask map of the target object to enrich the point cloud data extracted from the reference frame, and the extracted point cloud data is non-rigidly fitted with the inner layer driving model by using a dynamic real-time non-rigid reconstruction algorithm, so that a skeleton node (as shown in (a) in fig. 4) of the inner layer driving model moves, and a geometric model of the target object is generated. Further, according to the mapping relationship between the depth image and the RGB image, the color of the geometric vertex is obtained from the corresponding RGB image, and the texture mapping is performed on the geometric model, so as to obtain a real three-dimensional model of the target object, as shown in (b) of fig. 4.

S202: and removing repeated vertex data in the initial reconstruction data corresponding to the three-dimensional model to obtain target reconstruction data and sending the target reconstruction data to the rendering terminal so that the rendering terminal renders the three-dimensional model according to the target reconstruction data.

In the embodiment of the application, after the complete and real three-dimensional model of the target object is obtained, in order to avoid obtaining new vertex and texture data, the reconstruction of the three-dimensional model is stopped, that is, the geometric structure, the topological structure, the number of vertices, the vertex color and the patch generated by the vertices of the three-dimensional model are kept unchanged.

Generally, the reconstructed three-dimensional model has many repeated vertices, which generate redundant data such as repeated normal vectors and colors, and affect the data transmission speed. Therefore, in S202, after the three-dimensional model is reconstructed through the process shown in fig. 3, initial reconstruction data (including vertex coordinates, vertex normal vectors, vertex colors, patch indexes, and other data) of the three-dimensional model is obtained, and the vertex data in the initial reconstruction data is de-duplicated to reduce the data amount of the three-dimensional model, so as to obtain target reconstruction data. In specific implementation, see fig. 5:

s2021: and taking the three-dimensional coordinates of each vertex of the three-dimensional model in the initial reconstruction data as a key, and taking the normal vector and the color of the corresponding vertex as values to obtain a key value pair corresponding to each vertex.

And executing S2021, associating and de-duplicating vertex data in a key value pair mode. Specifically, the three-dimensional coordinates of each vertex are used as keys (keys) thereof, the color and normal vector of the vertex corresponding to the current three-dimensional coordinates are used as values (values), and the one-to-one correspondence relationship between the three-dimensional coordinates of the vertex and the color and normal vector is generated.

In an alternative embodiment, the three-dimensional coordinates of the vertices are represented by a position number, for example, position number 1 (x)₁，y₁，z₁) Vertex of (i.e., (x)₁，y₁，z₁) The key at the vertex is 1), and the position number 2 is (x)₂，y₂，z₂) Vertex of (i.e., (x)₂，y₂，z₂) The key at the vertex is 2), and so on. The color and normal vectors of the vertices corresponding to the three-dimensional coordinates are represented by English letters, e.g., (x)₁，y₁，z₁) Color sum corresponding to the vertexThe normal vector is A (i.e., (x)₁，y₁，z₁) The value of the vertex is A), (x)₂，y₂，z₂) The vertex at (A) corresponds to a color and normal vector of B (i.e., (x)₂，y₂，z₂) Where the vertex value is B), and so on. Assuming that the keys of the vertices in the three-dimensional model are 5, 2, 3, 2, 1, 3, and 4, respectively, and the values of the vertices are E, B, C, B, A, C, D, the pairs of keys corresponding to the vertices are { (5, E), (2, B), (3, C), (2, B), (1, a), (3, C), and (4, D) }.

S2022: and removing repeated key value pairs in each key value pair corresponding to the vertex aiming at each vertex to obtain a key value pair corresponding to the vertex, and taking the key value pair corresponding to the vertex as data after the vertex is removed.

In an optional implementation manner, when S2022 is executed, traversing each vertex represented by the three-dimensional coordinate, comparing a key corresponding to each vertex with other keys, if the key appears for the first time, keeping a key value pair corresponding to the vertex represented by the current three-dimensional coordinate, otherwise, deleting the key value pair corresponding to the vertex represented by the current three-dimensional coordinate to ensure that each vertex corresponds to only one key value pair, and using the key value pair corresponding to the vertex as data after the vertex is deduplicated, thereby removing redundant vertex data in initial reconstruction data and reducing data volume.

In other embodiments, to improve deduplication efficiency, vertex data may be sorted first, and data deduplication may be performed based on the sorted result. In specific implementation, the keys of each vertex represented by the three-dimensional coordinates are sequenced, the position of each vertex is changed, the value corresponding to each key is adjusted according to the sequenced result, the quantity of repeated key value pairs corresponding to each vertex is obtained, further, the quantity of the key value pairs needing to be removed of each vertex is determined according to the quantity of the repeated key value pairs corresponding to each vertex, the repeated key value pairs corresponding to the vertex are removed according to the quantity of the removed key value pairs, and the key value pairs corresponding to the vertices are unique.

For example, assuming that the keys of the vertices before the sorting are 5, 2, 3, 2, 1, 3, and 4, and the value of each vertex is E, B, C, B, A, C, D, the keys of the vertices after the sorting from small to large are 1, 2, 3, 4, and 5, and the keys of the vertices are A, B, B, C, C, D, E. According to the sorted result, the number of the key value pairs corresponding to the vertex at the position 1 is 1, no repeated key value pair exists, and duplicate removal is not needed; the vertex at the position 2 corresponds to 2 key value pairs, and in order to ensure that the key value pair corresponding to the vertex at the position 2 is unique, 1 key value pair needs to be removed; the vertex at the position 3 corresponds to 2 key value pairs, and in order to ensure that the key value pair corresponding to the vertex at the position 3 is unique, 1 key value pair needs to be removed; the number of the key value pairs corresponding to the vertex at the position 4 is 1, no repeated key value pair exists, and duplication removal is not needed; the number of the key value pairs corresponding to the vertex at the position 5 is 1, no repeated key value pair exists, and duplication removal is not needed; after removing the repeated key value pairs, the keys corresponding to the vertexes are 1, 2, 3, 4 and 5, and the value is A, B, C, D, E.

And after the vertex data of the three-dimensional model in the self-modeling process are removed, generating a new surface patch from the rest vertexes after the removal of the weight by using the three-dimensional coordinates of each vertex contained in the surface patch of the three-dimensional model before the removal of the weight, and determining a vertex index corresponding to the new surface patch.

It should be noted that, the association of the vertex coordinates, the vertex colors, and the vertex normal vectors in the key value pairs is only an example, and the embodiment of the present application does not make a limitation on the association form of the vertex data, and for example, the relationship may also be performed in an array manner.

In the embodiment of the application, after the three-dimensional model is completely reconstructed, new data are stopped being blended into the voxels, new vertexes cannot be introduced, namely, the geometric structure, the topological structure, the vertex number, the vertex colors and the patches formed by the vertexes of the three-dimensional model are kept unchanged, and only the pose data of each vertex controlled by the framework and the SMPL skin is changed in each frame. Therefore, in order to improve the deduplication efficiency, the indexes of the remaining vertices after deduplication of the initial reconstructed data can be recorded, and then the redundant data is directly removed according to the indexes of the remaining vertices. Wherein the index characterizes the position of the vertex in the three-dimensional model.

In S202, after removing the repeated vertex data in the initial reconstruction data, target reconstruction data is obtained, where the target reconstruction data is complete data of the target object, and includes coordinates of the remaining vertices, normal vectors of the remaining vertices, colors of the remaining vertices, and patch indexes newly generated by the remaining vertices. Because the target reconstruction data does not contain redundant data and the data volume is small, when the target reconstruction data is transmitted to the rendering terminal in a lossless manner through the cloud, the integrity and the reconstruction precision of the three-dimensional model are ensured, the bandwidth pressure is reduced, and the transmission speed is increased. And the rendering terminal receives the target reconstruction data and then stores the target reconstruction data so as to render the three-dimensional model according to the target reconstruction data in the real-time interaction process.

It should be noted that the target reconstruction data is transmitted only once, is not a real posture in an actual interaction process, and does not need to be transmitted in real time. In the actual interaction process, the target object does not keep a posture unchanged, the posture of the three-dimensional model is adjusted through the current posture of the target object, the action of the real three-dimensional model is matched with the action of the target object, and the immersion sense of remote interaction is improved.

S203: and acquiring a target depth image of the current frame of the target object in the interaction process, and acquiring pose data of each vertex in the three-dimensional model according to the target depth image.

In the real-time interaction process, a target object changes the position and the posture of the target object through motion, the target depth image of a current frame target object acquired by an RGBD camera in real time is converted into target Point cloud data, the Point cloud data is converted into a space of an inner layer driving model generated based on a reference frame, and the inner layer driving model corresponding to the reference frame is driven into the posture of the target object in the current frame through operations such as non-rigid Iterative Closest Point (ICP), skin calculation and the like, so that the posture data of the model corresponding to the current frame, namely the three-dimensional coordinates of each vertex, are acquired.

S204: and extracting the pose data of the residual vertex corresponding to the corresponding index from the pose data of each vertex according to the index of the residual vertex after the duplication removal, and sending the pose data to the rendering terminal, so that the rendering terminal renders and displays the three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertex of the target reconstruction data by the pose data of the residual vertex.

In the embodiment of the application, indexes of the residual vertexes after the duplication elimination are recorded through the duplication elimination of the vertex data in the self-modeling stage, and when S204 is executed, the acquisition terminal extracts the pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of each vertex according to the indexes of the residual vertexes. Because the index of the residual vertex is unique, the extracted pose data of the residual vertex is unique and does not contain redundant data, so that the data volume of the pose data of the residual vertex is smaller compared with the pose data of each vertex of the three-dimensional model obtained according to the depth image, the pressure of transmission bandwidth can be reduced, the transmission speed is improved, and the requirement of pose data real-time transmission can be met under the condition of the existing network. And after receiving the pose data of the residual vertexes transmitted in real time, the rendering terminal drives the three-dimensional model reconstructed based on the target reconstruction data to move, so that the three-dimensional model is matched with the posture of the current target object.

Corresponding to the acquisition terminal side, fig. 6 exemplarily shows a flowchart of a rendering display method of a three-dimensional model provided in the embodiment of the present application, and as shown in fig. 6, the flowchart is executed by the rendering terminal and mainly includes the following steps:

s601: and receiving target reconstruction data sent by the acquisition terminal.

The target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of the target object, and the three-dimensional model is reconstructed through a multi-frame depth image and a corresponding RGB image, which are obtained by rotating the target object in a preset posture. The specific process is referred to S201-S202 and will not be repeated here.

In S601, since the target reconstruction data includes complete data of the target object, such as vertex coordinates, vertex colors, patch indexes, and the like, a complete three-dimensional model may be rendered through the target reconstruction data.

S602: and receiving pose data of the residual vertexes in the three-dimensional model corresponding to the current frame, which is sent by the acquisition terminal.

And extracting the pose data of the residual vertexes from the pose data of each vertex of the three-dimensional model according to the indexes of the residual vertexes after the duplication is removed, wherein the pose data of each vertex is acquired according to the target depth image of the current frame target object. The detailed description is referred to S203-S204 and will not be repeated here.

S603: and replacing the pose data of the corresponding peak of the target reconstruction data with the pose data of the residual peak, and rendering and displaying the three-dimensional model corresponding to the current frame according to the replaced target reconstruction data.

In the real-time interaction process, a new vertex cannot be fused, and the topological structure of the three-dimensional model remains unchanged, namely the color of the vertex and a patch generated by the vertex remain unchanged. Therefore, in S603, the rendering terminal replaces the pose data of the corresponding vertex in the target reconstruction data with the pose data of the remaining vertex sent by the acquisition terminal in real time, so that the three-dimensional model matches with the current motion of the target object, and the immersive experience in the remote interaction process is improved. As shown in fig. 7, the effect diagram of 20 consecutive frames is acquired, reconstructed, transmitted and rendered in real time.

In the embodiment of the application, on one hand, the target reconstruction data comprises complete data of a target object, the data are transmitted in a lossless coding mode, the data resolution ratio is unchanged, the integrity and high precision of a three-dimensional model are ensured, the three-dimensional model corresponding to the current frame is rendered and displayed by combining pose data transmitted in real time, the authenticity of model action is ensured, and the immersive experience of remote interaction is improved; on the other hand, the target reconstruction data and the real-time pose data are data subjected to de-duplication, repeated vertex data is not contained, the redundancy is low, the data volume is small, the pressure of network bandwidth is reduced, the transmission speed is increased, the rendering display time delay of a rendering terminal is reduced, the rendering display efficiency is high, the phenomenon that the model acts in a stuck mode is reduced, and the display quality of the model is improved.

According to the embodiment of the application, the quality of the model is ensured, and meanwhile, the data volume transmitted by each frame is reduced. Taking the voxel with the resolution of 256 × 256 (pixels) as an example, by using the three-dimensional reconstruction method provided by the embodiment of the present application, the data volume to be transmitted in each frame is reduced from the original 45939KB to 270KB, which reduces the data volume by nearly 170 times, and improves the real-time performance of system transmission.

The following describes a three-dimensional reconstruction method provided in the embodiment of the present application from a process of interaction between an acquisition terminal and a rendering terminal, as shown in fig. 8:

s801: the acquisition terminal acquires a multi-frame scene depth image.

S802: the acquisition terminal acquires a plurality of frames of depth images and corresponding RGB images of a target object rotating in a preset posture, and human body segmentation is carried out according to the plurality of frames of scene depth images.

S803: and the acquisition terminal reconstructs a three-dimensional model of the target object according to the segmented human body depth image and the human body RGB image.

S804: and the acquisition terminal determines whether the three-dimensional model is complete, if so, S805 is executed, and if not, the S802 is returned.

In an optional implementation mode, when a new vertex is not introduced into the acquired human depth image of the target object, the reconstruction of the three-dimensional model is determined to be complete, and the acquisition of the image is stopped.

S805: the acquisition terminal acquires initial reconstruction data corresponding to the three-dimensional model and removes repeated vertex data in the initial reconstruction data.

Generally, the initial reconstruction data corresponding to the three-dimensional model contains many repeated vertex data, and the redundancy rate is high, so that the data volume is large. In S805, by removing the repeated vertex data in the initial reconstruction data, the data amount transmitted by the three-dimensional model is reduced, the pressure of the network bandwidth is reduced, and the transmission speed is increased.

In S805, the coordinates, colors, normal vectors, and the like of the remaining vertices after deduplication are all unique, redundancy is low, data size is small, and transmission speed is high. The detailed deduplication process is referred to as S202 and will not be repeated here.

S806: and the acquisition terminal records the indexes of the residual vertexes after the duplication removal.

S807: and the acquisition terminal regenerates a new triangular patch according to the residual vertexes.

S808: and the acquisition terminal transmits the coordinates, colors, normal vectors and triangular patch data of the remaining vertexes to the cloud server.

S809: and the rendering terminal acquires the coordinates, the colors, the normal vectors and the triangular patch data of the residual vertexes transmitted by the acquisition terminal from the cloud server.

S810: the acquisition terminal acquires a target depth image of a current frame of a target object in the interaction process and acquires a human body target depth image of the target object.

S811: the acquisition terminal converts the depth image of the human body target into point cloud data, acquires pose data of each vertex in the three-dimensional model from the point cloud data, and acquires current pose data of the remaining vertices according to indexes of the remaining vertices.

S812: and the acquisition terminal sends the current pose data of the residual vertexes to the rendering terminal through the cloud server.

S813: and the rendering terminal replaces the three-dimensional coordinates of the corresponding vertex in the de-duplicated initial reconstruction data with the current pose data of the residual vertex, and renders and displays the three-dimensional model corresponding to the current frame with the replaced data.

Based on the same technical concept, the embodiment of the present application provides an acquisition terminal, which can execute the flow at the acquisition terminal side in the three-dimensional reconstruction method provided by the embodiment of the present application, and can achieve the same technical effect, which is not repeated here.

Referring to fig. 9, the capture terminal includes an RGBD camera 901 and a host 902, the RGBD camera is used for capturing a depth image and an RGB image, the host 902 at least includes a processor 902_1, a memory 902_2 and a communication interface 902_3, and the communication interface 902_3 and the memory 902_2 are connected to the processor 902_1 through a bus 902_ 4;

the memory 902_2 stores a computer program, and the processor 902_1 performs the following operations according to the computer program:

acquiring a plurality of frames of depth images and corresponding RGB images of a target object, which are acquired by an RGBD camera 901 and rotate in a preset posture, and reconstructing a three-dimensional model of the target object according to the plurality of frames of depth images and the corresponding RGB images;

removing repeated vertex data in the initial reconstruction data corresponding to the three-dimensional model to obtain target reconstruction data, and sending the target reconstruction data to a rendering terminal through a communication interface 902_ 3;

acquiring a target depth image of a current frame of a target object in an interaction process acquired by an RGBD (red green blue) camera 901, and extracting pose data of each vertex in a three-dimensional model from the target depth image;

and extracting the pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of each vertex according to the indexes of the residual vertexes after the duplication is removed, and sending the pose data of the residual vertexes to the rendering terminal through the communication interface 902_3, so that the rendering terminal renders and displays the three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes.

Optionally, the processor 902_1 removes repeated vertex data in the initial reconstruction data corresponding to the three-dimensional model, and the specific operations are as follows:

taking the three-dimensional coordinates of each vertex of the three-dimensional model in the initial reconstruction data as a key, and taking the normal vector and the color of the corresponding vertex as values to obtain a key value pair corresponding to each vertex;

and removing repeated key value pairs in each key value pair corresponding to the vertex aiming at each vertex to obtain a key value pair corresponding to the vertex, and taking the key value pair corresponding to the vertex as data after the vertex is removed.

Optionally, after obtaining the key-value pair corresponding to each vertex, the processor 902_1 further performs:

sorting the keys of each vertex, and adjusting the corresponding value of each key according to the sorted result to obtain the repeated key value pair number corresponding to each vertex;

the processor 902_1 removes duplicate key value pairs in each key value pair corresponding to the vertex, and specifically operates as follows:

determining the number of key value pairs to be removed from the vertex according to the number of repeated key value pairs corresponding to the vertex;

and removing repeated key value pairs corresponding to the vertexes according to the removed number of the key value pairs.

Optionally, the target reconstruction data includes coordinates of the remaining vertices, normal vectors of the remaining vertices, colors of the remaining vertices, and patch indexes, and the patch indexes are newly generated from the remaining vertices.

It should be noted that the structure of the acquisition terminal shown in fig. 9 is only the hardware necessary for implementing the method flow shown in fig. 2, and besides, the acquisition terminal also includes conventional hardware with a display function, such as a display 903.

Based on the same technical concept, the embodiment of the application provides a rendering terminal, which can execute the process at the rendering terminal side in the three-dimensional reconstruction method provided by the embodiment of the application, and can achieve the same technical effect, and the process is not repeated here.

Referring to fig. 10, the rendering terminal includes a processor 1001, a memory 1002, a display 1003, and a communication interface 1004, the display 1003, and the memory 1002 being connected with the processor 1001 through a bus 1005;

the memory 1002 stores a computer program, and the processor 1001 performs the following operations according to the computer program:

receiving target reconstruction data sent by the acquisition terminal through a communication interface 1004; the target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of a target object, and the three-dimensional model is reconstructed by a multi-frame depth image and a corresponding RGB image, which are obtained by rotating the target object in a preset posture;

receiving pose data of the residual vertexes in the current frame three-dimensional model sent by the acquisition terminal through a communication interface 1004; extracting the pose data of the residual vertexes from the pose data of each vertex of the three-dimensional model according to the indexes of the residual vertexes after the duplication is removed, wherein the pose data of each vertex is acquired according to a target depth image of a current frame of the target object;

and replacing the pose data of the corresponding peak of the target reconstruction data with the pose data of the residual peak, rendering the three-dimensional model corresponding to the current frame according to the replaced target reconstruction data, and displaying the three-dimensional model through the display 1003.

It should be noted that the structure of the rendering terminal shown in fig. 10 is only the hardware necessary for implementing the method flow shown in fig. 6, and besides, the rendering terminal also includes conventional hardware with a display function, such as an RGBD camera 1006.

Embodiments of the present application also provide a computer-readable storage medium for storing instructions that, when executed, may implement the methods of the foregoing embodiments.

The embodiments of the present application also provide a computer program product for storing a computer program, where the computer program is used to execute the method of the foregoing embodiments.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A three-dimensional reconstruction method is applied to an acquisition terminal and comprises the following steps:

2. The method of claim 1, wherein removing the repeated vertex data in the initial reconstructed data corresponding to the three-dimensional model comprises:

and removing repeated key value pairs in each key value pair corresponding to each vertex to obtain a key value pair corresponding to the vertex, and taking the key value pair corresponding to the vertex as data after the vertex is removed.

3. The method of claim 2, wherein after obtaining the respective key-value pairs for the vertices, the method further comprises:

sorting the keys of each vertex, and adjusting the value corresponding to each key according to the sorted result to obtain the repeated key value pair number corresponding to each vertex;

the removing repeated key value pairs in each key value pair corresponding to the vertex comprises:

determining the number of key value pairs which need to be removed by the vertex according to the repeated number of key value pairs corresponding to the vertex;

4. The method of any of claims 1-3, wherein the target reconstruction data includes coordinates of remaining vertices, normal vectors of remaining vertices, colors of remaining vertices, and patch indices, the patch indices being newly generated by the remaining vertices.

5. A three-dimensional reconstruction method is applied to a rendering terminal and comprises the following steps:

and replacing the pose data of the corresponding peak of the target reconstruction data by the pose data of the residual peak, and rendering and displaying the three-dimensional model corresponding to the current frame according to the replaced target reconstruction data.

6. An acquisition terminal, comprising: the system comprises an RGBD camera, a processor, a memory and a communication interface, wherein the RGBD camera, the memory and the communication interface are connected with the processor through a bus;

the RGBD camera is used for acquiring a depth image and an RGB image;

7. The acquisition terminal of claim 6, wherein the processor removes repeated vertex data from the initial reconstructed data corresponding to the three-dimensional model by:

8. The acquisition terminal of claim 7, wherein after obtaining the key-value pairs corresponding to the respective vertices, the processor further performs:

the processor removes repeated key value pairs in each key value pair corresponding to the vertex, and the specific operation is as follows:

9. The acquisition terminal of any of claims 6-8, wherein the target reconstruction data comprises coordinates of remaining vertices, normal vectors of remaining vertices, colors of remaining vertices, and patch indices, the patch indices being newly generated by the remaining vertices.

10. The rendering terminal is characterized by comprising a processor, a memory, a display and a communication interface, wherein the memory and the communication interface are connected with the processor through a bus;

and replacing the pose data of the corresponding peak of the target reconstruction data with the pose data of the residual peak, rendering a three-dimensional model corresponding to the current frame according to the replaced target reconstruction data, and displaying the three-dimensional model by the display.