CN114373041B

CN114373041B - Three-dimensional reconstruction method and device

Info

Publication number: CN114373041B
Application number: CN202111533663.6A
Authority: CN
Inventors: 张思栋; 许瀚誉; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2024-04-02
Anticipated expiration: 2041-12-15
Also published as: CN114373041A

Abstract

The application relates to the technical field of computer vision and computer graphics, and provides a three-dimensional reconstruction method and equipment, wherein after a three-dimensional model of a target object is reconstructed by an acquisition terminal, repeated vertex data in initial reconstruction data are removed to obtain target reconstruction data, and as the target reconstruction data do not contain redundant vertex data, the data volume is smaller, and the bandwidth pressure for transmitting the target reconstruction data is reduced; in the real-time interaction process, the acquisition terminal acquires pose data of each vertex in the three-dimensional model according to the target depth image of the current frame of the target object, and acquires the pose data of the residual vertices according to indexes of the residual vertices after the de-duplication, so that redundancy of the pose data transmitted in real time is reduced, the transmission speed of the pose data is improved, the rendering terminal is further ensured to render and display the three-dimensional model corresponding to the current frame in real time according to the pose data of the residual vertices and the target reconstruction data, and the real-time performance of three-dimensional reconstruction is improved.

Description

Three-dimensional reconstruction method and device

Technical Field

The present application relates to the field of computer vision and computer graphics, and in particular, to a three-dimensional reconstruction method and apparatus.

Background

In recent years, three-dimensional reconstruction technology is becoming an important research content in the fields of computer vision, computer graphics and the like, and is widely applied to scenes such as augmented reality, tele-immersive communication, three-dimensional video live broadcast and the like.

Three-dimensional reconstruction refers to the process of reconstructing three-dimensional information from single-view or multi-view images. For some static scenes and objects, reconstructing three-dimensional models of the scenes and the objects through a static three-dimensional reconstruction algorithm, and rendering and displaying the three-dimensional models in the three-dimensional scenes. For some dynamic objects (e.g., people), because of dynamic changes, it is often necessary to create a series of three-dimensional models that are rendered and displayed sequentially to describe the motion state of the object.

In a real-time remote three-dimensional reconstruction social system, the main focus is real-time dynamic three-dimensional reconstruction of human bodies and faces. The acquisition terminal acquires data by utilizing various sensors, reconstructs three-dimensional human body information by adopting a dynamic three-dimensional reconstruction method, sends the three-dimensional human body information to the rendering terminal, and renders and displays a three-dimensional human body model in real time by the rendering terminal according to the three-dimensional human body information. Because the reconstruction data such as the vertex color, the vertex normal vector, the surface patch and the like of the multi-path three-dimensional model are transmitted in real time, the data volume transmitted by each frame is larger, and larger pressure is caused on the bandwidth, so that the rendering terminal visually causes the blocking of the model action when the frame rate of the three-dimensional model is insufficient, and the dizziness is caused after the time of the frame rate of less than 20 frames exceeds a certain threshold value, thereby influencing the immersive experience of a user.

Disclosure of Invention

The embodiment of the application provides a three-dimensional reconstruction method and device, which are used for reducing the data volume of three-dimensional reconstruction, reducing the transmission bandwidth pressure and further improving the real-time performance of the three-dimensional reconstruction.

In a first aspect, an embodiment of the present application provides a three-dimensional reconstruction method, applied to an acquisition terminal, including:

acquiring a multi-frame depth image and a corresponding RGB image of a target object rotating in a preset gesture, and reconstructing a three-dimensional model of the target object according to the multi-frame depth image and the corresponding RGB image;

removing repeated vertex data in initial reconstruction data corresponding to the three-dimensional model, obtaining target reconstruction data and sending the target reconstruction data to a rendering terminal;

acquiring a target depth image of a current frame of the target object in the interaction process, and acquiring pose data of each vertex in the three-dimensional model according to the target depth image;

and extracting pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of the residual vertexes according to the indexes of the residual vertexes after the weight removal, and sending the pose data to the rendering terminal so that the rendering terminal can render and display a three-dimensional model corresponding to the current frame after replacing the pose data of the residual vertexes with the pose data of the corresponding vertexes of the target reconstruction data.

In a second aspect, an embodiment of the present application provides a three-dimensional reconstruction method, applied to a rendering terminal, including:

receiving target reconstruction data sent by an acquisition terminal; the target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of a target object, wherein the three-dimensional model is reconstructed through an acquired multi-frame depth image and a corresponding RGB image of the target object rotating in a preset gesture;

receiving pose data of residual vertexes in the three-dimensional model of the current frame sent by the acquisition terminal; the pose data of the residual vertexes are extracted from the pose data of all vertexes of the three-dimensional model according to indexes of the residual vertexes after the weight removal, and the pose data of all vertexes are obtained according to a target depth image of the current frame of the target object;

and replacing pose data of corresponding vertexes of the target reconstruction data with pose data of the residual vertexes, and rendering and displaying a three-dimensional model corresponding to the current frame according to the replaced target reconstruction data.

In a third aspect, an embodiment of the present application provides an acquisition terminal, including: the RGBD camera, the memory and the communication interface are connected with the processor through buses;

The RGBD camera is used for acquiring a depth image and an RGB image;

the memory stores a computer program, and the processor performs the following operations according to the computer program:

acquiring multi-frame depth images and corresponding RGB images of a target object acquired by the RGBD camera and rotated in a preset gesture, and reconstructing a three-dimensional model of the target object according to the multi-frame depth images and the corresponding RGB images;

removing repeated vertex data in initial reconstruction data corresponding to the three-dimensional model to obtain target reconstruction data, and sending the target reconstruction data to a rendering terminal through the communication interface;

acquiring a target depth image of a current frame of the target object acquired by the RGBD camera in the interaction process, and acquiring pose data of each vertex in the three-dimensional model according to the target depth image;

and extracting pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of the residual vertexes according to the indexes of the residual vertexes after the duplication elimination, and sending the pose data of the residual vertexes to the rendering terminal through the communication interface so that the rendering terminal can render and display the three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes.

In a fourth aspect, the present application provides a rendering terminal, including a processor, a memory, a display, and a communication interface, where the memory, the communication interface, and the processor are connected by a bus;

receiving target reconstruction data sent by an acquisition terminal through the communication interface; the target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of a target object, wherein the three-dimensional model is reconstructed through an acquired multi-frame depth image and a corresponding RGB image of the target object rotating in a preset gesture;

receiving pose data of residual vertexes in the three-dimensional model of the current frame sent by the acquisition terminal through the communication interface; the pose data of the residual vertexes are extracted from the pose data of all vertexes of the three-dimensional model according to indexes of the residual vertexes after the weight removal, and the pose data of all vertexes are obtained according to a target depth image of the current frame of the target object;

and replacing pose data of corresponding vertexes of the target reconstruction data with pose data of the residual vertexes, rendering a three-dimensional model corresponding to the current frame according to the replaced target reconstruction data, and displaying by the display.

In a fifth aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the three-dimensional reconstruction method provided by the embodiments of the present application.

In the above embodiment of the present application, the acquisition terminal reconstructs a three-dimensional model of a target object according to a multi-frame depth image and a corresponding RGB image, in which the target object rotates in a preset gesture, and removes repeated vertex data in initial reconstruction data of the three-dimensional model to obtain target reconstruction data, and the vertex data deduplication is performed to make the target reconstruction data not include redundant vertex data, so that the data amount is reduced relative to the initial reconstruction data, and when the target reconstruction data is sent to the rendering terminal, the pressure of a transmission bandwidth is reduced; further, in the real-time interaction process, the acquisition terminal acquires pose data of each vertex in the three-dimensional model according to the target depth image of the current frame of the target object, and extracts the pose data of the residual vertices from the acquired pose data of each vertex according to indexes of the residual vertices after the weight removal, so that redundancy of the pose data transmitted in real time is reduced, the transmission speed of the pose data is improved, and further, the rendering terminal is guaranteed to render and display the three-dimensional model corresponding to the current frame in real time according to the pose data of the residual vertices and the target reconstruction data, and the real-time performance of three-dimensional reconstruction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 illustrates a block diagram of a three-dimensional reconstruction system provided by an embodiment of the present application;

fig. 2 is a flowchart schematically illustrating a three-dimensional reconstruction method implemented by the acquisition terminal side according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for acquisition terminal side self-modeling according to an embodiment of the present application;

FIG. 4 schematically illustrates a self-modeling process provided by an embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for model reconstruction data deduplication according to an embodiment of the present application;

fig. 6 is a flowchart schematically illustrating a three-dimensional reconstruction method implemented by a rendering terminal according to an embodiment of the present application;

FIG. 7 schematically illustrates a reconstruction effect provided by an embodiment of the present application;

FIG. 8 illustrates a flow chart of a complete three-dimensional reconstruction method provided by an embodiment of the present application;

fig. 9 illustrates a block diagram of an acquisition terminal according to an embodiment of the present application;

fig. 10 exemplarily shows a block diagram of a rendering terminal provided in an embodiment of the present application.

Detailed Description

An important application scenario of the dynamic three-dimensional reconstruction technology is a real-time remote three-dimensional reconstruction social system. And after reconstructing the three-dimensional model based on the acquired color image and depth data by the acquisition terminal, transmitting the three-dimensional reconstruction data to the rendering terminal through the cloud for rendering and displaying. Wherein the acquisition terminal and the rendering terminal are relative to the user. For example, the user a corresponds to one display terminal 1, the user B corresponds to one display terminal 2, the display terminal 1 is an acquisition terminal for the user a, a rendering terminal for the user B, and the display terminal 2 is an acquisition terminal for the user B, and a rendering terminal for the user a.

For example, in a virtual social scene of a real-time remote three-dimensional communication system, different user terminals transmit respective dynamic three-dimensional model data to other user terminals through three-dimensional reconstruction, and the other user terminals render and display the data at the local terminal after receiving the data. Compared with the traditional voice or video communication mode, the mode can enable the user in different places to achieve the immersive social experience.

The core technology of the real-time remote three-dimensional reconstruction social system relates to real-time three-dimensional reconstruction of human bodies and faces, encoding and decoding of two-dimensional or three-dimensional data, transmission technology and immersive rendering display technology. Three-dimensional reconstruction data relates to data of vertexes, patches and the like of three-dimensional geometry, wherein the vertex data comprises vertex coordinates, vertex normal vectors and vertex colors. The human dynamic three-dimensional reconstruction process comprises the following steps: firstly, input parameters such as posture data, geometric data, (clothes) material data and the like of a human body are obtained from various sensor devices, and the input parameters are processed by adopting a non-rigid real-time three-dimensional reconstruction method, so that a three-dimensional model of the human body is reconstructed.

In the dynamic three-dimensional reconstruction process, the higher the resolution of model voxels is, the more the data volume is increased, so that the cloud data transmission has important influence on the quality of model reconstruction and imaging of a rendering terminal on the premise of not having a mature high-efficiency high-fidelity three-dimensional data compression technology.

For example, at a local area network bandwidth of 30FPS, a voxel with a resolution of 192 x 128 (pixels) requires a transmission rate of 256Mbps, a resolution of this magnitude may be less effective at rendering the display, the voxels with a resolution of 384×384 require a transmission rate of 1120Mbps (for example, 30 FPS), and the amount of data to be transmitted increases more severely than the voxels with a resolution of 192×128 pixels, which is difficult to transmit in real time under the current ideal 5G network bandwidth condition.

However, real-time social networking requires the transmission of data such as model vertices, patches, and textures for each frame reconstruction model. Therefore, the real-time transmission of the model can be realized while the fidelity and the authenticity of the three-dimensional reconstruction model are ensured, and the method becomes the research focus of the remote real-time remote three-dimensional reconstruction social system productization.

At present, a real-time remote three-dimensional reconstruction social system generally adopts a method of pre-modeling and a real-time pose driving model, and the method reduces the data volume of real-time transmission, but the redundancy of vertex data in the reconstruction data of the pre-modeling and the pose data of the real-time transmission is larger, the real-time performance of the system is still influenced, and the real-time performance is poorer for the multi-path model data transmission of a high-precision model.

In view of this, in order to reduce the network bandwidth occupied by data transmission under the condition of guaranteeing the authenticity of the model, so as to meet the requirement of real-time performance of the social system, the rendering display efficiency of the rendering terminal is improved. The embodiment of the application provides a three-dimensional reconstruction method and device, which divide real-time three-dimensional reconstruction and remote terminal rendering display into four stages. The first stage is a self-modeling stage, wherein the depth image and the RGB image acquired by the RGBD camera are reconstructed into a high-precision three-dimensional model through a high-performance computer and a three-dimensional reconstruction algorithm, and the data of the complete three-dimensional model comprise vertex coordinates, vertex colors, vertex normal vectors, surface patches and textures formed by vertexes and the like. The second stage is a model vertex de-duplication stage, because many vertexes of the reconstructed model are repeated, more redundant data such as coordinates, normal vectors and colors can be generated, the data volume of the model is increased, and the transmission speed is influenced, so that the data volume of the model can be reduced by removing redundant vertex data. And the third stage is a transmission stage of the model data after the duplication removal and the current model posture data, and the model data such as the vertex coordinates, the vertex normal vector, the vertex color and the like after the duplication removal and the real-time posture data of the vertex of each frame of model are transmitted to the cloud. And the fourth stage is a rendering display stage, wherein a three-dimensional model is rendered according to the model data after the duplication removal and the top point pose data transmitted in real time, and is displayed. According to the embodiment of the application, through vertex data deduplication, the data volume of transmission can be effectively reduced, transmission delay is reduced, rendering display efficiency is improved, and according to the model data subjected to deduplication and the real-time vertex pose data, the human body three-dimensional model is rendered, the action is more natural and real, the watching effect is better, and therefore immersive experience of a user is improved.

It should be noted that, the method provided by the embodiment of the application is not only suitable for three-dimensional reconstruction and data transmission of human bodies and human faces, but also suitable for three-dimensional reconstruction and real-time data transmission of any rigid and non-rigid moving object.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 schematically illustrates a three-dimensional reconstruction system architecture provided in an embodiment of the present application. As shown in fig. 1, the system includes an acquisition terminal 101, a transmission terminal 102, and a rendering terminal 103.

The acquisition terminal 101 includes an RGBD camera 101_1 and a high performance computer 101_2. The RGBD camera 101_1 may be a Kinect camera, or may be a Realsense camera. The RGBD camera 101_1 is mainly used for collecting depth images and RGB images of a user and a real scene where the user is located, and the depth images and the RGB images of the scene can be used for foreground segmentation to segment a human body from the whole scene. And (3) performing related operation on the acquired depth image and RGB image of the user through a matched computer 101_2 to obtain human body pose data, and reconstructing out reconstruction data such as vertex coordinates, vertex colors, vertex normal vectors, surface patches formed by vertexes and the like of the three-dimensional model. In addition, the matched computer 101_2 can also perform de-duplication on pose data and reconstruction data, thereby reducing redundant data.

The transmission terminal 102 is used for acquiring reconstruction data and real-time pose data of the acquisition terminal, and distributing the data after encoding. In general, the transmission terminal 102 encodes the received data of the acquisition terminal 101 and the data of the distribution rendering terminal 103 in a lossless manner, so that the encoding and decoding of the data can be performed by adopting a mature compression technology with high efficiency and fidelity. The transmission terminal 102 may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, a cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communication, and basic cloud computing services such as big data and an artificial intelligence platform.

The rendering terminal 103 renders the three-dimensional model of the human body based on the de-duplicated reconstruction data and the real-time pose data received from the transmission terminal 102, and performs immersion for display. The rendering terminal 103 may be a display device such as a television, a mobile phone, a computer, VR/AR glasses, etc.

It should be noted that, the system architecture shown in fig. 1 may be deployed according to different usage scenarios, for example, in a live broadcast scenario, a hosting end sets an acquisition terminal of the system, a client sets a rendering terminal of the system, and a user may browse a reconstruction model through the rendering terminal; for another example, in a conference scenario, two conference rooms of a teleconference need to simultaneously arrange an acquisition terminal and a rendering terminal to perform real-time three-dimensional communication in the two conference rooms, and the experience is higher than that of a traditional voice or video communication mode.

Based on the system architecture shown in fig. 1, fig. 2 schematically shows a flow chart of a three-dimensional reconstruction method provided in an embodiment of the present application, and as shown in fig. 2, the flow is executed by an acquisition terminal, and mainly includes the following steps:

s201: and acquiring a multi-frame depth image and a corresponding RGB image of the target object rotating in a preset gesture, and reconstructing a three-dimensional model of the target object according to the multi-frame depth image and the corresponding RGB image.

In the embodiment of the present application, the acquisition terminal constructs a three-dimensional model of the target object in advance by acquiring a depth image and an RGB image of the target object, specifically referring to fig. 3:

s2011: and acquiring multi-frame scene depth images acquired by the RGBD camera.

In performing S2011, the RGBD camera is fixed, and a plurality of frames of scene depth images not including the target object are acquired for segmentation of the foreground and the background.

S2012: and acquiring a depth image of the target object in a preset posture acquired by the RGBD camera, and generating an inner layer driving model of the target object according to the acquired depth image.

In an alternative embodiment, when S2012 is executed, the target object walks in front of the RGBD camera, faces the RGBD camera, and is put into a Tpose pose, and is stationary for 2-3 seconds, a depth image of a frame of the target object is acquired by the RGBD camera, the depth image is taken as a reference frame, and the scene depth image is combined to perform foreground and background segmentation, and a mask image of the target object is generated through operations such as expansion and corrosion of the image. Further, according to the generated mask map of the target object, point cloud data of the target object are extracted from the depth image of the target object, and non-rigid fitting is performed on the extracted point cloud data and the point cloud data of the parameterized human body model (such as an SMPL model), so that an inner layer driving model of the target object is obtained.

S2013: and acquiring a multi-frame depth image and a corresponding RGB image of the target object rotating in a preset gesture.

In an alternative embodiment, when S2013 is performed, the target object rotates in a Tpose pose before the RGBD camera for one revolution, and during the rotation, it needs to be ensured that the RGBD camera can collect the complete data of the target object from the head to the foot, so as to improve the integrity of the three-dimensional model.

S2014: reconstructing a geometric model of the target object according to the multi-frame depth image, and carrying out texture mapping on the geometric model according to the multi-frame RGB image to generate a three-dimensional model of the target object.

When S2014 is executed, for each frame of depth image, extracting point cloud data of a target object from the depth image according to a mask map of the target object to enrich the point cloud data extracted in the reference frame, and performing non-rigid fitting on the extracted point cloud data and the inner layer driving model by using a dynamic real-time non-rigid reconstruction algorithm, so that skeleton nodes (shown in (a) of fig. 4) of the inner layer driving model move, and generating a geometric model of the target object. Further, according to the mapping relation between the depth image and the RGB image, the color of the geometric vertex is obtained from the corresponding RGB image, and the texture mapping is performed on the geometric model, so as to obtain a three-dimensional model of the real target object, as shown in (b) in fig. 4.

S202: and removing repeated vertex data in the initial reconstruction data corresponding to the three-dimensional model, obtaining target reconstruction data, and sending the target reconstruction data to the rendering terminal so that the rendering terminal renders the three-dimensional model according to the target reconstruction data.

In the embodiment of the present application, after the three-dimensional model of the complete and real target object is obtained, in order to avoid obtaining new vertex and texture data, the reconstruction of the three-dimensional model is stopped, that is, the geometry, topology, number of vertices, color of vertices, and patch generated by the vertices of the three-dimensional model are kept unchanged.

Generally, the reconstructed three-dimensional model has many repeated vertices, so that redundant data such as repeated normal vectors, colors and the like are generated, and the data transmission speed is influenced. Therefore, in S202, after the three-dimensional model is reconstructed through the flow shown in fig. 3, initial reconstruction data (including data such as vertex coordinates, vertex normal vectors, vertex colors, and patch indexes) of the three-dimensional model is obtained, and the vertex data in the initial reconstruction data is de-duplicated to reduce the data amount of the three-dimensional model, so as to obtain target reconstruction data. For implementation, see fig. 5:

s2021: and taking the three-dimensional coordinates of each vertex of the three-dimensional model in the initial reconstruction data as a key, and taking the normal vector and the color of the corresponding vertex as values to obtain the key value pair corresponding to each vertex.

When S2021 is executed, vertex data is associated and deduplicated by means of key-value pairs. Specifically, the three-dimensional coordinates of each vertex are used as respective keys, the color and normal vector of the vertex corresponding to the current three-dimensional coordinates are used as values, and a one-to-one correspondence relation between the three-dimensional coordinates of the vertex and the color and normal vector is generated.

In an alternative embodiment, the three-dimensional coordinates of the vertices are represented by position numbers, e.g., position number 1 (x ₁ ，y ₁ ，z ₁ ) The vertex at (i.e. (x) ₁ ，y ₁ ，z ₁ ) The key at the vertex is 1), position number 2 represents (x ₂ ，y ₂ ，z ₂ ) The vertex at (i.e. (x) ₂ ，y ₂ ，z ₂ ) The key at the vertex is 2), and so on. The color and normal vector of the vertex corresponding to the three-dimensional coordinates are expressed by english letters, for example, (x) ₁ ，y ₁ ，z ₁ ) The color and normal vector corresponding to the vertex at a (i.e., (x) ₁ ，y ₁ ，z ₁ ) The value at vertex is A), (x ₂ ，y ₂ ，z ₂ ) The vertex at which corresponds has a color and normal vector B (i.e., (x ₂ ，y ₂ ，z ₂ ) Where the vertex value is B), and so on. Assuming that the keys of the respective vertices in the three-dimensional model are 5, 2, 3, 2, 1, 3, 4, and the values of the respective vertices are E, B, C, B, A, C, D, the corresponding key-value pairs of the respective vertices are { (5, E), (2, B), (3, C), (2, B), (1, A), (3, C), (4, D) }.

S2022: and removing repeated key value pairs in each key value pair corresponding to each vertex for each vertex to obtain one key value pair corresponding to the vertex, and taking one key value pair corresponding to the vertex as data after vertex de-duplication.

In an alternative embodiment, when S2022 is executed, each vertex represented by the three-dimensional coordinate is traversed, for each vertex, a key corresponding to the vertex is compared with other keys, if the key appears for the first time, a key value pair corresponding to the vertex represented by the current three-dimensional coordinate is reserved, otherwise, the key value pair corresponding to the vertex represented by the current three-dimensional coordinate is deleted, so that each vertex is ensured to correspond to only one key value pair, and one key value pair corresponding to the vertex is used as the data after the vertex is de-duplicated, thereby removing redundant vertex data in the initial reconstructed data and reducing the data quantity.

In other embodiments, to improve the deduplication efficiency, vertex data may be sorted first, and data deduplication may be performed based on the sorted result. And further, for each vertex, determining the number of key value pairs to be removed from the vertex according to the number of the repeated key value pairs corresponding to the vertex, and removing the repeated key value pairs corresponding to the vertex according to the number of the removed key value pairs to make the key value pairs corresponding to the vertex unique.

For example, assuming that the keys of the vertices before sorting are 5, 2, 3, 2, 1, 3, and 4, and the values of the vertices are E, B, C, B, A, C, D, the keys of the vertices after sorting from small to large are 1, 2, 3, 4, and 5, and the keys of the vertices are A, B, B, C, C, D, E. According to the ordered result, the number of the key value pairs corresponding to the vertexes at the position 1 is 1, and no repeated key value pairs exist, so that duplication elimination is not needed; the vertex at the position 2 corresponds to 2 key value pairs, and 1 key value pair needs to be removed in order to ensure that the key value pair corresponding to the vertex at the position 2 is unique; the vertex at the position 3 corresponds to 2 key value pairs, and 1 key value pair needs to be removed in order to ensure that the key value pair corresponding to the vertex at the position 3 is unique; the number of the key value pairs corresponding to the vertexes at the position 4 is 1, and repeated key value pairs are not needed, so that duplicate removal is not needed; the number of the key value pairs corresponding to the vertexes at the position 5 is 1, and repeated key value pairs are not needed, so that duplicate removal is not needed; the keys corresponding to the vertexes after the repeated key value pair is removed are 1, 2, 3, 4 and 5, and the value is A, B, C, D, E.

After the vertex data of the three-dimensional model in the self-modeling process is de-duplicated, generating a new surface patch from the rest vertexes after de-duplication through the three-dimensional coordinates of each vertex contained in the surface patch of the three-dimensional model before de-duplication, and determining the vertex index corresponding to the new surface patch.

It should be noted that, the association of vertex coordinates, vertex colors and vertex normal vectors in the manner of key-value pairs is only an example, and the embodiment of the present application does not require a limitation on the association form of vertex data, for example, may also perform a relationship in the manner of an array.

In the embodiment of the application, since the three-dimensional model is completely reconstructed, the integration of new data into the voxels is stopped, and new vertexes are not introduced, that is, the geometry, topology, number of vertexes, vertex color and the surface patch formed by the vertexes of the three-dimensional model are kept unchanged, and only the pose data of each vertex controlled by the skeleton and the SMPL skin is changed every frame. Therefore, in order to improve the deduplication efficiency, the indexes of the remaining vertexes after the deduplication of the initial reconstructed data can be recorded, and redundant data can be directly removed according to the indexes of the remaining vertexes afterwards. Wherein the index characterizes the position of the vertex in the three-dimensional model.

In S202, after removing repeated vertex data in the initial reconstruction data, target reconstruction data is obtained, where the target reconstruction data is complete data of the target object, and includes coordinates of remaining vertices, normal vectors of the remaining vertices, colors of the remaining vertices, and a patch index newly generated by the remaining vertices. Because the target reconstruction data does not contain redundant data, the data volume is smaller, and when the target reconstruction data is transmitted to the rendering terminal in a lossless manner through the cloud, the integrity and reconstruction accuracy of the three-dimensional model are ensured, the bandwidth pressure is reduced, and the transmission speed is improved. And the rendering terminal receives the target reconstruction data and stores the target reconstruction data so as to render the three-dimensional model according to the target reconstruction data in the real-time interaction process.

It should be noted that, the target reconstruction data is transmitted only once, and is not a real gesture in the actual interaction process, and real-time transmission is not needed. In the actual interaction process, the target object does not keep a constant gesture, the gesture of the three-dimensional model is adjusted through the current gesture of the target object, the action of the real three-dimensional model is matched with the action of the target object, and the immersion of remote interaction is improved.

S203: and acquiring a target depth image of a current frame of the target object in the interaction process, and acquiring pose data of each vertex in the three-dimensional model according to the target depth image.

In the real-time interaction process, the target object changes the position and the gesture thereof through movement, the target depth image of the target object of the current frame, which is acquired by the RGBD camera in real time, is converted into target point cloud data, the point cloud data is converted into the space of an inner layer driving model generated based on the reference frame, and the inner layer driving model corresponding to the reference frame is driven into the gesture of the target object in the current frame through operations such as non-rigid iteration closest point (Iterative Closest Point, ICP), skin calculation and the like, so that the gesture data of the model corresponding to the current frame, namely the three-dimensional coordinates of each vertex, are acquired.

S204: and extracting pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of each vertex according to the indexes of the residual vertexes after the weight removal, and sending the pose data to a rendering terminal so that the rendering terminal can render and display a three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes.

In the embodiment of the present application, indexes of remaining vertices after the de-duplication are recorded by de-duplication of vertex data in a self-modeling stage, and when executing S204, the acquisition terminal extracts pose data of remaining vertices corresponding to the corresponding indexes from pose data of each vertex according to the indexes of the remaining vertices. Because the index of the residual vertexes is unique, the extracted pose data of the residual vertexes are unique and do not contain redundant data, compared with the pose data of each vertex of the three-dimensional model obtained according to the depth image, the data size of the pose data of the residual vertexes is smaller, the pressure of the transmission bandwidth can be reduced, the transmission speed is improved, and the real-time transmission requirement of the pose data can be met under the condition of the existing network. And after receiving the pose data of the residual vertexes transmitted in real time, the rendering terminal drives the three-dimensional model reconstructed based on the target reconstruction data to move so that the three-dimensional model is matched with the pose of the current target object.

Corresponding to the acquisition terminal side, fig. 6 exemplarily shows a flowchart of a rendering display method of a three-dimensional model provided in an embodiment of the present application, and as shown in fig. 6, the flowchart is executed by a rendering terminal, and mainly includes the following steps:

s601: and receiving target reconstruction data sent by the acquisition terminal.

The target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of the target object, and the three-dimensional model is reconstructed through an acquired multi-frame depth image rotated by the target object in a preset gesture and a corresponding RGB image. The specific process is referred to in S201-S202 and will not be repeated here.

In S601, since the target reconstruction data includes complete data of the target object, such as vertex coordinates, vertex colors, patch indexes, etc., a complete three-dimensional model can be rendered from the target reconstruction data.

S602: and receiving pose data of the residual vertexes in the three-dimensional model corresponding to the current frame sent by the acquisition terminal.

The pose data of the residual vertexes are extracted from the pose data of each vertex of the three-dimensional model according to indexes of the residual vertexes after the weight removal, and the pose data of each vertex is obtained according to a target depth image of a target object of the current frame. The detailed description is referred to S203 to S204 and will not be repeated here.

S603: and replacing pose data of corresponding vertexes of the target reconstruction data with pose data of the residual vertexes, and rendering and displaying a three-dimensional model corresponding to the current frame according to the replaced target reconstruction data.

In the real-time interaction process, a new vertex is not fused, and the topological structure of the three-dimensional model is kept unchanged, namely, the color of the vertex and the patch generated by the vertex are kept unchanged. Therefore, in S603, the rendering terminal replaces pose data of the corresponding vertex in the target reconstruction data with pose data of the remaining vertices sent in real time by the collecting terminal, so that the three-dimensional model is matched with the current action of the target object, and immersive experience in the remote interaction process is improved. As shown in fig. 7, an effect map of 20 consecutive frames is acquired, reconstructed, transmitted, and rendered in real time.

In the embodiment of the application, on one hand, the target reconstruction data comprises the complete data of the target object, the lossless coding mode is adopted for transmission, the data resolution is unchanged, the integrity and high precision of the three-dimensional model are ensured, the three-dimensional model corresponding to the current frame is rendered and displayed by combining the pose data transmitted in real time, the reality of the model action is ensured, and the immersive experience of remote interaction is improved; on the other hand, as the target reconstruction data and the real-time pose data are the data after the duplication removal, the redundancy is low, the data volume is small, the pressure of the network bandwidth is reduced, the transmission speed is improved, the rendering display time delay of the rendering terminal is reduced, the rendering display efficiency is high, the phenomenon of model action blocking is reduced, and the display quality of the model is improved.

The embodiment of the application reduces the data quantity transmitted by each frame while guaranteeing the quality of the model. Taking a voxel with a resolution of 256 x 256 (pixels) as an example, with the three-dimensional reconstruction method provided by the embodiments of the present application, the data volume to be transmitted in each frame is reduced from the original 45939KB to 270KB, the data volume is reduced by nearly 170 times, and the real-time performance of system transmission is improved.

The following describes a three-dimensional reconstruction method provided in the embodiment of the present application from a process of interaction between an acquisition terminal and a rendering terminal, as shown in fig. 8:

s801: the acquisition terminal acquires multi-frame scene depth images.

S802: the acquisition terminal acquires multi-frame depth images and corresponding RGB images of the target object rotating in a preset gesture, and performs human body segmentation according to the multi-frame scene depth images.

S803: and the acquisition terminal rebuilds a three-dimensional model of the target object according to the segmented human body depth image and the human body RGB image.

S804: the acquisition terminal determines whether the three-dimensional model is complete, if yes, S805 is executed, otherwise S802 is returned.

In an alternative embodiment, when no new vertex is introduced into the acquired human body depth image of the target object, the three-dimensional model is determined to be completely reconstructed, and the acquisition of the image is stopped.

S805: the acquisition terminal acquires initial reconstruction data corresponding to the three-dimensional model, and removes repeated vertex data in the initial reconstruction data.

Generally, the initial reconstruction data corresponding to the three-dimensional model includes many repeated vertex data, and has high redundancy, so that the data volume is large. In S805, by removing the repeated vertex data in the initial reconstructed data, the amount of data transmitted by the three-dimensional model is reduced, the pressure of the network bandwidth is reduced, and the transmission speed is increased.

In S805, the coordinates, colors, normal vectors, and the like of the remaining vertices after the deduplication are unique, the redundancy is low, the data volume is small, and the transmission speed is high. The specific deduplication process is referred to as S202 and is not repeated here.

S806: the acquisition terminal records indexes of residual vertexes after the duplicate removal.

S807: and the acquisition terminal regenerates a new triangular patch according to the residual vertexes.

S808: and the acquisition terminal transmits the coordinates, the colors, the normal vectors and the triangle patch data of the residual vertexes to the cloud server.

S809: and the rendering terminal acquires the coordinates, the colors, the normal vectors and the triangle patch data of the residual vertexes transmitted by the acquisition terminal from the cloud server.

S810: the acquisition terminal acquires a target depth image of a current frame of a target object in the interaction process, and acquires a human body target depth image of the target object.

S811: the acquisition terminal converts the human body target depth image into point cloud data, acquires pose data of each vertex in the three-dimensional model from the point cloud data, and acquires current pose data of the residual vertex according to indexes of the residual vertex.

S812: and the acquisition terminal sends the current pose data of the residual vertexes to the rendering terminal through the cloud server.

S813: and the rendering terminal replaces the three-dimensional coordinates of the corresponding vertexes in the initial reconstruction data after the duplication removal by the current pose data of the residual vertexes, and renders and displays the three-dimensional model corresponding to the current frame by the replaced data.

Based on the same technical conception, the embodiment of the application provides an acquisition terminal, which can execute the flow of the acquisition terminal side in the three-dimensional reconstruction method provided by the embodiment of the application, can achieve the same technical effect and is not repeated here.

Referring to fig. 9, the capture terminal includes an RGBD camera 901 and a host 902, the RGBD camera is used for capturing a depth image and an RGB image, the host 902 includes at least a processor 902_1, a memory 902_2 and a communication interface 902_3, and the communication interface 902_3 and the memory 902_2 are connected with the processor 902_1 through a bus 902_4;

the memory 902_2 stores a computer program, and the processor 902_1 performs the following operations according to the computer program:

Acquiring multi-frame depth images and corresponding RGB images of a target object acquired by an RGBD camera 901 and rotated in a preset gesture, and reconstructing a three-dimensional model of the target object according to the multi-frame depth images and the corresponding RGB images;

removing repeated vertex data in initial reconstruction data corresponding to the three-dimensional model to obtain target reconstruction data, and sending the target reconstruction data to a rendering terminal through a communication interface 902_3;

acquiring a target depth image of a current frame of a target object in an interaction process acquired by the RGBD camera 901, and extracting pose data of each vertex in the three-dimensional model from the target depth image;

and extracting the pose data of the residual vertexes corresponding to the corresponding indexes from the pose data of the residual vertexes according to the indexes of the residual vertexes after the weight removal, and sending the pose data of the residual vertexes to a rendering terminal through a communication interface 902_3 so that the rendering terminal can render and display the three-dimensional model corresponding to the current frame after replacing the pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes.

Optionally, the processor 902_1 removes repeated vertex data in the initial reconstructed data corresponding to the three-dimensional model, which specifically includes:

taking the three-dimensional coordinates of each vertex of the three-dimensional model in the initial reconstruction data as a key, and taking the normal vector and the color of the corresponding vertex as values to obtain a key value pair corresponding to each vertex;

And removing repeated key value pairs in each key value pair corresponding to each vertex for each vertex to obtain one key value pair corresponding to the vertex, and taking one key value pair corresponding to the vertex as data after vertex de-duplication.

Optionally, after obtaining the key value pairs corresponding to the vertices, the processor 902_1 further performs:

sorting the keys of each vertex, and adjusting the corresponding values of each key according to the sorted results to obtain the number of repeated key value pairs corresponding to each vertex;

the processor 902_1 removes repeated key value pairs in each key value pair corresponding to the vertex, and specifically operates as:

determining the number of key value pairs to be removed from the vertex according to the repeated key value pair number corresponding to the vertex;

and removing repeated key value pairs corresponding to the vertexes according to the number of the removed key value pairs.

Optionally, the target reconstruction data includes coordinates of the remaining vertices, normal vectors of the remaining vertices, colors of the remaining vertices, and patch indexes newly generated by the remaining vertices.

It should be noted that the configuration of the acquisition terminal shown in fig. 9 is only hardware necessary for implementing the method flow shown in fig. 2, and the acquisition terminal further includes conventional hardware having a display function, such as the display 903.

Based on the same technical conception, the embodiment of the application provides a rendering terminal, which can execute the flow of the rendering terminal side in the three-dimensional reconstruction method provided by the embodiment of the application, can achieve the same technical effect, and is not repeated here.

Referring to fig. 10, the rendering terminal includes a processor 1001, a memory 1002, a display 1003, and a communication interface 1004, the display 1003, and the memory 1002 being connected to the processor 1001 through a bus 1005;

the memory 1002 stores a computer program, and the processor 1001 performs the following operations according to the computer program:

receiving target reconstruction data sent by an acquisition terminal through a communication interface 1004; the target reconstruction data are obtained by removing repeated vertex data in initial reconstruction data corresponding to a three-dimensional model of the target object, and the three-dimensional model is reconstructed through an acquired multi-frame depth image of the target object rotating in a preset gesture and a corresponding RGB image;

receiving pose data of residual vertexes in the three-dimensional model of the current frame sent by an acquisition terminal through a communication interface 1004; the pose data of the residual vertexes are extracted from the pose data of all vertexes of the three-dimensional model according to indexes of the residual vertexes after the repetition removal, and the pose data of all vertexes are obtained according to a target depth image of a current frame of a target object;

The pose data of the corresponding vertex of the target reconstruction data are replaced by the pose data of the remaining vertex, and the three-dimensional model corresponding to the current frame is rendered according to the replaced target reconstruction data and displayed through the display 1003.

It should be noted that, the structure of the rendering terminal shown in fig. 10 is only hardware necessary for implementing the method flow shown in fig. 6, and the rendering terminal further includes conventional hardware having a display function, such as RGBD camera 1006.

Embodiments of the present application also provide a computer readable storage medium storing instructions that, when executed, perform the method of the foregoing embodiments.

The present application also provides a computer program product for storing a computer program for performing the method of the foregoing embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. The three-dimensional reconstruction method is characterized by being applied to an acquisition terminal and comprising the following steps:

2. The method of claim 1, wherein the removing repeated vertex data in the initial reconstructed data corresponding to the three-dimensional model comprises:

taking the three-dimensional coordinates of each vertex of the three-dimensional model in the initial reconstruction data as keys and the normal vector and the color of the corresponding vertex as values to obtain a key value pair corresponding to each vertex;

and removing repeated key value pairs in each key value pair corresponding to each vertex for each vertex, obtaining one key value pair corresponding to the vertex, and taking one key value pair corresponding to the vertex as the data after the vertex is de-duplicated.

3. The method of claim 2, wherein after obtaining the respective corresponding key-value pairs for each of the vertices, the method further comprises:

the step of removing repeated key value pairs in each key value pair corresponding to the vertex comprises the following steps:

4. A method according to any of claims 1-3, wherein the target reconstruction data comprises coordinates of remaining vertices, normal vectors of remaining vertices, colors of remaining vertices, and patch indices, the patch indices being newly generated by remaining vertices.

5. A three-dimensional reconstruction method, characterized by being applied to a rendering terminal, comprising:

and replacing the pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes, and rendering and displaying the three-dimensional model corresponding to the current frame according to the replaced target reconstruction data.

6. An acquisition terminal, characterized by comprising: the RGBD camera, the memory and the communication interface are connected with the processor through buses;

the RGBD camera is used for acquiring a depth image and an RGB image;

7. The acquisition terminal of claim 6, wherein the processor removes repeated vertex data in the initial reconstructed data corresponding to the three-dimensional model, and specifically comprises:

8. The acquisition terminal of claim 7, wherein after obtaining the respective corresponding key value pairs for each of the vertices, the processor further performs:

the processor removes repeated key value pairs in each key value pair corresponding to the vertex, and the specific operation is as follows:

9. The acquisition terminal of any one of claims 6-8, wherein the target reconstruction data includes coordinates of remaining vertices, normal vectors of remaining vertices, colors of remaining vertices, and patch indices newly generated by remaining vertices.

10. The rendering terminal is characterized by comprising a processor, a memory, a display and a communication interface, wherein the memory and the communication interface are connected with the processor through a bus;

and replacing pose data of the corresponding vertexes of the target reconstruction data with the pose data of the residual vertexes, rendering a three-dimensional model corresponding to the current frame according to the replaced target reconstruction data, and displaying by the display.