CN103269423A

CN103269423A - Scalable 3D display remote video communication method

Info

Publication number: CN103269423A
Application number: CN2013101767177A
Authority: CN
Inventors: 袁立; 彭祎帆; 韩祥; 王锐; 李海峰; 刘旭; 鲍虎军; 钟擎
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-05-13
Filing date: 2013-05-13
Publication date: 2013-08-28
Anticipated expiration: 2033-05-13
Also published as: CN103269423B

Abstract

The invention discloses an expandable three-dimensional display remote video communication method. According to the expandable three-dimensional display remote video communication method, the image of a first user is obtained by means of an RGB-D camera at the data transmission end, the image of the first user comprises a texture image and a depth image, facial feature information and expression feature information are extracted from the texture image, physical feature information is extracted from the depth image, and point clouds of the first user are reconstructed; all the feature information is optimized by means of the point clouds, the optimized feature information is obtained to generate a three-dimensional model A, the three-dimensional model A is projected on a corresponding texture image, the corresponding texture information is extracted, and acquired voice information, the optimized feature information and texture data are sent to a second user; at the data receiving end, the second user receives the data from the first user and extracts the optimized feature information from the received data to generate a three-dimensional model B, the three-dimensional model B is rendered by means of the texture data, and a rendering result is output through a three-dimensional display device, and the voice information is played.

Description

Scalable 3D display remote video communication method

技术领域technical field

本发明涉及远程视频通信领域，尤其涉及一种可拓展式三维显示远程视频通信方法。The invention relates to the field of remote video communication, in particular to an expandable three-dimensional display remote video communication method.

背景技术Background technique

远程视频通信系统在当今社会发挥着越来越重要的作用，业界采用了很多技术来提高系统虚拟现实的效果，使远程会议的参与者获得更加身临其境的体验。三维显示技术在学术界及产业界都经历了空前的发展，正可以使用在远程视频通信系统中。由于最新的三维显示技术可以提供自然的可视化效果以及重要的视觉线索，例如眼神会意，面部表情，肢体语言等等，将它应用到远程视频通信系统中，优势不言而喻。Remote video communication systems are playing an increasingly important role in today's society. The industry has adopted many technologies to improve the effect of virtual reality in the system, so that participants in remote conferences can obtain a more immersive experience. Three-dimensional display technology has experienced unprecedented development in both academia and industry, and it can be used in remote video communication systems. Since the latest 3D display technology can provide natural visualization effects and important visual cues, such as eye contact, facial expressions, body language, etc., the advantages of applying it to remote video communication systems are self-evident.

学术界已有一些应用于远程视频通信的三维显示技术，其中视差型自体视三维显示是一种较为成熟的方法。这种多视角的自体视三维显示在一些固定的视角提供了高分辨率的三维图像。然而，随着观察位置数量的提升，这种方法的显示效果会出现问题，它不能在较大的观察角度范围内提供光滑的运动视差，这意味着远程视频通信参与者走动的自由度被严重地限制了。另外一些基于屏幕旋转原理设计的体三维显示系统由于其旋转机制的结构设计，限制了其显示体积，也从成本，简洁性和可伸缩性的方面限制了其在商业远程视频通信系统中的应用。There are already some 3D display technologies applied to remote video communication in the academic circle, among which the parallax-type self-viewing 3D display is a relatively mature method. This multi-view self-viewing 3D display provides high-resolution 3D images at some fixed viewing angles. However, as the number of observation positions increases, the display effect of this method will be problematic. It cannot provide smooth motion parallax in a large range of observation angles, which means that the freedom of movement of participants in remote video communication is seriously restricted. restricted. In addition, some volumetric three-dimensional display systems designed based on the principle of screen rotation limit their display volume due to the structural design of their rotation mechanism, and also limit their application in commercial remote video communication systems in terms of cost, simplicity and scalability. .

面向市场的三维远程视频通信系统需要满足一些功能需求。首先，最基本的要求在于使与会者们能够交谈、走动、做一些动作，在这方面的限制越少越好。其次，需要提供自然的，具有平滑的双眼视差和运动视差的三维显示效果。同时，还需要快速地捕获与会者的外形及语言等信息，以便实时呈现出来。从商业角度而言，又要求任何端对端的三维显示远程视频通信系统成本较低、简洁、易于部署，以便于进行大范围的市场推广。A market-oriented 3D remote video communication system needs to meet some functional requirements. First of all, the most basic requirement is to enable participants to talk, walk, and do some actions. The less restrictions in this regard, the better. Second, there is a need to provide a natural, three-dimensional display effect with smooth binocular parallax and motion parallax. At the same time, it is also necessary to quickly capture information such as the appearance and language of the participants so that they can be displayed in real time. From a commercial point of view, any end-to-end 3D display remote video communication system is required to be low in cost, simple and easy to deploy, so as to facilitate large-scale market promotion.

在实时人物三维信息捕捉方面，RGB-D相机的出现为这个问题提供了便捷且成本低廉的潜在解决方案，然而将其应用到视频通信场合，尤其是三维视频通信场合的报道非常少。In terms of real-time 3D information capture of people, the emergence of RGB-D cameras provides a convenient and low-cost potential solution to this problem. However, there are very few reports on its application to video communication, especially 3D video communication.

发明内容Contents of the invention

本发明的目的是克服现有的远程视频通信解决方案在虚拟现实效果、用户自由度以及实现成本上的不足，提供了一种面向市场的可拓展式三维显示远程视频通信系统及方法。The purpose of the present invention is to overcome the deficiencies of existing remote video communication solutions in terms of virtual reality effect, user freedom and implementation cost, and provide a market-oriented scalable three-dimensional display remote video communication system and method.

一种可拓展式三维显示远程视频通信方法，实施步骤如下：An expandable three-dimensional display remote video communication method, the implementation steps are as follows:

数据发送端：获取第一用户的人物图像和语音信息，并提取出人物图像中的特征信息和纹理Data sending end: obtain the character image and voice information of the first user, and extract the feature information and texture in the character image

1）利用RGB-D相机获取第一用户的人物图像，该人物图像中包含了纹理图像和深度图像；1) Use the RGB-D camera to acquire the character image of the first user, the character image includes texture images and depth images;

从纹理图像中提取出面部特征信息及表情特征信息；Extract facial feature information and expression feature information from the texture image;

从深度图像中提取肢体特征信息并重建出人物点云；Extract body feature information from depth images and reconstruct character point clouds;

2）利用所述的人物点云对步骤1）中各特征信息进行优化得到优化后的特征信息，通过该优化后的特征信息生成三维模型A；2) Using the character point cloud to optimize each feature information in step 1) to obtain optimized feature information, and generate a 3D model A through the optimized feature information;

3）将所述的三维模型A投影到对应的纹理图像，将该三维模型A所对应的纹理数据提取出来；在采集人物图像的同时获取语音信息，将该语音信息连同优化后的特征信息，以及提取出来的纹理数据发送给第二用户；3) Project the 3D model A to the corresponding texture image, extract the texture data corresponding to the 3D model A; acquire voice information while collecting the character image, and combine the voice information with the optimized feature information, and sending the extracted texture data to the second user;

4）循环操作步骤1）～3），利用后一时刻的数据对前一时刻的数据进行更新并发送给第二用户；4) Cycle operation steps 1) to 3), update the data at the previous time with the data at the next time and send it to the second user;

数据接收端：Data receiver:

第二用户通过网络接收来自第一用户的数据后，提取其中的优化后的特征信息生成三维模型B，利用所述提取出来的纹理数据对三维模型B进行渲染，通过三维显示设备输出渲染结果，并播放所述的语音信息。After receiving the data from the first user through the network, the second user extracts the optimized feature information therein to generate a 3D model B, uses the extracted texture data to render the 3D model B, and outputs the rendering result through a 3D display device, And play the voice message.

在数据发送端获取第一用户的人物图像和语音信息，并提取出人物图像中的特征信息和纹理数据，然后发送至数据接收端，数据接收端的第二用户根据所接受到的数据进行建模和渲染处理，并通过三维显示设备输出，以及播放对应的语音信息，实现数据发送端和数据接收端的远程视频通信。Obtain the character image and voice information of the first user at the data sending end, extract the feature information and texture data in the character image, and then send it to the data receiving end, and the second user at the data receiving end performs modeling according to the received data and rendering processing, output through the three-dimensional display device, and play the corresponding voice information to realize remote video communication between the data sending end and the data receiving end.

在数据发送端，利用RGB-D相机获取第一用户的人物图像时，在不同时刻改变RGB-D相机相对第一用户的视角，得到不同角度下第一用户的人物图像，然后根据人物图像中的各特征信息，构建三维模型，实现三维显示的效果。At the data sending end, when using the RGB-D camera to acquire the person image of the first user, change the angle of view of the RGB-D camera relative to the first user at different times to obtain the person image of the first user at different angles, and then according to the image of the person Each feature information of the system is used to build a 3D model to achieve the effect of 3D display.

所述的语音信息、优化后的特征信息以及提取出来的纹理数据经压缩、打包处理后通过网络发送给所述的第二用户。其中，语音信息和优化后的特征信息的数据量较小，也可不进行压缩处理，对数据传输的速度影响大不，但是纹理数据的数据量较大，占用的资源较多，如不进行精简与压缩处理，其传输的速度较慢，第一用户与第二用户间存在较大的时间差，影响视频通信的实时性。The voice information, the optimized feature information and the extracted texture data are compressed and packaged and sent to the second user through the network. Among them, voice information and optimized feature information have a small amount of data, and compression processing may not be performed, which will have a great impact on the speed of data transmission. However, the texture data has a large amount of data and takes up more resources. If it is not streamlined Compared with compression processing, the transmission speed is relatively slow, and there is a large time difference between the first user and the second user, which affects the real-time performance of video communication.

利用后一时刻的数据对前一时刻的数据进行更新时，利用后一时刻的优化后的特征信息对前一时刻的优化后的特征信息进行更新处理，并对应生成更新后的三维模型A，将更新后的三维模型A投影到该后一时刻的纹理图像，提取出更新后的纹理数据。When using the data at the next moment to update the data at the previous moment, use the optimized feature information at the next moment to update the optimized feature information at the previous moment, and correspondingly generate the updated 3D model A, Project the updated 3D model A to the texture image at the next moment, and extract the updated texture data.

优选的，所述的三维显示设备包括投影模块和定向散射屏模块，所述的定向散射屏模块为具有二向散射特性的屏幕，所述的投影模块为投影阵列。所述的投影阵列为投影仪组成的阵列，或由二维显示器和镜头阵列组成。三维显示设备是使用集成光场三维显示技术的实时空间三维呈现系统，输出视场角度很大且运动视差连续的空间三维场景图像，使观察者通过双目视差形成三维视觉，达到真实的三维视频效果。Preferably, the three-dimensional display device includes a projection module and a directional scattering screen module, the directional scattering screen module is a screen with bidirectional scattering characteristics, and the projection module is a projection array. The projection array is an array composed of projectors, or composed of a two-dimensional display and a lens array. The 3D display device is a real-time spatial 3D presentation system using integrated light field 3D display technology, which outputs a spatial 3D scene image with a large field of view and continuous motion parallax, enabling the observer to form a 3D vision through binocular parallax to achieve a real 3D video Effect.

本发明将集成光场三维显示技术和基于RGB-D相机的实时人物捕捉技术结合起来，每个终端设备使用集成光场三维显示技术以提供实时真实感，具有细腻双目视差及运动视差的三维效果；同时，使用RGB-D相机捕获人物信息，并且通过互联网互联，发挥了三维显示系统的视觉优势和RGB-D相机成本低廉以及人物捕获高效的优点。The present invention combines integrated light field 3D display technology with RGB-D camera-based real-time character capture technology, and each terminal device uses integrated light field 3D display technology to provide real-time realism, with fine binocular parallax and motion parallax 3D At the same time, the use of RGB-D cameras to capture character information and interconnection through the Internet has brought into play the visual advantages of the 3D display system and the low cost of RGB-D cameras and the advantages of high-efficiency character capture.

本发明的主要优点在于：The main advantages of the present invention are:

整合了相关资源，使用价格低廉且方便使用的RGB-D深度相机实时地采集并处理，得到质量较好的人物交互信息，并对这些信息进行精简与压缩降低其网络传输带宽要求，使其可以实时地在互联网上传输，同时应用了高性能的实时光场三维显示装置，可以实现很好的虚拟现实效果，使远程视频通信的参与者可以获得身临其境的交互体验。Integrate relevant resources, use the cheap and easy-to-use RGB-D depth camera to collect and process in real time, get better quality character interaction information, and simplify and compress the information to reduce its network transmission bandwidth requirements, so that it can Real-time transmission on the Internet, and the application of a high-performance real-time light field three-dimensional display device can achieve a good virtual reality effect, so that participants in remote video communication can obtain an immersive interactive experience.

附图说明Description of drawings

图1为本发明中三维显示远程视频通信方法的基本逻辑结构示意图。FIG. 1 is a schematic diagram of the basic logic structure of the three-dimensional display remote video communication method in the present invention.

图2为数据发送端获取交互数据的流程图。FIG. 2 is a flow chart of the data sending end acquiring interactive data.

图3为数据接收端利用交互数据生成虚拟现实场景的流程图。Fig. 3 is a flowchart of generating a virtual reality scene by using interactive data at the data receiving end.

具体实施方式Detailed ways

如图1所示，实现本发明三维显示远程视频通信方法的系统包括数据发送端1以及通过网络与数据发送端1连接的数据接收端2，数据发送端1和数据接收端2实时地进行数据交互，实现远程视频通信。数据发送端1和数据接收端2均同时进行数据的接收和发送，两个终端的逻辑结构相同，为方便本发明的描述，人为的分为数据发送端1和数据接收端2。As shown in Figure 1, the system that realizes the three-dimensional display remote video communication method of the present invention includes a data sending end 1 and a data receiving end 2 connected to the data sending end 1 through a network, and the data sending end 1 and the data receiving end 2 perform data processing in real time. Interaction to realize remote video communication. The data sending end 1 and the data receiving end 2 both receive and send data at the same time, and the logical structure of the two terminals is the same. For the convenience of description of the present invention, they are artificially divided into data sending end 1 and data receiving end 2.

数据发送端1包括主控计算机8、三维显示设备5、人物捕获设备6，其中，主控计算机8包含网络传输模块9、三维显示软件模块4和人物捕获软件模块3。三维显示设备5和人物捕获设备6分别进入主控计算机8的三维显示软件模块4和人物捕获软件模块3，三维显示软件模块4和人物捕获软件模块3分别与网络传输模块9相连接，网络传输模块9通过互联网连接至数据接收端2，三维显示设备5在通信交互区域7输出图像数据，人物捕获软件模块3用于获取通信交互区域7内的人物图像，三维显示设备5的可视区域及人物捕获设备6的捕获区域均要能够覆盖通信交互区域7。The data sending end 1 includes a main control computer 8 , a three-dimensional display device 5 , and a person capture device 6 , wherein the main control computer 8 includes a network transmission module 9 , a three-dimensional display software module 4 and a person capture software module 3 . The three-dimensional display device 5 and the character capture device 6 enter the three-dimensional display software module 4 and the character capture software module 3 of the main control computer 8 respectively, and the three-dimensional display software module 4 and the character capture software module 3 are connected with the network transmission module 9 respectively, and the network transmission The module 9 is connected to the data receiving terminal 2 through the Internet, the three-dimensional display device 5 outputs image data in the communication interaction area 7, the character capture software module 3 is used to obtain the character image in the communication interaction area 7, the visible area and The capture areas of the character capture devices 6 must be able to cover the communication interaction area 7 .

网络传输模块9对需要在网络上传输的数据进行了精简与压缩，通过互联网传送至数据接收端2；同时接受其他终端通过互联网发送过来的数据。数据包括人物的表情参数，肢体动作参数，需要更新的人物纹理以及语音信息，数据量比较小，压缩后可以满足网络实时传输。The network transmission module 9 simplifies and compresses the data that needs to be transmitted on the network, and transmits it to the data receiving end 2 through the Internet; at the same time, it accepts data sent by other terminals through the Internet. The data includes character expression parameters, body movement parameters, character textures that need to be updated, and voice information. The amount of data is relatively small, and it can be compressed to meet real-time network transmission.

三维显示设备5是使用集成光场三维显示技术的实时空间三维呈现系统，这是本发明的优点之一。光场三维显示的原理为重构三维场景发出的光线的空间分布，使用该原理的三维显示设备主要包括投影模块和定向散射屏模块。投影模块和定向散射屏模块具有多种排布模式，这里以实现360度全景观看的多投影式光场显示系统为例。围绕屏幕环状分布的投影仪将所要显示的空间三维模型或三维场景的各个角度投影的组合图像投影成像于环状屏幕的中心区域，该环状屏幕为具有二向散射特性的定向散射屏幕，即在横向具有特定散射角度、在纵向具有较大散射角度特性，呈旋转对称分布的散射屏幕结构。由此，环状分布的投影仪投影的图像就被转换成360度全景可视的空间三维场景图像，供围绕在屏幕周边观察区域内的观众观看。根据光场重建的原理以及环状定向散射屏的特性，在观察区域内的每一个位置仅能观看到对应于这个位置的一个投影仪投射出的一窄条图像，而每一个位置能看多个投影仪投射出的多个窄条图像的组合图像便形成该位置完整的画面，通过这种类似于集成拼接光场的模式呈现出完整的图像信息。那么在观察区域内的不同位置均能观看到对应于相应位置的不同图像，这就能保证在观察区域内的观众的双眼观看到的图像信息是不同的，通过双目视差形成三维视觉，亦可以通过在横向不同位置间的移动获得运动视差。三维显示设备5对于三维显示软件模块4的依赖关系仅在于三维显示设备5需要三维显示软件模块4提供模型、纹理等绘制三维效果所必须的数据和参数。三维显示软件模块4的具体实现方法和三维显示设备5无关。The three-dimensional display device 5 is a real-time spatial three-dimensional rendering system using integrated light field three-dimensional display technology, which is one of the advantages of the present invention. The principle of light field 3D display is to reconstruct the spatial distribution of light emitted by a 3D scene. The 3D display device using this principle mainly includes a projection module and a directional scattering screen module. The projection module and the directional scattering screen module have a variety of arrangement modes. Here, a multi-projection light field display system that realizes 360-degree panoramic viewing is taken as an example. The projectors distributed in a ring around the screen project the combined images of the three-dimensional model or three-dimensional scene to be displayed at various angles into the central area of the ring screen. The ring screen is a directional scattering screen with two-way scattering characteristics. That is, a scattering screen structure with a specific scattering angle in the horizontal direction and a larger scattering angle in the vertical direction, and a rotationally symmetrical distribution. Thus, the images projected by the annularly distributed projectors are converted into 360-degree panoramic and visible spatial three-dimensional scene images for viewers in the viewing area surrounding the screen. According to the principle of light field reconstruction and the characteristics of the annular directional scattering screen, each position in the observation area can only watch a narrow image projected by a projector corresponding to this position, and each position can see more The combined image of multiple narrow strip images projected by a projector forms a complete picture of the position, and the complete image information is presented through this mode similar to the integrated stitching light field. Then different images corresponding to the corresponding positions can be viewed at different positions in the observation area, which can ensure that the image information viewed by the eyes of the audience in the observation area is different, and three-dimensional vision is formed through binocular parallax, which is also Motion parallax can be obtained by moving between different positions in the lateral direction. The dependence of the 3D display device 5 on the 3D display software module 4 is only that the 3D display device 5 requires the 3D display software module 4 to provide models, textures and other necessary data and parameters for rendering 3D effects. The specific implementation method of the three-dimensional display software module 4 has nothing to do with the three-dimensional display device 5 .

三维显示远程视频通信系统的主要功能是实现两个方向上的实时信息交换。第一个方向是视频通信参与者的外形、表情、语音、动作等交互数的据获与发送。人物捕获设备6对通信交互区域7中的通信参与用户的外形、动作、语音进行捕捉，获得基本的图像、音频数据。这些数据被传送给人物捕获软件模块3，人物捕获软件模块3通过一些算法从中计算出人物的外形、语音和表情动作信息，并对其进行优化和累积，以得到更加符合真实场景情况的信息，并把这些数据传送给网络传输模块9。网络传输模块9对数据进行压缩和打包，通过互联网将其发送到另一个远程通信终端（即为数据接收端），从而完成这个方向上的数据传输。第二个方向是接收视频通信参与者的外形、表情、语音、动作等交互数据，并生成虚拟现实场景。网络传输模块9通过互联网获得另一个终端发送过来的通信用户数据，并将解压缩后的数据传送给三维显示软件模块4。三维显示软件模块4通过获得的用户外形及表情、动作、语音数据，计算出三维显示设备5需要呈现的三维模型及纹理信息，传送给三维图像显示设备5，使其呈现相应的三维效果并播放语音，供通信交互区域7中的参与者观看。这两个方向的实时信息交换的完成，使得三维显示远程视频通信系统不同终端的使用者们可以实时地观察到对方的外形、表情、动作等三维效果，听到对方的语音，并且也可以实时地把自己的这些信息反馈给对方。The main function of the three-dimensional display remote video communication system is to realize real-time information exchange in two directions. The first direction is the acquisition and transmission of interactive data such as appearance, expression, voice, and action of video communication participants. The character capture device 6 captures the appearance, action and voice of the communication participating users in the communication interaction area 7 to obtain basic image and audio data. These data are sent to the character capture software module 3, and the character capture software module 3 calculates the shape, voice and facial expressions of the character through some algorithms, and optimizes and accumulates it to obtain information that is more in line with the real scene. And send these data to network transmission module 9. The network transmission module 9 compresses and packs the data, and sends it to another remote communication terminal (that is, the data receiving end) through the Internet, so as to complete the data transmission in this direction. The second direction is to receive interactive data such as appearance, expression, voice, and action of video communication participants, and generate virtual reality scenes. The network transmission module 9 obtains the communication user data sent by another terminal through the Internet, and transmits the decompressed data to the three-dimensional display software module 4 . The three-dimensional display software module 4 calculates the three-dimensional model and texture information that the three-dimensional display device 5 needs to present through the obtained user appearance, expression, action, and voice data, and transmits it to the three-dimensional image display device 5, so that it presents a corresponding three-dimensional effect and plays Voice, for the participants in the communication interaction area 7 to watch. The completion of real-time information exchange in these two directions enables users of different terminals of the three-dimensional display remote video communication system to observe the other party's appearance, expression, movement and other three-dimensional effects in real time, hear the other party's voice, and also can real-time Feedback this information to the other party.

下面详细介绍上面两个方向上实时信息交换的具体实施过程，包括以下步骤：The specific implementation process of real-time information exchange in the above two directions is introduced in detail below, including the following steps:

如图2所示，数据发送端：As shown in Figure 2, the data sender:

从深度图像中提取肢体特征信息并重建出人物点云。Extract body feature information from depth image and reconstruct character point cloud.

人物捕获设备选用廉价、易用且便于部署的RGB-D相机，利用RGB-D相机获取第一用户的人物图像时，在不同时刻改变RGB-D相机相对第一用户的视角，在本实施例中具体使用的RGB-D相机是Kinect。Kinect可以实时采集场景的颜色及深度数据流，并作为颜色帧和深度帧输送至人物捕获软件模块。这些数据流本身就是场景部分的纹理信息以及深度信息，其中包含了人物的纹理及深度数据。人物捕获软件模块3从人物的纹理数据中实时地提取出人物的面部特征及表情特征信息，从人物的深度数据中提取人物肢体特征信息，这些信息定义了人物的几何外形、面部表情和肢体动作信息，从深度数据中可以重建出人物的点云，可以反映人物的原始几何数据。The character capture device uses an inexpensive, easy-to-use, and easy-to-deploy RGB-D camera. When using the RGB-D camera to capture the first user's character image, change the angle of view of the RGB-D camera relative to the first user at different times. In this embodiment The RGB-D camera specifically used in is Kinect. Kinect can collect the color and depth data stream of the scene in real time, and send them to the character capture software module as color frames and depth frames. These data streams themselves are the texture information and depth information of the scene part, which contains the texture and depth data of the characters. The character capture software module 3 extracts the character's facial features and expression feature information from the character's texture data in real time, and extracts the character's body feature information from the character's depth data, which defines the character's geometric shape, facial expression and body movements Information, the point cloud of the character can be reconstructed from the depth data, which can reflect the original geometric data of the character.

本实施例中，使用Kinect作为RGB-D相机，针对Kinect软件开发，微软公司提供了Kinect的Software Development Kit（SDK），使用KinectSDK中的接口即可以提取特征信息，重建人物点云。In this embodiment, Kinect is used as the RGB-D camera. For Kinect software development, Microsoft provides Kinect's Software Development Kit (SDK). Using the interface in the KinectSDK, feature information can be extracted and character point clouds can be reconstructed.

2）利用所述的人物点云对步骤1）中各特征信息进行优化得到优化后的特征信息，通过该优化后的特征信息生成三维模型A。2) Using the person point cloud to optimize each feature information in step 1) to obtain optimized feature information, and generate a 3D model A through the optimized feature information.

这里的各特征信息包括上述的面部特征信息、表情特征信息和肢体特征信息，利用重建的人物点云对各特征信号进行优化得到优化后的特称信息，再通过该优化后的特征信息生成三维模型A。The feature information here includes the above-mentioned facial feature information, expression feature information, and body feature information. The reconstructed character point cloud is used to optimize each feature signal to obtain the optimized special name information, and then use the optimized feature information to generate a 3D model. Model A.

特征信息优化：定义能量方程，以优化后的特征信息作为未知数。方程描述了优化后的特征信息生成的模型与优化前的特征信息生成的模型在几何形状上的差别程度以及优化后的特征信息生成的模型与点云数据在几何形状上的差别程度这两者之和。求使此能量方程取得最小值的解，即得到优化后的特征信息。Feature information optimization: define the energy equation, and use the optimized feature information as the unknown. The equation describes the degree of difference in geometry between the model generated by the optimized feature information and the model generated by the feature information before optimization, and the degree of difference in geometry between the model generated by the optimized feature information and the point cloud data. Sum. Find the solution that makes this energy equation obtain the minimum value, that is, obtain the optimized characteristic information.

根据特征信息生成三维模型：使用了模型变形方法，包括Morph技术和Skinned Mesh技术。均为常规技术手段。Generate a 3D model based on feature information: Model deformation methods are used, including Morph technology and Skinned Mesh technology. All are conventional techniques.

3）将所述的三维模型A投影到对应的纹理图像，将该三维模型A所对应的纹理数据提取出来；在采集人物图像的同时获取语音信息，将该语音信息连同优化后的特征信息，以及提取出来的纹理数据发送给第二用户。3) Project the 3D model A to the corresponding texture image, extract the texture data corresponding to the 3D model A; acquire voice information while collecting the character image, and combine the voice information with the optimized feature information, And the extracted texture data is sent to the second user.

将三维模型A投影到Kinect采集到的纹理图像中，提取出三维模型A所对应的纹理数据，其中，语音信息可以通过Kinect中的声音采集设备直接获得，然后网络传输模块9将语音信息、优化后的特征信息以及提取出来的纹理数据经压缩、打包处理后发送给所述的第二用户。对于Kinect捕获的每个颜色帧，都可以提取出采集到的人物原始纹理，我们对多次采集到的人物原始纹理数据进行积累与优化，提高其鲁棒性，得到质量较好的人物纹理。The three-dimensional model A is projected into the texture image collected by Kinect, and the texture data corresponding to the three-dimensional model A is extracted, wherein the voice information can be directly obtained by the sound collection device in the Kinect, and then the network transmission module 9 converts the voice information, optimized After the feature information and the extracted texture data are compressed and packaged, they are sent to the second user. For each color frame captured by Kinect, the original texture of the person can be extracted. We accumulate and optimize the original texture data of the person collected many times to improve its robustness and obtain a character texture with better quality.

纹理的优化：每个人物都有一个基础纹理，每次提取到新的纹理数据都会用来更新基础纹理数据，这种更新是一种累积的方式，能优化纹理。具体的更新方法是：对于每一个像素，根据其累计采集到的纹理数据估计出一个概率分布，用这个概率分布的期望值来更新本像素的颜色值。Texture optimization: Each character has a basic texture, and every time new texture data is extracted, it will be used to update the basic texture data. This update is a cumulative method that can optimize the texture. The specific update method is: for each pixel, a probability distribution is estimated according to the accumulated texture data collected, and the color value of the pixel is updated with the expected value of the probability distribution.

4）循环操作步骤1）～3），利用后一时刻的数据对前一时刻的数据进行更新并发送给第二用户。4) Cycle operation steps 1) to 3), update the data at the previous time with the data at the next time and send it to the second user.

通过多次改变视角获取人物图像，并对多次采集到的人物原始纹理数据进行积累与优化，在结合对应的三维模型，就可以得到完整的人物的外形、表情、动作数据。By changing the angle of view multiple times to obtain character images, and accumulating and optimizing the original texture data of the characters collected multiple times, combined with the corresponding 3D model, you can get the complete shape, expression, and action data of the character.

如图3所示，数据接收端：As shown in Figure 3, the data receiving end:

数据接收端内的网络传输模块接收通过互联网发送过来的数据包，对其中的数据进行解压缩，并完成数据同步的工作。网络传输模块将当前需要进行虚拟现实呈现的数据提供给三维显示软件模块，这些数据包括人物的几何外形的特征信息、面部表情的特征信息、肢体动作的特征信息、需要更新的人物纹理信息、人物语音信息。三维显示软件模块的实施过程为：把人物的几何外形的特征信息、面部表情的特征信息、肢体动作的特征信息作为参数带入算法中，对三维显示软件模块中预存的标准人物模型进行变形。通过变形操作，可以得到符合上述静态及动态特征描述的人物模型。三维显示软件模块中最初存储着人物模型的基础纹理，在每一次得到需要更新的人物纹理信息时，三维显示软件模块就会对人物模型的基础纹理进行更新，并不断地累积，得到当前可以使用的人物纹理。人物语音数据可以直接使用。三维显示软件模块把计算得到的人物模型数据、人物纹理数据提交给三维显示设备，进行虚拟现实呈现，同时播放人物语音。The network transmission module in the data receiving end receives the data packets sent over the Internet, decompresses the data therein, and completes the work of data synchronization. The network transmission module provides the data that needs to be presented in virtual reality to the three-dimensional display software module. These data include the characteristic information of the geometric shape of the character, the characteristic information of the facial expression, the characteristic information of the body movement, the character texture information that needs to be updated, and the character information. voice message. The implementation process of the three-dimensional display software module is as follows: the characteristic information of the character's geometric shape, facial expression and body movement are brought into the algorithm as parameters, and the standard character model pre-stored in the three-dimensional display software module is deformed. Through the deformation operation, a character model conforming to the above static and dynamic feature descriptions can be obtained. The 3D display software module initially stores the basic texture of the character model, and every time the character texture information that needs to be updated is obtained, the 3D display software module will update the basic texture of the character model and continuously accumulate it to obtain the currently usable texture information. character texture. Character voice data can be used directly. The 3D display software module submits the calculated character model data and character texture data to the 3D display device for virtual reality presentation and simultaneously plays character voice.

Claims

1. can expand formula three-dimensional display remote video communication method for one kind, it is characterized in that implementation step is as follows:

Data sending terminal:

1) utilizes the RGB-D camera to obtain first user's character image, comprised texture image and depth image in this character image;

From texture image, extract facial characteristics information and expressive features information;

From depth image, extract the limbs characteristic information and reconstruct people's object point cloud;

2) utilize described people's object point cloud that each characteristic information in the step 1) is optimized characteristic information after being optimized, the characteristic information after optimizing by this generates threedimensional model A;

3) described threedimensional model A is projected to corresponding texture image, the corresponding data texturing of this threedimensional model A is extracted; Obtain voice messaging when gathering character image, together with the characteristic information after optimizing, and the data texturing that extracts sends to second user with this voice messaging;

4) cycling step 1)～3), utilize back one data constantly that the data of previous moment are upgraded and send to second user;

Data receiver:

Second user receives from after first user's data by network, characteristic information after the extraction optimization wherein generates threedimensional model B, utilize the described data texturing that extracts that threedimensional model B is played up, by three-dimensional display apparatus output rendering result, and play described voice messaging.

2. the formula three-dimensional display remote video communication method of expanding as claimed in claim 1, it is characterized in that, when utilizing back one data constantly that the data of previous moment are upgraded, characteristic information after utilizing characteristic information after the optimization constantly of back one to the optimization of previous moment upgrades processing, and the corresponding threedimensional model A that generates after upgrading, threedimensional model A after upgrading is projected to the texture image in this back moment, extract the data texturing after the renewal.

3. the formula three-dimensional display remote video communication method of expanding as claimed in claim 2 is characterized in that, when utilizing the RGB-D camera to obtain first user's character image, at relative first user's of different time changing RGB-D cameras visual angle.

4. the formula three-dimensional display remote video communication method of expanding as claimed in claim 3, it is characterized in that the characteristic information after described voice messaging, the optimization and the data texturing that extracts are compressed, the processing back of packing sends to described second user by network.

5. the formula three-dimensional display remote video communication method of expanding as claimed in claim 4, it is characterized in that, described three-dimensional display apparatus comprises projection module and directional scattering panel module, described directional scattering panel module is to have two screens to scattering properties, and described projection module is projected array.