KR20220135939A

KR20220135939A - Transmission apparatus, receiving apparatus and method for providing 3d image by using point cloud data

Info

Publication number: KR20220135939A
Application number: KR1020210042297A
Authority: KR
Inventors: 양태길
Original assignee: 주식회사 케이티
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2022-10-07

Abstract

A transmission apparatus for providing a 3D image using point cloud data includes: a receiving part for receiving a depth image of an object from an image capturing device; a virtual object generation part for converting the depth image into point cloud data in a virtual space to generate a virtual object corresponding to the object; a viewpoint image generation part for generating a plurality of viewpoint images of the virtual object; a synthesized image generation part for generating a synthesized image obtained by synthesizing the plurality of viewpoint images; and a transmission part for transmitting the synthesized image to a receiving apparatus. The synthesized image may be converted into a 3D image and output through a display of the receiving apparatus.

Description

Transmitting device, receiving device, and method for providing a 3D image using point cloud data

본 발명은 점군(Point Cloud) 데이터를 이용한 3차원 영상을 제공하는 송신 장치, 수신 장치 및 방법에 관한 것이다. The present invention relates to a transmitting apparatus, a receiving apparatus, and a method for providing a 3D image using point cloud data.

종래의 2차원 텔레프레즌스(Telepresence)는 원격지에 위치한 대상들을 같은 공간에 있는 것처럼 느낄 수 있도록 구현한 기술이다. 이러한 2차원 텔레프레즌스 기술은 주로 화상회의 및 공연 등에 많이 사용되고 있다. Conventional two-dimensional telepresence (Telepresence) is a technology implemented to make objects located at a remote place feel as if they are in the same space. This two-dimensional telepresence technology is mainly used in video conferences and performances.

도 1을 참조하면, 서로 다른 지역에 위치한 각 참가자의 영상은 프로젝터를 통해 유사 홀로그램 무대에 투사되어 원격의 참가자들이 한 공간에 존재하는 것과 같이 시연된다. 이 때, 각 참가자의 음성이 무대 스피커를 통해 출력되어 한 공간에서 서로 대화하는 것과 같은 효과가 연출된다. Referring to FIG. 1 , the images of each participant located in different areas are projected on a pseudo hologram stage through a projector, and the remote participants are demonstrated as if they were present in one space. At this time, the voices of each participant are output through the stage speakers, creating an effect such as talking to each other in one space.

이는, 텔레프레즌스의 상이 홀로그램처럼 보여지기 위해서 페퍼스 고스트 홀로그램 무대기술에 기반한 것으로, 이는 2차원 영상을 반사시키는 유사 홀로그램 기술이다. This is based on the Pepper's Ghost hologram stage technology to make the telepresence image look like a hologram, which is a pseudo-hologram technology that reflects a two-dimensional image.

한편, 일반적인 입체영상 디스플레이 장치는 대부분 양안시차(binocular disparity) 원리를 이용하여 입체 영상을 표시하고 있다. 이는 좌안과 우안에 비치는 상이 서로 다르기 때문에, 이를 바라보는 사람은 좌우 양안에 의한 시차의 지각을 통하여 입체감과 공간감을 얻을 수 있다. Meanwhile, most general stereoscopic image display apparatuses display a stereoscopic image using the principle of binocular disparity. Since the images reflected by the left and right eyes are different from each other, the viewer can obtain three-dimensionality and a sense of space through the perception of the parallax by the left and right eyes.

이러한 좌, 우 영상을 분리하는 방식에 따라 안경 방식과 무안경 방식이 있다. 하지만, 안경 방식의 경우, 3차원 입체 영상의 시청을 위해 물리적 장치(예컨대, HMD 장치, 3D 안경)를 사용해야 하는 불편함이 있기 때문에, 입체 가시화 기술은 안경식 스테레오 디스플레이를 넘어 무안경식 다시점 디스플레이 방식에 대한 연구가 활발히 진행되고 있다. According to a method of separating the left and right images, there are a glasses method and a glasses-free method. However, in the case of the glasses method, since there is an inconvenience of using a physical device (eg, an HMD device, 3D glasses) for viewing a 3D stereoscopic image, the stereoscopic visualization technology goes beyond the glasses-type stereo display and goes beyond the glasses-type stereo display and is a non-glasses-type multi-view display method. research is being actively conducted.

무안경식 다시점 디스플레이 방식인 라이트 필드(Light Field) 디스플레이는 픽셀이 모든 방향으로 동일한 빛을 발산하는 2차원 디스플레이(또는 안경식 디스플레이)와 달리, 방향에 따라 빛의 양을 달리하는 방식으로 양안에 다른 시점의 영상들을 투사하는 디스플레이이다. Unlike a two-dimensional display (or glasses-type display) in which pixels emit the same light in all directions, the light field display, which is a glasses-free multi-viewpoint display method, differs from each other by varying the amount of light depending on the direction. It is a display that projects images of a viewpoint.

이와 같이 라이트 필드 디스플레이는 디스플레이의 특수성으로 인해 기존 2차원 영상을 투사하는데 적합하지 않으며, 라이트 필드용 컨텐츠를 생성하는 과정이 반드시 필요하다. As such, the light field display is not suitable for projecting an existing two-dimensional image due to the specificity of the display, and a process of generating content for the light field is absolutely necessary.

라이트 필드용 콘텐츠를 생성하는 방법에는 실사 촬영을 이용한 방법과 가상 공간에서 3차원 렌더링 기술을 활용한 방법이 있다. There are two methods for creating content for the light field: a method using live-action shooting and a method using 3D rendering technology in a virtual space.

실사 촬영을 통해 라이트 필드용 콘텐츠를 획득하기 위해서는 복잡한 장치, 많은 공간이 소요되며, 각 카메라의 시점간 동기 및 배열(Align) 불일치의 난제로 대중화 되기에는 장벽이 높다. Acquiring content for light field through live-action shooting requires a complex device and a lot of space, and there is a high barrier to popularization due to the difficulty of synchronization and alignment mismatch between viewpoints of each camera.

일본등록특허공보 제6512575호 (2019.04.19. 등록)Japanese Patent Publication No. 6512575 (Registered on April 19, 2019)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 원격의 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상을 합성한 합성 영상을 3차원 영상으로 변환하여 현실 공간 상에 출력함으로써 3차원 대면 경험을 제공하고자 한다. The present invention is to solve the problems of the prior art described above, by converting a composite image obtained by synthesizing a plurality of viewpoint images of a virtual object corresponding to a remote object into a 3D image and outputting it in a real space in a 3D face-to-face We want to provide an experience.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 점군(Point Cloud) 데이터를 이용한 3차원 영상을 제공하는 송신 장치는 영상 촬영 장치로부터 대상체에 대한 깊이 영상을 수신하는 수신부; 상기 깊이 영상을 가상 공간 상의 점군 데이터로 변환하여 상기 대상체에 대응하는 가상 객체를 생성하는 가상 객체 생성부; 상기 가상 객체에 대한 복수의 시점 영상을 생성하는 시점 영상 생성부; 상기 복수의 시점 영상을 합성한 합성 영상을 생성하는 합성 영상 생성부; 및 상기 합성 영상을 수신 장치로 전송하는 전송부를 포함하고, 상기 합성 영상은 상기 수신 장치의 디스플레이를 통해 상기 3차원 영상으로 변환되어 출력될 수 있다. As a technical means for achieving the above-described technical problem, the transmitting device for providing a 3D image using point cloud data according to the first aspect of the present invention is a receiving unit for receiving a depth image of an object from an image capturing device. ; a virtual object generator converting the depth image into point cloud data in a virtual space to generate a virtual object corresponding to the object; a viewpoint image generator generating a plurality of viewpoint images of the virtual object; a synthesized image generator for generating a synthesized image obtained by synthesizing the plurality of viewpoint images; and a transmitter configured to transmit the synthesized image to a receiving device, wherein the synthesized image may be converted into the 3D image and output through a display of the receiving device.

본 발명의 제 2 측면에 따른 점군 데이터를 이용한 3차원 영상을 제공하는 수신 장치는 송신 장치로부터 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상이 합성된 합성 영상을 수신하는 수신부; 상기 합성 영상을 상기 3차원 영상으로 변환하는 변환부; 및 상기 수신 장치의 디스플레이를 통해 상기 3차원 영상을 출력하는 출력부를 포함하고, 상기 대상체에 대응하는 가상 객체는 상기 대상체에 대한 깊이 영상이 가상 공간 상의 점군 데이터로 변환되어 생성될 수 있다. A receiving apparatus for providing a 3D image using point cloud data according to a second aspect of the present invention includes: a receiving unit for receiving a composite image in which a plurality of viewpoint images of a virtual object corresponding to an object are synthesized from a transmitting apparatus; a conversion unit converting the synthesized image into the 3D image; and an output unit outputting the 3D image through a display of the receiving device, wherein the virtual object corresponding to the object may be generated by converting a depth image of the object into point cloud data in a virtual space.

본 발명의 제 3 측면에 따른 송신 장치에 의해 수행되는 점군 데이터를 이용한 3차원 영상을 제공하는 방법은 영상 촬영 장치로부터 대상체에 대한 깊이 영상을 수신하는 단계; 상기 깊이 영상을 가상 공간 상의 점군 데이터로 변환하여 상기 대상체에 대응하는 가상 객체를 생성하는 단계; 상기 가상 객체에 대한 복수의 시점 영상을 생성하는 단계; 상기 복수의 시점 영상을 합성한 합성 영상을 생성하는 단계; 및 상기 합성 영상을 수신 장치로 전송하는 단계를 포함하고, 상기 합성 영상은 상기 수신 장치의 디스플레이를 통해 상기 3차원 영상으로 변환되어 출력될 수 있다. According to a third aspect of the present invention, a method for providing a 3D image using point cloud data performed by a transmitting apparatus includes: receiving a depth image of an object from an image capturing apparatus; generating a virtual object corresponding to the object by converting the depth image into point cloud data in a virtual space; generating a plurality of viewpoint images of the virtual object; generating a synthesized image obtained by synthesizing the plurality of viewpoint images; and transmitting the synthesized image to a receiving device, wherein the synthesized image may be converted into the 3D image and output through a display of the receiving device.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 원격의 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상을 합성한 합성 영상을 3차원 영상으로 변환하여 현실 공간 상에 출력함으로써 3차원 대면 서비스를 제공할 수 있다. According to any one of the above-described problem solving means of the present invention, the present invention converts a 3D image obtained by synthesizing a plurality of viewpoint images of a virtual object corresponding to a remote object into a 3D image and outputs it in real space. Dimensional face-to-face service can be provided.

이를 통해, 사용자가 별도의 물리적 장치(예컨대, HMD 장치, 3D 안경)를 착용할 필요 없이, 디스플레이를 통해 출력되는 대상체에 대응하는 3차원 영상을 통해 대상체와 실시간으로 대화를 나눌 수 있고, 대상체가 사용자와 같은 공간 상에 있는 것과 같은 경험을 제공받을 수 있다. Through this, the user can have a conversation with the object in real time through a 3D image corresponding to the object output through the display, without the need for the user to wear a separate physical device (eg, HMD device, 3D glasses), and the object The user may be provided with the same experience as being in the same space.

도 1은 기존의 원격 텔레프레즌스(Telepresence)를 이용한 홀로그램 영상을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른, 3차원 영상 제공 시스템의 구성도이다.
도 3은 본 발명의 일 실시예에 따른, 도 2에 도시된 송신 장치의 블록도이다.
도 4a 내지 4c는 본 발명의 일 실시예에 따른, 합성 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른, 송신 장치를 통해 3차원 영상을 제공하는 방법을 나타낸 흐름도이다.
도 6은 본 발명의 일 실시예에 따른, 도 2에 도시된 수신 장치의 블록도이다.
도 7a 내지 7d는 본 발명의 일 실시예에 따른, 합성 영상을 3차원 영상으로 변환하는 방법을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시예에 따른, 수신 장치를 통해 3차원 영상을 제공하는 방법을 나타낸 흐름도이다. 1 is a view showing a holographic image using the conventional remote telepresence (Telepresence).
2 is a block diagram of a 3D image providing system according to an embodiment of the present invention.
3 is a block diagram of the transmitting apparatus shown in FIG. 2 according to an embodiment of the present invention.
4A to 4C are diagrams for explaining a method of generating a composite image according to an embodiment of the present invention.
5 is a flowchart illustrating a method of providing a 3D image through a transmitting apparatus according to an embodiment of the present invention.
6 is a block diagram of the receiving apparatus shown in FIG. 2 according to an embodiment of the present invention.
7A to 7D are diagrams for explaining a method of converting a synthesized image into a 3D image according to an embodiment of the present invention.
8 is a flowchart illustrating a method of providing a 3D image through a receiving device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described as being performed by the terminal or device in this specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.

이하, 첨부된 구성도 또는 처리 흐름도를 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다. Hereinafter, detailed contents for carrying out the present invention will be described with reference to the accompanying configuration diagram or process flow diagram.

도 2는 본 발명의 일 실시예에 따른, 3차원 영상 제공 시스템의 구성도이다. 2 is a block diagram of a 3D image providing system according to an embodiment of the present invention.

도 2를 참조하면, 3차원 영상 제공 시스템은 송신 장치(100) 및 수신 장치(110)를 포함할 수 있다. 다만, 이러한 도 2의 3차원 영상 제공 시스템은 본 발명의 일 실시예에 불과하므로 도 2를 통해 본 발명이 한정 해석되는 것은 아니며, 본 발명의 다양한 실시예들에 따라 도 2와 다르게 구성될 수도 있다. Referring to FIG. 2 , the 3D image providing system may include a transmitting device 100 and a receiving device 110 . However, since the 3D image providing system of FIG. 2 is only one embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 2 , and may be configured differently from FIG. 2 according to various embodiments of the present invention. have.

일반적으로, 도 2의 3차원 영상 제공 시스템의 각 구성요소들은 네트워크(미도시)를 통해 연결된다. 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다. In general, each component of the 3D image providing system of FIG. 2 is connected through a network (not shown). A network refers to a connection structure that allows information exchange between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World). Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound communication, Visible Light Communication (VLC), LiFi, and the like, but are not limited thereto.

송신 장치(100)는 영상 촬영 장치(미도시)로부터 대상체에 대한 깊이 영상을 수신할 수 있다. The transmitting apparatus 100 may receive a depth image of the object from an image capturing apparatus (not shown).

송신 장치(100)는 대상체에 대한 깊이 영상을 가상 공간 상의 점군(Point Cloud) 데이터로 변환하고, 점군 데이터를 이용하여 대상체에 대응하는 가상 객체를 생성할 수 있다. The transmitting apparatus 100 may convert a depth image of an object into point cloud data in a virtual space, and generate a virtual object corresponding to the object by using the point cloud data.

송신 장치(100)는 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상을 생성할 수 있다. 예를 들어, 송신 장치(100)는 가상 공간에 위치한 가상 객체를 중심으로 복수의 가상 카메라를 배치하고, 배치된 복수의 가상 카메라의 배열 정보에 기초하여 각 가상 카메라를 통해 가상 객체에 대한 복수의 시점 영상을 생성할 수 있다. The transmitting apparatus 100 may generate a plurality of viewpoint images of a virtual object corresponding to the object. For example, the transmitting device 100 arranges a plurality of virtual cameras centering on a virtual object located in a virtual space, and based on the arrangement information of the plurality of virtual cameras, a plurality of virtual objects for the virtual object through each virtual camera. A viewpoint image can be created.

송신 장치(100)는 가상 객체에 대한 복수의 시점 영상을 합성한 합성 영상을 생성하고, 생성된 합성 영상을 수신 장치(110)로 전송할 수 있다. The transmitting apparatus 100 may generate a synthesized image obtained by synthesizing a plurality of viewpoint images of a virtual object, and may transmit the generated synthesized image to the receiving apparatus 110 .

수신 장치(110)는 송신 장치(100)로부터 원격에 위치한 대상체에 대한 합성 영상을 수신할 수 있다. The receiving device 110 may receive a composite image of an object located remotely from the transmitting device 100 .

수신 장치(110)는 대상체에 대한 합성 영상을 3차원 영상으로 변환한 후, 3차원 영상을 수신 장치(110)의 디스플레이를 통해 출력할 수 있다. The receiving device 110 may convert the synthesized image of the object into a 3D image, and then output the 3D image through the display of the receiving device 110 .

이하에서는 도 2의 3차원 영상 제공 시스템의 각 구성요소의 동작에 대해 보다 구체적으로 설명한다. Hereinafter, the operation of each component of the 3D image providing system of FIG. 2 will be described in more detail.

도 3은 본 발명의 일 실시예에 따른, 도 2에 도시된 송신 장치(100)의 블록도이다. 3 is a block diagram of the transmitting apparatus 100 shown in FIG. 2 according to an embodiment of the present invention.

도 3을 참조하면, 송신 장치(100)는 수신부(300), 가상 객체 생성부(310), 시점 영상 생성부(320), 합성 영상 생성부(330) 및 전송부(340)를 포함할 수 있다. 다만, 도 3에 도시된 송신 장치(100)는 본 발명의 하나의 구현 예에 불과하며, 도 3에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. Referring to FIG. 3 , the transmitter 100 may include a receiver 300 , a virtual object generator 310 , a viewpoint image generator 320 , a synthesized image generator 330 , and a transmitter 340 . have. However, the transmitting apparatus 100 shown in FIG. 3 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 3 .

이하에서는 도 4a 내지 4c를 함께 참조하여 도 3을 설명하기로 한다. Hereinafter, FIG. 3 will be described with reference to FIGS. 4A to 4C.

본 발명은 점군 데이터(Point Cloud) 및 라이트 필드(Light Field) 기술을 이용하여 원격의 다른 공간에 위치한 대상체를 3차원으로 사용자가 현재 위치한 공간 상에 표현할 수 있다. According to the present invention, an object located in another remote space can be expressed in three dimensions in the space where the user is currently located by using point cloud data and light field technology.

먼저, 수신부(300)는 영상 촬영 장치(미도시)로부터 대상체에 대한 깊이 영상을 수신할 수 있다. First, the receiver 300 may receive a depth image of an object from an image capturing apparatus (not shown).

여기서, 대상체에 대한 깊이 영상은 영상 촬영 장치(미도시) 및 대상체 간의 거리 데이터 및 대상체에 대한 복수의 픽셀 정보가 포함된 색상 데이터를 포함할 수 있다. Here, the depth image of the object may include distance data between an image capturing apparatus (not shown) and the object and color data including a plurality of pixel information on the object.

여기서, 영상 촬영 장치(미도시)는 예를 들어, 이미지 센서, 적외선 이미터 및 적외선 깊이 센서를 포함할 수 있다. 또한, 영상 촬영 장치(미도시)에는 프로세싱 장치 및 명령어들을 저장하는 메모리가 더 포함될 수 있다. 영상 촬영 장치(미도시)에서 명령어들이 실행될 때, 명령어들은 프로세싱 장치로 하여금 복수의 동작들을 수행하게 할 수 있다. 예를 들어, 명령어들은 이미지 센서에 의해 캡쳐된 가시광에 기초하여 이미지 데이터를 결정하는 동작, 적외선 이미터에 의해 전송되고, 적외선 깊이 센서에 의해 캡쳐된 적외선 광에 기초하여 거리 데이터를 결정하는 동작을 포함할 수 있다. Here, the image capturing apparatus (not shown) may include, for example, an image sensor, an infrared emitter, and an infrared depth sensor. In addition, the image capturing apparatus (not shown) may further include a processing device and a memory for storing instructions. When the instructions are executed by the image capturing apparatus (not shown), the instructions may cause the processing apparatus to perform a plurality of operations. For example, the instructions may include determining image data based on visible light captured by the image sensor, determining distance data based on infrared light transmitted by an infrared emitter and captured by an infrared depth sensor. may include

수신부(300)는 영상 촬영 장치(미도시)로부터 대상체의 음성 데이터를 더 수신할 수 있다. The receiver 300 may further receive audio data of the object from an image photographing apparatus (not shown).

가상 객체 생성부(310)는 대상체에 대한 깊이 영상을 가상 공간 상의 점군 데이터로 변환하여 대상체에 대응하는 가상 객체를 생성할 수 있다. The virtual object generator 310 may generate a virtual object corresponding to the object by converting the depth image of the object into point cloud data in the virtual space.

여기서, 점군 데이터는 3차원 공간 상에서의 포인트 집합으로서 2차원 이미지와 비교되는 값으로 3차원 상에서 하나의 포인트를 표현할 수 있다. 이러한, 점군 데이터는 포인트에 대한 위치 좌표 정보 및 색상 정보를 포함할 수 있는 벡터 형태의 집합이다.Here, the point cloud data is a set of points in a three-dimensional space, and may represent one point in three dimensions as a value compared with a two-dimensional image. Such point cloud data is a set of a vector form that may include position coordinate information and color information for a point.

가상 객체 생성부(310)는 대상체에 대한 깊이 영상로부터 거리 데이터 및 색상 데이터를 추출할 수 있다. The virtual object generator 310 may extract distance data and color data from the depth image of the object.

가상 객체 생성부(310)는 깊이 영상의 색상 데이터 및 거리 데이터에 기초하여 깊이 영상을 점군 데이터로 변환할 수 있다. The virtual object generator 310 may convert the depth image into point cloud data based on color data and distance data of the depth image.

예를 들어, 도 4a를 참조하면, 가상 객체 생성부(310)는 대상체에 대한 거리 데이터에 기초하여 대상체에 대한 깊이 영상을 가상 공간 상의 3차원 공간 좌표(x 좌표, y 좌표, z 좌표)로 매핑하여 대상체에 대응하는 점군 데이터로 변환할 수 있다. For example, referring to FIG. 4A , the virtual object generator 310 converts a depth image of an object into three-dimensional space coordinates (x coordinate, y coordinate, and z coordinate) in a virtual space based on distance data with respect to the object. It can be mapped and converted into point cloud data corresponding to the object.

즉, 대상체에 포함된 무수히 많은 포인트에 대한 위치 데이터들이 모여서 공간적인 구성을 이루는 점군 데이터가 생성될 수 있다. 이러한 점군 데이터는 밀도가 높아지면 높아질수록 점점 더 구체적인 데이터로 형성되면서 하나의 가상 객체로서의 의미를 갖게 된다. That is, point cloud data constituting a spatial configuration may be generated by collecting location data for innumerable points included in the object. As the density of the point cloud data increases, it becomes more and more specific data and has a meaning as a virtual object.

가상 객체 생성부(310)는 매쉬 기법을 이용하여 대상체에 대한 거리 데이터를 점군 데이터로 변환할 수 있다. The virtual object generator 310 may convert distance data of the object into point cloud data using a meshing technique.

예를 들어, 가상 객체 생성부(310)는 대상체에 대한 각 포인트들을 이은 폴리곤(Polygon)에 해당하는 면들을 집합으로 변환할 수 있다. 또한, 가상 객체 생성부(310)는 각 면의 좌표 정보 및 대상체에 대한 색상 데이터를 이용하여 각 폴리곤에 해당하는 면들의 집합에 칼라 텍스쳐를 적용함으로써 가상 공간 상의 가상 객체를 생성할 수 있다. For example, the virtual object generator 310 may convert faces corresponding to polygons connecting points of the object into a set. Also, the virtual object generator 310 may generate a virtual object in a virtual space by applying a color texture to a set of faces corresponding to each polygon using coordinate information of each face and color data of the object.

영상 촬영 장치(미도시)로부터 실시간으로 대상체에 대한 깊이 영상 및 칼라 영상이 수신되면, 가상 객체 생성부(310)는 실시간으로 수신한 깊이 영상 및 칼라 영상에 따라 가상 객체를 실시간으로 생성할 수 있다. When a depth image and a color image of an object are received in real time from an image capturing apparatus (not shown), the virtual object generator 310 may generate a virtual object in real time according to the received depth image and color image in real time. .

한편, 점군 데이터를 사용하지 않는 경우, 가상 객체 생성부(310)는 대상체를 촬영하는 N개의 영상 촬영 장치(미도시)의 카메라 배열 정보를 이용하여 N개의 영상 촬영 장치(미도시)로부터 수신된 대상체에 대한 깊이 영상을 후술하는 타일 형태로 합성하여 대상체에 대응하는 가상 객체를 생성할 수도 있다. On the other hand, when the point cloud data is not used, the virtual object generating unit 310 uses the camera arrangement information of the N image photographing apparatuses (not illustrated) for photographing the object received from the N image photographing apparatuses (not illustrated). A virtual object corresponding to the object may be generated by synthesizing the depth image of the object in the form of a tile to be described later.

하지만, N 개의 영상 촬영 장치(미도시)의 카메라 배열 정보를 이용하는 경우, 영상 촬영 장치(미도시)의 수에 비례하여 가상 객체의 생성 비용이 증가하고, 각 영상 촬영 장치(미도시)의 위치에 대한 캘리브레이션(Calibration) 이슈와 각 영상 촬영 장치(미도시)의 출력을 동기화해야 하는 기술적인 문제가 발생한다. 또한, 영상 촬영 장치(미도시)에 의한 촬영의 경우, 촬영 대상체의 뒤에 있는 배경을 제거하여 다른 공간 배경 효과를 넣거나 다른 3차원 그래픽 객체와 합성하는 것이 용이하지 않다. However, when camera arrangement information of N image photographing apparatuses (not shown) is used, the cost of creating a virtual object increases in proportion to the number of image photographing apparatuses (not illustrated), and the location of each image photographing apparatus (not illustrated) There is a technical problem of synchronizing the output of each imaging device (not shown) with a calibration issue for the . In addition, in the case of photographing by an image photographing apparatus (not shown), it is not easy to remove the background behind the object to be photographed to add another spatial background effect or to synthesize it with another 3D graphic object.

이러한 문제를 해결하기 위해, 본원에서는 영상 촬영 장치(미도시)로부터 수신된 대상체에 대한 깊이 영상으로부터 점군 데이터를 획득하여 대상체를 가상 공간 상에 존재하는 가상 객체로 생성한다. To solve this problem, in the present application, point cloud data is obtained from a depth image of an object received from an imaging device (not shown), and the object is generated as a virtual object existing in a virtual space.

또한, 3차원 데이터를 표현하는 점군 데이터는 상당량의 메모리를 차지하기 때문에 점군 데이터를 수신 장치(110)로 전송하기 위해서는 특별한 압축 방법이 요구된다. 점군 데이터를 일반적인 압축 방식을 이용하여 압축하게 되면 양자화로 인한 원본 데이터 변형이 발생한다. In addition, since the point cloud data representing the 3D data occupies a significant amount of memory, a special compression method is required to transmit the point cloud data to the receiving device 110 . When the point cloud data is compressed using a general compression method, the original data is deformed due to quantization.

따라서, 본 발명에서는 대상체에 대응하는 가상 객체를 생성하는데 점군 데이터를 이용할 뿐, 점군 데이터를 압축하여 수신 장치(110)로 전송하지는 않는다. Accordingly, in the present invention, the point cloud data is used only to generate a virtual object corresponding to the object, and the point cloud data is not compressed and transmitted to the receiving device 110 .

한편, 시점 영상 생성부(320)는 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상을 생성할 수 있다. Meanwhile, the viewpoint image generator 320 may generate a plurality of viewpoint images of the virtual object corresponding to the object.

예를 들어, 도 4b를 참조하면, 시점 영상 생성부(320)는 가상 공간 상에 위치한 가상 객체(40)를 중심으로 가상 객체(40)를 촬영하는 복수의 가상 카메라의 배열 정보를 획득할 수 있다. For example, referring to FIG. 4B , the viewpoint image generating unit 320 may obtain arrangement information of a plurality of virtual cameras for photographing the virtual object 40 centered on the virtual object 40 located in the virtual space. have.

또한, 시점 영상 생성부(320)는 복수의 가상 카메라의 배열 정보에 기초하여 가상 객체(40)에 대한 복수의 시점 각각에 대응하는 복수의 가상 카메라를 통해 가상 객체(40)에 대한 복수의 시점 영상을 생성할 수 있다. 여기서, 복수의 시점 영상을 촬영하는 가상 카메라의 개수는 수신 장치(110)에서 출력될 3차원 영상의 해상도와, 송신 장치(100) 및 수신 장치(110) 간의 관계 정보에 기초하여 결정될 수 있다. 또한, 복수의 시점 영상을 촬영하는 가상 카메라의 개수는 3차원 영상이 출력될 수신 장치(110)의 디스플레이에 대한 정보에 더 기초하여 결정될 수 있다. 예를 들어, 가상 카메라의 개수는 수신 장치(110)의 디스플레이를 구성하는 렌티큘러(Lenticular) 프레넬 렌즈의 특성 정보에 기초하여 결정될 수 있다. Also, the viewpoint image generator 320 may generate a plurality of viewpoints for the virtual object 40 through a plurality of virtual cameras corresponding to each of the plurality of viewpoints for the virtual object 40 based on the arrangement information of the plurality of virtual cameras. You can create an image. Here, the number of virtual cameras for capturing a plurality of viewpoint images may be determined based on the resolution of the 3D image to be output from the receiving device 110 and relationship information between the transmitting device 100 and the receiving device 110 . In addition, the number of virtual cameras for capturing a plurality of viewpoint images may be further determined based on information on the display of the receiving device 110 to which the 3D image is to be output. For example, the number of virtual cameras may be determined based on characteristic information of a lenticular Fresnel lens constituting the display of the receiving device 110 .

한편, 가상 객체에 대한 복수의 시점 영상을 각기 다른 비디오 채널로 수신 장치(110)로 전송할 수 있으나, 대상체의 음성 데이터의 합성 시 비디오 채널의 선택 문제와 서로 다른 비디오 채널로 전송할 경우, 수신 장치(110)에서 하나의 3차원 영상 프레임(예컨대, 라이트 필드 영상 프레임)을 구성하는데 있어서 각 시점 영상의 동기를 맞춰야 하는 문제가 발생한다. 또한, 복수의 시점 영상의 전송 과정에서 하나의 시점 영상의 손실이나 지연이 발생하게 되면, 수신 장치(110)에서 3차원 영상을 생성하기 어렵거나 3차원 영상의 생성에 지연이 발생하게 된다. Meanwhile, a plurality of viewpoint images of the virtual object may be transmitted to the receiving device 110 through different video channels. 110), in constructing one 3D image frame (eg, a light field image frame), there is a problem in that each viewpoint image must be synchronized. In addition, when loss or delay of one viewpoint image occurs in the process of transmitting a plurality of viewpoint images, it is difficult for the receiving device 110 to generate a 3D image or a delay occurs in generation of the 3D image.

이러한 문제점을 해결하기 위해, 합성 영상 생성부(330)는 가상 객체에 대한 복수의 시점 영상을 합성한 합성 영상을 생성한다. To solve this problem, the synthesized image generator 330 generates a synthesized image obtained by synthesizing a plurality of viewpoint images of a virtual object.

즉, 각 가상 카메라의 출력 영상(즉, 복수의 시점 영상)을 수신 장치(110)에서 용이하게 동기화하기 위해, 합성 영상 생성부(330)는 복수의 시점 영상을 타일 형식의 영상 프레임 열의 집합으로 하는 합성 영상을 생성할 수 있다. That is, in order to easily synchronize the output images (ie, a plurality of viewpoint images) of each virtual camera in the receiving device 110 , the synthesized image generator 330 converts the plurality of viewpoint images into a set of image frame columns in a tile format. A composite image can be created.

여기서, 타일 형태의 개수 및 사이즈는 수신 장치(110)의 디스플레이의 사이즈 정보 및 가상 카메라의 개수에 의해 결정될 수 있다. Here, the number and size of the tile shape may be determined by the size information of the display of the receiving device 110 and the number of virtual cameras.

예를 들어, 도 4c를 참조하면, 수신 장치(110)의 디스플레이 해상도가 8K이고, 가상카메라의 개수가 30개(이 때, 가로 시점의 경우, 6개의 가상 카메라, 세로 시점의 경우, 5개의 가상 카메라)인 경우, 합성 영상 생성부(330)는 디스플레이 해상도로부터 디스플레이의 사이즈 정보(7680 X 4320)를 도출하고, 디스플레이의 가로 정보(7680) 및 가로 시점의 개수(6개의 가상 카메라)에 기초하여 타일의 가로 해상도(1280=7680/6)를 계산할 수 있다. 또한, 합성 영상 생성부(330)는 디스플레이의 세로 정보(4320) 및 세로 시점의 개수(5개의 가상 카메라)에 기초하여 타일의 세로 해상도(864=4320/5)를 계산할 수 있다. 예를 들어, 각 타일의 가로 해상도는 [수학식 1]을 통해 계산되고, 각 타일의 세로 해상도는 [수학식 2]를 통해 계산될 수 있다. For example, referring to FIG. 4C , the display resolution of the receiving device 110 is 8K, and the number of virtual cameras is 30 (in this case, 6 virtual cameras in the case of a horizontal view, and 5 in the case of a vertical view) virtual camera), the synthesized image generator 330 derives size information (7680 X 4320) of the display from the display resolution, and based on the horizontal information 7680 of the display and the number of horizontal viewpoints (6 virtual cameras) Thus, the horizontal resolution of the tile (1280=7680/6) can be calculated. Also, the synthesized image generator 330 may calculate the vertical resolution (864=4320/5) of the tile based on the vertical information 4320 of the display and the number of vertical viewpoints (five virtual cameras). For example, the horizontal resolution of each tile may be calculated through [Equation 1], and the vertical resolution of each tile may be calculated through [Equation 2].

[수학식 1][Equation 1]

타일의 가로 해상도(WR) = 디스플레이 가로(W) / 가로 시점의 개수 (WN) Tile Horizontal Resolution (WR) = Display Horizontal (W) / Number of Horizontal Views (WN)

[수학식 2][Equation 2]

타일의 세로 해상도(HR) = 디스플레이 세로(H) / 세로 시점의 개수(HN) Tile's Vertical Resolution (HR) = Display Height (H) / Number of Vertical Views (HN)

합성 영상 생성부(330)는 수신 장치(110)의 디스플레이의 사이즈 정보 및 가상 카메라의 개수에 기초하여 복수의 시점 영상을 타일 형식으로 합성한 합성 영상을 생성할 수 있다. 예를 들어, 도 4c를 참조하면, 합성 영상 생성부(330)는 수신 장치(110)의 디스플레이의 사이즈 정보 및 가상 카메라의 개수에 기초하여 복수의 가상 카메라의 배열에 따라 촬영된 40개의 시점 영상을 6*5 타일 형식으로 합성한 합성 영상을 생성할 수 있다. The synthesized image generator 330 may generate a synthesized image obtained by synthesizing a plurality of viewpoint images in a tile format based on the size information of the display of the receiving device 110 and the number of virtual cameras. For example, referring to FIG. 4C , the synthesized image generator 330 captures 40 viewpoint images according to an arrangement of a plurality of virtual cameras based on the size information of the display of the receiving device 110 and the number of virtual cameras. It is possible to create a composite image synthesized in a 6*5 tile format.

합성 영상 생성부(330)는 기설정된 영상 압축 방식을 이용하여 합성 영상을 인코딩할 수 있다. The synthesized image generator 330 may encode the synthesized image using a preset image compression method.

전송부(340)는 합성 영상을 수신 장치(110)에게 전송할 수 있다. The transmitter 340 may transmit the synthesized image to the receiver 110 .

수신 장치(110)가 3차원 영상을 용이하게 생성하기 위해, 전송부(340)는 합성 영상의 전체 해상도(R), 합성 영상에 사용된 전체 시점 영상의 개수(N, 가상 카메라의 개수에 대응됨), 가로 시점의 개수(WN), 세로 시점의 개수(HN), 타일의 가로 해상도(WR), 및 타일의 세로 해상도(HR)를 포함한 합성 영상에 대한 정보를 수신 장치(110)로 더 전송할 수 있다. In order for the receiving device 110 to easily generate the 3D image, the transmitter 340 corresponds to the total resolution (R) of the synthesized image and the number of total viewpoint images (N, the number of virtual cameras) used in the synthesized image. ), the number of horizontal views (WN), the number of vertical views (HN), the horizontal resolution (WR) of the tile, and the vertical resolution (HR) of the tile to the receiving device 110 . can be transmitted

전송부(340)는 인코딩된 합성 영상 및 대상체의 음성 데이터를 수신 장치(110)에게 전송할 수 있다. 이 때, 인코딩된 합성 영상 및 대상체의 음성 데이터는 표준 송신 규격으로 수신 장치(110)에게 전송될 수 있다. The transmitter 340 may transmit the encoded composite image and the voice data of the object to the receiver 110 . In this case, the encoded composite image and the audio data of the object may be transmitted to the receiving device 110 according to a standard transmission standard.

이 때, 합성 영상은 수신 장치(110)의 디스플레이를 통해 3차원 영상으로 변환되어 출력될 수 있다. In this case, the synthesized image may be converted into a 3D image and output through the display of the receiving device 110 .

합성 영상에 포함된 각 시점 영상은 수신 장치(110)의 디스플레이의 픽셀 배열에 기초하여 3차원 영상으로 변환되어 출력될 수 있다. 이 때, 각 시점 영상에 대한 픽셀 데이터는 수신 장치(110)의 디스플레이에 포함된 광학 렌즈의 넓이 정보 및 기울기 정보에 기초하여 결정될 수 있다. Each viewpoint image included in the composite image may be converted into a 3D image and output based on the pixel arrangement of the display of the receiving device 110 . In this case, pixel data for each viewpoint image may be determined based on width information and tilt information of the optical lens included in the display of the receiving device 110 .

한편, 당업자라면, 수신부(300), 가상 객체 생성부(310), 시점 영상 생성부(320), 합성 영상 생성부(330) 및 전송부(340) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다. Meanwhile, for those skilled in the art, each of the receiver 300 , the virtual object generator 310 , the viewpoint image generator 320 , the synthesized image generator 330 , and the transmitter 340 may be implemented separately, or at least one of them It will be fully understood that this may be integrated and implemented.

도 5는 본 발명의 일 실시예에 따른, 송신 장치(100)를 통해 3차원 영상을 제공하는 방법을 나타낸 흐름도이다. 5 is a flowchart illustrating a method of providing a 3D image through the transmitting apparatus 100 according to an embodiment of the present invention.

도 5를 참조하면, 단계 S501에서 송신 장치(100)는 영상 촬영 장치로부터 대상체에 대한 깊이 영상을 수신할 수 있다. Referring to FIG. 5 , in operation S501 , the transmitting apparatus 100 may receive a depth image of an object from an image capturing apparatus.

단계 S503에서 송신 장치(100)는 깊이 영상을 가상 공간 상의 점군 데이터로 변환하여 대상체에 대응하는 가상 객체를 생성할 수 있다. In operation S503, the transmitting apparatus 100 may generate a virtual object corresponding to the object by converting the depth image into point cloud data in a virtual space.

단계 S505에서 송신 장치(100)는 가상 객체에 대한 복수의 시점 영상을 생성할 수 있다. In operation S505, the transmitting apparatus 100 may generate a plurality of viewpoint images of the virtual object.

단계 S507에서 송신 장치(100)는 복수의 시점 영상을 합성한 합성 영상을 생성할 수 있다. In operation S507, the transmitting apparatus 100 may generate a synthesized image obtained by synthesizing a plurality of viewpoint images.

단계 S509에서 송신 장치(100)는 합성 영상을 수신 장치(110)에게 전송할 수 있다. 여기서, 수신 장치(110)로 전송된 합성 영상은 수신 장치(110)의 디스플레이를 통해 3차원 영상으로 변환되어 출력될 수 있다. In step S509 , the transmitting device 100 may transmit the synthesized image to the receiving device 110 . Here, the synthesized image transmitted to the receiving device 110 may be converted into a 3D image and output through the display of the receiving device 110 .

상술한 설명에서, 단계 S501 내지 S509는 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S501 to S509 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be changed.

도 6은 본 발명의 일 실시예에 따른, 도 2에 도시된 수신 장치(110)의 블록도이다.6 is a block diagram of the receiving device 110 shown in FIG. 2 according to an embodiment of the present invention.

도 6을 참조하면, 수신 장치(110)는 수신부(600), 변환부(610) 및 출력부(620)를 포함할 수 있다. 다만, 도 6에 도시된 수신 장치(110)는 본 발명의 하나의 구현 예에 불과하며, 도 6에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. Referring to FIG. 6 , the receiving device 110 may include a receiving unit 600 , a converting unit 610 , and an output unit 620 . However, the receiving device 110 shown in FIG. 6 is only one embodiment of the present invention, and various modifications are possible based on the components shown in FIG. 6 .

이하에서는 7a 내지 7d를 함께 참조하여 도 6을 설명하기로 한다. Hereinafter, FIG. 6 will be described with reference to 7a to 7d.

수신 장치(110)는 송신 장치(100)로부터 수신된 합성 영상을 출력하여 광학 렌즈(예컨대, 렌티큘러 프레넬 렌즈)을 통해 3차원 영상을 공간 상에 형성할 수 있다. The receiving device 110 may output the synthesized image received from the transmitting device 100 to form a 3D image in space through an optical lens (eg, a lenticular Fresnel lens).

구체적으로, 수신부(600)는 송신 장치(100)로부터 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상이 합성된 합성 영상 및 대상체의 음성 데이터를 수신할 수 있다. 여기서, 대상체에 대응하는 가상 객체는 대상체에 대한 깊이 영상이 가상 공간 상의 점군 데이터로 변환되어 생성된 객체이다. 여기서, 복수의 시점 영상은 가상 공간 상에 위치한 가상 객체를 중심으로 배열된 복수의 가상 카메라를 통해 생성된 영상이다. In detail, the receiver 600 may receive, from the transmitter 100 , a composite image in which a plurality of viewpoint images of a virtual object corresponding to the object are synthesized and audio data of the object. Here, the virtual object corresponding to the object is an object generated by converting a depth image of the object into point cloud data in a virtual space. Here, the plurality of viewpoint images are images generated through a plurality of virtual cameras arranged around a virtual object located in a virtual space.

수신부(600)는 송신 장치(100)로부터 합성 영상의 전체 해상도, 합성 영상에 사용된 전체 시점 영상의 개수(가상 카메라의 개수에 대응됨), 가로 시점의 개수, 세로 시점의 개수, 타일의 가로 해상도, 및 타일의 세로 해상도를 포함한 합성 영상에 대한 정보를 더 수신할 수 있다. The receiver 600 receives the total resolution of the synthesized image from the transmitting device 100 , the number of total viewpoint images used in the synthesized image (corresponding to the number of virtual cameras), the number of horizontal viewpoints, the number of vertical viewpoints, and the number of horizontal views of the tile. Information on the composite image including the resolution and the vertical resolution of the tile may be further received.

송신 장치(100)로부터 수신된 합성 영상은 수신 장치(110)의 디스플레이의 사이즈 정보 및 가상 카메라의 개수에 기초하여 복수의 시점 영상이 타일 형식으로 합성된 영상일 수 있다. 여기서, 복수의 가상 카메라의 개수는 수신 장치(110)에서 출력될 3차원 영상의 해상도 및 송신 장치(100) 및 수신 장치(110) 간의 관계 정보에 기초하여 결정될 수 있다. 복수의 가상 카메라의 개수는 3차원 영상이 출력될 수신 장치(110)의 디스플레이에 대한 정보에 더 기초하여 결정될 수 있다. The synthesized image received from the transmitting apparatus 100 may be an image in which a plurality of viewpoint images are synthesized in a tile format based on the size information of the display of the receiving apparatus 110 and the number of virtual cameras. Here, the number of the plurality of virtual cameras may be determined based on the resolution of the 3D image to be output from the receiving device 110 and relationship information between the transmitting device 100 and the receiving device 110 . The number of the plurality of virtual cameras may be further determined based on information on the display of the receiving device 110 to which the 3D image is to be output.

변환부(610)는 인코딩된 합성 영상을 디코딩한 후, 디코딩된 합성 영상을 3차원 영상으로 변환할 수 있다. After decoding the encoded composite image, the converter 610 may convert the decoded composite image into a 3D image.

변환부(610)는 디코딩된 합성 영상을 타일 형태의 각 시점 영상으로 변환할 수 있다. 이 때, 타일 형태의 각 시점 영상은 예를 들어, 도 4c와 같은 형태로 구성될 수 있다. The converter 610 may convert the decoded synthesized image into a tile-shaped view image. In this case, each view image in the form of a tile may be configured in the form shown in FIG. 4C, for example.

변환부(610)는 송신 장치(100)로부터 수신된 합성 영상에 대한 정보에 기초하여 합성 영상을 3차원 영상으로 변환할 수 있다. The converter 610 may convert the synthesized image into a 3D image based on information on the synthesized image received from the transmitting apparatus 100 .

변환부(610)는 합성 영상에 포함된 각 시점 영상을 수신 장치(110)의 디스플레이의 픽셀 배열에 기초하여 3차원 영상으로 변환할 수 있다. The converter 610 may convert each viewpoint image included in the synthesized image into a 3D image based on the pixel arrangement of the display of the receiving device 110 .

예를 들어, 변환부(610)는 수신 장치(110)의 디스플레이에 포함된 광학 렌즈(예컨대, 렌티큘러 프레넬 렌즈)의 특성에 따른 고유의 합성 알고리즘을 이용하여 타일 형태의 각 시점 영상을 디스플레이의 픽셀 배열로 합성할 수 있다. 이 때, 픽셀배열의 기준은 광학 렌즈의 특성에 따라 상이할 수 있다. For example, the conversion unit 610 converts each viewpoint image in the form of a tile using a unique synthesis algorithm according to the characteristics of an optical lens (eg, lenticular Fresnel lens) included in the display of the receiving device 110 to the display of the display. It can be composited into an array of pixels. In this case, the pixel arrangement criterion may be different depending on the characteristics of the optical lens.

변환부(610)는 송신 장치(100)로부터 수신된 합성 영상에 대한 정보(전체 해상도, 합성 영상에 사용된 전체 시점 영상의 개수, 가로 시점의 개수, 세로 시점의 개수, 타일의 가로해상도, 및 타일의 세로 해상도)에 기초하여 각 시점 영상을 분할할 수 있다. 예를 들어, 전체 시점 영상의 개수가 30 시점으로 구성된 시점 영상인 경우, 각 시점 영상에 대한 인덱스는 1~30이고, 각 시점 영상의 픽셀 데이터는 도 7a와 같이, 수신 장치(110)의 디스플레이에 포함된 광학 렌즈의 넓이 정보 및 기울기 정보에 기초하여 결정될 수 있다. The conversion unit 610 provides information on the composite image received from the transmitting device 100 (full resolution, the number of total viewpoint images used in the synthesized image, the number of horizontal viewpoints, the number of vertical viewpoints, the horizontal resolution of the tile, and Each view image may be segmented based on the vertical resolution of the tile). For example, when the total number of viewpoint images is a viewpoint image composed of 30 viewpoints, an index for each viewpoint image is 1 to 30, and pixel data of each viewpoint image is displayed on the display of the receiving device 110 as shown in FIG. 7A . It may be determined based on width information and tilt information of the optical lens included in the .

도 7a는 전체 시점 개수가 30개인 타일 형태의 각 시점 영상을 이용하여 수신 장치(110)의 디스플레이 해상도를 기준으로 3차원 영상(라이트필드 영상)으로 합성하기 위한 픽셀 배치도를 나타낸 도면이다. FIG. 7A is a diagram illustrating a pixel layout for synthesizing a 3D image (light field image) based on the display resolution of the receiving device 110 using each view image in the form of a tile having a total number of views of 30. Referring to FIG.

도 7a를 참조하면, 각 픽셀 숫자는 타일 형태의 시점 별 시점 영상에 대한 인덱스를 나타낸다. 수신 장치(110)의 디스플레이를 구성하는 광학 렌즈의 프리즘의 기울기 정보 및 넓이 정보에 따라 시점 영상들이 배치될 수 있다. 여기서, 같은 색상을 갖는 그룹은 각 시점 영상에서 픽셀의 픽셀 데이터(좌표 정보)가 같은 그룹이다. 픽셀 데이터가 동일한 그룹에 속하는지에 대한 연산의 경우 많은 연산량을 요구하기 때문에 수신 장치(110)의 GPU 모듈에서 병렬 처리될 수 있다. Referring to FIG. 7A , each pixel number indicates an index for a viewpoint image for each viewpoint in the form of a tile. Viewpoint images may be arranged according to inclination information and width information of a prism of an optical lens constituting the display of the receiving device 110 . Here, a group having the same color is a group having the same pixel data (coordinate information) of pixels in each view image. Since the calculation on whether pixel data belongs to the same group requires a large amount of calculation, it may be processed in parallel in the GPU module of the receiving device 110 .

도 7b는 광학 렌즈를 이용한 수신 장치(110)의 디스플레이에서 디스플레이 패널의 수직 라인에 따른 복원된 합성 영상의 위치를 분석하기 위한 도면으로서 7개의 전체 시점 개수에 기초하여 편향되게 배치된 광학 렌즈의 기울기 및 넓이에 따른 디스플레이를 나타낸 도면이다. 7B is a view for analyzing a position of a reconstructed composite image along a vertical line of a display panel in a display of the receiving device 110 using an optical lens, and the inclination of the optical lens arranged to be deflected based on the number of 7 total viewpoints and a view showing the display according to the width.

다시 도 6을 참조하면, 출력부(620)는 수신 장치(110)의 디스플레이를 통해 3차원 영상을 출력할 수 있다. 또한, 출력부(620)는 3차원 영상 및 대상체의 음성 데이터를 함께 출력할 수 있다. Referring back to FIG. 6 , the output unit 620 may output a 3D image through the display of the receiving device 110 . Also, the output unit 620 may output a 3D image and audio data of an object together.

예를 들어, 도 7c을 참조하면, 출력부(620)는 디스플레이에 포함된 광학 렌즈의 프리즘 원리를 이용하여 N개의 시점 영상이 공간 상에 수렴하도록 함으로써 N 개의 시점 영상을 갖는 3차원 영상을 출력할 수 있다. For example, referring to FIG. 7C , the output unit 620 outputs a 3D image having N viewpoint images by allowing the N viewpoint images to converge in space using the prism principle of an optical lens included in the display. can do.

도 7d는 도 4c의 시점별 타일 형식의 합성 영상을 도 7a 방식으로 출력한 3차원 영상을 나타낸다. FIG. 7D shows a 3D image obtained by outputting the synthesized image in a tile format for each viewpoint of FIG. 4C using the method of FIG. 7A .

한편, 당업자라면, 수신부(600), 변환부(610) 및 출력부(620) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다. Meanwhile, those skilled in the art will fully understand that the receiving unit 600 , the converting unit 610 , and the output unit 620 may be implemented separately, or one or more of them may be integrated.

도 8은 본 발명의 일 실시예에 따른, 수신 장치(110)를 통해 3차원 영상을 제공하는 방법을 나타낸 흐름도이다. 8 is a flowchart illustrating a method of providing a 3D image through the receiving device 110 according to an embodiment of the present invention.

도 8을 참조하면, 단계 S801에서 수신 장치(110)는 송신 장치(100)로부터 대상체에 대응하는 가상 객체에 대한 복수의 시점 영상이 합성된 합성 영상을 수신할 수 있다. 여기서, 대상체에 대응하는 가상 객체는 대상체에 대한 깊이 영상이 가상 공간 상의 점군 데이터로 변환되어 생성된 객체이다. Referring to FIG. 8 , in step S801 , the receiving device 110 may receive a composite image in which a plurality of viewpoint images of a virtual object corresponding to an object are synthesized from the transmitting device 100 . Here, the virtual object corresponding to the object is an object generated by converting a depth image of the object into point cloud data in a virtual space.

단계 S803에서 수신 장치(110)는 합성 영상을 3차원 영상으로 변환할 수 있다. In step S803, the receiving device 110 may convert the synthesized image into a 3D image.

단계 S805에서 수신 장치(110)는 수신 장치(110)의 디스플레이를 통해 3차원 영상을 출력할 수 있다. In step S805 , the receiving device 110 may output a 3D image through the display of the receiving device 110 .

상술한 설명에서, 단계 S801 내지 S805는 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S801 to S805 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between steps may be changed.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The foregoing description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

100: 송신 장치
110: 수신 장치
300: 수신부
310: 가상 객체 생성부
320: 시점 영상 생성부
330: 합성 영상 생성부
340: 전송부
600: 수신부
610: 변환부
620: 출력부100: sending device
110: receiving device
300: receiver
310: virtual object generator
320: viewpoint image generating unit
330: synthesized image generating unit
340: transmission unit
600: receiver
610: conversion unit
620: output unit

Claims

In the transmitting device for providing a three-dimensional image using point cloud (Point Cloud) data,
a receiver configured to receive a depth image of an object from an image capturing apparatus;
a virtual object generator converting the depth image into point cloud data in a virtual space to generate a virtual object corresponding to the object;
a viewpoint image generator generating a plurality of viewpoint images of the virtual object;
a synthesized image generator for generating a synthesized image obtained by synthesizing the plurality of viewpoint images; and
and a transmitter for transmitting the synthesized image to a receiving device,
The synthesized image is converted into the 3D image through a display of the receiving device and outputted.

The method of claim 1,
and the virtual object generator converts the depth image into the point cloud data based on color data and distance data of the depth image.

The method of claim 1,
and the viewpoint image generator generates the plurality of viewpoint images through a plurality of virtual cameras arranged around the virtual object located in the virtual space.

4. The method of claim 3,
The transmitting device, wherein the number of the virtual cameras is determined based on a resolution of the 3D image to be output from the receiving device and relationship information between the transmitting device and the receiving device.

5. The method of claim 4,
The number of the virtual cameras will be determined further based on information on a display of the receiving device to which the 3D image is to be output.

6. The method of claim 5,
The synthesized image generating unit generates the synthesized image in a tile format from the plurality of viewpoint images based on the size information of the display and the number of the virtual cameras.

The method of claim 1,
The receiving unit further receives the voice data of the object,
The synthesized image generator encodes the synthesized image using a preset image compression method,
The transmitter transmits the encoded composite image and the audio data of the object to the receiver.

7. The method of claim 6,
Each viewpoint image included in the synthesized image is converted into the 3D image based on the pixel arrangement of the display.

9. The method of claim 8,
The pixel data for each viewpoint image will be determined based on the width information and the inclination information of the optical lens included in the display, the transmitting device.

In the receiving device for providing a three-dimensional image using point cloud (Point Cloud) data,
a receiver configured to receive a composite image obtained by synthesizing a plurality of viewpoint images of a virtual object corresponding to an object from a transmitter;
a conversion unit converting the synthesized image into the 3D image; and
an output unit for outputting the 3D image through a display of the receiving device
including,
The virtual object corresponding to the object is generated by converting a depth image of the object into point cloud data in a virtual space.

11. The method of claim 10,
The plurality of viewpoint images are generated through a plurality of virtual cameras arranged around the virtual object located in a virtual space.

12. The method of claim 11,
The reception device, wherein the number of the virtual cameras is determined based on a resolution of the 3D image to be output from the reception device and relationship information between the transmission device and the reception device.

13. The method of claim 12,
The number of the virtual cameras will be determined further based on information on a display of the receiving device to which the 3D image is to be output.

14. The method of claim 13,
The receiving apparatus of claim 1, wherein the plurality of viewpoint images are generated as the composite image in a tile format based on the size information of the display and the number of the virtual cameras.

11. The method of claim 10,
The receiving unit further receives the voice data of the object from the transmitting device,
The output unit outputs the 3D image and audio data of the object.

11. The method of claim 10,
The receiving device of claim 1, wherein the converting unit converts each viewpoint image included in the synthesized image into the 3D image based on the pixel arrangement of the display.

17. The method of claim 16,
The pixel data for each viewpoint image will be determined based on the width information and the inclination information of the optical lens included in the display, the receiving device.

In the method of providing a three-dimensional image using point cloud data performed by a transmitting device,
receiving a depth image of the object from an image capturing apparatus;
generating a virtual object corresponding to the object by converting the depth image into point cloud data in a virtual space;
generating a plurality of viewpoint images of the virtual object;
generating a synthesized image obtained by synthesizing the plurality of viewpoint images; and
transmitting the synthesized image to a receiving device;
The synthesized image is converted into the three-dimensional image through a display of the receiving device and outputted.