KR100874226B1

KR100874226B1 - Multi-view Image and 3D Audio Transceiver and Transmit and Receive Method Using the Same

Info

Publication number: KR100874226B1
Application number: KR1020070002824A
Authority: KR
Inventors: 김만배; 호요성; 김종원; 변혜란; 유지상; 김홍국; 이관행; 류제하; 이승현
Original assignee: 광주과학기술원
Priority date: 2007-01-10
Filing date: 2007-01-10
Publication date: 2008-12-16
Also published as: KR20080065766A

Abstract

본 발명은 다시점 화상 및 3차원 오디오 송수신 장치 및 이를 이용한 송수신 방법에 관한 것이다. 보다 상세하게는 다시점 화상과 3차원 오디오 데이터를 결합하되 시청자가 보는 시점화상에 따라 다른 오디오를 재생함으로써 보다 실감나는 미디어를 공급할 수 있고, 수신측에서 원하는 시점의 화상만을 선택적으로 전송할 수 있어 좁은 대역폭으로도 가능하며, 중간시점의 화상을 생성하여 보다 모니터 시차에 맞는 화상을 제공할 수 있는 다시점 화상 및 3차원 오디오 송수신 장치 및 이를 이용한 송수신 방법에 관한 것이다.The present invention relates to a multi-view image and a three-dimensional audio transmission and reception apparatus and a transmission and reception method using the same. More specifically, by combining a multi-view image and three-dimensional audio data, by playing different audio according to the viewer's view image, the viewer can supply more realistic media, and the receiver can selectively transmit only the image of the desired viewpoint. The present invention relates to a multi-view image and a three-dimensional audio transmission apparatus and a transmission / reception method using the same, which can be used as a bandwidth and can generate an image of an intermediate view to provide an image more suitable for monitor parallax.

다시점 화상, 3D 오디오, 시점 변환, 다시점 적응, 중간시점 화상 Multiview Images, 3D Audio, Viewpoint Conversion, Multiview Adaptation, Midview Images

Description

Device for tranceiving multi-view video and 3D audio, Method for tranceiving the same}

도 1은 본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 장치의 블록도,1 is a block diagram of a multi-view image and 3D audio transmission and reception apparatus according to a preferred embodiment of the present invention;

도 2는 도 1 중 다시점 화상 부호화부의 상세한 구성을 도시한 블록도,FIG. 2 is a block diagram showing a detailed configuration of a multi-view image encoder in FIG. 1;

도 3은 도 1 중 다시점 방송 서버부의 상세한 구성을 도시한 블록도,3 is a block diagram showing a detailed configuration of a multi-view broadcast server unit in FIG.

도 4는 도 1 중 다시점 적응 서버부의 상세한 구성을 도시한 블록도,4 is a block diagram showing a detailed configuration of a multi-view adaptive server unit in FIG.

도 5는 도 1 중 다시점 데이터 전송부의 상세한 구성을 도시한 블록도,5 is a block diagram illustrating a detailed configuration of a multi-view data transmission unit of FIG. 1;

도 6은 도 1 중 시점 변환부의 상세한 구성을 도시한 블록도,6 is a block diagram illustrating a detailed configuration of a viewpoint converting unit in FIG. 1;

도 7은 도 1 중 다시점 데이터 복호화부의 상세한 구성을 도시한 블록도,7 is a block diagram illustrating a detailed configuration of a multi-view data decoding unit of FIG. 1;

도 8은 도 1 중 중간시점 화상 생성부의 상세한 구성을 도시한 블록도,FIG. 8 is a block diagram illustrating a detailed configuration of an intermediate view image generation unit in FIG. 1;

도 9는 도 1 중 3D 오디오 합성부의 상세한 구성을 도시한 블록도이다.FIG. 9 is a block diagram illustrating a detailed configuration of a 3D audio synthesizer in FIG. 1.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for main parts of the drawings>

100 - 다시점 화상 획득부 200 - 다시점 화상 부호화부100-Multiview Image Acquisition Unit 200-Multiview Image Coding Unit

300 - 3D 오디오 획득부 400 - 3D 오디오 부호화부300-3D audio encoder 400-3D audio encoder

500 - 다시점 방송 서버부 600 - 다시점 적응 서버부500-Multiview Broadcast Server Unit 600-Multiview Adaptive Server Unit

700 - 다시점 데이터 전송부 800 - 시점 변환부700-Multiview Data Transmitter 800-Viewpoint Converter

900 - 다시점 데이터 복호화부 1000 - 다시점 화상 저장부900-Multiview Data Decoding Unit 1000-Multiview Image Storage

1100 - 중간시점 화상 생성부 1200 - 다시점 화상 재생부1100-Mid-view image generator 1200-Multi-view image playback unit

1300 - 3D 오디오 저장부 1400 - 3D 오디오 합성부1300-3D Audio Storage 1400-3D Audio Synthesis

1500 - 3D 오디오 재생부 1500-3D Audio Playback

본 발명은 다시점 화상 및 3차원(이하, "3D") 오디오 송수신 장치 및 이를 이용한 송수신 방법에 관한 것이다. 보다 상세하게는 다시점 화상과 3D 오디오 데이터를 결합하되 시청자가 보는 시점화상에 따라 다른 오디오를 재생함으로써 보다 실감나는 미디어를 공급할 수 있고, 수신측에서 원하는 시점의 화상만을 선택적으로 전송할 수 있어 좁은 대역폭으로도 가능하며, 중간시점의 화상을 생성하여 보다 모니터 시차에 맞는 화상을 제공할 수 있는 다시점 화상 및 3D 오디오 송수신 장치 및 이를 이용한 송수신 방법에 관한 것이다.The present invention relates to a multi-view image and a three-dimensional (hereinafter referred to as "3D") audio transceiver and a transmission and reception method using the same. More specifically, by combining multi-view images and 3D audio data, and playing different audio according to the viewpoint image viewed by the viewer, more realistic media can be supplied, and only the image of the desired viewpoint can be selectively transmitted from the receiving side, thereby narrowing bandwidth. Also, the present invention relates to a multi-view image and a 3D audio transmitting apparatus and a transmitting / receiving method using the same, which can generate an image of an intermediate view and provide a more suitable image for monitor parallax.

다시점(multi-view) 비디오는 평행 또는 아크 모양으로 배열된 동기화된 많은 개수의 카메라(예를 들면 8개)로 동일한 물체를 촬영해서 얻은 시점 화상(view image)의 집합이다. 이와 같은 다시점 비디오는 입체 디스플레이 장치뿐만 아니라 입체 방송, 실감 방송, 3D DMB 방송, FTV(Free-view TV) 등에서 사용자가 원하는 시점에서 시청하거나 콘텐츠를 3D 입체화상으로 보고자 할 때도 적용될 수 있는 응 용의 폭이 넓은 기술이다.Multi-view video is a collection of view images obtained by photographing the same object with a large number of synchronized cameras (e.g. eight) arranged in parallel or arc shapes. Such multi-view video can be applied not only to stereoscopic display devices but also to stereoscopic viewing, realistic broadcasting, 3D DMB broadcasting, free-view TV (FTV), etc., when the user wants to view a content or view content in 3D stereoscopic images. Is a broad technology.

이러한 다시점 화상의 처리에 관한 요소 기술로는 획득 기술, 모델링/렌더링 기술, 부호화/복호화 기술 및 전송 기술 등이 있다. Element technologies related to such multi-view image processing include acquisition techniques, modeling / rendering techniques, encoding / decoding techniques, and transmission techniques.

모델링/렌더링 기술은 다수개의 카메라를 통해 특정 객체(object)를 모델링하는 기법으로, 사용자는 임의의 시점에서 객체를 돌려볼 수 있다. 예를 들어, 일본 NHK 기술연구소의 "HD 카메라를 이용한 인체모델링 및 임의시점 영상 생성기술", 독일 MPI-infomatik의 "Free-viewpoint video기술" 등이 제안되었다. 모델링/렌더링 기술은 송수신 개념과는 별도로 개발된 기술로, 순수 영상처리분야에서 사용되고 있다.Modeling / rendering technology is a technique of modeling a specific object through a plurality of cameras, the user can look at the object at any point in time. For example, "human body modeling and random view image generation technology using HD camera" by NHK R & D Center in Japan, and "free-viewpoint video technology" by MPI-infomatik of Germany have been proposed. Modeling / rendering technology is developed separately from the concept of transmission and reception, and is used in the field of pure image processing.

한편, 다시점 동영상 기반의 부호화 및 전송기술로는 "Eye Vision"이 있다. Eye Vision은 예를 들어 스포츠 경기장에서 50대 정도의 카메라를 설치하고 임의의 시간동안 특정 객체에 대하여 사용자에게 360도 뷰를 생성하여 보여준다. 이 기술에서는 운용자가 임의의 시점을 선택하여 방송으로 송신하게 되므로, 시청자는 운영자가 전송해 주는 대로 시청할 수밖에 없다는 문제점이 있다. Meanwhile, a multi-view video-based encoding and transmission technology includes "Eye Vision." Eye Vision, for example, installs about 50 cameras in a sports arena and creates a 360-degree view to a user for a specific object at any time. In this technology, since the operator selects a random point of view and transmits it through a broadcast, there is a problem that the viewer has no choice but to watch as the operator transmits.

또한, 지금까지의 다시점 화상처리시스템은 다시점 화상 데이터와 오디오 데이터의 결합을 고려하지 않거나, 이를 고려하였다 하더라도 단일의 오디오 데이터와 다중화함으로써, 시청자가 어떤 시점의 화상을 시청하더라도 획일적인 사운드만 들려주어 현장감이 떨어진다는 문제점이 있다. 더욱이, 사용자가 시점을 선택할 수 있는 경우, 원하는 특정의 시점 화상에 맞추어 오디오 데이터가 실시간으로 적응하지 못한다는 문제점이 있다.In addition, conventional multi-view image processing systems do not consider the combination of multi-view image data and audio data or multiplex it with a single audio data even if this is considered, so that even if a viewer watches an image at any point in time, There is a problem that drop the sense of reality. Moreover, when the user can select a viewpoint, there is a problem that the audio data does not adapt in real time to a desired specific viewpoint image.

또한, 기존의 다시점 화상처리시스템은 여러 대의 카메라로부터 획득한 다양한 시점 화상들 모두를 전송하고, 수신측에서 시청자가 원하는 시점 화상을 선택하는 방식이므로, 데이터의 전송에 있어 엄청난 대역폭을 차지할 수밖에 없는 문제점이 있다.In addition, the conventional multi-view image processing system transmits all of the various viewpoint images acquired from several cameras, and the viewer selects a desired viewpoint image on the receiving side, thus inevitably taking up tremendous bandwidth in data transmission. There is a problem.

또한, 기존의 다시점 화상처리시스템은 획득된 시점 화상 전부를 송신하였다 하더라도 수신측에서는 카메라 개수만큼의 시점 화상만 시청할 수 있다. 따라서, 수신측 3D 모니터의 시점 간격에 맞출 수 있는 충분한 시점 화상이 존재하지 않아 시점 화상간의 차이가 크고, 시점간의 부드러운 변화를 제공할 수 없다는 문제점이 있다.In addition, even if the existing multi-viewpoint image processing system transmits all of the obtained viewpoint images, the receiving side can view only as many viewpoint images as the number of cameras. Therefore, there is a problem in that there is not enough viewpoint image to match the viewing interval of the receiving 3D monitor, so that the difference between the viewpoint images is large and a smooth change between viewpoints cannot be provided.

본 발명은 상기와 같은 과제를 해결하기 위해 안출된 것으로, 특히 다시점 화상과 3D 오디오 데이터를 결합하되 시청자가 보는 시점화상에 따라 다른 오디오를 재생함으로써 보다 실감나는 미디어를 공급할 수 있고, 수신측에서 원하는 시점의 화상만을 선택적으로 전송할 수 있어 좁은 대역폭으로도 가능하며, 중간시점의 화상을 생성하여 보다 모니터 시차에 맞는 화상을 제공할 수 있는 다시점 화상 및 3D 오디오 송수신 장치 및 이를 이용한 송수신 방법을 제공하는 데 그 목적이 있다.The present invention has been made to solve the above problems, in particular, by combining a multi-view image and 3D audio data, it is possible to supply a more realistic media by playing different audio according to the viewing image viewed by the viewer, at the receiving side Provides a multi-view image and 3D audio transmitting and receiving device and a transmission and reception method using the same that can selectively transmit only the image of the desired point of view, even with a narrow bandwidth, and can provide an image that meets the monitor parallax by generating an image of the intermediate view Its purpose is to.

상기 목적을 달성하기 위해 안출된 본 발명에 따른 다시점 화상 및 3D 오디오 송수신 장치는 다시점 화상 데이터와 3D 오디오 데이터를 송수신 처리하는 장치 에 있어서, 부호화된 다시점 화상 데이터 일부를 다중화하는 다시점 화상 다중화부와, 부호화된 나머지 다시점 화상 데이터와 부호화된 3D 오디오 데이터를 다중화하는 화상 오디오 다중화부를 포함하는 다시점 방송 서버부; 수신측에서 요구하는 시점 화상과 그에 대응되는 3D 오디오를 생성하는 다시점 적응 서버부; 시청자가 요구한 시점에 따라, 시점 화상의 기술자(descriptor)와 3D 오디오의 기술자를 생성하는 시점 변환부; 및 복호화된 다시점 화상 데이터를 통해 중간 시점의 화상을 생성하는 중간시점화상 생성부를 포함하는 것을 특징으로 한다.In order to achieve the above object, a multi-view image and a 3D audio transmitting and receiving device according to the present invention are apparatuses for transmitting and receiving multi-view image data and 3D audio data, and the multi-view image for multiplexing a part of encoded multi-view image data. A multi-view broadcasting server unit including a multiplexing unit and a video audio multiplexing unit for multiplexing the encoded remaining multiview image data and the encoded 3D audio data; A multi-view adaptive server unit for generating a viewpoint image requested by the receiving side and 3D audio corresponding thereto; A viewpoint conversion unit for generating a descriptor of a viewpoint image and a descriptor of 3D audio according to a viewpoint requested by the viewer; And an intermediate viewpoint image generating unit generating an intermediate viewpoint image through the decoded multi-view image data.

또한, 상기 다시점 적응 서버부는 상기 시점 변환부로부터 전송된 시점 기술자(view descriptor)를 파싱(parsing)하는 시점 기술자 파싱부; 상기 시점 기술자 파싱부로부터 다시점 리소스(resource)를 적응하는 화상 리소스 적응부; 상기 화상 리소스 적응부로부터 다시점 화상 기술자를 적응하는 화상 기술자 적응부; 상기 시점 기술자 파싱부로부터 오디오 리소스를 적응하는 오디오 리소스 적응부; 및 상기 오디오 리소스 적응부로부터 오디오 기술자를 적응하는 오디오 기술자 적응부를 포함할 수 있다.The multi-view adaptive server unit may further include: a view descriptor parser configured to parse a view descriptor transmitted from the view transform unit; An image resource adaptor adapted to adapt a multi-view resource from the viewpoint descriptor parser; An image descriptor adaptor adapted to adapt a multi-view image descriptor from the image resource adaptor; An audio resource adaptor adapted to adapt an audio resource from the viewpoint descriptor parser; And an audio descriptor adaptor adapted to adapt an audio descriptor from the audio resource adaptor.

또한, 상기 시점 변환부는 전송받은 복수개의 시점 화상으로부터 시청자가 원하는 적어도 하나의 시점을 선택하는 시점 결정부; 및 상기 시점 결정부로부터 선택된 시점의 기술자를 생성하여 상기 다시점 적응 서버부로 전송하는 시점 기술자 생성부를 포함할 수 있다.The view converting unit may further include: a view determining unit selecting at least one view desired by a viewer from a plurality of received view images; And a viewpoint descriptor generator for generating a descriptor of a viewpoint selected from the viewpoint determiner and transmitting the descriptor to the multi-view adaptive server unit.

또한, 상기 중간시점화상 생성부는 카메라의 외부 및 내부 파라미터를 계산하는 카메라 파라메타 계산부; 이웃하는 시점 화상간의 매칭점을 찾아 초기 깊이정 보를 찾아내는 깊이정보 계산부; 이웃하는 시점 화상간에 기준 화상을 정하고, 상기 기준 화상에 대한 색상 정보를 바탕으로 서로 다른 색상정보를 가지는 영역으로 분할하는 영역 분할부; 상기 영역 분할부를 통해 색상별로 분할된 영역에 상기 깊이정보 계산부를 통해 계산된 초기 깊이 정보를 반영하여 깊이 계층을 생성하는 깊이계층 예측부; 상기 깊이계층 예측부로부터 생성된 깊이 계층에 반복적으로 깊이할당을 수행하는 깊이정보 개선부; 및 상기 깊이정보 개선부를 통해 최적으로 생성된 깊이 정보와, 상기 카메라 파라메타 계산부를 통해 계산된 카메라 파라메타를 이용하여 임의 시점의 화상을 생성하는 중간시점 제작부를 포함할 수 있다.The intermediate view image generation unit may include a camera parameter calculator configured to calculate external and internal parameters of the camera; A depth information calculator for finding initial depth information by finding matching points between neighboring viewpoint images; A region dividing unit for determining a reference image between neighboring viewpoint images and dividing the reference image into regions having different color information based on the color information of the reference image; A depth layer predictor for generating a depth layer by reflecting initial depth information calculated by the depth information calculator in a region divided by colors through the region divider; A depth information improver which repeatedly allocates depth to the depth layer generated from the depth layer predictor; And a mid-view production unit configured to generate an image at any point in time using depth information optimally generated by the depth information improver and camera parameters calculated by the camera parameter calculator.

또한, 상기 다시점 화상 및 3D 오디오 송수신 장치는복수개의 카메라로부터 다시점 화상을 획득하는 다시점 화상 획득부; 상기 다시점 화상 획득부를 통해 얻은 다시점 화상 데이터를 부호화하는 다시점 화상 부호화부; 복수개의 마이크로폰으로부터 다채널 3D 오디오를 획득하는 3D 오디오 획득부; 상기 3D 오디오 획득부를 통해 얻은 3D 오디오 데이터를 부호화하는 3D 오디오 부호화부를 더 포함할 수 있다.In addition, the multi-view image and 3D audio transmission and reception apparatus includes a multi-view image acquisition unit for obtaining a multi-view image from a plurality of cameras; A multi-view image encoder for encoding multi-view image data obtained through the multi-view image acquisition unit; A 3D audio acquisition unit for acquiring multi-channel 3D audio from the plurality of microphones; The apparatus may further include a 3D audio encoder configured to encode 3D audio data obtained through the 3D audio acquirer.

또한, 상기 다시점 화상 및 3D 오디오 송수신 장치는 카메라로부터 각 시점 화상의 데이터를 가져 오는 획득 서버부; 수신측에서 요구된 시점을 선택적으로 전송하도록 제어 명령을 내리는 제어 서버부; 및 상기 제어 서버부로부터 명령받은 시점 화상의 데이터를 상기 획득 서버부로부터 가져와서 비압축 데이터 형태로 전송하는 IP(Internet Protocol)멀티캐스트 처리부를 구비하여 비압축 데이터 형태로 전송하는 다시점 데이터 전송부를 더 포함할 수 있다.In addition, the multi-view image and 3D audio transmission and reception apparatus includes an acquisition server unit for importing data of each viewpoint image from the camera; A control server unit for giving a control command to selectively transmit a requested time point at the receiving side; And an IP (Internet Protocol) multicast processing unit for obtaining data of the viewpoint image commanded from the control server unit from the acquisition server unit and transmitting the data of the viewpoint image in the form of uncompressed data. It may further include.

또한, 상기 다시점 데이터 전송부는 상기 획득 서버부로부터 각 시점 화상 및 3D 오디오의 생성 시간 정보를 포함한 모든 시점 데이터를 받아 오는 시점 데이터 모음부; 및 상기 생성 시간 정보를 이용하여 상기 시점 데이터를 동기 다중화하는 다시점 다중화부를 구비하여 압축 데이터 형태로 더 전송할 수 있다.The multi-view data transmission unit may further include: a viewpoint data collection unit receiving all viewpoint data including each viewpoint image and generation time information of 3D audio from the acquisition server unit; And a multi-view multiplexer configured to perform synchronous multiplexing of the view data using the generation time information.

또한, 상기 다시점 화상 및 3D 오디오 송수신 장치는 수신된 n개의 다시점 부호화 데이터 중에서 j개의 시점을 복호화하는 j시점 복호화부; 상기 j시점 복호화부를 통해 복호화된 데이터를 하나로 모으는 시점 데이터 모음부; 및 수신된 다채널 오디오 데이터를 복호화하는 3D 오디오 복호화부를 구비하는 다시점 데이터 복호화부를 더 포함할 수 있다(단, n≥j).The multi-view image and 3D audio transmission and reception apparatus may further include: a j-view decoder configured to decode j views from among n multi-view coded data; A viewpoint data collection unit for collecting the data decoded through the j viewpoint decoder; And a multi-view data decoder having a 3D audio decoder for decoding the received multi-channel audio data (where n ≧ j).

또한, 상기 다시점 화상 및 3D 오디오 송수신 장치는 상기 시점 데이터 모음부를 통해 모인 데이터를 각각의 시점별로 프레임 단위의 시점화상을 생성하기 위해 저장하는 다시점 화상 저장부; 다시점 화상과 동기화하고 3D 오디오 합성을 위해 복호화된 3D 오디오 데이터를 저장하는 3D 오디오 저장부; 시청자에 의해 선택된 시점 화상에 일치하는 3D 오디오를 상기 3D 오디오 저장부로부터 가져와서 합성하는 3D 오디오 합성부; 및 상기 3D 오디오 합성부를 통해 합성된 3D 오디오 데이터를 다채널 스피커 또는 스테레오 스피커를 통해 재생하는 3D 오디오 재생부를 더 포함할 수 있다.In addition, the multi-view image and the 3D audio transmission and reception apparatus includes a multi-view image storage unit for storing the data collected through the viewpoint data collection unit for generating a viewpoint image of each frame unit for each viewpoint; A 3D audio storage unit for synchronizing with a multi-view image and storing decoded 3D audio data for 3D audio synthesis; A 3D audio synthesizing unit for synthesizing 3D audio corresponding to the viewpoint image selected by the viewer from the 3D audio storage unit; And a 3D audio reproducing unit for reproducing 3D audio data synthesized through the 3D audio synthesizing unit through a multichannel speaker or a stereo speaker.

또한, 상기 3D 오디오 합성부는 시청자의 시점 선택에 따라 서라운드 패닝(surround panning)을 통해 선택된 시점 화상에 일치하는 3D 오디오를 합성하는 서라운드 패닝부; 스테레오 재생시 3D 효과를 주기위해 다채널 오디오 데이터로부 터 3D 파라미터를 추출하는 3D 파라미터 추출부; 및 상기 3D 파라미터 추출부를 통해 추출된 3D 파라미터를 이용하여 스테레오 오디오 데이터를 생성하는 다운 믹싱부를 포함할 수 있다.The 3D audio synthesizing unit may further include: a surround panning unit configured to synthesize 3D audio corresponding to the selected viewpoint image through surround panning according to a viewer's viewpoint selection; 3D parameter extraction unit for extracting 3D parameters from the multi-channel audio data to give a 3D effect in stereo playback; And a down mixing unit generating stereo audio data using the 3D parameter extracted through the 3D parameter extracting unit.

또한, 본 발명에 따른 다시점 화상 및 3D 오디오 송수신 방법은 다시점 화상 데이터와 3D 오디오 데이터를 송수신 처리하는 방법에 있어서, (a) 부호화된 다시점 화상 데이터 일부를 다중화하는 다시점 화상 다중화 단계; (b) 부호화된 나머지 다시점 화상 데이터와 부호화된 3D 오디오 데이터를 다중화하는 화상 오디오 다중화 단계; (c) 시청자가 요구한 시점에 따라, 시점 화상의 기술자(descriptor)와 3D 오디오의 기술자를 생성하는 단계; (d) 수신측에서 요구하는 시점 화상과 그에 대응되는 3D 오디오를 생성하는 단계; 및 (e) 복호화된 다시점 화상 데이터를 통해 중간 시점의 화상을 생성하는 단계를 포함하는 것을 특징으로 한다.In addition, a multi-view image and 3D audio transmission and reception method according to the present invention comprises: (a) a multi-view image multiplexing step of multiplexing a part of encoded multi-view image data; (b) a video audio multiplexing step of multiplexing the remaining coded multiview image data and the encoded 3D audio data; (c) generating a descriptor of a viewpoint image and a descriptor of 3D audio according to the viewpoint requested by the viewer; (d) generating a viewpoint image required by the receiver and 3D audio corresponding thereto; And (e) generating an image of an intermediate view through the decoded multi-view image data.

또한, 상기 (d)단계는 상기 (c)단계를 통해 전송된 시점 기술자(view descriptor)를 파싱(parsing)하는 단계; 다시점 리소스(resource)를 적응하는 단계; 다시점 화상 기술자를 적응하는 단계; 오디오 리소스를 적응하는 단계; 및 오디오 기술자를 적응하는 단계를 포함할 수 있다.The step (d) may include parsing a view descriptor transmitted through the step (c); Adapting a multi-view resource; Adapting a multiview image descriptor; Adapting the audio resource; And adapting the audio descriptor.

또한, 상기 (c)단계는 전송받은 복수개의 시점 화상으로부터 시청자가 원하는 적어도 하나의 시점을 선택하는 단계; 및 선택된 시점의 기술자를 생성하여 전송하는 단계를 포함할 수 있다.Also, the step (c) may include selecting at least one viewpoint desired by the viewer from the plurality of viewpoint images received; And generating and transmitting the descriptor of the selected time point.

또한, 상기 (e)단계는 카메라의 외부 및 내부 파라미터를 계산하는 단계; 이웃하는 시점 화상 간의 매칭점을 찾아 초기 깊이 정보를 찾아내는 단계; 이웃하는 시점 화상 간에 기준 화상을 정하고, 상기 기준 화상에 대한 색상 정보를 바탕으로 서로 다른 색상 정보를 가지는 영역으로 분할하는 단계; 색상별로 분할된 영역에 상기 초기 깊이 정보를 반영하여 깊이 계층을 생성하는 단계; 상기 깊이 계층에 반복적으로 깊이 할당을 수행하는 단계; 및 상기 깊이 할당을 수행하는 단계를 통해 최적으로 생성된 깊이 정보와, 계산된 카메라 파라미터를 이용하여 임의 시점의 화상을 생성하는 단계를 포함할 수 있다.In addition, step (e) may include calculating external and internal parameters of the camera; Finding initial depth information by finding a matching point between neighboring viewpoint images; Determining a reference picture between neighboring viewpoint images, and dividing the reference picture into regions having different color information based on color information of the reference picture; Generating a depth layer by reflecting the initial depth information in a region divided by colors; Iteratively performing depth allocation to the depth hierarchy; And generating an image of an arbitrary view point using depth information optimally generated through the depth assignment and calculated camera parameters.

또한, 상기 다시점 화상 및 3D 오디오 송수신 방법은 수신측에서 요구된 시점 화상을 비압축 데이터 형태로 전송하는 비압축 데이터 전송단계; 및 모든 시점 화상과 이에 대응되는 3D 오디오를 하나의 데이터스트림(data stream)으로 다중화하여 압축 데이터 형태로 전송하는 압축 데이터 전송단계를 더 포함할 수 있다.The multi-view image and the 3D audio transmission and reception method may further include: an uncompressed data transmission step of transmitting a viewpoint image requested by a receiver in the form of uncompressed data; And a compressed data transmission step of multiplexing all viewpoint images and 3D audio corresponding thereto into one data stream and transmitting the compressed data in the form of compressed data.

또한, 상기 압축 데이터 전송단계의 다중화는 각 시점 화상 및 3D 오디오의 생성 시간 정보를 이용하여 동기화된 형태로 수행되는 것이 바람직하다.In addition, the multiplexing of the compressed data transmission step may be performed in a synchronized form by using generation time information of each viewpoint image and 3D audio.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 이하에서 본 발명의 바람직한 실시예를 설명할 것이나, 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. First, in adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible, even if shown on different drawings. In addition, in describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the following will describe a preferred embodiment of the present invention, but the technical idea of the present invention is not limited thereto and may be variously modified and modified by those skilled in the art.

먼저, 본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 장치에 대해 설명한다.First, a multi-view image and a 3D audio transmission and reception apparatus according to a preferred embodiment of the present invention will be described.

도 1은 본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 장치의 블록도이다. 도 2는 도 1 중 다시점 화상 부호화부를, 도 3은 도 1 중 다시점 방송 서버부를, 도 4는 도 1 중 다시점 적응 서버부를, 도 5는 도 1 중 다시점 데이터 전송부를, 도 6은 도 1 중 시점 변환부를, 도 7은 도 1 중 다시점 데이터 복호화부를, 도 8은 도 1 중 중간시점 화상 생성부를, 그리고 도 9는 도 1 중 3D 오디오 합성부를 상세하게 도시한 블록도이다.1 is a block diagram of a multi-view image and 3D audio transmission and reception apparatus according to a preferred embodiment of the present invention. FIG. 2 is a multi-view image encoder in FIG. 1, FIG. 3 is a multi-view broadcast server in FIG. 1, FIG. 4 is a multi-view adaptive server in FIG. 1, FIG. 5 is a multi-view data transmitter in FIG. FIG. 1 is a block diagram illustrating in detail the view converting unit of FIG. 1, FIG. 7 is a multiview data decoding unit of FIG. 1, FIG. 8 is an intermediate view image generating unit of FIG. 1, and FIG. 9 is a 3D audio synthesizing unit of FIG. 1. .

본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 장치는, 도 1을 참조하면, 다시점 화상 획득부(100), 다시점 화상 부호화부(200), 3D 오디오 획득부(300), 3D 오디오 부호화부(400), 다시점 방송 서버부(500), 다시점 적응 서버부(600), 다시점 데이터 전송부(700), 시점 변환부(800), 다시점 데이터 복호화부(900), 다시점 화상 저장부(1000), 중간시점 화상 생성부(1100), 다시점 화상 재생부(1200), 3D 오디오 저장부(1300), 3D 오디오 합성부(1400), 및 3D 오디오 재생부(1500)를 포함하여 형성된다.In the multi-view image and 3D audio transmission and reception apparatus according to the preferred embodiment of the present invention, referring to FIG. 1, the multi-view image acquisition unit 100, the multi-view image encoding unit 200, the 3D audio acquisition unit 300, 3D audio encoder 400, multi-view broadcast server 500, multi-view adaptive server 600, multi-view data transmission unit 700, a viewpoint converting unit 800, a multi-view data decoding unit 900 , The multi-view image storage unit 1000, the mid-view image generation unit 1100, the multi-view image playback unit 1200, the 3D audio storage unit 1300, the 3D audio synthesis unit 1400, and the 3D audio playback unit ( 1500).

상기 다시점 화상 획득부(100)는 복수개의 카메라로부터 다시점 화상을 획득하는 부분이다. 예를 들어, 상기 다시점 화상 획득부(100)가 n개의 카메라를 구비하고 있다면, 동일한 촬영 대상을 각 카메라가 촬영하여 디지털 또는 아날로그 형태의 신호를 전송선을 통해 해당 버퍼에 저장한 후, 다시점 화상 부호화부(200)로 전송하게 된다. The multi-view image acquisition unit 100 is a part of obtaining a multi-view image from a plurality of cameras. For example, if the multi-viewpoint image acquisition unit 100 includes n cameras, each camera captures the same photographing target and stores a digital or analog signal in a corresponding buffer through a transmission line, It is transmitted to the image encoder 200.

상기 다시점 화상 부호화부(200)는 다시점 화상 획득부(100)를 통해 얻은 다시점 화상 데이터를 부호화(encoding)하는 부분이다. 상기 다시점 화상 부호화부(200)는, 도 2를 참조하면, 다시점 화상 재배열부(210)와 재배열 화상 부호화부(220)를 포함하여 형성된다. The multi-view image encoder 200 is a portion for encoding multi-view image data obtained through the multi-view image acquisition unit 100. 2, the multiview image encoder 200 includes a multiview image rearranger 210 and a rearrangement image encoder 220.

상기 다시점 화상 재배열부(210)는 부호화기에 입력되는 복수개의 다시점 화상들(V1, V2,..., Vn)을 GOP(Group Of Pictures) 단위로 분할하고, 이에 따라 순차적으로 재배열하는 부분이다. 상기 다시점 화상 재배열부(210)의 프레임 재배열 방법에 관해 간단히 설명하면 다음과 같다. 예를 들어, 8개의 카메라를 통해 8개의 시점 화상(S0~S7)이 입력되는 경우, 먼저 T0 시각에서 모든 시점(S0~S7)의 화면을 일렬로 나열한 다음, 소정의 시간(T1~T8) 동안 첫 시점의 화면(S0)에 대해서 일렬로 나열하여 출력한다. 이어서, 다음 시점의 화면(S1)에 대해서 소정 시간(T1~T8) 동안 나열하는 과정을 반복하여 마지막 시점(S7)의 화면에 대해서도 일렬로 나열하는 것이 완료되면, 다시 소정의 시간(T9~T16) 동안 첫 시점의 화면(S0)부터 나열하기 시작하여 마지막 시점(S7)의 화면까지 나열한다. 이와 같은 화면 재배열 방식은 현재 H.264/AVC 표준에서 다시점 화상에 대해 검토하고 있는 방식이다. 다만, 이와 다른 방식으로 화면을 재배열할 수도 있음은 물론이다. The multi-view image rearranger 210 divides a plurality of multi-view images V1, V2,..., Vn input into an encoder into GOP units, and rearranges them sequentially. Part. The frame rearrangement method of the multi-view image rearranger 210 will be described below. For example, when eight viewpoint images S0 to S7 are inputted through eight cameras, the screens of all viewpoints S0 to S7 are arranged in a line at time T0, and then a predetermined time (T1 to T8). While outputting the list of the first time screen (S0) in a row. Subsequently, the process of listing the screen S1 of the next time point for a predetermined time T1 to T8 is repeated, and when listing of the screen of the last time point S7 is completed in a line, the predetermined time T9 to T16 is again performed. ), The list starts from the screen S0 of the first time point and the screen from the last time point S7. Such a screen rearrangement method is currently a method for reviewing a multiview image in the H.264 / AVC standard. Of course, the screen may be rearranged in a different manner.

상기 재배열 화상 부호화부(220)는 재배열된 화상을 MPEG-2, H.264/AVC 등의 표준 부호화 방식에 따라 인코딩하는 부분이다. 각 시점 화상은 독립적으로 동시 부호화(simulcast encoding) 될 수도 있으나, 바람직하게는 시점간 상관성을 이용 하여 다시점 화상 부호화(multi-view encoding)된다. 다시점 화상에는 시간적으로 인접한 화상간의 시간적 중복성뿐만 아니라, 공간적으로 인접한 카메라들에 의해 획득된 화상간의 공간적 중복성도 존재한다. 상기 동시 부호화 방식은 각 시점의 화상이 독립적으로 부호화되므로, 각 시점 화상 간에 존재하는 공간적 중복성을 제거할 수 없다는 단점이 있다. 이에 반해, 다시점 화상 부호화 방식은 시점 화상들을 재배열하여 단일 부호화기로 부호화함으로써, 공간적인 중복성 또한 제거할 수 있으므로, 보다 효율적으로 다시점 화상을 부호화할 수 있다.The rearranged image encoder 220 encodes the rearranged image according to a standard encoding scheme such as MPEG-2 and H.264 / AVC. Each viewpoint image may be simultaneously encoded (simulcast encoding), but is preferably multi-view encoding using inter-view correlation. In multi-view images, there are not only temporal redundancies between temporally adjacent images, but also spatial redundancy between images obtained by spatially adjacent cameras. The simultaneous encoding method has a disadvantage in that the spatial redundancy existing between each viewpoint image cannot be removed because the images of each viewpoint are independently encoded. In contrast, in the multi-view image encoding method, spatial redundancy can also be eliminated by rearranging the view images and encoding them with a single encoder, so that multi-view images can be encoded more efficiently.

상기 3D 오디오 획득부(300)는 복수개의 마이크로폰(microphone)으로부터 다채널 3D 오디오 사운드 소스(sound source)를 획득하는 부분이다. 상기 3D 오디오 획득부(300)에서는 두 개의 마이크로폰을 이용하여 스테레오 사운드 소스를 획득할 수도 있으나, 바람직하게는 다수개의 마이크로폰을 이용하여 4채널 혹은 5.1채널 등의 다채널 사운드 소스를 획득하게 된다. 이러한 사운드 소스의 획득은 각 시점 화상별로 이루어지는 것이 바람직하다. 시청자는 시점 변환부(800)를 통해 자신이 선택한 시점 화상과 함께, 해당 시점 화상을 반영하는 3D 오디오를 시청할 수 있으므로, 보다 실감나는 입체 사운드를 체험할 수 있게 된다.The 3D audio acquisition unit 300 is a part for obtaining a multi-channel 3D audio sound source from a plurality of microphones. The 3D audio acquisition unit 300 may acquire a stereo sound source by using two microphones, but preferably obtains a multichannel sound source such as 4 channel or 5.1 channel using a plurality of microphones. Acquisition of such a sound source is preferably made for each viewpoint image. The viewer can watch 3D audio reflecting the viewpoint image together with the viewpoint image selected by the viewer through the viewpoint converting unit 800, so that the viewer can experience more realistic three-dimensional sound.

상기 3D 오디오 부호화부(400)는 3D 오디오 획득부를 통해 얻은 3D 오디오 데이터를 부호화하는 부분이다. 이때, 상기 3D 오디오 부호화부(400)에서는 AAC(Advanced Audio Coding) 또는 AC-3(Audio Coding-3)와 같은 다채널 오디오 압축기법을 이용하여 사운드 소스를 인코딩하게 된다.The 3D audio encoder 400 is a part for encoding 3D audio data obtained through the 3D audio acquirer. In this case, the 3D audio encoder 400 encodes a sound source using a multi-channel audio compression method such as AAC (Advanced Audio Coding) or AC-3 (Audio Coding-3).

상기 다시점 방송 서버부(500)는 다시점 시스템을 관리하고, 다시점 화상 및 3D 오디오 데이터를 처리하는 부분이다. 상기 다시점 방송 서버부(500)는, 도 3을 참조하면, 다시점 화상 데이터와 3D 오디오 데이터 일부를 다중화하는 다시점 화상 다중화부(510), 부호화된 나머지 다시점 화상 데이터와 부호화된 3D 오디오 데이터를 다중화하는 화상 오디오 다중화부(520)를 포함하여 형성된다. 예를 들어, 다시점 화상 부호화 데이터는 V1, V2, ..., Vn으로 구성되어 있고, 오디오 부호화 데이터는 A 하나로 구성된다면, 상기 다시점 화상 다중화부(510)는 V1, V2, ..., Vn-1의 화상 데이터를 다중화하고 상기 화상 오디오 다중화부(520)는 Vn의 시점 화상과 A의 오디오를 다중화할 수 있다. 한편, 각 시점 화상별로 오디오 부호화 데이터가 존재하는 경우에는 여러 개의 화상 오디오 다중화부(520)가 구비되거나, 다수의 데이터를 적절히 그룹지어 하나의 화상 오디오 다중화부(520)를 통해 다중화될 수 있다.The multi-view broadcasting server 500 manages a multi-view system and processes multi-view images and 3D audio data. Referring to FIG. 3, the multi-view broadcast server 500 includes a multi-view image multiplexer 510 which multiplexes a part of multi-view image data and 3D audio data, and the remaining coded multi-view image data and the encoded 3D audio. And an image audio multiplexer 520 for multiplexing data. For example, if the multi-view image coded data consists of V1, V2, ..., Vn, and the audio-coded data consists of A single, the multi-view image multiplexer 510 includes the V1, V2, ... , Multiplexing the image data of Vn-1 and the image audio multiplexer 520 may multiplex the viewpoint image of Vn and the audio of A. On the other hand, when audio coded data exists for each view image, a plurality of image audio multiplexers 520 may be provided or multiplexed through a single image audio multiplexer 520 by appropriately grouping a plurality of data.

상기 다시점 적응 서버부(600)는 수신측(클라이언트)으로 다시점 화상을 전송하고, 메시지 교환을 하며, 수신측이 요구하는 시점에 따른 시점 화상 및 3D 오디오 데이터를 생성하는 부분이다. 즉, 상기 다시점 적응 서버부(600)는 다시점 화상과 3D 오디오를 소비하는 사용 환경, 즉 수신측 단말 특성 및/또는 시청자의 재생 취향 특성에 대한 정보를 이용하여 하나의 컨텐츠를 각기 다른 사용 환경에 부합할 수 있도록 적응 변환처리하는 단일소스 복수사용(Single-Source Multi-Use) 환경을 제공한다. 상기 다시점 적응 서버부(600)는, 도 4를 참조하면, 시점 기술자 파싱부(610), 화상 리소스 적응부(620), 화상 기술자 적응부(630), 오디오 리소스 적응부(640), 및 오디오 기술자 적응부(650)를 포함하여 형성된다.The multi-view adaptive server unit 600 transmits a multi-view image to the receiving side (client), exchanges messages, and generates a viewpoint image and 3D audio data according to the viewpoint requested by the receiving side. That is, the multi-view adaptive server unit 600 uses a single content differently using information on a usage environment that consumes a multi-view image and 3D audio, that is, information on a receiving terminal characteristic and / or a viewer's reproduction preference characteristic. It provides a single-source multi-use environment that adaptively transforms to match the environment. Referring to FIG. 4, the multi-view adaptation server unit 600 includes: a view descriptor parser 610, an image resource adaptor 620, an image descriptor adaptor 630, an audio resource adaptor 640, and Audio descriptor adaptation unit 650.

상기 시점 기술자 파싱부(610)는 시점 변환부(800)로부터 전송된 시점 기술자(view descriptor)를 파싱(parsing)하는 부분이다. 여기서 기술자(descriptor)란, 표준화된 표현, 식별 및 메타 데이터를 갖는 구조화된 디지털 객체인 디지털 아이템(Digital Item;DI) 내의 항목 또는 컴포넌트(component)에 관련된 정보를 의미한다. 즉, 상기 시점 기술자 파싱부(610)는 시점 변환부(800)의 시점 결정부(810)에서 시청자가 선택한 시점에 따라 시점 기술자 생성부(820)에서 생성된 시점 기술자를 입력받아 관련 정보를 파싱한다. The view descriptor parser 610 parses a view descriptor transmitted from the view converter 800. Here, a descriptor means information related to an item or component in a digital item (DI), which is a structured digital object having a standardized representation, identification, and metadata. That is, the viewpoint descriptor parser 610 parses the related information by receiving the viewpoint descriptor generated by the viewpoint descriptor generator 820 according to the viewpoint selected by the viewer in the viewpoint determiner 810 of the viewpoint transformer 800. do.

상기 화상 리소스 적응부(620)는 시점 기술자 파싱부(610)를 통해 파싱된 다시점 리소스를 적응 변환하는 부분이다. 여기서 리소스(resource)란, 비디오, 오디오, 이미지 또는 텍스트 항목과 같이 개별적으로 식별가능한 항목을 의미한다. 즉, 상기 화상 리소스 적응부(620)는 다시점 리소스에 시청자의 재생 취향을 반영하여 화상 컨텐츠를 적응 변환시킨다. 이를 통해 적응 리소스 Rv가 생성되고, 소정의 변환과정을 거친 후 클라이언트로 전송된다.The image resource adaptor 620 adaptively transforms the multi-view resource parsed by the view descriptor parser 610. Here, the resource means an individually identifiable item such as a video, audio, image or text item. That is, the image resource adaptation unit 620 adaptively converts the image content by reflecting the playback taste of the viewer in the multi-view resource. Through this, an adaptive resource Rv is generated, and after a predetermined conversion process, is transmitted to the client.

상기 화상 기술자 적응부(630)는 다시점 화상 기술자를 적응 변환하는 부분이다. 일반적으로 화상 기술자는 수신측 단말기의 화상 포맷, 초당 처리되는 최대 교점수(maximum vertices), 초당 최대 픽셀수(maximum pixels), 최대 전송율(maximum rate)과 같은 특성을 기술하거나, 특정 시점 화상을 선호하는 시청자의 재생 취향 특성을 기술한다. 상기 화상 기술자 적응부(630)는 시점 기술자 생성부(820)로부터 전송받은 다시점 화상 기술자를 반영하여 화상 컨텐츠를 적응 변환하게 된다. 이를 통해 적응 기술자 Dv가 생성되고, 소정의 변환과정을 거친 후 클 라이언트로 전송된다.The image descriptor adaptation unit 630 is a part for adaptively transforming a multiview image descriptor. In general, the image descriptor describes characteristics such as the image format of the receiving terminal, the maximum vertices processed per second, the maximum pixels per second, and the maximum rate, or prefers a specific viewpoint image. The reproduction taste characteristics of the viewer are described. The image descriptor adaptor 630 adaptively converts the image content by reflecting the multi-view image descriptor received from the viewpoint descriptor generator 820. Through this, an adaptive descriptor Dv is generated, and after a predetermined conversion process, is transmitted to the client.

상기 오디오 리소스 적응부(640)는 시점 기술자 파싱부(610)를 통해 파싱된 오디오 리소스를 적응 변환하는 부분이다. 이를 통해 적응 리소스 Ra가 생성되고, 소정의 변환과정을 거친 후 클라이언트로 전송된다.The audio resource adaptor 640 adaptively converts the audio resource parsed through the view descriptor parser 610. Through this, an adaptive resource Ra is generated, and after being converted into a predetermined process, is transmitted to the client.

상기 오디오 기술자 적응부(650)는 시점 기술자 생성부(820)로부터 전송받은 오디오 기술자를 반영하여 오디오 컨텐츠를 적응 변환하게 된다. 이를 통해 적응 기술자 Da가 생성되고, 소정의 변환과정을 거쳐 클라이언트로 전송된다.The audio descriptor adaptor 650 adaptively converts the audio content by reflecting the audio descriptor received from the viewpoint descriptor generator 820. Through this, the adaptive descriptor Da is generated and transmitted to the client through a predetermined conversion process.

상기 다시점 데이터 전송부(700)는 클라이언트에게 효율적으로 데이터를 전송하기 위한 전송 프로토콜 및 메카니즘을 제공한다. 이때, 상기 다시점 데이터 전송부(700)는 비압축 데이터 전송과 압축 데이터 전송을 위한 두 가지 형태의 데이터 전송을 위한 구조를 구비한다. 비압축 데이터 전송은 클라이언트가 요구한 시점 화상이 지연되거나 화질이 손상되는 것을 방지하기 위해 비압축 데이터 형태로 전송한다. 반면, 압축 데이터 전송은 클라이언트가 요구한 시점 화상을 포함한 모든 시점 화상을 압축하고, 이에 3D 오디오 데이터를 다중화하여 전송한다.The multi-view data transmitter 700 provides a transmission protocol and a mechanism for efficiently transmitting data to the client. In this case, the multi-view data transmitter 700 has a structure for transmitting two types of data for uncompressed data transmission and compressed data transmission. The uncompressed data transmission is transmitted in the form of uncompressed data in order to prevent the client from requesting a delayed image or deteriorated picture quality. On the other hand, compressed data transmission compresses all viewpoint images including a viewpoint image requested by a client, and multiplexes and transmits 3D audio data.

첫째, 비압축 데이터 전송을 위해 상기 다시점 데이터 전송부(700)는, 도 5를 참조하면, 획득 서버부(710), IP 멀티캐스트 처리부(720), 제어 서버부(740)를 구비한다. 상기 획득 서버부(710)는 다수개 구비되며, 바람직하게는 다시점 카메라의 개수만큼 구비되어 각 다시점 카메라로부터 시점 데이터를 얻어 온다. 상기 제어 서버부(740)는 클라이언트가 요구한 시점을 선택적으로 전송하기 위해 제어 명령을 내리는 부분으로, 클라이언트가 원하는 시점에 해당하는 획득 서버부(710)로 부터 데이터를 가져와 비압축 데이터 형태로 각 IP 멀티캐스트 처리부(720)를 이용하여 전송한다. 즉, 클라이언트가 원하는 시점 만이 제어 서버부(740)의 제어 데이터를 통하여 선택적으로 전송된다.First, the multi-view data transmitting unit 700, for obtaining uncompressed data, includes an acquisition server unit 710, an IP multicast processing unit 720, and a control server unit 740. Referring to FIG. The acquiring server unit 710 is provided in plural, preferably provided by the number of multi-view cameras to obtain viewpoint data from each multi-view camera. The control server unit 740 issues a control command to selectively transmit a time point requested by the client. The control server unit 740 obtains data from the acquiring server unit 710 corresponding to the time point desired by the client, and forms each of the uncompressed data. The IP multicast processor 720 transmits the data. That is, only the time point desired by the client is selectively transmitted through the control data of the control server 740.

둘째, 압축 데이터 전송을 위해 상기 다시점 데이터 전송부(700)는 획득 서버부(710), 시점 데이터 모음부(730), 동기 다중화부(750), 및 IP 멀티캐스트 처리부(760)를 구비한다. 상기 시점 데이터 모음부(730)는 획득 서버부(710)들로부터 각 시점 화상 및 3D 오디오의 생성 시간 정보를 포함하여, 모든 시점의 데이터를 받아 온다. 수집된 데이터들은 동기 다중화부(750)를 통해 하나의 데이터 스트림으로 다중화된다. 이때, 상기 동기 다중화부(750)는 각 시점 화상 및 3D 오디오의 생성 시간 정보를 이용하여 동기화된 다중화를 수행한다. 이렇게 다중화된 하나의 스트림은 IP 멀티캐스트 처리부(760)를 통해 클라이언트에게 전송된다.Second, the multiview data transmitter 700 includes an acquisition server unit 710, a viewpoint data collection unit 730, a synchronous multiplexer 750, and an IP multicast processor 760 for compressed data transmission. . The viewpoint data collection unit 730 receives data of all viewpoints, including generation time information of each viewpoint image and 3D audio, from the acquisition server unit 710. The collected data are multiplexed into one data stream through the synchronous multiplexer 750. In this case, the synchronous multiplexer 750 performs synchronized multiplexing by using generation time information of each viewpoint image and 3D audio. This multiplexed stream is transmitted to the client through the IP multicast processor 760.

상기 시점 변환부(800)는 시청자가 요구한 시점에 따라 시점 화상의 기술자와 3D 오디오의 기술자를 생성하는 부분이다. 상기 시점 변환부(800)는, 도 6을 참조하면, 시점 결정부(810)와 시점 기술자 생성부(820)를 포함한다. The viewpoint converting unit 800 generates a descriptor of a viewpoint image and a descriptor of 3D audio according to a viewpoint requested by the viewer. Referring to FIG. 6, the viewpoint converter 800 includes a viewpoint determiner 810 and a viewpoint descriptor generator 820.

상기 시점 결정부(810)는 전송받은 복수개의 시점 화상으로부터 시청자가 원하는 시점을 선택하는 부분이다. 이때, 시청자가 원하는 시점은 하나일 수도 있고 둘 이상일 수도 있다. 즉, 상기 시점 결정부(810)는 다시점 카메라의 시점 화상 V1, V2, ..., Vn으로부터 사용자가 원하는 시점 Vk (k=1~n)를 선택한다. The viewpoint determination unit 810 is a part for selecting a viewpoint desired by the viewer from the plurality of viewpoint images received. In this case, one or more viewpoints may be desired by the viewer. That is, the viewpoint determination unit 810 selects the viewpoint Vk (k = 1 to n) desired by the user from the viewpoint images V1, V2, ..., Vn of the multiview camera.

상기 시점 기술자 생성부(820)는 시점 결정부(810)를 통해 시점이 결정되면 그에 따른 시점 기술자(view descriptor)를 생성한다. 시점 기술자는 XML(eXtensible Markup Language) 형식의 기계판독 가능한 언어로 표현될 수 있으며, 다시점 화상 기술자와 오디오 기술자가 있다. 상기 시점 기술자 생성부(820)를 통해 생성된 시점 기술자는 다시점 적응 서버부(600)로 전송되어 시점 화상과 3D 오디오를 적응 변환시키도록 한다.The view descriptor generator 820 generates a view descriptor according to the view determiner through the view determiner 810. The viewpoint descriptor may be expressed in a machine readable language in an XML (eXtensible Markup Language) format, and there are a multiview image descriptor and an audio descriptor. The view descriptor generated through the view descriptor generator 820 is transmitted to the multiview adaptation server 600 to adaptively convert the view image and the 3D audio.

상기 다시점 데이터 복호화부(900)는 수신된 다시점 화상 및 3D 오디오 데이터를 복호화하여, 다시점 화상 저장부(1000)와 3D 오디오 저장부(1400)로 전송한다. 상기 다시점 데이터 복호화부(900)는, 도 7을 참조하면, j시점 복호화부(910), 시점 데이터 모음부(920), 및 3D 오디오 복호화부(930)를 구비한다. The multi-view data decoding unit 900 decodes the received multi-view image and the 3D audio data, and transmits the multi-view data storage unit 1000 and the 3D audio storage unit 1400. Referring to FIG. 7, the multi-view data decoder 900 includes a j-view decoder 910, a view data collection unit 920, and a 3D audio decoder 930.

상기 j시점 복호화부(910)는 수신된 n개의 다시점 부호화 데이터 중에서 j개의 시점을 복호화하는 부분이다. 이때, j값은 n값보다 작거나 같은 것이 바람직하다. 즉, 상기 j시점 복호화부(910)는 n개의 시점 중에서 일부인 j개의 시점을 복호화하거나, 모든 시점에 대해 복호화하게 된다. 모든 시점에 대해 복호화하는 경우(j=n), 도시되지 않았으나 1개의 복호기만을 이용하여 n개의 다시점 부호화 데이터를 복호화할 수도 있다. 이는 클라이언트가 원하는 품질 정도에 따라 부호화기에 비해 복잡도가 낮은 복호화기를 사용할 수 있기 때문이다. 이와 같이 1개의 복호기만을 이용하여 복호화하는 경우에는 일정 이상의 품질을 확보하면서도 효율적인 통제가 가능하며, 시점 개수가 증감하더라도 유연하게 대처할 수 있다는 장점이 있다. 도 7에 도시된 바와 같이, 필요에 따라 복호화기의 개수를 증가시켜 보다 향상된 성능을 확보할 수도 있음은 물론이다. j값은 복호화기의 성능을 비롯하여 사용 환경 등 몇 가지 변수에 의해 결정된다. 상기 j시점 복호화부(910)는 MPEG-2, H.264/AVC와 같은 표준 시스템의 복호 규칙을 일부 수용하여 형성된다. The j-view decoder 910 is a part for decoding j-views from among the n multi-view coded data. At this time, j value is preferably less than or equal to n value. That is, the j-view decoder 910 decodes j views which are a part of n views or decodes all the views. When decoding for all views (j = n), although not shown, n multi-view coded data may be decoded using only one decoder. This is because the client may use a decoder having a lower complexity than the encoder according to the quality desired by the client. In the case of decoding using only one decoder as described above, it is possible to efficiently control while securing a certain quality or more, and can flexibly cope with the increase or decrease of the number of viewpoints. As shown in FIG. 7, the number of decoders may be increased as necessary to ensure more improved performance. The j value is determined by several variables, including the performance of the decoder and the usage environment. The j-time decoder 910 is formed by partially accepting a decoding rule of a standard system such as MPEG-2 and H.264 / AVC.

상기 시점 데이터 모음부(920)는 j시점 복호화부(910)를 통해 복호화된 데이터를 하나로 모으는 부분이다. 상기 시점 데이터 모음부(920)는 수집한 데이터를 다시점 화상 저장부(1000)로 전송하게 된다. 상기 다시점 화상 저장부(1000)에서는 전송받은 데이터를 각 시점별로 프레임 단위의 시점 화상을 생성하기 위해 일시적으로 저장한다.The viewpoint data collection unit 920 collects the data decoded by the j viewpoint decoder 910 into one. The viewpoint data collection unit 920 transmits the collected data to the multi-viewpoint image storage unit 1000. The multi-viewpoint image storage unit 1000 temporarily stores the received data to generate a viewpoint image in a frame unit for each viewpoint.

상기 3D 오디오 복호화부(930)는 수신된 오디오 데이터를 복호화하는 부분이다. 이때, 오디오 데이터는 다채널 또는 스테레오 채널일 수 있으며, 3D 오디오 저장부(1300)로 전송된다. 상기 3D 오디오 저장부(1300)는 다시점 화상과 동기화하여 3D 오디오를 합성하기 위해 복호화된 3D 오디오 데이터를 일시적으로 저장한다.The 3D audio decoder 930 is a part for decoding the received audio data. In this case, the audio data may be a multi-channel or a stereo channel and is transmitted to the 3D audio storage 1300. The 3D audio storage unit 1300 temporarily stores decoded 3D audio data in order to synthesize 3D audio in synchronization with a multi-view image.

상기 중간시점 화상 생성부(1100)는 다시점 화상의 중간시점 화상을 생성하여 다시점 화상 재생부(1200)로 전송한다. 즉, 상기 중간시점 화상 생성부(1100)는 복호화된 시점 화상 간의 중간시점 화상을 생성하여, 3D 모니터의 시점 간격에 맞추어 부드러운 시점 화상을 제공하게 된다. 상기 중간시점 화상 생성부(1100)는, 도 8을 참조하면, 카메라 파라미터 계산부(1110), 깊이정보 계산부(1120), 영역 분할부(1130), 깊이계층 예측부(1140), 깊이정보 개선부(1150), 및 중간시점 화상 제작부(1160)를 구비한다.The intermediate view image generator 1100 generates an intermediate view image of the multiview image and transmits the intermediate view image to the multiview image reproducing unit 1200. That is, the mid-view image generator 1100 generates a mid-view image between the decoded viewpoint images, and provides a smooth viewpoint image according to the viewpoint interval of the 3D monitor. Referring to FIG. 8, the intermediate view image generator 1100 may include a camera parameter calculator 1110, a depth information calculator 1120, an area divider 1130, a depth layer predictor 1140, and depth information. An improvement unit 1150 and an intermediate view image production unit 1160 are provided.

상기 카메라 파라미터 계산부(1110)는 카메라의 외부 및 내부 파라미터를 계산하는 부분이다. 카메라의 외부 파라미터는 카메라의 세팅과 관련된 변수들, 즉 카메라의 위치, 높이, 각도 등에 관한 파라미터이며, 내부 파라미터는 카메라 자체 의 특성과 관련된 변수들, 즉 광심, 초점 거리, 해상도 등에 관한 파라미터이다. The camera parameter calculator 1110 calculates external and internal parameters of the camera. The external parameters of the camera are parameters related to the camera settings, that is, the position, height, angle, etc. of the camera, and the internal parameters are parameters related to the characteristics of the camera itself, that is, the optical center, focal length, and resolution.

상기 깊이정보 계산부(1120)는 이웃하는 시점 화상간의 매칭점을 찾아 초기 깊이정보를 찾아내는 부분이다. 즉, 상기 깊이정보 계산부(1120)는 주어진 각 시점 화상 사이에 블록 단위로 유사성을 측정하고, 그 결과 가장 유사한 매칭점을 찾아 초기 깊이정보를 계산한다.The depth information calculator 1120 finds a matching point between neighboring viewpoint images to find initial depth information. That is, the depth information calculator 1120 measures similarity between each given viewpoint images in block units, and finds the most similar matching point and calculates initial depth information.

상기 영역 분할부(1130)는 이웃하는 시점 화상간에 기준 화상을 정하고, 이러한 기준 화상에 대한 색상 정보를 바탕으로 서로 다른 색상정보를 가지는 영역으로 분할하는 부분이다. The region dividing unit 1130 determines a reference image between neighboring viewpoint images, and divides the region into regions having different color information based on the color information of the reference image.

상기 깊이계층 예측부(1140)는 영역 분할부(1130)를 통해 색상별로 분할된 영역에 깊이정보 계산부(1120)를 통해 계산된 초기 깊이정보를 반영하여 깊이계층을 생성하는 부분이다. 이때, 상기 깊이계층 예측부(1140)는 같은 영역에 대해서는 대표 깊이정보를 할당한다. 이러한 작업을 반복하여 모든 영역에 대하여 유사한 대표 깊이정보끼리 그룹으로 묶은 후, 그룹 내에서 대표깊이값 하나를 추출하여 깊이계층을 생성한다.The depth layer predictor 1140 is a portion that generates the depth layer by reflecting initial depth information calculated by the depth information calculator 1120 in a region divided by colors through the region divider 1130. In this case, the depth layer predictor 1140 allocates representative depth information to the same region. This operation is repeated, grouping similar representative depth information for all regions into groups, and extracting one representative depth value from the group to create a depth hierarchy.

상기 깊이정보 개선부(1150)는 깊이계층 예측부(1140)로부터 생성된 깊이 계층에 반복적으로 깊이할당을 수행하는 부분이다. 보다 상세하게는, 상기 깊이정보 개선부(1150)는 각 영역별로 대표 깊이 정보를 이용하여 깊이계층 예측부(140)에서 생성된 깊이계층에 할당하고, 대표깊이 정보와 위치상 인접한 색상 영역 및 색상의 유사성 등을 반영하는 비용함수를 정의한다. 이때, 깊이할당은 이러한 비용함수값이 최소가 되는 지점을 찾기 위해 반복적으로 수행되는 것이 바람직하다.The depth information improving unit 1150 is a part which repeatedly allocates depth to the depth layer generated from the depth layer predicting unit 1140. In more detail, the depth information improving unit 1150 allocates the depth layer generated by the depth layer predicting unit 140 using representative depth information for each region, and the color region and color adjacent to the representative depth information. Define cost functions that reflect similarities. At this time, the depth allocation is preferably performed repeatedly to find the point where the cost function value is the minimum.

상기 중간시점 제작부(1160)는 깊이정보 개선부(1150)를 통해 최적으로 생성된 깊이정보와, 카메라 파라미터 계산부(1110)를 통해 계산된 카메라 파라미터를 이용하여 임의 시점의 화상을 생성하는 부분이다. 이와 같이 하여 생성된 임의의 시점 화상에는 가려짐 등에 기인한 빈 영역들이 발생할 수 있다. 이때, 이러한 빈 영역은 이웃하는 화소를 바탕으로 복원하고, 보간법(interpolation)을 이용하여 자연스러운 중간시점 화상을 생성한다.The intermediate view maker 1160 is a part that generates an image at any point in time using depth information optimally generated by the depth information improver 1150 and camera parameters calculated by the camera parameter calculator 1110. . Blank areas due to occlusion or the like may occur in any viewpoint image generated in this way. In this case, the empty area is restored based on the neighboring pixels, and a natural mid-view image is generated by using interpolation.

상기 3D 오디오 합성부(1400)는, 도 9를 참조하면, 서라운드 패닝(surround panning)부(1410), 3D 파라미터 추출부(1420) 및 다운 믹싱부(1430)를 구비한다. 상기 서라운드 패닝부(1410)는 사용자가 선택한 시점에 따라 서라운드 패닝을 통해 선택된 시점 화상에 일치하는 3D 오디오를 합성한다. 합성된 3D 오디오 데이터는 다채널 스피커를 통해 재생된다. 상기 3D 파라미터 추출부(1420)는 스테레오 재생시 3D 효과를 주기 위해 다채널 오디오 데이터로부터 3D 파라미터를 추출한다. 상기 다운 믹싱부(1430)는 추출된 3D 파라미터와 선택된 시점을 이용하여 스테레오 오디오 데이터를 생성한다. 생성된 스테레오 오디오 데이터는 스테레오 스피커를 통해 재생된다. Referring to FIG. 9, the 3D audio synthesizer 1400 includes a surround panning unit 1410, a 3D parameter extracting unit 1420, and a down mixing unit 1430. The surround panning unit 1410 synthesizes 3D audio corresponding to the selected viewpoint image through surround panning according to the viewpoint selected by the user. The synthesized 3D audio data is reproduced through the multichannel speakers. The 3D parameter extractor 1420 extracts a 3D parameter from multi-channel audio data to give a 3D effect in stereo reproduction. The down mixing unit 1430 generates stereo audio data using the extracted 3D parameter and the selected viewpoint. The generated stereo audio data is reproduced through the stereo speakers.

상기 3D 오디오 재생부(1500)는 합성된 3D 오디오 데이터를 다채널 스피커 또는 스테레오 스피커를 통해 재생한다. 이때, 상기 서라운드 패닝부(1410)를 통해 전송된 데이터는 다채널 스피커로 재생되고, 다운 믹싱부(1430)를 통해 전송된 데이터는 스테레오 스피커로 재생됨은 상기에서 언급한 바와 같다.The 3D audio player 1500 reproduces the synthesized 3D audio data through a multichannel speaker or a stereo speaker. In this case, as described above, the data transmitted through the surround panning unit 1410 is reproduced by the multi-channel speaker, and the data transmitted by the down mixing unit 1430 is reproduced by the stereo speaker.

다음으로, 본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 방법에 대해 설명한다.Next, a multi-view image and a 3D audio transmission / reception method according to a preferred embodiment of the present invention will be described.

본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 방법은 다시점 화상 획득단계, 다시점 화상 부호화단계, 3D 오디오 획득단계, 3D 오디오 부호화단계, (a)부호화된 다시점 화상 데이터 일부를 다중화하는 다시점 화상 다중화 단계, (b)부호화된 나머지 다시점 화상 데이터와 부호화된 3D 오디오 데이터를 다중화하는 화상 오디오 다중화단계, (c)시청자가 요구한 시점에 따라 시점 화상의 기술자와 3D 오디오의 기술자를 생성하는 단계, 수신측에서 요구된 시점 화상을 비압축 데이터 형태로 전송하는 비압축 데이터 전송단계, 모든 시점 화상과 이에 대응되는 3D 오디오를 하나의 데이터스트림으로 다중화하여 압축 데이터 형태로 전송하는 압축 데이터 전송단계, (d)수신측에서 요구하는 시점 화상과 그에 대응되는 3D 오디오를 생성하는 단계, 다시점 데이터 복호화단계, 다시점 화상 저장단계, (e)복호화된 다시점 화상 데이터를 통해 중간 시점의 화상을 생성하는 단계, 다시점 화상 재생 단계, 3D 오디오 저장단계, 3D 오디오 합성 단계, 및 3D 오디오 재생단계를 포함하여 이루어진다.A multi-view image and a 3D audio transmission / reception method according to a preferred embodiment of the present invention may include a multi-view image acquisition step, a multi-view image encoding step, a 3D audio acquisition step, a 3D audio encoding step, and (a) a part of encoded multi-view image data. A multi-view image multiplexing step of multiplexing, (b) a video-audio-multiplexing step of multiplexing the remaining encoded multi-view image data and the encoded 3D audio data, and (c) the description of the viewpoint image and the 3D audio according to the viewpoint requested by the viewer. Generating a descriptor; transmitting uncompressed data in the form of uncompressed data requested by the receiving side; multiplexing all the view images and the corresponding 3D audio into a single data stream and transmitting them in compressed data form (D) generating a view image requested by the receiving side and 3D audio corresponding thereto; A multi-view data decoding step, a multi-view image storing step, (e) generating an image of an intermediate view through the decoded multi-view image data, a multi-view image reproducing step, a 3D audio storing step, a 3D audio synthesis step, and 3D audio reproduction step.

상기 (c)시청자가 요구한 시점에 따라 시점 화상의 기술자와 3D 오디오의 기술자를 생성하는 단계는 전송받은 복수개의 시점 화상으로부터 시청자가 원하는 적어도 하나의 시점을 선택하는 단계 및 선택된 시점의 기술자를 생성하여 전송하는 단계를 포함하여 이루어진다.
상기 (d)수신측에서 요구하는 시점 화상과 그에 대응되는 3D 오디오를 생성하는 단계는 전송된 시점 기술자를 파싱하는 단계, 다시점 리소스를 적응하는 단계, 다시점 화상 기술자를 적응하는 단계, 오디오 리소스를 적응하는 단계, 및 오디오 기술자를 적응하는 단계를 포함하여 이루어진다.(C) generating the descriptor of the viewpoint image and the descriptor of 3D audio according to the viewpoint requested by the viewer, selecting at least one viewpoint desired by the viewer from the plurality of viewpoint images received and generating the descriptor of the selected viewpoint. It comprises the step of transmitting.
The step (d) of generating a viewpoint image required by the receiver and 3D audio corresponding thereto may include parsing the transmitted viewpoint descriptor, adapting a multiview resource, adapting a multiview image descriptor, and audio resources. And adapting the audio descriptor.

삭제delete

또한, 상기 (e)복호화된 다시점 화상 데이터를 통해 중간 시점의 화상을 생성하는 단계는 카메라의 외부 및 내부 파라미터를 계산하는 단계, 이웃하는 시점 화상 간의 매칭점을 찾아 초기 깊이정보를 찾아내는 단계, 이웃하는 시점 화상간에 기준 화상을 정하고, 기준 화상에 대한 색상정보를 바탕으로 서로 다른 색상 정보를 가지는 영역으로 분할하는 단계, 색상별로 분할된 영역에 초기 깊이 정보를 반영하여 깊이 계층을 생성하는 단계, 깊이 계층에 반복적으로 깊이 할당을 수행하는 단계 및 깊이 할당을 수행하는 단계를 통해 최적으로 생성된 깊이 정보와, 계산된 카메라 파라미터를 이용하여 임의 시점의 화상을 생성하는 단계를 포함하여 이루어진다. 상기 각 단계에 대해서는 상기 다시점 화상 및 3D 오디오 송수신 장치 구조에서 상술하였으므로, 여기서는 생략하기로 한다.In addition, the step (e) generating the image of the intermediate view through the decoded multi-view image data, the step of calculating the external and internal parameters of the camera, the step of finding the initial depth information by finding a matching point between neighboring viewpoint images, Determining a reference image between neighboring viewpoint images, dividing the reference image into regions having different color information based on the color information of the reference image, generating a depth hierarchy by reflecting initial depth information in the region divided by colors, Repeatedly performing depth assignment to the depth hierarchy and performing depth assignment to generate an image at any point of time using the optimally generated depth information and the calculated camera parameters. Since each of the above steps has been described above in the structure of the multi-view image and 3D audio transceiver, the description thereof will be omitted.

본 발명의 바람직한 실시예에 따른 다시점 화상 및 3D 오디오 송수신 방법이 수행되는 대략적인 순서는 다음과 같다.The general order of performing the multi-view image and 3D audio transmission / reception method according to the preferred embodiment of the present invention is as follows.

먼저, 다시점 화상 획득부(100)로 입력된 다시점 화상은 다시점 화상 부호화부(200)를 통해 인코딩된 후, 다시점 방송 서버부(500)로 전송된다. 마찬가지로, 3D 오디오 획득부(300)로 입력된 사운드 소스는 3D 오디오 부호화부(400)를 통해 인코딩된 후, 다시점 방송 서버부(500)로 전송된다.First, the multi-view image input to the multi-view image acquisition unit 100 is encoded through the multi-view image encoder 200 and then transmitted to the multi-view broadcast server unit 500. Similarly, the sound source input to the 3D audio acquirer 300 is encoded through the 3D audio encoder 400 and then transmitted to the multi-view broadcast server 500.

다시점 방송 서버부(500)에서는 부호화된 다시점 화상 데이터 일부를 다중화하고(a단계), 나머지 다시점 화상 데이터와 3D 오디오 데이터를 다중화한다(b단계). 시점 변환부(800)에서는 시청자가 요구한 시점에 따라, 시점 화상 기술자와 3D 오디오 기술자를 생성(c단계)한 후, 다시점 적응 서버부(600)로 전송한다. 다시점 적응 서버부(600)에서는 수신측 시점 변환부(800)에서 요구하는 시점에 적응하여 시점 화상과 그에 대응하는 3D 오디오를 생성(d단계)하여 다시점 데이터 전송부(700)로 전송한다. 다시점 데이터 전송부(700)에서는 비압축 데이터 전송과 압축 데이터 전송을 수행하여 전송하고, 다시점 데이터 복호화부(900)에서 이를 디코딩하여 다시점 화상 저장부(1000)와 3D 오디오 저장부(1300)에 저장한다. The multi-view broadcasting server 500 multiplexes a part of the encoded multi-view image data (step a), and multiplexes the remaining multi-view image data and 3D audio data (step b). The viewpoint converting unit 800 generates the viewpoint image descriptor and the 3D audio descriptor (step c) according to the viewpoint requested by the viewer, and then transmits the generated viewpoint image descriptor and the 3D audio descriptor to the multi-view adaptive server unit 600. The multi-view adaptation server 600 generates a view image and 3D audio corresponding to the view point requested by the receiving view converting unit 800 (step d), and transmits it to the multi-view data transmitting unit 700. . The multi-view data transmitter 700 transmits uncompressed data and compressed data, and decodes the multi-view data decoder 900 to decode the multi-view image storage unit 1000 and the 3D audio storage unit 1300. ).

중간시점 화상 생성부(1100)에서는 다시점 화상 저장부(1000)로부터 복호화된 다시점 화상 데이터를 가져와서 중간시점의 화상을 생성한다(e단계). 이러한 시점 화상들은 다시점 화상 재생부(1200)를 통하여 재생된다. 마찬가지로, 3D 오디오 합성부(1400)에서는 3D 오디오 저장부(1300)로부터 복호화된 3D 오디오 데이터를 가져와서 다채널 혹은 스테레오 채널의 3D 오디오를 합성하고, 3D 오디오 재생부(1500)를 통해 재생된다.The intermediate view image generating unit 1100 generates the image of the intermediate view by taking the decoded multiview image data from the multiview image storage unit 1000 (step e). These viewpoint images are reproduced through the multi-view image reproducing unit 1200. Similarly, the 3D audio synthesizer 1400 takes 3D audio data decoded from the 3D audio storage 1300, synthesizes 3D audio of a multichannel or a stereo channel, and reproduces the 3D audio through the 3D audio playback unit 1500.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications, changes, and substitutions may be made by those skilled in the art without departing from the essential characteristics of the present invention. will be. Accordingly, the embodiments disclosed in the present invention and the accompanying drawings are not intended to limit the technical spirit of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited by the embodiments and the accompanying drawings. . The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of the present invention.

본 발명에 의하면 다시점 화상과 3D 오디오 데이터를 결합하되 시청자가 보는 시점화상에 따라 다른 오디오를 재생함으로써 보다 실감나는 미디어를 공급할 수 있고, 수신측에서 원하는 시점의 화상만을 선택적으로 전송할 수 있어 좁은 대역폭으로도 가능하며, 중간시점의 화상을 생성하여 모니터의 시점 간격에 맞추어 부드럽고 자연스러운 시점 화상을 제공할 수 있는 효과가 있다.According to the present invention, the multi-view image and the 3D audio data are combined, but by playing different audio according to the view image viewed by the viewer, more realistic media can be supplied, and only the image of the desired view can be selectively transmitted from the receiving side. In addition, it is possible to generate a mid-point image and provide a smooth and natural view image according to the viewing interval of the monitor.

Claims

In the apparatus for transmitting and receiving multi-view image data and 3D audio data,

A multiview broadcast server unit including a multiview image multiplexer for multiplexing a part of encoded multiview image data, and an image audio multiplexer for multiplexing the remaining coded multiview image data and the encoded 3D audio data;

A multi-view adaptive server unit for generating a viewpoint image requested by the receiving side and 3D audio corresponding thereto;

An acquisition server unit for obtaining data of each viewpoint image from a camera, a control server unit for giving a control command to selectively transmit the viewpoint requested at the receiving side, and data of the viewpoint image commanded from the control server unit; A multi-view data transmission unit which has an IP (Internet Protocol) multicast processing unit which takes the data from the unit and transmits it in the form of uncompressed data and transmits it in the form of uncompressed data;

A viewpoint conversion unit for generating a descriptor of a viewpoint image and a descriptor of 3D audio according to a viewpoint requested by the viewer; And

An intermediate viewpoint image generation unit that generates an intermediate viewpoint image through the decoded multiview image data.

Multi-view image and 3D audio transmission and reception apparatus comprising a.

The method of claim 1, wherein the multi-view adaptive server unit

A view descriptor parser for parsing a view descriptor transmitted from the view transform unit;

An image resource adaptor for adaptively converting a multi-view resource parsed through the viewpoint descriptor parser to a receiving terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptive resource Rv;

An image descriptor adaptor for adaptively converting the multi-viewpoint image descriptor transmitted from the viewpoint converter to a receiving terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptive descriptor Dv;

An audio resource adaptor configured to adaptively convert the audio resource parsed through the viewpoint descriptor parser into a receiving terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptive resource Ra; And

An audio descriptor adaptor for adaptively converting an audio descriptor transmitted from the viewpoint converter to a reception terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptive descriptor Da.

The method of claim 1, wherein the viewpoint converting unit

A viewpoint determination unit for selecting at least one viewpoint desired by the viewer from the plurality of viewpoint images received; And

A viewpoint descriptor generator which generates a descriptor of a selected viewpoint from the viewpoint determiner and transmits the descriptor to the multi-view adaptive server unit

The method of claim 1, wherein the intermediate view image generation unit

A camera parameter calculator configured to calculate external and internal parameters of the camera;

A depth information calculator for finding initial depth information by finding matching points between neighboring viewpoint images;

A region dividing unit for determining a reference image between neighboring viewpoint images and dividing the regions of neighboring viewpoint images into regions having different color information based on color information of the reference image;

While generating the depth layer by reflecting the initial depth information calculated by the depth information calculation unit in the area divided by color through the area dividing unit, it is repeated to assign the representative depth information for the same area to all areas A depth layer prediction unit for generating a depth hierarchy by grouping representative depth information and extracting one representative depth value from the group;

A depth information improver which repeatedly allocates depth to the depth layer generated from the depth layer predictor; And

An intermediate view maker for generating an image of an arbitrary view point using depth information optimally generated by the depth information improver and camera parameters calculated by the camera parameter calculator.

The method of claim 1,

A multi-view image acquisition unit for obtaining a multi-view image from a plurality of cameras;

A multi-view image encoder for encoding multi-view image data obtained through the multi-view image acquisition unit;

A 3D audio acquisition unit for acquiring multi-channel 3D audio from the plurality of microphones;

3D audio encoder for encoding the 3D audio data obtained through the 3D audio acquisition unit

Multi-view image and 3D audio transmitting and receiving device further comprising.

delete

The multi-view data transmission unit of claim 1, wherein

A viewpoint data collection unit which receives all viewpoint data including each viewpoint image and generation time information of 3D audio from the acquisition server unit; And

A synchronous multiplexer for synchronously multiplexing the viewpoint data using the generation time information

Multi-view image and 3D audio transmission and reception apparatus characterized in that it further comprises a compressed data transmission.

The method of claim 1,

A j-view decoder for decoding j-views from the received n multi-view coded data;

A viewpoint data collection unit for collecting the data decoded through the j viewpoint decoder; And

3D audio decoder for decoding the received multi-channel audio data

A multi-view image and 3D audio transmitting and receiving apparatus further comprising a multi-view data decoding unit having n, wherein n≥j.

The method of claim 8,

A multi-view image storage unit for storing the data collected through the viewpoint data collection unit to generate a viewpoint image in a frame unit for each viewpoint;

A 3D audio storage unit for synchronizing with a multi-view image and storing decoded 3D audio data for 3D audio synthesis;

A 3D audio synthesizing unit for synthesizing 3D audio corresponding to the viewpoint image selected by the viewer from the 3D audio storage unit; And

3D audio playback unit for playing back 3D audio data synthesized through the 3D audio synthesis unit through a multi-channel speaker or a stereo speaker

The method of claim 9, wherein the 3D audio synthesis unit

A surround panning unit for synthesizing 3D audio corresponding to the selected viewpoint image through surround panning according to a viewer's viewpoint selection;

3D parameter extraction unit for extracting 3D parameters from the multi-channel audio data to give a 3D effect in stereo playback; And

A down mixing unit for generating stereo audio data using the 3D parameters extracted through the 3D parameter extractor

In the method for transmitting and receiving multi-view image data and 3D audio data,

(a) a multiview image multiplexing step of multiplexing a part of encoded multiview image data;

(b) a video audio multiplexing step of multiplexing the remaining coded multiview image data and the encoded 3D audio data;

(c) generating a descriptor of a viewpoint image and a descriptor of 3D audio according to the viewpoint requested by the viewer;

(d) The uncompressed data transmission step of transmitting the requested view image in the form of uncompressed data, and multiplexing all view images and the corresponding 3D audio into one data stream and transmitting the resultant view image in the form of compressed data. A multi-view data transmission step including a compressed data transmission step;

(e) generating a viewpoint image requested by the receiver and 3D audio corresponding thereto; And

(f) generating an intermediate viewpoint image through the decoded multi-view image data

Multi-view image and 3D audio transmission and reception method comprising a.

The method of claim 11, wherein step (e)

Parsing the view descriptor transmitted through the step (c);

An image resource adapting step of adaptively converting a multi-view resource parsed through the viewpoint descriptor parsing into a receiving terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptive resource Rv;

An image descriptor adaptation step of adaptively converting the multi-viewpoint image descriptor transmitted through the step (c) to a reception terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptation descriptor Dv;

An audio resource adaptation step of adaptively converting the audio resource parsed through the viewpoint descriptor parsing to a receiving terminal characteristic or a viewer's reproduction preference characteristic to generate an adaptive resource Ra; And

The audio descriptor adaptation step of adaptively converting the audio descriptor transmitted through the step (c) to the receiving terminal characteristic or the viewer's reproduction preference characteristic to generate the adaptive descriptor Da.

Multi-view image and 3D audio transmission and reception method comprising a.

The method of claim 11, wherein step (c)

Selecting at least one viewpoint desired by a viewer from a plurality of received viewpoint images; And

Creating and transmitting the descriptor of the selected time point

Multi-view image and 3D audio transmission and reception method comprising a.

The method of claim 11, wherein step (f)

Calculating external and internal parameters of the camera;

Finding initial depth information by finding a matching point between neighboring viewpoint images;

Determining a reference picture between neighboring viewpoint images, and dividing the region of neighboring viewpoint images into an area having different color information based on color information of the reference image;

The depth layer is generated by reflecting the initial depth information in the region divided by color, but the task of allocating the representative depth information for the same region is repeated in all the regions, the representative depth information is grouped, and the representative depth value in the group. Extracting one to generate a depth hierarchy;

Iteratively performing depth allocation to the depth hierarchy; And

Generating an image at any point in time using the depth information optimally generated through the depth allocation and the calculated camera parameters.

Multi-view image and 3D audio transmission and reception method comprising a.

delete

12. The method of claim 11, wherein the multiplexing into one data stream in the compressed data transmission step is performed.

A multi-view image and 3D audio transmission and reception method, characterized in that performed in a synchronized form using the time information of each viewpoint image and 3D audio.