KR100783722B1

KR100783722B1 - A Scalable Stereoscopic Video Coding method for Heterogeneous Environments and the apparatus theirof

Info

Publication number: KR100783722B1
Application number: KR1020050125928A
Authority: KR
Inventors: 오세찬; 김길동; 박성혁; 이한민; 한석윤
Original assignee: 한국철도기술연구원
Priority date: 2005-12-20
Filing date: 2005-12-20
Publication date: 2007-12-07
Also published as: KR20070065481A

Abstract

본 발명은 다양한 수신환경을 고려한 스테레오 비디오 부호화 방법에 있어서, 스케일러블 스테레오 비디오 부호화기(Scalable Stereoscopic Video Encoder)로 영상신호를 부호화하는 방법에 있어서, 공간적 기본 해상도와 완전한 해상도를 표현하기 위해 입력된 스테레오 영상을 하향 샘플링하고, 공간적 기본 해상도의 좌 영상은 비계위 MPEG-2 부호화기(Non-Scalable MPEG-2 Encoder)를 사용하여 부호화하며, 시간적 해상도를 표현하기 위해 B프레임 분할(B frame Partition)을 거쳐 각각 기본계층(BL) 및 제1확장계층(EL1) 비트 열을 생성하는 것으로, 이질적인 수신 단말들의 효율적인 디스플레이 제공을 위한 시공간적 계위 기술을 이용한 스테레오 영상 부호화 기술이다.According to the present invention, in a stereo video encoding method considering various reception environments, in a method of encoding an image signal with a scalable stereoscopic video encoder, a stereo image input to express a spatial basic resolution and a full resolution is input. Down-sampling, the left image of spatial basic resolution is encoded using a non-scalable MPEG-2 encoder, and each frame is subjected to a B frame partition to express temporal resolution. By generating the base layer BL and the first extended layer EL1 bit string, the stereo image encoding technique using the space-time hierarchy technique for providing an efficient display of heterogeneous receiving terminals.

더욱이 효율적인 스테레오 영상의 재현을 위하여, 하나의 기본계층(Base Layer: BL)과, 하나 혹은 복수개의 확장계층(Enhancement Layer: EL)을 갖는다. 또한, 기술적인 특징들로는 스테레오 비디오 부호화 효율을 높이기 위해 ECSC(Enhanced Compatible Stereoscopic Coding; 확장 호환을 갖는 스테레오 비디오 부호화) 방법을 제안하고, 유연성 있는 스테레오 영상 서비스의 제공을 위해 시공간적 계위를 이용한 스테레오 비디오 부호화기를 제공한다.Furthermore, for efficient stereo image reproduction, one base layer (BL) and one or more enhancement layers (EL) are provided. In addition, technical features include ECSC (Enhanced Compatible Stereoscopic Coding) to improve stereo video encoding efficiency, and stereo video encoder using spatio-temporal hierarchy to provide flexible stereo image service. to provide.

스테레오 비디오, 시공간적 계위, 스케일러블, 부호화, 영상 Stereo video, spatiotemporal hierarchy, scalable, coding, image

Description

A scalable stereoscopic video coding method for heterogeneous environments and the apparatus theirof}

도 1은 본 발명에 따른 스케일러블 스테레오 비디오 서비스의 전체 구성도다.1 is an overall configuration diagram of a scalable stereo video service according to the present invention.

도 2는 본 발명에 관한 시공간적 계위 구성도로서, 도 2a는 I프레임과 P프레임 사이에 3개의 B프레임이 존재하는 경우(즉 M=3)를 나타낸 것이고, 도 2b는 I프레임과 P프레임 사이에 5개의 B프레임이 존재하는 경우(즉 M=5)를 나타낸 것이다.FIG. 2 is a space-time hierarchy diagram of the present invention, and FIG. 2A shows a case in which three B frames exist between an I frame and a P frame (that is, M = 3), and FIG. 2B shows a space between an I frame and a P frame. In this case, 5 B frames exist (i.e., M = 5).

도 3은 본 발명에 따른 스케일러블 스테레오 비디오 부호화기를 나타낸 블록도이다.3 is a block diagram illustrating a scalable stereo video encoder according to the present invention.

도 4는 본 발명에 따른 ECSC(Enhanced Compatible Stereoscopic Coding)를 이용한 B-프레임의 예측과정을 나타낸 설명도이다.4 is an explanatory diagram illustrating a prediction process of a B-frame using ECSC (Enhanced Compatible Stereoscopic Coding) according to the present invention.

본 발명은 다양한 수신환경을 고려한 스테레오 비디오 부호화 방법 및 그 장 치에 관한 것으로, 다양하고 이질적인 네트워크 환경 및 디스플레이 환경을 갖는 수신단말의 허용대역폭, 디스플레이 시스템과, 그 처리능력을 고려하여 적합한 시공간적 해상도의 스테레오 비디오 서비스를 제공하기 위한 스테레오 비디오 부호화 방법 및 그 장치에 관한 것이다.The present invention relates to a stereo video encoding method and a device thereof in consideration of various receiving environments, and to a suitable spatiotemporal resolution in consideration of the allowable bandwidth of a receiving terminal having various heterogeneous network environments and display environments, a display system, and processing power thereof. A stereo video encoding method and apparatus for providing a stereo video service.

최근 인터넷망 및 멀티미디어 서비스의 발전과 더불어 멀리 떨어져있는 사용자들 사이에서 3차원 영상이나 파노라마 영상 등을 포함하는 고화질의 실감미디어를 제공하기 위하여 동시적인 공동작업 환경이 가능해졌다.Recently, with the development of internet network and multimedia service, simultaneous collaboration environment has been made possible to provide high quality realistic media including 3D video or panoramic video among remote users.

그러나 상기 3차원 영상의 고화질 실감미디어의 데이터를 송수신하거나 사용할 수 있는 기반에는 네트워크 인프라가 구축되어 있어야 하고, 그리고 3차원 영상 데이터를 가시적으로 보여줄 수 있는 디스플레이 장치들이 수반되어야 한다. 또한, 현재 구축되어 있는 네트워크 인프라들과 각 수신 장치들은 서로 이질적이거나 각각 전송하거나 처리할 수 있는 데이터의 양이 각각 상이하다.However, a network infrastructure must be established on the basis of transmitting or using high-definition sensory media data of the 3D image, and display devices capable of visually displaying 3D image data must be accompanied. In addition, currently established network infrastructures and respective receiving apparatuses are heterogeneous with each other or have different amounts of data that can be transmitted or processed.

따라서 상기 측면들을 고려하면, 네트워크에서 고려되어야 하는 다양한 지원 가능한 범위와 수신 장치의 능력을 고려한 3차원 영상 데이터 부호화 기법이 고려되어야 한다. Accordingly, in consideration of the above aspects, a 3D image data encoding technique considering various supportable ranges and capabilities of a reception apparatus to be considered in a network should be considered.

이와 같이 다양한 네트워크와 수신 장치들의 성능에 따른 차별적인 서비스 제공을 위해서 많은 부호화 기법들에 대한 연구가 이루어지고 있다. 더욱이 향후, 예를 들어 2015년 까지 상용화를 목표로 현재 국내에서 추진하고 있는 실감방송에 대한 적극적인 연구지원으로 산업체, 학교 및 연구소 등에서 연구가 활발하게 진행 중에 있다.As such, many coding techniques have been studied to provide differentiated services according to the performance of various networks and receiving apparatuses. Moreover, in the future, for example, the research is being actively conducted in industries, schools, and research institutes with active research support for realistic broadcasting, which is currently being promoted in Korea, for the purpose of commercialization by 2015.

현재의 3차원 영상 데이터는 스테레오 영상이 주류를 이루고 있다. 스테레오 영상을 효율적으로 부호화하고, 이를 네트워크의 대역폭이나 수신 장치의 수용 범위에 따라 차별적인 서비스의 제공이 요구된다. 더욱이 이러한 점들을 고려하여 제안한 스테레오 부호화 기법은 스케일러빌러티(scalability; 사용자 수의 증대에 유연하게 대응할 수 있는 정도의 확장성) 부호화 방식을 기반으로 하는 차별화된 서비스의 제공이 가능하도록 하는 기술적 의미를 갖는 것이다.Currently, three-dimensional image data is the mainstream stereo image. It is required to efficiently encode stereo images and to provide differentiated services according to the bandwidth of the network or the receiving range of the receiving apparatus. In addition, the proposed stereo coding scheme considers the technical meaning of providing differentiated services based on scalability (scalability that can flexibly respond to an increase in the number of users). To have.

종래에 스테레오 영상 부호화 기술들은 양안차 예측(Disparity Estimation)의 효율 측면에서 문제들을 접근하고 있다. 양안차 예측의 복잡도를 단순화하기 위한 속도적 개선과 양안차 예측의 정확도를 높이기 위한 기술들이 연구되고 있다. 그러나 이와 같은 기술들은 전체적인 부호화 효율의 향상은 꾀할 수 있지만, 다양한 구성을 갖는 현재의 네트워크 환경에는 적합하지 않은 실정이다. 각각의 네트워크 환경이 보장할 수 있는 대역폭이 다를 뿐만 아니라, 가변적이기 때문에 다양한 변화에 대처하기는 힘들다. 더불어 3차원 영상 데이터를 보여주기 위한 장치들도 각각 그 성능이 다르다. 따라서 지나치게 처리과정이 복잡한 방법을 적용하는 경우에는 실제적으로 해당 장치를 이용하여 실시간으로 처리하는 것에는 곤란한 문제가 내재되어 있었다.Background Art Conventional stereo image coding techniques approach problems in terms of efficiency of disparity estimation. Techniques to improve the speed of binocular car prediction and to improve the accuracy of binocular car prediction are being studied. However, these techniques can improve the overall coding efficiency, but are not suitable for the current network environment having various configurations. Not only is the bandwidth that each network environment can guarantee, but also variable, so it is difficult to cope with various changes. In addition, the devices for displaying three-dimensional image data are also different in performance. Therefore, when an excessively complicated method is applied, it is difficult to actually process the device in real time.

본 발명은 상기 종래기술들이 갖는 이질적이면서 공동작업 환경에 대한 적응 적인 측면에서 나타나는 문제점을 해결하기 위한 것이 목적이다.It is an object of the present invention to solve the problems inherent in the heterogeneous and adaptive aspects of a collaborative environment.

또한, 본 발명은 다양한 네트워크의 상황에 부합되는 수준의 데이터 처리가 가능하도록 하고, 수신 단말에서 지원 가능한 수준의 스테레오 비디오 서비스를 제공하기 위한 것이 목적이다.In addition, an object of the present invention is to provide a level of data processing corresponding to the situation of various networks, and to provide a level of stereo video service that can be supported by a receiving terminal.

더욱이 본 발명은 이질적인 수신 단말들의 효율적인 디스플레이 제공을 위한 시공간적 계위 기술을 이용하여 다양한 수신환경을 고려한 스테레오 비디오 부호화 방법 및 그 장치를 제공하기 위한 것이다.Furthermore, the present invention is to provide a method and apparatus for stereo video encoding in consideration of various reception environments using a space-time hierarchy technique for providing an efficient display of heterogeneous receiving terminals.

본 발명은 상기 목적을 달성하기 위하여, 효율적인 스테레오 영상을 재현하기 위한 방법에 있어서,In order to achieve the above object, the present invention provides a method for reproducing an efficient stereo image,

입력된 기준영상(좌 영상)의 기본 해상도를 제공하기 위한 하나의 기본계층(BL; Base Layer)과,One base layer (BL) for providing a basic resolution of an input reference image (left image),

기본 해상도의 우 영상과 고해상도의 좌우 영상을 지원하기 위한 추가정보들을 제공하기 위한 하나 이상의 확장계층(EL; Enhancement Layer)을 갖는 다양한 수신환경을 고려한 스테레오 비디오 부호화 방법을 제공한 것이 특징이다.The present invention provides a stereo video encoding method considering various reception environments having one or more enhancement layers (ELs) for providing additional information for supporting a right image having a basic resolution and a left and right image having a high resolution.

본 발명은 또한 3차원영상 부호화수단과 공간적계위 부호화수단 및 시간적계위 부호화수단을 포함하는 다양한 수신환경을 고려한 스테레오 비디오 부호화 장치를 제공하고자 한다.Another object of the present invention is to provide a stereo video encoding apparatus considering various receiving environments including a 3D image encoding unit, a spatial hierarchy encoding unit, and a temporal hierarchy encoding unit.

이하 첨부된 도면을 참조하여 본 발명의 다양한 수신환경을 고려한 스테레오 비디오 부호화 방법 및 그 장치에 관하여 상세하게 설명하면 다음과 같다.Hereinafter, a stereo video encoding method and apparatus thereof in consideration of various reception environments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에서 제안하는 부호화 장치를 이용하는 스테레오 비디오 서비스시스템을 나타낸 구성도이다.1 is a block diagram illustrating a stereo video service system using an encoding apparatus proposed by the present invention.

비디오 영상을 단말기측으로 제공하는 서버측에 포함된 부호화기에서 스테레오 영상을 하나의 기본계층(BL)과 복수의 확장계층(EL1-EL5)으로 나누어 부호화를 수행한다.The encoder included in the server providing the video image to the terminal side performs encoding by dividing the stereo image into one base layer BL and a plurality of enhancement layers EL1 to EL5.

각기 상이한 수신 단말기들(Client A-D)의 네트워크 대역폭 및 처리환경에 따라 차별적으로 기본계층(BL) 비트열 또는 기본계층(BL)과 확장계층(EL)의 비트열을 함께 전송한다. Differently, the base layer (BL) bit sequence or the base layer (BL) and the extension layer (EL) bit sequence are transmitted together according to the network bandwidth and processing environment of the different receiving terminals (Client A-D).

수신 단말기(Client A-D)는 전송받은 여러 계층의 비트열을 결합하여 수신 단말기(Client A-D)에서 지원 가능한 수준의 시공간적 해상도를 갖는 영상을 디스플레이 한다. The receiving terminal (Client A-D) combines the received bit streams of various layers to display an image having a spatiotemporal resolution that can be supported by the receiving terminal (Client A-D).

예를 들어, 제1수신 단말기(Client A)는 저해상도의 모노 타입의 비디오 데이터만 필요한 경우로서, 기본계층에 해당하는 비디오 비트 열을 제공받는다. 그리고 이와는 반대로 제4수신 단말기(Client D)는 고해상도의 3차원 데이터가 필요한 경우로서, 이 경우에는 모든 계층의 비트 열을 제공받는다.For example, the first receiving terminal Client A needs only low resolution mono type video data and is provided with a video bit sequence corresponding to a base layer. On the contrary, the fourth receiving terminal Client D needs high-resolution 3D data, and in this case, a bit string of all layers is provided.

따라서 각 수신 단말기(Client A-D)는 자신의 디스플레이 및 네트워크 환경에 맞는 비트 열을 제공받을 수 있고, 수신 단말기(Client A-D) 환경에 의하여 하 나의 3차원 비디오 소스에 대해서 모노 또는 다양한 스테레오 디스플레이가 가능하다.Therefore, each receiving terminal (Client AD) can be provided with a bit string suitable for its display and network environment, and mono or various stereo displays can be displayed for one 3D video source by the receiving terminal (Client AD) environment. .

본 발명은 스테레오 비디오 부호화 효율을 높이기 위하여 (ECSC; Enhanced Compatible Stereoscopic Coding) 방법을 제공하고, 유연성을 갖는 스테레오 영상 서비스의 제공을 위하여 시공간적 계위를 이용한 스테레오 비디오 부호화기를 제공한다. 이를 위해 기준영상(좌 영상)은 MPEG-2 표준을 사용하여 부호화하고, 우 영상의 P-프레임 및 B-프레임는 MPEG-2 멀티뷰 프로파일에서 정의된 compatible 방식에 양안차 벡터(DV)와 움직임 벡터(MV)를 이용한 보간 기법을 사용하였다.The present invention provides an Enhanced Compatible Stereoscopic Coding (ECSC) method for improving stereo video encoding efficiency, and provides a stereo video encoder using space-time hierarchy to provide a stereo image service with flexibility. For this purpose, the reference image (left image) is encoded using the MPEG-2 standard, and the P-frame and B-frame of the right image are binocular vector (DV) and motion vectors in a compatible manner defined in the MPEG-2 multiview profile. Interpolation technique using (MV) was used.

도 2는 본 발명의 시공간적 계위 구성도이다. 도2a는 I와 P프레임 사이에 3개의 B프레임이 존재하는 경우(즉, M=3)를 나타낸 것으로, 이질적인 다양한 디스플레이 환경을 가지는 시스템들 간에 효율적인 3차원 영상 전송을 위하여 하나의 기본계층(BL)이외에 추가로 복수의 확장계층(EL)을 정의한다. 기본계층(BL)은 입력된 기준영상(좌 영상)의 기본 해상도에 해당하는 비트열이며, 제1확장계층(EL1)은 시간적으로 고해상도의 기준영상을제공하기 위한 추가적인 비트열이다. 제3확장계층(EL2)는 기준영상에 대해 공간적 고해상도를 제공하기 위한 추가적인 비트열이다. 한편, 제3확장계층(EL3)은 기본 해상도에 해당하는 우 영상을 표현하기 위한 비트 열이다. 제4확장계층(EL4) 및 제5확장계층(EL5)는 고화질의 우 영상을 표현하기 위한 비트열을생성한다. 기본계층(BL)에서는 현재 프레임과 이웃한 프레임을 이용한 움직임 보상 예측(MCP; Motion Compensation Prediction)을 기반으로 부호 화를 수행하여 시간적 중복성을 제거한다. 제3확장계층(EL3)에서는 시간적으로 근접한 자신의 r층에 존재하는 영상들로부터 움직임 보상 예측(MCP)을 수행함과 더불어 압축 효율을 높이기 위하여 시간적으로 일치하는 기본계층(BL)의 복호화된 좌 영상과의 양안차 보상 예측(DCP; Disparity Compensation Prediction)을 수행함으로써 생성된다.2 is a space-time hierarchy configuration diagram of the present invention. FIG. 2A shows a case where three B frames exist between I and P frames (that is, M = 3), and one base layer BL is used for efficient three-dimensional image transmission between systems having heterogeneous and diverse display environments. In addition to the above), a plurality of extended layers EL are defined. The base layer BL is a bit sequence corresponding to the base resolution of the input reference image (left image), and the first extension layer EL1 is an additional bit sequence for providing a high resolution reference image in time. The third extension layer EL2 is an additional bit string for providing spatial high resolution with respect to the reference image. Meanwhile, the third extension layer EL3 is a bit string for representing the right image corresponding to the basic resolution. The fourth expansion layer EL4 and the fifth expansion layer EL5 generate a bit string for representing a high quality right image. The base layer BL removes temporal redundancy by performing encoding based on a motion compensation prediction (MCP) using a current frame and a neighboring frame. In the third extended layer EL3, the decoded left image of the base layer BL that matches the temporal layer BL in order to improve the compression efficiency while performing motion compensation prediction (MCP) from the images present in the adjacent r layer in time. It is generated by performing Disparity Compensation Prediction (DCP).

제1확장계층(EL1)과 제4확장계층(EL4)은 기본계층(BL)과 제3확장계층(EL3) 비트열에 추가적으로 완전한 시간적 해상도를 제공하기 위한 비트 열이다. 이와 마찬가지로, 제2확장계층(EL2)과 제5확장계층(EL5)은 기본계층(BL)+제1확장계층(EL1)과, 제3확장계층(EL3)+제4확장계층(EL4) 비트열에 추가적으로 완전한 공간적 해상도를 제공하기 위한 비트 열이다.The first extension layer EL1 and the fourth extension layer EL4 are bit sequences for providing complete temporal resolution in addition to the base layer BL and the third extension layer EL3. Similarly, the second extension layer EL2 and the fifth extension layer EL5 include the base layer BL + the first extension layer EL1 and the third extension layer EL3 + the fourth extension layer EL4. It is a bit string to provide full spatial resolution in addition to the column.

도2b는 I와 P프레임 사이에 5개의 B프레임이 존재하는 경우(즉, M=5)로서 도 2a의 변형된 실시예를 나타낸다.FIG. 2B shows the modified embodiment of FIG. 2A as if there are five B frames (i.e., M = 5) between the I and P frames.

이를 위해 본 발명에 의한 스테레오 비디오 부호화 장치는 상기 계층들을 계위적으로 부호화하기 위한 공간적 계위 부호화 수단, 3차원 영상 부호화 수단 및 시간적 계위 부호화 수단을 포함하여 구성된다.To this end, the stereo video encoding apparatus according to the present invention includes a spatial hierarchy encoding means, a three-dimensional image encoding means, and a temporal hierarchy encoding means for hierarchically encoding the layers.

도 3은 본 발명에 따른 스케일러블 스테레오 비디오 부호화기(Scalable Stereoscopic Video Encoder)의 세부적인 구성을 나타낸 것이다.3 illustrates a detailed configuration of a scalable stereoscopic video encoder according to the present invention.

도 3에 의하면 공간적으로 기본 해상도와 완전한 해상도를 표현하기 위해 입력된 스테레오 영상을 하향 샘플링 한다. 하향 샘플링 된 공간적 기본 해상도의 좌 영상은 기존의 비계위 MPEG-2 부호화기(40, Non-Scalable MPEG-2 Encoder)를 사용하여 부호화 된다. 이 후 시간적 계위 부호화수단(30)을 통해 여러 시간적 해상도를 표현하기 위해 B프레임 분할(B frame Partition)을 거쳐 각각 기본계층(BL), 제1확장계층(EL1) 비트 열을 생성한다.According to FIG. 3, the input stereo image is downsampled in order to spatially represent the basic resolution and the full resolution. The left-sampled down-sampled spatial basic resolution is encoded using a conventional non-scalable MPEG-2 encoder (40, Non-Scalable MPEG-2 Encoder). Thereafter, in order to express various temporal resolutions through the temporal hierarchical encoding means 30, the base layer BL and the first extended layer EL1 bit strings are generated through a B frame partition.

3차원 영상 부호화수단(20)에서는 우 영상의 압축을 위해 현재의 영상과 시간적으로 일치하는 공간적 기본 해상도를 가지는 좌 영상을 참조하여 부호화 된다. 움직임 벡터(MV) 또는 양안차 벡터(DV) 중 전송 효율이 높은 쪽을 선택한 후, 그에 따라서 움직임 차이 정보(DFD; Displaced Frame Difference) 또는 양안 차이정보(DCD; Displacement Compensated Difference)를 부호화하여 전송한다.In the 3D image encoding means 20, the left image having a spatial basic resolution that is temporally coincident with the current image is encoded to compress the right image. After selecting the higher transmission efficiency among the motion vector (MV) or the binocular difference vector (DV), the motion difference information (DFD; Displaced Frame Difference) or the binocular difference information (DCD) are encoded and transmitted accordingly. .

공간적 기본 해상도를 가지는 우 영상을 부호화하기 위한 방법은 다음과 같다.A method for encoding a right image having a spatial default resolution is as follows.

먼저, 우 영상에서의 I-프레임은 시간적으로 동일한 좌 영상으로부터 구한 양안차 벡터를 이용하여 예측된 블록을 이용하여 부호화한다.First, an I-frame in a right image is encoded using a block predicted using a binocular vector obtained from a temporally identical left image.

그리고 우 영상의 P-프레임는 동일한 시퀀스 내에서 순방향 예측(Forward Prediction)을 하여 얻어진 움직임 벡터(MV)와 시간적으로 동일한 좌 영상을 참조하여 얻어진 양안차 벡터(DV)를 이용하여 부호화한다.The P-frame of the right image is encoded using a binocular difference vector (DV) obtained by referring to a left image temporally identical to the motion vector (MV) obtained by performing forward prediction in the same sequence.

다음으로 우 영상의 B-프레임는 제4도와 같이 동일한 시퀀스 내에서 순방향 움직임 예측 및 역방향 움직임 예측을 하여 얻어진 매크로블록(MB)과 양안차 예측을 통해서 얻어진 매크로블록(MB)을 이용하여 부호화한다. Next, the B-frame of the right image is encoded using a macroblock MB obtained through forward motion prediction and backward motion prediction in the same sequence as shown in FIG. 4 and a macroblock MB obtained through binocular prediction.

완전한 공간적 해상도 표현을 위해 추가적으로 필요한 비트 열(EL2, EL5)은 복호화된 기본 해상도의 영상을 본래 해상도로 상향 샘플링 하여 시간적으로 일치하는 원 영상과의 차이 값을 부호화하여 생성한다. 확장계층의 적용은 수신측의 디스플레이가 지원되는 경우에 대해 고화질을 제공하기 위해서 사용된다.The additionally necessary bit strings EL2 and EL5 for full spatial resolution representation are generated by encoding a difference value from the original image that is temporally matched by up-sampling the decoded base resolution image to the original resolution. Application of the extension layer is used to provide high picture quality when the display of the receiving side is supported.

공간적 계위 부호화수단(10)은 기존 해상도를 가지는 기준영상에 고해상도를 지원하기 위한 추가적인 비트열을 생성한다. The spatial hierarchy encoding means 10 generates an additional bit string for supporting high resolution in a reference image having an existing resolution.

도 4는 본 발명에서 ECSC(Enhanced Compatible Stereoscopic Coding)의 예측방법을 도식화한 것이다. 예측된 우 영상 B-프레임 블록은 우 영상의 I-프레임 및 P-프레임과, 좌 영상의 B-프레임에서 유사블록을 찾아 얻어진 블록들을 가중치를 두어 합한 값이다. P-프레임의 경우 순방향과 양안차 벡터에 대한 가중치는 각각 0.5이고, B-프레임의 경우 순방향과 역방향 벡터에 대한 가중치는 각각 0.25이며, 양안차 벡터는 0.5이다. 하나의 블록의 예측과정에서 모든 가중치의 합은 1이 된다.4 is a diagram illustrating a prediction method of enhanced compatible stereoscopic coding (ECSC) in the present invention. The predicted right image B-frame block is a sum of weights of I-frames and P-frames of the right image and blocks obtained by finding similar blocks in the B-frames of the left image. For a P-frame, the weights for the forward and binocular vectors are 0.5, for the B-frames the weights for the forward and reverse vectors are 0.25, respectively, and the binocular vectors are 0.5. In the prediction process of one block, the sum of all weights is one.

(수학식 1)(Equation 1)

상기 수학식에서 w는 가중치를 나타내고, v는 벡터이며, f, b, d는 각각 순방향, 역방향 및 양안차이다.In the above equation, w represents a weight, v is a vector, and f, b, and d are forward, reverse, and binocular, respectively.

이상에서는 본 발명을 특정의 바람직한 실시 예를 도시하고 설명하였으나, 본 발명은 상기한 실시 예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 다 양한 변경과 수정이 가능할 것이다. While the present invention has been shown and described with reference to certain preferred embodiments, the present invention is not limited to the above embodiments and those skilled in the art without departing from the spirit of the present invention. Various changes and modifications will be possible.

상술한 바와 같이, 본 발명은 하나의 부호화 장비로 동시에 여러 이질적인 수신단말들에게 수신환경에 맞는 스테레오 서비스를 제공 가능한 효과가 있다. 특히, 본 발명을 이용한 부호화기로 부호화 된 스테레오 비디오의 경우 수신 단말의 환경에 따라서 재 부호화가 필요치 않으며, 서로 다른 수신 단말의 환경을 미리 알 경우에 각각의 단말의 디스플레이 시스템 및 네트워크 환경에 최적화 된 서비스가 가능하다. 예를 들어, 스테레오 실감 미디어 전송 서비스를 여러 사용자에게 멀티캐스팅 하고자 할 때에 여러 계층으로 확장하여 부호화된 스테레오 비디오 스트림을 요구한 사용자들에게 사용자의 컴퓨팅 환경에 적합한 비트열의 조합으로 전송하여 전송 채널을 낭비하지 않고 효과적으로 사용 가능하다.As described above, the present invention has the effect that it is possible to provide a stereo service suitable for a receiving environment to several heterogeneous receiving terminals at the same time with one encoding equipment. In particular, in case of stereo video encoded by the encoder using the present invention, re-encoding is not necessary according to the environment of the receiving terminal, and the service optimized for the display system and the network environment of each terminal when the environment of different receiving terminals is known in advance. Is possible. For example, when multicasting a stereo realistic media transmission service to multiple users, the transmission channel is wasted by sending a combination of bit strings suitable for a user's computing environment to a user who needs an encoded stereo video stream by extending to multiple layers. It can be used effectively without.

또한, 본 발명은 수신 단말기에서 전송 받은 영상의 해상도는 이미 수신단말의 디스플레이 시스템에 최적화 되어있기 때문에 해상도 조정을 위한 별도의 처리과정이 없이 모노 또는 스테레오 비디오의 디스플레이가 가능하여 단말기 컴퓨팅 자원을 절약할 수 있으며, 반응속도의 향상을 가져오며, 해상도 조정시에 야기되는 화질저하를 막을 수 있는 효과가 있다.In addition, since the resolution of the image transmitted from the receiving terminal is already optimized for the display system of the receiving terminal, it is possible to display mono or stereo video without additional processing for adjusting the resolution, thereby saving terminal computing resources. It can improve the reaction speed, and can prevent the deterioration of image quality caused by the resolution adjustment.

Claims

In the method for encoding a video signal with a scalable stereoscopic video encoder (Scalable Stereoscopic Video Encoder),

One base layer (BL) for providing a basic resolution of an input reference image (left image),

A stereo video encoding method considering various reception environments having at least one enhancement layer (EL) to provide additional information for supporting a right image of a native resolution and a left and right image of a high resolution.

The method of claim 1,

The extended layer (EL) is a stereo video encoding method in consideration of various reception environments, characterized in that it is composed of a spatial and temporal hierarchy consisting of a layer for providing a high resolution in space and a layer for providing a high resolution in time.

The method according to claim 1 or 2,

The right image of the native resolution is encoded by referring to the left image having a spatial native resolution that temporally matches the current image.

After selecting a motion vector (MV) or a binocular vector (DV) having a high transmission efficiency, the motion difference information (DFD; Displaced Frame Difference) or the binocular difference information (DCD; Displacement Compensated Difference) ) Is a stereo video encoding method considering various receiving environments, characterized in that the transmission.

The method according to claim 1 or 2,

The encoding of the right image having the spatial native resolution may include:

(a) encoding an I-frame in a right image using a block predicted using a binocular vector obtained from a temporally identical left image;

(b) encoding a P-frame of a right image using a binocular vector DV obtained by referring to a left image temporally identical to a motion vector MV obtained by performing forward prediction in the same sequence;

(c) encoding the B-frame of the right image using a macroblock (MB) obtained through forward motion prediction and backward motion prediction in the same sequence and a macroblock (MB) obtained through binocular prediction;

Stereo video encoding method considering the various receiving environment, characterized in that consisting of.

The method of claim 4, wherein

Stereo video encoding method considering various receiving environments, characterized in that the application of the spatiotemporal hierarchy composition consisting of an odd number of B-frames between the I-frame and the P-frame.

The method of claim 4, wherein

The predicted right image B-frame block is a sum of weighted blocks obtained by finding similar blocks in the I-frame and P-frame of the right image and the B-frame of the left image. Stereo video encoding method.

The method of claim 4, wherein

The P-frame is 0.5 as the weight for the forward and binocular vectors, the B-frame is 0.25 as the weight for the forward and reverse vectors, the binocular vector is 0.5 as the weight, and all weights in the prediction process of one block. The sum of 1 is obtained by the following equation, wherein the stereo video encoding method with extended compatibility.

(Mathematical formula)

(W denotes a weight, v is a vector, and f, b, and d are forward, reverse, and binocular, respectively.)

In the scalable stereo video encoding apparatus (Scalable Stereoscopic Video Encoder),

Three-dimensional image encoding means;

Spatial hierarchy encoding means;

Stereo video encoding apparatus considering various receiving environments including temporal hierarchy encoding means