KR20110009576A

KR20110009576A - Method and apparatus for combining plural moving pictures in an h.264/avc compressed domain

Info

Publication number: KR20110009576A
Application number: KR1020090067041A
Authority: KR
Inventors: 전도영
Original assignee: 주식회사 엠씨넥스
Priority date: 2009-07-22
Filing date: 2009-07-22
Publication date: 2011-01-28
Also published as: KR101053161B1

Abstract

PURPOSE: A method and an apparatus for combining plural moving pictures in an H.264/AVC compressed domain are provided to prevent the calculation errors which occur in the encoding or decoding procedure of a combined image. CONSTITUTION: An image combination apparatus(100) comprises a bit stream parse(110), an additional data generation unit(120) and a stream combination unit(130). The image combination apparatus provides the bit streams of a combined image by receiving plural video bit streams. The input of the image combination apparatus is the bit stream which compresses and encodes the N number of moving pictures, wherein N is an integral number which is 2 or larger than 2.

Description

Method and apparatus for combining plural moving pictures in an H.264 / AVC compressed domain

본 발명은 복수의 동영상을 합성하는 것에 관한 것으로, 보다 구체적으로 ISO/IEC(International Standardization Organization/International Electro-technical Commission) 14496-10 Advanced Video Coding(이하, 'H.264/AVC'라 한다)에 따라 부호화된 복수의 동영상 비트스트림을 합성하는 방법과 장치에 관한 것이다.The present invention relates to synthesizing a plurality of moving pictures, and more specifically, to ISO / IEC (International Standardization Organization / International Electro-technical Commission) 14496-10 Advanced Video Coding (hereinafter referred to as 'H.264 / AVC'). A method and apparatus for synthesizing a plurality of video bitstreams encoded according to the present invention.

디지털 방송, 멀티미디어 컴퓨터 응용 및 멀티미디어 통신 등 디지털 신호처리 기술에 기초한 다양한 멀티미디어 서비스가 개발 및 제공되고 있다. 영상 신호는 멀티미디어 서비스에서 매우 중요한 미디어이다. 영상신호를 포함한 멀티미디어 데이터는 그 양이 방대하므로 이를 실제 응용분야에 적용하기 위해서는 먼저 주어진 데이터를 압축하는 일이 매우 중요하다. Various multimedia services based on digital signal processing technologies such as digital broadcasting, multimedia computer applications, and multimedia communication have been developed and provided. Video signals are very important media in multimedia services. Since the amount of multimedia data including video signals is huge, it is very important to compress the given data first in order to apply it to actual applications.

영상신호 압축을 위한 국제 표준화 기구로는 영상회의와 같은 비디오 응용을 위해 결성된 ITU-T(International Telecommunication Union Telecommunication Standardization Sector)와 비디오 데이터의 저장 또는 방송 등 다양한 응용분야를 위한 ISO/IEC 등이 있다. 최근에 ITU-T와 ISO/IEC가 함께 JVT(Joint Video Team)를 구성하여 기존의 MPEG(Moving Picture Experts Group)-2, H.263, MPEG-4 Visual 비디오 압축 부호화 표준보다 압축 성능이 향상된 H.264/AVC 비디오 부호화 표준을 제정하였다. H.264/AVC는 현재 차세대 동영상 압축 기술로 평가 받고 있는데, 특히, 디지털 TV, 위성 및 지상파 DMB(Digital Multimedia Broadcasting) 등 차세대 멀티미디어 서비스와 접목되면서 다채널 고화질의 영상압축, 인터넷, 케이블 모뎀, 이동통신망에서의 영상전달, 디지털 데이터 방송 등과 같은 동영상 멀티미디어 서비스에 널리 이용되고 있다.International standardization organizations for video signal compression include the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) formed for video applications such as video conferencing, and ISO / IEC for various applications such as storage or broadcasting of video data. . ITU-T and ISO / IEC recently formed Joint Video Team (JVT) to improve compression performance over existing Moving Picture Experts Group (MPEG) -2, H.263, and MPEG-4 Visual video compression coding standards. The .264 / AVC video coding standard was established. H.264 / AVC is currently being evaluated as the next generation video compression technology. In particular, H.264 / AVC is combined with next generation multimedia services such as digital TV, satellite, and terrestrial digital multimedia broadcasting (DMB). Background Art Widely used for video multimedia services such as image transmission and digital data broadcasting in a communication network.

동영상 멀티미디어 서비스 중의 하나로 복수의 동영상을 합성 또는 혼합하여 제공하는 동영상 합성 서비스가 있다. 동영상 합성 서비스에서는 동일한 코덱(예컨대, H.264/AVC)에 기초하여 부호화된 복수의 비트스트림을 입력 받아서, 소정의 포맷으로 배치된 합성 영상의 비트스트림을 제공한다. 출력되는 합성 영상의 비트스트림도 입력되는 비트스트림과 동일한 코덱으로 부호화된 데이터이다. 합성 영상은 입력 영상의 전부 또는 일부를 포함할 수 있으며, 합성 영상을 구성하는 각 입력 영상의 위치 및 크기 등, 즉 합성 영상의 포맷은 임의로 결정될 수 있다. 입력 영상의 합성은 미리 결정된 합성 영상의 포맷 정보에 기초하여 이루어진다.One of the video multimedia services is a video synthesizing service that provides a plurality of videos by synthesizing or mixing. The video synthesis service receives a plurality of bitstreams encoded based on the same codec (eg, H.264 / AVC) and provides a bitstream of a synthesized video arranged in a predetermined format. The bitstream of the output synthetic video is also data encoded with the same codec as the input bitstream. The composite image may include all or a portion of the input image, and the position and size of each input image constituting the composite image, that is, the format of the composite image may be arbitrarily determined. The synthesis of the input image is performed based on the format information of the predetermined synthesized image.

복수의 입력 비트스트림으로부터 합성 영상의 비트스트림을 출력하기 위한 기존의 동영상 합성 방식으로는, 부호화된 비디오 신호들을 복호한 후에 재구성된 영상 데이터를 합성하는 '화소영역 처리방식'이 있다. 예를 들어, H.264/AVC에 기초하여 부호화된 N(N은 2이상의 정수)개의 동영상 비트스트림을 이용하여 합성 영상의 비트스트림을 제공하고자 할 경우에, 화소영역 처리방식에서는 우선 입력되는 N개의 비트스트림을 복호하여 각 영상을 복원한 다음, 복원된 N개의 영상 전부 또는 일부를 가지고 소정의 포맷을 갖는 합성 영상을 생성한다. 그리고 생성된 합성 영상을 다시 H.264/AVC에 기초하여 부호화함으로써 부호화된 합성 영상의 비트스트림을 생성하여 출력한다. A conventional video synthesis method for outputting a bitstream of a synthesized video from a plurality of input bitstreams includes a 'pixel region processing method' that decodes encoded video signals and synthesizes reconstructed video data. For example, when a bitstream of a composite image is to be provided using N video bitstreams (N is an integer of 2 or more) encoded based on H.264 / AVC, the pixel region processing method first inputs N. Decoding each bitstream to reconstruct each image, and then generate a composite image having a predetermined format with all or a portion of the reconstructed N images. Then, the generated composite image is encoded based on H.264 / AVC to generate and output a bitstream of the encoded composite image.

이와 같이, 화소영역 처리방식은 합성하는 영상의 개수만큼 비트스트림을 복호한 후에 이를 소정의 포맷으로 합성하며 합성 영상을 다시 부호화한다. 따라서 화소영역 처리방식에 의하면, 복호화/부호화 처리를 거치므로 영상 합성에 소요되는 시간이 길며, 복호화와 부호화를 모두 수행하여 하므로 하드웨어도 복잡해지고, 복호화/부호화 연산 도중에 오류가 발생하여 화질이 저하될 수가 있다. 또한, 합성 영상을 생성하기 위해서는 복호된 동영상을 저장할 필요가 있는데, 이를 위해서는 추가적인 버퍼를 사용해야 하므로 버퍼 사용량이 커진다. As described above, the pixel region processing method decodes the bitstream by the number of images to be synthesized, synthesizes them into a predetermined format, and re-encodes the synthesized image. Therefore, according to the pixel region processing method, since the decoding / encoding process takes a long time for image synthesis, and since both decoding and encoding are performed, the hardware becomes complicated, and an error occurs during the decoding / coding operation, resulting in deterioration of image quality. There is a number. In addition, in order to generate a composite image, it is necessary to store a decoded video. For this, an additional buffer must be used, which increases the buffer usage.

따라서 이러한 화소영역 처리방식의 단점을 보완할 수 있는 동영상 합성 처리 기법이 필요하다. Therefore, there is a need for a video synthesis processing technique that can compensate for the disadvantages of the pixel region processing method.

본 발명이 해결하고자 하는 하나의 과제는, H.264/AVC에 따라서 부호화된 복수의 입력 영상에 대한 비트스트림을 가지고, 이 입력 영상들이 소정의 디스플레이 형태로 배치된 합성 영상에 대하여 H.264/AVC에 따른 비트스트림을 생성하고자 할 경우에, 처리 시간을 단축할 수 있으며 하드웨어의 구성을 간단하게 할 수 있는 동영상 합성 방법과 장치를 제공하는 것이다.One problem to be solved by the present invention is to have a bitstream for a plurality of input video coded according to H.264 / AVC, the H.264 / In order to generate a bitstream according to AVC, it is possible to provide a video synthesizing method and apparatus that can reduce processing time and simplify hardware configuration.

본 발명이 해결하고자 하는 다른 하나의 과제는, H.264/AVC에 따라서 부호화된 복수의 입력 영상에 대한 비트스트림을 가지고, 이 입력 영상이 소정의 디스플레이 형태로 배치된 합성 영상에 대하여 H.264/AVC에 따른 비트스트림을 생성하고자 할 경우에, 추가적인 버퍼가 필요 없으며 부호화/복호화 과정에서 발생할 수 있는 연산 오류를 방지할 수 있는 동영상 합성 방법과 장치를 제공하는 것이다.Another problem to be solved by the present invention is H.264 for a composite image having a bitstream of a plurality of input images encoded according to H.264 / AVC, and the input images are arranged in a predetermined display form. In order to generate a bitstream based on / AVC, an additional buffer is not required, and a video synthesis method and apparatus capable of preventing arithmetic errors that may occur in an encoding / decoding process are provided.

상기한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 영상 합성 방법은 복수의 입력 영상의 전부 또는 일부가 소정의 디스플레이 형태로 재구성된 합성 영상을 생성하는 방법으로서, 상기 복수의 입력 영상에 대한 H.264/AVC 포맷의 비트스트림들을 수신하고, 수신된 각 비트스트림의 시퀀스 파라미터 세트(SPS), 픽쳐 파라미터 세트(PPS), 및 슬라이스 헤더에 대한 구문분석을 수행하는 단계, 상기 합성 영상의 디스플레이 형태에 대응하도록 상기 SPS, PPS, 및 슬라이스 헤더의 파라미터와 변수의 일부를 변경하거나 또는 신규로 설정하는 단계, 및 상기 합성 영상에 대한 H.264/AVC 포맷의 비트스트림으로서, 상기 변경되거나 재설정된 파라미터 와 변수를 포함하는 비트스트림을 생성하여 출력하는 단계를 포함한다. 상기 구문 분석과 파라미터의 변경 등은 입력 영상의 비트스트림에 대한 복호화 없이 압축 영역에서 수행된다. An image synthesis method according to an embodiment of the present invention for solving the above problems is a method for generating a composite image in which all or part of a plurality of input images are reconstructed into a predetermined display form, Receiving bitstreams in H.264 / AVC format and parsing the sequence parameter set (SPS), picture parameter set (PPS), and slice header of each received bitstream, displaying the composite image Changing or newly setting some of the parameters and variables of the SPS, PPS, and slice header to correspond to the shape, and as a bitstream in H.264 / AVC format for the composite image, wherein the changed or reset Generating and outputting a bitstream including parameters and variables. The parsing and parameter change are performed in the compressed region without decoding the bitstream of the input image.

상기 실시예의 일 측면에 의하면, 상기 영상 합성 방법에서는 상기 복수의 입력 영상 각각을 상기 합성 영상을 구성하는 슬라이스 그룹으로 간주하고 영상 합성을 수행할 수 있다. 그리고 이 경우에, 입력 영상의 비트스트림들 각각에 포함된 SPS의 파라미터들 중에서 'level_idc', 'seq_parameter_set_id', 'pic_width_in_mbs_minus1', 및 'pic_height_in_map_units_minus1'의 값을 변경하거나 또는 새롭게 설정하거나, 상기 입력 영상의 비트스트림들 각각에 포함된 PPS의 파라미터들 중에서 'pic_parameter_set_id', 'seq_parameter_set_id', 'num_slice_groups_minus1', 'slice_group_maptype', 'top_left', 및 'bottom_right'의 값을 변경하거나 또는 새롭게 설정하거나, 및/또는 상기 입력 영상의 비트스트림들 각각에 포함된 슬라이스 헤더에 포함되는 변수들 중에서 first_mb_in_slice, pic_parameter_set_id, 및 frame_number의 값을 변경하거나 또는 새롭게 설정할 수 있다.According to an aspect of the embodiment, in the image synthesis method, each of the plurality of input images may be regarded as a slice group constituting the synthesized image and image synthesis may be performed. In this case, values of 'level_idc', 'seq_parameter_set_id', 'pic_width_in_mbs_minus1', and 'pic_height_in_map_units_minus1' are changed or newly set among the parameters of the SPS included in each of the bitstreams of the input image, or Change or newly set the values of 'pic_parameter_set_id', 'seq_parameter_set_id', 'num_slice_groups_minus1', 'slice_group_maptype', 'top_left', and 'bottom_right' among the parameters of the PPS included in each of the bitstreams, and / or Among the variables included in the slice header included in each of the bitstreams of the input image, values of first_mb_in_slice, pic_parameter_set_id, and frame_number may be changed or newly set.

그리고 영상 합성의 결과로 생성되는 합성 영상의 비트스트림은 하나의 SPS, 하나의 PPS, 및 상기 합성 영상에 포함되는 상기 입력 영상의 개수에 해당하는 슬라이스 헤더와 영상 데이터를 포함할 수 있다.The bitstream of the synthesized image generated as a result of image synthesis may include one SPS, one PPS, and a slice header and image data corresponding to the number of the input images included in the synthesized image.

상기한 기술적 과제를 해결하기 위한 본 발명의 일 실시예에 따른 영상 합성 장치는 복수의 입력 영상의 전부 또는 일부가 소정의 디스플레이 형태로 배열된 합 성 영상을 생성하기 위한 장치로서, 수신된 상기 복수의 입력 영상에 대한 H.264/AVC 포맷의 비트스트림들 각각의 시퀀스 파라미터 세트(SPS), 픽쳐 파라미터 세트(PPS), 및 슬라이스 헤더에 대한 구문분석을 수행하기 위한 스트림 구분분석부, 상기 합성 영상의 디스플레이 형태에 대응하도록 상기 SPS, PPS, 및 슬라이스 헤더의 파라미터와 변수의 일부를 변경하거나 또는 신규로 설정하기 위한 파라미터 및 슬라이스 헤더 생성부, 및 상기 합성 영상에 대한 H.264/AVC 포맷의 비트스트림으로서, 상기 변경되거나 재설정된 파라미터와 변수를 포함하는 비트스트림을 생성하여 출력하기 위한 스트림 합성부를 포함한다.An image synthesizing apparatus according to an embodiment of the present invention for solving the above technical problem is an apparatus for generating a synthetic image in which all or part of a plurality of input images are arranged in a predetermined display form, the plurality of received A stream classification analyzer for parsing a sequence parameter set (SPS), a picture parameter set (PPS), and a slice header of each of the bitstreams of the H.264 / AVC format for the input image of the synthesized image, the synthesized image A parameter and slice header generation unit for changing or newly setting some of the parameters and variables of the SPS, PPS, and slice header to correspond to the display type of the bit, and bits of the H.264 / AVC format for the composite image A stream synthesis, for generating and outputting a bitstream including the changed or reset parameters and variables as a stream. It includes.

상기 실시예의 일 측면에 의하면, 상기 스트림 구문분석부는 상기 입력 영상의 비트스트림들 각각의 SPS 구문을 분석하기 위한 SPS 구문분석부, 상기 입력 영상의 비트스트림들 각각의 PPS 구문을 분석하기 위한 PPS 구문분석부, 및 상기 입력 영상의 비트스트림들 각각의 슬라이스 헤더 구문을 분석하기 위한 슬라이스 헤더 구문분석부를 포함할 수 있다.According to an aspect of the embodiment, the stream parsing unit SPS parsing unit for analyzing the SPS syntax of each of the bitstreams of the input image, PPS syntax for analyzing each PPS syntax of the bitstreams of the input image The parser may include a slice header parser configured to analyze slice header syntax of each of the bitstreams of the input image.

상기한 과제를 해결하기 위한 본 발명의 다른 실시예에 따른 영상 합성 방법은 복수의 입력 영상의 전부 또는 일부가 소정의 디스플레이 형태로 배열된 합성 영상을 생성하기 위한 영상 합성 방법으로서, 상기 복수의 입력 영상에 대한 H.264/AVC 포맷의 비트스트림들을 수신하고, 상기 수신된 비트스트림들 각각에 포함되어 있는 파라미터와 변수의 일부를 상기 합성 영상에 대응하도록 변경 또는 재설정하며, 상기 상기 입력 영상에 대한 H.264/AVC 포맷의 비트스트림으로서, 상기 변경 및 재설정된 파라미터와 변수를 포함하는 비트스트림을 생성하여 출력하는 것 을 특징으로 한다. 이 경우에, 상기 수신된 복수의 비트스트림들 각각에 포함되어 있는 시퀀스 파라미터 세트(SPS)를 상기 합성 영상에 대한 단일의 SPS로 변경하여 통합하고, 상기 수신된 복수의 비트스트림들 각각에 포함되어 있는 픽쳐 파라미터 세트(PPS)를 상기 합성 영상에 대한 단일의 PPS로 변경하여 통합하고, 그리고 상기 수신된 복수의 비트스트림들 각각에 포함되어 있는 슬라이스 헤더를 상기 합성 영상에 대응하는 복수의 슬라이스 헤더로 변경할 수 있다.According to another aspect of the present invention, there is provided an image synthesizing method for generating a synthetic image in which all or part of a plurality of input images are arranged in a predetermined display form. Receiving bitstreams in H.264 / AVC format for an image, changing or resetting some of the parameters and variables included in each of the received bitstreams to correspond to the composite image, and A bitstream in the H.264 / AVC format, characterized by generating and outputting a bitstream including the changed and reset parameters and variables. In this case, a sequence parameter set (SPS) included in each of the plurality of received bitstreams is changed into a single SPS for the composite image, and the combined sequence parameter set (SPS) is included in each of the plurality of received bitstreams. Change the picture parameter set (PPS) into a single PPS for the composite image, and integrate a slice header included in each of the plurality of received bitstreams into a plurality of slice headers corresponding to the composite image. You can change it.

본 발명에 의하면, H.264/AVC의 압축 영역에서, 즉 H.264/AVC에 따라서 부호화된 복수의 동영상에 대한 비트스트림을 복호하지 않고 압축 영역에서 단지 일부 파라미터와 변수를 변경하거나 또는 통합하여 합성 영상에 대응하는 파라미터와 변수를 설정한 다음, 이를 포함하는 합성 영상에 대한 H.264/AVC 포맷의 비트스트림을 생성하여 출력하기 때문에, 처리 시간을 단축할 수 있을 뿐만 아니라 하드웨어의 구성을 간단하게 할 수 있다. 뿐만 아니라, 입력 영상의 비트스트림에 대한 복호화를 수행하지 않기 때문에, 복호화된 데이터를 저장하기 위한 추가적인 버퍼가 필요 없으며 복호화 및 합성 영상의 부호화 과정에서 발생할 수 있는 연산 오류를 방지할 수 있다.According to the present invention, without changing or integrating only some parameters and variables in the compressed region of H.264 / AVC, that is, without decoding a bitstream for a plurality of videos encoded according to H.264 / AVC, After setting parameters and variables corresponding to the synthesized video, the H.264 / AVC format bitstream is generated and output for the synthesized video including the synthesized video, thereby reducing processing time and simplifying hardware configuration. It can be done. In addition, since the decoding of the bitstream of the input image is not performed, an additional buffer for storing the decoded data is not required and operation errors that may occur during decoding and encoding of the synthesized image can be prevented.

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세하게 설명한다. 사 용되는 용어들은 실시예에서의 기능을 고려하여 선택된 용어들로서, 그 용어의 의미는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 후술하는 실시예들에서 사용된 용어의 의미는, 본 명세서에 구체적으로 정의된 경우에는 그 정의에 따르며, 구체적인 정의가 없는 경우는 당업자들이 일반적으로 인식하는 의미로 해석되어야 할 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Terms used are terms selected in consideration of functions in the embodiment, the meaning of the terms may vary depending on the intention or custom of the user or operator. Therefore, the meaning of the terms used in the embodiments to be described later, according to the definition if specifically defined herein, and if there is no specific definition should be interpreted to mean generally recognized by those skilled in the art.

도 1은 본 발명의 실시예에 따른 영상 합성 기법이 이용될 수 있는 동영상 멀티미디어 서비스의 일례를 설명하기 위한 블록도이다. 도 1에 도시된 동영상 멀티미디어 서비스는 영상회의 서비스일 수 있는데, 이것은 단지 예시적인 것이다. 1 is a block diagram illustrating an example of a video multimedia service in which an image combining technique according to an embodiment of the present invention may be used. The video multimedia service shown in FIG. 1 may be a video conferencing service, which is merely exemplary.

도 1을 참조하면, 영상회의 서비스를 제공하기 위한 본 발명의 일 실시예에 따른 영상 합성 장치는 4개의 단말로부터 각각의 입력 영상(12a, 12b, 12c, 12d)에 대한 H.264/AVC 포맷의 비트스트림을 입력 받는다. 영상 합성 장치는 다지점 제어 유닛(Multipoint Control Unit, MCU)에 구비되는 장치일 수 있다. 그리고 영상 합성 장치는 4개의 입력 영상이 임의의 위치에 재배치된, 즉 소정의 디스플레이 형태로 배열된 합성 영상(14a, 14b, 14c, 14d)에 대한 H.264/AVC 포맷의 비트스트림을 단말로 출력한다. 영상 합성 장치는 각 단말로 디스플레이 형태가 서로 다른 합성 영상(14a, 14b, 14c, 14d)의 비트스트림을 출력할 수 있지만, 이것은 단지 예시적인 것이다. 즉, 영상 합성 장치는 모든 단말로 동일한 디스플레이 형태의 합성 영상의 비트스트림을 출력하거나 또는 일부 단말로만 합성 영상의 비트스트림을 출력할 수도 있다.Referring to FIG. 1, an image synthesizing apparatus according to an embodiment of the present invention for providing a video conferencing service includes an H.264 / AVC format for each input image 12a, 12b, 12c, and 12d from four terminals. Receives a bitstream of. The image synthesizing apparatus may be a device provided in a multipoint control unit (MCU). In addition, the image synthesizing apparatus uses the H.264 / AVC format bitstream for the composite images 14a, 14b, 14c, and 14d in which four input images are rearranged at arbitrary positions, that is, arranged in a predetermined display form. Output The image synthesizing apparatus may output bitstreams of the synthesized images 14a, 14b, 14c, and 14d having different display forms to each terminal, but this is merely exemplary. That is, the image synthesizing apparatus may output the bitstream of the synthesized image having the same display form to all terminals or output the bitstream of the synthesized image only to some terminals.

영상 합성 장치는 합성 영상을 구성하는 각 입력 영상의 위치와 크기, 입력 영상들의 배치 등도 임의의 형태로 만들 수가 있다. 도 2는 도 1의 영상 합성 장치에서 출력하는 합성 영상의 디스플레이 형태에 대한 다른 예를 보여 주는 블록도이다. 도 2를 참조하면, 합성 영상의 디스플레이 형태는 입력 영상을 이용하여 다양한 포맷(위치, 크기 등)으로 구성할 수 있다는 것을 알 수 있다. 그리고 도 2에 도시되지는 않았지만, 영상 합성 장치는 입력 영상의 일부만을 포함하는 합성 영상의 비트스트림을 출력할 수도 있다. The image synthesizing apparatus may also make the position and size of each input image constituting the synthesized image, the arrangement of the input images, and the like in any form. FIG. 2 is a block diagram illustrating another example of a display form of a composite image output from the image synthesizing apparatus of FIG. 1. Referring to FIG. 2, it can be seen that the display form of the composite image can be configured in various formats (position, size, etc.) using the input image. Although not shown in FIG. 2, the image synthesizing apparatus may output a bitstream of the synthesized image including only a part of the input image.

이러한 합성 영상의 디스플레이 형태는 미리 결정된 한 가지로 고정되거나 또는 시간에 따라서 변경이 될 수도 있다. 합성 영상의 디스플레이 형태는 영상 합성을 수행하기 이전에 이미 결정되어 영상 합성 장치에 입력되어 있는데, 본 실시예에서는 이러한 합성 영상의 디스플레이 형태를 결정하는 구체적인 방식에 대해서는 특별한 제한이 없다. 예를 들어, 합성 영상의 디스플레이 형태는 입력 영상의 중요도에 따라서 영상 합성 장치의 내부에서 결정되거나 또는 합성 영상의 디스플레이 형태에 대한 정보가 영상 합성 장치의 외부(예컨대, 영상 회의를 주재하는 의장 또는 합성 영상의 비트스트림을 복호하여 재생하는 단말의 사용자 등)로부터 입력될 수도 있다. 그리고 영상 합성 장치는 이와 같이 이미 입력되어 있는 합성 영상에 대한 소정의 디스플레이 형태 정보에 기초하여, 복수의 입력 영상의 비트스트림으로부터 이러한 디스플레이 형태에 맞는 합성 영상의 비트스트림을 생성하여 출력한다.The display form of such a composite image may be fixed to a predetermined one or may be changed over time. The display form of the synthesized image is determined before the image synthesis is performed and input to the image synthesizing apparatus. In this embodiment, there is no particular limitation on a specific method of determining the display form of the synthesized image. For example, the display form of the synthesized image is determined inside the image synthesizing apparatus according to the importance of the input image, or the information on the display form of the synthesized image is external to the image synthesizing apparatus (eg, the chairman or the synthesis presided over the video conference). Or a user of a terminal that decodes and reproduces a bitstream of an image. The image synthesizing apparatus generates and outputs a bitstream of the synthesized image corresponding to the display form from the bitstreams of the plurality of input images based on the predetermined display type information of the synthesized image which is already input.

도 3은 본 발명의 일 실시예에 따른 영상 합성 장치(100)의 구성을 보여 주는 블록도이다. 도 3을 참조하면, 영상 합성 장치(100)는 스트림 구문분석부(Stream Parse, 110), 부가 데이터 생성부(Additional Data Composite, 120), 및 스트림 합성부(Stream Composite, 130)를 포함한다. 도 3에 도시된 영상 합성 장치(100)의 구성은 단지 설명의 편의를 위하여 논리적으로 구분한 것이며, 물리적으로는 두 개의 블록이 하나로 통합되어 구현되거나 또는 하나의 블록이 복수의 서브블록들으로 분리되어 구현될 수도 있다. 이러한 영상 합성 장치(100)는 영상 회의(video conference)를 위한 MCU에 구비될 수 있으나, 여기에만 한정되는 것은 아니다. 즉, 영상 합성 장치(100)는 복수의 비디오 비트스트림을 입력 받아서 합성 영상의 비트스트림을 제공하는 다른 유형의 신호 처리 장치에도 구비될 수 있다. 3 is a block diagram showing a configuration of an image synthesizing apparatus 100 according to an embodiment of the present invention. Referring to FIG. 3, the image synthesizing apparatus 100 includes a stream parser 110, an additional data generator 120, and a stream composite 130. The configuration of the image synthesizing apparatus 100 shown in FIG. 3 is logically divided for convenience of description only, and physically, two blocks are implemented as one integrated or one block is divided into a plurality of subblocks. May be implemented. The video synthesizing apparatus 100 may be provided in an MCU for a video conference, but is not limited thereto. That is, the image synthesizing apparatus 100 may be provided in another type of signal processing apparatus that receives a plurality of video bitstreams and provides a bitstream of the synthesized image.

영상 합성 장치(100)의 입력은 N(N은 2이상의 정수이다)개의 동영상을 압축 부호화한 비트스트림이다. 본 실시예에 의하면, 입력 비트스트림은 물론, 출력 비트스트림도 H.264/AVC에 따라서 부호화되어 전송을 위해 생성되는 데이터이다. H.264/AVC 비트스트림 구문(syntax)는 비트스트림을 네트워크상에서 전송하기 용이한 형태로 보내기 위해 네트워크 추상계층(Network Abstraction Layer, NAL) 단위의 구문을 통해 비트스트림을 형식화한다. 영상 합성 장치(100)에서는 H.264/AVC 압축 영역에서 영상 합성을 수행하는바, 우선 NAL 단위 구문에 관하여 개략적으로 살펴보기로 한다. The input of the video synthesizing apparatus 100 is a bitstream obtained by compression-coding N (N is an integer of 2 or more) videos. According to this embodiment, not only the input bitstream but also the output bitstream are data encoded according to H.264 / AVC and generated for transmission. The H.264 / AVC bitstream syntax formats the bitstream through the syntax of the Network Abstraction Layer (NAL) in order to send the bitstream in a form that is easy to transmit on the network. The image synthesis apparatus 100 performs image synthesis in the H.264 / AVC compression region. First, the NAL unit syntax will be briefly described.

NAL은 비트스트림 내 데이터를 형식화하기 위해 사용되고, 다양한 채널이나 저장 미디어에 적합한 전송 방법에 대한 헤더정보를 제공한다. 모든 데이터들은 NAL안에 있고, 각각은 바이트 정수 값을 가지고 있다. NAL은 패킷 지향 시스템과 비트스트림 지향 시스템에서 모두 쓰이기 위한 하나의 일반적인 형식을 갖는다. 패킷 지향전송과 비트스트림에서 NAL 형식은 비트스트림 형식에서의 NAL이 접두(prefix)나 여분의 첨가 바이트에 의해 선행된다는 점만 제외하면 일치한다.NAL is used to format data in a bitstream and provides header information on a transmission method suitable for various channels or storage media. All data is in NAL, and each has a byte integer value. NAL has one general format for use in both packet-oriented and bitstream-oriented systems. In packet-oriented transmissions and bitstreams, the NAL format is consistent except that the NAL in the bitstream format is preceded by a prefix or extra extra bytes.

NAL의 기본 단위를 NAL 단위(NALU)라고 한다. NAL 단위는 기본적으로 NAL 헤더(header)와 비디오 부호화 계층(Video Coding Layer, VCL)에서 생성된 RBSP(Raw Byte Sequence Payload)의 두 부분으로 구성된다. NAL 헤더에는 해당 NAL 단위의 참조픽쳐가 되는 슬라이스가 포함되어 있는지 여부를 나타내는 플래그 정보(nal_ref_idc)와 해당 NAL 단위의 종류를 나타내는 NAL 유형 식별자(nal_unit_type)가 포함되어 있다. 표 1은 NAL 단위의 종류를 나타낸다. 표 1을 참조하면, NAL 유형 정보(nal_unit_type)의 값이 1 내지 5는 각각 슬라이스(slice)를 가리키고, 7은 시퀀스 파라미트 세트(Sequence Parameter Set, SPS), 8은 픽쳐 파라미터 세트(Picture Parameter Set, PPS)를 가리킨다.The basic unit of NAL is called NAL unit (NALU). The NAL unit basically consists of two parts: a NAL header and a raw byte sequence payload (RBSP) generated from a video coding layer (VCL). The NAL header includes flag information (nal_ref_idc) indicating whether a slice serving as a reference picture of the corresponding NAL unit is included and a NAL type identifier (nal_unit_type) indicating the type of the corresponding NAL unit. Table 1 shows the types of NAL units. Referring to Table 1, the values of NAL type information (nal_unit_type) 1 to 5 indicate slices, 7 is a sequence parameter set (SPS), and 8 is a picture parameter set. , PPS).

H.264/AVC에 따라서 부호화되어 전송되는 비트스트림은 이러한 NAL 단위를 기초로 하여 구성된다. 만일 AVC 파일 포맷과 실시간 전송 프로토콜(Real-time Transport Protocol, RTP) 등 하위 시스템에 비트열의 단위를 구분하는 기능이 있는 경우에는, 시작부호가 부가되지 않고 NAL 단위와 인접한 NAL 단위를 구분하기 위하여 삽입되는 영바이트로만 비트스트림이 구성된다. 반면, MPEG-2 시스템과 같이 부호화한 데이터를 하나의 비트열로 해서 다루는 하위 시스템을 사용하는 경우에는, 시작부호가 추가로 부가되어 비트스트림이 구성된다. 후자의 경우와 같이, 시작부호가 붙은 비트열을 바이트열 포맷이라고 하는데, 도 4에는 이러한 바이트열 포맷의 구성을 보여 주는 블록도가 도시되어 있다. 시작 부호는 16진수 00 00 01(이진수로는 00000000 00000000 00000001)이라는 패턴을 가지는 3바이트 부호이다.The bitstream encoded and transmitted according to H.264 / AVC is configured based on such NAL units. If the sub-system, such as the AVC file format and the Real-time Transport Protocol (RTP), has a function for distinguishing units of bit strings, a start code is not added and inserted to distinguish between NAL units and adjacent NAL units. The bitstream consists of only zero bytes. On the other hand, in the case of using a sub-system which treats encoded data as one bit string like the MPEG-2 system, the start code is additionally added to form a bit stream. As in the latter case, a bit string with a start code is called a byte string format, and Fig. 4 shows a block diagram showing the structure of such a byte string format. The start sign is a three-byte code with the pattern of hexadecimal 00 00 01 (00000000 00000000 00000001 in binary).

본 실시예에 의하면, 이와 같이 NAL 단위로 기초로 한 비트스트림(시작부호는 부가되거나 부가되지 않을 수 있다)이 스트림 구문분석부(110)의 입력으로 입력된다. N개의 비트스트림이 입력되면, 스트림 구문분석부(110), 보다 구체적으로 헤더 구문분석부(112, Header Parse)는 먼저 N개의 비트스트림 각각의 NAL 단위에서 NAL 헤더를 구문분석한다. NAL 헤더를 구문분석하면, NAL 유형 정보(nal_unit_type)의 값을 이용하여 뒤따르는 비트스트림(RBSP)의 데이터 타입을 구분할 수 있다. 예를 들어, NAL 유형 정보(nal_unit_type)의 값이 1 내지 5이면 뒤따르는 비트스트림의 데이터 타입은 슬라이스 데이터로, 7이면 시퀀스 파라미트 세트(SPS), 그리고 8이면 픽쳐 파라미터 세트(PPS)인 것으로 구분할 수 있다. 슬라이스 데이터는 슬라이스 헤더 데이터를 포함한다.According to the present embodiment, a bitstream (start code may or may not be added) based on the NAL unit is input to the stream parser 110 as described above. When N bitstreams are input, the stream parser 110, more specifically, the header parser 112 first parses the NAL header in each NAL unit of the N bitstreams. By parsing the NAL header, the data type of the following bitstream (RBSP) can be distinguished using the value of the NAL type information (nal_unit_type). For example, if the value of NAL type information (nal_unit_type) is 1 to 5, the data type of the following bitstream is slice data, 7 is a sequence parameter set (SPS), and 8 is a picture parameter set (PPS). Can be distinguished. Slice data includes slice header data.

이와 같이, 헤더 구문분석부(112)에서 데이터 타입(SPS, PPS, 슬라이스 등)의 구분이 이루어지면, 현재 비트스트림은 각각의 구문분석부, 즉 SPS 구문분석부(114a), PPS 구문분석부(114b), 및 슬라이스 헤더 구문분석부(114c)로 전달된다. 각 구문분석부(114a, 114b, 114c)는 압축 영역에서 구문분석을 수행한다. 본 실시예에 의하면, 현재 비트스트림의 데이터 타입이 도시된 각각의 구문분석부(114a, 114b, 114c)에 해당되지 않는 경우에는, 추가적인 구문분석은 수행되지 않는다. 그리고 추가적인 구문분석이 수행되지 않는 비트스트림은 그대로 스트림 합성부(130)로 전달될 수 있다.As such, when the data parsers (SPS, PPS, slice, etc.) are distinguished in the header parser 112, the current bitstream may be parsed by each parser, that is, the SPS parser 114a or the PPS parser. 114b, and the slice header parser 114c. Each parser 114a, 114b, 114c performs parsing in the compressed region. According to this embodiment, if the data type of the current bitstream does not correspond to each of the parsers 114a, 114b, and 114c shown, no further parsing is performed. In addition, the bitstream in which the additional parsing is not performed may be transferred to the stream synthesis unit 130 as it is.

SPS 구문분석부(114a)에서는 SPS RBSP 구문에 따라서 비트스트림을 구문분석한다. H.264/AVC에 따른 SPS RBSP 구문은 도 5a 내지 도 5c에 도시되어 있다. 도 5a 내지 도 5c는 전체적으로 하나의 구문이지만, 단지 도시의 편의를 위해서 분리해서 도시한 것이다. 그리고 본 명세서에서는 도 5a 내지 도 5c에 도시된 SPS RBSP 구문에 포함되어 있는 각각의 변수에 대한 구체적인 설명은 생략한다. 각 변수의 구체적인 의미는 Text of ISO/IEC 14496-10 Advanced Video Coding 3^rd Edition인 H.264/AVC의 표준 문서를 참조하면 알 수 있으며, 위 표준 문서는 참조에 의하여 본 명세서에 완전히 결합된다.The SPS parser 114a parses the bitstream according to the SPS RBSP syntax. The SPS RBSP syntax according to H.264 / AVC is shown in FIGS. 5A-5C. 5A to 5C are a single phrase in their entirety, but are shown separately for convenience of illustration only. In the present specification, detailed description of each variable included in the SPS RBSP syntax shown in FIGS. 5A to 5C will be omitted. Specific meanings of the variables referring to the standard document in Text of ISO / IEC 14496-10 Advanced Video Coding 3 rd Edition in H.264 / AVC can be seen, the above standard documents are completely coupled herein by reference.

본 실시예에 의하면, SPS 구문분석부(114a)에서는 적어도 다음과 같은 변수에 대한 분석이 수행될 수 있다. 예를 들어, 해당 시퀀스의 프로파일을 지시하는 변수인 'profile_idc', 해당 시퀀스의 레벨을 지시하는 변수인 'level_idc', 해당 SPS을 식별하기 위한 식별값인 'seq_parameter_set_id', 입력 영상의 크기를 나타내기 위한 변수인 'pic_width_in_mbs_minus1'와 'pic_height_in_map_units_minus1'가 SPS 구문분석부(114a)에서 분석이 될 수 있다. According to the present embodiment, the SPS parser 114a may analyze at least the following variables. For example, 'profile_idc' which is a variable indicating a profile of the sequence, 'level_idc' which is a variable indicating the level of the sequence, 'seq_parameter_set_id', which is an identification value for identifying the SPS, and the size of the input image. Variables 'pic_width_in_mbs_minus1' and 'pic_height_in_map_units_minus1' may be analyzed by the SPS parser 114a.

PPS 구문분석부(114b)에서는 PPS RBSP 구문에 따라서 비트스트림을 구문분석한다. H.264/AVC에 따른 PPS RBSP 구문은 도 6a 내지 도 6c에 도시되어 있다. 도 6a 내지 도 6c도 전체적으로 하나의 구문이지만, 단지 도시의 편의를 위해서 분리해서 도시한 것이며, 역시 PPS RBSP 구문에 포함되어 있는 각각의 변수에 대한 구체적인 설명은, 참조로 결합되는 H.264/AVC의 표준 문서로 대체하여 이에 대한 구체적인 설명은 생략한다.The PPS parser 114b parses the bitstream according to the PPS RBSP syntax. PPS RBSP syntax according to H.264 / AVC is shown in FIGS. 6A-6C. 6A to 6C are one syntax as a whole, but are shown separately only for convenience of illustration, and detailed description of each variable included in the PPS RBSP syntax is also H.264 / AVC, which is incorporated by reference. The detailed description thereof will be omitted instead of the standard document.

본 실시예에 의하면, PPS 구문분석부(114b)에서는 적어도 다음과 같은 변수에 대한 구문분석이 수행될 수 있다. 예를 들어, 해당 PPS를 식별하기 위한 식별값인 'pic_parameter_set_id', 해당 픽쳐가 포함되는 시퀀스를 식별하기 위한 값인 'seq_parameter_set_id', 해당 픽쳐에 포함되는 슬라이스 그룹의 개수를 지시하기 위한 변수인 'num_slice_groups_minus1', 각 슬라이스 그룹의 매크로블록의 배열을 지시하기 위한 변수인 'slice_group_maptype', 그리고 각 슬라이스 그룹의 크기 및 위치를 지시하기 위한 변수인 'top_left'와 'bottom_right'가 PPS 구문분석부(114b)에서 분석이 수행될 수 있다.According to the present embodiment, the PPS parser 114b may parse at least the following variables. For example, 'pic_parameter_set_id', an identification value for identifying a corresponding PPS, 'seq_parameter_set_id', a value for identifying a sequence including a corresponding picture, and 'num_slice_groups_minus1', a variable for indicating the number of slice groups included in the picture. , 'Slice_group_maptype', which is a variable indicating the arrangement of macroblocks of each slice group, and 'top_left' and 'bottom_right', which are variables indicating the size and position of each slice group, are analyzed by the PPS parser 114b. This can be done.

슬라이스 헤더 구문분석부(114c)에서는 슬라이스 헤더 구문에 따라서 비트스트림을 구문분석한다. H.264/AVC에 따른 슬라이스 헤더 구문은 도 7a 내지 도 7d에 도시되어 있다. 도 7a 내지 도 7d도 전체적으로 하나의 구문이지만, 단지 도시의 편의를 위해서 분리해서 도시한 것이며, 역시 PPS RBSP 구문에 포함되어 있는 각각의 변수에 대한 구체적인 설명도, 참조로 결합되는 H.264/AVC 표준 문서로 대체하며 이에 대한 구체적인 설명은 생략한다.The slice header parser 114c parses the bitstream according to the slice header syntax. Slice header syntax according to H.264 / AVC is shown in FIGS. 7A-7D. 7A to 7D are one syntax as a whole, but are shown separately only for convenience of illustration, and detailed descriptions of each variable also included in the PPS RBSP syntax are also referred to as H.264 / AVC. It is replaced with a standard document and detailed description thereof is omitted.

본 실시예에 의하면, 슬라이스 헤더 구문분석부(114b)에서는 적어도 다음과 같은 변수에 대한 분석이 수행될 수 있다. 예를 들어, 해당 슬라이스에서 최초 매크로블록의 위치를 나타내는 변수인 'first_mb_in_slice', 슬라이스의 유형을 지시하는 변수인 'slice_type', 해당 슬라이스가 포함되는 픽쳐를 지시하는 값인 'pic_parameter_set_id', 그리고 프레임 번호를 지시하는 변수인 'frame_num'가 슬라이스 헤더 구문분석부(114c)에서 분석이 수행될 수 있다.According to the present embodiment, the slice header parser 114b may analyze at least the following variables. For example, the variable 'first_mb_in_slice', which indicates the position of the first macroblock in the slice, 'slice_type', which indicates the type of the slice, 'pic_parameter_set_id', which indicates the picture that contains the slice, and the frame number. The 'frame_num' indicating variable may be analyzed by the slice header parser 114c.

스트림 구분분석부(110)에서 SPS, PPS, 및 슬라이스 헤더(Slice Header)에 대한 구문분석이 이루어지고 나면, 파라미터 및 슬라이스 헤더 생성부(120, 이하, '합성 영상 생성부'라고 한다)는, 합성 영상을 구성하기 위한 SPS, PPS, 및 슬라이스 헤더의 변수들을 생성한다. 보다 구체적으로, 합성 영상 생성부(120)는 입력 영상들 각각에 포함되어 있는 SPS들 및 PPS들은 합쳐서, 합성 영상 전체에 대하여 하나의 SPS 구문과 PPS 구문을 생성한다. 그리고 합성 영상 생성부(120)는 입력 영상의 SPS, PPS, 슬라이스 헤더의 변수들의 상당 부분은 그대로 이용하지만, 일부 파라미터와 변수들은 합성 영상의 디스플레이 형태 등에 기초하여 합성 영상을 위한 파라미터와 슬라이스 헤더의 변수들을 새롭게 설정하거나 또는 변경한다. 이하, 합성 영상 생성부(120)에서 합성 영상을 위하여 새롭게 설정하거나 변경하는 파라미터와 변수에 관하여 구체적으로 설명한다. 여기에서 구체적으로 언급되지 않은 파라미터와 변수들은 입력 영상들 각각의 비트스트림에 포함되어 있는 파라미터와 변수들이 그대로 사용될 수 있다.After parsing the SPS, PPS, and Slice Header in the stream classification analyzer 110, the parameter and slice header generator 120 (hereinafter, referred to as a “synthetic image generator”), Variables of the SPS, the PPS, and the slice header for composing the synthesized image are generated. More specifically, the composite image generator 120 combines the SPSs and the PPSs included in each of the input images to generate one SPS phrase and a PPS syntax for the entire composite image. The synthesized image generator 120 uses a substantial portion of the variables of the SPS, PPS, and slice header of the input image as it is, but some parameters and variables are determined based on the display format of the synthesized image and the like. Set or change the variables. Hereinafter, parameters and variables newly set or changed for the synthesized image by the synthesized image generator 120 will be described in detail. The parameters and variables not specifically mentioned herein may be used as they are in the bitstream of each of the input images.

우선, 합성 영상 생성부(120), 보다 구체적으로 SPS 생성부(120a)는 입력 영상들 각각에 포함되어 있는 SPS를 이용하여 합성 영상에 대하여 하나의 SPS를 생성한다. 즉, 각각의 입력 영상에 포함되어 있던 SPS들을 이용하여 하나의 SPS가 생성된다. 왜냐하면, 영상 합성 장치(100)는 복수의 입력 영상이 혼합된 하나의 시퀀스(합성 영상)에 대한 비트스트림을 생성하여 출력하기 때문이다. 이 경우에, SPS 생성부(120a)는 각 입력 영상의 SPS에 포함되어 있는 대부분의 파라미터들은 그대로 이용한다. 예를 들어, 입력 영상들 각각의 비트스트림의 프로파일과 합성 영상의 비트스트림의 프로파일이 같은 베이스라인 프로파일인 경우에, SPS 생성부(120a)는 'profile_idc'의 값은 변경하지 않고 베이스라인 프로파일을 지시하는 입력 영상의 'profile_idc'의 값을 그대로 이용할 수 있다. First, the composite image generator 120, and more specifically, the SPS generator 120a generates one SPS for the composite image by using the SPS included in each of the input images. That is, one SPS is generated using the SPSs included in each input image. This is because the image synthesizing apparatus 100 generates and outputs a bitstream for one sequence (synthetic image) in which a plurality of input images are mixed. In this case, the SPS generator 120a uses most of the parameters included in the SPS of each input image. For example, when the profile of the bitstream of each of the input images and the profile of the bitstream of the synthesized image are the same baseline profile, the SPS generator 120a may change the baseline profile without changing the value of 'profile_idc'. The value of 'profile_idc' of the inputted input image can be used as it is.

반면, SPS 생성부(120a)는 SPS에 포함된 파라미터들 중에서, 'level_idc', 'seq_parameter_set_id', 'pic_width_in_mbs_minus1' 및 'pic_height_in_map_units_minus1'은 합성 영상에 적합하도록 그 값을 새롭게 설정하거나 변경한다. 예를 들어, 'level_idc'는, 입력 영상의 'level_idc' 값에 상관없이, 합성 영상의 크기(해상도)에 따라서 변경될 수 있는데, 합성 영상이 QCIF(Quarter Common Intermediate Format) 해상도를 갖는 경우에는 레벨1, CIF(Common Intermediate Format) 해상도를 갖는 경우에는 레벨2, 표준화질의 티브이(SDTV)의 해상도를 갖는 경우에는 레벨3, 고화질 티브이9HDTV)의 해상도를 갖는 경우에는 레벨4를 지시하는 값 등으로 변경될 수 있다. 그리고 'seq_parameter_set_id'는 해당 합성 영상의 시퀀스를 지시하는 값으로 새롭게 설정된다. 그리고 'pic_width_in_mbs_minus1'와 'pic_height_in_map_units_minus1'도 각각 합성 영상의 크기를 나타내는 변수로 새롭게 설정된다.On the other hand, among the parameters included in the SPS, the SPS generation unit 120a newly sets or changes the values of 'level_idc', 'seq_parameter_set_id', 'pic_width_in_mbs_minus1' and 'pic_height_in_map_units_minus1' so as to suit the synthesized image. For example, 'level_idc' may be changed according to the size (resolution) of the synthesized image regardless of the 'level_idc' value of the input image. When the synthesized image has a QCIF (Quarter Common Intermediate Format) resolution, 1, Level 2 for CIF (Common Intermediate Format) resolution, Level 3 for Standard Definition TV (SDTV) resolution, or Level 4 for High Definition TV 9 HDTV (4). Can be. 'Seq_parameter_set_id' is newly set to a value indicating a sequence of the corresponding composite image. 'Pic_width_in_mbs_minus1' and 'pic_height_in_map_units_minus1' are also newly set as variables representing the size of the synthesized image.

합성 영상 생성부(120), 보다 구체적으로 PPS 생성부(120b)는 입력 영상들 각각에 포함되어 있는 PPS를 이용하여 합성 영상에 대하여 하나의 PPS를 생성한다. 즉, 각각의 입력 영상에 포함되어 있던 PPS들을 이용하여 하나의 PPS가 생성된다. 왜냐하면, 영상 합성 장치(100)는 복수의 입력 영상이 혼합된 하나의 픽쳐(합성 영상의 프레임)에 대한 비트스트림을 생성하여 출력하기 때문이다. 이 경우에, PPS 생성부(120b)는 각 입력 영상의 PPS에 포함되어 있는 대부분의 파라미터들은 그대로 이용한다. 반면, PPS 생성부(120b)는 PPS에 포함된 파라미터들 중에서, 'pic_parameter_set_id', 'seq_parameter_set_id', 'num_slice_groups_minus1', 'slice_group_maptype', 및 'top_left'와 'bottom_right'은 합성 영상 및 이를 구성하는 입력 영상(슬라이스 그룹)의 디스플레이 형태에 적합하도록 그 값을 새롭게 설정하거나 변경한다. The composite image generator 120, and more specifically, the PPS generator 120b generates one PPS for the composite image by using the PPS included in each of the input images. That is, one PPS is generated using the PPSs included in each input image. This is because the image synthesizing apparatus 100 generates and outputs a bitstream for one picture (frame of the composite image) in which a plurality of input images are mixed. In this case, the PPS generation unit 120b uses most of the parameters included in the PPS of each input image. On the other hand, the PPS generation unit 120b includes, among the parameters included in the PPS, 'pic_parameter_set_id', 'seq_parameter_set_id', 'num_slice_groups_minus1', 'slice_group_maptype', and 'top_left' and 'bottom_right' are synthetic images and input images constituting the same. Set or change the value to suit the display type of (Slice Group).

예를 들어, 'pic_parameter_set_id'는 합성 영상의 해당 픽쳐를 지시하는 값으로 새롭게 설정될 수 있다. 'seq_parameter_set_id'는 해당 픽쳐가 포함되는 시퀀스에 대하여 SPS 생성부(120a)에서 새롭게 설정한 'seq_parameter_set_id'의 값으로 설정될 수 있다. 'num_slice_groups_minus1'는 해당 픽쳐를 구성하는 슬라이스 그룹의 개수를 지시하는 값으로 설정될 수 있는데, 예를 들어 상기 슬라이스 그룹의 개수는 입력 영상의 개수가 될 수 있다. 이와 같이, 본 발명의 실시예에서는 각 입력 영상을 합성 영상의 슬라이스 그룹으로 간주하고 합성을 하기 때문에, 합성 영상을 구성하는 입력 영상의 종류 및 각 입력 영상의 크기와 위치 등을 임의로 결정할 수가 있다. 'slice_group_maptype'는 다양한 입력을 위하여 'foreground/leftover'로 지정되거나 또는 각 매크로블록 또는 그 쌍에 명시적으로 할당하는 것으로 지정될 수 있다. 그리고 'top_left'와 'bottom_right'는 각 슬라이스 그룹 즉, 각 입력 영상의 크기와 합성 영상에서의 각 입력 영상의 위치에 따라서, 소정의 값으로 설정할 수가 있다.For example, 'pic_parameter_set_id' may be newly set to a value indicating a corresponding picture of the composite image. 'seq_parameter_set_id' may be set to a value of 'seq_parameter_set_id' newly set by the SPS generation unit 120a for the sequence including the corresponding picture. 'num_slice_groups_minus1' may be set to a value indicating the number of slice groups constituting a corresponding picture. For example, the number of slice groups may be the number of input images. As described above, in the embodiment of the present invention, since each input image is regarded as a slice group of the synthesized image and synthesized, the type of the input image constituting the synthesized image, the size and position of each input image, and the like can be arbitrarily determined. 'slice_group_maptype' may be designated as 'foreground / leftover' for various inputs or explicitly assigned to each macroblock or pair thereof. 'Top_left' and 'bottom_right' may be set to a predetermined value according to each slice group, that is, the size of each input image and the position of each input image in the composite image.

도 8은 합성 영상 생성부(120), 보다 구체적으로 슬라이스 헤더 변경부(120c)에서 비트스트림 혼합 방식을 설명하기 위한 도면이다. 도 8을 참조하면, (a)에서 4개의 입력 영상은 비트스트림 내에서의 각각의 슬라이스와 매크로블록을 가지고 있다. 이 때, 4개의 비디오 영상을 하나의 비디오 영상으로 생각한다면, (b)와 같은 슬라이스와 매크로블록의 구조를 가지게 된다. 즉, 4개의 영상의 슬라이스의 번호가 매크로블록 순서에 대한 변수를 처리하므로, 1개의 비트스트림에 N개의 비트스트림 정보를 포함하여 영상을 합성할 수 있다. 따라서 슬라이스 헤더 변경부(120c)는 입력된 입력 영상의 비트스트림들을 기반으로 슬라이스 위치를 지정하기 위한 최초의 매크로블록의 주소(first_mb_in_slice)를 변환하고, 슬라이스 유형(slice_type), 해당 슬라이스가 속하는 픽쳐의 PPS 식별자(pic_parameter_set_id), 프레임 번호(frame_number) 등의 변수를 합성 영상의 디스플레이 형태에 맞도록 변환하거나 새롭게 설정한다. FIG. 8 is a diagram for describing a bitstream mixing method in the synthesized image generator 120 and more specifically, the slice header changer 120c. Referring to FIG. 8, in (a), four input images have respective slices and macroblocks in the bitstream. At this time, if four video images are considered as one video image, the structure of a slice and a macroblock as shown in (b) is obtained. That is, since the number of slices of four images processes variables for macroblock order, images can be synthesized by including N bitstream information in one bitstream. Therefore, the slice header changing unit 120c converts the address of the first macroblock (first_mb_in_slice) for specifying the slice position based on the bitstreams of the input image, and inputs the slice type (slice_type) and the picture of the picture to which the slice belongs. Variables such as a PPS identifier (pic_parameter_set_id) and a frame number (frame_number) are converted or newly set to match the display type of the composite image.

예를 들어, 'first_mb_in_slice'는 소정의 방법을 이용하여 계산될 수 있다. 도 9는 도 8의 (a)를 입력 영상으로 이용하여 생성된 합성 영상의 다른 예를 보여 주기 위한 것으로서, 도 9에 도시된 것과 같은 합성 영상에서 'first_mb_in_slice'는 다음의 수학식 1과 같이 계산될 수 있다. For example, 'first_mb_in_slice' may be calculated using a predetermined method. FIG. 9 illustrates another example of a synthesized image generated by using FIG. 8A as an input image. In the synthesized image as shown in FIG. 9, 'first_mb_in_slice' is calculated as in Equation 1 below. Can be.

first_mb_in_slice = first_mb_in_slice(i) + w(i)*[T_num_mb- num_mb(i)] + offset_mb(i)first_mb_in_slice = first_mb_in_slice (i) + w (i) * [T_num_mb-num_mb (i)] + offset_mb (i)

여기서, w(i) = first_mb_in_slice(i)/num_mb(i)이고, Offset_mb(i)는 i번째 픽쳐 앞에 있는 매크로블록의 수를 나타낸다.Here, w (i) = first_mb_in_slice (i) / num_mb (i), and Offset_mb (i) represents the number of macroblocks before the i-th picture.

수학식 1을 참조하면, 4개의 입력 영상의 비트스트림이 있을 경우에, 입력되는 비트스트림의 가로 방향의 매크로블록의 수 num_mb(i)를 영상 전체의 크기에서 입력 스트림 T_num_mb 뺀 값을 w(i)로 곱하면 화면 지정을 위한 슬라이스의 세로 위치 값이 나오며, 세로 위치에 가로로부터 떨어져 있는 거리 값인 offset_mb(i)를 더하면 슬라이스 위치를 지정된 위치로 변환할 수 있다. 또한, 1개의 슬라이스가 입력 영상에 대한 전체를 포함하고 있을 경우에는, 세로의 매크로블록의 개수를 first_mb_in_slice에 곱하여 다음 영상의 위치를 지정할 수 있다. Referring to Equation 1, when there are bitstreams of four input images, the number of macroblocks in the horizontal direction of the input bitstream num_mb (i) is subtracted from the size of the entire image minus the input stream T_num_mb w (i Multiply by) to get the vertical position value of the slice for screen designation. You can convert the slice position to the specified position by adding offset_mb (i), which is the distance value from the horizontal position, to the vertical position. In addition, when one slice includes the entirety of the input image, the number of vertical macroblocks may be multiplied by first_mb_in_slice to designate the position of the next image.

그리고 슬라이스 헤더 변경부(120c)에서는 'slice_type'은 입력을 그대로 유지하고, 'pic_parameter_set_id'는 PPS 생성부(120b)에서 합성 영상의 픽쳐에 대하여 새롭게 설정한 'pic_parameter_set_id'의 값으로 변경되며, 'frame_number'는 합성 영상에 대하여 새로운 일련 번호를 사용하여, 해당 슬라이스가 포함되는 프레임에 대한 번호를 지시하는 값으로 설정된다.In the slice header changing unit 120c, 'slice_type' is maintained as it is, and 'pic_parameter_set_id' is changed to a value of 'pic_parameter_set_id' newly set for the picture of the composite image by the PPS generation unit 120b, and 'frame_number' 'Is set to a value indicating the number of the frame containing the slice using the new serial number for the composite image.

계속해서 도 3을 참조하면, 스트림 합성부(130)는 합성 영상의 비트스트림을 생성한다. 스트림 합성부(130)에서 생성되는 비트스트림도 H.264/AVC에 따라서 부호화된 데이터이므로, NAL 단위(NALU)를 기본 단위로 한다. 스트림 합성부(130)에서는 파라미터 및 슬라이스 헤더 생성부(120)에서 새롭게 생성되거나 설정된 파라미터와 변수들 및 입력 영상들의 비트스트림에 포함되어 있는 그 외의 다른 파라미터와 변수들을 합성 영상의 비트스트림에 포함시킨다. 3, the stream synthesizer 130 generates a bitstream of the synthesized image. Since the bitstream generated by the stream synthesizing unit 130 is also data encoded according to H.264 / AVC, the NAL unit (NALU) is used as the basic unit. The stream synthesizing unit 130 includes the parameters and variables newly generated or set by the parameter and slice header generating unit 120 and other parameters and variables included in the bitstream of the input images in the bitstream of the synthesized image. .

도 10은 합성 영상의 비트스트림의 구성을 개략적으로 보여 주는 블록도로서, 합성 영상의 비트스트림에 포함되는 구성 요소들과 그 배열 순서를 보여 주기 위한 것이다. FIG. 10 is a block diagram schematically illustrating a configuration of a bitstream of a composite image, and is for illustrating components included in the bitstream of the composite image and an arrangement order thereof.

도 10을 참조하면, 합성 영상의 비트스트림에는 SPS를 포함하는 NALU가 제일 앞에 배치되는데, 이것은 SPS가 영상 재생 및 크기, 버퍼 관리 등의 스트림 전체에 대한 정보를 가지고 있기 때문이다. PPS를 포함하는 NALU는 SPS에 이어 두 번째로 배치된다. PPS는 현재 재생될 픽쳐 화면에 대한 정보를 가지고 있어서, SPS를 참조하여 현재 픽쳐의 참조 픽쳐의 개수와 영상 포맷에 관련된 정보를 재구성한 것이다. 그리고 전술한 바와 같이, 입력 영상의 비트스트림은 N개여서 입력되는 SPS와 PPS도 각각 N개이나, 합성된 비트스트림은 단일 스트림으로 출력되므로, 1개의 재구성된 SPS 및 PPS가 순차적으로 포함된다. Referring to FIG. 10, the NALU including the SPS is placed first in the bitstream of the synthesized video because the SPS has information on the entire stream such as video playback, size, and buffer management. NALUs containing PPS are placed second after the SPS. The PPS has information on a picture picture to be reproduced at present, and thus reconstructs information related to the number of reference pictures and the picture format of the current picture with reference to the SPS. As described above, the number of bit streams of the input image is N, and thus, N SPSs and PPSs are input, respectively, but since the synthesized bitstreams are output as a single stream, one reconstructed SPS and PPS are sequentially included.

합성 영상의 비트스트림에는 마지막으로 슬라이스 헤더 등을 포함하는 NALU가 배치된다. 슬라이스 헤더는 PPS를 참조하여, 화면 구성 요소인 슬라이스의 위치 및 유형, 현재 프레임 번호 등과 같이, 실제 화면 구성의 데이터를 재구성하여 포함한다. 그리고 이 경우에는 입력 비트스트림 N개에 대한 N개의 슬라이스 헤더 정보와 데이터가 포함된다.Finally, a NALU including a slice header and the like is disposed in the bitstream of the synthesized video. The slice header refers to the PPS and reconstructs the data of the actual screen configuration, such as the position and type of the slice, the current frame number, and the like. In this case, N slice header information and data for N input bitstreams are included.

도 11은 본 발명의 일 실시예에 따른 영상 합성 방법을 보여 주는 흐름도이다. N개의 입력 영상에 대한 비트스트림을 이용하여 하나의 합성 영상을 생성하는 구체적인 과정은 도 3 등을 참조하여 위에서 상세히 설명하였으므로, 이하에서는 영상 합성 방법에 관하여 개략적으로 설명하기로 한다.11 is a flowchart illustrating a method of synthesizing an image according to an embodiment of the present invention. Since a specific process of generating one composite image using the bitstreams for the N input images has been described in detail above with reference to FIG. 3, the image synthesis method will be described below.

도 11을 참조하면, 영상 합성 장치는 N개의 입력 영상들 각각에 대한 비트스 트림을 수신한다(201). N개의 비트스트림은 모두 동일한 동영상 코덱, 예컨대 H.264/AVC에 따라서 부호화되어 생성된 NAL 단위의 데이터일 수 있다. 그리고 영상 합성 장치는 수신된 N개의 비트스트림 각각에 대한 구문분석을 수행한다(202). 이 경우에, 각 비트스트림에 포함되어 있는 헤더 정보들을 제거하고, 분석기를 통하여 SPS, PPS, 및 슬라이스 헤더 각각에 대한 구문분석을 수행한다. 그리고 영상 합성 장치는 SPS, PPS, 및 슬라이스 헤더에 포함되어 있는 파라미터와 변수들 중에서, 합성 영상의 디스플레이 형태 등에 기초하여 이에 대응하도록 일부 파라미터와 변수들의 값을 변경하거나 새롭게 설정한다(203). 합성 영상에서는 각 입력 영상을 하나의 슬라이스 그룹으로 간주하고, 일부 파라미터와 변수의 값을 변경할 수 있다. 그리고 변경되거나 새롭게 설정되는 파라미터와 변수들과 함께, 입력 영상에 원래 포함되어 있는 파라미터와 변수들을 이용하여, 합성 영상의 비트스트림을 생성하여 출력한다(204). 합성 영상의 비트스트림은 통합된 하나의 SPS와 PPS를 포함하고, 또한 N개의 입력 영상들 중에서 합성 영상에 포함되는 전부 또는 일부 입력 영상에 대한 슬라이스 헤더와 영상 데이터를 포함한다.Referring to FIG. 11, the image synthesizing apparatus receives a bit stream for each of the N input images (201). The N bitstreams may all be data in NAL unit generated by encoding the same video codec, for example, H.264 / AVC. The image synthesizing apparatus parses each of the received N bitstreams (202). In this case, header information included in each bitstream is removed, and the parser performs parsing on each of the SPS, PPS, and slice headers. In operation 203, the image synthesizing apparatus changes or newly sets values of some parameters and variables among the parameters and variables included in the SPS, the PPS, and the slice header based on the display type of the synthesized image. In the composite image, each input image may be regarded as one slice group, and values of some parameters and variables may be changed. A bitstream of the synthesized image is generated and output using the parameters and variables originally included in the input image along with the changed and newly set parameters and variables (204). The bitstream of the composite image includes one integrated SPS and PPS, and also includes slice headers and image data of all or some input images included in the composite image among N input images.

이상의 설명은 본 발명의 실시예에 불과할 뿐, 이 실시예에 의하여 본 발명의 기술 사상이 한정되는 것으로 해석되어서는 안된다. 본 발명의 기술 사상은 특허청구범위에 기재된 발명에 의해서만 특정되어야 한다. 따라서 본 발명의 기술 사상을 벗어나지 않는 범위에서 전술한 실시예는 다양한 형태로 변형되어 구현될 수 있다는 것은 당업자에게 자명하다. The above description is only an embodiment of the present invention, and the technical idea of the present invention should not be construed as being limited by this embodiment. The technical idea of the present invention should be specified only by the invention described in the claims. Therefore, it will be apparent to those skilled in the art that the above-described embodiments may be implemented in various forms without departing from the spirit of the present invention.

도 1은 본 발명의 실시예에 따른 영상 합성 기법이 이용될 수 있는 동영상 멀티미디어 서비스의 일례를 설명하기 위한 블록도이다.1 is a block diagram illustrating an example of a video multimedia service in which an image combining technique according to an embodiment of the present invention may be used.

도 2는 도 1의 영상 합성 장치에서 출력하는 합성 영상의 디스플레이 형태에 대한 다른 예를 보여 주는 블록도이다.FIG. 2 is a block diagram illustrating another example of a display form of a composite image output from the image synthesizing apparatus of FIG. 1.

도 3은 본 발명의 일 실시예에 따른 영상 합성 장치(100)의 구성을 보여 주는 블록도이다.3 is a block diagram showing a configuration of an image synthesizing apparatus 100 according to an embodiment of the present invention.

도 4는 바이트열 포맷의 구성을 보여 주는 블록도이다.4 is a block diagram showing a configuration of a byte string format.

도 5a 내지 도 5c는 H.264/AVC에 따른 SPS RBSP 구문을 보여 주는 도면이다.5A to 5C are diagrams illustrating SPS RBSP syntax according to H.264 / AVC.

도 6a 내지 도 6c는 H.264/AVC에 따른 PPS RBSP 구문을 보여 주는 도면이다.6A to 6C are diagrams illustrating PPS RBSP syntax according to H.264 / AVC.

도 7a 내지 도 7d는 H.264/AVC에 따른 슬라이스 헤더 구문을 보여 주는 도면이다.7A to 7D illustrate slice header syntax according to H.264 / AVC.

도 8은 도 3의 합성 영상 생성부의 슬라이스 헤더 변경부에서 비트스트림 혼합 방식을 설명하기 위한 도면이다.FIG. 8 is a diagram for describing a bitstream mixing method in the slice header change unit of the synthesized image generator of FIG. 3.

도 9는 도 8의 (a)를 입력 영상으로 이용하여 생성된 합성 영상의 다른 예를 보여 주기 위한 도면이다.FIG. 9 is a diagram illustrating another example of a composite image generated by using FIG. 8A as an input image.

도 10은 합성 영상의 비트스트림의 구성을 개략적으로 보여 주는 블록도이다.10 is a block diagram schematically illustrating a configuration of a bitstream of a composite image.

도 11은 본 발명의 일 실시예에 따른 영상 합성 방법을 보여 주는 흐름도이다.11 is a flowchart illustrating a method of synthesizing an image according to an embodiment of the present invention.

Claims

A method of generating a composite image in which all or part of a plurality of input images are arranged in a predetermined display form,

(a) receiving bitstreams in H.264 / AVC format for the plurality of input images and parsing the sequence parameter set (SPS), picture parameter set (PPS), and slice header of each received bitstream; Performing;

(b) changing or resetting some of the parameters and variables of the SPS, PPS, and slice header to correspond to the display type of the composite image; And

(c) generating and outputting a bitstream including the changed or reset parameters and variables as a bitstream of the H.264 / AVC format for the composite image.

The method of claim 1,

In the image synthesizing method, each of the plurality of input images is regarded as a slice group constituting the synthesized image.

The method of claim 2,

The step (b) may include changing or newly setting values of 'level_idc', 'seq_parameter_set_id', 'pic_width_in_mbs_minus1', and 'pic_height_in_map_units_minus1' among the parameters of the SPS.

The method of claim 3,

Step (b) includes changing or newly setting values of 'pic_parameter_set_id', 'seq_parameter_set_id', 'num_slice_groups_minus1', 'slice_group_maptype', 'top_left', and 'bottom_right' among the parameters of the PPS. Image synthesis method.

The method of claim 4, wherein

Step (b) comprises changing or newly setting the values of first_mb_in_slice, pic_parameter_set_id, and frame_number among variables included in the slice header.

The method of claim 1,

The bitstream of the H.264 / AVC format for the composite image generated in step (c) includes one SPS, one PPS, and a slice header and image corresponding to the number of the input images included in the composite image. An image synthesizing method comprising data.

An apparatus for generating a composite image in which all or part of a plurality of input images are reconstructed into a predetermined display form,

Receiving bitstreams in H.264 / AVC format for the plurality of input images and parsing the sequence parameter set (SPS), picture parameter set (PPS), and slice header of each received bitstream. Stream classification analyzer for;

A parameter and slice header generation unit for changing or resetting some of the parameters and variables of the SPS, PPS, and slice header to correspond to the display form of the composite image; And

And a stream synthesizer for generating and outputting a bitstream including the changed or reset parameters and variables as a bitstream of the H.264 / AVC format for the synthesized image.

The method of claim 7, wherein the stream parser is

SPS parsing unit for parsing the SPS of each of the bitstreams of the H.264 / AVC format for the input image, parsing PPS syntax of each of the bitstreams of the H.264 / AVC format for the input image And a slice header parser for analyzing slice header syntax of each of the bitstreams of the H.264 / AVC format with respect to the input image.

An image synthesizing method for generating a synthetic image in which all or part of a plurality of input images are reconstructed into a predetermined display form,

Receiving bitstreams in H.264 / AVC format for the plurality of input images, and changing or resetting some of the parameters and variables included in each of the received bitstreams to correspond to the display form of the composite image; And generating and outputting a bitstream including the changed and reset parameters and variables as a bitstream of the H.264 / AVC format for the composite image.

The method of claim 9, wherein the image synthesizing method is

Changing and integrating a sequence parameter set (SPS) included in each of the bitstreams of the plurality of input images into a single SPS for the composite image,

A picture parameter set (PPS) included in each of the plurality of bitstreams of the input image is changed into a single PPS for the composite image, and integrated;

And a slice header included in each of the bitstreams of the plurality of input images to a plurality of slice headers corresponding to a display form of the synthesized image.