KR20120117382A

KR20120117382A - Fast scalable video coding method and device using multi-track video

Info

Publication number: KR20120117382A
Application number: KR1020110035110A
Authority: KR
Inventors: 배태면
Original assignee: 에스케이플래닛 주식회사
Priority date: 2011-04-15
Filing date: 2011-04-15
Publication date: 2012-10-24
Also published as: KR101853744B1

Abstract

PURPOSE: A method and an apparatus for encoding a multi-track video in a scalable video are provided to realize high speed by reducing a multi-track video in encoding time to a scalable video. CONSTITUTION: A video aligning unit(240) aligns two or more multi-track videos suitable for SVC(Scalable Video Coding) encoding. A bitstream analyzing unit(250) analyzes bitstream from first video of the lowest priority layer to a video of the highest layer and extracts encoding information. An SVC re-encoding unit(260) executes re-encoding based on encoding information. [Reference numerals] (210) Communication unit; (220) Original video storing unit; (230) Multi-track video encoder; (240) Video aligning unit; (250) Bit stream analyzing unit; (260) SVC re-encoding unit

Description

Fast scalable video coding method and device using multi-track video}

본 발명은 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법 및 장치에 관한 것으로서, 더욱 자세하게는 둘 이상의 멀티트랙 비디오(multi-track video)를 해상도, 프레임율, 비트율에 따라 SVC(Scalable Video Coding) 인코딩에 맞게 정렬하고, 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어(layer)의 1번째 비디오로부터 최상위 레이어의 비디오까지 각각의 비트스트림(bitstream)을 분석하여 인코딩 정보를 추출하며, 추출한 인코딩 정보를 기반으로 SVC 재인코딩을 실행하여 SVC 비디오를 생성할 수 있도록 된 것이다.The present invention relates to a method and apparatus for encoding multitrack video into scalable video, and more particularly, to scalable video coding (SVC) encoding two or more multitrack video according to resolution, frame rate, and bit rate. And extract the encoding information by analyzing the respective bitstreams from the first video of the lowest layer to the video of the highest layer among the arranged multitrack videos, and extract the encoding information based on the extracted encoding information. Re-encoding can be performed to generate SVC video.

컨텐츠 제공자(CP)로부터 제공받은 비디오 및 오디오를 포함하는 컨텐츠를 사용자 단말기에 서비스 하기 위해서는 기본적인 해상도 스케일링(Resolution Scaling), 프레임율 변환(frame rate conversion), 비디오/오디오 인코딩(Vidoe/audio Encoding), 메타데이터 삽입(Metadata Insertion), 패키징(Packaging) 등의 인제스팅(Ingesting) 과정을 거친다. In order to service content including video and audio provided from a content provider (CP) to a user terminal, basic resolution scaling, frame rate conversion, video / audio encoding, Ingesting process such as metadata insertion and packaging.

이때, 컨텐츠 서버에서 인제스팅 과정을 수행할 때 오류가 발생한 비디오를 사용자 단말기에 서비스하는 경우 사용자 단말기에서 재생할 때 문제가 발생하게 된다. 이를 방지하기 위해 최종 결과물을 재생하여 사람이 직접 확인하는 과정을 마지막으로 거치게 된다. In this case, when the video server that provides an error when performing the ingestion process in the content server serves the user terminal, a problem occurs when the user terminal plays the video. In order to prevent this, the final result is finally reproduced by a human being.

그런데, 컨텐츠 서버에서 사용자 단말기에 제공하는 대부분의 비디오는 그 양이 많거나 방대하므로 인제스팅 하기 위해서 사용자가 일일이 확인하는 과정은 오래 걸리거나 한계가 있으므로 사람에 의한 확인 과정을 자동화하여 좀 더 빠르게 수행하는 기술이 최근에 많이 제안되고 있다.However, since most of the videos provided by the content server to the user terminal are large or massive, the process of checking the user manually for ingesting takes a long time or is limited, so that the verification process by the human person is performed faster. Many techniques have been proposed in recent years.

이와 함께 온라인 비디오 서비스는 인터넷을 통해 비디오를 스트리밍하여 사용자가 비디오를 소비할 수 있도록 하는데, 이 때 사용자의 네트워크 환경에 맞게 비디오 데이터량을 조절하면서 전송하는 적응적 비디오 스트리밍(adaptive video streaming) 기술을 통해 사용자에게 중간에 끊기거나 영상이 깨진 비디오를 시청하도록 하는 일을 방지하는 기술이 일반화되고 있다. 현재의 adaptive video streaming 기술은 하나의 원본 비디오에 대해 다양한 데이터 크기를 가지는 압축비디오를 만들어 두고 사용자의 네트워크 환경에 맞는 압축비디오를 선택하는 기술이 주를 이루고 있는데, 이러한 방법은 서비스 시스템이 하나의 비디오에 대해 여러 개의 압축 비디오(Multi-track video)를 인제스팅해야 한다. At the same time, online video services stream video over the Internet to enable users to consume video. In this case, adaptive video streaming technology, which adjusts and transmits the amount of video data according to the user's network environment, is used. Technology to prevent users from watching videos that are cut off or broken in the middle is becoming common. Current adaptive video streaming technology mainly consists of creating compressed video with various data sizes for one original video and selecting compressed video according to user's network environment. You need to ingest multiple compressed video for.

한편, 최근에는 하나의 압축 비디오로 다양한 device와 네트워크 환경에 대해 비디오 서비스를 제공하는 것을 목적으로 Scalable Video Coding(SVC) 방법이 ITU와 MPEG의 join video technology(JVT) group에 의해 H.264를 기반으로 표준화 되었다. Recently, the Scalable Video Coding (SVC) method is based on H.264 by the join video technology (JVT) group of ITU and MPEG for the purpose of providing video services to various devices and network environments with one compressed video. Was standardized.

그러나 SVC는 표준화가 최근에 이루어져 상용화에 있어서는 초기 단계에 있으며, 현재는 SVC 전 단계로 video를 H.264와 같은 기존의 coding방법으로 여러 개의 파일을 준비하여 각 device와 네트워크 환경에 맞는 파일을 제공하는 multi-track video 방식을 활용하고 있다. 앞으로 SVC로 전환하는 경우, 기존의 multi-track video를 SVC로 재인코딩해야 하며, 이러한 재인코딩은 시간이 많이 소요되는 문제점이 있다.However, SVC has been standardized recently and is in the early stage of commercialization. Currently, SVC prepares several files with existing coding methods such as H.264 to provide files suitable for each device and network environment. It uses a multi-track video method. When switching to SVC in the future, the existing multi-track video must be re-encoded in SVC, this re-encoding has a problem that takes a lot of time.

전술한 문제점을 해결하기 위한 본 발명은, 둘 이상의 멀티트랙 비디오를 해상도, 프레임율, 비트율에 따라 SVC 인코딩에 맞게 정렬하고, 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어(layer)의 1번째 비디오로부터 최상위 레이어의 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출하며, 추출한 인코딩 정보를 기반으로 SVC 재인코딩을 실행하여 SVC 비디오를 생성할 수 있도록 된, 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법 및 장치를 제공함에 그 목적이 있다.The present invention for solving the above-described problems, the two or more multitrack video is aligned according to the SVC encoding according to the resolution, frame rate, bit rate, and the highest order from the first video of the lowest layer of the sorted multitrack video A method of encoding multi-track video into scalable video, which is capable of extracting encoding information by analyzing each bitstream up to the video of the layer, and generating SVC video by performing SVC re-encoding based on the extracted encoding information; and The object is to provide a device.

전술한 목적을 달성하기 위한 본 발명의 일 측면에 따르면, 둘 이상의 멀티트랙 비디오를 해상도, 프레임율, 비트율에 따라 SVC 인코딩에 맞게 정렬하고, 정렬된 멀티트랙 비디오에서 가장 하위 레이어(layer)의 1번째 비디오와 그 다음 하위 레이어의 2번째 비디오 내지 최상위 레이어의 비디오에 대한 비트스트림을 분석하여 인코딩 정보를 추출하며, 추출한 인코딩 정보를 기반으로 SVC 재인코딩을 실행하는 것을 특징으로 하는 비디오 인코딩 장치가 제공된다.According to an aspect of the present invention for achieving the above object, two or more multitrack video is aligned to SVC encoding according to the resolution, frame rate, bit rate, and one of the lowest layer in the aligned multitrack video Provides a video encoding apparatus characterized by extracting the encoding information by analyzing the bitstream of the second video and the video of the second layer or the top layer of the next lower layer, and performs the SVC re-encoding based on the extracted encoding information do.

한편, 전술한 목적을 달성하기 위한 본 발명의 다른 측면에 따르면, 둘 이상의 멀티트랙 비디오를 SVC 인코딩에 맞게 정렬하는 비디오 정렬부; 상기 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오로부터 최상위 레이어의 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출하는 비트스트림 분석부; 및 상기 인코딩 정보를 기반으로 SVC 재인코딩을 실행하는 SVC 재인코딩부를 포함하는 비디오 인코딩 장치가 제공된다.On the other hand, according to another aspect of the present invention for achieving the above object, a video alignment unit for aligning two or more multitrack video to SVC encoding; A bitstream analyzer for extracting encoding information by analyzing each bitstream from the first video of the lowest layer to the video of the highest layer among the aligned multitrack videos; And an SVC re-encoding unit for performing SVC re-encoding based on the encoding information.

또한, 상기 비디오 정렬부는, 상기 둘 이상의 멀티트랙 비디오에 대해 해상도가 높은 비디오를 더 높은 레이어(Layer)에 정렬하고, 해상도가 동일한 비디오 중에서 프레임율(frame rate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬하며, 해상도와 프레임율이 동일한 경우에 비트율(bitrate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬할 수 있다.The video aligner may be configured to align the higher resolution video to the higher layer for the two or more multitrack videos, and to place the higher frame rate video among the same resolution video to the higher layer. If the resolution and frame rate are the same, the video with the higher bitrate can be arranged to be located in the higher layer.

또한, 상기 비트스트림 분석부는, 상기 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오와 그 다음 하위 레이어의 2번째 비디오를 입력받아 각각의 비트스트림을 분석하여 인코딩 모드(encoding mode), 프리딕션 정보(prediction information)를 포함하는 인코딩 정보를 추출하고, 이어 그 다음 하위 레이어의 3번째 비디오부터 최상위 레이어의 N번째 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출할 수 있다.The bitstream analyzer may receive the first video of the lowest layer and the second video of the next lower layer among the aligned multitrack videos, and analyze each bitstream to encode an encoding mode and a prediction. The encoding information including the information may be extracted, and then the encoding information may be extracted by analyzing each bitstream from the third video of the lower layer to the Nth video of the uppermost layer.

또한, 상기 SVC 재인코딩부는, 입력되는 2 개의 인코딩 정보를 바탕으로 상위 레이어에 해당하는 비트스트림을 매크로블럭 단위로 재인코딩을 수행할 수 있다.The SVC re-encoding unit may re-encode the bitstream corresponding to the upper layer in units of macroblocks based on two input encoding information.

그리고, 상기 SVC 재인코딩부는, 하위와 상위 매크로블럭에 대응되는 매크로블럭의 디코딩 된 텍스처 정보를 업스케일링(up-scaling)하여 인터레이어 인트라 프리딕션(iner-layer intra prediction) 영상을 구하고 이를 이용하여 인코딩 코스트(encoding cost)를 산출하며, 인터레이어 인트라 코딩의 인코딩 코스트가 현재 인코딩 모드의 인코딩 코스트에 특정 상수값을 더한 값보다 크면 현재의 모드로 인코딩을 수행하고, 그렇지 않으면 인터레이어 인트라 프리딕션 모드로 재인코딩을 수행할 수 있다.The SVC re-encoding unit may up-scale the decoded texture information of the macroblocks corresponding to the lower and upper macroblocks to obtain an inter-layer intra prediction image and use the same. Calculates the encoding cost, and if the encoding cost of the interlayer intra coding is greater than the encoding cost of the current encoding mode plus a certain constant value, the encoding is performed in the current mode; otherwise, the interlayer intra prediction mode Re-encoding can be done with

한편, 전술한 목적을 달성하기 위한 본 발명의 또다른 측면에 따르면, (a) 둘 이상의 멀티트랙 비디오를 SVC 인코딩에 맞게 정렬하는 단계; (b) 상기 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오로부터 최상위 레이어의 비디오에 대한 비트스트림을 분석하여 인코딩 정보를 추출하는 단계; 및 (c) 상기 인코딩 정보를 기반으로 SVC 재인코딩을 실행하는 단계를 포함하는 비디오 인코딩 장치의 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법이 제공된다.On the other hand, according to another aspect of the present invention for achieving the above object, (a) aligning two or more multitrack video to SVC encoding; (b) extracting encoding information by analyzing a bitstream of the video of the highest layer from the first video of the lowest layer among the aligned multitrack videos; And (c) executing SVC re-encoding based on the encoding information. A method of encoding multitrack video of a video encoding apparatus into scalable video is provided.

또한, 상기 (a) 단계는, 상기 둘 이상의 멀티트랙 비디오에 대해 해상도가 높은 비디오를 더 높은 레이어(Layer)에 정렬하고, 해상도가 동일한 비디오 중에서 프레임율(frame rate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬하며, 해상도와 프레임율이 동일한 경우에 비트율(bitrate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬할 수 있다.In addition, the step (a) is to align the higher resolution video to the higher layer (Layer) for the two or more multi-track video, the higher the frame rate of the video of the same resolution If the resolution and frame rate are the same, the video with the higher bitrate can be arranged to be located in the higher layer.

또한, 상기 (b) 단계는, 상기 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오와 그 다음 하위 레이어의 2번째 비디오를 입력받아 각각의 비트스트림을 분석하여 인코딩 모드(encoding mode), 프리딕션 정보(prediction information)를 포함하는 인코딩 정보를 추출하고, 이어 그 다음 하위 레이어의 3번째 비디오로부터 최상위 레이어의 N번째 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출할 수 있다.Also, in the step (b), the first video of the lowest layer and the second video of the next lower layer are input from the sorted multitrack video, and the respective bitstreams are analyzed to encode an encoding mode and a free signal. Encoding information including dictionary information may be extracted, and then encoding information may be extracted by analyzing each bitstream from the third video of the lower layer to the Nth video of the uppermost layer.

또한, 상기 (c) 단계는, 입력되는 2 개의 인코딩 정보를 바탕으로 상위 레이어에 해당하는 비트스트림을 매크로블럭 단위로 재인코딩을 수행할 수 있다.Also, in the step (c), the bitstream corresponding to the upper layer may be re-encoded in units of macroblocks based on the two input encoding information.

그리고, 상기 (c) 단계는, 하위와 상위 매크로블럭에 대응되는 매크로블럭의 디코딩 된 텍스처 정보를 업스케일링(up-scaling)하여 인터레이어 인트라 프리딕션(iner-layer intra prediction) 영상을 구하고 이를 이용하여 인코딩 코스트(encoding cost)를 산출하며, 인터레이어 인트라 코딩의 인코딩 코스트가 현재 인코딩 모드의 인코딩 코스트에 특정 상수값을 더한 값보다 크면 현재의 모드로 인코딩을 수행하고, 그렇지 않으면 인터레이어 인트라 프리딕션 모드로 재인코딩을 수행할 수 있다.In the step (c), up-scaling decoded texture information of the macroblocks corresponding to the lower and upper macroblocks to obtain an inter-layer intra prediction image and use the same. To calculate the encoding cost.If the encoding cost of the interlayer intra coding is greater than the encoding cost of the current encoding mode plus a certain constant value, the encoding is performed in the current mode. Otherwise, the interlayer intra prediction is performed. Re-encoding can be performed in mode.

한편, 전술한 목적을 달성하기 위한 본 발명의 또다른 측면에 따르면, (a) 둘 이상의 멀티트랙 비디오를 SVC 인코딩에 맞게 정렬하는 단계; (b) 상기 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오로부터 최상위 레이어의 비디오에 대한 비트스트림을 분석하여 인코딩 정보를 추출하는 단계; (c) 상기 인코딩 정보를 기반으로 인터레이어 인트라 프리딕션을 수행하여 인터레이어 인트라 모드의 인코딩 코스트를 산출하는 단계; (d) 현재 인코딩 모드의 인코딩 코스트를 산출하는 단계; 및 (e) 상기 두 인코딩 코스트를 비교하여 그 결과에 따라 현재 인코딩 모드 또는 인터레이어 인트라 모드로 SVC 재인코딩을 실행하는 단계를 포함하는 비디오 인코딩 장치의 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법이 제공된다.On the other hand, according to another aspect of the present invention for achieving the above object, (a) aligning two or more multitrack video to SVC encoding; (b) extracting encoding information by analyzing a bitstream of the video of the highest layer from the first video of the lowest layer among the aligned multitrack videos; (c) calculating an encoding cost of the interlayer intra mode by performing interlayer intra prediction based on the encoding information; (d) calculating an encoding cost of the current encoding mode; And (e) comparing the two encoding costs and executing SVC re-encoding in the current encoding mode or the interlayer intra mode according to the result. Is provided.

여기서, 상기 인코딩 코스트는 를 통해 산출하고, R은 inter mode일 때 motion vector를 전송하는 데 필요한 데이터 크기를 나타내는 cost이고, λ는 Lagrange multipler로써 예측오차 cost와 움직임 정보 cost의 weight를 표현하는 상수이며 상기 Intra mode의 encoding cost의 경우에 R이 zero가 된다.Here, the encoding cost is calculated by, R is a cost representing the data size required to transmit the motion vector in the inter mode, λ is a Lagrange multipler is a constant representing the weight of the prediction error cost and the motion information cost In the case of the encoding cost of the intra mode, R is zero.

본 발명에 의하면, 멀티트랙 비디오를 스케일러블 비디오로의 인코딩 시간을 단축시켜 고속화를 실현할 수 있다.According to the present invention, it is possible to shorten the encoding time of multitrack video into scalable video and to realize high speed.

또한, 기존의 인코딩 모드를 최대한 활용하여 멀티트랙 비디오를 스케일러블 비디오로 고속 변환이 가능하고, 스케일러블 비디오의 특징인 인터레이어 인트라 프리딕션 모드를 활용하여 압축 효율을 높일 수 있다.In addition, high-speed conversion of multitrack video to scalable video is possible by utilizing the existing encoding mode to the maximum, and compression efficiency can be improved by utilizing an interlayer intra prediction mode, which is a feature of scalable video.

도 1은 본 발명의 실시예에 따른 비디오 제공 시스템의 전체적인 구성을 개략적으로 나타낸 구성도이다.
도 2는 본 발명의 실시예에 따른 비디오 인코딩 장치의 내부 기능 블럭을 개략적으로 나타낸 구성도이다.
도 3은 본 발명의 실시예에 따른 비디오 인코딩 장치의 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법을 설명하기 위한 동작 흐름도이다.
도 4는 본 발명의 실시예에 따라 원본 비디오를 N 개의 멀티트랙 비디오로 생성하는 예를 나타낸 도면이다.
도 5는 본 발명의 실시예에 따른 각 매크로블럭별 SVC 재인코딩 과정을 설명하기 위한 동작 흐름도이다.1 is a configuration diagram schematically showing the overall configuration of a video providing system according to an embodiment of the present invention.
2 is a block diagram schematically illustrating an internal functional block of a video encoding apparatus according to an embodiment of the present invention.
3 is a flowchart illustrating a method of encoding multitrack video into scalable video of a video encoding apparatus according to an embodiment of the present invention.
4 is a diagram illustrating an example of generating an original video into N multitrack videos according to an embodiment of the present invention.
5 is a flowchart illustrating an SVC re-encoding process for each macroblock according to an embodiment of the present invention.

본 발명의 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하 상세한 설명에 의해 보다 명확하게 이해될 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 비디오 제공 시스템의 전체적인 구성을 개략적으로 나타낸 구성도이다.1 is a configuration diagram schematically showing the overall configuration of a video providing system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명에 따른 비디오 제공 시스템(100)은 비디오 인코딩 장치(110), 통신망(120) 및 사용자 단말기(130) 등을 포함한다.Referring to FIG. 1, a video providing system 100 according to the present invention includes a video encoding apparatus 110, a communication network 120, a user terminal 130, and the like.

비디오 인코딩 장치(110)는 둘 이상의 멀티트랙 비디오를 해상도, 프레임율, 비트율에 따라 SVC 인코딩에 맞게 정렬하고, 정렬된 멀티트랙 비디오에서 가장 하위 레이어(layer)의 1번째 비디오와 그 다음 하위 레이어의 2번째 비디오 내지 최상위 레이어의 비디오에 대한 비트스트림을 분석하여 인코딩 정보를 추출하며, 추출한 인코딩 정보를 기반으로 SVC 재인코딩을 실행한다.The video encoding apparatus 110 arranges two or more multitrack videos according to the resolution, frame rate, and bit rate according to SVC encoding, and the first video of the lowest layer and the next lower layer of the aligned multitrack video. The encoding information is extracted by analyzing the bitstream of the second video to the top layer video, and SVC re-encoding is performed based on the extracted encoding information.

이어, 비디오 인코딩 장치(110)는 재인코딩 된 SVC 비디오를 통신망(120)을 통해 사용자 단말기(130)에 전송하여 제공한다.Subsequently, the video encoding apparatus 110 transmits and re-encodes the SVC video to the user terminal 130 through the communication network 120.

여기서, 비디오 인코딩 장치(110)는 원본 비디오를 인제스팅하여 하나 이상의 사용자 단말기(130)에 전송해 주는 미디어 서버 등이 될 수 있으며, 원본 비디오를 입력받아 둘 이상 다수 개의 멀티트랙 비디오로 인제스팅해서 통신망(120)을 통해 전송할 수 있는 전용 미디어 장치 등이 될 수 있다.Here, the video encoding apparatus 110 may be a media server that ingests the original video and transmits it to one or more user terminals 130. The video encoding apparatus 110 receives the original video and ingests it into two or more multitrack videos. It may be a dedicated media device that can be transmitted through the communication network 120.

통신망(120)은 비디오 인코딩 장치(110)에서 사용자 단말기(130)로 비디오를 전송하는 전송 경로를 제공하고, 사용자 단말기(130)가 비디오 인코딩 장치(110)에 접속하기 위한 접속 경로를 제공한다. 여기서, 통신망(120)은 WCDMA, HDPA, 3G, 4G 등 이동 통신망과, 블루투스(Bluetooth)와 지그비(Zigbee), 와이파이(Wi-Fi) 등 근거리 통신망과, 인터넷이나 PSTN 등 유선 통신망 등을 포함한다.The communication network 120 provides a transmission path for transmitting video from the video encoding apparatus 110 to the user terminal 130, and provides a connection path for the user terminal 130 to access the video encoding apparatus 110. Here, the communication network 120 includes a mobile communication network such as WCDMA, HDPA, 3G, 4G, a local area network such as Bluetooth, Zigbee, and Wi-Fi, and a wired communication network such as the Internet or PSTN. .

사용자 단말기(130)는 비디오 인코딩 장치(110)로부터 인제스팅 된 SVC 비디오를 수신하여 디코딩해서 디스플레이한다.The user terminal 130 receives, decodes, and displays the ingested SVC video from the video encoding apparatus 110.

여기서, 사용자 단말기(130)는 비디오 인코딩 장치(110)로부터 비디오 데이터를 수신하여 디스플레이할 수 있는 IPTV, 셋탑박스(Settop Box) 등이 될 수 있으며, 사용자가 이동하면서 비디오 데이터를 재생하여 볼 수 있는 스마트 폰이나 이동통신 단말기 등이 될 수 있다.Here, the user terminal 130 may be an IPTV, a set-top box, etc., which may receive and display video data from the video encoding apparatus 110, and may play and view the video data while the user moves. It may be a smart phone or a mobile communication terminal.

도 2는 본 발명의 실시예에 따른 비디오 인코딩 장치의 내부 기능 블럭을 개략적으로 나타낸 구성도이다.2 is a block diagram schematically illustrating an internal functional block of a video encoding apparatus according to an embodiment of the present invention.

도 2를 참조하면, 본 발명에 따른 비디오 인코딩 장치(110)는, 통신부(210), 원본비디오 저장부(220), 멀티트랙비디오 인코더(230), 비디오 정렬부(240), 비트스트림 분석부(250) 및 SVC 재인코딩부(260) 등을 포함한다.2, the video encoding apparatus 110 according to the present invention includes a communication unit 210, an original video storage unit 220, a multitrack video encoder 230, a video alignment unit 240, and a bitstream analyzer. 250, the SVC re-encoding unit 260, and the like.

통신부(210)는 사용자 단말기(130)와 통신망(120)을 통해 통신한다.The communication unit 210 communicates with the user terminal 130 through the communication network 120.

즉, 통신부(210)는 사용자 단말기(130)로부터 통신망(120)을 통해 비디오 및 오디오를 포함하는 컨텐츠의 전송 요청을 수신하거나, 사용자 단말기(130)에 인제스팅(Ingesting) 된 컨텐츠를 전송한다.That is, the communication unit 210 receives a request for transmitting content including video and audio from the user terminal 130 or transmits the ingested content to the user terminal 130.

원본비디오 저장부(220)는 오디오와 비디오를 포함하는 원본 비디오를 다양한 종류에 따라 다수 개로 저장하고 있다.The original video storage unit 220 stores a plurality of original videos including audio and video according to various types.

멀티트랙 비디오 인코더(230)는 원본 비디오를 입력받아 도 4에 도시된 바와 같이 화질이 서로 다른 둘 이상 N 개의 압축비디오를 생성한다. 도 4는 본 발명의 실시예에 따라 원본 비디오를 N 개의 압축비디오로 생성하는 예를 나타낸 도면이다.The multitrack video encoder 230 receives the original video and generates two or more N-compressed videos having different quality as shown in FIG. 4. 4 illustrates an example of generating an original video into N compressed videos according to an embodiment of the present invention.

비디오 정렬부(240)는 둘 이상의 멀티트랙 비디오를 SVC 인코딩에 맞게 정렬한다.The video aligner 240 aligns two or more multitrack videos for SVC encoding.

이때, 비디오 정렬부(240)는, 둘 이상의 멀티트랙 비디오에 대해 해상도가 높은 비디오를 더 높은 레이어(Layer)에 정렬하고, 해상도가 동일한 비디오 중에서 프레임율(frame rate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬하며, 해상도와 프레임율이 동일한 경우에 비트율(bit rate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬한다.In this case, the video aligning unit 240 aligns a video having a higher resolution to a higher layer for two or more multitrack videos, and displays a video having a high frame rate among the videos having the same resolution. If the resolution and frame rate are the same, the video with the higher bit rate is aligned with the higher layer.

비트스트림 분석부(250)는 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오로부터 최상위 레이어의 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출한다.The bitstream analyzer 250 extracts encoding information by analyzing each bitstream from the first video of the lowest layer to the video of the highest layer among the arranged multitrack videos.

이때, 비트스트림 분석부(250)는, 비디오 정렬부(240)를 통해 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오와 그 다음 하위 레이어의 2번째 비디오를 입력받아 각각의 비트스트림을 분석하여 인코딩 모드(encoding mode), 프리딕션 정보(prediction information)를 포함하는 인코딩 정보를 추출하고, 이어 그 다음 하위 레이어의 3번째 비디오부터 최상위 레이어의 N번째 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출한다.In this case, the bitstream analyzer 250 analyzes each bitstream by receiving the first video of the lowest layer and the second video of the next lower layer among the multitrack videos arranged through the video alignment unit 240. Extracts encoding information including an encoding mode and prediction information, and then analyzes each bitstream from the third video of the lower layer to the Nth video of the uppermost layer, and then encodes the encoding information. Extract

SVC 재인코딩부(260)는 추출된 인코딩 정보를 기반으로 SVC 재인코딩을 실행한다.The SVC re-encoding unit 260 executes SVC re-encoding based on the extracted encoding information.

이때, SVC 재인코딩부(260)는, 입력되는 2 개의 인코딩 정보를 바탕으로 상위 레이어에 해당하는 비트스트림을 매크로블럭 단위로 재인코딩을 수행한다. 즉, SVC 재인코딩부(260)는 하위와 상위 매크로블럭에 대응되는 매크로블럭의 디코딩 된 텍스처(texture) 정보를 업스케일링(up-scaling)하여 인터레이어 인트라 프리딕션(iner-layer intra prediction) 영상을 구하고 이를 이용하여 인코딩 코스트(encoding cost)를 산출하며, 인터레이어 인트라 코딩의 인코딩 코스트가 현재 인코딩 모드(Encoding Mode)의 인코딩 코스트에 특정 상수값을 더한 값보다 크면 현재의 모드로 인코딩을 수행하고, 그렇지 않으면 인터레이어 인트라 프리딕션 모드로 재인코딩을 수행한다. At this time, the SVC re-encoding unit 260 re-encodes the bitstream corresponding to the upper layer in units of macroblocks based on the two encoding information input. That is, the SVC re-encoding unit 260 up-scales decoded texture information of macroblocks corresponding to lower and upper macroblocks to inter-layer intra prediction images. If the encoding cost of the interlayer intra coding is greater than the encoding cost of the current encoding mode plus a certain constant value, the encoding is performed in the current mode. Otherwise, re-encoding is performed in interlayer intra prediction mode.

여기서, 인코딩 모드는 매크로블럭을 압축하는 방법인 인트라 또는 인터 모드를 통틀어 인코딩 모드라 하고, 프리딕션 정보는 motion vector, block partion 정보 등 움직임 예측에 관련된 정보이며, 인터레이어 인트라 프리딕션은 SVC에서 레이어 간 정보를 이용하여 인트라 prediction을 수행하는 것을 의미한다. In this case, the encoding mode is referred to as an encoding mode through intra or inter mode, which is a method of compressing a macroblock. Prediction information is information related to motion prediction such as motion vector and block partion information, and interlayer intra prediction is a layer in SVC. Intra prediction is performed using liver information.

또한, 인코딩 코스트(encoding cost)는 인코딩 시 필요한 비트(bit)량을 나타내며, H.264에서 일반적으로 사용되는 움직임 예측 시 오차(SAD:Sum of Absolute Difference, SSD:Sum of Square Difference, MSE: Mean Square Error)와 인코딩 시 필요한 motion vector의 bit량(R)을 Lagrange multipler ramda를 활용하여 나타내며, SVC 재인코딩부(260)는 encoding cost가 최소가 되는 motion vector를 활용한다.In addition, the encoding cost represents the amount of bits required for encoding, and a motion prediction error (SAD: Sum of Absolute Difference, SSD: Sum of Square Difference, MSE: Mean) Square Error) and the bit amount R of the motion vector required for encoding are represented by using a Lagrange multipler ramda, and the SVC re-encoding unit 260 utilizes a motion vector having a minimum encoding cost.

도 3은 본 발명의 실시예에 따른 비디오 인코딩 장치의 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법을 설명하기 위한 동작 흐름도이다.3 is a flowchart illustrating a method of encoding multitrack video into scalable video of a video encoding apparatus according to an embodiment of the present invention.

도 3을 참조하면, 본 발명에 따른 비디오 인코딩 장치(100)는 원본 비디오를 멀티트랙비디오 인코더(230)에 입력받아 도 4에 도시된 바와 같이 화질이 서로 다른 둘 이상 N 개의 멀티트랙 비디오를 생성한다(S310). 여기서, 도 4는 본 발명의 실시예에 따라 원본 비디오를 N 개의 멀티트랙 비디오로 생성하는 예를 나타낸 도면이다.Referring to FIG. 3, the video encoding apparatus 100 according to the present invention receives an original video to the multitrack video encoder 230 and generates two or more N multitrack videos having different quality as shown in FIG. 4. (S310). 4 is a diagram illustrating an example of generating an original video into N multitrack videos according to an embodiment of the present invention.

이어, 비디오 인코딩 장치(100)는 둘 이상의 멀티트랙 비디오를 비디오 정렬부(240)를 통해 SVC 인코딩에 맞게 정렬한다(S320).Subsequently, the video encoding apparatus 100 aligns two or more multitrack videos to SVC encoding through the video alignment unit 240 (S320).

이때, 비디오 인코딩 장치(100)는 둘 이상의 멀티트랙 비디오에 대해 해상도가 높은 비디오를 더 높은 레이어(Layer)에 정렬하고, 해상도가 동일한 비디오 중에서 프레임율(frame rate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬하며, 해상도와 프레임율이 동일한 경우에 비트율(bitrate)이 높은 비디오를 더 높은 레이어에 위치하도록 정렬한다.In this case, the video encoding apparatus 100 aligns a video having a high resolution for two or more multitrack videos to a higher layer, and among the videos having the same resolution, a video having a high frame rate to a higher layer. If the resolution and frame rate are the same, the video with the higher bitrate is aligned to the higher layer.

이어, 비디오 인코딩 장치(100)는 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어의 1번째 비디오와 그 다음 하위 레이어의 2번째 비디오를 비트스트림 분석부(250)에 입력받아 각각의 비트스트림을 분석하여 인코딩 모드(encoding mode), 프리딕션 정보(prediction information)를 포함하는 인코딩 정보를 추출한다(S330).Subsequently, the video encoding apparatus 100 receives the first video of the lowest layer and the second video of the next lower layer among the sorted multitrack videos, receives the bitstream analyzer 250, and analyzes and encodes each bitstream. The encoding information including the encoding mode and the prediction information is extracted (S330).

이어, 비디오 인코딩 장치(100)는 추출된 2 개의 인코딩 정보를 바탕으로 상위 레이어에 해당하는 비트스트림을 매크로블럭 단위로 SVC 재인코딩을 수행한다(S340).Subsequently, the video encoding apparatus 100 performs SVC re-encoding of the bitstream corresponding to the upper layer on a macroblock basis based on the extracted two encoding information (S340).

이어, 비디오 인코딩 장치(100)는 다음 비디오가 최상위 레이어의 N 번째 비디오이면(S350-예) 인코딩 동작을 종료하고, 그렇지 않으면 그 다음 하위 레이어의 3번째 비디오로부터 최상위 레이어의 N 번째 비디오까지 2 개씩 입력받아 각각의 비트스트림을 분석하여 인코딩 정보를 추출한다(S360).Subsequently, the video encoding apparatus 100 ends the encoding operation if the next video is the Nth video of the top layer (S350-Yes), or else from the third video of the next lower layer to the Nth video of the top layer. The input information is analyzed by analyzing the respective bitstreams to extract encoding information (S360).

그리고, 비디오 인코딩 장치(100)는 2 개의 인코딩 정보를 바탕으로 SVC 재인코딩부(260)를 통해 해당 비트스트림을 매크로블록 단위로 최상위 N 번째 비디오까지 SVC 재인코딩을 수행한다(S340).The video encoding apparatus 100 performs SVC re-encoding up to the Nth video of the corresponding bitstream in units of macroblocks through the SVC re-encoding unit 260 based on two encoding information (S340).

여기서, SVC 재인코딩부(260)는 도 5에 도시된 흐름도와 같이 각 매크로블럭별 재인코딩을 수행한다. 도 5는 본 발명의 실시예에 따른 각 매크로블럭별 SVC 재인코딩 과정을 설명하기 위한 동작 흐름도이다.Here, the SVC re-encoding unit 260 performs re-encoding for each macro block as shown in the flowchart of FIG. 5. 5 is a flowchart illustrating an SVC re-encoding process for each macroblock according to an embodiment of the present invention.

SVC 재인코딩부(260)는 입력되는 2 개의 인코딩(encoding) 정보를 바탕으로 상위 layer에 해당하는 비트스트림을 매크로블럭 단위로 수행한다.The SVC re-encoding unit 260 performs a bitstream corresponding to an upper layer in macroblock units based on two input encoding information.

먼저, SVC 재인코딩부(260)는 하위와 상위 매크로블럭에 대응되는 매크로블럭의 디코딩 된 텍스처 정보를 업스케일링(up-scaling)하여 인터레이어 인트라 프리딕션을 수행하고 inter-layer intra prediction 영상을 구성한다(S510).First, the SVC re-encoding unit 260 up-scales decoded texture information of macroblocks corresponding to lower and upper macroblocks to perform interlayer intra prediction and compose an inter-layer intra prediction image. (S510).

이어, SVC 재인코딩부(260)는 다음 수학식 1과 같이 인터레이어 인트라 모드의 인코딩 코스트(encoding cost)를 산출한다(S520). Subsequently, the SVC re-encoding unit 260 calculates an encoding cost of the interlayer intra mode as shown in Equation 1 (S520).

여기서 R은 inter mode일 때 motion vector를 전송하는 데 필요한 데이터 크기를 나타내는 cost이고, λ는 Lagrange multipler로써 예측오차 cost와 움직임 정보 cost의 weight를 표현하는 상수이다. Intra mode의 encoding cost의 경우에는 R이 zero가 된다.Here, R is a cost representing the data size required to transmit a motion vector in inter mode, and λ is a Lagrange multipler, which is a constant representing the weight of the prediction error cost and the motion information cost. In the case of the encoding cost of the intra mode, R is zero.

이어, SVC 재인코딩부(260)는 수학식 1에 따라 현재 인코딩 모드의 인코딩 코스트를 산출한다(S530).Subsequently, the SVC re-encoding unit 260 calculates an encoding cost of the current encoding mode according to Equation 1 (S530).

이어, SVC 재인코딩부(260)는 인터레이어 인트라 모드의 인코딩 코스트와 현재 인코딩 모드의 인코딩 코스트를 비교하여, 인터레이어 인트라 모드의 인코딩 코스트가 현재 인코딩 모드의 인코딩 코스트에 특정 상수값(α)을 더한 값보다 작으면(S540-예), 인터레이어 인트라 모드로 SVC 재인코딩을 수행하고(S550), 그렇지 않으면(S540-아니오) 현재의 인코딩 모드로 인코딩을 수행한다(S560).Subsequently, the SVC re-encoding unit 260 compares the encoding cost of the interlayer intra mode and the encoding cost of the current encoding mode, so that the encoding cost of the interlayer intra mode has a specific constant α to the encoding cost of the current encoding mode. If it is less than the added value (S540-Yes), SVC re-encoding is performed in the interlayer intra mode (S550), otherwise (S540-No), encoding is performed in the current encoding mode (S560).

이때, 특정 상수값(α)은 inter-layer intra mode로 재인코딩시의 cost가 현재 encoding mode보다 현저히 높은 효율을 보이지 않는다면 현재 encoding mode를 유지하도록 하는 역할을 한다.In this case, the specific constant α serves to maintain the current encoding mode unless the cost of re-encoding into the inter-layer intra mode is significantly higher than the current encoding mode.

전술한 바와 같이 본 발명에 의하면, 둘 이상의 멀티트랙 비디오를 해상도, 프레임율, 비트율에 따라 SVC 인코딩에 맞게 정렬하고, 정렬된 멀티트랙 비디오 중에서 가장 하위 레이어(layer)의 1번째 비디오로부터 최상위 레이어의 비디오까지 각각의 비트스트림을 분석하여 인코딩 정보를 추출하며, 추출한 인코딩 정보를 기반으로 SVC 재인코딩을 실행하여 SVC 비디오를 생성할 수 있도록 된, 멀티트랙 비디오를 스케일러블 비디오로 인코딩하는 방법 및 장치를 실현할 수 있다.As described above, according to the present invention, two or more multitrack videos are aligned to SVC encoding according to resolution, frame rate, and bit rate, and among the aligned multitrack videos, the first layer of the lowest layer and the highest layer A method and apparatus for encoding multitrack video into scalable video, which is capable of extracting encoding information by analyzing each bitstream up to video, and generating SVC video by performing SVC re-encoding based on the extracted encoding information. It can be realized.

본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있으므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims and their equivalents. Only. The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

본 발명은 컨텐츠 서버에서 사용자 단말기로 비디오를 포함하는 컨텐츠를 전송하는 서비스 및 시스템에 적용할 수 있다.The present invention can be applied to a service and a system for transmitting content including a video from a content server to a user terminal.

또한, 사용자 단말기에 컨텐츠를 전송하기 위해 인제스팅 과정을 수행하는 컨텐츠 서버에 적용할 수 있다.In addition, the present invention may be applied to a content server performing an ingestion process to transmit content to a user terminal.

그리고, 비디오 및 오디오를 포함하는 컨텐츠를 통신망을 이용해 서비스하는 컨텐츠 서버 또는 컨텐츠를 수신하여 디스플레이하는 사용자 단말기를 포함하는 컨텐츠 미디어 통신 시스템에 적용할 수 있다.In addition, the present invention may be applied to a content media communication system including a content server serving a content including video and audio or a user terminal receiving and displaying the content using a communication network.

100 : 비디오 제공 시스템 110 : 비디오 인코딩 장치
120 : 통신망 130 : 사용자 단말기
210 : 통신부 220 : 원본비디오 저장부
230 : 멀티트랙비디오 인코더 240 : 비디오 정렬부
250 : 비트스트림 분석부 260 : SVC 재인코딩부100: video providing system 110: video encoding device
120: communication network 130: user terminal
210: communication unit 220: original video storage unit
230: multitrack video encoder 240: video alignment unit
250: bitstream analysis unit 260: SVC re-encoding unit

Claims

Align two or more multitrack videos for SVC encoding according to resolution, frame rate, and bit rate, and in the sorted multitrack video, the first video of the lowest layer and the second to highest layer of the next lower layer. And encoding the extracted information by analyzing the bitstream of the video, and performing SVC re-encoding based on the extracted encoding information.

A video alignment unit for aligning two or more multitrack videos to SVC encoding;
A bitstream analyzer for extracting encoding information by analyzing each bitstream from the first video of the lowest layer to the video of the highest layer among the aligned multitrack videos; And
An SVC re-encoding unit which executes SVC re-encoding based on the encoding information;
Video encoding device comprising a.

The method of claim 2,
The video aligning unit aligns the higher resolution video to higher layers for the two or more multitrack videos, and aligns the video having the highest frame rate to the higher layer among the same resolution video. And arranging video having a higher bitrate in a higher layer when the resolution and the frame rate are the same.

The method of claim 2,
The bitstream analyzer may receive the first video of the lowest layer and the second video of the next lower layer among the sorted multitrack videos, analyze the respective bitstreams, and encode an encoding mode and prediction information ( extracting encoding information including prediction information, and then analyzing each bitstream from a third video of a lower layer to an Nth video of a top layer to extract encoding information.

The method of claim 2,
The SVC re-encoding unit may re-encode a bitstream corresponding to an upper layer in macroblock units based on two input encoding information.

The method of claim 5, wherein
The SVC re-encoding unit may up-scale decoded texture information of macroblocks corresponding to lower and upper macroblocks to obtain an inter-layer intra prediction image and use the same to determine an encoding cost. (encoding cost), and if the encoding cost of the interlayer intra coding is greater than the encoding cost of the current encoding mode plus a certain constant value, the encoding is performed in the current mode; otherwise, the encoding cost is reset to the interlayer intra prediction mode. And a video encoding apparatus.

(a) aligning the two or more multitrack videos for SVC encoding;
(b) extracting encoding information by analyzing a bitstream of a video of the highest layer from the first video of the lowest layer among the aligned multitrack videos; And
(c) executing SVC re-encoding based on the encoding information;
And encoding multitrack video of the video encoding apparatus into scalable video.

The method of claim 7, wherein
In the step (a), the higher resolution video is aligned with the higher layer for the two or more multitrack videos, and the video having the highest frame rate is positioned in the higher layer among the same resolution video. And arranging video having a high bitrate in a higher layer when the resolution and the frame rate are the same so that the multitrack video of the video encoding apparatus is scalable.

The method of claim 7, wherein
In the step (b), the first video of the lowest layer and the second video of the next lower layer are input from the sorted multitrack video, and the respective bitstreams are analyzed to encode an encoding mode and prediction information. extracting encoding information including (prediction information), and then analyzing the respective bitstreams from the third video of the lower layer to the Nth video of the uppermost layer to extract the encoding information of the video encoding apparatus. How to encode multitrack video into scalable video.

The method of claim 7, wherein
In the step (c), the multitrack video of the video encoding apparatus is encoded as scalable video, wherein the bitstream corresponding to the upper layer is re-encoded in units of macroblocks based on the two encoding information input. How to.

11. The method of claim 10,
In the step (c), up-scaling decoded texture information of the macroblocks corresponding to the lower and upper macroblocks to obtain an inter-layer intra prediction image and encoding the same by using the same. Calculates the encoding cost.If the encoding cost of the interlayer intra coding is greater than the encoding cost of the current encoding mode plus a certain constant value, the encoding is performed in the current encoding mode; otherwise, the interlayer intra prediction mode is used. A method for encoding multitrack video of a video encoding apparatus into scalable video, wherein the SVC re-encoding is performed.

(a) aligning the two or more multitrack videos for SVC encoding;
(b) extracting encoding information by analyzing a bitstream of the video of the highest layer from the first video of the lowest layer among the aligned multitrack videos;
(c) calculating an encoding cost of the interlayer intra mode by performing interlayer intra prediction based on the encoding information;
(d) calculating an encoding cost of the current encoding mode; And
(e) comparing the two encoding costs and performing SVC re-encoding in the current encoding mode or the interlayer intra mode according to the result;
And encoding multitrack video of the video encoding apparatus into scalable video.

13. The method of claim 12,
The encoding cost is calculated by, R is a cost representing the data size required to transmit the motion vector in the inter mode, λ is a Lagrange multipler is a constant representing the weight of the prediction error cost and the motion information cost, the Intra A method of encoding multitrack video of a video encoding apparatus into scalable video, wherein R is zero in the case of an encoding cost of the mode.