KR20200119435A

KR20200119435A - Real-time segmented video transcoding device and method

Info

Publication number: KR20200119435A
Application number: KR1020190041176A
Authority: KR
Inventors: 장준환; 박우출; 김용화; 양진욱; 윤상필; 김현욱; 조은경; 최민수; 이준석; 양재영
Original assignee: 한국전자기술연구원
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2020-10-20
Also published as: KR102316495B1; WO2020209437A1

Abstract

The present invention provides an apparatus and a method for transcoding segmented videos in real-time. The apparatus for transcoding segmented videos according to the present invention comprises: an input unit receiving an original video stream; and a control unit generating tiles by spatially segmenting the inputted original video stream, encoding the generated tiled frames in a parallel structure by using a plurality of graphics processing units (GPUs), and generating a first video stream having a first resolution, a second video stream having a second resolution lower than that of the first resolution, and a third video stream having a third resolution lower than that of the second resolution, by rearranging the encoded frames. The present invention can provide the video stream with high quality in real-time by performing rapid calculation.

Description

Real-time segmented video transcoding device and method

본 발명은 트랜스코딩 기술에 관한 것으로, 더욱 상세하게는 복수의 GPU(Graphics Processing Unit)를 이용하여 실시간으로 타일을 고품질로 트랜스코딩하는 실시간 분할 영상 트랜스코딩 장치 및 방법에 관한 것이다.The present invention relates to a transcoding technology, and more particularly, to a real-time segmented image transcoding apparatus and method for transcoding tiles with high quality in real time using a plurality of GPUs (Graphics Processing Units).

최근 다양한 고품질의 영상들이 사용자에게 제공되고 있으며, 그 중에서 360 VR 영상은 스테리오(stereo)로 구성되어 일반적인 4K 영상(3840×2160)보다 더욱 큰 해상도(4096×4096 이상)를 요구한다. 즉 360 VR 영상은 2D 영상과 달리 360°에 해당하는 영상을 스트리밍하기 때문에 평면 영상을 나타내는 2D 영상보다 많은 대역폭을 필요로 한다.Recently, various high-quality images have been provided to users, and among them, 360 VR images are composed of stereos and require a higher resolution (4096×4096 or higher) than a general 4K image (3840×2160). In other words, unlike a 2D image, a 360 VR image requires more bandwidth than a 2D image representing a flat image because it streams an image corresponding to 360°.

한편 360 VR 영상은 많은 대역폭의 필요성뿐만 아니라 영상의 특성상 영상 시청자가 영상 전체를 한 번에 보는 것이 아니고, 영상의 일부만을 보기 때문에 현재 화면에 보이지 않는 영역을 고품질로 전송하면 대역폭을 낭비하는 문제를 가지고 있다. 이에 이러한 문제를 해결하기 위해 다양한 연구를 하였으나, 대역폭을 낭비하지 않으면서 효과적으로 360 VR 영상을 스트리밍하는 기술이 개발되지는 못하였다.On the other hand, 360 VR video not only requires a lot of bandwidth, but due to the nature of the video, viewers do not view the entire video at once, but only a part of the video. Have. Accordingly, various studies have been conducted to solve this problem, but a technology for effectively streaming 360 VR images without wasting bandwidth has not been developed.

한국등록특허공보 제10-1923619호(2018.11.23.)Korean Registered Patent Publication No. 10-1923619 (2018.11.23.)

본 발명이 이루고자 하는 기술적 과제는 원본 비디오 스트림(video stream)을 공간 분할하고, 분할된 타일을 실시간으로 트랜스코딩하는 실시간 분할 영상 트랜스코딩 장치 및 방법을 제공하는데 목적이 있다.An object of the present invention is to provide a real-time segmented image transcoding apparatus and method for spatially segmenting an original video stream and transcoding the segmented tiles in real time.

상기 목적을 달성하기 위해 본 발명의 실시간 분할 영상 트랜스코딩 장치는 원본 비디오 스트림(video stream)을 입력받는 입력부 및 상기 입력된 원본 비디오 스트림을 공간 분할하여 타일을 생성하고, 상기 생성된 타일의 프레임(Tiled frame)을 복수의 GPU(Graphics Processing Unit)를 이용하여 병렬구조로 인코딩하고, 상기 인코딩된 프레임을 재배열하여 제1 해상도를 가지는 제1 비디오 스트림, 제1 해상도보다 낮은 해상도인 제2 해상도를 가지는 제2 비디오 스트림 및 제2 해상도보다 낮은 해상도인 제3 해상도를 가지는 제3 비디오 스트림을 생성하는 제어부를 포함한다.In order to achieve the above object, the real-time segmented image transcoding apparatus of the present invention generates a tile by spatially dividing an input unit receiving an original video stream and the input original video stream, and a frame of the generated tile ( Tiled frames) are encoded in a parallel structure using a plurality of GPUs (Graphics Processing Units), and the encoded frames are rearranged to obtain a first video stream having a first resolution and a second resolution lower than the first resolution. The branch includes a control unit for generating a second video stream and a third video stream having a third resolution lower than the second resolution.

또한 상기 제어부는, 상기 원본 비디오 스트림을 기 설정된 개수의 타일로 분할하여 타일을 생성하는 영상 공간 분할부, 상기 생성된 타일의 프레임과 관련된 작업량을 산출하고, 상기 산출된 작업량에 따라 상기 복수의 GPU에 작업을 할당하는 GPU 작업 관리부, 상기 복수의 GPU를 병렬구조로 구비하고, 각 GPU마다 할당된 작업에 대한 비디오 스트림의 인코딩을 수행하는 GPU부 및 상기 인코딩된 비디오 스트림을 동기화하고, 상기 동기화된 비디오 스트림을 재배열하여 상기 제1 비디오 스트림, 상기 제2 비디오 스트림 및 상기 제3 비디오 스트림을 생성하는 비디오 후처리부를 포함하는 것을 특징으로 한다.In addition, the control unit may further include an image space dividing unit for generating tiles by dividing the original video stream into a preset number of tiles, calculating an amount of work related to a frame of the generated tile, and the plurality of GPUs according to the calculated amount of work. A GPU task management unit that allocates a job to a GPU, and a GPU unit that includes the plurality of GPUs in a parallel structure, performs encoding of a video stream for a job assigned to each GPU, and synchronizes the encoded video stream, and the synchronized And a video post-processor configured to rearrange the video streams to generate the first video stream, the second video stream, and the third video stream.

또한 상기 영상 공간 분할부는, 상기 타일의 가로와 세로의 픽셀 수가 128배수로 분할하는 것을 특징으로 한다.In addition, the image space division unit is characterized in that the number of horizontal and vertical pixels of the tile is divided by a multiple of 128.

또한 상기 영상 공간 분할부는, 상기 타일 중 하단 마지막 가로의 타일과 우측 마지막 세로의 타일의 경우, 픽셀 수에 제한을 두지 않는 것을 특징으로 한다.In addition, in the case of the last horizontal tile at the lower end and the last vertical tile at the right, among the tiles, the image space division unit does not limit the number of pixels.

또한 상기 GPU 작업 관리부는, 각 GPU의 평균 작업 완료 시간 및 할당된 작업 큐(queue)의 크기에 따라 상기 작업을 할당하는 것을 특징으로 한다.In addition, the GPU task manager may allocate the task according to an average task completion time of each GPU and a size of an assigned task queue.

또한 상기 GPU 작업 관리부는, 각 GPU의 작업 종류에 따른 평균 작업 시간을 기준으로 각 GPU의 작업 완료 시간을 예상하여 상기 작업을 할당하는 것을 특징으로 한다.In addition, the GPU task manager may predict a task completion time of each GPU based on an average task time according to a task type of each GPU and allocate the task.

또한 상기 GPU 작업 관리부는, 각 작업을 GOP(Group of Pictures) 사이즈 만큼 타일의 프레임을 GPU에 순차 복사하는 것을 특징으로 한다.In addition, the GPU task management unit is characterized in that each task is sequentially copied to the GPU by the frame of the tile as a GOP (Group of Pictures) size.

또한 상기 GPU 작업 관리부는, 상기 작업과 관련된 정보를 각 GPU에 전달할 때 프레임 번호 정보를 더 포함하여 전달하는 것을 특징으로 한다.In addition, the GPU task manager may further include and transmit frame number information when delivering information related to the task to each GPU.

또한 상기 비디오 후처리부는, 상기 제1 비디오 스트림, 상기 제2 비디오 스트림 및 상기 제3 비디오 스트림에 해당하는 각각의 멀티플렉서를 포함하는 것을 특징으로 한다.In addition, the video post-processing unit may include multiplexers corresponding to the first video stream, the second video stream, and the third video stream.

본 발명에 따른 실시간 타일 트랜스코딩 방법은 분할 영상 트랜스코딩 장치가 원본 비디오 스트림을 입력받는 단계, 상기 분할 영상 트랜스코딩 장치가 상기 입력된 원본 비디오 스트림을 공간 분할하여 타일을 생성하는 단계, 상기 분할 영상 트랜스코딩 장치가 상기 생성된 타일의 프레임을 복수의 GPU를 이용하여 병렬구조로 인코딩하는 단계 및 상기 분할 영상 트랜스코딩 장치가 상기 인코딩된 프레임을 재배열하여 제1 해상도를 가지는 제1 비디오 스트림, 제1 해상도보다 낮은 해상도인 제2 해상도를 가지는 제2 비디오 스트림 및 제2 해상도보다 낮은 해상도인 제3 해상도를 가지는 제3 비디오 스트림을 생성하는 단계를 포함한다.The real-time tile transcoding method according to the present invention includes the steps of receiving an original video stream by a segmented image transcoding apparatus, generating a tile by spatially dividing the input original video stream by the segmented image transcoding apparatus, and Encoding, by a transcoding device, the frame of the generated tile in a parallel structure using a plurality of GPUs, and by rearranging the encoded frames by the split image transcoding device to a first video stream having a first resolution, And generating a second video stream having a second resolution lower than one resolution and a third video stream having a third resolution lower than the second resolution.

본 발명의 실시간 분할 영상 트랜스코딩 장치 및 방법은 원본 비디오 스트림을 공간 분할하고, 분할된 타일을 복수의 GPU를 통해 병렬구조로 트랜스코딩할 수 있다.The apparatus and method for real-time divided image transcoding of the present invention may spatially divide an original video stream and transcode the divided tiles in a parallel structure through a plurality of GPUs.

이때 각 GPU의 평균 작업 완료 시간 및 할당된 작업 큐(queue)의 크기에 따라 복수의 GPU에 작업을 할당함으로써, 빠른 연산을 수행하여 실시간으로 고품질의 비디오 스트림을 제공할 수 있다.At this time, by assigning jobs to a plurality of GPUs according to the average job completion time of each GPU and the size of the allocated job queue, a high-quality video stream can be provided in real time by performing a fast operation.

도 1은 본 발명의 실시예에 따른 분할 영상 트랜스코딩 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 실시예에 따른 분할 영상 트랜스코딩 장치가 구동되는 전체 과정을 설명하기 위한 개략도이다.
도 3은 본 발명의 실시예에 따른 작업 관리 과정을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 작업 할당을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 후처리 과정을 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 분할 영상 트랜스코딩 방법을 설명하기 위한 순서도이다.1 is a block diagram illustrating a split image transcoding apparatus according to an embodiment of the present invention.
2 is a schematic diagram for explaining an entire process of driving a divided image transcoding apparatus according to an embodiment of the present invention.
3 is a view for explaining a work management process according to an embodiment of the present invention.
4 is a diagram for explaining task assignment according to an embodiment of the present invention.
5 is a view for explaining a post-processing process according to an embodiment of the present invention.
6 is a flowchart illustrating a split image transcoding method according to an embodiment of the present invention.

이하 본 발명의 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의한다. 또한 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 당업자에게 자명하거나 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. First of all, in adding reference numerals to elements of each drawing, note that the same elements are to have the same numerals as possible even if they are indicated on different drawings. In addition, in describing the present invention, when it is determined that a detailed description of a related known configuration or function is apparent to those skilled in the art or may obscure the subject matter of the present invention, the detailed description thereof will be omitted.

도 1은 본 발명의 실시예에 따른 분할 영상 트랜스코딩 장치를 설명하기 위한 블록도이고, 도 2는 본 발명의 실시예에 따른 분할 영상 트랜스코딩 장치가 구동되는 전체 과정을 설명하기 위한 개략도이다.FIG. 1 is a block diagram illustrating a split image transcoding apparatus according to an embodiment of the present invention, and FIG. 2 is a schematic diagram illustrating an entire process of driving a divided image transcoding apparatus according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 분할 영상 트랜스코딩 장치(100)는 원본 비디오 스트림을 공간 분할하고, 분할된 타일을 실시간으로 트랜스코딩한다. 분할 영상 트랜스코딩 장치(100)는 입력부(10) 및 제어부(30)를 포함한다.1 and 2, the divided image transcoding apparatus 100 spatially divides an original video stream and transcodes the divided tiles in real time. The split image transcoding apparatus 100 includes an input unit 10 and a control unit 30.

입력부(10)는 원본 비디오 스트림을 입력받는다. 입력부(10)는 파일, 네트워크, API(Application Program Interface) 등 다양한 방식으로 원본 비디오 스트림을 입력받을 수 있다. 여기서 원본 비디오 스트림은 4K 스테레오(stereo)이면서 4096×4096px 이상인 고화질 영상 스트림일 수 있으며, H.264 포맷, HEVC(High Efficiency Video Coding) 포맷, YUV420 로우 프레임(raw frame) 포맷, RGB 로우 프레임 포맷 등을 지원할 수 있다. The input unit 10 receives an original video stream. The input unit 10 may receive an original video stream in various ways, such as a file, a network, and an application program interface (API). Here, the original video stream may be a 4K stereo and high-definition video stream of 4096×4096 px or more, and H.264 format, HEVC (High Efficiency Video Coding) format, YUV420 raw frame format, RGB low frame format, etc. Can support.

제어부(30)는 입력부(10)로부터 입력된 원본 비디오 스트림을 공간 분할하여 타일을 생성한다. 제어부(30)는 생성된 타일의 프레임(Tiled frame)을 복수의 GPU를 이용하여 병렬구조로 인코딩한다. 제어부(30)는 인코딩된 프레임을 재배열하여 제1 해상도를 가지는 제1 비디오 스트림, 제1 해상도보다 낮은 해상도인 제2 해상도를 가지는 제2 비디오 스트림 및 제2 해상도보다 낮은 해상도인 제3 해상도를 가지는 제3 비디오 스트림을 생성한다. 여기서 제1 해상도는 고품질(high quality, HQ)의 고해상도를 의미하고, 제2 해상도는 중품질(middle quality, MQ)의 일반 해상도를 의미하며, 제3 해상도는 저품질(low quality, LQ)의 저해상도를 의미할 수 있다. 제어부(30)는 영상 공간 분할부(31), GPU 작업 관리부(33), GPU부(35) 및 비디오 후처리부(37)를 포함한다.The control unit 30 generates a tile by spatially dividing the original video stream input from the input unit 10. The controller 30 encodes the generated tiled frame in a parallel structure using a plurality of GPUs. The controller 30 rearranges the encoded frames to obtain a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third resolution lower than the second resolution. The branch produces a third video stream. Here, the first resolution refers to a high resolution of high quality (HQ), the second resolution refers to a normal resolution of middle quality (MQ), and the third resolution refers to a low resolution of low quality (LQ). Can mean The control unit 30 includes an image space division unit 31, a GPU task management unit 33, a GPU unit 35, and a video post-processing unit 37.

영상 공간 분할부(31)는 원본 비디오 스트림을 기 설정된 개수의 타일로 분할하여 타일을 생성한다. 영상 공간 분할부(31)는 가로(row)와 세로(column)의 개수가 짝수가 되도록 원본 비디오 스트림을 타일로 분할한다. 예를 들어 영상 공간 분할부(31)는 가로 및 세로를 6×6, 6×8, 8×8, 8×12, 12×12 등으로 분할할 수 있다. 또한 영상 공간 분할부(31)는 각 타일의 세로와 가로의 픽셀 수가 128배수로 분할한다. 예를 들어 영상 공간 분할부(31)는 가로 및 세로를 256×256, 256×512, 512×512 등으로 분할할 수 있다. 이때 영상 공간 분할부(31)는 타일 중 하단 마지막 가로의 타일과 우측 마지막 세로의 타일의 경우, 픽셀 수의 제한을 두지 않을 수 있다. 이를 통해 영상 공간 분할부(31)는 융통성 있게 영상을 복수의 타일로 분할할 수 있다. 한편 영상 공간 분할부(31)는 원본 비디오 스트림에 대해서 논리적인 분할만 수행할뿐 데이터 이동을 수행하지 않는다.The image spatial dividing unit 31 generates tiles by dividing the original video stream into a preset number of tiles. The image spatial dividing unit 31 divides the original video stream into tiles so that the number of rows and columns is even. For example, the image space dividing unit 31 may divide the width and height into 6×6, 6×8, 8×8, 8×12, 12×12, or the like. In addition, the image space division unit 31 divides the number of vertical and horizontal pixels of each tile by a multiple of 128. For example, the image space dividing unit 31 may divide the width and height into 256×256, 256×512, 512×512, or the like. In this case, the image space dividing unit 31 may not limit the number of pixels in the case of the last horizontal tile at the bottom and the last vertical tile at the right of the tiles. Through this, the image space dividing unit 31 may flexibly divide the image into a plurality of tiles. Meanwhile, the image spatial division unit 31 performs only logical division on the original video stream and does not perform data movement.

GPU 작업 관리부(33)는 영상 공간 분할부(31)로부터 생성된 타일의 프레임과 관련된 작업량을 산출한다. GPU 작업 관리부(33)는 산출된 작업량에 따라 복수의 GPU에 작업을 할당한다. 여기서 작업은 GPU를 통해 수행되는 인코딩 작업을 의미할 수 있다. 예를 들어 GPU 작업 관리부(33)는 각 GPU의 평균 작업 완료 시간 및 할당된 작업 큐의 크기에 따라 작업을 할당할 수 있다. 또한 GPU 작업 관리부(33)는 각 GPU의 작업 종류에 따른 평균 작업 시간을 기준으로 각 GPU의 작업 완료 시간을 예상하여 작업을 할당할 수 있다.The GPU task management unit 33 calculates the amount of work related to the frame of the tile generated by the image space division unit 31. The GPU task management unit 33 allocates tasks to a plurality of GPUs according to the calculated amount of work. Here, the operation may mean an encoding operation performed through the GPU. For example, the GPU task management unit 33 may allocate tasks according to the average task completion time of each GPU and the size of the assigned task queue. In addition, the GPU task management unit 33 may predict the task completion time of each GPU and allocate the task based on the average task time according to the task type of each GPU.

GPU부(35)는 복수의 GPU를 포함한다. 예를 들어 GPU부(35)는 제1 GPU, 제2 GPU 내지 제n GPU를 포함할 수 있다. 바람직하게는 GPU부(35)는 동일한 스펙의 GPU를 포함하여 각 GPU 간의 호환이 원활하게 이루어질 수 있도록 할 수 있으나, 이에 한정하지 않고 수행되는 환경에 따라 서로 다른 스펙의 GPU를 포함할 수 있다. GPU부(35)는 복수의 GPU를 병렬구조로 가지고, 각각의 GPU에서 GPU 작업 관리부(33)로부터 할당된 작업에 대한 비디오 스트림의 인코딩을 수행한다.The GPU unit 35 includes a plurality of GPUs. For example, the GPU unit 35 may include a first GPU, a second GPU to an n-th GPU. Preferably, the GPU unit 35 may include GPUs of the same specification to facilitate compatibility between each GPU, but is not limited thereto, and may include GPUs of different specifications depending on the environment to be performed. The GPU unit 35 has a plurality of GPUs in a parallel structure, and encodes a video stream for a job allocated from the GPU job management unit 33 in each of the GPUs.

비디오 후처리부(37)는 GPU부(35)로부터 인코딩된 비디오 스트림을 동기화하고, 동기화된 비디오 스트림을 재배열한다. 비디오 후처리부(37)는 재배열을 통해 제1 비디오 스트림, 제2 비디오 스트림 및 제3 비디오 스트림을 생성한다. 이때 비디오 후처리부(37)는 제1 비디오 스트림, 제2 비디오 스트림 및 제3 비디오 스트림에 해당하는 각각의 멀티플렉서(multiplexer)를 구비할 수 있다.The video post-processing unit 37 synchronizes the video stream encoded from the GPU unit 35 and rearranges the synchronized video stream. The video post-processing unit 37 generates a first video stream, a second video stream, and a third video stream through rearrangement. In this case, the video post-processing unit 37 may include respective multiplexers corresponding to the first video stream, the second video stream, and the third video stream.

도 3은 본 발명의 실시예에 따른 작업 관리 과정을 설명하기 위한 도면이고, 도 4는 본 발명의 실시예에 따른 작업 할당을 설명하기 위한 도면이다. 도 3(a)은 각 GPU별 기존 작업 버퍼 상태를 나타낸 도면이고, 도 3(b)은 새로운 작업을 나타낸 도면이며, 도 3(c)은 각 GPU별 새로운 작업이 할당된 버퍼 상태를 나타낸 도면이다.3 is a diagram for explaining a task management process according to an embodiment of the present invention, and FIG. 4 is a diagram for explaining a task assignment according to an embodiment of the present invention. 3(a) is a diagram showing the status of the existing job buffer for each GPU, FIG. 3(b) is a diagram showing a new job, and FIG. 3(c) is a diagram showing the buffer status to which a new job is allocated for each GPU to be.

도 2 내지 도 4를 참조하면, GPU 작업 관리부(33)는 프레임 버퍼(frame buffer)(51), 작업 큐 로더(work queue loader)(53) 및 로드 밸런서(load balancer)(55)를 포함한다.2 to 4, the GPU work management unit 33 includes a frame buffer 51, a work queue loader 53, and a load balancer 55. .

프레임 버퍼(51)는 영상 공간 분할부(31)로부터 논리적으로 분할되어 생성된 타일의 프레임(Tiled frame)을 저장한다. 이때 프레임 버퍼(51)는 타일의 프레임을 작업 큐 로더(53)에 전달하기 전에 일시적으로 저장하는 기능을 가진다.The frame buffer 51 stores a tile frame generated by logically dividing from the image space dividing unit 31. At this time, the frame buffer 51 has a function of temporarily storing the frame of the tile before transferring it to the work queue loader 53.

작업 큐 로더(53)는 프레임 버퍼(51)로부터 저장된 타일의 프레임과 관련된 작업량을 산출하고, 산출된 작업량에 따라 GPU부(35)에 작업을 할당한다. 작업 큐 로더(53)는 각 GPU의 작업 종류(HQ/MQ/LQ)에 따른 평균 작업 시간을 기준으로 각 GPU의 작업 완료 시간을 예상하여 작업을 할당할 수 있다. 여기서 작업 큐 로더(53)는 하나의 타일의 프레임에 대한 HQ/MQ 두 개의 작업 명령 및 전체 프레임에 대한 LQ 작업 명령을 생성할 수 있다.The job queue loader 53 calculates the amount of work related to the frame of the tile stored from the frame buffer 51, and allocates the work to the GPU unit 35 according to the calculated amount of work. The job queue loader 53 may predict the job completion time of each GPU and allocate the job based on the average job time according to the job type (HQ/MQ/LQ) of each GPU. Here, the work queue loader 53 may generate two HQ/MQ work commands for a frame of one tile and an LQ work command for all frames.

상세하게는 작업 큐 로더(53)는 각 GPU의 평균 작업 완료 시간 및 할당된 작업 큐의 크기를 산출하고, 산출된 정보를 이용하여 각 GPU에 작업을 할당한다. 예를 들어 작업 큐 로더(53)는 새로운 타일의 프레임이 입력되면 각 GPU별 예상되는 평균 작업 시간을 업데이트한다. 이때 작업 큐 로더(53)는 업데이트 정보를 로드 밸런서(55)로부터 수신한다. 작업 큐 로더(53)는 작업 시간에 따라 오름차순으로 정렬을 수행하고, 가장 작업 완료시간이 빠른 GPU에 새롭게 입력된 타일의 프레임을 할당한다. 작업 큐 로더(53)는 할당한 후에도 남은 타일이 존재하는 경우, 상술된 과정을 재수행하여 남은 타일에 대한 작업 할당을 수행한다. In detail, the work queue loader 53 calculates the average work completion time of each GPU and the size of the allocated work queue, and allocates the work to each GPU using the calculated information. For example, when a frame of a new tile is input, the work queue loader 53 updates the estimated average work time for each GPU. At this time, the work queue loader 53 receives update information from the load balancer 55. The work queue loader 53 sorts in ascending order according to the work time, and allocates a frame of a newly input tile to the GPU having the fastest work completion time. If there are remaining tiles even after allocation, the job queue loader 53 performs the above-described process again to perform job allocation for the remaining tiles.

작업 큐 로더(53)는 GPU부(35)에 작업 할당뿐만 아니라 타일의 프레임을 해당 GPU에 복사할 수 있다. 이때 작업 큐 로더(53)는 각 작업을 GOP(Group of Pictures) 사이즈 만큼 타일의 프레임을 GPU에 순차 복사할 수 있다.The job queue loader 53 may not only assign a job to the GPU unit 35 but also copy a frame of a tile to a corresponding GPU. At this time, the job queue loader 53 may sequentially copy the tile frames to the GPU as much as the GOP (Group of Pictures) size for each job.

또한 작업 큐 로더(53)는 작업과 관련된 정보를 GPU에 전달할 때 프레임 번호 정보를 더 포함하여 전달할 수 있다. 여기서 프레임 번호 정보는 시간 정보를 의미한다. In addition, the job queue loader 53 may further include and transmit frame number information when transmitting information related to a job to the GPU. Here, the frame number information means time information.

로드 밸런서(55)는 GPU부(35)로부터 각 GPU의 현재 진행 중인 작업 상태를 수신한다. 로드 밸런서(55)는 수신된 정보를 이용하여 각 GPU의 작업 종류에 따른 평균 작업 시간을 산출한다. 로드 밸런서(55)는 산출된 평균 작업 시간을 작업 큐 로더(53)로 전송하여 작업 큐 로더(53)가 해당 정보를 이용하여 작업 할당을 할 수 있도록 지원한다.The load balancer 55 receives the current work status of each GPU from the GPU unit 35. The load balancer 55 calculates an average working time according to the type of work of each GPU by using the received information. The load balancer 55 transmits the calculated average work time to the work queue loader 53 so that the work queue loader 53 can use the information to perform work allocation.

도 5는 본 발명의 실시예에 따른 후처리 과정을 설명하기 위한 도면이다.5 is a view for explaining a post-processing process according to an embodiment of the present invention.

도 2 및 도 5를 참조하면, 비디오 후처리부(37)는 비디오 동기화부(video synchronizer)(71) 및 멀티플렉서부(multiplexer)(73)를 포함한다. Referring to FIGS. 2 and 5, the video post-processing unit 37 includes a video synchronizer 71 and a multiplexer 73.

비디오 동기화부(71)는 GPU부(35)로부터 인코딩된 비디오 스트림을 동기화한다. 여기서 인코딩된 비디오 스트림은 로드 밸런싱(load balancing)에 의해 순차적으로 생성되지 않을 수 있다. 비디오 동기화부(71)는 인코딩된 결과를 우선적으로 수신하고, 타일 ID별로 버퍼에 저장한다. 이를 통해 비디오 동기화부(71)는 동일한 프레임 타임(frame time)에 대한 타일이 모두 인코딩되면 해당 프레임을 멀티플렉서부(73)로 전달한다.The video synchronization unit 71 synchronizes the video stream encoded from the GPU unit 35. Here, the encoded video stream may not be sequentially generated by load balancing. The video synchronization unit 71 preferentially receives the encoded result and stores it in a buffer for each tile ID. Through this, the video synchronization unit 71 transmits the corresponding frame to the multiplexer 73 when all tiles for the same frame time are encoded.

멀티플렉서부(73)는 비디오 동기화부(71)로부터 전달된 프레임을 재배열하여 하나의 미디어 콘테이너(media container)로 구성한다. 여기서 미디어 콘테이너는 MP4 또는 TS 형태일 수 있다. MP4는 기 설정에 따른 일정 단위 시간(3sec, 5sec 등)만큼의 프레임을 모아 파일/네트워크/API를 통해 전달하고, TS는 하나의 프레임 타임에 대한 작업이 완료되면 곧바로 파일/네트워크/API를 통해 전달한다. 또한 멀티플렉서부(73)는 제1 해상도를 가지는 제1 비디오 스트림, 제1 해상도보다 낮은 해상도인 제2 해상도를 가지는 제2 비디오 스트림 및 제2 해상도보다 낮은 해상도인 제3 해상도를 가지는 제3 비디오 스트림을 생성한다. 이를 위해 멀티플렉서부(73)는 제1 비디오 스트림, 제2 비디오 스트림 및 제3 비디오 스트림에 해당하는 각각의 멀티플렉서(91, 93, 95)를 포함한다. 여기서 제1 해상도는 고품질의 고해상도를 의미하고, 제2 해상도는 중품질의 일반 해상도를 의미하며, 제3 해상도는 저품질의 저해상도를 의미할 수 있다.The multiplexer unit 73 rearranges the frames transmitted from the video synchronization unit 71 to form a single media container. Here, the media container may be in the form of MP4 or TS. MP4 collects frames for a certain unit time (3sec, 5sec, etc.) according to the preset settings and delivers them through file/network/API, and TS is delivered through file/network/API as soon as the work for one frame time is completed. Deliver. In addition, the multiplexer 73 includes a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third video stream having a third resolution lower than the second resolution. Create To this end, the multiplexer 73 includes multiplexers 91, 93, and 95 corresponding to the first video stream, the second video stream, and the third video stream. Here, the first resolution may mean a high-quality high-resolution, the second resolution may mean a medium-quality normal resolution, and the third resolution may mean a low-quality low resolution.

도 6은 본 발명의 실시예에 따른 분할 영상 트랜스코딩 방법을 설명하기 위한 순서도이다.6 is a flowchart illustrating a split image transcoding method according to an embodiment of the present invention.

도 1 및 도 6을 참조하면, 분할 영상 트랜스코딩 방법은 원본 비디오 스트림을 공간 분할하고, 분할된 타일을 복수의 GPU를 통해 병렬구조로 트랜스코딩한다. 이때 분할 영상 트랜스코딩 방법은 각 GPU의 평균 작업 완료 시간 및 할당된 작업 큐의 크기에 따라 복수의 GPU에 작업을 할당함으로써, 빠른 연산을 수행하여 실시간으로 고품질의 비디오 스트림을 제공할 수 있다.Referring to FIGS. 1 and 6, in a split image transcoding method, an original video stream is spatially divided and the divided tiles are transcoded in a parallel structure through a plurality of GPUs. In this case, the split image transcoding method allocates jobs to a plurality of GPUs according to the average job completion time of each GPU and the size of the allocated job queue, thereby performing a fast operation to provide a high-quality video stream in real time.

S110단계에서, 분할 영상 트랜스코딩 장치(100)는 원본 비디오 스트림을 입력받는다. 분할 영상 트랜스코딩 장치(100)는 파일, 네트워크, API(Application Program Interface) 등 다양한 방식으로 원본 비디오 스트림을 입력받는다. 여기서 원본 비디오 스트림은 4K 스테레오(stereo)이면서 4096×4096px 이상인 고화질 영상 스트림일 수 있으며, H.264 포맷, HEVC(High Efficiency Video Coding) 포맷, YUV420 로우 프레임(raw frame) 포맷, RGB 로우 프레임 포맷 등을 지원할 수 있다.In step S110, the split image transcoding apparatus 100 receives an original video stream. The split image transcoding apparatus 100 receives an original video stream in various ways, such as a file, a network, and an application program interface (API). Here, the original video stream may be a 4K stereo and high-definition video stream of 4096×4096 px or more, and H.264 format, HEVC (High Efficiency Video Coding) format, YUV420 raw frame format, RGB low frame format, etc. Can support.

S130단계에서, 분할 영상 트랜스코딩 장치(100)는 입력된 원본 비디오 스트림을 공간 분할하여 타일을 생성한다. 분할 영상 트랜스코딩 장치(100)는 원본 비디오 스트림을 기 설정된 개수의 타일로 분할하여 타일을 생성한다. 분할 영상 트랜스코딩 장치(100)는 가로와 세로의 개수가 짝수가 되도록 원본 비디오 스트림을 타일로 분할하고, 각 타일의 세로와 가로의 픽셀 수가 128배수로 분할할 수 있다. 이때 분할 영상 트랜스코딩 장치(100)는 타일 중 하단 마지막 가로의 타일과 우측 마지막 세로의 타일의 경우, 픽셀 수의 제한을 두지 않을 수 있다.In step S130, the divided image transcoding apparatus 100 generates a tile by spatially dividing the input original video stream. The split image transcoding apparatus 100 generates tiles by dividing the original video stream into a preset number of tiles. The split image transcoding apparatus 100 may divide the original video stream into tiles so that the number of horizontal and vertical numbers is even, and the number of vertical and horizontal pixels of each tile may be divided into 128 times. In this case, the split image transcoding apparatus 100 may not limit the number of pixels in the case of the bottom last horizontal tile and the right last vertical tile among tiles.

S150단계에서, 분할 영상 트랜스코딩 장치(100)는 생성된 타일의 프레임을 복수의 GPU를 이용하여 병렬구조로 인코딩한다. 분할 영상 트랜스코딩 장치(100)는 생성된 타일의 프레임과 관련된 작업량을 산출하고, 산출된 작업량에 따라 복수의 GPU에 작업을 할당하여 인코딩을 수행한다. 이를 통해 분할 영상 트랜스코딩 장치(100)는 GPU의 작업 진행 상태에 맞도록 최적화된 인코딩을 병렬구조로 수행할 수 있다.In step S150, the split image transcoding apparatus 100 encodes the frame of the generated tile in a parallel structure using a plurality of GPUs. The split image transcoding apparatus 100 calculates an amount of work related to the frame of the generated tile, and performs encoding by allocating work to a plurality of GPUs according to the calculated amount of work. Through this, the split image transcoding apparatus 100 may perform encoding optimized to fit the working state of the GPU in a parallel structure.

S170단계에서, 분할 영상 트랜스코딩 장치(100)는 인코딩된 프레임을 재배열한다. 분할 영상 트랜스코딩 장치(100)는 인코딩된 비디오 스트림을 동기화하고, 동기화된 비디오 스트림을 재배열할 수 있다. 이때 분할 영상 트랜스 코딩 장치(100)는 제1 해상도를 가지는 제1 비디오 스트림, 제1 해상도보다 낮은 해상도인 제2 해상도를 가지는 제2 비디오 스트림 및 제2 해상도보다 낮은 해상도인 제3 해상도를 가지는 제3 비디오 스트림을 생성한다. 여기서 제1 해상도는 고품질의 고해상도를 의미하고, 제2 해상도는 중품질의 일반 해상도를 의미하며, 제3 해상도는 저품질의 저해상도를 의미할 수 있다.In step S170, the split image transcoding apparatus 100 rearranges the encoded frames. The split image transcoding apparatus 100 may synchronize the encoded video stream and rearrange the synchronized video stream. In this case, the split image transcoding apparatus 100 includes a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third resolution having a third resolution lower than the second resolution. 3 Create a video stream. Here, the first resolution may mean a high-quality high-resolution, the second resolution may mean a medium-quality normal resolution, and the third resolution may mean a low-quality low resolution.

이상에서 본 발명의 바람직한 실시예에 대해 도시하고 설명하였으나, 본 발명은 상술한 특정의 바람직한 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다.Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific preferred embodiments described above, and without departing from the gist of the present invention claimed in the claims, in the technical field to which the present invention pertains. Anyone of ordinary skill in the art can implement various modifications, as well as such modifications will be within the scope of the claims.

10: 입력부
30: 제어부
31: 영상 공간 분할부
33: GPU 작업 관리부
35: GPU부
37: 비디오 후처리부
51: 프레임 버퍼
53: 작업 큐 로더
55: 로드 밸런서
71: 비디오 동기화부
73: 멀티플렉서부
91: 제1 멀티플렉서
93: 제2 멀티플렉서
95: 제3 멀리플렉서
100: 분할 영상 트랜스코딩 장치10: input
30: control unit
31: image space division unit
33: GPU task management unit
35: GPU unit
37: video post-processing unit
51: frame buffer
53: work queue loader
55: load balancer
71: video synchronization unit
73: multiplexer unit
91: first multiplexer
93: second multiplexer
95: third mulplexer
100: segmented video transcoding device

Claims

An input unit receiving an original video stream; And
A tile is generated by spatially dividing the input original video stream, the tiled frame is encoded in a parallel structure using a plurality of GPUs (Graphics Processing Units), and the encoded frame is rearranged. Thus, a controller for generating a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third video stream having a third resolution lower than the second resolution;
Real-time segmented image transcoding apparatus comprising a.

The method of claim 1,
The control unit,
An image space dividing unit generating tiles by dividing the original video stream into a preset number of tiles;
A GPU task management unit that calculates a work amount related to the frame of the generated tile and allocates work to the plurality of GPUs according to the calculated work amount;
A GPU unit having the plurality of GPUs in a parallel structure and performing encoding of a video stream for a task allocated to each GPU; And
A video post-processing unit synchronizing the encoded video stream and rearranging the synchronized video stream to generate the first video stream, the second video stream, and the third video stream;
Real-time segmented image transcoding apparatus comprising a.

The method of claim 2,
The image space division unit,
The real-time segmented image transcoding apparatus, characterized in that the number of horizontal and vertical pixels of the tile is divided by a multiple of 128.

The method of claim 3,
The image space division unit,
In the case of a bottom last horizontal tile and a right last vertical tile among the tiles, the number of pixels is not limited.

The method of claim 2,
The GPU task management unit,
The real-time segmented image transcoding apparatus, characterized in that the task is allocated according to the average task completion time of each GPU and the size of the assigned task queue.

The method of claim 2,
The GPU task management unit,
A real-time segmented image transcoding apparatus, characterized in that the task is allocated by predicting the task completion time of each GPU based on the average task time according to the task type of each GPU.

The method of claim 2,
The GPU task management unit,
A real-time segmented image transcoding apparatus, characterized in that for each job, a frame of a tile as much as a GOP (Group of Pictures) size is sequentially copied to a GPU.

The method of claim 2,
The GPU task management unit,
The real-time segmented image transcoding apparatus, further comprising and transmitting frame number information when transmitting the information related to the task to each GPU.

The method of claim 2,
The video post-processing unit
And a multiplexer corresponding to the first video stream, the second video stream, and the third video stream.

Receiving, by a split image transcoding apparatus, an original video stream;
Generating a tile by spatially dividing the input original video stream by the divided image transcoding apparatus;
Encoding, by the divided image transcoding apparatus, the frame of the generated tile in a parallel structure using a plurality of GPUs; And
The split image transcoding apparatus rearranges the encoded frames to provide a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a second video stream having a lower resolution than the second Generating a third video stream having 3 resolutions;
Real-time tile transcoding method comprising a.