KR20220051959A

KR20220051959A - Method and Apparatus for Real Time Parallel Video Encoding

Info

Publication number: KR20220051959A
Application number: KR1020200135708A
Authority: KR
Inventors: 김동원; 박용현
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2020-10-20
Filing date: 2020-10-20
Publication date: 2022-04-27

Abstract

Disclosed are an apparatus and method for real-time parallel video encoding. According to an embodiment of the present invention, to efficiently use computing power of idle resources in real-time video encoding of video, the method comprises: a step of allocating, by a master machine, a frame to be encoded to each worker machine on the basis of a hierarchical structure of a group of pictures (GOP); a step of encoding, by each worker machine, the allocated frame on the basis of a synchronized decoded picture buffer (DPB) to generate an elementary stream (ES) in parallel; and a step of combining, by the master machine, the ESs generated in frame/module/multi-pass units by each worker machine, thereby performing real-time video encoding.

Description

Method and Apparatus for Real Time Parallel Video Encoding

본 개시는 비디오에 대한 실시간 병렬 부호화 장치 및 방법에 관한 것이다. 더욱 상세하게는, 마스터 머신(master machine)이 GOP(Group of Pictures)의 계층 구조를 기반으로 각 워커 머신(worker machine) 측에 부호화할 프레임을 할당하고, 각 워커 머신이 동기화된(synchronized) DPB(Decoded Picture Buffer)를 기반으로 할당받은 프레임을 부호화하여 ES(Elementary Stream)를 병렬로 생성하며, 마스터 머신이 각 워커 머신에 의해 프레임/모듈/멀티패스(multipass) 단위로 생성한 ES를 결합함으로써 실시간 비디오 부호화를 수행하는 비디오 병렬 부호화 장치 및 방법에 관한 것이다.The present disclosure relates to a real-time parallel encoding apparatus and method for video. More specifically, the master machine allocates a frame to be encoded to each worker machine side based on the hierarchical structure of the GOP (Group of Pictures), and each worker machine synchronizes the DPB By encoding the allocated frame based on (Decoded Picture Buffer) and generating ES (Elementary Stream) in parallel, the master machine combines the ES generated by each worker machine in units of frame/module/multipass A video parallel encoding apparatus and method for performing real-time video encoding.

이하에 기술되는 내용은 단순히 본 발명과 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아니다. The content described below merely provides background information related to the present invention and does not constitute the prior art.

비디오 부호화(video encoding)를 수행하기 위해 HW(Hardware) 부호화기(encoder) 또는 SW(Software) 부호화기가 이용될 수 있다. HW 부호화기의 경우, 파이프라인(pipeline)의 동기화 및 데이터 처리량(throughput)의 문제로 인하여, 움직임 추정(motion estimation)을 포함하는 부호화의 과정이 SW 부호화기보다 정밀하지 않을 수 있다. 또한, 부호화 과정에서 필요한 자원(resource)이 가변적인 경우, 최대 필요 리소스만큼 HW가 구비되어야 한다는 문제도 있다. 따라서, 압축 성능 및 리소스 활용 측면에서 SW 부호화기를 이용하는 것이 유리하다. A hardware (HW) encoder or a software (SW) encoder may be used to perform video encoding. In the case of the HW encoder, due to problems of pipeline synchronization and data throughput, the encoding process including motion estimation may not be as precise as that of the SW encoder. In addition, when a resource required in the encoding process is variable, there is a problem that HW must be provided as much as the maximum required resource. Therefore, it is advantageous to use a SW encoder in terms of compression performance and resource utilization.

한편, 전통적인 방송(broadcast)의 경우 SW 부호화기(이하 부호화기)에 대한 리소스 수요가 거의 변동하지 않고 고정되나, 개인 실시간 방송의 경우 동적으로 변할 수 있다. 온 프로미스(On Promise) 기반으로 부호화기의 리소스를 구축하는 경우, 실시간 처리(예컨대, 30 fps(frames per second) 영상의 경우 실시간 부호화를 위해 1 프레임당 33 ms 내에 부호화가 종료되어야 함)를 위해 최대 수요에 맞추어야 하므로 리소스 낭비가 발생할 수 있다. On the other hand, in the case of traditional broadcasting, the resource demand for the SW encoder (hereinafter, the encoder) is fixed and hardly fluctuates, but in the case of personal real-time broadcasting, it may change dynamically. For real-time processing (for example, in the case of 30 fps (frames per second) video, encoding must be finished within 33 ms per frame for real-time encoding) when building resources of the encoder based on On Promise It has to meet peak demand, which can lead to wasted resources.

부호화기의 리소스를 조절하기 위해 SVT(Scalable Video Technology)를 이용하는 방법이 존재한다(비특허문헌 1 참조). SVT 기반의 부호화기는 동일 머신(machine) 내에서 부호화기를 구성하는 각 알고리즘 모듈을 파이프라인화하여 병렬 처리를 수행할 수 있다. 머신 내에 CPU(Central Processing Unit) 코어(core)가 많을수록 병렬 처리 성능이 향상될 수 있으나, 동일 영상 프레임에 대한 병렬 처리에 국한된다. 또한, 참조 프레임의 활용 측면에서 머신 간 프레임별 병렬처리가 불가능하다는 문제도 존재한다.There is a method of using SVT (Scalable Video Technology) to adjust the resources of the encoder (see Non-Patent Document 1). The SVT-based encoder can pipeline each algorithm module constituting the encoder in the same machine to perform parallel processing. Parallel processing performance can be improved as there are more CPU (Central Processing Unit) cores in the machine, but it is limited to parallel processing for the same image frame. In addition, in terms of utilization of reference frames, there is also a problem that parallel processing for each frame between machines is impossible.

부호화기의 리소스를 조절하는 다른 방법으로 복수의 머신에 산재하는 유휴 CPU를 이용하는 분산 부호화 방식이 있다(비특허문헌 2 참조). 도 8에 예시된 바와 같이 분산 부호화 방식은 마스터 PC(Personal Computer) 및 워커(worker) PC를 이용하여 머신 간의 병렬 처리를 수행한다. 마스터 PC는 소스 비디오를 GOP(Group of Picture) 단위로 각 워커 PC 측에 분배한 후, 각 워커 PC가 생성한 부분 부호화 스트림을 결합함으로써, 머신 간 병렬 처리를 수행할 수 있다. 그러나, 각 워커 PC에 대한 비디오의 할당이 GOP 단위로 실행되므로, 실시간 부호화가 어렵다는 문제가 존재한다. As another method of adjusting the resources of the encoder, there is a distributed encoding method using idle CPUs scattered in a plurality of machines (see Non-Patent Document 2). As illustrated in FIG. 8 , the distributed coding method performs parallel processing between machines using a master PC (Personal Computer) and a worker PC. The master PC can perform parallel processing between machines by distributing the source video to each worker PC in a GOP (Group of Picture) unit and then combining the partial encoded streams generated by each worker PC. However, since video allocation to each worker PC is performed in units of GOPs, there is a problem in that real-time encoding is difficult.

따라서, 복수의 머신을 이용하여 비디오를 병렬 부호화하되, 실시간으로 처리하는 것이 가능한 SW 부호화 장치 및 방법을 필요로 한다. Accordingly, there is a need for a SW encoding apparatus and method capable of parallel encoding a video using a plurality of machines and processing the video in real time.

비특허문헌 1: https://01.org/svt.Non-Patent Document 1: https://01.org/svt. 비특허문헌 2: http://users.abo.fi/slafond/theses/ Tewodros_Deneke_msc_thesis.pdf.Non-Patent Document 2: http://users.abo.fi/slafond/theses/ Tewodros_Deneke_msc_thesis.pdf.

본 개시는, 실시간으로 비디오를 부호화함에 있어서 유휴 리소스의 컴퓨팅 파워를 효율적을 이용하기 위해, 마스터 머신(master machine)이 GOP(Group of Pictures)의 계층 구조를 기반으로 각 워커 머신(worker machine) 측에 부호화할 프레임을 할당하고, 각 워커 머신이 동기화된(synchronized) DPB(Decoded Picture Buffer)를 기반으로 할당받은 프레임을 부호화하여 ES(Elementary Stream)를 병렬로 생성하며, 마스터 머신이 각 워커 머신에 의해 프레임/모듈/멀티패스(multipass) 단위로 생성한 ES를 결합함으로써 실시간 비디오 부호화를 수행하는 비디오 병렬 부호화 장치 및 방법을 제공하는 데 주된 목적이 있다.According to the present disclosure, in order to efficiently use the computing power of idle resources in encoding video in real time, a master machine is configured on the side of each worker machine based on a hierarchical structure of a Group of Pictures (GOP). Allocates frames to be encoded in , and each worker machine encodes the allocated frames based on a synchronized Decoded Picture Buffer (DPB) to generate ES (Elementary Stream) in parallel, and the master machine sends them to each worker machine. An object of the present invention is to provide a video parallel encoding apparatus and method for performing real-time video encoding by combining ESs generated in units of frames/modules/multipasses.

본 개시의 실시예에 따르면, 소스 비디오(resource video)를 병렬 부호화(parallel encoding)하기 위해 비디오 부호화 장치에서 수행되는 방법에 있어서, 복수의 워커 머신별로 상위수준의 워커 머신이 생성한 제1 비트스트림으로부터 복호한 픽처를 DPB(Decoded Picture Buffer)에 저장하는 과정; 상기 워커 머신별로 부호화를 위한 입력 픽처를 획득하는 과정; 상기 DBP에 저장된 픽처들을 참조 픽처(reference picture)로 이용하여 상기 입력 픽처를 부호화함으로써 상기 각 워커 머신별로 제2 비트스트림을 생성하는 과정; 및 상기 제2 비트스트림을 하위수준의 워커 머신 측으로 제공하고, 상기 제2 비트스트림으로부터 생성한 픽처를 상기 DPB에 저장하는 과정을 포함하는 것을 특징으로 하는 방법을 제공한다. According to an embodiment of the present disclosure, in a method performed by a video encoding apparatus for parallel encoding a source video, a first bitstream generated by a higher-level worker machine for each of a plurality of worker machines storing the decoded picture in a decoded picture buffer (DPB); obtaining an input picture for encoding for each worker machine; generating a second bitstream for each worker machine by encoding the input picture using pictures stored in the DBP as reference pictures; and providing the second bitstream to a lower-level worker machine, and storing a picture generated from the second bitstream in the DPB.

본 개시의 다른 실시예에 따르면, 상기 소스 비디오의 GOP(Group of Picture)를 구성하는 픽처들 간의 계층 구조를 결정한 후, 상기 계층 구조에 기반하여 상기 각 워커 머신별로 상기 참조 픽처에 대한 정보를 제공하고 상기 입력 픽처를 할당하는 과정; 및 상기 제2 비트스트림을 결합하여 상기 소스 비디오에 대한 비트스트림을 생성하는 과정을 더 포함하는 것을 특징으로 하는 방법을 제공한다. According to another embodiment of the present disclosure, after determining a hierarchical structure between pictures constituting a Group of Picture (GOP) of the source video, information on the reference picture is provided for each worker machine based on the hierarchical structure and allocating the input picture; and combining the second bitstream to generate a bitstream for the source video.

본 개시의 다른 실시예에 따르면, 소스 비디오(resource video)를 병렬 부호화(parallel encoding)함에 있어서, 마스터 머신(master machine) 및 복수의 워커 머신(worker machine)을 포함하되, 상기 복수의 워커 머신 각각은, 상위수준의 워커 머신이 생성한 제1 비트스트림으로부터 픽처를 복호하는 외부 참조프레임 복호기(external reference frame decoder); 상기 픽처를 저장하는 DPB(Decoded Picture Buffer); 부호화를 위한 입력 픽처를 획득하는 입력부; 및 상기 DBP에 저장된 픽처들을 참조 픽처(reference picture)로 이용하여 상기 입력 픽처를 부호화함으로써 제2 비트스트림를 생성하고, 상기 제2 비트스트림을 하위수준의 워커 머신 및 상기 마스터 머신 측으로 제공하며, 상기 제2 비트스트림부터 생성한 픽처를 상기 DPB에 저장하는 부호화 모듈을 포함하는 것을 특징으로 하는 비디오 부호화 장치를 제공한다. According to another embodiment of the present disclosure, in parallel encoding a source video (resource video), including a master machine and a plurality of worker machines, each of the plurality of worker machines is an external reference frame decoder that decodes a picture from a first bitstream generated by a higher-level worker machine; DPB (Decoded Picture Buffer) for storing the picture; an input unit for obtaining an input picture for encoding; and generating a second bitstream by encoding the input picture using pictures stored in the DBP as reference pictures, and providing the second bitstream to a lower-level worker machine and the master machine, There is provided a video encoding apparatus comprising an encoding module for storing a picture generated from two bitstreams in the DPB.

본 개시의 다른 실시예에 따르면, 상기 마스터 머신은, 상기 소스 비디오의 GOP(Group of Picture)를 구성하는 픽처들 간의 계층 구조를 결정한 후, 상기 계층 구조에 기반하여 상기 복수의 워커 머신별로 상기 참조 픽처에 대한 정보를 제공하고 상기 입력 픽처를 할당하며, 상기 복수의 워커 머신별 제2 비트스트림을 결합하여 상기 소스 비디오에 대한 비트스트림을 생성하는 것을 특징으로 하는 비디오 부호화 장치를 제공한다. According to another embodiment of the present disclosure, the master machine determines a hierarchical structure between pictures constituting a Group of Picture (GOP) of the source video, and then refers to each of the plurality of worker machines based on the hierarchical structure. Provided is a video encoding apparatus comprising: providing information about a picture, allocating the input picture, and combining a second bitstream for each of the plurality of worker machines to generate a bitstream for the source video.

본 개시의 다른 실시예에 따르면, 소스 비디오(resource video)를 병렬 부호화(parallel encoding)하기 위해 비디오 부호화 장치에서 수행되는 방법이 포함하는 각 단계를 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터프로그램을 제공한다. According to another embodiment of the present disclosure, a computer stored in a computer-readable recording medium to execute each step included in a method performed in a video encoding apparatus for parallel encoding a source video (resource video) program is provided.

이상에서 설명한 바와 같이 본 실시예에 따르면, 마스터 머신이 GOP의 계층 구조를 기반으로 각 워커 머신 측에 부호화할 프레임을 할당하고, 각 워커 머신이 동기화된(synchronized) DPB(Decoded Picture Buffer)를 기반으로 할당받은 프레임을 부호화하여 ES를 병렬로 생성하며, 마스터 머신이 각 워커 머신에 의해 프레임/모듈/멀티패스 단위로 생성된 ES를 결합함으로써 실시간 비디오 부호화를 수행하는 비디오 병렬 부호화 장치 및 방법을 사용함으로써, 실시간으로 비디오를 부호화함에 있어서 유휴 리소스의 컴퓨팅 파워를 효율적을 이용하는 것이 가능해지는 효과가 있다.As described above, according to this embodiment, the master machine allocates a frame to be encoded to each worker machine side based on the hierarchical structure of the GOP, and each worker machine is synchronized based on a synchronized decoded picture buffer (DPB). A video parallel encoding apparatus and method in which the ES is generated in parallel by encoding the frames allocated to By doing so, there is an effect that it becomes possible to efficiently use the computing power of idle resources in encoding video in real time.

도 1은 본 개시의 일 실시예에 따른 비디오 병렬 부호화 장치의 개략적인 블록도이다.
도 2는 본 개시의 일 실시예에 따른 마스터 머신 및 워커 머신의 블록도이다.
도 3은 본 개시의 일 실시예에 따른 GOP의 계층적 구조에 대한 예시도이다.
도 4는 본 개시의 일 실시예에 따른 프레임 단위의 병렬 부호화를 위한 프레임 처리 순서에 대한 예시도이다.
도 5는 본 개시의 일 실시예에 따른 프레임 단위의 병렬 비디오 부호화 방법의 흐름도이다.
도 6은 본 개시의 다른 실시예에 따른 모듈 단위의 병렬 비디오 부호화 방법에 대한 개략적인 예시도이다.
도 7은 본 개시의 다른 실시예에 따른 멀티패스 단위의 병렬 비디오 부호화 방법에 대한 개략적인 예시도이다.
도 8은 분산 부호화 방식에 대한 개략적인 예시도이다. 1 is a schematic block diagram of a video parallel encoding apparatus according to an embodiment of the present disclosure.
2 is a block diagram of a master machine and a worker machine according to an embodiment of the present disclosure.
3 is an exemplary diagram of a hierarchical structure of a GOP according to an embodiment of the present disclosure.
4 is an exemplary diagram of a frame processing sequence for frame-by-frame parallel encoding according to an embodiment of the present disclosure.
5 is a flowchart of a frame-by-frame parallel video encoding method according to an embodiment of the present disclosure.
6 is a schematic exemplary diagram of a module-based parallel video encoding method according to another embodiment of the present disclosure.
7 is a schematic illustration of a method of multipath-based parallel video encoding according to another embodiment of the present disclosure.
8 is a schematic illustration of a distributed coding scheme.

이하, 본 발명의 실시예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 실시예들의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in the description of the present embodiments, if it is determined that a detailed description of a related well-known configuration or function may obscure the gist of the present embodiments, the detailed description thereof will be omitted.

또한, 본 실시예들의 구성요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성요소를 다른 구성요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Also, in describing the components of the present embodiments, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

첨부된 도면과 함께 이하에 개시될 상세한 설명은 본 발명의 예시적인 실시형태를 설명하고자 하는 것이며, 본 발명이 실시될 수 있는 유일한 실시형태를 나타내고자 하는 것이 아니다.DETAILED DESCRIPTION The detailed description set forth below in conjunction with the appended drawings is intended to describe exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced.

본 실시예는 비디오에 대한 실시간 병렬 부호화 장치 및 방법에 관한 내용을 개시한다. 보다 자세하게는, 마스터 머신(master machine)이 GOP(Group of Pictures)의 계층 구조를 기반으로 각 워커 머신(worker machine) 측에 부호화할 프레임을 할당하고, 각 워커 머신이 동기화된(synchronized) DPB(Decoded Picture Buffer)를 기반으로 할당받은 프레임을 부호화하여 ES(Elementary Stream)를 병렬로 생성하며, 마스터 머신이 각 워커 머신에 의해 프레임/모듈/멀티패스(multipass) 단위로 생성한 ES를 결합함으로써 실시간 비디오 부호화를 수행하는 비디오 병렬 부호화 장치 및 방법을 제공한다.This embodiment discloses a real-time parallel encoding apparatus and method for video. In more detail, the master machine allocates a frame to be encoded to each worker machine side based on the hierarchical structure of the GOP (Group of Pictures), and each worker machine synchronizes the DPB ( ES (Elementary Stream) is generated in parallel by encoding the allocated frame based on Decoded Picture Buffer) A video parallel encoding apparatus and method for performing video encoding are provided.

비디오 부호화(video encoding)는 영상 압축 알고리즘을 이용하여 원시 비디오 프레임로부터 압축된 비트스트림을 생성하는 과정이다. 이에 대하여 트랜스코딩(transcoding)은 기압축된 비트스트림을 복호화(decoding)하여 비디오 프레임을 복원한 후, 이를 다른 압축 방식 또는 압축률을 적용하여 압축한 다른 비트스트림을 생성하는 과정이다. Video encoding is a process of generating a compressed bitstream from a raw video frame using an image compression algorithm. In contrast, transcoding is a process of decoding a pre-compressed bitstream to restore a video frame, and then generating another compressed bitstream by applying a different compression method or compression ratio.

ES(Elementary Stream)은 비디오를 부호화하여 생성한 비트스트림을 나타낸다.An ES (Elementary Stream) represents a bitstream generated by encoding a video.

GOP(Group of Pictures)는 복수의 픽처의 묶음으로서, 통상 인트라 예측(intra prediction) 방식을 이용하여 부호화된 IDR(Instantaneous Decoding Refresh) 프레임을 선행 프레임(leading frame)으로 포함한다. 사전에 설정된 적정한 값(예컨대, 17 또는 33 프레임)으로 GOP의 길이가 설정됨으로써, 예측의 누적에 따른 오차 전파(error propagation)가 방지될 수 있다. A Group of Pictures (GOP) is a bundle of a plurality of pictures, and includes an Instantaneous Decoding Refresh (IDR) frame encoded using an intra prediction method as a leading frame. By setting the length of the GOP to an appropriate preset value (eg, 17 or 33 frames), error propagation according to the accumulation of predictions can be prevented.

DPB(Decoded Picture Buffer)는 비디오 부호화/복호화 장치에 포함되고, 복원된 픽처를 저장하는 버퍼이다. 저장된 픽처들은 화면간 예측(inter prediction)에 필요한 참조 프레임으로 이용될 수 있다. A decoded picture buffer (DPB) is a buffer that is included in a video encoding/decoding apparatus and stores a reconstructed picture. The stored pictures may be used as reference frames required for inter prediction.

이하 도 1 및 도 2의 도시를 이용하여, 본 개시에 따른 병렬 비디오 부호화 장치에 관하여 설명한다.Hereinafter, a parallel video encoding apparatus according to the present disclosure will be described with reference to FIGS. 1 and 2 .

도 1은 본 개시의 일 실시예에 따른 비디오 병렬 부호화 장치의 개략적인 블록도이다. 1 is a schematic block diagram of a video parallel encoding apparatus according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따른 비디오 병렬 부호화 장치(100, 이하 부호화 장치)는 부호화에 대한 수요가 발생하는 경우 유휴 리소스의 컴퓨팅 파워를 적절하게 이용하여 소스 비디오(source video)에 대한 프레임/모듈/멀티패스(multipass) 단위의 병렬 비디오 부호화를 수행한다. 부호화 장치(100)는 마스터 머신(master machine, 110) 및 N(N은 자연수) 개의 워커 머신(worker machine, 120)을 포함한다. A video parallel encoding apparatus 100 (hereinafter, referred to as an encoding apparatus) according to an embodiment of the present disclosure appropriately uses computing power of idle resources when a demand for encoding occurs to provide frames/modules/modules for source video Parallel video encoding in a multipass unit is performed. The encoding apparatus 100 includes a master machine 110 and N (N is a natural number) worker machine 120 .

마스터 머신(110)은 프레임/모듈/멀티패스 단위의 병렬화 및 부호화 시 필요한 참조 프레임의 동기화에 필요한 정보를 각 워커 머신(120) 측으로 전달한다. 마스터 머신(110)은 소스 비디오를 각 워커 머신(120) 측으로 그대로 전달하거나 각 워커 머신(120)별로 분배하여 전달할 수 있다. The master machine 110 transmits information necessary for synchronization of reference frames necessary for parallelization and encoding in units of frames/modules/multipaths to each worker machine 120 side. The master machine 110 may deliver the source video to each worker machine 120 as it is, or may distribute and deliver the source video for each worker machine 120 .

전달된 정보를 기반으로 각 워커 머신(120)은 상위수준의 워커 머신으로부터 참조 프레임에 대한 ES를 수신, 복호 및 저장하고, 프레임/모듈/멀티패스 단위로 소스 비디오(source video)에 대한 병렬 부호화를 수행한다. 각 워커 머신(120)은 분배된 소스 비디오를 획득하거나 전체 소스 비디오를 획득한 후 부호화를 위해 분배된 영상을 선별할 수 있다. Based on the transmitted information, each worker machine 120 receives, decodes and stores the ES for the reference frame from the higher-level worker machine, and parallel encodes the source video in units of frames/modules/multipaths. carry out Each worker machine 120 may obtain a distributed source video or obtain a full source video and then select a distributed image for encoding.

도 2는 본 개시의 일 실시예에 따른 마스터 머신 및 워커 머신의 블록도이다.2 is a block diagram of a master machine and a worker machine according to an embodiment of the present disclosure;

마스터 머신(110)은 병렬제어부(210)을 포함하고, 각 워커 머신(120)은 DPB(Decoded Picture Buffer) 매니저(220), 입력부(230) 및 부호화 모듈(encoding module, 240)의 전부 또는 일부를 포함하며, DPB 매니저(220)는 외부 참조프레임 복호기(external reference frame decoder, 222, 이하 참조프레임 복호기)와 동기(synchronized) DPB(224, 이하 DPB)를 포함한다. 한편, 입력부(230)에는 입력신호가 압축된 비트스트림인 경우 이를 복호하기 위한 복호기(미도시)를 포함되어, 부호화 장치(100)가 트랜스코딩을 수행하는 것이 가능하도록 한다.The master machine 110 includes a parallel control unit 210 , and each worker machine 120 is all or part of a decoded picture buffer (DPB) manager 220 , an input unit 230 and an encoding module 240 . The DPB manager 220 includes an external reference frame decoder 222 (hereinafter referred to as a reference frame decoder) and a synchronized DPB 224 (hereinafter referred to as DPB). Meanwhile, when the input signal is a compressed bitstream, the input unit 230 includes a decoder (not shown) for decoding the input signal, so that the encoding apparatus 100 can perform transcoding.

도 1 및 도 2의 도시는 본 실시예에 따른 예시적인 구성이며, 마스터 머신의 구조 및 동작, 워커 머신의 구조 및 동작 형태에 따라 다른 구성요소 또는 구성요소 간의 다른 연결을 포함하는 다양한 구현이 가능하다. 1 and 2 are exemplary configurations according to the present embodiment, and various implementations including other components or other connections between components are possible depending on the structure and operation of the master machine and the structure and operation of the worker machine Do.

마스터 머신(110)의 병렬제어부(210)는 소스 비디오에 포함된 GOP를 병렬 부호화하기 위한 계층 구조 정보를 생성한다. 이러한 계층 구조 정보를 기반으로 병렬제어부(210)는 각 워커 머신(120)이 필요로 하는 참조 프레임에 대한 정보를 생성하고, 소스 비디오 중 각 워커 머신이 부호화할 프레임에 대한 정보를 생성한다. 계층 구조는 하위수준의 워커 머신에 할당되는 프레임이 포함되는 계층이 상위수준의 워커 머신에 할당되는 프레임이 포함되는 계층을 참조하도록 결정될 수 있다. 병렬제어부(210)는 생성한 정보들을 각 워커 머신(120) 측으로 전달하고, 각 워커 머신(120)의 구성요소를 초기화한다. 병렬제어부(210)는 각 워커 머신(120)이 병렬로 생성한 ES를 결합하여, 소스 비디오에 대한 통합 비트스트림을 생성할 수 있다. The parallel control unit 210 of the master machine 110 generates hierarchical structure information for parallel encoding the GOP included in the source video. Based on the hierarchical structure information, the parallel control unit 210 generates information about a reference frame required by each worker machine 120 and generates information about a frame to be encoded by each worker machine in the source video. The hierarchical structure may be determined such that a layer including a frame allocated to a lower-level worker machine refers to a layer including a frame allocated to a higher-level worker machine. The parallel control unit 210 transmits the generated information to each worker machine 120 and initializes the components of each worker machine 120 . The parallel controller 210 may combine the ESs generated in parallel by each worker machine 120 to generate an integrated bitstream for the source video.

워커 머신(120)의 DPB 매니저(220)는, 마스터 머신(110)으로부터 획득된 정보를 이용하여, 상위수준의 워커 머신으로부터 DPB 동기화를 위한 ES를 전달받아서 참조프레임 복호기(222)에 전달한다. 참조프레임 복호기(222)는 해당 워커 머신(120)에서 참조 프레임으로 필요한 프레임의 ES를 복호하여 DPB(224)에 저장한다. DPB(224)는 해당 워커 머신의 부호화 과정에서 복원된 픽처, 및 상위 수준의 워커 머신으로부터 전달받은 ES로부터 복호된 픽처를 모두 저장하여 동기화하고, 부호화 모듈(240)에서 필요로 하는 참조 프레임을 제공한다.The DPB manager 220 of the worker machine 120 uses the information obtained from the master machine 110 to receive the ES for DPB synchronization from the higher-level worker machine and deliver it to the reference frame decoder 222 . The reference frame decoder 222 decodes the ES of a frame required as a reference frame in the corresponding worker machine 120 and stores it in the DPB 224 . The DPB 224 stores and synchronizes both the picture restored in the encoding process of the corresponding worker machine and the picture decoded from the ES received from the higher-level worker machine, and provides a reference frame required by the encoding module 240 do.

한편, 최상위수준에 위치하는 제1 워커 머신은 상위수준의 ES가 존재하지 않으므로, 참조프레임 복호기(222)를 포함하지 않을 수 있다. On the other hand, the first worker machine located at the uppermost level may not include the reference frame decoder 222 because the upper level ES does not exist.

워커 머신(120)의 입력부(230)는 소스 비디오를 획득하여 부호화 모듈(240) 측으로 전달한다. 입력부(230)는 마스터 머신(110)에 의해 분배된 소스 비디오를 획득하거나 전체 소스 비디오를 획득한 후 부호화를 위해 분배된 영상만을 선별할 수 있다. The input unit 230 of the worker machine 120 acquires the source video and transmits it to the encoding module 240 . The input unit 230 may acquire the source video distributed by the master machine 110 or select only the distributed image for encoding after acquiring the entire source video.

부호화 장치(100)가 트랜스코딩을 수행하는 경우, 입력부(230)는 소스 비디오에 대한 비트스트림을 획득하여 복호한 후, 복호한 프레임 중에서 해당 워커 머신에서 부호화할 프레임을 선별하여 부호화 모듈(240) 측으로 전달한다. 마스터 머신(110)이 제공한 정보를 기반으로 입력부(230)는 부호화할 프레임을 선별할 수 있다. 입력부(230)는 획득된 영상이 어느 형식(예컨대, ES, HDMI(High-definition Multimedia Interface), SDI(Serial Digital Interface) 등)을 갖더라도 마스터 머신(110)에 의해 설정된 프레임을 선별할 수 있다.When the encoding apparatus 100 performs transcoding, the input unit 230 obtains and decodes a bitstream of the source video, selects a frame to be encoded by the worker machine from among the decoded frames, and performs the encoding module 240 . pass to the side Based on the information provided by the master machine 110 , the input unit 230 may select a frame to be encoded. The input unit 230 selects the frame set by the master machine 110 no matter what format the acquired image has (eg, ES, HDMI (High-definition Multimedia Interface), SDI (Serial Digital Interface), etc.) .

워커 머신(120)의 부호화 모듈(240)은 마스터 머신(110)이 제공한 정보를 기반으로 입력부(230)에서 선별적으로 획득된 소스 비디오를 부호화하여 ES를 생성한다. 이때, DPB(224)에 저장된 프레임들이 참조 프레임으로 이용될 수 있다. 부호화 모듈(240)은 생성된 ES를 마스터 머신(110) 측으로 제공한다. 또한, 부호화 모듈(240)은 생성된 ES를 하위수준의 워커 머신 측으로 제공하여 해당 워커 머신에서 참조 프레임으로 이용될 수 있도록 한다. The encoding module 240 of the worker machine 120 generates the ES by encoding the source video selectively obtained from the input unit 230 based on the information provided by the master machine 110 . In this case, frames stored in the DPB 224 may be used as reference frames. The encoding module 240 provides the generated ES to the master machine 110 side. Also, the encoding module 240 provides the generated ES to a lower-level worker machine to be used as a reference frame in the corresponding worker machine.

부호화 모듈(240)은 복호기를 포함하여 픽처를 복원한 후 ES 생성을 위한 인터 예측에 이용한다. 이때 복원된 픽처는 DPB(224) 측으로 제공되어 이후의 부호화 과정에서 참조 프레임으로 이용될 수 있다. The encoding module 240 includes a decoder to reconstruct a picture and then uses it for inter prediction for ES generation. In this case, the reconstructed picture may be provided to the DPB 224 and used as a reference frame in a subsequent encoding process.

부호화 장치(100)가 모듈/멀티패스 단위의 병렬 부호화를 수행하는 경우, 부호화 모듈(240)은 부호화 과정에서 생성된 부호화 정보를 마스터 머신(110) 또는 다른 하위수준의 워커 머신 측으로 제공할 수 있다. When the encoding apparatus 100 performs parallel encoding in units of modules/multipaths, the encoding module 240 may provide encoding information generated in the encoding process to the master machine 110 or other lower-level worker machines. .

이하 도 3 내지 도 5의 예시를 이용하여 부호화 장치(100)가 수행하는 프레임 단위의 병렬 비디오 부호화 방법에 대하여 설명한다.Hereinafter, a frame-by-frame parallel video encoding method performed by the encoding apparatus 100 using examples of FIGS. 3 to 5 will be described.

도 3은 본 개시의 일 실시예에 따른 GOP의 계층적 구조에 대한 예시도이다. 도 4는 본 개시의 일 실시예에 따른 프레임 단위의 병렬 부호화를 위한 프레임 처리 순서에 대한 예시도이다. 3 is an exemplary diagram of a hierarchical structure of a GOP according to an embodiment of the present disclosure. 4 is an exemplary diagram of a frame processing sequence for frame-by-frame parallel encoding according to an embodiment of the present disclosure.

통상 부호화 효율 향상 측면에서, 소스 비디오의 GOP를 구성하는 프레임의 디스플레이 순서와 부호화 순서는 상이할 수 있다. 도 3의 예시에서, 9 개의 각 프레임이 예시된 순서는 디스플레이 순서이고, 각 프레임에 표시된 숫자는 부호화 순서를 나타낸다. In general, in terms of improving encoding efficiency, the display order and the encoding order of frames constituting the GOP of the source video may be different. In the example of FIG. 3 , the order in which each of the nine frames is illustrated is a display order, and a number displayed in each frame indicates an encoding order.

이러한 디스플레이 순서와 부호화 순서 간의 상이함을 기반으로, 마스터 머신(110)은 참조 프레임 설정을 위한 계층 구조를 결정하고, 병렬 부호화를 위해 각 워커 머신(120)별로 부호화할 프레임을 할당할 수 있다. Based on the difference between the display order and the encoding order, the master machine 110 may determine a hierarchical structure for setting a reference frame and allocate a frame to be encoded for each worker machine 120 for parallel encoding.

도 3의 예시에서, 프레임 0는 IDR 프레임이고, 프레임 1은 IDR 프레임을 참조한다. 또한, 참조 P는 단방향 예측에 사용될 수 있는 프레임을 나타내고, 참조 B는 양방향 예측에 사용될 수 있는 프레임을 나타내며, 참조 b는 계층 구조의 상단에 위치하여 참조 프레임으로 이용되지 않는 비참조 프레임을 나타낸다. 참조 프레임과 비참조 프레임을 나타내는 도 3의 예시는 하나의 예일뿐으로, 통상 계층 구조의 상단에 위치하는 프레임은 먼저 부호화되어 하단에 위치하는 어느 프레임이든 참조할 수 있다. 예컨대, 상단에 위치하는 프레임은 하단에 위치하는 적어도 하나의 프레임을 참조할 수 있다.In the example of FIG. 3 , frame 0 is an IDR frame, and frame 1 refers to an IDR frame. In addition, reference P indicates a frame that can be used for unidirectional prediction, reference B indicates a frame that can be used for bidirectional prediction, and reference b indicates a non-reference frame that is located at the top of the hierarchical structure and is not used as a reference frame. The example of FIG. 3 showing the reference frame and the non-reference frame is just one example, and a frame located at the upper end of a normal hierarchical structure is encoded first, and any frame located at the lower end can be referred to. For example, a frame positioned at the upper end may refer to at least one frame positioned at the lower end.

이러한 참조 프레임의 범위를 고려하여 도 4에 예시된 바와 같이 4 대의 워커 머신 각각에 대해 부호화를 위한 프레임이 할당될 수 있다. 도 4에 예시된 GOP는 17 개의 픽처를 포함하고, 도 3의 예시는 그중 9 개의 픽처에 대한 계층 구조를 나타낸 것이다. In consideration of the range of the reference frame, a frame for encoding may be allocated to each of four worker machines as illustrated in FIG. 4 . The GOP illustrated in FIG. 4 includes 17 pictures, and the example of FIG. 3 shows a hierarchical structure for 9 pictures among them.

도 4의 예시에서, 상대적으로 계층 구조의 하단에 있는 프레임을 처리하는 워커 머신이 상대적으로 상위수준인 것으로 가정한다. 예컨대, 최상위수준의 제1 워커 머신은, 부호화 순서 0 내지 2인 프레임들을 부호화하기 위해 자체적으로 참조 프레임을 생성하고 이용할 수 있다. 제2 워커 머신은, 부호화 순서 3, 4인 프레임의 부호화를 위해 먼저 부호화된 어느 프레임이든 참조 프레임으로 이용할 수 있도록, 상위수준의 제1 워커 머신으로부터 부호화 순서 0 내지 2인 프레임들을 ES 형태로 전달받은 후, 복호하여 DPB에 저장한다. 또한, 제3 워커 머신 및 제4 워커 머신은, 부호화 순서 5 내지 8인 프레임의 부호화를 위해 먼저 부호화된 어느 프레임이든 참조 프레임으로 이용할 수 있도록, 상위수준의 제1 워커 머신 및 제2 워커 머신으로부터 부호화 순서 0 내지 4인 프레임들을 ES 형태로 전달받은 후, 복호하여 DPB에 저장한다. In the example of FIG. 4 , it is assumed that a worker machine processing a frame at the lower end of the hierarchical structure is relatively higher. For example, the top-level first worker machine may generate and use a reference frame by itself to encode frames in the encoding order 0 to 2. The second worker machine transfers the frames in the encoding order 0 to 2 from the first worker machine of the higher level in the form of ES so that any frame encoded first can be used as a reference frame for encoding frames having the encoding order 3 and 4 After receiving, it is decoded and stored in DPB. In addition, the third worker machine and the fourth worker machine are configured from the first and second worker machines of higher levels so that any frame encoded earlier can be used as a reference frame for the encoding of frames in the encoding order 5 to 8. After receiving frames in encoding order 0 to 4 in the ES format, they are decoded and stored in the DPB.

동일한 크기의 GOP에 대한 처리가 반복되는 경우, 도 4에 예시된 바와 같이 계층 구조에 기반하는 프레임 단위 파이프라인의 일부를 형성한 채로, 각 워커 머신은 할당된 프레임 대한 병렬 부호화를 수행할 수 있다. 예컨대, 30 fps 영상의 경우, 기존 방식에서는 프레임당 33 ms 이내에 하나의 프레임에 대한 부호화가 종료되어야 한다. 이에 대하여, 도 4에 예시된 바에 따른 본 개시에서는, 4 대의 워커 머신이 병렬처리를 수행함으로써 하나의 프레임당 133 ms의 부호화 시간이 허용됨으로써 실시간 비디오 부호화의 실현에 더 근접할 수 있다. 단, 최상위수준의 제1 워커 머신의 경우, 하나의 프레임을 더 처리해야 하므로 하나의 프레임당 107 ms의 부호화 시간이 허용될 수 있다. When processing for GOPs of the same size is repeated, as illustrated in FIG. 4 , each worker machine can perform parallel encoding on an assigned frame while forming a part of a frame-by-frame pipeline based on a hierarchical structure. . For example, in the case of a 30 fps video, in the existing method, encoding of one frame must be finished within 33 ms per frame. In contrast, in the present disclosure as exemplified in FIG. 4 , a coding time of 133 ms per one frame is allowed by four worker machines performing parallel processing, which can be closer to realization of real-time video coding. However, in the case of the top-level first worker machine, since one frame needs to be further processed, an encoding time of 107 ms per frame can be allowed.

도 5는 본 개시의 일 실시예에 따른 프레임 단위의 병렬 비디오 부호화 방법의 흐름도이다.5 is a flowchart of a frame-by-frame parallel video encoding method according to an embodiment of the present disclosure.

부호화 장치(100)의 마스터 머신(110)은 GOP를 구성하는 프레임들의 계층 구조를 결정하고, 이에 따라 각 워커 머신(120)별로 소스 비디오 중 부호화할 프레임을 할당한다(S500). 마스터 머신(110)은 하위수준의 워커 머신에 할당되는 프레임이 포함되는 계층이 상위수준의 워커 머신에 할당되는 프레임이 포함되는 계층을 참조하도록 계층 구조를 결정할 수 있다. 마스터 머신(110)은 참조 프레임에 대한 정보, 및 부호화할 프레임에 대한 정보를 각 워커 머신(120) 측으로 전달한다.The master machine 110 of the encoding apparatus 100 determines the hierarchical structure of frames constituting the GOP, and accordingly allocates a frame to be encoded among the source video for each worker machine 120 ( S500 ). The master machine 110 may determine the hierarchical structure so that a layer including a frame allocated to a lower-level worker machine refers to a layer including a frame allocated to a higher-level worker machine. The master machine 110 transmits information on a reference frame and information on a frame to be encoded to each worker machine 120 .

한편, 트랜스코딩을 수행하는 경우, 입력되는 소스 비디오의 비트스트림에 내재된 GOP의 계층 구조를 그대로 이용하거나 GOP에 대한 신규 계층 구조를 결정하여 부호화 장치(100)는 병렬 부호화를 수행할 수 있다. 따라서, GOP 계층 구조의 재사용 여부와 관계없이, 입력 비트스트림과 출력 비트스트림 간의 동일 GOP 내의 참조 관계는 동일하지 않을 수 있다.Meanwhile, when transcoding is performed, the encoding apparatus 100 may perform parallel encoding by using the hierarchical structure of the GOP embedded in the bitstream of the input source video as it is or by determining a new hierarchical structure for the GOP. Therefore, regardless of whether the GOP hierarchical structure is reused, the reference relationship in the same GOP between the input bitstream and the output bitstream may not be the same.

도 4에 예시된 바와 같이 부호화할 프레임이 할당되면, 각 워커 머신은 사전에 복호되어 참조 가능한 모든 픽처를 포함하지 못할 수 있다. 따라서, 각 워커 머신은 부호화한 ES를 하위수준의 워커 머신에 전달하여 참조 프레임으로 이용될 수 있도록 한다. ES의 전달은 외부 버퍼를 이용하여 각 워커 머신(120) 간에 수행되거나, 마스터 머신(110)에 의해 수행될 수 있다.As illustrated in FIG. 4 , when a frame to be encoded is allocated, each worker machine may not include all pictures that can be decoded and referenced in advance. Therefore, each worker machine transmits the encoded ES to a lower-level worker machine so that it can be used as a reference frame. The transfer of the ES may be performed between each worker machine 120 using an external buffer, or may be performed by the master machine 110 .

각 워커 머신(120)은 상위수준의 워커 머신으로부터 전달받은 ES로부터 복호한 픽처를 DPB(224)에 저장한다(S502). Each worker machine 120 stores the picture decoded from the ES received from the higher-level worker machine in the DPB 224 (S502).

참조 프레임에 대한 정보가 계층 정보인 경우, 각 워커 머신(120)은 부호화할 프레임이 참조 가능한 모든 프레임을 전달받은 ES로부터 복호하여 DPB(224)에 저장할 수 있다. 참조 프레임에 대한 정보가 참조 프레임에 대한 인덱스인 경우, 각 워커 머신(120)은 인덱스에 해당하는 프레임만을 전달받은 ES로부터 복호하여 DPB(224)에 저장할 수 있다. When the information about the reference frame is hierarchical information, each worker machine 120 may decode all frames that can be referenced by the frame to be encoded from the ES and store it in the DPB 224 . When the information on the reference frame is an index for the reference frame, each worker machine 120 may decode only the frame corresponding to the index from the received ES and store it in the DPB 224 .

최상위수준에 위치하는 워커 머신은 상위수준의 ES가 존재하지 않으므로, 전달받은 ES에 대한 복호화 과정이 생략될 수 있다. Since the upper-level ES does not exist in the worker machine located at the top level, the decryption process for the received ES may be omitted.

각 워커 머신(120)이 ES 복호화를 수행하는 것은 중복적인 요소일 수 있으나, 복호화는 부호화에 비하여 상대적으로 컴퓨팅 파워를 적게 소모하므로 병렬처리에 따른 장점이 더 클 수 있다. It may be a redundant factor for each worker machine 120 to perform ES decoding, but since decoding consumes relatively less computing power compared to encoding, the advantage of parallel processing may be greater.

각 워커 머신(120)은 할당된 프레임을 획득한다(S504). 각 워커 머신(120)은 마스터 머신(110)에 의해 할당된 프레임을 획득하거나 전체 소스 비디오를 획득한 후 부호화를 위해 분배된 영상만을 선별할 수 있다. Each worker machine 120 acquires an assigned frame (S504). Each worker machine 120 may acquire the frame allocated by the master machine 110 or select only the distributed image for encoding after acquiring the entire source video.

한편, 트랜스코딩을 수행하는 경우, 마스터 머신(110)이 제공한 정보를 기반으로 각 워커 머신(120)은 소스 비디오에 대한 비트스트림을 획득하여 복호한 후, 복호한 프레임 중에서 해당 워커 머신에서 부호화할 프레임을 선별할 수 있다.On the other hand, when transcoding is performed, each worker machine 120 obtains and decodes a bitstream for a source video based on information provided by the master machine 110, and then encodes the decoded frame in the corresponding worker machine. You can select the frame to be used.

각 워커 머신(120)은 DPB(224)에 저장된 픽처들을 참조 프레임으로 이용하여 할당된 프레임을 부호화함으로써 ES를 생성한다(S506). 마스터 머신(110)이 제공한 정보에 기반하여, 각 워커 머신(120)은 DPB(224)에 저장된 프레임들을 참조 프레임으로 사용할 수 있다.Each worker machine 120 generates an ES by encoding an assigned frame using pictures stored in the DPB 224 as reference frames (S506). Based on the information provided by the master machine 110 , each worker machine 120 may use frames stored in the DPB 224 as reference frames.

도 4에 예시된 바와 같이 하나의 GOP가 복수의 워커 머신에 의하여 부호화되므로 부호화 장치(100)가 소스 비디오를 실시간 처리하는 것이 가능해진다. As illustrated in FIG. 4 , since one GOP is encoded by a plurality of worker machines, it is possible for the encoding apparatus 100 to process the source video in real time.

각 워커 머신(120)은 생성된 ES를 DPB(224)에 저장하고, 하위 수준의 워커 머신 측으로 제공하며, 마스터 머신(110) 측으로 제공한다(S508). 각 워커 머신(120)은 ES를 생성하는 과정에서 인터 예측을 위해 생성된 복원 픽처를 DPB(224)에 저장하여 이후 참조 프레임으로 사용한다. 각 워커 머신(120)은 생성된 ES를 하위수준의 워커 머신에 제공하여, 참조 프레임으로 이용될 수 있도록 한다. Each worker machine 120 stores the generated ES in the DPB 224, provides it to a lower-level worker machine, and provides it to the master machine 110 (S508). Each worker machine 120 stores the reconstructed picture generated for inter prediction in the process of generating the ES in the DPB 224 and uses it as a reference frame thereafter. Each worker machine 120 provides the generated ES to a lower-level worker machine to be used as a reference frame.

마스터 머신(110)은 각 워커 머신(120)으로부터 전달받은 ES를 결합하여 소스 비디오에 대한 비트스트림을 생성한다(S510). The master machine 110 generates a bitstream for the source video by combining the ESs received from each worker machine 120 (S510).

이상에서 설명한 바와 같이 본 실시예에 따르면, 마스터 머신이 GOP의 계층 구조를 기반으로 각 워커 머신 측에 부호화할 프레임을 할당하고, 각 워커 머신은 동기화된(synchronized) DPB(Decoded Picture Buffer)를 이용하여 할당받은 프레임을 부호화하여 ES를 병렬로 생성하며, 마스터 머신이 각 워커 머신이 프레임 단위로 생성한 ES를 결합함으로써 실시간 비디오 부호화를 수행하는 비디오 병렬 부호화 장치 및 방법을 사용함으로써, 실시간으로 비디오를 부호화함에 있어서 유휴 리소스의 컴퓨팅 파워를 효율적을 이용하는 것이 가능해지는 효과가 있다.As described above, according to this embodiment, the master machine allocates a frame to be encoded to each worker machine side based on the hierarchical structure of the GOP, and each worker machine uses a synchronized Decoded Picture Buffer (DPB) By using a video parallel encoding apparatus and method that generates ESs in parallel by encoding the allocated frames, the master machine performs real-time video encoding by combining the ESs generated by each worker machine frame by frame, There is an effect that it becomes possible to efficiently use the computing power of idle resources in encoding.

이하 부호화 장치(100)가 수행하는 모듈 단위의 병렬 비디오 부호화 방법에 대하여 설명한다.Hereinafter, a module-by-module parallel video encoding method performed by the encoding apparatus 100 will be described.

통상의 비디오 부호화 장치는 움직임 추정(motion estimation: ME), 움직임 보상(motion compensation), 변환(transform), 양자화(quantization), 엔트로피 부호화(entropy coding), 인루프 필터링(in-loop filtering) 등을 수행하는 모듈들을 포함한다. 본 개시에 따른 모듈 단위의 부호화에서는, 각 모듈들이 머신별로 파이프라인에 배치 및 분산하여 병렬처리된 후, 각 모듈의 출력 정보가 마스터 머신의 부호화 모듈 측으로 전달된다. A typical video encoding apparatus performs motion estimation (ME), motion compensation, transform, quantization, entropy coding, in-loop filtering, etc. Includes modules that perform. In the module-based encoding according to the present disclosure, after each module is arranged and distributed in a pipeline for each machine and parallel-processed, the output information of each module is transmitted to the encoding module side of the master machine.

도 6은 본 개시의 다른 실시예에 따른 모듈 단위의 병렬 비디오 부호화 방법에 대한 개략적인 예시도이다.6 is a schematic exemplification of a module-based parallel video encoding method according to another embodiment of the present disclosure.

도 6의 예시에서 제2 머신(620)의 입력부(230)는 비디오 소스를 획득하여 ME를 수행하는 모듈에 전달한다. 제2 머신(620)은 동기 DBP(224)에 저장된 픽처들을 참조 프레임으로 이용하여 ME를 수행한다. 제2 머신(620)은 ME의 수행에 따른 움직임 벡터를 마스터 머신인 제1 머신(610) 측으로 제공한다. 또한, 제2 머신(620)의 참조프레임 복호기(222)는 제1 머신(610)으로부터 획득한 비트스트림으로부터 복호 픽처를 생성하여 동기 DBP(224)에 저장한다. 동기 DBP(224)에 저장된 픽처들은 이후 참조 프레임으로 이용될 수 있다. In the example of FIG. 6 , the input unit 230 of the second machine 620 acquires a video source and transmits it to a module performing ME. The second machine 620 performs ME using the pictures stored in the synchronization DBP 224 as reference frames. The second machine 620 provides a motion vector according to the performance of the ME to the first machine 610 , which is the master machine. In addition, the reference frame decoder 222 of the second machine 620 generates a decoded picture from the bitstream obtained from the first machine 610 and stores it in the synchronization DBP 224 . Pictures stored in the synchronization DBP 224 may be used as reference frames thereafter.

한편, 제1 머신(610)의 입력부(230)는 비디오 소스를 획득하여 부호화 모듈(240) 측으로 전달한다. 제2 머신으로부터 전달받은 ME 정보를 이용하여 부호화 모듈(240)은 ME를 제외한 부호화의 모든 과정을 수행함으로써 비트스트림을 생성한다. 또한, 제1 머신(610)은 비트스트림을 제2 머신(620) 측으로 제공하여, 이후 제2 머신(620)에서 ME 수행 시 참조 프레임으로 사용될 수 있도록 한다. Meanwhile, the input unit 230 of the first machine 610 acquires a video source and transmits it to the encoding module 240 . Using the ME information received from the second machine, the encoding module 240 generates a bitstream by performing all encoding processes except for the ME. In addition, the first machine 610 provides the bitstream to the second machine 620 so that it can be used as a reference frame when the second machine 620 performs ME.

도 6에 예시된 바와 같이, 제1 머신(610)이 제2 머신(620)의 ME 정보를 이용해야 하므로, 제2 머신(620)이 수행하는 ME는 제1 머신(610)이 수행하는 부호화보다 선행한다. As illustrated in FIG. 6 , since the first machine 610 must use the ME information of the second machine 620 , the ME performed by the second machine 620 is the encoding performed by the first machine 610 . precede more

트랜스코딩을 수행하는 경우, 제1 머신(610) 및 제2 머신(620)의 입력부(230)는 소스 비디오에 대한 비트스트림을 획득하여 복호한 후, 부호화 모듈(240) 측으로 전달할 수 있다. When transcoding is performed, the input unit 230 of the first machine 610 and the second machine 620 may obtain and decode the bitstream for the source video, and then transmit it to the encoding module 240 .

마스터 머신(110)이 제공한 정보를 기반으로 각 워커 머신(120)은 ES로부터 복호한 프레임 중에서, 해당 워커 머신에서 부호화할 프레임을 선별할 수 있다.Based on the information provided by the master machine 110 , each worker machine 120 may select a frame to be encoded by the worker machine from among the frames decoded from the ES.

한 머신 내에서 모든 부호화 모듈을 수행함으로써 부호화의 복잡도가 현저히 증가될 수밖에 없는 기존 방식과 비교하여, 본 개시에 따른 모듈 단위의 부호화에서는, 통상 가장 많은 연산량을 요구하는 ME가 다른 머신에서 병렬처리되도록 함으로써, ME의 효율을 향상시키고 안정적인 비디오 부호화를 수행하는 것이 가능해지는 효과가 있다.Compared with the existing method in which encoding complexity is significantly increased by performing all encoding modules in one machine, in module-unit encoding according to the present disclosure, ME, which usually requires the most amount of computation, is processed in parallel in another machine. By doing so, there is an effect of improving the efficiency of the ME and making it possible to perform stable video encoding.

이하 부호화 장치(100)가 수행하는 멀티패스(mutipass) 단위의 병렬 비디오 부호화 방법에 대하여 설명한다.Hereinafter, a parallel video encoding method performed by the encoding apparatus 100 in a multipass unit will be described.

멀티패스 부호화는 이전에 부호화하여 생성된 결과물과 소스(원본)를 비교하여 압축률 측면에서 좀더 최적화된 형태로 다시 부호화하는 방법이다. 멀티패스부호화 시 압축성능은 증가하나, 부호화 복잡도는 멀티패스의 개수만큼 증가할 수 있다. 본 개시에 따른 멀티패스 단위의 부호화에서는, 각 패스들을 머신별로 파이프라인에 배치 및 분산하여 병렬처리한 후, 하위 패스의 인코딩 정보를 마스터 머신 측으로 전달한다.Multipass encoding is a method of re-encoding in a more optimized form in terms of compression rate by comparing a result generated by previous encoding with a source (original). In multipath encoding, compression performance increases, but encoding complexity may increase as much as the number of multipaths. In the multi-path coding according to the present disclosure, each pass is arranged and distributed in a pipeline for each machine to be parallel-processed, and then the encoding information of the lower pass is transmitted to the master machine.

도 7은 본 개시의 다른 실시예에 따른 멀티패스 단위의 병렬 비디오 부호화 방법에 대한 개략적인 예시도이다. 7 is a schematic illustration of a method of multipath-based parallel video encoding according to another embodiment of the present disclosure.

도 7에 예시된 바는 2 패스 병렬 비디오 부호화를 나타낸다. 제2 머신(720)은 첫 번째 패스의 부호화를 수행하여 부호화 정보를 생성하고, 제1 머신(710)은 부호화 정보를 이용하여 두 번째 패스의 부호화를 수행한다. The bar illustrated in FIG. 7 shows two-pass parallel video coding. The second machine 720 encodes the first pass to generate encoding information, and the first machine 710 encodes the second pass using the encoding information.

첫 번째 패스를 수행하는 제2 머신(720)은 비디오 소스를 획득한 후 동기 DBP(224)에 저장된 프레임을 참조 프레임으로 이용하여 블록 모드, 움직임 등에 관한 부호화 정보만을 추출한다. 제2 머신(720)은 생성된 부호화 정보를 마스터 머신인 제1 머신(710) 측으로 전달한다. 제2 머신(720)의 참조프레임 복호기(222)는 제1 머신(710)으로부터 획득한 비트스트림으로부터 복호 픽처를 생성하여 동기 DBP(224)에 저장한다. 동기 DBP(224)에 저장된 픽처들은 이후 참조 프레임으로 이용될 수 있다. After acquiring the video source, the second machine 720 performing the first pass uses the frame stored in the synchronization DBP 224 as a reference frame to extract only encoding information on block mode, motion, and the like. The second machine 720 transmits the generated encoding information to the first machine 710, which is the master machine. The reference frame decoder 222 of the second machine 720 generates a decoded picture from the bitstream obtained from the first machine 710 and stores it in the synchronization DBP 224 . Pictures stored in the synchronization DBP 224 may be used as reference frames thereafter.

두 번째 패스를 수행하는 제1 머신(710)은 프레임별 부호화 정보를 이용하여 다시 정밀한 부호화를 수행함으로써 비디오 소스에 대한 비트스트림을 생성한다. 또한, 제1 머신(710)은 비트스트림을 제2 머신(720) 측으로 제공하여, 이후 제2 머신(720)에서 부호화 정보 추출 시 참조 프레임으로 사용될 수 있도록 한다.The first machine 710 performing the second pass generates a bitstream for the video source by performing precise encoding again using encoding information for each frame. In addition, the first machine 710 provides the bitstream to the second machine 720 so that it can be used as a reference frame when the encoding information is extracted in the second machine 720 thereafter.

한편, 트랜스코딩을 수행하는 경우, 제1 머신(710) 및 제2 머신(720)은 소스 비디오에 대한 비트스트림을 획득하여 복호한 후, 멀티패스 단위의 병렬 부호화를 수행할 수 있다. Meanwhile, when transcoding is performed, the first machine 710 and the second machine 720 may obtain and decode a bitstream for a source video, and then perform parallel encoding in units of multipaths.

한 머신 내에서 멀티패스를 수행함으로써 부호화의 복잡도가 현저히 증가될 수밖에 없는 기존 방식과 비교하여, 본 개시에 따른 멀티패스 단위의 부호화에서는, 각 패스가 상이한 머신에서 병렬처리되도록 함으로써, 머신별 복잡도를 유지한 채로 부호화 효율을 향상시키는 것이 가능해지는 효과가 있다.Compared with the existing method in which encoding complexity is inevitably increased by performing multipass in one machine, in the multipass unit encoding according to the present disclosure, each pass is processed in parallel in different machines, thereby reducing the complexity of each machine. There is an effect that it becomes possible to improve the encoding efficiency while maintaining it.

본 실시예에 따른 각 순서도에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 순서도에 기재된 과정을 변경하여 실행하거나 하나 이상의 과정을 병렬적으로 실행하는 것이 적용 가능할 것이므로, 순서도는 시계열적인 순서로 한정되는 것은 아니다.Although it is described that each process is sequentially executed in each flowchart according to the present embodiment, the present invention is not limited thereto. In other words, since it may be applicable to change and execute the processes described in the flowchart or to execute one or more processes in parallel, the flowchart is not limited to a time-series order.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 디지털 전자 회로, 집적 회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍가능 시스템 상에서 실행가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령들을 수신하고 이들에게 데이터 및 명령들을 전송하도록 결합되는 적어도 하나의 프로그래밍가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는　기록매체"에 저장된다. Various implementations of the systems and techniques described herein may be implemented in digital electronic circuitry, integrated circuitry, field programmable gate array (FPGA), application specific integrated circuit (ASIC), computer hardware, firmware, software, and/or combination can be realized. These various implementations may include being implemented in one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a "computer-readable recording medium".

컴퓨터가 읽을 수 있는　기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는　기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비일시적인(non-transitory) 매체일 수 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송) 및 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한 컴퓨터가 읽을 수 있는　기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. These computer-readable recording media are non-volatile or non-transitory, such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. media, and may further include transitory media such as carrier waves (eg, transmission over the Internet) and data transmission media. In addition, the computer-readable recording medium is distributed in network-connected computer systems, and computer-readable codes may be stored and executed in a distributed manner.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 프로그램가능 컴퓨터에 의하여 구현될 수 있다. 여기서, 컴퓨터는 프로그램가능 프로세서, 데이터 저장 시스템(휘발성 메모리, 비휘발성 메모리, 또는 다른 종류의 저장 시스템이거나 이들의 조합을 포함함) 및 적어도 한 개의 커뮤니케이션 인터페이스를 포함한다. 예컨대, 프로그램가능 컴퓨터는 서버, 네트워크 기기, 셋탑 박스, 내장형 장치, 컴퓨터 확장 모듈, 개인용 컴퓨터, 랩탑, PDA(Personal Data Assistant), 클라우드 컴퓨팅 시스템 또는 모바일 장치 중 하나일 수 있다.Various implementations of the systems and techniques described herein may be implemented by a programmable computer. Here, the computer includes a programmable processor, a data storage system (including volatile memory, non-volatile memory, or other types of storage systems or combinations thereof), and at least one communication interface. For example, a programmable computer may be one of a server, a network appliance, a set-top box, an embedded device, a computer expansion module, a personal computer, a laptop, a Personal Data Assistant (PDA), a cloud computing system, or a mobile device.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present embodiment.

100: 비디오 병렬 부호화 장치
110: 마스터 머신 120: 워커 머신
210: 병렬제어부 220: DBP 매니저
222: 외부 참조프레임 복호기
224: 동기 DPB
230: 입력부 240: 부호화 모듈
100: video parallel encoding device
110: master machine 120: worker machine
210: parallel control unit 220: DBP manager
222: external reference frame decoder
224: Synchronous DPB
230: input unit 240: encoding module

Claims

A method performed in a video encoding apparatus for parallel encoding a source video, the method comprising:
storing a picture decoded from a first bitstream generated by a higher-level worker machine for each of the plurality of worker machines in a decoded picture buffer (DPB);
obtaining an input picture for encoding for each worker machine;
generating a second bitstream for each worker machine by encoding the input picture using pictures stored in the DBP as reference pictures; and
A process of providing the second bitstream to a lower-level worker machine and storing a picture generated from the second bitstream in the DPB
A method comprising a.

According to claim 1,
determining a hierarchical structure between pictures constituting a Group of Picture (GOP) of the source video, and then providing information on the reference picture for each worker machine based on the hierarchical structure and allocating the input picture; and
generating a bitstream for the source video by combining the second bitstream
Method, characterized in that it further comprises.

3. The method of claim 2,
The hierarchical structure is
The method of claim 1, wherein the layer including the input picture allocated to the lower-level worker machine is determined to refer to the layer including the input picture allocated to the upper-level worker machine.

3. The method of claim 2,
The process of storing in the DPB is,
When the information on the reference picture is an index for the reference picture, only the picture corresponding to the index from the first bitstream is decoded and stored in the DPB.

According to claim 1,
When transcoding is performed on the source video, a third bitstream in which the source video is encoded is obtained, and the input frame is obtained for each worker machine by decoding the third bitstream. How to.

In parallel encoding of the source video (resource video),
including a master machine and a plurality of worker machines,
Each of the plurality of worker machines,
an external reference frame decoder for decoding a picture from a first bitstream generated by a higher-level worker machine;
Decoded Picture Buffer (DPB) for storing the picture;
an input unit for obtaining an input picture for encoding; and
A second bitstream is generated by encoding the input picture using pictures stored in the DBP as reference pictures, and the second bitstream is provided to a lower-level worker machine and the master machine, and the second An encoding module that stores a picture generated from a bitstream in the DPB
A video encoding apparatus comprising a.

7. The method of claim 6,
The master machine is
After determining a hierarchical structure between pictures constituting a Group of Picture (GOP) of the source video, information on the reference picture is provided for each of the plurality of worker machines based on the hierarchical structure and the input picture is assigned, and generating a bitstream for the source video by combining the second bitstreams for each of the plurality of worker machines.

8. The method of claim 7,
The hierarchical structure is
and a layer including an input picture allocated to the lower-level worker machine is determined to refer to a layer including an input picture allocated to the upper-level worker machine.

A computer-readable record for executing each step comprising a method performed in a video encoding apparatus for parallel encoding a source video according to any one of claims 1 to 5 A computer program stored on a medium.