WO2020209437A1

WO2020209437A1 - Apparatus and method for transcoding segmented images in real time

Info

Publication number: WO2020209437A1
Application number: PCT/KR2019/005776
Authority: WO
Inventors: 장준환; 박우출; 김용화; 양진욱; 윤상필; 김현욱; 조은경; 최민수; 이준석; 양재영
Original assignee: 전자부품연구원
Priority date: 2019-04-09
Filing date: 2019-05-14
Publication date: 2020-10-15
Also published as: KR102316495B1; KR20200119435A

Abstract

An apparatus and a method for transcoding segmented images in real time are disclosed. An apparatus for transcoding segmented images, according to the present invention, comprises: an input unit for receiving an input of an original video stream; and a control unit for generating tiles by spatially segmenting the inputted original video stream, encoding the generated tiled frames in a parallel structure by using a plurality of graphics processing units (GPUs), and generating a first video stream having a first resolution, a second video stream having a second resolution that is lower than the first resolution, and a third video stream having a third resolution that is lower than the second resolution, by rearranging the encoded frames.

Description

Real-time segmented image transcoding apparatus and method

The present invention relates to a transcoding technology, and more particularly, to a real-time segmented image transcoding apparatus and method for transcoding tiles with high quality in real time using a plurality of GPUs (Graphics Processing Units).

Recently, various high-quality images have been provided to users, and among them, 360 VR images are composed of stereos and require a higher resolution (4096×4096 or higher) than a general 4K image (3840×2160). In other words, unlike a 2D image, a 360 VR image requires more bandwidth than a 2D image representing a flat image because it streams an image corresponding to 360°.

On the other hand, 360 VR video not only requires a lot of bandwidth, but due to the nature of the video, viewers do not view the entire video at once, but only a part of the video. Have. Accordingly, various studies have been conducted to solve this problem, but a technology for effectively streaming 360 VR images without wasting bandwidth has not been developed.

An object of the present invention is to provide a real-time segmented image transcoding apparatus and method for spatially segmenting an original video stream and transcoding the segmented tiles in real time.

In order to achieve the above object, the real-time segmented image transcoding apparatus of the present invention generates a tile by spatially dividing an input unit receiving an original video stream and the input original video stream, and a frame of the generated tile ( Tiled frames) are encoded in a parallel structure using a plurality of GPUs (Graphics Processing Units), and the encoded frames are rearranged to obtain a first video stream having a first resolution and a second resolution lower than the first resolution. The branch includes a control unit for generating a second video stream and a third video stream having a third resolution lower than the second resolution.

In addition, the control unit may further include an image space dividing unit for generating tiles by dividing the original video stream into a preset number of tiles, calculating an amount of work related to a frame of the generated tile, and the plurality of GPUs according to the calculated amount of work. A GPU task management unit that allocates a job to a GPU, and a GPU unit that includes the plurality of GPUs in a parallel structure, performs encoding of a video stream for a job assigned to each GPU, and synchronizes the encoded video stream, and the synchronized And a video post-processor configured to rearrange the video streams to generate the first video stream, the second video stream, and the third video stream.

In addition, the image space division unit is characterized in that the number of horizontal and vertical pixels of the tile is divided by a multiple of 128.

In addition, in the case of the last horizontal tile at the lower end and the last vertical tile at the right, among the tiles, the image space division unit does not limit the number of pixels.

In addition, the GPU task manager may allocate the task according to an average task completion time of each GPU and a size of an assigned task queue.

In addition, the GPU task manager may predict a task completion time of each GPU based on an average task time according to a task type of each GPU and allocate the task.

In addition, the GPU task management unit is characterized in that each task is sequentially copied to the GPU by the frame of the tile as a GOP (Group of Pictures) size.

In addition, the GPU task manager may further include and transmit frame number information when delivering information related to the task to each GPU.

In addition, the video post-processing unit may include multiplexers corresponding to the first video stream, the second video stream, and the third video stream.

The real-time tile transcoding method according to the present invention includes the steps of receiving an original video stream by a segmented image transcoding apparatus, generating a tile by spatially dividing the input original video stream by the segmented image transcoding apparatus, and Encoding, by a transcoding device, the frame of the generated tile in a parallel structure using a plurality of GPUs, and by rearranging the encoded frames by the split image transcoding device to a first video stream having a first resolution, And generating a second video stream having a second resolution lower than one resolution and a third video stream having a third resolution lower than the second resolution.

The apparatus and method for real-time divided image transcoding of the present invention may spatially divide an original video stream and transcode the divided tiles in a parallel structure through a plurality of GPUs.

At this time, by assigning jobs to a plurality of GPUs according to the average job completion time of each GPU and the size of the allocated job queue, a high-quality video stream can be provided in real time by performing a fast operation.

1 is a block diagram illustrating a split image transcoding apparatus according to an embodiment of the present invention.

2 is a schematic diagram for explaining an entire process of driving a divided image transcoding apparatus according to an embodiment of the present invention.

3 is a view for explaining a work management process according to an embodiment of the present invention.

4 is a diagram for explaining task assignment according to an embodiment of the present invention.

5 is a view for explaining a post-processing process according to an embodiment of the present invention.

6 is a flowchart illustrating a split image transcoding method according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. First of all, in adding reference numerals to elements of each drawing, note that the same elements are to have the same numerals as possible even if they are indicated on different drawings. In addition, in describing the present invention, when it is determined that a detailed description of a related known configuration or function is apparent to those skilled in the art or may obscure the subject matter of the present invention, the detailed description thereof will be omitted.

FIG. 1 is a block diagram illustrating a split image transcoding apparatus according to an embodiment of the present invention, and FIG. 2 is a schematic diagram illustrating an entire process of driving a divided image transcoding apparatus according to an embodiment of the present invention.

1 and 2, the divided image transcoding apparatus 100 spatially divides an original video stream and transcodes the divided tiles in real time. The split image transcoding apparatus 100 includes an input unit 10 and a control unit 30.

The input unit 10 receives an original video stream. The input unit 10 may receive an original video stream in various ways, such as a file, a network, and an application program interface (API). Here, the original video stream may be a 4K stereo and high-definition video stream of 4096×4096 px or more, and H.264 format, HEVC (High Efficiency Video Coding) format, YUV420 raw frame format, RGB low frame format, etc. Can support.

The control unit 30 generates a tile by spatially dividing the original video stream input from the input unit 10. The controller 30 encodes the generated tiled frame in a parallel structure using a plurality of GPUs. The controller 30 rearranges the encoded frames to obtain a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third resolution lower than the second resolution. The branch produces a third video stream. Here, the first resolution refers to a high resolution of high quality (HQ), the second resolution refers to a normal resolution of middle quality (MQ), and the third resolution refers to a low resolution of low quality (LQ). Can mean The control unit 30 includes an image space division unit 31, a GPU task management unit 33, a GPU unit 35, and a video post-processing unit 37.

The image spatial dividing unit 31 generates tiles by dividing the original video stream into a preset number of tiles. The image spatial dividing unit 31 divides the original video stream into tiles so that the number of rows and columns is even. For example, the image space dividing unit 31 may divide the width and height into 6×6, 6×8, 8×8, 8×12, 12×12, or the like. In addition, the image space division unit 31 divides the number of vertical and horizontal pixels of each tile by a multiple of 128. For example, the image space dividing unit 31 may divide the width and height into 256×256, 256×512, 512×512, or the like. In this case, the image space dividing unit 31 may not limit the number of pixels in the case of the last horizontal tile at the bottom and the last vertical tile at the right of the tiles. Through this, the image space dividing unit 31 may flexibly divide the image into a plurality of tiles. Meanwhile, the image spatial division unit 31 performs only logical division on the original video stream and does not perform data movement.

The GPU task management unit 33 calculates the amount of work related to the frame of the tile generated by the image space division unit 31. The GPU task management unit 33 allocates tasks to a plurality of GPUs according to the calculated amount of work. Here, the operation may mean an encoding operation performed through the GPU. For example, the GPU task management unit 33 may allocate tasks according to the average task completion time of each GPU and the size of the assigned task queue. In addition, the GPU task management unit 33 may predict the task completion time of each GPU and allocate the task based on the average task time according to the task type of each GPU.

The GPU unit 35 includes a plurality of GPUs. For example, the GPU unit 35 may include a first GPU, a second GPU to an n-th GPU. Preferably, the GPU unit 35 may include GPUs of the same specification to facilitate compatibility between each GPU, but is not limited thereto, and may include GPUs of different specifications depending on the environment to be performed. The GPU unit 35 has a plurality of GPUs in a parallel structure, and encodes a video stream for a job allocated from the GPU job management unit 33 in each of the GPUs.

The video post-processing unit 37 synchronizes the video stream encoded from the GPU unit 35 and rearranges the synchronized video stream. The video post-processing unit 37 generates a first video stream, a second video stream, and a third video stream through rearrangement. In this case, the video post-processing unit 37 may include respective multiplexers corresponding to the first video stream, the second video stream, and the third video stream.

3 is a diagram for explaining a task management process according to an embodiment of the present invention, and FIG. 4 is a diagram for explaining a task assignment according to an embodiment of the present invention. 3(a) is a diagram showing the status of the existing job buffer for each GPU, FIG. 3(b) is a diagram showing a new job, and FIG. 3(c) is a diagram showing the buffer status to which a new job is allocated for each GPU to be.

2 to 4, the GPU work management unit 33 includes a frame buffer 51, a work queue loader 53, and a load balancer 55. .

The frame buffer 51 stores a tile frame generated by logically dividing from the image space dividing unit 31. At this time, the frame buffer 51 has a function of temporarily storing the frame of the tile before transferring it to the work queue loader 53.

The job queue loader 53 calculates the amount of work related to the frame of the tile stored from the frame buffer 51, and allocates the work to the GPU unit 35 according to the calculated amount of work. The job queue loader 53 may predict the job completion time of each GPU and allocate the job based on the average job time according to the job type (HQ/MQ/LQ) of each GPU. Here, the work queue loader 53 may generate two HQ/MQ work commands for a frame of one tile and an LQ work command for all frames.

In detail, the work queue loader 53 calculates the average work completion time of each GPU and the size of the allocated work queue, and allocates the work to each GPU using the calculated information. For example, when a frame of a new tile is input, the work queue loader 53 updates the estimated average work time for each GPU. At this time, the work queue loader 53 receives update information from the load balancer 55. The work queue loader 53 sorts in ascending order according to the work time, and allocates a frame of a newly input tile to the GPU having the fastest work completion time. If there are remaining tiles even after allocation, the job queue loader 53 performs the above-described process again to perform job allocation for the remaining tiles.

The job queue loader 53 may not only assign a job to the GPU unit 35 but also copy a frame of a tile to a corresponding GPU. At this time, the job queue loader 53 may sequentially copy the tile frames to the GPU as much as the GOP (Group of Pictures) size for each job.

In addition, the job queue loader 53 may further include and transmit frame number information when transmitting information related to a job to the GPU. Here, the frame number information means time information.

The load balancer 55 receives the current work status of each GPU from the GPU unit 35. The load balancer 55 calculates an average working time according to the type of work of each GPU by using the received information. The load balancer 55 transmits the calculated average work time to the work queue loader 53 so that the work queue loader 53 can use the information to perform work allocation.

Referring to FIGS. 2 and 5, the video post-processing unit 37 includes a video synchronizer 71 and a multiplexer 73.

The video synchronization unit 71 synchronizes the video stream encoded from the GPU unit 35. Here, the encoded video stream may not be sequentially generated by load balancing. The video synchronization unit 71 preferentially receives the encoded result and stores it in a buffer for each tile ID. Through this, the video synchronization unit 71 transmits the corresponding frame to the multiplexer 73 when all tiles for the same frame time are encoded.

The multiplexer unit 73 rearranges the frames transmitted from the video synchronization unit 71 to form a single media container. Here, the media container may be in the form of MP4 or TS. MP4 collects frames for a certain unit time (3sec, 5sec, etc.) according to the preset settings and delivers them through file/network/API, and TS is delivered through file/network/API as soon as the work for one frame time is completed. Deliver. In addition, the multiplexer 73 includes a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third video stream having a third resolution lower than the second resolution. Create To this end, the multiplexer 73 includes

multiplexers

91, 93, and 95 corresponding to the first video stream, the second video stream, and the third video stream. Here, the first resolution may mean a high-quality high-resolution, the second resolution may mean a medium-quality normal resolution, and the third resolution may mean a low-quality low resolution.

Referring to FIGS. 1 and 6, in a split image transcoding method, an original video stream is spatially divided and the divided tiles are transcoded in a parallel structure through a plurality of GPUs. In this case, the split image transcoding method allocates jobs to a plurality of GPUs according to the average job completion time of each GPU and the size of the allocated job queue, thereby performing a fast operation to provide a high-quality video stream in real time.

In step S110, the split image transcoding apparatus 100 receives an original video stream. The split image transcoding apparatus 100 receives an original video stream in various ways, such as a file, a network, and an application program interface (API). Here, the original video stream may be a 4K stereo and high-definition video stream of 4096×4096 px or more, and H.264 format, HEVC (High Efficiency Video Coding) format, YUV420 raw frame format, RGB low frame format, etc. Can support.

In step S130, the divided image transcoding apparatus 100 generates a tile by spatially dividing the input original video stream. The split image transcoding apparatus 100 generates tiles by dividing the original video stream into a preset number of tiles. The split image transcoding apparatus 100 may divide the original video stream into tiles so that the number of horizontal and vertical numbers is even, and the number of vertical and horizontal pixels of each tile may be divided into 128 times. In this case, the split image transcoding apparatus 100 may not limit the number of pixels in the case of the bottom last horizontal tile and the right last vertical tile among tiles.

In step S150, the split image transcoding apparatus 100 encodes the frame of the generated tile in a parallel structure using a plurality of GPUs. The split image transcoding apparatus 100 calculates an amount of work related to the frame of the generated tile, and performs encoding by allocating work to a plurality of GPUs according to the calculated amount of work. Through this, the split image transcoding apparatus 100 may perform encoding optimized to fit the working state of the GPU in a parallel structure.

In step S170, the split image transcoding apparatus 100 rearranges the encoded frames. The split image transcoding apparatus 100 may synchronize the encoded video stream and rearrange the synchronized video stream. In this case, the split image transcoding apparatus 100 includes a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third resolution having a third resolution lower than the second resolution. 3 Create a video stream. Here, the first resolution may mean a high-quality high-resolution, the second resolution may mean a medium-quality normal resolution, and the third resolution may mean a low-quality low resolution.

Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific preferred embodiments described above, and without departing from the gist of the present invention claimed in the claims, in the technical field to which the present invention pertains. Anyone of ordinary skill in the art can implement various modifications, as well as such modifications will be within the scope of the claims.

[Explanation of code]

10: input

30: control unit

31: image space division unit

33: GPU task management unit

35: GPU unit

37: video post-processing unit

51: frame buffer

53: work queue loader

55: load balancer

71: video synchronization unit

73: multiplexer unit

91: first multiplexer

93: second multiplexer

Claims

An input unit receiving an original video stream; And

A tile is generated by spatially dividing the input original video stream, the tiled frame is encoded in a parallel structure using a plurality of GPUs (Graphics Processing Units), and the encoded frame is rearranged. Thus, a controller for generating a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a third video stream having a third resolution lower than the second resolution;

Real-time segmented image transcoding apparatus comprising a.
The method of claim 1,

The control unit,

An image space dividing unit generating tiles by dividing the original video stream into a preset number of tiles;

A GPU task management unit that calculates a work amount related to the frame of the generated tile and allocates work to the plurality of GPUs according to the calculated work amount;

A GPU unit having the plurality of GPUs in a parallel structure and performing encoding of a video stream for a task allocated to each GPU; And

A video post-processing unit synchronizing the encoded video stream and rearranging the synchronized video stream to generate the first video stream, the second video stream, and the third video stream;

Real-time segmented image transcoding apparatus comprising a.
The method of claim 2,

The image space division unit,

The real-time segmented image transcoding apparatus, characterized in that the number of horizontal and vertical pixels of the tile is divided by a multiple of 128.
The method of claim 3,

The image space division unit,

In the case of a bottom last horizontal tile and a right last vertical tile among the tiles, the number of pixels is not limited.
The method of claim 2,

The GPU task management unit,

The real-time segmented image transcoding apparatus, characterized in that the task is allocated according to the average task completion time of each GPU and the size of the assigned task queue.
The method of claim 2,

The GPU task management unit,

A real-time segmented image transcoding apparatus, characterized in that the task is allocated by predicting the task completion time of each GPU based on the average task time according to the task type of each GPU.
The method of claim 2,

The GPU task management unit,

A real-time segmented image transcoding apparatus, characterized in that for each job, a frame of a tile as much as a GOP (Group of Pictures) size is sequentially copied to a GPU.
The method of claim 2,

The GPU task management unit,

The real-time segmented image transcoding apparatus, further comprising and transmitting frame number information when transmitting information related to the task to each GPU.
The method of claim 2,

The video post-processing unit

And a multiplexer corresponding to the first video stream, the second video stream, and the third video stream.
Receiving, by a split image transcoding apparatus, an original video stream;

Generating a tile by spatially dividing the input original video stream by the divided image transcoding apparatus;

Encoding, by the divided image transcoding apparatus, the frame of the generated tile in a parallel structure using a plurality of GPUs; And

The split image transcoding apparatus rearranges the encoded frames to provide a first video stream having a first resolution, a second video stream having a second resolution lower than the first resolution, and a second video stream having a lower resolution than the second resolution. Generating a third video stream having 3 resolutions;

Real-time tile transcoding method comprising a.