KR101063424B1

KR101063424B1 - Video data processing device and method

Info

Publication number: KR101063424B1
Application number: KR1020090008184A
Authority: KR
Inventors: 박성원; 권구현; 윤기욱
Original assignee: 주식회사 코아로직
Priority date: 2009-02-02
Filing date: 2009-02-02
Publication date: 2011-09-07
Also published as: KR20100088998A

Abstract

Disclosed are a video data processing apparatus and method. The video data processing apparatus includes a slice distributor which receives a bit stream, demultiplexes the received bit stream, and divides the received bit stream into slices capable of a plurality of independent processes, and a slice distributed by the slice distributor in the slice. Data output from a video decoder and a video decoder including a plurality of L (L is an integer of 1 or more) order data processing blocks sequentially processed in macroblock units and a pipeline structure connected to the plurality of L order data processing blocks. It includes a multiplexer that multiplexes. Therefore, it is possible to improve the utilization efficiency of the data processing block and prevent the performance degradation of the video decoder due to the data processing block which takes a long time.

Slice, Macroblock, Variable Length Decoder, Multiplexer, Demultiplexer

Description

Apparatus and Method for Processing Video Data

본 발명은 비디오 데이터 처리 장치 및 방법에 관한 것으로서, 좀더 상세하게는, 비트 스트림을 독립적인 처리가 가능한 슬라이스별로 분할하여 병목 부분에 해당하는 데이터 처리 블록 단, 예컨대 VLD 단에서 병렬로 처리함으로써 데이터 처리 블록의 활용 효율을 높일 수 있는 기술에 관한 것이다.The present invention relates to an apparatus and method for video data processing. More particularly, the present invention relates to data processing by splitting a bit stream into slices that can be processed independently and processing them in parallel in a data processing block stage corresponding to a bottleneck, for example, a VLD stage. The present invention relates to a technology capable of increasing the utilization efficiency of blocks.

일반적으로, 전통적인 개념의 비디오 복호화기(Video Decoder)는 싱글 쓰레드(Single Thread)/싱글 코어(Single Core) 즉, 단일 연산부를 기반으로 한 연산 모델을 가정하여 개발되었다. 이러한 싱클 쓰레드/싱글 코어 기반의 연산 모델은 압축된 영상 데이터를 복호화하기 위하여 순차적인 연산을 수행하여야 하며, 따라서 응용 프로그램의 연산 속도를 높이기 위해서는 해당 연산부, 즉 하드웨어 프로세서의 동작 주파수를 높여야만 한다. 그러나 하드웨어 프로세서의 동작 주파수를 높이는 것은 프로세서의 구조를 복잡하게 할 뿐만 아니라 그에 따른 전력 소모율도 높아지는 단점을 내포하고 있다.In general, a conventional video decoder has been developed assuming a computation model based on a single thread / single core, that is, a single computation unit. Such a single thread / single core-based computational model must perform sequential operations to decode the compressed image data. Therefore, in order to increase the computational speed of an application program, an operating frequency of a corresponding computing unit, that is, a hardware processor, must be increased. However, increasing the operating frequency of a hardware processor not only complicates the structure of the processor but also increases the power consumption.

이에 따라, 최근에는 다수 개의 연산부를 사용하는 멀티 쓰레드(Multi Thread)/멀티 코어(Multi Core) 기반의 프로세서들이 개발되고 있다. 다중 연산부의 구조는 다양한 형태의 구현이 가능하나, 가장 일반적인 다중 연산부의 구현 형태는 범용 RISC(Reduced Instruction Set Computer) 프로세서와 고속 DSP(Digital Signal Processor)를 조합한 형태 또는 특정한 기능을 수행하는 단위 코어를 배열 형식으로 연결하는 형태라 할 수 있다.Accordingly, recently, multi-thread / multi-core based processors using a plurality of computing units have been developed. Although the structure of the multi-computer can be implemented in various forms, the most common implementation of the multi-computer is a combination of a general-purpose reduced instruction set computer (RISC) processor and a high-speed digital signal processor (DSP) or a unit core that performs a specific function. It can be said to connect in the form of array.

통상, 상기 전자(즉, RISC 프로세서와 DSP를 조합한 다중 연산부 구조)의 경우, 일반적인 연산은 범용 RISC 프로세서에서 담당하고 멀티미디어와 같은 고속 연산이 필요할 경우 DSP를 사용한다. 한편 상기 후자(즉, 특정한 기능을 수행하는 단위 코어를 배열 형식으로 연결한 다중 연산부 구조)의 경우, 각 단위 코어로 프로세스에 적합한 연산을 할당하여 단위 코어들이 개별적으로 연산을 수행하거나 또는 코어끼리 서로 연동하면서 파이프라인(Pipeline) 형식으로 연산을 수행한다.In general, in the case of the former (i.e., a multi-computation unit structure combining a RISC processor and a DSP), a general operation is performed by a general-purpose RISC processor and a DSP is used when a high-speed operation such as multimedia is required. On the other hand, in the latter case (ie, a multi-computation unit structure in which unit cores performing a specific function are connected in an array form), each unit core allocates a suitable operation to a process so that the unit cores individually perform the operations or mutually Performs operations in pipeline form while interlocking.

도 1은 일반적인 멀티 코어 기반의 비디오 복호화기의 구성을 도시하는 예시도이다.1 is an exemplary diagram illustrating a configuration of a general multi-core based video decoder.

도 1에 도시된 바와 같이, 멀티 코어 기반의 비디오 복호화기(10)는 다수 개의 데이터 처리 블록, 예컨대 가변장 복호화기(VLD : Variable Length Decoder)(11), 역양자화/역이산코사인변환부(IQ/IDCT : Inverse Quantization / Inverse Discrete Cosine Transformer)(12), 움직임 보상부(MC : Motion Compensator)(13) 및 디블록킹 필터(DF : Deblocking Filter)(14)를 직렬로 연결한 파이프라인을 구비한다. 이때, 각각의 데이터 처리 블록은 특정한 데이터 처리 기능을 구비하는 코어를 의미할 수 있다.As shown in FIG. 1, the multi-core based video decoder 10 includes a plurality of data processing blocks such as a variable length decoder (VLD) 11, an inverse quantization / inverse discrete cosine transform unit ( IQ / IDCT: Inverse Quantization / Inverse Discrete Cosine Transformer (12), Motion Compensator (MC) (13), and Deblocking Filter (DF: Deblocking Filter) (14). do. In this case, each data processing block may mean a core having a specific data processing function.

상기 각각의 데이터 처리 블록은 매크로블록(MB : Macroblock) 단위로 데이터를 처리하며 각각의 데이터 처리 블록이 순차적인 데이터를 동시에 처리하기 때문에, 다수 개의 코어의 병렬 수행이 가능하다. 따라서 싱글 코어 기반의 비디오 복호화기에 비하여 그 처리 효율이 높다.Each of the data processing blocks processes data in units of macroblocks (MBs), and since each data processing block processes sequential data simultaneously, multiple cores can be executed in parallel. Therefore, the processing efficiency is higher than that of a single core based video decoder.

이러한 파이프라인 구조 하에서 하나의 데이터 처리 블록의 출력은 다음 하나의 데이터 처리 블록으로 입력된다. 따라서 파이프라인된(Pipelined) 비디오 복호화기에서는 특정한 데이터 처리 블록의 처리 속도가 늦을 경우 해당 블록이 병목(bottleneck) 부분이 되어 비디오 복호화기 전체의 처리 속도를 저하시키게 된다.Under this pipeline structure, the output of one data processing block is input to the next one data processing block. Accordingly, in a pipelined video decoder, when a processing speed of a specific data processing block is slow, the corresponding block becomes a bottleneck portion, thereby reducing the processing speed of the entire video decoder.

예를 들면, 도 1에 도시되어 있는 비디오 복호화기(10)에서 VLD(11)는 다른 데이터 처리 블록들(12, 13, 14)에 비하여 더 많은 연산량을 가지며, 이전 복호화된 정보와의 연관성으로 인하여 병렬로 나누어 연산할 수도 없기 때문에, 상기 비디오 복호화기(10)에서의 병목 부분에 해당할 수 있다. 따라서 VLD(11)의 처리 시간은 비디오 복호화기(10)의 처리 시간을 좌우하는 관건이 된다.For example, in the video decoder 10 shown in FIG. 1, the VLD 11 has a larger amount of computation than the other data processing blocks 12, 13, and 14, and is associated with previously decoded information. Due to this, it cannot be divided in parallel and thus may correspond to a bottleneck in the video decoder 10. Therefore, the processing time of the VLD 11 becomes a key factor in determining the processing time of the video decoder 10.

도 2는 도 1에 도시되어 있는 비디오 복호화기(10)의 각 데이터 처리 블록 별 동작 구간을 도시하는 예시도로서, 파이프라인 구조에서 VLD(11) 이외의 데이터 처리 블록(12, 13, 14)의 처리 시간을 단위 시간 T로 가정하고 VDL(11)의 처리 시간을 3T로 가정하였을 경우 각 시간 별 데이터 처리 블록의 사용 구간을 나타내고 있다.FIG. 2 is an exemplary diagram illustrating an operation section for each data processing block of the video decoder 10 illustrated in FIG. 1, and the data processing blocks 12, 13, and 14 other than the VLD 11 in the pipeline structure. Assuming that the processing time of is assumed to be a unit time T and the processing time of the VDL 11 is assumed to be 3T, the use interval of the data processing block for each time is shown.

도 1 내지 도 2를 참조하면, VLD(11)는 매크로블록 MB(n)을 처리하기 위하여 T0에서 T3까지 3T의 시간 동안 동작하고, 이어서 매크로블록 MB(n+1), MB(n+2)를 순차적으로 처리하기 위하여 T3에서 T6까지, T6에서 T9까지 각각 3T 단위로 동작을 수행한다.1 to 2, the VLD 11 operates for a time of 3T from T0 to T3 to process the macroblock MB (n), followed by the macroblocks MB (n + 1) and MB (n + 2). ) Are sequentially performed in units of 3T from T3 to T6 and T6 to T9.

VLD(11)에 의하여 처리된 매크로블록 MB(n)은 IQ/IDCT(12)로 전송되고, IQ/IDCT(12)는 매크로블록 MB(n)을 처리하기 위하여 T3에서 T4까지 1T 의 시간 동안 동작한 후, VLD(11)가 다음 매크로블록인 매크로블록 MB(n+1)을 처리할 때까지 2T의 시간 동안 대기한다. 이와 마찬가지로, MC(13)와 DF(14)의 경우에도 매크로블록 M(n)의 처리 후 M(n+1)을 처리하기 위하여 VLD(11)의 처리 지연으로 인하여 2T의 시간 동안 각각 대기한다.The macroblock MB (n) processed by the VLD 11 is transmitted to the IQ / IDCT 12, and the IQ / IDCT 12 performs a time of 1T from T3 to T4 to process the macroblock MB (n). After operation, the VLD 11 waits for a time of 2T until it processes the next macroblock, the macroblock MB (n + 1). Similarly, in the case of the MC 13 and the DF 14, respectively, for the processing delay of the VLD 11 to process M (n + 1) after the processing of the macroblock M (n), each waits for 2T. .

도 2에 도시된 굵은 쌍방향 화살표는 VLD(11)의 처리 지연으로 인한 VLD(11) 이외의 데이터 처리 블록들(12, 13, 14)의 Idle 시간을 나타낸 것으로서, 순차적인 매크로블록 단위의 데이터를 처리하기 위하여 IQ/IDCT(12), MC(13), DF(14)가 병목 부분인 VLD(11)의 처리 지연으로 인하여 2T의 시간 동안 각각 Idle 상태로 있는 것을 알 수 있다.The thick bidirectional arrows shown in FIG. 2 indicate idle times of the data processing blocks 12, 13, and 14 other than the VLD 11 due to the processing delay of the VLD 11, and indicate data in sequential macroblock units. It can be seen that the IQ / IDCT 12, the MC 13, and the DF 14 are in the idle state for 2T time due to the processing delay of the VLD 11, which is the bottleneck portion for processing.

이와 같이, 종래의 멀티 코어 기반의 비디오 복호화기는 병목 부분, 예컨대 VLD의 처리 지연으로 인하여 나머지 다른 데이터 처리 블록들의 활용도가 떨어지는 문제점이 있다. 따라서 코어의 활용도를 높여 비디오 복호화기의 처리 성능을 높일 수 있는 기술의 개발이 시급히 요구되고 있다.As described above, the conventional multi-core based video decoder has a problem in that the utilization of other data processing blocks is reduced due to a bottleneck, for example, processing delay of the VLD. Therefore, there is an urgent need to develop a technology that can increase the utilization of the core to increase the processing performance of the video decoder.

본 발명이 해결하고자 하는 과제는 비트 스트림을 독립적인 처리가 가능한 슬라이스별로 분할하여 병목 부분에서 병렬로 처리함으로써 처리 시간이 많이 소요되는 데이터 처리 블록으로 인한 비디오 복호화기의 성능 저하를 방지할 수 있는 비디오 데이터 처리 장치 및 방법을 제공하는데 있다.The problem to be solved by the present invention is to divide the bit stream into slices that can be processed independently to process in parallel in the bottleneck part to prevent the performance of the video decoder due to the data processing block that takes a long time processing video A data processing apparatus and method are provided.

이러한 기술적 과제를 해결하기 위하여 본 발명은 일 측면(Aspect)에서 비디오 데이터 처리 장치를 제공한다. 상기 비디오 데이터 처리 장치는, 비트 스트림을 수신하고, 수신된 비트 스트림을 역다중화하여 다수 개의 독립적인 처리가 가능한 슬라이스 단위로 분할하는 슬라이스 분배부와; 상기 슬라이스 분배부에 의하여 분배되는 슬라이스를 상기 슬라이스에 포함되는 매크로블록 단위로 순차적으로 처리하는 다수 개의 L(L은 1 이상의 정수)차 데이터 처리 블록 및 상기 다수 개의 L차 데이터 처리 블록과 연결되는 파이프 라인 구조를 포함하는 비디오 복호화기; 및 상기 비디오 복호화기로부터 출력되는 데이터를 다중화하는 멀티플렉서를 포함한다.In order to solve this technical problem, the present invention provides a video data processing apparatus in one aspect. The video data processing apparatus includes a slice distributor configured to receive a bit stream, demultiplex the received bit stream, and divide the received bit stream into slice units capable of a plurality of independent processes; Pipes connected to the plurality of L (L is an integer) or more data processing blocks for sequentially processing the slices distributed by the slice distributor in macroblock units included in the slices; A video decoder comprising a line structure; And a multiplexer for multiplexing data output from the video decoder.

상기 슬라이스 분배부는 상기 분할되는 다수 개의 슬라이스를 각각의 상기 L차 데이터 처리 블록으로 분배할 수 있다. 상기 슬라이스 분배부는, 상기 분할되는 슬라이스의 개수가 상기 L 차 데이터 처리 블록의 개수보다 많을 경우, 각각의 상 기 L 차 데이터 처리 블록으로 슬라이스를 하나씩 분배하고, 분배된 슬라이스의 처리를 먼저 완료한 L 차 데이터 처리 블록 순으로 분배되지 않은 나머지 슬라이스를 하나씩 분배할 수 있다.The slice distributor may distribute the divided plurality of slices to each of the L-th data processing blocks. When the number of slices to be divided is greater than the number of L-th data processing blocks, the slice distributor divides the slices into each of the L-th data processing blocks one by one, and L completes the processing of the distributed slices first. The remaining slices which are not distributed in the difference data processing block order may be distributed one by one.

상기 다수 개의 L차 데이터 처리 블록과 연결되는 파이프 라인 구조는, 만약 상기 L 이 1인 경우, 상기 L 차 데이터 처리 블록과 연결되는 2차 데이터 처리 블록부터 K(K는 2이상의 정수)차 데이터 처리 블록까지 파이프라인드(Pipelined) 된 구조를 포함할 수 있다.The pipeline structure connected to the plurality of L-th order data processing blocks may include a K (K is an integer of 2 or more) order data from a second data processing block connected to the L-th data processing block if L is 1; It may include a structure that is pipelined up to the block.

한편, 만약 상기 L 이 1보다 큰 경우, 상기 다수 개의 L차 데이터 처리 블록과 연결되는 파이프 라인 구조는, 1차 데이터 처리 블록으로부터 L-1차 데이터 처리 블록까지 연결되는 파이프라인드된 구조를 포함할 수 있다. 이 경우 상기 다수 개의 L차 데이터 처리 블록과 연결되는 파이프 라인 구조는, 상기 L 차 데이터 처리블록의 후단에 연결되며, L+1차 데이터 처리 블록으로부터 K(는 L+1이상의 정수) 차 데이터 처리 블록까지 파이프라인드된 구조를 더 포함할 수도 있다.On the other hand, if L is greater than 1, the pipelined structure connected with the plurality of L-th order data processing blocks includes a pipelined structure connected from the primary data processing block to the L-1 primary data processing block. can do. In this case, a pipeline structure connected to the plurality of L-th order data processing blocks is connected to a rear end of the L-th order data processing block, and K (where L + 1 is an integer greater than or equal to L) difference data processing from the L + primary data processing block. It may further comprise a structure pipelined up to the block.

각각의 상기 L 차 데이터 처리 블록은 매크로블록 단위의 데이터를 처리하기 위하여 상기 파이프라인 구조에 포함되는 각 데이터 처리 블록의 처리 시간 보다 더 긴 처리 시간을 소모한다.Each L-order data processing block consumes a longer processing time than a processing time of each data processing block included in the pipeline structure to process data in macroblock units.

상기 다수 개의 L차 데이터 처리 블록은 다수 개의 가변장 복호화기(VLD)를 포함할 수 있다. 상기 다수 개의 L차 데이터 처리 블록과 연결되는 파이프라인 구조는, 상기 다수 개의 VLD에 의하여 처리된 매크로블록 단위의 데이터를 순차적으로 처리하는 역양자화/역이산코사인 변환부(IQ/IDCT)와; 상기 IQ/IDCT에 의하여 처 리된 매크로블록 단위의 데이터를 순차적으로 처리하는 움직임 보상부(MC); 및 상기 움직임 보상부에 의하여 처리된 매크로블록 단위의 데이터를 순차적으로 처리하여 상기 멀티플렉서로 출력하는 디블록킹 필터(DF)를 포함할 수 있다.The plurality of L-th order data processing blocks may include a plurality of variable length decoders (VLDs). The pipeline structure connected to the plurality of L-th order data processing blocks may include: an inverse quantization / inverse discrete cosine transform unit (IQ / IDCT) for sequentially processing data of a macroblock unit processed by the plurality of VLDs; A motion compensation unit (MC) for sequentially processing data in macroblock units processed by the IQ / IDCT; And a deblocking filter (DF) for sequentially processing macroblock data processed by the motion compensator and outputting the data to the multiplexer.

상기 파이프 라인 구조는 직렬 연결된 다수 개의 데이터 처리 블록을 포함하며, 상기 각각의 상기 L차 데이터 처리 블록 및 상기 파이프 라인 구조에 포함되는 각각의 상기 데이터 처리 블록은 특정 연산 기능을 수행하는 코어를 의미할 수 있다.The pipeline structure includes a plurality of data processing blocks connected in series, and each of the L-th data processing block and each of the data processing blocks included in the pipeline structure may mean a core performing a specific operation function. Can be.

한편, 상술한 본 발명의 기술적 과제를 해결하기 위하여 본 발명은 다른 측면에서 비디오 데이터 처리 방법을 제공한다. 상기 비디오 데이터 처리 방법은, 프레임 단위의 비트 스트림을 수신하는 단계와; 상기 수신되는 비트 스트림을 다수 개의 슬라이스로 분할하는 단계와; 상기 분할되는 다수 개의 슬라이스를 다수 개의 L(L은 1 이상의 정수) 차 데이터 처리 블록으로 분배하는 단계와; 적어도 하나의 슬라이스를 매크로블록 단위로 순차적으로 처리하는 L차 데이터 처리 블록을 다수 개 사용하여 상기 분배되는 다수 개의 슬라이스를 처리하는 단계; 및 다수 개의 상기 L차 데이터 처리 블록에 의하여 처리되는 데이터를 복호화하는 단계를 포함할 수 있다.Meanwhile, in order to solve the above technical problem, the present invention provides a video data processing method in another aspect. The video data processing method includes: receiving a bit stream in units of frames; Dividing the received bit stream into a plurality of slices; Distributing the divided plurality of slices into a plurality of L (L is an integer of 1 or more) data processing blocks; Processing the plurality of distributed slices using a plurality of L-th order data processing blocks that sequentially process at least one slice in macroblock units; And decoding the data processed by the plurality of L-th order data processing blocks.

상기 분배 단계는, 상기 분할되는 슬라이스의 개수가 상기 L차 데이터 처리 블록의 개수보다 많을 경우, 각각의 상기 L차 데이터 처리 블록으로 슬라이스를 하나씩 분배하고, 분배된 슬라이스의 처리를 먼저 완료한 L차 데이터 처리 블록 순으로 분배되지 않은 나머지 슬라이스를 하나씩 분배하는 단계를 포함할 수 있다.In the distributing step, when the number of the slices to be divided is greater than the number of the L-th order data processing blocks, the slices are distributed one by one to each L-th order data processing block, and the L-th order of completing the processing of the distributed slices first. And distributing the remaining slices which are not distributed in data processing block order one by one.

상기 비디오 데이터 처리 방법은, 상기 복호화된 데이터를 다중화하는 단계를 더 포함할 수도 있다. 각각의 상기 L차 데이터 처리 블록의 데이터 처리 시간은 상기 복호화에 사용되는 다른 차수의 데이터 처리 블록의 데이터 처리 시간 보다 더 길 수 있다.The video data processing method may further include multiplexing the decoded data. The data processing time of each L-th order data processing block may be longer than the data processing time of another order data processing block used for the decoding.

이상 설명한 바와 같이, 본 발명에 따르면 비트 스트림을 독립적인 처리가 가능한 슬라이스별로 분할하여 병목 부분에 해당하는 데이터 처리 블록 단, 예컨대 VLD 단에서 병렬로 처리함으로써, 데이터 처리 블록의 활용 효율을 높이고 처리 시간이 많이 소요되는 데이터 처리 블록으로 인한 비디오 복호화기의 성능 저하를 방지할 수 있다.As described above, according to the present invention, by dividing the bit stream into slices that can be processed independently, the data processing block stage corresponding to the bottleneck portion, for example, the VLD stage, is processed in parallel, thereby increasing the utilization efficiency of the data processing block and processing time. The performance degradation of the video decoder due to this consuming data processing block can be prevented.

이하, 본 발명이 속하는 분야에 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 상세히 설명한다. 이하에 설명할 본 발명의 바람직한 실시예에서는 내용의 명료성을 위하여 특정한 기술 용어를 사용한다. 하지만 본 발명은 그 선택된 특정 용어에 한정되지는 않으며, 각각의 특정 용어가 유사한 목적을 달성하기 위하여 유사한 방식으로 동작하는 모든 기술 동의어를 포함함을 미리 밝혀둔다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. In the preferred embodiment of the present invention described below, specific technical terms are used for clarity of content. However, the invention is not limited to the particular term selected, and it is to be understood that each specific term includes all technical synonyms that operate in a similar manner to achieve a similar purpose.

도 3은 본 발명의 바람직한 실시예에 따른 비디오 데이터 처리 장치의 구성을 도시하는 블록도이다.3 is a block diagram showing a configuration of a video data processing apparatus according to a preferred embodiment of the present invention.

도 3에 도시된 바와 같이, 본 발명의 바람직한 실시예에 따른 비디오 데이터 처리 장치(1000)는 슬라이스(Slice) 분배부(110), 비디오 복호화기(100) 및 멀티플렉서(Multiplexer)(150)를 구비한다.As shown in FIG. 3, the video data processing apparatus 1000 according to an exemplary embodiment of the present invention includes a slice distributor 110, a video decoder 100, and a multiplexer 150. do.

상기 비디오 복호화기(100)는 N(N은 2이상의 정수) 개의 1차 데이터 처리 블록(121~12n), 2차 데이터 처리 블록(132) 내지 K(K는 3이상의 정수)차 처리 블록(13k)을 포함할 수 있다. 상기 N 개의 1차 데이터 처리 블록(121~12n), 2차 데이터 처리 블록(132) 내지 K차 데이터 처리 블록(13k)은 파이프라인 구조로 연결된다. 각각의 데이터 처리 블록들은 고유한 연산 기능을 수행하는 코어를 의미할 수 있다.The video decoder 100 includes N (N is an integer of 2 or more) primary data processing blocks 121 to 12n, and secondary data processing blocks 132 to K (K is an integer of 3 or more). ) May be included. The N primary data processing blocks 121-12n and the secondary data processing blocks 132 through K-th data processing blocks 13k are connected in a pipeline structure. Each data processing block may refer to a core that performs a unique computation function.

또한, 도시되지는 않았지만 2차 데이터 처리 블록(132)과 K차 처리 블록(13k)의 사이에는 파이프라인된(Pipelined) 다수 개의 데이터 처리 블록이 존재 할 수 있다. 예를 들어, K가 5일 경우, 2차 데이터 처리 블록(132)의 출력 단에는 3차 데이터 처리 블록이 연결되며, 그 3차 데이터 처리 블록의 출력 단에는 4차 데이터 처리 블록이 연결되며, 4차 데이터 처리 블록의 출력 단에는 5차 데이터 처리 블록(13k에 해당)이 연결될 수 있다. 이때 5차 데이터 처리 블록(13k에 해당)의 출력 단은 멀티플렉서(150)와 연결될 수 있을 것이다.Although not shown, a plurality of pipelined data processing blocks may exist between the secondary data processing block 132 and the K-th processing block 13k. For example, when K is 5, the tertiary data processing block is connected to the output terminal of the secondary data processing block 132, and the quaternary data processing block is connected to the output terminal of the tertiary data processing block. The fifth data processing block 13k may be connected to the output terminal of the fourth data processing block. In this case, the output terminal of the fifth order data processing block 13k may be connected to the multiplexer 150.

상기 N 개의 1차 데이터 처리 블록(121~12n), 2차 데이터 처리 블록(132) 내지 K차 데이터 처리 블록(13k)은 각각 매크로블록 단위로 순차적인 데이터를 처리한다. 이때, 각각의 1차 데이터 처리 블록은 다른 데이터 처리 블록에 비하여 연산량이 많아 상대적으로 긴 연산 시간을 소모한다. 예를 들어, 2차 데이터 처리 블록(132)과 K차 데이터 처리 블록(13k)이 할당된 연산을 처리하기 위하여 소정의 단위 시간 T를 소모한다면 각각의 1차 데이터 처리 블록은 각각 3T의 연산 시간을 소모할 수 있다.The N primary data processing blocks 121 to 12n and the secondary data processing blocks 132 to K-th data processing blocks 13k each process sequential data in macroblock units. In this case, each primary data processing block has a large amount of calculation compared to other data processing blocks, and consumes a relatively long calculation time. For example, if the secondary data processing block 132 and the K-order data processing block 13k consume a predetermined unit time T in order to process the allocated operation, each primary data processing block each has a computation time of 3T. Can consume.

상기 슬라이스 분배부(110)는 부호화된 비디오 데이터, 즉 비트 스트림을 프레임 단위로 수신하고, 수신된 프레임 단위의 비트 스트림을 슬라이스 단위로 분할하여 N개의 1차 데이터 처리 블록(121~12n)으로 분배하는 기능을 수행한다. 슬라이스 분배부(110)는 비트 스트림을 수신하여 저장하기 위한 버퍼, 버퍼에 저장된 비트 스트림을 역다중화하여 슬라이스 단위로 분할하기 위한 디멀티플렉서(Demultiplexer), N개의 1차 데이터 처리 블록(121~12n)과 연동하면서 슬라이스 분배를 제어하는 기능을 수행하기 위한 제어 모듈 등을 구비할 수 있다.The slice distributor 110 receives encoded video data, that is, a bit stream, in units of frames, divides the received bit stream in units of slices, and distributes the same to N primary data processing blocks 121-12n. It performs the function. The slice distributor 110 may include a buffer for receiving and storing a bit stream, a demultiplexer for demultiplexing and dividing the bit stream stored in the buffer into slice units, N primary data processing blocks 121 to 12n, and It may be provided with a control module for performing a function for controlling the slice distribution while interlocking.

상기 수신되는 비트 스트림은 부호화 프로세스 시에 프레임 당 M(M은 2이상 의 정수)개의 독립적인 슬라이스로 구분되어 부호화된다. 각각의 슬라이스는 순차적으로 연관되는 다수 개의 매크로블록을 포함하며, 슬라이스 헤더에는 해당 슬라이스를 식별하기 위한 슬라이스 식별 정보 및 각각의 매크로블록을 식별하기 위한 매크로블록 식별 정보가 삽입된다.The received bit stream is encoded by being divided into M (M is an integer of 2 or more) independent slices per frame in the encoding process. Each slice includes a plurality of macroblocks that are sequentially associated, and the slice header inserts slice identification information for identifying the slice and macroblock identification information for identifying each macroblock.

슬라이스 간은 서로 독립적이기 때문에 개별 처리가 가능하다. 따라서 슬라이스 분배부(110)는 수신되는 프레임 단위의 비트 스트림을 역다중화하여 슬라이스 식별 정보에 따라 M개의 슬라이스로 분할한 후, 분할된 M개의 슬라이스를 N개의 1차 데이터 처리 블록(121~12n)으로 분배할 수 있다.Because slices are independent of each other, they can be processed separately. Therefore, the slice distributor 110 demultiplexes the received bit stream in frame units and divides the M stream into M slices according to the slice identification information, and then divides the divided M slices into N primary data processing blocks 121 to 12n. Can be dispensed with.

예를 들면, 슬라이스의 개수와 1차 데이터 처리 블록(121~12n)의 개수가 동일할 경우(즉, M과 N이 동일할 경우), 슬라이스 분배부(110)는 수신되는 프레임 단위의 비트 스트림을 M개의 슬라이스로 분할하고, 분할된 M개의 슬라이스를 하나씩 각각의 1차 데이터 처리 블록으로 전송할 수 있다. 이 경우 M개의 슬라이스는 일대일 대응되는 N개의 1차 데이터 처리 블록(121~12n)에 의하여 처리될 수 있다.For example, when the number of slices and the number of primary data processing blocks 121 to 12n are the same (that is, when M and N are the same), the slice distributor 110 may receive a bit stream in a received frame unit. May be divided into M slices, and the divided M slices may be transmitted to each primary data processing block one by one. In this case, M slices may be processed by N primary data processing blocks 121 ˜ 12n corresponding to one to one.

한편, 슬라이스의 개수가 1차 데이터 처리 블록(121~12n)의 개수보다 많을 경우(즉, M이 N보다 클 경우), 슬라이스 분배부(110)는 분할된 M개의 슬라이스 중 N개를 하나씩 각각의 1차 데이터 처리 블록으로 전송하고, 나머지 M-N 개의 슬라이스는 먼저 처리를 완료하는 1차 데이터 처리 블록 순으로 하나씩 전송할 수 있다. 만약, 슬라이스의 개수가 1차 데이터 처리 블록(121~12n)의 개수보다 작을 경우(즉, M이 N보다 작을 경우), 슬라이스 분배부(110)는 분할된 M개의 슬라이스를 N개의 1차 데이터 처리 블록(121~12n) 중 M개의 1차 데이터 처리 블록으로 하나씩 전 송할 수 있다.On the other hand, when the number of slices is larger than the number of primary data processing blocks 121 to 12n (that is, when M is larger than N), the slice distributor 110 each selects N pieces of the M slices divided one by one. The first data processing block may be transmitted, and the remaining MN slices may be transmitted one by one in order of the first data processing block to complete the processing. If the number of slices is smaller than the number of primary data processing blocks 121 to 12n (that is, when M is smaller than N), the slice distributor 110 divides the divided M slices into N primary data. Each of the processing blocks 121 to 12n may be transferred to M primary data processing blocks one by one.

이와 같이 슬라이스 분배부(110)로부터 슬라이스들이 분배되면, 각각의 1차 데이터 처리 블록은 자신들로 분배된 슬라이스를 매크로블록 단위로 처리한 뒤 이를 2차 데이터 처리 블록(132)으로 전송한다. 2차 데이터 처리 블록(132) 내지 K차 데이터 처리 블록(13k)은 N개의 1차 데이터 처리 블록(121~12n)으로부터 수신되는 매크로블록 단위의 데이터를 파이프라인 구조에 따라 순차적으로 처리함으로써 데이터를 복호화한다.When slices are distributed from the slice distributor 110 as described above, each primary data processing block processes the slices distributed in the macroblock unit and transmits the slices to the secondary data processing block 132. The secondary data processing blocks 132 to K-th data processing blocks 13k sequentially process the data in macroblock units received from the N primary data processing blocks 121 to 12n according to the pipeline structure. Decrypt

멀티플렉서(150)는 K차 데이터 처리 블록(13k)으로부터 수신되는 복호화된 데이터들을 다중화한 뒤 송출하거나 프레임 버퍼(Frame Buffer)에 저장하는 기능을 수행할 수 있다. 멀티플렉서(150)는 이러한 동작을 위하여 슬라이스 헤더의 매크로블록 식별 정보를 사용할 수 있다. 즉, 멀티플렉서(150)는 슬라이스 분배부(110)의 디멀티플렉서의 역동작을 수행할 수 있다.The multiplexer 150 may perform a function of multiplexing the decoded data received from the K-th data processing block 13k and then transmitting or storing the decoded data in a frame buffer. The multiplexer 150 may use macroblock identification information of the slice header for this operation. That is, the multiplexer 150 may perform an inverse operation of the demultiplexer of the slice distributor 110.

이상, 본 발명의 바람직한 실시예에 따른 비디오 데이터 처리 장치(1000)를 설명하였다. 이러한 비디오 데이터 처리 장치(1000)의 구조는 멀티 코어 또는 멀티 쓰레드 환경 하에서 파이프라인 구조를 사용하는 시스템이라면 어디서나 적용 가능하며 따라서 다양한 다른 실시 형태가 있을 수 있다. 이하에서는 본 발명의 바람직한 다른 실시예를 설명하기로 한다.In the above, the video data processing apparatus 1000 according to the preferred embodiment of the present invention has been described. The structure of the video data processing apparatus 1000 may be applied to any system using a pipeline structure under a multi-core or multi-threaded environment, and thus, various other embodiments may be provided. Hereinafter, another preferred embodiment of the present invention will be described.

도 4는 본 발명의 바람직한 다른 실시예에 따른 비디오 데이터 처리 장치의 구성을 도시하는 블록도이다.4 is a block diagram showing a configuration of a video data processing apparatus according to another preferred embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명의 바람직한 다른 실시예에 따른 비디오 데이터 처리 장치(2000)는 슬라이스 분배부(210), 비디오 복호화기(200) 및 멀티플렉서(260)를 구비한다.As shown in FIG. 4, the video data processing apparatus 2000 according to another exemplary embodiment of the present invention includes a slice distributor 210, a video decoder 200, and a multiplexer 260.

본 실시예에서의 비디오 복호화기(200)는 3개의 가변장 복호화기(VLD)(221~223), 즉 VLD1(221), VLD2(222) 및 VLD3(223)를 구비한다. 상기 VLD들(221~223)은 각각 자신에게 할당된 슬라이스를 매크로블록 단위로 순차적으로 처리한다. 예를 들어, VLD들(221~223)은 가변장 부호화된 DCT 계수를 대상으로 복호화를 수행하여 복호화된 데이터 블록들을 생성한다. 또한 다른쪽 방향으로는 복호화된 움직임 벡터(MV)를 생성할 수도 있다.The video decoder 200 according to the present embodiment includes three variable length decoders (VLDs) 221 to 223, that is, VLD1 221, VLD2 222, and VLD3 223. The VLDs 221 to 223 sequentially process slices allocated to the VLDs in macroblock units. For example, the VLDs 221 ˜ 223 decode the variable length coded DCT coefficients to generate decoded data blocks. In addition, the decoded motion vector MV may be generated in the other direction.

VLD1(221), VLD2(222) 및 VLD3(223)는 역양자화/역이산코사인 변환기(IQ/IDCT)(230)와 연결된다. IQ/IDCT(230)는 VLD1(221), VLD2(222) 및 VLD3(223)로부터 각각 전송되는 매크로블록 단위의 데이터를 역양자화하여 실제 DCT 계수 값을 복원하고, 역이산코사인 변환을 수행한다. 본 실시예에서는 IQ와 IDCT를 하나의 데이터 처리 블록으로 예시하고 있지만 이는 한정된 사항은 아니며 실시 환경에 따라서는 IQ와 IDCT는 별도 데이터 처리 블록, 즉 코어로 구성할 수 있다.VLD1 221, VLD2 222 and VLD3 223 are connected with an inverse quantization / inverse discrete cosine converter (IQ / IDCT) 230. The IQ / IDCT 230 inversely quantizes data in macroblock units transmitted from the VLD1 221, the VLD2 222, and the VLD3 223 to restore actual DCT coefficient values, and performs inverse discrete cosine transform. In the present embodiment, IQ and IDCT are illustrated as one data processing block, but this is not a limitation and according to an implementation environment, IQ and IDCT may be configured as separate data processing blocks, that is, cores.

IQ/IDCT(230)는 움직임 보상기(MC)(240)와 연결되며, MC(240)의 출력은 매크로블록 경계선의 블록킹 화질 열화 현상을 감소시키기 위한 디블록킹 필터(DF)(250)와 연결된다. 또한DF(250)의 출력단은 멀티플렉서(260)와 연결된다.The IQ / IDCT 230 is connected to a motion compensator (MC) 240, and the output of the MC 240 is connected to a deblocking filter (DF) 250 to reduce the blocking quality deterioration of the macroblock boundary. . Also, the output terminal of the DF 250 is connected to the multiplexer 260.

이와 같이 상기 비디오 복호화기(200)는 다수 개의 코어, 예컨대3개의 VLD(221~223), IQ/IDCT(230), MC(240) 및 DF(250)를 파이프라인 구조로 연동하는 구성을 갖는다. 이러한 비디오 복호화기(200)의 입력단과 출력단에는 슬라이스 분 배부(210)와 멀티플렉서(260)가 각각 연결된다.As described above, the video decoder 200 has a configuration in which a plurality of cores, for example, three VLDs 221 to 223, an IQ / IDCT 230, an MC 240, and a DF 250 interwork with each other in a pipelined structure. . The slice distributor 210 and the multiplexer 260 are connected to the input and output terminals of the video decoder 200, respectively.

슬라이스 분배부(210)는 부호화된 비디오 데이터, 즉 비트 스트림을 프레임 단위로 수신하여 임시로 저장하고, 그 저장된 비트 스트림을 역다중화하여 슬라이스별로 분할한 뒤 분할된 슬라이드들을 하나씩 VLD1(221), VLD2(222) 및 VLD3(223)로 분배하는 기능을 수행한다. 이러한 슬라이스 분배부(210)는 VLD들(221~223)과 연동하면서 슬라이스의 개수와 VLD의 개수와의 관계, VLD간의 처리 성능 차이, 슬라이스의 크기 등을 고려하여 분배 제어 동작을 수행할 수도 있다.The slice distributor 210 temporarily receives the encoded video data, that is, the bit stream, in units of frames, demultiplexes the stored bit stream, divides the slices by slices, and divides the divided slides one by one into the VLD1 221 and VLD2. 222 and the VLD3 223. The slice distributor 210 may perform a distribution control operation in consideration of the relationship between the number of slices and the number of VLDs, the difference in processing performance between VLDs, the size of the slice, and the like, in association with the VLDs 221 to 223. .

멀티플렉서(260)는 DF(250)로부터 수신되는 복호화된 데이터들을 다중화한 뒤 출력, 예컨대 송출하거나 프레임 버퍼에 저장하는 기능을 수행할 수 있다. 멀티플렉서(260)는 이러한 동작을 위하여 슬라이스 헤더의 매크로블록 식별 정보를 사용할 수 있다. 이러한 멀티플렉서(260)는 슬라이스 분배부(210)에 의해 수행되는 역다중화 동작을 역으로 수행하는 모듈이라 할 수 있다.The multiplexer 260 may multiplex the decoded data received from the DF 250 and then output, for example, transmit or store the decoded data in the frame buffer. The multiplexer 260 may use the macroblock identification information of the slice header for this operation. The multiplexer 260 may be referred to as a module that reversely performs the demultiplexing operation performed by the slice distributor 210.

도 5는 도 4에 도시된 비디오 데이터 처리 장치(2000)의 상세한 동작 절차를 설명하기 위한 흐름도이고, 도 6은 슬라이스 분배부(210)에 의하여 수신되는 비트 스트림의 프레임의 구성을 설명하기 위한 예시도이다.FIG. 5 is a flowchart for describing a detailed operation procedure of the video data processing apparatus 2000 shown in FIG. 4, and FIG. 6 is an example for describing a configuration of a frame of a bit stream received by the slice distributor 210. It is also.

도 4 내지 도 6을 참조하면, 먼저 슬라이스 분배부(210)는 부호화된 비디오 데이터 즉, 비트 스트림을 프레임 단위로 수신하여 버퍼에 저장할 수 있다(단계:S1). 이때 수신되는 비트 스트림은 부호화 프로세스 시, 예컨대 프레임 당 3개의 독립적인 슬라이스로 구분되어 부호화된 것으로 가정한다.4 to 6, first, the slice distributor 210 may receive encoded video data, that is, a bit stream, in units of frames and store it in a buffer (step S1). In this case, it is assumed that the received bit stream is encoded, for example, divided into three independent slices per frame.

도 6에 도시된 바와 같이, 프레임은 3개의 슬라이스 즉, 슬라이스1, 슬라이 스 2, 슬라이스 3을 포함한다. 상기 3개의 슬라이스는 복호화 프로세스에서 각각 독립적인 처리가 가능하다. 각 슬라이스는 순차적인 처리가 요구되는 다수 개의 매크로블록을 포함한다. 슬라이스 1은 p개의 매크로블록, 예컨대 MB(1-1), MB(1-2), MB(1-3), …, MB(1-p)를 포함한다. 슬라이스 2는 q개의 매크로블록, 예컨대 MB(2-1), MB(2-2), MB(2-3), …, MB(2-q)를 포함한다. 또한 슬라이스 3은 r개의 매크로블록, 예컨대 MB(3-1), MB(3-2), MB(3-3), …, MB(3-r)를 포함한다.As shown in FIG. 6, the frame includes three slices, that is, slice 1, slice 2, and slice 3. The three slices can be independently processed in the decoding process. Each slice contains a plurality of macroblocks that require sequential processing. Slice 1 is composed of p macroblocks, such as MB (1-1), MB (1-2), MB (1-3),... , MB (1-p). Slice 2 consists of q macroblocks, e.g. MB (2-1), MB (2-2), MB (2-3),... , MB (2-q). Slice 3 also contains r macroblocks, e.g. MB (3-1), MB (3-2), MB (3-3),... , MB (3-r).

상기 p, q, r은 각각 2이상의 정수로서 p, q, r이 모두 동일할 수도 있고 상이할 수도 있다. 즉 각 슬라이스의 크기는 서로 동일할 수도 있고, 상이할 수도 있다.P, q, and r are each an integer of 2 or more, and all of p, q, and r may be the same or different. That is, the size of each slice may be the same or different.

슬라이스 분배부(210)는 버퍼에 저장된 비트 스트림을 역다중화하여 슬라이스 1, 슬라이스 2, 슬라이스 3로 나눈 후(단계:S2), 이들을 하나씩 3개의 VLD(221~223)로 분배한다(단계:S3). 예를 들어, 슬라이스 분배부(210)는 슬라이스 1를 VLD1(221)로 전송하고, 슬라이스 2를 VLD2(222)로 전송하고, 슬라이스 3를 VLD3(223)로 전송할 수 있다.The slice distributor 210 demultiplexes the bit streams stored in the buffer, divides them into slices 1, 2, and 3 (step S2), and then distributes them to three VLDs 221 to 223 one by one (step: S3). ). For example, the slice distributor 210 may transmit slice 1 to VLD1 221, transmit slice 2 to VLD2 222, and transmit slice 3 to VLD3 223.

각각의 VLD(221, 222, 223)는 슬라이스 분배부(210)에 의하여 분배된 슬라이스를 매크로블록 단위로 순차적으로 처리한다(단계:S4).Each VLD 221, 222, 223 sequentially processes slices distributed by the slice distributor 210 in macroblock units (step S4).

VLD1(221)은 슬라이스 분배부(210)로부터 전송되는 슬라이스1의 매크로블록들을 매크로블록 MB(1-1)으로부터 MB(1-p)까지 순차적으로 처리하면서 각각의 출력 값을 IQ/IDCT(230)로 전송한다. 예를 들어, VLD1(221)은 먼저 슬라이스1의 매크로블록 MB(1-1)을 처리한 후 출력 값을 IQ/IDCT(230)로 전송하고, 이어서 그 다음 매 크로블록인 매크로블록 MB(1-2)를 처리한 후 출력 값을 IQ/IDCT(230)로 전송하고, 이어서 그 다음 매크로블록인 매크로블록 MB(1-3)를 처리한 후 그 출력 값을 IQ/IDCT(230)로 전송할 수 있다.The VLD1 221 sequentially processes the macroblocks of the slice 1 transmitted from the slice distributor 210 from the macroblocks MB (1-1) to MB (1-p) and processes each output value with the IQ / IDCT (230). To send). For example, the VLD1 221 first processes the macroblock MB (1-1) of the slice 1 and then transmits the output value to the IQ / IDCT 230, and then the macroblock MB (1) which is the next macroblock. After processing -2), the output value is transmitted to the IQ / IDCT 230, and then the macroblock MB (1-3), which is the next macroblock, is processed, and then the output value is transmitted to the IQ / IDCT 230. Can be.

VLD2(222)은 슬라이스 분배부(210)로부터 전송되는 슬라이스2의 매크로블록들을 매크로블록 MB(2-1)으로부터 MB(2-q)까지 순차적으로 처리하면서 각각의 출력 값을 IQ/IDCT(230)로 전송한다. 마찬가지로, VLD3(223)은 슬라이스 분배부로부터 전송되는 슬라이스3의 매크로블록들을 매크로블록 MB(3-1)으로부터 MB(3-r)까지 순차적으로 처리하면서 각각의 출력 값을 IQ/IDCT(230)로 전송한다.The VLD2 222 processes the macroblocks of the slice 2 transmitted from the slice distributor 210 sequentially from the macroblocks MB (2-1) to MB (2-q) and processes each output value with the IQ / IDCT (230). To send). Similarly, the VLD3 223 sequentially processes the macroblocks of slice 3 transmitted from the slice distributor from the macroblocks MB (3-1) to MB (3-r) and processes each output value with the IQ / IDCT 230. To send.

IQ/IDCT(230)는 VLD1(221), VLD2(222), VLD3(230)로부터 전송되는 매크로블록 단위의 출력 값을 수신되는 순서대로 순차적으로 처리한 후 출력 값을 MC(240)로 넘겨준다(단계:S5). IQ/IDCT(230)로부터 출력 값을 넘겨받은 MC(240)는 움직임 보상을 수행한 후 DF(250)로 출력을 넘겨준다(단계:S6). MC(240)의 출력을 수신한DF(250)는 디블록킹을 수행하여 매크로블록 경계선의 블록킹 화질 열화를 제거한다(단계:S7). 그러면 복호화된 이미지 데이터가 생성된다.The IQ / IDCT 230 sequentially processes the output values in units of macroblocks transmitted from the VLD1 221, the VLD2 222, and the VLD3 230 in the order in which they are received, and then passes the output values to the MC 240. (Step: S5). The MC 240 having received the output value from the IQ / IDCT 230 passes the output to the DF 250 after performing motion compensation (step: S6). The DF 250 receiving the output of the MC 240 performs deblocking to remove the blocking quality deterioration of the macroblock boundary line (step S7). The decoded image data is then generated.

이어서, 멀티플렉서(260)는 DF(250)로부터 전달되는 복호화된 이미지 데이터를 슬라이스 헤더의 매크로블록 번호 정보를 이용하여 다중화한 뒤(단계:S8), 이를 송출하거나 또는 프레임 버퍼에 저장한다(단계:S9).Subsequently, the multiplexer 260 multiplexes the decoded image data transmitted from the DF 250 using the macroblock number information of the slice header (step S8), and then sends it or stores it in the frame buffer (step: S9).

도 7은 도 4에 도시된 비디오 복호화기(200)의 초기 동작 시 각 데이터 처리 블록 별 동작 구간을 도시하는 예시도로서, IQ/IDCT(230), MC(240), DF(250)의 처리 시간을 단위 시간 T로 가정하고, VLD1(221), VLD2(222), VLD3(223)의 처리 시간 을 3T로 가정하였을 경우 각 시간 별 데이터 처리 블록의 동작 구간을 나타내고 있다.FIG. 7 is an exemplary diagram illustrating an operation section for each data processing block during an initial operation of the video decoder 200 illustrated in FIG. 4, and processes the IQ / IDCT 230, the MC 240, and the DF 250. When the time is assumed to be the unit time T and the processing time of the VLD1 221, the VLD2 222, and the VLD3 223 is 3T, the operation period of the data processing block for each time is shown.

도 7을 참조하면, VLD1(221)은 자신에게 분배된 슬라이스 1의 첫번째 매크로블록인 매크로블록 MB(1-1)을 처리하기 위하여 T0에서 T3까지 3T의 시간 동안 동작하였다. VLD2(222)는 자신에게 분배된 슬라이스 2의 첫번째 매크로블록인 매크로블록 MB(2-1)을 처리하기 위하여 T0에서 T3까지 3T의 시간 동안 동작하였다. 또한, VLD3(223)은 자신에게 분배된 슬라이스 3의 첫번째 매크로블록인 매크로블록 MB(3-1)을 처리하기 위하여 T0에서 T3까지 3T의 시간 동안 동작하였다.Referring to FIG. 7, VLD1 221 operates for 3T from T0 to T3 to process macroblock MB (1-1), which is the first macroblock of slice 1 distributed to it. VLD2 222 operated for 3T time from T0 to T3 to process macroblock MB (2-1), which is the first macroblock of slice 2 distributed to it. In addition, VLD3 223 operated for 3T from T0 to T3 to process macroblock MB (3-1), which is the first macroblock of slice 3 distributed to it.

IQ/IDCT(230)는 VLD1(221)에 의해 처리된 MB(1-1)을 처리하기 위하여 T3에서 T4까지 1T의 시간 동안 동작한 후, T4에서 T5까지 1T의 시간 동안은 VLD2(222)에 의하여 처리된 MB(2-1)을 처리하였으며, T5에서 T6까지 1T의 시간 동안은 VLD3(223)에 의하여 처리된 MB(3-1)을 처리하였다. 따라서 앞서 배경 기술 부분에서 언급한 종래의 비디오 복호화기(도 1의 10)에서 발생하였던 2T 동안의 불필요한 대기 상태는 제거된다. 이후에도 IQ/IDCT(230)는 VLD1(221)에 의해 처리된 MB(1-2), VLD2(222)에 의해 처리된 MB(2-2), VLD3(223)에 의해 처리된 MB(3-2)를 Idle 상태 없이 처리할 수 있다.IQ / IDCT 230 operates for 1T time from T3 to T4 to process MB (1-1) processed by VLD1 221, and then VLD2 222 for 1T time from T4 to T5. MB (2-1) was processed by, and the MB (3-1) was processed by the VLD3 (223) for a time of 1T from T5 to T6. Therefore, the unnecessary waiting state during 2T, which occurred in the conventional video decoder (10 of FIG. 1) mentioned in the background art, is eliminated. Thereafter, the IQ / IDCT 230 is the MB (1-2) processed by the VLD1 221, the MB (2-2) processed by the VLD2 222, and the MB (3- processed by the VLD3 223). 2) can be processed without Idle status.

동일한 개념으로, MC(240)의 경우에도 VLD1(221)및 IQ/IDCT(230)에 의하여 각각 처리된 MB(1-1)을 처리하기 위하여 T4에서 T5까지 1T의 시간 동안 동작한 후, T5에서 T6까지 1T의 시간 동안은 VLD2(222)및 IQ/DCT(230)에 의하여 처리된 MB(2-1)을 처리하였으며, T6에서 T7까지 1T의 시간 동안은 VLD3(223)및 IQ/IDCT(230)에 의하여 처리된 MB(3-1)을 처리하였다.In the same concept, even in the case of the MC 240, after operating for 1T from T4 to T5 to process the MB 1-1 processed by the VLD1 221 and the IQ / IDCT 230, respectively, T5. MB (2-1) processed by VLD2 (222) and IQ / DCT 230 for 1T from T6 to T6, and VLD3 (223) and IQ / IDCT for 1T from T6 to T7. MB (3-1) processed by 230 was processed.

이와 같이, 도 4에 도시된 비디오 데이터 처리 장치(2000)에 의하면 병목 현상이 발생하는 부분을 3개의 VLD(221~223)를 사용하여 슬라이스별로 처리함으로써 비디오 복호화기(200)에 구비되는 코어의 활용도를 높여 비디오 복호화기(200) 자체의 성능을 향상시킬 수 있다.As described above, according to the video data processing apparatus 2000 illustrated in FIG. 4, the portions in which the bottleneck occurs are processed by slices using three VLDs 221 ˜ 223, so that the cores provided in the video decoder 200 are processed. By increasing the utilization, the performance of the video decoder 200 itself may be improved.

한편, 비트 스트림의 인코딩 시에는 프레임이 VLD의 개수보다 많은 개수의 슬라이스로 구분되어 부호화되었을 경우가 존재할 수 있다. 이 경우 비트 스트림을 수신한 슬라이스 분배부(210)의 분배 제어 기능이 요구된다.On the other hand, when encoding the bit stream, there may be a case where a frame is divided and encoded into a number of slices larger than the number of VLDs. In this case, a distribution control function of the slice distributor 210 that receives the bit stream is required.

슬라이스 분배부(210)는 수신되는 비트 스트림을 역다중화하여 슬라이스별로 분할한 후 슬라이스의 개수가 비디오 복호화기(200)에 구비된 VLD의 개수보다 많다고 판단되면, 먼저 각각의 VLD로 하나씩 슬라이스를 분배하고 나머지 슬라이스는 먼저 처리를 완료한 VLD 순으로 하나씩 순차적으로 분배한다.When the slice distributor 210 demultiplexes the received bit stream and divides the slice into slices, and determines that the number of slices is greater than the number of VLDs provided in the video decoder 200, the slice distributor 210 first distributes the slices to each VLD. The remaining slices are sequentially distributed one by one in the order of completed VLD.

예를 들어, 비디오 복호화기(200)에 3개의 VLD, 예컨대 VLD1(221), VLD2(222), VLD3(223)가 구비되어 있는데 비하여, 비트 스트림은 그 인코딩 시에 프레임 당 5개의 독립적인 슬라이스, 예컨대 슬라이스 1, 슬라이스 2, 슬라이스 3, 슬라이스 4, 슬라이스 5로 구분되어 부호화되었다고 가정하면, 슬라이스 분배부(210)는 수신되는 비트 스트림을 슬라이스별로 분할한 후 슬라이스의 개수가 VLD의 개수보다 많은 것을 인지하고, 우선적으로 슬라이스 1, 슬라이스2, 슬라이스3는 각각 VLD1(221), VLD2(222), VLD3(223)로 분배한다.For example, the video decoder 200 is equipped with three VLDs, such as VLD1 221, VLD2 222, and VLD3 223, whereas the bit stream has five independent slices per frame at its encoding. For example, assuming that slice 1 is divided into slice 1, slice 2, slice 3, slice 4, and slice 5, the slice distributor 210 divides the received bit stream by slice, and then the number of slices is larger than the number of VLDs. It is recognized that slice 1, slice 2, and slice 3 are first distributed to VLD1 221, VLD2 222, and VLD3 223, respectively.

일정 시간이 흐른 후 만약 VLD 3(223)로부터 슬라이스3의 처리를 완료하였다 는 신호가 슬라이스 분배부(210)로 수신되면, 슬라이스 분배부(210)는 슬라이스 4를 VLD3로 분배한다. 이어서 VLD2(222)로부터 슬라이스 2의 처리를 완료하였다는 신호가 슬라이스 분배부(210)로 수신되면, 슬라이스 분배부(210)는 슬라이스 5를 VLD2(222)로 분배한다. 이후 VLD1(221), VLD2(222), VLD3(223)에 의하여 슬라이스1, 슬라이스 5, 슬라이스 4의 처리가 완료되면, 슬라이스 분배부(210)는 다음 프레임에 해당하는 5개의 슬라이스를 동일한 프로세스로 분배할 수 있다.After a certain period of time, if a signal is received from the VLD 3 (223) to complete the slice 3 processing, the slice distributor 210 distributes the slice 4 to the VLD 3. Subsequently, when a signal is received from the VLD2 222 to the slice distributor 210 to complete the processing of slice 2, the slice distributor 210 distributes slice 5 to the VLD2 222. Subsequently, when the processing of slice 1, slice 5, and slice 4 is completed by the VLD1 221, VLD2 222, and VLD3 223, the slice distributor 210 processes the five slices corresponding to the next frame in the same process. Can be distributed.

VLD별로 슬라이스의 처리 속도가 다른 이유는 프레임 내의 슬라이스의 크기가 서로 다르거나, 환경에 따라VLD간에 성능의 차이 발생할 수 있기 때문이다. 만약, 3개의 VLD(221~223)가 모두 동시에 슬라이스의 처리를 완료했을 경우에는 나머지 2개의 슬라이스는 3개의 VLD (221~223)중 임의의 2개로 분배할 수도 있다.The reason the slice processing speed is different for each VLD is that the slice size in the frame is different or the performance may be different between VLDs depending on the environment. If all three VLDs 221 to 223 have completed processing of slices at the same time, the remaining two slices may be distributed to any two of the three VLDs 221 to 223.

이상 본 발명에 대하여 그 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 기술적 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시켜 실시할 수 있음을 이해할 수 있을 것이다.Although the present invention has been described above with reference to its preferred embodiments, those skilled in the art will variously modify the present invention without departing from the spirit and scope of the invention as set forth in the claims below. And can be practiced with modification.

특히, 앞선 실시예들에서는 1차 데이터 처리 블록, 예컨대 VLD의 처리 시간이 다른 데이터 처리 블록들보다 커서 해당 병목 부분을 N개의 1차 데이터 처리 블록으로 병렬 처리하는 경우를 설명하였으나, 또 다른 실시예로서, 1차 데이터 처리 블록이 아닌 후단의 데이터 처리 블록, 예컨대 IQ/IDCT 등의 처리 시간이 다른 데이터 처리 블록들보다 클 경우가 있을 수 있다.In particular, in the above embodiments, the processing time of the primary data processing block, for example, the VLD is greater than that of other data processing blocks, so that the bottleneck portion is processed in parallel by N primary data processing blocks. For example, there may be a case where a processing time of a data processing block at a later stage other than the primary data processing block, for example, IQ / IDCT, is larger than other data processing blocks.

이 경우, 해당 병목 부분을 N 개의 데이터 처리 블록을 사용하여 처리할 수 있다. 즉, 병목 부분에 해당하는 L(L은 1이상의 정수, 즉 1과 같거나 1보다 큰 정수, L이 1인 경우는 앞서 설명한 실시예들을 통하여 설명한 바 있음)차 데이터 처리 블록을 N개 구비하고 슬라이스 분배부를 통하여 각각의 L차 데이터 처리 블록에 슬라이스를 분배하는 것이다.In this case, the bottleneck can be processed using N data processing blocks. In other words, L corresponding to the bottleneck portion (L is an integer greater than or equal to 1, that is, an integer equal to or greater than 1, and L is 1, as described through the above-described embodiments). The slice is distributed to each L-th order data processing block through the slice distributor.

이때, N 개의 L차 데이터 처리 블록의 전단에는1차 데이터 처리 블록으로부터 L-1차 데이터 처리 블록까지 파이프라인드된 구조가 연결될 수 있다. 슬라이스 분배부는 L-1차 데이터 처리 블록과 N개의 L차 데이터 처리 블록의 사이에 구비되어 N개의 L차 데이터 처리 블록으로 슬라이스를 분배할 수 있다.In this case, the pipelined structure from the primary data processing block to the L-1 primary data processing block may be connected to the front end of the N L-th data processing blocks. The slice distributor may be provided between the L-1 order data processing blocks and the N order data processing blocks to distribute slices to the N order data processing blocks.

또한 N 개의 L차 데이터 처리 블록의 후단에는 L+1차 데이터 처리 블록으로부터 K(이 경우 K는 L+1이상의 정수) 차 데이터 처리 블록까지 파이프라인드된 구조가 연결될 수도 있다.In addition, a pipelined structure may be connected to the rear of the N L-th data processing blocks from the L + primary data processing block to K (in this case, K is an integer greater than or equal to L + 1).

예를 들어, L이 4이고 K가 7이라고 가정하면, 비디오 데이터 처리 장치는 1차 데이터 처리 블록, 2차 데이터 처리 블록, 3차 데이터 처리 블록, 슬라이스 분배부, N개의 4차 데이터 처리 블록, 5차 데이터 처리 블록, 6차 데이터 처리 블록, 7차 데이터 처리 블록 및 멀티플렉서가 파이프라인 형태로 연결될 수 있을 것이다.For example, assuming that L is 4 and K is 7, the video data processing apparatus includes a primary data processing block, a secondary data processing block, a tertiary data processing block, a slice distributor, N quaternary data processing blocks, The fifth data processing block, the sixth data processing block, the seventh data processing block, and the multiplexer may be connected in a pipeline form.

도 2는 도 1에 도시되어 있는 비디오 복호화기(10)의 각 데이터 처리 블록 별 동작 구간을 도시하는 예시도이다.FIG. 2 is an exemplary diagram illustrating an operation period of each data processing block of the video decoder 10 illustrated in FIG. 1.

도 5는 도 4에 도시된 비디오 데이터 처리 장치의 상세한 동작 절차를 설명하기 위한 흐름도이다.FIG. 5 is a flowchart for describing a detailed operating procedure of the video data processing apparatus shown in FIG. 4.

도 6은 슬라이스 분배부에 의하여 수신되는 비트 스트림의 프레임의 구성을 설명하기 위한 예시도이다.6 is an exemplary diagram for describing a configuration of a frame of a bit stream received by a slice distributor.

도 7은 도 4에 도시된 비디오 복호화기의 초기 동작 시 각 데이터 처리 블록 별 동작 구간을 도시하는 예시도이다.FIG. 7 is an exemplary diagram illustrating an operation section for each data processing block in the initial operation of the video decoder illustrated in FIG. 4.

<도면의 주요 부분에 대한 부호 설명>Description of the Related Art [0002]

100 : 비디오 복호화기100: video decoder

110 : 슬라이스 분배부110: slice distribution unit

121, 122, …, 12n : 1차 데이터 처리 블록121, 122,... , 12n: primary data processing block

132 : 2차 데이터 처리 블록132: secondary data processing block

13k : K차 데이터 처리 블록13k: K-order data processing block

150 : 멀티 플렉서150: multiplexer

1000 : 비디오 데이터 처리 장치1000: video data processing device

Claims

A slice distributor configured to receive a bit stream, demultiplex the received bit stream, and divide the received bit stream into slice units capable of a plurality of independent processes;

Pipes connected to the plurality of L (L is an integer) or more data processing blocks for sequentially processing the slices distributed by the slice distributor in macroblock units included in the slices; A video decoder comprising a line structure; And

A multiplexer for multiplexing data output from the video decoder,

Each L-th data processing block consumes a longer processing time than a processing time of each data processing block included in the pipeline structure to process data in macroblock units.

The video data processing apparatus of claim 1, wherein the slice distributor divides the plurality of slices into the L-th data processing blocks.

The slice distributing unit of claim 1, wherein the slice distributor divides a slice one by one into each L-order data processing block when the number of the divided slices is larger than the number of the L-order data processing blocks, and processes the divided slices. The video data processing apparatus of claim 1, wherein the remaining slices are distributed one by one in order of the L-th order data processing block.

The pipeline structure of claim 1, wherein when L is 1, the pipeline structure connected to the plurality of L-th order data processing blocks includes:

And a pipelined structure from a secondary data processing block connected to the L order data processing block to a K (K is an integer of 2 or more) order data processing block.

The pipeline structure of claim 1, wherein when L is greater than 1, the pipeline structure is connected to the plurality of L-th order data processing blocks.

And a pipelined structure connected from the primary data processing block to the L-1 primary data processing block.

The pipeline structure of claim 5, wherein the pipeline structure is connected to the plurality of L-th order data processing blocks.

Video data, characterized in that connected to the rear end of the L-order data processing block, the pipelined structure from the L + primary data processing block to K (is an integer greater than or equal to L + 1) difference data processing block; Processing unit.

delete

The video data processing apparatus of claim 1, wherein the plurality of L-th order data processing blocks comprise a plurality of variable length decoders (VLDs).

The pipeline structure of claim 8, wherein the pipeline structure is connected to the plurality of L-th order data processing blocks.

An inverse quantization / inverse discrete cosine transform unit (IQ / IDCT) which sequentially processes data of a macroblock unit processed by the plurality of VLDs;

A motion compensation unit (MC) for sequentially processing data in macroblock units processed by the IQ / IDCT; And

And a deblocking filter (DF) for sequentially processing the macroblock data processed by the motion compensator and outputting the data to the multiplexer.

Receiving a bit stream in units of frames;

Dividing the received bit stream into a plurality of slices;

Distributing the divided plurality of slices into a plurality of L (L is an integer of 1 or more) difference data processing blocks;

Processing the plurality of distributed slices using a plurality of L-th order data processing blocks that sequentially process at least one slice in macroblock units; And

Decoding data processed by a plurality of L-th order data processing blocks;

And the data processing time of each L-th order data processing block is longer than the data processing time of another order data processing block used for the decoding.

The method of claim 10, wherein the dispensing step,

If the number of slices to be divided is greater than the number of L-th order data processing blocks, the slices are distributed one by one to each L-th order data processing block, and the L-th order data processing blocks in which processing of the distributed slices are completed first And distributing the remaining slices which have not been distributed one by one.

11. The method of claim 10, further comprising multiplexing the decoded data.

delete