KR20140130574A

KR20140130574A - Method and apparatus for processing moving image

Info

Publication number: KR20140130574A
Application number: KR20130048136A
Authority: KR
Inventors: 정태영; 김대연; 김현규
Original assignee: 주식회사 칩스앤미디어; 인텔렉추얼디스커버리 주식회사
Priority date: 2013-04-30
Filing date: 2013-04-30
Publication date: 2014-11-11

Abstract

An apparatus for processing a video is disclosed. The video processing apparatus comprises: a central video processing unit which parses parameter information or slice header information from video data inputted from a host, and registers the parsed slice header information to a scheduler; and a plurality of video processing units for processing the video according the parsed information by being controlled by the central video processing unit. Each of the plurality of video processing units searches the slice header information corresponding to the slice data to be processed in the scheduler to process the video.

Description

TECHNICAL FIELD [0001] The present invention relates to a video processing method and apparatus,

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a moving image processing method and apparatus, and more particularly, to a configuration in which a moving image is processed in a scalable manner using a plurality of processing units.

As the need for UHD has arisen, it has become difficult to accommodate the size of the storage medium and the bandwidth of the transmission medium with the current moving image compression technology. Therefore, a new compression standard technology for compressing UHD moving image has been required. Standardization was completed in January.

However, the HEVC can also be used for a video stream that is served over the internet and networks such as 3G and LTE. In this case, not only UHD but also FHD or HD class can be compressed with HEVC.

UHD TV also expects 4K 30fps in the short term, but 4K 60fps / 120fps, 8K 30fps / 60fps / ... The number of pixels to be processed per second is expected to increase.

In order to cost-effectively cope with various resolutions, frame rates, etc. according to such applications, it is necessary to have a video decoding apparatus that can be easily extended according to the performance and functions required in an application.

SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned needs, and an object of the present invention is to provide a video processing method and apparatus for controlling a Multi V-core using a scheduler.

According to an aspect of the present invention, there is provided an apparatus for processing a moving picture, the apparatus comprising: parsing parameter information or slice header information from moving picture data input from the host, and transmitting the parsed slice header information to a scheduler And a plurality of image processors for processing a moving image in accordance with the parsed information under the control of the image central processing unit, wherein the plurality of image processors are respectively connected to the slice data to be processed And searches the corresponding slice header information in the scheduler to process the moving image.

According to another aspect of the present invention, there is provided a method of processing moving images in a moving image processing apparatus having an image processing unit and a plurality of image processing units, A step of parsing parameter information or slice header information from moving picture data, the image processing unit registering the parsed slice header information in a scheduler, and registering the parsed slice header information in slicer data to be processed Searching the corresponding slice header information in the scheduler to process the moving image.

The moving picture processing method may be embodied as a computer-readable recording medium having recorded thereon a program for execution on a computer.

According to various embodiments of the present invention, it is possible to provide a video processing apparatus and method capable of effectively processing the number of pixels to be processed per second (4K 60 fps / 120 fps, 8K 30 fps / 60 fps / have.

1 is a block diagram illustrating a configuration of a moving picture encoding apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining an example of a method of dividing and processing an image into blocks.
3 is a block diagram showing an embodiment of an arrangement for performing inter prediction in an encoding apparatus.
4 is a block diagram illustrating a configuration of a moving picture decoding apparatus according to an embodiment of the present invention.
5 is a block diagram showing an embodiment of a configuration for performing inter prediction in a decoding apparatus.
6 and 7 are views showing an example of the configuration of a sequence parameter set (SPS).
8 and 9 are diagrams showing an example of the configuration of a picture parameter set (PPS).
10 to 12 are views showing an example of the configuration of a slice header (SH).
13 is a layer structure of a moving picture decoding apparatus according to an embodiment of the present invention.
FIG. 14 is a timing diagram illustrating a moving picture decoding operation of a VPU according to an embodiment of the present invention.
15 is a diagram illustrating a detailed operation of a V-CPU according to an embodiment of the present invention.
16 is a view for explaining control of synchronization of Multi V-Cores for data parallel processing of Multi V-Cores performed in a V-CPU according to an embodiment of the present invention.
17 to 18 are diagrams illustrating a method for determining the number of V-cores to be used for data parallel processing performed in the V-CPU according to an embodiment of the present invention.
19 to 20 are diagrams for explaining an entry point search method performed in the V-CPU according to an embodiment of the present invention.
FIG. 21 is a view for explaining a method of assigning entry points so that the number of pixels allocated to each of the Multi V-cores performed in the V-CPU according to an embodiment of the present invention is equal to the number of pixels assigned.
22 to 23 illustrate interfaces between a V-CPU and a V-CORE according to an embodiment of the present invention.
24 to 25 are views for explaining a method of controlling a Multi V-core using a scheduler according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. It should be understood, however, that the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, the same reference numbers are used throughout the specification to refer to the same or like parts.

Throughout this specification, when a part is referred to as being "connected" to another part, it is not limited to a case where it is "directly connected" but also includes the case where it is "electrically connected" do.

Throughout this specification, when a member is " on " another member, it includes not only when the member is in contact with the other member, but also when there is another member between the two members.

Throughout this specification, when an element is referred to as "including " an element, it is understood that the element may include other elements as well, without departing from the other elements unless specifically stated otherwise. The terms "about "," substantially ", etc. used to the extent that they are used throughout the specification are intended to be taken to mean the approximation of the manufacturing and material tolerances inherent in the stated sense, Accurate or absolute numbers are used to help prevent unauthorized exploitation by unauthorized intruders of the referenced disclosure. The word " step (or step) "or" step "used to the extent that it is used throughout the specification does not mean" step for.

Throughout this specification, the term " combination thereof " included in the expression of the machine form means one or more combinations or combinations selected from the group consisting of the constituents described in the expression of the machine form, And the like.

As an example of a method of encoding an actual image and its depth information map, the Moving Picture Experts Group (MPEG) and the Video Coding Experts Group (VCEG) having the highest coding efficiency among the video coding standards developed so far jointly standardize Encoding is performed using HEVC (High Efficiency Video Coding), but the present invention is not limited thereto.

Generally, the encoding apparatus includes an encoding process and a decoding process, and the decoding apparatus has a decoding process. The decoding process of the decoding apparatus is the same as the decoding process of the encoding apparatus. Hereinafter, an encoding apparatus will be mainly described, which is a block diagram illustrating a configuration of a moving picture encoding apparatus according to an embodiment of the present invention.

1, a moving picture encoding apparatus 100 according to the present invention includes a picture dividing unit 110, a transform unit 120, a quantization unit 130, a scanning unit 131, an entropy coding unit 140, An inter prediction unit 160, an inverse quantization unit 135, an inverse transformation unit 125, a post-processing unit 170, a picture storage unit 180, a subtraction unit 190, and an addition unit 195, .

The picture division unit 110 analyzes the input video signal to determine a prediction mode by dividing a picture into a coding unit of a predetermined size for each largest coding unit (LCU: Largest Coding Unit), and determines a prediction unit size .

The picture division unit 110 sends the prediction unit to be encoded to the intra prediction unit 150 or the inter prediction unit 160 according to a prediction mode (or a prediction method). Further, the picture division unit 110 sends the prediction unit to be encoded to the subtraction unit 190.

The picture may be composed of a plurality of slices, and the slice may be composed of a plurality of maximum coding units (LCU).

The LCU can be divided into a plurality of coding units (CUs), and the encoder can add information indicating whether or not to be divided to a bit stream. The decoder can recognize the position of the LCU by using the address (LcuAddr).

The coding unit CU in the case where division is not allowed is regarded as a prediction unit (PU), and the decoder can recognize the position of the PU using the PU index.

The prediction unit PU may be divided into a plurality of partitions. Also, the prediction unit PU may be composed of a plurality of conversion units (TUs).

In this case, the picture division unit 110 may send the image data to the subtraction unit 190 in units of blocks of a predetermined size (for example, in units of PU or TU) according to the determined coding mode.

Referring to FIG. 2, a CTU (Coding Tree Unit) is used as a moving picture encoding unit, and the CTU is defined as various square shapes. The CTU includes a coding unit CU (coding unit).

The coding unit (CU) is a quad tree and has a depth of 0 when the maximum coding unit LCU (Largest Coding Unit) having a size of 64 × 64 is set to 0, , That is, the encoding unit (CU) of 8 × 8 size, is recursively found.

A prediction unit for performing prediction is defined as a PU (Prediction Unit). Each coding unit (CU) is predicted by a unit divided into a plurality of blocks, and is divided into a square and a rectangle to perform prediction.

The transforming unit 120 transforms the residual block, which is a residual signal of the prediction block generated by the intra prediction unit 150 or the inter prediction unit 160, with the original block of the input prediction unit. The residual block is composed of a coding unit or a prediction unit. A residual block composed of a coding unit or a prediction unit is divided into optimum conversion units and converted. Different transformation matrices may be determined depending on the prediction mode (intra or inter). Also, since the residual signal of the intra prediction has directionality according to the intra prediction mode, the transformation matrix can be adaptively determined according to the intra prediction mode.

The transformation unit can be transformed by two (horizontal, vertical) one-dimensional transformation matrices. For example, in the case of inter prediction, a predetermined conversion matrix is determined.

On the other hand, in case of the intra prediction, when the intra prediction mode is horizontal, the probability that the residual block has the direction in the vertical direction becomes high. Therefore, the DCT-based integer matrix is applied in the vertical direction, Or a KLT-based integer matrix. When the intra prediction mode is vertical, a DST-based or KLT-based integer matrix is applied in the vertical direction and a DCT-based integer matrix is applied in the horizontal direction.

In case of DC mode, DCT-based integer matrix is applied in both directions. Further, in the case of intra prediction, the transformation matrix may be adaptively determined depending on the size of the conversion unit.

The quantization unit 130 determines a quantization step size for quantizing the coefficients of the residual block transformed by the transform matrix. The quantization step size is determined for each coding unit of a predetermined size or larger (hereinafter referred to as a quantization unit).

The predetermined size may be 8x8 or 16x16. And quantizes the coefficients of the transform block using a quantization matrix determined according to the determined quantization step size and the prediction mode.

The quantization unit 130 uses the quantization step size of the quantization unit adjacent to the current quantization unit as the quantization step size predictor of the current quantization unit.

The quantization unit 130 searches the left quantization unit, the upper quantization unit, and the upper left quantization unit of the current quantization unit in order, and can generate a quantization step size predictor of the current quantization unit using one or two effective quantization step sizes have.

For example, the effective first quantization step size searched in the above order can be determined as a quantization step size predictor. In addition, the average value of the two effective quantization step sizes searched in the above order may be determined as a quantization step size predictor, or when only one is effective, it may be determined as a quantization step size predictor.

When the quantization step size predictor is determined, the difference value between the quantization step size of the current encoding unit and the quantization step size predictor is transmitted to the entropy encoding unit 140.

On the other hand, there is a possibility that the left coding unit, the upper coding unit, and the upper left coding unit of the current coding unit do not exist. On the other hand, there may be coding units that were previously present on the coding order in the maximum coding unit.

Therefore, the quantization step sizes of the quantization units adjacent to the current coding unit and the quantization unit immediately before the coding order in the maximum coding unit can be candidates.

In this case, 1) the left quantization unit of the current coding unit, 2) the upper quantization unit of the current coding unit, 3) the upper left side quantization unit of the current coding unit, 4) . The order may be changed, and the upper left side quantization unit may be omitted.

The quantized transform block is provided to the inverse quantization unit 135 and the scanning unit 131.

The scanning unit 131 scans the coefficients of the quantized transform block and converts them into one-dimensional quantization coefficients. Since the coefficient distribution of the transform block after quantization may be dependent on the intra prediction mode, the scanning scheme is determined according to the intra prediction mode.

The coefficient scanning method may be determined depending on the size of the conversion unit. The scan pattern may vary according to the directional intra prediction mode. The scan order of the quantization coefficients is scanned in the reverse direction.

When the quantized coefficients are divided into a plurality of subsets, the same scan pattern is applied to the quantization coefficients in each subset. The scan pattern between subset applies zigzag scan or diagonal scan. The scan pattern is preferably scanned to the remaining subsets in the forward direction from the main subset containing the DC, but vice versa.

In addition, a scan pattern between subsets can be set in the same manner as a scan pattern of quantized coefficients in a subset. In this case, the scan pattern between the sub-sets is determined according to the intra-prediction mode. On the other hand, the encoder transmits to the decoder information indicating the position of the last non-zero quantization coefficient in the transform unit.

Information that can indicate the position of the last non-zero quantization coefficient in each subset can also be transmitted to the decoder.

The inverse quantization unit 135 dequantizes the quantized quantized coefficients. The inverse transform unit 125 restores the inversely quantized transform coefficients into residual blocks in the spatial domain. The adder combines the residual block reconstructed by the inverse transform unit with the intra prediction unit 150 or the received prediction block from the inter prediction unit 160 to generate a reconstruction block.

The post-processing unit 170 performs a deblocking filtering process for eliminating the blocking effect generated in the reconstructed picture, an adaptive offset application process for compensating a difference value from the original image on a pixel-by-pixel basis, and a coding unit And performs an adaptive loop filtering process to compensate the value.

The deblocking filtering process is preferably applied to the boundary of a prediction unit and a conversion unit having a size larger than a predetermined size. The size may be 8x8. The deblocking filtering process may include determining a boundary to be filtered, determining a bounary filtering strength to be applied to the boundary, determining whether to apply a deblocking filter, And selecting a filter to be applied to the boundary if it is determined to apply the boundary.

Whether or not the deblocking filter is applied is determined based on i) whether the boundary filtering strength is greater than 0 and ii) whether a pixel value at a boundary between two blocks adjacent to the boundary to be filtered (P block, Q block) Is smaller than a first reference value determined by the quantization parameter.

The filter is preferably at least two or more. If the absolute value of the difference between two pixels located at the block boundary is greater than or equal to the second reference value, a filter that performs relatively weak filtering is selected.

And the second reference value is determined by the quantization parameter and the boundary filtering strength.

The adaptive offset application process is to reduce a distortion between a pixel in the image to which the deblocking filter is applied and the original pixel. It may be determined whether to perform the adaptive offset applying process in units of pictures or slices.

The picture or slice may be divided into a plurality of offset regions, and an offset type may be determined for each offset region. The offset type may include a predetermined number (e.g., four) of edge offset types and two band offset types.

If the offset type is an edge offset type, the edge type to which each pixel belongs is determined and the corresponding offset is applied. The edge type is determined based on the distribution of two pixel values adjacent to the current pixel.

The adaptive loop filtering process can perform filtering based on a value obtained by comparing a reconstructed image and an original image through a deblocking filtering process or an adaptive offset applying process. The adaptive loop filtering can be applied to the entire pixels included in the 4x4 block or the 8x8 block.

Whether or not the adaptive loop filter is applied can be determined for each coding unit. The size and the coefficient of the loop filter to be applied may vary depending on each coding unit. Information indicating whether or not the adaptive loop filter is applied to each coding unit may be included in each slice header.

In the case of the color difference signal, it is possible to determine whether or not the adaptive loop filter is applied in units of pictures. The shape of the loop filter may have a rectangular shape unlike the luminance.

Adaptive loop filtering can be applied on a slice-by-slice basis. Therefore, information indicating whether or not adaptive loop filtering is applied to the current slice is included in the slice header or the picture header.

If the current slice indicates that adaptive loop filtering is applied, the slice header or picture header additionally includes information indicating the horizontal and / or vertical direction filter length of the luminance component used in the adaptive loop filtering process.

The slice header or picture header may include information indicating the number of filter sets. At this time, if the number of filter sets is two or more, the filter coefficients can be encoded using the prediction method. Accordingly, the slice header or the picture header may include information indicating whether or not the filter coefficients are encoded in the prediction method, and may include predicted filter coefficients when the prediction method is used.

On the other hand, not only luminance but also chrominance components can be adaptively filtered. Accordingly, the slice header or the picture header may include information indicating whether or not each of the color difference components is filtered. In this case, in order to reduce the number of bits, information indicating whether or not to filter Cr and Cb can be joint-coded (i.e., multiplexed coding).

At this time, in the case of chrominance components, since Cr and Cb are not all filtered in order to reduce the complexity, it is most likely to be the most frequent. Therefore, if Cr and Cb are not all filtered, the smallest index is allocated and entropy encoding is performed .

When both Cr and Cb are filtered, the largest index is allocated and entropy encoding is performed.

The picture storage unit 180 receives the post-processed image data from the post-processing unit 170, and restores and restores the pictures on a picture-by-picture basis. The picture may be a frame-based image or a field-based image. The picture storage unit 180 has a buffer (not shown) capable of storing a plurality of pictures.

The inter-prediction unit 160 performs motion estimation using at least one reference picture stored in the picture storage unit 180, and determines a reference picture index and a motion vector indicating a reference picture.

Based on the determined reference picture index and motion vector, a prediction block corresponding to a prediction unit to be coded is extracted from a reference picture used for motion estimation among a plurality of reference pictures stored in the picture storage unit 180 and output .

The intraprediction unit 150 performs intraprediction encoding using the reconstructed pixel values in a picture including the current prediction unit.

The intra prediction unit 150 receives the current prediction unit to be predictively encoded and selects one of a predetermined number of intra prediction modes according to the size of the current block to perform intra prediction.

The intraprediction unit 150 adaptively filters the reference pixels to generate intra prediction blocks. If reference pixels are not available, reference pixels may be generated using available reference pixels.

The entropy coding unit 140 entropy-codes the quantized coefficients quantized by the quantization unit 130, the intra prediction information received from the intra prediction unit 150, the motion information received from the inter prediction unit 160, and the like.

FIG. 3 is a block diagram of an embodiment of a configuration for performing inter-prediction in the encoding apparatus. The illustrated inter-prediction encoding apparatus includes a motion information determination unit 161, a motion information encoding mode determination unit 162, The information encoding unit 163, the prediction block generating unit 164, the residual block generating unit 165, the residual block encoding unit 166, and the multiplexer 167.

Referring to FIG. 3, the motion information determination unit 161 determines motion information of a current block. The motion information includes a reference picture index and a motion vector. The reference picture index indicates any one of the previously coded and reconstructed pictures.

And indicates one of the reference pictures belonging to the list 0 (L0) when the current block is unidirectionally inter-predictive-coded. On the other hand, when the current block is bi-directionally predictive-coded, a reference picture index indicating one of the reference pictures of the list 0 (L0) and a reference picture index indicating one of the reference pictures of the list 1 (L1) .

In addition, when the current block is bi-directionally predictive-coded, it may include an index indicating one or two pictures among the reference pictures of the composite list LC generated by combining the list 0 and the list 1.

The motion vector indicates the position of the prediction block in the picture indicated by each reference picture index. The motion vector may be a pixel unit (integer unit) or a sub-pixel unit.

For example, it may have a resolution of 1/2, 1/4, 1/8 or 1/16 pixels. When the motion vector is not an integer unit, the prediction block is generated from the pixels of the integer unit.

The motion information encoding mode determination unit 162 determines whether the motion information of the current block is to be coded in the skip mode, the merge mode, or the AMVP mode.

The skip mode is applied when there is a skip candidate having the same motion information as the current block motion information, and the residual signal is zero. The skip mode is also applied when the current block is the same size as the coding unit. The current block can be viewed as a prediction unit.

The merge mode is applied when there is a merge candidate having the same motion information as the current block motion information. The merge mode is applied when there is a residual signal when the current block is different in size from the coding unit or the size is the same. The merge candidate and the skip candidate can be the same.

AMVP mode is applied when skip mode and merge mode are not applied. The AMVP candidate having the motion vector most similar to the motion vector of the current block is selected as the AMVP predictor.

The motion information encoding unit 163 encodes the motion information according to a method determined by the motion information encoding mode deciding unit 162. [ When the motion information encoding mode is a skip mode or a merge mode, a merge motion vector encoding process is performed. When the motion information encoding mode is AMVP, the AMVP encoding process is performed.

The prediction block generation unit 164 generates a prediction block using the motion information of the current block. If the motion vector is an integer unit, the block corresponding to the position indicated by the motion vector in the picture indicated by the reference picture index is copied to generate a prediction block of the current block.

However, when the motion vector is not an integer unit, the pixels of the prediction block are generated from the pixels in the integer unit in the picture indicated by the reference picture index.

In this case, in the case of a luminance pixel, a prediction pixel can be generated using an 8-tap interpolation filter. In the case of a chrominance pixel, a 4-tap interpolation filter can be used to generate a predictive pixel.

The residual block generating unit 165 generates a residual block using the current block and the prediction block of the current block. If the current block size is 2Nx2N, a residual block is generated using a 2Nx2N prediction block corresponding to the current block and the current block.

However, if the current block size used for prediction is 2NxN or Nx2N, a prediction block for each of the 2NxN blocks constituting 2Nx2N is obtained, and the 2Nx2N final prediction block using the 2NxN prediction blocks is calculated Can be generated.

The 2Nx2N residual block may be generated using the 2Nx2N prediction block. It is possible to overlap-smoothing the pixels of the boundary portion to solve the discontinuity of the boundary portion of 2NxN-sized two prediction blocks.

The residual block coding unit 166 divides the generated residual block into one or more conversion units. Then, each conversion unit is transcoded, quantized, and entropy encoded. At this time, the size of the conversion unit may be determined according to the size of the residual block in a quadtree manner.

The residual block coding unit 166 transforms the residual block generated by the inter prediction method using an integer-based transform matrix. The transform matrix is an integer-based DCT matrix.

The residual block coding unit 166 uses a quantization matrix to quantize the coefficients of the residual block transformed by the transform matrix. The quantization matrix is determined by a quantization parameter.

The quantization parameter is determined for each coding unit equal to or larger than a predetermined size. The predetermined size may be 8x8 or 16x16. Therefore, when the current coding unit is smaller than the predetermined size, only the quantization parameters of the first coding unit are encoded in the coding order among the plurality of coding units within the predetermined size, and the quantization parameters of the remaining coding units are the same as the parameters. You do not have to.

The coefficients of the transform block are quantized using a quantization matrix determined according to the determined quantization parameter and the prediction mode.

The quantization parameter determined for each coding unit equal to or larger than the predetermined size is predictively encoded using a quantization parameter of a coding unit adjacent to the current coding unit. A quantization parameter predictor of the current coding unit can be generated by searching the left coding unit of the current coding unit, the upper coding unit order, and using one or two valid quantization parameters available.

For example, a valid first quantization parameter retrieved in the above order may be determined as a quantization parameter predictor. In addition, the first coding unit may be searched in order of the coding unit immediately before in the coding order, and the first validation parameter may be determined as a quantization parameter predictor.

The coefficients of the quantized transform block are scanned and converted into one-dimensional quantization coefficients. The scanning scheme can be set differently according to the entropy encoding mode. For example, in the case of CABAC encoding, the inter prediction encoded quantized coefficients can be scanned in a predetermined manner (zigzag or raster scan in the diagonal direction). On the other hand, when encoded by CAVLC, it can be scanned in a different manner from the above method.

For example, the scanning method may be determined according to the intra-prediction mode in the case of interlacing, or the intra-prediction mode in the case of intra. The coefficient scanning method may be determined depending on the size of the conversion unit.

The scan pattern may vary according to the directional intra prediction mode. The scan order of the quantization coefficients is scanned in the reverse direction.

The multiplexer 167 multiplexes the motion information encoded by the motion information encoder 163 and the residual signals encoded by the residual block encoder. The motion information may vary depending on the encoding mode.

That is, in the case of skipping or merge, only the index indicating the predictor is included. However, in the case of AMVP, the reference picture index, the difference motion vector, and the AMVP index of the current block are included.

Hereinafter, an operation of the intra predictor 150 will be described in detail.

First, the prediction mode information and the size of the prediction block are received by the picture division unit 110, and the prediction mode information indicates an intra mode. The size of the prediction block may be a square of 64x64, 32x32, 16x16, 8x8, 4x4, or the like, but is not limited thereto. That is, the size of the prediction block may be non-square instead of square.

Next, the reference pixel is read from the picture storage unit 180 to determine the intra-prediction mode of the prediction block.

It is determined whether or not the reference pixel is generated by examining whether or not the unavailable reference pixel exists. The reference pixels are used to determine the intra prediction mode of the current block.

If the current block is located at the upper boundary of the current picture, pixels adjacent to the upper side of the current block are not defined. In addition, when the current block is located at the left boundary of the current picture, pixels adjacent to the left side of the current block are not defined.

It is determined that these pixels are not usable pixels. In addition, it is determined that the pixels are not usable even if the current block is located at the slice boundary and pixels adjacent to the upper or left side of the slice are not encoded and reconstructed.

As described above, if there are no pixels adjacent to the left or upper side of the current block, or if there are no pixels that have been previously coded and reconstructed, the intra prediction mode of the current block may be determined using only available pixels.

However, it is also possible to use the available reference pixels of the current block to generate reference pixels of unusable positions. For example, if the pixels of the upper block are not available, the upper pixels may be created using some or all of the left pixels, or vice versa.

That is, available reference pixels at positions closest to the predetermined direction from the reference pixels at unavailable positions can be copied and generated as reference pixels. When there is no usable reference pixel in a predetermined direction, the usable reference pixel at the closest position in the opposite direction can be copied and generated as a reference pixel.

On the other hand, even if the upper or left pixels of the current block exist, the reference pixel may be determined as an unavailable reference pixel according to the encoding mode of the block to which the pixels belong.

For example, if the block to which the reference pixel adjacent to the upper side of the current block belongs is inter-coded and the reconstructed block, the pixels can be determined as unavailable pixels.

In this case, it is possible to generate usable reference pixels by using pixels belonging to the restored block by intra-coded blocks adjacent to the current block. In this case, information indicating that the encoder determines available reference pixels according to the encoding mode must be transmitted to the decoder.

Next, an intra prediction mode of the current block is determined using the reference pixels. The number of intra prediction modes that can be allowed in the current block may vary depending on the size of the block. For example, if the current block size is 8x8, 16x16, or 32x32, there may be 34 intra prediction modes. If the current block size is 4x4, 17 intra prediction modes may exist.

The 34 or 17 intra prediction modes may include at least one non-directional mode and a plurality of directional modes.

The one or more non-directional modes may be a DC mode and / or a planar mode. When the DC mode and the planar mode are included in the non-directional mode, there may be 35 intra-prediction modes regardless of the size of the current block.

At this time, it may include two non-directional modes (DC mode and planar mode) and 33 directional modes.

The planner mode generates a prediction block of the current block using at least one pixel value (or a predicted value of the pixel value, hereinafter referred to as a first reference value) located at the bottom-right of the current block and the reference pixels .

As described above, the configuration of the moving picture decoding apparatus according to an embodiment of the present invention can be derived from the configuration of the moving picture coding apparatus described with reference to FIG. 1 to FIG. 3. For example, The image can be decoded by performing an inverse process of the encoding process.

4 is a block diagram illustrating a configuration of a moving picture decoding apparatus according to an embodiment of the present invention.

4, the moving picture decoding apparatus according to the present invention includes an entropy decoding unit 210, an inverse quantization / inverse transform unit 220, an adder 270, a deblocking filter 250, a picture storage unit 260, An intra prediction unit 230, a motion compensation prediction unit 240, and an intra / inter changeover switch 280.

The entropy decoding unit 210 decodes the encoded bit stream transmitted from the moving picture encoding apparatus into an intra prediction mode index, motion information, a quantized coefficient sequence, and the like. The entropy decoding unit 210 supplies the decoded motion information to the motion compensation prediction unit 240. [

The entropy decoding unit 210 supplies the intra prediction mode index to the intraprediction unit 230 and the inverse quantization / inverse transformation unit 220. In addition, the entropy decoding unit 210 supplies the inverse quantization coefficient sequence to the inverse quantization / inverse transformation unit 220.

The inverse quantization / inverse transform unit 220 transforms the quantized coefficient sequence into an inverse quantization coefficient of the two-dimensional array. One of a plurality of scanning patterns is selected for the conversion. One of a plurality of scanning patterns is selected based on at least one of a prediction mode of the current block (i.e., one of intra prediction and inter prediction) and the intra prediction mode.

The intraprediction mode is received from an intraprediction unit or an entropy decoding unit.

The inverse quantization / inverse transform unit 220 restores the quantization coefficients using the selected quantization matrix among the plurality of quantization matrices to the inverse quantization coefficients of the two-dimensional array. A different quantization matrix is applied according to the size of the current block to be restored and a quantization matrix is selected based on at least one of a prediction mode and an intra prediction mode of the current block with respect to the same size block.

Then, the reconstructed quantized coefficient is inversely transformed to reconstruct the residual block.

The adder 270 reconstructs the image block by adding the residual block reconstructed by the inverse quantization / inverse transforming unit 220 to the intra prediction unit 230 or the prediction block generated by the motion compensation prediction unit 240.

The deblocking filter 250 performs deblocking filter processing on the reconstructed image generated by the adder 270. Accordingly, the deblocking artifact due to the video loss due to the quantization process can be reduced.

The picture storage unit 260 is a frame memory for holding a local decoded picture in which the deblocking filter process is performed by the deblocking filter 250.

The intraprediction unit 230 restores the intra prediction mode of the current block based on the intra prediction mode index received from the entropy decoding unit 210. A prediction block is generated according to the restored intra prediction mode.

The motion compensation prediction unit 240 generates a prediction block for the current block from the picture stored in the picture storage unit 260 based on the motion vector information. When motion compensation with a decimal precision is applied, a prediction block is generated by applying a selected interpolation filter.

The intra / inter selector switch 280 provides the adder 270 with a prediction block generated in either the intra prediction unit 230 or the motion compensation prediction unit 240 based on the coding mode.

FIG. 5 is a block diagram of an embodiment for performing inter prediction in a decoding apparatus. The inter prediction decoding apparatus includes a demultiplexer 241, a motion information encoding mode determination unit 242, a merge mode motion information decoding unit 242, An AMVP mode motion information decoding unit 244, a prediction block generating unit 245, a residual block decoding unit 246, and a restoration block generating unit 247.

Referring to FIG. 5, the demultiplexer 241 demultiplexes the current encoded motion information and the encoded residual signals from the received bitstream. The demultiplexer 241 transmits the demultiplexed motion information to the motion information encoding mode determination unit 242 and transmits the demultiplexed residual signal to the residual block decoding unit 246.

The motion information encoding mode determination unit 242 determines a motion information encoding mode of the current block. When the skip_flag of the received bitstream has a value of 1, the motion information encoding mode determination unit 242 determines that the motion information encoding mode of the current block is encoded in the skip encoding mode.

The motion information encoding mode determination unit 242 determines that the skip_flag of the received bitstream has a value of 0 and the motion information encoding mode of the current block having only the merge index of the motion information received from the demultiplexer 241 is the merge mode As shown in FIG.

When the skip_flag of the received bitstream has a value of 0 and the motion information received from the demultiplexer 241 has a reference picture index, a differential motion vector, and an AMVP index, the motion information encoding mode determination unit 242 determines It is determined that the motion information encoding mode of the current block is coded in the AMVP mode.

The merge mode motion information decoding unit 243 is activated when the motion information encoding mode determination unit 242 determines the motion information encoding mode of the current block as a skip or merge mode.

The AMVP mode motion information decoding unit 244 is activated when the motion information encoding mode determination unit 242 determines that the motion information encoding mode of the current block is the AMVP mode.

The prediction block generator 245 generates a prediction block of the current block using the motion information reconstructed by the merge mode motion information decoding unit 243 or the AMVP mode motion information decoding unit 244. [

If the motion vector is an integer unit, the block corresponding to the position indicated by the motion vector in the picture indicated by the reference picture index is copied to generate a prediction block of the current block.

However, when the motion vector is not an integer unit, the pixels of the prediction block are generated from the integer unit pixels in the picture indicated by the reference picture index. In this case, in the case of a luminance pixel, a prediction pixel can be generated using an 8-tap interpolation filter. In the case of a chrominance pixel, a 4-tap interpolation filter can be used to generate a predictive pixel.

The residual block decoding unit 246 entropy decodes the residual signal. Then, the entropy-decoded coefficients are inversely scanned to generate a two-dimensional quantized coefficient block. The inverse scanning method can be changed according to the entropy decoding method.

That is, the inverse scanning method of the inter-prediction residual signal in case of decoding based on CABAC and decoding based on CAVLC can be changed. For example, in case of decoding based on CABAC, a raster inverse scanning method in a diagonal direction, and a case in which decoding is based on CAVLC, a zigzag reverse scanning method can be applied.

In addition, the inverse scanning method may be determined depending on the size of the prediction block.

The residual block decoding unit 246 dequantizes the generated coefficient block using an inverse quantization matrix. And restores the quantization parameter to derive the quantization matrix. The quantization step size is restored for each coding unit of a predetermined size or more.

The predetermined size may be 8x8 or 16x16. Accordingly, when the current coding unit is smaller than the predetermined size, only the quantization parameters of the first coding unit are restored in the coding order among the plurality of coding units within the predetermined size, and the quantization parameters of the remaining coding units are the same as the parameters, You do not have to.

The quantization parameter of the coding unit adjacent to the current coding unit is used to recover the quantization parameter determined for each coding unit equal to or larger than the predetermined size. The first coding unit of the current coding unit, the upper coding unit order, and determine a valid first quantization parameter as a quantization parameter predictor of the current coding unit.

In addition, the first coding unit may be searched in order of the coding unit immediately before in the coding order, and the first validation parameter may be determined as a quantization parameter predictor. And restores the quantization parameter of the current prediction unit using the determined quantization parameter predictor and the difference quantization parameter.

The residual block decoding unit 260 inversely transforms the dequantized coefficient block to recover the residual block.

The reconstruction block generation unit 270 adds the prediction blocks generated by the prediction block generation unit 250 and the residual blocks generated by the residual block decoding unit 260 to generate reconstruction blocks.

Hereinafter, a process of restoring a current block through intraprediction will be described with reference to FIG.

First, the intra prediction mode of the current block is decoded from the received bitstream. For this, the entropy decoding unit 210 recovers the first intra prediction mode index of the current block by referring to one of the plurality of intra prediction mode tables.

The plurality of intra prediction mode tables are tables shared by the encoder and the decoder, and may be any one selected according to the distribution of intra prediction modes of a plurality of blocks adjacent to the current block.

For example, if the intra prediction mode of the left block of the current block and the intra prediction mode of the upper block of the current block are the same, the first intra prediction mode table of the current block is restored by applying the first intra prediction mode table, The first intra prediction mode index of the current block can be restored by applying the second intra prediction mode table.

As another example, when the intra prediction modes of the upper block and the left block of the current block are all the directional intra prediction modes, the direction of the intra prediction mode of the upper block and the intra prediction mode of the left block If the direction is within a predetermined angle, the first intra-prediction mode table of the current block is restored by applying the first intra-prediction mode table. If the direction is outside the predetermined angle, the second intra- The mode index can also be restored.

The entropy decoding unit 210 transmits the first intra-prediction mode index of the restored current block to the intra-prediction unit 230.

The intraprediction unit 230 receiving the index of the first intraprediction mode determines the maximum possible mode of the current block as the intra prediction mode of the current block when the index has the minimum value (i.e., 0).

However, if the index has a value other than 0, the index indicating the maximum possible mode of the current block is compared with the first intra-prediction mode index. If the first intra-prediction mode index is not smaller than the index indicated by the maximum possible mode of the current block, the intra-prediction mode corresponding to the second intra-prediction mode index obtained by adding 1 to the first intra- The intra prediction mode of the current block is determined as the intra prediction mode corresponding to the first intra prediction mode index.

The intra prediction mode acceptable for the current block may be composed of at least one non-directional mode and a plurality of directional modes.

The one or more non-directional modes may be a DC mode and / or a planar mode. In addition, either the DC mode or the planar mode may be adaptively included in the allowable intra prediction mode set.

To this end, information specifying the non-directional mode included in the allowable intra prediction mode set may be included in the picture header or slice header.

Next, in order to generate an intra prediction block, the intra predictor 230 rotors the reference pixels stored in the picture storage unit 260, and determines whether there is a reference pixel that is not available.

The determination may be made according to the presence or absence of the reference pixels used to generate the intra prediction block by applying the decoded intra prediction mode of the current block.

Next, when it is necessary to generate a reference pixel, the intra predictor 230 generates reference pixels of a position that is not available using the reconstructed available reference pixels.

The definition of a reference pixel that is not available and the method of generating a reference pixel are the same as those in the intra prediction unit 150 shown in FIG. However, it is also possible to selectively reconstruct a reference pixel used for generating an intra prediction block according to the decoded intra prediction mode of the current block.

Next, the intraprediction unit 230 determines whether to apply a filter to the reference pixels to generate a prediction block. That is, the intra-prediction unit 230 determines whether to apply filtering on the reference pixels to generate an intra-prediction block of the current block based on the decoded intra-prediction mode and the size of the current prediction block.

Since the problem of blocking artifacts increases as the size of the block increases, the larger the size of the block, the larger the number of prediction modes for filtering reference pixels. However, when the block is larger than a predetermined size, it can be regarded as a flat area, so that reference pixels may not be filtered to reduce the complexity.

If it is determined that the filter needs to be applied to the reference pixel, the reference pixels are filtered using a filter.

At least two or more filters may be adaptively applied according to the difference in level difference between the reference pixels. The filter coefficient of the filter is preferably symmetrical.

In addition, the above two or more filters may be adaptively applied according to the size of the current block. That is, when a filter is applied, a filter having a narrow bandwidth may be applied to a block having a small size, and a filter having a wide bandwidth may be applied to a block having a large size.

In the case of the DC mode, since a prediction block is generated with an average value of reference pixels, there is no need to apply a filter. That is, when the filter is applied, only unnecessary calculation amount is increased.

In addition, it is not necessary to apply the filter to the reference pixel in the vertical mode in which the image has vertical correlation. It is not necessary to apply the filter to the reference pixel even in the horizontal mode in which the image is related to the horizontal direction.

Since the filtering is applied to the intra-prediction mode of the current block, the reference pixel can be adaptively filtered based on the intra-prediction mode of the current block and the size of the prediction block.

Next, according to the reconstructed intra prediction mode, a prediction block is generated using the reference pixel or the filtered reference pixels. Since the generation of the prediction block is the same as the operation in the encoder, it is omitted. Even in the planar mode, the operation is the same as that in the encoder, so it is omitted.

Next, it is determined whether to filter the generated prediction block. The determination as to whether to perform the filtering may use information included in the slice header or the encoding unit header. It may also be determined according to the intra prediction mode of the current block.

If it is determined that the generated prediction block is to be filtered, the generated prediction block is filtered. Specifically, a new pixel is generated by filtering pixels at a specific position of a prediction block generated using available reference pixels adjacent to the current block.

This may be applied together at the time of generating the prediction block. For example, in the DC mode, a prediction pixel in contact with reference pixels among prediction pixels is filtered using a reference pixel in contact with the prediction pixel.

Therefore, the predictive pixel is filtered using one or two reference pixels according to the position of the predictive pixel. The filtering of the prediction pixel in the DC mode can be applied to the prediction block of all sizes. In the vertical mode, the prediction pixels adjacent to the left reference pixel among the prediction pixels of the prediction block may be changed using reference pixels other than the upper pixel used to generate the prediction block.

Likewise, in the horizontal mode, the prediction pixels adjacent to the upper reference pixel among the generated prediction pixels may be changed using reference pixels other than the left pixel used to generate the prediction block.

The current block is reconstructed using the predicted block of the current block restored in this manner and the residual block of the decoded current block.

The moving picture bitstream according to an embodiment of the present invention may include PS (parameter sets) and slice data as a unit used to store coded data in one picture.

A PS (parameter set) is divided into a picture parameter set (hereinafter, simply referred to as PPS) and a sequence parameter set (hereinafter simply referred to as SPS) which are data corresponding to the heads of each picture. The PPS and the SPS may include initialization information required to initialize each encoding.

The SPS is common reference information for decoding all pictures coded in a random access unit (RAU), and includes a profile, a maximum number of pictures usable for reference, a picture size, and the like, as shown in Figs. 6 and 7 .

The PPS includes, for each picture coded by the random access unit (RAU), the kind of the variable length coding method as the reference information for decoding the picture, the initial value of the quantization step, and a plurality of reference pictures, 9 as shown in FIG.

On the other hand, the slice header SH includes information on the corresponding slice when coding in units of slices, and can be configured as shown in FIGS. 10 to 12.

Hereinafter, a configuration for scalably processing the above-described moving image encoding and decoding processing using a plurality of processing units will be described in detail.

An apparatus for processing moving images according to an exemplary embodiment of the present invention includes: parsing parameter information or slice header information from moving picture data input from the host, and performing image center processing for registering the parsed slice header information in a scheduler And a plurality of image processors for processing a moving image according to the parsed information under the control of the image central processing unit, wherein the plurality of image processors include slice header information corresponding to slice data to be processed, Scheduler searches and processes the video.

The image central processing unit may allocate slice data to be processed by each of the plurality of image processing units using the slice header information registered in the scheduler.

In addition, when the boundary of the picture according to the processing of the plurality of image processing units is found, the image central processing unit decodes the slice data corresponding to the slice information registered in the scheduler, in the plurality of image processing units Scheduling can be waited until.

Each of the plurality of image processing units includes a first processing unit for communicating with the image central processing unit to perform entropy coding on the moving image data, and a second processing unit for processing the entropy- Unit. &Lt; / RTI >

A method of processing a moving image in a moving image processing apparatus having an image processing unit and a plurality of image processing units according to an embodiment of the present invention is characterized in that the image processing unit is configured to extract parameter information or slice header information Wherein the image processing unit registers the parsed slice header information in a scheduler, and each of the plurality of image processors transmits slice header information corresponding to slice data to be processed to the scheduler And processing the moving picture.

The image central processing unit may further include a step of allocating slice data to be processed by each of the plurality of image processing units using the slice header information registered in the scheduler.

In addition, when the image central processing unit finds a boundary in a picture according to processing of a plurality of image processing units, decoding of slice data corresponding to the slice information registered in the scheduler is completed in the plurality of image processing units And waiting for the scheduling until a predetermined time elapses.

The plurality of image processing units include a first processing unit and a second processing unit, respectively, wherein the first processing unit communicates with the image central processing unit to perform entropy coding on the moving image data, And the second processing unit may process the entropy-coded moving picture data in units of encoding.

Here, the video processing unit may refer to a VPU 300 to be described later, the video central processing unit may be a V-CPU 310 to be described later, and the video processing unit may be a V-CORE 320 to be described later. The first image processing unit may be referred to as a BPU 321, and the second image processing unit may be referred to as a VCE 322 to be described later.

Here, the moving picture processing apparatus may include both a moving picture coding apparatus and a moving picture decoding apparatus. The moving picture decoding apparatus and the moving picture encoding apparatus may be implemented as apparatuses for performing inverse processes as described above with reference to FIGS. 1 to 4. Hereinafter, a moving picture decoding apparatus will be described as an example of a moving picture decoding apparatus do. However, the present invention is not limited to this, and the moving picture processing apparatus may be embodied as a moving picture coding apparatus which performs an inverse process of a moving picture decoding apparatus to be described later.

13 is a diagram illustrating a layer structure of a moving picture decoding apparatus according to an embodiment of the present invention. Referring to FIG. 13, the moving picture decoding apparatus may include a video processing unit (VPU) 300 that performs a moving picture decoding function. The VPU 300 includes a V-CPU 310, a BPU 321, a VCE 322). Here, the BPU 321 and the VCE 322 may combine to form the V-core 320. [

Here, the VPU 300 according to an embodiment of the present invention may preferably include one V-CPU 310 and a plurality of V-cores 320 (hereinafter referred to as Multi V-Core) . However, the present invention is not limited to this, and the number of VPUs 300 may vary depending on the implementation of the VPU 300.

The V-CPU 310 controls the overall operation of the VPU 300. In particular, the V-CPU 310 can parse a Video Parameter Set (VPS), an SPS, a PPS, and an SH in a received moving picture bitstream. Then, the V-CPU 310 can control the overall operation of the VPU 300 based on the parsed information.

For example, the V-CPU 310 can determine the number of V-cores 320 to be used for data parallel processing based on the parsed information. As a result of the determination, when it is determined that a plurality of V-cores 320 are necessary for the data parallel processing, the V-CPU 310 determines the area to be processed by each V-core 320 of the Multi V- Can be determined.

Also, the V-CPU 310 can determine the entry points of the bit stream for the area to be allocated to each V-core 320. [

Also, the V-CPU 310 can allocate the boundary area in one picture generated by decoding using the Multi V-core 320 to the Multi V-core 320. [

The V-CPU 310 communicates with an application programming interface (API) on a picture-by-picture basis and can communicate with the V-Core 320 on a slice / tile basis.

The V-Core 320 performs a demodulation process and a boundary process under the control of the V-CPU 310. [ For example, the V-Core 320 can decode the allocated area under the control of the V-CPU 310. [ In addition, the V-Core 320 can perform boundary processing on the boundary area allocated under the control of the V-CPU 310. [

Here, the V-Core 320 may include a BPU 321 and a VCE 322.

The BPU 321 entropy decodes the data of the allocated area (slice or tile). That is, the BPU 321 can perform the functions of the entropy decoding unit 210 described above, and the BPU 321 can include a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), and a TU (Transform Unit) level parameter can be derived. Then, the VCE 322 can be controlled.

Where the BPU 321 may communicate with the V-CPU 310 on a slice or tile basis and with the VCE 322 on a CTU-by-CTU basis.

The VCE 322 may perform TQ (Transform / Quantization), Intra-prediction, Inter-prediction, Loop Filtering (LF), and Memory compression by receiving the derived parameters of the BPU 321. That is, the VCE 322 may perform the functions of the inverse quantization / inverse transformation unit 220, the deblocking filter 250, the intra prediction unit 230, and the motion compensation prediction unit 240.

Here, the VCE 322 can process the allocated area by CTU-based pipelining.

FIG. 14 is a timing diagram illustrating a moving picture decoding operation of a VPU according to an embodiment of the present invention. Referring to FIG. 14, as described above, the V-Cpu 310 is allocated to each multi V-Core 320 for each picture (frame) area, and the multi V- (core processing) and boundary processing (boundary processing).

Hereinafter, the detailed operation of the V-CPU 310 will be described in detail.

Specifically, the V-CPU 310 can perform an interface operation with the host processor.

Also, the V-CPU 310 can parse a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), and a Slice Header (SH) in the received moving picture bitstream.

In addition, the V-CPU 310 can transmit information necessary for slice / tile decoding in the V-Core 320 using the parsed information. The necessary information may include 'Picture parameter data structure' and 'Slice control data structure'.

The 'Picture parameter data structure' may include the following information.

For example, the information contained in the sequence / picture header (eg, picture size, scaling list, CTU, min / max CU size, min / max TU size, etc.) can do.

This Picture parameter data structure can be set once during decoding of one picture.

Slice control data structure may contain the following information.

For example, the information included in the Slice header (eg, slice type, slice / tile area information, reference picture list, weighted prediction parameter, etc.) may be included.

This slice control data structure can be set when the slice changes. The inter-processor communication registers of the V-Core 320 or the slice parameter buffer at external memory can store N slice control data structures. If the state is not full, the data structure corresponding to the slice currently being decoded can be stored Can be stored. In this case, N is a time point at which the V-CORE 320 notifies the V-CPU 310 of the completion of the processing, after the pipe of the VCE 322 is completely flushed (N = 1) (N > 1) between the current segment and the next segment.

Here, the information transferred from the V-CPU 310 to the V-Core 320 may be transferred through the inter-processor communication registers of the V-Core 320. [ Inter-processor communication registers can be implemented as a fixed-size register array (file), or as an external memory. If it is implemented as an external memory, the V-CPU 310 can be stored in an external memory, and the BPU 321 can be operated in a structure read from an external memory.

Meanwhile, even if the number of slice control data structures that can be stored in the V-Core 320 is 1 (or any number), the V-Core 320 between the segment and the segment is prevented from being idle for a long time The V-CPU 310 must be able to perform SH decoding and parameter generation as shown in FIG.

Meanwhile, when a plurality of tiles are included in one slice and are processed in parallel by the multi V-Cores 320, the V-CPU 310 transmits the same slice control data structure to the multi V-Core 320 .

In addition, the V-CPU 310 can control the synchronization of the Multi V-Cores 320 for data parallel processing of the Multi V-Cores 320. [

Also, the V-CPU 310 can process the exception when the V-Core 320 generates an exception. For example, when an error is detected in the parameter set decoding in the V-CPU 310, an error is detected in the slice data decoding in the BPU 321 of the V-Core 320, and a decoding time specified in frame decoding is exceeded : When the peripheral and V-core 320 of the V-CPU 310 are stalled due to an unknown error in the VPU 300 and a failure of the system bus, a countermeasure can be taken to solve this problem.

In addition, the V-CPU 310 can report completion to the API upon completion of frame decoding of the VPU 300. [

In addition, the V-CPU 310 can determine the number of V-cores 320 to be used for data parallel processing based on the parsed information. As a result of the determination, when it is determined that a plurality of V-cores 320 are necessary for the data parallel processing, the V-CPU 310 determines the area to be processed by each V-core 320 of the Multi V- Can be determined.

Hereinafter, the detailed operation of the BPU 321 will be described in detail.

The BPU 321 may entropy decode the data of the allocated area (slice or tile). The SHU (Slice Header) is decoded by the V-CPU 310, and the BPU 321 does not decode the SH because all the necessary information is received by the picture parameter data structure and the slice control data structure.

In addition, the BPU 321 can derive a CTU (Coding Tree Unit) / CU (Coding Unit) / PU (Prediction Unit) / TU (Transform Unit) level parameter.

The BPU 321 may also send the derived parameters to the VCE 322.

CUU / CU / PU / TU parameters and coefficients required for decode processing excluding the information (picture size, segment offset / size, ...) common to each block and source / destination address in DMAC and reference pixel data The BPU 321 and the VCE 322 can communicate through the FIFO. However, the segment level parameters may be set in the internal register of the VCE 322 instead of the FIFO.

In addition, the BPU 321 may perform a function of a VCE controller for controlling the VCE 322. The VCE controller outputs the picture_init, segment_init signal, and software reset that the BPU 321 can control by register setting, and each sub-block of the VCE 322 can use these signals for control.

When the BPU 321 sets the above picture / segment-level parameters to the VCE controller and then sets the segment run by the register, the CU 321 does not communicate with the BPU 321 until the decoding of the set segment is completed. The decoding process can be controlled by referring to the fullness of the parameter FIFO and the status information of each subblock.

Also, the BPU 321 can process the exception when an exception occurs.

In addition, it can report to the V-CPU 310 when the slice / tile segment processing is completed.

The VCE 322 may perform TQ (Transform / Quantization), Intra-prediction, Inter-prediction, Loop Filtering (LF), and Memory compression by receiving the derived parameters of the BPU 321.

Here, the VCE 322 can process the allocated area by CTU-based pipelining.

According to various embodiments of the present invention described above, it is possible to separate the header parsing and the data processing process, pipeline the separated data processing process, and perform V- CPU can be provided.

Hereinafter, a method of controlling the synchronization of the Multi V-Cores 320 for data parallel processing of the Multi V-Cores 320 performed by the V-CPU 310 will be described in detail with reference to FIG.

Referring to FIG. 16, the V-CPU 310 may transmit a decoding command signal to each of the Multi V-Cores 320 determined to be used for data parallel processing. Accordingly, each V-CORE 320 performs decoding, and when the decoding is completed, each V-CORE 320 can transmit a decoding completion signal to the V-CPU 310. [

If a decoding completion signal is received from all of the V-Core 320 to which the decoding command signal is transmitted, the V-CPU 310 can transmit a post-processing command (e.g., boundary processing) to each of the Multi V-Cores 320 have. Accordingly, each V-CORE 320 performs a post-process, and when the post-process is completed, each V-CORE 320 can transmit a post-process completion signal to the V-CPU 310.

If the post-processing command signal is received from all V-cores to which the post-processing command signal is transmitted, the V-CPU 310 can transmit a decoding command signal to each of the Multi V-Cores 320 determined to be used. Accordingly, the V-CPU 310 can control the synchronization of the Multi V-Cores 320 for data parallel processing.

Hereinafter, a method of determining the number of V-cores to be used in data parallel processing performed by the V-CPU 310 will be described in detail with reference to FIGS.

Specifically, the V-CPU 310 can detect SPS (Sequence Parameter Set) and level information included in the parsed SPS (Sequence Parameter Set). The detected level information can be compared with level information that can be processed by the V-CORE 320 to determine the number of V-cores to be used for real-time decoding.

Here, the V-CPU 310 can use level information that can be processed by the V-CORE 320 shown in FIG.

For example, one V-core 320 can decode level 5.0. If the level information of the bitstream is 5.0, the V-CPU 310 can determine that one V-core 320 is needed have. Then, the V-CPU 310 can determine one V-core 320 to be used.

Alternatively, one V-core 320 can decode the level 5.0. If the level information of the bitstream is 5.1, the V-CPU 310 can determine that two V-cores 320 are needed .

If it is determined that two or more V-cores 320 are necessary, the V-CPU 310 determines whether each frame is one of the following three types of information: tile information of a PPS (Picture Parameter Set) and Slice Header (SH ) Can be identified by parsing.

CASE1) 1tile, 1slice

CASE2) Multiple tile

CASE3) 1tile, multiple slice

If the bit stream is 1tile and 1slice (CASE1), parallel processing is not possible, so only one V-CORE 320 can be used. In this case, the V-CPU 310 can determine one V-core 320 to be used.

If the bitstream is a multiple tile (CASE2), it is possible to determine the number of V-COREs 320 for allowing each V-core 320 to parallelize the same number of pixels as possible. In this case, the V-CPU 310 can determine the V-core 320 to use as the determined number. The V-CPU 310 can allocate an area to be processed to the V-core 320 determined to parallelize the same number of pixels as possible for each V-core 320.

If the bitstream is 1tile or multiple slice (CASE3), it is possible to determine the number of V-COREs 320 for allowing the respective V-cores 320 to process the same number of pixels in parallel. In this case, the V-CPU 310 can determine the V-core 320 to use as the determined number. The V-CPU 310 can allocate an area to be processed to the V-core 320 determined to parallelize the same number of pixels as possible for each V-core 320.

On the other hand, the POWER of the V-core 320 whose use is not determined can be interrupted.

Hereinafter, an entry point search method performed by the V-CPU 310 will be described in detail with reference to FIGS. 19 to 20. FIG.

If the system notifies the entry point location, the V-CPU 310 can reverse seek to find the start code to parse the slice header (SH).

And, if the found slice is a dependent slice, the V-CPU 310 can continue to reverse seek until it finds a normal slice.

If the system tells the location of the NAL unit, the NAL unit should not be a dependent slice.

Since there is no entry point information in the picture level, the V-CPU 310 can find the entry point by parsing all slice headers in the picture on a picture-by-picture basis. Since the entry point information is located at the end of the slice header, the syntax of the slice header can be parsed to find the entry point information of the V-CPU 310.

In this case, since all slice headers in the picture must be parsed in picture units, the V-CPU 310 can store all the slice headers in the memory of the V-CPU 310 when searching for entry points. Accordingly, when the V-core 320 is operated later, the slice header may not be parsed again. For example, the memory size of approximately 300 bytes / slice * 600 (MaxSlicesPerPicture of 6.2 (max level)) = 180 KB when all slice headers of a picture are stored.

In other words, in the case of a single core, since it is necessary to sequentially decode a single V-CORE, it is not necessary to search an entry point in advance.

However, in the case of a multicore, since it is necessary to decode using a plurality of V-COREs, it is necessary to search for entry points in advance in order to decode in parallel in a plurality of V-COREs.

Accordingly, according to an embodiment of the present invention, the V-CPU can search for an entry point in advance to perform decoding using Multi V-CORE.

Meanwhile, FIGS. 19 to 20 illustrate examples in which entry points are searched when an entry point is not indicated in the system layer. 19, when a slice in a picture is a first subset of slice segments and at least one slice in a picture is not a slice (Not 1st subset of slice segment) (Look for tileID = 2) for searching an entry point of a tile in a non-slice will be described as an example.

&Lt; 1st subset of slice segment > >

In this case, when applied to the algorithm shown in Fig. 20, the entry point for tileID = 2 can be searched for when the entry point offset is 0 at tileID = 2.

&Lt; Not 1st subset of slice segment > When at least one slice in the picture is not rectangular

20, the entry point offset at tileID = 2 is calculated as entry point offset = sum of entry point offset [i], and an entry point for tileID = 2 is searched .

Hereinafter, referring to FIG. 21, a method of assigning entry points so that the number of pixels allocated to each of the Multi V-cores 320 performed in the V-CPU 310 is equalized will be described in detail.

As shown in FIGS. 17 to 18, the V-CPU 310 can determine the number of the V-cores 320 to be used for the parallel processing and determines the V-core 320 to be used. In this case, the V-CPU 310 can allocate the entry points searched by the above-described Figs. 19 to 20 so that the number of pixels to be allocated to each of the V-cores 320 determined to be used is equalized.

First, a method of determining an area to be allocated to each of the Multi V-cores 320 can be performed by the algorithm shown in FIG.

In FIG. 21, ctb_num_in_pic is the number of CTBs in the picture, and ctb_num_in_segment [] may be the number of CTBs in each tile or slice. Accordingly, it is possible to determine the allocation area (core_start_addr [core_id]) to each V-core 320.

The V-CPU 310 uses the slice_dress of the Slice header and the entry point information of the Slice header to set the entry point to each V-Core 320 so that the number of pixels allocated to each V- Can be properly allocated.

When the start position of the bit stream to be allocated to the Multi V-CORE 320 is determined according to the operation described above, the V-CPU 310 and the Multi V-CORE 320 are connected to the interface ). &Lt; / RTI >

22, an interface (Interface 0) between the HOST CPU and the V-CPU 310 and an interface (Interface 1) between the V-CPU 310 and the Multi V-core 320 are included according to an embodiment of the present invention .

Hereinafter, the interface (Interface1) between the V-CPU 310 and the Multi V-core 320 will be described in detail with reference to FIG.

Referring to FIG. 23, in a single core, when parsing of a slice header is completed in the V-CPU 310, the single V-CORE 320 can decode the slice data corresponding to the parsed slice header. Here, the V-CPU 310 may parse the slice header for the next slice before the decoding of the slice data in the single V-CORE 320 is completed. Accordingly, the single V-CORE 320 can sequentially decode the slice data.

For multi core, there can be two types of data parallel processing (Pipeline).

In the first method, the V-CPU 310 sequentially parses the slice header of one picture in units of pictures, and sequentially transmits the slice data corresponding to the parsed slice header to the Multi V-core 320 in order Can be assigned. In this case, the Multi V-CORE 320 can start decoding the slice data in the assigned order.

In the second method, the V-CPU 310 parses all the slice headers of one picture in units of pictures, and then simultaneously assigns each slice data corresponding to each of the parsed slice headers to the Multi V-core 320 can do. In this case, the Multi V-CORE 320 can simultaneously start decoding the slice data.

In the case of the above-described Multi core, the two data parallel processing methods may have the following merits and demerits.

Specifically, the first scheme has the advantage of using less memory than the second scheme, but it has a disadvantage in that the overall processing time due to the penalty becomes longer.

In addition, the second scheme has the advantage that the overall processing time is faster than the first scheme, but it has a disadvantage of using a lot of memory.

Here, the memory may be a memory for storing a bit stream (sequence, picture, slice header portion). In the first scheme, it is not necessary to parse the bitstream (sequence, picture, slice header portion) for the next frame until all the processing for one frame is completed. Therefore, , Slice header portion), the space to store the bitstream may be smaller than the second scheme in which parsing is performed.

Hereinafter, a method for controlling the Multi V-core using the scheduler will be described in detail with reference to FIGS. 24 to 25. FIG.

In the case of a multicore, unlike the case of a single core, the V-CPU 310 first parses the slice header information and registers it in the parsed slice header information scheduler can do. Here, since the slice header information can have new values every time a new independent slice is issued, the V-CPU 310 can register the parsed slice header information together with the entry point in the scheduler.

In addition, the V-CPU 310 can allocate the slice data of the picture unit to the V-CORE 320 among the Multi V-core 320 using the information registered in the scheduler.

In this case, each of the Multi V-cores 320 can search the scheduler for the slice header information corresponding to the slice data to be processed and decode the allocated slice data. Herein, if it can be decoded immediately after being registered in the scheduler, the Multi V-core 320 can perform decoding immediately.

When the V-CPU 310 finds a boundary in a picture between the Multi V-cores 320 according to decoding using the Multi V-core 320, the V-CPU 310 registers Core 320 until the decoding of the slice data corresponding to the received information is completed.

This operation can be expressed by a pseudo code as shown in FIG.

In FIG. 25, dpb_allocate () finds a frame buffer which is not currently in use (ref / out / dis flag is 0 in all), sets ref / out / dis_flag of the corresponding frame buffer to all 1s and returns idx of the frame buffer .

Also, dpb_operation () performs a function of removing ref_flag of a picture which is no longer included in the rps among the pictures in the current dpb. In other words, count the number of frame buffers whose out_flag is 1 in the frame buffer, and execute dpb_bump_out () until this number becomes less than max_reorder_delay. Also, count the number of frame buffers whose ref_flag or out_flag is 1 in the frame buffer, and execute dpb_bump_out () until this number becomes less than max_dec_frame_buffering. Unmark the ref_flag of the picture in the dpb that is not included in rps.

Also, dpb_bump_out () counts the number of frame buffers with out_flag / ref_flag in the dpb_operation () function, clears the output flag of the frame buffer having the smallest POC if it exceeds a certain number, and sets the idx of the frame buffer to display_fifo Moving.

The method according to the present invention may be implemented as a program for execution on a computer and stored in a computer-readable recording medium. Examples of the computer-readable recording medium include a ROM, a RAM, a CD- , A floppy disk, an optical data storage device, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet).

The computer readable recording medium may be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner. And, functional programs, codes and code segments for implementing the above method can be easily inferred by programmers of the technical field to which the present invention belongs.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It should be understood that various modifications may be made by those skilled in the art without departing from the spirit and scope of the present invention.

Claims

An apparatus for processing moving images,
An image central processing unit for parsing parameter information or slice header information from moving picture data input from a host, and registering the parsed slice header information in a scheduler; And
And a plurality of image processing units under the control of the image central processing unit and processing moving images in accordance with the parsed information,
Wherein the plurality of image processing units include:
Wherein the scheduler searches the slice header information corresponding to the slice data to be processed by each of the slice header information, and processes the moving picture.

The method according to claim 1,
Wherein the image central processing unit comprises:
And determines a start position of a moving picture bitstream to be allocated to each of the plurality of image processing units using the slice header information obtained in units of pictures.

3. The method of claim 2,
Wherein the image central processing unit comprises:
And registers the parsed slice header information together with the determined bitstream start position in the scheduler.

The method according to claim 1,
Wherein the image central processing unit comprises:
And allocates slice data to be processed by each of the plurality of image processing units using the slice header information registered in the scheduler.

The method according to claim 1,
Wherein the image central processing unit comprises:
When the boundary in the picture according to the processing of the plurality of image processing units is found, waiting for scheduling until decoding of the slice data corresponding to the slice information registered in the scheduler is completed in the plurality of image processing units A video processing device.

The method according to claim 1,
Wherein each of the plurality of image processing units comprises:
A first processing unit communicating with the image central processing unit to perform entropy coding on the moving image data; And
And a second processing unit for processing the entropy-coded moving picture data in units of coding.

A method of processing moving images in a moving image processing apparatus having an image processing unit and a plurality of image processing units,
Registering parameter information or slice header information parsed from moving picture data input from a host in a scheduler; And
And a slice header information corresponding to slice data to be processed by each of the plurality of image processing units is searched in the scheduler for moving image processing.

8. The method of claim 7,
Further comprising the step of determining a start position of a moving picture bitstream to be allocated to each of the plurality of image processing units using the slice header information obtained in units of pictures.

9. The method of claim 8,
Wherein the parsed slice header information is registered in the scheduler together with the determined bitstream start position.

9. The method of claim 8,
And allocating slice data to be processed by each of the plurality of image processing units using the slice header information registered in the scheduler.

8. The method of claim 7,
Waiting for scheduling until the decoding of the slice data corresponding to the slice information registered in the scheduler is completed in the plurality of image processing units when a boundary in the picture according to the processing of the plurality of image processing units is found, Further comprising the steps of: