WO2019009452A1

WO2019009452A1 - Method and device for encoding or decoding image

Info

Publication number: WO2019009452A1
Application number: PCT/KR2017/007270
Authority: WO
Inventors: 이종석; 김재환; 박영오; 박정훈; 전선영; 최광표
Original assignee: 삼성전자 주식회사
Priority date: 2017-07-06
Filing date: 2017-07-06
Publication date: 2019-01-10

Abstract

Disclosed is a prediction image generation technique using a deep neural network (DNN). A method for decoding an image, according to one disclosed embodiment, comprises the steps of: receiving a bitstream of an encoded image; determining one or more blocks segmented from the encoded image; generating prediction data by performing DNN-based prediction on a current block among the one or more blocks; extracting, from the bitstream, residual data of the current block; and reconstructing the current block using the prediction data and the residual data.

Description

Method and apparatus for encoding or decoding an image

This disclosure relates to methods of processing images using artificial intelligence (AI) that utilizes machine learning algorithms. Specifically, the present disclosure relates to a technique for generating a predictive image using a deep neural network (DNN) in an image encoding and decoding process.

An artificial intelligence system is a computer system that implements human-level intelligence. It is a system in which the machine learns, judges,

Artificial intelligence technology consists of element technologies that simulate functions such as recognition and judgment of human brain by using machine learning (deep learning) algorithm and machine learning algorithm which use algorithm to classify / learn input data by themselves.

Elemental technologies include, for example, linguistic understanding techniques for recognizing human language / characters, visual understanding techniques for recognizing objects as human vision, reasoning / predicting techniques for reasoning and predicting information for reasoning and prediction of information, A knowledge representation technique for processing the robot as knowledge data, an autonomous running of the vehicle, and an operation control technique for controlling the motion of the robot.

In particular, visual understanding is a technology for recognizing and processing objects as human vision, including object recognition, object tracking, image search, human recognition, scene understanding, spatial understanding, and image enhancement.

A method and apparatus for encoding / decoding an image according to various embodiments are provided. The technical problem to be solved by this embodiment is not limited to the above-mentioned technical problems, and other technical problems can be deduced from the following embodiments.

According to an aspect of the present invention, there is provided an image decoding method comprising: receiving a bitstream of an encoded image; Determining at least one block segmented from the encoded image; Performing prediction based on a DNN (Deep Neural Network) on a current block among the one or more blocks to generate prediction data for the current block; Extracting residual data of the current block from the bitstream; And reconstructing the current block using the prediction data and the residual data.

Also, in the image decoding method according to an exemplary embodiment, the DNN may be a network that has been learned to generate the prediction data that minimizes an error with the original data of the current block.

The generating of the prediction data may include generating first prediction data by performing a first prediction based on a DNN for the current block, Generating second prediction data by performing a second prediction based on DNN with respect to the first prediction data; And generating the prediction data using the first prediction data and the second prediction data.

Also, in the image decoding method according to an exemplary embodiment, the first prediction may be performed based on a RNN (Recurrent Neural Network), and the second prediction may be performed based on a CNN (Convolutional Neural Network).

Also, in the image decoding method according to an embodiment, the RNN may be a network that is learned to generate the first prediction data that minimizes an error with the original data of the current block.

Also, in the image decoding method according to an exemplary embodiment, the CNN may be a network number that is learned to generate the second prediction data that minimizes an error between the original data of the current block and the value obtained by subtracting the first prediction data have.

According to another aspect of the present invention, there is provided a method of decoding an image, the method comprising the steps of: generating neighboring blocks adjacent to the current block by LSTM (long short) in the RNN along a predetermined direction on a time step- -term memory) into each cell of the network; Generating a cell output for the time step using a plurality of gates in each cell; And processing the cell output via a fully connected network of the RNN.

According to an embodiment of the present invention, the inputting step may include: determining one or more input angles based on the current block; Determining a neighboring block for each input angle located along each input angle; And inputting the neighboring blocks for each input angle to each cell of the LSTM network in the clockwise order.

In the image decoding method according to an embodiment, when there are a plurality of neighboring blocks according to the input angle, the input order between neighboring blocks located at the same input angle may be an order of positions closer to the current block have.

Also, in the image decoding method according to an embodiment, the inputting may include inputting the neighboring blocks to each cell of the LSTM network in the order of Z scan.

According to an embodiment of the present invention, the generating of the second prediction data may further include the step of generating the second prediction data by adding the first prediction data and the neighboring reconstructed data adjacent to the current block to a convolutional layer of the CNN Inputting; And performing a convolution operation using a plurality of filters of the convolution layer.

According to an embodiment of the present invention, the generating of the prediction data may include: determining one or more reference pictures and one or more reference block positions referenced by the current block; And generating the prediction data using the one or more reference pictures and the one or more reference block positions.

Also, in the image decoding method according to an embodiment, information on the structure of the DNN and information on a block for performing prediction based on the DNN may be a video parameter set, a sequence parameter set, and a picture parameter set Can be obtained from at least one.

According to an embodiment of the present invention, there is provided an apparatus for decoding an image, the apparatus comprising: a receiver for receiving a bitstream of an encoded image; A block determining unit for determining one or more blocks divided from the encoded image; A prediction unit for generating prediction data for the current block by performing prediction based on DNN (Deep Neural Network) on the current block among the one or more blocks; And a decompression unit for extracting the residual data of the current block from the bitstream and restoring the current block using the prediction data and the residual data.

According to an embodiment of the present invention, there is provided a method of encoding an image, the method comprising: determining one or more blocks for dividing an image; Performing prediction based on a DNN (Deep Neural Network) on a current block among the one or more blocks to generate prediction data for the current block; Generating residual data of the current block using original data corresponding to the current block and the prediction data; And generating a bitstream obtained by coding the residual data.

By performing the prediction based on the learned DNN, the signaling of the prediction information can be omitted, and the coding and decoding efficiency can be enhanced.

FIG. 1 shows a detailed block diagram of an image encoding apparatus 100 according to an embodiment.

FIG. 2 shows a detailed block diagram of an image decoding apparatus 200 according to an embodiment.

3 is a diagram showing an example of prediction information used for intra prediction.

4 is a diagram showing an example of prediction information used for inter prediction.

FIG. 5 is a conceptual diagram illustrating a DNN-based prediction process according to an embodiment.

FIG. 6 is a diagram illustrating a DNN-based intra prediction process according to an exemplary embodiment of the present invention.

7 is a diagram showing a structure of an RNN.

8 is a diagram showing the structure of the LSTM.

9A is a diagram showing an example of RNN input data for generating first predictive data.

9B is a diagram showing another example of RNN input data for generating first predictive data.

9C is a diagram showing another example of RNN input data for generating first predictive data.

10 is a view showing the structure of CNN.

11 is a diagram showing an example of CNN input data for generating second predictive data.

12 is a diagram illustrating a DNN-based inter prediction process according to an embodiment.

13 is a diagram illustrating a structure of a bitstream according to an embodiment.

FIG. 14 shows a schematic block diagram of an image encoding apparatus 1400 according to an embodiment.

FIG. 15 shows a schematic block diagram of an image decoding apparatus 1500 according to an embodiment.

16 is a flowchart illustrating an image encoding method according to an embodiment.

17 is a flowchart illustrating a video decoding method according to an embodiment.

FIG. 18 illustrates a process in which at least one encoding unit is determined by dividing a current encoding unit according to an embodiment.

FIG. 19 illustrates a process in which at least one encoding unit is determined by dividing an encoding unit in the form of a non-square according to an embodiment.

FIG. 20 illustrates a process in which an encoding unit is divided based on at least one of block type information and division type information according to an embodiment.

FIG. 21 illustrates a method of determining a predetermined encoding unit among odd number of encoding units according to an embodiment.

FIG. 22 shows a sequence in which a plurality of encoding units are processed when a current encoding unit is divided to determine a plurality of encoding units according to an embodiment.

FIG. 23 illustrates a process in which if the encoding unit can not be processed in a predetermined order according to an embodiment, it is determined that the current encoding unit is divided into odd number of encoding units.

FIG. 24 illustrates a process in which a first encoding unit is divided according to an embodiment to determine at least one encoding unit.

25 illustrates that when the second encoding unit of the non-square type determined by dividing the first encoding unit according to an embodiment satisfies a predetermined condition, the form in which the second encoding unit can be divided is limited .

FIG. 26 illustrates a process of dividing a square-shaped encoding unit when the division type information can not indicate division into four square-shaped encoding units according to an embodiment.

FIG. 27 illustrates that the processing order among a plurality of coding units may be changed according to the division process of coding units according to an embodiment.

FIG. 28 illustrates a process of determining the depth of an encoding unit according to a change in type and size of an encoding unit when a plurality of encoding units are determined by recursively dividing an encoding unit according to an embodiment.

FIG. 29 shows a depth and a PID (PID) for a coding unit classification that can be determined according to the type and size of coding units according to an exemplary embodiment.

30 shows that a plurality of coding units are determined according to a plurality of predetermined data units included in a picture according to an embodiment.

FIG. 31 shows a processing block serving as a reference for determining a determination order of a reference encoding unit included in a picture according to an embodiment.

As used herein, the term " part " refers to a hardware component, such as a software, FPGA, or ASIC, and the " part " However, 'minus' is not limited to software or hardware. The " part " may be configured to be in an addressable storage medium and configured to play back one or more processors. Thus, by way of example, and not limitation, " part (s) " refers to components such as software components, object oriented software components, class components and task components, and processes, Subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and " parts " may be combined into a smaller number of components and " parts " or further separated into additional components and " parts ".

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the embodiments of the present invention. However, one embodiment may be implemented in various different forms and is not limited to the embodiments described herein. In order to clearly explain an embodiment in the drawings, parts not related to the description are omitted.

Although the terms used in one embodiment have selected general terms that are as widely used as possible in consideration of the functions in one embodiment, they may vary depending on the intention or circumstance of a person skilled in the art, the emergence of new techniques and the like. Also, in certain cases, there may be a term selected arbitrarily by the applicant, in which case the meaning thereof will be described in detail in the description of the corresponding invention. Therefore, the term used in an embodiment should be defined based on the meaning of the term, not on the name of a simple term, but on the contents of one embodiment.

This disclosure relates to a method of processing an image using artificial intelligence (AI) utilizing a machine learning algorithm. More specifically, the present invention relates to performing intraprediction or inter prediction using a Deep Neural Network (DNN) in an image encoding and decoding process.

Hereinafter, an overall operation related to encoding and decoding of an image will be described with reference to FIGS. 1 and 2. FIG. The intra prediction and inter prediction methods will be described below with reference to FIGS. A method of generating a predictive image using artificial intelligence will be described below with reference to FIGS. 5 to 17. FIG. A method of determining a data unit of an image according to an embodiment will be described below with reference to FIGS. 18 to 31. FIG.

The image encoding apparatus 100 according to an embodiment includes a block determination unit 110, an inter prediction unit 115, an intra prediction unit 120, a reconstruction picture buffer 125, a transform unit 130, a quantization unit An inverse quantization unit 140, an inverse transform unit 145, an in-loop filtering unit 150, and an entropy coding unit 155.

The block determination unit 110 according to an exemplary embodiment may divide the current image data into a maximum encoding unit according to a maximum size of a block for encoding an image. Each maximum encoding unit may include blocks (i.e., encoding units) that are divided into block types and division types. The maximum coding unit according to an exemplary embodiment may classify image data of a spatial domain included in the maximum coding unit hierarchically according to a block type and a division type. The block type of the encoding unit may be a square or a rectangle, and may be any geometric shape, and thus is not limited to a unit of a certain size.

As the size of a picture to be encoded increases, if an image is coded in a larger unit, the image can be encoded with a higher image compression rate. However, if the coding unit is enlarged and its size is fixed, the image can not be efficiently encoded reflecting the characteristics of the continuously changing image.

For example, when coding a flat area with respect to the sea or sky, the compression rate can be improved by increasing the coding unit. However, when coding a complex area for people or buildings, the compression ratio is improved as the coding unit is decreased.

For this, the block determination unit 110 according to an embodiment sets a maximum encoding unit of a different size for each picture or slice, and sets a block type and a division type of one or more encoding units divided from the maximum encoding unit. The size of the encoding unit included in the maximum encoding unit can be variably set according to the block type and the division type.

The block type and the division type of one or more encoding units may be determined based on a Rate-Distortion Cost calculation. The block type and the division type may be determined differently for each picture or slice, or may be determined differently for each maximum encoding unit. The determined block type and division type are outputted from the block determination unit 110 together with the image data for each coding unit.

According to one embodiment, an encoding unit divided from a maximum encoding unit may be characterized by a block type and a division type. A concrete method of determining the coding units in the block form and the division form will be described later in more detail with reference to FIG. 18 to FIG.

According to one embodiment, the coding units included in the maximum coding unit may be predicted or transformed (e.g., converting values of the pixel domain into values in the frequency domain) based on the processing units of different sizes. In other words, the image encoding apparatus 100 can perform a plurality of processing steps for image encoding based on various sizes and processing units of various types. In order to encode the image data, processing steps such as prediction, conversion, and entropy encoding are performed. The processing units of the same size may be used in all steps, and processing units of different sizes may be used in each step.

According to one embodiment, the prediction mode of an encoding unit may be at least one of an intra mode, an inter mode, and a skip mode, and the specific prediction mode may be performed only for a specific size or type of encoding unit. According to one embodiment, a prediction mode with the smallest coding error can be selected by performing prediction for each coding unit.

Also, the image encoding apparatus 100 can convert image data based on a processing unit having a different size from the encoding unit. The conversion may be performed based on a data unit having a size smaller than or equal to the encoding unit in order to convert the encoding unit.

According to an exemplary embodiment, the image encoding apparatus 100 may measure a coding error of an encoding unit using a Lagrangian Multiplier-based Rate-Distortion Optimization technique.

The intra prediction unit 120 performs intra prediction on blocks in the intra mode of the input image 105 and the inter prediction unit 115 performs the intra prediction on the input image 105 and the restored picture buffer 125, The inter-prediction is performed using the reference picture obtained in step S11. Whether intra prediction or inter prediction is performed may be determined for each block unit. The image encoding apparatus 100 may encode information on whether to perform intra prediction or inter prediction.

As will be described later, the intra-prediction unit 120 according to one embodiment can perform intra-prediction based on DNN, and the inter-prediction unit 115 can perform inter-prediction based on DNN.

Residual data is generated by calculating the difference between the data for the block of the input image 105 and the predicted data for each block output from the intra prediction unit 120 or inter prediction unit 115. [ The residual data is output as a transform coefficient quantized block by block through the transform unit 130 and the quantization unit 135. [ The quantized transform coefficients are restored into residual data in the spatial domain through the inverse quantization unit 140 and the inverse transform unit 145. The residual data of the reconstructed spatial domain is reconstructed into spatial domain data for the block of the input image 105 by adding the prediction data for each block outputted from the intra prediction unit 120 or the inter prediction unit 115 . The reconstructed spatial domain data is generated as a reconstructed image through the in-loop filtering unit 150. The in-loop filtering unit 150 may perform deblocking only, or may perform sample adaptive offset (SAO) filtering after deblocking. The generated restored image is stored in the restored picture buffer 125. The reconstructed pictures stored in the reconstructed picture buffer 125 may be used as reference pictures for inter prediction of other pictures. The transform coefficients quantized by the transforming unit 130 and the quantizing unit 135 may be output to the bitstream 160 via the entropy encoding unit 155. [

The bit stream 160 output from the image encoding apparatus 100 may include the encoding result of the residual data. In addition, the bitstream 160 may include a coding result of information indicating a block type, a division type, size information of a conversion unit, and the like.

The image decoding apparatus 200 according to an exemplary embodiment performs operations for decoding an image. The image decoding apparatus 200 according to an embodiment includes a receiving unit 210, a block determining unit 215, an entropy decoding unit 220, an inverse quantization unit 225, an inverse transform unit 230, an inter prediction unit 235, An intraprediction unit 240, a restored picture buffer 245, and an in-loop filtering unit 250.

The receiving unit 210 of FIG. 2 receives the bit stream 205 of the encoded image.

The block determination unit 215 according to an exemplary embodiment may divide the image data of the current picture into a maximum encoding unit according to a maximum size of a block for decoding an image. Each maximum encoding unit may include blocks (i.e., encoding units) that are divided into block types and division types. The block determination unit 215 according to an embodiment can divide image data of a spatial region into hierarchical blocks according to a block type and a division type by obtaining division information from the bit stream 205. [ On the other hand, when the blocks used for decoding have a certain shape and size, the block determining unit 215 can divide the image data without using the division information. The block determination unit 215 according to one embodiment may correspond to the block determination unit 110 of FIG.

The entropy decoding unit 220 obtains the encoded image data to be decoded and the encoding information necessary for decoding from the bitstream 205. The encoded image data is a quantized transform coefficient, and the inverse quantization unit 225 and the inverse transform unit 230 reconstruct the residual data from the quantized transform coefficients.

The intra prediction unit 240 performs intra prediction on the intra mode block. The inter prediction unit 235 performs inter prediction using the reference pictures obtained in the reconstruction picture buffer 245 for the inter mode block. Whether intra prediction or inter prediction is performed may be determined for each block unit. The video decoding apparatus 200 can obtain information on whether to perform intra prediction or inter prediction from the bit stream 205. [

As will be described later, the intra predictor 240 according to one embodiment can perform intraprediction based on DNN, and the inter predictor 235 can perform inter prediction based on DNN.

The spatial data for the block is restored by adding the predictive data and the residual data for each block through the intra predictor 240 or the inter predictor 235, And then output as a reconstructed image through the input unit 250. The in-loop filtering unit 250 may perform deblocking only, or may perform SAO filtering after deblocking.

As mentioned, the present disclosure includes techniques for applying DNN in performing intra-prediction or inter-prediction. Prior to describing the prediction operation, the DNN will be briefly described.

The Neural Network refers to the Computational Architecture, which models the biological brain. Neural networks are software or hardware-based recognition models that mimic the computational capabilities of biological systems using a large number of artificial neurons connected by a line. Artificial neurons, referred to as nodes, are connected to each other and collectively operate to process input data.

A neural network may include an input layer, a hidden layer, and an output layer. The input layer may receive an input for performing learning and transmit it to a hidden layer, and the output layer may generate an output of the neural network based on a signal received from nodes of the hidden layer. The hidden layer is located between the input layer and the output layer, and can change the learning data transmitted through the input layer to a value that is easy to predict. Nodes included in the input layer and the hidden layer can be connected to each other through connection lines having connection weights. In addition, the nodes included in the hidden layer and the output layer can be connected to each other through connection lines having connection weights. The input layer, the hidden layer, and the output layer may include a plurality of nodes.

The neural network may include a plurality of hidden layers. A neural network including a plurality of hidden layers is referred to as a deep neural network (DNN), and learning of DNN is referred to as deep learning. A node contained in a hidden layer is called a hidden node.

The DNN has a multilayer perceptrons structure including a plurality of hidden layers. Perceptron is a term referring to the mathematical model of each neuron, y = Wx + b. This multi-layer perceptron can improve the accuracy of prediction through learning through the backpropagation algorithm. The way in which DNN learns through the backpropagation algorithm is to start with the input layer and then use the reference label value (for example, the data representing the correct answer or the data with the smallest error with the original data) when the y value is obtained through the output layer. In case of an incorrect answer, the value is transmitted from the output layer toward the input layer and the W and b values are updated according to the calculated cost.

By providing a specific input / output dataset to such a DNN, it learns the data pattern of the provided input / output dataset at a high level and generates a model that deduces the most similar predicted image to the original data. In the case of the

intraprediction units

120 and 240 according to the embodiment, the input data set is the restored data of the current block used for intra prediction, and the output data set is the predicted data of the current block that minimizes the error with the original data Lt; / RTI > In the case of the

inter-prediction units

115 and 235 according to the embodiment, the input data set is the data of the past and / or future restored image referenced by the current block, and the output data set includes the current block Lt; / RTI > On the other hand, the error between the original data and the prediction data can be measured based on the R-D cost.

In this way, when prediction is performed using the learned DNN to generate a prediction block with a minimized error with the original data, the predictive information (for example, prediction information) from the image coding apparatus 100 to the image decoding apparatus 200, Prediction mode, reference picture index, etc.) need not be transmitted. In addition, the image decoding apparatus 200 can generate a prediction block without using prediction information by using a DNN having the same structure as the image encoding apparatus 100. [ According to an embodiment, information on the network structure of the DNN may be transmitted from the image encoding apparatus 100 to the image decoding apparatus 200.

However, the DNN according to one embodiment is not limited to the above-described structure, and may be formed by networks of various structures.

Examples of various types of DNNs include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Belief Network (DBN), Restricted Boltzman Machine ; RBM), but the present invention is not limited thereto.

As described above, the prediction based on the DNN does not require the signaling of the prediction information. Prior to describing the DNN-based prediction operation, one example of prediction information used in a conventional prediction technique will be described with reference to FIG. 3 to FIG.

Intra prediction is a prediction technique that allows only spatial reference, and refers to predicting a current block using pixels of a block adjacent to a block to be predicted. The modes used in intra prediction may exist in various forms as shown in FIG.

The intra prediction modes shown in FIG. 3 include a vertical mode, a horizontal mode, a DC (direct current) mode, a diagonal down-left mode, a diagonal down-right mode, There are a vertical right mode, a vertical left mode, a horizontal up mode, and a horizontal down mode.

For example, an operation of predicting a 4x4 current block according to mode 0, i.e., the vertical mode in Fig. 3, will be described. The pixel values of the pixels A to D adjacent to the upper side of the current block of the size of 4x4 are predicted as pixel values of the 4x4 current block. That is, the value of the pixel A is divided into four pixel values included in the first column of the 4x4 current block, the value of the pixel B is divided into four pixel values included in the second column of the 4x4 current block, Is predicted as the four pixel values included in the third column of the 4x4 current block and the pixel D value is predicted as the four pixel values included in the fourth column of the 4x4 current block. The prediction block generated through intraprediction, which extends values of neighboring pixels in a predetermined direction, may have a certain direction according to the prediction mode.

As described above, in a typical intraprediction operation, prediction information such as a prediction mode is signaled, and the signaled prediction information is used for generating a prediction block.

However, intra prediction based on DNN according to an embodiment does not require signaling of prediction information. The learned DNN has the generalization ability to analyze the input pattern and find the feature to generate the correct prediction image. The intra prediction based on the DNN according to an embodiment uses the learned DNN so that the error between the data of the prediction block and the data of the original image is minimized.

A detailed description of the intra prediction method based on DNN will be described later with reference to FIG. 6 to FIG.

Intra prediction is based on the fact that there is a high correlation between adjacent pixels in one picture. Similarly, each picture constituting a video has a high correlation with each other in terms of time. Therefore, the prediction value for the block in the current picture can be generated from the picture that has been reconstructed at the previous time. The technique of generating a prediction block from a picture reconstructed at a previous time is referred to as inter prediction.

For example, in the case of an image composed of 30 pictures in one second, it is very difficult to distinguish the difference between a picture and a neighboring picture with a human eye because the difference between pictures is small. As a result, if an image is output as 30 pictures in one second, the human recognizes that each picture is continuous. It is noted that the inter prediction can predict an unknown pixel value of the current picture from the pixel value of an already known picture constituting the previous picture when the picture of the current picture is similar to that of the previous picture. This inter prediction is based on the Motion Prediction technique. Motion prediction is performed by referring to a previous picture based on a time axis or referring to both a previous picture and a future picture. A picture referred to in encoding or decoding a current picture is referred to as a reference picture.

Referring to FIG. 4, the image is composed of a series of still images. These still images are divided into a group of pictures (GOP). Each still image is referred to as a picture or a frame. One picture group includes an I picture 410, a P picture 420, and a B picture 430. The P picture 420 and the B picture 430 are subjected to motion estimation and motion compensation using the reference pictures, To be coded. In particular, the B picture 430 is a picture that is coded by predicting a past picture and a future picture in forward and backward directions, that is, in both directions.

Referring to FIG. 4, motion estimation and motion compensation for coding a P-picture 420 use an I-picture 410 as a reference picture. The motion estimation and motion compensation for coding the B picture 430 uses the I picture 410 and the P picture 420 as reference pictures. As described above, in inter prediction, motion estimation and compensation can be performed using not only one reference picture but also multiple pictures.

The inter-prediction process includes searching for an optimal prediction block from reference pictures through motion estimation, and generating a prediction block through a motion compensation process. When a prediction block is generated through inter prediction, a residual signal, which is a difference value between the generated prediction block and the original block, is transformed, quantized, and entropy-encoded. At this time, prediction information such as a motion vector, a prediction direction, a reference picture index and the like is signaled together with a residual signal in a conventional inter prediction technique. That is, in a typical inter-prediction operation, prediction information is signaled, and the signaled prediction information is used for prediction block generation.

However, inter prediction based on DNN according to an embodiment does not require signaling of prediction information. The learned DNN has the generalization ability to analyze the input pattern and find the feature to generate the correct prediction image. The DNN-based inter prediction according to an embodiment uses the learned DNN so that the error between the data of the prediction block and the data of the original image is minimized.

A detailed description of the DNN-based inter prediction method will be described later with reference to FIG.

Referring to FIG. 5, an input image 510, a DNN 520, and a prediction image 530 are shown. When an area to be predicted is referred to as a current block 515, the input image 510 may be image data restored before the current block 515. Although the input image 510 of FIG. 5 is shown as existing in the same picture as the current block 515, the input image 510 may be a frame different from the picture to which the current block 515 belongs. For example, at the time of intra prediction, restored data belonging to the same picture as the current block 515 is used as the input image 510, and at the time of inter prediction, the restored picture is used as the input image 510 .

The input image 510 may be input to the input layer of the DNN 520 as learning data. The data transferred through the input layer of the DNN 520 may be changed to a value that is easy to predict in the hidden layer. The hidden layer is connected through a connection line having a connection weight between the input layer and the output layer. The output layer of the DNN 520 may generate an output, i.e., a prediction image 530, based on the signal received from the nodes of the hidden layer. The input layer, the hidden layer, and the output layer include a plurality of nodes. The DNN 520 maps a mapping between the input image 510 and the prediction image 530 through an algorithm between the plurality of nodes. Can be generated. When the DNN 520 learns to output a prediction image 530 having the smallest error with respect to the input image 510, the DNN 520 can generate a relatively correct output for an input pattern that has not been used for learning And has a generalization ability.

The DNN 520 according to one embodiment may be implemented as a set of layers including a convolutional pulling layer, a hidden layer, and a fully connected layer. For example, the overall structure of DNN 520 may be in the form of a hidden layer at the convolution pooling layer, followed by a fully connected layer at the hidden layer.

Also, the DNN 520 according to one embodiment may be implemented in the form of a CNN.

CNN according to an exemplary embodiment is a structure suitable for image analysis. The CNN includes a feature extraction layer that learns a feature having a greatest discriminative power from given image data, And a prediction layer for learning a prediction model so as to achieve a prediction performance.

The feature extraction layer consists of a convolution layer that creates a feature map by applying a plurality of filters to each region of the image, and features that are invariant to changes in position or rotation by spatially integrating feature maps And a plurality of repetition times of a pooling layer for extracting the data. Through this, it is possible to extract various levels of features from low-level features such as points, lines, and surfaces to complex and meaningful high-level features.

The convolution layer obtains a feature map by taking a nonlinear activation function internally of the filter and the local receptive field for each patch of the input image. Compared to other network structures, CNN is rare There is a feature that uses filters with one connectivity (Sparse Connectivity) and shared weights (Shared Weights). This connection structure reduces the number of parameters to be learned and makes learning through the backpropagation algorithm efficient, resulting in improved prediction performance.

A pooling layer or a sub-sampling layer generates a new feature map by utilizing the local information of the feature map obtained at the previous convolution layer. Generally, the feature map newly generated by the integration layer is reduced in size to a smaller size than the original feature map. Typical integration methods include Max Pooling, which selects the maximum value of the corresponding region in the feature map, And average pooling to obtain the average value of the area. The feature map of the integrated layer generally has less influence on the position of any structure or pattern existing in the input image than the feature map of the previous layer. That is, the integrated layer can extract more robust features in the local changes such as noise or distortion in the input image or the previous feature map, and this feature can play an important role in the classification performance. The role of the other integration layer is to reflect the characteristics of a wider area as it goes from the deep structure to the upper learning layer. As the feature extraction layer accumulates, the lower layer reflects local characteristics and climbs up to the upper layer It is possible to generate a feature map that reflects more abstract features of the whole image.

In this way, the final feature extracted through repetition of the convolution layer and the integration layer is that the classification model such as Multi-layer Perception (MLP) or Support Vector Machine (SVM) connected Layer) to be used for classification model learning and prediction.

Also, the DNN 520 according to one embodiment may be implemented in the form of an RNN.

According to one embodiment, the output of the hidden node in the previous time interval in the structure of DNN 520 may be connected to hidden nodes in the current time interval. The output of the hidden node in the current time interval may be connected to the hidden nodes in the next time interval. Thus, a neural network having a recurrent connection between hidden nodes in different time intervals is called a recurrent neural network (RNN). The RNN according to one embodiment may recognize sequential data. The sequential data is data having time and order, such as voice data, image data, biometric data, handwriting data, and the like. For example, the recognition model of the RNN can recognize which pattern the input image data changes according to.

Output data set for a specific period is provided to the RNN to learn a data pattern for a corresponding period to generate a model for inferring a prediction image 530 most similar to the original data.

The DNN-based intraprediction according to an exemplary embodiment may be performed in the intraprediction unit 120 of the image encoding apparatus 100 or the intraprediction unit 240 of the image decoding apparatus 200.

The

intraprediction units

120 and 240 according to an exemplary embodiment may generate prediction data by performing prediction based on a DNN on a current block 610 to be predicted. At this time, the prediction based on the DNN may include a first prediction and a second prediction. The prediction data generated by performing the prediction based on the DNN may mean the final prediction data 660 generated by the first prediction and the second prediction. The first prediction may be based on RNN and the second prediction may be based on CNN. Thus, a prediction based on DNN according to an embodiment may include a first prediction 620 based on RNN and a second prediction 630 based on CNN.

Referring to FIG. 6, the

intra-prediction units

120 and 240 perform a first prediction 620 based on an RNN with respect to a current block 610 to be predicted to generate first prediction data 640 Can be generated. In this case, the RNN may be a learned network in which the first prediction data 640 generated at the output of the RNN is the same as the original data of the current block 610. That is, by using the learned RNN, the first prediction data 640 having the smallest error with the original data of the current block 610 can be generated. The structure of the RNN will be described in more detail with reference to FIGS. 7 to 8. FIG.

A first prediction 620 based on an RNN according to an embodiment may use the neighboring

blocks

612, 614, 616, 618 adjacent to the current block 610 as inputs. The neighboring

blocks

612, 614, 616, and 618 may be blocks restored before the current block 610. Although the neighboring

blocks

612, 614, 616, and 618 shown in FIG. 6 are shown as located on the upper left, upper, upper right, and left sides of the current block 610, It may be.

The prediction using the neighboring

blocks

612, 614, 616, and 618 in the intra prediction may be performed when the neighboring

blocks

612, 614, 616, and 618 have continuity or directionality with respect to the current block 610 . Thus, in order to perform the task of inferring the correlation through successive input patterns, it may be desirable to use an RNN that makes it possible to link the previous information to the current task. For example, the RNN input order between the neighboring

blocks

612, 614, 616, and 618 may affect prediction of the current block 610.

Hereinafter, the data input to the RNN for the first prediction 620 based on the RNN will be referred to as " first input data ". The RNN according to one embodiment can recognize sequential data. The order of inputting " first input data " to the RNN will be described in detail with reference to Figs. 9A to 9C.

The

intraprediction units

120 and 240 may generate the second prediction data 650 by performing a second prediction 630 based on CNN on the generated first prediction data 640. [ The CNN may be a learned network such that the second predicted data 650 generated at the output of the CNN is equal to the original data of the current block 610 minus the first predicted data 640 . As described above, by using the learned CNN, the second predicted data 650 having the smallest error from the original data of the current block 610 minus the first predicted data 640 can be generated. In other words, the CNN-based second prediction 630 process can be understood as a process for predicting a value obtained by subtracting the first prediction data 640 from the original data of the current block 610. The structure of CNN will be described in more detail with reference to FIG.

The second prediction 630 based on the CNN according to an embodiment may use the data of the area including the current block 610 and the neighboring

blocks

612, 614, 616, and 618 as an input. Data in the area including the current block 610 and the neighboring

blocks

612, 614, 616 and 618 is divided into first predicted data 640 corresponding to the current block 610 and neighboring

blocks

612, 614, 616 , &Lt; / RTI > 618). Hereinafter, data input to the CNN for the second prediction 630 based on CNN will be referred to as " second input data ".

The "second input data" will be described in detail with reference to FIG.

The

intraprediction units

120 and 240 may generate the final prediction data 660 for the current block 610 by adding the first prediction data 640 and the second prediction data 650 together.

The image encoding apparatus 100 generates residual data by calculating the difference between the original data of the current block 610 and the final prediction data 660 and outputs the bitstream obtained by encoding the generated residual data And transmits it to the image decoding apparatus 200. The image encoding apparatus 100 according to the embodiment does not encode additional prediction information (for example, prediction mode information).

The image decoding apparatus 200 according to an embodiment can restore the data of the current block 610 by adding the residual data obtained from the bitstream to the final prediction data 660. [ At this time, the video decoding apparatus 200 can generate the final prediction data 660 without acquiring additional prediction information from the bitstream.

7 is a diagram showing a structure of an RNN.

An RNN is a network in which there is a connection between hidden nodes in different time intervals, and can learn the network through supervised learning. Supervised learning is a method of inputting learning data and output data corresponding thereto together into a neural network and updating connection weights of connection lines so that output data corresponding to the learning data is output. For example, the RNN can update the connection weights between neurons through delta rules and backpropagation learning.

Error backpropagation learning estimates errors by forward computation for given learning data, then propagates the error estimated by advancing from the output layer to the hidden layer and toward the input layer, Lt; RTI ID = 0.0 > direction. &Lt; / RTI > The processing of the neural network proceeds in the direction of the input layer, the hidden layer, and the output layer, but in the error backpropagation learning, the update direction of the connection weighting can proceed in the direction of the output layer, the hidden layer, and the input layer.

The RNN may define an objective function for measuring how close the optimal connection weights are currently set, continually change the connection weights based on the result of the objective function, and repeat the learning. For example, the objective function may be an error function for calculating an error between the actual output value of the RNN based on the training data and the desired expected value to be output. The RNN may update the connection weights in a direction that reduces the value of the error function.

The

intra prediction unit

120 and 240 may perform a first prediction based on an RNN with respect to a current block 610 according to an embodiment of the present invention. Here, the RNN includes a long short-term memory (LSTM) Structure. LSTM is a kind of RNN that can perform long-term dependency learning. An RNN that does not include an LSTM network can link the previous information to the current task, but it is difficult to link information of a previous task that is far away in time to the current task. LSTM is a structure designed to avoid these long-term dependency problems. The detailed structure of the LSTM will be described later with reference to FIG.

Referring to FIG. 7, the RNN may include an LSTM network 720 and a fully connected network 730.

For learning of the RNN, the LSTM network 720 may detect feature values from the input data 710. For example, the LSTM network 720 may extract a relative change amount that varies with time in the input data 710 as a feature value. The LSTM network 720 may obtain sufficient feature values from the input data 710 and may learn the RNN using the obtained feature values. Here, the input data 710 may be the first input data.

The RNN according to an exemplary embodiment may learn a change trend of a block that changes in a specific direction. To this end, the neighboring blocks of the current block may be input to the LSTM network 720 in a changing order. In this case, the blocks input to the LSTM network 720 are blocks in the same time frame.

According to one embodiment, the input data 710 may be input to the LSTM network 720 in order. According to one embodiment, neighboring blocks adjacent to the current block may be input to the LSTM network 720 in an input order corresponding to the change trend. For example, neighboring blocks may be learned and entered into each LSTM network 720 by time step or time stamp. For example, each of the neighboring blocks is connected to the LSTM network 710 in the order of '0' input data 710, '1' input data 710, '2' input data 710, 720).

In a subsequent time interval, the output value output from the LSTM network 720 of the RNN may be input to the LSTM network 720 at the next time step. For example, the output value " s1 " of the LSTM network 720 that processed the input data 710 of '0' may be input to the LSTM network 720 that processes the input data 710 of '1' . The output value " s2 " of the LSTM network 720 that processed the input data 710 of '1' may be input to the LSTM network 720 that processes the input data 710 of '2'.

Referring to FIG. 7, for example, if the LSTM network 720 performing learning on the '1' input data 710 indicates the learning pattern in the current time step T, The LSTM network 720 which performs learning on the input data 710 of the LSTM network 720 represents the learning pattern in the previous time step T-1 and the LSTM network 720 which performs learning on the '2' Indicates a learning pattern at the next time step (T + 1). As such, the LSTM network 720 uses the structure for both the previous time step, the current time step, and the next time step for learning. The information of the current stage in the LSTM network 720 may be transferred to the next stage to affect the output value.

The fully connected network 730 may classify the learning results of the LSTM network 720 for sequential data and output the output data 740 from the output layer of the RNN. The output data 740 according to one embodiment may be the first prediction data.

The learning process of the RNN may include a process of comparing the output value generated at each time step with a desired expected value, and adjusting the connection weight of the nodes in a direction of reducing the difference between the output value and the expected value. For example, the input data 710 input to the LSTM network 720 may be multiplied and added with the connection weights of the LSTM network 720 and the fully connected network 730. At this time, a difference may occur between the output data 740 of the generated RNN and the expected output data, and the RNN may update the connection weights of the nodes in the direction of minimizing the difference.

8 is a diagram showing the structure of the LSTM.

The LSTM 800 shown in FIG. 8 corresponds to each cell forming the LSTM network 720 of FIG. The LSTM 800 is a deep-run structure that avoids the long-term dependency of the RNN by selectively updating the cell state storing information.

The cell state of the LSTM 800 may be added or erased via a structure called a gate. Each cell may be composed of three gates for performing write, read and keep operations, and each gate may have a value between '0' and '1'. The value of each gate is the basis for determining whether to store, retrieve, or hold cell information. As such, the gates of each cell can optionally carry information. The process of selective information transfer may consist of a sigmoid layer, a tanh layer, and a pointwise multiplication operation. On the other hand, the values between '0' and '1' of each gate can be learned on the same principle as the weights of the neural network.

8, a gated recurrent unit (GRU), which is one of various types of LSTM 800, is illustrated.

According to one embodiment, each module of the LSTM 800 includes a plurality of interacting layers. The LSTM 800 may generate a new cell state and a new cell output by applying a plurality of gates to the current cell state and the current cell state on a time step basis.

First sigmoid layer of LSTM (800) receives a h _t x _{_t-1} and outputs a r _t. Depending on the output value of the first sigmoid layer, it is determined whether h _t-1 , i.e., whether the cell state of the previous stage is maintained or not. The value of '1' on the sigmoid layer means "completely maintained" and the value of '0' means "completely removed". The function of the first sigmoid layer is shown in Equation (1).

The second sigmoid layer of the LSTM 800 receives h _t _-1 and x _t by Equation 2 and outputs z _t . It can be determined which values to update via the second sigmoid layer.

The tanh layer of the LSTM 800 generates a vector of new candidate values that can be added to the cell state. The output value of the second sigmoid layer and the output value of the tanh layer can be summed to produce a value to update the cell state. The function of the tanh layer is expressed by Equation (3).

Lastly, the LSTM 800 can update the cell state h _t _- ₁ of the previous step using Equation (4). The updated new cell state is represented by h _t .

The new cell state derived by Equation (4) may serve as a basis for determining whether to use the data input to the corresponding cell in network learning. As such, the LSTM 800 can avoid the long-term dependency problem of the RNN by selectively updating cell state storing information.

Meanwhile, the LSTM 800 according to an exemplary embodiment is not limited to the GRU structure as described above, and may have a modified structure in various forms.

As described above with reference to FIGS. 7 and 8, the RNN can be utilized to recognize sequential data. That is, when the sequential data is inputted, the recognition model of the RNN extracts the feature value from the sequential data, and outputs the recognition result by classifying the extracted feature value. A sequential data input method of an RNN according to an embodiment will be described below.

Since intra prediction is a process of predicting a current block depending on a pattern of a neighboring block having a certain directionality, it is preferable that the input data for learning of the RNN is also sequentially input according to a certain direction.

Referring to FIG. 9A, a current block 910 and neighboring blocks (blocks '0' to '11') for which prediction is to be performed are shown. The

intra predictors

120 and 240 may use neighboring blocks adjacent to the current block 910 as input of the RNN as first input data in order to perform the first prediction. The first input data is the data restored before the current block 910, and the position in which the first input data is distributed is not limited to the position shown in FIG. 9A.

According to one embodiment, the

intra-prediction unit

120, 240 may determine one or more input angles 912, 914, 916, 918 based on the current block 910. At this time, one or more input angles 912, 914, 916, 918 may be preset. According to another embodiment, one or more input angles 912, 914, 916, and 918 may be determined by information signaled from the image encoding apparatus 100 to the image decoding apparatus 200.

According to one embodiment, the

intra-prediction units

120 and 240 may determine neighboring blocks (blocks '0' to '11') for each input angle located along each of the input angles 912, 914, 916 and 918 . The neighboring blocks for each input angle may correspond to the first input data for generating the first prediction data.

According to an embodiment, the

intraprediction units

120 and 240 can input neighboring blocks (blocks '0' to '11') according to input angles to each cell of the LSTM network in clockwise order. For example, the neighboring blocks placed at each input angle are input to each cell of the LSTM network in the order of the input angle 912, the input angle 914, the input angle 916, and the input angle 918 according to the time step . However, the order in which the adjacent blocks are input according to the input angle is not necessarily the order of the clockwise direction, and the order of counterclockwise direction or other predetermined direction may be followed.

If there are a plurality of neighboring blocks per input angle, the order of input between neighboring blocks located at the same input angle may be in the order of positions closer to the current block. For example, referring to FIG. 9A, the input order between neighboring blocks (blocks '0' to '2') located at the input angle 912 is block 0, block 1, '. However, the order of input between adjacent blocks located at the same input angle is not limited to the above. For example, the order of input between neighboring blocks located at the same input angle may be in the order farther from the nearest position in the current block.

The input order of the neighboring blocks inputted to the RNN by the

intraprediction units

120 and 240 according to an exemplary embodiment may be set in advance. According to another embodiment, the input order of the neighboring blocks may be determined by the information signaled from the image coding apparatus 100 to the image decoding apparatus 200. For example, the first input data may be the order Lt; RTI ID = 0.0 > RNN. &Lt; / RTI > However, the following embodiments are for illustrative purposes only, and the input order of the first input data may be variously modified.

The

intraprediction units

120 and 240 predict the positions of the left neighboring blocks (blocks '0' to '2') located at the left side of the current block 910 at distant positions from the current block 910 Each of the left neighboring blocks may be input to each cell of the LSTM network of the RNN.

Next, the

intraprediction units

120 and 240 predict the neighboring blocks (blocks '3' to '5') located in the upper left direction of the current block 910 from a position distant from the current block 910 Each of the upper left-side neighboring blocks may be input to each cell of the LSTM network of the RNN in order of location.

Next, the

intraprediction units

120 and 240 predict the neighboring blocks (block '6' to block '8') located in the upper direction of the current block 910 from positions far from the current block 910 Each of the upper neighboring blocks may be input to each cell of the LSTM network of the RNN in order.

Next, the

intraprediction units

120 and 240 predict the neighboring blocks (blocks '9' to '11') located on the upper right side of the current block 910 from a position distant from the current block 910 Each of the upper right neighboring blocks may be input to each cell of the LSTM network of the RNN in order of position.

According to one embodiment, the

intraprediction units

120 and 240 may input neighboring blocks for each input angle to each cell of the LSTM network in the clockwise order. For example, the neighboring blocks located at each input angle may have an input angle 922, an input angle 924, an input angle 926, an input angle 928, an input angle 930, an input angle 932 And an input angle 934 in this order. If there are a plurality of neighboring blocks per input angle, the order of input between neighboring blocks located at the same input angle may be in the order of positions closer to the current block. For example, referring to FIG. 9B, the input order between neighboring blocks (blocks 3 'to 5') located at the input angle 926 is block 3, block 4, '. However, the above-described input sequence is merely exemplary, and neighboring blocks located at the same input angle can be input to the LSTM network based on various input sequences.

Referring to FIG. 10, a current block 940 and neighboring blocks (blocks '0' to '5') for which prediction is to be performed are shown. The

intraprediction units

120 and 240 according to an exemplary embodiment may use neighboring blocks adjacent to the current block 940 as input of the RNN as first input data in order to perform the first prediction. The first input data is data restored before the current block 940.

The

intraprediction units

120 and 240 according to an exemplary embodiment may input neighboring blocks to each cell of the LSTM network in the sequence of Z scans 942.

For example, the

intraprediction units

120 and 240 determine the order of the upper right side position of the current block 1010 and the left side position of the current block 1010 (i.e., block '0') from the upper left position of the current block 1010, To block ' 5 ') to each cell of the LSTM network of the RNN.

However, the input order of the above-mentioned neighboring blocks is merely an example, and neighboring blocks may include various scans (for example, raster scan, N scan, up-right diagonal scan, A horizontal scan, a vertical scan, etc.) may be input to each cell of the LSTM network in this order.

10 is a view showing the structure of CNN.

10, input data 1010 is input through an input layer of CNN 1020 and output data 1030 is output through an output layer of CNN 1020. [

A plurality of hidden layers may be included between the input layer and the output layer of the CNN 1020. Each layer that forms the hidden layer may include a convolution layer and a subsampling layer. The convolution layer performs a convolution operation on the image data input to each layer using a convolution filter, and generates a feature map. At this time, the feature map means image data in which various features of the input data 1010 are expressed. The subsampling layer reduces the size of the feature map by sampling or pooling. The output layer of CNN 1020 classifies the class of image data by combining various features expressed in the feature map. At this time, the output layer may be configured as a completely connected layer.

The structure of the CNN according to one embodiment (e.g., the number of hidden layers, the number and size of filters in each layer, etc.) is determined in advance and the weight matrix of the filter (in particular, the convolution filter) ) Is calculated to an appropriate value using the data that the correct answer is known to belong to which class. Thus, data that are known to be correct are used as 'learning data'. At this time, the process of determining the weighting matrix of the filter means 'learning'.

For example, in the structure of CNN 1020, the number of filters per layer may be 64, and the size of each filter may be 3x3. Also, for example, in the structure of CNN 1020, the total number of layers may be ten. However, the above-described embodiment is merely for illustrative purposes, and the number of hidden layers, the number and size of filters in each layer, and the like can be changed according to various forms.

As described above with reference to FIG. 6, the

intra-prediction units

120 and 240 according to an exemplary embodiment perform a second prediction based on the CNN 1020 on the first prediction data generated through the RNN-based first prediction To generate second prediction data (i.e., output data 1030). The CNN 1020 according to an embodiment may be a network that has been learned such that the second predicted data, which is the output data 1030, is equal to the original data of the current block minus the first predicted data.

In this case, the input data 1010 of the CNN 1020 may be the first predictive data and the neighboring reconstructed data adjacent to the current block, and the output data 1030 may be a value obtained by subtracting the first predictive data from the original data of the current block, May be the prediction data of the current block that minimizes the error of the current block. On the other hand, the error can be measured based on the R-D cost.

The CNN input data 1100 according to an embodiment is second input data. The CNN input data 1100 corresponds to the input data 1010 of FIG.

Referring to FIG. 11, CNN input data 1100 according to an exemplary embodiment may include first prediction data 1110 and

peripheral reconstruction data

1120, 1130, and 1140. However, the above embodiment is for illustrative purposes only, and the CNN input data 1100 can be modified into various forms.

The

intraprediction units

120 and 240 according to an exemplary embodiment of the present invention may include first prediction data output through a first prediction operation based on the RNN of FIG. 7 and second prediction data output through a second prediction operation based on CNN of FIG. Data can be summed to generate final prediction data for the current block.

The image encoding apparatus 100 generates residual data by calculating the difference between the original data of the current block and the final predicted data, generates a bitstream obtained by encoding the generated residual data, (200). The image encoding apparatus 100 according to the embodiment does not encode additional prediction information (for example, prediction mode information). According to one embodiment, the learned DNNs (i.e., RNN and CNN) have a generalization capability capable of analyzing input patterns to find their characteristics and generating a correct prediction image.

The image decoding apparatus 200 according to the embodiment can restore the data of the current block by adding the residual data obtained from the bitstream to the final predicted data. At this time, the video decoding apparatus 200 can generate final prediction data without acquiring additional prediction information from the bitstream.

The inter prediction process is a process of finding an optimal prediction block from reference pictures through motion estimation and generating a prediction block through a motion compensation process. An image is composed of a series of still images, and still images are divided into a group of pictures. One picture group includes an I picture, a P picture, and a B picture. Among them, the P picture and the B picture are pictures to be coded by performing motion estimation and motion compensation using reference pictures.

The DNN-based inter prediction may be performed in the inter-prediction unit 115 of the image coding apparatus 100 or the inter-prediction unit 235 of the image decoding apparatus 200 according to an embodiment. The DNN-based inter prediction according to an exemplary embodiment may be based on RNN, CNN, etc.

Referring to FIG. 12, a process of inputting restored pictures 1220 and 1230 to the DNN 1240 for predicting the current picture 1210 is illustrated. At this time, the current picture 1210 may be a P picture or a B picture. If the current picture 1210 is a picture corresponding to the viewpoint t, the reconstructed picture 1220 used for inter prediction is reconstructed and reconstructed corresponding to the past viewpoint (for example, t-1, t-2) And the reconstructed picture 1230 may be a reconstructed picture corresponding to a future viewpoint (e.g., t + 1, t + 2) based on the viewpoint t. In addition, the DNN 1240 according to one embodiment may use information on the characteristics of the input image, such as the type of the current picture, to generate the prediction data 1250.

The

inter-prediction unit

115 and 235 according to an exemplary embodiment may generate the prediction data 1250 by performing inter-prediction based on the DNN on the current block 610 in the current picture 1210 to be predicted . In this case, the DNN 1240 may be a learned network such that the prediction data 640 generated as an output is the same as the original data of the current block in the current picture 1210. That is, by using the learned DNN, the prediction data 1250 having the smallest error from the original data of the current block can be generated. The learned DNN 1240 has a generalization capability capable of analyzing an input pattern and finding its characteristics to determine a correct reference picture and reference block location. Thus, inter prediction based on DNN according to an embodiment does not require signaling of prediction information such as a motion vector, a prediction direction, a reference picture index, and the like.

The DNN 1240 according to one embodiment may be implemented as a set of layers including a convolutional pulling layer, a hidden layer, and a fully connected layer. For example, the overall structure of the DNN 1240 may be in the form of a hidden layer in the convolution pooling layer, followed by a fully connected layer in the hidden layer.

Since the structure of the DNN 1240 according to an embodiment is as described above with reference to FIG. 5, a detailed description of the structure of the DNN 1240 will be omitted.

The image encoding apparatus 100 generates residual data by calculating the difference between the original data of the current block in the current picture 1210 and the prediction data 1250 and outputs the encoded residual data And transmits the generated stream to the video decoding apparatus 200. The image encoding apparatus 100 according to the embodiment does not encode additional prediction information (e.g., a motion vector, a prediction direction, a reference picture index, etc.).

The image decoding apparatus 200 according to an embodiment can restore the data of the current block in the current picture 1210 by adding the residual data acquired from the bitstream to the prediction data 1250. [ At this time, the video decoding apparatus 200 can generate the prediction data 1250 without acquiring another prediction information from the bitstream.

The encoded bitstream 1300 is composed of a plurality of NAL (Network Abstraction Layer) units. The NAL unit is used to store not only encoded sample data such as an encoded slice 1340 but also high level syntax metadata such as parameter set data, slice header data (not shown) or supplemental enhancement information data (not shown) . &Lt; / RTI >

The parameter set may include a required syntax element (e.g., a video parameter set (VPS) 1310) that may be applied to multiple bitstream layers, a required syntax element that may be applied to an encoded video sequence within a layer , A sequence parameter set (SPS) 1320), or a required syntax element (e.g., a picture parameter set (PPS) 1330) that may be applied to multiple pictures in a single encoded video sequence Lt; / RTI > The parameter set may be transmitted with the encoded picture of the bitstream, or transmitted via other means, including a reliable channel, hard-coding, out-of-band transmission, and the like.

The slice header may be a high-level syntax structure that includes picture-related information for slices or picture types.

The SEI message may convey information that may be used for various other purposes such as picture output timing, display, loss detection, and concealment, although this may not be necessary for the decoding process.

According to one embodiment, the parameter set included in the bitstream 1300 may include additional information for performing prediction based on the DNN. The additional information according to an exemplary embodiment includes information on the structure of the DNN (e.g., filter set, information on the number of nodes, etc.) and information on the block to which the DNN is applied can do. Further, the additional information may include information on the input angle and / or the input order for determining the order of inputting the input data to the RNN.

For example, the side information may be signaled through a video parameter set 1310, a sequence parameter set 1320, a picture parameter set 1330, etc. in the bit stream 1300.

Meanwhile, the additional information may be transmitted through a bitstream, but it may be shared in advance between the image encoding apparatus 100 and the image decoding apparatus 200. Further, the additional information may be shared through a separate server capable of communication.

The image encoding apparatus 1400 according to one embodiment may correspond to the image encoding apparatus 100 of FIG.

14, the image encoding apparatus 1400 includes a block determining unit 1410, a predicting unit 1420, a compressing unit 1430, and a transmitting unit 1440. The block determination unit 1410 of FIG. 14 may correspond to the block determination unit 110 of FIG. The prediction unit 1420 of FIG. 14 may correspond to the intra prediction unit 120 or the inter prediction unit 115 of FIG. The compression unit 1430 of FIG. 14 may correspond to the conversion unit 130, the quantization unit 135, and the entropy encoding unit 155 of FIG.

The block determination unit 1410 according to an exemplary embodiment may divide the image data of the current picture into a maximum encoding unit according to the maximum size of the encoding unit. Each maximum encoding unit may include blocks (i.e., encoding units) that are divided into block types and division types. The maximum coding unit according to an exemplary embodiment may classify image data of a spatial domain included in the maximum coding unit hierarchically according to a block type and a division type. The block type of the encoding unit may be a square or a rectangle, and may be any geometric shape, and thus is not limited to a unit of a certain size. The block form and the division form of one or more blocks may be determined based on the R-D cost calculation.

According to one embodiment, the predicting unit 1420 performs a DNN-based prediction on the current block among the blocks determined by the block determining unit 1410. [

In the case of intraprediction, the predicting unit 1420 generates a first prediction data by performing a first prediction based on a DNN with respect to a current block, performs a second prediction based on the DNN on the first prediction data, And generate the final prediction data for the current block using the first prediction data and the second prediction data. Here, the first prediction may be based on RNN, and the second prediction may be based on CNN. The concrete procedure of intra prediction according to an embodiment has been described above with reference to FIG. 6 to FIG. 12, and a detailed description thereof will be omitted.

In the case of inter prediction, the prediction unit 1420 performs inter prediction based on the DNN on the current block among the one or more blocks to determine one or more reference pictures and one or more reference block positions, and determines one or more reference pictures and one or more reference blocks The prediction data for the current block can be generated using the position. The detailed process of the inter prediction according to the embodiment has been described with reference to FIG. 13, so a detailed description will be omitted.

The compression unit 1430 according to one embodiment generates residual data by calculating the difference between the original data of each block and the prediction data for each block output from the prediction unit 1420. [ The compression unit 1430 transforms and quantizes residual data and generates block-quantized transform coefficients. The compression unit 1430 outputs a bitstream obtained by entropy encoding the quantized transform coefficients. The encoded bitstream may include the encoding result of the residual data. In addition, the bit stream may include the encoding result of the information indicating the block type, the division type, the size information of the conversion unit, and the like.

The transmitting unit 1440 according to an embodiment transmits the bit stream output from the compressing unit 1430 to the image decoding apparatus.

The image decoding apparatus 1500 according to one embodiment may correspond to the image decoding apparatus 200 of FIG.

15, the image decoding apparatus 1500 includes a receiving unit 1510, a block determining unit 1520, a predicting unit 1530, and a restoring unit 1540. The receiving unit 1510 of FIG. 15 may correspond to the receiving unit 210 of FIG. The block determination unit 1520 of FIG. 15 may correspond to the block determination unit 215 of FIG. The prediction unit 1530 of FIG. 15 may correspond to the intra prediction unit 240 or the inter prediction unit 235 of FIG. The restoration unit 1540 of FIG. 15 may correspond to the entropy decoding unit 220, the inverse quantization unit 225, and the inverse transformation unit 230 of FIG.

The receiving unit 1510 according to an embodiment receives the encoded bitstream.

The block determination unit 1520 according to an embodiment may obtain the division information from the bit stream and divide the image data of the spatial domain into hierarchical blocks according to the block type and the division type. On the other hand, when the blocks used for decoding have a certain shape and size, the block determining unit 1520 can divide the image data without using the division information.

According to one embodiment, the predicting unit 1530 performs prediction based on the DNN on the current block among the blocks determined by the block determining unit 1520. [ On the other hand, the information on the structure of the DNN may be acquired from the bitstream as a form of additional information.

In the case of intraprediction, the predicting unit 1530 generates a first prediction data by performing a first prediction based on a DNN with respect to a current block, performs a second prediction based on DNN with respect to the first prediction data, And generate the final prediction data for the current block using the first prediction data and the second prediction data. Here, the first prediction may be based on RNN, and the second prediction may be based on CNN. The concrete procedure of the intra prediction according to the embodiment has been described above with reference to FIG. 6 to FIG. 11, and a detailed description thereof will be omitted.

In the case of inter prediction, the prediction unit 1530 performs inter prediction based on DNN on the current block among the one or more blocks to determine one or more reference pictures and one or more reference block positions, and determines one or more reference pictures and one or more reference blocks The prediction data for the current block can be generated using the position. The detailed process of the inter prediction according to the embodiment has been described with reference to FIG. 12, so a detailed description will be omitted.

The reconstruction unit 1540 according to one embodiment obtains the residual data of each block by inverse-quantizing and inverse transforming the quantized transform coefficients obtained by entropy decoding the bitstream. Thereafter, the restoring unit 1540 restores the image by using the residual data of each block and the predicted data of each block generated by the predicting unit 1530.

In step S1610, the image coding apparatus 100 determines one or more blocks for dividing the image.

In step S1620, the image encoding apparatus 100 generates prediction data for the current block by performing prediction based on the DNN on the current block among the one or more blocks.

In step S1630, the image coding apparatus 100 generates residual data of the current block by using the original data and the prediction data corresponding to the current block.

In step S1640, the image coding apparatus 100 generates a bit stream obtained by coding residual data.

In step S1710, the video decoding apparatus 200 receives the bit stream of the encoded video.

In step S1720, the image decoding apparatus 200 determines one or more blocks that are divided from the encoded image.

In step S1730, the image decoding apparatus 200 generates prediction data for the current block by performing prediction based on the DNN on the current block among the one or more blocks.

In step S1740, the video decoding apparatus 200 extracts the residual data of the current block from the bitstream.

In step S1750, the image decoding apparatus 200 restores the current block using the predictive data and the residual data.

Hereinafter, a method of determining a data unit of an image according to an embodiment will be described with reference to FIGS. 18 to 31. FIG. The dividing method for the coding unit described in Figs. 18 to 31 can be similarly applied to the dividing method of the conversion unit as the basis of the conversion.

FIG. 18 illustrates a process in which the image decoding apparatus 200 determines at least one encoding unit by dividing a current encoding unit according to an embodiment.

According to an exemplary embodiment, the image decoding apparatus 200 can determine the type of an encoding unit using block type information, and determine a type of an encoding unit to be divided using the type information. That is, the division method of the coding unit indicated by the division type information can be determined according to which block type the block type information used by the video decoding apparatus 200 represents.

According to one embodiment, the image decoding apparatus 200 may use block type information indicating that the current encoding unit is a square type. For example, the video decoding apparatus 200 can determine whether to divide a square encoding unit according to division type information, vertically divide, horizontally divide, or divide into four encoding units. Referring to FIG. 18, when the block type information of the current encoding unit 1800 indicates a square shape, the image decoding apparatus 200 determines the size of the current encoding unit 1800 according to the division type information indicating that the current block is not divided 1810c, 1810c, 1810d, etc.) based on the division type information indicating the predetermined division method.

Referring to FIG. 18, the image decoding apparatus 200 determines two encoding units 1810b obtained by dividing the current encoding unit 1800 in the vertical direction based on the division type information indicating that the image is divided in the vertical direction according to an embodiment . The image decoding apparatus 200 can determine two encoding units 1810c obtained by dividing the current encoding unit 1800 in the horizontal direction based on the division type information indicating that the image is divided in the horizontal direction. The image decoding apparatus 200 can determine the four encoding units 1810d obtained by dividing the current encoding unit 1800 in the vertical direction and the horizontal direction based on the division type information indicating that the image is divided in the vertical direction and the horizontal direction. However, the division type in which the square coding unit can be divided should not be limited to the above-mentioned form, but may include various forms in which the division type information can be represented. The predetermined divisional form in which the square encoding unit is divided will be described in detail by way of various embodiments below.

FIG. 19 illustrates a process in which the image decoding apparatus 200 determines at least one encoding unit by dividing a non-square encoding unit according to an embodiment.

According to one embodiment, the image decoding apparatus 200 may use block type information indicating that the current encoding unit is a non-square format. The image decoding apparatus 200 may determine whether to divide the non-square current coding unit according to the division type information, or not to divide it in a predetermined manner. 19, if the block type information of the

current encoding unit

1900 or 1950 indicates a non-square shape, the image decoding apparatus 200 determines whether the current encoding unit 1900 1920b, 1930a, 1930b, 1930c, 1970a, 1930b, 1930a, 1930b, 1930c, and 1950c) 1970b, 1980a, 1980b, 1980c). The predetermined division method in which the non-square coding unit is divided will be described in detail through various embodiments.

According to an exemplary embodiment, the image decoding apparatus 200 may determine a type in which an encoding unit is divided using segmentation type information. In this case, the segmentation type information indicates a number of at least one encoding unit . 19, if the division type information indicates that the

current encoding unit

1900 or 1950 is divided into two encoding units, the image decoding apparatus 200 determines the

current encoding unit

1900 or 1950 based on the division type information, To determine two

encoding units

1920a, 12220b, or 1970a and 1970b included in the current encoding unit.

When the video decoding apparatus 200 divides the

current coding unit

1900 or 1950 in a non-square form based on the division type information, the non-square

current coding unit

1900 or 1950 The current encoding unit can be divided in consideration of the position of the long side. For example, the image decoding apparatus 200 may divide the

current encoding unit

1900 or 1950 in the direction of dividing the long side of the

current encoding unit

1900 or 1950 in consideration of the type of the

current encoding unit

1900 or 1950 So that a plurality of encoding units can be determined.

According to one embodiment, when the division type information indicates that an encoding unit is divided into an odd number of blocks, the video decoding apparatus 200 can determine an odd number of encoding units included in the

current encoding unit

1900 or 1950. [ For example, when the division type information indicates that the

current coding unit

1900 or 1950 is divided into three coding units, the video decoding apparatus 200 divides the

current coding unit

1900 or 1950 into three

coding units

1930a , 1930b, 1930c, 1980a, 1980b, 1980c). According to an exemplary embodiment, the image decoding apparatus 200 may determine an odd number of encoding units included in the

current encoding unit

1900 or 1950, and the sizes of the determined encoding units may not be the same. For example, the size of a

predetermined encoding unit

1930b or 1980b among the determined odd number of

encoding units

1930a, 1930b, 1930c, 1980a, 1980b, and 1980c is different from the size of

other encoding units

1930a, 1930c, . In other words, an encoding unit that can be determined by dividing the

current encoding unit

1900 or 1950 may have a plurality of types of sizes, and may be an odd number of

encoding units

1930a, 1930b, 1930c, 1980a, 1980b, May have different sizes.

According to an exemplary embodiment, when the division type information indicates that an encoding unit is divided into an odd number of blocks, the image decoding apparatus 200 can determine an odd number of encoding units included in the

current encoding unit

1900 or 1950, The image decoding apparatus 200 may set a predetermined restriction on at least one of the odd number of encoding units generated by division. Referring to FIG. 19, the image decoding apparatus 200 includes an

encoding unit

1930a, 1930b, 1930c, 1980a, 1980b, and 1980c generated by dividing a

current encoding unit

1900 or 1950, (1930b, 1980b) can be made different from other encoding units (1930a, 1930c, 1980a, 1980c). For example, the image decoding apparatus 200 may restrict the

encoding units

1930b and 1980b positioned at the center to be not further divided, or may be limited to a predetermined number of times, differently from

other encoding units

1930a, 1930c, 1980a, It can be limited to be divided.

FIG. 20 illustrates a process in which the video decoding apparatus 200 divides an encoding unit based on at least one of block type information and division type information according to an embodiment.

According to an embodiment, the image decoding apparatus 200 may determine to divide or not divide the first coding unit 2000 of a square shape into coding units based on at least one of the block type information and the division type information. According to one embodiment, when the division type information indicates division of the first encoding unit 2000 in the horizontal direction, the image decoding apparatus 200 divides the first encoding unit 2000 in the horizontal direction, (2010) can be determined. The first encoding unit, the second encoding unit, and the third encoding unit used according to an embodiment are terms used to understand the relation before and after the division between encoding units. For example, if the first encoding unit is divided, the second encoding unit can be determined, and if the second encoding unit is divided, the third encoding unit can be determined. Hereinafter, the relationship between the first coding unit, the second coding unit and the third coding unit used can be understood to be in accordance with the above-mentioned characteristic.

According to an exemplary embodiment, the image decoding apparatus 200 may determine that the determined second encoding unit 2010 is not divided or divided into encoding units based on at least one of the block type information and the division type information. Referring to FIG. 20, the video decoding apparatus 200 decodes a second coding unit 2010 of a non-square shape determined by dividing a first coding unit 2000 based on at least one of block type information and division type information It may be divided into at least one

third encoding unit

2020a, 2020b, 2020c, 2020d, or the like, or the second encoding unit 2010 may not be divided. The image decoding apparatus 200 can acquire at least one of the block type information and the division type information and the image decoding apparatus 200 can acquire at least one of the first encoding unit 2000 (For example, 2010), and the second encoding unit 2010 may divide a plurality of second encoding units (for example, 2010) of various types into a first encoding unit The unit 2000 can be divided according to the divided method. According to an embodiment, when the first encoding unit 2000 is divided into the second encoding units 2010 based on at least one of the block type information and the division type information for the first encoding unit 2000, The coding unit 2010 may be divided into a third coding unit (e.g., 2020a, 2020b, 2020c, 2020d, etc.) based on at least one of block type information and division type information for the second coding unit 2010 have. That is, an encoding unit can be recursively divided based on at least one of division type information and block type information associated with each encoding unit. Therefore, a square encoding unit may be determined in a non-square encoding unit, and a non-square encoding unit may be determined by dividing the square encoding unit recursively. Referring to FIG. 20, among the odd

third encoding units

2020b, 2020c, and 2020d in which the non-square second encoding unit 2010 is divided and determined, a predetermined encoding unit (for example, An encoding unit or a square-shaped encoding unit) can be recursively divided. In an exemplary embodiment, the third encoding unit 2020c, which is one of the odd-numbered

third encoding units

2020b, 2020c, and 2020d, may be divided in the horizontal direction and divided into a plurality of fourth encoding units.

A method which can be used for recursive division of an encoding unit will be described later in various embodiments.

According to an embodiment, the image decoding apparatus 200 may divide each of the

third encoding units

2020a, 2020b, 2020c, and 2020d into encoding units based on at least one of block type information and division type information, It can be determined that the unit 2010 is not divided. The image decoding apparatus 200 may divide the non-square second encoding unit 2010 into odd

third encoding units

2020b, 2020c, and 2020d according to an embodiment. The image decoding apparatus 200 may set a predetermined restriction on a predetermined third encoding unit among odd numbered

third encoding units

2020b, 2020c, and 2020d. For example, the image decoding apparatus 200 may limit the encoding unit 2020c located in the middle among the odd numbered

third encoding units

2020b, 2020c, and 2020d to no longer be divided or be divided into a set number of times . Referring to FIG. 20, the image decoding apparatus 200 includes an odd number of

third encoding units

2020b, 2020c, and 2020d included in a non-square

second encoding unit

2010, 2020c are not further divided or limited to being divided into a predetermined division form (for example, divided into only four coding units or divided into a form corresponding to a form in which the second coding unit 2010 is divided) (For example, dividing only n times, n > 0). The above restriction on the coding unit 2020c positioned in the middle is merely an example and should not be construed to be limited to the above embodiments and the coding unit 2020c positioned in the middle is not limited to the

other coding units

2020b and 2020d Quot;), < / RTI > which can be decoded differently.

According to an exemplary embodiment, the image decoding apparatus 200 may acquire at least one of block type information and division type information used for dividing a current encoding unit at a predetermined position in a current encoding unit.

FIG. 21 illustrates a method by which the image decoding apparatus 200 determines a predetermined encoding unit among odd number of encoding units according to an embodiment. Referring to FIG. 21, at least one of the block type information and the division type information of the current encoding unit 2100 is a sample of a predetermined position among a plurality of samples included in the current encoding unit 2100 (for example, Sample 2140). However, the predetermined position in the current coding unit 2100 in which at least one of the block type information and the division type information can be obtained should not be limited to the middle position shown in FIG. 21, and the current coding unit 2100 (E.g., top, bottom, left, right, top left, bottom left, top right or bottom right, etc.) The video decoding apparatus 200 may determine that the current encoding unit is not divided or divided into the encoding units of various types and sizes by acquiring at least one of the block type information and the division type information obtained from the predetermined position.

According to one embodiment, when the current encoding unit is divided into a predetermined number of encoding units, the image decoding apparatus 200 can select one of the encoding units. The method for selecting one of the plurality of encoding units may be various, and description of these methods will be described later in various embodiments.

According to an exemplary embodiment, the image decoding apparatus 200 may divide a current encoding unit into a plurality of encoding units and determine a predetermined encoding unit.

FIG. 21 shows a method for the video decoding apparatus 200 to determine a coding unit of a predetermined position among odd-numbered coding units according to an embodiment.

According to an exemplary embodiment, the image decoding apparatus 200 may use information indicating the positions of odd-numbered encoding units in order to determine an encoding unit located in the middle among odd-numbered encoding units. Referring to FIG. 21, the image decoding apparatus 200 may divide the current encoding unit 2100 to determine odd number of

encoding units

2120a, 2120b, and 2120c. The image decoding apparatus 200 can determine the center encoding unit 2120b by using information on the positions of the odd number of

encoding units

2120a, 2120b, and 2120c. For example, the image decoding apparatus 200 determines the positions of the

encoding units

2120a, 2120b, and 2120c based on information indicating the positions of predetermined samples included in the

encoding units

2120a, 2120b, and 2120c, The coding unit 2120b located in the coding unit 2120 can be determined. Specifically, the video decoding apparatus 200 determines the positions of the

coding units

2120a, 2120b, and 2120c based on information indicating the positions of the upper

left samples

2130a, 2130b, and 2130c of the coding units 2120a, By determining the position, the coding unit 2120b located in the center can be determined.

Information indicating the positions of the upper

left samples

2130a, 2130b, and 2130c included in the

coding units

2120a, 2120b, and 2120c according to an exemplary embodiment is a position in the picture of the

coding units

2120a, 2120b, and 2120c Or information about the coordinates. Information indicating the positions of the upper

left samples

2130a, 2130b, and 2130c included in the

coding units

2120a, 2120b, and 2120c according to one embodiment is stored in the

coding units

2120a and 2120b included in the current coding unit 2100 And 2120c, and the width or height may correspond to information indicating a difference between coordinates of the

encoding units

2120a, 2120b, and 2120c in a picture. That is, the image decoding apparatus 200 may directly use the information on the positions or coordinates of the

coding units

2120a, 2120b, and 2120c in the pictures or the information on the width or height of the coding units corresponding to the difference between the coordinates The encoding unit 2120b located in the center can be determined.

The information indicating the position of the upper left sample 2130a of the upper coding unit 2120a may indicate the coordinates (xa, ya) and the upper left sample 2130b of the middle coding unit 2120b May represent the coordinates (xb, yb), and the information indicating the position of the upper left sample 2130c of the lower coding unit 2120c may indicate the coordinates (xc, yc). The image decoding apparatus 200 can determine the center encoding unit 2120b by using the coordinates of the upper

left samples

2130a, 2130b and 2130c included in the

encoding units

2120a, 2120b and 2120c. For example, when the coordinates of the upper

left samples

2130a, 2130b and 2130c are sorted in ascending or descending order, the coding unit 2120b including (xb, yb) coordinates of the sample 2130b positioned at the center, Can be determined as a coding unit located in the middle of the

coding units

2120a, 2120b, and 2120c determined by dividing the current coding unit 2100. [ However, the coordinates indicating the positions of the upper

left samples

2130a, 2130b, and 2130c may indicate the coordinates indicating the absolute position in the picture, and the position of the upper left sample 2130a of the upper coding unit 2120a may be (Dxb, dyb) coordinates indicating the relative position of the upper left sample 2130b of the middle coding unit 2120b and the relative position of the upper left sample 2130c of the lower coding unit 2120c Information dyn (dxc, dyc) coordinates may also be used. Also, the method of determining the coding unit at a predetermined position by using the coordinates of the sample as information indicating the position of the sample included in the coding unit should not be limited to the above-described method, and various arithmetic Should be interpreted as a method.

According to an embodiment, the image decoding apparatus 200 may divide the current encoding unit 2100 into a plurality of

encoding units

2120a, 2120b, and 2120c, and may encode a predetermined reference among the encoding

units

2120a, 2120b, and 2120c The encoding unit can be selected. For example, the image decoding apparatus 200 can select an encoding unit 2120b having a different size from among the encoding

units

2120a, 2120b, and 2120c.

According to an embodiment, the image decoding apparatus 200 may include a (xa, ya) coordinate which is information indicating the position of the upper left sample 2130a of the upper encoding unit 2120a, a (Xc, yc) coordinates, which is information indicating the position of the lower-stage coding unit 2130b and the position of the upper-left sample 2130c of the lower-

stage coding unit

2120c, 2120b, and 2120c, respectively. The image decoding apparatus 200 encodes the encoded data in units of

encoding units

2120a, 2120b, and 2120c using coordinates (xa, ya), (xb, yb), (xc, yc) indicating the positions of the encoding units 2120a, ) Can be determined.

According to an embodiment, the image decoding apparatus 200 can determine the width of the upper encoding unit 2120a as xb-xa and the height as yb-ya. According to an embodiment, the image decoding apparatus 200 can determine the width of the middle encoding unit 2120b as xc-xb and the height as yc-yb. The image decoding apparatus 200 may determine the width or height of the lower encoding unit using the width or height of the current encoding unit and the width and height of the upper encoding unit 2120a and the middle encoding unit 2120b . The image decoding apparatus 200 can determine an encoding unit having a different size from other encoding units based on the width and height of the

determined encoding units

2120a, 2120b, and 2120c. Referring to FIG. 21, the image decoding apparatus 200 may determine a coding unit 2120b as a coding unit at a predetermined position while having a size different from that of the upper coding unit 2120a and the lower coding unit 2120c. However, the process of determining the encoding unit having a size different from that of the other encoding units by the image decoding apparatus 200 may be performed in an embodiment of determining an encoding unit at a predetermined position using the size of the encoding unit determined based on the sample coordinates , Various processes may be used for determining the encoding unit at a predetermined position by comparing the sizes of the encoding units determined according to predetermined sample coordinates.

However, the position of the sample to be considered for determining the position of the coding unit should not be interpreted as being limited to the left upper end, and information about the position of any sample included in the coding unit can be interpreted as being available.

According to an exemplary embodiment, the image decoding apparatus 200 may select a coding unit at a predetermined position among the odd number of coding units in which the current coding unit is divided in consideration of the type of the current coding unit. For example, if the current coding unit is a non-square shape having a width greater than the height, the image decoding apparatus 200 can determine a coding unit at a predetermined position along the horizontal direction. That is, the image decoding apparatus 200 may determine one of the encoding units which are located in the horizontal direction and limit the encoding unit. If the current encoding unit is a non-square shape having a height greater than the width, the image decoding apparatus 200 can determine the encoding unit at a predetermined position in the vertical direction. That is, the image decoding apparatus 200 may determine one of the encoding units which are located in the vertical direction and limit the encoding unit.

According to an exemplary embodiment, the image decoding apparatus 200 may use information indicating positions of even-numbered encoding units in order to determine an encoding unit of a predetermined position among the even-numbered encoding units. The image decoding apparatus 200 can determine an even number of coding units by dividing the current coding unit and determine a coding unit at a predetermined position by using information on the positions of the even number of coding units. A concrete procedure for this is omitted because it may be a process corresponding to a process of determining a coding unit of a predetermined position (for example, the middle position) among the above-mentioned odd number of coding units.

According to one embodiment, when a non-square current encoding unit is divided into a plurality of encoding units, in order to determine an encoding unit at a predetermined position among a plurality of encoding units, Can be used. For example, in order to determine an encoding unit located in the middle among the plurality of encoding units in which the current encoding unit is divided, the image decoding apparatus 200 may convert the block type information stored in the sample included in the middle encoding unit, Information can be used.

Referring to FIG. 21, the image decoding apparatus 200 may divide the current encoding unit 2100 into a plurality of

encoding units

2120a, 2120b, and 2120c based on at least one of block type information and division type information, It is possible to determine the encoding unit 2120b located in the middle of the plurality of

encoding units

2120a, 2120b, and 2120c. Furthermore, the image decoding apparatus 200 can determine the coding unit 2120b positioned at the center in consideration of the position where at least one of the block type information and the division type information is obtained. That is, at least one of the block type information and the division type information of the current encoding unit 2100 can be acquired in the sample 2140 located in the middle of the current encoding unit 2100, and the block type information and the division type information If the current encoding unit 2100 is divided into a plurality of

encoding units

2120a, 2120b, and 2120c based on at least one of the encoding units 2120a to 2120c, You can decide. However, the information used for determining the coding unit located in the middle should not be limited to at least one of the block type information and the division type information, and various kinds of information may be used in the process of determining the coding unit located in the middle .

According to an embodiment, predetermined information for identifying a coding unit at a predetermined position may be obtained from a predetermined sample included in a coding unit to be determined. Referring to FIG. 21, the image decoding apparatus 200 includes a plurality of

encoding units

2120a, 2120b, and 2120c, which are determined by dividing the current encoding unit 2100, (For example, a sample located at the center of the current encoding unit 2100) at a predetermined position in the current encoding unit 2100 to determine the encoding unit of the current encoding unit 2100, And at least one of division type information. That is, the video decoding apparatus 200 can determine the sample at the predetermined position in consideration of the block block shape of the current encoding unit 2100, and the video decoding apparatus 200 determines that the current encoding unit 2100 is divided An encoding unit 2120b including a sample from which predetermined information (for example, at least one of block type information and division type information) can be obtained is determined among a plurality of

encoding units

2120a, 2120b, and 2120c A predetermined limit can be set. Referring to FIG. 21, the image decoding apparatus 200 may determine a sample 2140 located in the center of the current encoding unit 2100 as a sample from which predetermined information can be obtained, The coding unit 200 may limit the coding unit 2120b including the sample 2140 to a predetermined limit in the decoding process. However, the position of the sample from which predetermined information can be obtained can not be construed to be limited to the above-mentioned position, but can be interpreted as samples at arbitrary positions included in the encoding unit 2120b to be determined for limiting.

The position of a sample from which predetermined information can be obtained according to an embodiment may be determined according to the type of the current encoding unit 2100. According to one embodiment, the block type information can determine whether the current encoding unit is a square or a non-square, and determine the position of a sample from which predetermined information can be obtained according to the shape. For example, the video decoding apparatus 200 may use at least one of the information on the width of the current encoding unit and the information on the height to position at least one of the width and the height of the current encoding unit in half The sample can be determined as a sample from which predetermined information can be obtained. For example, when the block type information related to the current encoding unit is a non-square type, the image decoding apparatus 200 selects one of the samples adjacent to the boundary dividing the long side of the current encoding unit into halves by a predetermined Can be determined as a sample from which the information of < / RTI >

According to an exemplary embodiment, when the current encoding unit is divided into a plurality of encoding units, the image decoding apparatus 200 may determine at least one of the block type information and the division type information One can be used. According to an embodiment, the image decoding apparatus 200 can acquire at least one of the block type information and the division type information from a sample at a predetermined position included in the encoding unit, and the image decoding apparatus 200 determines that the current encoding unit is divided And divide the generated plurality of coding units by using at least one of division type information and block type information obtained from samples at predetermined positions included in each of the plurality of coding units. That is, the coding unit can be recursively divided using at least one of the block type information and the division type information obtained in the sample at the predetermined position included in each of the coding units. The recursive division process of the encoding unit is described in detail with reference to FIG. 20, and a detailed description thereof will be omitted.

According to an exemplary embodiment, the image decoding apparatus 200 may determine at least one encoding unit by dividing the current encoding unit, and may determine the order in which the at least one encoding unit is decoded in a predetermined block (for example, ). &Lt; / RTI >

FIG. 22 shows a sequence in which a plurality of coding units are processed when the image decoding apparatus 200 determines a plurality of coding units by dividing the current coding unit according to an embodiment.

According to one embodiment, the image decoding apparatus 200 may divide the first encoding unit 2200 in the vertical direction according to the block type information and the division type information to determine the

second encoding units

2210a and 2210b, 2250b, 2250c, 2250d by dividing the first encoding unit 2200 in the horizontal direction and the

second encoding units

2230a, 2230b or dividing the first encoding unit 2200 in the vertical direction and the horizontal direction, Can be determined.

22, the image decoding apparatus 200 may determine the order in which the

second encoding units

2210a and 2210b determined by dividing the first encoding unit 2200 in the vertical direction are processed in the horizontal direction 2210c . The image decoding apparatus 200 may determine the processing order of the

second encoding units

2230a and 2230b determined by dividing the first encoding unit 2200 in the horizontal direction as the vertical direction 2230c. The image decoding apparatus 200 processes the encoding units located in one row of the

second encoding units

2250a, 2250b, 2250c, and 2250d determined by dividing the first encoding unit 2200 in the vertical direction and the horizontal direction (For example, a raster scan order or a z scan order 2250e) in which the encoding units located in the next row are processed.

According to an exemplary embodiment, the image decoding apparatus 200 may recursively divide encoding units. 22, the image decoding apparatus 200 may determine a plurality of

encoding units

2210a, 2210b, 2230a, 2230b, 2250a, 2250b, 2250c, and 2250d by dividing the first encoding unit 2200, It is possible to recursively divide each of the determined plurality of

encoding units

2210a, 2210b, 2230a, 2230b, 2250a, 2250b, 2250c, and 2250d. The method of dividing the plurality of

encoding units

2210a, 2210b, 2230a, 2230b, 2250a, 2250b, 2250c, and 2250d may be a method corresponding to the method of dividing the first encoding unit 2200. [ Accordingly, the plurality of

encoding units

2210a, 2210b, 2230a, 2230b, 2250a, 2250b, 2250c, and 2250d may be independently divided into a plurality of encoding units. 22, the image decoding apparatus 200 may determine the

second encoding units

2210a and 2210b by dividing the first encoding unit 2200 in the vertical direction, and may further determine the

second encoding units

2210a and 2210b Can be determined not to divide or separate independently.

According to an embodiment, the image decoding apparatus 200 may divide the left second encoding unit 2210a in the horizontal direction into

third encoding units

2220a and 2220b, and the second encoding unit 2210b ) May not be divided.

According to an embodiment, the processing order of the encoding units may be determined based on the division process of the encoding units. In other words, the processing order of the divided coding units can be determined based on the processing order of the coding units immediately before being divided. The image decoding apparatus 200 can determine the order in which the

third encoding units

2220a and 2220b determined by dividing the second encoding unit 2210 on the left side are processed independently of the second encoding unit 2210b on the right side. The

third encoding units

2220a and 2220b may be processed in the vertical direction 2220c since the second encoding units 2210a on the left side are divided in the horizontal direction and the

third encoding units

2220a and 2220b are determined. Since the order in which the left second encoding unit 2210a and the right second encoding unit 2210b are processed corresponds to the horizontal direction 2210c, the third encoding unit 2210a included in the left second encoding unit 2210a, The right encoding unit 2210b may be processed after the

frames

2220a and 2220b are processed in the vertical direction 2220c. The above description is intended to explain the process sequence in which encoding units are determined according to the encoding units before division. Therefore, it should not be construed to be limited to the above-described embodiments, It should be construed as being used in various ways that can be handled independently in sequence.

23 illustrates a process of determining that the current encoding unit is divided into odd number of encoding units when the image decoding apparatus 200 can not process the encoding units in a predetermined order according to an embodiment.

According to an embodiment, the image decoding apparatus 200 may determine that the current encoding unit is divided into odd number of encoding units based on the obtained block type information and the division type information. Referring to FIG. 23, the first encoding unit 2300 in the form of a square may be divided into

second encoding units

2310a and 2310b in a non-square form, and the

second encoding units

2310a and 2310b may be independently 3

encoding units

2320a, 2320b, 2320c, 2320d, and 2320e. According to an embodiment, the image decoding apparatus 200 may determine a plurality of

third encoding units

2320a and 2320b by dividing the left encoding unit 2310a of the second encoding unit in the horizontal direction, and the right encoding unit 2310b Can be divided into an odd number of

third encoding units

2320c, 2320d, and 2320e.

According to one embodiment, the image decoding apparatus 200 determines whether or not the

third encoding units

2320a, 2320b, 2320c, 2320d, and 2320e can be processed in a predetermined order, and determines whether there are odd-numbered encoding units You can decide. Referring to FIG. 23, the image decoding apparatus 200 may recursively divide the first encoding unit 2300 to determine the

third encoding units

2320a, 2320b, 2320c, 2320d, and 2320e. The video decoding apparatus 200 may further include a first coding unit 2300, a second coding unit 2310a, and a

third coding unit

2320a, 2320b, 2320c, and 2320c based on at least one of block type information and division type information, 2320d, and 2320e may be divided into odd number of coding units among the divided types. For example, an encoding unit located on the right of the

second encoding units

2310a and 2310b may be divided into odd

third encoding units

2320c, 2320d, and 2320e. The order in which the plurality of coding units included in the first coding unit 2300 are processed may be a predetermined order (for example, a z-scan order 2330) 200 can determine whether the

third encoding units

2320c, 2320d, and 2320e determined by dividing the right second encoding unit 2310b into odd numbers satisfy the condition that the

third encoding units

2320c, 2320d, and 2320e can be processed according to the predetermined order.

According to an embodiment, the image decoding apparatus 200 satisfies a condition that

third encoding units

2320a, 2320b, 2320c, 2320d, and 2320e included in the first encoding unit 2300 can be processed in a predetermined order And it is determined whether or not at least one of the width and height of the

second encoding units

2310a and 2310b is divided in half according to the boundaries of the

third encoding units

2320a, 2320b, 2320c, 2320d, and 2320e . For example, the

third coding units

2320a and 2320b determined by dividing the height of the left-side second coding unit 2310a in the non-square form by half, satisfy the condition, but the right second coding unit 2310b is set to 3 Since the boundaries of the

third encoding units

2320c, 2320d, and 2320e determined by dividing the

first encoding units

2320c, 2320d, and 2320d by the

encoding units

2320c, 2320d, and 2320e do not divide the width or height of the right second encoding unit 2310b by half, 2320e may be determined as not satisfying the condition and the image decoding apparatus 200 may determine that the scanning order is disconnection in the case of such unsatisfactory condition and the right second encoding unit 2310b is determined based on the determination result It can be determined to be divided into odd number of encoding units. According to an exemplary embodiment, when the image decoding apparatus 200 is divided into odd-numbered encoding units, a predetermined limit may be imposed on a predetermined unit of the divided encoding units. Since the embodiment has been described above, a detailed description thereof will be omitted.

FIG. 24 illustrates a process in which the image decoding apparatus 200 determines at least one encoding unit by dividing a first encoding unit 2400 according to an embodiment. According to an embodiment, the image decoding apparatus 200 may divide the first encoding unit 2400 based on at least one of the block type information and the division type information acquired through the receiving unit 210. [ The first encoding unit 2400 in the form of a square may be divided into four encoding units having a square shape or may be divided into a plurality of non-square encoding units. For example, referring to FIG. 24, when the block type information indicates that the first encoding unit 2400 is square and that the division type information is divided into non-square encoding units, the image decoding apparatus 200 determines that the first encoding unit The encoding unit 2400 may be divided into a plurality of non-square encoding units. Specifically, when the division type information indicates that the first encoding unit 2400 is divided horizontally or vertically to determine an odd number of encoding units, the image decoding apparatus 200 includes a first encoding unit 2400 in the form of a square Can be divided into the

second encoding units

2410a, 2410b, and 2410c divided in the vertical direction as the odd number of encoding units, or the second encoding units 2420a, 2420b, and 2420c determined by being divided in the horizontal direction.

According to an exemplary embodiment, the image decoding apparatus 200 may be configured such that the

second encoding units

2410a, 2410b, 2410c, 2420a, 2420b, and 2420c included in the first encoding unit 2400 are processed in a predetermined order And the condition is that at least one of the width and the height of the first encoding unit 2400 is divided in half according to the boundaries of the

second encoding units

2410a, 2410b, 2410c, 2420a, 2420b, and 2420c . 24, the boundaries of the

second encoding units

2410a, 2410b, and 2410c, which are determined by dividing the first encoding unit 2400 in the vertical direction, are halved in the width of the first encoding unit 2400 The first encoding unit 2400 can be determined as not satisfying a condition that can be processed in a predetermined order. Also, since the boundaries of the second encoding units 2420a, 2420b, and 2420c determined by dividing the first encoding unit 2400 in the horizontal direction into the horizontal direction can not divide the width of the first encoding unit 2400 in half, 1 encoding unit 2400 can be determined as not satisfying a condition that can be processed in a predetermined order. The image decoding apparatus 200 may determine that the scan order is disconnection in the case of such unsatisfactory condition and determine that the first encoding unit 2400 is divided into odd number of encoding units based on the determination result. According to an exemplary embodiment, when the image decoding apparatus 200 is divided into odd-numbered encoding units, a predetermined limit may be imposed on a predetermined unit of the divided encoding units. Since the embodiment has been described above, a detailed description thereof will be omitted.

According to one embodiment, the image decoding apparatus 200 may determine the encoding units of various types by dividing the first encoding unit.

24, the image decoding apparatus 200 may divide a first coding unit 2400 in a square form and a

first coding unit

2430 or 2450 in a non-square form into various types of coding units .

25 shows an example in which when the non-square second encoding unit determined by dividing the first encoding unit 2500 by the image decoding apparatus 200 satisfies a predetermined condition, the second encoding unit is divided Lt; RTI ID = 0.0 > limited. &Lt; / RTI >

According to an exemplary embodiment, the image decoding apparatus 200 may include a first encoding unit 2500 in the form of a square based on at least one of block type information and division type information acquired through the receiving

unit

210, 2

encoding units

2510a, 2510b, 2520a, and 2520b. The

second encoding units

2510a, 2510b, 2520a, and 2520b may be independently divided. Accordingly, the video decoding apparatus 200 determines whether to divide or not divide into a plurality of coding units based on at least one of the block type information and the division type information related to each of the

second coding units

2510a, 2510b, 2520a, and 2520b . According to one embodiment, the image decoding apparatus 200 divides the non-square left second encoding unit 2510a determined by dividing the first encoding unit 2500 in the vertical direction into a horizontal direction, 2512a, 2512b can be determined. However, in the case where the left second encoding unit 2510a is divided in the horizontal direction, the right-side second encoding unit 2510b is arranged in the horizontal direction in the same way as the direction in which the left second encoding unit 2510a is divided, As shown in Fig. If the right second encoding unit 2510b is divided in the same direction and the

third encoding units

2514a and 2514b are determined, the left second encoding unit 2510a and the right second encoding unit 2510b are arranged in the horizontal direction The

third encoding units

2512a, 2512b, 2514a, and 2514b can be determined by being independently divided. However, this is because the video decoding apparatus 200 divides the first encoding unit 2500 into four square-shaped

second encoding units

2530a, 2530b, 2530c, and 2530d based on at least one of the block type information and the division type information And this may be inefficient in terms of image decoding.

According to one embodiment, the image decoding apparatus 200 divides the

second encoding unit

2520a or 2520b in the non-square form determined by dividing the first encoding unit 12100 in the horizontal direction into the vertical direction, (2522a, 2522b, 2524a, 2524b). However, if one of the second coding units (for example, the upper second coding unit 2520a) is divided in the vertical direction, the video decoding apparatus 200 may generate a second coding unit (for example, The encoding unit 2520b) can be restricted so that the upper second encoding unit 2520a can not be divided in the vertical direction in the same direction as the divided direction.

FIG. 26 illustrates a process in which the image decoding apparatus 200 divides a square-shaped encoding unit when the division type information can not be divided into four square-shaped encoding units according to an embodiment.

According to an embodiment, the image decoding apparatus 200 divides the first encoding unit 2600 based on at least one of the block type information and the division type information, and outputs the

second encoding units

2610a, 2610b, 2620a, 2620b, You can decide. The division type information may include information on various types in which the coding unit can be divided, but information on various types may not include information for dividing into four square units of coding units. According to the division type information, the image decoding apparatus 200 can not divide the first encoding unit 2600 in the square form into the

second encoding units

2630a, 2630b, 2630c, and 2630d in the form of a quadrangle. Based on the division type information, the video decoding apparatus 200 can determine the non-square

second encoding units

2610a, 2610b, 2620a, 2620b, and the like.

According to an embodiment, the image decoding apparatus 200 may independently divide the non-square

second encoding units

2610a, 2610b, 2620a, and 2620b, respectively. Each of the

second encoding units

2610a, 2610b, 2620a, 2620b, and the like may be divided in a predetermined order through a recursive method, and the first encoding unit 2600 May be a partitioning method corresponding to a method in which a partition is divided.

For example, the image decoding apparatus 200 can determine the

third encoding units

2612a and 2612b in the form of a square by dividing the left second encoding unit 2610a in the horizontal direction, and the second right encoding unit 2610b It is possible to determine the

third encoding units

2614a and 2614b in the form of a square by being divided in the horizontal direction. Further, the image decoding apparatus 200 may divide the left second encoding unit 2610a and the right second encoding unit 2610b in the horizontal direction to determine the square-shaped

third encoding units

2616a, 2616b, 2616c, and 2616d have. In this case, the encoding unit can be determined in the same manner as that of the first encoding unit 2600 divided into the four second square encoding

units

2630a, 2630b, 2630c, and 2630d.

In another example, the image decoding apparatus 200 can determine the

third encoding units

2622a and 2622b in the shape of a square by dividing the upper second encoding unit 2620a in the vertical direction, and the lower second encoding units 2620b Can be divided in the vertical direction to determine the

third encoding units

2624a and 2624b in the form of a square. Furthermore, the image decoding apparatus 200 may divide the upper second encoding unit 2620a and the lower second encoding unit 2620b in the vertical direction to determine the square-shaped

third encoding units

2622a, 2622b, 2624a, and 2624b have. In this case, the encoding unit can be determined in the same manner as that of the first encoding unit 2600 divided into the four second square encoding

units

2630a, 2630b, 2630c, and 2630d.

According to an embodiment, the image decoding apparatus 200 may divide the first encoding unit 2700 based on the block type information and the division type information. When the block type information indicates a square shape and the division type information indicates that the first encoding unit 2700 is divided into at least one of a horizontal direction and a vertical direction, the image decoding apparatus 200 includes a first encoding unit 2700 (E.g., 2710a, 2710b, 2720a, 2720b, etc.) can be determined by dividing the second encoding unit (e.g. Referring to FIG. 27, the non-square

second encoding units

2710a, 2710b, 2720a, and 2720b, which are determined by dividing the first encoding unit 2700 only in the horizontal direction or the vertical direction, As shown in FIG. For example, the image decoding apparatus 200 divides the

second encoding units

2710a and 2710b generated by dividing the first encoding unit 2700 in the vertical direction into the horizontal direction and outputs the

third encoding units

2716a and 2716b, 2726c and 2716d can be determined and the

second coding units

2720a and 2720b generated by dividing the first coding unit 2700 in the horizontal direction are respectively divided in the horizontal direction and the

third coding units

2726a, 2726b and 2726c , 2726d) can be determined. Since the process of dividing the

second encoding units

2710a, 2710b, 2720a, and 2720b has been described in detail with reference to FIG. 25, a detailed description will be omitted.

According to an embodiment, the image decoding apparatus 200 may process a coding unit in a predetermined order. The features of the processing of the encoding unit according to the predetermined order have been described above with reference to FIG. 22, and a detailed description thereof will be omitted. 27, the image decoding apparatus 200 divides a first encoding unit 2700 in a square form into four quadrangle-shaped

third encoding units

2716a, 2716b, 2716c, 2716d, 2726a, 2726b, 2726c, 2726d Can be determined. According to an embodiment, the image decoding apparatus 200 may process the

third encoding units

2716a, 2716b, 2716c, 2716d, 2726a, 2726b, 2726c, and 2726d according to the form in which the first encoding unit 2700 is divided You can decide.

According to an embodiment, the image decoding apparatus 200 divides the

second encoding units

2710a and 2710b generated in the vertical direction into the horizontal direction to determine the

third encoding units

2716a, 2716b, 2716c, and 2716d And the image decoding apparatus 200 first processes the

third encoding units

2716a and 2716b included in the left second encoding unit 2710a in the vertical direction and then processes the

third encoding units

2716a and 2716b included in the right second encoding unit 2710b The

third encoding units

2716a, 2716b, 2716c, and 2716d can be processed in accordance with an order 2717 of processing the

third encoding units

2716c and 2716d in the vertical direction.

According to an embodiment, the image decoding apparatus 200 divides the

second encoding units

2720a and 2720b generated in the horizontal direction into vertical directions to determine the

third encoding units

2726a, 2726b, 2726c, and 2726d And the image decoding apparatus 200 first processes the

third encoding units

2726a and 2726b included in the upper second encoding unit 2720a in the horizontal direction and then processes the

third encoding units

2726a and 2726b included in the lower second encoding unit 2720b The

third encoding units

2726a, 2726b, 2726c, and 2726d can be processed in accordance with the sequence 2727 of processing the

third encoding units

2726c and 2726d in the horizontal direction.

Referring to FIG. 27, the

second encoding units

2710a, 2710b, 2720a, and 2720b are divided to determine the

third encoding units

2716a, 2716b, 2716c, 2716d, 2726a, 2726b, 2726c, and 2726d, have. The

second encoding units

2710a and 2710b determined to be divided in the vertical direction and the

second encoding units

2720a and 2720b determined to be divided in the horizontal direction are divided into different formats. However, the

third encoding units

2716a , 2716b, 2716c, 2716d, 2726a, 2726b, 2726c, and 2726d, the result is that the first encoding unit 2700 is divided into the same type of encoding units. Accordingly, the image decoding apparatus 200 recursively divides the encoding units through different processes based on at least one of the block type information and the division type information, thereby eventually determining the same type of encoding units, Lt; RTI ID = 0.0 > units of < / RTI >

According to an exemplary embodiment, the image decoding apparatus 200 may determine the depth of a coding unit according to a predetermined criterion. For example, a predetermined criterion may be a length of a long side of a coding unit. When the length of the long side of the current coding unit is divided by 2 ⁿ (n> 0) times the length of the long side of the coding unit before the current coding unit is divided, the depth of the current coding unit is the depth of the coding unit before being divided It can be determined that the depth is increased by n more. Hereinafter, an encoding unit with an increased depth is expressed as a lower-depth encoding unit.

Referring to FIG. 28, on the basis of block type information (for example, block type information may indicate '0: SQUARE') indicating that the block type information is a square type according to an embodiment, 1 encoding unit 2800 may be divided to determine the second encoding unit 2802, the third encoding unit 2804, and the like of the lower depth. The size of the square shape of the first encoding unit 2800 if it 2Nx2N, the second encoding unit 2802, the width and height of the first encoding unit 2800 is determined by dividing 1/2 times ¹ have a size of NxN . Furthermore, the third coding unit 2804 determined by dividing the width and height of the second coding unit 2802 by a half size may have a size of N / 2xN / 2. In this case, it corresponds to 1/2 ² times the width and height of the first encoding unit 2800 of the third encoding unit (2804). The case where the depth of the first encoding unit 2800 D a first depth of coding units 2800 ¹ 1/2 times the second encoding unit 2802 in the width and height may be in the D + 1, the first encoding depth of 1/2 ² times the third encoding unit 2804 of the width and height of the unit 2800 may be a D + 2.

According to an exemplary embodiment, block type information indicating a non-square shape (for example, block type information is' 1: NS_VER 'indicating that the height is a non-square having a width greater than the width or' 2 >: NS_HOR '), the image decoding apparatus 200 divides the non-square

first coding unit

2810 or 2820 into second

lower coding units

2812 or 2822, The

third encoding unit

2814 or 2824, or the like.

The video decoding apparatus 200 may determine a second coding unit (e.g., 2802, 2812, 2822, etc.) by dividing at least one of the width and the height of the first coding unit 2810 of Nx2N size. That is, the image decoding apparatus 200 can determine the second encoding unit 2802 of NxN size or the second encoding unit 2822 of NxN / 2 size by dividing the first encoding unit 2810 in the horizontal direction, The second encoding unit 2812 having the size of N / 2xN may be determined by dividing the second encoding unit 2812 in the horizontal direction and the vertical direction.

According to an embodiment, the image decoding apparatus 200 divides at least one of the width and the height of the 2NxN first encoding unit 2820 to determine a second encoding unit (e.g., 2802, 2812, 2822, etc.) It is possible. That is, the image decoding apparatus 200 can determine the second encoding unit 2802 of NxN size or the second encoding unit 2812 of N / 2xN size by dividing the first encoding unit 2820 in the vertical direction, The second encoding unit 2822 of the NxN / 2 size may be determined by dividing the image data in the horizontal direction and the vertical direction.

According to an exemplary embodiment, the image decoding apparatus 200 divides at least one of the width and the height of the second encoding unit 2802 of NxN size to determine a third encoding unit (e.g., 2804, 2814, and 2824) It is possible. That is, the image decoding device 200 includes a second sub-coding unit 2802 in the vertical direction and the horizontal direction to determine the N / 2xN / 2 size, a third coding unit 2804 or to N / ² 2 xN / 2 size The third encoding unit 2814 of the N / 2xN / 2 ² size can be determined, or the third encoding unit 2824 of the N / 2xN / 2 ² size can be determined.

According to an exemplary embodiment, the image decoding apparatus 200 divides at least one of the width and the height of the second encoding unit 2812 of N / 2xN size to generate a third encoding unit (e.g., 2804, 2814, and 2824) . That is, the image decoding apparatus 200 divides the second encoding unit 2812 in the horizontal direction to obtain a third encoding unit 2804 of N / 2xN / 2 or a third encoding unit 2804 of N / 2xN / 2 ² size 2824) may be a crystal or division by determining the N / ² 2 xN / 2 the size of the third encoding unit 2814 in the vertical direction and the horizontal direction.

According to an embodiment, the image decoding apparatus 200 divides at least one of the width and the height of the second encoding unit 2814 of NxN / 2 size into a third encoding unit (e.g., 2804, 2814, 2824, etc.) . That is, the image decoding device 200 includes a second by dividing the coding unit 2812 in the vertical direction N / 2xN / 2 size, a third encoding unit 2804 or N / 2 ^2, xN / 2 size, a third encoding unit of the The second encoding unit 2814 can be determined or divided into vertical and horizontal directions to determine a third encoding unit 2824 of N / 2xN / 2 ² size.

According to one embodiment, the image decoding apparatus 200 may divide a square-shaped encoding unit (for example, 2800, 2802, and 2804) into a horizontal direction or a vertical direction. For example, the first encoding unit 2800 of the size 2Nx2N is divided in the vertical direction to determine the first encoding unit 2810 of the size Nx2N or the horizontal direction to determine the first encoding unit 2820 of 2NxN size . According to one embodiment, when the depth is determined based on the length of the longest side of the encoding unit, the depth of the encoding unit in which the

first encoding unit

2800, 2802, or 2804 of 2Nx2N size is divided in the horizontal direction or the vertical direction is determined May be the same as the depth of the first encoding unit (2800, 2802 or 2804).

According to one embodiment it may correspond to 1/2 ² times the third encoding unit of width and height of a first encoding unit (2810 or 2820) of (2814 or 2824). When the depth of the

first coding unit

2810 or 2820 is D, the depth of the

second coding unit

2812 or 2814 which is half the width and height of the

first coding unit

2810 or 2820 is D + and, the depth of the first encoding unit ² 1/2 times the third encoding unit (2814 or 2824) of the width and height of (2810 or 2820) may be a D + 2.

According to an embodiment, the image decoding apparatus 200 may divide the first encoding unit 2900 in a square shape to determine various types of second encoding units. 29, the video decoding apparatus 200 divides the first coding unit 2900 into at least one of a vertical direction and a horizontal direction according to the division type information, and outputs the

second coding units

2902a, 2902b, 2904a, 2904b, 2906a, 2906b, 2906c, 2906d. That is, the video decoding apparatus 200 can determine the

second encoding units

2902a, 2902b, 2904a, 2904b, 2906a, 2906b, 2906c, and 2906d based on the division type information for the first encoding unit 2900. [

The

second encoding units

2902a, 2902b, 2904a, 2904b, 2906a, 2906b, 2906c, and 2906d, which are determined according to the division type information for the first encoding unit 2900 in a square form, Depth can be determined based on. For example, since the length of one side of the first encoding unit 2900 in the square form is equal to the length of the long side of the

second encoding units

2902a, 2902b, 2904a, and 2904b in the non-square form, 2900) and the non-square type

second encoding units

2902a, 2902b, 2904a, 2904b are denoted by D in the same manner. On the other hand, when the image decoding apparatus 200 divides the first encoding unit 2900 into four square-shaped

second encoding units

2906a, 2906b, 2906c, and 2906d based on the division type information, 2906b, 2906c, and 2906d are 1/2 times the length of one side of the first encoding unit 2900, the depths of the

second encoding units

2906a, 2906b, 2906c, May be a depth of D + 1 that is one depth lower than D, which is the depth of the first encoding unit 2900.

According to an embodiment, the image decoding apparatus 200 divides a first encoding unit 2910 of a shape whose height is longer than a width in a horizontal direction according to division type information to generate a plurality of

second encoding units

2912a, 2912b, 2914a, 2914b, and 2914c. According to one embodiment, the image decoding apparatus 200 divides a first encoding unit 2920 of a shape whose width is longer than a height in a vertical direction according to the division type information, and generates a plurality of

second encoding units

2922a, 2922b, 2924a, 2924b, and 2924c.

The

second encoding units

2912a, 2912b, 2914a, 2914b, 2116a, 2116b, 2116c, and 2116d determined according to the division type information for the

first encoding unit

2910 or 2920 of the non- The depth can be determined based on the length of the long side. For example, since the length of one side of the square-shaped

second encoding units

2912a and 2912b is 1/2 times the length of one side of the non-square first encoding unit 2910 whose height is longer than the width, The depth of the

second encoding units

2902a, 2902b, 2904a, and 2904b in the form of D + 1 is one depth below the depth D of the first encoding unit 2910 in the non-square form.

Furthermore, the image decoding apparatus 200 may divide the non-square first encoding unit 2910 into odd

second encoding units

2914a, 2914b, and 2914c based on the division type information. The odd number of

second encoding units

2914a, 2914b and 2914c may include non-square

second encoding units

2914a and 2914c and a square second encoding unit 2914b. In this case, the length of the long sides of the non-square

second encoding units

2914a and 2914c and the length of one side of the second encoding unit 2914b in the square shape are set to 1/2 of the length of one side of the first encoding unit 2910, The depth of the

second encoding units

2914a, 2914b and 2914c may be a depth of D + 1 which is one depth lower than the depth D of the first encoding unit 2910. [ The image decoding apparatus 200 is connected to the first encoding unit 2920 in the form of a non-square shape whose width is longer than the height in a manner corresponding to the scheme for determining the depths of the encoding units associated with the first encoding unit 2910 The depth of the encoding units can be determined.

According to one embodiment, the image decoding apparatus 200 determines an index (PID) for dividing the divided coding units. If the odd-numbered coding units are not the same size, The index can be determined based on the index. 29, an encoding unit 2914b positioned at the center among the odd-numbered

encoding units

2914a, 2914b, and 2914c has the same width as

other encoding units

2914a and 2914c, And may be twice as high as the height of the sidewalls 2914a and 2914c. That is, in this case, the middle encoding unit 2914b may include two of the

other encoding units

2914a and 2914c. Therefore, if the index (PID) of the coding unit 2914b located at the center is 1 according to the scanning order, the coding unit 2914c positioned next to the coding unit 2914c may be three days in which the index is increased by two. That is, there may be a discontinuity in the value of the index. According to an exemplary embodiment, the image decoding apparatus 200 may determine whether odd-numbered encoding units are not the same size based on the presence or absence of an index discontinuity for distinguishing between the divided encoding units.

According to an exemplary embodiment, the image decoding apparatus 200 may determine whether the image is divided into a specific division form based on a value of an index for identifying a plurality of coding units divided and determined from the current coding unit. 29, the image decoding apparatus 200 divides a first coding unit 2910 of a rectangular shape whose height is longer than the width to determine an even number of

coding units

2912a and 2912b or an odd number of

coding units

2914a and 2914b , 2914c can be determined. The image decoding apparatus 200 may use an index (PID) indicating each coding unit to distinguish each of the plurality of coding units. According to one embodiment, the PID may be obtained at a sample of a predetermined position of each coding unit (e.g., the upper left sample).

According to an exemplary embodiment, the image decoding apparatus 200 may determine an encoding unit at a predetermined location among the encoding units determined by using the index for distinguishing the encoding units. According to an exemplary embodiment, when the division type information for the rectangular first type encoding unit 2910 having a height greater than the width is divided into three encoding units, the image decoding apparatus 200 transmits the first encoding unit 2910 It can be divided into three

coding units

2914a, 2914b and 2914c. The image decoding apparatus 200 may assign an index to each of the three

encoding units

2914a, 2914b, and 2914c. The image decoding apparatus 200 can compare the indexes of the respective encoding units in order to determine the middle encoding unit among the encoding units divided into odd numbers. The image decoding apparatus 200 encodes an encoding unit 2914b having an index corresponding to a middle value among the indices based on the indices of the encoding units so as to encode the middle position among the encoding units determined by dividing the first encoding unit 2910 Can be determined as a unit. According to an exemplary embodiment, the image decoding apparatus 200 may determine an index based on a size ratio between coding units when the coding units are not the same size in determining the index for dividing the divided coding units . 29, the coding unit 2914b generated by dividing the first coding unit 2910 is divided into

coding units

2914a and 2914c having the same width as the

other coding units

2914a and 2914c but different in height Can be double the height. In this case, if the index (PID) of the coding unit 2914b positioned at the center is 1, the coding unit 2914c positioned next to the coding unit 2914c may be three days in which the index is increased by two. In this case, when the index increases uniformly and the increment increases, the image decoding apparatus 200 may determine that the image decoding apparatus 200 is divided into a plurality of encoding units including encoding units having different sizes from other encoding units. According to an exemplary embodiment, when the division type information indicates division into odd number of coding units, the image decoding apparatus 200 may be configured such that the coding unit (for example, the middle coding unit) at a predetermined position among the odd number of coding units is different from the coding unit Can divide the current encoding unit into other forms. In this case, the image decoding apparatus 200 can determine an encoding unit having a different size by using an index (PID) for the encoding unit. However, the index and the size or position of the encoding unit at a predetermined position to be determined are specific for explaining an embodiment, and thus should not be construed to be limited thereto, and various indexes, positions and sizes of encoding units can be used Should be interpreted.

According to an exemplary embodiment, the image decoding apparatus 200 may use a predetermined data unit in which recursive division of encoding units starts.

According to an exemplary embodiment, a predetermined data unit may be defined as a data unit in which an encoding unit starts to be recursively segmented using at least one of block type information and partition type information. That is, it may correspond to a coding unit of the highest depth used in a process of determining a plurality of coding units for dividing a current picture. Hereinafter, such a predetermined data unit is referred to as a reference data unit for convenience of explanation.

According to one embodiment, the reference data unit may represent a predetermined size and shape. According to one embodiment, the reference encoding unit may comprise samples of MxN. Here, M and N may be equal to each other, or may be an integer represented by a multiplier of 2. That is, the reference data unit may represent a square or a non-square shape, and may be divided into an integer number of encoding units.

According to an embodiment, the image decoding apparatus 200 may divide the current picture into a plurality of reference data units. According to an exemplary embodiment, the image decoding apparatus 200 may divide a plurality of reference data units for dividing a current picture using division information for each reference data unit. The segmentation process of the reference data unit may correspond to the segmentation process using a quad-tree structure.

According to an exemplary embodiment, the image decoding apparatus 200 may determine in advance a minimum size that a reference data unit included in a current picture can have. Accordingly, the image decoding apparatus 200 can determine reference data units of various sizes having a size larger than a minimum size, and determine at least one coding unit using block type information and division type information based on the determined reference data unit You can decide.

Referring to FIG. 30, the image decoding apparatus 200 may use a reference encoding unit 3000 of a square shape or a reference encoding unit 3002 of a non-square shape. According to an exemplary embodiment, the type and size of the reference encoding unit may include various data units (e.g., a sequence, a picture, a slice, a slice segment a slice segment, a maximum encoding unit, and the like).

According to an embodiment, the receiver 210 of the video decoding apparatus 200 may acquire at least one of the information on the type of the reference encoding unit and the size of the reference encoding unit from the bit stream for each of the various data units . The process of determining at least one encoding unit included in the reference-type encoding unit 3000 in the form of a square is described in detail in the process of dividing the current encoding unit of FIG. 18 and included in the non-square-type reference encoding unit 3000 The process of determining at least one encoding unit of FIG. 19 has been described in the process of dividing the

current encoding unit

1900 or 1950 of FIG. 19, so a detailed description will be omitted.

In order to determine the size and type of the reference encoding unit according to a predetermined data unit predetermined based on a predetermined condition, the image decoding apparatus 200 may include an index for identifying the size and type of the reference encoding unit Can be used. That is, the receiving unit 210 extracts a predetermined condition (for example, a data unit having a size equal to or smaller than a slice) among the various data units (for example, a sequence, a picture, a slice, a slice segment, It is possible to obtain only an index for identifying the size and type of the reference encoding unit for each slice, slice segment, maximum encoding unit, and the like. The image decoding apparatus 200 can determine the size and shape of the reference data unit for each data unit satisfying the predetermined condition by using the index. When the information on the type of the reference encoding unit and the information on the size of the reference encoding unit are obtained from the bitstream for each relatively small data unit and used, the use efficiency of the bitstream may not be good. Therefore, Information on the size of the reference encoding unit and information on the size of the reference encoding unit can be acquired and used. In this case, at least one of the size and the type of the reference encoding unit corresponding to the index indicating the size and type of the reference encoding unit may be predetermined. That is, the image decoding apparatus 200 selects at least one of the size and the shape of the reference encoding unit in accordance with the index, thereby obtaining at least one of the size and shape of the reference encoding unit included in the data unit, You can decide.

According to an exemplary embodiment, the image decoding apparatus 200 may use at least one reference encoding unit included in one maximum encoding unit. That is, the maximum encoding unit for dividing an image may include at least one reference encoding unit, and the encoding unit may be determined through a recursive division process of each reference encoding unit. According to an exemplary embodiment, at least one of the width and the height of the maximum encoding unit may correspond to at least one integer multiple of the width and height of the reference encoding unit. According to an exemplary embodiment, the size of the reference encoding unit may be a size obtained by dividing the maximum encoding unit n times according to a quadtree structure. That is, the image decoding apparatus 200 can determine the reference encoding unit by dividing the maximum encoding unit n times according to the quad-tree structure, and may determine the reference encoding unit based on at least one of the block type information and the division type information As shown in FIG.

31 shows a processing block serving as a reference for determining a determination order of a reference encoding unit included in a picture 3100 according to an embodiment.

According to one embodiment, the image decoding apparatus 200 may determine at least one processing block that divides a picture. The processing block is a data unit including at least one reference encoding unit for dividing an image, and at least one reference encoding unit included in the processing block may be determined in a specific order. That is, the order of determination of at least one reference encoding unit determined in each processing block may correspond to one of various kinds of order in which the reference encoding unit can be determined, and the reference encoding unit determination order determined in each processing block May be different for each processing block. The order of determination of the reference encoding unit determined for each processing block is a raster scan, a Z scan, an N scan, an up-right diagonal scan, a horizontal scan a horizontal scan, and a vertical scan. However, the order that can be determined should not be limited to the scan orders.

According to an exemplary embodiment, the image decoding apparatus 200 may obtain information on the size of the processing block to determine the size of the at least one processing block included in the image. The image decoding apparatus 200 may obtain information on the size of the processing block from the bitstream to determine the size of the at least one processing block included in the image. The size of such a processing block may be a predetermined size of a data unit represented by information on the size of the processing block.

According to an embodiment, the receiving unit 210 of the image decoding apparatus 200 may obtain information on the size of a processing block from a bit stream for each specific data unit. For example, information on the size of a processing block can be obtained from a bitstream in units of data such as an image, a sequence, a picture, a slice, a slice segment, or the like. That is, the receiving unit 210 may acquire information on the size of the processing block from the bitstream for each of the plurality of data units, and the image decoding apparatus 200 may obtain information on the size of the processing block, The size of one processing block may be determined, and the size of the processing block may be an integer multiple of the reference encoding unit.

According to an exemplary embodiment, the image decoding apparatus 200 may determine the sizes of the processing blocks 3102 and 3112 included in the picture 3100. For example, the video decoding apparatus 200 can determine the size of the processing block based on information on the size of the processing block obtained from the bitstream. Referring to FIG. 31, the image decoding apparatus 200 according to an exemplary embodiment of the present invention has a horizontal size of the processing blocks 3102 and 3112 of four times the horizontal size of the reference encoding unit, a vertical size of four times the vertical size of the reference encoding unit You can decide. The image decoding apparatus 200 may determine an order in which at least one reference encoding unit is determined in at least one processing block.

According to one embodiment, the video decoding apparatus 200 may determine each of the processing blocks 3102 and 3112 included in the picture 3100 based on the size of the processing block, and may include in the processing blocks 3102 and 3112 The determination order of at least one reference encoding unit is determined. The determination of the reference encoding unit may include determining the size of the reference encoding unit according to an embodiment.

According to an exemplary embodiment, the image decoding apparatus 200 may obtain information on a determination order of at least one reference encoding unit included in at least one processing block from a bitstream, So that the order in which at least one reference encoding unit is determined can be determined. The information on the decision order can be defined in the order or direction in which the reference encoding units are determined in the processing block. That is, the order in which the reference encoding units are determined may be independently determined for each processing block.

According to an exemplary embodiment, the image decoding apparatus 200 may obtain information on a determination order of a reference encoding unit from a bitstream for each specific data unit. For example, the receiving unit 210 may acquire information on the determination order of the reference encoding unit from a bitstream for each data unit such as an image, a sequence, a picture, a slice, a slice segment, and a processing block. Since the information on the determination order of the reference encoding unit indicates the reference encoding unit determination order in the processing block, the information on the determination order can be obtained for each specific data unit including an integer number of processing blocks.

The video decoding apparatus 200 may determine at least one reference encoding unit based on the determined order according to an embodiment.

According to an embodiment, the receiving unit 210 may obtain information on the reference encoding unit determination order from the bitstream as the information related to the processing blocks 3102 and 3112, and the video decoding apparatus 200 may receive the

information

3102, and 3112, and determine at least one reference encoding unit included in the picture 3100 according to the determination order of the encoding unit. Referring to FIG. 31, the image decoding apparatus 200 may determine a

determination order

3104 and 3114 of at least one reference encoding unit associated with each of the processing blocks 3102 and 3112. For example, when information on the determination order of reference encoding units is obtained for each processing block, the reference encoding unit determination order associated with each

processing block

3102 and 3112 may be different for each processing block. If the reference encoding unit determination order 3104 related to the processing block 3102 is a raster scan order, the reference encoding unit included in the processing block 3102 can be determined according to the raster scan order. On the other hand, when the reference encoding unit determination order 3114 related to the other processing block 3112 is a reverse order of the raster scan order, the reference encoding unit included in the processing block 3112 can be determined according to the reverse order of the raster scan order.

The image decoding apparatus 200 may decode the determined at least one reference encoding unit according to an embodiment. The image decoding apparatus 200 can decode an image based on the reference encoding unit determined through the above-described embodiment. The method of decoding the reference encoding unit may include various methods of decoding the image.

According to an exemplary embodiment, the image decoding apparatus 200 may obtain block type information indicating a type of a current encoding unit or division type information indicating a method of dividing a current encoding unit from a bitstream. The block type information or the division type information may be included in a bitstream related to various data units. For example, the video decoding apparatus 200 may include a sequence parameter set, a picture parameter set, a video parameter set, a slice header, a slice segment header slice segment type information included in the segment header can be used. Further, the image decoding apparatus 200 can obtain a syntax corresponding to the block type information or the division type information from the bitstream for each maximum coding unit, reference coding unit, and processing block, from the bitstream.

Various embodiments have been described above. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

The above-described embodiments of the present invention can be embodied in a general-purpose digital computer that can be embodied as a program that can be executed by a computer and operates the program using a computer-readable recording medium. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical reading medium (e.g., CD ROM,

Claims

Receiving a bitstream of the encoded image;

Determining at least one block segmented from the encoded image;

Performing prediction based on a DNN (Deep Neural Network) on a current block among the one or more blocks to generate prediction data for the current block;

Extracting residual data of the current block from the bitstream; And

And reconstructing the current block using the prediction data and the residual data.
The method according to claim 1,

Wherein the DNN is a network that has been learned to generate the prediction data that minimizes an error with the original data of the current block.
The method according to claim 1,

Wherein the step of generating the prediction data comprises:

Performing first prediction based on DNN for the current block to generate first prediction data;

Generating second prediction data by performing a second prediction based on DNN with respect to the first prediction data; And

And generating the prediction data using the first prediction data and the second prediction data.
The method of claim 3,

Wherein the first prediction is performed based on a RNN (Recurrent Neural Network), and the second prediction is performed based on a CNN (Convolutional Neural Network).
5. The method of claim 4,

Wherein the RNN is a network that is learned to generate the first prediction data that minimizes an error with the original data of the current block.
5. The method of claim 4,

Wherein the CNN is a network that is learned to generate the second predictive data that minimizes an error with a value obtained by subtracting the first predictive data from original data of the current block.
5. The method of claim 4,

Wherein the generating the first prediction data comprises:

Inputting neighboring blocks adjacent to the current block into cells of a long short-term memory (LSTM) network in the RNN according to a predetermined direction on a time step basis;

Generating a cell output for the time step using a plurality of gates in each cell; And

And processing the cell output via a fully connected network of the RNN.
8. The method of claim 7,

Wherein the step of inputting comprises:

Determining one or more input angles based on the current block;

Determining a neighboring block for each input angle located along each of the one or more input angles; And

And inputting neighboring blocks for each input angle to each cell of the LSTM network in a clockwise order.
9. The method of claim 8,

Wherein an input order between neighboring blocks located at the same input angle is a position closer to a position farther from the current block when there are a plurality of neighboring blocks according to the input angle.
8. The method of claim 7,

Wherein the step of inputting comprises:

And inputting the neighboring blocks to each cell of the LSTM network in the order of Z scans.
5. The method of claim 4,

Wherein the generating the second prediction data comprises:

Inputting the first prediction data and neighboring reconstructed data adjacent to the current block into a convolutional layer of the CNN; And

And performing a convolution operation using a plurality of filters of the convolution layer.
The method according to claim 1,

Wherein the step of generating the prediction data comprises:

Determining one or more reference pictures and one or more reference block locations to which the current block refers; And

Generating the prediction data using the one or more reference pictures and the one or more reference block positions.
The method according to claim 1,

Wherein information on the structure of the DNN and information on a block that performs prediction based on the DNN are obtained from at least one of a video parameter set, a sequence parameter set, and a picture parameter set of the bitstream.
A receiver for receiving a bit stream of an encoded image;

A block determining unit for determining one or more blocks divided from the encoded image;

A prediction unit for generating prediction data for the current block by performing prediction based on DNN (Deep Neural Network) on the current block among the one or more blocks; And

And a decompression unit for extracting the residual data of the current block from the bitstream and restoring the current block using the prediction data and the residual data.
Determining one or more blocks to divide an image;

Performing prediction based on a DNN (Deep Neural Network) on a current block among the one or more blocks to generate prediction data for the current block;

Generating residual data of the current block using original data corresponding to the current block and the prediction data; And

And generating a bitstream obtained by coding the residual data.