WO2019004664A1

WO2019004664A1 - Video signal processing method and device

Info

Publication number: WO2019004664A1
Application number: PCT/KR2018/007098
Authority: WO
Inventors: 이배근
Original assignee: 주식회사 케이티
Priority date: 2017-06-26
Filing date: 2018-06-22
Publication date: 2019-01-03
Also published as: KR20190001548A

Abstract

An image coding method, according to the present invention, comprises the steps of: generating a 360-degree projected image comprising a plurality of faces by means of the projective transformation of a three-dimensional 360-degree image on a two-dimensional plane; adding a padding region to a border on at least one side of the current face among the plurality of faces; and coding information associated with the padding of the current face, wherein the padding region is generated on the basis of a sample included in at least one portion of a face that is not neighboring the current face in the 360-degree projected image.

Description

Method and apparatus for video signal processing

The present invention relates to a video signal processing method and apparatus.

Recently, the demand for high resolution and high quality images such as high definition (HD) image and ultra high definition (UHD) image is increasing in various applications. As the image data has high resolution and high quality, the amount of data increases relative to the existing image data. Therefore, when the image data is transmitted using a medium such as a wired / wireless broadband line or stored using an existing storage medium, The storage cost is increased. High-efficiency image compression techniques can be utilized to solve such problems as image data becomes high-resolution and high-quality.

An inter picture prediction technique for predicting a pixel value included in a current picture from a previous or a subsequent picture of a current picture by an image compression technique, an intra picture prediction technique for predicting a pixel value included in a current picture using pixel information in the current picture, There are various techniques such as an entropy encoding technique in which a short code is assigned to a value having a high appearance frequency and a long code is assigned to a value having a low appearance frequency. Image data can be effectively compressed and transmitted or stored using such an image compression technique.

On the other hand, demand for high-resolution images is increasing, and demand for stereoscopic image content as a new image service is also increasing. Video compression techniques are being discussed to effectively provide high resolution and ultra-high resolution stereoscopic content.

It is an object of the present invention to provide a method and an apparatus for two-dimensionally projecting and converting a 360 degree image.

It is an object of the present invention to provide a method for adding a padding area to a boundary or face boundary of a 360 degree image.

SUMMARY OF THE INVENTION It is an object of the present invention to provide a method of performing padding using a neighboring face neighboring a current face in a three-dimensional space.

The technical objects to be achieved by the present invention are not limited to the above-mentioned technical problems, and other technical subjects which are not mentioned are described in the following description, which will be clearly understood by those skilled in the art to which the present invention belongs It will be possible.

A method of encoding an image according to the present invention includes the steps of generating a 360 degree projection image including a plurality of paces by projectively transforming a three dimensional 360 degree image into a two dimensional plane, Adding a padding area, and encoding the padding related information of the current face. At this time, the padding region may be generated based on a sample included in at least a part of a face that is not adjacent to the current face in the 360-degree projection image.

According to another aspect of the present invention, there is provided a method of decoding an image, the method comprising: decoding padding-related information of a current face; decoding a padding area on at least one border of the current face based on the padding- And generating a 360-degree image by projecting the 360-degree projection image including the projected image back onto the three-dimensional space. In this case, the padding region may be generated based on a sample included in at least a portion of a face that is not adjacent to the current face in the 360-degree projection image.

In the image coding / decoding method according to the present invention, the padding region may be configured such that when the neighboring pace neighboring the current pace in the 360 degree projection image is not adjacent to the current pace in the 360 degree image, May be added to the boundary of the current face that is tangential to the tangent plane.

In the image encoding / decoding method according to the present invention, the padding region may not be adjacent to the current face in the 360-degree projection image, but may copy a portion of a neighboring face neighboring the current face in the 360- Lt; / RTI >

In the image encoding / decoding method according to the present invention, the padding region may include a sample included in the current pace, and a neighboring pseudo image that is not adjacent to the current pace in the 360 degree projection image, Lt; / RTI > may be generated based on the average or weighted operation of the samples included in the neighboring paces.

In the image coding / decoding method according to the present invention, the shape of the padding area is not adjacent to the current face in the 360-degree projection image, but in the 360-degree image, the shape of the neighboring face adjacent to the current face is . &Lt; / RTI >

In the image encoding / decoding method according to the present invention, when the padding region is a non-rectangular shape, the value of a sample in the active region, to which the padding region is added, Can be determined by the depth.

The padding region may include a vertical padding region tangent to an upper or lower boundary of the current face and a horizontal padding region tangent to a left or right boundary of the current face, The length of the vertical padding region and the size of the horizontal padding region may be different.

In the image encoding / decoding method according to the present invention, the padding-related information may include at least one of information indicating whether the padding area exists, information indicating a position of the padding area, or information indicating a length of the padding area .

The features briefly summarized above for the present invention are only illustrative aspects of the detailed description of the invention which are described below and do not limit the scope of the invention.

According to the present invention, there is an advantage that the encoding / decoding efficiency can be improved by projectively transforming the 360 degree image into two dimensions.

According to the present invention, there is an advantage that a coding / decoding efficiency can be improved by adding a padding area to a border or face boundary of a 360-degree image.

According to the present invention, padding is performed using a neighboring face neighboring the current face in a three-dimensional space, thereby preventing image deterioration of the image.

The effects obtained by the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description will be.

1 is a block diagram illustrating an image encoding apparatus according to an embodiment of the present invention.

2 is a block diagram illustrating an image decoding apparatus according to an embodiment of the present invention.

3 is a diagram illustrating a partition mode that can be applied to a coding block when a coding block is coded by inter-picture prediction.

4 to 6 are views illustrating a camera apparatus for generating a panoramic image.

7 is a block diagram of a 360-degree video data generation apparatus and a 360-degree video play apparatus.

8 is a flowchart showing the operation of a 360-degree video data generation apparatus and a 360-degree video play apparatus.

Figure 9 shows a 2D projection method using the isometric quadrature method.

10 shows a 2D projection method using a cube projection method.

11 shows a 2D projection method using a bipartite projection technique.

12 shows a 2D projection method using an octahedral projection technique.

13 shows a 2D projection method using a cutting pyramid projection technique.

14 shows a 2D projection method using an SSP projection technique.

Fig. 15 is a diagram illustrating the conversion between the face 2D coordinate and the three-dimensional coordinate.

16 is a diagram for explaining an example in which padding is performed in an ERP projected image.

17 is a view for explaining an example in which the lengths of the padding regions in the horizontal direction and the vertical direction are differently set in the ERP projection image.

18 is a diagram showing an example in which padding is performed at the boundary of the face.

19 is a diagram showing an example of determining a sample value of a padding area between paces.

20 is a view illustrating a CMP-based 360 degree projection image.

21 is a diagram showing an example in which a plurality of data is included in one face.

22 is a diagram showing a 360-degree projection image in which each face is configured to include a plurality of faces.

FIGS. 23 and 24 are views showing a 360-degree projection image based on the TPP technique in which face overlap padding is performed.

FIG. 25 is a diagram showing a 360-degree projection image based on OHP technique considering continuity of images. FIG.

FIG. 26 is a diagram illustrating a 360-degree projection image based on the OHP technique in which face overlap padding is performed.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Hereinafter, the same reference numerals will be used for the same constituent elements in the drawings, and redundant explanations for the same constituent elements will be omitted.

1, the image encoding apparatus 100 includes a picture division unit 110,

prediction units

120 and 125, a transform unit 130, a quantization unit 135, a reordering unit 160, an entropy encoding unit An inverse quantization unit 140, an inverse transform unit 145, a filter unit 150, and a memory 155. [

Each of the components shown in FIG. 1 is shown independently to represent different characteristic functions in the image encoding apparatus, and does not mean that each component is composed of separate hardware or one software configuration unit. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of the constituent units may be combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and separate embodiments of the components are also included within the scope of the present invention, unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

The picture division unit 110 may divide the input picture into at least one processing unit. At this time, the processing unit may be a prediction unit (PU), a transform unit (TU), or a coding unit (CU). The picture division unit 110 divides one picture into a plurality of coding units, a prediction unit, and a combination of conversion units, and generates a coding unit, a prediction unit, and a conversion unit combination So that the picture can be encoded.

For example, one picture may be divided into a plurality of coding units. In order to divide a coding unit in a picture, a recursive tree structure such as a quad tree structure can be used. In a coding or decoding scheme in which one picture or a largest coding unit is used as a root and divided into other coding units A unit can be divided with as many child nodes as the number of divided coding units. Under certain constraints, an encoding unit that is no longer segmented becomes a leaf node. That is, when it is assumed that only one square division is possible for one coding unit, one coding unit can be divided into a maximum of four different coding units.

Hereinafter, in the embodiment of the present invention, a coding unit may be used as a unit for performing coding, or may be used as a unit for performing decoding.

The prediction unit may be one divided into at least one square or rectangular shape having the same size in one coding unit, and one of the prediction units in one coding unit may be divided into another prediction Or may have a shape and / or size different from the unit.

If a prediction unit performing intra prediction on the basis of an encoding unit is not the minimum encoding unit at the time of generation, intraprediction can be performed without dividing the prediction unit into a plurality of prediction units NxN.

The

prediction units

120 and 125 may include an inter prediction unit 120 for performing inter prediction and an intra prediction unit 125 for performing intra prediction. It is possible to determine whether to use inter prediction or intra prediction for a prediction unit and to determine concrete information (e.g., intra prediction mode, motion vector, reference picture, etc.) according to each prediction method. At this time, the processing unit in which the prediction is performed may be different from the processing unit in which the prediction method and the concrete contents are determined. For example, the method of prediction, the prediction mode and the like are determined as a prediction unit, and the execution of the prediction may be performed in a conversion unit. The residual value (residual block) between the generated prediction block and the original block can be input to the conversion unit 130. [ In addition, the prediction mode information, motion vector information, and the like used for prediction can be encoded by the entropy encoding unit 165 together with the residual value and transmitted to the decoder. When a particular encoding mode is used, it is also possible to directly encode the original block and transmit it to the decoding unit without generating a prediction block through the

prediction units

120 and 125.

The inter-prediction unit 120 may predict a prediction unit based on information of at least one of a previous picture or a following picture of the current picture, and may predict a prediction unit based on information of a partially- Unit may be predicted. The inter prediction unit 120 may include a reference picture interpolation unit, a motion prediction unit, and a motion compensation unit.

In the reference picture interpolating section, the reference picture information is supplied from the memory 155 and pixel information of an integer pixel or less can be generated in the reference picture. In the case of a luminance pixel, a DCT-based interpolation filter having a different filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of quarter pixels. In the case of a color difference signal, a DCT-based 4-tap interpolation filter having a different filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of 1/8 pixel.

The motion prediction unit may perform motion prediction based on the reference picture interpolated by the reference picture interpolating unit. Various methods such as Full Search-based Block Matching Algorithm (FBMA), Three Step Search (TSS), and New Three-Step Search Algorithm (NTS) can be used as methods for calculating motion vectors. The motion vector may have a motion vector value of 1/2 or 1/4 pixel unit based on the interpolated pixel. The motion prediction unit can predict the current prediction unit by making the motion prediction method different. Various methods such as a skip method, a merge method, an AMVP (Advanced Motion Vector Prediction) method, and an Intra Block Copy method can be used as the motion prediction method.

The intra prediction unit 125 can generate a prediction unit based on reference pixel information around the current block which is pixel information in the current picture. In the case where the neighboring block of the current prediction unit is the block in which the inter prediction is performed so that the reference pixel is the pixel performing the inter prediction, the reference pixel included in the block in which the inter prediction is performed is referred to as the reference pixel Information. That is, when the reference pixel is not available, the reference pixel information that is not available may be replaced by at least one reference pixel among the available reference pixels.

In intra prediction, the prediction mode may have a directional prediction mode in which reference pixel information is used according to a prediction direction, and a non-directional mode in which direction information is not used in prediction. The mode for predicting the luminance information may be different from the mode for predicting the chrominance information and the intra prediction mode information or predicted luminance signal information used for predicting the luminance information may be utilized to predict the chrominance information.

When intraprediction is performed, when the size of the prediction unit is the same as the size of the conversion unit, intra prediction is performed on the prediction unit based on pixels existing on the left side of the prediction unit, pixels existing on the upper left side, Can be performed. However, when intra prediction is performed, when the size of the prediction unit differs from the size of the conversion unit, intraprediction can be performed using the reference pixel based on the conversion unit. It is also possible to use intraprediction using NxN partitioning only for the minimum encoding unit.

The intra prediction method can generate a prediction block after applying an AIS (Adaptive Intra Smoothing) filter to the reference pixel according to the prediction mode. The type of the AIS filter applied to the reference pixel may be different. In order to perform the intra prediction method, the intra prediction mode of the current prediction unit can be predicted from the intra prediction mode of the prediction unit existing around the current prediction unit. In the case where the prediction mode of the current prediction unit is predicted using the mode information predicted from the peripheral prediction unit, if the intra prediction mode of the current prediction unit is the same as the intra prediction mode of the current prediction unit, The prediction mode information of the current block can be encoded by performing entropy encoding if the prediction mode of the current prediction unit is different from the prediction mode of the neighbor prediction unit.

In addition, a residual block including a prediction unit that has been predicted based on the prediction unit generated by the

prediction units

120 and 125 and a residual value that is a difference value from the original block of the prediction unit may be generated. The generated residual block may be input to the transform unit 130. [

The transform unit 130 transforms the residual block including the residual information of the prediction unit generated through the original block and the

predictors

120 and 125 into a DCT (Discrete Cosine Transform), a DST (Discrete Sine Transform), a KLT You can convert using the same conversion method. The decision to apply the DCT, DST, or KLT to transform the residual block may be based on the intra prediction mode information of the prediction unit used to generate the residual block.

The quantization unit 135 may quantize the values converted into the frequency domain by the conversion unit 130. [ The quantization factor may vary depending on the block or the importance of the image. The values calculated by the quantization unit 135 may be provided to the inverse quantization unit 140 and the reorder unit 160.

The reordering unit 160 can reorder the coefficient values with respect to the quantized residual values.

The reordering unit 160 may change the two-dimensional block type coefficient to a one-dimensional vector form through a coefficient scanning method. For example, the rearranging unit 160 may scan a DC coefficient to a coefficient in a high frequency region using a Zig-Zag scan method, and change the DC coefficient to a one-dimensional vector form. Instead of the jig-jag scan, a vertical scan may be used to scan two-dimensional block type coefficients in a column direction, and a horizontal scan to scan a two-dimensional block type coefficient in a row direction depending on the size of the conversion unit and the intra prediction mode. That is, it is possible to determine whether any scanning method among the jig-jag scan, the vertical direction scan and the horizontal direction scan is used according to the size of the conversion unit and the intra prediction mode.

The entropy encoding unit 165 may perform entropy encoding based on the values calculated by the reordering unit 160. For entropy encoding, various encoding methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) may be used.

The entropy encoding unit 165 receives the residual value count information of the encoding unit, the block type information, the prediction mode information, the division unit information, the prediction unit information and the transmission unit information, and the motion information of the motion unit from the reordering unit 160 and the

prediction units

120 and 125 Vector information, reference frame information, interpolation information of a block, filtering information, and the like.

The entropy encoding unit 165 can entropy-encode the coefficient value of the encoding unit input by the reordering unit 160. [

The inverse quantization unit 140 and the inverse transformation unit 145 inverse quantize the quantized values in the quantization unit 135 and inversely transform the converted values in the conversion unit 130. [ The residual value generated by the inverse quantization unit 140 and the inverse transform unit 145 is combined with the prediction unit predicted through the motion estimation unit, the motion compensation unit and the intra prediction unit included in the

prediction units

120 and 125, A block (Reconstructed Block) can be generated.

The filter unit 150 may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).

The deblocking filter can remove block distortion caused by the boundary between the blocks in the reconstructed picture. It may be determined whether to apply a deblocking filter to the current block based on pixels included in a few columns or rows included in the block to determine whether to perform deblocking. When a deblocking filter is applied to a block, a strong filter or a weak filter may be applied according to the deblocking filtering strength required. In applying the deblocking filter, horizontal filtering and vertical filtering may be performed concurrently in performing vertical filtering and horizontal filtering.

The offset correction unit may correct the offset of the deblocked image with respect to the original image in units of pixels. In order to perform offset correction for a specific picture, pixels included in an image are divided into a predetermined number of areas, and then an area to be offset is determined and an offset is applied to the area. Alternatively, Can be used.

Adaptive Loop Filtering (ALF) can be performed based on a comparison between the filtered reconstructed image and the original image. After dividing the pixels included in the image into a predetermined group, one filter to be applied to the group may be determined and different filtering may be performed for each group. The information related to whether to apply the ALF may be transmitted for each coding unit (CU), and the shape and the filter coefficient of the ALF filter to be applied may be changed according to each block. Also, an ALF filter of the same type (fixed form) may be applied irrespective of the characteristics of the application target block.

The memory 155 may store the reconstructed block or picture calculated through the filter unit 150 and the reconstructed block or picture stored therein may be provided to the

predictor

120 or 125 when the inter prediction is performed.

2, the image decoder 200 includes an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225,

prediction units

230 and 235, 240, and a memory 245 may be included.

When an image bitstream is input in the image encoder, the input bitstream may be decoded in a procedure opposite to that of the image encoder.

The entropy decoding unit 210 can perform entropy decoding in a procedure opposite to that in which entropy encoding is performed in the entropy encoding unit of the image encoder. For example, various methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) may be applied in accordance with the method performed by the image encoder.

The entropy decoding unit 210 may decode information related to intra prediction and inter prediction performed in the encoder.

The reordering unit 215 can perform reordering based on a method in which the entropy decoding unit 210 rearranges the entropy-decoded bitstreams in the encoding unit. The coefficients represented by the one-dimensional vector form can be rearranged by restoring the coefficients of the two-dimensional block form again. The reordering unit 215 can perform reordering by receiving information related to the coefficient scanning performed by the encoding unit and performing a reverse scanning based on the scanning order performed by the encoding unit.

The inverse quantization unit 220 can perform inverse quantization based on the quantization parameters provided by the encoder and the coefficient values of the re-arranged blocks.

The inverse transform unit 225 may perform an inverse DCT, an inverse DST, and an inverse KLT on the DCT, DST, and KLT transformations performed by the transform unit on the quantization result performed by the image encoder. The inverse transform can be performed based on the transmission unit determined by the image encoder. In the inverse transform unit 225 of the image decoder, a transform technique (e.g., DCT, DST, KLT) may be selectively performed according to a plurality of information such as a prediction method, a size of a current block, and a prediction direction.

The

prediction units

230 and 235 can generate a prediction block based on the prediction block generation related information provided by the entropy decoding unit 210 and the previously decoded block or picture information provided in the memory 245. [

As described above, when intra prediction is performed in the same manner as in the image encoder, when the size of the prediction unit is the same as the size of the conversion unit, pixels existing on the left side of the prediction unit, pixels existing on the upper left side, However, when the size of the prediction unit differs from the size of the prediction unit in intra prediction, intraprediction is performed using a reference pixel based on the conversion unit . It is also possible to use intra prediction using NxN division only for the minimum coding unit.

The

prediction units

230 and 235 may include a prediction unit determination unit, an inter prediction unit, and an intra prediction unit. The prediction unit determination unit receives various information such as prediction unit information input from the entropy decoding unit 210, prediction mode information of the intra prediction method, motion prediction related information of the inter prediction method, and identifies prediction units in the current coding unit. It is possible to determine whether the unit performs inter prediction or intra prediction. The inter prediction unit 230 predicts the current prediction based on the information included in at least one of the previous picture of the current picture or the following picture including the current prediction unit by using information necessary for inter prediction of the current prediction unit provided by the image encoder, Unit can be performed. Alternatively, the inter prediction may be performed on the basis of the information of the partial region previously reconstructed in the current picture including the current prediction unit.

In order to perform inter prediction, a motion prediction method of a prediction unit included in a corresponding encoding unit on the basis of an encoding unit includes a skip mode, a merge mode, an AMVP mode, and an intra block copy mode It is possible to judge whether or not it is any method.

The intra prediction unit 235 can generate a prediction block based on the pixel information in the current picture. If the prediction unit is a prediction unit that performs intra prediction, the intra prediction can be performed based on the intra prediction mode information of the prediction unit provided by the image encoder. The intraprediction unit 235 may include an AIS (Adaptive Intra Smoothing) filter, a reference pixel interpolator, and a DC filter. The AIS filter performs filtering on the reference pixels of the current block and can determine whether to apply the filter according to the prediction mode of the current prediction unit. The AIS filtering can be performed on the reference pixel of the current block using the prediction mode of the prediction unit provided in the image encoder and the AIS filter information. When the prediction mode of the current block is a mode in which AIS filtering is not performed, the AIS filter may not be applied.

The reference pixel interpolator may interpolate the reference pixels to generate reference pixels in units of pixels less than or equal to an integer value when the prediction mode of the prediction unit is a prediction unit that performs intra prediction based on pixel values obtained by interpolating reference pixels. The reference pixel may not be interpolated in the prediction mode in which the prediction mode of the current prediction unit generates the prediction block without interpolating the reference pixel. The DC filter can generate a prediction block through filtering when the prediction mode of the current block is the DC mode.

The restored block or picture may be provided to the filter unit 240. The filter unit 240 may include a deblocking filter, an offset correction unit, and an ALF.

When information on whether a deblocking filter is applied to a corresponding block or picture from the image encoder or a deblocking filter is applied, information on whether a strong filter or a weak filter is applied can be provided. In the deblocking filter of the video decoder, the deblocking filter related information provided by the video encoder is provided, and the video decoder can perform deblocking filtering for the corresponding block.

The offset correction unit may perform offset correction on the reconstructed image based on the type of offset correction applied to the image and the offset value information during encoding.

The ALF can be applied to an encoding unit on the basis of ALF application information and ALF coefficient information provided from an encoder. Such ALF information may be provided in a specific parameter set.

The memory 245 may store the reconstructed picture or block to be used as a reference picture or a reference block, and may also provide the reconstructed picture to the output unit.

As described above, in the embodiment of the present invention, a coding unit (coding unit) is used as a coding unit for convenience of explanation, but it may be a unit for performing not only coding but also decoding.

The current block indicates a block to be coded / decoded. Depending on the coding / decoding step, the current block includes a coding tree block (or coding tree unit), a coding block (or coding unit), a transform block (Or prediction unit), and the like. In this specification, 'unit' represents a basic unit for performing a specific encoding / decoding process, and 'block' may represent a sample array of a predetermined size. Unless otherwise indicated, the terms 'block' and 'unit' may be used interchangeably. For example, in the embodiments described below, it can be understood that the encoding block (coding block) and the encoding unit (coding unit) have mutually equivalent meanings.

One picture may be divided into a square block or a non-square basic block and then encoded / decoded. At this time, the basic block may be referred to as a coding tree unit. The coding tree unit may be defined as a coding unit of the largest size allowed in a sequence or a slice. Information regarding whether the coding tree unit is square or non-square or about the size of the coding tree unit can be signaled through a sequence parameter set, a picture parameter set, or a slice header. The coding tree unit can be divided into smaller size partitions. In this case, if the partition generated by dividing the coding tree unit is depth 1, the partition created by dividing the partition having depth 1 can be defined as depth 2. That is, the partition created by dividing the partition having the depth k in the coding tree unit can be defined as having the depth k + 1.

A partition of arbitrary size generated as the coding tree unit is divided can be defined as a coding unit. The coding unit may be recursively divided or divided into basic units for performing prediction, quantization, transformation, or in-loop filtering, and the like. In one example, a partition of arbitrary size generated as a coding unit is divided may be defined as a coding unit, or may be defined as a conversion unit or a prediction unit, which is a basic unit for performing prediction, quantization, conversion or in-loop filtering and the like.

Alternatively, if a coding block is determined, a prediction block having the same size as the coding block or smaller than the coding block can be determined through predictive division of the coding block. Predictive partitioning of the coded block can be performed by a partition mode (Part_mode) indicating the partition type of the coded block. The size or shape of the prediction block may be determined according to the partition mode of the coding block. The division type of the coding block can be determined through information specifying any one of the partition candidates. At this time, the partition candidates available to the coding block may include an asymmetric partition type (for example, nLx2N, nRx2N, 2NxnU, 2NxnD) depending on the size, type, coding mode or the like of the coding block. In one example, the partition candidate available to the coding block may be determined according to the coding mode of the current block. For example, FIG. 3 illustrates a partition mode that can be applied to a coding block when the coding block is coded by inter-picture prediction.

When the coding block is coded by the inter-picture prediction, one of eight partitioning modes can be applied to the coding block, as in the example shown in Fig.

On the other hand, when the coding block is coded by the intra prediction, the coding mode can be applied to the partition mode PART_2Nx2N or PART_NxN.

PART_NxN may be applied when the coding block has a minimum size. Here, the minimum size of the coding block may be one previously defined in the encoder and the decoder. Alternatively, information regarding the minimum size of the coding block may be signaled via the bitstream. In one example, the minimum size of the coding block is signaled through the slice header, so that the minimum size of the coding block per slice can be defined.

In another example, the partition candidates available to the coding block may be determined differently depending on at least one of the size or type of the coding block. In one example, the number or type of partition candidates available to the coding block may be differently determined according to at least one of the size or type of the coding block.

Alternatively, the type or number of asymmetric partition candidates among the partition candidates available to the coding block may be limited depending on the size or type of the coding block. In one example, the number or type of asymmetric partition candidates available to the coding block may be differently determined according to at least one of the size or type of the coding block.

In general, the size of the prediction block may have a size from 64x64 to 4x4. However, when the coding block is coded by inter-picture prediction, it is possible to prevent the prediction block from having a 4x4 size in order to reduce the memory bandwidth when performing motion compensation.

Depending on the angle of view of the camera, the view of the video captured by the camera is limited. In order to overcome this problem, it is possible to capture a video using a plurality of cameras and stitch the photographed video to form one video or one bit stream. For example, FIGS. 4 to 6 show an example in which a plurality of cameras are used to photograph up and down, right and left, or front and back at the same time. As described above, a video generated by stitching a plurality of videos can be referred to as a panoramic video. In particular, an image having a degree of freedom (Degree of Freedom) based on a predetermined center axis can be referred to as a 360-degree video. For example, the 360 degree video may be an image having rotational degrees of freedom for at least one of Yaw, Roll, and Pitch.

The camera structure (or camera arrangement) for acquiring 360-degree video may have a circular arrangement, as in the example shown in Fig. 4, or a one-dimensional vertical / horizontal arrangement as in the example shown in Fig. Or a two-dimensional arrangement (i.e., a combination of vertical arrangement and horizontal arrangement) as in the example shown in Fig. 5 (b). Alternatively, as in the example shown in Fig. 6, a plurality of cameras may be mounted on the spherical device.

The embodiments described below will be described with reference to 360-degree video, but it will be within the technical scope of the present invention to apply the embodiments described below to panoramic video that is not 360-degree video.

FIG. 7 is a block diagram of a 360-degree video data generation apparatus and a 360-degree video play apparatus, and FIG. 8 is a flowchart illustrating operations of a 360-degree video data generation apparatus and a 360-degree video data apparatus.

7, the 360-degree video data generation apparatus includes a projection unit 710, a frame packing unit 720, an encoding unit 730, and a transmission unit 740, A parsing unit 750, a decoding unit 760, a frame deblocking unit 770, and an inverse decoding unit 780. The encoding unit and the decoding unit shown in FIG. 7 may correspond to the image encoding apparatus and the image decoding apparatus shown in FIG. 1 and FIG. 2, respectively.

The data generation apparatus can determine a projection transformation technique of a 360-degree image generated by stitching an image photographed by a plurality of cameras. In the projection unit 710, the 3D shape of the 360-degree video is determined according to the determined projection transformation technique, and the 360-degree video is projected on the 2D plane according to the determined 3D shape (S801). Here, the projection transformation technique can represent a 3D shape of 360-degree video and an aspect in which 360-degree video is developed on the 2D plane. 360 degree images can be approximated to have shapes such as spheres, cylinders, cubes, octahedrons, or regular twins, etc., in 3D space according to projection transformation techniques. According to the projection transformation technique, an image generated by projecting a 360-degree video onto a 2D plane can be referred to as a 360-degree projection image.

The 360 degree projection image may be composed of at least one face according to the projection transformation technique. For example, when a 360-degree video is approximated as a polyhedron, each face constituting the polyhedron can be defined as a pace. Alternatively, the specific surface constituting the polyhedron may be divided into a plurality of regions, and each divided region may be configured to form a separate face. Alternatively, a plurality of faces on the polyhedron may be configured to form one face. 360 degree video, which approximates spherical shape, can have multiple faces according to the projection transformation technique.

Frame packing may be performed in the frame packing unit 720 in order to increase the encoding / decoding efficiency of the 360-degree video (S802). The frame packing may include at least one of rearranging, resizing, warping, rotating, or flipping the face. Through the frame packing, the 360 degree projection image can be converted into a form having a high encoding / decoding efficiency (for example, a rectangle) or discontinuous data between faces can be removed. The frame packing may also be referred to as frame reordering or Region-wise Packing. The frame packing may be selectively performed to improve the coding / decoding efficiency for the 360 degree projection image.

In the encoding unit 730, the 360-degree projection image or the 360-degree projection image in which the frame packing is performed may be encoded (S803). At this time, the encoding unit 730 may encode information indicating a projection transformation technique for 360-degree video. Here, the information indicating the projection transformation technique may be index information indicating any one of a plurality of projection transformation techniques.

In addition, the encoding unit 730 can encode information related to frame packing for 360-degree video. Here, the information related to the frame packing may include at least one of whether or not frame packing has been performed, the number of paces, the position of the pace, the size of the pace, the shape of the pace, or the rotation information of the pace.

The transmitting unit 740 encapsulates the bit stream and transmits the encapsulated data to the player terminal (S804).

The file parsing unit 750 can parse the file received from the content providing apparatus (S805). In the decoding unit 760, the 360-degree projection image can be decoded using the parsed data (S806).

If frame packing is performed on the 360-degree projection image, the frame deblocking unit 760 may perform a frame de-packing (Region-wise depacking), which is opposite to the frame packing performed on the content providing side (S807). The frame de-packing may be to restore the frame-packed 360 degree projection image to before the frame packing is performed. For example, frame de-packing may be to reverse the pacing, resizing, warping, rotation, or flipping performed at the data generating device.

The inverse transformation unit 780 can perform inverse projection on the 360 degree projection image on the 2D plane in 3D form according to the projection transformation technique of 360 degree video (S808).

Projection transformation techniques include ERP, Equirectangular Procction, Cube Map Projection (CMP), Icosahedral Projection (ISP), Octahedron Projection (OHP), Cutting Pyramid And may include at least one of Truncated Pyramid Projection (TPP), Sphere Segment Projection (SSP), Equatorial Cylindrical Projection (ECP), and rotated spherical projection (RSP).

Figure 9 shows a 2D projection method using the isometric quadrature method.

The isometric method is a method of projecting a pixel corresponding to a sphere into a rectangle having an aspect ratio of N: 1, which is the most widely used 2D transformation technique. Here, N may be 2, or may be 2 or less or 2 or more real numbers. When using the isometrical method, the actual length of the sphere corresponding to the unit length on the 2D plane becomes shorter as the sphere becomes closer to the sphere. For example, the coordinates of both ends of the unit length on the 2D plane may correspond to a distance difference of 20 cm in the vicinity of the sphere of the sphere, and a distance difference of 5 cm in the vicinity of the sphere of the sphere. As a result, the isochronous quadrature method has a disadvantage in that the image is distorted in the vicinity of the sphere and the coding efficiency is lowered.

10 shows a 2D projection method using a cube projection method.

The cube projection method approximates a 360 degree video with a cube and then transforms the cube into 2D. When projecting a 360 degree video into a cube, one face (or plane) is configured to be adjacent to the four faces. Since the continuity between faces is high, the cube projection method has an advantage in that the coding efficiency is higher than that of the isotropic square method. After the 360 degree video is projected and converted into 2D, the 2D projection converted image may be rearranged into a rectangular shape to perform encoding / decoding.

11 shows a 2D projection method using a bipartite projection technique.

The trilateral projection method is a method of approximating a 360-degree video to a twenty-sided shape and transforming it into 2D. The twin-sided projection technique has a strong continuity between faces. As in the example shown in FIG. 11, it is also possible to perform coding / decoding by rearranging the faces in the 2D projection-converted image.

12 shows a 2D projection method using an octahedral projection technique.

The octahedron projection method is a method of approximating a 360 degree video to an octahedron and transforming it into 2D. The octahedral projection technique is characterized by strong continuity between faces. As in the example shown in FIG. 12, it is possible to perform encoding / decoding by rearranging the faces in the 2D projection-converted image.

13 shows a 2D projection method using a cutting pyramid projection technique.

The truncated pyramid projection technique is a method of approximating a 360 degree video with a cutting pyramid and transforming it into 2D. Under the truncated pyramid projection technique, frame packing may be performed such that the face at a particular point in time has a different size from the neighboring face. For example, as in the example shown in FIG. 13, the Front face may have a larger size than the side face and the back face. In the case of using the cutting pyramid projection technique, the image data at a specific point in time is large and the encoding / decoding efficiency at a specific point is higher than that at the other points.

14 shows a 2D projection method using an SSP projection technique.

The SSP is a method of performing 2D projection transformation by dividing spherical 360 degree video into high latitude regions and mid-latitude regions. Specifically, as in the example shown in Fig. 14, two high-latitude regions in the north and south directions of the sphere can be mapped to two circles on the 2D plane, and the mid-latitude region of the sphere can be mapped to a rectangle on the 2D plane like the ERP. The boundary between high latitudes and mid-latitudes may be 45 degrees latitude or above or below latitude 45 degrees.

ECP is a method of transforming spherical 360 degree video into cylindrical shape and then 2D cylindrical projection of 360 degree video. Specifically, when the ECP is followed, the upper and lower surfaces of the cylinder can be mapped to two circles on the 2D plane, and the body of the cylinder can be mapped to a rectangle on the 2D plane.

The RSP represents a method of projecting and transforming a sphere-shaped 360-degree video around a tennis ball into two ellipses on a 2D plane.

Each sample of the 360 degree projection image can be identified by face 2D coordinates. The face 2D coordinates may include an index f for identifying the face where the sample is located, and coordinates (m, n) representing a sample grid in the 360 degree projection image.

Through the conversion between face 2D coordinates and three-dimensional coordinates, 2D projection transformation and image rendering can be performed. For example, FIG. 15 is an illustration to illustrate the conversion between face 2D coordinates and three-dimensional coordinates. (X, y, z) and the face 2D coordinates (f, m, n) can be performed using the following equations (1) have.

In the 360 degree projection image, the current picture may include at least one face. At this time, the number of faces may be 1, 2, 3, 4 or more natural numbers, depending on the projection method. In the face 2D coordinates, f may be set to a value equal to or less than the number of faces. The current picture may include at least one pace having the same temporal order or output order (POC).

Alternatively, the number of paces constituting the current picture may be fixed or variable. For example, the number of paces constituting the current picture may be limited so as not to exceed a predetermined threshold value. Here, the threshold value may be a fixed value promised in the encoder and the decoder. Alternatively, information regarding the maximum number of paces constituting one picture may be signaled through the bit stream.

Paces can be determined by partitioning the current picture using at least one of horizontal, vertical, or diagonal lines, depending on the projection method.

Each face in the picture may be assigned an index to identify each face. Each face may be capable of parallel processing, such as a tile or a slice. Accordingly, when intra prediction or inter prediction of the current block is performed, a neighboring block belonging to a different face from the current block can be judged as unavailable.

Pairs that do not allow parallel processing (or non-parallel processing regions) may be defined, or interdependent paces may be defined. For example, paces for which parallel processing is not allowed or interdependent paces may be sequentially encoded / decoded instead of being parallel-encoded / decoded. Accordingly, even if the neighboring block belongs to a different pace than the current block, the neighboring block may be determined to be available for intra prediction or inter prediction of the current block, depending on whether inter-face parallel processing is possible or dependency.

In order to increase the efficiency of encoding / decoding the 360 degree projection image, padding can be performed at a picture or face boundary. The padding may be performed as a part of performing the frame packing (S802), or may be performed as a separate step before performing the frame packing. Alternatively, padding may be performed in the preprocessing process before encoding the 360-degree projection image in which the frame packing is performed, or padding may be performed as a part of the encoding step S803.

The padding can be performed considering the continuity of the 360 degree image. The continuity of the 360 degree image may mean spatially continuous when the 360 degree projection image is projected backward as a sphere or a polyhedron. For example, when projecting a 360 degree projection image back into a sphere or a polyhedron, it can be understood that the spatially continuous paces have mutual continuity. Padding between pictures or face boundaries may be performed using spatially continuous samples.

When ERP is used, it is possible to obtain a 360-degree projection image of two dimensions by spreading a 360-degree image approximated by spheres into a rectangle having a ratio of 2: 1. When a rectangular 360 degree projection image is projected back to the sphere, the left boundary of the 360 degree projection image has continuity with the right boundary. For example, in the example shown in Fig. 16, pixels A, B and C outside the left border line can be expected to have values similar to pixels A ', B' and C 'inside the right border line, It is expected that the pixels D, E, and F of the left border line have a value similar to the pixels D ', E', and F 'inside the left boundary line.

Also, based on the vertical center line dividing the 360 degree projection image into two halves, the upper boundary on the left has continuity with the upper boundary on the right. For example, in the example shown in Fig. 16, pixels G and H outside the upper left boundary line can be predicted to be similar to the inner pixels G 'and H' of the upper right boundary, and pixels I and J Can be predicted to be similar to the inner pixels I 'and J' of the upper left boundary.

Likewise, based on the vertical center line bisecting the 360 degree projection image, the upper left boundary has continuity with the upper right boundary. For example, in the example shown in FIG. 16, pixels K and L outside the lower left boundary line can be predicted to be similar to the inner pixels K 'and L' of the lower right boundary, and pixels M and N Can be predicted to be similar to the inner pixels M 'and N' of the lower left boundary.

In consideration of continuity in the three-dimensional space, padding can be performed at the boundary of the 360 degree projection image or at the boundary between faces. Specifically, the padding can be performed using samples contained inside the boundary having continuity with the boundary where the padding is performed. For example, in the example shown in FIG. 16, padding is performed using the samples adjacent to the right boundary at the left boundary of the 360 degree projection image, and padding is performed using the samples adjacent to the left boundary at the right boundary of the 360 degree projection image . That is, at positions A, B and C of the left boundary, padding can be performed using samples at positions A ', B' and C 'contained inside the right boundary, and the positions D, E and F , Padding can be performed using samples of the positions of D ', E' and F 'included inside the left boundary.

Also, when the upper boundary is divided, padding is performed using samples adjacent to the upper right boundary at the upper left boundary, and padding can be performed using samples adjacent to the upper left boundary at the upper right boundary. That is, at the G and H positions of the upper left boundary, padding is performed using the samples at G 'and H' positions contained in the upper right boundary, and at the I and J positions of the upper right boundary, The padding can be performed by using the samples of the positions I 'and J' contained inside.

Likewise, when the lower boundary is bisected, padding may be performed using samples adjacent to the lower-right boundary at the lower left boundary, and padding may be performed using samples adjacent to the lower left boundary at the lower right boundary. That is, at the K and L positions of the lower left boundary, padding is performed using samples at positions K 'and L' included in the upper right boundary, and at the M and N positions of the upper right boundary, The padding can be performed using the samples at the positions M 'and N' included in the inner side of the padding.

An area where padding is performed may be referred to as a padding area, and a padding area may include a plurality of sample lines. At this time, the number of sample lines included in the padding area can be defined as the length of the padding area or the padding size. In Fig. 16, the length of the padding area is shown as k in both the horizontal and vertical directions.

The length of the padding area may be set differently for each horizontal or vertical direction, or different for each face boundary. In particular, when the ERP projection transformation is used, the closer to the upper or lower end of the 360 degree projection image, the shorter the actual length of the sphere corresponding to the unit length. Thus, large distortion occurs at the upper or lower end of the 360 degree projection image using the ERP projection transformation. In order to minimize the reduction in encoding / decoding efficiency due to the occurrence of distortion, it is possible to consider a method of adaptively setting the length of the padding region according to the degree of distortion, or using a smoothing filter.

In the example shown in Fig. 17, the length of the arrow indicates the length of the padding area.

The length of the padding area performed in the horizontal direction and the length of the padding area performed in the vertical direction may be set differently, as in the example shown in FIG. For example, if k columns of samples are generated through padding in the horizontal direction, padding may be performed such that 2k rows of samples are generated in the vertical direction.

As another example, padding may be performed with the same length in both the vertical direction and the horizontal direction, but the length of the padding area may be posteriorly extended through interpolation in at least one of the vertical direction and the horizontal direction. For example, k sample lines in the vertical direction and horizontal direction can be generated, and k sample lines can be additionally generated in the vertical direction through interpolation or the like. That is, k sample lines are generated in both the horizontal and vertical directions (see FIG. 16), and k sample lines are further generated for the vertical direction so that the length in the vertical direction is 2k (refer to FIG. 17) .

Interpolation may be performed using at least one of the samples contained within the boundary or the sample contained outside the boundary. For example, after copying the samples inside the lower boundary to the outside of the padding area adjacent to the upper boundary, additional padding areas can be created by interpolating the copied samples and the samples contained in the padding area adjacent to the upper boundary . The interpolation filter may include at least one of a vertical direction filter and a horizontal direction filter. Depending on the position of the sample to be produced, either the vertical filter or the horizontal filter may be selectively used. Alternatively, the vertical filter and the horizontal filter may be used simultaneously to generate a sample included in the additional padding area.

As described above, the length n in the horizontal direction of the padding area and the length m in the vertical direction of the padding area may have the same value or may have different values. For example, n and m are natural numbers equal to or greater than 0 and may have mutually the same value, or one of m and n may have a smaller value than the other. At this time, m and n can be encoded in the encoder and signaled through the bit stream. Alternatively, according to the projection transformation method, the length n in the horizontal direction and the length m in the vertical direction in the encoder and decoder may be predefined.

The padding area may be generated by copying samples located inside the image. Specifically, the padding region located adjacent to a predetermined boundary may be generated by copying a sample located inside the boundary having continuity with a predetermined boundary in 3D space. For example, in the example shown in Figs. 16 and 17, a padding area located at the left boundary of the image may be generated by copying the sample adjacent to the right border of the image.

As another example, a padding area may be created using at least one sample inside the boundary to be padded and at least one sample outside the boundary. For example, after padding the spatially contiguous samples with the boundary to be padded to the outside of the boundary, a weighted average calculation or an average calculation is performed between the copied samples and the samples included in the boundary, Can be determined. 16 and 17, the sample value of the padding region located at the left boundary of the image may include at least one sample adjacent to the left boundary of the image and at least one sample adjacent to the right boundary of the image Weighted average or averaged.

The weight applied to each sample in the weighted average operation may be determined based on the distance to the boundary where the padding region is located. For example, of the samples in the padding region located at the left boundary, a sample close to the left boundary is derived by giving a large weight to samples located inside the left boundary, while a sample far away from the left boundary is sampled That is, samples adjacent to the right border of the image).

When the 360 degree projection image includes a plurality of paces, frame packing can be performed by adding a padding area between faces. That is, a 360 degree projection image can be generated by adding a padding area to the face boundary.

For convenience of explanation, an embodiment will be described on the basis of a 360-degree projection image which is projection-converted based on OHP. The face located at the upper end of the 360 degree projection image will be referred to as the upper face and the face located at the lower end of the 360 degree projection image will be referred to as the lower face based on the drawing shown in FIG. 18 (a) do. For example, the upper face may represent one of

faces

1, 2, 3, and 4, and the lower face may represent any of

faces

5, 6, 7,

For a given face, a padding area may be set in the form of surrounding a predetermined face. As an example, as in the example shown in Figure 18 (a), for a triangular face, a padding region containing m samples may be created.

As a result of carrying out frame packing by setting a padding area surrounding each face, as in the example shown in FIG. 18 (b), a 360-degree projection image with a padding area added between the boundaries of the image and the paces Can be obtained.

In FIG. 18A, the padding area is set to surround the face, but the padding area may be set to only a part of the face boundary. That is, unlike in the example shown in FIG. 18 (b), the padding area may be added only at the boundary of the image, or the padding area may be added only between the faces to perform the frame packing.

Alternatively, frame packing may be performed by adding a padding area only between paces at which image discontinuity occurs, in consideration of continuity between paces.

The length of the padding area between the faces may be set the same or may be set differently depending on the position. For example, the length (i.e., length in the horizontal direction) n of the padding region located at the left or right side of the predetermined face and the length m in the horizontal direction of the padding region located at the upper or lower end of the predetermined face may have the same value, Value. For example, n and m are natural numbers equal to or greater than 0 and may have mutually the same value, or one of m and n may have a smaller value than the other. At this time, m and n can be encoded in the encoder and signaled through the bit stream. Alternatively, the length n in the horizontal direction and the length m in the vertical direction may be predefined in the encoder and decoder in accordance with the projection conversion method, the position of the face, the size of the face or the shape of the face.

The sample value of the padding area may be determined based on the sample included in the predetermined face or the sample included in the predetermined face and the sample included in the face adjacent to the predetermined face.

For example, a sample value of a padding area adjacent to a boundary of a predetermined face may be generated by copying a sample included in the face or interpolating samples included in the face. For example, in the example shown in FIG. 18 (a), the upper extension region U of the upper face may be created by copying a sample adjacent to the boundary of the upper face, or by interpolating a predetermined number of samples adjacent to the boundary of the upper face . Similarly, the lower extension region D of the lower face may be generated by copying a sample adjacent to the boundary of the lower face or by interpolating a predetermined number of samples adjacent to the boundary of the lower face.

Alternatively, a sample value of a padding area adjacent to a boundary of a predetermined face may be generated using a sample value included in a face spatially adjacent to the face. Here, the inter-face adjacency can be determined based on whether the faces have continuity when the 360 degree projection image is projected back onto the 3D space. Specifically, a sample value of a padding area adjacent to a boundary of a predetermined face is generated by copying a sample included in a face spatially adjacent to the face, or a sample included in the face and a sample included in the face spatially adjacent to the face Can be generated by interpolating samples. For example, the left portion of the upper extended region of the second face may be generated based on the samples included in the first face, and the right portion may be generated based on the samples included in the third face.

The padding region between the first face and the second face may be obtained by weighted averaging at least one sample included in the first face and at least one sample included in the second face. Specifically, the padding region between the upper face and the lower face can be obtained by weighted averaging the upper extension region U and the lower extension region D.

The weight w may be determined based on the information encoded and signaled by the encoder. Alternatively, depending on the position of the sample in the padding region, the weight w may be variably determined. For example, the weight w may be determined based on the distance from the position of the sample in the padding region to the first face and the distance from the position of the sample in the padding region to the second face.

Equations (4) and (5) show examples in which the weight w is variably determined according to the position of the sample. When padding is performed between the upper face and the lower face, a sample value of the padding area is generated based on Equation (4) in the lower extended region close to the lower face, and in the upper extended region close to the upper face, A sample value of the padding region can be generated.

The filter for the weighting operation may have a vertical direction, a horizontal direction, or a predetermined angle. If the weighted filter has a predetermined angle, the sample included in the first pace and the sample included in the second pace located on the predetermined angle line from the sample in the padding region may be used to determine the sample value of the corresponding sample .

As another example, at least a portion of the padding region may be generated using only samples included in either the first face or the second face. For example, if any one of the samples included in the first face or the sample included in the second face is not available, padding can be performed using only the available samples. Alternatively, padding may be performed by replacing the unavailable sample with the surrounding available sample.

Although padding-related embodiments are described based on a specific projection transformation method, padding can be performed on the same principle as the embodiments described in the projection transformation method other than the exemplified projection transformation method. For example, padding can be performed at a face boundary or an image boundary even in a 360 degree projection image based on CMP, OHP, ECP, RSP, TPP, and the like.

In addition, padding related information can be signaled through the bitstream. Here, the padding related information may include whether padding has been performed, the position of the padding area or the padding size, and the like. Padding related information may be signaled on a picture, slice or pace basis. In one example, information indicating whether padding was performed on the top boundary, bottom boundary, left boundary, or right boundary on a per-pace basis and the padding size may be signaled.

According to the projection transformation technique, a 360 degree image can be projected and converted into a two dimensional image composed of a plurality of faces. For example, under the CMP technique, a 360 degree image can be projected and transformed into a two dimensional image composed of six faces.

The six paces may be arranged in a 2x3 form, or in a 3x2 form, as in the example shown in Fig. For example, FIG. 20 shows a 360-degree projection image in the form of 3 × 2.

In FIG. 20, six square faces of MxM size are illustrated as arranged in 3x2 form.

When a 360-degree image is encoded / decoded using a projection transformation technique in which a plurality of paces exist, image quality deterioration (i.e., face artifact) may occur at the boundary of the face. In order to prevent the occurrence of face artifacts, it is possible to consider a method of projecting and converting data of a specific face and data adjacent to a specific face at one face. That is, the predetermined pace can be configured to include not only the area corresponding to the predetermined face but also the area adjacent to the corresponding area.

Taking the CMP technique as an example, under a CMP technique, a 360-degree image approximated to a cube can be projected and transformed onto a 2D plane such that one face on the cube becomes one face, as in the example shown in FIG. For example, the Nth face of the cube may constitute the face of the index N of the 360 degree projection image.

However, when a 360-degree projection image is formed so that one face on the cube becomes one face as shown in the example shown in Fig. 20, it is inevitable that image quality deterioration occurs at the face boundary. In particular, relatively large artifacts may occur at the boundaries of the faces that are spatially continuous on the 2D plane, but not spatially contiguous on the 3D space.

In order to reduce the occurrence of face artifacts, a face can be configured so that data of a plurality of faces are included in one face. Here, the data of a plurality of surfaces may include at least a partial area of at least one of a surface corresponding to a predetermined face (hereinafter, referred to as a 'corresponding surface') and a plurality of surfaces adjacent to the corresponding surface.

As in the example shown in Fig. 21, the face 0 may be configured to include a face located at the front face and at least a partial area of the face adjacent to the face located at the front face. That is, a 360 degree image may be projected and transformed so that at least some of the corresponding faces of face 0 (i.e., the face located at the front face) and the corresponding faces of face 2, face 3, face 4, have. Accordingly, a part of the data included in the face 0 may be overlapped with data included in the face 2, face 3, face 4, and face 5.

As in the example shown in Fig. 22, each face can be configured to include data for a plurality of planes. At this time, each face may be configured to include a corresponding area and a part of four sides adjacent to the corresponding area, as in the example shown in Fig.

The number of adjacent faces included in each face may be set differently from the example shown in Fig. In one example, the predetermined face may be configured to include only a partial area of the adjacent face adjacent to the right and left of the corresponding face and the corresponding face, or only a partial area of the adjacent face adjacent to the upper face and the lower face of the corresponding face. That is, an area including data on the other side only in the left and right or upper and lower sides of the face can be set.

Or, depending on the position of the face, the number of adjacent faces included in the face may be determined to be different. (E.g., faces 2, 3, 4, and 5 in FIG. 22) located at the left and right boundaries of the image are configured to include a corresponding face and a partial area of the face adjacent to the corresponding face, Faces 1 and 6 may be configured to include a corresponding area and a partial area of two sides adjacent to the corresponding surface.

An area generated based on the adjacent surface adjacent to the face corresponding to the face may be defined as a padding area. At this time, the padding sizes for the vertical direction and the horizontal direction may have the same value. For example, in FIG. 22, the padding size for the vertical and horizontal directions is illustrated as being set to k. Unlike the illustrated example, the padding sizes for the vertical and horizontal directions may be set differently.

Furthermore, the padding sizes for the vertical and horizontal directions may be set differently depending on the face. In one example, the padding size in the horizontal direction at the face located at the left or right boundary can be set larger than the padding size in the vertical direction.

As another example, the padding size may be set differently for each face.

According to an embodiment of the present invention, a predetermined face can be configured by resampling the corresponding face of a predetermined face to a size smaller than the face, and then padding the remaining region in which the resampled image is disposed. For example, the image corresponding to the front face may be resampled to a size smaller than MxM, and the resampled image may be disposed at the center of face 0. Thereafter, padding can be performed on the remaining area of the face 0 excluding the resampled image.

Resampling can be used to reduce the size of at least one of the width or height of the image corresponding to the corresponding surface. As an example, resampling may be performed to make the width and height of the image corresponding to the front face smaller than M, as in the example shown in FIG. That is, a filter for resampling can be applied to both the horizontal direction and the vertical direction.

Alternatively, resampling may be performed in order to keep the size of either the width or the height of the image corresponding to the corresponding surface at M, while making the size of the other one smaller than M. That is, a filter for resampling can be applied only in the horizontal direction or the vertical direction.

The padding may be performed using at least one of a sample (or block) located at the boundary of the corresponding surface or a sample (or block) included in the plane adjacent to the corresponding surface. For example, the value of a sample included in the padding region may be generated by copying a sample located at a boundary of a corresponding surface or a sample included in a surface adjacent to the corresponding surface, or a sample located at a boundary of the corresponding surface, Can be generated based on an averaging operation or a weighting operation of the samples included in the plane.

As in the above-described example, the projection transformation method of constructing the face using the corresponding surface and the adjacent surface adjacent to the corresponding surface can be defined as Overlapped Face Projection. Although the face overlap projection conversion method based on the CMP technique has been described with reference to FIGS. 21 and 22, the face overlap projection conversion method can be applied to the projection conversion technique in which a plurality of face generation is caused. For example, the face overlap projection conversion method may be applied to ISP, OHP, TPP, SSP, ECP, or RSP.

Information regarding the face overlap projection conversion method can be signaled through the bit stream. The information on the face overlap projection conversion method includes information indicating whether or not the face overlap projection conversion method is used, information indicating the number of adjacent faces included in the face, information indicating whether or not the padding area exists, Information indicating the padding size, whether or not a padding area has been created using the neighboring paces adjacent to the current face in the three-dimensional space, and the like.

According to the projection transformation technique, it may happen that spatially continuous faces are not spatially continuous on the 2D plane in the 3D space. For example, in the 360-degree projection image based on the TPP shown in FIG. 13, the front face and the left face are spatially continuous in the 3D space, but the front face and the left face are not spatially continuous on the 2D plane.

In the case of reconstructing a 360-degree image after encoding / decoding the 360-degree projection image, the Fate artifacts are relatively large at the face boundaries that are adjacent to each other but not adjacent to each other after being projected and transformed on the 2D plane. Subjective image quality may be lowered accordingly.

In order to reduce face artifact, padding can be performed at a predetermined face using data of a face which is not neighboring the predetermined face in the 360 degree projection image. More specifically, it is possible to add a padding area to the boundary of a predetermined page using data of a face which is not neighboring to a predetermined face.

The padding is not adjacent to the predetermined face on the 2D plane, but when the 360 degree projection image is reconstructed in 3D, the padding can be performed using a face adjacent to the predetermined face. As described above, padding the boundary of the current face using data of a continuous face (or sub-face) in the 3D space although it is not adjacent to the 360-degree projection image can be defined as Overlapped Face Padding.

Face overlap padding can be performed by copying a portion of the face that is not adjacent to the current face. That is, the padding area added to the border of the current face may be a copy of a face area that is not adjacent to the current face.

As another example, face overlap padding may be performed based on a sample (or block) adjacent to the boundary of the current face and a sample (or block) adjacent to the boundary of the face that is not adjacent to the current face. Specifically, the value of the sample included in the padding area is calculated by copying a sample located at the boundary of the current face or a sample located at the boundary of the face not adjacent to the current face, May be generated based on an averaging operation or a weighting operation of a sample located at the boundary of a face that is not adjacent to the current face.

Face overlap padding may be performed considering at least one of the continuity of the current face and the shape of the face that is not adjacent to the current face. Specifically, a padding area may be added to the boundary of the current face where the face is spatially contiguous with the current face in the 3D space. At this time, the shape of the padding area to be added may be determined based on the shape of the pace that is not neighboring the current face. Referring to the drawings, face overlap padding will be described in more detail.

When restoring the 360 degree projection image based on the TPP technique to 3D, the front face is continuous with the top face, the face, the right face, and the left face. Of these, you can add a padding area to the front face boundary using the remaining faces except the front face and the continuous right face in the 2D plane.

The padding area added to the boundary of the front face can be generated based on the face that is adjacent to the boundary of the front face when reconstructing the 360 projection image into 3D. For example, since the left boundary of the front face is adjacent to the left face in the 3D space, a padding area generated using data included in the left face is added to the left boundary of the front face, as in the example shown in FIG. 23 .

As in the example shown in FIG. 23, padding may be performed only on one side boundary of the 360 degree projection image, or padding may be performed on multiple boundaries of the 360 degree projection image, as in the example shown in FIG. have. 24, a padding area generated using data included in the top face is added to the upper boundary of the front face, while data included in the bottom face is added to the lower boundary of the front face A padding area may be added. With the same principle, a padding area generated using data included in the front face can be added to the upper boundary of the top face and the lower boundary of the bottom face.

Although not shown, it is also possible to add a padding area to the right border of the Left face using data included in the Top face.

The right border of the front face may be adjacent to the right face that is continuous in 3D space, and the padding area may not be added. That is, the pace adjacent to the current face may not be contiguous in the 3D space, or a padding area may be added to the boundary where the neighboring face does not exist.

The shape of the padding area may be determined based on the shape of the pace that is not adjacent to the current face. For example, since the top face, the bottom face, and the left face are in a trapezoidal shape, a padding area added to each boundary of the front face may be a copy of a part of the trapezoid as in the example shown in FIG. With the same principle, the upper boundary of the top face and the padding area added to the lower boundary of the bottom face may be a copy of a part of the rectangle, as in the example shown in Fig.

It is possible to generate a padding area by copying the rotated or flipped part of the face in consideration of continuity between faces in 3D space. For example, the padding area of the top border of the front face may be a copy of a part of the top face rotated 180 degrees, and the padding area of the bottom border of the front face may be a copy of the bottom face rotated 180 degrees It can be done. On the same principle, the upper boundary of the top face and the padding area of the lower boundary of the bottom face may be copies of a portion of the front face rotated 180 degrees.

In order to increase the coding / decoding efficiency of the 360 degree projection image, the 360 degree projection image may have a rectangular shape. Accordingly, when the non-rectangular padding region is added to the boundary of the face, an inactive region that can not fill the rectangle may occur. For example, in the example shown in FIGS. 23 and 24, a trapezoidal padding area among the areas surrounding the Front face may be added, and the remaining area may be set as an inactive area.

A pixel included in the inactive area may have a value calculated based on a predefined value or a bit depth. In one example, a pixel in the inactive area may have an intermediate value of the maximum value that can be represented by the bit depth. In the 8-bit image, pixels in the inactive region can have 128, which is the intermediate value of the maximum value that can be represented by 8 bits. In the 10-bit image, pixels in the inactive region can be 512, which is an intermediate value of the maximum value that can be expressed by 10 bits.

As another example, a pixel in an inactive area may be determined by a sample located at the boundary of the face or padding area. As an example, pixels in the inactive area can be generated by copying samples located at the boundary of the face or padding area. Alternatively, the pixels in the inactive area may be generated based on an averaging operation or a weighting operation of the samples in the padding area lying on the same horizontal line and the sample in the padding area lying on the same vertical line.

Next, the face overlap padding in the 360-degree projection image based on the OHP technique will be described.

When the 360 degree projection image is generated by alternately arranging the upper face and the lower face as in the example shown in FIG. 12 or 18, encoding / decoding efficiency may be hindered due to the discontinuity of the inter-face image. In order to solve such a problem, a frame packing method that maximizes continuity of an image can be considered.

25, the four upper faces constituting the octahedron are referred to as

faces

1, 2, 3 and 4, respectively, and the four lower faces are referred to as

faces

5, 6 and 7 , And 8, respectively.

Considering the continuity between faces, it is desirable to arrange the spatially

continuous faces

1, 2, 3 on the 2D plane continuously in the 3D space, and continuously face the spatially

continuous faces

5, 6, 7 on the 2D plane Can be deployed. Further, using the point where the face 2 and the face 6 are continuous in the 3D space, the face 2 and the face 6 can be arranged so as to be adjacent to each other.

The remaining faces 4 and 8 may be bisected, and the bisected faces may be placed in the remaining portions of the rectangle. Accordingly, a 360-degree projection image in a rectangular shape can be obtained as in the example shown in FIG. For convenience of explanation, it is assumed to distinguish the bisected paces with '-1' and '-2', respectively.

As in the example shown in FIG. 25, even if frame packing is performed, adjacent faces may not be adjacent to each other when the 360 degree projection image is restored to 3D in the 2D plane. For example, in the example shown in FIG. 25, faces 8-2 and 5, faces 8-1 and 7, faces 4-2 and 1, and faces 4-1 and 3 are neighbors on the 2D plane, It is not neighbor in space.

If the face neighboring the current face on the 2D plane is restored to 3D when the 360 degree projection image is restored to 3D, face artifacts are likely to occur at the current face boundary. To reduce image quality degradation due to the generation of face artifacts, padding can be performed at a face-to-face boundary that is contiguous on the 2D plane but not contiguous in the 3D space.

In the example shown in Fig. 26, between faces 8-2 and 5 that are adjacent on the 2D plane but not on the 3D space, between faces 4-2 and 1, between faces 8-1 and 7, and 4-1 And a padding area may be added between face 3 and face 3.

In addition, a padding area may be added to a border where there is no neighbor face. As an example, a padding area may be added to the upper boundary of face 8-2 and the upper boundary of face 4-2, and a padding area may be added to the lower boundary of face 8-1 and the lower boundary of face 4-1 .

The value of a sample included in the padding region may be generated based on at least one of a sample included in the current face or a sample contained in a neighboring neighboring face of the current face. Here, the neighboring face may include a face neighboring the current face on the 2D plane, or a face neighboring the current face when the 360 degree projection image is projected backward in 3D. As an example, the samples included in the padding region located between the faces 8-2 and 5 may be generated based on the average or weighted operation of the samples included at face 8 and the samples included at face 5.

Alternatively, the padding may be performed by copying a portion of the neighboring paces on the 3D space. As an example, the padding area added to the upper boundaries of faces 8-2 and 4-2 may be generated by copying a portion of face 8-1 and face 4-1. The padding area added to the lower boundaries of the faces 8-1 and 4-1 may be generated by copying a part of the faces 8-2 and 4-2.

The shape of the padding area may be determined based on the shape of the pace that is not adjacent to the current face. As an example, the padding areas added to the upper boundaries of faces 8-2 and 4-2 may be trapezoidal in shape corresponding to portions of triangles, face 8-1 and face 4-1. The padding area added to the bottom edges of the faces 8-1 and 4-1 may also be a trapezoidal shape corresponding to a part of the faces 8-2 and 4-2.

As non-rectangular padding areas are added, inactive areas can occur in a 360 degree projection image. The sample value in the inactive area may have a predefined value, a value determined by bit depth, or a value determined by an adjacent sample.

23 and 26 illustrate the face overlap padding based on the TPP and OHP techniques, respectively, but the face overlap padding can also be applied to the projection transformation technique in which a plurality of face generation is caused. For example, face overlap padding may be applied to ISP, CMP, TPP, SSP, ECP or RSP.

Information regarding the face overlap padding can be signaled through the bit stream. The information on the face overlap padding may include at least one of information indicating whether face overlap padding is used, information indicating whether a padding area exists, information indicating a position of the padding area, or information indicating a padding size .

Although the above-described embodiments have been described on the basis of a series of steps or flowcharts, they do not limit the time-series order of the invention, and may be performed simultaneously or in different orders as necessary. Further, in the above-described embodiments, each of the components (for example, units, modules, etc.) constituting the block diagram may be implemented by a hardware device or software, and a plurality of components may be combined into one hardware device or software . The above-described embodiments may be implemented in the form of program instructions that may be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

The present invention can be applied to an electronic device capable of encoding / decoding an image.

Claims

Generating a 360-degree projection image including a plurality of paces by projectively transforming a three-dimensional 360-degree image onto a two-dimensional plane;

Adding a padding region to at least one border of a current face of the plurality of faces; And

Encoding the padding related information of the current face,

Wherein the padding region is generated based on a sample included in at least a portion of a face that is not adjacent to the current face in the 360 degree projection image.
The method according to claim 1,

Wherein the padding region is added to the boundary of the current face adjacent to the neighboring face when the neighboring face neighboring the current face in the 360 degree projection image is not adjacent to the current face in the 360 degree image. A video encoding method.
The method according to claim 1,

Wherein the padding region is generated by copying a portion of a neighboring face neighboring the current face in the 360-degree image, while the padding region is not adjacent to the current face in the 360-degree projection image.
The method according to claim 1,

Wherein the padding region includes a sample included in the current face and an average operation or weight of a sample included in the neighboring face adjacent to the current face in the 360 degree image that is not adjacent to the current face in the 360 degree projection image, And generating an image based on the computation.
The method according to claim 1,

Wherein the shape of the padding region is determined based on a shape of a neighboring face neighboring the current face in the 360-degree image, while the shape of the padding region is not adjacent to the current face in the 360-degree projection image.
6. The method of claim 5,

Wherein when the padding region is a non-rectangular shape, the value of a sample in the active region, to which the padding region is added, of the 360 degree projection image is determined by the bit depth of the 360 degree projection image. Way.
The method according to claim 1,

Wherein the padding region comprises a vertical padding region tangent to an upper or lower boundary of the current face and a horizontal padding region tangent to a left or right boundary of the current face,

Wherein the length of the vertical direction padding region and the size of the horizontal direction padding region are different from each other.
The method according to claim 1,

Wherein the padding-related information includes at least one of information indicating whether the padding area exists, information indicating a position of the padding area, or information indicating a length of the padding area.
Decoding the padding related information of the current face;

Decoding the padding area on at least one side boundary of the current face based on the padding related information; And

And projecting the 360 degree projection image including the decoded current face back to the three dimensional space to generate a 360 degree image,

Wherein the padding region is generated based on a sample included in at least a part of a face that is not adjacent to the current face in the 360 degree projection image.
10. The method of claim 9,

Wherein the padding region is generated by copying a portion of a neighboring face neighboring the current face in the 360-degree image, while the padding region is not adjacent to the current face in the 360-degree projection image.
10. The method of claim 9,

Wherein the padding region is a region that is not adjacent to the current face in the sample included in the current face and the 360 degree projection image but is an average or weighted calculation of samples included in the neighboring pace neighboring the current face in the 360 degree image And generating the decoded image based on the decoded image.
10. The method of claim 9,

Wherein the shape of the padding region is determined based on a shape of a neighboring face neighboring the current face in the 360-degree image, while the shape of the padding region is not adjacent to the current face in the 360-degree projection image.
13. The method of claim 12,

Wherein when the padding region is a non-rectangular shape, a value of a sample in an inactive region generated as the padding region is added is determined by the bit depth of the 360 degree projection image.
10. The method of claim 9,

Wherein the padding region comprises a vertical padding region tangent to an upper or lower boundary of the current face and a horizontal padding region tangent to a left or right boundary of the current face,

Wherein a length of the vertical direction padding area and a size of the horizontal direction padding area are different from each other.
10. The method of claim 9,

Wherein the padding-related information includes at least one of information indicating whether the padding area exists, information indicating a position of the padding area, or information indicating a length of the padding area.