WO2020046085A1

WO2020046085A1 - Method and apparatus for processing image signal

Info

Publication number: WO2020046085A1
Application number: PCT/KR2019/011249
Authority: WO
Inventors: 구문모; 살레후메디; 팔루리시달; 김승환; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2018-09-02
Filing date: 2019-09-02
Publication date: 2020-03-05

Abstract

Embodiments of the present invention provide a method and an apparatus for processing a video signal. A method for processing an image signal, according to one embodiment of the present invention, comprises the steps of: confirming a transformation index for indicating a transformation kernel for the transformation of a current block; determining a transformation matrix corresponding to the transformation index; and generating an array of residual samples by applying the transformation matrix to transformation coefficients of the current block, wherein the components of the transformation matrix are implemented by a shift operation and addition of 1.

Description

Method and apparatus for processing video signal

The present invention relates to a method and apparatus for processing a video signal, and more particularly, to a method and apparatus for encoding or decoding a video signal by performing a transformation.

Compression coding refers to a series of signal processing techniques for transmitting digitized information through a communication line or for storing in a form suitable for a storage medium. Media such as an image, an image, and voice may be a target of compression encoding. In particular, a technique of performing compression encoding on an image is referred to as video image compression.

Next-generation video content will be characterized by high spatial resolution, high frame rate, and high dimensionality of scene representation. Processing such content will result in a huge increase in terms of memory storage, memory access rate, and processing power.

Therefore, there is a need to design a coding tool for more efficiently processing next generation video content. In particular, the video codec standard after the high efficiency video coding (HEVC) standard requires an efficient conversion technique for converting a spatial domain video signal into the frequency domain along with a higher accuracy prediction technique. Shall be.

Embodiments of the present invention provide an image signal processing method and apparatus for reducing the computational complexity during conversion.

Technical problems to be achieved in the present invention are not limited to the above-mentioned technical problems, and other technical problems not mentioned above will be clearly understood by those skilled in the art from the following description. Could be.

In accordance with another aspect of the present invention, there is provided a method of processing an image signal, the method comprising: identifying a transform index indicating a transform kernel for transforming a current block, determining a transform matrix corresponding to the transform index, and converting the transform index; Generating an array of residual samples by applying a matrix to the transform coefficients of the current block, the components of the transform matrix being implemented by a shift operation and addition of one.

In addition, each of the components of the transformation matrix may be implemented by a sum of terms consisting of a left shift of one.

Further, the number of terms constituting each of the components of the transformation matrix may be set to be smaller than three.

In addition, each of the components of the transformation matrix may be set to a value approximated within an allowable error range from DCT-4, DST-7, or DCT-8.

In addition, each of the components of the transformation matrix may be determined in consideration of the allowed error range and the number of terms.

An apparatus for processing an image signal according to another embodiment of the present invention includes a memory for storing the image signal and a processor coupled to the memory, wherein the processor indicates a transform kernel for transforming a current block. Determine a transform index, determine a transform matrix corresponding to the transform index, and apply the transform matrix to transform coefficients of the current block, the components of the transform matrix being: It can be implemented by shift operation and addition of one.

According to an embodiment of the present invention, the computational complexity may be reduced by performing a transformation using a shift operation and an addition operation without a multiplication operation.

The effect obtained in the present invention is not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description. .

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description in order to provide a thorough understanding of the present invention, provide an embodiment of the present invention and together with the description, describe the technical features of the present invention.

1 shows an example of an image coding system as an embodiment to which the present invention is applied.

2 is a schematic block diagram of an encoding apparatus in which an encoding of a video / image signal is performed, according to an embodiment to which the present invention is applied.

3 is an embodiment to which the present invention is applied and shows a schematic block diagram of a decoding apparatus in which decoding of a video signal is performed.

4 is a structural diagram of a content streaming system according to an embodiment to which the present invention is applied.

5 is an embodiment to which the present invention may be applied. FIG. 5A is a quadtree (QT), FIG. 5B is a binary tree (BT), and FIG. 5C is a ternary tree (TT). FIG. 4 is a diagram for describing block division structures by Tree (AT). FIG.

6 and 7 illustrate embodiments to which the present invention is applied. FIG. 6 is a schematic block diagram of a transform and quantization unit, an inverse quantization unit, and an inverse transform unit in the encoding apparatus 100 of FIG. 2, and FIG. A schematic block diagram of the inverse quantization and inverse transform portion is shown.

8 is a flowchart illustrating a process of performing adaptive multiple transform (AMT).

9 is a flowchart illustrating a decoding process in which AMT is performed.

10 shows three forward scan sequences for transform coefficients or transform coefficient blocks applied in the HEVC standard, (a) a diagonal scan, (b) a horizontal scan, and (c) a vertical scan (vertical scan).

11 and 12 illustrate embodiments to which the present invention is applied. FIG. 11 shows positions of transform coefficients when a forward diagonal scan is applied when 4x4 RST is applied to a 4x8 block, and FIG. 12 shows two 4x4 blocks. An example of a case of merging valid transform coefficients of a into one block is shown.

13 is a flowchart illustrating an inverse transform process based on multiple transform selection (MTS) according to an embodiment of the present invention.

14 is a block diagram of an apparatus for performing decoding based on an MTS according to an embodiment of the present invention.

15 shows an example of a decoding flowchart for performing a conversion process according to an embodiment of the present invention.

16 shows a flowchart for processing a video signal according to an embodiment to which the present invention is applied.

17 shows an example of a block diagram of an apparatus for processing a video signal as an embodiment to which the present invention is applied.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The detailed description, which will be given below with reference to the accompanying drawings, is intended to explain exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced. The following detailed description includes specific details in order to provide a thorough understanding of the present invention. However, one of ordinary skill in the art appreciates that the present invention may be practiced without these specific details.

In some instances, well-known structures and devices may be omitted or shown in block diagram form centering on the core functions of the structures and devices in order to avoid obscuring the concepts of the present invention.

In addition, the terminology used in the present invention is selected as a general term that is widely used as possible now, in a specific case will be described using terms arbitrarily selected by the applicant. In such a case, since the meaning is clearly described in the detailed description of the relevant part, it should not be interpreted simply by the name of the term used in the description of the present invention, and it should be understood that the meaning of the term should be understood and interpreted. .

Specific terms used in the following description are provided to help the understanding of the present invention, and the use of the specific terms may be changed to other forms without departing from the technical spirit of the present invention. For example, signals, data, samples, pictures, frames, blocks, etc. may be appropriately replaced and interpreted in each coding process.

Hereinafter, in the present specification, a 'processing unit' refers to a unit in which a process of encoding / decoding such as prediction, transformation, and / or quantization is performed. In addition, the processing unit may be interpreted to include a unit for a luma component and a unit for a chroma component. For example, the processing unit may correspond to a block, a coding unit (CU), a prediction unit (PU), or a transform unit (TU).

In addition, the processing unit may be interpreted as a unit for the luminance component or a unit for the chrominance component. For example, the processing unit may correspond to a CTB, CB, PU or TB for the luminance component. Alternatively, the processing unit may correspond to a CTB, CB, PU or TB for the chrominance component. In addition, the present invention is not limited thereto, and the processing unit may be interpreted to include a unit for a luminance component and a unit for a color difference component.

In addition, the processing unit is not necessarily limited to square blocks, but may also be configured in a polygonal form having three or more vertices.

In the following specification, a pixel or a pixel is referred to as a sample. In addition, using a sample may mean using a pixel value or a pixel value.

The image coding system can include a source device 10 and a receiving device 20. The source device 10 may transmit the encoded video / video information or data to the receiving device 20 through a digital storage medium or a network in a file or streaming form.

Source device 10 may include a video source 11, an encoding device 12, and a transmitter 13. The receiving device 20 may include a receiver 21, a decoding device 22 and a renderer 23. The encoding device 10 may be called a video / image encoding device, and the decoding device 20 may be called a video / image decoding device. The transmitter 13 may be included in the encoding device 12. The receiver 21 may be included in the decoding device 22. The renderer 23 may include a display unit, and the display unit may be configured as a separate device or an external component.

The video source may acquire the video / image through a process of capturing, synthesizing, or generating the video / image. The video source may comprise a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, one or more cameras, video / image archives including previously captured video / images, and the like. Video / image generation devices may include, for example, computers, tablets and smartphones, and may (electronically) generate video / images. For example, a virtual video / image may be generated through a computer or the like. In this case, the video / image capturing process may be replaced by a process of generating related data.

The encoding device 12 may encode the input video / image. The encoding device 12 may perform a series of procedures such as prediction, transform, and quantization for compression and coding efficiency. The encoded data (encoded video / image information) may be output in the form of a bitstream.

The transmitter 13 may transmit the encoded video / video information or data output in the form of a bitstream to the receiver of the receiving device through a digital storage medium or a network in the form of a file or streaming. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, and the like. The transmission unit 13 may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcast / communication network. The receiver 21 may extract the bitstream and transfer it to the decoding device 22.

The decoding device 22 may decode the video / image by performing a series of procedures such as inverse quantization, inverse transformation, and prediction corresponding to the operation of the encoding device 12.

The renderer 23 may render the decoded video / image. The rendered video / image may be displayed through the display unit.

2 is a schematic block diagram of an encoding apparatus in which an encoding of a video / image signal is performed, according to an embodiment to which the present invention is applied. The encoding apparatus 100 of FIG. 2 may correspond to the encoding apparatus 12 of FIG. 1.

The image divider 110 may divide an input image (or a picture or a frame) input to the encoding apparatus 100 into one or more processing units. As an example, the processing unit may be called a coding unit (CU). In this case, the coding unit may be recursively divided according to a quad-tree binary-tree (QTBT) structure from a coding tree unit (CTU) or a largest coding unit (LCU). For example, one coding unit may be divided into a plurality of coding units of a deeper depth based on a quad tree structure and / or a binary tree structure. In this case, for example, the quad tree structure may be applied first and the binary tree structure may be applied later. Alternatively, the binary tree structure may be applied first. The coding procedure according to the present invention may be performed based on the final coding unit that is no longer split. In this case, the maximum coding unit may be used as the final coding unit immediately based on coding efficiency according to the image characteristic, or if necessary, the coding unit is recursively divided into coding units of lower depths and optimized. A coding unit of size may be used as the final coding unit. Here, the coding procedure may include a procedure of prediction, transform, and reconstruction, which will be described later. As another example, the processing unit may further include a prediction unit (PU) or a transform unit (TU). In this case, the prediction unit and the transform unit may be partitioned or partitioned from the last coding unit described above, respectively. The prediction unit may be a unit of sample prediction, and the transformation unit may be a unit for deriving a transform coefficient and / or a unit for deriving a residual signal from the transform coefficient.

The unit may be used interchangeably with terms such as block or area in some cases. In a general case, an M × N block may represent a set of samples or transform coefficients composed of M columns and N rows. A sample may generally represent a pixel or a value of a pixel, and may represent only a pixel / pixel value of a luma component or only a pixel / pixel value of a chroma component. A sample may be used as a term corresponding to one picture (or image) for a pixel or a pel.

The encoding apparatus 100 subtracts the prediction signal (predicted block, prediction sample array) output from the inter prediction unit 180 or the intra prediction unit 185 from the input image signal (original block, original sample array). A signal may be generated (residual signal, residual block, residual sample array), and the generated residual signal is transmitted to the converter 120. In this case, as shown, a unit for subtracting a prediction signal (prediction block, prediction sample array) from an input image signal (original block, original sample array) in the encoder 100 may be referred to as a subtraction unit 115. The prediction unit may perform a prediction on a block to be processed (hereinafter, referred to as a current block) and generate a predicted block including prediction samples for the current block. The prediction unit may determine whether intra prediction or inter prediction is applied on a current block or CU basis. As described later in the description of each prediction mode, the prediction unit may generate various information related to prediction, such as prediction mode information, and transmit the generated information to the entropy encoding unit 190. The information about the prediction may be encoded in the entropy encoding unit 190 and output in the form of a bitstream.

The intra predictor 185 may predict the current block by referring to the samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located apart according to the prediction mode. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. Non-directional mode may include, for example, DC mode and planner mode (Planar mode). The directional mode may include, for example, 33 directional prediction modes or 65 directional prediction modes depending on the degree of detail of the prediction direction. However, as an example, more or less number of directional prediction modes may be used depending on the setting. The intra predictor 185 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.

The inter predictor 180 may derive the predicted block with respect to the current block based on the reference block (reference sample array) specified by the motion vector on the reference picture. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, subblocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter prediction, the neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture. A reference picture including a reference block and a reference picture including a temporal neighboring block may be the same or different. The temporal neighboring block may be referred to as a collocated reference block, a collocated CU (colCU), or the like. A reference picture including a temporal neighboring block may be referred to as a collocated picture (colPic). have. For example, the inter prediction unit 180 constructs a motion information candidate list based on neighboring blocks, and provides information indicating which candidates are used to derive the motion vector and / or reference picture index of the current block. Can be generated. Inter prediction may be performed based on various prediction modes. For example, in the case of a skip mode and a merge mode, the inter prediction unit 180 may use motion information of a neighboring block as motion information of a current block. In the skip mode, unlike the merge mode, the residual signal may not be transmitted. In the case of motion vector prediction (MVP) mode, the motion vector of the neighboring block is used as a motion vector predictor, and the motion vector of the current block is signaled by signaling a motion vector difference. Can be directed.

The prediction signal generated by the inter predictor 180 or the intra predictor 185 may be used to generate a reconstruction signal or may be used to generate a residual signal.

The transformer 120 may apply transform techniques to the residual signal to generate transform coefficients. For example, the transformation technique may include at least one of a discrete cosine transform (DCT), a discrete sine transform (DST), a karhunen-loeve transform (KLT), a graph-based transform (GBT), or a conditionally non-linear transform (CNT). It may include. Here, GBT means a conversion obtained from this graph when the relationship information between pixels is represented by a graph. The CNT refers to a transform that is generated based on and generates a prediction signal by using all previously reconstructed pixels. In addition, the conversion process may be applied to pixel blocks having the same size as the square, or may be applied to blocks of variable size rather than square.

The quantization unit 130 quantizes the transform coefficients and transmits them to the entropy encoding unit 190. The entropy encoding unit 190 encodes the quantized signal (information about the quantized transform coefficients) and outputs the bitstream as a bitstream. have. Information about the quantized transform coefficients may be referred to as residual information. The quantization unit 130 may rearrange the quantized transform coefficients in the form of a block into a one-dimensional vector form based on a coefficient scan order, and the quantized transform based on the quantized transform coefficients in the form of a one-dimensional vector. Information about the coefficients may be generated. The entropy encoding unit 190 may perform various encoding methods such as, for example, exponential Golomb, context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), and the like. The entropy encoding unit 190 may encode information necessary for video / image reconstruction other than quantized transform coefficients (for example, values of syntax elements) together or separately. The encoded information (eg, video / picture information) may be transmitted or stored in units of NALs (network abstraction layer) in a bitstream form. The bitstream may be transmitted over a network or may be stored in a digital storage medium. The network may include a broadcasting network and / or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, and the like. The signal output from the entropy encoding unit 190 may include a transmitting unit (not shown) for transmitting and / or a storing unit (not shown) for storing as an internal / external element of the encoding apparatus 100, or the transmitting unit It may be a component of the entropy encoding unit 190.

The quantized transform coefficients output from the quantization unit 130 may be used to generate a prediction signal. For example, the quantized transform coefficients may be reconstructed in the residual signal by applying inverse quantization and inverse transform through inverse quantization unit 140 and inverse transform unit 150 in a loop. The adder 155 adds the reconstructed residual signal to the predicted signal output from the inter predictor 180 or the intra predictor 185 so that a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) is added. Can be generated. If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block may be used as the reconstructed block. The adder 155 may be called a restoration unit or a restoration block generation unit. The generated reconstruction signal may be used for intra prediction of the next block to be processed in the current picture, and may be used for inter prediction of the next picture through filtering as described below.

The filtering unit 160 may improve subjective / objective image quality by applying filtering to the reconstruction signal. For example, the filtering unit 160 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture, and transmit the modified reconstructed picture to the decoded picture buffer 170. Various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, and the like. The filtering unit 160 may generate various information related to the filtering and transmit the generated information to the entropy encoding unit 190 as described later in the description of each filtering method. The filtering information may be encoded in the entropy encoding unit 190 and output in the form of a bitstream.

The modified reconstructed picture transmitted to the decoded picture buffer 170 may be used as the reference picture in the inter predictor 180. When the inter prediction is applied through the encoding apparatus, the encoding apparatus may avoid prediction mismatch between the encoding apparatus 100 and the decoding apparatus, and may improve encoding efficiency.

The decoded picture buffer 170 may store the modified reconstructed picture for use as a reference picture in the inter prediction unit 180.

3 is an embodiment to which the present invention is applied and shows a schematic block diagram of a decoding apparatus in which decoding of a video signal is performed. The decoding device 200 of FIG. 3 may correspond to the decoding device 22 of FIG. 1.

Referring to FIG. 3, the decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an adder 235, a filtering unit 240, and a decoded picture buffer (DPB). 250, an inter predictor 260, and an intra predictor 265 may be configured. The inter predictor 260 and the intra predictor 265 may be collectively called a predictor. That is, the predictor may include an inter predictor 180 and an intra predictor 185. The inverse quantization unit 220 and the inverse transform unit 230 may be collectively called a residual processing unit. That is, the residual processor may include an inverse quantization unit 220 and an inverse transform unit 230. The entropy decoding unit 210, the inverse quantization unit 220, the inverse transformer 230, the adder 235, the filtering unit 240, the inter prediction unit 260, and the intra prediction unit 265 are described above. Can be configured by one hardware component (eg, decoder or processor). In addition, the decoded picture buffer 250 may be implemented by one hardware component (for example, a memory or a digital storage medium) according to an exemplary embodiment.

When a bitstream including video / image information is input, the decoding apparatus 200 may reconstruct an image corresponding to a process in which the video / image information is processed in the encoding apparatus 100 of FIG. 2. For example, the decoding apparatus 200 may perform decoding using a processing unit applied in the encoding apparatus 100. The processing unit of decoding may thus be a coding unit, for example, and the coding unit may be divided along the quad tree structure and / or the binary tree structure from the coding tree unit or the largest coding unit. The reconstructed video signal decoded and output through the decoding apparatus 200 may be reproduced through the reproducing apparatus.

The decoding apparatus 200 may receive a signal output from the encoding apparatus 100 of FIG. 2 in the form of a bitstream, and the received signal may be decoded through the entropy decoding unit 210. For example, the entropy decoding unit 210 may parse the bitstream to derive information (eg, video / image information) necessary for image reconstruction (or picture reconstruction). For example, the entropy decoding unit 210 decodes the information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, quantized values of syntax elements required for image reconstruction, and transform coefficients for residuals. Can be output. More specifically, the CABAC entropy decoding method receives a bin corresponding to each syntax element in a bitstream, and decodes syntax element information and decoding information of neighboring and decoding target blocks or information of symbols / bins decoded in a previous step. The context model may be determined using the context model, the probability of occurrence of a bin may be predicted according to the determined context model, and arithmetic decoding of the bin may be performed to generate a symbol corresponding to the value of each syntax element. have. In this case, the CABAC entropy decoding method may update the context model by using the information of the decoded symbol / bin for the context model of the next symbol / bin after determining the context model. The information related to the prediction among the information decoded by the entropy decoding unit 210 is provided to the predictor (the inter predictor 260 and the intra predictor 265), and the entropy decoding performed by the entropy decoder 210 is performed. Dual values, that is, quantized transform coefficients and related parameter information, may be input to the inverse quantizer 220. In addition, information on filtering among information decoded by the entropy decoding unit 210 may be provided to the filtering unit 240. Meanwhile, a receiver (not shown) that receives a signal output from the encoding apparatus 100 may be further configured as an internal / external element of the decoding apparatus 200, or the receiver may be a component of the entropy decoding unit 210. It may be.

The inverse quantization unit 220 may dequantize the quantized transform coefficients and output the transform coefficients. The inverse quantization unit 220 may rearrange the quantized transform coefficients in the form of a two-dimensional block. In this case, reordering may be performed based on the coefficient scan order performed in the encoding apparatus 100. The inverse quantization unit 220 may perform inverse quantization on quantized transform coefficients using a quantization parameter (for example, quantization step size information), and may obtain transform coefficients.

The inverse transformer 230 inversely transforms the transform coefficients to obtain a residual signal (residual block, residual sample array).

The prediction unit may perform prediction on the current block and generate a predicted block including prediction samples for the current block. The prediction unit may determine whether intra prediction or inter prediction is applied to the current block based on the information about the prediction output from the entropy decoding unit 210, and may determine a specific intra / inter prediction mode.

The intra predictor 265 may predict the current block by referring to the samples in the current picture. The referenced samples may be located in the neighbor of the current block or may be spaced apart according to the prediction mode. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra predictor 265 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.

The inter predictor 260 may derive the predicted block for the current block based on the reference block (reference sample array) specified by the motion vector on the reference picture. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, subblocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter prediction, the neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture. For example, the inter prediction unit 260 may construct a motion information candidate list based on neighboring blocks and derive a motion vector and / or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes, and the information about the prediction may include information indicating a mode of inter prediction for the current block.

The adder 235 adds the obtained residual signal to the predictive signal (predicted block, predictive sample array) output from the inter predictor 260 or the intra predictor 265 to restore the reconstructed signal (reconstructed picture, reconstructed block). , Restore sample array). If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block may be used as the reconstructed block.

The adder 235 may be called a restoration unit or a restoration block generation unit. The generated reconstruction signal may be used for intra prediction of the next block to be processed in the current picture, and may be used for inter prediction of the next picture through filtering as described below.

The filtering unit 240 may improve subjective / objective image quality by applying filtering to the reconstruction signal. For example, the filtering unit 240 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture, and transmit the modified reconstructed picture to the decoded picture buffer 250. Various filtering methods may include, for example, deblocking filtering, sample adaptive offset (SAO), adaptive loop filter (ALF), bilateral filter, and the like.

The modified reconstructed picture transmitted to the decoded picture buffer 250 may be used as the reference picture by the inter predictor 260.

In the present specification, the embodiments described by the filtering unit 160, the inter prediction unit 180, and the intra prediction unit 185 of the encoding apparatus 100 are respectively the filtering unit 240 and the inter prediction unit 260 of the decoding apparatus. ) And the intra prediction unit 265 may be equally or correspondingly applied.

The content streaming system to which the present invention is applied may largely include an encoding server 410, a streaming server 420, a web server 430, a media storage 440, a user device 450, and a multimedia input device 460. have.

The encoding server 410 compresses content input from multimedia input devices such as a smartphone, a camera, a camcorder, etc. into digital data to generate a bitstream and transmit the bitstream to the streaming server 420. As another example, when the multimedia input device 460 such as a smartphone, a camera, a camcorder, or the like directly generates a bitstream, the encoding server 410 may be omitted.

The bitstream may be generated by an encoding method or a bitstream generation method to which the present invention is applied, and the streaming server 420 may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

The streaming server 420 transmits the multimedia data to the user device 450 based on the user request through the web server 430, and the web server 430 serves as an intermediary to inform the user of what service there is. When a user requests a desired service from the web server 430, the web server 430 transmits the request to the streaming server 420, and the streaming server 420 transmits multimedia data to the user. At this time, the content streaming system may include a separate control server, in which case the control server serves to control the command / response between each device in the content streaming system.

The streaming server 420 may receive content from the media store 440 and / or the encoding server 410. For example, when the content is received from the encoding server 410, the content may be received in real time. In this case, in order to provide a smooth streaming service, the streaming server 420 may store the bitstream for a predetermined time.

Examples of the user device 450 include a mobile phone, a smart phone, a laptop computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation device, and a slate PC. ), Tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, glass glasses, head mounted displays), digital TVs , Desktop computers, digital signage, and the like.

Each server in the content streaming system may operate as a distributed server. In this case, data received from each server may be distributedly processed.

In video coding, one block may be divided on a QT basis. In addition, one subblock divided by QT may be further divided recursively using QT. Leaf blocks that are no longer QT split may be split by at least one of BT, TT, or AT. BT may have two types of divisions: horizontal BT (2NxN, 2NxN) and vertical BT (Nx2N, Nx2N). The TT may have two types of divisions, horizontal TT (2Nx1 / 2N, 2NxN, 2Nx1 / 2N) and vertical TT (1 / 2Nx2N, Nx2N, 1 / 2Nx2N). AT is horizontal-up AT (2Nx1 / 2N, 2Nx3 / 2N), horizontal-down AT (2Nx3 / 2N, 2Nx1 / 2N), vertical-left AT (1 / 2Nx2N, 3 / 2Nx2N), vertical-right AT (3 / 2Nx2N, 1 / 2Nx2N) can be divided into four types. Each BT, TT, AT may be further recursively divided using BT, TT, AT.

5A shows an example of QT division. Block A may be divided into four sub-blocks A0, A1, A2, A3 by QT. The sub block A1 may be further divided into four sub blocks B0, B1, B2, and B3 by QT.

5B shows an example of BT partitioning. Block B3, which is no longer divided by QT, may be divided into vertical BT (C0, C1) or horizontal BT (D0, D1). Like the block C0, each subblock may be further recursively divided into the form of horizontal BT (E0, E1) or vertical BT (F0, F1).

5C shows an example of TT partitioning. Block B3, which is no longer divided by QT, may be divided into vertical TT (C0, C1, C2) or horizontal TT (D0, D1, D2). Like block C1, each subblock may be further recursively divided into a form of horizontal TT (E0, E1, E2) or vertical TT (F0, F1, F2).

5D shows an example of AT splitting. Block B3, which is no longer divided by QT, may be divided into vertical AT (C0, C1) or horizontal AT (D0, D1). Like block C1, each subblock may be further recursively divided into a form of horizontal AT (E0, E1) or vertical TT (F0, F1).

On the other hand, BT, TT, AT splitting can be used together to split. For example, a sub block divided by BT may be divided by TT or AT. In addition, the sub-block divided by TT can be divided by BT or AT. A sub block divided by AT may be divided by BT or TT. For example, after the horizontal BT division, each sub block may be divided into vertical BTs, or after the vertical BT division, each sub block may be divided into horizontal BTs. In this case, the division order is different, but the shape of the final division is the same.

In addition, when a block is divided, the order in which the blocks are searched may be variously defined. In general, searching from left to right and from top to bottom, and searching for a block means an order of determining whether each divided sub-block is divided into additional blocks, or when each sub-block is not divided any more. A coding order of a block may be referred to, or a search order when referring to information of another neighboring block in a subblock.

The transformation may be performed for each processing unit (or transformation block) divided by the division structure as illustrated in FIGS. 5A to 5D, and in particular, the transformation matrix may be applied by dividing by the row direction and the column direction. . According to an embodiment of the present invention, different conversion types may be used depending on the length of the row direction or the column direction of the processing unit (or transform block).

6 and 7 are embodiments to which the present invention is applied, and FIG. 6 is a schematic diagram of a transform and quantization unit 120/130 and an inverse quantization and inverse transform unit 140/150 in the encoding apparatus 100 of FIG. 7 shows a schematic block diagram of an inverse quantization and inverse transform unit 220/230 in the decoding apparatus 200.

Referring to FIG. 6, the transform and quantization unit 120/130 may include a primary transform unit 121, a secondary transform unit 122, and a quantization unit 130. have. The inverse quantization and inverse transform unit 140/150 may include an inverse quantization unit 140, an inverse secondary transform unit 151, and an inverse primary transform unit 152. Can be.

Referring to FIG. 7, the inverse quantization unit 220/230 may include an inverse quantization unit 220, an inverse secondary transform unit 231, and an inverse primary transform unit ( 232).

In the present invention, the transformation may be performed through a plurality of steps when performing the transformation. For example, as shown in FIG. 6, two stages of a primary transform and a secondary transform may be applied, or more transformation steps may be used according to an algorithm. Here, the primary transform may be referred to as a core transform.

The primary transform unit 121 may apply a primary transform to the residual signal, where the primary transform may be defined as a table at the encoder and / or the decoder.

The secondary transform unit 122 may apply a secondary transform on the primary transformed signal, where the secondary transform may be defined as a table at the encoder and / or the decoder.

In one embodiment, a non-separable secondary transform (NSST) may be conditionally applied as a secondary transform. For example, NSST is applied only to intra prediction blocks, and may have a transform set applicable to each prediction mode group.

Here, the prediction mode group may be set based on symmetry with respect to the prediction direction. For example, since the prediction mode 52 and the prediction mode 16 are symmetric with respect to the prediction mode 34 (diagonal direction), the same transform set may be applied by forming one group. At this time, when the transform for the prediction mode 52 is applied, the input data is transposed and then applied, since the prediction set 16 and the transform set are the same.

On the other hand, in the planar mode and the DC mode, since there is no symmetry of the direction, each has a transform set, and the transform set may be composed of two transforms. For the remaining directional mode, three transforms may be configured per transform set.

The quantization unit 130 may perform quantization on the quadratic transformed signal.

The inverse quantization and inverse transform unit 140/150 performs the above-described process in reverse, and redundant description thereof will be omitted.

7 shows a schematic block diagram of an inverse quantization and inverse transform unit 220/230 in the decoding apparatus 200.

Referring to FIG. 7, the inverse quantization and

inverse transform units

220 and 230 may include an inverse quantization unit 220, an inverse secondary transform unit 231, and an inverse primary transform unit. 232 may include.

The inverse quantization unit 220 obtains a transform coefficient from the entropy decoded signal using the quantization step size information.

The inverse quadratic transform unit 231 performs inverse quadratic transformation on the transform coefficients. Here, the inverse secondary transform indicates an inverse transform of the secondary transform described with reference to FIG. 6.

The inverse primary transform unit 232 performs inverse primary transform on the inverse secondary transformed signal (or block) and obtains a residual signal. Here, the inverse primary transform indicates an inverse transform of the primary transform described with reference to FIG. 6.

In the present specification, an embodiment of a separable transform in which transforms are applied separately in the horizontal direction and the vertical direction is basically described, but the transform combination may also be composed of non-separable transforms. .

Alternatively, a combination of transforms may be constructed from a mixture of separable and non-separable transforms. In this case, when the non-separated transform is used, the row / column transform selection or the horizontal / vertical direction selection is unnecessary, and the transform combinations of Table 4 may be used only when the separable transform is selected.

In addition, the schemes proposed in this specification may be applied regardless of the first-order transform or the second-order transform. That is, there is no restriction that it should be applied to either one, and both can be applied. Here, the primary transform may mean a transform for transforming the residual block first, and the secondary transform may mean a transform for applying the transform to a block generated as a result of the primary transform.

First, the encoding apparatus 100 may determine a transform group corresponding to the current block (S805). Here, the transform group may mean the transform group of Table 4, but the present invention is not limited thereto and may be configured with other transform combinations.

The encoding apparatus 100 may perform transform on candidate transform combinations available in the transform group (S810). As a result of the conversion, the encoding apparatus 100 may determine or select a transformation combination having the lowest cost of RD (rate distortion) (S815). The encoding apparatus 100 may encode a transform combination index corresponding to the selected transform combination (S820).

9 is a flowchart illustrating a decoding process in which AMT is performed.

First, the decoding apparatus 200 may determine a transform group for the current block (S905). The decoding apparatus 200 may parse the transform combination index, where the transform combination index may correspond to any one of a plurality of transform combinations in the transform group (S910). The decoding apparatus 200 may induce a transform combination corresponding to the transform combination index (S915). Here, the transform combination may mean the transform combination described in Table 4, but the present invention is not limited thereto. That is, the structure by other conversion combination is also possible.

The decoding apparatus 200 may perform inverse transform on the current block based on the transform combination (S920). If the transformation combination consists of row transformations and column transformations, you can apply the column transformation first and then the column transformation. However, the present invention is not limited thereto, and in the case of applying the reverse or non-separated transform, the non-separated transform may be applied immediately.

Meanwhile, in another embodiment, the process of determining the transform group and the process of parsing the transform combination index may be performed at the same time.

실시예 1: 4x4 블록에 적용될 수 있는 RST(reduced secondary transform)Example 1: Reduced secondary transform (RST) that can be applied to a 4x4 block

A non-separated transform that can be applied to one 4x4 block is a 16x16 transform. That is, when the data elements constituting the 4x4 block are arranged in a row-first or column-first order, a 16x1 vector may be applied to the corresponding non-separated transform. The forward 16x16 transform consists of 16 row-wise transform basis vectors. When the inner product of the 16x1 vector and each transform basis vector is taken, a transform coefficient for the corresponding transform basis vector is obtained. do. The process of obtaining the corresponding transform coefficients for all 16 transform basis vectors is equivalent to multiplying the 16x16 non-separated transform matrix by the input 16x1 vector. The transform coefficients obtained by the matrix product have a 16 × 1 vector form, and statistical characteristics may be different for each transform coefficient. For example, when a 16x1 transform coefficient vector is composed of 0th to 15th elements, the variance of the 0th element may be greater than the variance of the 15th element. In other words, the greater the variance value is, the larger the element is.

Applying the inverse 16x16 non-separation transform from the 16x1 transform coefficients can restore the original 4x4 block signal (when ignoring effects such as quantization or integer calculations). If the forward 16x16 non-separated transform is an orthonormal transform, the reverse 16x16 transform can be obtained by transposing the matrix with respect to the forward 16x16 transform. Simply multiply the inverse 16x16 non-separated transform matrix by the 16x1 transform coefficient vector to obtain 16x1 vector data and arrange the row-first or column-first order that was applied first to restore the 4x4 block signal.

As described above, elements constituting the 16x1 transform coefficient vector may have different statistical characteristics. As in the previous example, if the transform coefficients placed in front (close to the zeroth element) have greater energy, the original signal may be applied to some of the transform coefficients that appear first without using all the transform coefficients. You can restore a signal that is fairly close to. For example, suppose that the inverse 16x16 non-separated transform consists of 16 column basis vectors, leaving only L column basis vectors to form a 16xL matrix and only the L transform coefficients that are more important among the transform coefficients. After leaving (Lx1 vector, which can appear first as in the previous example), multiplying the 16xL matrix by the Lx1 vector can restore the original input 16x1 vector data and the 16x1 vector with little error. As a result, since only L coefficients are involved in data reconstruction, the Lx1 transform coefficient vector is obtained instead of the 16x1 transform coefficient vector. That is, L significant transform coefficients can be obtained by selecting L corresponding row direction transform vectors from a forward 16x16 non-separated transform matrix and constructing an Lx16 transform and multiplying the 16x1 input vector.

The L value has a range of 1 <= L <16, and in general, L can be selected in any of 16 transform basis vectors, but in terms of encoding and decoding, the energy of the signal is as in the above example. It may be advantageous in terms of coding efficiency to select transform basis vectors of high importance in view.

실시예 2: 4x4 RST의 적용 영역 설정과 변환 계수의 배치Example 2 Application Area Setting and Arrangement of Transform Coefficient of 4x4 RST

FIG. 10 shows three forward scan sequences for transform coefficients or transform coefficient blocks (4x4 blocks, Coefficient Groups (CGs)) applied in the HEVC standard, and the residual coding may be (a), (b), or ( c) in the reverse order of scan order (i.e., coded in the order of 16 to 1). Since the three scan orders shown in (a), (b), and (c) are selected according to the intra-prediction mode, the scan order is determined according to the intra-prediction mode in the same way for the L transform coefficients. Can be configured.

The L value has a range of 1 <= L <16, and in general, L may be selected in any of 16 transform basis vectors, but in terms of encoding and decoding, It may be advantageous in terms of coding efficiency to select transform basis vectors of high importance.

When the 4x4 RST is applied by dividing the upper left 4x8 block into 4x4 blocks according to the diagonal scan order of FIG. 10A, if the L value is 8 (that is, leaving only 8 transform coefficients out of 16) As shown in FIG. 11, transform coefficients may be located. Only half of each 4 × 4 block may have transform coefficients, and a value of 0 may be filled as a default at positions marked with X. Therefore, it is assumed that L transform coefficients are placed for each 4x4 block in the scan order shown in FIG. 10A, and filled with zeros for the remaining (16-L) positions of each 4x4 block. Residual coding (eg, residual coding in HEVC) may be applied.

In addition, as shown in FIG. 12, L transform coefficients arranged in two 4 × 4 blocks may be configured as one block. In particular, when the L value is 8, the transform coefficients of the two 4x4 blocks completely fill one 4x4 block, and thus no transform coefficients remain in the other block. Accordingly, since residual coding is not necessary for the 4 × 4 block in which the transform coefficient is empty, a flag (coded_sub_block_flag) indicating whether the residual coding is applied to the block may be coded as 0 in HEVC. The combination scheme for the position of the transform coefficients of two 4x4 blocks may vary. For example, the positions may be combined in any order, but the following method may also be applied.

1) Combine the transform coefficients of two 4x4 blocks alternately in the order of scanning. That is, the transform coefficients for the upper blocks in FIGS. 8A, 8B, and 8C are calculated.

The conversion coefficient of the lower block

When we say

You can combine them one by one like this. Also,

Wow

You can change the order of (i.e.

Can be set to come first).

2) The transform coefficients for the first 4x4 block may be arranged first, and then the transform coefficients for the second 4x4 block may be arranged. In other words,

It can be arranged as connected. naturally,

You can change the order as

실시예 3: 4x4 RST에 대한 NSST 인덱스를 코딩하는 방법Example 3 Method of Coding NSST Index for 4x4 RST

When 4x4 RST is applied as shown in FIG. 11, 0 values may be filled from L + 1 to 16th according to the transform coefficient scan order for each 4x4 block. Accordingly, if any one of the two 4x4 blocks is a non-zero value among the L + 1st to 16th positions, it can be seen that 4x4 RST is not applied. If the 4x4 RST also has a structure in which one of the transform sets prepared as JEM NSST is selected and applied, the index for which transform is to be applied (named NSST index in this document) may be signaled. Suppose that a decoder can know the NSST index through bitstream parsing and perform such parsing after residual decoding. If residual decoding is performed and it is found that even one non-zero transform coefficient exists between L + 1 and 16th, as described above, it is certain that 4x4 RST is not applied so that the NSST index is not parsed. Can be set. Accordingly, the signaling cost can be reduced by selectively parsing the NSST index only when necessary.

If 4x4 RST is applied to several 4x4 blocks in a specific region as shown in FIG. 11 (all of the same 4x4 RST may be applied or different 4x4 RST may be applied), all of the above through one NSST index A 4x4 RST can be specified (same or separate) that applies to 4x4 blocks. Since one NSST index determines 4x4 RST for all 4x4 blocks and whether it is applied, whether or not a non-zero transform coefficient exists in L + 1 th to 16 th positions for all 4x4 blocks It can be configured not to code the NSST index if there is a non-zero transform coefficient (L + 1 to 16th) in a position that is not allowed even in one 4x4 block by checking during the decoding process.

The NSST index may signal separately for the luminance block and the chrominance block, and for the chrominance block, the NSST index may signal separate NSST indexes for Cb and Cr, and share one NSST index. It may be (signal only once). When sharing one NSST index for Cb and Cr, the 4x4 RST specified by the same NSST index may be applied (the 4x4 RST for Cb and Cr may be the same, or the NSST index may be the same but have separate 4x4 RST). In order to apply the conditional signaling described above for the shared NSST index, it is checked whether there is a non-zero transform coefficient from L + 1 to 16th for all 4x4 blocks for Cb and Cr, if any one is 0. If no transform coefficient is found, the signaling for the NSST index may be omitted.

Even when combining transform coefficients for two 4x4 blocks as shown in FIG. 12, signaling for NSST index after checking whether non-zero transform coefficients appear at a position where no valid transform coefficients exist when 4x4 RST is applied You can decide whether or not. In particular, when L value is 8 as shown in Fig. 3 and valid transform coefficients are empty for one 4x4 block when applying 4x4 RST (block indicated by X in Fig. 3 (b)), the ratio is checked by checking coded_sub_block_flag of the block. If the value is 1, it can be set not to signal the NSST index.

실시예 4: NSST 인덱스에 대한 코딩을 레지듀얼 코딩 이전에 수행하는 경우에 대한 최적화 방법Example 4 Optimization Method for the Case of Coding NSST Index Before Residual Coding

When the coding for the NSST index is performed before the residual coding, whether or not the 4x4 RST is applied is determined in advance, so that the residual coding can be omitted for the positions where the transform coefficient is sure to be filled with zero. Here, whether 4x4 RST is applied can be configured to know through NSST index value (for example, 4x4 RST is not applied when NSST index is 0) or through a separate syntax element (for example, NSST flag). It may signal. For example, if a separate syntax element is an NSST flag, the NSST flag is parsed first to determine whether 4x4 RST is applied, and if the NSST flag value is 1, the residuals for positions where no valid conversion factor can exist as described above are present. Coding can be omitted.

In the case of HEVC, when the residual coding is performed, the first non-zero coefficient position on the TU is coded. If the coding for the NSST index is performed after the last non-zero coefficient position coding and the position of the last non-zero coefficient is found to be a position where non-zero coefficients cannot occur when assuming the application of 4x4 RST, the NSST index Can be configured not to code and apply 4x4 RST. For example, the positions marked with X in Figure 2 do not have valid transform coefficients when 4x4 RST is applied (eg zero values can be filled), so that the last nonzero coefficient is placed in the region marked with X. In this case, coding of the NSST index can be omitted. If the last non-zero coefficient is not located in the region marked with X, coding of the NSST index may be performed.

Conditionally code the NSST index after coding for the last nonzero coefficient position (as described above, if the position of the last nonzero coefficient is an unacceptable position assuming the application of 4x4 RST) When it is determined whether 4x4 RST is applied, the remaining residual coding part may be processed in the following two ways.

1) In case of not applying 4x4 RST, general residual coding is kept as it is. That is, coding is performed under the assumption that there may be non-zero transform coefficients in any position from the last non-zero coefficient position to DC.

2) When 4x4 RST is applied, a corresponding transform coefficient must exist for a specific position or a specific 4x4 block (for example, X position of FIG. 11), so that the position or block can be filled with 0 by default. You can skip the residual coding for this. For example, if you reach the location marked X in Figure 2, you can omit coding for sig_coeff_flag (a flag for whether a non-zero coefficient exists at that location, present in HEVC), as shown in Figure 3 Likewise, if you combine the transform coefficients of two blocks, you can omit coding for coded_sub_block_flag (exists in HEVC) for a 4x4 block that is empty to zero, and derive its value to 0. I can fill it.

If you code an NSST index after coding for the last nonzero coefficient position, skip the NSST index coding if the x position (Px) and y position (Py) of the last nonzero coefficient are less than Tx, Ty, respectively. It can be configured not to apply 4x4 RST. For example, Tx = 1, Ty = 1 indicates that the NSST index coding is omitted for the case where the last non-zero coefficient exists at the DC position. The method of determining NSST index coding by comparing with the threshold value can be applied differently to luminance and chrominance. For example, different Tx and Ty may be applied to luminance and chrominance, and luminance (to Threshold may be applied and not luminance (color difference).

The two methods described above are omitted (when the last nonzero coefficient is located in an area where no valid transform coefficients exist, the NSST index coding is omitted; when the X and Y coordinates for the last nonzero coefficient are less than any threshold, respectively, May be applied all at once. For example, a threshold check on the last non-zero coefficient position coordinate may be performed first, and then it may be checked whether the last non-zero coefficient is located in an area where no valid transform coefficient exists (the order may be changed). ).

The methods presented in this embodiment 4) can also be applied to 8x8 RST. That is, if the last non-zero coefficient is located in the non-left 4x4 region in the top-left 8x8 region, coding for the NSST index may be omitted, otherwise coding may be performed in the NSST index. have. In addition, if the X and Y coordinate values for the last non-zero coefficient position are less than a certain takeover value, the coding for the NSST index may be omitted. Naturally, the two methods can be applied together.

실시예 5: RST 적용시 휘도와 색차에 대해 각기 다른 NSST 인덱스 코딩 및 레지듀얼 코딩 방식 적용Example 5 Application of Different NSST Index Coding and Residual Coding Methods for Luminance and Chromatic Difference

The schemes described in

Embodiments

3 and 4 can be applied differently to luminance and color difference. In other words, NSST index coding and residual coding schemes for luminance and color difference may be applied differently. For example, the luminance follows the scheme described in Example 4, and the scheme in Example 3 may be applied to the color difference. Alternatively, the conditional NSST index coding described in Example 3) or Example 4) may be applied to the luminance, and the conditional NSST index coding may not be applied to the color difference, and vice versa (not applied to the color difference, but not applied to the luminance). .

실시예 6: 축소된 적응적(또는 암시적) 다중 변환(Reduced adaptive (or explicit) multiple transform)Example 6: Reduced adaptive (or explicit) multiple transform

Reduced complexity when combinations of multiple transforms (DCT-2, DST-7, DCT-8, DST-1, DCT-5, etc.) are optionally used for the primary transform, such as in EMT (or AMT) of JEM Instead of performing the transformation for all cases, the worst case complexity can be significantly reduced by applying the transformation only to predefined regions. For example, based on the Reduced transform (RT) method already mentioned, instead of obtaining an MxM-sized transform block when applying a first-order transform to an MxM-sized pixel block, the (M> = R) transform of the RxR block is obtained. Only perform calculations on blocks. As a result, only valid coefficients (non-zero coefficients) exist in the RxR region, and the transform coefficients present in the other region are regarded as a zero value without performing calculation. Table 1 below shows three examples of a reduced adaptive multiple transform (RAMT) using a predefined R value for each primary transform size.

실시예 7: 1차 변환에 기반한 축소된 적응적(또는 암시적) 다중 변환 Reduced adaptive (or explicit) multiple transform based on primary transformExample 7: Reduced adaptive (or explicit) multiple transform based on primary transform

In applying the reduced adaptive multiple transform specified in Embodiment 6, the reduced transform factor R may be determined depending on the corresponding primary transform. For example, if the primary transform is DCT-2, the computational amount is relatively simple compared to other primary transforms, so that the coding performance is reduced by not using the reduced transform for a small block or by using a relatively large R value. Minimize. For example, in the case of DCT-2 and other transformations, other reduced transformation factors may be used as shown in Table 2.

실시예 8: 인트라 예측 모드에 의존하는 EMT(AMT) 코어 변환 맵핑(EMT(AMT) core transform mapping depends on intra prediction mode)Example 8 EMT (AMT) core transform mapping depends on intra prediction mode

If EMT_CU_Flag = 1 (or AMT_CU_Flag = 1), one of the four combinations of EMT indexes (0,1,2,3) is selected through the 2-bit EMT_TU_index, and the corresponding primary transform is selected based on the given EMT index. do. Table 3 is an example of a mapping table for selecting a corresponding primary transform for horizontal and vertical directions based on an EMT index value.

The present invention analyzes the statistics of the first-order transforms generated according to the intra prediction mode and proposes a more efficient EMT core transform mapping method based on the statistics. First, Table 4 shows the distribution (%) of the EMT_TU_index as a percentage by intra prediction mode.

In Table 4, the Hor mode represents Modes 2 through 33 when the JEM is based on the 67 mode, and the Ver mode represents the angular modes 34 through 66.

As shown in the above table, in the Hor mode (2 <= mode <= 33), EMT_TU_index = 2 has a higher probability than EMT_TU_index = 1. Therefore, we propose a mapping table as shown in Table 5.

Table 5 shows an example of using different mappings for Hor mode groups. As described above, the method of deriving the first-order transform based on the EMT_TU_index uses a different mapping table based on the intra prediction direction.

As another example, the present invention proposes a method in which the available EMT_TU_index for each intra prediction mode is not the same but may be defined differently. For example, in the case of planner mode, when EMT_TU_index = 3 (EMT_TU_index> 1 in directional mode), the probability of occurrence is relatively low, and thus, such an efficient coding is possible by excluding such a part. Table 6 specifies an example in which the available EMT_TU_index value depends on the intra prediction mode.

실시예 9: EMT(AMT) TU 인덱스 부호화Example 9 EMT (AMT) TU Index Coding

In order to encode the values of EMT_TU_index distributed differently according to the intra prediction modes mentioned in Embodiment 8, two encoding methods are largely proposed.

1. When binarizing an EMT (AMT) TU index value, it encodes using the Truncated unary method rather than the fixed length binarization method. Table 7 shows examples of fixed length and truncated unary binarization.

2. When encoding the EMT TU index value through context modeling, the context model is determined using the information of the intra prediction mode. Table 8 shows some examples. In particular, the intra prediction mode context modeling method specified in the present invention may be considered along with other factors such as block size.

실시예 10Example 10

In this embodiment, the AMT term is redefined to MTS. Relevant syntaxes and semantics in VVC (Versitile Video Coding Version 4, JVET-K1001-v4.docx) are summarized as in Table 9 below. In addition, the present invention and some modifications are described with reference to Example 11 and the following description below.

In Table 9, sps_mts_intra_enabled_flag and sps_mts_inter_enabled_flag are described in Table 10 below.

The syntax for the transform unit is shown in Tables 11 and 12 below.

The residual coding syntax is shown in Tables 13 and 14 below.

13 is a flowchart illustrating an inverse transformation process based on an MTS according to an embodiment of the present invention.

The decoding apparatus 200 to which the present invention is applied may acquire sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag (S1305). Here, sps_mts_intra_enabled_flag indicates whether cu_mts_flag exists in the residual coding syntax of the intra coding unit. For example, if sps_mts_intra_enabled_flag = 0, cu_mts_flag is not present in the residual coding syntax of the intra coding unit, and if sps_mts_intra_enabled_flag = 1, cu_mts_flag is present in the residual coding syntax of the intra coding unit. And, sps_mts_inter_enabled_flag indicates whether cu_mts_flag exists in the residual coding syntax of the inter coding unit. For example, if sps_mts_inter_enabled_flag = 0, cu_mts_flag is not present in the residual coding syntax of the inter coding unit, and if sps_mts_inter_enabled_flag = 0, cu_mts_flag is present in the residual coding syntax of the inter coding unit.

The decoding apparatus 200 may obtain cu_mts_flag based on sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag (S1310). For example, when sps_mts_intra_enabled_flag = 1 or sps_mts_inter_enabled_flag = 1, the decoding apparatus 200 may obtain cu_mts_flag. Here, cu_mts_flag indicates whether the MTS is applied to the residual sample of the luma transform block. For example, if cu_mts_flag = 0, the MTS is not applied to the residual sample of the luma transform block, and if cu_mts_flag = 1, the MTS is applied to the residual sample of the luma transform block.

The decoding apparatus 200 may obtain mts_idx based on cu_mts_flag (S1315). For example, when cu_mts_flag = 1, the decoding apparatus 200 may obtain mts_idx. Here, mts_idx indicates which transform kernel is applied to luma residual samples along the horizontal and / or vertical direction of the current transform block.

For example, for mts_idx, at least one of the embodiments described herein may be applied.

The decoding apparatus 200 may induce a transform kernel corresponding to mts_idx (S1320). For example, a transform kernel corresponding to mts_idx may be defined by being divided into a horizontal transform and a vertical transform.

For example, when mts is applied to the current block (ie cu_mts_flag = 1), the decoding apparatus 200 may configure an MTS candidate based on the intra prediction mode of the current block. In this case, the decoding flowchart of FIG. 10 may further include configuring the MTS candidate. The decoding apparatus 200 may determine the MTS candidate applied to the current block by using mts_idx among the configured MTS candidates.

As another example, different transform kernels may be applied to the horizontal transform and the vertical transform. However, the present invention is not limited thereto, and the same transform kernel may be applied to the horizontal transform and the vertical transform.

In operation S1325, the decoding apparatus 200 may perform inverse transformation based on the transform kernel.

In this document, MTS may also be expressed as AMT or EMT. Likewise, mts_idx may also be expressed as AMT_idx, EMT_idx, AMT_TU_idx EMT_TU_idx, and the present invention is not limited thereto.

The decoding apparatus 200 to which the present invention is applied may include a sequence parameter obtaining unit 1405, an MTS flag obtaining unit 1410, an MTS index obtaining unit 1415, and a transform kernel deriving unit 1420.

The sequence parameter obtainer 1405 may acquire sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag. Here, sps_mts_intra_enabled_flag indicates whether cu_mts_flag exists in the residual coding syntax of the intra coding unit, and sps_mts_inter_enabled_flag indicates whether cu_mts_flag exists in the residual coding syntax of the inter coding unit. As a specific example, the description associated with FIG. 10 may be applied.

The MTS flag obtainer 1410 may acquire cu_mts_flag based on sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag. For example, when sps_mts_intra_enabled_flag = 1 or sps_mts_inter_enabled_flag = 1, the MTS flag acquisition unit 1415 may acquire cu_mts_flag. Here, cu_mts_flag indicates whether the MTS is applied to the residual sample of the luma transform block. As a specific example, the description associated with FIG. 10 may be applied.

The MTS index obtainer 1415 may acquire mts_idx based on cu_mts_flag. For example, when cu_mts_flag = 1, the MTS index obtainer 1415 may acquire mts_idx. Here, mts_idx indicates which transform kernel is applied to luma residual samples along the horizontal and / or vertical direction of the current transform block. As a specific example, the description of FIG. 10 may be applied.

The translation kernel derivation unit 1420 may derive the translation kernel corresponding to mts_idx. In addition, the decoding apparatus 200 may perform inverse transform based on the derived transform kernel.

The process for the transform process for the scaled transform coefficients may be as Table 15 below.

The horizontal transform (trTypeHor) and vertical transform (trTypeVer) according to the MTS index (mts_idx) and the prediction mode (CuPredMode) of the current CY may be set as shown in Table 16 below.

실시예 11Example 11

In the embodiment of the present invention, two MTS candidates for the directional mode and four MTS candidates for the non-directional mode may be used as follows.

A) Non-directional mode (DC, planner)

When the MTS index is zero, the DST-7 is used for horizontal and vertical conversion.

When the MTS index is 1, DST-7 is used for vertical transformation and DCT-8 is used for horizontal transformation.

When the MTS index is 2, the DCT-8 is used for vertical transformation and the DST-7 is used for horizontal transformation.

When the MTS index is 3, the DCT-8 is used for horizontal and vertical conversion.

B) Modes belonging to the horizontal group mode

When the MTS index is 1, the DCT-8 is used for vertical transformation and the DST-7 is used for horizontal transformation.

C) modes belonging to the vertical group mode

Where (in VTM 2.0 where 67 modes are used), the horizontal group modes include intra prediction modes 2 to 34, and the vertical modes include intra prediction modes 35 to 66,

Table 17 below shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to the MTS index (mts_idx) for a non-angular mode.

Table 18 shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to the MTS index (mts_idx) for the horizontal group mode.

Table 19 shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to the MTS index (mts_idx) for the vertical group mode.

실시예 12Example 12

In this embodiment, three MTS candidates are used for all intra modes.

When the MTS index is zero, DST-7 is used for horizontal and vertical conversion.

When the MTS index is 1, DST-7 is used for the vertical transform and DCT-8 is used for the horizontal transform.

When the MTS index is 2, DCT-8 is used for vertical transformation and DST-7 is used for horizontal transformation.

Table 20 shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to a prediction mode (CuPredMode) and an MTS index (mts_idx).

실시예 13Example 13

In this embodiment, three MTS candidates are used for all intra prediction modes.

In another embodiment of the present invention, two MTS candidates are used for directional prediction modes and three MTS candidates for non-directional prediction modes.

A) non-directional modes (DC, planner)

DST-7 is used for horizontal and vertical conversion when MTS index is 0

When MTS index is 1, DST-7 is used for vertical transformation and DCT-8 is used for horizontal transformation.

DCT-8 is used for vertical transformation and DST-7 is used for horizontal transformation when MTS index is 2.

B) prediction modes corresponding to horizontal group mode

DST-7 is used for horizontal and vertical conversion when MTS index is 0

DCT-8 is used for vertical transformation and DST-7 is used for horizontal transformation when MTS index is 1

C) prediction modes corresponding to vertical group mode

DST-7 is used for horizontal and vertical conversion when MTS index is 0

Here (in VTM 2.0 where 67 modes are used), the horizontal group modes include 2 to 34 intra prediction modes and the vertical modes include 35 to 66 intra prediction modes.

Table 21 shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to the MTS index (mts_idx) for non-directional modes.

Table 22 shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to the MTS index (mts_idx) for the horizontal group mode.

Table 23 shows a horizontal transform type (trTypeHor) and a vertical transform type (trTypeVer) according to the MTS index (mts_idx) for the vertical group mode.

실시예 14Example 14

In this embodiment, one MTS (eg DST-7) is used for all modes. In this case, the MTS index (mts_idx [x] [y]) is not required, but the MTS flag cu_mts_flag [x] [y] may be used as shown in Table 24 to indicate a conversion type.

실시예 15Example 15

The conversion process in VTM 2.0 can be summarized as shown in Table 25 below.

The decoding apparatus 200 to which the present invention is applied may check the transform size nTbS (S1505). Here, the transform size nTbS may be a variable representing a horizontal sample size of scaled transform coefficients.

The decoding apparatus 200 may check the transform kernel type trType (S1510). Here, the transform kernel type trType may be a variable indicating the type of the transform kernel, and various embodiments of the present disclosure may be applied.

The decoding apparatus 200 may perform transform matrix multiplication based on at least one of a transform size nTbS or a transform kernel type (S1515). For example, if the conversion kernel type is 0, (Equation 15-1) of Table 25 may be applied, and if the conversion kernel type is 1 or 2, (Equation 15-2) of Table 25 may be applied.

As another example, when the transform kernel type is 1 and the transform size is 4, the transform matrix of (Equation 15-3) in Table 25 may be applied when performing the transform matrix multiplication.

As another example, when the transform kernel type is 1 and the transform size is 8, when the transform matrix multiplication is performed, the transform matrix of (Equation 15-4) in Table 25 may be applied.

As another example, if the transform kernel type is 1 and the transform size is 16, the transform matrix shown in (Equation 15-3) of Table 25 may be applied when performing the transform matrix multiplication.

As another example, if the transform kernel type is 1 and the transform size is 32, the predefined transform matrix may be applied.

The decoding apparatus 200 may derive the transform sample based on the transform matrix multiplication (S1520).

실시예 16Example 16

As described in the previous embodiment, the transformation matrix contains a predefined number of specific coefficients, which are repeated in several rows in the transformation matrix. For example, 4x4 DST-7 is defined as Equation 1 below.

Here, the first row contains four coefficients 117, 219, 296, and 336. The four coefficients are repeated in the rest. In this embodiment, a coefficient approximation procedure for eliminating multiplication operations is introduced.

For example, Equation 2 below represents transform matrix multiplication.

In Equation 2, transMatrix [i] [j] and x [j] represent transform coefficients and input values, respectively. The multiplication operation between the transform coefficients and the input values (transMatrix [i] [j] * x [j]) is eliminated by efficient means (i.e. shift and add operations) if the transform coefficients are expressed in the form of a polynomial of power of two. Can be. If the conversion factor is 65, we can approximate 64 instead of calculating 65 * x [j], so the multiplication process can be eliminated because 64 * x [j] is equivalent to x [j] << 6. have. In another example, the transform coefficient 280 may be approximated to 282. Thus, 280 * x [j] can be replaced by 282 * x [j], which is equivalent to x [j] << 8 + x [j] << 4.

Thus, all transform coefficient values can be approximated with a combination of powers of two to eliminate the multiplication process in an efficient manner (i.e. using a small number of terms that minimize approximation error). Here, the difference between the original transform coefficients and the approximated value needs to be minimized to reduce coding performance loss. In other words, transform coefficient approximation can be less efficient in energy compression because it can damage the orthogonality of each basis vector. Thus, the difference between the original value and the approximate value should be minimized to maintain orthogonality (to maintain coding performance). Table 26 below shows how the approximation error (Diff in Table 26) changes according to the number of terms for approximation. For example, the minimum error for two term approximation represents a maximum of 30 errors. However, if three term approximations are used the maximum error can be reduced to three.

Smaller terms require fewer operations (shifts and additions) but can result in greater coding performance loss. Thus there is a trade-off between coding performance and computational complexity. In the following embodiments, some practical designs for approximation are introduced.

실시예 17Example 17

In VTM 2.0, when tyType is 1 and nTbs is 8, Equation 3 below may be applied.

Eight transform coefficients are repeatedly used in the transform matrix in the 4 × 4 transform matrix, and an approximation based on up to three terms may be as shown in Table 27 below.

In consideration of the above approximation, the 4x4 transformation matrix may be approximated as in Equation 4 below.

실시예 18Example 18

In VTM 2.0, the transformation matrix when trType is 1 and nTbs is 4 is expressed by Equation 5 below.

In the above 8x8 matrix, eight transform coefficients are repeatedly used in the matrix, and an approximation based on up to three terms may be as shown in Table 28 below.

In consideration of the above approximation, an 8x8 transformation matrix may be approximated as shown in Equation 6 below.

실시예 19Example 19

In VTM 2.0, the transformation matrix when trType is 1 and nTbs is 16 is expressed by Equation 7 below.

In the 16 × 16 matrix, 16 transform coefficients are repeatedly used in the matrix, and an approximation based on up to three terms may be as shown in Table 29 below.

In consideration of the above approximation, the 16x16 transform matrix may be approximated as shown in Equation 8 below.

실시예 20Example 20

In VTM 2.0, 32 transform coefficients in a 32x32 matrix are repeatedly used in the matrix, and an approximation based on up to three terms may be as shown in Table 29 below.

In view of the above approximation, the 32x32 transformation matrix can be arranged as described above.

실시예 21Example 21

In this embodiment, the DST-7 and DCT-8 coefficients are parameterized and summarized. As mentioned in the previous embodiments (16-16), the respective coefficients (parameters) may be approximated in the form of Equations 9-16 below.

16 shows a flowchart for processing a video signal according to an embodiment to which the present invention is applied. The flowchart of FIG. 16 may be performed by the decoding apparatus 200 or the inverse transform unit 230.

In operation S1605, the decoding apparatus 200 confirms a transform index indicating a transform kernel for transforming the current block.

In operation S1610, the decoding apparatus 200 determines a transform matrix corresponding to the transform index. Here, the components of the transformation matrix are implemented by shift operation and addition of one. For example, each of the components of the transformation matrix may be implemented by the sum of terms consisting of a left shift of one. In addition, the number of terms constituting each of the components of the transformation matrix may be set to be less than three. In addition, each of the components of the transformation matrix may be set to a value approximated within an allowable error range from DCT-4, DST-7, or DCT-8. In addition, each of the components of the transformation matrix may be determined in consideration of the allowable error range and the number of terms.

In operation S1615, the decoding apparatus 200 generates an array of residual samples by applying a transform matrix having coefficients approximated by a shift operation and an add operation to the transform coefficients of the current block.

17 shows an example of a block diagram of an apparatus for processing a video signal as an embodiment to which the present invention is applied. The video signal processing apparatus of FIG. 17 may correspond to the encoding apparatus of FIG. 1 or the decoding apparatus of FIG. 2.

The image processing apparatus 1700 for processing an image signal includes a memory 1720 storing an image signal and a processor 1710 coupled to the memory and processing the image signal.

The processor 1710 according to an exemplary embodiment of the present invention may be configured with at least one processing circuit for processing an image signal, and may process the image signal by executing instructions for encoding or decoding the image signal. That is, the processor 1710 may encode the original image data or decode the encoded image signal by executing the above-described encoding or decoding methods.

In addition, the processing method to which the present invention is applied can be produced in the form of a program executed by a computer, and can be stored in a computer-readable recording medium. Multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium. The computer readable recording medium includes all kinds of storage devices and distributed storage devices in which computer readable data is stored. The computer-readable recording medium may be, for example, a Blu-ray disc (BD), a universal serial bus (USB), a ROM, a PROM, an EPROM, an EEPROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, and an optical disc. It may include a data storage device. The computer-readable recording medium also includes media embodied in the form of a carrier wave (for example, transmission over the Internet). In addition, the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted through a wired or wireless communication network.

In addition, embodiments of the present invention may be implemented as a computer program product by a program code, the program code may be performed on a computer by an embodiment of the present invention. The program code may be stored on a carrier readable by a computer.

As described above, the embodiments described herein may be implemented and performed on a processor, microprocessor, controller, or chip. For example, the functional units shown in each drawing may be implemented and performed on a computer, processor, microprocessor, controller, or chip.

In addition, the decoder and encoder to which the present invention is applied include a multimedia broadcasting transmitting and receiving device, a mobile communication terminal, a home cinema video device, a digital cinema video device, a surveillance camera, a video chat device, a real time communication device such as video communication, a mobile streaming device, Storage media, camcorders, video on demand (VoD) service providing devices, OTT video (Over the top video) devices, Internet streaming service providing devices, three-dimensional (3D) video devices, video telephony video devices, and medical video devices. It can be used to process video signals or data signals. For example, the OTT video device may include a game console, a Blu-ray player, an Internet access TV, a home theater system, a smartphone, a tablet PC, a digital video recorder (DVR), and the like.

The embodiments described above are the components and features of the present invention are combined in a predetermined form. Each component or feature is to be considered optional unless stated otherwise. Each component or feature may be embodied in a form that is not combined with other components or features. It is also possible to combine some of the components and / or features to constitute an embodiment of the invention. The order of the operations described in the embodiments of the present invention may be changed. Some components or features of one embodiment may be included in another embodiment or may be replaced with corresponding components or features of another embodiment. It is obvious that the claims may be combined to form embodiments by combining claims that do not have an explicit citation relationship in the claims or as new claims by post-application correction.

Embodiments according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of a hardware implementation, an embodiment of the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs ( field programmable gate arrays), processors, controllers, microcontrollers, microprocessors, and the like.

In the case of an implementation by firmware or software, an embodiment of the present invention may be implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above. The software code may be stored in memory and driven by the processor. The memory may be located inside or outside the processor, and may exchange data with the processor by various known means.

It will be apparent to those skilled in the art that the present invention may be embodied in other specific forms without departing from the essential features of the present invention. Accordingly, the above detailed description should not be construed as limiting in all respects but should be considered as illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

As mentioned above, preferred embodiments of the present invention are disclosed for purposes of illustration, and those skilled in the art can improve and change various other embodiments within the spirit and technical scope of the present invention disclosed in the appended claims below. , Replacement or addition would be possible.

Claims

In the method for processing a video signal,

Identifying a transform index indicating a transform kernel for transforming the current block;

Determining a transform matrix corresponding to the transform index;

Generating an array of residual samples by applying the transform matrix to transform coefficients of the current block,

The components of the transformation matrix are

A method characterized by being implemented by shift operation and addition of one.
The method of claim 1,

Each of the components of the transformation matrix is

Characterized by a sum of terms consisting of a left shift to one.
The method of claim 2,

And the number of terms constituting each of the components of the transformation matrix is set to be less than three.
The method of claim 2,

Each of the components of the transformation matrix is

Characterized in that it is set to a value approximated within an allowable error range from discrete cosine transform (DCT) -4, discrete sine transform (DST) -7, or DCT-8.
The method of claim 4, wherein

Each of the components of the transformation matrix is

Characterized in that it is determined in consideration of the allowed error range and the number of terms.
An apparatus for processing a video signal,

A memory for storing the video signal; And

A processor coupled with the memory,

The processor,

Check the conversion index pointing to the conversion kernel for the conversion for the current block,

Determine a transform matrix corresponding to the transform index,

Apply the transform matrix to the transform coefficients of the current block to generate an array of residual samples,

The components of the transformation matrix are

Apparatus characterized by being implemented by shift operation and addition of one.
The method of claim 6,

Each of the components of the transformation matrix is

And implemented by the sum of terms consisting of a left shift to one.
The method of claim 7, wherein

And the number of terms constituting each of the components of the transformation matrix is set to be less than three.
The method of claim 7, wherein

Each of the components of the transformation matrix is

Characterized in that it is set to a value approximated within an allowable error range from discrete cosine transform (DCT) -4, discrete sine transform (DST) -7, or DCT-8.
The method of claim 9,

Each of the components of the transformation matrix is

And determine the allowable error range and the number of terms.