GB2559912A - Video encoding and decoding using transforms - Google Patents

Video encoding and decoding using transforms Download PDF

Info

Publication number
GB2559912A
GB2559912A GB1806966.6A GB201806966A GB2559912A GB 2559912 A GB2559912 A GB 2559912A GB 201806966 A GB201806966 A GB 201806966A GB 2559912 A GB2559912 A GB 2559912A
Authority
GB
United Kingdom
Prior art keywords
transform
block
skip mode
sub
coefficients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1806966.6A
Other versions
GB201806966D0 (en
Inventor
Mrak Marta
Gabriellini Andrea
Sprljan Nikola
John Flynn David
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Broadcasting Corp
Original Assignee
British Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Broadcasting Corp filed Critical British Broadcasting Corp
Priority to GB1806966.6A priority Critical patent/GB2559912A/en
Priority claimed from GB1110873.5A external-priority patent/GB2492333B/en
Publication of GB201806966D0 publication Critical patent/GB201806966D0/en
Publication of GB2559912A publication Critical patent/GB2559912A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/129Scanning of coding units, e.g. zig-zag scan of transform coefficients or flexible macroblock ordering [FMO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Encoding (Fig. 5) utilising a spatial transform operating on rows and columns of a block, comprising the steps of establishing a set of transform skip modes, including: transform on rows and columns; transform on rows only; transform on columns only; no transform; selecting one of these said modes; providing an indication of the selected mode for a decoder. Decoding (Fig. 6) video which has been encoded utilising a spatial transform operating on rows and columns of a block with transform skip modes, including: transform on rows and columns; transform on rows only; transform on columns only; no transform; comprising the steps of providing an indication of the transform skip mode and applying inverse transforms in accordance with the mode. The same transform skip mode may be used on all components (luminance, Y; chrominance, U and V) of a YUV block. The transform skip mode may be signalled on a set of blocks.

Description

(56) Documents Cited:
XP030080458 XP031697009 (71) Applicant(s):
British Broadcasting Corporation (Incorporated in the United Kingdom) Broadcasting House, Portland Place, LONDON, W1A 1AA, United Kingdom (72) Inventor(s):
Marta Mrak Andrea Gabriellini Nikola Sprljan David John Flynn (74) Agent and/or Address for Service:
Mathys & Squire LLP
The Shard, 32 London Bridge Street, LONDON, SE1 9SG, United Kingdom (58) Field of Search:
INT CL H04N
Other: EPODOC, WPI, INSPEC, Internet (54) Title of the Invention: Video encoding and decoding using transforms Abstract Title: Video Coding using spatial Transform Skip Modes (TSM) (57) Encoding (Fig. 5) utilising a spatial transform operating on rows and columns of a block, comprising the steps of establishing a set of transform skip modes, including: transform on rows and columns; transform on rows only; transform on columns only; no transform; selecting one of these said modes; providing an indication of the selected mode for a decoder. Decoding (Fig. 6) video which has been encoded utilising a spatial transform operating on rows and columns of a block with transform skip modes, including: transform on rows and columns; transform on rows only; transform on columns only; no transform; comprising the steps of providing an indication of the transform skip mode and applying inverse transforms in accordance with the mode. The same transform skip mode may be used on all components (luminance, Y; chrominance, U and V) of a YUV block. The transform skip mode may be signalled on a set of blocks.
Figure 5: Encoder
Figure GB2559912A_D0001
block connection
Figure 6: Decoder
Figure GB2559912A_D0002
connection block
Figure 1: Encoder
Figure GB2559912A_D0003
Μ
CM
Figure 3
Figure GB2559912A_D0004
Figure GB2559912A_D0005
Figure 5: Encoder c
o
Figure GB2559912A_D0006
o
CD c
c o
o _o .5 _Q
CD ~σ o
o
CD
Q
Go
CD
D
ClO ~σ
CD
Figure GB2559912A_D0007
c
O '+-» o
CD
C c
o o
-1 VIDEO ENCODING AND DECODING USING TRANSFORMS
FIELD OF THE INVENTION
This invention is related to video compression and decompression systems, and in particular to a framework to adaptively model signal representation between prediction and entropy coding, by the adaptive use of transform functions and related tools, including scaling, quantisation, scanning, and signalling.
BACKGROUND OF THE INVENTION
Transmission and storage of video sequences are employed in several applications like e.g. TV broadcasts, internet video streaming services and video conferencing.
Video sequences in a raw format require a very large amount of data to be represented, as each second of a sequence may consist of tens of individual frames and each frame is represented by typically at least 8 bit per pixel, with each frame requiring several hundreds or thousands of pixels. In order to minimise the transmission and storage costs video compression is used on the raw video data. The aim is to represent the original information with as little capacity as possible, i.e., with as few bits as possible. The reduction of the capacity needed to represent a video sequence will affect the video quality of the compressed sequence, i.e. its similarity to the original uncompressed video sequence.
State-of-the-art video encoders, such as AVC/H.264, utilise four main processes to achieve the maximum level of video compression while achieving a desired level of video quality for the compressed video sequence: prediction, transformation, quantisation and entropy coding. The prediction process exploits the temporal and spatial redundancy found in video sequences to greatly reduce the capacity required to represent the data. The mechanism used to predict data is known to both encoder and decoder, thus only an error signal, or residual, must be sent to the decoder to reconstruct the original signal. This process is typically performed on blocks of data (e.g. 8x8 pixels) rather than entire frames. The prediction is typically performed against already reconstructed frames or blocks of reconstructed pixels belonging to the same frame.
-2 The transformation process aims to exploit the correlation present in the residual signals. It does so by concentrating the energy of the signal into few coefficients. Thus the transform coefficients typically require fewer bits to be represented than the pixels of the residual. H.264 uses 4x4 and 8x8 integer type transforms based on the Discrete Cosine Transform (DCT).
The capacity required to represent the data in output ofthe transformation process may still be too high for many applications. Moreover, it is not possible to modify the transformation process in order to achieve the desired level of capacity for the compressed signal. The quantisation process takes care of that, by allowing a further reduction ofthe capacity needed to represent the signal. It should be noted that this process is destructive, i.e. the reconstructed sequence will look different to the original
The entropy coding process takes all the non-zero quantised transform coefficients and processes them to be efficiently represented into a stream of bits. This requires reading, or scanning, the transform coefficients in a certain order to minimise the capacity required to represent the compressed video sequence.
The above description applies to a video encoder; a video decoder will perform all ofthe above processes in roughly reverse order. In particular, the transformation process on the decoder side will require the use of the inverse of the transform being used on the encoder. Similarly, entropy coding becomes entropy decoding and the quantisation process becomes inverse scaling. The prediction process is typically performed in the same exact fashion on both encoder and decoder.
The present invention relates to the transformation part ofthe coding, thus a more thorough review of the transform process is presented here.
The statistical properties of the residual affect the ability of the transform (i.e.
DCT) to compress the energy of the input signal in a small number of coefficients. The residual shows very different statistical properties depending on the quality of the prediction and whether the prediction exploits spatial or temporal redundancy. Other factors affecting the quality of the prediction are the size of the blocks being used and the spatial /temporal characteristics ofthe sequence being processed.
It is well known that the DCT approaches maximum energy compaction performance for highly correlated Markov-I signals. DCT’s energy compaction performance starts dropping as the signal correlation becomes weaker. For
-3instance, it is possible to show how the Discrete Sine Transform (DST) can outperform the DCT for input signals with lower adjacent correlation characteristics.
The DCT and DST in image and video coding are normally used on blocks, i.e.
2D signals; this means that a one dimensional transform is first performed in one direction (e.g., horizontal) followed by a one dimensional transform performed in the other direction. As already mentioned the energy compaction ability of a transform is dependent on the statistics of the input signal. It is possible, and indeed it is also common under some circumstances, for the two-dimensional signal input to the transform to display different statistics along the two vertical and horizontal axes. In this case it would be desirable to choose the best performing transform on each axis. A similar approach has already been attempted within the new ISO and ITU video coding standard under development, High Efficiency Video Coding (HEVC). In particular, a combination of two one dimensional separable transforms such as a DCT-like [2] and DST [3] has been used in HEVC standard underdevelopment.
While previous coding standards based on DCT use a two-dimensional transform (2D DCT), newer solutions apply a combination of DCT and DST to intra predicted blocks, i.e. on blocks that are spatially predicted. It has been shown that DST is a better choice than DCT for transformation of rows, when the directional prediction is from a direction that is closer to horizontal then vertical, and, similarly, is a better choice for transformation of columns when the directional prediction is closer to vertical. In the remaining direction (e.g. on rows, when DST is applied on columns), DCT is used.
For implementation purposes, in video coding it is common to use integer approximations of DCT and DST, which will in rest of this text be simply referred to as DCT and DST. One of solutions for integer DCT-like transform uses 16-bit intermediate data representation and is known as partial butterfly. Its main properties are same (anti)symmetry properties as of DCT, almost orthogonal basis vectors, 16 bit data representation before and after each transform stage, bit multipliers for all internal multiplications and no need for correction of different norms of basis vectors during (de)quantisation.
-4SUMMARY OF THE INVENTION
The present invention consists in, in one aspect, a method of video encoding utilising a spatial transform operating on rows and columns of a block, comprising the steps of establishing a set of transform skip modes including:
transform on rows and columns; transform on rows only; transform on columns only; no transform;
selecting one ofthe said modes; and providing an indication ofthe selected mode for a decoder.
There are described in the following a transform mode and a system to apply a combination of transforms minimising the capacity required to represent a signal for a given output signal quality target. Moreover, a system to signal the selected combination of transform modes is presented.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
The present invention will now be described by way of example with reference to the accompanying drawings, in which:
Figure 1 is a block diagram illustrating a feature on an encoder according to an embodiment of the invention;
Figure 2 is a block diagram illustrating the feature on a decoder according to the embodiment;
Figure 3 is a diagram illustrating an alternative to the known zig-zag scanning approach;
Figure 4 is a diagram illustrating a further alternative scanning approach;
Figure 5 is a block diagram illustrating a feature on an encoder according to a further embodiment of the invention;
Figure 6 is a block diagram illustrating the feature on a decoder according to the embodiment.
This invention presents a mode to perform the transformation process Transform Skip Mode (TSM). As described above, the most common transform used in video coding is the DCT. Its energy compacting performance depends on the correlation ofthe residual. It has also been described how the residual can be
-5highly decorrelated, or correlated in one direction only, making the 2D DCT less efficient. It is proposed to skip the transformation process when the encoder makes such decision in a rate-distortion sense. The selected transform mode must be signalled to the decoder, which then performs a combination of transform/ skip transform as defined in signalling.
Four transform modes are defined as shown in Table 1.
TSM Transform on rows Transform on columns Note
TSO + + 2D transform
TS1 + - 1D transform
TS2 - + 1D transform
TS3 - - no transform
Table 1 - Transform Skip Mode options
TSO mode corresponds to 2D transform, i.e. 2D DCT. TS1 mode defines application of one dimensional horizontal DCT followed by a transform skip in the orthogonal direction, i.e. transform of columns is skipped. TS2 defines skipping of horizontal transform, while only columns are transformed. Finally, TS3 mode completely skips transforms in both axes, i.e., no transform is applied to the input signal.
Figures 1 and 2 show core transform skip mode block diagrams, for encoder and decoder, respectively. Each transform skip mode is selected with corresponding (TfO, Tf 1) pair of flags, such that TSO: (1, 1), TS1: (1, 0), TS2: (0, 1) and TS3: (0, 0).
As for any other additional bits from a compressed bit-stream that enable adaptive option, signalling of the transform skip mode can be costly. Therefore several strategies are devised to maximise the coding efficiency.
FourTSM options can be signalled using carefully designed code words. Those code words do not need to be transmitted for each block, but some other methods can be used to save necessary bit-rate.
Some of possibilities for reducing the signalling cost are listed in the following; each option influencing transform-related parts of the encoder and decoder:
-61. The same transform mode used on all components (luminance - Y and chrominance - U and V) of a YUV block; therefore, for Y, U and V collocated blocks only one TSM choice is transmitted.
2. TSM not signalled when all quantised blocks (Y, U and V) have only coefficients with zero values.
3. TSM not signalled for blocks when Y block has only zero-value coefficients, and then 2D DCT is used on U and V components.
4. TSM signalled only for blocks with certain other modes (e.g. bidirectional predicted); otherwise 2D-DCT is applied.
5. Application of TSM signalled on a set of blocks (if on then TS modes signalled for each block from the set).
6. TSM signalled on a set of blocks (e.g. all sub-blocks share the same TSM).
7. TSM signalled if certain other block characteristics are present; e.g. TSM not signalled when Y block has only one non-zero value, and that value is in top-left corner of the block (DC component); in that case 2D-DCT is used for all components.
Four TSM modes (2D transform, two 1D block transforms and skipped transform on a block) can be defined with various code words, e.g. with simple 2-bit words, or with more bits (i.e. with unary codes):
TSM 2 bit signalling Unary code
TSO 11 1
TS1 10 01
TS2 01 001
TS3 00 ooo
If arithmetic coding is used, each bin of the code word can be encoded with different probability models (i.e. initial context states for each slice), depending on the current block size and on QP value.
On the other hand, if variable length coding is used, TSM code words can be encoded independently of or merged with other syntax elements, to reduce the signalling overhead.
-7ln some approaches, a block is not always transformed at once, but rather options for its partitioning into smaller sub-units are applied, and transforms are applied on each sub-units. Representative of such transform structure is Residual QuadTree (RQT) method. While application of TSM on blocks that are not further divided into smaller unit has been assumed so far, TSM can also be applied on such multi-split transform structures. Several options are identified:
1. TSM is decided on a block level, and the same transform choice is applied on each sub-unit.
2. TSM is enabled only at the root level of transformation structure, i.e. when a block is not further partitioned into smaller units when a multi-split structure is enabled; if a block is split into smaller units, each unit is transformed using 2D transform.
3. TSM is decided and signalled for each sub-unit, independently of its depth.
4. TSM is decided and signalled for sub-units, up to specific depth (size) of units; for lower sub-units, when TSB is not signalled, 2D transform is used.
Coefficients within a block can have different characteristics when the transform is not performed in one or both directions. Therefore different coding strategies can be applied, depending on the transform skip mode, to better compress given coefficients.
When a 2D transform is applied on a block, the resulting coefficients are often grouped towards top-left corner of a block, that is to say they are low-frequency components. Conventional zig-zag scanning is therefore a good choice for coding of such signals.
If only 1D transform is applied (TS1 orTS2), adaptive scanning can be used. For example, a row-by-row, or a column-by-column scanning can be used for TS2 and TS1 cases respectively, since one can expect that applied transform concentrates the coefficients towards lower frequencies.
For the TS3 case, where a transform is not applied in any direction, the zig-zag scan may be used. Alternatively, a new scanning pattern may be employed which takes into account the probability (implicit in the decision to conduct no transform) that non-zero coefficients are not uniformly distributed but are instead grouped in “islands” surrounded by “seas” of zero coefficients.
-8Thus, in one new arrangement, positions of the first and the last significant coefficients within a block can be transmitted in the bit-stream, and a zig-zag scanning of coefficients within a block can then be performed. This is shown in Figure 3 where white squares represent coefficients that are not encoded and have zero value, gray squares represent coefficients that will be encoded, i.e. include significant (non-zero) coefficients), where the first coded coefficient is labelled with F and the last encoded coefficient is labelled with L. Scanning is performed only on rows and columns that belong to area defined by the first and the last coefficient. In this scanning method, x and y coordinates of the first coefficient must be the same or smaller than the x and y coordinates of the last significant coefficient.
This arrangement should lead to highly efficient coding in the case where nonzero coefficients are clustered, but requires the additional complexity in the encoder of determining the positions of the first and the last significant coefficients within a block, together with the need to signal those positions to the decoder.
In an alternative, a double zig-zag scan is used, as depicted in Figure 4, where a block of transform coefficients is represented with sub-blocks of coefficients.
Each sub-block is visited in sub-block level zig-zag scan, and inside each block a zig-zag scan is used. This enables better grouping of non-zero coefficients, which tend to be spatially close.
It will be desirable, where a decision is taken to skip either or both 1D transforms, to minimise or remove the need to change other elements of the process to accommodate the skipped transform or transforms.
Here, two implementation strategies for the adaptive transform stage are identified:
1) skipping selected transform of rows I columns, and modifying quantisation stage.
2) replacing selected transform of rows I columns by suitable scaling step and adapting the quantisation step if required.
While the first strategy is suitably presented with Figures 1 and 2, the second strategy that employs scaling is depicted in Figures 5 and 6. One of the main reasons why scaling is performed is to maintain levels of signal, with the highest
-9supported precision, between transform blocks. This is indicated using dashed line in Figures 5 and 5.
Scaling is performed by multiplying each input pixel value by a factor that is derived from norm-2 of corresponding transform vectors (which would be used to obtain a transform coefficient value, at the same position in a row/column, if the transform was selected. Some transforms have close to orthonormal properties of each vector and this property can further simplify the scaling design since a single value can be used to suitably scale whole row/column on which the transform is skipped.
In the following, scaling strategies are discussed in the context of integer DCT transform with 16 bit intermediate data representation. It will be recognised, however, that this is only an example.
Where a transform is replaced with scaling, the adaptive transform stage is designed in a way that it can be interleaved within the integer DCT transform with 16 bit intermediate data representation, i.e. with the goal to replace some of its parts and to be compatible with the rest of the codec that supports original 2D transform. For example, not applying transform can be used on rows in a way which is still compatible with the part of 2D transform that is applied on columns. This means that quantisation applied for 2D transform can also be used with adaptive transform choice.
The forward transform skip is defined for rows and columns separately.
On samples x of rows the transform skip is applied as:
y = (x · scale + offset) right shifted by S bits (a) where:
S = M -1 + DB offset = 1 left shifted by (S -1) bits
DB = B - 8 is the internal bit-depth increase with 8-bit input
M = log2(N), where N is row/column size in the number of pixels, and scale is an unsigned integer multiplier.
On columns, the transform skip is applied as in (a) where x are samples of columns, but with:
S = M + 6 offset = 1 left shifted by (S -1) bits
-10ln this way a bit-width of 16 after each transform stage is ensured, as in the 2D transform.
Scale factors are designed in a way to be near the norm-2 of related transform vectors (scale2 = N · 642) and to be an integer number. For block sizes N, the following scale factors are used:
N 4 8 16 32
scale 128 181 256 362
On samples x of columns the inverse transform skip is applied as y = (x · scale + offset) right shifted by S bits where:
S = 7 offset = 1 left shifted by (S -1) bits and scale is the same as in the forward skip.
On rows the same transform skip operation is applied, but with:
S = 12 - DB, where DB is the same as in the forward transform skip.
In order to save unnecessary processing of pixels, where one or both 1D transforms are skipped, scaling can be moved to quantisation. Moreover (for example), if only the vertical transform is kept, it can be adapted, to ensure maximal 16-bit representation of pixels. This enables full use to be made of the available bit width. Therefore, scaling in quantisation has to be adapted not only because of the scaling related to skipped transform but also related to new scaling within a transform.
TSM = TSO (2D transform)
Regular 2D transform and corresponding quantisation is used.
TSM - TS1 (1D transform on rows) and TS2 (1D transform on columns)
In both cases the forward transform corresponds to the original transform of rows y = (x + offset) right shifted by S bits, (b) where:
x is the original value of residual block,
-11 S = Μ -1 + DB, offset = 1 left shifted by (S -1) bits and M and DB are the same as in (a).
This ensures 16-bit intermediate data precision.
Quantisation is adapted and takes into account the level at which signal is now.
TSM = TS3 (no transform)
Residual pixels are directly quantised using the flat matrix so that the level of signal corresponds to the levels of quantised coefficients that are 2D transformed and quantised.
It will be understood that the invention has been described by way of example only and that a wide variety of modifications are possible without departing from the scope of the invention as set forth in the appended claims. Features which are here described in certain combinations may find useful application in other combinations beyond those specifically mentioned and may in certain cases be used alone. For example, the scanning approaches in video coding or decoding where:
positions of the first and the last coefficients to be encoded /decoded within a block are signalled to the decoder and a zig-zag scanning of coefficients is performed between said first and the last coefficients; or a double zig-zag scan is performed, where a block of transform coefficients is represented with sub-blocks of coefficients; each sub-block is visited in sub-block level zig-zag scan, and inside each sub-block a zigzag scan is used;
may be useful beyond the case of transform skip mode.

Claims (30)

1. A method of video encoding utilising a spatial transform operating on rows and columns of a block, comprising the steps of establishing a set of transform skip modes including:
transform on rows and columns; transform on rows only; transform on columns only; no transform;
selecting one ofthe said modes; and providing an indication ofthe selected mode for a decoder.
2. A method according to Claim 1, wherein mode selection is signalled to a decoder with each mode assigned a codeword.
3. A method according to any one of the preceding claims, where the order in which coefficients within a block are scanned in the entropy coding stage is adapted in accordance with the transform skip mode.
4. A method according to Claim 3, wherein row-by-row scanning is employed where the row transform is skipped and transform of columns is kept, and column-by-column scanning is employed where the column transform is skipped and transform on rows is kept.
5. A method according to Claim 4, wherein positions of the first and the last coefficients to be encoded /decoded within a block are signalled to the decoder and a zig-zag scanning of coefficients is performed between said first and the last coefficients.
-136. A method according to Claim 4, wherein, a double zig-zag scan is performed, where a block of transform coefficients is represented with sub-blocks of coefficients; each sub-block is visited in sub-block level zig-zag scan, and inside each sub-block a zig-zag scan is used.
7. A method according to any one of the preceding claims, wherein the same transform skip mode is used on all components (luminance - Y and chrominance - U and V) of a YUV block.
8. A method according to any one of the preceding claims, wherein the transform skip mode is not signalled for blocks having only zero-value coefficients.
9. A method according to Claim 8, wherein the transform skip mode is not signalled when the luminance component has only zero values; in this case 2D transform is used on chroma components.
10. A method according to Claim 8, wherein the transform skip mode is not signalled when the only non-zero-value coefficient of the luminance component is the top-left corner of the block (DC component) in this case 2D transform is used on chroma components.
11. A method according to any one of the preceding claims, wherein the transform skip mode is signalled only for blocks with predefined other modes (e.g. predicted from other frames only).
12. A method according to any one of the preceding claims, wherein the transform skip mode is signalled on a set of blocks.
-1413. A method according to any one of the preceding claims, where the transform provides options for its partitioning into smaller sub-units and transforms are applied on each sub-units (for example the Residual QuadTree (RQT) method) and wherein:
the transform skip mode is enabled on a block level, and the same transform mode is applied on each sub-unit; or transform skip mode is enabled only on the root level of transformation structure; for lower sub-units, when the transform skip mode is disabled, 2D transform is used; or the transform skip mode is enabled for each sub-unit, independently of its depth; or the transform skip mode is enabled for sub-units, up to specific depth of units; for lower sub-units, when the transform skip mode is disabled, 2D transform is used.
14. A method according to any one of the preceding claims, wherein a quantisation stage is adapted according to the selected transform skip mode.
15. A method according to Claim 14, wherein a quantisation matrix that has the same values in each column is applied when only a horizontal transform is applied, a quantisation matrix that has the same values in each row is applied when only a vertical transform is applied.
16. A method according to any one of Claims 1 to 13, comprising the step of scaling of untransformed coefficients, where the scaling factors are dependent upon the norms of corresponding transform vectors to bring the untransformed coefficients to the same level as transformed coefficients and wherein the same quantisation is used for all transform skip modes.
17. A method according to Claim 16, wherein the same scaling factors are used for all coefficients in scaled row or column.
-1518. A method according to any one of the preceding claims, wherein the row transform differs in dependence upon whether or not the column transfer is skipped and wherein the column transform differs in dependence upon whether or not the row transfer is skipped.
19. A method of decoding video which has been encoded utilising a spatial transform operating on rows and columns of a block with transform skip modes including:
transform on rows and columns; transform on rows only; transform on columns only; no transform;
comprising the steps of providing an indication of the transform skip mode and applying inverse transforms in accordance with the mode.
20. A method according to Claim 19, wherein each transform skip mode is assigned a codeword.
21. A method according to Claim 19 or Claim 20, where the order in which coefficients within a block are scanned is adapted in accordance with the transform skip mode.
22. A method according to Claim 21, wherein row-by-row scanning is employed where the row transform is skipped and transform of columns is kept, and column-by-column scanning is employed where the column transform is skipped and transform on rows is kept.
23. A method according to Claim 22, wherein positions of the first and the last coefficients to be encoded /decoded within a block are signalled to the decoder and a zig-zag scanning of coefficients is performed between said first and the last coefficients.
-1624. A method according to Claim 22, wherein, a double zig-zag scan is performed, where a block of transform coefficients is represented with sub-blocks of coefficients; each sub-block is visited in sub-block level zig-zag scan, and inside each sub-block a zig-zag scan is used.
25. A method according to any one of Claim 19 to Claim 24, wherein the same transform skip mode is used on all components (luminance - Y and chrominance - U and V) of a YUV block.
26. A method according to any one of Claim 19 to Claim 25, wherein the 2D inverse transform is used on chroma components when the luminance component has only zero values.
27. A method according to any one of Claim 19 to Claim 26, wherein the 2D inverse transform is used on chroma components when the only non-zero-value coefficient ofthe luminance component is the top-left corner ofthe block (DC component).
28. A method according to any one of Claim 19 to Claim 27, where the transform provides options for its partitioning into smaller sub-units and inverse transforms are applied on each sub-units (for example the Residual QuadTree (RQT) method) and wherein:
the transform skip mode is enabled on a block level, and the same transform mode is applied on each sub-unit; or transform skip mode is enabled only on the root level of transformation structure; for lower sub-units, when the transform skip mode is disabled, 2D inverse transform is used; or the transform skip mode is enabled for each sub-unit, independently of its depth; or
-17the transform skip mode is enabled for sub-units, up to specific depth of units; for lower sub-units, when the transform skip mode is disabled, 2D inverse transform is used.
29. A method according to any one of Claim 19 to Claim 28, wherein an inverse quantisation stage is adapted according to the selected transform skip mode.
30. A method according to Claim 29, wherein a quantisation matrix that has the same values in each column is applied when only a horizontal inverse transform is applied, a quantisation matrix that has the same values in each row is applied when only a vertical inverse transform is applied.
31. A method according to any one of Claim 19 to Claim 30, wherein the inverse row transform differs in dependence upon whether or not the inverse column transfer is skipped and wherein the inverse column transform differs in dependence upon whether or not the inverse row transfer is skipped.
32. A computer program product containing instructions causing programmable means to implement a method according to any one of the preceding claims.
33. A video encoder adapted and configured to operate in accordance with any one of Claim 1 to Claim 18.
34. A video decoder adapted and configured to operate in accordance with any one of Claim 19 to Claim 31.
Intellectual
Property
Office
Application No: GB1806966.6 Examiner: Steve Williams
GB1806966.6A 2011-06-27 2011-06-27 Video encoding and decoding using transforms Withdrawn GB2559912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1806966.6A GB2559912A (en) 2011-06-27 2011-06-27 Video encoding and decoding using transforms

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1806966.6A GB2559912A (en) 2011-06-27 2011-06-27 Video encoding and decoding using transforms
GB1110873.5A GB2492333B (en) 2011-06-27 2011-06-27 Video encoding and decoding using transforms

Publications (2)

Publication Number Publication Date
GB201806966D0 GB201806966D0 (en) 2018-06-13
GB2559912A true GB2559912A (en) 2018-08-22

Family

ID=62495235

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1806966.6A Withdrawn GB2559912A (en) 2011-06-27 2011-06-27 Video encoding and decoding using transforms

Country Status (1)

Country Link
GB (1) GB2559912A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113170173B (en) * 2018-11-28 2024-04-12 北京字节跳动网络技术有限公司 Improved method for transformation quantization or quantization bypass mode

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FATIH KAMISLI; JAE S LIM: "Video compression with 1-D directional transforms in H.264/AVC", ACOUSTICS SPEECH AND SIGNAL PROCESSING (ICASSP), 2010 IEEE INTERNATIONAL CONFERENCE, 14 March 2010 (2010-03-14), IEEE, Piscataway, NJ, USA, pages 738 - 741, XP031697009 *
YUMI SOHN; WOO JIN HAN: "One Dimensional Transform For H.264 Based Intra Coding (Abstract)", 26. PICTURE CODING SYMPOSIUM;7-11-2007 - 9-11-2007, 7 November 2007 (2007-11-07) - 9 November 2007 (2007-11-09), Lisbon, XP030080458 *

Also Published As

Publication number Publication date
GB201806966D0 (en) 2018-06-13

Similar Documents

Publication Publication Date Title
EP2652954B1 (en) Video encoding and decoding using transforms
US10341656B2 (en) Image decoding method using intra prediction mode
CN107181942B (en) Image decoding apparatus
KR102393180B1 (en) Method and apparatus for generating reconstruction block
US20100061454A1 (en) Method of processing a video signal
WO2004038921A2 (en) Method and system for supercompression of compressed digital video
EP2753081A2 (en) Image encoding/decoding method for rate-distortion optimization and device for performing same
KR20050072487A (en) Apparatus and method for multiple description encoding
CN110324639B (en) Techniques for efficient entropy encoding of video data
EP3813372A1 (en) Sparse matrix representation using a boundary of non-zero coefficients
CN110708547A (en) Efficient entropy coding group grouping method for transform mode
GB2559912A (en) Video encoding and decoding using transforms
CN1246019A (en) System for producing decode low-resolution video signal from code high-resolution video signal
WO2024081011A1 (en) Filter coefficient derivation simplification for cross-component prediction

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)