AU2019203981A1

AU2019203981A1 - Method, apparatus and system for encoding and decoding a block of video samples

Info

Publication number: AU2019203981A1
Application number: AU2019203981A
Authority: AU
Inventors: Iftekhar Ahmed; Jonathan GAN; Christopher James ROSEWARNE
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2020-12-24

Abstract

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING A BLOCK OF VIDEO SAMPLES A system and method of decoding a transform block of an image frame from a video bitstream. The method comprises determining a state variable for a residual coefficient of the transform block, the state variable being determined either: based on a previous decoded residual coefficient of the transform block (1590), or using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined positions in the transform block (15100). The method also comprises decoding the residual coefficient of the transform block according to the determined state variable; inverse quantising the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and decoding the transform block by inverse transforming the reconstructed transform coefficient. 22856187_1 E 4/19 LO CCU LOOl E-oo a) c) co _0~ 0 cu 00 oo o EE 0 -&-a c~~~ oo.." o mI c .c o oE2 L- 0 4- (0 s o~c m COI

Description

E 4/19

CCU

LO LOOl

a) c) 00

E-oo

0 0-&-a _0~ cu co oo o EE c~~~ o oo.."

oE2 L- 0 4- (0 s

o~c m

mI c .c COI o

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING A BLOCK OF VIDEO SAMPLES TECHNICAL FIELD

[0001] The present invention relates generally to digital video signal processing and, in particular, to a method, apparatus and system for encoding and decoding a block of video samples. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer program for encoding and decoding a block of video samples.

BACKGROUND

[0002] Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the "Joint Video Experts Team" (JVET). The Joint Video Experts Team (JVET) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), also known as the "Video Coding Experts Group" (VCEG), and members of the International Organisations for Standardisation / International Electrotechnical Commission Joint Technical Committee 1 / Subcommittee 29 / Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the "Moving Picture Experts Group" (MPEG).

[0003] The Joint Video Experts Team (JVET) issued a Call for Proposals (CfP), with responses analysed at its 10 th meeting in San Diego, USA. The submitted responses demonstrated video compression capability significantly outperforming that of the current state-of-the-art video compression standard, i.e.: "high efficiency video coding" (HEVC). On the basis of this outperformance it was decided to commence a project to develop a new video compression standard, to be named 'versatile video coding' (VVC). VVC is anticipated to address ongoing demand for ever-higher compression performance, especially as video formats increase in capability (e.g., with higher resolution and higher frame rate) and address increasing market demand for service delivery over WANs, where bandwidth costs are relatively high. At the same time, VVC must be implementable in contemporary silicon processes and offer an acceptable trade-off between the achieved performance versus the implementation cost (for example, in terms of silicon area, CPU processor load, memory utilisation and bandwidth).

22856187_1

[0004] Video data includes a sequence of frames of image data, each of which include one or more colour channels. Generally one primary colour channel and two secondary colour channels are needed. The primary colour channel is generally referred to as the 'luma' channel and the secondary colour channel(s) are generally referred to as the 'chroma' channels. Although video data is typically displayed in an RGB (red-green-blue) colour space, this colour space has a high degree of correlation between the three respective components. The video data representation seen by an encoder or a decoder is often using a colour space such as YCbCr. YCbCr concentrates luminance, mapped to 'luma' according to a transfer function, in a Y (primary) channel and chroma in Cb and Cr (secondary) channels. Moreover, the Cb and Cr channels may be sampled spatially at a lower rate (subsampled) compared to the luma channel, for example half horizontally and half vertically - known as a '4:2:0 chroma format'. The 4:2:0 chroma format is commonly used in 'consumer' applications, such as internet video streaming, broadcast television, and storage on Blu-Ray T M disks. Subsampling the Cb and Cr channels at half-rate horizontally and not subsampling vertically is known as a '4:2:2 chroma format'. The 4:2:2 chroma format is typically used in professional applications, including capture of footage for cinematic production and the like. The higher sampling rate of the 4:2:2 chroma format makes the resulting video more resilient to editing operations such as colour grading. Prior to distribution to consumers, 4:2:2 chroma format material is often converted to the 4:2:0 chroma format and then encoded for distribution to consumers. In addition to chroma format, video is also characterised by resolution and frame rate. Example resolutions are ultra-high definition (UHD) with a resolution of 3840x2160 or '8K' with a resolution of 7680x4320 and example frame rates are 60 or 120Hz. Luma sample rates may range from approximately 500 mega samples per second to several giga samples per second. For the 4:2:0 chroma format, the sample rate of each chroma channel is one quarter the luma sample rate and for the 4:2:2 chroma format, the sample rate of each chroma channel is one half the luma sample rate.

[0005] The VVC standard is a 'block based' codec, in which frames are firstly divided into a square array of regions known as 'coding tree units' (CTUs). CTUs generally occupy a relatively large area, such as 128x128 luma samples. However, CTUs at the right and bottom edge of each frame may be smaller in area. Associated with each CTU is a 'coding tree' for the luma channel and an additional coding tree for the chroma channels. A coding tree defines a decomposition of the area of the CTU into a set of blocks, also referred to as 'coding blocks' (CBs). It is also possible for a single coding tree to specify blocks both for the luma channel and the chroma channels, in which case the collections of collocated coding blocks are referred to as 'coding units' (CUs), i.e., each CU having a coding block for each colour channel. The

22856187_1

CBs are processed for encoding or decoding in a particular order. As a consequence of the use of the 4:2:0 chroma format, a CTU with a luma coding tree for a 128x128 luma sample area has a corresponding chroma coding tree for a 64x64 chroma sample area, collocated with the 128x128 luma sample area. When a single coding tree is in use for the luma channel and the chroma channels, the collections of collocated blocks for a given area are generally referred to as 'units', for example the above-mentioned CUs, as well as 'prediction units' (PUs), and 'transform units' (TUs). When separate coding trees are used for a given area, the above mentioned CBs, as well as 'prediction blocks' (PBs), and 'transform blocks' (TBs) are used.

[0006] Notwithstanding the above distinction between 'units' and 'blocks', the term 'block' may be used as a general term for areas or regions of a frame for which operations are applied to all colour channels.

[0007] For each CU a prediction unit (PU) of the contents (sample values) of the corresponding area of frame data is generated (a 'prediction unit'). Further, a representation of the difference (or 'residual' in the spatial domain) between the prediction and the contents of the area as seen at input to the encoder is formed. The difference in each colour channel may be transformed and coded as a sequence of residual coefficients, forming one or more TUs for a given CU. The applied transform may be a Discrete Cosine Transform (DCT) or other transform, applied to each block of residual values. This transform is applied separably, i.e. that is the two dimensional transform is performed in two passes. The block is firstly transformed by applying a one-dimensional transform to each row of samples in the block. Then, the partial result is transformed by applying a one-dimensional transform to each column of the partial result to produce a final block of transform coefficients that substantially decorrelates the residual samples. Transforms of various sizes are supported by the VVC standard, including transforms of rectangular-shaped blocks, with each side dimension being a power of two. Transform coefficients are quantised for entropy encoding into a bitstream.

[0008] Each TB includes a two-dimensional array of residual coefficients, divided into 4x4 sub blocks and scanned according to a backward diagonal scan. The backward diagonal scan progresses from a last significant coefficient position back towards the DC (top-left) residual coefficient position. Residual coefficients are coded using an arithmetic coding engine, with two sets of contexts for the coefficient context modelling. For each coefficient, one of the two sets is selected according to a state machine, updated according to the least significant bit (LSB) of the previous residual coefficient. Quantisation of each residual coefficient in a TB,

22856187_1 performed in a separate stage in a pipelined design, is also dependent upon state machine, updated in the same coefficient-by-coefficient manner of the entropy coding engine. The serial dependency in the quantisation and inverse quantisation of TBs may include up to 1024 coefficients in the VVC standard. The serial dependency in the quantisation and inverse quantisation of TBs introduces latency in a portion of a pipeline absent from previous standards, such as HEVC. In previous standards, residual coefficients could be quantised or inverse quantised in parallel. Quantisation in parallel allowed implementations relative freedom to quantise and inverse quantise coefficients in batches sized according to architectural convenience, for example SIMD data path widths or internal bus widths.

SUMMARY

[0009] It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

[00010] One aspect of the present disclosure provides A method of decoding a transform block of an image frame from a video bitstream, the method comprising: determining a state variable for a residual coefficient of the transform block, the state variable being determined either: based on a previous decoded residual coefficient of the transform block, or using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined positions in the transform block; decoding the residual coefficient of the transform block according to the determined state variable; inverse quantising the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and decoding the transform block by inverse transforming the reconstructed transform coefficient.

[00011] According to another aspect, the residual coefficients of the transform block are grouped into sub-blocks and scanned within each sub-block in a backward diagonal scan, progressing from sub-block to sub-block in a backward diagonal scan.

[00012] According to another aspect, the predetermined state is a zero state.

[00013] According to another aspect, the predetermined state corresponds to a sub-block index.

22856187_1

[00014] According to another aspect, each position of the plurality of positions corresponds with an initial position in each of the sub-blocks of the transform block encountered when performing the backward diagonal scan order.

[00015] According to another aspect, the predetermined positions occur once every predetermined number of sub-blocks.

[00016] According to another aspect, the number of predetermined positions encountered depends on the position of a last significant position of the residual coefficients in the transform block.

[00017] According to another aspect, the state variable is used to select one of a pair of quantisers, and wherein the state selects an alternate one of the quantisers at each of the predetermined positions.

[00018] According to another aspect, determining the state variable based on the previous decoded residual coefficient of the transform block comprises determining a transition of a state machine based on a parity of the previously decoded residual coefficient.

[00019] According to another aspect, the state is determined to be the predetermined state based on the previous decoded residual coefficient of the transform block.

[00020] According to another aspect, decoding the residual coefficient of the transform block according to the determined state variable comprises decoding a magnitude of a portion of the residual coefficient using a context-coded bin based on the state variable.

[00021] According to another aspect, decoding the residual coefficient of the transform block further comprises decoding a remainder of the residual coefficient using Golomb Rice coding.

[00022] Another aspect of the present disclosure provides a non-transitory computer-readable medium having a computer program stored thereon to implement a method of decoding a transform block of an image frame from a video bitstream, the program comprising: code for determining a state variable for a residual coefficient of the transform block, the state variable being determined either: based on a previous decoded residual coefficient of the transform block, or using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined

22856187_1 positions in the transform block; code for decoding the residual coefficient of the transform block according to the determined state variable; code for inverse quantising the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and code for decoding the transform block by inverse transforming the reconstructed transform coefficient.

[00023] Another aspect of the present disclosure provides a video decoder, configured to: receive a transform block of an image frame from a video bitstream; determine a state variable for a residual coefficient of the transform block, the state variable being determined either: based on a previous decoded residual coefficient of the transform block, or using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined positions in the transform block; decode the residual coefficient of the transform block according to the determined state variable; inverse quantise the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and decode the transform block by inverse transforming the reconstructed transform coefficient.

[00024] Another aspect of the present disclosure provides a system, comprising: a memory; and a processor, wherein the processor is configured to execute code stored on the memory for implementing a method of decoding a transform block of an image frame from a video bitstream, the method comprising: determining a state variable for a residual coefficient of the transform block, the state variable being determined either: based on a previous decoded residual coefficient of the transform block, or using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined positions in the transform block; decoding the residual coefficient of the transform block according to the determined state variable; inverse quantising the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and decoding the transform block by inverse transforming the reconstructed transform coefficient..

[00025] Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

[00026] At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:

22856187_1

[00027] Fig. 1 is a schematic block diagram showing a video encoding and decoding system;

[00028] Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which one or both of the video encoding and decoding system of Fig. 1 may be practiced;

[00029] Fig. 3 is a schematic block diagram showing functional modules of a video encoder;

[00030] Fig. 4 is a schematic block diagram showing functional modules of a video decoder;

[00031] Fig. 5 is a schematic block diagram showing the available divisions of a block into one or more blocks in the tree structure of versatile video coding;

[00032] Fig. 6 is a schematic illustration of a dataflow to achieve permitted divisions of a block into one or more blocks in a tree structure of versatile video coding;

[00033] Figs. 7A and 7B show an example division of a coding tree unit (CTU) into a number of coding units (CUs);

[00034] Fig. 8A shows two scalar quantisers for quantising and inverse quantising residual coefficients;

[00035] Fig. 8B shows a finite state machine (FSM) for selecting a scalar quantiser for quantising and inverse quantising and binarising residual coefficients;

[00036] Fig. 8C shows state transitions of the FSM of Fig. 8B;

[00037] Fig. 9A shows a two-level backward diagonal scan;

[00038] Fig. 9B shows example transform coefficients prior to quantisation;

[00039] Fig. 9C shows residual coefficients resulting from quantising the transform coefficients of Fig. 9B;

[00040] Fig. 9D shows the residual coefficients of Fig. 9C mapped into a transform block;

[00041] Fig. 10 shows a method for encoding coding units and transform blocks of an image frame into a video bitstream;

22856187_1

[00042] Fig. 11 shows a method for encoding a transform block of an image frame into a video bitstream as used in Fig. 10;

[00043] Fig. 12 shows a method for encoding residual coefficients of a sub-block of a transform block of an image frame into a video bitstream as used in Fig. 11;

[00044] Fig. 13 shows a method for decoding coding units and transform blocks of an image frame from a video bitstream;

[00045] Fig. 14 shows a method for decoding a transform block of an image frame from a video bitstream as used in Fig. 13; and

[00046] Fig. 15 shows a method for decoding residual coefficients of a sub-block of a transform block of an image frame from a video bitstream as used in Fig. 14.

DETAILED DESCRIPTION INCLUDING BEST MODE

[00047] Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

[00048] As described above, sequential dependencies from one coefficient to the next in the inverse quantisation stage of a pipeline result in higher latency in the inverse quantisation stage of a video encoding or decoding system. Although the largest TB size is 64x64, the largest coded residual is 32x32 or 1024 coefficients. A sequential dependency in the worst case of 1024 significant residual coefficients delays the inverse quantisation operation. The dependency runs in the direction of the backward diagonal scan, and so cannot be easily incorporated into the inverse transform stage, where horizontal and vertical one-dimensional (ID) inverse transforms are performed.

[00049] Fig. 1 is a schematic block diagram showing functional modules of a video encoding and decoding system 100. The system 100 may utilise periodic resetting of a state machine for residual coefficient encoding and decoding and quantisation and inverse quantisation of residual coefficients. The periodic resetting permits processing the residual coefficients of a transform block as a set of segments, each of which may be processed concurrently. In particular, for quantisation and inverse quantisation, concurrent processing results in lower

22856187_1 latency than a sequential process for the entirety of the transform block. The chroma CB may use a single prediction mode, such as one intra prediction mode, independently of the prediction modes of each of the collocated luma CBs (including where one or more luma CBs uses inter prediction). Residual coefficient coding may also exploit the multiple of 16 block size, including in the case of blocks having a width or height of two samples.

[00050] The system 100 includes a source device 110 and a destination device 130. A communication channel 120 is used to communicate encoded video information from the source device 110 to the destination device 130. In some arrangements, the source device 110 and destination device 130 may either or both comprise respective mobile telephone handsets or "smartphones", in which case the communication channel 120 is a wireless channel. In other arrangements, the source device 110 and destination device 130 may comprise video conferencing equipment, in which case the communication channel 120 is typically a wired channel, such as an intemet connection. Moreover, the source device 110 and the destination device 130 may comprise any of a wide range of devices, including devices supporting over the-air television broadcasts, cable television applications, internet video applications (including streaming) and applications where encoded video data is captured on some computer-readable storage medium, such as hard disk drives in a file server.

[00051] As shown in Fig. 1, the source device 110 includes a video source 112, a video encoder 114 and a transmitter 116. The video source 112 typically comprises a source of captured video frame data (shown as 113), such as an image capture sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote image capture sensor. The video source 112 may also be an output of a computer graphics card, for example displaying the video output of an operating system and various applications executing upon a computing device, for example a tablet computer. Examples of source devices 110 that may include an image capture sensor as the video source 112 include smart-phones, video camcorders, professional video cameras, and network video cameras.

[00052] The video encoder 114 converts (or 'encodes') the captured frame data (indicated by an arrow 113) from the video source 112 into a bitstream (indicated by an arrow 115) as described further with reference to Fig. 3. The bitstream 115 is transmitted by the transmitter 116 over the communication channel 120 as encoded video data (or "encoded video information"). It is also possible for the bitstream 115 to be stored in a non-transitory storage

22856187_1 device 122, such as a "Flash" memory or a hard disk drive, until later being transmitted over the communication channel 120, or in-lieu of transmission over the communication channel 120.

[00053] The destination device 130 includes a receiver 132, a video decoder 134 and a display device 136. The receiver 132 receives encoded video data from the communication channel 120 and passes received video data to the video decoder 134 as a bitstream (indicated by an arrow 133). The video decoder 134 then outputs decoded frame data (indicated by an arrow 135) to the display device 136. The decoded frame data 135 has the same chroma format as the frame data 113. Examples of the display device 136 include a cathode ray tube, a liquid crystal display, such as in smart-phones, tablet computers, computer monitors or in stand-alone television sets. It is also possible for the functionality of each of the source device 110 and the destination device 130 to be embodied in a single device, examples of which include mobile telephone handsets and tablet computers.

[00054] Notwithstanding the example devices mentioned above, each of the source device 110 and destination device 130 may be configured within a general purpose computing system, typically through a combination of hardware and software components. Fig. 2A illustrates such a computer system 200, which includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, which may be configured as the video source 112, and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220, which may represent the communication channel 120, may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional "dial-up" modem. Alternatively, where the connection 221 is a high capacity (e.g., cable or optical) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220. The transceiver device 216 may provide the functionality of the transmitter 116 and the receiver 132 and the communication channel 120 may be embodied in the connection 221.

[00055] The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access

22856187_1 memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes an number of input/output (1/0) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called "firewall" device or device of similar functionality. The local network interface 211 may comprise an EthemetTM circuit card, a BluetoothTM wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 211. The local network interface 211 may also provide the functionality of the transmitter 116 and the receiver 132 and communication channel 120 may also be embodied in the local communications network 222.

[00056] The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu ray DiscTM), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the video source 112, or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 110 and the destination device 130 of the system 100 may be embodied in the computer system 200.

[00057] The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the

22856187_1 computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun SPARCstations, Apple MacTM or alike computer systems.

[00058] Where appropriate or desired, the video encoder 114 and the video decoder 134, as well as methods described below, may be implemented using the computer system 200. In particular, the video encoder 114, the video decoder 134 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 114, the video decoder 134 and the steps of the described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

[00059] The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for implementing the video encoder 114, the video decoder 134 and the described methods.

[00060] The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212.

[00061] In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions

22856187_1 and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray DiscTM, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 401 include radio or infra-red transmission channels, as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

[00062] The second part of the application program 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.

[00063] Fig. 2B is a detailed schematic block diagram of the processor 205 and a "memory" 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.

[00064] When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on

22856187_1 the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

[00065] The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the computer system 200 of Fig. 2A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.

[00066] As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

[00067] The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

[00068] In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a

22856187_1 number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

[00069] The video encoder 114, the video decoder 134 and the described methods may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder 114, the video decoder 134 and the described methods produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

[00070] Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of micro operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:

a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;

a decode operation in which the control unit 239 determines which instruction has been fetched; and

an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.

[00071] Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

[00072] Each step or sub-process in the method of Figs. 10 and 11, to be described, is associated with one or more segments of the program 233 and is typically performed by the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205

22856187_1 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233.

[00073] Fig. 3 is a schematic block diagram showing functional modules of the video encoder 114. Fig. 4 is a schematic block diagram showing functional modules of the video decoder 134. Generally, data passes between functional modules within the video encoder 114 and the video decoder 134 in groups of samples or coefficients, such as divisions of blocks into sub-blocks of a fixed size, or as arrays. The video encoder 114 and video decoder 134 may be implemented using a general-purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205. Alternatively the video encoder 114 and video decoder 134 may be implemented by a combination of dedicated hardware and software executable within the computer system 200. The video encoder 114, the video decoder 134 and the described methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processing units (GPUs), digital signal processors (DSPs), application-specific standard products (ASSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular, the video encoder 114 comprises modules 310-386 and the video decoder 134 comprises modules 420-496 which may each be implemented as one or more software code modules of the software application program 233.

[00074] Although the video encoder 114 of Fig. 3 is an example of a versatile video coding (VVC) video encoding pipeline, other video codecs may also be used to perform the processing stages described herein. The video encoder 114 receives captured frame data 113, such as a series of frames, each frame including one or more colour channels. The frame data 113 may be in any chroma format, for example 4:0:0, 4:2:0, 4:2:2, or 4:4:4 chroma format. A block partitioner 310 firstly divides the frame data 113 into CTUs, generally square in shape and configured such that a particular size for the CTUs is used. The size of the CTUs may be 64x64, 128x128, or256x256 luma samples for example. The block partitioner 310 further divides each CTU into one or more CBs according to a luma coding tree and a chroma coding tree. The CBs have a variety of sizes, and may include both square and non-square aspect ratios. Operation of the block partitioner 310 is further described with reference to Fig. 10. However,

22856187_1 in the VVC standard, CBs, CUs, PUs, and TUs always have side lengths that are powers of two. Thus, a current CB, represented as 312, is output from the block partitioner 310, progressing in accordance with an iteration over the one or more blocks of the CTU, in accordance with the luma coding tree and the chroma coding tree of the CTU. Options for partitioning CTUs into CBs are further described below with reference to Figs. 5 and 6.

[00075] The CTUs resulting from the first division of the frame data 113 may be scanned in raster scan order and may be grouped into one or more 'slices'. A slice may be an 'intra' (or 'I') slice. An intra slice (I slice) indicates that every CU in the slice is intra predicted. Alternatively, a slice may be uni- or bi-predicted ('P' or 'B'slice, respectively), indicating additional availability of uni- and bi-prediction in the slice, respectively.

[00076] For each CTU, the video encoder 114 operates in two stages. In the first stage (referred to as a 'search' stage), the block partitioner 310 tests various potential configurations of a coding tree. Each potential configuration of a coding tree has associated 'candidate' CBs. The first stage involves testing various candidate CBs to select CBs providing high compression efficiency with low distortion. The testing generally involves a Lagrangian optimisation whereby a candidate CB is evaluated based on a weighted combination of the rate (coding cost) and the distortion (error with respect to the input frame data 113). The 'best' candidate CBs (the CBs with the lowest evaluated rate/distortion) are selected for subsequent encoding into the bitstream 115. Included in evaluation of candidate CBs is an option to use a CB for a given area or to further split the area according to various splitting options and code each of the smaller resulting areas with further CBs, or split the areas even further. As a consequence, both the CBs and the coding tree themselves are selected in the search stage.

[00077] The video encoder 114 produces a prediction block (PB), indicated by an arrow 320, for each CB, for example the CB 312. The PB 320 is a prediction of the contents of the associated CB 312. A subtracter module 322 produces a difference, indicated as 324 (or 'residual', referring to the difference being in the spatial domain), between the PB 320 and the CB 312. The difference 324 is a block-size difference between corresponding samples in the PB 320 and the CB 312. The difference 324 is transformed, quantised and represented as a transform block (TB), indicated by an arrow 336. The PB 320 and associated TB 336 are typically chosen from one of many possible candidate CBs, for example based on evaluated cost or distortion.

22856187_1

[00078] A candidate coding block (CB) is a CB resulting from one of the prediction modes available to the video encoder 114 for the associated PB and the resulting residual. Each candidate CB results in one or more corresponding TBs, as described hereafter with reference to Fig. 8. The TB 336 is a quantised and transformed representation of the difference 324. When combined with the predicted PB in the video decoder 114, the TB 336 reduces the difference between decoded CBs and the original CB 312 at the expense of additional signalling in a bitstream.

[00079] Each candidate coding block (CB), that is prediction block (PB) in combination with a transform block (TB), thus has an associated coding cost (or 'rate') and an associated difference (or 'distortion'). The rate is typically measured in bits. The distortion of the CB is typically estimated as a difference in sample values, such as a sum of absolute differences (SAD) or a sum of squared differences (SSD). The estimate resulting from each candidate PB may be determined by a mode selector 386 using the difference 324 to determine an intra prediction mode (represented by an arrow 388). Estimation of the coding costs associated with each candidate prediction mode and corresponding residual coding can be performed at significantly lower cost than entropy coding of the residual. Accordingly, a number of candidate modes can be evaluated to determine an optimum mode in a rate-distortion sense.

[00080] Determining an optimum mode in terms of rate-distortion is typically achieved using a variation of Lagrangian optimisation. Selection of the intra prediction mode 388 typically involves determining a coding cost for the residual data resulting from application of a particular intra prediction mode. The coding cost may be approximated by using a 'sum of absolute transformed differences' (SATD) whereby a relatively simple transform, such as a Hadamard transform, is used to obtain an estimated transformed residual cost. In some implementations using relatively simple transforms, the costs resulting from the simplified estimation method are monotonically related to the actual costs that would otherwise be determined from a full evaluation. In implementations with monotonically related estimated costs, the simplified estimation method may be used to make the same decision (i.e. intra prediction mode) with a reduction in complexity in the video encoder 114. To allow for possible non-monotonicity in the relationship between estimated and actual costs, the simplified estimation method may be used to generate a list of best candidates. The non-monotonicity may result from further mode decisions available for the coding of residual data, for example. The list of best candidates may be of an arbitrary number. A more complete search may be performed using the best candidates to establish optimal mode choices for coding the residual

22856187_1 data for each of the candidates, allowing a final selection of the intra prediction mode along with other mode decisions.

[00081] The other mode decisions include an ability to skip a forward transform, known as 'transform skip'. Skipping the transforms is suited to residual data that lacks adequate correlation for reduced coding cost via expression as transform basis functions. Certain types of content, such as relatively simple computer generated graphics may exhibit similar behaviour. For a 'skipped transform', residual coefficients are still coded even though the transform itself is not performed.

[00082] Lagrangian or similar optimisation processing can be employed to both select an optimal partitioning of a CTU into CBs (by the block partitioner 310) as well as the selection of a best prediction mode from a plurality of possibilities. Through application of a Lagrangian optimisation process of the candidate modes in the mode selector module 386, the intra prediction mode with the lowest cost measurement is selected as the 'best' mode. The lowest cost mode is the selected intra prediction mode 388 and is also encoded in the bitstream 115 by an entropy encoder 338. The selection of the intra prediction mode 388 by operation of the mode selector module 386 extends to operation of the block partitioner 310. Forexample, candidates for selection of the intra prediction mode 388 may include modes applicable to a given block and additionally modes applicable to multiple smaller blocks that collectively are collocated with the given block. In cases including modes applicable to a given block and smaller collocated blocks, the process of selection of candidates implicitly is also a process of determining the best hierarchical decomposition of the CTU into CBs.

[00083] In the second stage of operation of the video encoder 114 (referred to as a 'coding' stage), an iteration over the selected luma coding tree and the selected chroma coding tree, and hence each selected CB, is performed in the video encoder 114. In the iteration, the CBs are encoded into the bitstream 115, as described further herein.

[00084] The entropy encoder 338 supports both variable-length coding of syntax elements and arithmetic coding of syntax elements. Arithmetic coding is supported using a context-adaptive binary arithmetic coding process. Arithmetically coded syntax elements consist of sequences of one or more 'bins'. Bins, like bits, have a value of '0' or '1'.However bins are not encoded in the bitstream 115 as discrete bits. Bins have an associated predicted (or 'likely' or 'most probable') value and an associated probability, known as a 'context'. When the actual bin to be coded matches the predicted value, a 'most probable symbol' (MPS) is coded. Coding a most

22856187_1 probable symbol is relatively inexpensive in terms of consumed bits. When the actual bin to be coded mismatches the likely value, a 'least probable symbol' (LPS) is coded. Coding a least probable symbol has a relatively high cost in terms of consumed bits. The bin coding techniques enable efficient coding of bins where the probability of a '0' versus a '1' is skewed. For a syntax element with two possible values (that is, a 'flag'), a single bin is adequate. For syntax elements with many possible values, a sequence of bins is needed.

[00085] The presence of later bins in the sequence may be determined based on the value of earlier bins in the sequence. Additionally, each bin may be associated with more than one context. The selection of a particular context can be dependent on earlier bins in the syntax element, the bin values of neighbouring syntax elements (i.e. those from neighbouring blocks) and the like. Each time a context-coded bin is encoded, the context that was selected for that bin (if any) is updated in a manner reflective of the new bin value. As such, the binary arithmetic coding scheme is said to be adaptive.

[00086] Also supported by the video encoder 114 are bins that lack a context ('bypass bins'). Bypass bins are coded assuming an equiprobable distribution between a '0' and a '1'. Thus, each bin occupies one bit in the bitstream 115. The absence of a context saves memory and reduces complexity, and thus bypass bins are used where the distribution of values for the particular bin is not skewed. One example of an entropy coder employing context and adaption is known in the art as CABAC (context adaptive binary arithmetic coder) and many variants of this coder have been employed in video coding.

[00087] The entropy encoder 338 encodes the intra prediction mode 388 using a combination of context-coded and bypass-coded bins. Typically, a list of 'most probable modes' is generated in the video encoder 114. The list of most probable modes is typically of a fixed length, such as three or six modes, and may include modes encountered in earlier blocks. A context-coded bin encodes a flag indicating if the intra prediction mode is one of the most probable modes. If the intra prediction mode 388 is one of the most probable modes, further signalling, using bypass coded bins, is encoded. The encoded further signalling is indicative of which most probable mode corresponds with the intra prediction mode 388, for example using a truncated unary bin string. Otherwise, the intra prediction mode 388 is encoded as a 'remaining mode'. Encoding as a remaining mode uses an alternative syntax, such as a fixed-length code, also coded using bypass-coded bins, to express intra prediction modes other than those present in the most probable mode list.

22856187_1

[00088] A multiplexer module 384 outputs the PB 320 according to the determined best intra prediction mode 388, selecting from the tested prediction mode of each candidate CB. The candidate prediction modes need not include every conceivable prediction mode supported by the video encoder 114.

[00089] Prediction modes fall broadly into two categories. A first category is 'intra-frame prediction' (also referred to as 'intra prediction'). In intra-frame prediction, a prediction for a block is generated, and the generation method may use other samples obtained from the current frame. For an intra-predicted PB, it is possible for different intra-prediction modes to be used for luma and chroma, and thus intra prediction is described primarily in terms of operation upon PBs.

[00090] The second category of prediction modes is 'inter-frame prediction' (also referred to as 'inter prediction'). In inter-frame prediction a prediction for a block is produced using samples from one or two frames preceding the current frame in an order of coding frames in the bitstream. Moreover, for inter-frame prediction, a single coding tree is typically used for both the luma channel and the chroma channels. The order of coding frames in the bitstream may differ from the order of the frames when captured or displayed. When one frame is used for prediction, the block is said to be 'uni-predicted' and has one associated motion vector. When two frames are used for prediction, the block is said to be 'bi-predicted' and has two associated motion vectors. For a P slice, each CU may be intra predicted or uni-predicted. For a B slice, each CU may be intra predicted, uni-predicted, or bi-predicted. Frames are typically coded using a 'group of pictures' structure, enabling a temporal hierarchy of frames. A temporal hierarchy of frames allows a frame to reference a preceding and a subsequent picture in the order of displaying the frames. The images are coded in the order necessary to ensure the dependencies for decoding each frame are met.

[00091] A subcategory of inter prediction is referred to as 'skip mode'. Inter prediction and skip modes are described as two distinct modes. However, both inter prediction mode and skip mode involve motion vectors referencing blocks of samples from preceding frames. Inter prediction involves a coded motion vector delta, specifying a motion vector relative to a motion vector predictor. The motion vector predictor is obtained from a list of one or more candidate motion vectors, selected with a 'merge index'. The coded motion vector delta provides a spatial offset to a selected motion vector prediction. Inter prediction also uses a coded residual in the bitstream 133. Skip mode uses only an index (also named a 'merge index') to select one out of

22856187_1 several motion vector candidates. The selected candidate is used without any further signalling. Also, skip mode does not support coding of any residual coefficients. The absence of coded residual coefficients when the skip mode is used means that there is no need to perform transforms for the skip mode. Therefore, skip mode does not typically result in pipeline processing issues. Pipeline processing issues may be the case for intra predicted CUs and inter predicted CUs. Due to the limited signalling of the skip mode, skip mode is useful for achieving very high compression performance when relatively high quality reference frames are available. Bi-predicted CUs in higher temporal layers of a random-access group-of-picture structure typically have high quality reference pictures and motion vector candidates that accurately reflect underlying motion.

[00092] The samples are selected according to a motion vector and reference picture index. The motion vector and reference picture index applies to all colour channels and thus inter prediction is described primarily in terms of operation upon PUs rather than PBs. Within each category (that is, intra- and inter-frame prediction), different techniques may be applied to generate the PU. For example, intra prediction may use values from adjacent rows and columns of previously reconstructed samples, in combination with a direction to generate a PU according to a prescribed filtering and generation process. Alternatively, the PU may be described using a small number of parameters. Inter prediction methods may vary in the number of motion parameters and their precision. Motion parameters typically comprise a reference frame index, indicating which reference frame(s) from lists of reference frames are to be used plus a spatial translation for each of the reference frames, but may include more frames, special frames, or complex affine parameters such as scaling and rotation. In addition, a pre determined motion refinement process may be applied to generate dense motion estimates based on referenced sample blocks.

[00093] Having determined and selected the PB 320, and subtracted the PB 320 from the original sample block at the subtractor 322, a residual with lowest coding cost, represented as 324, is obtained and subjected to lossy compression. The lossy compression process comprises the steps of transformation, quantisation and entropy coding. A forward primary transform module 326 applies a forward transform to the difference 324, converting the difference 324 from the spatial domain to the frequency domain, and producing primary transform coefficients represented by an arrow 328. The primary transform coefficients 328 are passed to a forward secondary transform module 330 to produce transform coefficients represented by an arrow 332 by performing a non-separable secondary transform (NSST) operation. The forward primary

22856187_1 transform is typically separable, transforming a set of rows and then a set of columns of each block, typically using a type-II discrete cosine transform (DCT-2), although a type-VII discrete sine transform (DST-7) and a type-VIII discrete cosine transform (DCT-8) may also be available, for example horizontally for block widths not exceeding 16 samples and vertically for block heights not exceeding 16 samples. The transformation of each set of rows and columns is performed by applying one-dimensional transforms firstly to each row of a block to produce an intermediate result and then to each column of the intermediate result to produce a final result. The forward secondary transform is generally a non-separable transform, which is only applied for the residual of intra-predicted CUs and may nonetheless also be bypassed. The forward secondary transform operates either on 16 samples (arranged as the upper-left 4x4 sub-block of the primary transform coefficients 328) or 64 samples (arranged as the upper-left 8 x8 coefficients, arranged as four 4x4 sub-blocks of the primary transform coefficients 328). Moreover, the matrix coefficients of the forward secondary transform are selected from multiple sets according to the intra prediction mode of the CU such that two sets of coefficients are available for use. The use of one of the sets of matrix coefficients, or the bypassing of the forward secondary transform, is signalled with an "nsstindex" syntax element, coded using a truncated unary binarisation to express the values zero (secondary transform not applied), one (first set of matrix coefficients selected), or two (second set of matrix coefficients selected).

[00094] The transform coefficients 332 are passed to a quantiser module 334. At the module 334, quantisation in accordance with a 'quantisation parameter' is performed to produce residual coefficients, represented by the arrow 336. The quantisation parameter is constant for a given TB and thus results in a uniform scaling for the production of residual coefficients for a TB. A non-uniform scaling is also possible by application of a 'quantisation matrix', whereby the scaling factor applied for each residual coefficient is derived from a combination of the quantisation parameter and the corresponding entry in a scaling matrix, typically having a size equal to that of the TB. The scaling matrix may have a size that is smaller than the size of the TB, and when applied to the TB a nearest neighbour approach can be used to provide scaling values for each residual coefficient from a scaling matrix smaller in size than the TB size. The residual coefficients 336 are supplied to the entropy encoder 338 for encoding in the bitstream 115. Typically, the residual coefficients of each TB with at least one significant residual coefficient of the TU are scanned to produce an ordered list of values, according to a scan pattern. The scan pattern generally scans the TB as a sequence of 4x4 'sub-blocks', providing a regular scanning operation at the granularity of 4x4 sets of residual coefficients, with the arrangement of sub-blocks dependent on the size of the TB. Additionally, the

22856187_1 prediction mode 388 and the corresponding block partitioning are also encoded in the bitstream 115.

[00095] As described above, the video encoder 114 needs access to a frame representation corresponding to the frame representation seen in the video decoder 134. Thus, the residual coefficients 336 are also inverse quantised by a dequantiser module 340 to produce inverse transform coefficients, represented by an arrow 342. The inverse transform coefficients 342 are passed through an inverse secondary transform module 344 to produce intermediate inverse transform coefficients, represented by an arrow 346. The intermediate inverse transform coefficients 346 are passed to an inverse primary transform module 348 to produce residual samples, represented by an arrow 350, of the TU. The types of inverse transform performed by the inverse secondary transform module 344 correspond with the types of forward transform performed by the forward secondary transform module 330. The types of inverse transform performed by the inverse primary transform module 348 correspond with the types of primary transform performed by the primary transform module 326. A summation module 352 adds the residual samples 350 and the PU 320 to produce reconstructed samples (indicated by an arrow 354) of the CU.

[00096] The reconstructed samples 354 are passed to a reference sample cache 356 and an in loop filters module 368. The reference sample cache 356, typically implemented using static RAM on an ASIC (thus avoiding costly off-chip memory access) provides minimal sample storage needed to satisfy the dependencies for generating intra-frame PBs for subsequent CUs in the frame. The minimal dependencies typically include a 'line buffer' of samples along the bottom of a row of CTUs, for use by the next row of CTUs and column buffering the extent of which is set by the height of the CTU. The reference sample cache 356 supplies reference samples (represented by an arrow 358) to a reference sample filter 360. The sample filter 360 applies a smoothing operation to produce filtered reference samples (indicated by an arrow 362). The filtered reference samples 362 are used by an intra-frame prediction module 364 to produce an intra-predicted block of samples, represented by an arrow 366. For each candidate intra prediction mode the intra-frame prediction module 364 produces a block of samples, that is 366.

[00097] The in-loop filters module 368 applies several filtering stages to the reconstructed samples 354. The filtering stages include a 'deblocking filter' (DBF) which applies smoothing aligned to the CU boundaries to reduce artefacts resulting from discontinuities. Another filtering stage present in the in-loop filters module 368 is an 'adaptive loop filter' (ALF), which

22856187_1 applies a Wiener-based adaptive filter to further reduce distortion. A further available filtering stage in the in-loop filters module 368 is a 'sample adaptive offset' (SAO) filter. The SAO filter operates by firstly classifying reconstructed samples into one or multiple categories and, according to the allocated category, applying an offset at the sample level.

[00098] Filtered samples, represented by an arrow 370, are output from the in-loop filters module 368. The filtered samples 370 are stored in a frame buffer 372. The frame buffer 372 typically has the capacity to store several (for example up to 16) pictures and thus is stored in the memory 206. The frame buffer 372 is not typically stored using on-chip memory due to the large memory consumption required. As such, access to the frame buffer 372 is costly in terms of memory bandwidth. The frame buffer 372 provides reference frames (represented by an arrow 374) to a motion estimation module 376 and a motion compensation module 380.

[00099] The motion estimation module 376 estimates a number of 'motion vectors' (indicated as 378), each being a Cartesian spatial offset from the location of the present CB, referencing a block in one of the reference frames in the frame buffer 372. Afiltered block of reference samples (represented as 382) is produced for each motion vector. The filtered reference samples 382 form further candidate modes available for potential selection by the mode selector 386. Moreover, for a given CU, the PU 320 may be formed using one reference block ('uni-predicted') or may be formed using two reference blocks ('bi-predicted'). For the selected motion vector, the motion compensation module 380 produces the PB 320 in accordance with a filtering process supportive of sub-pixel accuracy in the motion vectors. As such, the motion estimation module 376 (which operates on many candidate motion vectors) may perform a simplified filtering process compared to that of the motion compensation module 380 (which operates on the selected candidate only) to achieve reduced computational complexity. When the video encoder 114 selects inter prediction for a CU the motion vector 378 is encoded into the bitstream 115.

[000100] Although the video encoder 114 of Fig. 3 is described with reference to versatile video coding (VVC), other video coding standards or implementations may also employ the processing stages of modules 310-386. The frame data 113 (and bitstream 115) may also be read from (or written to) memory 206, the hard disk drive 210, a CD-ROM, a Blu-ray diskTM or other computer readable storage medium. Additionally, the frame data 113 (and bitstream 115) may be received from (or transmitted to) an external source, such as a server connected to the communications network 220 or a radio-frequency receiver.

22856187_1

[000101] The video decoder 134 is shown in Fig. 4. Although the video decoder 134 of Fig. 4 is an example of a versatile video coding (VVC) video decoding pipeline, other video codecs may also be used to perform the processing stages described herein. As shown in Fig. 4, the bitstream 133 is input to the video decoder 134. The bitstream 133 may be read from memory 206, the hard disk drive 210, a CD-ROM, a Blu-ray diskTM or other non-transitory computer readable storage medium. Alternatively, the bitstream 133 may be received from an external source such as a server connected to the communications network 220 or a radio frequency receiver. The bitstream 133 contains encoded syntax elements representing the captured frame data to be decoded.

[000102] The bitstream 133 is input to an entropy decoder module 420. The entropy decoder module 420 extracts syntax elements from the bitstream 133 by decoding sequences of 'bins' and passes the values of the syntax elements to other modules in the video decoder 134. The entropy decoder module 420 uses an arithmetic decoding engine to decode each syntax element as a sequence of one or more bins. Each bin may use one or more 'contexts', with a context describing probability levels to be used for coding a 'one' and a 'zero' value for the bin. Where multiple contexts are available for a given bin, a 'context modelling' or 'context selection' step is performed to choose one of the available contexts for decoding the bin. The process of decoding bins forms a sequential feedback loop. The number of operations in the feedback loop is preferably minimised to enable the entropy decoder 420 to achieve a high throughput in bins/second. Context modelling depends on other properties of the bitstream known to the video decoder 134 at the time of selecting the context, that is, properties preceding the current bin. For example, a context may be selected based on the quad-tree depth of the current CU in the coding tree. Dependencies are preferably based on properties that are known well in advance of decoding a bin, or are determined without requiring long sequential processes.

[000103] A quadtree depth of a coding tree is an example of a dependency for context modelling that is easily known. An intra prediction mode is an example of a dependency for context modelling that is relatively difficult or computationally intensive to determine. Intra prediction modes are coded as either an index into a list of 'most probable modes' (MPMs) or an index into a list of 'remaining modes', with the selection between MPMs and remaining modes according to a decoded 'intralumampm flag'. When an MPM is in use an 'intralumampmidx' syntax element is decoded to select which one of the most probable modes is to be used. Generally there are six MPMs. When a remaining mode is in use an 'intralumaremainder' syntax element is decoded to select which one of the remaining (non

22856187_1

MPM) modes is to be used. Determining both the most probable modes and the remaining modes requires a substantial number of operations and includes dependencies on the intra prediction modes of neighbouring blocks. For example, the neighbouring blocks can be the block(s) above and to the left of the current block. Desirably, the contexts of the bins of each CU can be determined, enabling parsing by the arithmetic coding engine, without knowing the intra prediction mode being signalled. The feedback loop present in the arithmetic coding engine for sequential bin decoding thus avoids a dependency on the intra prediction mode. The intra prediction mode determination can be deferred to a subsequent processing stage, with a separate feedback loop due to the dependency of MPM list construction on the intra prediction modes of neighbouring blocks. Accordingly, the arithmetic decoding engine of the entropy decoder module 420 is able to parse the intralumampm flag, intra luma mpm idx, intralumaremainder without needing to know the intra prediction modes of any earlier (e.g. neighbouring) block. The entropy decoder module 420 applies an arithmetic coding algorithm, for example 'context adaptive binary arithmetic coding' (CABAC), to decode syntax elements from the bitstream 133. The decoded syntax elements are used to reconstruct parameters within the video decoder 134. Parameters include residual coefficients (represented by an arrow 424) and mode selection information such as an intra prediction mode (represented by an arrow 458). The mode selection information also includes information such as motion vectors, and the partitioning of each CTU into one or more CBs. Parameters are used to generate PBs, typically in combination with sample data from previously decoded CBs.

[000104] The residual coefficients 424 are input to a dequantiser module 428. The dequantiser module 428 performs inverse quantisation (or 'scaling') on the residual coefficients 424 to create reconstructed intermediate transform coefficients, represented by an arrow 432, according to a quantisation parameter. The reconstructed intermediate transform coefficients 432 are passed to an inverse secondary transform module 436 where a secondary transform is applied or no operation (bypass), in accordance with a decoded "nsst-index" syntax element. The "nsstindex" is decoded from the bitstream 133 by the entropy decoder 420, under execution of the processor 205. The inverse secondary transform module 436 produces reconstructed transform coefficients 440. Should use of a non-uniform inverse quantisation matrix be indicated in the bitstream 133, the video decoder 134 reads a quantisation matrix from the bitstream 133 as a sequence of scaling factors and arranges the scaling factors into a matrix. The inverse scaling uses the quantisation matrix in combination with the quantisation parameter to create the reconstructed intermediate transform coefficients 432.

22856187_1

[000105] The reconstructed transform coefficients 440 are passed to an inverse primary transform module 444. The module 444 transforms the coefficients from the frequency domain back to the spatial domain. The result of operation of the module 444 is a block of residual samples, represented by an arrow 448. The block of residual samples 448 is equal in size to the corresponding CU. The residual samples 448 are supplied to a summation module 450. At the summation module 450 the residual samples 448 are added to a decoded PB (represented as 452) to produce a block of reconstructed samples, represented by an arrow 456. The reconstructed samples 456 are supplied to a reconstructed sample cache 460 and an in-loop filtering module 488. The in-loop filtering module 488 produces reconstructed blocks of frame samples, represented as 492. The frame samples 492 are written to a frame buffer 496.

[000106] The reconstructed sample cache 460 operates similarly to the reconstructed sample cache 356 of the video encoder 114. The reconstructed sample cache 460 provides storage for reconstructed sample needed to intra predict subsequent CBs without the memory 206 (for example by using the data 232 instead, which is typically on-chip memory). Reference samples, represented by an arrow 464, are obtained from the reconstructed sample cache 460 and supplied to a reference sample filter 468 to produce filtered reference samples indicated by arrow 472. The filtered reference samples 472 are supplied to an intra-frame prediction module 476. The module 476 produces a block of intra-predicted samples, represented by an arrow 480, in accordance with the intra prediction mode parameter 458 signalled in the bitstream 133 and decoded by the entropy decoder 420.

[000107] When the prediction mode of a CB is indicated to be intra prediction in the bitstream 133, the intra-predicted samples 480 form the decoded PB 452 via a multiplexor module 484. Intra prediction produces a prediction block (PB) of samples, that is, a block in one colour component, derived using 'neighbouring samples' in the same colour component. The neighbouring samples are samples adjacent to the current block and by virtue of being preceding in the block decoding order have already been reconstructed. Where luma and chroma blocks are collocated, the luma and chroma blocks may use different intra prediction modes. However, the two chroma channels each share the same intra prediction mode. Intra prediction falls into three types. "DC intra prediction" involves populating a PB with a single value representing the average of the neighbouring samples. "Planar intra prediction" involves populating a PB with samples according to a plane, with a DC offset and a vertical and horizontal gradient being derived from the neighbouring samples. "Angular intra prediction" involves populating a PB with neighbouring samples filtered and propagated across the PB in a

22856187_1 particular direction (or 'angle'). In VVC 65 angles are supported, with rectangular blocks able to utilise additional angles, not available to square blocks, to produce a total of 87 angles. A fourth type of intra prediction is available to chroma PBs, whereby the PB is generated from collocated luma reconstructed samples according to a 'cross-component linear model' (CCLM) mode. Three different CCLM modes are available, each of which uses a different model derived from the neighbouring luma and chroma samples. The derived model is then used to generate a block of samples for the chroma PB from the collocated luma samples.

[000108] When the prediction mode of a CB is indicated to be inter prediction in the bitstream 133, a motion compensation module 434 produces a block of inter-predicted samples, represented as 438, using a motion vector and reference frame index to select and filter a block of samples 498 from a frame buffer 496. The block of samples 498 is obtained from a previously decoded frame stored in the frame buffer 496. For bi-prediction, two blocks of samples are produced and blended together to produce samples for the decoded PB 452. The frame buffer 496 is populated with filtered block data 492 from an in-loop filtering module 488. As with the in-loop filtering module 368 of the video encoder 114, the in-loop filtering module 488 applies any of the DBF, the ALF and SAO filtering operations. Generally, the motion vector is applied to both the luma and chroma channels, although the filtering processes for sub-sample interpolation luma and chroma channel are different.

[000109] Fig. 5 is a schematic block diagram showing a collection 500 of available divisions or splits of a region into one or more sub-regions in the tree structure of versatile video coding. The divisions shown in the collection 500 are available to the block partitioner 310 of the encoder 114 to divide each CTU into one or more CUs or CBs according to a coding tree, as determined by the Lagrangian optimisation, as described with reference to Fig. 3.

[000110] Although the collection 500 shows only square regions being divided into other, possibly non-square sub-regions, it should be understood that the diagram 500 is showing the potential divisions but not requiring the containing region to be square. If the containing region is non-square, the dimensions of the blocks resulting from the division are scaled according to the aspect ratio of the containing block. Once a region is not further split, that is, at a leaf node of the coding tree, a CU occupies that region. The particular subdivision of a CTU into one or more CUs by the block partitioner 310 is referred to as the 'coding tree' of the CTU.

[000111] The process of subdividing regions into sub-regions must terminate when the resulting sub-regions reach a minimum CU size. In addition to constraining CUs to prohibit

22856187_1 block areas smaller than a predetermined minimum size, for example 16 samples, CUs are constrained to have a minimum width or height of four. Other minimums, both in terms of width and height or in terms of width or height are also possible. The process of subdivision may also terminate prior to the deepest level of decomposition, resulting in a CU larger than the minimum CU size. It is possible for no splitting to occur, resulting in a single CU occupying the entirety of the CTU. A single CU occupying the entirety of the CTU is the largest available coding unit size. Due to use of subsampled chroma formats, such as 4:2:0, arrangements of the video encoder 114 and the video decoder 134 may terminate splitting of regions in the chroma channels earlier than in the luma channels.

[000112] At the leaf nodes of the coding tree exist CUs, with no further subdivision. For example, a leaf node 510 contains one CU. At the non-leaf nodes of the coding tree exist a split into two or more further nodes, each of which could be a leaf node that forms one CU, or a non leaf node containing further splits into smaller regions. At each leaf node of the coding tree, one coding block exists for each colour channel. Splitting terminating at the same depth for both luma and chroma results in three collocated CBs. Splitting terminating at a deeper depth for luma than for chroma results in a plurality of luma CBs being collocated with the CBs of the chroma channels.

[000113] A quad-tree split 512 divides the containing region into four equal-size regions as shown in Fig. 5. Compared to HEVC, versatile video coding (VVC) achieves additional flexibility with the addition of a horizontal binary split 514 and a vertical binary split 516. Each of the splits 514 and 516 divides the containing region into two equal-size regions. The division is either along a horizontal boundary (514) or a vertical boundary (516) within the containing block.

[000114] Further flexibility is achieved in versatile video coding with addition of a ternary horizontal split 518 and a ternary vertical split 520. The ternary splits 518 and 520 divide the block into three regions, bounded either horizontally (518) or vertically (520) along /4and/4 of the containing region width or height. The combination of the quad tree, binary tree, and ternary tree is referred to as 'QTBTTT'. The root of the tree includes zero or more quadtree splits (the 'QT' section of the tree). Once the QT section terminates, zero or more binary or ternary splits may occur (the 'multi-tree' or 'MT' section of the tree), finally ending in CBs or CUs at leaf nodes of the tree. Where the tree describes all colour channels, the tree leaf nodes are CUs. Where the tree describes the luma channel or the chroma channels, the tree leaf nodes are CBs.

22856187_1

[000115] Compared to HEVC, which supports only the quad tree and thus only supports square blocks, the QTBTTT results in many more possible CU sizes, particularly considering possible recursive application of binary tree and/or ternary tree splits. The potential for unusual (non square) block sizes can be reduced by constraining split options to eliminate splits that would result in a block width or height either being less than four samples or in not being a multiple of four samples. Generally, the constraint would apply in considering luma samples. However, in the arrangements described, the constraint can be applied separately to the blocks for the chroma channels. Application of the constraint to split options to chroma channels can result in differing minimum block sizes for luma versus chroma, for example when the frame data is in the 4:2:0 chroma format or the 4:2:2 chroma format. Each split produces sub-regions with a side dimension either unchanged, halved or quartered, with respect to the containing region. Then, since the CTU size is a power of two, the side dimensions of all CUs are also powers of two.

[000116] Fig. 6 is a schematic flow diagram illustrating a data flow 600 of a QTBTTT (or 'coding tree') structure used in versatile video coding. The QTBTTT structure is used for each CTU to define a division of the CTU into one or more CUs. The QTBTTT structure of each CTU is determined by the block partitioner 310 in the video encoder 114 and encoded into the bitstream 115 or decoded from the bitstream 133 by the entropy decoder 420 in the video decoder 134. The data flow 600 further characterises the permissible combinations available to the block partitioner 310 for dividing a CTU into one or more CUs, according to the divisions shown in Fig. 5.

[000117] Starting from the top level of the hierarchy, that is at the CTU, zero or more quad-tree divisions are first performed. Specifically, a Quad-tree (QT) split decision 610 is made by the block partitioner 310. The decision at 610 returning a '1' symbol indicates a decision to split the current node into four sub-nodes according to the quad-tree split 512. The result is the generation of four new nodes, such as at 620, and for each new node, recursing back to the QT split decision 610. Each new node is considered in raster (or Z-scan) order. Alternatively, if the QT split decision 610 indicates that no further split is to be performed (returns a '0' symbol), quad-tree partitioning ceases and multi-tree (MT) splits are subsequently considered.

[000118] Firstly, an MT split decision 612 is made by the block partitioner 310. At 612, a decision to perform an MT split is indicated. Returning a '0' symbol at decision 612 indicates that no further splitting of the node into sub-nodes is to be performed. If no further splitting of a node is to be performed, then the node is a leaf node of the coding tree and corresponds to a

22856187_1

CU. The leaf node is output at 622. Alternatively, if the MT split 612 indicates a decision to perform an MT split (returns a '1' symbol), the block partitioner 310 proceeds to a direction decision 614.

[000119] The direction decision 614 indicates the direction of the MT split as either horizontal ('H' or '0') or vertical ('V' or '1').The block partitioner 310 proceeds to a decision 616 if the decision 614 returns a '0' indicating a horizontal direction. The block partitioner 310 proceeds to a decision 618 if the decision 614 returns a '1'indicating a vertical direction.

[000120] At each of the decisions 616 and 618, the number of partitions for the MT split is indicated as either two (binary split or 'BT' node) or three (ternary split or 'TT') at the BT/TT split. That is, a BT/TT split decision 616 is made by the block partitioner 310 when the indicated direction from 614 is horizontal and a BT/TT split decision 618 is made by the block partitioner 310 when the indicated direction from 614 is vertical.

[000121] The BT/TT split decision 616 indicates whether the horizontal split is the binary split 514, indicated by returning a '0', or the ternary split 518, indicated by returning a '1'. When the BT/TT split decision 616 indicates a binary split, at a generate HBT CTU nodes step 625 two nodes are generated by the block partitioner 310, according to the binary horizontal split 514. When the BT/TT split 616 indicates a ternary split, at a generate HTT CTU nodes step 626 three nodes are generated by the block partitioner 310, according to the ternary horizontal split 518.

[000122] The BT/TT split decision 618 indicates whether the vertical split is the binary split 516, indicated by returning a '0', or the ternary split 520, indicated by returning a '1'. When the BT/TT split 618 indicates a binary split, at a generate VBT CTU nodes step 627 two nodes are generated by the block partitioner 310, according to the vertical binary split 516. When the BT/TT split 618 indicates a ternary split, at a generate VTT CTU nodes step 628 three nodes are generated by the block partitioner 310, according to the vertical ternary split 520. For each node resulting from steps 625-628 recursion of the data flow 600 back to the MT split decision 612 is applied, in a left-to-right or top-to-bottom order, depending on the direction 614. As a consequence, the binary tree and ternary tree splits may be applied to generate CUs having a variety of sizes.

[000123] Figs. 7A and 7B provide an example division 700 of a CTU 710 into a number of CUs or CBs. An example CU 712 is shown in Fig. 7A. Fig. 7A shows a spatial arrangement of CUs in the CTU 710. The example division 700 is also shown as a coding tree 720 in Fig. 7B.

22856187_1

[000124] At each non-leaf node in the CTU 710 of Fig. 7A, for example nodes 714, 716 and 718, the contained nodes (which may be further divided or may be CUs) are scanned or traversed in a 'Z-order' to create lists of nodes, represented as columns in the coding tree 720. For a quad-tree split, the Z-order scanning results in top left to right followed by bottom left to right order. For horizontal and vertical splits, the Z-order scanning (traversal) simplifies to a top-to-bottom scan and a left-to-right scan, respectively. The coding tree 720 of Fig. 7B lists all nodes and CUs according to the applied scan order. Each split generates a list of two, three or four new nodes at the next level of the tree until a leaf node (CU) is reached.

[000125] Having decomposed the image into CTUs and further into CUs by the block partitioner 310, and using the CUs to generate each residual block (324) as described with reference to Fig. 3, residual blocks are subject to forward transformation and quantisation by the video encoder 114. The resulting TBs 336 are subsequently scanned to form a sequential list of residual coefficients, as part of the operation of the entropy coding module 338. An equivalent process is performed in the video decoder 134 to obtain TBs from the bitstream 133.

[000126] Fig. 8A shows a graph 800 showing two scalar quantisers for quantising and inverse quantising residual coefficients. Transform coefficients, such as the coefficients 332, are integer values output from the forward primary transform module 326 in combination with integer values output from the forward secondary transform module 330. Transform coefficients are quantised by the quantiser block 334 to produce residual coefficients. Quantisation involves selecting a quantisation index closest in value to the transform coefficient according to a quantiser step size that results from a quantisation parameter. As shown in the graph 800, two quantisers (QOand QI) are available. The quantiser to be used is selected according to a state variable, referred to as S. The state variable S is updated according to afinite state machine (FSM) as described with reference to Fig. 8B. The FSM is updated in the entropy decoder 420 as scanning progresses from one coefficient to the next in the entropy decoder 420. The FSM is updated based upon a previous value of the state S and parity of the current residual coefficient magnitude. The parity relates to whether the current residual coefficient is an odd value such as 810 compared to an even value such as 820.

[000127] Quantisation methods can include a single quantiser or two quantisers. Firstly, quantisation methods with a single quantiser are described. One method for selecting quantisation indices is to divide the transform coefficient by the quantiser step size (referred to as a 'uniform quantiser'). Quantisation is a lossy process due to discarding of the distance between the transform coefficient and the chosen quantisation, for example the remainder when

22856187_1 a uniform quantiser is used. A decrease in loss in the coding performance can be achieved when a larger range of transform coefficients are assigned to the zero residual coefficient value (referred to as a 'deadzone quantiser'). Further decrease of the loss in coding performance can be obtained by considering the distortion resulting from each quantisation index being one of several values nearby (e.g. above and below) the transform coefficient. Consideration of the distortion for each residual coefficient independently is possible ('rate-distortion optimised quantisation' or 'RDOQ').

[000128] The use of two quantisers (such as QO and QI), selected according to a state machine enables further gains in compression efficiency compared to methods such as RDOQ. Selection of one of the two quantisers according to a state machine implemented in both the coefficient parsing in the entropy decoder 420 and the dequantiser 428 provides a complexity increase compared to methods such as RDOQ. However, use of the two quantisers has been found to allow a relatively substantial coding gain (decrease in coding loss). The increase in coding gain can offset the increase in complexity.

[000129] Moreover, typical pipelined implementations perform entropy decoding in one pipeline stage and residual coefficient inverse quantisation in a later stage, reducing required memory for intermediate buffering. Moreover, multiple transform blocks would typically be buffered as the pipeline operates typically on a granularity of 64x64 luma sample regions, sometimes referred to as 'virtual pipeline data units' (VPDUs). Transform blocks are confined to a maximum width or height of 64 samples, as this is the longest ID transform size available in the VVC standard. As such, the largest TB size is 64x64. Moreover, TBs are constrained not to overlap the boundary from one VPDU to another VPDU. Notwithstanding the maximum TB size of 64x64, residual coefficients are only coded for at most the upper-left 32x32 region of a TB, so for TBs with either dimension exceeding 32 samples, a 'zeroed out' region exists, where residual coefficients are implicitly insignificant.

[000130] Fig. 8B shows a finite state machine 840 (FSM) for selecting a scalar quantiser for quantising and inverse quantising and binarising residual coefficients. Four states are used, so that the state variable Scan be one of states "00", "01", "10", and "11". Foreachresidual coefficient that is quantised or inverse quantised, the QO quantiser is used when the state is "00" or "01", and the QIquantiser is used when the state is "10" or "11",i.e. the most significant bit of the state variable is used to select between QO and Q1. The state machine 840 operates in the order in which residual coefficients are encoded and decoded, i.e. a backward diagonal scan order. When a sub-block contains no significant residual coefficients, i.e. coded sub-block flag

22856187_1 for the sub-block is zero, the state machine is not updated. Due to the order of updating the state machine, the 'last significant coefficient' is the first residual coefficient encountered by the state machine. Prior to encountering the last significant coefficient, the state is "00". The state machine 840 is updated based upon the parity of each encountered residual coefficient, that is, whether the current residual coefficient is odd or even. For the purpose of state machine updating, the absolute magnitude of the residual coefficient is considered. Prior to processing the last significant coefficient the state is reset to "00". Also, after decoding residual coefficients at the predetermined locations in the TB, for example the top-left position of each sub-block, the state S is reset to "00". The reset state forms the state that is used for quantiser and context selection for the next residual coefficient to be decoded, for example the bottom right residual coefficient of the next sub-block in the backward diagonal scan order. Then, after each residual coefficient is decoded, the state is updated in accordance with the FSM 840 to set the state for use in decoding the next residual coefficient.

[000131] Fig. 8C shows a set of possible state transitions 860 of the FSM 840. Starting with a last significant residual coefficient (i.e., "N") the state (S) begins at "00". Upon decoding coefficient N the state may transition to either "00" or "10", after which the updated state is used for decoding coefficient N-1. After decoding coefficient N-i the state may transition to any of "00", "01", "10", or "11" in accordance with the FSM 840. The set of all paths possible from the last significant residual coefficient N to a residual coefficient 0 (i.e. the DC residual coefficient) forms a structure that is commonly referred to as a trellis, and is determined by the potential state transitions 860. Each possible path within the trellis has an associated rate distortion cost. A path is chosen by selecting the path with the minimum rate-distortion cost. The selection of the minimum cost path may be referred to as a "trellis search". The constrained structure of a trellis allows for computation and memory efficient search strategies. For example, one method of selecting the minimum cost path in a trellis is the Viterbi algorithm. For residual coefficients located at particular positions in the TB the state may be reset to a known state, e.g. "00" (also referred to as a zero state), as described with reference to Figs. 9C, 12, and 15.

[000132] Fig. 9A shows a two-level backward diagonal scan 910 of an example 8x8 TB 900. The scan 910 is shown progressing from the bottom right residual coefficient position of the TB 900 back to the top-left (DC) residual coefficient position of the TB 900. The path of the scan 910 progresses within 4x4 regions, known as sub-blocks, and from one sub-block to the next. For TBs of width or height of two, sub-block sizes of 2x2, 2x8, or 8x2 are available. Scanning within a particular sub-block is either performed or the sub-block skipped, according

22856187_1 to a 'coded sub-block flag'. When scanning of a sub-block is skipped all residual coefficients within the sub-block are inferred to have a value of zero. Although the scan 910 is shown commencing from the bottom-right residual coefficient position of the TB 900, for a given set of residual coefficients scanning commences from the position of the 'last significant coefficient', the coefficient being 'last' when order of coefficients is considered as progressing from the DC coefficient instead of the scan order.

[000133] Fig. 9B shows example transform block 900b including transform coefficients 915, i.e. as output from the primary module 326 and/or secondary transform module 330 and prior to quantisation. The transform coefficients 915 are output from the primary and/or secondary transform modules, i.e. 326 and 330, depending on whether a secondary transform is being used. The transform coefficients 915 have a relatively high magnitude for being directly encoded into the bitstream 115 in terms of coding cost and compression. After quantisation, magnitudes of the resulting residual coefficients are relatively smaller and encoding into the bitstream 115 with an acceptable level of compression is possible.

[000134] Fig. 9C shows residual coefficients resulting from quantising the transform coefficients 915 of Fig. 9B. A set of states 930 (also referred to as the state variable S) of the state machine 840 are shown. The initial state (prior to the last significant coefficient (911) is "00". A trellis search results in a sequence of residual coefficient magnitudes 920, derived from the transform coefficients 915. A last significant residual coefficient is located at scan position 59 (i.e., 911), and further residual coefficients back to scan position 0. As the initial state was "00", QO is used to inverse quantise coefficient 911 and the first set of contexts is used to decode coefficient 911. As the magnitude of the coefficient 911 is one (i.e. odd), the next state resulting from encoding or decoding the coefficient 911 is "10". For clarity, residual coefficient magnitudes 920 are depicted as magnitudes only (signs omitted) in Fig. 9C as the sign of each residual coefficient does not influence the state machine.

[000135] The state shown for each residual coefficient is the next state that results from a current state and the residual coefficient parity (ODD or EVEN) and the position of the residual coefficient along the scan path (that is, whether the coefficient is located at a predetermined position or not). The reset to "00" at a predetermined position (for example positions 48, 32, 16) results in use of QO and the first set of contexts for residual coefficients at positions 47and 31. The odd parity of coefficients at position 47 results in a next state of "10" for the coefficients, for example as shown as a two-part transition at transition 951. The even parity of the coefficient at position 31 results in a next state of "00" (that is no change from the reset that

22856187_1 occurred at the previous residual coefficient), at transition 950 for example. The reset of the state to "00" after decoding the last residual coefficient of each sub-block reduces the sequential dependencies in updating of the state machine 840, allowing each residual coefficient in each sub-block to be quantised and inverse quantised in parallel.

[000136] Effectively, an initial position in each of the sub-blocks of the transform block encountered when performing the backward diagonal scan order provides a predetermined position for a reset. The predetermined positions for the reset are dependent upon the sub-block boundaries, for example, coinciding with each sub-block boundary by virtue of being performed after decoding the last residual coefficient of each sub-block in the backward diagonal scan order. As the scan starts for the last significant coefficient at a beginning of a CU, the number of positions for a reset also depend on the position of the last significant coefficient. As a consequence, dependencies in arithmetic coding and quantisation are at most for runs of 16 coefficients rather than a worst case of a run of 1024 residual coefficients, resulting from a 32x32 coded region of residual coefficients of a TB. The state machine 840 operates such that the state may nonetheless return to the state "00" for coefficients other than those at the sub block transition. For example, transition 960 ("00" to "00") occurs when decoding of residual coefficients not located at a transition between sub-blocks. Transition 941 ("11" to "01") shows the transition when an odd residual coefficient is decoded that is not located at a predetermined location for the state reset operation.

[000137] Fig. 9D shows the residual coefficient magnitudes 920 of Fig. 9C mapped into a transform block 900d as residual coefficients 990. The residual coefficients 990 are shown with sign and magnitude, and according to the backward diagonal scan pattern.

[000138] Fig. 10 shows a method 1000 for encoding coding units and transform blocks of an image frame into a video bitstream 115. The method 1000 may be embodied by apparatus such as a configured FPGA, an ASIC, or an ASSP. Additionally, the method 1000 may be performed by video encoder 114 under execution of the processor 205. As such, the method 1000 may be stored on computer-readable storage medium and/or in the memory 206. The method 1000 commences at a divide frame into CTUs step 1010.

[000139] At the divide frame into CTUs step 1010 the block partitioner 310, under execution of the processor 205, divides a current frame of the frame data 113 into an array of CTUs. A progression of encoding over the CTUs resulting from the division commences. Control in the processor progresses from the step 1010 to a determine coding tree step 1020.

22856187_1

[000140] At the determine coding tree step 1020 the video encoder 114, under execution of the processor 205, tests various prediction modes and split options in combination to arrive at a coding tree for a CTU. Also derived at step 1020 are prediction modes and residual coefficients for each TB of each CU of the coding tree for the CTU. Generally, a Lagrangian optimisation is performed to select the optimal coding tree and CUs for the CTU. When evaluating use of inter prediction, a motion vector is selected from a set of candidate motion vectors. Candidate motion vectors are generated according to a search pattern. When testing distortion of fetched reference blocks for candidate motion vectors are being evaluated, the application of prohibited chroma splitting in the coding tree is considered. When a split is prohibited in chroma and allowed in luma, the resulting luma CBs may use inter prediction. Motion compensation is applied to the luma channel only and so the distortion computation considers the luma distortion and not the chroma distortion. The chroma distortion is not considered as motion compensation is not performed in the chroma channel when the chroma split was prohibited. For chroma, the distortion resulting from the considered intra prediction mode and a coded chroma TB (if any) is considered. When considering both luma and chroma, the inter prediction search may firstly select a motion vector based on luma distortion and then 'refine' the motion vector by also considering chroma distortion. Refinement generally considers small variation on motion vector value, such as sub-pixel displacements. Control in the processor 205 progresses from the step 1020 to an encode coding unit step 1030.

[000141] At the encode coding unit step 1030 the video encoder 114, under execution of the processor 205, encodes one of the coding units into the bitstream 115. The step 1030 is performed for each CU resulting from the coding tree determined at the step 1020. Split flags are encoded to indicate the decomposition of the CTU into one or more CUs, in accordance with the coding tree as determined at the step 1020. A prediction mode for the CU is encoded into the bitstream 115, indicating use of intra prediction, intra block copy, inter prediction, motion vector, merge mode or other parameters needed to define the PU associated with the CU. Control in the processor 205 progresses from the step 1030 to a quantise residual coefficients step 1035.

[000142] At the quantise residual coefficients step 1035 the quantiser module 334, under execution of the processor 205, produces an array of quantised residual coefficients for a set of transform coefficients. For each residual coefficient, selection of one of the quantisers QO or Q Iis in accordance with the state variable S, updated from one coefficient to the next coefficient based on the state machine 840. Moreover, at predetermined points along the scan path, e.g. at boundaries from one sub-block to the next sub-block, the state variable S is reset to

22856187_1 a predetermined state, also referred to as a reset state. The predetermined state can accordingly correspond to an index of the sub-block. In the example described the reset state is "00". However, other states can be used as the reset state if required. Determination of each residual coefficient accords with 4-state trellis path determination, as described by way of example in relation to Figs. 9A-9D. Control in the processor 205 progresses from the quantise residual coefficients step 1035 to an encode transform blocks step 1040.

[000143] At the encode transform blocks step 1040 the video encoder 114, under execution of the processor 205, encodes a TB for each colour channel (that is each of the primary and secondary channels) of the CU encoded at the step 1030 into the bitstream 115. Each TB is encoded by performing a method 1100, described with reference to Fig. 11. Control in the processor 205 progresses from the step 1040 to a last coding unit test step 1050.

[000144] At the last coding unit test step 1050 the processor 205 tests if the current coding unit is the last one in the coding tree of step 1020. If the current coding unit is the last one in the coding tree of step 1020 ("YES" at step 1050), control in the processor progresses to a last CTU test step 1060. If the current coding unit is not the last one in the coding tree of step 1020 ("NO" at step 1050), the next coding unit in the coding tree of step 1020 is selected for encoding and control in the processor 205 progresses to the step 1030.

[000145] At the last CTU test step 1060 the processor 205 tests if the current CTU is the last CTU in the slice or frame. If not ("NO" at step 1060), the video encoder 114 advances to the next CTU in the frame and control in the processor 205 progresses from the step 1060 back to the step 1020 to continue processing remaining CTUs in the frame. If the CTU is the last one in the frame or slice, the step 1060 returns "YES" and the method 1000 terminates. As a result of the method 1000, an entire image frame is encoded as a sequence of CTUs into a bitstream.

[000146] Fig. 11 shows a method 1100 for encoding a transform block of an image frame into a video bitstream 115, as implemented at step 1040 for each of the colour channels. The method 1100 may be embodied by apparatus such as a configured FPGA, an ASIC, or an ASSP. Additionally, the method 1100 may be performed by video encoder 114 under execution of the processor 205. As such, the method 1100 may be stored on computer-readable storage medium and/or in the memory 206. The method 1100 commences at a reset state step 1110.

[000147] At the reset state step 1110 the entropy encoder 338, under execution of the processor 205, resets the state variable S, described in relation to step 1035, to "00". Control in the processor 205 progresses from the step 1110 to an encode last position step 1120.

22856187_1

[000148] At the encode last position step 1120 the entropy encoder 338, under execution of the processor 205, encodes the position of the last significant residual coefficient of the TB into the bitstream 115. Control in the processor 205 progresses from the step 1120 to a select sub-block step 1130.

[000149] At the select sub-block step 1130 the processor 205 selects one sub-block of the TB. The selected sub-block is the one containing the last significant residual coefficient. Control in the processor 205 progresses from the step 1130 to a determine coded sub-block flag step 1140.

[000150] At the determine coded sub-block flag step 1140 the processor 205 determines a coded sub-block flag, such that a value of '1' indicates the presence of at least one significant residual coefficient in the sub-block and a value of '0' indicates that all residual coefficients in the sub-block are insignificant. Control in the processor 205 progresses to a coded sub-block flag test step 1150.

[000151] At the coded sub-block flag test step 1150 the coded sub-block flag determined at step 1140 is tested for presence of at least one significant residual coefficient. If the test is indicative of the presence of at least one significant residual coefficient ("TRUE" at step 1150), control in the processor 205 progresses to a first pass coefficient encode step 1152. For the sub block containing the last significant coefficient progression to step 1152 is implicitly the case. For the top-left sub-block (containing the DC coefficient), step 1150 returns "TRUE" and control passes to the step 1152 regardless of the flag value determined at step 1140. Moreover, for sub-blocks other than the one containing the DC coefficients or the last coefficient, the determined coded sub-block flag of the step 1140 is encoded into the bitstream 115 by the entropy encoder 338 at step 1152. If no significant residual coefficients are present in the sub block ("FALSE" at step 1150), control in the processor 205 progresses from the step 1150 to a next sub-block step 1170.

[000152] At the first pass coefficient encode step 1152 the video encoder 115, under execution of the processor 205, performs a method 1200, as described hereafter with reference to Fig. 12. The method 1200 executes to encode one of more flags associated with each of the residual coefficients of one sub-block into the bitstream 115. In particular, for each residual coefficient a significance flag, a 'greater than one' flag, a parity flag, and a 'greater than three' flag may be encoded using context-coded bins in execution of the step 1152. Each sub-block has a budget of using at most 32 context-coded bins for residual coding. Each bin encoded using context coding decreases the budget by one. Once the budget reaches zero, the coefficient index within

22856187_1 the sub-block at which the budget reached zero is used to divide the remaining coefficients of the sub-block into an additional (second) group, to be encoded using a different method with bypass-coded bins only, as described hereafter in relation to the method 1100. As such, over the course of execution of the method 1100, the current sub-block is divided into a first group of residual coefficients (encoded using context-coded bins and bypass-coded bins) and a second group of residual coefficients (encoded using bypass-coded bins only). The encoded magnitude for a residual coefficient n is defined as AbsLevel[ n ]. Then, for a residual coefficient belonging to the first group of residual coefficients, a portion of the encoded magnitude AbsLevel[ n ] is encoded by the corresponding significance flag sigcoeff flag[ n], 'greater than one' flag abs_levelgt1_flag[ n ], parity flag par levelflag[ n ], and 'greater than three' flag abs_levelgt3_flag[ n ] as a value AbsLevelPass1[ n ] expressed according to Equation (1) below:

AbsLevelPassl[ n]= sigcoeff flag[ n ]+ par levelflag[ n ]+ abs_levelgtlflag[ n ]+ 2 * abslevel_gt3_flag[ n ]. (1)

[000153] Control in the processor 205 progresses from the step 1152 to a second pass coefficient encode step 1154.

[000154] At the second pass coefficient encode step 1154 the entropy encoder 338, under execution of the processor 205, encodes the remaining magnitude of the first group of residual coefficients into the bitstream 115 using a bypass-coded Golomb Rice binarisation of a value absremainder[ n ],which is defined according to Equation (2) below:

absremainder[ n ]= (AbsLevel[ n ] - AbsLevelPass1[ n ]) >> 1. (2)

[000155] The remaining magnitude abs_magnitude[ n ] refers to the quantity of magnitude of each residual coefficient belonging to the first group that has not already been expressed via the significance flag, greater than one flag, greater than three flag, and parity flag. For example, a residual coefficient with magnitude 10 (AbsLevel[ n ] = 10) has a significance flag, a greater than-one flag and a greater than three flag all equal to one, and a parity flag equal to zero. The corresponding value of AbsLevelPass1[ n ] is therefore equal to 4, and the remaining magnitude absremainder[ n ] is equal to 3.

[000156] Control in the processor 205 progresses from the step 1154 to a third pass coefficient encode 1156 step.

22856187_1

[000157] At the third pass coefficient encode 1156 step the entropy encoder 338, under execution of the processor 205, encodes residual coefficients (if any) from the second group of step 1152 into the bitstream 115. Residual coefficients of the second group may be encoded using Rice Golomb coding, with the Rice parameter derived from a 'neighbourhood' of previously encoded residual coefficients. The neighbourhood is a pattern offive residual coefficients located according to a fixed spatial pattern relative to the current residual coefficient. The fixed spatial pattern results in the use of up to five previously encoded residual coefficients, located in close spatial proximity to the current residual coefficient. The sum of absolute magnitudes of the neighbourhood encoded residual coefficients is used to select a Rice parameter for the residual coefficient to be coded. In some arrangements, the Rice parameter derivation method for the second pass coefficient encode step 1154 and the third pass coefficient encode step 1156 are identical. Control in the processor 205 progresses from the step 1156 to a sign bit pass encode step 1158.

[000158] At the sign bit pass encode step 1158, the sign bits of any significant (non-zero) residual coefficients in the current sub-block are bypass-coded into the bitstream 115 by the entropy encoder, under execution of the processor 205. Control in the processor 205 progresses from the step 1158 to a top-left sub-block test step 1160.

[000159] At the top-left sub-block test step 1160 the processor 205 tests if the current sub-block is the last one in the backward diagonal scan order, i.e. the sub-block is the top-left sub-block of the TB. If the current sub-block is the top-left sub-block ("YES" at step 1160) the method 1100 terminates and control in the processor 205 returns to the method 1000. Otherwise ("NO" at step 1160) control in the procssor 205 progresses from the step 1160 to the next sub-block step 1170.

[000160] At the next sub-block step 1170 the processor 205 advances to the next sub-block in the backward diagonal scan order. Control in the processor 205 progresses from the step 1170 to the determine coded sub-block flag step 1140.

[000161] Fig. 12 shows the method 1200 for encoding residual coefficients of a sub-block of a transform block of an image frame into a video bitstream 115, as implemented at step 1152. The method 1200 may be embodied by apparatus such as a configured FPGA, an ASIC, or an ASSP. Additionally, the method 1200 may be performed by video encoder 114 under execution of the processor 205. As such, the method 1200 may be stored on computer-readable storage medium and/or in the memory 206. The method 1200 is invoked at step 1152 for each

22856187_1 sub-block encoded into the bitstream 115 of a TB. The method 1200 commences at a reset coefficient position step 1210.

[000162] At the reset coefficient position step 1210 the processor 205 sets a current coefficient position for the current sub-block. The current coefficient position is set to the position of the last significant coefficient when the current sub-block includes the last significant coefficient. Otherwise, the current coefficient position is set to the bottom-right coefficient of the sub-block. A context-coded bin budget for the sub-block is set to an initial level, such as 32 context-coded bins. This initial level defines a worst-case quantity of context-coded bins per sub-block, thus setting an upper limit on the related complexity of coding context-coded bins. The initial level is fixed, enabling its use in the provisioning of hardware or software resources to handle worst case bitstreams. The context-coded bin budget establishes a limit on the number of context coded bins available for use for the sub-block. Limiting the number of context-coded bins available for use is advantageous as context-coded bins require more operations to encode or decode than bypass-coded bins. Setting the limit reduces the worst-case complexity resulting from encoding or decoding many context-coded bins. Control in the processor 205 progresses from the step 1210 to an encode significance flag step 1220.

[000163] At the encode significance flag step 1220 the entropy encoder 338, under execution of the processor 205, encodes a significance flag into the bitstream 115 for the current residual coefficient using a context-coded bin. The context used for the significance flag is obtained from one of two sets of contexts, the set of contexts being selected by the current state variable S, such that states "00" and"01" result in selection from the first set of contexts and states "10" and "11" result in selection from the second set of contexts. The state variable S is initially set at step 1110 and may be updated or reset at steps 1290 and 12100 described below based on the state machine 840 as described in relation to step 1035. The first set of contexts indicate use of the quantiser QOwhereas the second set of contexts indicate use of the quantiser Q1. A significance flag value of zero is encoded when the residual coefficient magnitude is zero and a significance flag value of one is encoded when the residual coefficient magnitude is greater than zero. Control in the processor 205 progresses from the step 1220 to a coefficient significance test step 1230.

[000164] At the coefficient significance test step 1230 the entropy encoder 338, under execution of the processor 205, tests the residual coefficient magnitude. If the residual coefficient is insignificant (magnitude equal to zero, i.e. "FALSE" at step 1230), control in the processor 205 progresses from the step 1230 to a coefficient position test step 1280. If the

22856187_1 residual coefficient is significant (magnitude is not equal to zero, i.e. "TRUE" at step 1230), control in the processor 205 progresses from the step 1230 to an encode gtl flag step 1240.

[000165] At the encode gtl flag step 1240 the entropy encoder 338, under execution of the processor 205, encodes a 'greater than one' flag into the bitstream 115 for the current residual coefficient using a context-coded bin. A greater than one flag indicates whether the residual coefficient magnitude is greater than one or not. As with the significance flag encoded at the step 1220, the context used to encode the greater than one flag is selected from one of two sets of contexts, the set of contexts being selected according to the state variable S. Control in the processor 205 progresses from the step 1240 to a coefficient gtl test step 1250.

[000166] At the coefficient gtl test step 1250 the processor 205 tests the magnitude of the residual coefficient. If the magnitude of the residual coefficient is greater than one ("TRUE" at step 1250) control in the processor 205 progresses from the step 1250 to an encode parity flag step 1260. If the magnitude of the residual coefficient is not greater than one ("FALSE" at step 1250) control in the processor 205 progresses from the step 1250 to the coefficient position test step 1280.

[000167] At the encode parity flag step 1260 the entropy encoder 338, under execution of the processor 205, encodes a flag indicating the least significant bit of the magnitude of the current residual coefficient into the bitstream 115 using a context-coded bin. The context coded-bin uses contexts from one of two sets, the context set selected in accordance with the state variable S, as described with reference to step 1220. At step 1260, the one-valued significance flag and greater than one flag indicate a magnitude of two or more. The parity flag further indicates a magnitude of two, four, six, and so on, or three, five, seven, and so on. Control in the processor 205 progresses from the step 1260 to an encode gt3 flag step 1270.

[000168] At the encode gt3 flag step 1270 the entropy encoder 338, under execution of the processor 205, encodes a flag indicating that the magnitude of the residual coefficient is greater than three into the bitstream 115 using a context-coded bin. The context coded-bin uses contexts from one of two sets, the set selected in accordance with the state variable S, as described with reference to step 1220. Control in the processor 205 progresses from the step 1270 to the coefficient position test step 1280.

[000169] At the coefficient position test step 1280 the entropy encoder 338, under execution of the processor 205, tests the position of the current residual coefficient in the TB. If the position is one of a predetermined set of positions ("TRUE" at step 1280) control in the processor 205

22856187_1 progresses from the step 1280 to a reset state step 12100. Otherwise ("FALSE" at step 1280) control in the processor 205 progresses from the step 1280 to an update state step 1290.

[000170] In one arrangement of the method 1200, the predetermined positions are the coefficient positions at the top-left of each sub-block in the TB. As the top-left position is the last residual coefficient to be scanned within a given sub-block, the reset state step 12100 is invoked on every boundary from one sub-block to the next on the backward diagonal scan path.

[000171] In another arrangement of the method 1200, the predetermined positions correspond with every predetermined number N of sub-blocks, indexed according to the scan path in a forward direction. Accordingly, the placement of positions is not dependent upon the position of the last significant residual coefficient. Resetting the state variable enables each run of coefficients from one reset point up to the next reset point to be quantised and inverse quantised in parallel, reducing latency in the modules 334 and 428. Resetting the state constrains the flexibility of state machine transition across the TB, such that a trade-off between the degree of latency reduction and coding performance achieved from the trellis state machine quantisation exists. Selecting a value for N (e.g. 4) allows a suitable trade-off to be achieved between preserving the coding gain of the dependent quantisation operation, and not introducing excessive latency into the modules 334 and 428. Quantisation parameter values typical of compressed video are generally in the range of 20 to 37 and result in a relatively sparsely populated transform block, that is the majority of residual coefficients are not significant. If N is equal to one, the transition into the "00" state occurs relatively frequently compared to the number of significant residual coefficients in a sub-block. Increasing N to a larger value, such as four, reduces the frequency with which entry back to the state "00" occurs when compared to the number of significant residual coefficients in each run of four sub-blocks along the scan path. The cost of increasing N to four is that the residual coefficients are only be inverse quantised in runs of at most 64 rather than runs of at most 16. However, increasing N to 64 is still beneficial for reducing latency for inverse quantising TBs of a size larger than 64 samples.

[000172] At the update state step 1290 the entropy encoder 338, under execution of the processor 205, uses the parity flag value of step 1260 to update the state machine according to the state transition diagram 840. Control in the processor progresses from the step 1290 to a top-left coefficient position test step 12110.

[000173] At the reset state step 12100 the entropy encoder 338, under execution of the processor 205, resets the state to a known value, typically "00". In one arrangement of the

22856187_1 method 1200, the state is reset to values such that QO and QIare alternately selected, according to the sub-block position. For example, when resetting the state every fourth sub-block, sub blocks 0, 8, 16, 24, and so on reset to state "00" and sub-blocks 4, 12, 20, 28, and so on reset to state "10". Resetting to states in both QO and QImay result in a more uniform selection of quantiser QOversus Q1, removing a bias towards one quantiser, e.g. QO, that could otherwise exist. Control in the processor 205 progresses from the step 12100 to the top-left coefficient position test step 12110.

[000174] At the top-left coefficient position test step 12110 the entropy encoder 338, under execution of the processor 205, tests if the current residual coefficient position is the last one in the backward diagonal scan pattern. Effectively step 12110 tests if the current residual coefficient position is the top-left position of the current sub-block. If the top-left position is reached ("YES" at step 12110) the method 1200 terminates, and control in the processor 205 returns to the method 1100, and the current value of the state variable S returning to the method 1100. Otherwise ("NO" at the step 12110), the entropy coder 338, under execution of the processor 205, advances one residual coefficient position within the sub-block along the backward diagonal scan pattern. Then control in the processor 205 progresses from the step 12110 to the step 1220.

[000175] Fig. 13 shows a method 1300 for decoding coding units and transform blocks of an image frame from a video bitstream 133. The method 1300 may be embodied by apparatus such as a configured FPGA, an ASIC, or an ASSP. Additionally, the method 1300 may be performed by video decoder 134 under execution of the processor 205. As such, the method 1300 may be stored on computer-readable storage medium and/or in the memory 206. The method 1300 commences at a divide frame into CTUs step 1310.

[000176] At the divide frame into CTUs step 1310 the video decoder 134, under execution of the processor 205, divides a current frame of the frame data 133 (to be decoded) into an array of CTUs. A progression of decoding over the CTUs resulting from the division commences. Control in the processor progresses from the step 1310 to a decode coding unit step 1320.

[000177] At the decode coding unit step 1320 the entropy decoder 420, under execution of the processor 205, decodes various split flags from the bitstream 133 in accordance with the coding tree as described with reference to Figs. 5-7 to determine the size and location of a CU within the CTU, i.e. in accordance with the coding tree of the CTU. Moreover, the prediction mode of the CU is also decoded. Progression of the method 1300 involves iteration over the step 1320,

22856187_1 resulting in a traversal over the coding tree of the CTU, with each CU being decoded. Control in the processor 205 progresses from the step 1320 to a decode transform blocks step 1330.

[000178] At the decode transform blocks step 1330 the entropy decoder 420, under execution of the processor 205, decodes transform blocks of the coding unit of step 1320 from the bitstream 133. The step 1330 invokes a method 1400, described hereafter in relation to Fig. 14, for each colour channel (primary and secondary colour channels) of the image frame to decode the TBs of the CU. Control in the processor 205 progresses from the step 1330 to an inverse quantise residual coefficients step 1335.

[000179] At the inverse quantise residual coefficients step 1335 the dequantiser module 428, under execution of the processor 205, produces an array of inverse quantised residual coefficients (reconstructed transform coefficients) for the set of transform coefficients of the TB. For each residual coefficient, implementation of one of the quantisers Q or Q Iis performed in accordance with the state variable S, updated from one coefficient to the next coefficient according to the finite state machine 840 in execution of the method 1330. At predetermined points along the scan path, e.g. at boundaries from one sub-block to the next sub-block, the state variable S is reset to a known state, e.g. "00". Inverse quantisation can be performed along runs of residual coefficients beginning at each predetermined point, reducing latency of the inverse quantisation operation. Determination of each residual coefficient accords with 4-state trellis path determination, as described in relation to Figs. 9A-9D. Control in the processor 205 progresses from the inverse quantise residual coefficients step 1335 to a last coding unit test step 1340.

[000180] At the last coding unit test step 1340 the processor 205 tests if the current coding unit is the last one in the CTU, as determined from decoding split flags at step 1320. If the current coding unit is the last one in the CTU ("YES" at step 1340), control in the processor progresses to a last CTU test step 1350. If the current coding unit is not the last one in the coding tree of step 1320 ("NO" at step 1340), the next coding unit in the coding tree of step 1320 is selected for decoding and control in the processor 205 progresses to the step 1320.

[000181] At the last CTU test step 1350 the processor 205 tests if the current CTU is the last CTU in the slice or frame. If the current CU is not the last ("NO" at step 1350), the video decoder 134 advances to the next CTU in the frame or slice and control in the processor 205 progresses from the step 1350 back to the step 1320 to continue processing remaining CTUs in the frame. If the CTU is the last one in the frame or slice, the step 1350 returns "YES" and the

22856187_1 method 1300 terminates. As a result of the method 1300, an entire image frame is decoded as a sequence of CTUs from a bitstream.

[000182] The reconstructed coefficients generated at step 1335 are provided to one of the inverse transform modules 436 or 444 depending upon whether a secondary transform is used. The modules 436 and 444 operate to decode the transform block by inverse transforming the reconstructed transform coefficient.

[000183] Fig. 14 shows the method 1400 for decoding a transform block of an image frame from a video bitstream 133, as implemented at step 1330 for each of the colour channels. The method 1400 may be embodied by apparatus such as a configured FPGA, an ASIC, or an ASSP. Additionally, the method 1400 may be performed by video decoder 134 under execution of the processor 205. As such, the method 1400 may be stored on computer-readable storage medium and/or in the memory 206. The method 1400 commences at a reset state step 1410.

[000184] At the reset state step 1410 the entropy decoder 420, under execution of the processor 205, resets a state variable S to "00". Control in the processor 205 progresses from the step 1410 to a decode last position step 1420.

[000185] At the decode last position step 1420 the entropy decoder 420, under execution of the processor 205, decodes the position of the last significant residual coefficient of the TB from the bitstream 133. Control in the processor 205 progresses from the step 1420 to a select sub block step 1430.

[000186] At the select sub-block step 1430 the processor 205 selects one sub-block of the TB. The selected sub-block is the sub-block containing the last significant residual coefficient. Control in the processor 205 progresses from the step 1430 to a determine coded sub-block flag step 1440.

[000187] At the determine coded sub-block flag step 1440 the processor 205 determines a coded sub-block flag. The coded sub-block flag is determined such that a value of '1' indicates the presence of at least one significant residual coefficient in the sub-block and a value of '0' indicates that all residual coefficients in the sub-block are insignificant. For the sub-block containing the last significant residual coefficient the coded sub-block flag is determined to be '1' and for the top-left sub-block (containing the DC coefficient), the coded sub-block flag is determined to be '1'. For other sub-blocks, the coded sub-block flag is determined by decoding

22856187_1 a context-coded bin from the bitstream 133. Control in the processor 205 progresses to a coded sub-block flag test step 1450.

[000188] At the coded sub-block flag test step 1450 the value of the coded sub-block flag determined at step 1440 is tested. If the value of the coded sub-block flag is '1' ("TRUE" at step 1450), control in the processor 205 progresses to afirst pass coefficient decode step 1452. If the value of the coded sub-block flag is '0' ("FALSE" at step 1450), control in the processor 205 progresses from the step 1450 to a next sub-block step 1470.

[000189] At the first pass coefficient decode step 1452 the video decoder 133, under execution of the processor 205, performs a method 1500, described with reference to Fig. 15. The method 1500 results in decoding one or more flags associated with each of the residual coefficients of one sub-block from the bitstream 133. In particular, the significant coefficient flags, 'greater than one' flags, parity flags, and 'greater than three' flags are decoded using context-coded bins. Each sub-block has a budget of using at most 32 context-coded bins for residual coding. As with the decoder, decoding of each context-coded bin reduces the budget by one. Once the budget is exhausted, the coefficients of the sub-block are divided into two groups (a first group and a second group) at the coefficient position in the sub-block at which the budget was exhausted. Coefficients of the sub-block in the second group are decoded using a different method with bypass-coded bins only. In particular, Golomb-Rice parameter derivation is differs from the Golomb-Rice parameter derivation method of the first group. As such, the sub-block is divided into afirst group of residual coefficients (decoded using context coded bins and bypass-coded bins) and a second group of residual coefficients (decoded using bypass-coded bins only). The magnitude for a residual coefficient n is defined as AbsLevel[ n]. Then, for a residual coefficient belonging to the first group of residual coefficients, a portion of the magnitude AbsLevel[ n ] is decoded by the corresponding significance flag sigcoeff flag[ n], 'greater than one' flag abslevel gtlflag[ n ], parity flag par levelflag[ n], and 'greater than three' flag abslevelgt3_flag[ n ] as a value AbsLevelPass1[ n ] expressed according to Equation (3) below:

AbsLevelPassl[ n ] = sigcoeff flag[ n ]+ par levelflag[ n ]+ abs_levelgtlflag[ n ]+ 2 * abslevel_gt3_flag[ n ] (3)

[000190] Control in the processor 205 progresses from the step 1452 to a second pass coefficient decode step 1454.

22856187_1

[000191] At the second pass coefficient decode step 1454 the entropy decoder 420, under execution of the processor 205, decodes remaining magnitudes of the first group of residual coefficients from the bitstream 133 by decoding bypass-coded Golomb Rice binarisations of 'absremainder' values. The magnitude AbsLevel[ n ] for a residual coefficient n belonging to the first group of residual coefficients is calculated according to Equation (4) below:

AbsLevel[ n ] = AbsLevelPass1[ n ] + 2 * absremainder[ n ] (4)

[000192] Control in the processor 205 progresses from the step 1454 to a third pass coefficient decode 1456 step.

[000193] At the third pass coefficient decode 1456 step the entropy decoder 420, under execution of the processor 205, decodes residual coefficients (if any) from the second group of step 1452 from the bitstream 133. Residual coefficients of the second group are decoded using Rice Golomb coding, with the Rice parameter derived from a 'neighbourhood' of previously decoded residual coefficients. The neighbourhood is a pattern offive residual coefficients located spatially close to, and preceding in the backward diagonal scan order, the residual coefficient currently being decoded. The sum of absolute magnitudes of these give residual coefficients is used to select a Rice parameter for the residual coefficient to be decoded. Residual coefficients of the first group and the second group may use the same Golomb-Rice parameter derivation method for decoding their respective remaining magnitude. Control in the processor 205 progresses from the step 1456 to a sign bit pass decode step 1458.

[000194] At the sign bit pass decode step 1458, the sign bits of any significant (non-zero) residual coefficients in the current sub-block are decoded from the bitstream 133 as bypass coded bins. Control in the processor 205 progresses from the step 1458 to a top-left sub-block test step 1460.

[000195] At the top-left sub-block test step 1460 the processor 205 tests if the current sub-block is the last one in the backward diagonal scan order, i.e. the sub-block is the top-left sub-block of the TB. If the current sub-block is the top-left sub-block ("YES" at step 1460) the method 1400 terminates and control in the processor 205 returns to the method 1300. Otherwise, if the current sub-block is not the top-left sub-block ("NO" at step 1460), control in the procssor 205 progresses from the step 1460 to the next sub-block step 1470.

22856187_1

[000196] At the next sub-block step 1470 the processor 205 advances to the next sub-block in the backward diagonal scan order. Control in the processor 205 progresses from the step 1470 to the determine coded sub-block flag step 1440.

[000197] Fig. 15 shows the method 1500 for decoding residual coefficients of a sub-block of a transform block of an image frame from a video bitstream, as implemented at step 1452. The method 1500 may be embodied by apparatus such as a configured FPGA, an ASIC, or an ASSP. Additionally, the method 1500 may be performed by video decoder 133 under execution of the processor 205. As such, the method 1500 may be stored on computer-readable storage medium and/or in the memory 206. The method 1500 is invoked at step 1460 for each sub-block encoded from the bitstream 133 of a TB. The method 1500 commences at a reset coefficient position step 1510.

[000198] At the reset coefficient position step 1510 the processor 205 sets a current coefficient position for the current sub-block. The current coefficient position is set to the position of the last significant coefficient if the current sub-block includes the last significant coefficient. The current coefficient position is otherwise set to the bottom-right coefficient of the sub-block. A context-coded bin budget for the sub-block is set to an initial level, such as 32 context-coded bins. The context-coded bin budget establishes a limit on the number of context-coded bins available for use for the sub-block. Limiting the number of context-coded bins available for use is advantageous as context-coded bins require more operations to encode or decode than bypass-coded bins. Setting the limit reduces the worst-case complexity resulting from encoding or decoding many context-coded bins. Control in the processor 205 progresses from the step 1510 to a decode significance flag step 1520.

[000199] At the decode significance flag step 1520 the entropy decoder 420, under execution of the processor 205, decodes a significance flag from the bitstream 133 for the current residual coefficient using a context-coded bin. The context used for the significance flag is obtained from one of two sets of contexts. The set of contexts is selected according to the current state variable S, such that states "00" and "01" result in selection from the first set of contexts and states "10" and "11" result in selection from the second set of contexts. A significance flag value of zero indicates that the residual coefficient magnitude is zero and a significance flag value of one indicates that the residual coefficient magnitude is greater than zero. Control in the processor 205 progresses from the step 1520 to a coefficient significance test step 1530.

22856187_1

[000200] At the coefficient significance test step 1530 the entropy decoder 420, under execution of the processor 205, tests the significance flag. If the significance flag is zero ("FALSE" at step 1530), control in the processor 205 progresses from the step 1530 to a coefficient position test step 1580. If the significance flag is one ("TRUE" at step 1530), control in the processor 205 progresses from the step 1530 to a decode gtl flag step 1540.

[000201] At the decode gtl flag step 1540 the entropy decoder 420, under execution of the processor 205, decodes a 'greater than one' flag from the bitstream 420 for the current residual coefficient using a context-coded bin. A greater than one flag indicates that the residual coefficient magnitude is greater than one or not. As with the significance flag decoded at the step 1520, the context used to encode the greater than one flag is selected from one of two sets of contexts, the set of contexts being selected according to the state variable S. Control in the processor 205 progresses from the step 1540 to a coefficient gtl test step 1550.

[000202] At the coefficient gtl test step 1250 the processor 205 tests the magnitude of the residual coefficient, to the extent known from the steps 1520 and 1540). If the magnitude of the residual coefficient is greater than one ("TRUE" at step 1550) control in the processor 205 progresses from the step 1550 to a decode parity flag step 1560. If the magnitude of the residual coefficient is not greater than one ("FALSE" at step 1550) control in the processor 205 progresses from the step 1550 to the coefficient position test step 1580.

[000203] At the decode parity flag step 1560 the entropy decoder 420, under execution of the processor 205, decodes a flag indicating the least significant bit of the magnitude of the current residual coefficient from the bitstream 133 using a context-coded bin. The context-coded bin uses contexts from one of two sets, the set selected in accordance with the state variable, as described with reference to step 1520. At step 1560, the one-valued significance flag and greater than one flag indicate a magnitude of two or more as each flag has been found to be set to '1'. The parity flag further indicates a magnitude of two, four, six, and so on, or three, five, seven, and so on. Control in the processor 205 progresses from the step 1560 to a decode gt3 flag step 1570.

[000204] At the decode gt3 flag step 1570 the entropy decoder 420, under execution of the processor 205, decodes a flag indicating that the magnitude of the residual coefficient is greater than three from the bitstream 133 using a context-coded bin. The context-coded bin uses contexts from one of two sets, the set selected in accordance with the state variable, as

22856187_1 described with reference to step 1520. Control in the processor 205 progresses from the step 1570 to the coefficient position test step 1580.

[000205] At the coefficient position test step 1580 the entropy decoder 420, under execution of the processor 205, tests the position of the current residual coefficient in the TB. If the position is one of a predetermined set of positions ("TRUE" at step 1580) control in the processor 205 progresses from the step 1580 to a reset state step 15100. Otherwise ("FALSE" at step 1580) control in the processor 205 progresses from the step 1580 to an update state step 1590. In one arrangement of the method 1500, the predetermined positions are the coefficient positions at the top-left of each sub-block in the TB. As the top-left position is the last residual coefficient to be scanned within a given sub-block, the reset state step 15100 is invoked on every boundary from one sub-block to the next on the backward diagonal scan path. In another arrangement of the method 1500, the predetermined positions correspond with every predetermined or fixed N number of sub-blocks, indexed according to the scan path in a forward direction, so the placement of positions is not dependent upon the position of the last significant residual coefficient. Resetting the state variable enables each run of coefficients from one reset point up to the next reset point to be quantised and inverse quantised in parallel, reducing latency in the modules 334 and 428. Resetting the state constrains the flexibility of state machine transition across the TB, so a trade-off between the degree of latency reduction and coding performance achieved from the trellis state machine quantisation exists. Selecting a value for N (e.g. 4), allows a suitable trade-off to be achieved between preserving the coding gain of the dependent quantisation operation, and not introducing excessive latency into the modules 334 and 428. Resetting every four sub-blocks results in worst-case runs of 64 residual coefficients, each of which can be inverse quantised in parallel with other 64-length runs of residual coefficients. Using a predetermined number of eight (8) or sixteen (16) sub-blocks is also possible, resulting in runs of 128 or 256 residual coefficients. However the degree of parallelisability is diminished compared to the four (4) sub-block case.

[000206] At the update state step 1590 the entropy decoder 420, under execution of the processor 205, uses the parity flag value of step 1560 to update the state machine according to the state transition diagram 840, i.e. maintaining the same state as seen in the video encoder 114 at the step 1290. As shown in Fig. 8, the state variable S is determined by determining a transition of the state machine 840 based on a parity of previously decoded residual coefficients. Control in the processor progresses from the step 1590 to a top-left coefficient position test step 15110.

22856187_1

[000207] At the reset state step 15100 the entropy decoder 420, under execution of the processor 205, resets the state to a known value, typically "00". In one arrangement of the method 1500, the state is reset to values such that QO and Q are alternately selected, according to the sub-block position. For example, when resetting the state every fourth sub-block, sub blocks 0, 8, 16, 24, and so on reset to state "00" and sub-blocks 4, 12, 20, 28, and so on reset to state "10". Resetting to states in both QO and QImay result in a more uniform selection of quantiser, removing a bias towards one quantiser, e.g. QO, that would otherwise exist. Control in the processor 205 progresses from the step 15100 to the top-left coefficient position test step 15110.

[000208] The steps 1590 and 15100 (and similarly the step 1410) execute to determine the state variable S for a residual coefficient of a transform block. Step 1590 operates to determine the state variable based on a previous decoded residual coefficient of the transform block. In using the trellis state machine, the state, and correspondingly, the quantiser, depend on the sequence of state transitions that have occurred for previously decoded residual coefficients. The previously decoded residual coefficients include the last coefficient where a reset occurred and any residual coefficients decoded since the reset. Accordingly, decoding of the previous residual coefficients affects the state and quantisation of later residual coefficients within the sub-block. Alternatively, based on steps 15100 and 1410, the state variable is a predetermined state being the reset state, such as the zero state "00" for example. Further, as described in relation to Fig. 9C, step 1590 can execute to return the state to the designated reset state based on parity of a residual coefficient rather than the location being a predetermined position such as a sub-block crossing. Similarly, the steps 1110, 1290 and 12100 operate to determine a state variable for a residual coefficient for encoding purposes.

[000209] At the top-left coefficient position test step 15110 the entropy decoder 420, under execution of the processor 205, tests if the current residual coefficient position is the last one in the backward diagonal scan pattern - that is, if the current residual coefficient position is the top-left position of the current sub-block. If the top-left position is reached ("YES" at step 15110) the method 1500 terminates, with control in the processor 205 returning to the method 1400, and the current value of the state variable S returning to the method 1400. Otherwise ("NO" at the step 15110), the entropy decoder 420, under execution of the processor 205, advances one residual coefficient position within the sub-block along the backward diagonal scan pattern. Control in the processor 205 progresses from the step 15110 to the step 1520.

22856187_1

[000210] Arrangements described with reference to Figs. 10-15 show the video encoder 114 and the video decoder 134 supporting dependent scalar quantisation with lower latency compared to traditional dependent quantisation methods. The dependent scalar quantisation method relates to use of two quantisers QOand Q Ias described in relation to Figs. 8 and 9. The lower latency is achieved by treating a transform block as a set of runs of residual coefficients, each of which may be independently quantised and inverse quantised. As residual coefficient quantisation and inverse quantisation is typically performed in a pipeline stage separate to the entropy encoding or entropy decoding stages, the latency reduction is beneficial. Performing the quantisation and inverse quantisation in the entropy encoder 338 and the entropy decoder 420 is undesirable as the application of scaling matrices, which may need to be accessed to parse TBs, becomes linked to the decoding of individual residual coefficients. Performing the quantisation and inverse quantisation in the entropy encoder 338 and the entropy decoder 420 creates difficulties for memory access timing including unpredictable timing of accesses due to variable length residual coefficient decoding and additional memory accesses of the scaling matrix as the scaling aspect of inverse quantisation is applied on a coefficient-by-coefficient basis rather than in groups. The timing of decoding individual residual coefficients varies due to the need to encode or decode a variable number of context coded and bypass-coded bins for each residual coefficient.

[000211] Other arrangements may use additional memory to store the state variable, or at least a usable summary of the state variable, for each residual coefficient to pass from the entropy encoder or decoder to the quantiser or inverse quantiser, respectively. For example, the MSB (one bit) of the state variable (two bits) may be sufficient to select use of QO or Q1. Arrangements storing the state variable require additional memory, e.g. one bit per residual coefficient in the 64x64 VPDU that contains the residual coefficients of all TBs contained therein. For 4:2:0 chroma format video data, the additional one bit corresponds with 64x64 bits for the luma channel and two sets of 32x32 bits for the chroma channels, resulting in a requirement of 6144 additional bits of memory to pass state information from the entropy encoding/decoding stage to the quantisation/inverse quantisation stage.

[000212] In another arrangement of the video encoder 114 and the video decoder 134, the use of dependent scalar quantisation is confined to TBs not exceeding an area threshold, for example, 256 samples. In confining dependent scalar quantisation to transform blocks of 256 samples or less, the worst case sequential run of 256 residual coefficients occurs when every residual coefficient is significant in a block of sizes e.g. 16x16, 32x8, 64x4 and so on.

22856187_1

[000213] In yet another arrangement of the video encoder 114 and the video decoder 134, the methods 1200 and 1500 operate such that the parity flags encoded at step 1260 and decoded at step 1560 use bypass coding rather than arithmetic coding. Use of bypass coding for the parity flags reduces the rate at which the context-coded bin budget is exhausted, allowing use of more context-coded bins for greater-than-one and greater-than-three flags. At higher bitrates, the distribution of one and zero values for the parity flags is more equal, so the benefit of arithmetic coding is reduced compared to the low bit rate case and thus bypass coding can be used. Use of bypass coding for the parity flags also avoid the need to select contexts from the context memory for the parity flags and increases throughput as bypass-coded bins can be encoded and decoded at a higher rate than context-coded bins.

INDUSTRIAL APPLICABILITY

[000214] The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video and image signals, achieving high compression efficiency.

[000215] The arrangements described herein permit residual encoding and decoding to use a trellis-based state machine that updates according to coefficient parity and selects contexts and quantisers for coefficients. The arrangements described allow implementation of the trellis based state machine without imposing excessive latency due to the sequential nature of the state update.

[000216] The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

[000217] In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.

22856187_1

Claims

1. A method of decoding a transform block of an image frame from a video bitstream, the method comprising:

determining a state variable for a residual coefficient of the transform block, the state variable being determined either:

based on a previous decoded residual coefficient of the transform block, or

using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined positions in the transform block;

decoding the residual coefficient of the transform block according to the determined state variable;

inverse quantising the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and

decoding the transform block by inverse transforming the reconstructed transform coefficient.

2. The method according to claim 1, wherein the residual coefficients of the transform block are grouped into sub-blocks and scanned within each sub-block in a backward diagonal scan, progressing from sub-block to sub-block in a backward diagonal scan.

3. The method according to clam 1, wherein the predetermined state is a zero state.

4. The method according to claim 1, wherein the predetermined state corresponds to a sub block index.

22856187_1

5. The method according to claim 1, wherein each position of the plurality of positions corresponds with an initial position in each of the sub-blocks of the transform block encountered when performing the backward diagonal scan order.

6. The method according to claim 1, wherein the predetermined positions occur once every predetermined number of sub-blocks.

7. The method according to claim 1, wherein the number of predetermined positions encountered depends on the position of a last significant position of the residual coefficients in the transform block.

8. The method according to claim 1, wherein the state variable is used to select one of a pair of quantisers, and wherein the state selects an alternate one of the quantisers at each of the predetermined positions.

9. The method according to claim 1, wherein determining the state variable based on the previous decoded residual coefficient of the transform block comprises determining a transition of a state machine based on a parity of the previously decoded residual coefficient.

10. The method according to claim 9, wherein the state is determined to be the predetermined state based on the previous decoded residual coefficient of the transform block.

11. The method according to claim 1, wherein decoding the residual coefficient of the transform block according to the determined state variable comprises decoding a magnitude of a portion of the residual coefficient using a context-coded bin based on the state variable.

12. The method according to claim 11, wherein decoding the residual coefficient of the transform block further comprises decoding a remainder of the residual coefficient using Golomb Rice coding.

13. A non-transitory computer-readable medium having a computer program stored thereon to implement a method of decoding a transform block of an image frame from a video bitstream, the program comprising:

code for determining a state variable for a residual coefficient of the transform block, the state variable being determined either:

22856187_1 based on a previous decoded residual coefficient of the transform block, or using a predetermined state, the predetermined state being used based on the position of the residual coefficient in the transform block being one of a plurality of predetermined positions in the transform block; code for decoding the residual coefficient of the transform block according to the determined state variable; code for inverse quantising the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and code for decoding the transform block by inverse transforming the reconstructed transform coefficient.

14. A video decoder, configured to:

receive a transform block of an image frame from a video bitstream;

determine a state variable for a residual coefficient of the transform block, the state variable being determined either:

based on a previous decoded residual coefficient of the transform block, or

decode the residual coefficient of the transform block according to the determined state variable;

inverse quantise the residual coefficient of the transform block according to the determined state variable to produce a reconstructed transform coefficient; and

decode the transform block by inverse transforming the reconstructed transform coefficient.

22856187_1

15. A system, comprising:

a memory; and

a processor, wherein the processor is configured to execute code stored on the memory for implementing a method of decoding a transform block of an image frame from a video bitstream, the method comprising:

based on a previous decoded residual coefficient of the transform block, or

CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant Spruson & Ferguson

22856187_1