AU2012202183A1

AU2012202183A1 - Method, apparatus and system for encoding and decoding the residual coefficients of a transform unit

Info

Publication number: AU2012202183A1
Application number: AU2012202183A
Authority: AU
Inventors: Christopher James ROSEWARNE
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-04-13
Filing date: 2012-04-13
Publication date: 2013-10-31

Abstract

Abstract METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING THE RESIDUAL COEFFICIENTS OF A TRANSFORM UNIT Disclosed is a method (900) of decoding a luma coefficient, for a transform unit (400) 5 having at least one sub-set (402), from a stream (113) of encoded video data, the method comprising selecting (901) a sub-set of the transform unit from the stream of encoded video data; selecting (902), for the sub-set of the transform unit, a context for a greater than-two flag from a context model (503) having at most two contexts dedicated to said greater-than-two flag for the transform unit, the context being for a luma coefficient of the 10 sub-set, and being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; decoding (903) the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and storing (904) the decoded luma coefficient of the transform unit P033508/ 6,197,598_1 /P033508_SpeciFiled 130412 T- 0 CL. (N o c (NJ (N CN 0: co UNC P03508 6.9758 1 1241

Description

S&F Ref: P033508 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant : chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Christopher James Rosewarne Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Method, apparatus and system for encoding and decoding the residual coefficients of a transform unit The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(6198702_1) - I METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING THE RESIDUAL COEFFICIENTS OF A TRANSFORM UNIT TECHNICAL FIELD The present invention relates generally to digital video signal processing and, in particular, to a method, apparatus and system for encoding and decoding luma and chroma residual coefficients of a transform unit (TU), wherein the luma residual coefficient context 5 selection is independent of previous values of luma residual coefficients. BACKGROUND Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently under development. Recent developments in video 10 coding standardisation have led to the formation of a group called the "Joint Collaborative Team on Video Coding" (JCT-VC). The Joint Collaborative Team on Video Coding (JCT VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), 15 and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee I / Subcommittee 29 / Working Group 11 (ISO/IEC JTCl/SC29/WG 11), also known as the Moving Picture Experts Group (MPEG). The Joint Collaborative Team on Video Coding (JCT-VC) has the goal of 20 producing a new video coding standard to significantly outperform the presently existing video coding standard known as "H.264/MPEG-4 AVC". The H.264/MPEG-4 AVC standard is itself a large improvement on previous video coding standards such as MPEG-4 and ITU-T H.263. The new video coding standard under development has been named "high efficiency video coding (HEVC)". The Joint Collaborative Team on Video Coding 25 JCT-VC is also considering implementation challenges arising from technology proposed for high efficiency video coding (HEVC) that create difficulties when scaling implementations of the HEVC standard to operate at high resolutions or high frame rates. One area of the H.264/MPEG-4 AVC video coding standard that presents difficulties for achieving high compression efficiency is the coding of residual coefficients 30 used to represent video data. Video data is formed by a sequence of frames, with each frame having a two-dimensional array of samples. Typically, frames include one luminance (luma) and two chrominance (chroma) channels. P033508 / 6,197,598_1 / P033508_SpeciFiled 130412 - 2 Colour information is typically represented using a colour space such as YUV, with Y being the luma channel and UV being two chroma channels. A colour space such as YUV has an advantage in that the majority of the frame content is contained in the luma channel. The relatively smaller amount of content stored in the UV chroma channels is 5 sufficient to reconstruct a colour frame. Furthermore, the chroma channels may also be down sampled to a lower spatial resolution with negligible resultant perceptual quality loss. A commonly used chroma format is 4:2:0 which results in each chroma channel having half the vertical and horizontal resolution of the luma channel. A coding unit is a square shaped set of collocated luma and chroma samples. Coding units vary in size from 10 4x4 to 64x64, with edge dimensions being a power of two and having a square shape. A 64x64 area is defined as a largest coding unit (LCU) and contains one or more coding units, such that the 64x64 area is completely filled with non-overlapping coding units. Other power-of-two sizes for the largest coding unit are possible and the maximum size of a coding unit is not required to be equal to the size of a largest coding unit (LCU). Each 15 frame is decomposed into an array of largest coding units (LCUs). A coding tree enables the subdivision of each largest coding unit (LCU) into four coding units (CUs), each having half the width and height of a parent largest coding unit (LCU). Each of the coding units (CUs) may be further subdivided into four equally-sized coding units (CUs). Such a subdivision process may be applied recursively until a smallest coding unit (SCU) size is 20 reached, enabling coding units (CUs) to be defined down to a minimum supported size. This recursive subdivision of a largest coding unit into a hierarchy of coding units produces a quadtree structure and is referred to as the coding tree. This subdivision process is encoded in a bitstream as a sequence of flags, coded as bins. Coding units therefore have a square shape. 25 The array of largest coding units (LCUs) comprising a frame are coded in a bitstream in raster scan order. Each frame is divided into one or more slices, each containing an integer number of consecutive largest coding units. A set of coding units exist in the coding tree which are not further sub-divided, these coding units occupying the leaf nodes of the coding tree. Transform trees exist at 30 these coding units. A transform tree may further decompose a coding unit using a quadtree structure as used for the coding tree. At the leaf nodes of the transform tree, residual data is encoded using transform units (TUs). In contrast to the coding tree, the transform tree may subdivide coding units into transform units having a non-square shape. Furthermore, although transform units are non-overlapping, the transform tree structure does not require P033508 / 6,197,5981 / P033508_SpeciFiled 130412 -3 that transform units (TUs) occupy all of the samples contained within the parent coding unit. Each coding unit at the leaf nodes of the coding trees are subdivided into one or more arrays of predicted data samples, each known as a prediction unit (PU). Each 5 prediction unit (PU) contains a prediction of a portion of the input video frame data, the prediction being derived by applying an intra-prediction or an inter-prediction process. Several methods may be used for coding prediction units (PUs) within a coding unit (CU). A single prediction unit (PU) may occupy an entire area of the coding unit (CU), or the coding unit (CU) may be split into two equal-sized rectangular prediction units (PUs), 10 either horizontally or vertically. Additionally, the coding units (CU) may be split into four equal-sized square prediction units (PUs). A video encoder compresses the video data into a bitstream by converting the video data into a sequence of syntax elements. A context adaptive binary arithmetic coding (CABAC) scheme is defined within the high efficiency video coding (HEVC) standard 15 under development, using an identical arithmetic coding scheme to that defined in the MPEG4-AVC/H.264 video compression standard. In the high efficiency video coding (HEVC) standard under development, when context adaptive binary arithmetic coding (CABAC) is in use, each syntax element is expressed as a sequence of bins. Each bin is either bypass-coded or arithmetically coded. Bypass coding is used 20 where the value of the bin is equally likely to be 0 or 1. In this case, there is no further compression achievable. Arithmetic coding is used for bins for which probability distribution is such that the likelihood of the bin having a 0 or a 1 value is not equal. Each arithmetically coded bin is associated with information known as a 'context'. Contexts contain a likely bin value (the 'vaIMPS') and a probability state, which is an integer which 25 maps to an estimated probability of the likely bin value occurring. Creating such a sequence of bins, comprising combinations of bypass-coded bins and arithmetic-coded bins, from a syntax element is known as "binarising" the syntax element. Sequences of bins coded using bypass coding may have either fixed lengths or variable lengths and do not use context information. 30 In a video encoder or video decoder, since separate context information is available for each bin, context selection for bins provides a means to improve coding efficiency. In particular, coding efficiency may be improved by selecting a particular bin such that statistical properties from previous instances of the bin, where the associated context information was used, correlate with statistical properties of a current instance of the bin. P033508 / 6,197,598_1 / P033508_SpeciFiled 130412 -4 Such context selection frequently utilises spatially local information within a given frame to determine the optimal context. In the high efficiency video coding (HEVC) standard under development and in H.264/MPEG-4 AVC, a prediction for a prediction unit is derived, based on reference 5 sample data either from other frames, or from neighbouring regions within the current frame that have been previously decoded. The difference between the prediction and the desired sample data is known as the residual. A frequency domain representation of the residual is a two-dimensional array of residual coefficients. By convention, the upper-left corner of the two-dimensional array contains residual coefficients representing low 10 frequency information. In typical video data, the majority of the changes in sample values are gradual, resulting in a predominance of low-frequency information within the residual. This manifests itself as larger magnitudes for residual coefficients located in the upper-left corner of the two-dimensional array. 15 The property of low-frequency information being predominant in the upper-left corner of the two-dimensional array of residual coefficients may be exploited by the chosen binarisation scheme to minimise the size of the residual coefficients in the bitstream. One aspect of binarisation is the selection of contexts to use for coding syntax 20 elements corresponding to individual flags. A flag may commonly use more than one context. The process of determining which context should be used for a particular instance of the flag depends on other already available information and this process is known as 'context modelling'. Context modelling is a process whereby a context is selected that most accurately represents the statistical properties of the present instance of the flag. For 25 example, frequently the value of a flag is influenced by the values of neighbouring instances of the same flag, in which case a context can be selected based on the values of neighbouring instances of the flag. Due to the majority of the frame information being contained in the luma channel, context modelling frequently uses separate contexts for the luma channel and the chroma channels. However, contexts are typically shared between 30 chroma channels, as the statistical properties of the two chroma channels are relatively similar. HM-6.0 divides the transform unit (TU) into a number of sub-sets and scans the residual coefficients in each sub-set in two passes. The first pass encodes flags indicating the status of the residual coefficients as being nonzero-valued (significant) or zero-valued P033508 / 6,197,598 1 / P033508_SpeciFiled 130412 - 5 (non-significant). These flags are known as a significance map. A second pass encodes the magnitude and sign of significant residual coefficients. Using a pre-determined provided scan pattern enables scanning of the two dimensional array of residual coefficients into a one-dimensional array. In the HM-6.0, the 5 provided scan pattern is used for processing both the significance map and the magnitude and sign of significant residual coefficients. By scanning the significance map using the provided scan pattern, the location of the last significant coefficient in the two-dimensional significance map may be determined. Scan patterns may be horizontal, vertical or diagonal. 10 The high efficiency video coding (HEVC) test model 6.0 (HM-6.0) provides support for residual blocks, also known as transform units (TUs) having both a square shape and a non-square shape. Each transform unit (TU) contains a set of residual coefficients. Residual blocks having equally sized side dimensions are known as square transform units (TUs) and residual blocks having unequally sized side dimensions are 15 known as non-square transform units (TUs). Transform unit (TU) sizes supported in HM-6.0 are 4x4, 8x8, 16x 16, 32x32, 4xl6, 16x4, 8x32 and 32x8. Transform unit (TU) sizes are typically described in terms of luma samples, however when a chroma format of 4:2:0 is used, each chroma sample occupies the area of 2x2 luma samples. Accordingly, scanning transform units (TUs) to encode 20 chroma residual data uses scan patterns of half the horizontal and vertical dimensions, such as 2x2 for a 4x4 luma residual block. For the purpose of scanning and coding the residual coefficients, the 16x16, 32x32, 4x16, 16x4, 8x32 and 32x8 transform units (TUs) are divided into a number of sub-blocks, i.e.: a lower-layer of the transform unit (TU) scan, having sizes such as 4x4, 2x8 or 8x2, with a corresponding map existing within HM-5.0. 25 In HM-6.0, 2x8 and 8x2 sub-blocks are used for 8x8 transform units when vertical and horizontal scan patterns are used respectively. In all other cases, 4x4 sub-blocks are used. In HM-6.0, sub-blocks for these transform unit (TU) sizes are co-located with sub-sets in the transform unit (TU).The set of significant coefficient flags within a portion of the significance map collocated within one sub-block is referred to as a significant coefficient 30 group. For the 16x16, 32x32, 4x16, 16x4, 8x32 and 32x8 transform units (TUs), the significance map coding makes use of a two-level scan. The upper level scan performs a scan, such as a backward diagonal down-left scan, to code or infer flags representing the significant coefficient groups of each sub-block. Within the sub-blocks, a scan, such as the backward diagonal down-left scan, is performed to code the significant coefficient flags for P033508 / 6,197,598 l / P033508_Speci Filed 130412 -6 sub-blocks having a significant coefficient group flag with a value of '1'. For a 16xl6 transform unit (TU), a 4x4 upper-level scan is used. For a 32x32 transform unit (TU), an 8x8 upper-level scan is used. For 16x4, 4x16, 32x8 and 8x32 transform unit (TU) sizes, 4x1, 1x4, 8x2 and 2x8 upper-level scans are used respectively. 5 At each transform unit (TU), residual coefficient data may be encoded into a bitstream. Each "residual coefficient" is an integer number representing image characteristics within a transform unit in the frequency (DCT) domain and occupying a unique location within the transform unit. A transform unit is a block of residual data samples that may be transformed between the spatial and the frequency domains. In the 10 frequency domain, the transform unit (TU) encodes the residual data samples as residual coefficient data. Side dimensions of transform units are sized in powers of two (2), ranging from 4samples to 32 samples for a "Luma" channel, and 2to 16samples for a "Chroma" channel. The leaf nodes of the transform unit (TU) tree may contain either a transform unit (TU) or nothing at all, in the case where no residual coefficient data is 15 required. As the spatial representation of the transform unit is a two-dimensional array of residual data samples, as described in detail below, a frequency domain representation resulting from a transform, such as a modified Discrete Cosine Transform (DCT), is also a two-dimensional array of residual coefficients. The spectral characteristics of a typical 20 sample data within a transform unit (TU) are such that the frequency domain representation is more compact than the spatial representation. Further, the predominance of lower-frequency spectral information that is typically found in a transform unit (TU) results in a clustering of larger-valued residual coefficients towards the upper-left of the transform unit (TU), where low-frequency residual coefficients are represented. 25 Modified discrete cosine transforms (DCTs) or modified Discrete Sine Transforms (DSTs) may be used to implement the residual transform. Implementations of the residual transform are configured to support each required transform unit (TU) size. In a video encoder, the residual coefficients from the residual transform are scaled and quantised. The scaling and quantisation reduce the magnitude of the residual coefficients, reducing 30 the size of the data coded into the bitstream at the cost of reducing the image quality. One aspect of the complexity of the high efficiency video coding (HEVC) standard under development is the number of look-up tables required in order to perform the scanning. Each additional look-up table results in an undesirable consumption of memory. P033508 / 6,197,5981 / P033508_SpeciFiled 130412 -7 SUMMARY It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements. Disclosed are arrangements, referred to as Greater Than Two Context Reduction 5 (GTTCR) arrangements, which seek to address the above problems by (a) establishing, in regard to decoding of luma coefficients, a maximum of two contexts for the "greater that two" flag, and (b) defining these contexts to be independent of the number of luma coefficients having a value greater than one in previously decoded TU subsets. According to one aspect of the present invention there is provided a method of 10 decoding a luma coefficient, for a transform unit having at least one sub-set, from a stream of encoded video data, the method comprising the steps of: selecting a sub-set of the transform unit from the stream of encoded video data; selecting, for the sub-set of the transform unit, a context for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag 15 for the transform unit, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, and being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; decoding the luma coefficient for the sub-set of the transform unit using the 20 selected context for the greater-than-two flag; and storing the decoded luma coefficient of the transform unit According to another aspect of the present invention there is provided a video decoder for implementing the above method. According to another aspect of the present invention there is provided a method of 25 encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the method comprising the steps of: selecting a sub-set of the transform unit from the unencoded frame data; selecting, for the sub-set of the transform unit, a context index for a greater-than two flag from a context model having at most two contexts dedicated to said greater-than 30 two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; P033508 / 6,197,598 1 / P033508_SpeciFiled 130412 -8 encoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and storing the encoded luma coefficient of the transform unit. According to another aspect of the present invention there is provided a video 5 decoder, configured for decoding a luma coefficient in a transform unit having at least one sub-set, from a stream of encoded video data, the decoder comprising: a sub-set selector for selecting a sub-set of the transform unit from the stream of encoded video data; a context selector for selecting, for the sub-set of the transform unit, a context for a 10 greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; 15 a decoder for decoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and a processor for storing the decoded luma coefficient of the transform unit. According to another aspect of the present invention there is provided a video encoder for encoding a luma coefficient, for a transform unit having at least one sub-set, 20 from unencoded frame data, the encoder comprising: a selector for selecting a sub-set of the transform unit from the unencoded frame data; a selector for selecting, for the sub-set of the transform unit, a context index for a greater-than-two flag from a context model having at most two contexts dedicated to said 25 greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; an encoder for encoding the luma coefficient for the sub-set of the transform unit 30 using the selected context for the greater-than-two flag; and a processor for storing the decoded luma coefficient of the transform unit. According to another aspect of the present invention there is provided a video decoder comprising: a processor; P033508 / 6,197,5981 / P033508_SpeciFiled 130412 - 9 a memory storing a software executable program for directing the processor to perform a method of decoding a luma coefficient, for a transform unit having at least one sub-set, from a stream of encoded video data, the method comprising the steps of: selecting a sub-set of the transform unit from the stream of encoded video data; 5 selecting, for the sub-set of the transform unit, a context for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub set, the context being selected independently of a number of luma coefficients greater than 10 one in a previously decoded transform unit sub-set; decoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and storing the decoded luma coefficient of the transform unit. According to another aspect of the present invention there is provided a video 15 encoder comprising: a processor; a memory storing a software executable program for directing the processor to perform a method of encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the method comprising the steps of: 20 selecting a sub-set of the transform unit from the unencoded frame data; selecting, for the sub-set of the transform unit, a context index for a greater-than two flag from a context model having at most two contexts dedicated to said greater-than two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of 25 the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; encoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and storing the decoded luma coefficient of the transform unit. 30 According to another aspect of the present invention there is provided a computer readable non-transitory storage medium storing a software executable program for directing a processor to perform a method of decoding a luma coefficient, for a transform unit having at least one sub-set, from a stream of encoded video data, the program comprising: P033508 / 6,197,598_ / P033508_Speci_Filed 130412 - 10 software executable code for selecting a sub-set of the transform unit from the stream of encoded video data; software executable code for selecting, for the sub-set of the transform unit, a context for a greater-than-two flag from a context model having at most two contexts 5 dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; software executable code for decoding the luma coefficient for the sub-set of the 10 transform unit using the selected context for the greater-than-two flag; and software executable code for storing the decoded luma coefficient of the transform unit. According to another aspect of the present invention there is provided a computer readable non-transitory storage medium storing a software executable program for 15 directing a processor to perform a method of encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the program comprising: software executable code for selecting a sub-set of the transform unit from the unencoded frame data; software executable code for selecting, for the sub-set of the transform unit, a 20 context index for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; 25 software executable code for encoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and software executable code for storing the decoded luma coefficient of the transform unit. Other aspects are also disclosed. 30 BRIEF DESCRIPTION OF THE DRAWINGS At least one embodiment of the present invention will now be described with reference to the following drawings, in which: Fig. 1 is a schematic block diagram showing functional modules of a video encoder; P033508 / 6,197,598_1 / P033508_SpeciFiled 130412 -11 Fig. 2 is a schematic block diagram showing functional modules of a video decoder; Figs. 3A and 3B form a schematic block diagram of a general purpose computer system upon which the encoder and decoder of Figs. I and 2, respectively, may be 5 practiced; Fig. 4A is a schematic block diagram showing an exemplary 8x8 transform unit (TU); Fig. 4B is a schematic block diagram showing a set of syntax elements encoding an 8x8 transform unit (TU); 10 Fig. 5 is a schematic block diagram showing an entropy decoder; Fig. 6 is a schematic flow diagram showing a method for decoding residual coefficients; Fig. 7 is a schematic flow diagram showing a method for decoding one bin; Fig. 8 is a schematic block diagram showing an entropy encoder; 15 Fig. 9 is a schematic flow diagram showing a method for decoding one luma coefficient; and Fig. 10 is a schematic flow diagram showing a method for encoding one luma coefficient. DETAILED DESCRIPTION INCLUDING BEST MODE 20 Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. It is to be noted that the discussions contained in the "Background" section relate to 25 discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the present inventor(s) or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art. Fig. I is a schematic block diagram showing functional modules of a video encoder 30 100. Fig. 2 is a schematic block diagram showing functional modules of a corresponding video decoder 200. The video encoder 100 and video decoder 200 may be implemented using a general-purpose computer system 300, as shown in Figs. 3A and 3B where the various functional modules may be implemented by dedicated hardware within the computer system 300, by software executable within the computer system 300, or P033508 / 6,197,5981 / P033508_SpeciFiled 130412 - 12 alternatively by a combination of dedicated hardware and software executable within the computer system 300. As seen in Fig. 3A, the computer system 300 includes: a computer module 301; input devices such as a keyboard 302, a mouse pointer device 303, a scanner 326, a 5 camera 327, and a microphone 380; and output devices including a printer 315, a display device 314 and loudspeakers 317. An external Modulator-Demodulator (Modem) transceiver device 316 may be used by the computer module 301 for communicating to and from a communications network 320 via a connection 321. The communications network 320 may be a wide-area network (WAN), such as the Internet, a cellular 10 telecommunications network, or a private WAN. Where the connection 321 is a telephone line, the modem 316 may be a traditional "dial-up" modem. Alternatively, where the connection 321 is a high capacity (e.g., cable) connection, the modem 316 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 320. 15 The computer module 301 typically includes at least one processor unit 305, and a memory unit 306. For example, the memory unit 306 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 301 also includes an number of input/output (I/O) interfaces including: an audio video interface 307 that couples to the video display 314, loudspeakers 317 and 20 microphone 380; an I/O interface 313 that couples to the keyboard 302, mouse 303, scanner 326, camera 327 and optionally a joystick or other human interface device (not illustrated); and an interface 308 for the external modem 316 and printer 315. In some implementations, the modem 316 may be incorporated within the computer module 301, for example within the interface 308. The computer module 301 also has a local network 25 interface 311, which permits coupling of the computer system 300 via a connection 323 to a local-area communications network 322, known as a Local Area Network (LAN). As illustrated in Fig. 3A, the local communications network 322 may also couple to the wide network 320 via a connection 324, which would typically include a so-called "firewall" device or device of similar functionality. The local network interface 311 may comprise an 30 EthernetTM circuit card, a Bluetooth T M wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 3 11. The 1/O interfaces 308 and 313 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus P033508 / 6,197,598_ / P033508_Speci_Filed 130412 - 13 (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 309 are provided and typically include a hard disk drive (HDD) 310. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 312 is typically provided to act as a non-volatile source of 5 data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-ray Disc ), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 300. Typically, any of the HDD 310, optical drive 312, networks 320 and 322, or camera 327 may for a source for video data to be encoded, or, with the display 314, a destination for decoded video data to be stored or 10 reproduced. The components 305 to 313 of the computer module 301 typically communicate via an interconnected bus 304 and in a manner that results in a conventional mode of operation of the computer system 300 known to those in the relevant art. For example, the processor 305 is coupled to the system bus 304 using a connection 3 18. Likewise, the 15 memory 306 and optical disk drive 312 are coupled to the system bus 304 by connections 319. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or alike computer systems. Where appropriate or desired, the encoder 100 and the decoder 200, as well as 20 methods described below, may be implemented using the computer system 300 wherein the encoder 100, the decoder 200 and the processes of Figs. 10 and 11, to be described, may be implemented as one or more GTTCR software application programs 333 executable within the computer system 300. In particular, the encoder 100, the decoder 200 and the steps of the described methods are effected by instructions 331 (see Fig. 3B) in 25 the software 333 that are carried out within the computer system 300. The software instructions 331 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first 30 part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 300 from the computer readable medium, and then executed by the computer system 300. A computer readable medium having such software or computer program recorded on the P033508 / 6,197,598 1 / P033508_SpeciFiled 130412 - 14 computer readable medium is a computer program product. The use of the computer program product in the computer system 300 preferably effects an advantageous apparatus for implementing the encoder 100, the decoder 200 and the described methods. The software 333 is typically stored in the HDD 310 or the memory 306. The 5 software is loaded into the computer system 300 from a computer readable medium, and executed by the computer system 300. Thus, for example, the software 333 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 325 that is read by the optical disk drive 312. In some instances, the application programs 333 may be supplied to the user 10 encoded on one or more CD-ROMs 325 and read via the corresponding drive 312, or alternatively may be read by the user from the networks 320 or 322. Still further, the software can also be loaded into the computer system 300 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 300 for 15 execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 301. Examples of transitory or non-tangible computer readable 20 transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 301 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. 25 The second part of the application programs 333 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUls) to be rendered or otherwise represented upon the display 314. Through manipulation of typically the keyboard 302 and the mouse 303, a user of the computer system 300 and the application may manipulate the interface in a functionally adaptable 30 manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 317 and user voice commands input via the microphone 380. P033508 / 6,197,5981 / P033508_SpeciFiled 130412 - 15 Fig. 3B is a detailed schematic block diagram of the processor 305 and a memory" 334. The memory 334 represents a logical aggregation of all the memory modules (including the HDD 309 and semiconductor memory 306) that can be accessed by the computer module 301 in Fig. 3A. 5 When the computer module 301 is initially powered up, a power-on self-test (POST) program 350 executes. The POST program 350 is typically stored in a ROM 349 of the semiconductor memory 306 of Fig. 3A. A hardware device such as the ROM 349 storing software is sometimes referred to as firmware. The POST program 350 examines hardware within the computer module 301 to ensure proper functioning and typically 10 checks the processor 305, the memory 334 (309, 306), and a basic input-output systems software (BIOS)module 351, also typically stored in the ROM 349, for correct operation. Once the POST program 350 has run successfully, the BIOS 351 activates the hard disk drive 310 of Fig. 3A. Activation of the hard disk drive 310 causes a bootstrap loader program 352 that is resident on the hard disk drive 310 to execute via the processor 305. 15 This loads an operating system 353 into the RAM memory 306, upon which the operating system 353 commences operation. The operating system 353 is a system level application, executable by the processor 305, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface. 20 The operating system 353 manages the memory 334 (309, 306) to ensure that each process or application running on the computer module 301 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 300 of Fig. 3A must be used properly so that each process can run effectively. Accordingly, the aggregated 25 memory 334 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 300 and how such is used. As shown in Fig. 3B, the processor 305 includes a number of functional modules including a control unit 339, an arithmetic logic unit (ALU) 340, and a local or internal 30 memory 348, sometimes called a cache memory. The cache memory 348 typically includes a number of storage registers 344- 346 in a register section. One or more internal busses 341 functionally interconnect these functional modules. The processor 305 typically also has one or more interfaces 342 for communicating with external devices via P033508 /6,197,5981 / P033508_SpeciFiled 130412 - 16 the system bus 304, using a connection 318. The memory 334 is coupled to the bus 304 using a connection 319. The application program 333 includes a sequence of instructions 331 that may include conditional branch and loop instructions. The program 333 may also include 5 data 332 which is used in execution of the program 333. The instructions 331 and the data 332 are stored in memory locations 328, 329, 330 and 335, 336, 337, respectively. Depending upon the relative size of the instructions 331 and the memory locations 328 330, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 330. Alternately, an instruction may be 10 segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 328 and 329. In general, the processor 305 is given a set of instructions which are executed therein. The processor 305 waits for a subsequent input, to which the processor 305 reacts to by executing another set of instructions. Each input may be provided from one or more 15 of a number of sources, including data generated by one or more of the input devices 302, 303, data received from an external source across one of the networks 320, 302, data retrieved from one of the storage devices 306, 309 or data retrieved from a storage medium 325 inserted into the corresponding reader 312, all depicted in Fig. 3A. The execution of a set of the instructions may in some cases result in 20 output of data. Execution may also involve storing data or variables to the memory 334. The encoder 100, the decoder 200 and the described methods use input variables 354, which are stored in the memory 334 in corresponding memory locations 355, 356, 357. The encoder 100, the decoder 200 and the described methods produce output variables 361, which are stored in the memory 334 in corresponding 25 memory locations 362, 363, 364. Intermediate variables 358 may be stored in memory locations 359, 360, 366 and 367. Referring to the processor 305 of Fig. 3B, the registers 344, 345, 346, the arithmetic logic unit (ALU) 340, and the control unit 339 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for 30 every instruction in the instruction set making up the program 333. Each fetch, decode, and execute cycle comprises: (a) a fetch operation, which fetches or reads an instruction 331 from a memory location 328, 329, 330; P033508 / 6,197,598 1 / P033508_SpeciFiled 130412 - 17 (b) a decode operation in which the control unit 339 determines which instruction has been fetched; and (c) an execute operation in which the control unit 339 and/or the ALU 340 execute the instruction. 5 Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 339 stores or writes a value to a memory location 332. Each step or sub-process in the processes of Figs. 6, 7, 9 and 10 to be described is associated with one or more segments of the program 333 and is performed by the register 10 section 344, 345, 347, the ALU 340, and the control unit 339 in the processor 305 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 333. The encoder 100, the decoder 200 and the described methods may alternatively be implemented in dedicated hardware such as one or more gate arrays and/or integrated 15 circuits performing the GTTCR functions or sub functions. Such dedicated hardware may also include graphic processors, digital signal processors, or one or more microprocessors and associated memories. If gate arrays are used, the process flow charts in Figs. 6, 7, 9 and 10 are converted to Hardware Description Language (HDL) form. This HDL description is converted to a device level net list which is used by a Place and Route (P&R) 20 tool to produce a file which is downloaded to the gate array to program it with the design specified in the HDL description. As described above, the video encoder 100 may be implemented as one or more software code modules of the software application program 333 resident on the hard disk drive 305 and being controlled in its execution by the processor 305. In particular the 25 video encoder 100 comprises modules 102 to 112, 114 and 115 which may each be implemented as one or more software code modules of the software application program 333. Although the video encoder 100 is an example of a high efficiency video coding (HEVC) video decoding pipeline, processing stages performed by the modules 102 to 112, 30 114 and 115 are common to other video codecs such as VC-1 or H.264/MPEG-4 AVC. The video encoder 100 receives unencoded frame data 101 as a series of frames including luminance and chrominance samples. The video encoder 100 divides each frame of the frame data 101 into hierarchical sets of coding units (CUs), representable for example as a coding unit (CU) tree. P033508 / 6,197,598 l / P033508_SpeciFiled 130412 - 18 The video encoder 100 operates by outputting, from a multiplexer module I10, an array of predicted data samples known as a prediction unit (PU) 120. A difference module 115 outputs the difference between the prediction unit (PU) 120 and a corresponding array of data samples received from the frame data 101, the difference being known as residual 5 data samples 122. The residual data samples 122 from the difference module 115 are received by a transform module 102, which converts the difference from a spatial representation to a frequency domain representation to create transform coefficients 124 for each transform unit (TU) in the transform tree. For the high efficiency video coding (HEVC) standard 10 under development, the conversion to the frequency domain representation is implemented using a modified Discrete Cosine Transform (DCT), in which a traditional DCT is modified to be implemented using shifts and additions. The transform coefficients 124 are then input to a scale and quantise module 103 and are scaled and quantised to produce residual coefficients 126. The scale and quantisation process results in a loss of precision. 15 The residual coefficients 126 are taken as input to an inverse scaling module 105 which reverses the scaling performed by the scale and quantise module 103 to produce resealed transform coefficients 128, which are rescaled versions of the residual coefficients 126. The rescaled transform coefficients 128 are identical to the transform coefficients available to a decoder. Availability of the rescaled transform coefficients 128 allows the encoder 100 20 to reconstruct frames identical to those available to a decoder, which may then be used by a motion compensation module 134 and a motion estimation module 107 for inter prediction. The residual coefficients 126 are also taken as input to an entropy encoder module 104 which encodes the residual coefficients in an encoded video data bitstream 113. Due 25 to the loss of precision resulting from the scale and quantise module 103, the rescaled transform coefficients 128 are not identical to the original transform coefficients 124. The rescaled transform coefficients 128 from the inverse scaling module 105 are then output to an inverse transform module 106. The inverse transform module 106 performs an inverse transform from the frequency domain to the spatial domain of the rescaled transform 30 coefficients 128 to produce a spatial-domain representation 130 of the rescaled transform coefficients 128 identical to a spatial domain representation that is produced at a decoder. A motion estimation module 107 produces motion vectors 132 by comparing the frame data 101 with previous frame data stored in a frame buffer module 112, typically configured within the memory 306. The motion vectors 132 are then input to a motion P033508 / 6,197,598 1 / P033508_SpeciFiled 130412 -19 compensation module 108 which produces inter-predicted reference samples 134 by filtering samples stored in the frame buffer module 112, taking into account a spatial offset derived from the motion vectors 132. Not illustrated in Fig. 1, the motion vectors 132 are also passed as syntax elements to the entropy encoder module 104 for coding in the 5 encoded bitstream 113. An intra-frame prediction module 109 produces intra-predicted reference samples 136 using samples 138 obtained from a summation module 114, which sums the output 120 of the multiplexer module 110 and the output 130 from the inverse transform module 106. Prediction units (PUs) may be coded using intra-prediction or inter-prediction 10 methods. The decision as to whether to use intra-prediction or inter-prediction is made according to a rate-distortion trade-off between a desired bit-rate of the resulting encoded bitstream 113 and the amount of image quality distortion introduced by either the intra prediction or inter-prediction method. The multiplexer module 110 selects either the intra predicted reference samples 136 from the intra-frame prediction module 109 or the inter 15 predicted reference samples 134 from the motion compensation module 108, depending on a current prediction mode 142, determined by control logic not illustrated but well-known in the art. The prediction mode 142 is also provided to the entropy encoder 104 as illustrated and as such is used to determine or otherwise establish the scan pattern of transform units as will be described. Inter-frame prediction uses only a diagonal scan 20 pattern, whereas intra-frame prediction may use the diagonal scan, a horizontal scan or a vertical scan pattern. The summation module 114 produces a sum 138 that is input to a deblocking filter module I11. The deblocking filter module 111 performs filtering along transform unit boundaries, producing deblocked samples 140 that are written to the frame buffer module 25 112 configured within the memory 306. The frame buffer module 112 is a buffer with sufficient capacity to hold data from multiple past frames for future reference. In the video encoder 100, the residual data samples 122 within one transform unit (TU) are determined by finding the difference between data samples of the input frame data 101 and the prediction 120 of the data samples of the input frame data 101. The 30 difference provides a spatial representation of the residual coefficients of the transform unit (TU). In operation of the entropy encoder module 104, the residual coefficients of a transform unit (TU) are converted to the two-dimensional significance map. The significance map of the residual coefficients in the transform unit (TU) is then scanned in a P033508 / 6,197,598_ / P033508_Speci_Filed 130412 -20 particular order, known as a scan order, to form a one-dimensional list of flag values, called a list of significant coefficient flag syntax elements, or 'significant coefficient flags'. The scan order may be described or otherwise specified by a scan pattern, such as that received with the prediction mode 142 from the intra-prediction module 109. The intra 5 prediction module 109 determines an intra-prediction mode that may be used to select the scan pattern. For example, if intra-prediction model (vertical intra-prediction) is selected then horizontal scanning is used. If intra-prediction mode 0 (planar intra-prediction) is selected then diagonal scanning is used, while if intra-prediction mode 2 (horizontal intra prediction) is selected then vertical scanning is used. The scan pattern may be horizontal, 10 vertical, diagonal or zig-zag. Version 6 of the high efficiency video coding (HEVC) test model performs scanning in a backward direction however scanning in a forward direction is also possible. For 16x16, 32x32, 4x16, 16x4, 8x32 and 32x8 transform units (TUs), a two-level scan is defined where the transform unit (TU) is divided into a set of sub-blocks, each sub 15 block having a square shape. At an upper level, scanning is performed by scanning each lower-level using a scan such as the backward diagonal down-left scan. At the lower level, also known as the sub-block level, scanning also is performed using a scan such as the backward diagonal down-left scan. In HEVC reference model version 6.0, the scan operation starts one residual 20 coefficient after a last significant coefficient (where 'after' is in the direction of a backward scan of the residual coefficients) and progresses until an upper-left location of the significance map is reached. Scan operations having this property and which accord to the HEVC reference model version 6.0 are known as 'backward scans'. In the HEVC reference software version 6.0, the location of the last significant coefficient is signalled by 25 encoding co-ordinates of the coefficient in the transform unit (TU). Those familiar with the art will appreciate that the use of the adjective "last" in this context is dependent upon the particular order of scanning. What may be the "last" non-zero residual coefficient or corresponding one-valued significant coefficient flag according to one scan pattern may not be the "last" according to another scan pattern. 30 The list of significant coefficient flags, indicating the significance of each residual coefficient prior to the last significant coefficient, is coded into the bitstream 113. The last significant coefficient flag value is not required to be explicitly encoded into the bitstream 113 because the prior coding of the location of the last significant coefficient flag implicitly indicates that this residual coefficient is significant. P033508 / 6,197,598 1 / P033508_SpeciFiled 130412 -21 The clustering of larger-valued residual coefficients towards the upper-left of the transform unit (TU) results in most significance flags earlier in the list being significant, whereas few significance flags are found later in the list. The entropy encoder module 104 also produces syntax elements from incoming 5 residual coefficient data (or residual coefficients) 126 received from the scale and quantise module 103. The entropy encoder module 104 outputs the encoded bitstream 113 and will be described in more detail below. For the high efficiency video coding (HEVC) standard under development, the encoded bitstream 113 is delineated into network abstraction layer (NAL) units. Each slice of a frame is contained in one NAL unit. 10 There are several alternatives for the entropy encoding method implemented in the entropy encoder module 104. The high efficiency video coding (HEVC) standard under development supports context adaptive binary arithmetic coding (CABAC), a variant of context adaptive binary arithmetic coding (CABAC) found in H.264/MPEG-4 AVC. An alternative entropy coding scheme is the probability interval partitioning entropy (PIPE) 15 coder, which is well-known in the art. For a video encoder 100 supporting multiple video coding methods, one of the supported entropy coding methods is selected according to the configuration of the encoder 100. Further, in encoding the coding units from each frame, the entropy encoder module 104 writes the encoded bitstream 113 such that each frame has one or more slices 20 per frame, with each slice containing image data for part of the frame. Producing one slice per frame reduces overhead associated with delineating each slice boundary. However, dividing the frame into multiple slices is also possible. Fig. 2 is a schematic block diagram showing functional modules of a video decoder 200. The video decoder 200 may be implemented as one or more software code modules of 25 the software application program 333 resident on the hard disk drive 305 and being controlled in its execution by the processor 305. In particular the video decoder 200 comprises modules 202 to 208 and 210 which may each be implemented as one or more software code modules of the software application program 333. Alternately, the video decoder 200 may be implemented in hardware or a mix of hardware and software. 30 Although the video decoder 200 is described with reference to a high efficiency video coding (HEVC) video decoding pipeline, processing stages performed by the modules 202 to 208 and 209 are common to other video codecs that employ entropy coding, such as H.264/MPEG-4 AVC, MPEG-2 and VC-1. P033508 / 6,197,598 l / P033508_SpeciFiled 130412 -22 An encoded bitstream, such as the encoded bitstream 113, is received by the video decoder 200. The encoded bitstream 113 may be read from memory 306, the hard disk drive 310, a CD-ROM, a Blu-rayTM disk or other computer readable storage medium. Alternatively the encoded bitstream 113 may be received from an external source such as a 5 server connected to the communications network 320 or a radio-frequency receiver. The encoded bitstream 113 contains encoded syntax elements representing frame data to be decoded. The encoded bitstream 113 is input to an entropy decoder module 202 which extracts the syntax elements from the encoded bitstream 113 and passes the values of the 10 syntax elements to other modules in the video decoder 200. There may be multiple entropy decoding methods implemented in the entropy decoder module 202, such as those described with reference to the entropy encoder module 104. Syntax element data 220 representing residual coefficient data is passed to an inverse scale and transform module 203 and syntax element data 222 representing motion vector information is passed to a 15 motion compensation module 204. The inverse scale and transform module 203 performs inverse scaling on the residual coefficient data to create reconstructed transform coefficients. The module 203 then performs an inverse transform to convert the reconstructed transform coefficients from a frequency domain representation to a spatial domain representation, producing residual samples 224, such as the inverse transform 20 described with reference to the inverse transform module 106. The motion compensation module 204 uses the motion vector data 222 from the entropy decoder module 202, combined with previous frame data 226 from a frame buffer module 208, configured within the memory 306, to produce inter-predicted reference samples 228 for a prediction unit (PU), being a prediction of output decoded frame data. 25 When a syntax element indicates that the current coding unit was coded using intra prediction, the intra-frame prediction module 205 produces intra-predicted reference samples 230 for the prediction unit (PU) using samples spatially neighbouring the prediction unit (PU). The spatially neighbouring samples are obtained from a sum 232 output from a summation module 210. The multiplexer module 206 selects intra-predicted 30 reference samples or inter-predicted reference samples for the prediction unit (PU) depending on the current prediction mode, which is indicated by a syntax element in the encoded bitstream 113. An array of samples 234 output from the multiplexer module 206 is added to the residual samples 224 from the inverse scale and transform module 203 by the summation module 210 to produce the sum 232 which is then input to each of a P033508 /6,197,5981 / P033508_SpeciFiled 130412 - 23 deblocking filter module 207 and the intra-frame prediction module 205. In contrast to the encoder 100, the intra-frame prediction module 205 receives a prediction mode 236 from the entropy decoder 202. The multiplexer 206 receives an intra-frame prediction / inter frame prediction selection signal 236 from the entropy decoder 202. The deblocking filter 5 module 207 performs filtering along transform unit boundaries to smooth artefacts visible along the transform unit boundaries. The output of the deblocking filter module 207 is written to the frame buffer module 208 configured within the memory 306. The frame buffer module 208 provides sufficient storage to hold multiple decoded frames for future reference. Decoded frames 209 are also output from the frame buffer module 208. 10 Fig. 4A is a schematic block diagram showing an exemplary 8x8 transform unit (TU) 400.The transform unit 400 has side dimensions of 8x8. Each transform unit has a particular size, selected from a set of possible transform unit sizes. Possible transform unit sizes vary from 4x4 to 32x32 and may include both square and non-square shapes. The example transform unit 400 shown in Fig. 4A illustrates the properties of transform units, 15 with the described properties being applicable to transform units of any defined size. Side dimensions for all possible transform unit sizes are powers of two, with a minimum of 4. A transform unit holds residual coefficients, such as the residual coefficients 220 or the residual coefficients 126. In order to represent the residual coefficients in a bitstream, such as the encoded 20 bitstream 113, it is necessary to convert the two-dimensional array of residual coefficients in the transform unit 400 into a one-dimensional list of coefficients. This conversion is accomplished by scanning the transform unit, using a scan pattern such as a backward diagonal sub-block scan pattern 403. The backward diagonal sub-block scan pattern 403 is defined over the entirety of the transform unit 400. However, the backward diagonal sub 25 block scan pattern 403 need not be applied to the entirety of the transform unit 400 in order to encode all significant (non-zero) residual coefficients. The residual coefficients in the transform unit 400 are grouped into smaller two dimensional structures, known as sub-blocks. An example sub-block 402 illustrates the spatial arrangement of a sub-block as having side dimensions of 4x4. The example sub 30 block 402 is one of four sub-blocks 401 that completely occupy the area of the transform unit 400, in a non-overlapping fashion and with no gaps between the four sub-blocks 401. The backward diagonal sub-block scan pattern 403 scans each sub-block in a backward diagonal down-left direction, starting from the lower-right residual coefficient in the sub P033508 / 6,197,598_1 / P033508_SpeciFiled 130412 - 24 block and progressing to the upper-left residual coefficient in the sub-block. In progressing to the next sub-block, a backward diagonal down-left scan is also applied. Although the TU 400 in Fig. 4A has four 4X4 sub-blocks, this is only one example of a TU. Thus for example, it is possible for a TU to comprise only a single 4x4 sub-block. 5 Other configurations are also possible. Fig. 4B is a schematic block diagram showing a set of syntax elements encoding an 8x8 transform unit (TU). A bitstream portion 410 of the encoded bitstream 113 (see Figs. I and 2) will be described with reference to Fig. 4B. The bitstream portion represents a number of concatenated syntax elements. Syntax elements in the bitstream portion 410 10 encode residual coefficients in the transform unit 400. The arrangement of syntax elements defined in the bitstream portion 410 is intended to exploit statistical properties of the residual coefficients in the transform unit 400 to achieve maximum compression efficiency. A last significant position 411 encodes the location of the last non-zero residual coefficient in the transform unit 400 into the bitstream portion 410. The last significant 15 position 411 uses a combination of arithmetically-coded and bypass-coded bins, represented as 'A' and 'B' in Fig. 4. The last significant coefficient could be located at any location in the transform unit 400, however due to the clustering of residual energy in lower frequency coefficients of the discrete cosine transform, the last significant coefficient is typically located towards the upper left region of the transform unit 400. 20 The transform unit is then scanned, along the scan path defined by the backward diagonal sub-block scan pattern 403, from the last significant coefficient back to the upper left residual coefficient, also known as the DC residual coefficient. During the scan, the residual coefficients are divided into sub-sets, commencing a new sub-set every sixteenth residual coefficient. The boundary between sub-sets is typically aligned to the boundary 25 between sub-blocks. For example, a sub-set 412 contains the same set of residual coefficients as the sub-block 402. The process of scanning from one sub-set to the next corresponds to an 'upper level scan' whereby for hardware implementations a sub-set selector selects one sub-set of the transform unit 400. As the upper level scan iterates over the sub-sets in the transform unit 400, the selector selects consecutive sub-sets in the 30 transform unit 400. Note that the last significant coefficient may occur at any position in the transform unit 400 and accordingly may occur on locations that are not on boundaries between adjacent sub-sets (adjacent in scan order). Since scanning is performed in a backward direction starting from the last significant coefficient, the first sub-set of data, in scan order, may have a length below 16. Furthermore, note that sub-sets whose content can P033508 / 6,197,5981 / P033508_SpeciFiled 130412 -25 be inferred as all zero-valued residual coefficients, due to the last significant position 411, are not encoded, although four sub-sets are depicted in the bitstream portion 410. The example sub-set 412 contains several syntax elements describing residual coefficients in the sub-block 402. An arithmetically coded significant coefficient group 5 flag 413 indicates that there is at least one non-zero residual coefficient. The significant coefficient group flag 413 provides an efficient means to indicate that a particular sub block in the transform unit 400 contains no significant residual coefficients. A (set of) arithmetically coded significantcoefficientflags 414, encoded only if the significant coefficient group flag 413 is true, indicate the status of each corresponding 10 residual coefficient as significant or not. No significantcoefficient_flag is present for the residual coefficient located at the last significant position 411, which is known to be significant. A (set of) arithmetically coded coeffabslevelgreaterl_flags4l5 indicates if the magnitude of the corresponding residual coefficient is greater than two. Once eight 15 instances of the coeffabslevelgreaterl flag 415 is encountered for any sub-set, the flag is no longer coded for that sub-set in the bitstream portion 410. An arithmetically coded coeffabs_level_greater2_flag 416, also referred to as a greater than two flag, is encoded in the bitstream portion 410 to indicate that the residual coefficient magnitude is greater than two. Only one instance of the 20 coeffabs level greater2_flag 416 is encoded in the bitstream portion 410 for each sub-set. A (set of) bypass-coded coeffsignflags 417 are coded to indicate the sign of each corresponding significant residual coefficient in the sub-set. A (set of) bypass-coded coeffabslevelremaining syntax elements 418 are coded to indicate the remaining magnitude of residual coefficients in the sub-set under the 25 following conditions: e the magnitude of the residual coefficient is non-zero, as indicated by the significant coefficient flags 414; and * the magnitude of the residual coefficient was not already constrained to a particular value by either or both of coeffabslevel_greater Iflag 415 or 30 coeffabslevelgreater2_flag 416. Fig. 5 is a schematic block diagram showing the entropy decoder 202. Description of the entropy decoder 202 will focus on decoding residual coefficients. A coefficient inverse binariser 501 selects contexts, such as a context 504, from a context model 503. The context model 503 is a memory holding an array of contexts. A selected context505 is P033508 /6,197,598_ / P033508_SpeciFiled 130412 -26 used by an arithmetic decoder 506 to decode bins, such as a bin 507, from the encoded bitstream 113. The bin 507 is used to update the state of the selected context 505, to create an updated selected context 509. The updated selected context 509 is written back to the selected context, such as the context 504, of the context model 503. 5 Accessing the context model 503 for each bin requires a large memory bandwidth. Memories requiring a large memory bandwidth are ideally placed on-chip, however on chip memory consumes large silicon area and hence is only suitable for small memories. Accordingly, reducing the size of the context model 503 is advantageous for low-cost implementation of the entropy coder 202. The bin 507 is also input to the coefficient 10 inverse binariser 501 to enable progression of the inverse binarisation process. Progression of the inverse binarisation process causes the entropy decoder 202 to construct the residual coefficients 220, which are output to the inverse scale and transform module 203. Fig. 6 is a schematic flow diagram showing a method for decoding residual coefficients by the entropy decoder 202 depicted in Fig. 5. The inverse binarisation process 15 of the coefficient inverse binariser50l will be described with reference to Figs. 4 and 6. A method 600 decodes the residual coefficients 220 from the bitstream portion 410, using the entropy decoder 202. The method 600 begins with a "decode last significant position" step 601 that decodes co-ordinates specifying the position of the last significant coefficient, such as the last significant position 411, for the transform unit 400. The remaining steps of 20 the method 600 then decode residual coefficients commencing from the last significant position 411, and through to the DC coefficient, in order of the backward diagonal scan pattern 403. A following "decode significant_coeffgroupflag" step 602 decodes a flag indicating if the current sub-block contains at least one significant coefficient. Thereafter, a 25 "significant coefficient group test" step 603 tests the flag value. If the flag is TRUE, control passes to a "decode significant coefficient flags" step 604. Otherwise, if the flag is FALSE, control passes to a "more sub-sets test" step 614. The residual coefficients in a sub-set with a false flag value are inferred to be zero. The "decode significant coefficients flag" step 604 decodes a set of flags, such as 30 the significant coefficient flags 414, indicating the significance of each residual coefficient in the sub-set. Residual coefficients that are significant are assigned a value of one, and all others are zero. Flags located prior to the last significant coefficient position 411 are not encoded in the bitstream portion 410 and accordingly, are not decoded by the "decode significant coefficients flag" step 604, and instead, those flags are inferred to be false. P033508 / 6,197,598 l / P033508_SpeciFiled 130412 -27 Thereafter, in regard to a current sub-set, a"greaterl _flag count test" step 606, a "decode greater _flag" step 607 and an "increment greaterl_flag count and residual" step 608 are successively repeated until either (a) a "greater _flag" count (accumulated by the "increment greater count and residual" step 608) reaches a pre-determined threshold, or 5 (b) each significant coefficient in the sub-set is processed, operating in scan order. The greaterr_ flag count" test step 606 thus compares a count of how many greaterl_flags have been encountered in the sub-set to the threshold. If this count exceeds the threshold, such as eight, or every significant residual coefficient in the current sub-set has been processed by the "greaterl_flag count test" step 606, the "decode greater _flag" step 607 10 and the "increment greaterlflag count and residual" step 608, control passes to a greater2_flagcount test step 609. Otherwise, the "decode greaterl_flag" step 607 decodes a flag, such as the "coeffabslevelgreaterl _flag" 415 from the bitstream portion 410. When true, this flag indicates that a residual coefficient has a magnitude exceeding one. Otherwise, the magnitude of the residual coefficient is equal to one. When the flag is true, 15 the "increment greater count and residual" step 608 increments the value of the "greater flag" count, and increments the magnitude of the residual coefficient from one to two. When the flag is not coded for the residual coefficient, no further information is known about the magnitude of the residual coefficient at this point (beyond the coefficient being non-zero due to the significant coefficient flag 414). 20 Thereafter, the"greater2_flag count test" step 609, a "decode greater2 flag" step 610 and an "increment greater2_flag count and residual" step 611 are successively repeated until either (a) a pre-defined "greater2_flag" count threshold is reached, or (b) each residual coefficient with a magnitude of two is processed. The "greater2_flag count" test step 609 compares the "greater2_flag count" against the threshold, such as 1, and if the 25 threshold is reached or all residual coefficients with a magnitude of two in the sub-set have been processed, control passes to a "decode coeff sign flag" step 612.Otherwise control passes to the "decode greater2_flag" step 610. The "decode greater2_flag" step 610 decodes a "coeffabs_levelgreater2_flag", such as the "coeffabslevel greater2 flag" 416. The "coeffabs_levelgreater2_flag" 30 indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two. If the decoded flag is true, the corresponding residual coefficient is incremented from two to three. Regardless of the flag value, the "greater2_flag" count is incremented. The "decode coeff sign flag" step 612 decodes one sign flag per significant residual coefficient, such as the coeff sign flags 417, in the sub-set. The signs of the residual P033508 / 6,197,598_ / P033508_SpeciFiled 130412 -28 coefficient are set according to the decoded flags, with true indicating negative and false indicating positive. Thereafter, a "decode coeffabslevelremaining" step 613 decodes any remaining residual coefficients where the magnitude is unknown. For example, if a residual 5 coefficient is significant and the "coeffabs_levelgreater flag" and the "coeffabs level greater2_flag" were not coded due to threshold exceeded for greaterl flag" count and "greater2_flag" count, the magnitude may be any integer greater than zero and is coded using "coeff abslevelremaining" 418 syntax elements. If a residual coefficient is significant, the "coeffabs levelgreaterl _flag" is true and the 10 "coeffabs level greater2_flag" is false, the magnitude is known to be 2 and no "coeff abs _levelremaining" syntax element is coded. Thereafter, the "more sub-sets test" step 614 iterates over each sub-set in the transform unit 400 until sub-set 0 (as indicated on Fig. 4A) has been decoded, at which point the method 600 terminates. To illustrate the operation of the method 600, the following example is provided 15 demonstrating the decoding of one sub-set. The sub-set contains the following 16 residual coefficients, with scan positions numbered from 15 down to 0 (due to the backward scan direction): 0, 1,-1, 0,2,4,-1, -4,4,2, -6,4,7,6, -12 and -18. As this set of residual coefficients includes at least one non-zero residual 20 coefficient, the "significant coefficient group flag" 413 is true. The "significant coefficient" flags 414 have values as follows: 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1 and 1. The "coeff _abslevelgreater _flags" 415 have values as follows: -, 0, 0, -, 1, 1, 0, 1, , 1 ,-- and -. 25 The dashes indicate locations along the 16 scan positions in the sub-set where no instance of the "coeffabs_levelgreater _flag" 415 is coded. Note that after coding eight instances of "coeffabslevelgreater flags" 415, no further instances of this flag are coded, even though further residual coefficients are significant (have a magnitude greater than 0). 30 The "coeffabslevelgreater2_flag" 416 is coded as follows: -, -, -, -, 0, - ,- ,- ,- ,- ,- ,- ,- ,- ,- and -. Again, the dashes indicate locations along the 16 scan positions in the sub-set where no instance of the "coeffabslevelgreater2_flag" 416 is coded. Note that only a single instance of the "coeffabs_levelgreater2_flag" is coded when the first residual P033508 / 6,197,5981 / P033508_SpeciFiled 130412 -29 coefficient with a magnitude greater than one is encountered and no further instances of this flag are coded, even though further residual coefficients have a magnitude greater than 2. The "coeffsignflags" 417 are coded as follows: 5 -,0, 1,-,0,0, 1, 1,0,0, 1,0,0,0, 1 and 0. An instance of this flag is coded for each significant residual coefficient in the sub set. The "coeff _abslevelremaining" syntax elements 418 have the following values: - ,-,- ,-, 2, 2, 0, 5, 3, 6, 5, ll and 17. 10 Note that for a given residual coefficient, an instance of the "coeff abslevelremaining" syntax elements 418 is only coded where the previously coded flags for the residual coefficient are insufficient to determine the value of the residual coefficient. Fig. 7 is a schematic flow diagram showing a method 700 for decoding one bin. 15 The "decode greater2_flag" method 610 in Fig. 6 is described hereinafter in more detail with reference to a method 700 of Fig. 7 and Fig. 5. As indicated in the method 610, at most one instance of the "coeffabs_level _greater2_flag" 416 is coded for each sub-set in the bitstream portion 410. A "determine context index" step 701 determines a context index, such as the context index 502, which is then provided as an input to the context 20 model 503. For hardware implementations, a context selector implemented in the coefficient inverse binariser 501 performs the functionality of the "determine context index" step 701, as described with reference to the method 700. Thereafter a "read context model" step 702 reads a context, selected by the determined context index, from the context model 503. A "decode coeffabs_level _greater2_flag" step 703 subsequently 25 decodes one bin using the selected context. Thereafter, the selected context is updated, based on the value of the decoded bin, and written back to the context model 503 in a "write to context model" step 704. The "determine context index" step 701 performs the context modelling for the coeffabs level greater2_flag 416. In one implementation, the selection of the context in 30 the luma channel is independent of the values of previously decoded luma coefficients. Instead the selection of the context model is based on a location of the sub-set in the TU. A dependency on the location of the sub-set within the transform unit 400 is retained. This dependency is a test of whether the current sub-set is sub-set 0 or not. The result of using this dependency exclusively for context selection is a requirement for two contexts for P033508 / 6,197,598_ / P033508_Speci_Filed 130412 -30 coeffabs level greater2_flag. Note that separate contexts are still used for the luma and chroma channels. In this implementation, the allocation of contexts required for the coeffabs level greater2_flag is as follows: Channel Location Luma Sub-set 0 Other sub-sets Chroma Sub-set 0 Other sub-sets 5 Independence from the values of previously coded luma coefficients results in a requirement for two contexts for the luma channel for coeffabs levelgreater2_flag. This is in contrast to a requirement for four contexts when the dependency on the previous luma coefficient count is included. 10 In another implementation, independence of the selection of the context from the location of the current sub-set in the transform unit 400 is introduced, while a dependence on the previously decoded coefficient values is retained. In this implementation, the allocation of contexts required for coeffabs_levelgreater2_flag is as follows: Channel Previous sub-set Luma LargerTi = 0 LargerTI> 0 Chroma LargerTi = 0 LargerT I> 0 15 Independence of the selection of the context from the location of the current sub-set within the transform unit 400 results in a requirement for two contexts for the luma channel for coeffabs_ levelgreater2_flag (also in contrast to the previous requirement for four contexts when both dependencies were retained). The 'LargerTl' is a modified 20 cumulative count of the number of residual coefficients having a magnitude larger than one from the previous sub-set and will be described further below. Each implementation achieves very close coding efficiency performance in HM 6.0, with a loss not exceeding 0.01% in the luma channel under the JCT-VC "common test conditions", while removing two contexts from the context model 503. P033508 / 6,197,598_ / P033508_SpeciFiled 130412 -31 The nature of the context modelling is dependent on the channel of the transform unit 400 which is being processed, either luma or chroma. In the first implementation, the context modelling is also dependent on the present sub-set being processed, with sub-set 0 (located in the upper-left 4x4 sub-block of the transform unit 400) being allocated a 5 separate context. The context modelling is independent of the direct counts or modified counts derived from the values of luma residual coefficients from previous sub-set(s), such as a count of the larger-than-one or larger-than-two residual coefficients from the previous sub set(s). 10 The 'LargerTl' count may be used to assist the context modelling by providing an indication of the magnitudes of residual coefficients in previous sub-sets. Where many large residual coefficients were encountered previously, it may be likely that the current sub-set will also contain large residual coefficients. The 'LargerTl' count is initialised to 0 at the beginning of processing the transform unit. Once processing a previous sub-set 15 completes, The 'LargerTl' count is incremented once for each residual coefficient in the previous sub-set having a magnitude of two or more. The 'LargerTl' count is then used in the context modelling of the determine context index step 701, by testing the value to see if it is zero or nonzero. Once this test is performed, the 'LargerTl' count is shifted right by one (i.e.: integer divided by two). In this way, the 'LargerTl' count provides a cumulative 20 count of previous residual coefficients with magnitudes exceeding one, and includes a 'decay' mechanism to reduce the impact of sub-sets located further away from the current sub-set. In the first and second implementations independence from either the previous sub set(s) or the location of the present sub-set within the transform unit 400 result in a reduced 25 number of contexts being required for the context modelling, and therefore reduces the memory requirement of the context model 503. As the "coeffabs_levelgreater2_flag" is coded at most once per sub-set, a dependence on either the number of previous larger than one residual coefficients or the location of the current sub-set in the transform unit 400 does not contribute to coding efficiency, yet does introduce a requirement for a larger 30 memory size of the context model 503. Particularly for hardware implementations, the silicon area consumed by on-chip memory is an important factor in the manufacturing cost of the integrated circuit. Accordingly, where it is possible to remove contexts yet have minimal impact on the coding efficiency of the video encoder 100 or the video decoder 200, it is desirable to do so. P033508 / 6,197,598 l / P033508_SpeciFiled 130412 -32 Fig. 8 is a schematic block diagram showing an entropy encoder, and the entropy encoder 104 will be described with reference to Fig. 8. Modules in the entropy encoder 104 operate in a similar manner to corresponding modules in the entropy decoder 202, the key difference being that syntax elements are being encoded instead of being decoded. 5 A coefficient binariser 801 receives the set of residual coefficients 126 for the transform unit 400. The coefficient binariser 801 creates a sequence of syntax elements to represent the residual coefficients in the transform unit 400. Operation of the coefficient binariser 801 is analogous to the method 600 of Fig. 6, except that decode operations are changed to encode operations. The sequence of syntax elements is represented as a 10 sequence of context indices 802 and bins 807. The context indices 802 are provided to a context model 803 to select a sequence of contexts 805, comprising contexts such as a context 804. Each context in the sequence of contexts 805 is received by a context updater 808 and an arithmetic encoder 806. For each context, the context updater 808 uses the bin 807 to produce an updated context 809 which is written back to the context model 803. 15 The arithmetic encoder 806 encodes the bin 807 into the encoded bitstream 113 using the context from the sequence of contexts 805. Fig. 9 is a schematic flow diagram showing a method for decoding one luma coefficient. A method 900 for decoding a luma coefficient will be described with reference to Figs. 6, 7 and 9. The method 900 corresponds with steps of the methods 600 and 700. 20 The method 900 starts with a "select portion of transform unit" step 901. The select portion of transform unit step 901 selects sub-sets of the transform unit that contain at least one significant residual coefficient, as described with reference to the "significant coefficient group test" step 603. Thereafter a "select context index" step 902 selects a context index for the "coeffabs_level_greater2_flag" (when present), in accordance with 25 the "determine context index" step 701. A subsequent "decode luma coefficient" step 903 decodes the value of one luma coefficient using the selected context index, in accordance with steps the 604-613. Thereafter, a "store luma coefficient" step 904 writes the decoded luma coefficient to a memory buffer, or register, used for communication with a next processing stage, such as the inverse scale and transform module 203. 30 Fig. 10 is a schematic flow diagram showing a method for encoding one luma coefficient. A method 1000 for encoding a luma coefficient will be described with reference to Figs. 6, 7 and 10. The method 1000 corresponds with steps of the methods 600 and 700 accordingly, varied for an encoding operation by way of the following modifications: P033508 / 6,197,5981 / P033508_SpeciFiled 130412 -33 1. The decode significantcoeff groupflag step 602 is now operable to encode a flag in the bitstream portion 410, indicating that at least one residual coefficient of the current sub-set is significant; and 2. The decode significant coefficient flags step 604 is now operable to encode a set of 5 significant coefficient flags, indicating the significant status of each residual coefficient in the current sub-set; and 3. The decode greaterl_flag step 607 is now operable to encode a flag indicating that the corresponding residual coefficient in the current sub-set has a magnitude exceeding one; and 10 4. The decode greater2 flag step 608 is now operable to encode a flag indicating that the corresponding residual coefficient in the current sub-set has a magnitude exceeding two; and 5. The decode coeff sign_ flag step 612 is now operable to encode a set of flags indicating the arithmetic sign of each residual coefficient in the current sub-set; and 15 6. The decode coeffabslevelremaining step 613 is now operable to encode a set of coeffabslevelremaininng syntax elements in the bitstream portion 410, indicating the magnitudes of residual coefficients that are not fully specified by the corresponding significant coefficient flag, (if present) the corresponding coeffabslevelgreaterl flag and (if present) the corresponding 20 coeffabslevel _greater2_flag; and 7. The decode coeffabs_level _greater2_flag step 703 is now operable to encode an instance of the coeffabs_level_greater2_flag in the bitstream portion 410. The method 1000 starts with a "select portion of transform unit" step 1001. The "select portion of transform unit" step 1001 selects sub-sets of the transform unit that 25 contain at least one significant residual coefficient, as described with reference to the "significant coefficient group test" step 603. A subsequent "select context index" step 1002 selects a context index for the "coeffabs level _greater2_flag" (when present), in accordance with the "determine context index" step 701. Thereafter an "encode luma coefficient" step 1003 encodes the value of one luma coefficient using the selected context 30 index, in accordance with steps 604-613. Finally, a "store luma coefficient" step 1004 writes the encoded luma coefficient to a bitstream portion, such as the bitstream portion 410. P033508 / 6,197,5981 / P033508_SpeciFiled 130412 - 34 INDUSTRIAL APPLICABILITY The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video signals. 5 The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only 10 of". Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. P033508/ 6,197,598_1/ P033508_Speci Filed 130412

Claims

1. A method of decoding a luma coefficient, for a transform unit having at least one 5 sub-set, from a stream of encoded video data, the method comprising the steps of: selecting a sub-set of the transform unit from the stream of encoded video data; selecting, for the sub-set of the transform unit, a context for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag for the transform unit, the greater-than-two flag indicating a first instance in a sub-set of a 10 residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set and being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; decoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and 15 storing the decoded luma coefficient of the transform unit.

2. A method of encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the method comprising the steps of: selecting a sub-set of the transform unit from the unencoded frame data; 20 selecting, for the sub-set of the transform unit, a context index for a greater-than two flag from a context model having at most two contexts dedicated to said greater-than two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients 25 greater than one in a previously decoded transform unit sub-set; encoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and storing the encoded luma coefficient of the transform unit. 30

3. A video decoder, configured for decoding a luma coefficient in a transform unit having at least one sub-set, from a stream of encoded video data, the decoder comprising: a sub-set selector for selecting a sub-set of the transform unit from the stream of encoded video data; P033508 / 6,197,598 l / P033508_Speci_Filed 130412 -36 a context selector for selecting, for the sub-set of the transform unit, a context for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma 5 coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; a decoder for decoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and a processor for storing the decoded luma coefficient of the transform unit. 10

4. A video encoder for encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the encoder comprising: a selector for selecting a sub-set of the transform unit from the unencoded frame data; 15 a selector for selecting, for the sub-set of the transform unit, a context index for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma 20 coefficients greater than one in a previously decoded transform unit sub-set; an encoder for encoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and a processor for storing the decoded luma coefficient of the transform unit. 25

5. A video decoder comprising: a processor; a memory storing a software executable program for directing the processor to perform a method of decoding a luma coefficient, for a transform unit having at least one sub-set, from a stream of encoded video data, the method comprising the steps of: 30 selecting a sub-set of the transform unit from the stream of encoded video data; selecting, for the sub-set of the transform unit, a context for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub P033508 / 6,197,598_1 / P033508_Speci_Filed 130412 - 37 set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; decoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and 5 storing the decoded luma coefficient of the transform unit.

6. A video encoder comprising: a processor; a memory storing a software executable program for directing the processor to 10 perform a method of encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the method comprising the steps of: selecting a sub-set of the transform unit from the unencoded frame data; selecting, for the sub-set of the transform unit, a context index for a greater-than two flag from a context model having at most two contexts dedicated to said greater-than 15 two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; encoding the luma coefficient for the sub-set of the transform unit using the 20 selected context for the greater-than-two flag; and storing the decoded luma coefficient of the transform unit.

7. A computer readable non-transitory storage medium storing a software executable program for directing a processor to perform a method of decoding a luma coefficient, for 25 a transform unit having at least one sub-set, from a stream of encoded video data, the program comprising: software executable code for selecting a sub-set of the transform unit from the stream of encoded video data; software executable code for selecting, for the sub-set of the transform unit, a 30 context for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; P033508 / 6,197,598_ / P033508_Speci_Filed 130412 - 38 software executable code for decoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and software executable code for storing the decoded luma coefficient of the transform unit. 5

8. A computer readable non-transitory storage medium storing a software executable program for directing a processor to perform a method of encoding a luma coefficient, for a transform unit having at least one sub-set, from unencoded frame data, the program comprising: 10 software executable code for selecting a sub-set of the transform unit from the unencoded frame data; software executable code for selecting, for the sub-set of the transform unit, a context index for a greater-than-two flag from a context model having at most two contexts dedicated to said greater-than-two flag, the greater-than-two flag indicating a first instance 15 in a sub-set of a residual coefficient having a magnitude greater than two, the context being for a luma coefficient of the sub-set, the context being selected independently of a number of luma coefficients greater than one in a previously decoded transform unit sub-set; software executable code for encoding the luma coefficient for the sub-set of the transform unit using the selected context for the greater-than-two flag; and 20 software executable code for storing the decoded luma coefficient of the transform unit. Dated this 13th day of April 2012 25 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant Spruson&Ferguson P033508 / 6,197,598 / P033508_Speci_Filed 130412