AU2017210632A1

AU2017210632A1 - Method, apparatus and system for encoding and decoding video data

Info

Publication number: AU2017210632A1
Application number: AU2017210632A
Authority: AU
Inventors: Christopher James ROSEWARNE
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-08-04
Filing date: 2017-08-04
Publication date: 2019-02-21

Abstract

- 46 Abstract METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING A method of decoding a precinct of video data from a video bitstream. A Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data is decoded. The plurality of coefficient groups is divided into GCLI groups at fixed predetermined locations within the wavelet subband. The GCLI groups comprise run GCLI groups and non-run GCLI groups. A truncation bit plane index for the coefficient group in the wavelet subband is received. Run GCLI groups are determined from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the significant bit plane index and the truncation bit plane index. The run GCLI groups are encoded starting at the fixed predetermined location within the wavelet subband. The precinct of video data is decoded from the bitstream based on the determined run GCLI groups. 1 3435054v1 | Source device I Destination device Video source Display device Video encoder i Video decoder |Storage Transmitter I I Receiver L_116 132 Fig. 1

Description

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING IMAGE DATA

TECHNICAL FIELD

The present invention relates generally to digital video signal processing and, in particular, to a method, system and apparatus for encoding and decoding video data from a video bitstream. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer program for encoding and decoding video data.

BACKGROUND

Many applications for video coding exist. For example, applications include video conferencing, video-on-demand, digital television broadcast, or storage of video data. In such applications, a substantial reduction in bitrate of the video data has been achieved by the use of video coding standards, such as the “H.264/AVC (ISO/IEC 14496-10) video coding standard” and the “high efficiency video coding (HEVC) standard”. However, in a number of other applications, video data has remained uncompressed. For example, the video data transmitted from a graphics card to a display has traditionally remained uncompressed. Within a live broadcast studio, or a post-production studio, the video data transmitted internally is also traditionally transported in an uncompressed format. In such applications, transmitting the video data without any loss in image quality is required. The costs of provisioning bandwidth for transmission of uncompressed video data has been deemed negligible compared to the need for image quality.

However, the adoption of higher and higher resolutions, as well as higher frame rates, has led to a rapid expansion in the bitrates of uncompressed video data. For example, the required throughput to transmit 8K resolution, 120 frames per second, 4:2:2 video with 10 bits used to encode each colour component is 79,626,240,000 bits per second - or roughly 80 Gbps. Such a required throughput is well beyond the capability of a popular wired interface used within studios in the form of the Serial Digital Interface (SDI) standardised by the Society of Motion Picture and Television Engineers (SMPTE). A recent standardised interface is SMPTE ST-2082, or “12G-SDI”, which supports bitrates up to 12 Gbps.

13435054v1

-22017210632 04 Aug 2017

Even at bitrates within the capabilities of current standards, there are significant challenges to achieving elevated bitrates. Cables may be more costly to manufacture as the cables must meet more stringent tolerances. The cables may become shorter, due to the need to minimise losses from signal attenuation and noise from crosstalk. In the short term, increased bitrates may be achieved by the use of multiple parallel cables, which increases costs both in terms of increased cable usage, and increased number of interface ports.

Recent developments in video coding therefore, have begun to focus on lightweight compression aimed at the above use cases. For example, the Video Electronics Standards Association (VESA) have published a standard named Display Stream Compression (DSC), and are presently standardising a more complex video codec which will be called Advanced Display Stream Compression (ADSC). ISO/IEC JTC1 SC29 Working Group 1, also known as the Joint Photographic Experts Group (JPEG), has issued a Call for Proposals for a standard to meet the above use cases. In contrast to the H.264/AVC and HEVC video coding standards, the lightweight coding standards only achieve compression ratios in the order of 3:1 to 8:1. However, the lightweight coding standards are required to operate in real time, and with very low latency. For example, the JPEG standardisation Call for Proposals requires no more than 32 video lines of latency caused by data dependency.

SUMMARY

It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

Disclosed is a method of run coding that seeks to improve the overall coding efficiency of a codec, with only a small increase in computational complexity, and no change in latency.

According to one aspect of the present disclosure, there is provided a method of decoding a precinct of video data from a video bitstream, the method comprising:

decoding a significant bit plane index for a coefficient group in a wavelet subband of the precinct of video data;

receiving a truncation bit plane index for the coefficient group in the wavelet subband; determining a plurality of coefficient groups with zero valued coefficients, starting from the coefficient group, a run length of the plurality of coefficient groups being determined

13435054v1

-3 2017210632 04 Aug 2017 by an amount that the significant bit plane index is less than the truncation bit plane index; and decoding the precinct of video data from the bitstream based on the number of coefficient groups with zero valued coefficients.

According to another aspect of the present disclosure, there is provided a system for decoding a precinct of video data from a video bitstream, the precinct of video data including one or more subbands, the system comprising:

a memory for storing data and a computer program;

a processor coupled to the memory for executing the computer program, the computer program comprising instructions for:

decoding a significant bit plane index for a coefficient group in a wavelet subband;

receiving a truncation bit plane index for the coefficient group in the wavelet subband;

determining a plurality of coefficient groups with zero valued coefficients, starting from the coefficient group, a run length of the plurality of coefficient groups being determined by an amount that the significant bit plane index is less than the truncation bit plane index; and decoding the precinct of video data from the bitstream based on the number of coefficient groups with zero valued coefficients.

According to still another aspect of the present disclosure, there is provided an apparatus for decoding a precinct of video data from a video bitstream, the precinct of video data including one or more subbands, the apparatus comprising:

means for decoding a significant bit plane index for a coefficient group in a wavelet subband;

means for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

means for determining a plurality of coefficient groups with zero valued coefficients, starting from the coefficient group, a run length of the plurality of coefficient groups being determined by an amount that the significant bit plane index is less than the truncation bit plane index; and means for decoding the precinct of video data from the bitstream based on the number of coefficient groups with zero valued coefficients.

13435054v1

-42017210632 04 Aug 2017

According to still another aspect of the present disclosure, there is provided a computer readable medium having a program for decoding a precinct of video data from a video bitstream stored on the medium, the precinct of video data including one or more subbands, the program comprising:

code for decoding a significant bit plane index for a coefficient group in a wavelet subband;

code for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

code for determining a plurality of coefficient groups with zero valued coefficients, starting from the coefficient group, a run length of the plurality of coefficient groups being determined by an amount that the significant bit plane index is less than the truncation bit plane index; and code for decoding the precinct of video data from the bitstream based on the number of coefficient groups with zero valued coefficients.

According to still another aspect of the present disclosure, there is provided a method of decoding a precinct of video data from a video bitstream, the method comprising:

decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

receiving a truncation bit plane index for the coefficient group in the wavelet subband; determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run GCLI groups.

According to still another aspect of the present disclosure, there is provided a system for decoding a precinct of video data from a video bitstream, the precinct of video data including one or more subbands, the system comprising:

a memory for storing data and a computer program;

13435054v1

-52017210632 04 Aug 2017 a processor coupled to the memory for executing the computer program, the computer program comprising instructions for:

determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run GCLI groups.

means for decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

means for determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and means for decoding the precinct of video data from the bitstream based on the determined run GCLI groups.

13435054v1

-62017210632 04 Aug 2017

code for decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

code for determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and code for decoding the precinct of video data from the bitstream based on the determined run GCLI groups.

decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

receiving a truncation bit plane index for the coefficient group in the wavelet subband; determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.

a memory for storing data and a computer program;

13435054v1

-72017210632 04 Aug 2017 a processor coupled to the memory for executing the computer program, the computer program comprising instructions for:

According to still another aspect of the present disclosure, there is provided an apparatus for decoding a precinct of video data from a video bitstream, the apparatus comprising:

means for decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

means for determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.

According to still another aspect of the present disclosure, there is provided a computer readable medium having a computer program stored on the medium for decoding a precinct of video data from a video bitstream, the program comprising:

code for decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

13435054v1

-82017210632 04 Aug 2017 code for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

code for determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.

According to still another aspect of the present disclosure, there is provided a video bitstream for decoding a precinct of video data, the video bitstream comprising:

an encoded Greatest Coded Line Index (GCLI) for each coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided, at fixed predetermined locations within the wavelet subband, into GCLI groups; and a run indication of a run length of run GCLI groups positioned at the fixed predetermined locations within the wavelet subband to decode the precinct of the video data, the run GCLI groups being GCLI groups having coefficient values below a predetermined value, wherein the run indication is determined based on the encoded GCLI, a truncation bit plane index and the fixed predetermined location within the wavelet subband.

Other aspects are also disclosed

BRIEF DESCRIPTION OF THE DRAWINGS

At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:

Fig. 1 is a schematic block diagram showing a low latency video coding and decoding system comprising a source device, a communication channel, and a destination device;

Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which one or both of the source device and destination device of Fig. 1 may be practiced;

Fig. 3 is a schematic block diagram showing functional modules of the video encoder ofFig. 1;

13435054v1

-92017210632 04 Aug 2017

Fig. 4 is a schematic block diagram showing functional modules of the video decoder of Fig. 1;

Fig. 5 is a diagram of a wavelet decomposition;

Fig. 6 is a diagram showing an example of a Greatest Coded Line Index (GCLI) extraction module determining GCLIs from sign/magnitude samples;

Fig. 7A shows example GCLIs, as well as an example Greatest Trimmed Line Index (GTLI) level, plotted against a coefficient group axis;

Fig. 7B shows a plot of an example truncated GCLIs signal, corresponding to the example GCLIs of Fig. 7A and example GTLI;

Fig. 8 is a schematic flow diagram showing a method of encoding GCLIs;

Fig. 9 shows a method of decoding a received GCLI coded bitstream; and

Fig. 10 shows a plot of an example truncated GCLI signal, with zero runs coded in an aligned manner to groups of GCLIs.

DETAILED DESCRIPTION INCLUDING BEST MODE

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

Fig. 1 is a schematic block diagram showing function modules of a low latency video encoding and decoding system 100. The system 100 includes a source device 110 and a destination device 130. A communication channel 120 is used to communicate encoded video information from the source device 110 to the destination device 130. In some arrangements, the source device 110 and destination device 130 may respectively comprise broadcast studio equipment, such as overlay insertion and real-time editing modules, in which case the communication channel 120 may be a Serial Digital Interface (SDI) link. Video data formatted for transmission over SDI may also be conveyed over Ethernet, e.g. using methods

13435054v1

-102017210632 04 Aug 2017 as specified in SMPTE ST. 2022-6. In general, the communication channel 120 may utilise an interface intended for conveying uncompressed data, even though in the system 100, compressed data is conveyed. Then, the communication channel 120 is typically a constant bit rate (CBR) channel.

In other arrangements, the source device 110 and destination device 130 may comprise a graphics driver as part of a system-on-chip (SOC) and an LCD panel (e.g. as found in a smart phone, tablet or laptop computer), in which case the communication channel 120 is typically a wired channel, such as printed circuit board (PCB) tracks and associated connectors. Moreover, the source device 110 and the destination device 130 may comprise any of a wide range of devices, including devices supporting over the air television broadcasts, cable television applications, internet video applications, or applications where encoded video data is captured on some storage medium or a file server. The source device 110 may also be a digital camera capturing video data and outputting the video data in a compressed format offering visual quality indistinguishable from an uncompressed format.

As shown in Fig. 1, the source device 110 includes a video source 112, a video encoder 114 and a transmitter 116. The video source 112 typically comprises a source of uncompressed video data 113, such as output from an imaging sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote imaging sensor. The uncompressed video data 113 is conveyed from the video source 112 to the video encoder 114 over a CBR channel, with fixed timing of the delivery of the video data. Generally, the video data is delivered in a raster scan format, with signalling to delineate between lines (‘horizontal sync’) and frames (‘vertical sync’). The video source 112 may also be the output of a computer graphics card, e.g. displaying the video output of an operating system and various applications executing upon a computing device, for example a tablet computer. Such content is an example of ‘screen content’. Examples of source devices 110 that may include an imaging sensor as the video source 112 include smart-phones, video camcorders and network video cameras. As screen content may itself include smoothly rendered graphics and playback of natural content in various regions, this is also commonly a form of ‘mixed content’. The video encoder 114 converts the uncompressed video data 113 from the video source 112 into encoded video data and will be described further with reference to Fig. 3.

13435054v1

-11 2017210632 04 Aug 2017

The video encoder 114 encodes the incoming uncompressed video data 113. The video encoder 114 is required to process the incoming sample data in real-time, i.e., the video encoder 114 is not able to stall the incoming uncompressed video data 113, e.g., if the rate of processing the incoming data were to fall below the input data rate. The video encoder 114 outputs compressed video data 115 (the ‘bitstream’) at a constant bit rate. In a video streaming application, the entire bitstream is not stored in any one location. Instead, the precincts of compressed video data are continually being produced by the video encoder 114 and consumed by the video decoder 134, with intermediate storage, e.g., in the (CBR) communication channel 120. The CBR stream compressed video data is transmitted by the transmitter 116 over the communication channel 120 (e.g. an SDI link). It is also possible for the compressed video data to be stored in a non-transitory storage device 122, such as a “Flash” memory or a hard disk drive, until later being transmitted over the communication channel 120, or in-lieu of transmission over the communication channel 120.

The destination device 130 includes a receiver 132, a video decoder 134 and a display device 136. The receiver 132 receives encoded video data from the communication channel 120 and passes received video data 133 to the video decoder 134. The video decoder 134 then outputs decoded video data 135 to the display device 136. Examples of the display device 136 include a cathode ray tube, a liquid crystal display (such as in smart-phones), tablet computers, computer monitors or in stand-alone television sets. It is also possible for the functionality of each of the source device 110 and the destination device 130 to be embodied in a single device, examples of which include mobile telephone handsets and tablet computers, or equipment within a broadcast studio including overlay insertion units.

Notwithstanding the example devices mentioned above, each of the source device 110 and destination device 130 may be configured within a general purpose computing system, typically through a combination of hardware and software components. Fig. 2A illustrates such a computer system 200, which includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, which may be configured as the video source 112, and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220, which may represent

13435054v1

- 122017210632 04 Aug 2017 the communication channel 120, may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220. The transceiver device 216 may provide the functionality of the transmitter 116 and the receiver 132 and the communication channel 120 may be embodied in the connection 221.

The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes a number of input/output (I/O) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card and provides an example of ‘screen content’. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Focal Area Network (FAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 211 may comprise an Ethernet™ circuit card, a Bluetooth™ wireless arrangement or an IEEE 802.il wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 211. The local network interface 211 may also provide the functionality of the transmitter 116 and the receiver 132 and communication channel 120 may also be embodied in the local communications network 222.

The FO interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage

13435054v1

- 13 2017210632 04 Aug 2017 devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-ray Disc™), USBRAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the video source 112, or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 110 and the destination device 130 of the system 100, or the source device 110 and the destination device 130 of the system 100 may be embodied in the computer system 200.

The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun SPARCstations, Apple Mac™ or alike computer systems.

Where appropriate or desired, the video encoder 114 and the video decoder 134, as well as methods described below, may be implemented using the computer system 200 wherein the video encoder 114, the video decoder 134 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 114, the video decoder 134 and the steps of the described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200

13435054v1

- 142017210632 04 Aug 2017 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for implementing the video encoder 114, the video decoder 134 and the described methods.

The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212.

In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Bluray Disc™, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or nontangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 401 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including email transmissions and information recorded on Websites and the like.

The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other

13435054v1

- 15 2017210632 04 Aug 2017 forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.

Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.

When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the computer system 200 of Fig. 2A need to be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.

13435054v1

- 162017210632 04 Aug 2017

As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

The video encoder 114, the video decoder 134 and the described methods may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder 114, the video decoder 134 and the described methods produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

13435054v1

- 172017210632 04 Aug 2017

Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of microoperations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:

(a) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;

(b) a decode operation in which the control unit 239 determines which instruction has been fetched; and (c) an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

Each step or sub-process in the method of Figs. 8 and 9, to be described, is associated with one or more segments of the program 233 and is typically performed by the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233.

Fig. 3 is a schematic block diagram showing functional modules of the video encoder 114. Fig. 4 is a schematic block diagram showing functional modules of the video decoder 134. The video encoder 114 and video decoder 134 may be implemented using the generalpurpose computer system 200, as shown in Figs. 2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200 or by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205. Alternatively, the video encoder 114 and video decoder 134 may be implemented by a combination of dedicated hardware and software executable within the computer system 200. The video encoder 114, the video decoder 134 and the described methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital

13435054v1

- 18 2017210632 04 Aug 2017 signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular the video encoder 114 comprises modules 302-340 and the video decoder 134 comprises modules 410-430 which may each be implemented as one or more software code modules of the software application program 233, or an FPGA ‘bitstream file’ that configures internal logic blocks in the FPGA to realise the video encoder 114 and the video decoder 134.

A colour transform module 302 receives the uncompressed video data 113 from the video source 112, and may perform an opponent colour transform, such as a Reversible Colour Transform, or a YCoCg transform. The uncompressed video data 113 typically takes the form of R, G, and B samples, respectively representing the relative strength of red, green, and blue colour in each sample location. Then, an opponent colour transform applied to the uncompressed video data 113 in the form of R, G, and B samples may produce colour component data 304, where the colour components 304 are further decorrelated compared to the uncompressed video data 113.

A discrete wavelet transform module 306 receives the colour component data 304, and performs a forward discrete wavelet transform, producing wavelet coefficients 308. The wavelet coefficients 308 are grouped into wavelet subbands corresponding to the structure of the specific discrete wavelet transform chosen, which is described further below with reference to Fig. 5.

By applying a wavelet transform, the colour component data 304 is decomposed into a set of wavelet subbands, each wavelet subband approximately corresponding to features of a different spatial frequency range. For example, slowly varying features are represented by a small number of wavelet coefficients in low frequency wavelet subbands, while more sharply defined features such as edges are represented by sparsely distributed wavelet coefficients in high frequency wavelet subbands. The wavelet transform produces a number of wavelet coefficients 308 across the set of wavelet subbands equal to the number of input samples from the colour component data 304. However, many of the wavelet coefficients are likely to be small or zero-valued, which may be easily compressed by a subsequent entropy encoding stage.

13435054v1

- 192017210632 04 Aug 2017

The wavelet transform is typically achieved by successive iterations of a two-channel filter bank, where one iteration of the two-channel filter bank outputs a low-pass subband and a high-pass subband, the first iteration being performed on the video data, and each subsequent iteration being performed on the low-pass subband output of the previous filter bank. The particular two-channel filter bank used is also referred to as the wavelet kernel. In one arrangement, a 5/3 LeGall wavelet is used for the wavelet kernel. In other arrangements, other wavelet kernels may be used, such as the Haar wavelet, or the 9/7 Cohen-DaubechiesFeauveau wavelet. The number of iterations of the filter bank may be equivalently referred to as the number of levels, or the depth, of the wavelet decomposition.

More sparsity in the resulting wavelet coefficients may be achieved by increasing the number of levels of wavelet decomposition. However, the number of input samples a wavelet coefficient is determined from, and thus depends on - also referred to as its ‘support’ roughly doubles for each additional level of wavelet decomposition. Thus, each additional level of wavelet decomposition increases the number of colour component data 304 samples that need to be buffered before the wavelet coefficients 308 can be produced. Within the video encoding and decoding system 100, low latency is satisfied by selecting a wavelet decomposition that is shallow in the vertical dimension (e.g., only one or two levels).

Fig. 5 is a diagram of a wavelet decomposition 500 suitable for the low latency video encoding and decoding system 100. In one arrangement, the wavelet decomposition 500 is applied to each colour component of the colour component data 304 separately. In other arrangements, different wavelet decompositions to the colour components may be applied. One level of vertical wavelet decomposition is applied, producing a vertical low-pass subband and a vertical high-pass subband. The vertical low-pass subband is then further processed by five levels of horizontal wavelet decomposition, producing a horizontal low-pass, vertical low-pass subband L5 510, and five horizontal high-pass, vertical low-pass subbands H5 511, H4 512, H3 513, H2 514 and Hl 515. The vertical high-pass subband is processed by one level of horizontal wavelet decomposition, producing a horizontal low-pass, vertical high-pass subband LH 520, and a horizontal high-pass, vertical high-pass subband HH 521.

Each N rows from the vertical low-pass subbands, and the corresponding N rows from the vertical high-pass subbands, are collected together into units that may be termed precincts. A precinct is a collection of all the subband samples corresponding to one spatial region in the

13435054v1

-202017210632 04 Aug 2017 uncompressed video data 113. Fig. 5 shows an example of a single row of vertical low-pass subband samples and a single row of vertical high-pass subband samples, collected to form a precinct 530. Then, a precinct subband consists of all the samples corresponding to a particular subband in a particular precinct.

A sign/magnitude conversion module 310 receives the wavelet coefficients 308, and converts each wavelet coefficient from a two’s complement representation to a sign/magnitude representation, producing sign/magnitude samples 312. For example, the number negative fifteen (-15) is expressed in a two’s complement eight (8) bit representation, as the binary sequence 11110001. The same number, negative fifteen (-15), is expressed in sign/magnitude representation as a sign bit of one (1), and the binary sequence 00001111.

A Greatest Coded Fine Index (GCFI) extraction module 314 receives the sign/magnitude samples 312, collects the sign/magnitudes samples 312 into coefficient groups, and calculates a Greatest Coded Fine Index (GCFI) for each of the coefficient groups, producing a number of GCFIs 316 equal to the number of coefficient groups. The coefficient groups are typically formed by collecting spatially neighbouring sign/magnitude samples 312. For example, each coefficient group may be formed by collecting k consecutive sign/magnitude samples in the horizontal direction. Collecting sign/magnitude samples horizontally is advantageous as the collection of sign/magnitude samples horizontally minimises the degree of vertical latency imposed by the GCFI extraction module 314. However, in another arrangement, coefficient groups may be formed by collecting k consecutive sign/magnitude samples in the vertical direction. One advantage of collecting sign/magnitude samples vertically is that the sign/magnitude samples are produced by a low latency wavelet decomposition such as the wavelet decomposition 500, and thus vertically adjacent samples are determined from spatial regions in the uncompressed video data 113 that are closer together. Then, the vertically adjacent samples are more likely to be statistically similar, and collecting the vertically adjacent samples into coefficient groups may exploit redundancy more efficiently. The process by which the GCFIs 316 are determined from the coefficient groups is described further below with reference to Fig. 6.

A rate allocation module 320 receives the GCFIs 316, and using a predetermined rate allocation budget, determines for each precinct subband a GCFI coding mode 322 and a Greatest Trimmed Fine Index (GTFI) 324.

13435054v1

-21 2017210632 04 Aug 2017

For each precinct subband, a GCLI coder 330 receives the GCLIs 316, the GCLI coding mode 322, and the GTLI 324, and produces a GCLI coded bitstream 332. The process by which the GCLI coder 330 encodes the GCLIs 316 is described further below with reference to Figs. 7 and 8.

For each precinct subband, a quantisation/truncation module 334 receives the sign/magnitude samples 312 and the GTLI 324, and produces a data bitstream 336. The process by which the data bitstream 336 is produced is described further below with reference to Fig. 6.

A bitstream packer 340 receives the GCLI coded bitstream 332 and the data bitstream 336, and produces the compressed video data 115.

Fig. 6 is a diagram showing an example of the GCLI extraction module 314 determining the GCLIs 316 from the sign/magnitude samples 312. The example of Fig. 6 shows three consecutive coefficient groups 600, 602 and 604, each formed from four sign/magnitude samples. Within each coefficient group, the sign and magnitude bits of the samples are collected into a sign bit plane 610, and magnitude bit planes 612. The magnitude bit planes 612 are further divided into a number of leading zero bit planes 620, which contain only zero-valued bits, and significant bit planes, whose most significant bit plane contains at least one non-zero bit. The GCLI for each coefficient group is equal to the number of significant bit planes, or equivalently the GCLI may be considered as an index 614 of the most significant bit plane (where indexation begins from one (1)). In the example of Fig. 6, the GCLIs for coefficient groups 600, 602 and 604 are six (6), five (5), and eight (8) respectively.

If the video encoder 114 is encoding losslessly, then all the data bits associated with the significant bit planes, as well as the sign bit planes 610, will be transferred by the quantisation/truncation module 334 to the data bitstream 336. However, it is likely that in order to meet a CBR constraint, the rate allocation module 320 introduces some loss. The determined GTLI 324 indicates the number of least significant bit planes to truncate. The GTLI may equivalently be considered as an index of the truncated bit planes (where indexation begins from one (1)). In the example of Fig. 6, the value of the determined GTLI 324 is two (2), resulting in the two least significant bit planes 624 being truncated from the

13435054v1

-222017210632 04 Aug 2017 data bitstream 336. The data bitstream 336 is composed of the sign bit planes 610 and the non-truncated, significant bit planes 622. The data bits corresponding to the non-truncated, significant bit planes 622 are shown circled in Fig. 6.

In the example of Fig. 6, all the coefficient groups have non-truncated significant bit planes. However, if any coefficient group has zero non-truncated significant bit planes (i.e., all significant bit planes have been lost to truncation), then the sign bit plane is unnecessary. Then, for coefficient groups with zero non-truncated significant bit planes, no data bits are transferred to the data bitstream 336.

In one arrangement, the data bits shown circled in Fig. 6 are transferred directly to the data bitstream 336 by the quantisation/truncation module 334. Such a transfer corresponds to a so-called ‘deadzone’ quantisation, as the samples are effectively rounded towards zero. In other arrangements of the quantisation/truncation module 334, the sign/magnitude samples 312 may be modified by an offset before truncation, which allows a more evenly distributed rounding policy. The offset may be calculated from the determined GTLI 324, and/or the GCLI of the corresponding coefficient group.

The functional modules of the video decoder 134 are now described with reference to Fig. 4. A bitstream unpacker 400 demuxes the received video data 133 into three bitstreams, comprising a header bitstream 402, a received GCLI coded bitstream 406, and a received data bitstream 408. The header bitstream 402 is read by a header decoder 410, and for each precinct subband, decodes a received GCLI coding mode 412 and a received GTLI 414.

A GCLI decoder 416 reads the received GCLI coded bitstream 406, and using the received GCLI coding mode 412 and the received GTLI 414, produces decoded GCLIs 418. The process by which the GCLI decoder 416 decodes the received GCLI coded bitstream 406 will be described further below with reference to Fig. 9.

A data unpacker 420 reads the received data bitstream 408, and using the decoded GCLIs 418 and the received GTLI 414, produces received quantised samples 422. Referring to the example of Fig. 6, if the data unpacker 420 is decoding the samples associated with the coefficient group 600, then the data unpacker 420 would receive a decoded GCLI of six (6), and a GTLI of two (2). Then, for a coefficient group size of k — 4, the data unpacker 420

13435054v1

-23 2017210632 04 Aug 2017 decodes for the coefficient group 600 a number of data bits from the received data bitstream 408 equal to k * (GCLI — GT LI) magnitude bits, plus k sign bits when GCLI > GT LI.

A scaling module 424 rescales the received quantised samples 422, producing rescaled sign/magnitude samples 426. The scaling module 424 approximately inverts the quantisation introduced by the quantisation/truncation module 334 in the video encoder 114. For example, given a GTLI of two (2) and a deadzone quantisation policy, a sign/magnitude sample with the value fifteen (15) would be quantised to — 3 by the quantisation/truncation module

334. The scaling module 424 would then rescale the value of three (3) to a value of twelve (12). In other arrangements of the scaling module 424, the received quantised samples 422 may be scaled by a factor calculated from the received GTLI 414, and/or the GCLI of the corresponding coefficient group.

A two’s complement conversion module 428 converts each of the rescaled sign/magnitude samples 426 to two’s complement representation, producing received wavelet coefficients 430.

An inverse discrete wavelet transform module 432 performs a reverse discrete wavelet transform on the received wavelet coefficients 430, producing received colour component data 434. The reverse discrete wavelet transform matches the forward discrete wavelet transform performed by the discrete wavelet transform module 306 in the video encoder 114.

An inverse colour transform module 436 may perform an inverse opponent colour transform on the received colour component 434, producing the decoded video data 135.

The process by which the GCLI coder 330 encodes the GCLIs 316 is now described with reference to Figs. 7A and 7B.

Let a GCLI value for an nth coefficient group of a subband b of a pth precinct be denoted by gcli_pb[n], and similarly let a GTLI value for the subband b of the pth precinct be denoted by gtli_pb. Example GCLIs 710 of the gcli_pb[n,], as well as an example GTLI 712 level, are plotted against a coefficient group axis 720 in Fig. 7A. The example plot of Fig. 7A demonstrates that the GCLI values may change from one coefficient group to the next, while the GTLI value is a fixed value per precinct subband. Then, the number of non-truncated,

13435054v1

-242017210632 04 Aug 2017 significant bit planes for the nth coefficient group of the subband b of the pth precinct g_pb [n] is determined in accordance with Equation (1), as follows:

dp.b W = max(gcli_p>b [n] - gtli_{p b}, 0) (1) g_Pib[n] may also be referred to as a ‘truncated GCLI’. An example truncated GCLIs 730 signal, corresponding to the example GCLIs 710 and example GTLI 712, is plotted in Fig. 7B.

If the GCLI coder 330 receives the GCLI coding mode 322 indicating a horizontal prediction mode, then a difference signal δ is determined in accordance with Equation (2), as follows:

δρ,ϋΜ = g_Pib[n] - g_pJ}[n - 1] (2)

That is, each d_{p b} [n] is equal to an amount representing the change in the number of non-truncated, significant bit planes for the current coefficient group, compared to the previous coefficient group to left. An example δ 732 shown in Fig. 7B illustrates how the difference signal δ is produced from the example truncated GCLIs 730.

Then, the first GCLI value gcli_{p b} [0] may be encoded with a fixed length binary code, and the subsequent GCLIs signalled by encoding the difference signal δ_ρί)[η] (for n > 0). As the difference signal δ is statistically skewed towards zeros, it is advantageous to signal the δ values with an entropy code such as a signed unary code. One example of a signed unary code is shown in Table 1, below:

13435054v1

-25 2017210632 04 Aug 2017

Table 1

The codes corresponding to gcli_pb [0] and the subsequent difference signal δ_ρΙ} may be concatenated together across all subbands and precincts to produce the GCLI coded bitstream 332.

If the GCLI coder 330 receives the GCLI coding mode 322 indicating a vertical prediction mode, then the difference signal δ is determined by the change in an amount of non-truncated, significant bit planes for the current coefficient group, compared to the previous coefficient group above. However, as the previous coefficient group above belongs to a different precinct, the previous coefficient group may in general have been truncated by a different GTLI. For prediction, the difference signal δ is determined with a common GTLI. Then, a modified truncated GCLI for the previous p — 1th precinct is determined, using the GTLI of the current pth precinct, in accordance with Equation (3), as follows:

9p-i,b M = max(#di_p__lj& [n] - gtli_p>b, 0) (3) and the difference signal δ is determined using the modified truncated GCLI as a predictor, in accordance with Equation (4), as follows:

$p,b [^] g_p,b[n] - gp-i,b[n] if g_P-i,b[n] > 0 g_pb [n] otherwise (4)

13435054v1

-262017210632 04 Aug 2017

The difference signal δ may then be encoded by an entropy code, such as the signed unary code described above.

One disadvantage of signalling each difference value 8_{p b} [n] with an entropy code, is that each difference value needs to be coded with at least one (1) bit. A run 740 in Fig. 7B illustrates a situation where a run of zero values occurs in the example truncated GCLIs 730. Apart from the initial δ 732, subsequent difference values corresponding to the run 740 will be zero valued. Rather than coding the subsequent difference values with one (1) bit each, the initial δ 732 may be modified so that the run 740 is signalled more efficiently. A method 800 of encoding the GCLIs 316, with improved signalling of runs such as in the Example ofFig. 7A and 7B, is described below with reference to Fig. 8.

The method 800 may be implemented by the video encoder 114, as one or more software code modules of the application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205. The method 800 begins at an initialise step 802. At the initialise step 802, the GCLI coder 330, under execution of the processor 205, initialises a runs counter r to zero, and sets a current coefficient group n to one (1) if the GCLI coding mode 322 indicates a horizontal prediction mode, or sets the current coefficient group n to zero (0) if the GCLI coding mode 322 indicates a vertical prediction mode. The method 800 then proceeds to a check run of truncated GCLIs step 804.

At the check run of truncated GCLIs step 804, the GCLI coder 330 checks whether the next m truncated GCLIs have a value of zero (0), where m is a predetermined base run length. Since the truncated GCLIs are always non-negative, the check run step 804 may equivalently check if the following equality is true, in accordance with Equation (5), as follows:

1^0^,6^ + /1==0 (5)

If the above equality is true, then the method 800 proceeds to an increment run step 806. Otherwise, the method 800 proceeds to an increment coefficient group step 808.

At the increment run step 806, a number of states are updated. If the runs counter r —— 0, then the current coefficient group n is saved as the start of the current run, in accordance with Equation (6), as follows:

13435054v1

-272017210632 04 Aug 2017

W-start (6)

Then, both the runs counter r and the current coefficient group n are incremented, in accordance with Equations (7) and (8), respectively, as follows:

r — r + 1 (7) η — n + m (8)

The method 800 then proceeds to a check counter maximum step 807.

At the check counter maximum step 807, the runs counter r may be checked against a maximum number of runs R_max. The check is carried out at step 807 if the entropy code used to encode δ values is a ‘bounded’ code. A bounded code, which may also be referred to as a truncated code, is an entropy code that is limited in the range of values that the entropy code can encode. If the entropy code used to encode δ values is lower bounded with a lower bound of 5_min, then the maximum number of runs R_max is determined in accordance with Equation (9), as ^max ~ [/(start] $min (9)

Then, the GCLI coder 330 checks whether the following inequality is true, in accordance with Equation (10), as follows:

/' < Rmax (1θ)

If the above inequality is true, or the entropy code used to encode δ values is not lower bounded, then the method 800 returns to the check run of truncated GCLIs step 804. Otherwise, if the above inequality is false, the method 800 proceeds to a run check step 810.

At the increment coefficient group step 808, the current coefficient group n is advanced to the next adjacent coefficient group, in accordance with Equation (11), as follows:

n = n -I- 1 (Π)

The method 800 then proceeds to the run check step 810.

13435054v1

-28 2017210632 04 Aug 2017

At the run check step 810, the GCLI coder 330, under execution of the processor 205, checks whether the runs counter r has recorded any runs of zero valued truncated GCLIs. If ?’ > 0, then the method 800 proceeds to a modify delta step 812. Otherwise, the n_startposition marker is set to n — 1, and the method 800 proceeds to an encode delta step 814.

At the modify delta step 812, the difference value 8_pb[n_start] is modified in accordance with Equation (12), as follows:

&p,b ^— $p,b Ι/Utart] (12)

Note that 5_pb [n_start] is the only difference value encoded to signal the entire run of zero valued truncated GCLIs. Subsequent difference values belonging to the run (δ_ρ [ttstart + t] for 0 < t < r * m) are not encoded. The runs counter r is then reset to zero, and the method 800 proceeds to the encode delta step 814.

At the encode delta step 814, the difference value S_{p i)}[n_start] is encoded by an entropy code, such as the signed unary code described above. The method 800 then proceeds to an end of subband check step 816.

At the end of subband check step 816, the current coefficient n is checked for whether n has advanced beyond the total number of coefficient groups in the subband. If there are no more coefficient groups to be processed, the method 800 terminates. Otherwise, the method 800 returns to the check run of truncated GCLIs step 804.

The method 800 of encoding GCLI values allows full adaptation to the data as runs are permitted to start and end at any point in the row of coefficient groups. In another arrangement, start and end points may be constrained to coincide with fixed partitions of the row of coefficient groups. Constraining the start and end points to coincide with fixed partitions of the row of coefficient groups may be referred to as an “aligned GCLI coding arrangement”. Use of such an aligned GCLI coding arrangement restricts the use of escape codes to the first GCLI of each group of GCLIs, and also constrains the points at which runs of zero-valued truncated GCLIs may commence or terminate. Grouping of GCLIs allows the identification of ‘run’ and ‘non-run’ GCLI groups to be performed in a more parallel manner. Such run and non-run GCLI groups form a plurality of partitions the row of coefficient

13435054v1

-292017210632 04 Aug 2017 groups. The run and non-run groups may be processed in parallel using SIMD instructions to perform the classification. Then, a remaining step with a sequential aspect is to detect adjacent ‘run’ groups. In order to achieve an aligned GCLI coding, at step 804 it is also determined that the current position corresponds to an allowed position. That is, the precondition shown in Equation (3) below needs to be satisfied:

mod(n, rrip)——Q (13) where m_p is the size of the groups used in run computations, the value of which is constrained so that m — i m_p, and where i is a positive integer integer (i.e., typically 1 or 2). The size m_p of the groups (or partitions) also defines fixed predetermined locations within the wavelet subband. For example, if m_p equals 8, then the fixed predetermined locations would correspond to locations 0, 8, 16 and so on (i.e. multiples of m_p), within the wavelet subband. The consequences of constraining the start and end points to coincide with fixed partitions of the row of coefficient groups are described in further detail with reference to Fig. 10.

Fig. 10 shows a plot 1000 of an example truncated GCFI signal, with zero runs coded in an aligned (or ‘constrained’) manner with respect to groups of GCLIs. The plot 1000 shows a column plot of the GCFI values (e.g. 1014) resulting from coefficients of a subband of the wavelet transform. Bit planes from the indicated GCFI down to a truncation level 1012 are present in the GCFI coded bitstream 332. For example, a GCFI 1014 provides an indication that bit planes from the GCFI 1014 at bit plane index six (6) down to the truncation level 1012 at bit plane index three (3) (i.e. four bit planes), are present in the data bitstream 336. As the minimum number of bit planes to code for a group of coefficients is zero, when the GCFI falls below the truncation level 1012 (e.g. as shown with GCFI 1016), no bit planes are present in the data bitstream 336. The quantities of bit planes present in the data bitstream 336 for each coefficient group are known as truncated GCFIs 1018. Coding of the truncated GCFIs 1018 involves a prediction process. For example, when horizontal prediction is used for predicting GCFIs, the truncated GCFI for coefficient group n is determined from the sum of the truncated GCFI for coefficient group n-1 and a delta GCFI. Accordingly, GCFI residuals 1020 are represented in the GCFI coded bitstream 332, using signed unary codes.

13435054v1

-302017210632 04 Aug 2017

In the aligned GCLI coding arrangement where start and end points are constrained to coincide with fixed partitions of the row of coefficient groups, GCLIs are further grouped into groups of size m_p . For example, the GCLIs may be grouped into groups 1030, 1032, 1034, 1036 and 1038 for which m_p — 8. As seen in Fig. 10, the groups 1030, 1032, 1034, 1036 and 1038 have fixed boundaries at fixed predetermined locations with respect to the start of the subband. The last group is truncated when the number of GCLIs in the subband is not a multiple of m_p. Assuming in the example of Fig. 10 that the base run length m — m_p, each group is categorised as a ‘non-run GCLI group’ or a ‘run GCLI group’. The groups 1030,

1032 and 1038 each include at least one GCLI above truncation level 1012 (i.e. a truncated GCLI with a value exceeding zero) and thus are considered ‘non-run GCLI groups’. The groups 1034 and 1036 have no truncated GCLIs exceeding the value of zero. As such, the groups 1034 and 1036 are ‘run GCLI groups’. Moreover, as the groups 1034 and 1036 are adjacent, the groups 1034 and 1036 are coalesced into a ‘multiple run GCLI group’. A multiple run GCLI group is signalled using an escape GCLI code 1040 (of value -2), derived in accordance with the step 812 of the method 800. The escape GCLI code 1040 serves as a run indication of a run length of run GCLI groups positioned at the fixed predetermined locations within the wavelet subband. Also, given that the run GCLI group is determined by determining that the current position corresponds to an allowed (or fixed predetermined) position in the wavelet bitstream, for example, as per equation (13) and the fact that run GCLI groups correspond to GCLIs not exceeding the truncation bitplane, the run indication is determined based on the encoded GCLI, a truncation bit plane index and the fixed predetermined location within the wavelet subband. The escape GCLI codes are used to decode the precinct of the video data as described above.

Note that the non-run GCLI group 1032 includes GCLIs truncated to zero (e.g. 1016) that are adjacent to the GCLIs of the run GCLI groups 1034 and 1036 but are not themselves coded using an escape code. A similar situation occurs for the first few truncated GCLIs of the non-run GCLI group 1038, as a consequence of the alignment of the group boundaries (e.g. to every eight GCLIs).

Grouping of GCLIs allows the identification of ‘run’ and ‘non-run’ groups to be performed in a more parallel manner. In particular, the groups of GCLIs may be processed in parallel using SIMD instructions to perform the classification. Then the remaining step with a sequential aspect is to detect adjacent ‘run’ groups. Although the example of Fig. 10 shows a

13435054v1

-31 2017210632 04 Aug 2017 group size m_p of eight (8) and a base run length m also of eight (8), it is not required that the values of the group size m_p and the base run length m are equal. The benefit of finer GCLI grouping (m_p < m) is better alignment of run detection to the data. In general, the smaller the GCLI group size the higher the coding performance, at the expense of a reduction in achievable parallelism. Moreover, when the GCLI group size is smaller than the base run length, the assignment of a group as a ‘run’ group does not alone form sufficient basis to use an escape code to code the group. For example, if a base run length of eight (8) is used with a group size of four (4), whenever a ‘run’ group (of four) GCLIs is abutted with ‘non-run’ groups (i.e. before and after), then escape coding cannot be used (as the total run length is less than eight). In the case where a base run length of eight (8) is used with a group size of four (4), any two adjacent ‘run’ groups of four GCLIs are able to exploit escape coding. In a variation on the aligned GCLI coding arrangement described above with reference to Fig. 10, escape codes may only be permitted where the GCLIs are aligned with multiples of the base run length and not just the group size. As such, in the arrangement wherein the group size m_p is less than the base run length m, the run indication may be determined using a plurality of predetermined fixed locations defined respectively by the groups size m_p and the base run length m. Other combinations of GCLI group size and base run length are also possible, with the example of Fig. 10 showing the trade-off between performance and parallelism.

Experiments by the inventors show that the coding gain of using the aligned GCLI coding arrangement for run coding with escape codes is substantially preserved at higher compression ratios (e.g. at 3 bits per pixel (BPP)). As PSNRs of decoded video data are lower when higher compression ratios are used, this substantial preservation of coding gain shows that the trade-off between parallelism and coding gain is desirable. At lower compression ratios (e.g. 14 BPP), the coding gain of constrained run coding with escape codes is less preserved compared to the unconstrained location of escape codes. However, at lower compression ratios, the PSNRs are much higher and so the reduction is less significant.

One GCLI prediction mode employs prediction of the GCLI from the truncation level 1012. When the prediction mode employing prediction of the GCLI from the truncation level 1012 is in use, the truncated GCLIs 1018 are unary coded directly to form the GCLI coded bitstream 332. In the case where the truncated GCLIs 1018 are unary coded directly to form the GCLI coded bitstream 332, unsigned unary coding is possible because (a) the truncated GCLIs are always greater than or equal to the truncation level 1012 and (b) there is

13435054v1

-322017210632 04 Aug 2017 no delta calculation step to introduce potential negative delta values for coding. Use of an unsigned unary code and using negative values to signal escape codes for signalling insignificant runs, therefore an adjustment is necessary.

In one arrangement, every «th GCLI may be coded using signed unary coding instead of unary coding, where n is the base run length. The negative values that may be coded with a signed unary code form escape codes that are then used to encode the multiple of the base run length, in a manner analogous to the escape coding of the method. This multiple of the base run length, in combination with the base run length, indicates the number of subsequent truncated GCLIs having a value of zero. As only every //th coded GCLI includes the option for such escape coding, the coding efficiency impact is limited to one signed unary code every n codes, maintaining a high level of coding performance. Moreover, the parallelism benefits are also maintained in an arrangement where every «th GCLI is coded using signed unary coding instead of unsigned unary coding, as the entry to and exit from a run state is constrained to GCLIs aligned to the «-spaced apart boundaries. In another arrangement, the (unsigned) unary code may be used to encode values in the range [-t, 15 — t] (rather than [0, 15]) where t is the truncation index. The arrangement where the (unsigned) unary code is used to encode values in the range [-t, 15 — t] is achieved by addition (subtraction) of a fixed offset to (from) the encoded (decoded) GCLI residual value. Specifically, at the encoder, the GCLI residual is first transformed to an offset GCLI residual according to Equation (14), below:

6_enc = max(0, δ_ρ,_&) + t - i (14) prior to unary encoding. At the decoder, the residual 5_pb and run length i respectively are recovered according to Equations (15) and (16):

3_{p b} = max(0,3_enc - t) (15) i = min(0,6_enc - t) (16)

In the case where the GCLI residual is first transformed according to Equation (14) and the residual &_pb and run length i are recovered according to Equations (15) and (16), respectively, as the truncation index is increased, more of the unary code range is given over

13435054v1

-33 2017210632 04 Aug 2017 for use as escape codes reflecting the fact that long zero runs become more likely as the truncation level increases.

In another variation, the unary code may be used to encode values in the range [~t_u , 15 — t_u] where t_u — max(t, t_max). The imposition of a t_max cap on the range of escape codes serves to limit the cost of encoding significant GCLI residuals. The specific value of t_max used may be fixed (and known to both decoder and encoder) or the specific value of t_max chosen by the encoder and signalled in a codestream header.

In yet another variation, unsigned unary code values may be mapped to signed code values (again where positive values encode GCLI residuals and negative values encode zero run lengths) in magnitude order, alternating between signed and unsigned values. An example bounded unsigned unary code table covering the range [-2,13] is shown in Table 2 below. In general other mappings between values and unary code values may be employed in order to minimise the overall length of the GCLI coded bitstream.

13435054v1

-342017210632 04 Aug 2017

Table 2

Value	Offset arrangement	Alternating arrangement
13	111111111111111	111111111111111
12	111111111111110	111111111111110
11	11111111111110	11111111111110
10	1111111111110	1111111111110
9	111111111110	111111111110
8	11111111110	11111111110
7	1111111110	1111111110
6	111111110	111111110
5	11111110	11111110
4	1111110	1111110
3	111110	111110
2	11110	11110
1	1110	110
0	110	0
-1	10	10
-2	0	1110

Fig. 9 is a schematic flow diagram showing a method 900 of decoding the received GCFI coded bitstream 406. The method 900 may be implemented by the video decoder 134, as one or more software code modules of the application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205.

The method 900 begins at an initialise step 902. At the initialise step 902, the GCFI decoder 416, under execution of the processor 205, sets a current coefficient group n to one (1) if the received GCFI coding mode 412 indicates a horizontal prediction mode, or sets the current coefficient group n to zero (0) if the GCFI coding mode 322 indicates a vertical prediction mode. The method 900 then proceeds to a decode delta step 904.

13435054v1

-35 2017210632 04 Aug 2017

At the decode delta step 904, a difference value δ_{ρ b} [ft] is decoded from the received GCLI coded bitstream 406, under execution of the processor 205. The method 900 then proceeds to a check delta step 906.

At the check delta step 906, the decoded difference value δ_{ρ b} [η] is compared against the amount of a lowest valid difference value 5_vai_id that can validly indicate a truncated GCLI. 5_vai_id is determined from the truncated GCLI predictor, which depends on the received GCLI coding mode 412. If the received GCLI coding mode 412 indicates a horizontal prediction mode, then 5_vaiid is set to — t/_Pi/j[ft — 1]. If the received GCLI coding mode 412 indicates a vertical prediction mode, then 6_vaUd is set to [ft]. If δ_{ρ fc}[ft] >

§_vau_d, then the method 900 proceeds to a decode GCLI step 908. Otherwise, the method 900 proceeds to a decode run step 910.

At the decode GCLI step 908, a truncated GCLI g_p,_b[n] for the current coefficient group ft is determined from the decoded difference value δ_{ρ ϋ}[η] and a truncated GCLI predictor, under execution of the processor 205. If the received GCLI coding mode 412 indicates a horizontal prediction mode, then g_pb [n] is determined from a previous truncated GCLI to the left, in accordance with Equation (17), as follows:

g_P,b[n] = g_Plb[n- 1] + δ_Ρ)6[η] (17)

If the received GCLI coding mode 412 indicates a vertical prediction mode, then 9_P,b M i^s determined from a previous truncated GCLI above, in accordance with Equation (18), as follows:

g_P,b[n\ = Yi,i,W+ δ_ρ,&[η] (18)

A GCLI value gcli_pb [ft] for the current coefficient group n is then determined, in accordance with Equation (19), as follows:

_P,b if g_P,_b [^] θ otherwise

The current coefficient group n is then incremented to η — n + 1. The method 900 then proceeds to an end of subband check step 912.

(19)

13435054v1

-362017210632 04 Aug 2017

At the decode run step 910, a runs multiple r is set equal to the difference between the decoded difference value δ_ρί)[η] and the lowest valid difference value b_valid, in accordance with Equation (20), as follows:

T &valid ~ δρ,ή [rt] (20)

The runs multiple r indicates how many consecutive runs of m truncated GCLIs have a value of zero, where m is a predetermined base run length. Then, the GCLI decoder 416 sets g_p,_b[n + i] = 0 for 0 < i < L, where L — r * m is a run length of truncated GCLIs with a value of zero. Note that because each truncated GCLI with a value of zero indicates that there are no significant bit planes for the corresponding coefficient group, L is also a run length of coefficient groups with zero valued coefficients. The current coefficient group n is then advanced to η — n + r * m. The method 900 then proceeds to an end of subband check step 912.

At the end of subband check step 912, the current coefficient n is checked for whether n has advanced beyond the total number of coefficient groups in the subband. If there are no more coefficient groups to be processed, then the method 900 terminates. Otherwise, the method 900 returns to the decode delta step 904.

In an arrangement of the low latency video encoding and decoding system 100, the base run length m is fixed and known to both the video encoder 114 and the video decoder 134. In another arrangement, the base run length m may be signalled in a picture header that is included in the compressed video data 115. In an arrangement where the base run length m may be signalled in a picture header, the base run length m is fixed per-picture, but may be changed to adapt to different types of uncompressed video data 113.

In another arrangement of the low latency video encoding and decoding system 100, the base run length m is fetched from a lookup table indexed by the subband b and the GTLI gtli_{p b}. One advantage of such an arrangement is that longer base run lengths may be suitable for higher frequency subbands, and higher values of gtli_pb. In another arrangement, the lookup table may be fixed, or may be signalled in the picture header.

13435054v1

-372017210632 04 Aug 2017

In another arrangement of the low latency video encoding and decoding system 100, the base run length m is fetched from a lookup table indexed by the subband b. One advantage of an arrangement where the base run length m is fetched from a lookup table indexed by the subband b is that a lookup table indexed by only one variable may be less complex to implement. The lookup table indexed by the subband b may be selected from a number of candidate lookup tables, based on a known constant bit rate of the communication channel 120. In another arrangement, the lookup table indexed by the subband b may be adaptively determined per-precinct by the video encoder 114 based on the statistics of the GCLIs 316, and then signalled in a precinct header that is included for each precinct in the compressed video data 115.

In another arrangement of the low latency video encoding and decoding system 100, the base run length m is not predetermined or fetched from a lookup table, but is instead jointly determined along with the runs multiple r. The GCLI coder 330 may determine r, m pairs separately for each run of zero valued truncated GCLIs, and the r, m pairs are decoded from the received GCLI coded bitstream 406 by the GCLI decoder 416.

In the present arrangement, the GCLI coder 330 encodes the runs multiple r by signalling the modified difference value δ_{ρ b}[n_start] —r. Then, the base run length m is signalled by a binary code of x bits, where the value of x is dependent on r. The described signalling scheme of a unary code followed by a binary code whose length is dependent on the value of the unary code, is similar to an exponential Golomb code. However, the range of values that may be covered by the described signalling scheme is broader for the same bit cost, because some values are skipped. Table 3 below shows an example of the described signalling scheme, where appropriate ranges of m are selected to ensure that there is no overlap in the overall r * m.

13435054v1

-38 2017210632 04 Aug 2017

Table 3

r	X	m	r * m
1	0	4	4
2	1	3-4	6, 8
3	2	3-6	9, 12,... 18
4	3	5-12	20, 24,... 48

In the present arrangement, the GCLI coder 330 first determines a run length L for each run of zero valued truncated GCLIs. Then, the largest r * m value less than or equal to L is selected from a table such as Table 3 above, and the corresponding r and m values are determined. The GCLI decoder 416 decodes the r, m pair from the corresponding unary code and binary code in the received GCLI coded bitstream 406.

In another arrangement, a run length L of zero valued truncated GCLIs is signalled by r and m, but instead of L being a linear function in r, L may increase more quickly with increasing r. For example, instead of L = r * m, the run length L may be determined as L — r² * m . In another arrangement, the run length may be predetermined as L — round(r^a * m) for some non-integer exponent a, and the run length L may be fetched from a lookup table for r, m pairs.

INDUSTRIAL APPLICABILITY

The arrangements described are applicable to the computer and data processing industries and particularly for image processing.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only

13435054v1

-392017210632 04 Aug 2017 of”. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

CLAIMS:

1. A method of decoding a precinct of video data from a video bitstream, the method comprising:

decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

receiving a truncation bit plane index for the coefficient group in the wavelet subband; determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run GCLI groups.
2. The method according to claim 1, wherein the run length of the plurality of the coefficient groups is further determined by a base run length of the coefficient groups, the run length being determined based on the base run length and the amount that the first significant bit plane index is less than the truncation bit plane index.
3. The method according to claim 2, wherein the base run length for the video bitstream is fixed.
4. The method according to claim 2, wherein the base run length is signalled in a header included in the video bitstream.
5. The method according to claim 2, wherein the base run length is fetched from a lookup table indexed by the wavelet subband and the truncation bit plane index.
6. The method according to claim 2, wherein the base run length is fetched from a lookup table indexed by the wavelet subband.

13435054v1

-41 2017210632 04 Aug 2017
7. The method according to claim 6, wherein the lookup table is selected from a number of lookup tables according to a known constant bit rate of the video bitstream.
8. The method according to claim 6, wherein the lookup table is signalled in a precinct header included in the video bitstream.
9. A system for decoding a precinct of video data from a video bitstream, the precinct of video data including one or more subbands, the system comprising:

a memory for storing data and a computer program;

a processor coupled to the memory for executing the computer program, the computer program comprising instructions for:

decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

receiving a truncation bit plane index for the coefficient group in the wavelet subband;

determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run GCLI groups.
10. An apparatus for decoding a precinct of video data from a video bitstream, the precinct of video data including one or more subbands, the apparatus comprising:

means for decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

13435054v1

-422017210632 04 Aug 2017 means for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

means for determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and means for decoding the precinct of video data from the bitstream based on the determined run GCLI groups.
11. A computer readable medium having a program for decoding a precinct of video data from a video bitstream stored on the medium, the precinct of video data including one or more subbands, the program comprising:

code for decoding a Greatest Coded Line Index (GCLI) for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into GCLI groups at fixed predetermined locations within the wavelet subband, wherein GCLI groups comprise run GCLI groups and non-run GCLI groups;

code for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

code for determining run GCLI groups from the plurality of GCLI groups having coefficient values below a predetermined value, a run length of the run GCLI groups being determined from the GCLI and the truncation bit plane index, the run GCLI groups being encoded starting at the fixed predetermined location within the wavelet subband; and code for decoding the precinct of video data from the bitstream based on the determined run GCLI groups.
12. A method of decoding a precinct of video data from a video bitstream, the method comprising:

decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

receiving a truncation bit plane index for the coefficient group in the wavelet subband;

13435054v1

-43 2017210632 04 Aug 2017 determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.
13. The method according to claim 12, wherein the fixed predetermined locations are defined based on a partition size within the plurality of partitions.
14. The method according to claim 12, wherein the first significant bit-plane index is encoded using an unsigned unary code by expanding a signed range to include a range for specifying run lengths and mapping the signed range into an unsigned range for the purpose of coding.
15. A system for decoding a precinct of video data from a video bitstream, the precinct of video data including one or more subbands, the system comprising:

a memory for storing data and a computer program;

a processor coupled to the memory for executing the computer program, the computer program comprising instructions for:

decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

receiving a truncation bit plane index for the coefficient group in the wavelet subband; determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.
16. An apparatus for decoding a precinct of video data from a video bitstream, the apparatus comprising:

13435054v1

-442017210632 04 Aug 2017 means for decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

means for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

means for determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.
17. A computer readable medium having a computer program stored on the medium for decoding a precinct of video data from a video bitstream, the program comprising:

code for decoding a significant bit plane index for a coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided into a plurality of partitions of consecutive coefficient groups at fixed predetermined locations within the wavelet subband;

code for receiving a truncation bit plane index for the coefficient group in the wavelet subband;

code for determining run partitions from the plurality of partitions having coefficient values below a predetermined value, a run length of the run partitions being determined from the significant bit plane index and the truncation bit plane index, wherein the run partitions are encoded starting at the fixed predetermined location within the wavelet subband; and decoding the precinct of video data from the bitstream based on the determined run partitions.
18. A video bitstream for decoding a precinct of video data, the video bitstream comprising:

an encoded Greatest Coded Line Index (GCLI) for each coefficient group in a plurality of coefficient groups in a wavelet subband of the precinct of video data, the plurality of coefficient groups being divided, at fixed predetermined locations within the wavelet subband, into GCLI groups; and a run indication of a run length of run GCLI groups positioned at the fixed predetermined locations within the wavelet subband to decode the precinct of the video data,

13435054v1

-45 2017210632 04 Aug 2017 the run GCLI groups being GCLI groups having coefficient values below a predetermined value, wherein the run indication is determined based on the encoded GCLI, a truncation bit plane index and the fixed predetermined location within the wavelet subband.
19. The video bitstream, wherein the run indication is further determined based on a plurality of fixed predetermined locations.