AU2017225027A1

AU2017225027A1 - Method, apparatus and system for encoding and decoding video data

Info

Publication number: AU2017225027A1
Application number: AU2017225027A
Authority: AU
Inventors: Andrew James Dorrell
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-09-05
Filing date: 2017-09-05
Publication date: 2019-03-21

Abstract

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING VIDEO A system and method for encoding and decoding a precinct of video data. The method decoding the precinct of video data from a bit-stream comprises determining a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data (920, 925) and determining a portion of the bit stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length (1510). The method further comprises determining the most significant bit plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code word being decoded based on determining that the code-word comprises a sign bit for the coefficient (1520-1524); determining a further bit-plane by decoding a remaining code word in the plurality of code-words; and decoding the precinct using the decoded code words and signs. 13575128_1 0- N: _00 > N0 It >, I 0, 1 0 IC~N 64 ( - . - "N a) cy 0I LL a,-- a) C/)I aI C./) _ >------------------

Description

2017225027 05 Sep 2017

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING VIDEO DATA

TECHNICAL FIELD

The present invention relates generally to digital video signal processing and, in 5 particular, to a method, apparatus and system for encoding and decoding video data. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer program for encoding and decoding video data.

BACKGROUND

Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Much emphasis in video compression research is directed towards ‘distribution codecs’, i.e. codecs intended for distributing compressed video data to geographically dispersed audiences.

However, an emerging area of research is directed towards ‘mezzanine codecs’.

Mezzanine codecs are used for local distribution, i.e. within a broadcast studio, and are characterised by requirements for ultra-low latency, typically well under one frame, and greatly reduced complexity, both for the encoder and the decoder. Recent developments in such coding within the International Organisations for Standardisation / International

Electrotechnical Commission Joint Technical Committee 1 / Subcommittee 29 / Working Group 1 (ISO/IEC JTC1/SC29/WG1), also known as the Joint Photographic Experts Group (JPEG) have resulted in a standardisation work item named ‘JPEG XS’. The goal of JPEG XS is to produce a codec having an end-to-end latency not exceeding thirty-two (32) lines of video data, and capability for implementation within relatively modest implementation technologies, e.g. mid-range FPGAs from vendors such as Xilinx ®. The latency requirements of JPEG XS mandate use of strict rate control techniques to ensure coded data does not vary excessively relative to the capacity of the channel carrying the compressed video data.

In a broadcast studio, video may be captured by a camera before undergoing several transformations, including real-time editing, graphic and overlay insertion and mixing. Once the video has been adequately processed, a distribution encoder is used to

13575128 1

-22017225027 05 Sep 2017 encode the processed video data for final distribution to end consumers. Within the studio, the video data is generally transported in an uncompressed format. Transporting uncompressed video data necessitates the use of very high speed links. Variants of the Serial Digital Interface (SDI) protocol can transport different video formats. For example,

3G-SDI (operating with a 3Gbps electrical link) can transport 1080p HDTV (1920x1080 resolution) at 30fps and eight (8) bits per sample. Interfaces having a fixed bit rate are suited to transporting data having a constant bit rate (CBR).

Uncompressed video data is generally CBR, and compressed video data, in the context of ultra-low latency coding, is generally expected to also be CBR. As bit rates increase, achievable cabling lengths reduce, which becomes problematic for cable routing through a studio. For example, UHDTV (3840x2160) requires a 4X increase in bandwidth compared to 1080p HDTV, implying a 12Gbps interface. Increasing the data rate of a single electrical channel reduces the achievable length of the cabling. At 3 Gbps, cable runs generally cannot exceed 150m, the minimum usable length for studio applications.

One method of achieving higher rate links is by replicating cabling, e.g. by using four 3G-SDI links, with frame tiling or some other multiplexing scheme. However, the cabling replicating method increases cable routing complexity, requires more physical space, and may reduce reliability compared to use of a single cable.

Thus, a codec that can perform compression at relatively low compression ratios (e.g. 4:1) while retaining a ‘visually lossless’ (i.e. having no perceivable artefacts compared to the original video data) level of performance is required by industry.

Compression ratios may also be expressed as the number of ‘bits per pixel’ (bpp) afforded to the compressed stream, noting that conversion back to a compression ratio requires knowledge of the bit depth of the uncompressed signal, and the chroma format.

For example 8b 4:4:4 video data occupies 24bpp uncompressed, so 4bpp implies a 6:1 compression ratio.

Video data includes one or more colour channels. Generally there is one primary colour channel and two secondary colour channels. The primary colour channel is generally referred to as the ‘luma’ channel and the secondary colour channel(s) are generally referred to as the ‘chroma’ channels. The luma channel captures the intensity information of the pixel and is typically denoted using the letter “Y”. When viewed, the image of the luma channel appears as a black and white (greyscale) image of the scene. The colour (hue) information is captured in the two chroma channels. The chroma channels are denoted using the letter “C”. The letter “C” denoting chroma channels is often used in combination

13575128 1

-3 2017225027 05 Sep 2017 with a colour axis specific subscript, the most common example being Cr and Cb which are used to indicate “red-green” and “blue-yellow” chroma axes respectively. A colour transform describes the conversion between a YCC (luma, chroma) and an RGB (red, green, blue) pixel representation. A colour transform typically takes the form of a matrix operation applied to a vector of the pixel’s channel values. A number of different colour transforms are known in the art, some of which are exactly reversible in integer arithmetic, and are widely employed in image and video compression and transmission. A reorganisation of RGB values to GBR may in some cases be used as a colour transform where the G channel is subsequently treated as if the G channel were a luma channel and the B and R channels are treated as if they were chroma channels.

The transform from RGB to YCC improves compressability of the video data in two ways. Firstly, the transform from RGB to YCC achieves some decorrelation to improve the effectiveness of the subsequent transform coding. Secondly, human visual sensitivity to fine detail is typically greater for the luma channel than for the chroma channels. The greater human sensitivity to the luma channel means that chroma components can incur more loss, and hence more compression, for the same level of visual loss.

Video data is also represented using a particular chroma sampling format. The luma channel and the chroma channels are spatially sampled at the same spatial density when a 4:4:4 chroma format is in use. For screen content, a commonly used chroma format is 4:4:4, as generally FCD panels provide pixels in a 4:4:4 chroma format. Other chroma sampling formats are also possible. For example, if the chroma channels are sampled at half the rate horizontally (compared to the luma channel), a 4:2:2 chroma sampling format is said to be in use. Also, if the chroma channels are sampled at half the rate horizontally and vertically (compared to the luma channel), a 4:2:0 chroma sampling format is said to be in use. These chroma sampling formats exploit the characteristic of the human visual system that sensitivity to intensity is higher than sensitivity to colour.

While 4:2:0 and 4:2:2 chroma sampling formats are widely employed in distribution codecs, they are less applicable to studio environments, where multiple generations of encoding and decoding are common. Also, for screen content the use of chroma formats other than 4:4:4 can be problematic as distortion is introduced to sub-pixel rendered (or ‘anti-aliased’) text and sharp object edges.

Colour channels are also associated with a bit-depth. The bit-depth defines the size, in bits, of samples in the respective colour channel, which determines a range of available

13575128 1

-42017225027 05 Sep 2017 sample values. Generally, all colour channels have the same bit-depth, although the colour channels may alternatively have different bit-depths.

Frame data may also contain a mixture of screen content and camera captured content. For example, a computer screen may include various windows, icons and control buttons, text, and also contain a video being played, or an image being viewed. The content, in terms of the entirety of a computer screen, can be referred to as ‘mixed content’ Moreover, the level of detail (or ‘texture’) of the content varies within a frame. Generally, regions of detailed textures (e.g. foliage, text), or regions containing noise (e.g. from a camera sensor) are difficult to compress. The detailed textures can only be coded at a low compression ratio without losing detail. Conversely, regions with little detail (e.g. flat regions, sky, background from a computer application) can be coded with a high compression ratio, with little loss of detail.

In terms of low complexity, one method is application of a ‘Wavelet’ transform, applied hierarchically across an image. Wavelet transforms have been studied in the context of the JPEG2000 image coding standard. The application of a wavelet transform across an image differs from a transform using a block-based codec, such as H.264/AVC. Application of H.264/AVC for example applies numerous discrete cosine transforms (DCTs) across the spatial extent of each frame. Each block in H.264/AVC is predicted using one of a variety of methods, achieving a high degree of local adaption, at a price of increased encoder complexity due to the need for more decisions to be made. In contrast, the Wavelet transform is applied over a wide spatial area, and thus the prediction modes available to a block based codec are generally not applicable, resulting in a greatly reduced disparity in the complexity of the encoder and the decoder.

In the context of wavelet-based compression techniques, achieving high visual quality and useful compression at low complexity is difficult. Achieving high visual quality and useful compression at low complexity is particularly difficult when strict local rate control is needed to meet ultra-low latency requirements. In a known method, the locations of zero coefficients arising from quantisation are coded efficiently by exploiting the tree structure of the wavelet transform. However, the known approach exploiting the tree structure of the wavelet transform requires extensive memory access, and is accordingly difficult to achieve with low complexity hardware. In another known method, blocks of wavelet coefficients are coded in bit-plane order using a context adaptive arithmetic coder. The bit-serial processing using a context adaptive arithmetic encoder also makes implementation in low complexity hardware difficult. In one example of a known

13575128 1

-5 2017225027 05 Sep 2017 low complexity wavelet codec, only the significant bit-planes of relatively small groups of coefficients are transmitted without specific compression processing. Compression is achieved because many coefficient groups have low magnitude values that require only a few bits to represent. Further, the set of most significant bit-plane indexes for each coefficient group can be subject to lossless compression processing at reduced cost because there are fewer index values than coefficients. However, the known low complexity wavelet codec method achieves only limited compression.

SUMMARY

It is an object of the present invention to substantially overcome, or at least 10 ameliorate, one or more disadvantages of existing arrangements.

One aspect of the present disclosure provides a method for decoding a precinct of video data from a bit-stream, the method comprising: determining a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data; determining a portion of the bit-stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length; determining the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient; determining a further bit-plane by decoding a remaining code-word in the plurality of code-words; and decoding the precinct using the decoded code-words and signs.

In another aspect, the fixed length code-word for the most significant bit-plane encodes a position of a set bit within the most significant bit-plane and a sign for the coefficient at the position.

In another aspect, a code-word for a most significant bit-plane for a further group of coefficients does not contain sign data.

In another aspect, the most significant bit-plane for the further group of coefficients is decoded using the corresponding code-word and a rotation bit stored in a variable length portion of the bit-stream.

In another aspect, decoding the most significant bit-plane for the further group of coefficients comprises: determining the variable length portion of the bit-stream

13575128 1

-62017225027 05 Sep 2017 corresponding to a plurality of rotation bits of most significant bit-planes of groups of coefficients; determining a value of the rotation bit in the variable length portion of the bitstream based on the code-word for the most significant bit-plane for the further group of coefficients; determining a plurality of bits for the most significant bit-plane for the further group of coefficients using the code-word; and modifying the determined plurality of bits by applying a rotation amount corresponding to the determined value of the rotation bit.

In another aspect, the method further comprises: determining a further portion of the bit-stream corresponding to a plurality of sign bits of the group of coefficients based on the number of non-zero coefficient values in the group of coefficients; and determining sign information for coefficients within the group of coefficients by reading sign bits from the further portion of the bit-stream.

In another aspect, at least two groups of coefficients are decoded in parallel using a plurality of threads.

In another aspect, each thread of the plurality of threads reads a respective portion 15 of the bit-stream for a corresponding group of coefficients to be decoded.

In another aspect, the code-word is determined to comprise a sign bit for the coefficient by examining a bit value of the code-word at a predetermined position.

Another aspect of the present disclosure provides a non-transitory computer readable medium having a program stored thereon for decoding a precinct of video data from a bit-stream, the program comprising: code for determining a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data; code for determining a portion of the bit-stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length; code for determining the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient; code for determining a further bit-plane by decoding a remaining code-word in the plurality of code-words; and code for decoding the precinct using the decoded code-words and signs.

Another aspect of the present disclosure provides a precinct of video data from a bit-stream, comprising: a memory for storing data and a computer readable medium; a processor coupled to the memory for executing a computer program, the program

13575128 1

-72017225027 05 Sep 2017 having instructions for: determining a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data; determining a portion of the bit-stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length; determining the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient;

determining a further bit-plane by decoding a remaining code-word in the plurality of code-words; and decoding the precinct using the decoded code-words and signs.

Another aspect of the present disclosure provides a video decoder configured to: receive a precinct of video data from a bit-stream; determine a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data; determine a portion of the bit-stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length; determine the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient; determine a further bit-plane by decoding a remaining code-word in the plurality of code-words; and decode the precinct using the decoded code-words and signs.

Another aspect of the present disclosure provides a bit-stream representing an encoded precinct of video data, comprising: a first portion, the first portion comprising (i) a fixed length code-word representing data at a most significant bit-plane index of a group of coefficients of the precinct and (ii) bit-planes below the most significant bit -plane index and above a truncation index of the precinct, and a sign portion representing sign bits associated with the group of coefficients, wherein the code-word representing data at the most significant bit-plane index includes one bit of sign information relating to a coefficient, and the sign portion is determined based on presence of non-zero coefficients below the truncation index.

13575128 1

-82017225027 05 Sep 2017

In another aspect, the first portion further comprises a further fixed length codeword representing data a most significant bit-plane index of a further group of coefficients of the precinct, and the further fixed length code-word does not contain sign information. 15. The bit-stream according to claim 9, wherein the further code word does not fully specify the most significant bit-plane of the further group of coefficients of the precinct, the bit-stream further comprising: a further portion representing presence of a rotation of bits in the further code-word, the further portion present if any most significant bit-plane data is not fully specified by the code-word.

Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:

Fig. 1 is a schematic block diagram showing a sub-frame latency video encoding 15 and decoding system;

Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which one or both of the video encoding and decoding systems of Fig. 1 may be practiced;

Fig. 3 is a schematic flow diagram showing a method for encoding video data with 20 sub-frame latency;

Fig. 4A is a schematic block diagram showing the structure of a wavelet transform suitable for achieving a low latency conversion of pixel values to wavelet coefficients;

Fig. 4B is a diagram showing a logical organisation of wavelet transform coefficients into sub-bands;

Fig. 4C is a diagram showing an in-memory organisation of an incremental output of a wavelet transform processor;

Fig. 5 shows a set of coefficient groups and corresponding characteristics;

Fig. 6 is a schematic flow diagram showing a method of determining truncation bit plane indices for sub-bands using a rate allocation model;

Fig. 7 is a graph providing a visual depiction of how unused bit-budget from a coded unit of video data within a frame can be redistributed for use in a next coded unit of video data;

13575128 1

-92017225027 05 Sep 2017

Figs. 8A to 8D show tables and pseudo-code for implementing a variable length coding and decoding method;

Fig. 9 is a schematic flow diagram showing a method for decoding video data with sub-frame latency;

Figs. 10A(l-2) show different possible arrangements for a forward wavelet transform;

Fig. 1 OB is a schematic block diagram showing a structure of an inverse wavelet transform suitable for achieving a low latency conversion of wavelet coefficients to pixel values;

Fig. 11 is a schematic block diagram showing an example of functional modules for implementing a sub-frame latency video encoder and inter-connection of the functional modules;

Fig. 12 is a schematic block diagram showing the an example of functional modules for implementing a sub-frame latency video decoder and inter-connection of the functional modules;

Fig. 13 A shows a table implementing a modification of the variable length coding scheme;

Fig. 13B shows an example of three coefficient groups being coded using the variable length coding scheme of Fig. 13 A;

Fig. 14 is a schematic flow diagram showing a method of encoding a group of coefficients using the variable length coding scheme of Fig. 13 A;

Fig. 15 is a schematic flow diagram showing a method of decoding a group of coefficients using the variable length coding scheme of Fig. 13 A;

Figs. 16A-C show data analysis supporting the use of variable length coding and indicating an advantage of restricting use of variable length coding according to measurable properties of a coefficient group;

Fig. 17 is a schematic flow diagram showing a method of determining the most significant nibble for a coefficient group; and

Fig. 18 shows a table of base threshold values used by the method of Fig. 17.

DETAIFED DESCRIPTION INCFUDING BEST MODE

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have

13575128 1

-102017225027 05 Sep 2017 for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

It is to be noted that the discussions contained in the Background section and that above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the inventor or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art.

The arrangements described relate to using a variable length code to encode the 10 most significant bit-plane of coefficients within groups of wavelet coefficients.

Fig. 1 is a schematic block diagram showing functional modules of an example of a sub-frame latency video encoding and decoding system 100. The system 100 transfers video data from a source device 110 to a destination device 130 via a communication channel 120, for example a cable. A video source 112 typically comprises a source of uncompressed video data 113. The video source 112 can be an imaging sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote imaging sensor, for example. The uncompressed video data 113 is conveyed from the video source 112 to a video encoder 114 over a CBR channel, with fixed timing of the delivery of the video data. Generally, the video data is delivered in a raster scan format, with signalling to delineate between lines (‘horizontal sync’) and frames (‘vertical sync’). The video source 112 may also be the output of a computer graphics card, for example displaying the video output of an operating system and various applications executing upon a computing device. The computing device can for example be a tablet computer, laptop or desktop computer. Content output by a graphics card is an example of ‘screen content’. Examples of source devices 110 that may include an image capture sensor as the video source 112 include smart-phones, video camcorders and network video cameras, and the like. As screen content may include smoothly rendered graphics and playback of natural content in various regions, screen content is also commonly a form of ‘mixed content’. The video encoder 114 converts the uncompressed video data 113 from the video source 112 into an encoded (compressed) video data bitstream 115 as described hereinafter in more detail with reference to Fig. 3.

The video encoder 114 encodes the incoming uncompressed video data 113. The video encoder 114 is required to process the incoming sample data in real-time. That is, the video encoder 114 is not able to stall the incoming uncompressed video data 113 (for

13575128 1

-11 2017225027 05 Sep 2017 example, if the rate of processing the incoming data were to fall below the input data rate). The video encoder 114 outputs compressed video data 115 (the ‘bit-stream’) at a constant bit rate. In a video streaming application, the entire bit-stream is not stored in any one location. Instead, minimum coded units (MCU’s) of compressed video data, corresponding to spatially contiguous groups of pixels, are continually being produced by the video encoder 114 and consumed by a video decoder 134 with intermediate storage, for example in the (CBR) communication channel 120. The CBR stream of compressed video data 115 is transmitted by the transmitter 116 over the communication channel 120. Examples of the communication channel 120 include one or more SDI HD MI or display port links as well as other twisted pair links such as CAT5 (or similar) Ethernet cable, optic fibre links. The communication channel 120 can also be a radio connection such as that provided by WiFi (IEEE 802.11) or Bluetooth™. Alternatively, the communication channel can be an internal bus within a system such as a PCI, VESA or SATA or a chip interface such as a MIPI M-PHY physical layer interface.

Video data may be transferred from the source device 110 to the destination device

130 via an intermediate device 125 such as a non-transitory storage device using communication channels 121 and 122. A storage unit 127 such as a “Flash” memory or a hard disk drive can be used in the intermediate device 125, for example to provide a live delay for implementing a dump box functionality. In the intermediate device 125 where digital storage is implemented, a receiver 132 may convert the signal received from physical link 121 back to the encoded digital form as generated by the video encoder 114. The encoded digital form is written to the storage unit 127. A transmitter 116 is used to retransmit the video data over physical link 122.

The destination device 130 includes a receiver 132, a video decoder 134 and a video sink such as a display device 136. The receiver 132 receives encoded video data from the communication channel 120 (or from the channel 122) and passes received compressed video data 133 to the video decoder 134. The video decoder 134 outputs decoded frame data 135 to the video sink 136. Examples of the video sink 136 include a video display device such as a cathode ray tube, a liquid crystal display (such as in smart30 phones), tablet computers, computer monitors or stand-alone television sets, and the like. The video sink 136 can be any other consumer of video data such as a video processing unit encoder or streaming server.

It is also possible for the functionality of each of the source device 110 and the destination device 130 to be embodied in a single device, examples of which include

13575128 1

- 122017225027 05 Sep 2017 mobile telephone handsets and tablet computers, or equipment within a broadcast studio including overlay insertion or other live editing units.

The physical link 120 over which the video frames are delivered may be, for example, part of an SDI interface. Interfaces such as SDI have sample timing synchronised to a clock source, with horizontal and vertical blanking periods. As such, samples of the decoded video need to be delivered in accordance with the frame timing of the SDI link. Video data formatted for transmission over SDI may also be conveyed over Ethernet, for example using methods as specified in SMPTE ST. 2022-6. In the event that samples are not delivered according to the required timing, noticeable visual artefacts result (e.g. from invalid data being interpreted as sample values by the downstream device). Accordingly, the video encoder 114 and decoder 134 implement rate control and buffer management mechanisms to ensure that no buffer underruns and resulting failure to deliver decoded video occur.

Rate variations may arise during compression due to variations in the complexity and time taken for the encoder 114 to search possible modes of the incoming video data 113. Accordingly, the rate control mechanism ensures that decoded video frames 135 from the video decoder 134 are delivered according to the timing of the interface over which the video frames are delivered. A similar constraint exists for the inbound link to the video encoder 114, which needs to encode samples in accordance with arrival timing and may not stall the incoming video data 113 to the video encoder 114 (for example due to varying processing demand for encoding different regions of a frame). To meet the constraints, the video encoder 114 and the video decoder 134 typically implement some buffering of video data. The buffering, at both the encoder 114 and the decoder 134, increases end to end latency of the video transmission. As described above, the video encoding and decoding system 100 has a latency of less than one frame of video data. In particular, some applications require latencies as low as thirty-two (32) lines of video data from the input of the video encoder 114 to the output of the video decoder 134. The latency may include time taken during input/output of video data and storage of partially-coded video data prior to and after transit over a communications channel.

The system 100 includes the source device 110 and the destination device 130. The communication channel 120 is used to communicate encoded video information from the source device 110 to the destination device 130. In some arrangements, the source device 110 and the destination device 130 may comprise respective broadcast studio equipment, such as overlay insertion and real-time editing modules, in which case the communication

13575128 1

- 13 2017225027 05 Sep 2017 channel 120 may be an SDI link. In other arrangements, the source device 110 and the destination device 130 may comprise a graphics driver as part of a system-on-chip (SOC) and an LCD panel (e.g. as found in a smart phone, tablet or laptop computer), in which case the communication channel 120 is typically a wired channel, such as printed circuit board (PCB) tracks and associated connectors.

Moreover, the source device 110 and the destination device 130 may comprise any of a wide range of devices, including devices supporting over the air television broadcasts, cable television applications, internet video applications and applications where encoded video data is captured on some storage medium or a file server. The source device 110 may also be a digital camera capturing video data and outputting the video data in a compressed format offering visually lossless compression, and as such the performance may be considered as equivalent to a truly lossless format (e.g. uncompressed).

Notwithstanding the example devices mentioned above, each of the source device 110 and the destination device 130 may be configured within a general purpose computing system, typically through a combination of hardware and software components.

Fig. 2A illustrates a typical computer system 200, which includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227 and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and loudspeakers 217. The camera 227 may be configured as the video source 112. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220, which may represent the communication channel 120, can be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220. In some arrangements, the transceiver device 216 may provide the functionality of the transmitter 116 and the receiver 132 and the communication channel 120 may be embodied in the connection 221.

The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer

13575128 1

- 142017225027 05 Sep 2017 module 201 also includes a number of input/output (I/O) interfaces including: an audiovideo interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card and provides an example of‘screen content’.

In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Tocal Area Network (TAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface

211 may comprise an Ethernet™ circuit card, a Bluetooth™ wireless arrangement or an

IEEE 802.11 wireless arrangement. However, numerous other types of interfaces may be practiced for the interface 211. The local network interface 211 may also provide the functionality of the transmitter 116 and the receiver 132 and communication channel 120 may also be embodied in the local communications network 222.

The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the video source 112, as an intermediate storage device 127, or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 110, intermediate device 125 and the destination device 130 of the system 100, may be embodied in the computer system 200.

13575128 1

- 15 2017225027 05 Sep 2017

The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun SPARCstations, Apple Mac™ or alike computer systems.

Where appropriate or desired, the video encoder 114 and the video decoder 134, as 10 well as methods described below, may be implemented using the computer system 200 wherein the video encoder 114, the video decoder 134 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 114, the video decoder 134 and the steps of the described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200.

A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably achieves an advantageous apparatus for implementing the video encoder 114, the video decoder 134 and the described methods.

The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212.

13575128 1

- 162017225027 05 Sep 2017

In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc™, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a

PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 401 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.

Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.

When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically

13575128 1

- 172017225027 05 Sep 2017 checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfd various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process.

Furthermore, the different types of memory available in the computer system 200 of Fig.

2A need to be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.

As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (AFU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be

13575128 1

- 182017225027 05 Sep 2017 segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

A video encoding method 300 shown in Fig. 3 (which may be used to implement the video encoder 114) and a video decoding method 900 shown in Fig. 9 (which may be used to implement the video decoder 134) may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder

114, the video decoder 134 and the described methods produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (AFU) 240, and the control unit 239 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:

(a) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;

(b) a decode operation in which the control unit 239 determines which instruction has been fetched; and (c) an execute operation in which the control unit 239 and/or the AFU 240 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

Each step or sub-process in the methods of Figs. 3, 6 and 9, to be described, is associated with one or more segments of the program 233 and is typically performed by

13575128 1

- 192017225027 05 Sep 2017 the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233. Although the execution of each of the methods is described with reference to a single computer system

200 and the corresponding component parts, each of the encoding and decoding methods may execute on distinct processors residing in distinct end equipment. That is, the source device 110, destination device 130 and intermediate device 125 may be implemented as physically distinct devices, each comprising physically distinct computer systems.

Fig. 3 is a schematic flow diagram showing the method 300 of encoding an image, 10 typically a video frame. The method 300 performs encoding using a variable length code for words formed from the most significant bit-planes of coefficient groups. The method 300 is typically implemented as one or more modules of the software application 233, controlled by execution of the processor 205, and stored in the memory 206. The method 300 can be implemented by the video encoder 114.

The method 300 is applied to each frame of a video sequence (for example of the video data 113) independently in order to provide flexibility for editing. Processing of a frame using the method 300 begins at an initialising step 301. The step 301 executes by initialising coding structures including input precision, position indexes and buffers that may reside in the working memory 206 of the computer module 201. Compression of the frame data is able to begin when the method 300 continues to a pixel transform step 302. Execution of the step 302 ultimately generates wavelet coefficients in a sign plus magnitude format. The pixel transform step 302 is typically implemented as a pipeline in order to minimise buffering and latency within the system 100. The pipeline stages are represented by steps 305, 310, 315, 320 and 325. The step 302 starts at the step 305 and accepts (reads or otherwise receives into the memory 206), raw pixel data for a video frame and applies a precision conversion. The precision conversion step 305 creates image samples 306 with a pre-determined “working” precision and range that is independent of the actual input image precision. In practice, the step 305 is typically achieved by shifting the bits in the input words an amount determined by the input and working precisions, and then subtracting an amount to centre the working range about zero. The method of step 305 is possible because a bit-wise left shift of bits in a binary number is mathematically equivalent to multiplication by a power of 2 (i.e. a « n = a X 2”) and bitwise right shift operator which is mathematically equivalent to division by a power of 2 (i.e. a » n — ^). Specifically the precision conversion step 305 converts pixel values according to:

13575128 1

-202017225027 05 Sep 2017 p_w — 2^{Bw Bin}Pi_n — 2^{Bw 1} Equation (1)

In Equation (1), p_in and p_w are channel values of the input pixel and working pixel 5 respectively and B_in and B_w are the number of bits of precision in the input and working pixels respectively.

The method 300 continues under execution of the processor 205 to the step 310. At step 310, the working precision pixel values are subject to a colour transform to yield pixel values that can be treated as YCC samples. An integer reversible colour transform is used at step 310 and is defined mathematically by:

- γ - Cg	1 4	1 -1	2 2	1 ’ -1	Ci >3 _1	Equation (2)
.Co.		. 2	0	—2.	LSud

Other colour transforms known in the art, either integer reversible or approximate, may also be used for step 310 so long as the forward and inverse transform pair are correctly matched between the encoder and the decoder. The colour transform step 310 can be bypassed if the raw pixel data already comprises YCC samples.

The method 300 continues under execution of the processor 205 to the step 315.

The resulting stream of YCC values 311 are passed to the transform step 315. At step 315, the stream of YCC values are subject to a wavelet transformation. Generally, a 5/3 Le Gall wavelet is used, where 5 and 3 refer to lengths of the high pass and low pass filters used in analysis and synthesis. Other wavelets are also possible, such as a Haar wavelet or a Cohen-Daubechies-Feauveau 9/7 wavelet. Due to the requirement of ultra-low latency, the number of levels of decompositions is highly constrained vertically, generally to not more than two levels. The number of levels of decompositions is relatively unconstrained horizontally, for example with five levels being used. In one arrangement, the wavelet transform of step 315 is implemented according to a wavelet transform arrangement 499 of Fig. 4 A. The arrangement 499 is comprised of a series of stages 421, 424, 425, 426, 427, 428 and 429. Each stage applies one level of wavelet decomposition either vertically along columns of pixel values (as in 421) or horizontally along rows of pixel values (as in 424429). The stages 421, 424, 425, 426, 427, 428 and 429 are implemented using a lifting scheme (known in the art) but could equally be implemented using convolutional filters with subsampling (also known in the art). Each stage generates high-pass coefficients (e.g.

13575128 1

-21 2017225027 05 Sep 2017

422) and low pass coefficients (e.g. 423) as output. A total number of output coefficients 316 is equal to a number of the input pixel values 311. Multiple stages of wavelet decomposition are applied to generate a full set of coefficients 401-408. The coefficients 401-408 are referred to in aggregate as the wavelet transform of the input signal.

Due to the structure of the wavelet decomposition stages 421, 424, 425, 426, 427,

428 and 429, a wavelet coefficient structure 400 depicted in Fig. 4B results from execution of step 315. The structure 400 has a sub-band corresponding to each set of values 401-408 output by each stage of the wavelet transform step 315. For example, the T5T and T5H sub-bands represent the collected outputs of the decomposition stage 429, T4H corresponds to the high pass output of 428 and so on.

Other two-dimensional (2D) wavelet transform structures or arrangements arise from a different cascading of the vertical and horizontal stages and may be used in place of those described by Figs. 4A and 4B. Some examples are given in Figs. 10A(l) and 10A(2). Specifically two different examples of wavelet transform stage configurations 1091 and

1093 are shown in Figs. 10A(l) and 10A(2) respectively. The configurations 1091 and

1093 are shown along with corresponding sub-band structures 1092 and 1094 respectively. The structure 1092 comprises coefficient sub-bands 1041-1047 and the structure 1094 comprises coefficient sub-bands 1051-1057. In both configurations 1091 and 1093, the transform is comprised of one-dimensional (ID) wavelet transform stages which are applied either vertically (for example 1031a, 1031b) or horizontally (for example 1035a, 1035b). Each stage generates a single high-pass (for examplel032a, 1032b) and a single low-pass (for example 1033 a, 1033b) output. The example 1091 is widely used to implement a 2 level 2D wavelet transform.

Following the model of the transform arrangement 499 of Fig. 4A, additional ID transform stages may be performed on the output 1041 of 1091. Different wavelet transform structures have different latency that is often the result of vertically applied transform stages. In all cases, a corresponding precinct structure mirrors the sub-band structure. Whatever wavelet transform structure is employed in the video encoder 114, the complimentary structure is employed in the decoder 134. For example, Fig. 10B depicts an inverse wavelet transform structure 1099 that is complimentary to the structure of Fig. 4A. The structure 1099 is described in detail below in the context of a video decoder.

Referring back to Fig. 3, output wavelet coefficients 316 are subject to a further precision conversion as the method 300 progresses to the step 320. The conversion at step

13575128 1

-222017225027 05 Sep 2017

320 rounds the coefficient values into a working range of the compression engine and is defined mathematically as

C_c — + (l « (β_Δ — 1))) » Β_Δ Equation (3) ₌ C_w + 2⁸^¹ “ 2^βΔ

In Equation (3), C_c and C_w are the coefficient values normalised to compressor and 5 working precision respectively. Β_Δ — B_w — B_c is a difference between the number of bits used to represent values in working and compressor precision and « and » denotes the bitwise left and right shift operators, mathematically equivalent to multiplication and division by a power of 2 (that is a « n — a x 2ⁿ and a » n ~ ^). A resulting stream of compressor precision wavelet coefficients 321 is generated and the method 300 continues to the step 325. The compressor precision wavelet coefficients 321 are passed to the step 325. The coefficients 321 are converted to a sign-magnitude format for subsequent compression at step 325. Sign-magnitude format represents each value as a positive integer equal to the coefficient’s absolute value plus a single “sign” bit. The sign bit is 1 if the coefficient is negative and 0 if the coefficient is positive. The sign-magnitude format is particularly useful in compression of wavelet coefficient data because the magnitude of the wavelet coefficients, irrespective of sign, indicate the importance of the value to the accurate representation of the original signal. In other words, any lossy compression should aim to preserve the high magnitude coefficients.

Because of the wavelet transform’s cascaded structure, the step 315 (and therefore the step 302) can, after some pipeline delay, generate output coefficient samples 316 at the same rate as the input image pixels 311 arrive. Referring again to Fig. 4A, the output 316 of the wavelet transform block 499 therefore appears in split raster scan order. In split raster scan order rows from the full set of sub-bands are generated together. Within the codec, groups of rows (e.g. 431 and 432 in Fig. 4B) that are generated (and coded) together are called a precinct. Groups of precincts, as defined for example by the spatial regions 435 and 436 form a grouping referred to as a slice. A single slice may represent a whole video frame. A precinct 430 comprising groups of rows 431 and 432 is shown in Fig. 4C. The precinct 430 is shown to have the same sub-band structure as in the arrangement 400. However, each of the precinct sub-bands 441-448, corresponding to horizontal slices of the wavelet transform sub-bands 401-408 respectively, has a smaller number of rows. If there is only a single level of vertical wavelet transform, a number of rows 424c in the vertically low-pass precinct sub-band 431 is equal to a number of rows 426c in the vertically high13575128 1

-23 2017225027 05 Sep 2017 pass precinct sub-band 432. If there was a second level of vertical wavelet transform then a third row of high pass samples would contain twice as many rows - again due to the cascaded structure of the wavelet transform. The minimum row height for a precinct subband (e.g. 424c, 426c) is 1 row of coefficients. The minimum number of rows contained in a precinct is a function of the number of levels of vertical wavelet transform, for example 2 rows for 1 level of decomposition, 4 rows for 2 levels, 8 for 3 levels and so on. To achieve a suitable compromise between compression and latency, a single level of vertical decomposition and 2 rows of precinct data are typically used. To improve compression performance, more rows of coefficients can be included in a precinct - with or without additional levels of wavelet transform.

Referring back to Fig. 3, the method 300 continues under control of the processor

205 from step 302 to a step 330. Step 330 of the encoding and compression method 300 receives coefficient data from the pixel transform step 302 structured as the precinct 430 as depicted in Fig 4C. From the precinct structure 430, the step 330 extracts coefficient groups. Coefficient groups are small sets of horizontally adjacent coefficient values such as depicted by 451-453 of Fig. 4C. More generally the coefficient groups contain horizontally adjacent or vertically adjacent coefficients or have a 2D block structure or some combination thereof. A main feature of the coefficient groups is that the group is spatially localised and the number of coefficients is relatively small. Execution of the step 330 divides each of the precinct-sub-bands into groups of 4 adjacent coefficients. The coefficient groups of the structure 430 are subsequently coded in steps 335 through 370 using a series of most-significant bit-plane indices (one per coefficient group), sign bits, a code representing the most significant bit-plane of the group and additional data bits that encode the coefficient magnitude values (below the most significant bit-plane) within each coefficient group.

The coding of the coefficient groups is described with additional reference to Fig.

5. In particular, in Fig.5, an arrangement 500 shows an expanded view of example coefficient groups 451-453 and corresponding characteristics (MSB index to MSBI code). Coefficient values 510 for the coefficient groups are represented in sign-magnitude format by sign bits 520 and magnitude bits 530. The magnitude bits 530 are stacked vertically so that the most significant bit (MSB) appears at the top and the least significant bit (FSB) at the bottom according to a bit-pane index 521.

The method 300 continues from step 330 to step 335. For each group of coefficients (for example 451-453), the step 335 executes to determine a most significant

13575128 1

-242017225027 05 Sep 2017 bit-plane index 540. The most significant bit-plane index 540 is the first index of a set of bit-planes 531 containing only zeros. More generally, the sequence of MSB-plane index values 540 defines a group-by group partitioning of the bit-planes into insignificant bitplane 531 and significant bit-planes 532 and 533.

The method 300 continues under execution of the processor 205 from step 335 to step 340. Lossy compression of the coefficient values is achieved by further introducing a truncation index 550. The truncation index 550 for each precinct sub-band is determined at process step 340 using a rate allocation process described below with reference to Fig. 6. The method 300 continues under execution of the processor 205 from step 340 to step 345. The difference between the MSB-plane index and the truncation index defines a bitprecision to be used for representing coefficient values within each coefficient group. The coefficient values are quantised at step 345 to be represented with the required bitprecision. A simple form of quantisation is truncation whereby the bit-planes 533 below the truncation index are set to zero and the remaining significant bit-planes 532 are left unmodified. However, better compression performance can be achieved using more complex quantisation schemes. In one arrangement, coefficient values are rounded according to the function:

C_n = 2^Bt

Equation (4)

In Equation (4), C^and C are the quantised and un-quantised coefficient magnitudes respectively and is a quantisation step size defined as:

2^ΰΜ~^Τ

Equation (5)

In Equation (5) B_M and B_T are the MSB-plane index 540 and the truncation bitplane index 550, respectively, of a coefficient group. Step 345 effectively operates to form one or more words from bits of the MSB-plane data of a group of wavelet coefficients for the image or frame.

The method 300 continues from step 345 to step 350. At step 350 of the compression method 300 of Fig. 3, the most significant bit-plane words (for example 551553) are extracted (for example as 535) and coded using a variable length code (as shown by 536). A prefix code is determined and used in execution of step 350 using inputs 811 and outputs 812 defined in a table 810 of Fig. 8 A. Specifically, the MSB-plane bits are treated as an input word for the inputs 811 and a corresponding one of the codes 812 is

13575128 1

-25 2017225027 05 Sep 2017 used to represent the MSB-plane data in the compressed bit-stream. The prefix codes 812 are designed to have shorter length for cases where the MSB-plane of the coefficient group contains only a single significant bit (1). The prefix codes 812 comprise a variable length code having fewer bits than the extracted most significant bit-plane words (for example as

535) for frequently occurring words. MSB-plane words containing only a single significant bit are typically significantly more frequent than other words for typical image data, including screen content. Other codes are similarly allocated on the basis of frequency. In particular, more frequently occurring MSB-plane words are given shorter codes and less frequently occurring MSB-plane words are given the longest codes. In the absence of a data driven frequency analysis, properties of typical images are used to design a code.

Specifically, high magnitude coefficients are typically sparsely distributed within wavelet sub-bands for typical image content. Thus, the code of table 810 generally contains shorter codes for input MSB-plane words with longer consecutive runs of zeros.

There are two exceptions to this rule. Firstly, the word “0000” will never occur in practice because, by definition, the MSB-plane for a group contains at least one significant (non-zero) bit. The word “0000” is therefore assigned to a long code but could equally be unassigned. Secondly, the word “1111”, while not typically expected in any high-pass coefficient sub-bands (such as 402-408), is expected with high frequency in the low pass sub-band 401. By making the code for the word “1111” have length 4 the coding method is suitable to be applied to all sub-bands, without knowledge of the specific sub-band being processed. The encoding of the video data can be readily implemented using the lookup table 820 of Fig. 8B. In the table 820, the rows of the table 810 have been reordered so that the MSB-plane word values appear in numerical order. Accordingly, the MSB-plane word can be used as an index 821 to directly access memory containing the corresponding code

822 and code length 823, resulting in a particularly straightforward and low complexity coding process in both hardware and software. At step 350, the formed word(s) from step 345 are encoded using the determined variable length code (for example from the lookup table 820).

The method 300 continues from step 350 to step 360. Subsequent bit-planes 537, also referred to as less significant bit planes, are encoded at step 360. In one arrangement the subsequent bit-planes are encoded without modification so that the subsequent bitplanes appear as a sequence of fixed length codes 537 within the compressed bit-stream. The overall length of the code for any coefficient group is determined by the MSB-plane index and truncation bit-plane index. Lossless compression results from not needing to

13575128 1

-262017225027 05 Sep 2017 transmit leading zeros whereas lossy compression results from discarding quantised least significant bits. Step 360 effectively operates to encode the coefficients for the remaining (less significant) bits of the frame using fixed length encoding.

The method 300 continues under execution of the processor 205 from step 360 to 5 step 365. At step 365 of the compression method 300 of Fig. 3, the sign bits 520 of the coefficients are encoded. The sign bits are encoded by dropping meaningless bits. For example, if a coefficient value is zero after quantisation, the sign bit is considered meaningless and dropped from the coded sign data 521 that is written to the compressed bit-stream 115.

The method 300 continues under execution of the processor 205 from step 365 to step 370. The sequence of MSB-plane indices 540 are encoded at step 370. The MSBplane indices 540 are typically encoded using a combination of differential and run coding. In a specific arrangement, a prediction residual (for example 541) is generated for each MSB-plane index 540. Prediction can be performed horizontally (along rows of the precinct as shown). However if memory is available, vertical prediction (down columns of the precinct) is typically more efficient. Prediction residuals can be set to zero wherever the truncation bit-plane index equals or exceeds the MSB-plane index 540. The resulting stream of residual values (e.g. 541) can be coded using a variable length code that yields shorter code words for smaller residual values (e.g. 542). In one arrangement a unary code is employed. Runs of zeros can be further encoded using a run-length code.

The method 300 continues under execution of the processor 205 from step 370 to step 390. Execution of step 390 of the compression method 300 packs the encoded information including the coded MSB-plane indices (e.g. 542), the coded MSB-plane words (e.g. 536), the sign bit information (e.g. 521), and other quantised coefficient bit25 planes (e.g. 537) and other required header information such as sub-band truncation indices into a bit-stream according to a predetermined set of syntax rules. The truncation indices can be encoded using as few as two parameters - scenario and refinement. The scenario and refinement parameters form part of a rate allocation process 340 as described below. Parameters such as the scenario and refinement as well as any other parameters required to decode the precinct are written together within a structured block referred to as a precinct header. Parameters that are relevant to the whole frame are encoded in a frame header within the bit-stream syntax.

For the compression method 300 to generate a frame data at a specific compressed bit rate, the truncation indices determined at step 340 must be appropriately selected. A

13575128 1

-272017225027 05 Sep 2017 method for determining the truncation levels is now described with reference to Fig. 6. Fig. 6 shows a method 600 of rate allocation. The method 600 is typically implemented as one or more modules of the application 233, controlled by execution of the processor 205 and stored in the memory 206.

The method 600 of rate allocation takes place in two distinct stages. In a first stage

601, budget calculations are performed to determine the coding cost for each precinct subband at each truncation index. The coefficient data is encoded in distinct parts, and the budget calculation step 601 involves distinct steps for calculating the cost of coding the MSB-plane index data (step 610), the coefficient magnitude data (step 620), the MSB10 plane codes (step 630) and the cost of the sign information (step 640). Coding costs are tabulated for each precinct sub-band at each truncation index. Where alternative methods exist for a coding step, such as vertical or horizontal prediction of the MSB-plane indices, costs are determined and tabulated for each case.

The method 600 continues under execution of the processor 205 from step 601 to a rate allocation step 602. A resulting sub-band coding cost table 603, generated at step 601, is passed to the rate allocation step 602. The rate allocation step 602 starts by determining an available precinct bit-budget at step 650. The available precinct bit-budget is determined based on the frame budget and may incorporate any unused bit-budget from the coding of a previous precinct. If slice partitions (e.g. 435, 436) are employed then budget sharing is not extended to precincts from different slices even if they are adjacent. Having determined an available precinct bit-budget, the method 600 continues to step 660. The budget tables are traversed at step 660 to determine a set of truncation points that deliver the highest fidelity without exceeding that bit-budget. To simplify determining truncation planes at step 660, a gain table 604 and a priority table 605 are used. The gain table 604 captures the relative gains of the wavelet synthesis filters corresponding to each wavelet sub-band in the decomposition structure 400 as discussed below in the context of the decoder and with reference to Fig. 10. In particular the gain table 604 contains, for each sub-band, and up to an offset, the log base 2 of the synthesis gain quantised to a nearest integer. The error in the relative gains as captured in the gain table is then ranked in order of magnitude to produce the priority table 605. Together, the gain and priority tables lead to an algorithm for selecting truncation levels for the sub-bands at step 660.

The algorithm for selecting truncation levels comprises two sequential searches. In a first search, which iterates over a first rate control variable referred to as scenario, the precinct coding cost is determined for decreasing rate increments until a calculated coding

13575128 1

-282017225027 05 Sep 2017 cost is less than the available budget. In the second search, which iterates over a second rate control variable referred to as refinement, the precinct coding cost is determined for increasing rate increments until a calculated coding cost is identified which is closest to the available budget without exceeding it. The relationship between the scenario τ, refinement e and bit-plane truncation index B_T [i] for sub-band i is defined as follows:

„ _Γ._Ί τ — Γ[ί] + 1 ifκ[ί] < e .

B_r[tl - „_r., , . Equation (6) ^{1L J} τ - F[t] otherwise ^{H v 7}

In Equation (6) Γ is the vector of per sub-band gains as recorded in the gain table

604 and κ is the vector of per sub-band refinement priorities as recorded in the priority table 605. The coding cost C for the precinct, is subsequently determined according to

C — Σί/φ-^τΠ]] Equation(7)

In Equation (7) A is the sub-band coding cost table 603. For strict rate analysis, the cost calculation may need to be subject to rounding. For example, if the precinct bit-stream is intended to be aligned to byte boundaries then the cost C would be rounded up to the next multiple of 8. Having determined an actual coding cost for the current precinct which is less than the precinct bit-budget, the method 600 continues to step 670. Unused bitbudget can be allocated at step 670. Specifically, unused bit-budget can be carried forward for use in coding a subsequent precinct where the subsequent precinct falls within the same slice as the current precinct. Otherwise, the unused budget may be assigned to padding. Padding adds additional, non-functional bits to the bit-stream for the purpose of aligning to a predetermined bit location.

While the step 602 determines truncation bit-planes based on an exact budget calculation at step 601, the ability of step 670 to forward unused bit-budget to a next precinct means that using an approximate budget calculation in step 601 is possible.

Fig. 7 illustrates the how carry forward of unused precinct bit-budget may be 25 combined with cost estimation. A graph 700 shows the available precinct bit-budget for two spatially adjacent precincts in a frame - precinct n and precinct n+1. Precinct n is determined to have an available budget 710 at step 650. The step 660 performs a search over scenario and refinement to determine a set of truncation bit-plane indices that result in consumption of rate as shown as 720 in Fig. 7. Any further lowering of the truncation level and consequent coding of any additional bit-planes would result in the estimated cost exceeding the precinct budget 710. When the video encoder 114 encodes the precinct according to the selected scenario and refinement, a smaller number of bits, 721 bits, are written to the encoded bit-stream 115. Then, a bit-budget 712 for precinct n+1 is

13575128 1

-292017225027 05 Sep 2017 determined by adding the unused bit-budget from the precinct n to the precinct bit-budget for the precinct n+1. When coding the precinct n+1, the application 133 is able to select a lower truncation level than would otherwise be the case. In some arrangements, the first precinct of a slice may be expected to be coded at slightly reduced quality compared to subsequent precincts in the frame, as the first precinct in the frame does not benefit from receipt of forwarded rate from any earlier precincts. One solution to the resultant reduced quality is to adjust the per-precinct budget such that the first precinct in each slice is allocated a higher budget than subsequent precincts in the frame.

Fig. 9 is a schematic flow diagram showing a method 900 of decoding a video 10 frame. The video frame has been encoded using variable length code for words formed from the most significant bit-planes of coefficient groups. The method 900 is typically implemented as one or more modules of the application 233, controlled by execution of the processor 205, and stored in the memory 206. The method 900 can be implemented by the video decoder 134.

The method 900 is applied to the bit-stream for each frame of a video sequence independently in order to provide the flexibility for editing. The method 900 begins at a step 910. The step 910 executes by initialising decoding structures including input precision, position indexes and buffers that may reside in the working memory 206 of the computer module 201. Initialising decoding structures may involve reading and decoding header information about the image size and colour component structure, as well as compression options and data that are not assumed to be known by the decoder 134 from the bit-stream. Initialising decoding structures may also include the wavelet transform structure, precinct and coding group structure and so on. Subsequently, the method 900 continues under execution of the processor 205 from step 910 to a step 915.

Decompression is applied to recover precincts which are converted back to pixels with sub-frame latency. The decompression begins at step 915 by deconstructing the bit-stream to separate information relating to the different coded components - the MSB-plane indices, truncation indices, the magnitude data including the MSB-plane codes and other bit-plane codes, and the sign bits.

Referring to step 915, the method 900 continues to step 920 to decode or determine

MSB-plane indices for the coefficients. Decoding the MSB-plane indices at step 920 involves reading and decoding the series of unary codes, including undoing any run-length coding to recover the prediction residuals. The prediction residuals are then added to the predicted values to generate a row of MSB-plane index values. The predicted values may

13575128 1

-302017225027 05 Sep 2017 be a previous row of decoded MSB-plane indices (if vertical prediction is being used) or an immediately (horizontally) previous value (if horizontal prediction is being used) or zero in the case of any unpredicted values (such as the first sample in a precinct row).

The method 900 continues under execution of the processor 205 from step 920 to 5 step 925. At step 925 the truncation indices are decoded or determined. Decoding the truncation indices involves re-generating the values from scenario and refinement according to Equation(6). If the gain and priority tables are not known to the decoder 134 then the gain and priority tables would typically be communicated via a frame header. Accordingly, the two values - scenario and refinement - are all that is required to reconstruct the precinct truncation indices. The scenario and refinement values are communicated via the precinct header.

The method 900 continues under execution of the processor 205 from step 925 to step 930. At step 930, MSB-plane words for non-zero coefficient groups are read and decoded. The variable length codes representing the MSB-plane words can be read and decoded using an algorithm 830 shown in Fig. 8C in combination with a 2D look-up table 840 (shown in Fig. 8D). Up to 3 leading bits are read from each MSB-plane code. In particular, leading bits 841 are read until a zero (0) is encountered or three consecutive ones (1) are encountered. The number of ones read is then interpreted as a row index for the 2D table 840. Having determined a row index, a column index 842 - a two bit number

- is read directly from the bit-stream and the two indices used to access a decoded MSBplane word 843. Step 930 effectively determines values for bits of MSB-plane words using the variable length codes.

The method 900 continues under execution of the processor 205 from step 930 to step 935 to form the remaining or less significant bits of the coefficients. At step 935, other less significant bit-plane words are read (received) as a series of fixed length codes. The number of codes is determined as the difference between the MSB-plane index and the truncation bit-plane index for coefficient group (e.g. 451-453). The decoded bit-plane words for the MSB-plane (e.g. 535) and other bit-planes (e.g. 537) are written to a memory (such as the memory 206) using the decoded MSB-plane indices (e.g. 540) to determine the bit-plane group (e.g. 532) into which the decoded bit-plane words should be written.

The method 900 continues under execution of the processor 205 from step 935 to step 940. Having reconstructed the coefficient magnitudes, decoding of the associated sign bits 520 is performed at step 940. In one arrangement, sign bits are read as a fixed length word for each coefficient group. In another, more compact but more complex to decode

13575128 1

-31 2017225027 05 Sep 2017 arrangement, coefficient magnitudes are compared to zero and a sign bit read for each nonzero magnitude encountered.

The method 900 continues under execution of the processor 205 from step 940 to step 945. The quantised coefficient magnitudes that result from the decoding process steps

910-940 are de-quantised at step 945. A simple form of de-quantisation is truncation whereby the bit-planes 533 below the truncation index are set to zero and the remaining significant bit-planes 532 are left unmodified. However, better compression performance can be achieved using more complex quantisation schemes. In one decoder arrangement corresponding to the encoder arrangement described previously with reference to Equation (4) and Equation (5), coefficient values are calculated according to the function:

C = 0- X 5_q Equation (8)

In Equation (8) C'^and C are the quantised and un-quantised coefficient magnitudes 15 respectively and 5_q is the quantisation step size defined as

2^bM⁺¹

X = ζΒμ-βγ+ι-! Equation (9)

In Equation (9), B_M and B_T are the MSB-plane index 540 and the truncation bit20 plane index 550, respectively, of a coefficient group. Step 945 effectively determines or reconstructs coefficient values 955 for a group using the MSB-plane values from step 930 and the fixed length codes of step 935 and 940.

The reconstructed coefficient values 955 for the precinct are subsequently provided to the inverse pixel transform step 960 to produce a number of output lines of pixel data.

The pixel transform step 960 is implemented as a pipeline in order to minimise buffering and latency within the system implementing the method. The pipeline stages are represented by steps 961, 963, 965, 967 and 969. The step 960 starts at step 961 and accepts coefficient values in sign-magnitude format and converts the coefficient values to conventional signed format values 962. In particular, the magnitude is multiplied by -1 if the sign bit is non-zero. The method 900 continues to the step 963. The signed coefficient values 962 are converted to a working precision by execution of step 963. The conversion step 963 transforms the coefficient values from the working range of the compression

13575128 1

-322017225027 05 Sep 2017 process to the working range of the wavelet transform process and is defined mathematically as

C_w — 2^B&C_c Equation (10)

In Equation (10), C_c and C_w are the coefficient values normalised to compressor and working precision respectively; — B_w ~ B_c is the difference between the number of bits used to represent values in working (wavelet) and compressor process precision. The method 900 continues from step 963 to step 965. The resulting stream of working process precision coefficients 964 from step 963 is subjected to an inverse wavelet transformation at step 965. The specific inverse wavelet transform used in the arrangements described is the 5/3 Te Gall wavelet. In other arrangements, other wavelet transforms could be specified via a frame header. The inverse wavelet transform 1099, as shown in Fig. 10B, is a mirror of the forward wavelet transform process 315 used during the encoding process 300 (and as shown as the arrangement 499 Fig. 4A). The inverse wavelet transform arrangement

1099 is comprised of a series of stages 1029, 1028, 1027, 1026, 1025, 1024 and 1021. Each stage applies one level of wavelet synthesis, either vertically along columns of pixel values (as in 1021) or horizontally along rows of pixel values (as in 1024-1029). The stages 1029, 1028, 1027, 1026, 1025, 1024 and 1021 are implemented using a lifting scheme (known in the art) but could equally be implemented using convolutional filters with up-sampling (also known in the art). Each stage accepts high-pass coefficients (e.g. 1022) and low pass coefficients (e.g. 1023) as input and a total number of output pixels 966 is equal to the number of input coefficient values 964. Multiple stages of wavelet synthesis are applied to consume the full set of coefficients 1001-1008 that are referred to in aggregate as the wavelet transform of the input signal. Each stage of the inverse transform arrangement

1099 has an associated gain, being the ratio of the energy in the output samples (which are low-pass coefficients at the next transform level) to the input coefficients. When multiple inverse transform stages are applied to a single input coefficient (e.g. 1001) then the gain for that coefficient is the product of the gains of all of the inverse transform stages the coefficient passes through. The gain value is typically different for each sub-band and is referred to as the synthesis gain of the sub-band (e.g. 401) to which the coefficient belongs. For the 5/3 wavelet transform employed in one arrangement, the synthesis gain associated with one level of inverse high-pass transformation is 0.71875 while the synthesis gain associated with one level of inverse low-pass transformation is 1.5. The synthesis gain

13575128 1

-33 2017225027 05 Sep 2017 does not need to be explicitly applied to the coefficients because the inverse gain is built into the forward transform. However, the synthesis gain must be known in order to generate the gain 604 and priority 605 tables used in the rate allocation step 602 and specifically the step 660 for determining the truncation bit-planes and specifying them in terms of a scenario and refinement.

The method 900 continues under execution of the processor 205 from step 965 to the step 967. The working precision pixel values 966 output from the inverse wavelet transform step 965 are subjected to an inverse colour transform at step 967 to convert the YCC samples to RGB samples 968. The colour transform inverts the forward colour transform (e.g. using Equation (2)) applied by the encoder during step 310 and may be bypassed if the forward transform was bypassed during encoding. Inverting the transform specified in Equation (2) is defined mathematically by:

R_W	Γ1	-1	Ι-	r
G_w	—	1	1	Ο	Cg	Equation (11)
-Bw-		.1	-1	-1.	.Co.

The method 900 continues under execution of the processor 205 from step 967 to the step 969. The working precision RGB pixel values 968 are converted to the original pixel precision at step 969 to produce the output pixel data 135 for the reconstructed frame. The precision conversion step 969 converts pixel values according to:

ρ,_Λ, + 2^Β^-^,+2^βΔ-^ί

Pout =-- Equation (12)

In Equation (12), p_out and p_w are the channel values of the output pixel and working pixel respectively and Β_Δ = B_w — B_out is the difference between the number of bits used to represent values in working and output precision.

Fig. 11 is a schematic block diagram showing an architecture 1100 of functional modules of the video encoder 114 used to implement the method 300. The video encoder 114 may be implemented using a general-purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by

13575128 1

-342017225027 05 Sep 2017 the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200. The video encoder 114 and the described methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods.

Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular, the video encoder architecture 1100 comprises modules 1110-1190. Each of the modules 1110-1190 may each be implemented as one or more software code modules of the software application program 233, or an FPGA ‘bitstream file’ that configures internal logic blocks in the FPGA to realise the video encoder 114. The video encoder 114 provides reduced complexity in the rate allocation functionality by approximating costs for evaluation of candidate truncation levels, such that a worst case estimate is used for each candidate during evaluation. Then, for coding, the actual coded cost is derived once, only at the selected truncation level that is applied for coding. The coder further provides a low complexity variable length coding of the MSB-plane of the coefficient groups leading to improved compression performance relative to the fixed length coding used for other bitplanes.

Received raw video data 113 is input to a pixel transform module 1110. The pixel transform module 1110 performs the step 302, generating precincts of wavelet coefficients 1112 in sign-magnitude format. The wavelet coefficients 1112 are stored in an output buffer memory of the pixel transform module 1110 so the wavelet coefficients 1112 can be read and processed at different rates by an MSB index calculator 1120, a quantiser 1160 and a sign coder 1170. The MSB index calculator module 1120 implements the step 335, generating a stream of indices 1122 input to a budget calculator module 1140 and an MSB index encoder module 1130. The budget calculator module 1140 in turn implements the step 601 and the associated sub-steps 610 to 640. In some arrangements, the budget calculator module 1140 also reads coefficient values 1112 from the pixel transform module 1110.

The budget calculator module 1140 is required to read the coefficient values 1112 if the budget calculator module 1140 must do exact budget calculations because the coefficient values are required to determine the cost of coding sign information and MSBplane coding. If rate forwarding is being used, it is only necessary for the budget calculator module 1140 to determine a worst case budget estimate. The worst case budget estimate

13575128 1

-35 2017225027 05 Sep 2017 relates to an estimate of an encoded size of a group of wavelet coefficients for a group. The worst case budget can be determined using a maximum possible length of the variable length code for an MSB-plane word. Determining a worst case budget estimate can typically be done without reference to the coefficient values 1112. Budget tables 1142 constructed by the budget calculator 1140 are input to the rate allocator module 1150 for implementing the step 602 and the associated sub-steps 650 to 670 to determine truncation bit-plane indices 1152 for each precinct sub-band. The rate allocator module 1150 may also generate prediction mode information.

The MSB index coder 1130 uses the truncation index 1152 and prediction method 10 information to truncate and encode the MSB index stream 1122 according to the method described for step 370, generating coded MSB index data output to a packer 1190. The MSB index data 1122 and the truncation bit-plane indices 1152 are also used by the quantiser module 1160 to generate quantised coefficient magnitudes 1166 according to the step 345. The quantised coefficient magnitudes 1166 are input to a sign coder 1170 and used to process the sign bits of the precincts of coefficients 1112 to create a stream of coded sign information according to step 365. The output of the quantiser 1160 is also divided into a stream of MSB-plane words 1162 and other bit-plane words 1164 as exemplified respectively by 535 and 537 of Fig. 5. The MSB-plane words 1162 are passed to a MSB-plane coder 1180 that implements the step 350. The stream of coded MSB-plane words 1184 (as exemplified by 536 of Fig. 5) are passed, together with the stream of other bit-plane words 1164 and the coded sign information 1172 to the packer 1190. The packer 1190 which merges the streams of words 1164 and 1184 and the sign information 1172 according to the method of step 390 to form a compressed bitstream 115. The packer 1190 is able to unambiguously determine a budget value 1192 consumed by the coded precinct.

When rate forwarding is implemented, the value 1192 is returned to the rate allocator module 1150 for use in calculating the truncation bit-plane indices for the next precinct.

The architecture 1100 of the video encoder 114 shows significant data processing modules and associated paths without the complication of control modules and paths. In a practical implementation, one or more control modules would be included that would control overall operation of the encoder 114 as well as buffer memory and so on. In some arrangements, for example, information controlling the operation of the encoder and providing data about the input video bit-stream 113 are written by a supervisory process into control registers that are read as required by other encoder modules (not shown). Alternatively, the registers may be used for part of the relevant encoder modules.

13575128 1

-362017225027 05 Sep 2017

Accordingly, important control information such as frame dimensions, quantisation method, gain and priority tables, colour transform and so on would be available to the modules of the architecture 1100. The data flows involved in the relevant control connections have a relatively small impact on the modular design of the encoder 114 and therefore do not significantly contribute to the design.

Fig. 12 is a schematic block diagram showing an architecture 1200 of functional modules of the video decoder 134 used to implement the method 900. The video decoder 134 may be implemented using a general-purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200. The video decoder 134 and the described methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods.

Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular, the video decoder

134 comprises modules 1210-1270.Each of the modules 1210-1270 may be implemented as one or more software code modules of the software application program 233, or an FPGA ‘bitstream file’ that configures internal logic blocks in the FPGA to realise the video decoder 134. The video decoder 134 provides improved compression performance through the use of variable length coding of the MSB-plane of the quantised magnitudes coefficient groups.

The architecture 1200 receives the compressed video data 133. The compressed video data 133 is first input to an unpacker module 1210. The unpacker module 1210 divides out the components of each encoded precinct according to step 915. Coded MSBplane index data 1242 generated at step 915 is passed from the unpacker module 1210 to an MSB index decoder module 1240 along with any control information, including the prediction mode, required to decode the MSB-plane indices according to the step 920. Scenario and refinement values 1232, extracted from the precinct header data, are passed to a truncation decoder modulel230 that calculates the truncation bit-plane indices 1256 for the precinct sub-bands using the method of the step 925. The MSB-plane codes 1222 are

13575128 1

-372017225027 05 Sep 2017 processed by the MSB-plane decoder module 1220 to produce MSB-plane words 1254 for coefficient groups according to the step 930. The MSB-plane words 1254 are combined with other bit-plane words 1255, decoded according to the step 935, in a dequantiser module 1250. The dequantiser module 1250 reconstructs coefficient magnitudes 1262. The coefficient magnitudes 1262 are used by a sign decoder module to process the encoded sign information 1264 and generate sign bits 1258 for the coefficient groups in accordance with the step 940. The dequantiser module 1250 is responsible for storing the bit-plane words into coefficient groups in sign-magnitude format, as exemplified in the expanded view of a memory array 500 of Fig. 5, and applying the dequantisation method of step 945 to the coefficient magnitudes to produce reconstructed coefficient values in sign-magnitude format 955. In turn, the reconstructed coefficient values 955 form an input to a pixel transform module 1270. The pixel transform module 1270 is responsible for implementing the inverse pixel transform of step 960. In particular, the pixel transform module 1270 implements the processing pipeline comprised of sub-steps 961, 963, 965, 967 and 969 that recover the decompressed video pixels 135 from the coefficient values.

The architecture 1200 of the video decoder 134 shows the significant data processing modules and associated paths without the complication of control modules and paths. In a practical implementation, one or more control modules would be included that would control the overall operation of the decoder as well as buffer memory and so on. In one arrangement, for example, the header information from the compressed video bitstream 133 input to the unpacker module 1210 would be stored to control registers that would be read as required by other decoder modules. Alternatively, the control registers may for part of the relevant decoder modules. Accordingly, important control information such as frame dimensions, quantisation method, gain and priority tables, colour transform and so on would be made available to the modules of the architecture 1200. The data flows involved in such control connections have a relatively small impact on the modular design of the decoder and therefore do not significantly contribute to the design.

The encoding and decoding methods described using variable length encoding may be implemented using processing architectures different to those described with reference to

Figs. 11 and 12. Because particular parallel processing architectures provide optimisation opportunities that are not available on other architectures, modifications to the variable length coding arrangements described may be required to enable exploitation of such optimisation opportunities. One example of a parallel processing architecture is a Graphics Processing Unit (GPU) implementation. A GPU is a type of processor, such as the

13575128 1

-382017225027 05 Sep 2017 processor 205, designed to enable a relatively high level of parallel data processing. Software code that is optimised for real-time execution on a GPU needs to allow for highly vectorised data processing. Vectorised code typically executes a sequence of small programs, called kernels, on many separate pieces of data. Execution time can be significantly reduced if there are no dependencies between the various runs of the kernel programs, as the lack of dependencies allows the small programs to execute in parallel. Parallel execution is particularly beneficial on GPU hardware, which typically comprises many parallel processing units capable of running thousands of parallel threads of program execution, as parallel execution enables real-time execution of the video processing system 100 on the relatively low-cost and widely available GPU hardware.

The variable length code implementation described in the tables 810 and 820 of

Figs. 8A and 8B have a sequential dependency that could prevent, or at least complicate, optimised implementations on a GPU. Specifically, the size of each variable length code cannot be determined ahead of time and thus a code-stream that contains concatenated variable length codes must be processed serially, from the first to last coefficient group, to parse the variable length codes. This requirement for serial processing can, however, be remediated by using a variation on the variable length coding methods described in relation to Figs. 3, 5, 8 and 9.

In the variation or modification of the variable length coding method, the variable length codes are packed with other data from the bitstream (e.g. sign data), and/or partitioned such that some bits (i.e. bits of a given variable length code that exceed a fixed bit length) are coded in a separate stream. As a consequence, a first stream of fixed length codes is created, where the length of the code associated with each coefficient group can be determined in advance based on information available (at the encoder 114 and decoder

134). Once the first stream is decoded, the length of the codes associated with a second stream of variable length (rotation) codes can be determined. Encoding or decoding of the second stream of variable length codes generates reconstructed coefficient magnitudes. By analysing the magnitudes, the length of the codes in a third stream of variable length (sign) codes can be determined. As a result, decoding all of the variable length data streams encoding the coefficient values in a short sequence of steps is possible where each step can be executed in a parallel manner. Executing the steps in a parallel manner makes good use of available GPU processing capabilities. An embodiment of most significant bit-plane coding that employs the modification or variation of the variable length encoding technique is now described with reference to Fig. 13A.

13575128 1

-392017225027 05 Sep 2017

Fig. 13A shows a table 1300 illustrating a variation on the variable length codeword alphabets of tables 810 and 820. The table 1300 shows a code-word alphabet where the variable length code corresponding to a most significant bit-plane nibble 1310, along with the corresponding sign information, is divided into a fixed length code-word 1311, having a length of four; a variable length rotation part 1312; and a variable length sign part 1313. A code-word length of four is used for the fixed length part so that a concatenation of all codes corresponding to each coefficient group is a multiple of four bits in length. Use of the code-word length of 4 is also referred to as nibble alignment - where a nibble (four bits) is half of a byte (eight bits) and may be exploited to increase throughput of the video processing system 100 by increasing the minimum granularity of data manipulated in the packer module 1190 and the unpacker module 1210 from one bit to four bits, for the affected portions of the code-stream (e.g. 115, 133). The code-word alphabet of table 1300 generates a coding advantage when the MSB-plane is encoded with fewer than four bits, e.g. with three bits. When the MSB-plane is encoded with three bits, the three bit codes are expanded to four bit codes by the incorporation of extra coded data, to maintain the advantageous property of nibble alignment. Specifically, the three bit codes are converted to four bit codes by incorporating one bit of sign information.

As shown in the table 1300, the three bit codes (indicated as codes 1321) are used only in cases where there is a single coefficient in the coefficient group that is known to be significant by examination of only MSB-plane (i.e. without also considering other coded bit-planes associated with the coefficient group). The extra coded data is preferably the sign bit associated with the coefficient. The coefficient may be any one of the four coefficients present in the coefficient group. The extra coded data is used to code the sign bit because the presence of the sign bit may be determined based only on examination of the MSB-plane of the coefficient group.

In the table 1300, two cases for sign bits are shown. One or more sign bits that correspond to coefficients in the coefficient group that are known to be significant by examination of the MSB-plane only are denoted with an upper case “S” character within the code-word 1311 or sign part 1313 columns. The presence of zero or more additional sign bits for the coefficient group is dependent on bit-planes other than the MSB-plane.

The additional sign bits are denoted with a lower case “s”. As a consequence, the number of sign bits contained in the bit-stream for a coefficient group will be at least equal to the number of upper case “S” characters and at most equal to the number of upper and lower case “S” or “s” characters.

13575128 1

-402017225027 05 Sep 2017

In the case of the three bit codes 1321 (also referred to as length 3 codes), depending on the number and value of additional bit-planes encoded, up to three additional coefficient locations will be non-zero. Accordingly, up to three additional sign bits 1313 may be required to encode the coefficient values associated with the coefficient group. The additional sign bits are packed into a separate part of the bit-stream 115 or 133. The separate packing allows the additional sign bits to be decoded in parallel in a subsequent processing pass, e.g. using different GPU kernels. The length-three codes represent a saving of one bit relative to the fixed length of four used for coding bit-planes, and as a consequence have an associated coding cost difference 1314 of-1 (i.e. a saving of one bit).

A second range of MSB-plane nibbles are encoded using codes of length four bits, indicated as 1322. As the four bit codes already meet the required fixed length, i.e. four bits, no further packing or partitioning of the associated data stream code-words (i.e. the corresponding subset of the code-words 1311) is required. The four bit codes 1322, also referred to as length four codes, have the same size as the original raw bit-plane and so have an associated coding cost difference 1314 of zero (0) bits.

A third range of MSB-plane nibbles are encoded using codes of length five (5) bits, indicated as codes 1323 and also referred to as length five codes. The associated codewords (as shown in 1311) do not have a unique mapping to MSB-plane nibbles. Specifically, each code-word 1311 of the group 1323 represents two different MSB-plane nibbles 1310 of the group 1323. To disambiguate which of the two different MSB-plane nibbles is coded, an extra bit is required. The table 1300 shows a rotate bit 1312, also referred to as a rotation bit or rotation flag, used to achieve the disambiguation. The rotate bit 1312 is only present for specific code-words (the code-words belonging to the length five codes 1323). The code-word 1311 and the rotate bit 1312 together form a code of length 5 bits. Moreover, the allocation of codes in the code-words 1311 has been chosen such that it is possible to determine if a rotate bit is required based on either the code-word (i.e. 1311) having two leading ones and at least one zero, or the MSB-plane (i.e. 1310) bit pattern having three ones or being one of‘1010’ or ‘0101’. A logical relationship exists between the two MSB-plane nibbles represented by each of the truncated length five (5) codes -one of the MSB-plane nibbles is a (wrapped) rotation of the other. While the logical relationship between the two MSB-plane nibbles is not essential, the logical relationship is intuitive and minimises the logic and dependencies involved in decoding the final MSBplane nibble. The length five (5) codes 1322 have one more bit than the original raw bitplane and so have an associated coding cost difference 1314 of+1.

13575128 1

-41 2017225027 05 Sep 2017

The different coding costs 1314 across the group 1321, 1322 and 1323 are a reflection on the expected frequency of each of the associated MSB-plane patterns in typical coded images. Specifically, MSB-plane patterns typically contain a single one (1) most frequently and three ones least frequently. As a result, images are typically encoded using a higher proportion of 3 bit codes (from 1321) than 4 or 5 bit codes (from 1322 or 1323 respectively), and overall fewer bits.

In summary, the table 1300 of Fig. 13A defines the contents of 3 bit-streams - data, rotation and sign -- that arise from coding the most significant bit-plane of coefficient groups when using a modified variable length coding. The modified variable length coding scheme has advantages when vectorising the computation involved in encoding and decoding the coefficient groups. The modified scheme creates a data bit-stream, comprising code-words from 1311, that is a whole number of nibbles. The length of the data bit-stream for each coefficient group can be determined based on previously determined information. The previously determined information comprises the most15 significant bit-plane indices and the truncation indices.

To create codes that all have 4 bits, the length three (3) codes 1321 are expanded to bits by the additional of other information for the coefficient group. The length five (5) codes of group 1323 are divided into a base 4 bit pattern (1311) and a 1 -bit modifier (1312) that is encoded separately. The other information used to expand the length three (3) codes is the sign bit for the coefficient that is significant at the MSB-plane level. The modifier that is separately coded for the length five (5) codes is a rotation flag (1312) that indicates that the required MSB-plane is a (wrapped) rotation of the decoded MSB-plane.

Other modifications in the variable length coding described may be used. For example, the length three (3) codes could be expanded to include an additional bit of precision information from a coefficient in the group rather than the sign bit. However using an additional bit of precision information has relatively little coding benefit. Alternatively, the additional bit could be used to store the rotation bit for an adjacent coefficient group. However, storing the rotation bit for an adjacent coefficient group would create dependencies between coefficient groups that would prevent the coefficient groups being processed in parallel. The sign information represents a preferable choice for expanding the 3 bit code-words because the sign information is required for decoding the coefficient values and must otherwise be coded elsewhere in the bit-stream. The sign information also has a clear relationship to the decoded MSB-plane through the position of the sign bit and allows for processing in parallel.

13575128 1

-422017225027 05 Sep 2017

In other arrangements, an arbitrary modifier for the fifth bit 1312 of the length five (5) codes 1323 can be selected. Rotate is preferably selected because the same logic can be applied to all five (5) bit codes. Further, a right rotation is selected to avoid the possibility of writing ones beyond the end of the decoder data buffer in cases where the right-most coefficient group extends past the edge of the image. Using a right rotation is also consistent with the use of zero padding (at the encoder 114) for coefficient groups on the right of the image. Zero padding on the edge is desirable because zero padding increases the likelihood of using length three (3) or length four (4) codes at the image boundary.

To further describe the modified variable length coding scheme, a coding example is described with reference to Fig. 13B. Fig. 13B depicts an array 1390 having three (3) groups of coefficients 1351, 1352 and 1353. The groups 1351, 1352 and 1353 have numeric values 1320 expanded to show their bit-plane structure. Each coefficient has a sign bit 1335 and magnitude bits 1330. Each magnitude bit 1330 is associated with a bitplane index 1321. Within each group of coefficients the magnitude bits can be further partitioned into leading zeros 1331, significant coded bits 1332 and truncated bits 1333.

The partition of the magnitude bits is defined by a most significant bit-plane index 1340 and a truncation bit-plane index 1350 as determined at steps 335 and 340. Within each group a nibble that specifies the most significant bit-plane 1354, 1355, 1356 is subjected to the modified arrangement of variable length coding described in relation to Fig. 13 A.

Modified variable length coding produces three component bit-streams corresponding to data 1371, rotation 1372 and sign 1373. The three component bit-streams are subsequently concatenated to form a precinct bit-stream 1374. The portion 1372 includes one bit corresponding to each five bit code used to encode the most significant bit-plane (e.g. 1354, 1355, 1356) in the groups of coefficients. The portion 1372 is present if any most significant bit-plane nobbles relates to a five bit code. The portion 1373 includes sign bits corresponding to non-zero coefficient values in the coefficient groups. A coefficient is determined to be zero if the coefficient only has zero valued bits between the most significant bit-plane index 1340 and truncation bit-plane index 1350 for the coefficient group. The component bit-streams 1372 and 1373 form portions of the bit-stream 1374.

The portions are combined in the step 390 which may for example be executed by a packer module 1190 of the video encoder 114.

In the example of Fig. 13B, the MSB-plane nibble 1354 for the coefficient group 1351, is the sequence ‘1110’ which corresponds to a five (5) bit code. The data code-word for the nibble 1354 is ‘1101’ (see Fig. 13A). The code-word ‘1101’ is written to the data

13575128 1

-43 2017225027 05 Sep 2017 bit-stream 1371 along with the contents of the remaining bit-planes down to the bit-plane truncation level 1350. Further, the rotation bit 1312 corresponding to the MSB-plane nibble 1354 is 0, as indicated by Fig. 13A. The rotation bit value ‘0’ is written to the rotation bit-stream 1372. The decoder 134 generates the sequence ‘1110’ based on the data code-word and no rotation is required to yield the final MSB-plane. The coefficient group 1351 has no zero valued coefficients after truncation so an entire sign nibble 1364 (‘0001’) for the coefficient group 1351 is also written to the sign bit-stream 1373.

In the example of Fig. 13B, the MSB-plane nibble 1355 for the coefficient group 1352, is the sequence ‘0100’ which corresponds to a three (3) bit code. The data code-word for the nibble 1355 is ‘001S’, as indicated by Fig. 13A. From a sign nibble 1365 (‘1000’) for the coefficient group 1352, the sign bit corresponding to the 1 in the MSB-plane 1355 is a zero (0). Accordingly, ‘0’ is packed as ‘S’ to create the final 4-bit code-word 1361 (‘0010’). The code-word 1361 is written to the data bit-stream 1371 along with the contents of the remaining bit-planes down to the bit-plane truncation level 1350. The coefficient group has a single zero valued coefficient after truncation in position two (where positions are in the range 0-3). As the sign bit for position 1 has already been packed into the data bit-stream, the remaining sign 2 bits 1362 (‘ 10’) of the sign nibble 1365 corresponding to positions 0 and 3 are written to the sign bit-stream 1373.

In the example of Fig. 13B, the MSB-plane nibble 1356 for coefficient group 1353, is the sequence ‘1001’ which corresponds to a length four (4) code. The data code-word for the nibble 1356 is ‘1011’, as shown by Fig. 13A. The code-word ‘1011’ is written to the data bit-stream 1371 along with the contents of the remaining bit-planes down to the bitplane truncation level 1350. The coefficient group 1353 has no zero valued coefficients after truncation so an entire sign nibble 1366 (‘ 1100’) for the coefficient group 1353, is also written to the sign bit-stream 1373.

A method 1400 for encoding the MSB-plane of a coefficient group is shown in Fig.

14. The method 1400 is typically implemented as one or more modules of the software application 233, controlled by execution of the processor 205, and stored in the memory 206. The method 1400 can be implemented by the video encoder 114.

The method 1400 represents an expansion of the steps 350-365 of the method 300 of Fig. 3. The method 1400 receives the coefficient groups, MSB-plane index and truncation index determined in execution of steps 301 to 345 of the method 300. The method 1400 begins at a check step 1401. The step 1401 executes by testing whether a property of the coefficient group makes the coefficient group suitable for variable length

13575128 1

-442017225027 05 Sep 2017

MSB-plane coding. Specifically at step 1401, the number of bit-planes present between the most significant bit-plane index and the truncation level is compared to 1. If there is found to be only 1 bit-plane to be coded for the coefficient group (Yes at step 1401) then variable length coding is bypassed and the method 1400 continues to a writing step 1402. The

MSB-plane bits are written to the data bit-stream at execution of step 1402. The method 1400 continues from step 1402 to a determining step 1446. The non-zero coefficient locations are determined at execution of step 1446. The method 1400 continues from step 1446 to a writing step 1448. The sign bits corresponding to the determined locations are written to the sign bit-stream at execution of step 1448.

If step 1401 indicates that more than one bit plane is to be encoded (No at step

1401), variable length coding is used. The method 1400 continues to a determining step 1410. The modified variable length coding process starts at step 1410 by determining the MSB-plane nibble (for example 1354, 1355 or 1356).

The method 1400 continues to a determining step 1412. The MSB-plane nibble determined at step 1410 is used at step 1412 to look up a corresponding code-word. The method 1400 continues from step 1412 to a check step 1420. Step 1420 checks if the codeword is a short code-word (that is a three bit or length three code-word).

If the code-word is determined at step 1420 to be a length three (3) code (Yes at step 1420), then the method 1400 continues to a determining step 1422. The position of the significant bit and the corresponding sign bit is determined at step 1422 (as described in relation to the MSB-plane nibble 1355). The method 1400 continues to an augmenting step 1424. Effectively the code-word encodes a position of a set bit and a sign for the coefficient at the position. The code-word is completed by augmentation with the corresponding sign bit at step 1424. For example, ‘0’ is selected for the nibble 1355 as described in relation to Fig. 13B and the corresponding code-word. The method 1400 continues to a writing step 1426. The four bit code-word is written to the data bit-stream at execution of step 1426 (for example ‘0010’ is written to the data bit-stream 1371). The method 1400 continues to a writing step 1427. The remaining bit-planes down to the truncation level are appended to the data bit-stream directly without encoding at execution of step 1427. The method 1400 continues to a determining step 1428. The location of any additional non-zero coefficients is determined at execution of step 1428. The method 1400 continues from step 1428 to writing step 1488. The sign bits corresponding to the determined locations are written to the sign bit-stream at execution of step 1448.

13575128 1

-45 2017225027 05 Sep 2017

If, at step 1420, the code-word corresponding to the MSB-plane nibble was not a length three (3) code (No at step 1420), the method 1400 continues to a check step 1430. In execution of step 1430, a test is performed by the application 233 to check if the codeword is a length five (5) code (also referred to as a long code). If the code-word does correspond to a length five (5) code (Yes at 1430), the method 1400 continues to a determining step 1432. The MSB-plane nibble (for example 1354) is compared to the reference MSB-plane as shown by Fig. 13A corresponding to the code-word for rotation of zero at execution of step 1432. If the MSB-plane nibble is found to be a match to the codeword for rotation of zero at step 1432, the rotation is determined to be zero, otherwise, if the MSB-plane nibble is found not to be a match to the code-word for rotation of zero the rotation is determined to be one (1). The method 1400 continues to a writing step 1434.The rotation value determined at step 1432 is written to the rotation bit-stream (e.g. 1372) at execution of step 1434. The method 1400 continues under execution of the processor 205 from step 1434 to step 1442.

If the code-word is found at step 1430 to correspond to a length four (4) code (No at step 1430), the method 1400 continues to step 1442. The code-word is written to the data bit-stream at execution of step 1442. The method 1400 continues to writing step 1444. At step 1444, the remaining bit-planes down to the truncation level are appended directly without encoding to the data bit-stream . The method 1400 continues to a determining step

1446. At step 1446, the locations of all non-zero coefficients after truncation are determined and the method 1400 continues to writing step 1448. The sign bits corresponding to the determined locations are appended to the sign bit-stream at execution of step 1448.

The method 1400 ends at execution of step 1448. The sequence of MSB-plane indices, rotation and sign bits are output to step 370 and subsequently combined at step 390 of the method 300.

While step 1401 determines the suitability of variable length MSB-plane coding by determining whether the number of bit-planes to be encoded is greater than 1, more complex tests can be performed at step 1401. For example, an upper bound on the number of bit-planes can also be tested. Using an upper bound may provide an advantage in simplifying budget calculations performed by the budget calculator 1140 of the encoder 114, especially when quantisation is employed, as using an upper bound reduces the number of quantisation cases that need to be evaluated.

13575128 1

-462017225027 05 Sep 2017

A method 1500 for decoding the MSB-plane of a coefficient group is described in the flow diagram of Fig. 15. The method 1500 is typically implemented as one or more modules of the software application 233, controlled by execution of the processor 205, and stored in the memory 206. The method 1500 can be implemented by the video decoder

134.

The method 1500 represents an expansion of the steps 930-940 of Fig. 9 for arrangements where variable length encoding method of 1400 is used to form the bitstream 133. The method 1500 received the truncation indices and MSB-plane indices determined from steps 910 to 925 of the method 900. The method 1500 begins at a check step 1510. The step 1510 operates in a similar manner to step 1401 by testing whether a property of the coefficient group makes the coefficient group suitable for variable length MSB-plane coding. In particular, at step 1510 the number of bit-planes present between the most significant bit-plane index and the truncation level is compared to 1. The most significant bit-plane index and the truncation level values have been previously decoded from the bit-stream at steps 920 and 925 respectively of the method 900. If there is found to be only 1 bit-plane to be decoded for the coefficient group then the step 1510 returns Yes and variable length coding is bypassed. The method 1500 continues to step 1512. The MSB-plane bits are read directly from the data bit-stream at execution of step 1512. The method 1500 continues from step 1512 to an identifying step 1516. Non zero coefficient locations are determined at execution of step 1516. The method 1500 continues from step 1516 to a reading step 1518. The corresponding sign bits are read from the sign bit-stream at step 1518.

However if execution of step 1510 indicates that more than 1 bit-plane is present between the most significant bit-plane index and the truncation level (No at step 1510), variable length coding is used for decoding the code-word. Determining the number of bitplanes present effectively determines a portion of the bit-stream corresponding to plurality of bit-planes for a group of coefficients. The method 1500 continues from step 1510 to a reading step 1520. Determining that more than one bit-plane is present determines whether variable length coding was used for the coefficient group. The modified variable length decoding process starts at step 1520 by reading the fixed length MSB-plane code-word from the data bit-stream. The method 1500 continues to a check step 1522.

Step 1522 executes to determine if the code-word is a short (three (3) bit) code.

The code word is determined to be a short code at step 1522 if the code word starts with a leading 0. Step 1522 effectively relates to examining a bit value of the code-word at a

13575128 1

-472017225027 05 Sep 2017 predetermined position. Determining that the code-word is a short code effectively relates to determining that the code-word includes a sign bit for the coefficient - for example the last ‘0’ in the code-word 1361 (0010). However, other arrangements of step 1522 can be used. The position of the significant bit can be determined by looking up a sequence of bits corresponding to the code word for example. Alternatively, the position of the significant bit can be determined from the position of the set bit directly from the code word. If the code word is determined to be a short code (Yes at step 1522), the method 1500 proceeds from step 1522 to step 1524. The position of the significant bit is determined and set (to 1) at step 1524. In the context of the arrangements described for step 1524, the significant bit relates to the bit in the MSB-plane code-word set to ‘ 1 ’. Further, the sign bit of the largest coefficient in the coefficient group is set to the value of the last bit of the code-word at step 1524. The length three (3) code words 1321 defined in the table 1300 of Fig. 13A are designed such that the first bit having a value of zero identifies the code as a length three (3) code, the middle two bits form a two bit integer value encoding the position of the significant bit in the MSB-plane and the final bit encodes the sign information. In this way step 1524 is achieved at low complexity in the decoder 134. Steps 1520 through 1524 effectively operate to determine the MSB-plane and a sign for a coefficient by decoding the code-word based on the modification of variable length encoding.

The method 1500 continues from step 1524 to a reading step 1526. Subsequent bit20 planes are read from the data bit-stream at execution of step 1526 and written into memory as reconstructed coefficient values for subsequent bit-planes. In order to determine which bits correspond to a particular group of coefficients a portion of the bit-stream corresponding to a plurality of bit-planes for a group of coefficients is determined. The portion of the bit-stream corresponding to coefficient magnitudes for the group of coefficients is determined based on the number of the bit-planes encoded for the group of coefficients and number of bits used to encode each bit-plane. Given fixed-length code words used to encode each bit-plane of coefficient magnitudes and previously decoded MSB-plane index and a truncation level, the portion of the bit-stream corresponding to the plurality of bit-planes for a group of coefficients can to be determined by a decoder. As such, portions of the bit-stream corresponding to different groups of coefficients can be determined in advance, thus allowing decoding of groups of coefficients in parallel. The method 1500 continues to an identifying step 1528. The reconstructed coefficients are tested to identify any additional non-zero values at execution of step 1528. The step 1528 only needs to test values in the coefficient group for which the sign has not already been

13575128 1

-482017225027 05 Sep 2017 set at step 1524. The method 1500 continues to step 1518. The corresponding sign bits are read from the sign bit-stream 1373 at execution of step 1518.

If the decision step 1522 finds that the MSB-plane code-word does not correspond to a short (three (3) bit) code (No at step 1522), then the method 1500 follows a different execution path. In this event, the code-word does not contain sign information (sign data). The method 1500 continues to a decoding step 1530. At execution of step 1530, the codeword is used to determine a base pattern for the most significant bit-plane of the coefficient group. The method 1500 continues to a reading step 1532. Subsequent bit-planes are read from the bit-stream at execution of step 1532.

The method 1500 continues to a check step 1534. At step 1534 a branch is performed based on a check of whether the MSB-plane code-word (or corresponding base pattern) corresponds to a long (length five (5)) code. The length five (5) codes 1323, defined in the table 1300 of Fig. 13 A, have the property that the first two bits are always set (1) but at least one bit in the code-word is not set (i.e. at least one bit in the code-word is zero). This property allows the test at step 1534 to be implemented with low complexity. Determining that the code-word is long effectively determines that a second portion of the bit-stream corresponding to rotation bits for the MSB-plane (such as rotation 1372 of the bit-stream 1374) must be read in order to decode the MSB-plane for the coefficient group. Typically the rotation bits for each precinct are present in the bit-stream immediately after the data code-words for the precinct and are, in turn, followed by the packed sign bits for the precinct. As the length of the data bit-stream can be determined based on the fixedlength size of the code words for each bit-plane, the decoded MSB-plane indexes and truncation level index, the location of the rotation bit-stream is determined at the decoder 134. Further, the length of the portion of the rotation bit-stream corresponding to each coefficient group can be determined from the MSB-plane code-word, allowing an index to the portion of the rotation bit-stream corresponding to each coefficient group to be calculated prior to parsing the rotation bit-stream. Pre-calculation of indexes corresponding to the rotation bits for each coefficient group can be advantageous when the decoding method 1500 is implemented using a GPU or similar vectorised processing architecture as each coefficient group can be reconstructed as a separate execution thread, at least two groups of coefficients are decoded in parallel using a plurality of threads with each thread of the plurality of threads reading a respective portion of the bit-stream for a corresponding group of coefficients to be decoded. Each of the bit-stream portions may be padded with extra bits as required to make the bit-stream’s total length an exact multiple of 4 or 8 in

13575128 1

-492017225027 05 Sep 2017 order to improve the simplicity and efficiency of bit-stream reading by hardware or software. Further, it is advantageous for variable length sections of the bit-stream, such as the packed rotation bit-stream, to be prefixed with a length. This allows highly pipelined architectures (such as FPGA or ASIC) to create indexes into each portion of the bit-stream.

The indexes allow processing of rotation and sign bits to proceed before the entire data bitstream has been processed.

If the code-word is found to be a long (five (5) bit) code at step 1534 (Yes at step 1534) then the method 1500 continues to a reading step 1536. For the codes 1323, the fixed length four bit code-word does not fully specify the MSB-plane nibble. A corresponding rotation bit or amount is required to fully specify or decode the MSB-plane nibble (such as 1354). A rotation amount or bit is read from the rotation bit-stream (for example 1372) at step 1536. The rotation amount is a 1 -bit value. If the rotation amount is set (1) the bits in the most-significant bit-plane of the coefficients are modified by applying the rotation amount. In applying the rotation amount, the bits in the most-significant bit15 plane of the coefficients are rotated to the right by one position at step 1536. The rotation operation is wrapped so that the right-most bit is transferred to the left-most position after the rotation operation and all other bits are shifted by one position to the right. The relationship conferred by the rotation is shown in the pairs of MSB-plane nibbles 1310 corresponding to each unique code-word 1311 in the length five (5) codes 1323 of Fig.

13A. In step 1536, the most significant bit-plane is decoded using the corresponding codeword and a rotation bit stored in a variable length portion of the bit-stream The method 1500 continues from step 1536 to an identifying step 1538. If decision step 1534 finds the code-word to not belong to the set of long (length five (5)) codes, the method 1500 continues from step 1534 to step 1538.

Non-zero coefficient locations of the code-word are identified at execution of step

1538. The method 1500 continues from step 1538 to step 1518. The corresponding sign bits are read from the sign bit-stream (for example 1373) at step 1518. Typically the packed sign bits for a precinct are present in the bit-stream immediately after the packed rotation bits for the precinct. The locations of missing sign bits can be determined after the coefficient magnitudes, including the MSB-plane codes, have been reconstructed. As such, the length of the sign bit-stream corresponding to each coefficient group can be determined based on reconstructed coefficient magnitudes. As with the rotation bit-stream, an index to the portion of the sign bit-stream corresponding to each coefficient group can be determined using the length of the rotation bit-stream and by examining reconstructed

13575128 1

-502017225027 05 Sep 2017 coefficient magnitudes within a respective group of coefficients. Pre-calculation of indexes corresponding to the sign bits for each coefficient group can be advantageous when the decoding method 1500 is implemented using a GPU or similar vectorised processing architecture as each coefficient group can be reconstructed as a separate execution thread.

Each of the bit-stream portions, e.g. the rotation bit-stream and sign bit-stream, may be padded with extra bits as required to make the bit-stream’s total length an exact multiple of 4 or 8 in order to improve the simplicity and efficiency of bit-stream reading by hardware or software. Further, it is advantageous for variable length sections of the bit-stream, such as the packed sign bit-stream, to be prefixed with a length. Prefixing the variable length sections with a length allows highly pipelined architectures (such as FPGA or ASIC) to create indexes into each portion of the bit-stream. These indexes allow processing of rotation and sign bits to proceed before the entire data bit-stream has been processed. The method 1500 ends at step 1518. The resultant decoded values are output at step 1518 for use by step 945 for example. Each of steps 1528 and 1538 determines the portion 1373 of the bit-stream based on the number of non-zero coefficient values in the group of coefficients.

Both the modified variable length encoding method 1400 and decoding method 1500 include a test to determine whether variable length MSB-plane encoding should be applied. The test is based on an analysis of typical image data subjected to the wavelet encoding procedure described herein. Exemplary results of the analysis are depicted in Figs. 16A to 16C. In particular, a pie chart 1610 of Fig. 16A shows distribution of length three (3), four (4) and five (5) codes for the case where only a single bit-plane is to be coded. In the case of Fig. 16A, quantisation according to Equation (4), performed at the encoder 114 in step 345 has resulted in a relatively uniform distribution of code-words, which is unfavourable for variable length coding. For variable length coding to be advantageous the frequency of shorter (length three (3)) codes is greater than the frequency of longer (length five (5)) codes. The chart 1610 indicates that this is not true, after quantisation, when only a single bit-plane is being encoded for the coefficient group.

However, as more coefficient bit-planes are encoded, the effect of quantisation on the MSB-plane is reduced and the distribution increasingly favours shorter codes. A pie chart 1611 of Fig. 16B shows the relative frequencies of length three (3), four (4) and five (5) codes across all cases where more than one bit-pane is encoded for a coefficient group. The chart 1611 shows a clear dominance of shorter (length three (3)) codes in this case.

13575128 1

-51 2017225027 05 Sep 2017

If the quantisation step 345 of the method 300 were to perform truncation only (otherwise known as dead-zone quantisation) then there would be no impact on the distribution of code-words and the modified variable length coding and decoding methods

1400 and 1500 could be carried out without the initial test steps -- 1401 and 1510 respectively. In this event, the methods 1400 and 1500 would proceed as if the tests 1401 and 1510 always returned “No”, indicating that the variable length coding should not be bypassed. Alternatively, the initial test steps 1401 and 1510 may only test an upper bound on the number of bit-planes being encoded for a coefficient group.

A graph 1620 in Fig. 16C shows how the coding advantage of MSB-plane packing 10 varies as an upper bound on coefficient precision is additionally implemented as part of the test steps 1401 and 1510. The graph 1620 indicates that coding gain increases as more bitplanes are encoded however there is a diminishing return with very little incremental advantage for applying the technique when 6 or more bit-planes are to be encoded for the coefficient group. For this reason, and because some complexity reduction may be possible, another arrangement of the modified variable length encoding coding method 1400 and the decoding method 1500 employ more complex test steps - respectively 1401 and 1510 incorporating both lower and upper bounds on the number of bit-planes. In yet another variation a higher lower bound (i.e. greater than one) may be used. For example, the tests

1401 and 1510 may return true when the coefficient bit planes to encode or decode is one (1) or two (2). In this case the encoder method 1400 is modified so that step 1402 proceeds to step 1444 where the additional bit-planes are written to the data bit-stream (rather than at step 1446). Similarly in the decoding method 1500, an additional step is required between steps 1512 and 1516 (or incorporated into one of steps 1512 and 1516) to read the additional encoded bit-planes from the data bit-stream.

By applying a variable length coding to the most significant bit-plane of a coefficient group, some coding advantage can be achieved. However the process of calculating the MSB index budget is more complex compared to a fixed-length coding approach for the MSB-plane. Specifically, at the step 610 of the method 600, the processor 205 must determine the quantised form of the most significant bit-plane at each possible quantisation level. While a worst case budget may be used, as previously described with reference to Fig. 7, achieving an exact budget is also advantageous.

A low cost method 1700 for determining the most significant nibble for a coefficient group, that takes into account the effect of quantisation but does not perform quantisation, is shown in Fig. 17. The method 1700 is typically implemented as one or

13575128 1

-522017225027 05 Sep 2017 more modules of the software application 233, controlled by execution of the processor 205, and stored in the memory 206.

The method 1700 has lower computational complexity than the approach of performing quantisation for each possible truncation level. The method 1700 exploits the fact that quantisation will set the most significant bit-plane, when the full precision coefficient magnitude exceeds a threshold. As a result, each one (1) in the MSB-plane can be determined by comparing the corresponding coefficient magnitude to a threshold. The threshold is determined for a coefficient group from the MSB-plane index and truncation level index for the coefficient.

The method 1700 starts at a determining step 1710. The MSB-plane index for the coefficient group is determined at execution of step 1710. The MSB-plane indices were previously calculated at step 335 of the method 300. Execution of the method 1700 proceeds to a determining step 1720. A threshold is determined at execution of step 1720 based on the previously determined MSB-plane index and a truncation level for the coefficient group. The threshold values may be determined using a brute force search and stored in a table in memory that is indexed using the MSB-plane index and truncation level. A more memory efficient method is described below with reference to Fig. 18. Execution of the method 1700 proceeds to an initialisation step 1730. A loop variable -- “index” - is initialised to zero on execution of step 1730. Execution of the method 1700 proceeds to a check step 1740.

A loop comprising steps 1740 though to 1790 executes once for each coefficient location in the coefficient group. The checking step 1740 determines whether the loop variable - index - is within the range of coefficient positions. If the checking step 1740 returns Yes, indicating an index value that is within the range of coefficient positions, then execution of the method 1700 proceeds to a step 1750. The coefficient magnitude is retrieved at execution of step 1750. Execution of the method 1700 proceeds from step 1750 to a determining step 1760. At step 1760, the value of the bit in the MSB-plane of the coefficient is determined. Execution of the method 1700 then proceeds to a check step 1770 where the coefficient magnitude is compared to the threshold determined previously at step 1720. If checking step 1770 indicates that the coefficient magnitude is greater than the threshold (Yes at step 1770), then execution of the method 1700 proceeds to an updating step 1780. The bit value determined at step 1760 is set to one (1). Execution of the method 1700 then proceeds to step 1790.

13575128 1

-53 2017225027 05 Sep 2017

If the checking step 1770 indicates that the coefficient magnitude is not greater than the threshold determined at step 1720 (No at step 1770) then execution proceeds to a setting step 1790. The determined value for the most significant bit of the currently indexed coefficient is written to the currently indexed position in a nibble at step 1790. The nibble is a 4-bit storage, in which there is one bit position corresponding to each coefficient position, used to store bits as they would appear in the most significant bit-plane of the quantised coefficient group. Execution of the method 1700 then proceeds to a loop variable updating step 1795. The value of index is incremented at execution of step 1795.

If the checking step 1740 returns No, indicating an index value that is not within 10 the range of coefficient positions, execution of the method 1700 proceeds to step 1799. At determining step 1799, the nibble value is used to look up the cost difference of bits required to code the MSB-plane. The cost difference is determined according to column 1314 of the table 1300. The cost difference is added to the default coding cost (of four) to determine the final cost for coding the nibble and this value is returned by the determining step 1799 to step 620 of process 600. Execution of the method 1700 ends with determining step 1799.

When using the quantisation defined by Equation (4) the threshold value can be determined using a bit manipulation at step 1720. Specifically, a base threshold value is determined from a positive (unsigned) integer containing all ones by zeroing every nth bit, where n is the number of bit-planes to be coded for the coefficient group and is equal to the difference between the MSB-plane index determined at step 335 and a truncation level index determined at step 340. The threshold is determined by right shifting the base value so that the most significant bit position is one below the MSB-plane index determined for the coefficient group.

A table 1800 of base threshold values in both binary 1822 and hexadecimal 1823 form for a range of values of n 1821 is shown in Fig. 18. Based on the table 1800, the determining step 1720 determines a threshold according to ^tb_m,b_t = = τ[Β_Μ - B_T] » (B_P - S_M) Equation (13)

In Equation (13), τ[η] is the base threshold value table 1800, B_M and B_T are the

MSB-plane index (for example 540, 1340) and the truncation bit-plane index (for example 550, 1350), respectively, of a coefficient group, and B_Pis one more than the number of bitplanes present in the base threshold table value. B_P must therefore be one greater than the maximum value that the MSB-plane index S_Mcan take on. On a typical arrangement,

B_P = 16. The table 1800 allows the storage required for threshold determination to be

13575128 1

-542017225027 05 Sep 2017 minimised. If the bit-pattern structure of the base threshold values is employed, individual values in the table 1800 may be generated as required with relatively little computational overhead and no table storage. Alternatively a small table containing the 12 required values can be used and the threshold values generated by bit-shifting the appropriate entry. If an upper bound on the number of bit-planes n is implemented as part of checking steps 1401 and 1510 then the size of the table 1800 can be further reduced.

INDUSTRIAL APPLICABILITY

The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video signals for a low-latency (sub-frame) video coding system.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

1. A method for decoding a precinct of video data from a bit-stream, the method comprising:

determining a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data;

determining a portion of the bit-stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length;

determining the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient;

2. The method according to claim 1, wherein the fixed length code-word for the most significant bit-plane encodes a position of a set bit within the most significant bit-plane and a sign for the coefficient at the position.

3. The method according to claim 1, wherein a code-word for a most significant bitplane for a further group of coefficients does not contain sign data.

4. The method according to claim 3, wherein the most significant bit-plane for the further group of coefficients is decoded using the corresponding code-word and a rotation bit stored in a variable length portion of the bit-stream.

5. The method according to claim 4, wherein decoding the most significant bit-plane for the further group of coefficients comprises:

determining the variable length portion of the bit-stream corresponding to a plurality of rotation bits of most significant bit-planes of groups of coefficients;

13575128 1

-562017225027 05 Sep 2017 determining a value of the rotation bit in the variable length portion of the bitstream based on the code-word for the most significant bit-plane for the further group of coefficients;

determining a plurality of bits for the most significant bit-plane for the further group of coefficients using the code-word; and modifying the determined plurality of bits by applying a rotation amount corresponding to the determined value of the rotation bit.

6. The method according to claim 1 or claim 3 further comprising:

determining a further portion of the bit-stream corresponding to a plurality of sign bits of the group of coefficients based on the number of non-zero coefficient values in the group of coefficients; and determining sign information for coefficients within the group of coefficients by reading sign bits from the further portion of the bit-stream.

7. The method according to claim 1, wherein at least two groups of coefficients are decoded in parallel using a plurality of threads.

8. The method according to claim 7, wherein each thread of the plurality of threads reads a respective portion of the bit-stream for a corresponding group of coefficients to be decoded.

9. The method according to claim 1, wherein the code-word is determined to comprise a sign bit for the coefficient by examining a bit value of the code-word at a predetermined position.

10. A non-transitory computer readable medium having a program stored thereon for decoding a precinct of video data from a bit-stream, the program comprising code for determining a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data;

code for determining a portion of the bit-stream corresponding to a plurality of bitplanes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length;

13575128 1

-572017225027 05 Sep 2017 code for determining the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient;

code for determining a further bit-plane by decoding a remaining code-word in the plurality of code-words; and code for decoding the precinct using the decoded code-words and signs.

11. A system for decoding a precinct of video data from a bit-stream, comprising: a memory for storing data and a computer readable medium;

a processor coupled to the memory for executing a computer program, the program having instructions for:

determining a portion of the bit-stream corresponding to a plurality of bitplanes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length;

12. A video decoder configured to:

receive a precinct of video data from a bit-stream;

determine a most significant bit plane index and a truncation index for a group of coefficients in a wavelet subband of the precinct of the video data;

determine a portion of the bit-stream corresponding to a plurality of bit-planes of the group of coefficients based on the most significant bit plane index, the truncation index and a fixed length of each of the plurality of bit planes, the determined portion comprises a plurality of code-words of the fixed length;

13575128 1

-582017225027 05 Sep 2017 determine the most significant bit-plane and a sign for a coefficient in the group of coefficients by decoding a code-word corresponding to the most significant bit plane from the plurality of code-words, the code-word being decoded based on determining that the code-word comprises a sign bit for the coefficient;

determine a further bit-plane by decoding a remaining code-word in the plurality of code-words; and decode the precinct using the decoded code-words and signs.

13. A bit-stream representing an encoded precinct of video data, comprising:

a first portion, the first portion comprising (i) a fixed length code-word representing data at a most significant bit-plane index of a group of coefficients of the precinct and (ii) bit-planes below the most significant bit -plane index and above a truncation index of the precinct, and a sign portion representing sign bits associated with the group of coefficients, wherein the code-word representing data at the most significant bit-plane index includes one bit of sign information relating to a coefficient, and the sign portion is determined based on presence of non-zero coefficients below the truncation index.

14. The bit-stream according to claim 8, wherein the first portion further comprises a further fixed length code-word representing data a most significant bit-plane index of a further group of coefficients of the precinct, and the further fixed length code-word does not contain sign information.

15. The bit-stream according to claim 9, wherein the further code word does not fully specify the most significant bit-plane of the further group of coefficients of the precinct, the bit-stream further comprising:

a further portion representing presence of a rotation of bits in the further codeword, the further portion present if any most significant bit-plane data is not fully specified by the code-word.