AU2017204642A1 - Method, apparatus and system for encoding and decoding video data - Google Patents

Method, apparatus and system for encoding and decoding video data Download PDF

Info

Publication number
AU2017204642A1
AU2017204642A1 AU2017204642A AU2017204642A AU2017204642A1 AU 2017204642 A1 AU2017204642 A1 AU 2017204642A1 AU 2017204642 A AU2017204642 A AU 2017204642A AU 2017204642 A AU2017204642 A AU 2017204642A AU 2017204642 A1 AU2017204642 A1 AU 2017204642A1
Authority
AU
Australia
Prior art keywords
group
bits
plane
coefficients
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2017204642A
Inventor
Andrew James Dorrell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2017204642A priority Critical patent/AU2017204642A1/en
Publication of AU2017204642A1 publication Critical patent/AU2017204642A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING VIDEO A system and method for encoding and decoding an image. The method for encoding the image comprises forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image (345); determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words (350); and encoding the group of wavelet coefficients of the image using the determined variable length code for the formed word (350) and fixed length encoding for remaining bits of the coefficients within the group. 13307568_1 Form coefficient 330 groups 301 Initialise coding structures Determine MSB 335 indexes 302 Precision conversion Determine 340 306 truncation indices o 310 345 Colour Quantise transform magnitude 311 values " 315 350 Wavelet Encode MSB transform plane " 360 320 Encode other Precision bit-planes conversion 325 Encode sign bits Sign magnitude Encode MSB- 370 plane indices Construct bit- 390 Fig. 3 stream

Description

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING
VIDEO DATA
TECHNICAL FIELD
The present invention relates generally to digital video signal processing and, in particular, to a method, apparatus and system for encoding and decoding video data. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer program for encoding and decoding video data.
BACKGROUND
Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Much emphasis in video compression research is directed towards ‘distribution codecs’, i.e. codecs intended for distributing compressed video data to geographically dispersed audiences.
However, an emerging area of research is directed towards ‘mezzanine codecs’.
Mezzanine codecs are used for local distribution, i.e. within a broadcast studio, and are characterised by requirements for ultra-low latency, typically well under one frame, and greatly reduced complexity, both for the encoder and the decoder. Recent developments in such coding within the International Organisations for Standardisation / International
Electrotechnical Commission Joint Technical Committee 1 / Subcommittee 29 / Working Group 1 (ISO/IEC JTC1/SC29/WG1), also known as the Joint Photographic Experts Group (JPEG) have resulted in a standardisation work item named ‘JPEG XS’. The goal of JPEG XS is to produce a codec having an end-to-end latency not exceeding thirty-two (32) lines of video data, and capability for implementation within relatively modest implementation technologies, e.g. mid-range FPGAs from vendors such as Xilinx ®. The latency requirements of JPEG XS mandate use of strict rate control techniques to ensure coded data does not vary excessively relative to the capacity of the channel carrying the compressed video data.
In a broadcast studio, video may be captured by a camera before undergoing several transformations, including real-time editing, graphic and overlay insertion and mixing. Once the video has been adequately processed, a distribution encoder is used to
13307568 1
-22017204642 07 Jul 2017 encode the processed video data for final distribution to end consumers. Within the studio, the video data is generally transported in an uncompressed format. Transporting uncompressed video data necessitates the use of very high speed links. Variants of the Serial Digital Interface (SDI) protocol can transport different video formats. For example,
3G-SDI (operating with a 3Gbps electrical link) can transport 1080p HDTV (1920x1080 resolution) at 30fps and eight (8) bits per sample. Interfaces having a fixed bit rate are suited to transporting data having a constant bit rate (CBR).
Uncompressed video data is generally CBR, and compressed video data, in the context of ultra-low latency coding, is generally expected to also be CBR. As bit rates 10 increase, achievable cabling lengths reduce, which becomes problematic for cable routing through a studio. For example, UHDTV (3840x2160) requires a 4X increase in bandwidth compared to 1080p HDTV, implying a 12Gbps interface. Increasing the data rate of a single electrical channel reduces the achievable length of the cabling. At 3 Gbps, cable runs generally cannot exceed 150m, the minimum usable length for studio applications.
One method of achieving higher rate links is by replicating cabling, e.g. by using four 3G-SDI links, with frame tiling or some other multiplexing scheme. However, the cabling replicating method increases cable routing complexity, requires more physical space, and may reduce reliability compared to use of a single cable.
Thus, a codec that can perform compression at relatively low compression ratios (e.g. 4:1) while retaining a ‘visually lossless’ (i.e. having no perceivable artefacts compared to the original video data) level of performance is required by industry.
Compression ratios may also be expressed as the number of ‘bits per pixel’ (bpp) afforded to the compressed stream, noting that conversion back to a compression ratio requires knowledge of the bit depth of the uncompressed signal, and the chroma format. 25 For example 8b 4:4:4 video data occupies 24bpp uncompressed, so 4bpp implies a 6:1 compression ratio.
Video data includes one or more colour channels. Generally there is one primary colour channel and two secondary colour channels. The primary colour channel is generally referred to as the Tuma’ channel and the secondary colour channel(s) are generally referred 30 to as the ‘chroma’ channels. The luma channel captures the intensity information of the pixel and is typically denoted using the letter “Y”. When viewed, the image of the luma channel appears as a black and white (greyscale) image of the scene. The colour (hue) information is captured in the two chroma channels. The chroma channels are denoted using the letter “C”. The letter “C” denoting chroma channels is often used in combination
13307568 1
-3 2017204642 07 Jul 2017 with a colour axis specific subscript, the most common example being Cr and Cb which are used to indicate “red-green” and “blue-yellow” chroma axes respectively. A colour transform describes the conversion between a YCC (luma, chroma) and an RGB (red, green, blue) pixel representation. A colour transform typically takes the form of a matrix 5 operation applied to a vector of the pixel’s channel values. A number of different colour transforms are known in the art, some of which are exactly reversible in integer arithmetic, and are widely employed in image and video compression and transmission. A reorganisation of RGB values to GBR may in some cases be used as a colour transform where the G channel is subsequently treated as if the G channel were a luma channel and 10 the B and R channels are treated as if they were chroma channels.
The transform from RGB to YCC improves compressability of the video data in two ways. Firstly, the transform from RGB to YCC achieves some decorrelation to improve the effectiveness of the subsequent transform coding. Secondly, human visual sensitivity to fine detail is typically greater for the luma channel than for the chroma channels. The 15 greater human sensitivity to the luma channel means that chroma components can incur more loss, and hence more compression, for the same level of visual loss.
Video data is also represented using a particular chroma sampling format. The luma channel and the chroma channels are spatially sampled at the same spatial density when a 4:4:4 chroma format is in use. For screen content, a commonly used chroma 20 format is 4:4:4, as generally LCD panels provide pixels in a 4:4:4 chroma format. Other chroma sampling formats are also possible. For example, if the chroma channels are sampled at half the rate horizontally (compared to the luma channel), a 4:2:2 chroma sampling format is said to be in use. Also, if the chroma channels are sampled at half the rate horizontally and vertically (compared to the luma channel), a 4:2:0 chroma sampling 25 format is said to be in use. These chroma sampling formats exploit the characteristic of the human visual system that sensitivity to intensity is higher than sensitivity to colour.
While 4:2:0 and 4:2:2 chroma sampling formats are widely employed in distribution codecs, they are less applicable to studio environments, where multiple generations of encoding and decoding are common. Also, for screen content the use of 30 chroma formats other than 4:4:4 can be problematic as distortion is introduced to sub-pixel rendered (or ‘anti-aliased’) text and sharp object edges.
Colour channels are also associated with a bit-depth. The bit-depth defines the size, in bits, of samples in the respective colour channel, which determines a range of available
13307568 1
-42017204642 07 Jul 2017 sample values. Generally, all colour channels have the same bit-depth, although the colour channels may alternatively have different bit-depths.
Frame data may also contain a mixture of screen content and camera captured content. For example, a computer screen may include various windows, icons and control buttons, text, and also contain a video being played, or an image being viewed. The content, in terms of the entirety of a computer screen, can be referred to as ‘mixed content’. Moreover, the level of detail (or ‘texture’) of the content varies within a frame. Generally, regions of detailed textures (e.g. foliage, text), or regions containing noise (e.g. from a camera sensor) are difficult to compress. The detailed textures can only be coded at a low compression ratio without losing detail. Conversely, regions with little detail (e.g. flat regions, sky, background from a computer application) can be coded with a high compression ratio, with little loss of detail.
In terms of low complexity, one method is application of a ‘Wavelet’ transform, applied hierarchically across an image. Wavelet transforms have been studied in the context of the JPEG2000 image coding standard. The application of a wavelet transform across an image differs from a transform using a block-based codec, such as H.264/AVC. Application of H.264/AVC for example applies numerous discrete cosine transforms (DCTs) across the spatial extent of each frame. Each block in H.264/AVC is predicted using one of a variety of methods, achieving a high degree of local adaption, at a price of increased encoder complexity due to the need for more decisions to be made. In contrast, the Wavelet transform is applied over a wide spatial area, and thus the prediction modes available to a block based codec are generally not applicable, resulting in a greatly reduced disparity in the complexity of the encoder and the decoder.
In the context of wavelet-based compression techniques, achieving high visual quality and useful compression at low complexity is difficult. Achieving high visual quality and useful compression at low complexity is particularly difficult when strict local rate control is needed to meet ultra-low latency requirements. In a known method, the locations of zero coefficients arising from quantisation are coded efficiently by exploiting the tree structure of the wavelet transform. However, the known approach exploiting the tree structure of the wavelet transform requires extensive memory access, and is accordingly difficult to achieve with low complexity hardware. In another known method, blocks of wavelet coefficients are coded in bit-plane order using a context adaptive arithmetic coder. The bit-serial processing using a context adaptive arithmetic encoder also makes implementation in low complexity hardware difficult. In one example of a known
13307568 1
-5 2017204642 07 Jul 2017 low complexity wavelet codec, only the significant bit-planes of relatively small groups of coefficients are transmitted without specific compression processing. Compression is achieved because many coefficient groups have low magnitude values that require only a few bits to represent. Further, the set of most significant bit-plane indexes for each coefficient group can be subject to lossless compression processing at reduced cost because there are fewer index values than coefficients. However, the known low complexity wavelet codec method achieves only limited compression.
SUMMARY
It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
One aspect of the present disclosure provides a method for encoding an image, the method comprising: forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image; determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring 15 words; and encoding the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group.
According to another aspect, an estimate of an encoded size of the group of wavelet coefficients is determined using a maximum possible length of the variable length code for 20 the formed word.
According to another aspect, the variable length code for the formed word is determined using a lookup table.
According to another aspect, the variable length code for the formed word is determined using a lookup table, the lookup table containing shorter codes for formed 25 words with longer consecutive runs of zeros.
Another aspect of the present disclosure provides a method of forming an image by decoding a group of wavelet coefficients of the image, the method comprising: determining a most significant bit-plane index and a truncation bit-plane index for the group; determining, from a variable length code, values for bits of the most significant bit30 plane for a plurality of coefficients within the group; receiving a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bit-plane index and the truncation bit-plane index; and determining the plurality
13307568 1
-62017204642 07 Jul 2017 of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
According to another aspect, the values for bits of the most significant bit-plane are determined from the variable length code using a lookup table.
According to another aspect, the values for bits of the most significant bit-plane are determined from the variable length code using a lookup table, the lookup table using fewer bits for variable length codes corresponding to frequently occurring words.
Another aspect of the present disclosure provides a non-transitory computer readable medium having a program stored thereon for encoding an image, the program 10 comprising: code for forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image; code for determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and code for encoding the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length 15 encoding for remaining bits of the coefficients within the group.
Another aspect of the present disclosure provides a non-transitory computer readable medium having a program stored thereon for forming an image by decoding a group of wavelet coefficients of the image, the program comprising: code for determining a most significant bit-plane index and a truncation bit-plane index for the group; code for 20 determining, from a variable length code, values for bits of the most significant bit-plane for a plurality of coefficients within the group; code for receiving a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bit-plane index and the truncation bit-plane index; and code for 25 determining the plurality of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
Another aspect of the present disclosure provides a system for encoding an image, comprising: a memory for storing data and a computer readable medium; a processor coupled to the memory for executing a computer program, the program having instructions 30 for: forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image; determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and encoding the group of wavelet coefficients of the image using the determined
13307568 1
-72017204642 07 Jul 2017 variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group.
Another aspect of the present disclosure provides a system for decoding a group of wavelet coefficients of the image, comprising: a memory for storing data and a computer 5 readable medium; a processor coupled to the memory for executing a computer program, the program having instructions for: determining a most significant bit-plane index and a truncation bit-plane index for the group; determining, from a variable length code, values for bits of the most significant bit-plane for a plurality of coefficients within the group; receiving a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bit-plane index and the truncation bitplane index; and determining the plurality of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
Another aspect of the present disclosure provides a video encoder configured to:
form a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image; determine a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and encode the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group
Another aspect of the present disclosure provides a video decoder configured to: receive a group of wavelet coefficients of an image; determine a most significant bit-plane index and a truncation bit-plane index for the group; determine, from a variable length 25 code, values for bits of the most significant bit-plane for a plurality of coefficients within the group; receive a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bit-plane index and the truncation bit-plane index; and determine the plurality of coefficients from the determined values for 30 the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
Other aspects are also disclosed.
13307568 1
-82017204642 07 Jul 2017
BRIEF DESCRIPTION OF THE DRAWINGS
At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:
Fig. 1 is a schematic block diagram showing a sub-frame latency video encoding and decoding system;
Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which one or both of the video encoding and decoding systems of Fig. 1 may be practiced;
Fig. 3 is a schematic flow diagram showing a method for encoding video data with 10 sub-frame latency;
Fig. 4A is a schematic block diagram showing the structure of a wavelet transform suitable for achieving a low latency conversion of pixel values to wavelet coefficients;
Fig. 4B is a diagram showing a logical organisation of wavelet transform coefficients into sub-bands;
Fig. 4C is a diagram showing an in-memory organisation of an incremental output of a wavelet transform processor;
Fig. 5 shows a set of coefficient groups and corresponding characteristics;
Fig. 6 is a schematic flow diagram showing a method of determining truncation bitplane indices for sub-bands using a rate allocation model;
Fig. 7 is a graph providing a visual depiction of how unused bit-budget from a coded unit of video data within a frame can be redistributed for use in a next coded unit of video data;
Figs. 8A to 8D show tables and pseudo-code for implementing a variable length coding and decoding method;
Fig. 9 is a schematic flow diagram showing a method for decoding video data with sub-frame latency;
Figs. 10A(l-2) show different possible arrangements for a forward wavelet transform;
Fig. 10B is a schematic block diagram showing a structure of an inverse wavelet transform suitable for achieving a low latency conversion of wavelet coefficients to pixel values;
13307568 1
-92017204642 07 Jul 2017
Fig. 11 is a schematic block diagram showing an example of functional modules for implementing a sub-frame latency video encoder and inter-connection of the functional modules; and
Fig. 12 is a schematic block diagram showing the an example of functional modules for implementing a sub-frame latency video decoder and inter-connection of the functional modules.
DETAILED DESCRIPTION INCLUDING BEST MODE
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
It is to be noted that the discussions contained in the Background section and that above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the inventor or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art.
The arrangements described relate to using a variable length code to encode the most significant bit-plane of coefficients within groups of wavelet coefficients.
Fig. 1 is a schematic block diagram showing functional modules of an example of a sub-frame latency video encoding and decoding system 100. The system 100 transfers video data from a source device 110 to a destination device 130 via a communication channel 120, for example a cable. A video source 112 typically comprises a source of uncompressed video data 113. The video source 112 can be an imaging sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote imaging sensor, for example. The uncompressed video data 113 is conveyed from the video source 112 to a video encoder 114 over a CBR channel, with fixed timing of the delivery of the video data. Generally, the video data is delivered in a raster scan format, with signalling to delineate between lines (‘horizontal sync’) and frames (‘vertical sync’). The video source 112 may also be the output of a computer graphics card, for example displaying the video output of an operating system and various applications executing upon a computing device. The computing device can for example
13307568 1
- 102017204642 07 Jul 2017 be a tablet computer, laptop or desktop computer. Content output by a graphics card is an example of ‘screen content’. Examples of source devices 110 that may include an image capture sensor as the video source 112 include smart-phones, video camcorders and network video cameras, and the like. As screen content may include smoothly rendered graphics and playback of natural content in various regions, screen content is also commonly a form of ‘mixed content’. The video encoder 114 converts the uncompressed video data 113 from the video source 112 into an encoded (compressed) video data bitstream 115 as described hereinafter in more detail with reference to Fig. 3.
The video encoder 114 encodes the incoming uncompressed video data 113. The video encoder 114 is required to process the incoming sample data in real-time. That is, the video encoder 114 is not able to stall the incoming uncompressed video data 113 (for example, if the rate of processing the incoming data were to fall below the input data rate). The video encoder 114 outputs compressed video data 115 (the ‘bit-stream’) at a constant bit rate. In a video streaming application, the entire bit-stream is not stored in any one location. Instead, minimum coded units (MCLJ’s) of compressed video data, corresponding to spatially contiguous groups of pixels, are continually being produced by the video encoder 114 and consumed by a video decoder 134 with intermediate storage, for example in the (CBR) communication channel 120. The CBR stream of compressed video data 115 is transmitted by the transmitter 116 over the communication channel 120.
Examples of the communication channel 120 include one or more SDI HD MI or display port links as well as other twisted pair links such as CAT5 (or similar) Ethernet cable, optic fibre links. The communication channel 120 can also be a radio connection such as that provided by WiFi (IEEE 802.11) or Bluetooth™. Alternatively, the communication channel can be an internal bus within a system such as a PCI, VESA or SATA or a chip interface such as a MIPI M-PHY physical layer interface.
Video data may be transferred from the source device 110 to the destination device 130 via an intermediate device 125 such as a non-transitory storage device using communication channels 121 and 122. A storage unit 127 such as a “Flash” memory or a hard disk drive can be used in the intermediate device 125, for example to provide a live delay for implementing a dump box functionality. In the intermediate device 125 where digital storage is implemented, a receiver 132 may convert the signal received from physical link 121 back to the encoded digital form as generated by the video encoder 114. The encoded digital form is written to the storage unit 127. A transmitter 116 is used to retransmit the video data over physical link 122.
13307568 1
-11 2017204642 07 Jul 2017
The destination device 130 includes a receiver 132, a video decoder 134 and a video sink such as a display device 136. The receiver 132 receives encoded video data from the communication channel 120 (or from the channel 122) and passes received compressed video data 133 to the video decoder 134. The video decoder 134 outputs decoded frame data 135 to the video sink 136. Examples of the video sink 136 include a video display device such as a cathode ray tube, a liquid crystal display (such as in smartphones), tablet computers, computer monitors or stand-alone television sets, and the like. The video sink 136 can be any other consumer of video data such as a video processing unit encoder or streaming server.
It is also possible for the functionality of each of the source device 110 and the destination device 130 to be embodied in a single device, examples of which include mobile telephone handsets and tablet computers, or equipment within a broadcast studio including overlay insertion or other live editing units.
The physical link 120 over which the video frames are delivered may be, for example, part of an SDI interface. Interfaces such as SDI have sample timing synchronised to a clock source, with horizontal and vertical blanking periods. As such, samples of the decoded video need to be delivered in accordance with the frame timing of the SDI link. Video data formatted for transmission over SDI may also be conveyed over Ethernet, for example using methods as specified in SMPTE ST. 2022-6. In the event that samples are not delivered according to the required timing, noticeable visual artefacts result (e.g. from invalid data being interpreted as sample values by the downstream device). Accordingly, the video encoder 114 and decoder 134 implement rate control and buffer management mechanisms to ensure that no buffer underruns and resulting failure to deliver decoded video occur.
Rate variations may arise during compression due to variations in the complexity and time taken for the encoder 114 to search possible modes of the incoming video data 113. Accordingly, the rate control mechanism ensures that decoded video frames 135 from the video decoder 134 are delivered according to the timing of the interface over which the video frames are delivered. A similar constraint exists for the inbound link to the video encoder 114, which needs to encode samples in accordance with arrival timing and may not stall the incoming video data 113 to the video encoder 114 (for example due to varying processing demand for encoding different regions of a frame). To meet the constraints, the video encoder 114 and the video decoder 134 typically implement some buffering of video data. The buffering, at both the encoder 114 and the decoder 134, increases end to end
13307568 1
- 122017204642 07 Jul 2017 latency of the video transmission. As described above, the video encoding and decoding system 100 has a latency of less than one frame of video data. In particular, some applications require latencies as low as thirty-two (32) lines of video data from the input of the video encoder 114 to the output of the video decoder 134. The latency may include time taken during input/output of video data and storage of partially-coded video data prior to and after transit over a communications channel.
The system 100 includes the source device 110 and the destination device 130. The communication channel 120 is used to communicate encoded video information from the source device 110 to the destination device 130. In some arrangements, the source device 10 110 and the destination device 130 may comprise respective broadcast studio equipment, such as overlay insertion and real-time editing modules, in which case the communication channel 120 may be an SDI link. In other arrangements, the source device 110 and the destination device 130 may comprise a graphics driver as part of a system-on-chip (SOC) and an LCD panel (e.g. as found in a smart phone, tablet or laptop computer), in which 15 case the communication channel 120 is typically a wired channel, such as printed circuit board (PCB) tracks and associated connectors.
Moreover, the source device 110 and the destination device 130 may comprise any of a wide range of devices, including devices supporting over the air television broadcasts, cable television applications, internet video applications and applications where encoded 20 video data is captured on some storage medium or a file server. The source device 110 may also be a digital camera capturing video data and outputting the video data in a compressed format offering visually lossless compression, and as such the performance may be considered as equivalent to a truly lossless format (e.g. uncompressed).
Notwithstanding the example devices mentioned above, each of the source device 25 110 and the destination device 130 may be configured within a general purpose computing system, typically through a combination of hardware and software components.
Fig. 2A illustrates a typical computer system 200, which includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227 and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and loudspeakers 217. The camera 227 may be configured as the video source 112. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220, which may represent the
13307568 1
- 13 2017204642 07 Jul 2017 communication channel 120, can be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may 5 be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220. In some arrangements, the transceiver device 216 may provide the functionality of the transmitter 116 and the receiver 132 and the communication channel 120 may be embodied in the connection 221.
The computer module 201 typically includes at least one processor unit 205, and a 10 memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes a number of input/output (FO) interfaces including: an audiovideo interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, 15 scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card and provides an example of‘screen content’.
In some implementations, the modem 216 may be incorporated within the computer 20 module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a 25 so-called “firewall” device or device of similar functionality. The local network interface 211 may comprise an Ethernet™ circuit card, a Bluetooth™ wireless arrangement or an IEEE 802.11 wireless arrangement. However, numerous other types of interfaces may be practiced for the interface 211. The local network interface 211 may also provide the functionality of the transmitter 116 and the receiver 132 and communication channel 120 30 may also be embodied in the local communications network 222.
The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage
13307568 1
- 142017204642 07 Jul 2017 devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the video source 112, as an intermediate storage device 127, or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 110, intermediate device 125 and the destination device 130 of the system 100, may be embodied in the computer system 200.
The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun SPARCstations, Apple Mac™ or alike computer systems.
Where appropriate or desired, the video encoder 114 and the video decoder 134, as well as methods described below, may be implemented using the computer system 200 wherein the video encoder 114, the video decoder 134 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 114, the video decoder 134 and the steps of the described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.
The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the
13307568 1
- 15 2017204642 07 Jul 2017 computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably achieves an advantageous apparatus for implementing the video encoder 114, the video decoder 134 and the described methods.
The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212.
In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc™, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 401 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer 30 system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.
13307568 1
- 162017204642 07 Jul 2017
Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.
When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205.
This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.
The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the computer system 200 of Fig. 2A need to be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.
As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via
13307568 1
- 172017204642 07 Jul 2017 the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.
The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.
In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.
A video encoding method 300 shown in Fig. 3 (which may be used to implement the video encoder 114) and a video decoding method 900 shown in Fig. 9 (which may be used to implement the video decoder 134) may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder
114, the video decoder 134 and the described methods produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.
Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:
(a) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;
13307568 1
- 182017204642 07 Jul 2017 (b) a decode operation in which the control unit 239 determines which instruction has been fetched; and (c) an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.
Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.
Each step or sub-process in the methods of Figs. 3, 6 and 9, to be described, is associated with one or more segments of the program 233 and is typically performed by 10 the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233. Although the execution of each of the methods is described with reference to a single computer system 200 and the corresponding component parts, each of the encoding and decoding methods 15 may execute on distinct processors residing in distinct end equipment. That is, the source device 110, destination device 130 and intermediate device 125 may be implemented as physically distinct devices, each comprising physically distinct computer systems.
Fig. 3 is a schematic flow diagram showing the method 300 of encoding an image, typically a video frame. The method 300 performs encoding using a variable length code 20 for words formed from the most significant bit-planes of coefficient groups. The method
300 is typically implemented as one or more modules of the software application 233, controlled by execution of the processor 205, and stored in the memory 206. The method 300 can be implemented by the video encoder 114.
The method 300 is applied to each frame of a video sequence (for example of the 25 video data 113) independently in order to provide flexibility for editing. Processing of a frame using the method 300 begins at an initialising step 301. The step 301 executes by initialising coding structures including input precision, position indexes and buffers that may reside in the working memory 206 of the computer module 201. Compression of the frame data is able to begin when the method 300 continues to a pixel transform step 302.
Execution of the step 302 ultimately generates wavelet coefficients in a sign plus magnitude format. The pixel transform step 302 is typically implemented as a pipeline in order to minimise buffering and latency within the system 100. The pipeline stages are represented by steps 305, 310, 315, 320 and 325. The step 302 starts at the step 305 and accepts (reads or otherwise receives into the memory 206), raw pixel data for a video
13307568 1
- 192017204642 07 Jul 2017 frame and applies a precision conversion. The precision conversion step 305 creates image samples 306 with a pre-determined “working” precision and range that is independent of the actual input image precision. In practice, the step 305 is typically achieved by shifting the bits in the input words an amount determined by the input and working precisions, and 5 then subtracting an amount to centre the working range about zero. The method of step 305 is possible because a bit-wise left shift of bits in a binary number is mathematically equivalent to multiplication by a power of 2 (i.e. a « n — a X 2”) and bitwise right shift operator which is mathematically equivalent to division by a power of 2 (i.e. a » n — ^). Specifically the precision conversion step 305 converts pixel values according to:
pw — 2Bw~Binpin — 2Bw_1 Equation (1)
In Equation (1), pin and pw are channel values of the input pixel and working pixel respectively and Bin and Bw are the number of bits of precision in the input and working pixels respectively.
The method 300 continues under execution of the processor 205 to the step 310. At step 310, the working precision pixel values are subject to a colour transform to yield pixel values that can be treated as YCC samples. An integer reversible colour transform is used at step 310 and is defined mathematically by:
- γ - r i 2 1 ’ r^wl
Cg .Co. 1 “ 4 rH 1 ™ __________________________________1 2 0 -1 —2. Gw -Bw.
Equation (2)
Other colour transforms known in the art, either integer reversible or approximate, may also be used for step 310 so long as the forward and inverse transform pair are 25 correctly matched between the encoder and the decoder. The colour transform step 310 can be bypassed if the raw pixel data already comprises YCC samples.
The method 300 continues under execution of the processor 205 to the step 315. The resulting stream of YCC values 311 are passed to the transform step 315. At step 315, the stream of YCC values are subject to a wavelet transformation. Generally, a 5/3 Le Gall 30 wavelet is used, where 5 and 3 refer to lengths of the high pass and low pass filters used in analysis and synthesis. Other wavelets are also possible, such as a Haar wavelet or a Cohen-Daubechies-Feauveau 9/7 wavelet. Due to the requirement of ultra-low latency, the
13307568 1
-202017204642 07 Jul 2017 number of levels of decompositions is highly constrained vertically, generally to not more than two levels. The number of levels of decompositions is relatively unconstrained horizontally, for example with five levels being used. In one arrangement, the wavelet transform of step 315 is implemented according to a wavelet transform arrangement 499 of 5 Fig. 4 A. The arrangement 499 is comprised of a series of stages 421, 424, 425, 426, 427,
428 and 429. Each stage applies one level of wavelet decomposition either vertically along columns of pixel values (as in 421) or horizontally along rows of pixel values (as in 424429). The stages 421, 424, 425, 426, 427, 428 and 429 are implemented using a lifting scheme (known in the art) but could equally be implemented using convolutional filters with subsampling (also known in the art). Each stage generates high-pass coefficients (e.g.
422) and low pass coefficients (e.g. 423) as output. A total number of output coefficients
316 is equal to a number of the input pixel values 311. Multiple stages of wavelet decomposition are applied to generate a full set of coefficients 401-408. The coefficients 401-408 are referred to in aggregate as the wavelet transform of the input signal.
Due to the structure of the wavelet decomposition stages 421, 424, 425, 426, 427,
428 and 429, a wavelet coefficient structure 400 depicted in Fig. 4B results from execution of step 315. The structure 400 has a sub-band corresponding to each set of values 401-408 output by each stage of the wavelet transform step 315. For example, the L5L and L5H sub-bands represent the collected outputs of the decomposition stage 429, L4H corresponds to the high pass output of 428 and so on.
Other two-dimensional (2D) wavelet transform structures or arrangements arise from a different cascading of the vertical and horizontal stages and may be used in place of those described by Figs. 4A and 4B. Some examples are given in Figs. 10A(l) and 10A(2). Specifically two different examples of wavelet transform stage configurations 1091 and
1093 are shown in Figs. 10A(l) and 10A(2) respectively. The configurations 1091 and
1093 are shown along with corresponding sub-band structures 1092 and 1094 respectively. The structure 1092 comprises coefficient sub-bands 1041-1047 and the structure 1094 comprises coefficient sub-bands 1051-1057. In both configurations 1091 and 1093, the transform is comprised of one-dimensional (ID) wavelet transform stages which are applied either vertically (for example 1031a, 1031b) or horizontally (for example 1035a, 1035b). Each stage generates a single high-pass (for examplel032a, 1032b) and a single low-pass (for example 1033 a, 1033b) output. The example 1091 is widely used to implement a 2 level 2D wavelet transform.
13307568 1
-21 2017204642 07 Jul 2017
Following the model of the transform arrangement 499 of Fig. 4A, additional ID transform stages may be performed on the output 1041 of 1091. Different wavelet transform structures have different latency that is often the result of vertically applied transform stages. In all cases, a corresponding precinct structure mirrors the sub-band structure. Whatever wavelet transform structure is employed in the video encoder 114, the complimentary structure is employed in the decoder 134. For example, Fig. 10B depicts an inverse wavelet transform structure 1099 that is complimentary to the structure of Fig. 4A. The structure 1099 is described in detail below in the context of a video decoder.
Referring back to Fig. 3, output wavelet coefficients 316 are subject to a further precision conversion as the method 300 progresses to the step 320. The conversion at step 320 rounds the coefficient values into a working range of the compression engine and is defined mathematically as
Cc — (cw + (1 « (^δ D)) » Equation (3)
Cw + 2^-1
In Equation (3), Cc and Cw are the coefficient values normalised to compressor and working precision respectively. ΒΔ = Bw — Bc is a difference between the number of bits used to represent values in working and compressor precision and « and » denotes the bitwise left and right shift operators, mathematically equivalent to multiplication and division by a power of 2 (that is a « n — a x 2n and α» n — ^-). A resulting stream of compressor precision wavelet coefficients 321 is generated and the method 300 continues to the step 325. The compressor precision wavelet coefficients 321 are passed to the step 325. The coefficients 321 are converted to a sign-magnitude format for subsequent compression at step 325. Sign-magnitude format represents each value as a positive integer equal to the coefficient’s absolute value plus a single “sign” bit. The sign bit is 1 if the coefficient is negative and 0 if the coefficient is positive. The sign-magnitude format is particularly useful in compression of wavelet coefficient data because the magnitude of the wavelet coefficients, irrespective of sign, indicate the importance of the value to the accurate representation of the original signal. In other words, any lossy compression should aim to preserve the high magnitude coefficients.
Because of the wavelet transform’s cascaded structure, the step 315 (and therefore 30 the step 302) can, after some pipeline delay, generate output coefficient samples 316 at the same rate as the input image pixels 311 arrive. Referring again to Fig. 4A, the output 316 of the wavelet transform block 499 therefore appears in split raster scan order. In split
13307568 1
-222017204642 07 Jul 2017 raster scan order rows from the full set of sub-bands are generated together. Within the codec, groups of rows (e.g. 431 and 432 in Fig. 4B) that are generated (and coded) together are called a precinct. Groups of precincts, as defined for example by the spatial regions 435 and 436 form a grouping referred to as a slice. A single slice may represent a whole video frame. A precinct 430 comprising groups of rows 431 and 432 is shown in Fig. 4C. The precinct 430 is shown to have the same sub-band structure as in the arrangement 400. However, each of the precinct sub-bands 441-448, corresponding to horizontal slices of the wavelet transform sub-bands 401-408 respectively, has a smaller number of rows. If there is only a single level of vertical wavelet transform, a number of rows 424c in the vertically low-pass precinct sub-band 431 is equal to a number of rows 426c in the vertically highpass precinct sub-band 432. If there was a second level of vertical wavelet transform then a third row of high pass samples would contain twice as many rows - again due to the cascaded structure of the wavelet transform. The minimum row height for a precinct subband (e.g. 424c, 426c) is 1 row of coefficients. The minimum number of rows contained in a precinct is a function of the number of levels of vertical wavelet transform, for example 2 rows for 1 level of decomposition, 4 rows for 2 levels, 8 for 3 levels and so on. To achieve a suitable compromise between compression and latency, a single level of vertical decomposition and 2 rows of precinct data are typically used. To improve compression performance, more rows of coefficients can be included in a precinct - with or without additional levels of wavelet transform.
Referring back to Fig. 3, the method 300 continues under control of the processor 205 from step 302 to a step 330. Step 330 of the encoding and compression method 300 receives coefficient data from the pixel transform step 302 structured as the precinct 430 as depicted in Fig 4C. From the precinct structure 430, the step 330 extracts coefficient groups. Coefficient groups are small sets of horizontally adjacent coefficient values such as depicted by 451-453 of Fig. 4C. More generally the coefficient groups contain horizontally adjacent or vertically adjacent coefficients or have a 2D block structure or some combination thereof. A main feature of the coefficient groups is that the group is spatially localised and the number of coefficients is relatively small. Execution of the step 330 divides each of the precinct-sub-bands into groups of 4 adjacent coefficients. The coefficient groups of the structure 430 are subsequently coded in steps 335 through 370 using a series of most-significant bit-plane indices (one per coefficient group), sign bits, a code representing the most significant bit-plane of the group and additional data bits that
13307568 1
-23 2017204642 07 Jul 2017 encode the coefficient magnitude values (below the most significant bit-plane) within each coefficient group.
The coding of the coefficient groups is described with additional reference to Fig.
5. In particular, in Fig.5, an arrangement 500 shows an expanded view of example coefficient groups 451-453 and corresponding characteristics (MSB index to MSBI code). Coefficient values 510 for the coefficient groups are represented in sign-magnitude format by sign bits 520 and magnitude bits 530. The magnitude bits 530 are stacked vertically so that the most significant bit (MSB) appears at the top and the least significant bit (LSB) at the bottom according to a bit-pane index 521.
The method 300 continues from step 330 to step 335. For each group of coefficients (for example 451-453), the step 335 executes to determine a most significant bit-plane index 540. The most significant bit-plane index 540 is the first index of a set of bit-planes 531 containing only zeros. More generally, the sequence of MSB-plane index values 540 defines a group-by group partitioning of the bit-planes into insignificant bit15 plane 531 and significant bit-planes 532 and 533.
The method 300 continues under execution of the processor 205 from step 335 to step 340. Lossy compression of the coefficient values is achieved by further introducing a truncation index 550. The truncation index 550 for each precinct sub-band is determined at process step 340 using a rate allocation process described below with reference to Fig. 6.
The method 300 continues under execution of the processor 205 from step 340 to step 345. The difference between the MSB-plane index and the truncation index defines a bitprecision to be used for representing coefficient values within each coefficient group. The coefficient values are quantised at step 345 to be represented with the required bitprecision. A simple form of quantisation is truncation whereby the bit-planes 533 below the truncation index are set to zero and the remaining significant bit-planes 532 are left unmodified. However, better compression performance can be achieved using more complex quantisation schemes. In one arrangement, coefficient values are rounded according to the function:
Equation (4)
In Equation (4), C£/and C are the quantised and un-quantised coefficient magnitudes respectively and 5q is a quantisation step size defined as:
_ 2BM+1
Q ~ 2bM-Bt*1-!
Equation (5)
13307568 1
-242017204642 07 Jul 2017
In Equation (5) BM and BT are the MSB-plane index 540 and the truncation bitplane index 550, respectively, of a coefficient group. Step 345 effectively operates to form one or more words from bits of the MSB-plane data of a group of wavelet coefficients for the image or frame.
The method 300 continues from step 345 to step 350. At step 350 of the compression method 300 of Fig. 3, the most significant bit-plane words (for example 551553) are extracted (for example as 535) and coded using a variable length code (as shown by 536). A prefix code is determined and used in execution of step 350 using inputs 811 and outputs 812 defined in a table 810 of Fig. 8 A. Specifically, the MSB-plane bits are treated as an input word for the inputs 811 and a corresponding one of the codes 812 is used to represent the MSB-plane data in the compressed bit-stream. The prefix codes 812 are designed to have shorter length for cases where the MSB-plane of the coefficient group contains only a single significant bit (1). The prefix codes 812 comprise a variable length code having fewer bits than the extracted most significant bit-plane words (for example as 535) for frequently occurring words. MSB-plane words containing only a single significant bit are typically significantly more frequent than other words for typical image data, including screen content. Other codes are similarly allocated on the basis of frequency. In particular, more frequently occurring MSB-plane words are given shorter codes and less frequently occurring MSB-plane words are given the longest codes. In the absence of a data driven frequency analysis, properties of typical images are used to design a code. Specifically, high magnitude coefficients are typically sparsely distributed within wavelet sub-bands for typical image content. Thus, the code of table 810 generally contains shorter codes for input MSB-plane words with longer consecutive runs of zeros.
There are two exceptions to this rule. Firstly, the word “0000” will never occur in practice because, by definition, the MSB-plane for a group contains at least one significant (non-zero) bit. The word “0000” is therefore assigned to a long code but could equally be unassigned. Secondly, the word “1111”, while not typically expected in any high-pass coefficient sub-bands (such as 402-408), is expected with high frequency in the low pass sub-band 401. By making the code for the word “1 11 1” have length 4 the coding method is suitable to be applied to all sub-bands, without knowledge of the specific sub-band being processed. The encoding of the video data can be readily implemented using the lookup table 820 of Fig. 8B. In the table 820, the rows of the table 810 have been reordered so that the MSB-plane word values appear in numerical order. Accordingly, the MSB-plane word
13307568 1
-25 2017204642 07 Jul 2017 can be used as an index 821 to directly access memory containing the corresponding code 822 and code length 823, resulting in a particularly straightforward and low complexity coding process in both hardware and software. At step 350, the formed word(s) from step 345 are encoded using the determined variable length code (for example from the lookup 5 table 820).
The method 300 continues from step 350 to step 360. Subsequent bit-planes 537, also referred to as less significant bit planes, are encoded at step 360. In one arrangement the subsequent bit-planes are encoded without modification so that the subsequent bitplanes appear as a sequence of fixed length codes 537 within the compressed bit-stream.
The overall length of the code for any coefficient group is determined by the MSB-plane index and truncation bit-plane index. Lossless compression results from not needing to transmit leading zeros whereas lossy compression results from discarding quantised least significant bits. Step 360 effectively operates to encode the coefficients for the remaining (less significant) bits of the frame using fixed length encoding.
The method 300 continues under execution of the processor 205 from step 360 to step 365. At step 365 of the compression method 300 of Fig. 3, the sign bits 520 of the coefficients are encoded. The sign bits are encoded by dropping meaningless bits. For example, if a coefficient value is zero after quantisation, the sign bit is considered meaningless and dropped from the coded sign data 521 that is written to the compressed bit-stream 115.
The method 300 continues under execution of the processor 205 from step 365 to step 370. The sequence of MSB-plane indices 540 are encoded at step 370. The MSBplane indices 540 are typically encoded using a combination of differential and run coding. In a specific arrangement, a prediction residual (for example 541) is generated for each
MSB-plane index 540. Prediction can be performed horizontally (along rows of the precinct as shown). However if memory is available, vertical prediction (down columns of the precinct) is typically more efficient. Prediction residuals can be set to zero wherever the truncation bit-plane index equals or exceeds the MSB-plane index 540. The resulting stream of residual values (e.g. 541) can be coded using a variable length code that yields shorter code words for smaller residual values (e.g. 542). In one arrangement a unary code is employed. Runs of zeros can be further encoded using a run-length code.
The method 300 continues under execution of the processor 205 from step 370 to step 390. Execution of step 390 of the compression method 300 packs the encoded information including the coded MSB-plane indices (e.g. 542), the coded MSB-plane
13307568 1
-262017204642 07 Jul 2017 words (e.g. 536), the sign bit information (e.g. 521), and other quantised coefficient bitplanes (e.g. 537) and other required header information such as sub-band truncation indices into a bit-stream according to a predetermined set of syntax rules. The truncation indices can be encoded using as few as two parameters - scenario and refinement. The scenario and refinement parameters form part of a rate allocation process 340 as described below. Parameters such as the scenario and refinement as well as any other parameters required to decode the precinct are written together within a structured block referred to as a precinct header. Parameters that are relevant to the whole frame are encoded in a frame header within the bit-stream syntax.
For the compression method 300 to generate a frame data at a specific compressed bit rate, the truncation indices determined at step 340 must be appropriately selected. A method for determining the truncation levels is now described with reference to Fig. 6. Fig.
shows a method 600 of rate allocation. The method 600 is typically implemented as one or more modules of the application 233, controlled by execution of the processor 205 and 15 stored in the memory 206.
The method 600 of rate allocation takes place in two distinct stages. In a first stage 601, budget calculations are performed to determine the coding cost for each precinct subband at each truncation index. The coefficient data is encoded in distinct parts, and the budget calculation step 601 involves distinct steps for calculating the cost of coding the 20 MSB-plane index data (step 610), the coefficient magnitude data (step 620), the MSBplane codes (step 630) and the cost of the sign information (step 640). Coding costs are tabulated for each precinct sub-band at each truncation index. Where alternative methods exist for a coding step, such as vertical or horizontal prediction of the MSB-plane indices, costs are determined and tabulated for each case.
The method 600 continues under execution of the processor 205 from step 601 to a rate allocation step 602. A resulting sub-band coding cost table 603, generated at step 601, is passed to the rate allocation step 602. The rate allocation step 602 starts by determining an available precinct bit-budget at step 650. The available precinct bit-budget is determined based on the frame budget and may incorporate any unused bit-budget from the coding of a previous precinct. If slice partitions (e.g. 435, 436) are employed then budget sharing is not extended to precincts from different slices even if they are adjacent. Having determined an available precinct bit-budget, the method 600 continues to step 660. The budget tables are traversed at step 660 to determine a set of truncation points that deliver the highest fidelity without exceeding that bit-budget. To simplify determining truncation planes at step 660, a
13307568 1
-272017204642 07 Jul 2017 gain table 604 and a priority table 605 are used. The gain table 604 captures the relative gains of the wavelet synthesis filters corresponding to each wavelet sub-band in the decomposition structure 400 as discussed below in the context of the decoder and with reference to Fig. 10. In particular the gain table 604 contains, for each sub-band, and up to an offset, the log base 2 of the synthesis gain quantised to a nearest integer. The error in the relative gains as captured in the gain table is then ranked in order of magnitude to produce the priority table 605. Together, the gain and priority tables lead to an algorithm for selecting truncation levels for the sub-bands at step 660.
The algorithm for selecting truncation levels comprises two sequential searches. In a first search, which iterates over a first rate control variable referred to as scenario, the precinct coding cost is determined for decreasing rate increments until a calculated coding cost is less than the available budget. In the second search, which iterates over a second rate control variable referred to as refinement, the precinct coding cost is determined for increasing rate increments until a calculated coding cost is identified which is closest to the 15 available budget without exceeding it. The relationship between the scenario τ, refinement e and bit-plane truncation index BT [i] for sub-band i is defined as follows:
τ — Γ[ί] + 1 if x[i] < e τ — Γ[ί] otherwise
Equation (6)
In Equation (6) Γ is the vector of per sub-band gains as recorded in the gain table
604 and κ is the vector of per sub-band refinement priorities as recorded in the priority table 605. The coding cost C for the precinct, is subsequently determined according to
C = Σί A [i, BT [ij] Equation (7)
In Equation (7) Λ is the sub-band coding cost table 603. For strict rate analysis, the cost calculation may need to be subject to rounding. For example, if the precinct bit-stream is intended to be aligned to byte boundaries then the cost C would be rounded up to the next multiple of 8. Having determined an actual coding cost for the current precinct which is less than the precinct bit-budget, the method 600 continues to step 670. Unused bitbudget can be allocated at step 670. Specifically, unused bit-budget can be carried forward for use in coding a subsequent precinct where the subsequent precinct falls within the same slice as the current precinct. Otherwise, the unused budget may be assigned to padding.
Padding adds additional, non-functional bits to the bit-stream for the purpose of aligning to a predetermined bit location.
13307568 1
-282017204642 07 Jul 2017
While the step 602 determines truncation bit-planes based on an exact budget calculation at step 601, the ability of step 670 to forward unused bit-budget to a next precinct means that using an approximate budget calculation in step 601 is possible.
Fig. 7 illustrates the how carry forward of unused precinct bit-budget may be combined with cost estimation. A graph 700 shows the available precinct bit-budget for two spatially adjacent precincts in a frame - precinct n and precinct n+1. Precinct n is determined to have an available budget 710 at step 650. The step 660 performs a search over scenario and refinement to determine a set of truncation bit-plane indices that result in consumption of rate as shown as 720 in Fig. 7. Any further lowering of the truncation level and consequent coding of any additional bit-planes would result in the estimated cost exceeding the precinct budget 710. When the video encoder 114 encodes the precinct according to the selected scenario and refinement, a smaller number of bits, 721 bits, are written to the encoded bit-stream 115. Then, a bit-budget 712 for precinct n+1 is determined by adding the unused bit-budget from the precinct n to the precinct bit-budget for the precinct n+1. When coding the precinct n+1, the application 133 is able to select a lower truncation level than would otherwise be the case. In some arrangements, the first precinct of a slice may be expected to be coded at slightly reduced quality compared to subsequent precincts in the frame, as the first precinct in the frame does not benefit from receipt of forwarded rate from any earlier precincts. One solution to the resultant reduced quality is to adjust the per-precinct budget such that the first precinct in each slice is allocated a higher budget than subsequent precincts in the frame.
Fig. 9 is a schematic flow diagram showing a method 900 of decoding a video frame. The video frame has been encoded using variable length code for words formed from the most significant bit-planes of coefficient groups. The method 900 is typically implemented as one or more modules of the application 233, controlled by execution of the processor 205, and stored in the memory 206. The method 900 can be implemented by the video decoder 134.
The method 900 is applied to the bit-stream for each frame of a video sequence independently in order to provide the flexibility for editing. The method 900 begins at a step 910. The step 910 executes by initialising decoding structures including input precision, position indexes and buffers that may reside in the working memory 206 of the computer module 201. Initialising decoding structures may involve reading and decoding header information about the image size and colour component structure, as well as compression options and data that are not assumed to be known by the decoder 134 from
13307568 1
-292017204642 07 Jul 2017 the bit-stream. Initialising decoding structures may also include the wavelet transform structure, precinct and coding group structure and so on. Subsequently, the method 900 continues under execution of the processor 205 from step 910 to a step 915.
Decompression is applied to recover precincts which are converted back to pixels with sub-frame latency. The decompression begins at step 915 by deconstructing the bit-stream to separate information relating to the different coded components - the MSB-plane indices, truncation indices, the magnitude data including the MSB-plane codes and other bit-plane codes, and the sign bits.
Referring to step 915, the method 900 continues to step 920 to decode or determine
MSB-plane indices for the coefficients. Decoding the MSB-plane indices at step 920 involves reading and decoding the series of unary codes, including undoing any run-length coding to recover the prediction residuals. The prediction residuals are then added to the predicted values to generate a row of MSB-plane index values. The predicted values may be a previous row of decoded MSB-plane indices (if vertical prediction is being used) or an immediately (horizontally) previous value (if horizontal prediction is being used) or zero in the case of any unpredicted values (such as the first sample in a precinct row).
The method 900 continues under execution of the processor 205 from step 920 to step 925. At step 925 the truncation indices are decoded or determined. Decoding the truncation indices involves re-generating the values from scenario and refinement according to Equation(6). If the gain and priority tables are not known to the decoder 134 then the gain and priority tables would typically be communicated via a frame header. Accordingly, the two values - scenario and refinement - are all that is required to reconstruct the precinct truncation indices. The scenario and refinement values are communicated via the precinct header.
The method 900 continues under execution of the processor 205 from step 925 to step 930. At step 930, MSB-plane words for non-zero coefficient groups are read and decoded. The variable length codes representing the MSB-plane words can be read and decoded using an algorithm 830 shown in Fig. 8C in combination with a 2D look-up table 840 (shown in Fig. 8D). Up to 3 leading bits are read from each MSB-plane code. In particular, leading bits 841 are read until a zero (0) is encountered or three consecutive ones (1) are encountered. The number of ones read is then interpreted as a row index for the 2D table 840. Having determined a row index, a column index 842 - a two bit number - is read directly from the bit-stream and the two indices used to access a decoded MSB13307568 1
-302017204642 07 Jul 2017 plane word 843. Step 930 effectively determines values for bits of MSB-plane words using the variable length codes.
The method 900 continues under execution of the processor 205 from step 930 to step 935 to form the remaining or less significant bits of the coefficients. At step 935, other 5 less significant bit-plane words are read (received) as a series of fixed length codes. The number of codes is determined as the difference between the MSB-plane index and the truncation bit-plane index for coefficient group (e.g. 451-453). The decoded bit-plane words for the MSB-plane (e.g. 535) and other bit-planes (e.g. 537) are written to a memory (such as the memory 206) using the decoded MSB-plane indices (e.g. 540) to determine 10 the bit-plane group (e.g. 532) into which the decoded bit-plane words should be written.
The method 900 continues under execution of the processor 205 from step 935 to step 940. Having reconstructed the coefficient magnitudes, decoding of the associated sign bits 520 is performed at step 940. In one arrangement, sign bits are read as a fixed length word for each coefficient group. In another, more compact but more complex to decode 15 arrangement, coefficient magnitudes are compared to zero and a sign bit read for each nonzero magnitude encountered.
The method 900 continues under execution of the processor 205 from step 940 to step 945. The quantised coefficient magnitudes that result from the decoding process steps 910-940 are de-quantised at step 945. A simple form of de-quantisation is truncation 20 whereby the bit-planes 533 below the truncation index are set to zero and the remaining significant bit-planes 532 are left unmodified. However, better compression performance can be achieved using more complex quantisation schemes. In one decoder arrangement corresponding to the encoder arrangement described previously with reference to Equation (4) and Equation (5), coefficient values are calculated according to the function:
C = x Equation (8)
In Equation (8) C(/and C are the quantised and un-quantised coefficient magnitudes respectively and 0q is the quantisation step size defined as
2SM+1
T = 2βμ_Βγ+1_1 Equation (9)
13307568 1
-31 2017204642 07 Jul 2017
In Equation (9), BM and BT are the MSB-plane index 540 and the truncation bitplane index 550, respectively, of a coefficient group. Step 945 effectively determines or reconstructs coefficient values 955 for a group using the MSB-plane values from step 930 and the fixed length codes of step 935 and 940.
The reconstructed coefficient values 955 for the precinct are subsequently provided to the inverse pixel transform step 960 to produce a number of output lines of pixel data. The pixel transform step 960 is implemented as a pipeline in order to minimise buffering and latency within the system implementing the method. The pipeline stages are represented by steps 961, 963, 965, 967 and 969. The step 960 starts at step 961 and accepts coefficient values in sign-magnitude format and converts the coefficient values to conventional signed format values 962. In particular, the magnitude is multiplied by -1 if the sign bit is non-zero. The method 900 continues to the step 963. The signed coefficient values 962 are converted to a working precision by execution of step 963. The conversion step 963 transforms the coefficient values from the working range of the compression process to the working range of the wavelet transform process and is defined mathematically as
Cw = 2B&Cc Equation (10)
In Equation (10), Cc and Cw are the coefficient values normalised to compressor and working precision respectively; ΒΔ = Bw — Bc is the difference between the number of bits used to represent values in working (wavelet) and compressor process precision. The method 900 continues from step 963 to step 965. The resulting stream of working process precision coefficients 964 from step 963 is subjected to an inverse wavelet transformation at step 965. The specific inverse wavelet transform used in the arrangements described is the 5/3 Le Gall wavelet. In other arrangements, other wavelet transforms could be specified via a frame header. The inverse wavelet transform 1099, as shown in Fig. 10B, is a mirror of the forward wavelet transform process 315 used during the encoding process 300 (and as shown as the arrangement 499 Fig. 4A). The inverse wavelet transform arrangement
1099 is comprised of a series of stages 1029, 1028, 1027, 1026, 1025, 1024 and 1021. Each stage applies one level of wavelet synthesis, either vertically along columns of pixel values (as in 1021) or horizontally along rows of pixel values (as in 1024-1029). The stages 1029, 1028, 1027, 1026, 1025, 1024 and 1021 are implemented using a lifting scheme (known in the art) but could equally be implemented using convolutional filters with up-sampling
13307568 1
-322017204642 07 Jul 2017 (also known in the art). Each stage accepts high-pass coefficients (e.g. 1022) and low pass coefficients (e.g. 1023) as input and a total number of output pixels 966 is equal to the number of input coefficient values 964. Multiple stages of wavelet synthesis are applied to consume the full set of coefficients 1001-1008 that are referred to in aggregate as the wavelet transform of the input signal. Each stage of the inverse transform arrangement 1099 has an associated gain, being the ratio of the energy in the output samples (which are low-pass coefficients at the next transform level) to the input coefficients. When multiple inverse transform stages are applied to a single input coefficient (e.g. 1001) then the gain for that coefficient is the product of the gains of all of the inverse transform stages the coefficient passes through. The gain value is typically different for each sub-band and is referred to as the synthesis gain of the sub-band (e.g. 401) to which the coefficient belongs. For the 5/3 wavelet transform employed in one arrangement, the synthesis gain associated with one level of inverse high-pass transformation is 0.71875 while the synthesis gain associated with one level of inverse low-pass transformation is 1.5. The synthesis gain does not need to be explicitly applied to the coefficients because the inverse gain is built into the forward transform. However, the synthesis gain must be known in order to generate the gain 604 and priority 605 tables used in the rate allocation step 602 and specifically the step 660 for determining the truncation bit-planes and specifying them in terms of a scenario and refinement.
The method 900 continues under execution of the processor 205 from step 965 to the step 967. The working precision pixel values 966 output from the inverse wavelet transform step 965 are subjected to an inverse colour transform at step 967 to convert the YCC samples to RGB samples 968. The colour transform inverts the forward colour transform (e.g. using Equation (2)) applied by the encoder during step 310 and may be bypassed if the forward transform was bypassed during encoding. Inverting the transform specified in Equation (2) is defined mathematically by:
RW 1 -1 Ι- - γ -
Gw 1 1 Ο Cg
-Bw- .1 -1 -1. .Co.
Equation (11)
The method 900 continues under execution of the processor 205 from step 967 to the step 969. The working precision RGB pixel values 968 are converted to the original pixel precision at step 969 to produce the output pixel data 135 for the reconstructed frame. The precision conversion step 969 converts pixel values according to:
13307568 1
-33 2017204642 07 Jul 2017 n,., + 2Bw~1 + 2BS~1
Pout =------------ Equation (12)
In Equation (12), pout and pw are the channel values of the output pixel and working pixel respectively and ΒΔ = Bw — Bout is the difference between the number of bits used to represent values in working and output precision.
Fig. 11 is a schematic block diagram showing an architecture 1100 of functional modules of the video encoder 114 used to implement the method 300. The video encoder 114 may be implemented using a general-purpose computer system 200, as shown in Figs.
2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200. The video encoder 114 and the described methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular, the video encoder architecture 1100 comprises modules 1110-1190. Each of the modules 1110-1190 may each be implemented as one or more software code modules of the software application program 233, or an FPGA ‘bitstream file’ that configures internal logic blocks in the FPGA to realise the video encoder 114. The video encoder 114 provides reduced complexity in the rate allocation functionality by approximating costs for evaluation of candidate truncation levels, such that a worst case estimate is used for each candidate during evaluation. Then, for coding, the actual coded cost is derived once, only at the selected truncation level that is applied for coding. The coder further provides a low complexity variable length coding of the MSB-plane of the coefficient groups leading to improved compression performance relative to the fixed length coding used for other bitplanes.
Received raw video data 113 is input to a pixel transform module 1110. The pixel transform module 1110 performs the step 302, generating precincts of wavelet coefficients
13307568 1
-342017204642 07 Jul 2017
1112 in sign-magnitude format. The wavelet coefficients 1112 are stored in an output buffer memory of the pixel transform module 1110 so the wavelet coefficients 1112 can be read and processed at different rates by an MSB index calculator 1120, a quantiser 1160 and a sign coder 1170. The MSB index calculator module 1120 implements the step 335, generating a stream of indices 1122 input to a budget calculator module 1140 and an MSB index encoder module 1130. The budget calculator module 1140 in turn implements the step 601 and the associated sub-steps 610 to 640. In some arrangements, the budget calculator module 1140 also reads coefficient values 1112 from the pixel transform module 1110.
The budget calculator module 1140 is required to read the coefficient values 1112 if the budget calculator module 1140 must do exact budget calculations because the coefficient values are required to determine the cost of coding sign information and MSBplane coding. If rate forwarding is being used, it is only necessary for the budget calculator module 1140 to determine a worst case budget estimate. The worst case budget estimate relates to an estimate of an encoded size of a group of wavelet coefficients for a group. The worst case budget can be determined using a maximum possible length of the variable length code for an MSB-plane word. Determining a worst case budget estimate can typically be done without reference to the coefficient values 1112. Budget tables 1142 constructed by the budget calculator 1140 are input to the rate allocator module 1150 for implementing the step 602 and the associated sub-steps 650 to 670 to determine truncation bit-plane indices 1152 for each precinct sub-band. The rate allocator module 1150 may also generate prediction mode information.
The MSB index coder 1130 uses the truncation index 1152 and prediction method information to truncate and encode the MSB index stream 1122 according to the method 25 described for step 370, generating coded MSB index data output to a packer 1190. The MSB index data 1122 and the truncation bit-plane indices 1152 are also used by the quantiser module 1160 to generate quantised coefficient magnitudes 1166 according to the step 345. The quantised coefficient magnitudes 1166 are input to a sign coder 1170 and used to process the sign bits of the precincts of coefficients 1112 to create a stream of 30 coded sign information according to step 365. The output of the quantiser 1160 is also divided into a stream of MSB-plane words 1162 and other bit-plane words 1164 as exemplified respectively by 535 and 537 of Fig. 5. The MSB-plane words 1162 are passed to a MSB-plane coder 1180 that implements the step 350. The stream of coded MSB-plane words 1184 (as exemplified by 536 of Fig. 5) are passed, together with the stream of other
13307568 1
-35 2017204642 07 Jul 2017 bit-plane words 1164 and the coded sign information 1172 to the packer 1190. The packer
1190 which merges the streams of words 1164 and 1184 and the sign information 1172 according to the method of step 390 to form a compressed bitstream 115. The packer 1190 is able to unambiguously determine a budget value 1192 consumed by the coded precinct.
When rate forwarding is implemented, the value 1192 is returned to the rate allocator module 1150 for use in calculating the truncation bit-plane indices for the next precinct.
The architecture 1100 of the video encoder 114 shows significant data processing modules and associated paths without the complication of control modules and paths. In a practical implementation, one or more control modules would be included that would 10 control overall operation of the encoder 114 as well as buffer memory and so on. In some arrangements, for example, information controlling the operation of the encoder and providing data about the input video bit-stream 113 are written by a supervisory process into control registers that are read as required by other encoder modules (not shown). Alternatively, the registers may be used for part of the relevant encoder modules.
Accordingly, important control information such as frame dimensions, quantisation method, gain and priority tables, colour transform and so on would be available to the modules of the architecture 1100. The data flows involved in the relevant control connections have a relatively small impact on the modular design of the encoder 114 and therefore do not significantly contribute to the design.
Fig. 12 is a schematic block diagram showing an architecture 1200 of functional modules of the video decoder 134 used to implement the method 900. The video decoder 134 may be implemented using a general-purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200. The video decoder 134 and the described methods may alternatively be implemented in dedicated hardware, such as one or more 30 integrated circuits performing the functions or sub functions of the described methods.
Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular, the video decoder 134 comprises modules 1210-1270.Each of the modules 1210-1270 maybe implemented
13307568 1
-362017204642 07 Jul 2017 as one or more software code modules of the software application program 233, or an FPGA ‘bitstream file’ that configures internal logic blocks in the FPGA to realise the video decoder 134. The video decoder 134 provides improved compression performance through the use of variable length coding of the MSB-plane of the quantised magnitudes coefficient 5 groups.
The architecture 1200 receives the compressed video data 133. The compressed video data 133 is first input to an unpacker module 1210. The unpacker module 1210 divides out the components of each encoded precinct according to step 915. Coded MSBplane index data 1242 generated at step 915 is passed from the unpacker module 1210 to 10 an MSB index decoder module 1240 along with any control information, including the prediction mode, required to decode the MSB-plane indices according to the step 920. Scenario and refinement values 1232, extracted from the precinct header data, are passed to a truncation decoder modulel230 that calculates the truncation bit-plane indices 1256 for the precinct sub-bands using the method of the step 925. The MSB-plane codes 1222 are 15 processed by the MSB-plane decoder module 1220 to produce MSB-plane words 1254 for coefficient groups according to the step 930. The MSB-plane words 1254 are combined with other bit-plane words 1255, decoded according to the step 935, in a dequantiser module 1250. The dequantiser module 1250 reconstructs coefficient magnitudes 1262. The coefficient magnitudes 1262 are used by a sign decoder module to process the encoded sign information 1264 and generate sign bits 1258 for the coefficient groups in accordance with the step 940. The dequantiser module 1250 is responsible for storing the bit-plane words into coefficient groups in sign-magnitude format, as exemplified in the expanded view of a memory array 500 of Fig. 5, and applying the dequantisation method of step 945 to the coefficient magnitudes to produce reconstructed coefficient values in sign-magnitude 25 format 955. In turn, the reconstructed coefficient values 955 form an input to a pixel transform module 1270. The pixel transform module 1270 is responsible for implementing the inverse pixel transform of step 960. In particular, the pixel transform module 1270 implements the processing pipeline comprised of sub-steps 961, 963, 965, 967 and 969 that recover the decompressed video pixels 135 from the coefficient values.
The architecture 1200 of the video decoder 134 shows the significant data processing modules and associated paths without the complication of control modules and paths. In a practical implementation, one or more control modules would be included that would control the overall operation of the decoder as well as buffer memory and so on. In one arrangement, for example, the header information from the compressed video bit13307568 1
-372017204642 07 Jul 2017 stream 133 input to the unpacker module 1210 would be stored to control registers that would be read as required by other decoder modules. Alternatively, the control registers may for part of the relevant decoder modules. Accordingly, important control information such as frame dimensions, quantisation method, gain and priority tables, colour transform and so on would be made available to the modules of the architecture 1200. The data flows involved in such control connections have a relatively small impact on the modular design of the decoder and therefore do not significantly contribute to the design.
INDUSTRIAL APPLICABILITY
The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video signals for a low-latency (sub-frame) video coding system.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.

Claims (13)

1. A method for encoding an image, the method comprising:
forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image;
determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and encoding the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group.
2. The method according to claim 1 wherein an estimate of an encoded size of the group of wavelet coefficients is determined using a maximum possible length of the variable length code for the formed word.
3. The method according to claim 1, wherein the variable length code for the formed word is determined using a lookup table.
4. The method according to claim 1, wherein the variable length code for the formed word is determined using a lookup table, the lookup table containing shorter codes for formed words with longer consecutive runs of zeros.
5. A method of forming an image by decoding a group of wavelet coefficients of the image, the method comprising:
determining a most significant bit-plane index and a truncation bit-plane index for the group;
determining, from a variable length code, values for bits of the most significant bitplane for a plurality of coefficients within the group;
receiving a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bit-plane index and the truncation bitplane index; and
13307568 1
-392017204642 07 Jul 2017 determining the plurality of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
6. The method according to claim 5, wherein the values for bits of the most significant bit-plane are determined from the variable length code using a lookup table.
7. The method according to claim 5, wherein the values for bits of the most significant bit-plane are determined from the variable length code using a lookup table, the lookup table using fewer bits for variable length codes corresponding to frequently occurring words.
8. A non-transitory computer readable medium having a program stored thereon for encoding an image, the program comprising:
code for forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image;
code for determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and code for encoding the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group.
9. A non-transitory computer readable medium having a program stored thereon for forming an image by decoding a group of wavelet coefficients of the image, the program comprising:
code for determining a most significant bit-plane index and a truncation bit-plane index for the group;
code for determining, from a variable length code, values for bits of the most significant bit-plane for a plurality of coefficients within the group;
code for receiving a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bitplanes of the group determined according to the most significant bit-plane index and the truncation bit-plane index; and
13307568 1
-402017204642 07 Jul 2017 code for determining the plurality of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
10. A system for encoding an image, comprising:
a memory for storing data and a computer readable medium;
a processor coupled to the memory for executing a computer program, the program having instructions for:
forming a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image;
determining a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and encoding the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group.
11. A system for decoding a group of wavelet coefficients of the image, comprising: a memory for storing data and a computer readable medium;
a processor coupled to the memory for executing a computer program, the program having instructions for:
determining a most significant bit-plane index and a truncation bit-plane index for the group;
determining, from a variable length code, values for bits of the most significant bit-plane for a plurality of coefficients within the group;
receiving a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bitplane index and the truncation bit-plane index; and determining the plurality of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
13307568 1
-41 2017204642 07 Jul 2017
12. A video encoder configured to:
form a word from bits of a most significant bit-plane of a group of wavelet coefficients of the image;
determine a variable length code for the formed word, the variable length code having fewer bits than the formed word for frequently occurring words; and encode the group of wavelet coefficients of the image using the determined variable length code for the formed word and fixed length encoding for remaining bits of the coefficients within the group
13. A video decoder configured to:
receive a group of wavelet coefficients of an image;
determine a most significant bit-plane index and a truncation bit-plane index for the group;
determine, from a variable length code, values for bits of the most significant bitplane for a plurality of coefficients within the group;
receive a plurality of fixed length codes to form remaining bits of the plurality of coefficients within the group, the remaining bits being for less significant bit-planes of the group determined according to the most significant bit-plane index and the truncation bitplane index; and determine the plurality of coefficients from the determined values for the bits of the most significant bit-plane and the received plurality of fixed length codes to form the image.
CANON KABUSHIKI KAISHA
AU2017204642A 2017-07-07 2017-07-07 Method, apparatus and system for encoding and decoding video data Abandoned AU2017204642A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2017204642A AU2017204642A1 (en) 2017-07-07 2017-07-07 Method, apparatus and system for encoding and decoding video data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2017204642A AU2017204642A1 (en) 2017-07-07 2017-07-07 Method, apparatus and system for encoding and decoding video data

Publications (1)

Publication Number Publication Date
AU2017204642A1 true AU2017204642A1 (en) 2019-01-24

Family

ID=65019210

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2017204642A Abandoned AU2017204642A1 (en) 2017-07-07 2017-07-07 Method, apparatus and system for encoding and decoding video data

Country Status (1)

Country Link
AU (1) AU2017204642A1 (en)

Similar Documents

Publication Publication Date Title
AU2020210276B2 (en) Method, apparatus and system for encoding and decoding video data
US10666948B2 (en) Method, apparatus and system for encoding and decoding video data
US11323745B2 (en) Method, apparatus and system for decoding and generating an image frame from a bitstream
Descampe et al. JPEG XS, a new standard for visually lossless low-latency lightweight image compression
Descampe et al. JPEG XS—A new standard for visually lossless low-latency lightweight image coding
US11831871B2 (en) Method and apparatus for intra sub-partitions coding mode
Richter et al. JPEG-XS—A high-quality mezzanine image codec for video over IP
EP2428032A2 (en) Bitstream syntax for graphics-mode compression in wireless hd 1.1
AU2017204642A1 (en) Method, apparatus and system for encoding and decoding video data
US9241163B2 (en) VC-2 decoding using parallel decoding paths
AU2017225027A1 (en) Method, apparatus and system for encoding and decoding video data
AU2017201971A1 (en) Method, apparatus and system for encoding and decoding image data
AU2017201933A1 (en) Method, apparatus and system for encoding and decoding video data
Taubman et al. High throughput JPEG 2000 for video content production and delivery over IP networks
Maini et al. A Review on JPEG 2000 Image Compression
AU2017210632A1 (en) Method, apparatus and system for encoding and decoding video data
Richter et al. Entropy coding, profiles, and levels of JPEG XS
Onno et al. JPEG2000: present and future of the new standard

Legal Events

Date Code Title Description
MK4 Application lapsed section 142(2)(d) - no continuation fee paid for the application