AU2017201933A1 - Method, apparatus and system for encoding and decoding video data - Google Patents

Method, apparatus and system for encoding and decoding video data Download PDF

Info

Publication number
AU2017201933A1
AU2017201933A1 AU2017201933A AU2017201933A AU2017201933A1 AU 2017201933 A1 AU2017201933 A1 AU 2017201933A1 AU 2017201933 A AU2017201933 A AU 2017201933A AU 2017201933 A AU2017201933 A AU 2017201933A AU 2017201933 A1 AU2017201933 A1 AU 2017201933A1
Authority
AU
Australia
Prior art keywords
video data
precinct
coefficients
coefficient
bit plane
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2017201933A
Inventor
Andrew James Dorrell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2017201933A priority Critical patent/AU2017201933A1/en
Publication of AU2017201933A1 publication Critical patent/AU2017201933A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Abstract A method (800) for decoding a precinct (410) of compressed video data from an encoded video bit-stream (115), the precinct of compressed video data including one or more 5 wavelet coefficient sub-bands , the method comprising the steps of: determining, for each value (5127) of a significance map (5114) of the precinct, a corresponding coefficient group position and coefficient group orientation (415, 416), the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index set (5128) of the coefficient group (5108); decoding (820) the coefficients of the coefficient 10 group using a most significant bit plane index (5129), the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying (815) an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct. P261450 / 12825164_2 - - - - - - - - - - - - - - > c c'J I--I IC> -~ -0 -- ~ IC/ a) -I ai cm, LO E 0 cor - 2 C.0

Description

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING VIDEO DATA
TECHNICAL FIELD
The present invention relates generally to digital video signal processing and, in particular, to a method, apparatus and system for encoding and decoding video data. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer program for encoding and decoding video data.
BACKGROUND
Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Much emphasis in video compression research is directed towards ‘distribution codecs’, i.e. codecs intended for distributing compressed video data to geographically dispersed audiences.
However, an emerging area of research is directed towards ‘mezzanine codecs’. Mezzanine codecs are used for local distribution, i.e. within a broadcast studio, and are characterised by requirements for ultra-low latency, typically well under one frame, and greatly reduced complexity, both for the encoder and the decoder. Recent developments in such coding within the International Organisations for Standardisation / International Electrotechnical Commission Joint Technical Committee 1 / Subcommittee 29 / Working Group 1 (ISO/IEC JTC1/SC29/WG1), also known as the Joint Photographic Experts Group (JPEG) have resulted in a standardisation work item named ‘JPEG XS’. The goal of JPEG XS is to produce a codec having an end-to-end latency not exceeding 32 lines of video data, and capability for implementation within relatively modest implementation technologies, e.g. mid-range FPGAs from vendors such as Xilinx ®. The latency requirements of JPEG XS mandate use of strict rate control techniques to ensure coded data does not vary excessively relative to the capacity of the channel carrying the compressed video data.
In a broadcast studio, video may be captured by a camera before undergoing several transformations, including real-time editing, graphic and overlay insertion and mixing. Once the video has been adequately processed, a distribution encoder is used to encode the video data for final distribution to end consumers. Within the studio, the video data is generally transported in an uncompressed format. Transporting uncompressed video data necessitates the use of very high speed links. Variants of the Serial Digital Interface (SDI) protocol can transport different video formats. For example, 3G-SDI (operating with a 3Gbps electrical link) can transport 1080p HDTV (1920x1080 resolution) at 30fps and 8 bits per sample. Interfaces having a fixed bit rate are suited to transporting data having a constant bit rate (CBR).
Uncompressed video data is generally CBR, and compressed video data, in the context of ultra-low latency coding, is generally expected to also be CBR. As bit rates increase, achievable cabling lengths reduce, which becomes problematic for cable routing through a studio. For example, UHDTV (3840x2160) requires a 4X increase in bandwidth compared to 1080p HDTV, implying a 12Gbps interface. Increasing the data rate of a single electrical channel reduces the achievable length of the cabling. At 3Gbps, cable runs generally cannot exceed 150m, the minimum usable length for studio applications.
One method of achieving higher rate links is by replicating cabling, e.g. by using four 3G-SDI links, with frame tiling or some other multiplexing scheme. However, the cabling replicating method increases cable routing complexity, requires more physical space, and may reduce reliability compared to use of a single cable.
Thus, a codec that can perform compression at relatively low compression ratios (e.g. 4:1) while retaining a ‘visually lossless’ (i.e. having no perceivable artefacts compared to the original video data) level of performance is required by industry.
Compression ratios may also be expressed as the number of ‘bits per pixel’ (bpp) afforded to the compressed stream, noting that conversion back to a compression ratio requires knowledge of the bit depth of the uncompressed signal, and the chroma format. For example 8b 4:4:4 video data occupies 24bpp uncompressed, so 4bpp implies a 6:1 compression ratio.
Video data includes one or more colour channels. Generally there is one primary colour channel and two secondary colour channels. The primary colour channel is generally referred to as the ‘luma’ channel and the secondary colour channel(s) are generally referred to as the ‘chroma’ channels. Video data is represented using a colour space, such as ‘YCbCr’ or ‘RGB’.
Some applications require visually lossless compression of the output of a computer graphics card, or transmission from a system-on-chip (SOC) in a tablet to the LCD panel in the tablet. Content from a graphics card or SOC often has different statistical properties from content captured from a camera, due to the use of rendering widgets, text, icons etc. The associated applications can be referred to as ‘screen content applications’. For screen content applications, ‘RGB’ is commonly used, as ‘RGB’ is the format generally used to drive LCD panels. The greatest visual signal strength is present in the ‘G’ (green) channel, so generally the G channel is coded using the primary colour channel, and the remaining channels (i.e. ‘B’ and ‘R’) are coded using the secondary colour channels. The arrangement may be referred to as ‘GBR’. When the ‘YCbCr’ colour space is in use, the Ύ channel is coded using the primary colour channel and the ‘Cb’ and ‘Cr’ channels are coded using the secondary colour channels.
Video data is also represented using a particular chroma format. The primary colour channel and the secondary colour channels are spatially sampled at the same spatial density when a 4:4:4 chroma format is in use. For screen content, a commonly used chroma format is 4:4:4, as generally LCD panels provide pixels in a 4:4:4 chroma format.
The bit-depth defines the bit width of samples in the respective colour channel, which implies a range of available sample values. Generally, all colour channels have the same bit-depth, although the colour channels may alternatively have different bit-depths.
Other chroma formats are also possible. For example, if the chroma channels are sampled at half the rate horizontally (compared to the luma channel), a 4:2:2 chroma format is said to be in use. Also, if the chroma channels are sampled at half the rate horizontally and vertically (compared to the luma channel), a 4:2:0 chroma format is said to be in use. These chroma formats exploit the characteristic of the human visual system that sensitivity to intensity is higher than sensitivity to colour. As such, reducing sampling of the colour channels without causing undue visual impact is possible. However, the reduction in sampling of the colour channels is less applicable to studio environments, where multiple generations of encoding and decoding are common. Also, for screen content the use of chroma formats other than 4:4:4 can be problematic as distortion is introduced to sub-pixel rendered (or ‘anti-aliased’) text and sharp object edges.
Frame data may also contain a mixture of screen content and camera captured content. For example, a computer screen may include various windows, icons and control buttons, text, and also contain a video being played, or an image being viewed. The content, in terms of the entirety of a computer screen, can be referred to as ‘mixed content’. Moreover, the level of detail (or ‘texture’) of the content varies within a frame. Generally, regions of detailed textures (e.g. foliage, text), or regions containing noise (e.g. from a camera sensor) are difficult to compress. The detailed textures can only be coded at a low compression ratio without losing detail. Conversely, regions with little detail (e.g. flat regions, sky, background from a computer application) can be coded with a high compression ratio, with little loss of detail.
In terms of low complexity, one popular solution is application of a ‘Wavelet’ transform, applied hierarchically across an image. Wavelet transforms are well-studied in the context of the JPEG2000 image coding standard. The wavelet transform application across an image differs from that using a block-based codec, such as H.264/AVC, which applies numerous discrete cosine transforms (DCTs) across the spatial extent of each frame. Each block in H.264/AVC is predicted using one of a variety of methods, achieving a high degree of local adaptivity, at a price of increased encoder complexity due to the need for mode decisions to be made. In contrast, the Wavelet transform is applied over a wide spatial area, and thus the prediction modes available to a block based codec are generally not applicable, resulting in a greatly reduced disparity in the complexity of the encoder and the decoder.
In the context of wavelet-based compression techniques, achieving high visual quality and useful compression is difficult, especially when strict local rate control is needed to meet ultra-low latency requirements.
SUMMARY
It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
According to a first aspect of the present invention, there is provided a method for decoding a precinct of compressed video data from an encoded video bit-stream, the precinct of compressed video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining, for each value of a significance map of the precinct, a corresponding coefficient group position and coefficient group orientation, the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index set of the coefficient group; decoding the coefficients of the coefficient group using a most significant bit plane index, the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct.
According to another aspect of the present invention, there is provided a method for decoding a precinct of compressed video data from a video bit-stream, the precinct of video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining a truncation bit plane for each sub-band in the wavelet structure; determining a significance map specifying most significant bit planes for corresponding groups of wavelet coefficients; determining a most significant bit plane for each wavelet coefficient location by expanding the significance map according to a coefficient group orientation wherein the coefficient group orientation is vertical for at least some coefficient locations; determining a set of wavelet sub-band coefficients based on the determined most significant bit plane and truncation bit plane and a sequence of data bits; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate a precinct of video data.
According to another aspect of the present invention, there is provided a method for encoding video data the method comprising the steps of: applying a forward wavelet transform to the video data to produce a precinct of groups of coefficients; determining a significance map representing indexes of most significant bit planes of magnitudes of the coefficients; determining a bit plane truncation level defining least significant bits of the coefficients to be discarded; and encoding the video data using a number of bits determined by the significance map and the bit plane truncation level.
According to another aspect of the present invention, there is provided a decoder comprising a processor and a memory storing a computer executable software program for directing the processor to perform a method for decoding a precinct of compressed video data from an encoded video bit-stream, the precinct of compressed video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining, for each value of a significance map of the precinct, a corresponding coefficient group position and coefficient group orientation, the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index set of the coefficient group; decoding the coefficients of the coefficient group using a most significant bit plane index, the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct.
According to another aspect of the present invention, there is provided a decoder comprising a processor and a memory storing a computer executable software program for directing the processor to perform a method for decoding a precinct of compressed video data from a video bit-stream, the precinct of video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining a truncation bit plane for each sub-band in the wavelet structure; determining a significance map specifying most significant bit planes for corresponding groups of wavelet coefficients; determining a most significant bit plane for each wavelet coefficient location by expanding the significance map according to a coefficient group orientation wherein the coefficient group orientation is vertical for at least some coefficient locations; determining a set of wavelet sub-band coefficients based on the determined most significant bit plane and truncation bit plane and a sequence of data bits; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate a precinct of video data.
According to another aspect of the present invention, there is provided an encoder comprising a processor and a memory storing a computer executable software program for directing the processor to perform a method for encoding video data the method comprising the steps of: applying a forward wavelet transform to the video data to produce a precinct of groups of coefficients; determining a significance map representing indexes of most significant bit planes of magnitudes of the coefficients; determining a bit plane truncation level defining least significant bits of the coefficients to be discarded; and encoding the video data using a number of bits determined by the significance map and the bit plane truncation level.
According to another aspect of the present invention, there is provided a computer readable non-transitory storage memory medium storing a program for directing the processor to perform a method for decoding a precinct of compressed video data from an encoded video bit-stream, the precinct of compressed video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining, for each value of a significance map of the precinct, a corresponding coefficient group position and coefficient group orientation, the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index set of the coefficient group; decoding the coefficients of the coefficient group using a most significant bit plane index, the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct.
According to another aspect of the present invention, there is provided a computer readable non-transitory storage memory medium storing a program for directing the processor to perform a method for decoding a precinct of compressed video data from a video bit-stream, the precinct of video data including one or more wavelet coefficient subbands, the method comprising the steps of: determining a truncation bit plane for each sub-band in the wavelet structure; determining a significance map specifying most significant bit planes for corresponding groups of wavelet coefficients; determining a most significant bit plane for each wavelet coefficient location by expanding the significance map according to a coefficient group orientation wherein the coefficient group orientation is vertical for at least some coefficient locations; determining a set of wavelet sub-band coefficients based on the determined most significant bit plane and truncation bit plane and a sequence of data bits; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate a precinct of video data.
According to another aspect of the present invention, there is provided a computer readable non-transitory storage memory medium storing a program for directing the processor to perform a method for method for encoding video data the method comprising the steps of: applying a forward wavelet transform to the video data to produce a precinct of groups of coefficients; determining a significance map representing indexes of most significant bit planes of magnitudes of the coefficients; determining a bit plane truncation level defining least significant bits of the coefficients to be discarded; and encoding the video data using a number of bits determined by the significance map and the bit plane truncation level.
Other aspects are also disclosed.
BRIEF DESCRIPTION OF THE DRAWINGS
At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:
Fig.l is a schematic block diagram showing a sub-frame latency video encoding and decoding system;
Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which one or both of the video encoding and decoding systems of Fig. 1 may be practiced;
Fig. 3 is a schematic flow diagram showing a process of a video encoder;
Fig. 4A is a schematic block diagram showing functional modules of a wavelet transform processor;
Fig. 4B is a diagram showing a logical organisation of wavelet transform subbands;
Fig. 4C is a diagram showing an in-memory organisation of the incremental output of a wavelet transform processor;
Fig. 5A is a schematic block diagram showing a sub-band coefficient grouping for representation in a bitstream, with a fixed truncation threshold;
Fig. 5B is a table showing encoded bits for an example group of wavelet coefficients;
Fig. 6 is a schematic flow diagram showing a process of determining coefficient grouping in an encoder;
Fig. 7A is a diagram showing a method of packing and scanning coefficient data using a vertical coefficient group;
Fig. 7B is a diagram showing a method of packing and scanning coefficient data using a horizontal coefficient group;
Fig. IQ is a diagram showing a method of packing and scanning coefficient data using a mixture of vertical and horizontal coefficient groups; and
Fig. 8A and 8B is a schematic flow diagram showing a method of decoding a bitstream with a variable coefficient grouping.
DETAILED DESCRIPTION INCLUDING BEST MODE
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
It is to be noted that the discussions contained in the "Background" section and that above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the inventor or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art.
Fig.l is a schematic block diagram showing functional modules of an exemplar sub-frame latency video encoding and decoding system 100. The system 100 transfers video data from a source device 110 to a destination device 130 via a cable 120, 121, 122. A video source 112 typically comprises a source of uncompressed video data 113, such as an imaging sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote imaging sensor. The uncompressed video data 113 is conveyed from the video source 112 to a video encoder 114 over a CBR channel, with fixed timing of the delivery of the video data. Generally, the video data is delivered in a raster scan format, with signalling to delineate between lines (‘horizontal sync’) and frames (‘vertical sync’). The video source 112 may also be the output of a computer graphics card, e.g. displaying the video output of an operating system and various applications executing upon a computing device, for example a tablet computer. Such content is an example of ‘screen content’. Examples of source devices 110 that may include an imaging sensor as the video source 112 include smart-phones, video camcorders and network video cameras. As screen content may itself include smoothly rendered graphics and playback of natural content in various regions, this is also commonly a form of ‘mixed content’. The video encoder 114 converts the uncompressed video data 113 from the video source 112 into an encoded (compressed) video data bit-stream 115 as described hereinafter in more detail with reference to Fig. 3.
The video encoder 114 encodes the incoming uncompressed video data 113. The video encoder 114 is required to process the incoming sample data in real-time, i.e., the video encoder 114 is not able to stall the incoming uncompressed video data 113, e.g., if the rate of processing the incoming data were to fall below the input data rate. The video encoder 114 outputs compressed video data 115 (the ‘codestream’) at a constant bit rate.
In a video streaming application, the entire codestream is not stored in any one location. Instead, “precincts” of compressed video data are continually being produced by the video encoder 114 and consumed by a video decoder 134, with intermediate storage, e.g., in the (CBR) communication channel 120. The term “precinct” refers to a coefficient set such as 410 depicted in Fig. 4C, and is the basis of a minimum coded unit (MCU) for the video encoder 114. Precincts are non-overlapping and each new precinct is generated, in the described example, by processing eight new rows of input frame samples. An example of a horizontally oriented coefficient block is depicted by a block 413, the block being one dimensional as it is one element high, oriented horizontally and located at a horizontal position within a row of the precinct depicted by an arrow 414 where a horizontal orientation is depicted by a dashed arrow 415. An example of a vertically oriented coefficient block is depicted by a block 417, the block being one dimensional as it is one element wide, oriented vertically and located at a horizontal position depicted by an arrow 418 where a vertical orientation is depicted by a dashed arrow 416. Both horizontal and vertical blocks are located within the precinct according to a position within a scan order which is modified according to a block orientation as later described with reference to in Figs. 7A-C. The horizontally oriented coefficient block is located at a 2D position within the precinct and within the full transform set.
The CBR stream compressed video data 115 is transmitted by the transmitter 116 over the communication channel 120. Examples of a communication channel include one or more SDI HDMI or display port links as well as other twisted pair links such as CAT5 (or similar) ethernet cable, optic fibre links. The communication channel can also be a radio connection such as that provided by Wi-Fi™ or Bluetooth®.
Video data may be transferred from the source device 110 to the destination device 130 via an intermediate device 125 such as a non-transitory storage device. A storage unit such as a “Flash” memory or a hard disk drive might be used, for example to provide a live delay for implementing a dump box functionality or other edit or overlay operation. In an intermediate device where digital storage is implemented, a receiver 126 may convert the signal received from a physical link 121 back to the encoded digital form as generated by the video encoder 114 which is written to the storage 127. A transmitter 128 is used to retransmit the video data over a physical link 122.
The destination device 130 includes a receiver 132, a video decoder 134 and a video sink such as a display device 136. The receiver 132 receives encoded video data from the communication channel 120 and passes received video data 133 to the video decoder 134. The video decoder 134 then outputs decoded frame data 135 to the video sink 136. Examples of the video sink 136 include a video display device such as a cathode ray tube, a liquid crystal display (such as in smart-phones), tablet computers, computer monitors or in stand-alone television sets. The video sink can be any other consumer of video data such as a video processing unit encoder or streaming server. It is also possible for the functionality of each of the source device 110 and the destination device 130 to be embodied in a single device, examples of which include mobile telephone handsets and tablet computers, or equipment within a broadcast studio including overlay insertion units.
The physical link 120 over which the video frames are delivered may be, for example, part of an SDI interface. Interfaces such as SDI have sample timing synchronised to a clock source, with horizontal and vertical blanking periods. As such, samples of the decoded video need to be delivered in accordance with the frame timing of the SDI link. Video data formatted for transmission over SDI may also be conveyed over Ethernet, e.g. using methods as specified in SMPTE ST. 2022-6. In the event that samples are not delivered according to the required timing, noticeable visual artefacts result (e.g. from invalid data being interpreted as sample values by the downstream device). For this reason, the video encoder 114 and decoder 134 implement rate control and buffer management mechanisms to ensure that no buffer underruns and resulting failure to deliver decoded video occur.
Rate variations may arise during compression due to variations in the complexity and time taken for the encoder 114 to search possible modes of the incoming video data 113. Accordingly, the rate control mechanism ensures that decoded video frames 135 from the video decoder 134 are delivered according to the timing of the interface over which the video frames are delivered. A similar constraint exists for the inbound link to the video encoder 114, which needs to encode samples in accordance with arrival timing and may not stall the incoming video data 113 to the video encoder 114, e.g. due to varying processing demand for encoding different regions of a frame. To meet these constraints, the video encoder 114 and the video decoder 134 will typically implement some buffering of video data. This buffering, at both the encoder 114 and the decoder 134, increases the end to end latency of the video transmission. As mentioned previously, the video encoding and decoding system 100 has a latency of less than one frame of video data. In particular, some applications require latencies as low as 32 lines of video data from the input of the video encoder 114 to the output of the video decoder 134. The latency may include time taken during input/output of video data and storage of partially-coded video data prior to and after transit over a communications channel.
The system 100 includes the source device 110 and the destination device 130. The communication channel 120 is used to communicate encoded video information from the source device 110 to the destination device 130. In some arrangements, the source device 110 and the destination device 130 may comprise respective broadcast studio equipment, such as overlay insertion and real-time editing modules, in which case the communication channel 120 may be an SDI link. In other arrangements, the source device 110 and the destination device 130 may comprise a graphics driver as part of a system-on-chip (SOC) and an LCD panel (e.g. as found in a smart phone, tablet or laptop computer) , in which case the communication channel 120 is typically a wired channel, such as printed circuit board (PCB) tracks and associated connectors.
Moreover, the source device 110 and the destination device 130 may comprise any of a wide range of devices, including devices supporting over the air television broadcasts, cable television applications, internet video applications and applications where encoded video data is captured on some storage medium or a file server. The source device 110 may also be a digital camera capturing video data and outputting the video data in a compressed format offering visually lossless compression, and as such the performance may be considered as equivalent to a truly lossless format (e.g. uncompressed).
Notwithstanding the example devices mentioned above, each of the source device 110 and destination device 130 may be configured within a general purpose computing system, typically through a combination of hardware and software components.
Fig. 2A illustrates such a computer system 200, which includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, which may be configured as the video source 112, and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220, which may represent the communication channel 120, may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220. The transceiver device 216 may provide the functionality of the transmitter 116 and the receiver 132 and the communication channel 120 may be embodied in the connection 221.
The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes a number of input/output (I/O) interfaces including: an audiovideo interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card and provides an example of ‘screen content’.
In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 211 may comprise an Ethernet™ circuit card, a Bluetooth™ wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 211. The local network interface 211 may also provide the functionality of the transmitter 116 and the receiver 132 and communication channel 120 may also be embodied in the local communications network 222.
The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the video source 112, or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 110, intermediate device 125 and the destination device 130 of the system 100, may be embodied in the computer system 200.
The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun SPARCstations, Apple Mac™ or alike computer systems.
Where appropriate or desired, the video encoder 114 and the video decoder 134, as well as methods described below, may be implemented using the computer system 200 wherein the video encoder 114, the video decoder 134 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 114, the video decoder 134 and the steps of the described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.
The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for implementing the video encoder 114, the video decoder 134 and the described methods.
The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212.
In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc™, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer module 401 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.
Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.
When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.
The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process.
Furthermore, the different types of memory available in the computer system 200 of Fig. 2A need to be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.
As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.
The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.
In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234. A video encoding method 300 (which may be used to implement the video encoder 114) and a video decoding method 800 (which may be used to implement the video decoder 134) may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder 114, the video decoder 134 and the described methods produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.
Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises: (a) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230; (b) a decode operation in which the control unit 239 determines which instruction has been fetched; and (c) an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.
Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.
Each step or sub-process in the method of Figs. 3, 6 and 8, to be described, is associated with one or more segments of the program 233 and is typically performed by the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233.
Fig. 3 is a flow diagram depicting the method 300 for encoding video data. The steps of the method 300 may be performed using the general-purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional steps may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200.
The described method 300 may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories.
Pixel data in the form of interleaved components in raster scan order (such as 113 in Fig. 1) is input to the encoding process 300. Though the scan order is not essential, this is a common pixel ordering and, in order to minimise latency, computational structures in the coder 114 are optimised to the ordering of the input data. A first step 305 in encoding is to perform a decorrelation across the colour components of the video data 113. The transform converts RGB values to YCgCo values 306 using a process defined by the following transform:
Where: R, G and B are the sample intensities of RGB pixel data and Y, Cg and Co are the luma component and two chroma components, respectively. The luma component captures the intensity information from the three colour (RGB) components. The luma image appears as a black and white (greyscale) image of the scene. The colour information is captured in the two chroma components. The transform from RGB to YCgCo improves compressibility of the video data 113 in two ways. Firstly, it achieves some decorrelation which improves the effectiveness of the subsequent wavelet transform. Secondly it is known that human visual sensitivity to fine detail is greater for the luma component than for the chroma components. This means that luma components can incur more loss, and hence more compression, for the same level of visual loss.
The YCgCo transform has a widely known, fully integer reversible implementation. Other colour transforms that achieve a similar decorrelation of the colour components may be used by the step 305 such as the YCbCr transform or the reversible colour transform employed by JPEG2000. Alternatively the colour transform may be omitted or the colour components reorganised into GBR order. In general, irrespective of the specific transform employed, the combination of luma and chroma components are referred to as YCC data. Subsequent to the step 305 the pixel components 306 are treated as if they were YCC data.
At a subsequent step 310, a forward wavelet transform is performed to produce DWT coefficients 311. A 5/3 Le Gall wavelet is used in the described example, although other wavelets may alternately be used, such as a Haar wavelet or a Cohen-Daubechies-Feauveau 9/7 wavelet. The 5/3 wavelet lends itself to efficient, integer reversible implementation referred to as lifting, however the transform can also be implemented using a filter bank approach. The step 310 applies the transform 310 independently to each of the color components in the YCC data 306, over the entire length of the input scan rows, and generates a low pass and high pass sub-band coefficients 311.
The process 310 is implemented to perform this transform incrementally such that two rows of input pixels in the YCC data 306 are processed to produce two rows of wavelet sub-band coefficients. The process is able to generate the first vertical wavelet coefficients for a frame after processing the first three rows of image samples in the pixel components 306, and generates a row of high pass and a row of low pass coefficients for each two rows of image samples 306 thereafter. Rows of vertical transform coefficients that are generated by the step 310 are subsequently transformed horizontally. Because all of the data required to perform the horizontal transform is contained in the input rows in the YCC data 306, any number of levels of transform can be performed without significant additional latency.
An example of the structure of the wavelet transform performed by the step 310 is described by Figs. 4A-C.
Fig. 4A is a block diagram depicting a wavelet transform process 420 as a series of ID, single level transform blocks. Each block implements either a ID vertical wavelet transform (e.g. a block 421) or a ID horizontal wavelet transform (e.g. blocks 424 to 429). Each block divides its input into two output sub-bands, namely a high pass sub-band (e.g. 422) and a low pass sub-band (e.g. 423). Each output sub-band produced by a particular transform block has half the number of samples of the input to the transform block in question - rounded up for the low pass and down for the high pass - so that the total number of output samples in 311 is always equal to the number of input samples in 306.
Each block performs a single level of wavelet transform. To achieve successive levels of transform, a transform block is selectively reapplied to the low pass output. The process 420 depicts one level of vertical wavelet transform 421 subsequently feeding one level of horizontal transform 424, applied to the vertical high-pass sub-band 422 and five levels of horizontal wavelet transform 425 to 429, applied to the vertical low pass sub-band to generate coefficient sub-bands 401 to 408.
Fig. 4B is a depiction of a logical sub-band layout 400 arising from the transform process 420. The layout 400 provides a structure according to which wavelet coefficients can be organised in memory. As noted above, the sub-bands 407 and 408 are the result of applying one level of horizontal transform to the high pass coefficients resulting from one level of vertical transform. The sub-bands 401 to 406 are the result of applying 5 levels of horizontal wavelet transform to the low pass coefficients resulting from one level of vertical transform.
Recalling that the DWT coefficients 311 are generated progressively as the raster scanned pixel data 306 is received and processed, it is reasonable, in a device with restricted memory, to confine processing to a narrow horizontal strip of coefficients that are calculated together. Such a strip 411,412 comprises corresponding rows from both the vertical high-pass derived bands 407 to 408 and vertical low pass derived bands 401 to 406. To facilitate meaningful processing of coefficients, four consecutive rows of coefficients are buffered for compression processing.
Fig. 4C shows the resulting coefficient set 410, this set of coefficients being referred to as a “precinct” this being the basis of a minimum coded unit (MCU) for the codec. Precincts are non-overlapping and each new precinct is generated by processing eight new rows of input frame samples.
The sub-band structure depicted in Fig. 4C is made up of sub-bands which are referred to by sub-band identifiers HL, HH, LH, L2H, L3H, L4H, L5H, and L5L.
Returning to Fig. 3, following the step 310 subsequent steps in the process 300 are applied sequentially to each precinct. A coefficient coding process starts at a step 320 where coefficient values are converted from the DWT coefficients 311 in a signed integer representation to a DWT coefficients 321 in a sign + magnitude representation. In this latter format, a sign bit is set to one if the coefficient value is negative or the sign bit is set to zero if the coefficient value is positive or zero. The magnitude is then always represented as a positive value.
At a following step 330 (described hereinafter in more detail with reference to Fig. 6), the coefficient magnitude values are processed to determine a significance map 5114. The significance map 5114 indicates, for each coefficient group, the maximum bit-plane containing at least one significant bit. For example, for coefficient group 5108, bit-plane 5129 sets the significant map 5114 due to the presence of a ‘ 1 ’ bit 5121 in the bit-plane 5129, resulting from the coefficient 5111 having a value of 150. This is described hereinafter in more detail with reference to Fig. 5A that records an index 5120 of the most significant bit plane (e.g. 5129) of magnitudes 510 of the coefficients 321. The significance map 5114 and the data bits 510 corresponding to the coefficient values are encoded separately. The significance map 5114 is encoded without loss. A following step 360 determines a bit plane truncation level (see 5117 - described hereinafter in more detail in Fig. 5A) based on the significance map 5114 and a bit budget. The bit budget for the precinct is a fixed fraction of the bit budget for the image frame which is in turn determined from the frame size, precision and frame rate (frames per second) combined with the bit rate that is supported by the communication channel 120. In general, the bit budget is determined by the following equation:
Where Bprecinct is the bit budget for a precinct, Bframe is the bit budget for a frame, Bheader is the budget required to send side information (also called header information) to the decoder and Nprecincts is the number of precincts that need to be coded and
Where Himage is the height of the video frame data 306 which is determined by the video source 112 and Hprecinct is the height of the precinct 410 which is preferably 8 lines. Further,
Where Wchannei is the capacity of the communication channel 120 in bits per second and Rframe is the frequency of frames in frames per second.
For constant bit rate operation each precinct has a fixed bit budget, Bprecinct, which cannot be exceeded. In cases of interest, the precinct budget is less than the size of the uncompressed video frame data 113. Then, to code the coefficients 321 some degree of truncation is employed to reduce the cost (number of bits required to represent the coefficients) to fit the available budget. The coding cost of the coefficients 321 is related to the degree of truncation because as the truncation level is raised, fewer bit-planes will be coded for each coefficient group. Moreover, at some point, coefficient groups will have no significant bit-planes coded, also reducing the coding cost of the significance map. The higher the truncation level, the lower the coding cost of the coefficients 321, however the lower the fidelity achievable when decoding. Of the coefficient groups retaining at least one coded significant bit-plane, the coding costs over all sub-bands are derived. Then, a truncation level is selected such that the total coding costs of the significance map 5114 and the significant coefficient group bit-planes 51251 and 51252 does not exceed the bit budget of the precinct, whilst minimising the degree of truncation.
The significance map 5114 and the bit plane truncation level 5117 determine the number of bits that will be allocated to encoding the coefficient magnitude, and hence a quantisation step size. The coefficient magnitude value is optionally rounded according to the determined quantisation step size.
The resulting quantised data bits are subsequently packed together in a step 390 with the significance map and other header data to form the compressed codestream 115. The quantised data bits are packed into the codestream without additional entropy coding. Compression is achieved because the significance map provides the basis for variable length coding of the coefficient bit planes (described hereinafter in more detail with reference to Fig. 5A). The compressed codestream 115 can also contains other information, including information required by the decoder 134, in the form of metadata such as the sub-band structure of the precinct, sub-band dimensions, number of components, quantisation parameters, component sample precision and colour transform,
as well as special coding features such as quantisation tables (to be used in combination with quantisation parameters) and default run orientation data.
Fig. 5A depicts 3 consecutive runs 5104, 5106 and 5108 of coefficients 5102 within a single sub-band. The aforementioned coefficient runs 5104, 5106 and 5108 are also referred to as coefficient groups or coefficient blocks. Bitplane magnitudes 510 and indexes 5120 for each of the coefficients 5102 are shown. The bitplane magnitudes 510 having the greatest bitplane index in the internal representation of the computer are used to store the sign bits 511. A sign bit is set to a ’ Γ if the coefficient value is negative, or it is set to Ό’ if the coefficient value is positive. Remaining bit planes 5110, 5112 and 5116 record positive magnitude information for the coefficients 5102. 5114 is a significance map and shows the significance map values (such as 5127) for the coefficient groups as determined at the step 330 of Fig. 3. There is one significance map sample (also referred to as the significance map value 5127) for every run of 4 sub-band coefficients (determination of run size is described hereinafter in more detail with reference to Fig. 6). The significance map values (also referred to as elements in the significance map) therefore record most significant non-zero bit plane indices for corresponding groups (e.g. 5108) of wavelet coefficients. I.e., each significance map value represents a bit-plane index, e.g. 5129, of a most significant bit plane index set, e.g. 5128. A line 5117 shows a bit plane truncation level selected for the sub-band as determined at the step 360 of Fig. 3. The two functions 5114 and 5117 (ie the significance map and the truncation level respectively), segment the bit planes into three bands. The top band 5110 contains only zeros. The bottom band 5116 contains least significant bits which fall below the quantisation step size defined by the truncation level 5117, the aforementioned least significant bits being discarded or rounded out in the encoding process so that the number of bits encoded for a coefficient location is always the number of bit planes between the significance map value for the group and the truncation level for the group. The middle band 5112 comprises bit planes that contain significant information. Because the significance map 5114 and the truncation level 5117 are separately encoded and known to the decoder, the significant bits 5112 can be transmitted raw, along with the corresponding sign bits, without further entropy encoding. In one arrangement, the bits are scanned from most significant to least significant within each run (described hereinafter in more detail with reference to Fig. 5B).
Fig. 5B depicts codestreams corresponding to the coefficient groups 5104, 5106 and 5108. As can be seen, group codes 5123, 5124 and 5125 implement a variable length code describing the coefficient values within the groups. Considering the group 5104 in Fig. 5B which comprises sign bits 5119 and magnitude bits 5109, the sign bits 5119 may be seen to correspond to 511 in Fig. 5A. Further considering the group 5104 in Fig. 5B the magnitude bits 5109 may be seen to correspond to the significant magnitude bits 5112 in Fig. 5A when scanned from left to right and top to bottom This form of coding provides a useful degree of compression with minimal complexity at the encoder and decoder.
The efficiency of the variable length coding method is compromised when a consecutive run of coefficients contains a significant range of magnitudes. In particular, too many bits will be used to represent the small values within the group. Thus, determining a best coefficient grouping is important to the performance of the codec. In order to maintain low complexity and hardware friendly processing however, fixed block sizes (also referred to as fixed coefficient run sizes or fixed coefficient group sizes) are desirable, as are fixed scanning patterns.
To maximise codec performance, an arrangement of the codec selects between vertical and horizontal runs of coefficients when calculating the significance map.
Fig. 6 is an example of a process 600 for implementing the process 330 of Fig. 3. For each sub-band, a run orientation is determined in a step 605 along with a run size in a following step 610. To calculate the significance map value, the coefficient magnitude bits are combined using a bit-wise OR operation in a following step 615 and the most significant bitplane of the result is determined in a following step 620.
Figs. 7A-7C show scanning patterns used to facilitate efficient hardware realisation of the codec. The scanning patterns are restricted to the sets depicted in Fig. 7A-7C and allow the position (e.g. 705) of each block within a 2D precinct to be determined based solely on the ID position (e.g. 706) of the block within its scan pattern. Generally, the precinct height determines both the length of the horizontal and vertical runs and the number of significance map samples that share a run orientation.
Fig. 7A depicts a vertical run coding arrangement 700. In this arrangement, vertical runs 701 of coefficient values are used to determine the significance map values. To ensure efficient use of memory where coefficients are calculated according to a row-wise raster scan order, vertical runs of coefficients are ordered according to a horizontal raster scan order 702.
Fig. 7B depicts a horizontal run coding arrangement 710. In this arrangement, horizontal runs 711 of coefficient values are used to determine the significance map values. To ensure efficient use of memory where coefficients are calculated according to a row wise raster scan order, horizontal runs of coefficients are ordered according to a horizontal raster scan order 712.
Fig. 7C depicts a mixed run coding arrangement 710. In this arrangement, horizontal runs 711 of coefficient values are mixed with vertical runs 701 of coefficients to determine the significance map values. Fig. 7C depicts a case for a run length of 4 coefficients. To ensure efficient use of memory where coefficients are calculated according to a row-wise raster scan order, horizontal runs of coefficients are ordered according to a hybrid scan order 722. The scan order of Fig. 7C can be switched between a vertical sequencing 723 (within the precinct) of horizontal runs and a horizontal sequencing 724 of vertical runs and requires the block orientation (ie the orientation of the run of coefficients) to be periodically signalled to the decoder. In the example of Fig. 7C, signalling occurs at the start of every four significance map samples and comprises a zero if the run orientation doesn’t change or a one if it does. This signalling arrangement is useful when a variable length code such as a unary code is used to encode significance map values and assumes that no change is the most frequent case. In an alternative arrangement a zero is used to signal a horizontal scan while a one is used to signal a vertical scan. The exact values used in this later case have no specific importance. Signalling can be performed for any appropriate grouping of coefficients.
For example, in another arrangement signalling of run orientation is performed once for each sub-band set within the precinct. In yet another arrangement, run orientation is signalled once for each sub-band set across all sub-bands. In yet another arrangement, the run orientations are fixed and known to both the encoder and decoder, thereby requiring no additional signalling within the codestream.
In one arrangement, the initial, default run orientation is determined by as a function of the sub-band according to the following table:
More generally, the default run orientation is determined according to whether the subband is a descendent of the initial vertical high-pass sub-band (e.g. 407, 408) or low pass sub-band (e.g. 401-406).
More generally again, whenever there is an equal number of transform levels in each orientation, the orientation of the runs used for significance map calculation should match the orientation of the low-pass filter. When an unequal number of transform levels in the horizontal and vertical orientations is employed, the default run orientation should be selected orthogonal to the direction of the majority of low pass or minority of high pass filtering operations used to create the sub-band.
The aforementioned arrangement assumes a particular distribution of spatial frequencies within the input image. The assumption may not be valid for all classes of image data. For example, the default run orientations given above, while broadly optimal for natural image content, may be less optimal for synthetic image data such as computer screen content which contains significantly large areas of flat colour separated by straight vertical and horizontal lines. For specific classes of image data like screen content, a different set of default run orientations may be considered better and in this case, the default run orientations can be signalled by placing the codec in a different mode of operation, providing the advantage of a more optimal scanning pattern without the overhead of additional signalling.
Alternatively, and if processing latency is relaxed, it would be possible to determine the coding cost involved for using all orientations, including mixed orientations, allowing the most efficient strategy to be used at all times.
Broadly speaking, run orientations should be selected in order to improve or maximise the correlation (i.e. reduce or minimise the variation) between coefficient magnitudes in a coefficient group such as 5108 in Fig.5A. In particular, run orientations should be selected in order to improve or maximise the correlation (i.e. reduce or minimise the variation) between coefficient magnitudes within a coefficient group, so there are less wasted bits within a group due to larger variance in the magnitudes of the enclosed coefficients, e.g.: fewer ‘0’ bits in the most significant coded bit-plane for the coefficient group.
In other variations, different run sizes are allowed for different sub-bands. In such implementations, run lengths would be signalled per sub-band. While variable run lengths have the potential to achieve more efficient coding, they have disadvantages in context of a low latency, low complexity hardware codec.
Fig. 8A is a flow diagram depicting the steps that would be performed by a decoder to reconstruct a video frame given the codstream generated by the process 300 of Fig. 3. The steps of the method 800 may be performed using a general-purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional steps may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200. The described method 800 may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories.
Upon receiving the codestream 115 in a step 890, metadata is unpacked in a following step 860 and decoded to yield information about the structure of the coded coefficients such as the sub-band structure. In the simplest case, wavelet transform basis and structure are fixed for a given operating mode of the codec so metadata is limited to sub-band dimensions, number of components, quantisation parameters, component sample precision and colour transform but can also include any information on special coding features such as quantisation tables (to be used in combination with quantisation parameters) and default run orientation data.
Subsequent steps 850, 840, 820, 815 and 805 are performed for each precinct until the entire frame is reconstructed. Truncation levels 5117 are determined for the encoded coefficients. In one arrangement the truncation level is determined separately for each coefficient corresponding to each sub-band set within the precinct. This determination is based on data in the codestream header. In other arrangements, more complex determinations are possible, including recovering encoded truncation values from the codestream. Subsequently, the significance map data 5114 is read and decoded from the codestream.
The process 840 by which the significance map is used to determine the most significant bit plane for each coefficient location is expanded in steps 830, 828, 826 and 824 in Fig. 8B. Specifically, for each significance map value recovered in the step 830, a run orientation determination step 828 and a run length determination step 826 are further made in order that the correct run of coefficient most significant bit plane indexes (MSBs) are determined in a step 824.
The run orientation determination is based on the index of the sub-band within the subband structure of the precinct (for example Fig. 4C) and the sub-band run orientations that are either known to both encoder and decoder or explicitly communicated via the codestream metadata. The run orientation may further be determined based on additional information encoded with the significance map values.
Having determined a truncation bit plane, MSB, run orientation and length, the corresponding coefficient data can be unpacked from the codestream into a sub-band array memory in the step 820. The Inverse wavelet transform step 815 and the colour transform step 805 are then performed to recover the encoded image pixel values.
INDUSTRIAL APPLICABILITY
The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video signals for a low-latency (sub-frame) video coding system.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word "comprising", such as “comprise” and “comprises” have correspondingly varied meanings.

Claims (15)

  1. CLAIMS:
    1. A method for decoding a precinct of compressed video data from an encoded video bit-stream, the precinct of compressed video data including one or more wavelet coefficient sub-bands , the method comprising the steps of: determining, for each value of a significance map of the precinct, a corresponding coefficient group position and coefficient group orientation, the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index across the coefficient group; decoding the coefficients of the coefficient group using a most significant bit plane index, the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct.
  2. 2. A method according to claim 1, wherein the coefficient group orientation varies according to a sub-band position in the precinct.
  3. 3. A method for decoding a precinct of compressed video data from a video bitstream, the precinct of video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining a truncation bit plane for each sub-band in the wavelet structure; determining a significance map specifying most significant bit planes for corresponding groups of wavelet coefficients; determining a most significant bit plane for each wavelet coefficient location by expanding the significance map according to a coefficient group orientation wherein the coefficient group orientation is vertical for at least some coefficient locations; determining a set of wavelet sub-band coefficients based on the determined most significant bit plane and truncation bit plane and a sequence of data bits; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate a precinct of video data.
  4. 4. A method according to claim 3, wherein: P261450/12825164_2 the wavelet structure has an unequal number of transform levels applied in the vertical and horizontal directions; and additional horizontal transform levels are applied selectively to the low pass band after an initial vertical transform.
  5. 5. A method according to claim 3, wherein the groups of wavelet coefficients comprise adjacent coefficients within corresponding wavelet sub-bands.
  6. 6. A method according to claim 3, wherein the order of the sequence of data bits can be independent of a coefficient group orientation used for determining the most significant bit plane.
  7. 7. A method for encoding video data the method comprising the steps of: applying a forward wavelet transform to the video data to produce a precinct of groups of coefficients; determining a significance map representing indexes of most significant bit planes of magnitudes of the coefficients; determining a bit plane truncation level defining least significant bits of the coefficients to be discarded; and encoding the video data using a number of bits determined by the significance map and the bit plane truncation level.
  8. 8. A method according to claim 7, wherein the precinct comprises four consecutive rows of coefficients corresponding to eight rows of input frame samples.
  9. 9. A method according to claim 7, wherein the determination of the significance map comprises the steps of: determining an orientation of the groups of coefficients depending upon a position of the coefficient group in the precinct; and determining the most significant bit plane for each group; wherein: the determined orientations maximise correlation between bit plane magnitudes in most significant bit plane index sets of the significance map.
  10. 10. A decoder comprising a processor and a memory storing a computer executable software program for directing the processor to perform a method for decoding a precinct of compressed video data from an encoded video bit-stream, the precinct of compressed video data including one or more wavelet coefficient sub-bands , the method comprising the steps of: determining, for each value of a significance map of the precinct, a corresponding coefficient group position and coefficient group orientation, the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index set of the coefficient group; decoding the coefficients of the coefficient group using a most significant bit plane index, the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct.
  11. 11. A decoder comprising a processor and a memory storing a computer executable software program for directing the processor to perform a method for decoding a precinct of compressed video data from a video bit-stream, the precinct of video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining a truncation bit plane for each sub-band in the wavelet structure; determining a significance map specifying most significant bit planes for corresponding groups of wavelet coefficients; determining a most significant bit plane for each wavelet coefficient location by expanding the significance map according to a coefficient group orientation wherein the coefficient group orientation is vertical for at least some coefficient locations; determining a set of wavelet sub-band coefficients based on the determined most significant bit plane and truncation bit plane and a sequence of data bits; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate a precinct of video data.
  12. 12. An encoder comprising a processor and a memory storing a computer executable software program for directing the processor to perform a method for encoding video data the method comprising the steps of: applying a forward wavelet transform to the video data to produce a precinct of groups of coefficients; determining a significance map representing indexes of most significant bit planes of magnitudes of the coefficients; determining a bit plane truncation level defining least significant bits of the coefficients to be discarded; and encoding the video data using a number of bits determined by the significance map and the bit plane truncation level.
  13. 13. A computer readable non-transitory storage memory medium storing a program for directing the processor to perform a method for decoding a precinct of compressed video data from an encoded video bit-stream, the precinct of compressed video data including one or more wavelet coefficient sub-bands , the method comprising the steps of: determining, for each value of a significance map of the precinct, a corresponding coefficient group position and coefficient group orientation, the coefficient group orientation being chosen to reduce a variation in a most significant bit plane index set of the coefficient group; decoding the coefficients of the coefficient group using a most significant bit plane index, the most significant bit plane index depending upon the corresponding coefficient group position and orientation; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate video data associated with the precinct.
  14. 14. A computer readable non-transitory storage memory medium storing a program for directing the processor to perform a method for decoding a precinct of compressed video data from a video bit-stream, the precinct of video data including one or more wavelet coefficient sub-bands, the method comprising the steps of: determining a truncation bit plane for each sub-band in the wavelet structure; determining a significance map specifying most significant bit planes for corresponding groups of wavelet coefficients; determining a most significant bit plane for each wavelet coefficient location by expanding the significance map according to a coefficient group orientation wherein the coefficient group orientation is vertical for at least some coefficient locations; determining a set of wavelet sub-band coefficients based on the determined most significant bit plane and truncation bit plane and a sequence of data bits; and applying an inverse wavelet transform to the decoded group of sub-band coefficients to generate a precinct of video data.
  15. 15. A computer readable non-transitory storage memory medium storing a program for directing the processor to perform a method for method for encoding video data the method comprising the steps of: applying a forward wavelet transform to the video data to produce a precinct of groups of coefficients; determining a significance map representing indexes of most significant bit planes of magnitudes of the coefficients; determining a bit plane truncation level defining least significant bits of the coefficients to be discarded; and encoding the video data using a number of bits determined by the significance map and the bit plane truncation level.
AU2017201933A 2017-03-22 2017-03-22 Method, apparatus and system for encoding and decoding video data Abandoned AU2017201933A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2017201933A AU2017201933A1 (en) 2017-03-22 2017-03-22 Method, apparatus and system for encoding and decoding video data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2017201933A AU2017201933A1 (en) 2017-03-22 2017-03-22 Method, apparatus and system for encoding and decoding video data

Publications (1)

Publication Number Publication Date
AU2017201933A1 true AU2017201933A1 (en) 2018-10-11

Family

ID=63709769

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2017201933A Abandoned AU2017201933A1 (en) 2017-03-22 2017-03-22 Method, apparatus and system for encoding and decoding video data

Country Status (1)

Country Link
AU (1) AU2017201933A1 (en)

Similar Documents

Publication Publication Date Title
AU2020210276B2 (en) Method, apparatus and system for encoding and decoding video data
TWI733986B (en) Method, apparatus and system for encoding and decoding video data
US11076172B2 (en) Region-of-interest encoding enhancements for variable-bitrate compression
JP2002010216A (en) Decoding apparatus, control method and storage medium
US7113645B2 (en) Image decompression apparatus and method
US11831871B2 (en) Method and apparatus for intra sub-partitions coding mode
US20200269133A1 (en) Game and screen media content streaming architecture
WO2010069059A1 (en) Video decoder
AU2017201933A1 (en) Method, apparatus and system for encoding and decoding video data
US9241163B2 (en) VC-2 decoding using parallel decoding paths
AU2017201971A1 (en) Method, apparatus and system for encoding and decoding image data
AU2017204642A1 (en) Method, apparatus and system for encoding and decoding video data
AU2017225027A1 (en) Method, apparatus and system for encoding and decoding video data
AU2017210632A1 (en) Method, apparatus and system for encoding and decoding video data
Onno et al. JPEG2000: present and future of the new standard

Legal Events

Date Code Title Description
MK1 Application lapsed section 142(2)(a) - no request for examination in relevant period