WO2017197434A1

WO2017197434A1 - Method, apparatus and system for encoding and decoding video data

Info

Publication number: WO2017197434A1
Application number: PCT/AU2017/000110
Authority: WO
Inventors: Christopher James ROSEWARNE; Jonathan GAN
Original assignee: Canon Kabushiki Kaisha; Canon Information Systems Research Australia Pty Ltd
Priority date: 2016-05-20
Filing date: 2017-05-17
Publication date: 2017-11-23
Also published as: AU2016203291A1

Abstract

A method of forming a portion of a video frame. Pixel values of an encoded independent slice segment for a slice of the video frame are encoded. The independent slice segment has a first predetermined bit size. At least one dependent slice segment for the slice of the video frame having a second predetermined bit size is decoded. The at least one dependent slice segment is dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment. The portion of the video frame is formed using the pixel values of the decoded independent slice segment and the dependent slice segment.

Description

METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING

VIDEO DATA

TECHNICAL FIELD

The present invention relates generally to digital video signal processing and, in particular, to a method, apparatus and system for encoding and decoding compressed video data. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer program for rate- controlled video compression.

BACKGROUND Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the "Joint Collaborative Team on Video Coding" (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation / International Electrotechnical Commission Joint Technical Committee 1 / Subcommittee 29 / Working Group 1 1 (ISO/IEC

JTC1/SC29/WG1 1), also known as the Moving Picture Experts Group (MPEG).

The Joint Collaborative Team on Video Coding (JCT-VC) has produced a new video coding standard that significantly outperfonns the "H.264/MPEG-4 AVC" (ISO/IEC 14496-10) video coding standard. The new video coding standard has been named "high efficiency video coding (HEVC)". Further development of high efficiency video coding (HEVC) is directed towards introducing improved support for content known variously as 'screen content' or 'discontinuous tone content'. Such content is typical of video output from a computer or a tablet device, e.g. from a DVI connector or as would be transmitted over a wireless HDMI link. The content is poorly handled by previous video compression standards and thus a new activity directed towards improving the achievable coding efficiency for this type of content is underway. Other developments, e.g. in the Video Electronics Standards Association (VESA), have been directed towards video coding algorithms capable of latencies under one frame. Traditional video compression standards, such as H.264/AVC and HEVC, have latencies of multiple frames, as measured from the input of the encoding process to the output of the decoding process. Codecs complying with such standards may be termed 'distribution codecs', as they are intended to provide compression for distribution of video data from a source, such as a studio, to the end consumer, e.g. via terrestrial broadcast or internet streaming. Note that HEVC does have signalling support for latencies under one frame, in the form of a Decoding Unit Supplementary Enhancement Information (SEI) message. The Decoding Unit SEI message is an extension of the timing signalling present in the Picture Timing SEI message, allowing specification of the timing of units less than one frame. However, the signalling is insufficient to achieve very low latencies with minimal buffering, and the consequently tight coupling of the encoding and decoding processes. Applications requiring low latency are generally present within a broadcast studio. In a broadcast studio, video may be captured by a camera before undergoing several transformations, including real-time editing, graphic and overlay insertion and muxing. Once the video has been adequately processed, a distribution encoder is used to encode the video data for final distribution to end consumers. Within the studio, the video data is generally transported in an uncompressed format. This necessitates the use of very high speed links. Variants of the Serial Digital Interface (SDI) protocol can transport different video formats. For example, 3G-SDI (operating with a 3Gbps electrical link) can transport 1080p HDTV (1920x1080 resolution) at 30fps and 8 bits per sample. Interfaces having a fixed bit rate are suited to transporting data having a constant bit rate (CBR). Uncompressed video data is generally CBR, and compressed video data may also be CBR. As bit rates increase, achievable cabling lengths reduce, which becomes problematic for cable routing through a studio. For example, UHDTV (3840x2160) requires a 4X increase in bandwidth compared to 1080p HDTV, implying a 12Gbps interface. Increasing the data rate of a single electrical channel reduces the achievable length of the cabling. At 3Gbps, cable runs generally cannot exceed 150m, the minimum usable length for studio applications. One method of achieving higher rate links is by replicating cabling, e.g. by using four 3G-SDI links, with frame tiling or some other multiplexing scheme. However, the cabling replicating method increases cable routing complexity, requires more physical space, and may reduce reliability compared to use of a single cable. Thus, a codec that can perform compression at relatively low compression ratios (e.g. 4: 1 ) while retaining a 'visually lossless' (i.e. having no perceivable artefacts compared to the original video data) level of performance is desired.

Activity within VESA has produced a standard named Display Stream

Compression (DSC) and is standardising a newer variant named Advanced Display Stream Compression (ADSC). However, this activity is directed more towards distribution of high-resolution video data within electronic devices, such as smart phones and tablets, as a means of reducing the printed circuit board (PCB) routing difficulties for supporting very high resolutions (e.g. as used in 'retina' displays), by reducing either clock rate of the required PCB traces. As such, ADSC is targeting applications where a single encode- decode cycle ('single-generation' operation) is anticipated. Within a broadcast studio, video data is typically passed between several processing stages prior to final encoding for distribution. For passing UHD video data through bandwidth-limited interfaces, such as 3G-SDI, multiple encode-decode cycles ('multi -generational' operation) is anticipated. Then, the quality level of the video data must remain visually lossless after as many as seven encode-decode cycles.

Video data includes one or more colour channels. Generally there is one primary colour channel and two secondary colour channels. The primary colour channel is generally referred to as the 'luma' channel and the secondary colour channel(s) are generally referred to as the 'chroma' channels. Video data is represented using a colour space, such as 'YCbCr' or 'RGB'. For screen content applications, 'RGB' is commonly used, as this is the fonnat generally used to drive LCD panels. Note that the greatest signal strength is present in the 'G' (green) channel, so generally the G channel is coded using the primary colour channel, and the remaining channels (i.e. 'B' and 'R') are coded using the secondary colour channels. This arrangement may be referred to as 'GBR'. When the

'YCbCr' colour space is in use, the 'Y' channel is coded using the primary colour channel and the 'Cb' and 'Cr' channels are coded using the secondary colour channels.

Video data is also represented using a particular chroma format. The primary colour channel and the secondary colour channels are spatially sampled at the same spatial density when the 4:4:4 chroma format is in use. For screen content, the commonly used chroma format is 4:4:4, as generally LCD panels provide pixels at a 4:4:4 chroma format. The bit-depth defines the bit width of samples in the respective colour channel, which implies a range of available sample values. Generally, all colour channels have the same bit-depth, although they may alternatively have different bit-depths. Other chroma formats are also possible. For example, if the chroma channels are sampled at half the rate vertically (compared to the luma channel), a 4:2:2 chroma format is said to be in use.

Also, if the chroma channels are sampled at half the rate horizontally and vertically

(compared to the luma channel), a 4:2:0 chroma format is said to be in use. These chroma formats exploit a characteri sti c of the human visual system that sensitivity to intensity is higher than sensitivity to colour. As such, it is possible to reduce sampling of the colour channels without causing undue visual impact. However, this property is less applicable to studio environments, where multiple generations of encoding and decoding are common. Also, for screen content the use of chroma formats other than 4:4:4 can be problematic as distortion is introduced to aliased text and sharp object edges.

Frame data may also contain a mixture of screen content and camera captured content. For example, a computer screen may include various windows, icons and control buttons, text, and also contain a video being played, or an image being viewed. Such content, in terms of the entirety of a computer screen, can be referred to as 'mixed content'. Moreover, the level of detail (or 'texture') varies within a frame. Generally, regions of detailed textures (e.g. foliage, text), or resulting from noise (e.g. from a camera sensor) are difficult to compress. The detailed textures can only be coded at a low compression ratio without losing detail. Conversely, regions with little detail (e.g. flat regions, sky, background from a computer application) can be coded with a high compression ratio, with little loss of detail.

In the context of sub-frame latency video compression, the buffering included in the video encoder and the video decoder is generally substantially smaller than one frame (e.g. only dozens of lines of samples). Then, the video encoder and video decoder must not only operate in real-time, but also with sufficiently tightly controlled timing that the available buffers do not underrun or overrun. In the context of real-time operation, it is not possible to stall input or delay output (e.g. due to buffer overrun or underrun). If input w^ras stalled or output delayed, the resul t would be some highly noticeable distortion of the video data being passed through the video encoder and decoder. Thus, a need exists for algorithms to control the behaviour of the video encoder and decoder to avoid such situations. SUMMARY

It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

According to one aspect of the present application, there is provided a method of forming a portion of a video frame, the method comprising:

decoding pixel values of an encoded independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size;

decoding at least one dependent slice segment for the slice of the video frame having a second predetermined bit size, the at least one dependent slice segment being dependent on the independent slice segment using pixel values of the independent sli ce segment to determine pixel val ues of the dependent slice segment; and

forming the portion of the video frame using the pixel values of the decoded independent slice segment and the dependent slice segment. According to another aspect of the present application, there is provided a decoder for forming a portion of a video frame, the decoder comprising:

decoder module for decoding pixel values of an encoded independent slice segment for a slice of the video frame, the independent sl ice segment having a first predetermined bit size, and for decoding at least one dependent slice segment for the slice of the video frame having a second predetermined bit size, the at least one dependent slice segment being dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

forming module for forming the portion of the video frame using the pixel values of the decoded independent slice segment and the dependent slice segment.

According to still another aspect of the present application, there is provided a computer readable medium having a program stored thereon for forming a portion of a video frame, the program comprising:

code for decoding pixel values of an encoded independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size; code for decoding at least one dependent slice segment for the slice of the video frame having a second predetermined bit size, the at least one dependent slice segment being dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

code for forming the portion of the video frame using the pixel values of the decoded independent slice segment and the dependent slice segment.

According to still another aspect of the present application, there i s provided a system for forming a portion of a video frame, the system comprising:

a memory for storing data and a computer readable medium;

a processor coupled to the memory for executing a computer program, the program having instructions for:

decoding at least one dependent slice segment for the slice of the video frame having a second predetermined bit size, the at least one dependent slice segment being dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

forming the portion of the video frame using the pixel values of the decoded independent slice segment and the dependent slice segment.

According to still another aspect of the present application, there is provided a method of forming an encoded portion of a video frame, the method comprising:

encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size, the first predetermined bit size being determined according to an encoder rate control target;

encoding a dependent slice segment, for the slice of the video stream, having a second predetermined bit size smaller than the first predetermined bit size, the second predetermined bit size being determined according to the encoder rate control target, wherein the at least one dependent slice segment is dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and storing the portion of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to fonn the encoded portion of the video frame.

According to still another aspect of the present application, there is provided an encoder for forming a portion of a video frame, the encoder comprising:

an encoder module for encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size, the first predetermined bit size being determined according to an encoder rate control target, and for encoding a dependent slice segment, for the slice of the video stream, having a second predetermined bit size smaller than the first predetermined bit size, the second

predetermined bit size being determined according to the encoder rate control target, wherein the at least one dependent slice segment is dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment;

a storage module for storing the porti on of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to form the encoded portion of the video frame.

According to still another aspect of the present application, there is provided a computer readable medium having a program stored thereon for forming an encoded portion of a video frame, the program comprising:

code for encoding an independent sli ce segment for a slice of the video frame, the independent slice segment having a first predetennined bit size, the first predetermined bit size being determined according to an encoder rate control target;

code for encoding a dependent slice segment, for the slice of the video stream, having a second predetermined bit size smaller than the first predetermined bit size, the second predetermined bit size being determined according to the encoder rate control target, wherein the at least one dependent slice segment is dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

code for storing the portion of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to form the encoded portion of the video frame. According to still another aspect of the present application, there is provided a system for forming an encoded portion of a video frame, the method comprising:

a memory for storing data and a computer readable medium;

encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size, the first

predetermined bit size being determined according to an encoder rate control target; encoding a dependent slice segment, for the slice of the video stream, having a second predetermined bit size smaller than the first predetennined bit size, the second predetermined bit size being determined according to the encoder rate control target, wherein the at least one dependent slice segment is dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

storing the portion of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to fonn the encoded portion of the video frame.

Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:

Fig. 1 is a schematic block diagram showing a sub-frame latency video encoding and decoding system;

Figs. 2A and 2B fonn a schematic block diagram of a general purpose computer system upon which one or both of the video encoding and decoding system of Fig. 1 may be practiced;

Fig. 3 is a schematic block diagram showing functional modules of a video encoder; Fig. 4 is a schematic block diagram showing functional modules of a video decoder;

Fig. 5A is a schematic block diagram showing square coding tree unit (CTU) configurations for the sub-frame latency video encoding and decoding system of Fig. 1 ; Fig. 5B is a schematic block diagram showing non-square coding tree unit (CTU) configurations for the sub-frame latency video encoding and decoding system of Fig. 1 ;

Fig. 5C is a schematic block diagram showing square block configurations for the sub-frame latency video encoding and decoding system of Fig. 1;

Fig. 6A is a schematic diagram showing a decomposition of a frame into a set of slices, suitable for use with the sub-frame latency video encoding and decoding system of Fig- i;

Fig. 6B is a schematic showing coupling of timing of the video encoder of Fig 1 regarding receipt of uncompressed video data from a video source 1 12 and deli very of compressed video data; Fig. 7 is a schematic diagram of the data flow of the sub-frame latency video encoding and decoding system of Fig. 1 ; and

Fig. 8 is a schematic flow diagram showing a method for forming an encoded portion of a frame.

DETAILED DESCRIPTION INCLUDING BEST MODE Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

Fig. 1 is a schematic block diagram showing function modules of a sub-frame latency video encoding and decoding system 100. The system 100 may use a rate control mechanism to ensure delivery of portions of a frame by video encoder 1 14 within a timeframe that allows video decoder 134 to deliver decoded frame data in real time. The rate control mechanism ensures that no buffer underruns and resulting failure to deliver decoded video occur (e.g. due to variations in the complexity and time taken for encoder searching of possible modes) of the incoming video data to the video encoder 1 14), so that decoded video frames from the video decoder 1 14 are delivered according to the timing of the interface over which the video frames are delivered. The interface over which the video frames are delivered may be, for example, SDl. Interfaces such as SDl have sample timing synchronised to a clock source, with horizontal and vertical blanking periods. As such, samples of the decoded video need to be delivered in accordance with the frame timing of the SDl link. Video data formatted for transmission over SDl may also be conveyed over Ethernet, e.g. using methods as specified in SMPTE ST. 2022-6. In the event that samples were not delivered according to the required timing, noticeable visual artefacts would result (e.g. from invalid data being interpreted as sample values by the downstream device). Accordingly, the rate control mechanism ensures that no buffer overruns and resulting inability to process incoming video data occur. A similar constraint exists for the inbound SDl link to the video encoder 1 14, which needs to encode samples in accordance with arrival timing and may not stall incoming video data to the video encoder 114, e.g. due to varying processing demand for encoding different regions of a frame.

As mentioned previously, the video encoding and decoding system 100 has a latency of less than one frame of video data. In particular, some applications require latencies as low as 32 lines of video data from the input of the video encoder 1 14 to the output of the video decoder 134. The latency may include time taken during input/output of video data and storage of partially-coded video data prior to and after transit over a communications channel. Generally, video data is transmitted and received in raster scan order, e.g. over an SDl link. However, the video encoding and decoding system 100 processes video data in coding tree units "CTUs". Each frame is divided into an array of square-shaped CTUs. The video encoder 1 14 requires all samples in a given CTU before encoding of that CTU can begin. The structure of a CTU is described further with reference to Fig. 5A. The structure of a frame is described further with reference to Fig. 6, and the timing of the video encoding and decoding system 100 is described further with reference to Fig. 7. An alternative structure, using a non-square shaped CTU, is described with reference to Fig. 5B.

The system 100 includes a source device 1 10 and a destination device 130. A communication channel 120 is used to communicate encoded video information from the source device 1 10 to the destination device 130. In some arrangements, the source device 110 and destination device 130 may comprise respective broadcast studio equipment, such as overlay insertion and real-time editing module, in which case the communication channel 120 may be an SD1 link. In other arrangements, the source device 110 and destination device 130 may comprise a graphics driver as part of a system-on-chip (SOC) and an LCD panel (e.g. as found in a smart phone, tablet or laptop computer) , in which case the communication channel 120 is typically a wired channel, such as PCB trackwork and associated connectors. Moreover, the source device 1 10 and the destination device 130 may comprise any of a wide range of devices, including devices supporting over the air television broadcasts, cable television applications, internet video applications and applications where encoded video data is captured on some storage medium or a file server. The source device 1 10 may also be a digital camera capturing video data and outputting the video data in a compressed format offering visually lossless compression, as such the performance may be considered as equivalent to a truly lossless format (e.g.

uncompressed).

As shown in Fig. 1 , the source device 1 10 includes a video source 1 12, the video encoder 1 14 and a transmitter 1 16. The video source 112 typically comprises a source of captured video frame data, such as an imaging sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote imaging sensor. The video source 1 12 may also be the output of a computer graphics card, e.g. displaying the video output of an operating system and various applications executing upon a computing device, for example a tablet computer. Such content is an example of 'screen content'. Examples of source devices 1 10 that may include an imaging sensor as the video source 1 12 include smart-phones, video camcorders and network video cameras. The video encoder 1 14 converts the captured frame data from the video source 112 into encoded video data and will be described further with reference to Fig. 3.

The video encoder 1 14 encodes a given frame as the frame is being input to the video encoder 1 14. The frame is generally input to the video encoder 114 as a sequence of samples in raster scan order, from the uppermost line in the frame to the lowermost line in the frame. The video encoder 1 14 is required to process the incoming sample data in realtime, i.e., it is not able to stall the incoming sample data if the rate of processing the incoming data were to fall below the input data rate. The encoded video data is typically an encoded bitstream containing a sequence to blocks of compressed video data. In a video streaming application, the entire bitstream is not stored in any one location. Instead, the blocks of compressed video data are continually being produced by the encoder and consumed by the decoder, with intermediate storage, e.g., in the communication channel 120. Blocks of compressed video data are transmitted by the transmitter 1 16 over the communication channel 120 (e.g. an SDI link) as encoded video data (or "encoded video information"). The coded picture buffer is used to store a portion of the frame in encoded form and generally comprises a non-transitory memory buffer. It is also possible for the encoded video data to be stored in a non-transitory storage device 122, such as a "Flash" memory or a hard disk drive, until later being transmitted over the communication channel 120, or in-lieu of transmission over the communication channel 120.

The destination device 130 includes a receiver 132, a video decoder 134 and a display device 136. The receiver 132 receives encoded video data from the

communication channel 120 and passes received video data to the video decoder 134. The video decoder 134 then outputs decoded frame data to the display device 136. Examples of the display device 136 include a cathode ray tube, a liquid crystal display (such as in smart-phones), tablet computers, computer monitors or in stand-alone television sets. It is also possible for the functionality of each of the source devi ce 1 10 and the destinati on device 130 to be embodied in a single device, examples of which include mobile telephone handsets and tablet computers. Notwithstanding the example devices mentioned above, each of the source device

1 10 and destination device 130 may be configured within a general purpose computing system, typically through a combination of hardware and software components. Fig. 2 A illustrates such a computer system 200, which includes: a computer module 201 ; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, which may be configured as the video source 1 12, and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220, which may represent the communication channel 120, may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional "dial- up" modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220. The transceiver device 216 may provide the functionality of the transmitter 1 16 and the receiver 132 and the communication channel 120 may be embodied in the connection 221. The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes a number of input/output (I/O) interfaces including: an audio- video interface 207 that couples to the video display 214, loudspeakers 217 and

microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is generally the output of a computer graphics card and provides an example of 'screen content'. In some

implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 21 1, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called "firewall" device or device of similar functionality. The local network interface 21 1 may comprise an Ethernet™ circuit card, a Bluetooth^{1 M} wireless arrangement or an IEEE 802.1 1 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 21 1. The local network interface 21 1 may also provide the functionality of the transmitter 1 16 and the receiver 132 and communication channel 120 may also be embodied in the local communications network 222.

The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the computer system 200. Typically, any of the HDD 210, optical drive 212, networks 220 and 222 may also be configured to operate as the video source 1 12, or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 1 10 and the destination device 130 of the system 100, or the source device 1 10 and the destination device 130 of the system 100 may be embodied in the computer system 200.

The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun SPARCstations, Apple Mac^IM or alike computer systems.

Where appropriate or desired, the video encoder 1 14 and the video decoder 134, as well as methods described below, may be implemented using the computer system 200 wherein the video encoder 114, the video decoder 134 and methods to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the video encoder 1 14, the video decoder 134 and the steps of the described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for implementing the video encoder 114, the video decoder 134 and the described methods.

The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212.

In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc^{I M}, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of the software, application programs, instructions and/or video data or encoded video data to the computer

module 401 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be

implemented, such as an audio interface utilizing speech prompts output via the

loudspeakers 217 and user voice commands input via the microphone 280. Fig. 2B is a detailed schematic block diagram of the processor 205 and a

"memory" 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A. When the computer module 201 is initially powered up, a power-on self-test

(POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface. The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process.

Furthermore, the different types of memory avai lable in the computer system 200 of Fig. 2A need to be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.

As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

The appli cation program 233 includes a sequence of instructi ons 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228- 230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

In general , the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input

devices 202, 203, data received from an external source across one of the

networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

The video encoder 1 14, the video decoder 134 and the described methods may use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The video encoder 1 14, the video decoder 134 and the described methods produce output variables 261 , which are stored in the memory 234 in

corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:

(a) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;

(b) a decode operation in which the control unit 239 determines which instruction has been fetched; and

(c) an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

Each step or sub-process in the method of Fig. 1 1, to be described, is associated with one or more segments of the program 233 and is typically performed by the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233.

Fig. 3 is a schematic block diagram showing functional modules of the video encoder 1 14. Fig. 4 is a schematic block diagram showing functional modules of the video decoder 134. Generally, data is passed between functional modules within the video encoder 1 14 and the video decoder 134 in blocks or arrays (e.g., blocks of samples or blocks of transform coefficients). Where a functional module is described with reference to the behaviour of individual array elements (e.g., samples or transform coefficients), the behaviour shall be understood to be applied to all array elements. The video encoder 1 14 and video decoder 134 may be implemented using a general -purpose computer system 200, as shown in Figs. 2A and 2B, where the various functional modules may be implemented by dedicated hardware within the computer system 200, by software executable within the computer system 200 such as one or more software code modules of the software application program 233 resident on the hard disk drive 205 and being controlled in its execution by the processor 205, or alternatively by a combination of dedicated hardware and software executable within the computer system 200. The video encoder 1 14, the video decoder 134 and the described methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or one or more microprocessors and associated memories. In particular the video encoder 1 14 comprises modules 320-348 and the video decoder 134 comprises modules 420-432 which may each be implemented as one or more software code modules of the software application program 233, or an FPGA 'bitstream file' that configures internal logic blocks in the FPGA to realise the video encoder 1 14 and the video decoder 134.

Although the video encoder 1 14 of Fig. 3 is an example of a low latency video encoding pipeline, other video codecs may also be used to perform the processing stages described herein. The video encoder 1 14 receives captured frame data, such as a series of frames, each frame including one or more colour channels.

The video encoder 1 14 divides each frame of the captured frame data, such as frame data 310, into regions generally referred to as 'coding tree units' (CTUs) with side sizes which are powers of two. The coding tree units (CTUs) in a frame are scanned in raster scan order and the sequentially scanned coding tree units (CTUs) are grouped into 'slices'. A division of each frame into multiple slices provides 'random access'

(commencing of decoding at a point other than the start of a bitstream) at each slice boundary within a frame. The term coding tree unit (CTU) refers collecti vely to all colour channels of the frame. Every coding tree unit (CTU) includes one coding tree block (CTB) for each colour channel. For example, in a frame coded using the YCbCr colour space, a coding tree unit (CTU) consists of tree coding tree blocks (CTBs) for Y, Cb and Cr colour planes corresponding to the same spatial location in the picture. The size of individual coding tree blocks (CTBs) may vary across colour components and generally depends on the selected 'chroma format'. For example, for the 4:4:4 chroma format, the sizes of the coding tree blocks (CTBs) will be the same. For the mode 4:2:0 chroma format, the dimensions of chroma coding tree blocks (CTBs) in samples are halved (both horizontally and vertically) relative to the size of the luma coding tree block (CTB). The size of a coding tree unit (CTU) is specified as the size of the corresponding luma coding tree block (CTB). The sizes of the chroma coding tree blocks (CTBs) are inferred from the size of the coding tree unit (CTU) and the chroma format.

Each coding tree unit (CTU) includes a hierarchical quad-tree subdivision of a portion of the frame with a collection of 'coding units' (CUs), such that at each leaf node of the hierarchical quad-tree subdivision one coding unit (CU) exists. The subdivision can be continued until the coding units (CU) present at the leaf nodes have reached a specific predetermined minimum size. The specific minimum size is referred to as a smallest coding unit (SCU) size. Generally, the smallest coding unit (SCU) size is 8x8 luma samples, but other sizes are also possible, such as 16x16 or 32x32 luma samples. For low latency video coding, smaller CTUs are desirable, as the resulting smaller blocks require less buffering prior to encoding and less buffering after decoding for conversion to/from line-based raster scan input/output of samples. The corresponding coding block (CB) for the luma channel has the same dimensions as the coding unit (CU). The corresponding coding blocks (CBs) for the chroma channels have dimensions scaled according to the chroma format. If no subdivi sion of a coding tree unit (CTU) is done and a single coding unit (CU) occupies the whole coding tree unit (CTU), such a coding unit (CU) is referred to as a largest coding unit (LCU) (or maximum coding unit size). These dimensions are also specified in units of luma samples. As a result of the quad-tree hierarchy, the entirety of the coding tree unit (CTU) is occupied by one or more coding units (CUs). The largest coding unit size is signalled in the bitstream for a collection of frames known as a coded video sequence. For a given frame, the largest coding unit (LCU) size and the smallest coding unit (SCU) size do not vary.

The video encoder 1 14 produces one or more 'prediction units' (PUs) for each coding block (CU). A PU includes all colour channels and is divided into one prediction block (PB) per colour channel. Various arrangements of prediction units (PUs) in each coding unit (CU) are possible and each arrangement of prediction units (PUs) in a coding unit (CU) is referred to as a 'partition mode'. It is a requirement that the prediction units (PUs) do not overlap and that the entirety of the coding unit (CU) is occupied by the one or more prediction units (PUs). Such a requirement ensures that the prediction units (PUs) cover the entire frame area. A partitioning of a coding unit (CU) into prediction units (PUs) implies subdivision of coding blocks (CBs) for each colour component into

'prediction blocks' (PBs). Depending on used chroma format, the sizes of prediction blocks (PBs) corresponding to the same coding unit (CU) for different colour component may differ in size. For coding units (CUs) configured to use intra-prediction, two partition modes are possible, known as 'PART_2Nx2N' and 'PART NxN'. The PART_2Nx2N partition mode results in one prediction unit (PU) being associated with the coding unit (CU) and occupying the entirety of the coding unit (CU). The PART NxN partition mode results in four prediction units (PUs) being associated with the coding unit (CU) and collectively occupying the entirety of the coding unit (CU) by each occupying one quadrant of the coding unit (CU).

The video encoder 1 14 operates by outputting a prediction unit (PU) 378. When intra-prediction is used, a transfonn block (TB)-based reconstmction process is applied for each colour channel. The TB-based reconstruction process results in the prediction unit (PU) 378 being derived on a TB basis. As such, a residual quad-tree decomposition of the coding unit (CU) associated with the prediction unit (PU) indicates the arrangement of TUs, and hence TBs, to be reconstructed to reconstruct the PU 378. A difference module 344 produces a 'residual sample array' 360. The residual sample array 360 is the difference between the PU 378 and a corresponding 2D array of data samples from a coding unit (CU) of the coding tree block (CTB) of the frame data 310. The difference is calculated for corresponding samples at each location in the array. The transfonn module 320 may apply a forward DCT to transfonn the residual sample array 360 into the frequency domain, producing 'transform coefficients'. An 8x8 CU is always divided into an 8x8 TU, however multiple configurations of the 8x8 TU are possible, as described further with reference to Figs. 5 A and 5B.

Within the TU, individual TBs are present and TB boundaries do not cross PB boundaries. As such, when the coding unit (CU) is configured to use a PART NxN partition mode, the associated residual quad-tree (RQT) is infeixed to have a subdivision at the top level of the hierarchy of subdivisions, resulting in four 4x4 TBs being associated with the luma channel of the CU. A rate control module 348 ensures that the bit rate of the encoded data meets a predetermined constraint. The predetennined constraint may be referred to as a rate control target. As the quantity of bits required to represent each CU varies, the rate control target can only be met by averaging across multiple CUs.

Moreover, each run of CUs (or CTUs) forms a 'slice' and the size of each slice is fixed. The fixed size of each slice facilitates architectures using parallelism, as it becomes possible to determine the start location of each slice without having to search for markers in the bitstream. The encoder may also encode multiple slices in parallel, storing the slices progressively as the slices are produced. The predetermined constraint may be determined by the capacity of the communications channel 120, or some other requirement. For example, the predefined constraint is for operation at a 'constant bit rate' (CBR). As such, the encoder rate control target may be detennined according to a constant bit rate channel capacity for a target communication channel (e.g., the channel 120) to carry video data containing a video frame.

The constraint operates at a sub-frame level, and, due to channel rate limitations and intermediate buffer size limitations, also imposes timing constraints on the delivery of blocks of compressed video data by the video encoder 1 14. In particular, to ensure the fixed size requirement of each slice is met, the cumulative cost of the CTUs within each slice must not exceed the fixed size requirement. The cost may be less than the fixed size requirement. For example, the timing constraints are discussed further with reference to Figs. 6A and 6B. The rate control module may also influence the selection of prediction modes within the video encoder 1 14, as discussed with reference to the method 800 of Fig. 8. For example, particular prediction modes have lower bit cost to code a block compared to other prediction modes and are thus considered low cost, albeit offering poor performance in terms of quality. Then, if the remaining available bits to code a given slice segment falls below a threshold (the threshold being updated for each coded block in the slice segment), the rate control module 348 enters a 'fallback' state where the remaining blocks in the slice segments are coded using this low cost prediction mode. As such, CBR operation is guaranteed, regardless of the complexity of the incoming uncompressed video data.

A quantisation parameter (QP) 384 is output from the rate control module 348. The QP 384 varies on a block by block basis as the frame is being encoded. In particular, the QP 384 is signalled using a 'delta QP' syntax element, signalled at most once per transform unit (TU). Delta QP is only signalled when at least one significant residual coefficient is present for the TU. Other methods for controlling the QP 384 are also possible. The QP defines a divisor applied by a quantiser module 322 to the transform coefficients 362 to produce residual coefficients 364. The remainder of the division operation in the quantiser module 322 is discarded. Lower QPs result in larger magnitude residual coefficients but with a smaller range of remainders to discard. As such, lower QPs give a higher quality at the video decoder 134 output, at the expense of a lower

compression ratio. Note that the compression ratio is influenced by a combination of the QP 384 and the magnitude of the transform coefficients 362. The magnitude of the transform coefficients 362 relates to the complexity of the incoming uncompressed video data and the ability of the selected prediction mode to predict the contents of the uncompressed video data data. Thus, overall compression efficiency is only indirectly influenced by the QP 384 and varies along each slice segment as the complexity of the data at each block varies. The residual coefficients 364 are an array of values having the same dimensions as the residual sample array 360. The residual coefficients 364 provide a frequency domain representation of the residual sample array 360 when a transform is applied. The residual coefficients 364 and determined quantisation parameter 384 are taken as input to a dequantiser module 326.

The dequantiser module 326 reverses the scaling performed by the quantiser module 322 to produce rescaled transform coefficients 366. The rescaled transform coefficients are rescaled versions of the residual coefficients 364. The residual coefficients 364 and the determined quantisation parameter 384 are also taken as input to an entropy encoder module 324. The entropy encoder module 324 encodes the values of the transform coefficients 364 in the encoded bitstream 312 (or 'video bitstream'). Due to the loss of precision resulting from the operation of the quantiser module 322, the rescaled transform coefficients 366 are not identical to the original values present in the transform coefficients 362. The rescaled transform coefficients 366 from the dequantiser module 326 are then output to an inverse transform module 328. The inverse transform module 328 performs an inverse transform from the frequency domain to the spatial domain to produce a spatial-domain representation 368 of the rescaled transform coefficients 366. The spatial-domain representation 368 is substantially identical to a spatial domain

representation that is produced at the video decoder 134. The spatial -domain

representation 368 is then input to a summation module 342.

The intra- frame prediction module 336 produces an intra-predicted prediction unit (PU) 378 using reconstructed samples 370 obtained from the summation module 342. In particular, the intra-frame prediction module 336 uses samples from neighbouring blocks (i.e. above, left or above-left of the current block) that have already been reconstructed to produce intra-predicted samples for the cun-ent prediction unit (PU). When a neighbouring block is not available (e.g. at the frame or independent slice segment boundary) the neighbouring samples are considered as 'not available' for reference. In such cases, a default value is used instead of the neighbouring sample values. Typically, the default value (or 'half-tone') is equal to half of the range implied by the bit-depth. For example, when the video encoder 1 14 is configured for a bit-depth of eight (8), the default value is 128. The summation module 342 sums the prediction unit (PU) 378 from the intra-frame prediction module 336 and the spatial domain output of the inverse transfonn module 328. Prediction units (PUs) may be generated using an intra-prediction method. Intra- prediction methods make use of data samples adjacent to the prediction unit (PU) that have previously been reconstructed (typically above and to the left of the prediction unit) in order to generate reference data samples within the prediction unit (PU). Thirty-three angular intra-prediction modes are available. Additionally, a 'DC mode' and a 'planar mode' are also available for intra-prediction, to give a total of thirty-five (35) available intra-prediction modes. An intra-prediction mode 388 indicates which one of the thirty- five available intra-prediction modes is selected for the current prediction unit (PU) when the prediction unit (PU) is configured to use intra-prediction (i.e. as indicated by the prediction mode 386). The summation module 342 produces the reconstructed samples 370 that are stored in a reconstructed picture buffer 332. Standards such as HEVC specify filtering stages, such as sample adaptive offset (SAO) or deblocking. Such filtering is generally beneficial, e.g. for removing blocking artefacts, at the higher compression ratios (e.g. 50: 1 to 100: 1) typically seen in applications such as distribution of compressed video data across the internet to households, or broadcast. The video encoder 1 14 does not perform filtering operations such as adaptive loop filter, SAO or deblocking filtering. The video encoder 1 14 is intended for operation at lower compression ratios, e.g. 4: 1 to 6: 1 or even 8: 1. At such compression ratios, these additional filtering stages have little impact on the frame data, and thus the complexity of the additional filtering operations is not justified by the resulting small improvement in quality. The reconstructed picture buffer 332 is configured within the memory 206 and provides storage for at least a portion of the frame, acting as an intermediate buffer prior for storage of samples to be used for reference for subsequent intra predicted blocks.

The entropy encoder 324 encodes the transform coefficients 364, the QP 388 and other parameters, collectively referred to as 'syntax elements', into the encoded bitstream 312 as sequences of symbols. At targeted compression ratios of 4: 1 to 8: 1 , the data rates for video data at UHD resolutions are very high. At such data rates, techniques such as arithmetic coding, in particular the context adaptive binary arithmetic coding (CABAC) algorithm of HEVC, are not feasible. One issue is that the use of adaptive contexts requires large memory bandwidth to the context memory for updating the probability associated with each context-coded bin in a syntax element. Another issue is the inherently serial nature of coding and decoding each bin into the bitstream . Even bins coded as so-called 'equi-probable' or 'bypass-coded' bins have a serial process that limits parallelism to only a few bins per clock cycle. At compression ratios such as 4: 1 to 8: 1 , the bin rate is extremely high, for example at UHD 4:4:4 10-bit 60 frame per second video data, the data rate is 14.93Gb/s uncompressed, so the compressed data rates between 1.866 to 3.732Gb/s can be expected. Hence, in the video processing system 100, the use of adaptive probabilities for coding of bins is disabled. Consequently, all bins are coded in the "equi-probable state", i.e. bin probabilities equally assigned between '0' bins and Ί ' bins. As a consequence, there is alignment between bins and bits in the encoded bitstream 312, which results in the ability to directly code bins into the bitstream and read bins from the bitstream as bits. Then, the encoded bitstream effectively contains only variable length and fixed length codewords, each codeword including an integer number of (equi-probable) bits. The absence of misalignment between (bypass coded) bins and bits greatly simplifies the design of the entropy encoder 324, as the sequence of bins defining a given syntax element value can be directly stored into the encoded bitstream 312.

Moreover, the absence of context coded bins also removes dependencies necessary for selecting contexts for bins. Such dependencies, when present, require buffers to store the values of previously coded bins, with those v alues used to select one context out of a set of contexts for a current bin. Then, encoding and decoding multiple bins per clock cycle is greatly simplified compared to when adaptive context coding is used, resulting in the potential to achieve the compressed data rates mentioned previously. In such architectures, the system clock can be expected to be in the order of several hundred MHz, with a bus sufficiently wide to achieve the required data rate. All these attributes of the entropy encoder 324 are also present in an entropy decoder 420 of the video decoder 134.

The video decoder 134 of Fig. 4 is described with reference to a low latency video decoding pipeline, however other video codecs may also employ the processing stages of modules 420-432. The encoded video information may also be read from memory 206, the hard disk drive 210, a CD-ROM, a Blu-ray disk^IM or other computer readable storage medium. Alternatively the encoded video information may be received from an external source, such as a server connected to the communications network 220 or a radio- frequency receiver. As seen in Fig. 4, received video data, such as the encoded bitstream 312, is input to the video decoder 134. The encoded bitstream 312 may be read from memory 206, the hard disk drive 210, a CD-ROM, a Blu-ray disk^IM or other non-transitory computer readable storage medium. Alternatively the encoded bitstream 312 may be received from an external source such as a server connected to the communications network 220 or a radio-frequency receiver. The encoded bitstream 312 contains encoded syntax elements representing the captured frame data to be decoded.

The encoded bitstream 312 is input to an entropy decoder module 420 which extracts the syntax elements from the encoded bitstream 312 and passes the values of the syntax elements to other blocks in the video decoder 134. The entropy decoder module 420 applies variable length coding to decode syntax elements from codes present in the encoded bitstream 312. The decoded syntax elements are used to reconstruct parameters within the video decoder 134. Parameters include zero or more residual data array 450, a prediction mode 454 and an intra-prediction mode 457, and a QP 452. The residual data array 450 and the QP 452 are passed to a dequantiser module 421, and the intra-prediction mode 457 is passed to an intra-frame prediction module 426.

The dequantiser module 421 performs inverse scaling on the residual data of the residual data array 450 to create reconstructed data 455 in the form of transform coefficients. The dequantiser module 421 outputs the reconstructed data 455 to an inverse transform module 422. The inverse transform module 422 applies an 'inverse transform' to convert the reconstructed data 455 (i.e., the transform coefficients) from a frequency domain representation to a spatial domain representation, outputting a residual sample array 456 via a multiplexer module 423. The inverse transform module 422 performs the same operation as the inverse transfonn module 328. The inverse transform module 422 is configured to perform an inverse transform. The transforms performed by the inverse transform module 422 are selected from a predetermined set of transform sizes required to decode an encoded bitstream 312.

When the prediction mode 454 indicates that the current prediction unit (PU) was coded using intra-prediction, the intra-frame prediction module 426 produces an intra- predicted prediction unit (PU) 464 for the prediction unit (PU) according to the intra- prediction mode 457. The intra-predicted prediction unit (PU) 464 is produced using data samples spatially neighbouring the prediction unit (PU) and a prediction direction also supplied by the intra-prediction mode 457. The spatially neighbouring data samples are obtained from reconstructed samples 458, output from a summation module 424. The prediction unit (PU) 466, which is output from the multiplexer module 428, is added to the residual sample array 456 from the inverse scale and transform module 422 by the summation module 424 to produce reconstructed samples 458. The reconstructed samples 458 are stored in the frame buffer module 432 configured within the memory 206. The frame buffer module 432 provides sufficient storage to hold part of one frame, as required for just in time output of decoded video data by the video decoder 134. The decoded video data may be sent to devices such as a display device (e.g. 136, 214) or other equipment within a broadcast environment, such as a 'distribution encoder', graphics overlay insertion, or other video processing apparatus.

Fig. 5A is a schematic block diagram showing square coding tree unit (CTU) configurations for the sub-frame latency video encoding and decoding system 100. A frame of video data is decomposed into a series of CTUs in the video encoder

1 14. To meet an end-to-end latency requirement of 32 lines of samples, the CTU height is limited to eight rows. This latency is divided equally between the video encoder 1 14 and the video decoder 134, i.e. 16 lines in each portion of the video encoding and decoding system 100. Dividing the latency equally permits input of uncompressed data in raster scan order to fill half of the 16 line input buffer whilst the other half is being encoded. A similar division occurs in the video decoder 134. Using square CTUs, the resulting size is 8x8 (luma) samples, smaller than the minimum size of 16x16 specified in HEVC. Also an additional source of latency results from the buffering of partially-coded slices in the video encoder 1 14 prior to transmission and the buffering of partially-received sli ces in the video decoder 134 prior to decoding. Further latency present in the communications channel 120 is not considered.

One consequence of an 8x8 CTU size is that no quadtree subdivision into multiple coding units (CUs) is performed. Instead, each CTU is always associated with one 8x8 CU. For an 8x8 CU, a residual quadtree is defined to always include one 8x8 transform unit (TU). As described in detail below, possible configurations of the 8x8 TU are shown in Fig. 5A, as TU configurations. The possible configurations are a result of the 'partition mode' of the CU and the chroma format of the video data.

For the primary colour channel (primary), the chroma format is not relevant and, as seen in Fig. 5 A, an 8x8 transform block (TB) 501 is present when a PART_2Nx2N partition mode is used. As also seen in Fig. 5A, four 4x4 TBs (referenced at 502 in Fig. 5A) are present when a PART NxN partition mode is used. As seen in Fig. 5A, for the secondary colour channels (secondary), the possible arrangements of TBs also depend on the chroma format. When the video data is in the 4:2:0 chroma format, two 4x4 TBs (referenced at 503 in Fig. 5A) are present (one for each secondary colour channel), regardless of the partition mode of the CU. When the video data is in the 4:2:2 chroma format, two pairs of 4x4 TBs

(referenced at 504 in Fig. 5A) are present (one pair of each secondary colour channel), regardless of the partition mode of the CU.

When the video data is in the 4:4:4 chroma format, the partition mode of the CU influences the arrangement of TBs, such that the same arrangements as for the primary colour channel is used. In particular, as seen in Fig. 5 A, one 8x8 TB is used per secondary colour channel (referenced at 505 in Fig. 5 A) when the partition mode of the CU is PART_2Nx2N and four 4x4 TBs (referenced at 506 in Fig. 5A) per secondary colour channel are used when the partition mode of the CU is PART NxN. For cases where multiple TBs are present for a given colour channel, the scan order of the TBs is shown in Fig. 5A using a thick arrows line. The scan order used is defined as a 'Z-scan' order, i.e. iterating over the blocks firstly top left-to-right and then lower left to right. The colour channels are processed with primary colour channel first, followed by the secondary colour channels, i.e. Y, Cb, then Cr, or G, B then R.

Fig. 5B is a schematic block diagram showing non-square coding tree unit (CTU) configurations for the sub-frame latency video encoding and decoding system 100 of Fig. 1. In non-square coding tree unit (CTU) configurations, the height of the CTU is retained at 8 lines, in order to meet the end-to-end latency requirement of 32 lines, as discussed earlier. However, the width of the CTU is doubled to 16 lines, resulting in non-square CTUs. Then, the CTU may contain either two 8x8 CUs (referenced at 512 in Fig. 5B), each having the various structures of TBs as described with reference to Fig. 5 A.

Additionally, the 16x8 CTU may instead contain one non-square 16x8 CU 514. In such a case, the 16x8 CU 514 is divided into TBs as shown in Fig. 5B. The divisions of Fig. 5B for the primary colour channel (primary) and the secondary colour channels (secondary) are analogous to the divisions shown in Fig. 5A, noting that the width of each TB is doubled with respect to the cases shown in Fig. 5 A, so that the possible TB sizes are 16x8 samples and 8x4 samples. The use of larger transforms enables more compact

representation of the residual signal for a given area in the frame, resulting in improved compression efficiency. The improved compression efficiency is balanced against the possibility of highly detailed areas, for which larger transforms offer no benefit, and thus the original transform sizes are still available via the selection of 8x8 CUs. The selection of one 16x8 CU or two 8x8 CUs for a given 16x8 CTU is controlled using a 'cu split' flag, coded in the bitstream. As the split results in two CUs, rather than four CUs, the split differs from the 'quad-tree' subdivision prevalent in HEVC.

Fig. 5C is a schematic block diagram showing a block configuration for the video encoding and decoding system 100 of Fig. 1. In particular, the configuration of Fig. 5C has a height of 4 luma samples in the buffering stages. Retaining a CTU width of 8 samples results in supporting two block configurations within the CTU: 4x4 and 8x4. Note that the 4:2:0 chroma format would result in a requirement to pair 4x4 chroma blocks with 8x8 (or 2x2 arrangements of 4x4 blocks) in the luma channel. As this would violate the height restriction of 4 luma samples, the 4:2:0 chroma format is not supported in the block configuration of Fig. 5C. Instead, only the 4:2:2 and 4:4:4 chroma formats are supported. The 4:2:2 and 4:4:4 chroma formats are supported using one pair of 4x4 blocks for each of the three colour channels in the case of 4:4:4 (i.e. blocks 542 for luma and blocks 546 for chroma). Alternatively, one 8x4 block may be used for each of the three colour channels in the case of 4:4:4 (i.e. 548). For the 4:2:2 case, either a pair of 4x4 blocks (i.e. blocks 542) or one 8x4 block (i.e. block 543) is present for the primary colour channel, and one 4x4 block is present per secondary colour channel (i.e. block 544). Each 4x4 block corresponds to one 4x4 TB and one 4x4 PB, hence there is no concept of multiple partition modes. As the configuration of Fig. 5C has a maximum buffering height of 4 luma samples, the end-to-end latency introduces by sample I/O and block processing is sixteen (16) lines; eight lines in the video encoder 1 14 (four for receiving video data in raster order and four for processing a row of CTUs) and eight lines in the video decoder 134 (four for decoding a row of CTUs and another four for outputting decoded video data in raster scan order). Additional latency may be introduced by the buffering of partially coded slices in the video encoder 1 14 prior to transmission and the buffering of partially received slices in the video decoder 134 prior to decoding. Such buffering is synchronised to the timing of encoding and decoding each row of 4x4 blocks. Accordingly, buffering of coded slices introduces an additional four lines latency in each of the video encoder 1 14 and the video decoder 134, resulting in an overall latency of 16 + 8 = 24 lines latency for the video processing system 100 when using the block configuration of Fig. 5C. An alternative for the configuration of Fig. 5C involves using an 8x4 CTU size. The block configurations resulting in the use of an 8x4 TB are referred to as a 'PART_2NxN' partition mode. In such cases, the intra-prediction process is performed on an 8x4 PB.

Fig. 6A is a schematic diagram showing decomposition of a frame 600 into a set of slices, suitable for use with the sub-frame latency video encoding and decoding system 100 of Fig. 1. The frame 600 is divided into slices, each slice occupying a number of rows of CTUs in the frame 600. Each slice is further decomposed into one independent slice segment, such as independent slice segment 610, followed by several dependent slice segments, such as dependent slice segments 616, 620 and 626. Each slice segment includes one row of CTUs. An independent slice segment and all subsequent dependent slice segments up to a next independent slice segment form one slice. As such, an independent slice segment does not use samples from preceding slice segments for prediction. The CTU 612 uses reference samples 614 for intra prediction, but does not use any samples located in the above dependent slice segment. Dependent slice segments may use samples from preceding slice segments (dependent or independent) for prediction. A particular dependent slice segment may be dependent on an independent slice segment where pixel values of the independent slice segment are used to determine pixel values of the particular dependent slice segment.

The CTU 622 uses reference samples 624 for intra prediction, which includes samples located both to the left of and above the CTU 622. The same constraints on dependencies for bitstream creation and parsing (e.g. for determining the context for a particular bin of a syntax element) are also present when CABAC is used. The timing of the video encoder 1 14 regarding receipt of uncompressed video data from the video source 1 12 and delivery of compressed video data (e.g. to the communications

channel 120), are closely coupled. The output timing between an independent and dependent slice segment is determined according to the encoder rate control target. The output timing and coupling is described with reference to Fig. 6B with instants in time marked as 'X'.

From time instant 622 to time instant 624, samples of the first independent slice segment of the first slice of the frame 600 are received by the video encoder 1 14. Then, from time instant 624 to time instant 626, samples of the dependent slice segments of the first slice of the frame 600 are received by the video encoder 1 14. From time instant 626 to time instant 628, samples of independent slice segment 610 are received by the video encoder 1 14. Then, from time instants 628 to 630, 630 to 632 and 632 to 634, samples of dependent slice segments 616, 620 and 626 are received by the video encoder 1 14. The video encoder 1 14 is not able to stall the arrival of samples and thus incoming samples are processed at the input data rate. A bitstream 640 is produced by the video encoder 1 14 and contains coded representations of the slice segments of the frame 600.

The video encoder 1 14 has a latency 660, as seen in Fig. 6B, between the timing of the boundaries between slice segments at the input and the delivery of blocks of compressed video data (or "coded slice segments"). The latency 660 is expressed as a constant time interval along a time axis 662. The coded representations of the slice segments of the frame 600 are produced and available for output relative to the time instants of the input video data, e.g. 622, 624, 626, 628, 630, 632 and 634, offset by the latency 600, as shown in Fig. 6. Importantly, the fixed timing of arrival of samples (e.g. 622 624 626 628 630 632 634) implies that the time taken to encode and deliver each coded slice segment by the video encoder 1 14 is also constant, resulting in constant latency. The bitstream 640 conforms to the timing requirements of the video processing system 100, as each slice segment is coded and delivered by the video encoder 114 at the required time. As the communications channel 120 has a fixed bandwidth, this also implies that each slice segment has a constant bit rate after compression. A bitstream 642 is an example of a bitstream that does not conform to the timing requirements of the video processing system 100. In the bitstream 642, the coded slice 656 is coded at a lower compression ratio than the compression ratio required in order to meet the timing constraint. As such, all subsequent coded slices are delivered later than required. This results in the video decoder 134 being unable to store decoded video samples in the reconstructed picture buffer 332 at the required timing for reading out to send over an SDI channel.

Although the video encoder 1 14 needs to be configured so as to not produce coded slices exceeding a predetermined required size (i.e. below the required compression ratio), the coded slices, e.g. 650, may fall under the required size (i.e. exceed the required compression ratio). The coded slice 650 is one such example, including a coded portion 652 and a padding portion 654, to bring the total size up to the required size to meet CBR operation. The padding portion 654 represents unused capacity on the SDI channel and thus the video encoder 1 14 is configured to minimise the size of the padding portion 654 using a 'rate control' algorithm, as described further with reference to Fig. 8.

Fig. 7 is a schematic diagram of the data flow of the sub-frame latency video encoding and decoding system 100 of Fig. 1 at a point in time, to further illustrate the overall timing of processing of slice segments within the system 100. In the example of Fig. 7, a frame 710 of uncompressed video data is sent to the video encoder 1 14. The frame 710 is divided into independent slice segments and dependent slice segments (collectively, the 'slice segments'), each being a consecutive run of CTUs from the leftmost CTU in the frame 710 to the rightmost CTU in the frame 710 for a given row of CTUs in the frame 710. Each of the independent slice segments and the dependent slice segments have an equal bit size as seen in Fig. 7.

In the example of Fig. 7, slice segments 712 have already been processed by the video encoder 1 14. The slice segments 712 are no longer present in the memory 206 associated with the video encoder 1 14. Slice segment 714 has already been written to the buffer 304 in raster scan order and slice segment 716 is stored in the buffer 304 in raster scan order. Slice segments 718 have not yet been received by the video encoder 1 14. The video encoder 1 14 encodes the slice segments of the frame 710 from the uppermost slice segment to the lowermost slice segment. The encoded bitstream 312 is conveyed along the communication channel 120 as a sequence of coded slice segments. The coded sl ice segments are received by the video decoder 134 and decoded to produce a decoded frame 720. The decoded frame 722 includes slice segments 722 that have already been decoded and output from the video decoder 134.

Slice segment 724 is read out from the decoded picture buffer 432 in the video decoder 134 in raster scan order as slice segment 726 is decoded on a CTU-by-CTU basis by the video decoder 134 and stored in the decoded picture buffer 432. From Fig. 7, it can be appreciated that the timing of the transition from encoding one slice segment (e.g. 714) to the next one is precisely controlled in the video encoder 1 14, as the transition from receiving one slice segment (e.g. 716) to the next is the result of the fixed timing of the received video data. A small degree of decoupling between the arri val of video data and the delivery of coded slice segments is possible due to the arrival of data in pixel raster scan order. In particular, the video encoder 1 14 can begin processing CTUs in a given slice as the last row of samples in the CTU row is being stored in the buffer 304. A corresponding degree of flexibility exists in the video decoder 134. A further degree of decoupling is possible between consecutive slices due to duration of the 'Horizontal Ancillary Data' (HANC), also known as 'horizontal blanking', that separates each line of video data transmitted over an SD1 link. Fig. 8 is a schematic flow diagram showing a method 800 of forming an encoded portion of a video frame, such that the resulting bitstream forming the encoded portion has slice segments, each of a fixed size in bits (i.e., bit size). In typical video streaming applications, the bitstream itself is continuously generated by the video encoder 1 14 and continuously decoded by the video decoder 134, from the moment of establishing the link (e.g. at power-up) to the moment of tearing-down the link (e.g. at power-down). The method 800 is performed in the video encoder 1 14, under control of the processor 205, to form coded slice segments of a predetermined bit size. As described above, the

predetermined bit size is determined according to a rate control target of the encoder 1 14. As such, CB operation of the video encoder 1 14 is realised. The resulting bitstream forming the encoded portion may be stored within the storage 122, for example, using the syntax elements of an encoded independent slice segment and a dependent slice segment to form the bitstream. Moreover, the bitstream may be formed from a continuous sequence of slice segments, within which each frame represented by one or more independent slice segments and associated dependent slice segments. The boundaries between the slice segments occur at predetermined locations. Accordingly, once the frame boundaries within the bitstream are known it becomes possible to commence decoding of other slice segments by virtue of their fixed offset relative to the frame boundary, e.g. in a parallel decoding system. The encoder may also encode multiple slice segments concuiTently, with the partially encoded slice segments being written concurrently to consecutive portions of a buffer, prior to transmission or storage, e.g. over the communications channel 120.

The method 800 begins at an initialise remaining bits step 802. At the initialise remaining bits step 802, the processor 205 initialises a remaining bits counter configured, for example, within the memory 206. The remaining bits counter indicates an available size remaining for encoding the slice segment currently being processed by the video encoder 1 14. The size remaining is measured in bits to provide precise measurement. However, larger units of measurement, such as bytes, may also be used. Due to the constant frame size, each slice segment has an equal predetermined fixed bit size. Then, for CBR operation (fixed compression ratio), each coded slice also has a predetermined fixed bit size. Control in the processor 205 then passes to an initialise remaining CTU count step 804.

In alternative arrangement of the method 800, coded slices corresponding to independent slice segments have a larger (but still predetermined fixed) size compared to coded slices corresponding to dependent slice segments. As all coded slices still have a predetermined fixed size, CBR operation is maintained for this larger size segments arrangement. However, the increase in coded slice size for the independent slice segments compensates for the unavailability of neighbouring samples from the previous slice (i.e. the last dependent slice segment of the preceding slice) for reference. As such, a slightly lower QP can be used to increase the quality of independent slice segments for the arrangement having larger size segments.

At the initialise remaining CTU count step 804, the processor 205 sets a count of the CTUs in the slice segment currently being processed. As the frame size is generally fixed, the count of the CTUs in the slice segment currently being processed is determined based on a division of the frame width by the CTU width . For frame sizes that do not divide evenly into the CTU width, a round-up is applied. Control in the processor 205 then passes to a remaining CTUs test step 806.

At the remaining CTUs test step 806, the processor 205 tests if any further CTUs remain to be coded in the slice segment currently being processed by the video

encoder 1 14. If no further CTUs in the current slice segment remain to be processed by the video encoder 1 14, the control in the processor 205 passes to an insert padding step 816. Otherwise, control in the processor 205 passes to an encode CTU step 808.

At the insert padding step 816, the video encoder 1 14, under control of the processor 205, inserts padding (e.g. 654) into the encoded bitstream 312 forming the porti on of the video frame, such that the length of the coded slice is padded to the predetermined bit size. The length of the padding varies between slice segments, and may be in the order of dozens of bytes, or in some rare cases of very compactly encoded slice segments may be hundreds of bytes in length. The SDI link is required to convey data with a minimum frequency of transitions between logic 'ones' and 'zeroes' such that clock recovery circuity in the receiver 132 is able to remain synchronised with the clock in the transmitter 1 16 that generates the SDI signal. Then, the padding data should include a bit pattern that contains transitions with reasonable frequency (e.g. at least once every 10 bits) and patterns that cannot be misinterpreted as the start of a slice segment. For example, the 'NAL unit' (a slice segment is a category of NAL unit) start code of 0x000001 is prohibited. It can be seen that the default HEVC padding process of inserting 0x00 bytes is not suitable for conveying the bitstream over an SDI link. An alternative padding byte, such as 0x55, is suitable, and cannot be misinterpredted as a NAL unit start. The method 800 then terminates.

At the encode CTU step 808, the video encoder 1 14, under control of the processor 205, encodes the current CTU into the encoded bitstream 312 using the QP 384. In particular, the mode decisions made by the video encoder 1 14 are stored in the encoded bitstream 312. The video encoder 1 14 tests one or more available modes for each PB (e.g. intra-prediction modes), and selects one mode for the PB. Then, the prediction mode is coded for the PB, and the residual associated with the corresponding TB. The process repeats for all blocks within a CTU. For an 8x8 CTU, with one 8x8 CU and two possible partition modes (i.e., PART NxN and PART_2Nx2N), the arrangements of PBs/TBs within each CTU are as described with reference to Fig. 5 A. A count of the bits consumed by coding each syntax element associated with the CTU is maintained. Without the use of arithmetic coding, the count of the bits consumed by coding each syntax element associated with the CTU is an integer number of bits in the encoded bitstream 312.

Following step 808, control in the processor 205 then passes to a calculate ratio step 810.

At the calculate margin step 810, the processor 205 calculates a margin indicative of the processing of CTUs of the current slice segment versus the consumption of available bits in the coded slice (as measured by the 'remaining bits' variable). The resulting margin is used in the rate control module 348 to control the coding rate of subsequent CTUs in the current slice segment. As the coded slice must not exceed a predetermined size, the remaining bits count is not permitted to become negative. Then, the rate control algorithm needs to compress each slice segment with a targeted compression ratio that is slightly higher than the compression ratio resulting from the data rate of the uncompressed video data versus the data rate of the communications channel 120. For example, when transporting UHD video data over a 3G-SDI link, a compression ratio of 4: 1 is required, however a target compression ratio may be 4.05: 1 . This extra 'headroom' is required to allow for local variation in the bitrate that results from different complexity features in the frame data. This local variation is difficult to control, and thus allowing some headroom allows encoding the data without introducing visual artefacts. For example, without the headroom, towards the end of each slice segment (i.e. the rightmost CTUs in the frame) the possibility of falling back to some very 'harsh' coding method, as described in step 818, would be increased. However, excessive headroom, or 'margin', implies artificially reduced quality of the video decoder 134 output compared to the quality potentially afforded by the CB communications channel 120.

An end of slice margin (in bits) is subtracted from a permitted size of a coded slice to produce an adjusted coded slice size. The adjusted coded slice size is then divided by the number of CTUs present in one slice segment to produce a coded CTU size. A target remaining slice size is determined by multiplying the remaining CTU count and the coded CTU size. The target remaining slice size is compared with the remaining bits count. If the target remaining slice size is larger than the remaining bits count, then the video encoder 1 14 has coded too many bits in representing CTUs up to the present CTU in the current slice segment. Then, the cost of coding remaining CTUs in the slice segment needs to be reduced (e.g. by increasing the QP 384). Alternatively, if the target remaining slice size is less than the remaining bits count, then the video encoder 114 has coded too few bits in representing CTUs up to the present CTU in the current slice segment and is able to increase the quality (e.g. by reducing the QP 384) for remaining CTUs in the slice segment. Thus, the calculate margin step 810 provides a way to dynamically adjust the cost of coding video data via the QP 384. For a run of CTUs or TUs containing very simple data, e.g. one colour, the QP could continue decreasing to very low value, as the cost of coding the minimal residual information was progressively increased. An issue here is that the cost of coding a subsequent block of hi gh complexity would be exorbitant. To avoid this situation, QP decrementing ceases once the achieved compression ratio for a given CTU exceeds the targeted compression ratio by some factor. For example, with a target compression ratio of 4: 1, once the achieved compression ratio reaches 16: 1, no further reductions in QP 384 occur. As a consequence, future blocks of high complexity can be coded without inducing overly high bit rate across those CTUs, or at least the CTUs prior to sufficient adjustment (incrementing) of the QP 384 within the complex region. One example of a block with high complexity is a block containing a high degree of noise (e.g from an imaging sensor). As noise cannot be predicted, large residual data can be expected. Control in the processor 205 then passes to a termination test step 812. At the termination test step 812, the video encoder 1 14, under control of the processor 205, checks if the remaining bits for the slice, divided by the number of remaining CTUs in the slice, has fallen below a predetermined threshold. The

predetermined threshold corresponds to the bit cost of coding one CTU with very harsh quantisation of residuals, e.g. one bit per residual coefficient. This can result in visually noticeable artefacts in the decoded video data (e.g. 412). However, such a mode is necessary to avoid a buffer overrun, where a coded slice segment would otherwise contain more bits than are available in the allocated space for the slice segment. The test at step 812 is configured to guard against excessively complex video data consuming too many bits in the coded slice segment, potentially leading to a failure to compress the slice segment into the available size. If the remaining bits for the slice, divided by the number of remaining CTUs in the slice, has fallen below the threshold, control in the processor 205 passes to a termination mode step 818. Otherwise, control in the processor 205 then passes to a code delta QP step 814. The threshold defines a minimum quantity of bits per CTU to code the remaining CTUs in the slice segment. Based upon a termination mode supporting one bit per sample, the threshold is equal to the number of samples in one CTU. For 8x8 CTUs coded in the 4:2:2 chroma format the result is 128 bits per CTU and for 8x8 CTUs coded in the 4:4:4 chroma format the result is 192 bits per CTU. The predetermined threshold can also include an offset to account for the cost of signalling the termination mode. Note that for a given slice segment, once the termination mode is signalled, all remaining CTUs in the sl ice segment also use the termination mode, thus the signal of the termination mode is only performed at most once per slice segment.

In one arrangement of the method 800, use of the termination mode is not explicitly signalled. In such arrangements, both the video encoder 1 14 and the video decoder 134 record the cumulative bit cost of the slice segment, and subtract this value from the allocated slice segment size to determine a remaining slice segment size. When the remaining slice segment size falls below a termination cost, the termination mode is entered, without any explicit signalling being required. The termination cost is equal to the number of samples required to code the remaining CTUs in the slice segment. At the code delta QP step 814, the entropy encoder 324, under control of the processor 205, encodes a delta QP syntax element into the encoded bitstream 312. The delta QP signals the result of any change to the QP from the step 810. Control in the processor 205 then passes to the remaining CTUs test step 806. At the termination mode step 818, the video encoder 1 14, under control of the processor 205, encodes the remaining CTUs in the slice segment using a low-cost prediction mode, or 'termination mode'. In particular, the low-cost prediction mode offers a fixed cost per CTU. This is achieved by quantising each residual coefficient using a high degree of lossiness, e.g. one bit per coefficient. Moreover, the residual transform is skipped. As no prediction modes are coded, each PB is predicted using the intra 'DC prediction mode, i.e. the PB is populated with a value representing the average of the reference samples. After the remaining CTUs in the slice segment have been coding using the low-cost prediction mode, control in the processor 205 passes to the insert padding step 816.

In an arrangement of the method 800, the calculate margin step 810 is modified such that the comparison between the target remaining slice size and the remaining bits count is performed by firstly deteraiining a difference between the two quantities, which is then smoothed using a low-pass filter over multiple CTUs before performing the comparison. Such an arrangement provides more stable adjustment of QP (less responsive to sudden changes in texture details with the frame) at the expense of slower adaptation to shifts in texture that may necessitate larger changes in QP to avoid buffer overrun for CBR operation.

In another arrangement of the method 800, the target remaining bits is determined based upon the coded cost of CTUs from the previous slice segment. A look-up table may be used to store the cumulative coded costs of CTUs as each CTU of the current slice segment is being coded, for reference when the next slice segment is to be coded. Such an arrangement exploits the correlation of textures between adjacent CTUs, particularly in the vertical direction (i.e. the CTU located in the previous slice segment, above a current CTU), that results from larger textures present in the video data (e.g. text, large objects). As the correlation is imperfect, in one arrangement of the method 800, the predicted cost from the previous sli ce segment is averaged with a simple linear estimation of the cost. Averaging the predicted cost in such a manner provides protection against cost estimates deviating greatly as the slice segments of a given frame are progressively encoded. The deviations may result in excessively harsh quantisation of portions of a given slice segment, or over-allocate bits to other portions where features were present on previous slice segments but cease to be present on later slice segments. Arrangements making use of the previous slice segment to assist in cost prediction and hence quantisation require relatively low additional memory, as the cost is one estimate per CTU in one slice segment.

In yet another arrangement of the method 800, the video encoder 1 14 performs a 'pre-scan' of the (uncompressed) slice segment samples. The pre-scan step occurs prior to encoding a given CTU in the slice segment. As part of the pre-scan step, a complexity measure is produced, e.g. by performing a transform and measuring the residual energy. One method of determining residual energy is to perform the transform to generate residual values. The residual value may then be summed and the summed value used as a measure of residual energy. Then, the complexity measure is derived from the residual energy, and is representative of the estimated coding cost of the CTU where a high complexity measure is considered expensive to code when compared to a low complexity measure. A table of estimated costs is produced that provide input for the target bit cost of each CTU. The pre- scan arrangement does not depend on any correlation between the current slice segment and the previous slice segment. However, the pre-scan arrangement does require additional memory bandwidth to perform the pre-scan operation. For reduced

computational complexity, a Hadamard transform may be used instead of a DCT, as the required output is only an estimate of texture complexity.

In yet another arrangement of the method 800, the video encoder 1 14 performs a pre-scan of a limited number of the leftmost CTUs (e.g., just the first CTU may be pre- scanned). In the arrangement where the video encoder 1 14 performs a pre-scan of a limited number of the leftmost CTUs, a cost estimate for the initial CTU is produced using a Hadamard transform and an initial QP is determined from this cost estimate. The initial QP is determined based upon the target bit cost for one CTU, such that the residual energy is anticipated to approximately consume this quantity of bits. Di viding the cost estimate by the target bit cost results in the desired quantiser step size, from which the initial QP value can be determined. Such arrangements result in an improved starting point (in terms of QP) for the rate control operation when coding successive CTUs within each slice segment. The cost of pre-scanning a limited number of initial CTUs (i.e. one CTU) is limited, generally no extra memory bandwidth is required, as the pre-scanned CTU is the next one to be encoded anyway, just the estimation function cost (e.g. Hadamard transform) is incurred. The extra time required to perform this can generally be

accommodated within the timing margin afforded by the horizontal blanking period of the SDI timing. When encoding an independent slice segment, the initial QP is coded as the 'slice QP'. When encoding a dependent slice segment, the initial QP for the current dependent slice segment may be coded by coding the difference between the QP of the last CTU of the previous slice segment and the new initial QP. In another arrangement, the slice header for a dependent slice segment includes an initial QP that is not dependent on the QP of any earlier slice segments. In the arrangement where the slice header for a dependent slice segment includes an initial QP that is not dependent on the QP of any earlier slice segments, any data transmission errors that result in the loss of a previous slice segment do not affect the ability of the video decoder 134 to correctly determine the QP for CTUs in the current slice segment. In the arrangement where the slice header for a dependent slice segment includes an initial QP that is not dependent on the QP of any earlier slice segments a 'slice_qp_delta' syntax element is coded in the slice header of each slice segment (dependent as well as independent).

As discussed previously, the size of a coded slice segment must not exceed a predetermined value. Then, as the coded slice segment is being encoded, the cumulative cost can be used to influence the remaining cost, so as not to exceed the predetermined size. Using the cumulative cost to influence the remaining cost can lead to overly conservative (i.e. lower quality) coding of the slice segment.

In one arrangement of the video processing system 100, either or both the video encoder 1 14 and the video decoder 134 may include multiple entropy encoders 324 or entropy decoders 420. The additional entropy encoders or decoders are configured to perform parallel encoding or decoding of each slice segment. In contrast to the parallelism supported in HEVC, each dependent slice segment may also be parsed completely independently of other slice segments. In this context, independent parsing refers to the absence of any dependencies for parsing a particular syntax element on syntax element values from a different slice segment, e.g. for arithmetic context selection. Despite the absence of dependencies for parsing syntax elements between consecutive dependent slice segments and a dependent slice segment and the preceding independent slice segments, the intra-prediction reconstruction process has dependencies on neighbouring samples that may cross the dependent slice segment boundary. Also, due to the fixed size of each coded slice, the multiple entropy encoders or entropy decoders are able to read or write their portions of the encoded bitstreams directly to or from the coded picture buffer, with suitable offsets applied to obtain the starting address of each coded slice segment. As discussed previously, in some arrangements the offset for an independent slice segment may be larger than the offset for dependent slice segments, resulting in a larger potential size for an independent slice segment compared to a dependent slice segment. However, with predetermined sizes for each coded slice segment, the entry points into the coded picture buffer for each encoding or decoding step are able to be determined. In particular, the entry point for a given sl ice segment i s equal to the sum of the sizes of the preceding slice segments, with additional space reserved for any high level syntax NAL units, such as the picture parameter set or the sequence parameter set.

In another arrangement of the video processing system 100, the first independent slice segment in each frame is offset by a fixed amount from the start of the al located space. In the resulting gap, a syntax structure is inserted that contains a frame header, including information frame size, bit depth and chroma format. In HEVC such

information would be present in the sequence parameter set (SPS) and the picture parameter set (PPS). The SPS and PPS have variable lengths, which has an undesirable impact on the timing of the subsequent slice segments, as the length can vary. The header has a fixed length, so that the rate target (i.e. number of bits per slice) for the independent slice segment can be determined without needing to code the header information into a packet. As the video processing system 100 does not include any inter-picture prediction processes, various parameters relating to reference picture list management are absent from the header information. The header information is included in the first independent slice segment for each frame; this is necessary to ensure that the video decoder 134 can commence decoding from any frame in the encoded bitstream 312, e.g. in case of truncated or otherwise edited bitstreams. As the length of the header information is fixed, generally each parameter in the header information is coded using fixed-length codewords. INDUSTRIAL APPLICABILITY

The arrangements described are applicable to the computer and data processing industries and particularly for the digital signal processing for the encoding a decoding of signals such as video signals for a low-latency (sub-frame) video coding system.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

Claims

CLAIMS:

1. A method of forming a portion of a video frame, the method comprising:

2. The method according to claim 1, wherein the first and second predetermined bit sizes are equal.

3. The method according to claim 1, wherein the first predetermined size

corresponding to the independent slice segment is larger than second predetermined size corresponding to the dependent slice segments.

4. A decoder for forming a portion of a video frame, the decoder comprising:

decoder module for decoding pixel values of an encoded independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size, and for decoding at least one dependent slice segment for the slice of the video frame having a second predetermined bit size, the at least one dependent slice segment being dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

5. A computer readable medium having a program stored thereon for forming a portion of a video frame, the program comprising:

6. A system for forming a portion of a video frame, the system comprising:

a memory for storing data and a computer readable medium;

decoding at l east one dependent sli ce segment for the slice of the video frame having a second predetermined bit size, the at least one dependent slice segment being dependent on the independent slice segment using pixel values of the independent slice segment to determine pixel values of the dependent slice segment; and

A method of forming an encoded portion of a video frame, the method comprising: encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size, the first predetennmed bit size being determined according to an encoder rate control target;

8. The method according to claim 7, wherein the encoder rate control target is determined according to a constant bit rate channel capacity for a target channel to carry video data containing the video frame.

9. The method according to claim 7, wherein output timing between the independent and dependent slice segments is determined according to the encoder rate control target.

10. An encoder for forming a portion of a video frame, the encoder comprising:

an encoder module for encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetermined bit size, the first predetermined bit size being determined according to an encoder rate control target, and for encoding a dependent sli ce segment, for the sli ce of the video stream, having a second predetermined bit size smaller than the first predetermined bit size, the second

a storage module for storing the portion of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to fonn the encoded portion of the video frame.

1 1. A computer readable medium having a program stored thereon for forming an encoded portion of a video frame, the program comprising:

code for encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetennined bit size, the first predetermined bit size being detennined according to an encoder rate control target;

storing the portion of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to form the encoded portion of the video frame.

12. A system for forming an encoded portion of a video frame, the method comprising: a memory for storing data and a computer readable medium;

encoding an independent slice segment for a slice of the video frame, the independent slice segment having a first predetennined bit size, the first

predetennined bit size being determined according to an encoder rate control target; encoding a dependent slice segment, for the slice of the video stream, having a second predetermined bit size smaller than the first predetermined bit size, the second predetermined bit size being detennined according to the encoder rate control target, wherein the at least one dependent slice segment is dependent on the independent slice segment using pixel values of the independent slice segment to detennine pixel values of the dependent slice segment; and

storing the portion of the video frame using the pixel values of the encoded independent slice segment and the dependent slice segment to fomi the encoded portion of the video frame.