GB2523993A

GB2523993A - Data encoding and decoding

Info

Publication number: GB2523993A
Application number: GB1403983.8A
Authority: GB
Inventors: James Alexander Gamei; Karl James Sharman; Nicholas Ian Saunders
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2014-03-06
Filing date: 2014-03-06
Publication date: 2015-09-16
Also published as: GB201403983D0

Abstract

Data is partitioned for encoding into data portions to form encoded output data that includes an associated message. The message indicates a selected constraint on the portion format of a set of data portions that are encoded. The data may be video data and the data portions may be slices of video frames with the constraint defining the maximum size of the slices. In an aspect of the invention the message indicating the constraint on the portion format is used by a group of decoders acting in parallel in order to select an optimum buffer size for use in decoding the encoded data. The data may be encoded according to the High Efficiency Video Coding (HEVC) standard and the message included as a supplemental enhancement information (SEI) message. The invention may be used to facilitate efficient parallel decoding of data by signalling to the decoder the maximum size of the data portions within the encoded data that enables an appropriate size of buffer to be selected.

Description

DATA ENCODING AND DECODING

Field of the Invention

This disclosure relates to data encoding and decoding.

Description of the Related Art

The "background" description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as

prior art against the present disclosure.

There are several video data compression and decompression systems which involve transforming video data into a frequency domain representation, quantising the frequency domain coefficients and then applying some form of entropy encoding to the quantised coefficients.

Entropy, in the present context, can be considered as representing the information content of a data symbol or series of symbols. The aim of entropy encoding is to encode a series of data symbols in a lossless manner using (ideally) the smallest number of encoded data bits which are necessary to represent the information content of that series of data symbols. In practice, entropy encoding is used to encode the quantised coefficients such that the encoded data is smaller (in terms of its number of bits) than the data size of the original quantised coefficients. A more efficient entropy encoding process gives a smaller output data size for the same input data size.

One technique for entropy encoding video data is the so-called CABAC (context adaptive binary arithmetic coding) technique.

Summary

This disclosure provides a data encoding method according to claim 1.

Further respective aspects and features are defined in the appended claims.

It is to be understood that both the foregoing general description and the following detailed description are exemplary, but not restrictive of, the present disclosure.

Brief Description of the DrawinQs

A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description of embodiments, when considered in connection with the accompanying drawings, wherein: Figure 1 schematically illustrates an audiolvideo (AN) data transmission and reception system using video data compression and decompression; Figure 2 schematically illustrates a video display system using video data decompression; Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression; Figure 4 schematically illustrates a video camera using video data compression; Figure 5 provides a schematic overview of a video data compression and decompression apparatus; Figure 6 schematically illustrates the generation of predicted images; Figure 7 schematically illustrates a largest coding unit (LCU); Figure 8 schematically illustrates a set of four coding units (CU); Figures 9 and 10 schematically illustrate the coding units of Figure 8 sub-divided into smaller coding units; Figure 11 schematically illustrates an array of prediction units (PU); Figure 12 schematically illustrates an array of transform units (TU); Figure 13 schematically illustrates a partially-encoded image; Figure 14 schematically illustrates a set of possible prediction directions; Figure 15 schematically illustrates a set of prediction modes; Figure 16 schematically illustrates a zigzag scan; Figure 17 schematically illustrates a CABAC entropy encoder; Figure 18 is a schematic diagram of a parallelised CABAC decoder; Figure 19 is a schematic timing diagram showing the operation of the system of Figure 18; Figure 20 schematically illustrates a variation in slice number and configuration from frame to frame; Figure 21 schematically illustrates a set of NAL units; Figure 22 schematically illustrates a video processing device; Figures 23a to 23c schematically illustrated processing tile configurations; Figure 24a and 24b schematically illustrate changes in latency in the operation of the device of Figure 22; Figure 25 schematically illustrates a second example of a video processing device; Figure 26 schematically illustrates a part of an encoding technique; and Figure 27 schematically illustrates a part of a decoding technique.

Description of the Embodiments

Referring now to the drawings, Figures 1-4 are provided to give schematic illustrations of apparatus or systems making use of the compression and/or decompression apparatus to be described below in connection with embodiments.

All of the data compression and/or decompression apparatus is to be described below may be implemented in hardware, in software running on a general-purpose data processing apparatus such as a general-purpose computer, as programmable hardware such as an application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or as combinations of these. In cases where the embodiments are implemented by software and/or firmware, it will be appreciated that such software and/or firmware, and non-transitory machine-readable data storage media by which such software and/or firmware are stored or otherwise provided, are considered as embodiments.

Figure 1 schematically illustrates an audio/video data transmission and reception system using video data compression and decompression.

An input audio/video signal 10 is supplied to a video data compression apparatus 20 which compresses at least the video component of the audio/video signal 10 for transmission along a transmission route 30 such as a cable, an optical fibre, a wireless link or the like. The compressed signal is processed by a decompression apparatus 40 to provide an output audio/video signal 50. For the return path, a compression apparatus 50 compresses an audio/video signal for transmission along the transmission route 30 to a decompression apparatus 70.

The compression apparatus 20 and decompression apparatus 70 can therefore form one node of a transmission link. The decompression apparatus 40 and decompression apparatus 60 can form another node of the transmission link. Of course, in instances where the transmission link is uni-directional, only one of the nodes would require a compression apparatus and the other node would only require a decompression apparatus.

Figure 2 schematically illustrates a video display system using video data decompression. In particular, a compressed audio/video signal 100 is processed by a decompression apparatus 110 to provide a decompressed signal which can be displayed on a display 120. The decompression apparatus 110 could be implemented as an integral part of the display 120, for example being provided within the same casing as the display device.

Alternatively, the decompression apparatus 110 might be provided as (for example) a so-called set top box (STB), noting that the expression "set-top" does not imply a requirement for the box to be sited in any particular orientation or position with respect to the display 120; it is simply a term used in the art to indicate a device which is connectable to a display as a peripheral device.

Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression. An input audio/video signal 130 is supplied to a compression apparatus 140 which generates a compressed signal for storing by a store device 150 such as a magnetic disk device, an optical disk device, a magnetic tape device, a solid state storage device such as a semiconductor memory or other storage device. For replay, compressed data is read from the store device 150 and passed to a decompression apparatus 160 for decompression to provide an output audio/video signal 170.

It will be appreciated that the compressed or encoded signal, and a storage medium or data carrier storing that signal, are considered as embodiments.

Figure 4 schematically illustrates a video camera using video data compression. In Figure 4, and image capture device 180, such as a charge coupled device (CCD) image sensor and associated control and read-out electronics, generates a video signal which is passed to a compression apparatus 190. A microphone (or plural microphones) 200 generates an audio signal to be passed to the compression apparatus 190. The compression apparatus 190 generates a compressed audio/video signal 210 to be stored and/or transmitted (shown generically as a schematic stage 220).

The techniques to be described below relate primarily to video data compression. It will be appreciated that many existing techniques may be used for audio data compression in conjunction with the video data compression techniques which will be described, to generate a compressed audio/video signal. Accordingly, a separate discussion of audio data compression will not be provided. It will also be appreciated that the data rate associated with video data, in particular broadcast quality video data, is generally very much higher than the data rate associated with audio data (whether compressed or uncompressed). It will therefore be appreciated that uncompressed audio data could accompany compressed video data to form a compressed audio/video signal. It will further be appreciated that although the present examples (shown in Figures 1-4) relate to audio/video data, the techniques to be described below can find use in a system which simply deals with (that is to say, compresses, decompresses, stores, displays and/or transmits) video data. That is to say, the embodiments can apply to video data compression without necessarily having any associated audio data handling at all.

Figure 5 provides a schematic overview of a video data compression and decompression apparatus.

Successive images of an input video signal 300 are supplied to an adder 310 and to an image predictor 320. The image predictor 320 will be described below in more detail with reference to Figure 6. The adder 310 in fact performs a subtraction (negative addition) operation, in that it receives the input video signal 300 on a +" input and the output of the image predictor 320 on a "-" input, so that the predicted image is subtracted from the input image. The result is to generate a so-called residual image signal 330 representing the difference between the actual and projected images.

One reason why a residual image signal is generated is as follows. The data coding techniques to be described, that is to say the techniques which will be applied to the residual image signal, tends to work more efficiently when there is less "energy" in the image to be encoded. Here, the term "efficiently" refers to the generation of a small amount of encoded data; for a particular image quality level, it is desirable (and considered "efficient") to generate as little data as is practicably possible. The reference to "energy" in the residual image relates to the amount of information contained in the residual image. If the predicted image were to be identical to the real image, the difference between the two (that is to say, the residual image) would contain zero information (zero energy) and would be very easy to encode into a small amount of encoded data. In general, if the prediction process can be made to work reasonably well, the expectation is that the residual image data will contain less information (less energy) than the input image and so will be easier to encode into a small amount of encoded data.

The residual image data 330 is supplied to a transform unit 340 which generates a discrete cosine transform (DOT) representation of the residual image data. The DOT technique itself is well known and will not be described in detail here. There are however aspects of the techniques used in the present apparatus which will be described in more detail below, in particular relating to the selection of different blocks of data to which the DOT operation is applied. These will be discussed with reference to Figures 7-12 below.

Note that in some embodiments, a discrete sine transform (DST) is used instead of a DOT. In other embodiments, no transform might be used. This can be done selectively, so that the transform stage is, in effect, bypassed, for example under the control of a transform skip" command or mode.

The output of the transform unit 340, which is to say, a set of transform coefficients for each transformed block of image data, is supplied to a quantiser 350. Various quantisation techniques are known in the field of video data compression, ranging from a simple multiplication by a quantisation scaling factor through to the application of complicated lookup tables under the control of a quantisation parameter. The general aim is twofold. Firstly, the quantisation process reduces the number of possible values of the transformed data. Secondly, the quantisation process can increase the likelihood that values of the transformed data are zero. Both of these can make the entropy encoding process, to be described below, work more efficiently in generating small amounts of compressed video data.

A data scanning process is applied by a scan unit 360. The purpose of the scanning process is to reorder the quantised transformed data so as to gather as many as possible of the non-zero quantised transformed coefficients together, and of course therefore to gather as many as possible of the zero-valued coefficients together. These features can allow so-called run-length coding or similar techniques to be applied efficiently. So, the scanning process involves selecting coefficients from the quantised transformed data, and in particular from a block of coefficients corresponding to a block of image data which has been transformed and quantised, according to a "scanning order" so that (a) all of the coefficients are selected once as part of the scan, and (b) the scan tends to provide the desired reordering. Techniques for selecting a scanning order will be described below. One example scanning order which can tend to give useful results is a so-called zigzag scanning order.

The scanned coefficients are then passed to an entropy encoder (EE) 370. Again, various types of entropy encoding may be used. Two examples which will be described below are variants of the so-called CABAC (Context Adaptive Binary Arithmetic Coding) system and variants of the so-called CAVLC (Context Adaptive Variable-Length Coding) system. In general terms, CABAC is considered to provide a better efficiency, and in some studies has been shown to provide a 10-20% reduction in the quantity of encoded output data for a comparable image quality compared to CAVLC. However, CAVLC is considered to represent a much lower level of complexity (in terms of its implementation) than CABAC. The CABAC technique will be discussed with leference to Figure 17 below.

Note that the scanning process and the entropy encoding process are shown as separate processes, but in fact can be combined or treated together. That is to say, the reading of data into the entropy encoder can take place in the scan order. Corresponding considerations apply to the respective inverse processes to be described below.

The output of the entropy encoder 370, along with additional data (mentioned above and/or discussed below), for example defining the manner in which the predictor 320 generated the predicted image, provides a compressed output video signal 380.

However, a return path is also provided because the operation of the predictor 320 itself depends upon a decompressed version of the compressed output data.

The reason for this feature is as follows. At the appropriate stage in the decompression process (to be described below) a decompressed version of the residual data is generated. This decompressed residual data has to be added to a predicted image to generate an output image (because the original residual data was the difference between the input image and a predicted image). In order that this process is comparable, as between the compression side and the decompression side, the predicted images generated by the predictor 320 should be the same during the compression process and during the decompression process. Of course, at decompression, the apparatus does not have access to the original input images, but only to the decompressed images. Therefore, at compression, the predictor 320 bases its prediction (at least, for inter-image encoding) on decompressed versions of the compressed images.

The entropy encoding process carried out by the entropy encoder 370 is considered to be "lossless", which is to say that it can be reversed to arrive at exactly the same data which was first supplied to the entropy encoder 370. So, the return path can be implemented before the entropy encoding stage. Indeed, the scanning process carried out by the scan unit 360 is also considered lossless, but in the present embodiment the return path 390 is from the output of the quantiser 350 to the input of a complimentary inverse quantiser 420.

In general terms, an entropy decoder 410, the reverse scan unit 400, an inverse quantiser 420 and an inverse transform unit 430 provide the respective inverse functions of the entropy encoder 370, the scan unit 360, the quantiser 350 and the transform unit 340. For now, the discussion will continue through the compression process; the process to decompress an input compressed video signal will be discussed separately below.

In the compression process, the scanned coefficients are passed by the return path 390 from the quantiser 350 to the inverse quantiser 420 which carries out the inverse operation of the scan unit 360. An inverse quantisation and inverse transformation process are carried out by the units 420, 430 to generate a compressed-decompressed residual image signal 440.

The image signal 440 is added, at an adder 450, to the output of the predictor 320 to generate a reconstructed output image 460. This forms one input to the image predictor 320, as will be described below.

Turning now to the process applied to a received compressed video signal 470, the signal is supplied to the entropy decoder 410 and from there to the chain of the reverse scan unit 400, the inverse quantiser 420 and the inverse transform unit 430 before being added to the output of the image predictor 320 by the adder 450. In straightforward terms, the output 460 of the adder 450 forms the output decompressed video signal 480. In practice, further filtering may be applied before the signal is output.

Figure 6 schematically illustrates the generation of predicted images, and in particular the operation of the image predictor 320.

There are two basic modes of prediction: so-called intra-image prediction and so-called inter-image, or motion-compensated (MC), prediction.

Intra-image prediction bases a prediction of the content of a block of the image on data from within the same image. This corresponds to so-called I-frame encoding in other video compression techniques. In contrast to I-frame encoding, where the whole image is intra-encoded, in the present embodiments the choice between intra-and inter-encoding can be made on a block-by-block basis, though in other embodiments the choice is still made on an image-by-image basis.

Motion-compensated prediction makes use of motion information which attempts to define the source, in another adjacent or nearby image, of image detail to be encoded in the current image. Accordingly, in an ideal example, the contents of a block of image data in the predicted image can be encoded very simply as a reference (a motion vector) pointing to a corresponding block at the same or a slightly different position in an adjacent image.

Returning to Figure 6, two image prediction arrangements (corresponding to intra-and inter-image prediction) are shown, the results of which are selected by a multiplexer 500 under the control of a mode signal 510 so as to provide blocks of the predicted image for supply to the adders 310 and 450. The choice is made in dependence upon which selection gives the lowest "energy" (which, as discussed above, may be considered as information content requiring encoding), and the choice is signalled to the encoder within the encoded output datastream.

Image energy, in this context, can be detected, for example, by carrying out a trial subtraction of an area of the two versions of the predicted image from the input image, squaring each pixel value of the difference image, summing the squared values, and identifying which of the two versions gives rise to the lower mean squared value of the difference image relating to that image area.

The actual prediction, in the intra-encoding system, is made on the basis of image blocks received as part of the signal 460, which is to say, the prediction is based upon encoded-decoded image blocks in order that exactly the same prediction can be made at a decompression apparatus. However, data can be derived from the input video signal 300 by an intra-mode selector 520 to control the operation of the intra-image predictor 530.

For inter-image prediction, a motion compensated (MC) predictor 540 uses motion information such as motion vectors derived by a motion estimator 550 from the input video signal 300. Those motion vectors are applied to a processed version of the reconstructed image 460 by the motion compensated predictor 540 to generate blocks of the inter-image prediction.

The processing applied to the signal 460 will now be described. Firstly, the signal is filtered by a filter unit 560. This involves applying a "deblocking" filter to remove or at least tend to reduce the effects of the block-based processing carried out by the transform unit 340 and subsequent operations. Also, an adaptive loop filter is applied using coefficients derived by processing the reconstructed signal 460 and the input video signal 300. The adaptive loop filter is a type of filter which, using known techniques, applies adaptive filter coefficients to the data to be filtered. That is to say, the filter coefficients can vary in dependence upon various factors.

Data defining which filter coefficients to use is included as part of the encoded output datastream.

The filtered output from the filter unit 560 in fact forms the output video signal 480. It is also buffered in one or more image stores 570; the storage of successive images is a requirement of motion compensated prediction processing, and in particular the generation of motion vectors. To save on storage requirements, the stored images in the image stores 570 may be held in a compressed form and then decompressed for use in generating motion vectors. For this particular purpose, any known compression / decompression system may be used. The stored images are passed to an interpolation filter 580 which generates a higher resolution version of the stored images; in this example, intermediate samples (sub-samples) are generated such that the resolution of the interpolated image is output by the interpolation filter 580 is 8 times (in each dimension) that of the images stored in the image stores 570. The interpolated images are passed as an input to the motion estimator 550 and also to the motion compensated predictor 540.

In embodiments, a further optional stage is provided, which is to multiply the data values of the input video signal by a factor of four using a multiplier 600 (effectively just shifting the data values left by two bits), and to apply a corresponding divide operation (shift right by two bits) at the output of the apparatus using a divider or right-shifter 610. So, the shifting left and shifting right changes the data purely for the internal operation of the apparatus. This measure can provide for higher calculation accuracy within the apparatus, as the effect of any data rounding errors is reduced.

The way in which an image is partitioned for compression processing will now be described. At a basic level, and image to be compressed is considered as an array of blocks of samples. For the purposes of the present discussion, the largest such block under consideration is a so-called largest coding unit (LCU) 700 (Figure 7), which represents a square array of 64 x 64 samples. Here, the discussion relates to luminance samples. Depending on the chrominance mode, such as 4:4:4, 4:2:2, 4:2:0 or 4:4:4:4 (GBR plus key data), there will be differing numbers of corresponding chrominance samples corresponding to the luminance block.

Three basic types of blocks will be described: coding units, prediction units and transform units. In general terms, the recursive subdividing of the LCU5 allows an input picture to be partitioned in such a way that both the block sizes and the block coding parameters (such as prediction or residual coding modes) can be set according to the specific characteristics of the image to be encoded.

The LOU may be subdivided into so-called coding units (CU). Coding units are always square and have a size between 8x8 samples and the full size of the LCU 700. The coding units can be arranged as a kind of tree structure, so that a first subdivision may take place as shown in Figure 8, giving coding units 710 of 32x32 samples; subsequent subdivisions may then take place on a selective basis so as to give some coding units 720 of 16x16 samples (Figure 9) and potentially some coding units 730 of 8x8 samples (Figure 10). Overall, this process can provide a content-adapting coding tree structure of CU blocks, each of which may be as large as the LOU or as small as 8x8 samples. Encoding of the output video data takes place on the basis of the coding unit structure.

Figure 11 schematically illustrates an array of prediction units (PU). A prediction unit is a basic unit for carrying information relating to the image prediction processes, or in other words the additional data added to the entropy encoded residual image data to form the output video signal from the apparatus of Figure 5. In general, prediction units are not restricted to being square in shape. They can take other shapes, in particular rectangular shapes forming half of one of the square coding units, as long as the coding unit is greater than the minimum (8x8) size. The aim is to allow the boundary of adjacent prediction units to match (as closely as possible) the boundary of real objects in the picture, so that different prediction parameters can be applied to different real objects. Each coding unit may contain one or more prediction units.

Figure 12 schematically illustrates an array of transform units (TU). A transform unit is a basic unit of the transform and quantisation process. Transform units are always square and can take a size from 4x4 up to 32x32 samples. Each coding unit can contain one or more transform units. The acronym SDIP-P in Figure 12 signifies a so-called short distance intra-prediction partition. In this arrangement only one dimensional transforms are used, so a 4xN block is passed through N transforms with input data to the transforms being based upon the previously decoded neighbouring blocks and the previously decoded neighbouring lines within the current SDIP-P.

The intra-prediction process will now be discussed. In general terms, intra-prediction involves generating a prediction of a current block (a prediction unit) of samples from previously-encoded and decoded samples in the same image. Figure 13 schematically illustrates a partially encoded image 800. Here, the image is being encoded from top-left to bottom-right on an LOU basis. An example LCU encoded partway through the handling of the whole image is shown as a block 810. A shaded region 820 above and to the left of the block 810 has already been encoded. The intra-image prediction of the contents of the block 810 can make use of any of the shaded area 820 but cannot make use of the unshaded area below that.

The block 810 represents an LCU; as discussed above, for the purposes of intra-image prediction processing, this may be subdivided into a set of smaller prediction units. An example of a prediction unit 830 is shown within the LCU 810.

The intra-image prediction takes into account samples above and/or to the left of the current LCU 810. Source samples, from which the required samples are predicted, may be located at different positions or directions relative to a current prediction unit within the LCU 810. To decide which direction is appropriate for a current prediction unit, the results of a trial prediction based upon each candidate direction are compared in order to see which candidate direction gives an outcome which is closest to the corresponding block of the input image. The candidate direction giving the closest outcome is selected as the prediction direction for that prediction unit.

The picture may also be encoded on a slice" basis. In one example, a slice is a horizontally adjacent group of LCUs. But in more general terms, the entire residual image could form a slice, or a slice could be a single [CU, or a slice could be a row of [CUs, and so on.

Slices can give some resilience to errors as they are encoded as independent units. The encoder and decoder states are completely reset at a slice boundary. For example, intra-prediction is not carried out across slice boundaries; slice boundaries are treated as image boundaries for this purpose.

Figure 14 schematically illustrates a set of possible (candidate) prediction directions. The full set of 34 candidate directions is available to a prediction unit of 8x8, 16x16 or 32x32 samples. The special cases of prediction unit sizes of 4x4 and 64x64 samples have a reduced set of candidate directions available to them (17 candidate directions and 5 candidate directions respectively). The directions are determined by horizontal and vertical displacement relative to a current block position, but are encoded as prediction "modes", a set of which is shown in Figure 15. Note that the so-called DC mode represents a simple arithmetic mean of the surrounding upper and left-hand samples.

Figure 16 schematically illustrates a zigzag scan, being a scan pattern which may be applied by the scan unit 360. In Figure 16, the pattern is shown for an example block of 8x8 transform coefficients, with the DC coefficient being positioned at the top left position 840 of the block, and increasing horizontal and vertical spatial frequencies being represented by coefficients at increasing distances downwards and to the right of the top-left position 840.

Note that in some embodiments, the coefficients may be scanned in a reverse order (bottom right to top left using the ordering notation of Figure 16). Also it should be noted that in some embodiments, the scan may pass from left to right across a few (for example between one and three) uppermost horizontal rows, before carrying out a zig-zag of the remaining coefficients.

Figure 17 schematically illustrates the operation of a CABAC entropy encoder.

In context adaptive encoding of this nature and according to embodiments, a bit of data may be encoded with respect to a probability model, or context, representing an expectation or prediction of how likely it is that the data bit will be a one or a zero. To do this, an input data bit is assigned a code value within a selected one of two (or more generally, a plurality of) complementary sub-ranges of a range of code values, with the respective sizes of the sub-ranges (in embodiments, the respective proportions of the sub-ranges relative to the set of code values) being defined by the context (which in turn is defined by a context variable associated with or otherwise relevant to that input value). A next step is to modify the overall range, which is to say, the set of code values, (for use in respect of a next input data bit or value) in response to the assigned code value and the current size of the selected sub-range. If the modified range is then smaller than a threshold representing a predetermined minimum size (for example, one half of an original range size) then it is increased in size, for example by doubling (shifting left) the modified range, which doubling process can be carried out successively (more than once) if required, until the range has at least the predetermined minimum size. At this point, an output encoded data bit is generated to indicate that a (or each, if more than one) doubling or size-increasing operation took place. A further step is to modify the context (that is, in embodiments, to modify the context variable) for use with or in respect of the next input data bit or value (or, in some embodiments, in respect of a next group of data bits or values to be encoded). This may be carried out by using the current context and the identity of the current most probable symbol" (either one or zero, whichever is indicated by the context to currently have a greater than 0.5 probability) as an index into a look-up table of new context values, or as inputs to an appropriate mathematical formula from which a new context variable may be derived. The modification of the context variable may, in embodiments, increase the proportion of the set of code values in the sub-range which was selected for the current data value.

The CABAC encoder operates in respect of binary data, that is to say, data represented by only the two symbols 0 and 1. The encoder makes use of a so-called context modelling process which selects a "contexf' or probability model for subsequent data on the basis of previously encoded data. The selection of the context is carried out in a deterministic way so that the same determination, on the basis of previously decoded data, can be performed at the decoder without the need for further data (specifying the context) to be added to the encoded datastream passed to the decoder.

Referring to Figure 17, input data to be encoded may be passed to a binary converter 900 if it is not already in a binary form; if the data is already in binary form, the converter 900 is bypassed (by a schematic switch 910). In the present embodiments, conversion to a binary form is actually carried out by expressing the quantised transform coefficient data as a series of binary "maps", which will be described further below.

The binary data or bins may then be handled by one of two processing paths, a "regular" and a "bypass" path (which are shown schematically as separate paths but which, in embodiments discussed below, could in fact be implemented by the same processing stages, just using slightly different parameters). The bypass path employs a so-called bypass coder 920 which does not necessarily make use of context modelling in the same form as the regular path.

In some examples of CABAC coding, this bypass path can be selected if there is a need for particularly rapid processing of a batch of data, but in the present embodiments two features of so-called "bypass" data are noted: firstly, the bypass data is handled by the CABAC encoder (950, 960), just using a fixed context model representing a 50% probability; and secondly, the bypass data relates to certain categories of data, one particular example being coefficient sign data. Otherwise, the regular path is selected by schematic switches 930, 940. This involves the data being processed by a context modeller 950 followed by a coding engine 960.

The entropy encoder shown in Figure 17 encodes a block of data (that is, for example, data corresponding to a block of coefficients relating to a block of the residual image) as a single value if the block is formed entirely of zero-valued data. For each block that does not fall into this category, that is to say a block that contains at least some non-zero data, a "significance map" is prepared. The significance map indicates whether, for each position in a block of data to be encoded, the corresponding coefficient in the block is non-zero (and so is an example of a significance map indicative of positions, relative to an array of the data values, of most-significant data portions which are non-zero.) The significance map may comprise a data flag indicative of the position, according to a predetermined ordering of the array of data values, of the last of the most-significant data portions having a non-zero value The significance map data, being in binary form, is itself CABAC encoded. The use of the significance map assists with compression because no data needs to be encoded for a coefficient with a magnitude that the significance map indicates to be zero. Also, the significance map can include a special code to indicate the final non-zero coefficient in the block, so that all of the final high frequency I trailing zero coefficients can be omitted from the encoding. The significance map is followed, in the encoded bitstream, by data defining the values of the non-zero coefficients specified by the significance map.

Further levels of map data are also prepared and are CABAC encoded. An example is a map which defines, as a binary value (1 = yes, 0 = no) whether the coefficient data at a map position which the significance map has indicated to be "non-zero" actually has the value of "one". Another map specifies whether the coefficient data at a map position which the significance map has indicated to be "non-zero" actually has the value of "two". A further map indicates, for those map positions where the significance map has indicated that the coefficient data is "non-zero", whether the data has a value of "greater than two". Another map indicates, again for data identified as "non-zero", the sign of the data value (using a predetermined binary notation such as 1 for +, 0 for-, or of course the other way around).

In embodiments, the significance map and other maps are generated from the quantised transform coefficients, for example by the scan unit 360, and is subjected to a zigzag scanning process (or a scanning process selected from zigzag, horizontal raster and vertical raster scanning according to the intra-prediction mode) before being subjected to CABAC encoding.

In some embodiments, the HEVC CABAC entropy coder codes syntax elements using the following processes: The location of the last significant coefficient (in scan order) in the TU is coded.

For each 4x4 coefficient group (groups are processed in reverse scan order), a significant-coefficient-group flag is coded, indicating whether or not the group contains non-zero coefficients. This is not required for the group containing the last significant coefficient and is assumed to be 1 for the top-left group (containing the DC coefficient). If the flag is 1, then the following syntax elements pertaining to the group are coded immediately following it: Significance map: For each coefficient in the group, a flag is coded indicating whether or not the coefficient is significant (has a non-zero value). No flag is necessary for the coefficient indicated by the last-significant position.

Greater-than-one map: For up to eight coefficients with significance map value 1 (counted backwards from the end of the group), this indicates whether the magnitude is greater than 1.

Greater-than-two flag: For up to one coefficient with greater-than-one map value 1 (the one nearest the end of the group), this indicates whether the magnitude is greater than 2.

Sign bits: For all non-zero coefficients, sign bits are coded as equiprobable CABAC bins, with the last sign bit (in reverse scan order) possibly being instead inferred from parity when sign bit hiding is used.

Escape codes: For any coefficient whose magnitude was not completely described by an earlier syntax element, the remainder is coded as an escape code.

In general terms, CABAC encoding involves predicting a context, or a probability model, for a next bit to be encoded, based upon other previously encoded data. If the next bit is the same as the bit identified as "most likely" by the probability model, then the encoding of the information that "the next bit agrees with the probability model" can be encoded with great efficiency. It is less efficient to encode that "the next bit does not agree with the probability model", so the derivation of the context data is important to good operation of the encoder. The term "adaptive" means that the context or probability models are adapted, or varied during encoding, in an attempt to provide a good match to the (as yet uncoded) next data.

Using a simple analogy, in the written English language, the letter "U" is relatively uncommon. But in a letter position immediately after the letter "0", it is very common indeed.

So, a probability model might set the probability of a "U" as a very low value, but if the current letter is a "0", the probability model for a "U" as the next letter could be set to a very high probability value.

CABAC encoding is used, in the present arrangements, for at least the significance map and the maps indicating whether the non-zero values are one or two, though each of these syntax elements may not be coded for every coefficient. Bypass processing -which in these embodiments is identical to CABAC encoding but for the fact that the probability model is fixed at an equal (0.5:0.5) probability distribution of is and Os, is used for at least the sign data and the parts of the coefficient magnitude that have not been described by an earlier syntax element. For those data positions identified as having parts of their coefficient magnitude not fully described, a separate so-called escape data encoding can be used to encode the actual remaining value of the data, where the actual magnitude value is the remaining magnitude value plus an offset derived from the respective coded syntax elements. This may include a Golomb-Rice encoding technique.

The CABAC context modelling and encoding process is described in more detail in WD4: Working Draft 4 of High-Efficiency Video Coding, JCTVC-F803_d5, Draft ISO/IEC 23008-HEVC; 201x(E) 2011-10-28.

A significant feature of the CABAC system is that a particular CABAC bitstream has to be decoded by a single decoder. That is to say, the CABAC data for a particular individual bitstream is inherently serialised, because each encoded value depends on previously encoded values, and cannot be handled by multiple decoders in parallel. However, when decoding video at very high operating points (for example, high bit rates and/or high quality such as professional quality), the CABAC throughput requirement is such that it becomes difficult to implement an entropy-decoder capable of decoding the worst-case frame in a timely manner.

This can be addressed through the parallel decoding of multiple slices or slice segments (provided that the picture is partitioned in this way), noting that a slice or slice segment is defined, in part, by the fact that it is self-contained and does not require access to any previous or following slice or slice segment in order to be entropy-decoded. It is also worth noting at this point that a slice cannot extend over a frame boundary, although a particular frame can be represented by a single slice. So, multiple parallel CABAC decoders can be provided with individual slices (or slice segments) or groups of slices being directed to each of the decoders.

But in the worst-case partitioning of the image (at least from the point of view of parallel CABAC decoding), there may be only one slice per frame and therefore to decode the data using the available decoding resources, multiple frames must be decoded in parallel. This necessitates a larger buffer on the output of the set of CABAC decoders and creates a frame delay as the buffer is filled. Though this is not an insurmountable problem, it would normally be preferable to enable the use of a "low-latency" mode (not involving full-frame delays) in which separate regions of each frame can be decoded in parallel, reducing the delay.

Figure 18 is a schematic diagram of a parallelised CABAC decoder. The basic decoding technique makes use of the technology discussed above and will not be described in detail in connection with Figure 18. The purpose of describing Figure 18 is to allow a discussion of the parallel handling of data decoding.

Referring to Figure 18, an incoming data stream 1000 for decoding is passed to a demultiplexer 1010 which splits the datastream amongst multiple parallel decoders 1020. As discussed above, the datastream is split between the multiple decoders on a slice-by-slice basis (or as groups of slices), such that any individual slice is not split between multiple decoders but is handled by a single respective decoder. The reason for this constraint is the serial nature of the CABAC encoded data as discussed above. (Note that the split can in principle be on a slice-segment-by-slice-segment basis; this should be assumed in the present discussion, because for clarity of the discussion the choice between slices and slice segments will not be mentioned explicitly on each occasion. Note also that the term "slice" will therefore be used in a generic sense of a portion which does not depend on another portion for decoding", to indicate a slice or a slice segment).

Note that the number of decoders 1020 depends on routine system design parameters.

For example, if a video decoder is implemented such that each CABAC decoder 1020 can handle, say, 3⁄4 of the data rate associated with the highest data rate video data which is to be received by the apparatus, then in order to provide real-time operation at least four such decoders 1020 would need to be provided. However, a 3⁄4 -rate system (as in this example) may in fact be designed to include more than four such decoders. Consider an example in which each frame was divided into five equally sized slices. A four-decoder system as just described could keep up with the data rate of the input data, because each such decoder in this example system could handle 3⁄4 of the total data rate demanded of the system. However, there would still be a potential delay or latency introduced because only four of the five slices making up an image could be handled concurrently. To address this, some video decoders, in which each CABAC decoder can handle 1/n of the total required data rate, make use of more than n decoders 1020. To allow the system to handle the most inconveniently divided arrangement of slices, some examples make use of 2n-1 decoders 1020.

It is significant to note that the possible use of parallel entropy decoding is a design decision relating to the video decoder. It would be possible, in principle, to impose constraints relating to this design decision on the operation of the video encoder, for example by imposing constraints on allowable ways in which the images can be partitioned into slices at the encoding stage. However, it is noted that the speed of operation of a decoder (which is the driver behind the need for parallelised operation) may vary in the future, for example as faster chipsets are developed. So, it is considered inappropriate to impose such restrictions, which stem from the current state of the art in decoder implementation, onto the encoder and the encoded bitstream.

Accordingly, an aim of the present disclosure is not to have to impose such constraints on the encoding operation.

After decoding, the respective decoded data streams are passed to a buffer 1030 which fulfils two main purposes in this context. Firstly, it acts as a multiplexer to recombine the split decoded datastreams in the correct order into a single output decoded datastream 1040. Its second purpose is to buffer the data in order to cope with timing differences in the output data.

However, in a so-called low latency mode in which no individual slice takes longer than a frame period to decode, the buffer 1030 can be small, or may not be needed at all, other than as a multiplexer to reassemble the output datastream.

As mentioned above, in a low latency mode, a buffer function may not be needed. or the amount of buffer (and hence delay) needed might just be smaller than in a higher latency mode.

If a buffer is not needed in a low latency mode, then it could be bypassed (though retaining the function of recombining the streams). Alternatively, data could still be passed through the buffer but with the write and read pointers set so that little or no actual buffering delay is imposed.

In some instances the amount of buffering required might be zero. For example, If the stream is intra and the tiles (if necessary depending on image size) are favourably arranged, the decoded pictures can be passed straight to the display (over a multi-link). Internal buffers are needed to store each slice while operations such as deblocking are applied, but there is no need for a full-frame buffer.

The buffer can be removed if not just the entropy decoder but the entire decoder path (inverse quantisers (10), inverse transform (IT) and prediction (Pred)) were repeated and if the requirements for decoding meant that each tile was just one slice and that the tile structure matched the next device (for example, a display). One example would be if technology permitted HD to be decoded in one slice, and the 4K requirements were a 4 tile structure, in which case 4K could be decoded without buffer.

However, a more realistic example might be as follows.: Process HD, where multiple slices are required to get CABAC throughput, but only 1 lO/IT/Pred.

To reduce the buffer requirements, split the picture into multiple column-based tiles of one slice each. The buffer would only need to be 32 rows (the biggest transform size in current proposals), although the base unit could be 64 rows (the general maximum LCU height).

If there were multiple entropy coders, each with their own 10, IT and Pred, then, although small LCU buffers might be needed for internal processing, the only buffer would be prior to recombining for deblocking and output, where 2 rows of LCUs (128 rows) would be required.

The significant feature here, however, is the difference in the amount of data that has to be buffered in the low latency arrangement (in which the division into slices is such that the data corresponding to a frame can be decoded in substantially a frame period) is lower than the amount of data which has to be buffered in a higher latency arrangement in which multiple or large portions of frames are being decoded in parallel such that the decoding process takes longer than a frame period, for example two or more frame periods; in such higher latency situations the buffering would need to be longer, for example at least one frame period.

Figure 19 is a schematic timing diagram showing the operation of the system of Figure 18 in a low latency" mode. Here, time is represented on a schematic axis from left to right.

An upper part of Figure 19 schematically represents an image or frame 1100 formed of (in this example) four slices 1100A..D, which are represented schematically as respective rectangles. A lower portion of Figure 19 provides a timing diagram in which frame period 1110 is illustrated. The timing is then shown for the parallel decoding of the four slices, noting as before that an individual slice has to be routed to a single decoder 1020. In four subsequent blocks the decoding period for each of the slices 1 bOA... D is illustrated. The total latency 1115, between the start of reception of the block 1100A and the end of decoding of the block 11000, is also illustrated. This equates or substantially equates to the frame period 1110.

Accordingly, the decoding of all four slices in this example can start simultaneously at the start of decoding (with perhaps a small stagger delay to allow the data reader to seek ahead in the bit stream). This allows a full frame to be decoded within a frame period with only quarter-speed decoding.

In the HEVC system in the form in which the draft standards exist at the priority date of the present application, the number of slices per frame, and the amount of encoded data represented by each slice, can vary from frame to frame.

Purely for illustration as an example, Figure 20 schematically illustrates a variation in slice number and configuration from frame to frame, in which a first frame 1120 is divided into five slices, a second frame 1130 is divided into four slices, a third frame 1140 is divided into twenty slices, a fourth frame 1150 is divided into four slices and a fifth frame 1160 is represented as a single slice.

The worst-case latency in the decoding process is affected (at least) by the maximum size of a slice in an image. Returning to the previous example, if each CABAC decoder 1020 can handle 1/n of the total data rate, and 2n-1 such decoders are provided so as to cope with the worst case inconvenient division of the frames into slices, then as long as no slice in a frame exceeds 1/n of the total data content of the encoded frame, then a so-called low latency mode in which slices of the frame are distributed amongst the decoders 1020 and are decoded substantially during a single frame period, can be used. However, if the size (for example, the data quantity) of an individual slice is greater than 1/n of the total of a frame, then a higher latency mode must be used because it will be impossible for any individual one of the decoders 1020 to decode that slice during a single frame period.

In itself, the question of whether a low latency mode or a higher latency mode is used need not affect either the quality or real-time nature of the output decoded data. However, the present disclosure recognises that if a decoder starts operation in the low latency mode and then, because of a change in the slice structure, has to move to the higher latency mode, this change can cause a pause in the delivery of output data, so breaking the real-time nature of the output data.

The present disclosure recognises that in order for the decoder to know that it will able to bypass (avoid use ofl its output buffer 1030, or at least to operate in the low latency mode without risk of such a pause at a later transition to a higher latency mode, it needs to know at the start of decoding that every frame in the sequence will be split into at least a minimum number of slices, with each slice being limited to contain no more than a certain proportion of the data. If the decoder does not have this information, then either the decoder risks a pause in the output data during a later switch from low latency operation to a higher latency operation, or the decoder has to operate all the time in a higher latency mode.

So, this information is provided simply to allow a decoder which might need to know the future progression of the slice structure to be aware of that future progression. A decoder which does not need to know this information (for example, a decoder which uses a single CABAC decoder which is so fast as not to need any parallel operation) can ignore the information. Note that the provision of the information does not, of itself, represent a constraint placed on the operation of the encoder, other than a voluntary one. Under the present disclosure, the encoder could select a mode of operation in which fewer or larger slices are used. A significant feature is just that the encoder includes information defining the constraints on the future progression of the slice structure which the encoder itself has chosen to apply.

It is known in high-bit-rate decoder implementation to operate at a specified number of pixels per second (rather than bits or CABAC bins per second). Therefore it is desirable to express the information discussed here as a limit on the number of pixels that may be present in each slice.

For smaller frame sizes, this subdivision can be effected using wavefront-parallel processing (entropy_coding_sync). As the frame size increases, however, the number of possible wavefronts increases only in proportion to the frame height, while the degree of parallelism required increases in proportion to the frame size (width * height); there is also no further parallelism offered as frame rates are increased. Considering that the HEVC standard defines sample rates up to those equating to 8k-120p video, the parallelism available through wavefront-parallel processing is not by itself sufficient at the highest operating points.

An additional feature will now be mentioned, further details being discussed further below. It is known for devices capable of processing video formats beyond HD to subdivide each frame into regular subsections. When a decoder is outputting to such devices, it is desirable to know at the start of decoding that each frame in the sequence is split into a tile structure that is a superset of the subdivision structure used by the device. This allows the decoder to avoid having to reformat the output frame, which would otherwise incur a 1-frame delay. The decoder would need to know of the availability of low-latency at the start of processing, since having to change out of a low-latency mode once it is in use would stall the output.

To signal the slice/tile limitations to the decoder, a new SEI (supplemental enhancement information) message is used.. This message has scope over the entire coded video sequence and contains syntax elements informing the decoder of the slice/tile limitations. Since SF1 messages are optionally encoded and decoded, there is no requirement for this new message to be implemented in present and future applications where it is not needed.

Note that the SEI message to be discussed below contains two pieces of information.

This is an example arrangement. An SEI or other message could be provided which is defined so as to contain only one or other of the pieces of information to be discussed below. That is to say, it is not essential that the message defines that both information types are provided, either at all or in the same message.

This provides an example of inserting a message into the encoded output data such that the message and the encoded output data can be processed separately by a data decoder.

The message may comprise a HEVC supplemental enhancement information message.

By standardising the latency control in this manner, interoperability across implementations can be ensured, avoiding the need for proprietary variants of the standards which impose arbitrary restrictions on the encoder operation. In addition, the flexibility of this syntax allows the limitations which the encoders might voluntarily place on their own operation to be eased as technology improves.

An SEI message is an example of a so-called NAL (network abstraction layer) unit. (For completeness, note that an SEI NAL unit can actually contain multiple SEI messages, concatenated together. However, just one is considered here).

Figure 21 schematically illustrates a set of NAL units 1200, one of which contains an SEI message 1210. The significance for the purposes of this discussion is that any NAL unit is independent of any other NAL unit to the extent that the presence or absence of any individual such unit does not affect the ability of the decoder to decode the remaining parts of the data stream. So, the SEI message under discussion can be included in the datastream but if it is not required by a particular decoder, it can be ignored without causing any impact on the ability of that particular decoder to decode the remainder of the datastream.

In the present example, a part of the SEI message under discussion defines a maximum number of blocks (see below for more detail) which can be present in any individual slice. In turn, this defines the maximum quantity of data which the decoders 1020 will have to deal with in connection with any individual slice. The presence of this indication in the SEI message defines that this limit will not be broken in respect of either the current sequence or the remainder of the current stream. A second, optional, portion of the SEI message indicates that at least some parameters of the slice structure will not change for the remainder of the current sequence or the current stream. These restrictions are enforced regardless of information contained in subsequent picture parameter sets (PPS5) within the sequence or stream. This provides an example of a constraint indicating a maximum data quantity of encoded data in each data portion.

Note that the portions of the SEI are referred to as "optional" in the sense that the SEI message may be defined so as not to include those portions. Both portions are optional in this context. If a portion is defined as being part of the SF1 message then it should be sent by the encoder, if the encoder elects to send that SEI message. However, in the example syntax below, the value of either syntax element can be zero, indicating that no limit or constraint is applied.

An example of the syntax of the SF1 message is as follows: latency_control( payloadSize) { maximum_ctus_per_slice_segment constant_tile_structure_flag This SEI message contains information that can be used by a decoder to determine the required degree of latency in order to maintain video rate.

maximum_ctus_per_slice_segment specifies a maximum number of instances of the coding_tree syntax element (defined in the HEVC standards. A coding_tree is a quadtree of CUs coding for a specific region (the size of which is specified in the sequence parameter set) of the picture) that may appear in a slice segment. Note that in this example, although the unit under consideration may be a slice, the independent unit is a "slice segment" each segment is a separate NAL unit and hence a separate CABAC stream. The distinction between this and a slice is such that a slice may be made up of one independent slice segment (i.e. one that has a header) and any number of dependent slice segments (ones that take their header information from the primary segment). It allows the slice to be broken up over multiple NAL units if desired.

Mien rnaximum_ctus_per_slice_segrnent is equal to zero, an unlimited number of instances of the coding_tree syntax element may be present in a slice segment. When not present, the value of maximum_ctus_per_slice_segment is inferred to be 0. When maximum_ctus_per_slice_segment is equal to zero, an unlimited number of instances of the coding_tree syntax element may be present in a slice segment. In other words, unless the flag is 0, this represents an example of a constraint applied to the portion (slice or slice segment) structure or format of the encoded video.

This provides an example of encoding each slice as a tree structure of encoding units, the constraint indicating a maximum number of encoding units which may be present in each slice.

constant_tile_structure_flag equal to 1 indicates that the values of the tiles_enabled_flag, num_tile_columns_minusl, num_tile_rows_minusi, uniform_spacing_flag, column_width_minusl and row_height_minusi syntax elements shall be identical for all picture parameter sets activated within the CVS (in other words, having scope over a set of pictures).

When not present, the value of constant_tile_structure_flag is inferred to be 0. In other words, unless the flag is 0, this represents an example of a constraint applied to the portion (tile) structure or format of the encoded video by specifying a tile structure. This provides an example of a constraint comprising a constraint on the spatial arrangement of slices or groups of slices in each frame.

Note that a constraint on the maximum size of the slice indirectly imposes a constraint on the minimum number of slices needed to encode each frame.

The SEI message can be included in the datastream such that any frames forming pad of the same sequence (or in other embodiments the entire remainder of the datastream) are constrained by the encoder to within the limits and constraints specified by the SEI message.

A data stream including such a message provides an example of an instance of distribution of encoded data which is partitioned into data portions, the encoded data comprising a data message indicating a constraint on the portion format of each data portion in respect of a set of the data portions.

As mentioned above, maintaining a consistent frame sub-structure can be relevant to certain types of video processing apparatus which handle video data in predetermined subdivisions of a frame. An example of such a device is a display device but other examples may be considered. Figure 22 schematically illustrates such a video processing device in which input video data 1300 is buffered and formatted by a buffer 1310 before being handled as separate spatial image regions, a feature which is shown schematically by the routing of the data from the buffer 1310 to separate data handling devices 1320. In order to achieve this, the buffer 1310 needs to receive and store sufficient data to form each processing subdivision before the subdivisions can be fully output for further processing. In the example of a display device, all of the subdivisions are processed (displayed) at the same time and so in order to achieve this each of the subdivisions must have been received and formatted by a consistent time.

Figures 23a to 23c schematically illustrate example processing subdivision configurations. Figure 23a represents a set of veitical subdivision. Figure 23b represents an array of vertically spaced and horizontally spaced subdivisions. Figure 23c represents a set of horizontal subdivisions. The particular subdivision representation would normally be fixed (or at least not easy to change) according to design parameters of the video processing device under consideration. So, in the example of a display device, the subdivision structure might relate to the way in which physical interconnections to the display are provided and so the processing subdivision structure may be difficult or impossible to change from frame to frame.

However, an issue can arise because of the need to receive all of the data for a paiticular subdivision location before outputting that tile. If the tile structure by which the video data is encoded and decoded changes from frame to frame, this can affect the delay latency involved in building up each tile's worth of data for output.

The tiles are a picture handling method employed during the coding and decoding process. Ideally, the coding tile structure should align with the way in which the display or other device (Figure 22) subdivides itself (the processing subdivisions discussed above) so that our decoder can start outputting to all (for example) four data handling devices or routes 1320 as soon as it starts decoding without some of the devices 1320 having to wait for data that will be decoded later. It would be no problem here to have multiple tiles per subdivision, as the decoder could simply decode first the tiles that correspond to the first part of each subdivision, but it causes more issues to have tiles which cross subdivisions, because then one of the subdivisions would have to wait while data corresponding to the other subdivision is being decoded. In practice, decoders do not stagger or stall subdivision outputs in this way if the tile structure is not favourable, but rather a frame buffer is used to wait for the entire frame to be decoded before reformatting it into a form suitable for the display -thus incurring a frame delay.

As before, there can be a problem if the relationship between encoding/decoding tiles and processing subdivisions changes, such that extra latency is introduced. This can cause a pause in the handling (for example display) of the decoded video.

Figure 24a and 24b schematically illustrate changes in latency in the operation of the device of Figure 22.

Referring to Figure 24a, an encoding/decoding tile structure 1350 comprises (for the sake of this simplified example) four tiles 1350A, 13509, 1350C, 1350D, processed in that order. The processing subdivision structure corresponding to the division of the data according to Figure 22 comprises four horizontal processing subdivisions 1360. In order to output the upper pair 1362 of subdivisions 1360, the tiles 1350A and 135DB must have been processed. In order to output the lower pair of subdivisions 1364, the tiles 1350C and 1350D must have been processed.

Consider, however, the case where the encoding/decoding slice structure has changed to a structure 1370 formed of two tiles 1370E, 1370F (processed in that order) as shown in Figure 24b. Here, none of the output subdivisions 1360 can be output until both tiles 1370E and 1370F have been fully processed. Here, the latency imposed by the buffer 1310 has increased to a whole frame period.

Taken individually, neither latency is necessarily a problem, but as before a change from the operation shown in Figure 24a to the operation shown in Figure 24b (purely by way of example) can cause the real-time processing or output of the video data to be broken, in that when the change is first made, there will be a half frame period during which no output data can be processed.

For this reason, the buffer 1310 can be arranged (as in Figure 25) to be responsive to the SEI message or data derived from that message discussed above, so that the buffer 1310 knows, in respect of the current sequence or stream, (a) the worst case minimum size and (b) that the overall tile structure will remain consistent. The buffer 1310 can then set its internal latency to suit the worst case of those defined by the SEI message such that either there will be no change over the course of the current sequence or stream or, if there is a change, it will not cause an unwanted pause in the output or processing of the video data.

Although the example has been used of an SEI message, it will be appreciated that other ways could be used to communicate either or both of the types of information discussed above. At a general level, the information just needs to be included in or associated with the data stream. While the information could be provided as separate metadata, it is considered convenient to include it in the datastream itself. But as alternatives to SEI messages, the data could be included in, for example, sequence parameter set (SPS) data, picture parameter set (PPS) data, other user-definable data fields or even as dummy image data in respect of a first or early image of the sequence.

In at least some embodiments the data is defined so as to "have scope" over a group of two or more pictures. This means that the information applies to all of the pictures in that group.

The group could be the current sequence (for example a set of pictures in which each picture references a consistent sequence identifier), the whole of the remainder of the current datastream, all pictures until receipt of a subsequent such message, or a set of pictures defined by data present in or associated with the SEI message or other data. This is an example of selecting a constraint in respect of a sequence comprising two or more frames, the message having scope over the two or more frames.

Examples have been provided above of the type of constraints which could be applied by the encoder to its own output and which are defined by the SEI message or other representation. But other indications may be used as well or instead. In respect of the number of CTUs, alternative or additional data may define, for example, one or more of: the maximum number of bits per slice, the maximum number of CABAC EP (equiprobable) bins per slice, and the number of CABAC bins (EP and context-coded) per slice. This provides an example of encoding the data, at least in part, using a binary arithmetic coding technique in which data values are encoded as one or more binary values, the binary arithmetic coding technique being, for example, a context adaptive binary arithmetic coding technique, and the constraint comprising a maximum number of the binary values encoded in each slice.

Referring to Figure 26, a method of operation of a data encoding apparatus in which data can be partitioned for encoding into data portions to form encoded output data comprises: the data encoding apparatus selecting (at a step 1500) a constraint on the portion format of encoding and decoding portions of the encoded video, for example on the maximum size of each data portion in respect of a set of data portions to be encoded and/or on the processing tile structure; the data encoding apparatus associating (at a step 1510) a message with the encoded output data, the message indicating the selected constraint; and the data encoding apparatus encoding (at a step 1520) the set of data portions according to the selected constraint on the maximum size of the data portions.

Note that as discussed extensively above, a constraint on the portion format can be represented by one or both of (a) a constraint on a maximum portion size, and (b) a constraint on the portion spatial layout in respect of, for example, image data, or more generally the relationship of portions to one another relative to data positions in the data to be encoded.

Note that the selection of a constrain by the encoder is considered "voluntary" in the sense that a particular mode of operation, in respect of the matters to be selected, is not required by the standards underlying the remainder of the operation of the encoder. However, the selection in respect of a particular encoder can be part of a set of design parameters applicable to that encoder. For example, consider a manufacturers which produces both encoders and decoders. The parameters of that manufacturer's encoder could be pre-arranged so as to select a structure based on the input image size and frame rate in order that that manufacturer's decoder can operate in low-latency mode. But this is not a requirement of the underlying standard. Because however the SEI message is standardised, any decoder can still operate to decode the stream, such that if a particular decoder receives a stream from an unknown source for which the associated message indicates that the decoder can successfully operate in a low latency mode, the decoder will start in low-latency mode. Otherwise, the decoder will either start in a higher latency mode, for example representing the worst case division which could later be imposed on the slice structure, or will risk a pause in the output data as discussed above.

Accordingly, the "selection" by the encoder can be according to predetermined design parameters associated with that encoder, but it still represents a selection in the sense that the encoding slice or other structure is not imposed upon that encoder by the underlying standards.

Referring to Figure 27, a method of operation of a data decoding apparatus in which data to be decoded is partitioned into data portions for handling by a group of data decoders acting in parallel comprises:: detecting (at a step 1530) a message, associated with data to be decoded, indicating a constraint on the portion format of encoding and decoding portions of the encoded video, for example on the maximum size of each data portion in respect of a set of data portions to be encoded and/or on the processing tile structure; allocating (at a step 1540) the data portions to respective decoders of the group of decoders for decoding; selecting (at a step 1550) a buffer size according to the constraint indicated by the detected message; and recombining (at a step 1560) the decoded data output by the group of decoders to form a single set of decoded data, the recombining step comprising buffering the decoded data using the selected buffer size.

More generally, not all of these steps are required, so that a method of operation of a data decoding apparatus in which data to be decoded is partitioned into data portions for handling by a group of data decoders acting in parallel, can comprise detecting a message (1530), associated with data to be decoded, indicating a constraint on the portion format of each data portion in respect of a set of data portions; selecting (1550) a buffer size according to the constraint indicated by the detected message; and buffering the decoded data using the selected buffer size.

Although different types of data could be used, the techniques are particularly applicable to video data representing successive video frames. The data portions can be slices (or slice segments), such that any one frame can be represented by one or more slices, and that any one slice may not contain data representing more than one frame. The constraint can be a constraint on the maximum size of the slices.

Respective features of the present disclosure are defined by the following numbered clauses: 1. A method of operation of a data encoding apparatus in which data can be partitioned for encoding into data portions to form encoded output data, the method comprising: the data encoding apparatus selecting a constraint on the portion format of a set of data portions to be encoded; the data encoding apparatus associating a message with the encoded output data, the message indicating the selected constraint; and the data encoding apparatus encoding the set of data portions according to the selected constraint on the portion format of the data portions.

2. A method according to clause 1, in which: the data to be encoded is video data representing successive video frames; the data portions are slices, such that any one frame can be represented by one or more slices, and that any one slice may not contain data representing more than one frame; and the constraint is a constraint on the maximum size of the slices.

3. A method according to clause 2, in which the selecting step comprises selecting the constraint in respect of a sequence comprising two or more frames, the message having scope over the two or more frames.

4. A method according to clause 2 or clause 3, in which the encoding step comprises encoding the data, at least in part, using a binary arithmetic coding technique in which data values are encoded as one or more binary values.

5. A method according to clause 4, in which the binary arithmetic coding technique is a context adaptive binary arithmetic coding technique.

6. A method according to clause 4 or clause 5, in which the constraint comprises a maximum number of the binary values encoded in each slice.

7. A method according to any one of clauses 2 to 6, in which the encoding step comprises encoding each slice as a tree structure of encoding units, the constraint indicating a maximum number of encoding units which may be present in each slice.

8. A method according to any one of clauses 2 to 7, in which the constraint comprises a constraint on the spatial arrangement of slices or groups of slices in each frame.

9. A method according to any one of the preceding clauses, in which the associating step comprises inserting the message into the encoded output data such that the message and the encoded output data can be processed separately by a data decoder.

10. A method according to clause 9, in which the message comprises a HEVC supplemental enhancement information message.

11. A method according to any one of the preceding clauses, in which the constraint indicates a maximum data quantity of encoded data in each data portion.

12. A method of operation of a data decoding apparatus in which data to be decoded is partitioned into data portions for handling by a group of data decoders acting in parallel, the method comprising: detecting a message, associated with data to be decoded, indicating a constraint on the portion format of each data portion in respect of a set of data portions; selecting a buffer size according to the constraint indicated by the detected message; and buffering the decoded data using the selected buffer size.

13. A method according to clause 12, comprising: allocating the data portions to respective decoders of the group of decoders for decoding; and recombining the decoded data output by the group of decoders to form a single set of decoded data, the recombining step comprising buffering the decoded data using the selected buffer size.

14. A method according to clause 12 or clause 13, comprising: processing the decoded data according to a set of processing subdivisions.

15. Computer software which, when executed by a computer, causes the computer to carry out the method of any one of the preceding clauses.

16. A data encoding apparatus in which data can be partitioned for encoding into data portions to form encoded output data, the apparatus comprising: a selector configured to select a constraint on the portion format of each data portion in respect of a set of data portions to be encoded; a message associator configured to associate a message with the encoded output data, the message indicating the selected constraint; and an encoder configured to encode the set of data portions according to the selected constraint on the portion format of the data portions.

17. A data decoding apparatus comprising: a group of decoders configured in parallel for decoding respective data portions of input data to be decoded; a detector configured to detect a message, associated with data to be decoded, indicating a constraint on the portion format of each data portion in respect of a set of data portions; a selector configured to select a buffer size according to the constraint indicated by the detected message; and a buffer configured to buffer the decoded data using the selected buffer size.

18. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to clause 16 or clause 17.

19. Encoded data which is partitioned into data portions, the encoded data comprising a data message indicating a constraint on the portion format of each data portion in respect of a set of the data portions.

20. An instance of distribution of encoded data according to clause 19.

As discussed earlier, it will be appreciated that apparatus features of the above clauses may be implemented by respective features of the encoder or decoder as discussed earlier.

Claims

CLAIMS1. A method of operation of a data encoding apparatus in which data can be partitioned for encoding into data portions to form encoded output data, the method comprising: the data encoding apparatus selecting a constraint on the portion format of a set of data portions to be encoded; the data encoding apparatus associating a message with the encoded output data, the message indicating the selected constraint; and the data encoding apparatus encoding the set of data portions according to the selected constraint on the portion format of the data portions.
2. A method according to claim 1, in which: the data to be encoded is video data representing successive video frames; the data portions are slices, such that any one frame can be represented by one or more slices, and that any one slice may not contain data representing more than one frame; and the constraint is a constraint on the maximum size of the slices.
3. A method according to claim 2, in which the selecting step comprises selecting the constraint in respect of a sequence comprising two or more frames, the message having scope over the two or more frames.
4. A method according to claim 2, in which the encoding step comprises encoding the data, at least in part, using a binary arithmetic coding technique in which data values are encoded as one or more binary values.
5. A method according to claim 4, in which the binary arithmetic coding technique is a context adaptive binary arithmetic coding technique.
6. A method according to claim 4, in which the constraint comprises a maximum number of the binary values encoded in each slice.
7. A method according to claim 2, in which the encoding step comprises encoding each slice as a tree structure of encoding units, the constraint indicating a maximum number of encoding units which may be present in each slice.
8. A method according to claim 2, in which the constraint comprises a constraint on the spatial arrangement of slices or groups of slices in each frame.
9. A method according to claim 1, in which the associating step comprises inserting the message into the encoded output data such that the message and the encoded output data can be processed separately by a data decoder.
10. A method according to claim 9, in which the message comprises a HEVC supplemental enhancement information message.
11. A method according to claim 1, in which the constraint indicates a maximum data quantity of encoded data in each data portion.
12. A method of operation of a data decoding apparatus in which data to be decoded is partitioned into data portions for handling by a group of data decoders acting in parallel, the method comprising: detecting a message, associated with data to be decoded, indicating a constraint on the portion format of each data portion in respect of a set of data portions; selecting a buffer size according to the constraint indicated by the detected message; and buffering the decoded data using the selected buffer size.
13. A method according to claim 12, comprising: allocating the data portions to respective decoders of the group of decoders for decoding; and recombining the decoded data output by the group of decoders to form a single set of decoded data, the recombining step comprising buffering the decoded data using the selected buffer size.
14. A method according to claim 12, comprising: processing the decoded data according to a set of processing subdivisions.
15. A non-transitory, machine-readable storage medium which stores computer software which, when executed by a computer, causes the computer to carry out the method of claim 1.
16. A data encoding apparatus in which data can be partitioned for encoding into data portions to form encoded output data, the apparatus comprising: a selector configured to select a constraint on the portion format of each data portion in respect of a set of data portions to be encoded; a message associator configured to associate a message with the encoded output data, the message indicating the selected constraint; and an encoder configured to encode the set of data portions according to the selected constraint on the portion format of the data portions.
17. A data decoding apparatus comprising: a group of decoders configured in parallel for decoding respective data portions of input data to be decoded; a detector configured to detect a message, associated with data to be decoded, indicating a constraint on the portion format of each data portion in respect of a set of data portions; a selector configured to select a buffer size according to the constraint indicated by the detected message; and a buffer configured to buffer the decoded data using the selected buffer size.
18. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to claim 16.
19. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to claim 17.
20. An instance of distribution of encoded data which is partitioned into data portions, the encoded data comprising a data message indicating a constraint on the portion format of each data portion in respect of a set of the data portions.