CN114009029A

CN114009029A - Image data encoding and decoding

Info

Publication number: CN114009029A
Application number: CN202080044703.8A
Authority: CN
Inventors: 斯蒂芬·马克·基廷; 卡尔·詹姆斯·沙曼; 阿德里安·理查德·布朗
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2019-06-25
Filing date: 2020-06-24
Publication date: 2022-02-01
Also published as: WO2020260879A1; GB2585042A; WO2020260880A1; EP3991419A1; BR112021025666A2; GB2585111A; US20220360783A1; EP3991420A1; US20220360782A1; JP2022539311A; GB201909143D0; EP3991419B1; GB201919471D0; KR20220027162A; CN113994680A; PL3991419T3

Abstract

The image data encoding apparatus includes: an entropy encoder for selectively encoding a data item representing image data to be encoded by a first Context Adaptive Binary Arithmetic Coding (CABAC) encoding system or a second bypass encoding system to generate encoded binary symbols; the image data represents one or more pictures of a sequence of pictures, each picture comprising two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures; wherein each subsection is affected by a respective minimum compression ratio.

Description

Image data encoding and decoding

Technical Field

The present disclosure relates to image data encoding and decoding.

Background

The "background" description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.

There are several video data encoding and decoding systems that involve converting video data into a frequency domain representation, quantizing the frequency domain coefficients, and then applying some form of entropy coding to the quantized coefficients. This can achieve compression of video data. A corresponding decoding or decompression technique is applied to recover the reconstructed version of the original video data.

High Efficiency Video Coding (HEVC), also known as H.265 or MPEG-H part 2, is a successor to the proposed H.264/MPEG-4 AVC. The video quality is improved for HEVC, the data compression ratio is doubled compared with H.264, and the data compression ratio can be expanded from 128 multiplied by 96 to 7680 multiplied by 4320 pixel resolution, which is approximately equivalent to the bit rate of 128kbit/s to 800 Mbit/s.

Disclosure of Invention

The present disclosure solves or alleviates the problems caused by this process.

The present disclosure provides an image data encoding apparatus including:

an entropy encoder for selectively encoding a data item representing image data to be encoded by a first Context Adaptive Binary Arithmetic Coding (CABAC) encoding system or a second bypass encoding system to generate encoded binary symbols;

the image data represents one or more pictures of a sequence of pictures, each picture comprising two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures;

wherein each subsection is affected by a respective minimum compression ratio.

The present disclosure also provides an image data encoding method, including:

selectively encoding a data item representing image data to be encoded by a first Context Adaptive Binary Arithmetic Coding (CABAC) encoding system or a second bypass encoding system to generate an encoded binary symbol;

the image data represents one or more pictures of a sequence of pictures, each picture comprising two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures; and is

Generating an output data stream;

the generating step is constrained by defining a respective minimum compression ratio applicable to each sub-portion.

The present disclosure also provides an image data encoding apparatus including:

the entropy encoder is configured to generate an output data stream subject to a constraint that defines an upper bound on a number of binarization symbols that can be represented by any individual output data unit relative to a byte size of the output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit and, for each output data unit that does not satisfy the constraint, provide padding data to increase the byte size of the output data unit to satisfy the constraint.

an image data encoder for applying compression coding to generate compressed image data representing one or more pictures of a sequence of pictures, each picture comprising an output data unit representing two or more sub-portions that are decodable and reconstructable independently of the picture or other sub-portions of the sequence of pictures;

wherein the apparatus is operable in accordance with an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data and/or output data stream to be encoded, wherein the image data encoding apparatus is configured to apply the respective encoding profile to each sub-portion of a picture irrespective of the encoding profile applied to any other sub-portion of the picture.

The present disclosure also provides an image data encoding method, including:

the image data representing one or more pictures of a sequence of pictures, each picture comprising two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures;

generating an output data stream;

the generating step is subject to a constraint that defines an upper bound on the number of binarized symbols, which may be represented by any individual output data unit relative to the byte size of that output data unit, wherein the generating step includes applying the constraint to each output data unit; and is

For each output data unit that does not satisfy the constraint, padding data is provided to increase the byte size of the output data unit to satisfy the constraint.

The present disclosure also provides an image data encoding method, including:

compression encoding image data representing one or more pictures of a sequence of pictures, each picture comprising two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures;

wherein the compression encoding step is operable in accordance with an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data and/or output data stream to be encoded, wherein the compression encoding step comprises applying the respective encoding profile to each sub-portion of a picture irrespective of the encoding profile applied to any other sub-portion of the picture.

Further corresponding aspects and features of the present disclosure are defined in the appended claims.

It is to be understood that both the foregoing general description and the following detailed description are exemplary, but are not restrictive, of the present technology.

Drawings

A more complete appreciation of the present disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

fig. 1 schematically illustrates an audio/video (a/V) data transmission and reception system using video data compression and decompression;

FIG. 2 schematically illustrates a video display system using video data decompression;

FIG. 3 schematically illustrates an audio/video storage system using video data compression and decompression;

FIG. 4 schematically illustrates a camera using video data compression;

fig. 5 and 6 schematically illustrate a storage medium;

FIG. 7 provides a schematic diagram of a video data compression and decompression apparatus;

FIG. 8 schematically illustrates a predictor;

FIG. 9 schematically shows a partially encoded image;

FIG. 10 schematically shows a set of possible intra prediction directions;

FIG. 11 schematically illustrates a set of prediction modes;

FIG. 12 schematically illustrates another set of prediction modes;

FIG. 13 schematically illustrates an intra prediction process;

fig. 14 schematically shows a CABAC encoder;

fig. 15 and 16 schematically illustrate CABAC encoding techniques;

fig. 17 and 18 schematically illustrate CABAC decoding techniques;

FIG. 19 schematically shows a segmented image;

FIG. 20 schematically illustrates an apparatus;

FIG. 21 is a schematic flow chart diagram illustrating a method;

FIG. 22 schematically shows a set of sub-portions of a picture;

fig. 23 schematically shows a set of sub-picture parameter data;

FIG. 24 schematically illustrates a data signal; and

fig. 25 is a schematic flow chart illustrating a method.

Detailed Description

Referring now to the drawings, fig. 1-4 provide schematic diagrams of an apparatus or system utilizing a compression and/or decompression apparatus as will be described below in connection with embodiments of the present technology.

All data compression and/or decompression devices to be described below may be implemented in hardware, software running on a general-purpose data processing device (e.g., a general-purpose computer), programmable hardware (e.g., an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA)) or a combination of these. Where embodiments are implemented by software and/or firmware, it should be understood that such software and/or firmware, and non-transitory data storage media storing or otherwise providing such software and/or firmware, are considered embodiments of the present technology.

Fig. 1 schematically shows an audio/video data transmission and reception system using video data compression and decompression.

The input audio/video signal 10 is provided to a video data compression device 20 that compresses at least the video component of the audio/video signal 10 for transmission along a transmission path 30, such as a cable, fiber optic, wireless link, or the like. The compressed signal is processed by a decompression apparatus 40 to provide an output audio/video signal 50. For the return path, compression device 60 compresses the audio/video signals for transmission along transmission path 30 to decompression device 70.

The compression device 20 and the decompression device 70 may thus form one node of the transmission link. Decompression apparatus 40 and compression apparatus 60 may form another node of the transmission link. Of course, in case the transmission link is unidirectional, only one node needs the compression device, while the other node only needs the decompression device.

Fig. 2 schematically illustrates a video display system using video data decompression. In particular, the compressed audio/video signal 100 is processed by a decompression device 110 to provide a decompressed signal that can be displayed on a display 120. Decompression apparatus 110 may be implemented as an integral part of display 120, e.g., disposed within the same housing as the display device. Alternatively, decompression apparatus 110 may be provided, for example, as a so-called set-top box (STB), noting that the expression "set-top box" is not meant to require that the box be located in any particular orientation or position relative to display 120; but is a term used in the art to denote a device that can be connected to a display as a peripheral device.

Fig. 3 schematically shows an audio/video storage system using video data compression and decompression. The input audio/video signal 130 is provided to a compression device 140 that generates a compressed signal for storage by a storage device 150, e.g., a magnetic disk device, an optical disk device, a tape device, a solid state storage device, e.g., a semiconductor memory, or other storage device. For playback, the compressed data is read from the storage device 150 and passed to a decompression apparatus 160 for decompression to provide an output audio/video signal 170.

It should be appreciated that a compressed or encoded signal, and a storage medium (e.g., a machine-readable non-transitory storage medium) that stores the signal, are considered to be embodiments of the present technology.

Fig. 4 schematically shows a video camera using video data compression. In fig. 4, an image capture device 180 (e.g., a Charge Coupled Device (CCD) image sensor and associated control and readout electronics) generates a video signal that is transmitted to a compression device 190. The microphone (or microphones) 200 generates an audio signal to be transmitted to the compression device 190. The compression device 190 generates a compressed audio/video signal 210 to be stored and/or transmitted (shown generally as illustrative stage 220).

The techniques to be described below relate primarily to video data compression and decompression. It will be appreciated that many of the prior art techniques may be used for audio data compression in conjunction with the video data compression techniques to be described to generate compressed audio/video signals. Therefore, a separate discussion of audio data compression will not be provided. It should also be appreciated that the data rate associated with video data, particularly broadcast quality video data, is typically much higher than the data rate associated with audio data, whether compressed or uncompressed. It will therefore be appreciated that uncompressed audio data may accompany compressed video data to form a compressed audio/video signal. It should also be appreciated that although the present examples (shown in fig. 1-4) relate to audio/video data, the techniques described below may find use in systems that simply process (i.e., compress, decompress, store, display, and/or transmit) video data. That is, embodiments may be applied to video data compression without any associated audio data processing at all.

Thus, fig. 4 provides an example of a video capture device that includes an image sensor and an encoding device of the type discussed below. Thus, fig. 2 provides an example of a decoding device of the type to be discussed below and a display to which the decoded image is output.

The combination of fig. 2 and 4 may provide a video capture device including an image sensor 180 and an encoding device 190, a decoding device 110, and a display 120 to which decoded images are output.

Fig. 5 and 6 schematically illustrate storage media storing compressed data generated by the

device

20, 60, for example, compressed data input to the device 110 or storage media or

stage

150, 220. Fig. 5 schematically illustrates a disk storage medium such as a magnetic disk or optical disc, and fig. 6 schematically illustrates a solid-state storage medium, e.g., flash memory. Note that fig. 5 and 6 may also provide examples of a non-transitory machine-readable storage medium storing computer software that, when executed by a computer, causes the computer to perform one or more methods that will be discussed below.

The above-described arrangements thus provide examples of video storage, capture, transmission or reception devices embodying any of the present technology.

Fig. 7 provides a schematic diagram of a video data compression and decompression apparatus.

The controller 343 controls the overall operation of the apparatus and, in particular when compressed mode is involved, controls the attempted encoding process by acting as a selector to select various operating modes, e.g. block size and shape, and whether the video data is to be losslessly encoded or otherwise encoded. The controller is considered to form part of an image encoder or an image decoder (as the case may be). Successive pictures of the input video signal 300 are provided to an adder 310 and a picture predictor 320. The image predictor 320 will be described in more detail below with reference to fig. 8. The image encoder or decoder (as the case may be) plus the intra-image predictor of fig. 8 may use features from the apparatus of fig. 7. However, this does not mean that each feature of fig. 7 is necessarily required for an image encoder or decoder.

The adder 310 actually performs a subtraction (negative addition) operation because the input video signal 300 is received at the "+" input and the output of the image predictor 320 is received at the "-" input, thereby subtracting the predicted image from the input image. The result is a so-called residual image signal 330, which represents the difference between the actual image and the projected image.

One reason for generating the residual image signal is as follows. The data encoding techniques to be described (that is, the techniques to be applied to the residual image signal) tend to work more efficiently when there is less "energy" in the image to be encoded. Herein, the term "effectively" refers to generating a small amount of encoded data; for a particular image quality level, it is desirable (and considered "effective") to generate as little data as possible. The "energy" referred to in the residual image relates to the amount of information contained in the residual image. If the predicted image is the same as the real image, the difference between the two (that is, the residual image) will contain zero information (zero energy) and will be very easy to encode into a small amount of encoded data. In general, if the prediction process can be made to work reasonably well so that the predicted image content is similar to the image content to be encoded, it is expected that the residual image data will contain less information (less energy) than the input image and will therefore be easier to encode into a small amount of encoded data.

The remainder of the device acting as an encoder (encoding the residual or difference image) will now be described. Residual image data 330 is provided to a transform unit or circuit 340 that generates a Discrete Cosine Transform (DCT) representation of a block or region of residual image data. DCT techniques are well known per se and will not be described in detail here. Note also that the use of a DCT is merely illustrative of one exemplary arrangement. Other transforms that may be used include, for example, Discrete Sine Transforms (DST). The transforms may also comprise a sequence or concatenation of individual transforms, e.g. one transform followed by an arrangement of another transform (whether direct or not). The choice of transform may be determined explicitly and/or may depend on the side information used to configure the encoder and decoder.

The output of transform unit 340 (that is, in one example, a set of DCT coefficients for each transform block of image data) is provided to quantizer 350. Various quantization techniques are known in the art of video data compression, ranging from simple multiplication by a quantization scale factor to the application of complex look-up tables under control of a quantization parameter. The overall goal is twofold. First, the quantization process reduces the number of possible values of the transform data. Second, the quantization process may increase the likelihood that the value of the transform data is zero. Both of which may make the entropy encoding process described below more efficient in producing small amounts of compressed video data.

The scanning unit 360 applies a data scanning process. The purpose of the scanning process is to reorder the quantized transform data to group together as many non-zero quantized transform coefficients as possible, and of course therefore as many zero-valued coefficients as possible. These features may allow efficient application of so-called run-length coding (run-length coding) or similar techniques. Thus, the scanning process involves selecting coefficients from quantized transform data, in particular from blocks of coefficients corresponding to blocks of image data that have been transformed and quantized, according to a "scan order" such that (a) all coefficients are selected once as part of the scan, and (b) the scan tends to provide the desired reordering. One example of a scanning order that tends to give useful results is a diagonal order, e.g. the so-called right diagonal scanning order.

The scanned coefficients are then passed to an Entropy Encoder (EE) 370. Also, various types of entropy coding may be used. Two examples are the variant of the so-called CABAC (context adaptive binary arithmetic coding) system and the variant of the so-called CAVLC (context adaptive variable length coding) system. In general, CABAC is believed to provide better efficiency, and in some studies CABAC has been shown to provide a 10-20% reduction in the amount of encoded output data for comparable image quality compared to CAVLC. However, CAVLC is considered to exhibit lower complexity (in terms of its implementation) than CABAC. Note that the scanning process and the entropy coding process are shown as separate processes, but may in fact be combined or processed together. That is, the data may be read into the entropy encoder in scan order. Corresponding considerations apply to the corresponding inverse process to be described below.

The output of the entropy coder 370, along with, for example, additional data (mentioned above and/or discussed below) that defines the manner in which the predictor 320 generates the predicted image, provides a compressed output video signal 380.

However, a return path 390 is also provided, since the operation of the predictor 320 itself depends on the decompressed version of the compressed output data.

The reason for this feature is as follows. At an appropriate stage of the decompression process (described below), a decompressed version of the residual data is generated. This decompressed residual data must be added to the predicted image to generate the output image (since the original residual data is the difference between the input image and the predicted image). In order to make the process comparable, the prediction image generated by the predictor 320 should be the same between the compression side and the decompression side in the compression process and the decompression process. Of course, at decompression, the device cannot access the original input image, but only the decompressed image. Thus, upon compression, the predictor 320 bases its prediction (at least for inter-picture coding) on a decompressed version of the compressed picture.

The entropy encoding process performed by the entropy encoder 370 is considered (in at least some examples) to be "lossless," that is, may be reversed to obtain the exact same data as was first provided to the entropy encoder 370. Thus, in such an example, the return path may be implemented before the entropy coding stage. In practice, the scanning process performed by the scanning unit 360 is also considered lossless, so in this embodiment the return path 390 is from the output of the quantizer 350 to the input of the complementary inverse quantizer 420. In the case where a stage introduces losses or potential losses, the stage (and its inverse) may be included in a feedback loop formed by the return path. For example, the entropy coding stage may be at least in principle lossy, e.g. by techniques that encode bits in parity information. In this case, entropy encoding and decoding should form part of a feedback loop.

In general, the entropy decoder 410, the inverse scanning unit 400, the inverse quantizer 420, and the inverse transform unit or circuit 430 provide respective inverse functions of the entropy encoder 370, the scanning unit 360, the quantizer 350, and the transform unit 340. Currently, the discussion will continue through the compression process; the process of decompressing the input compressed video signal will be discussed separately below.

During compression, the scanned coefficients are transferred from the quantizer 350 to the inverse quantizer 420 via the return path 390, which performs the inverse operation of the scanning unit 360. Inverse quantization and inverse transformation processes are performed by

units

420, 430 to generate a compression-decompressed residual image signal 440.

The image signal 440 is added to the output of the predictor 320 at adder 450 to generate a reconstructed output image 460. This forms one input to the image predictor 320, as described below.

Turning now to the process applied to decompress the received compressed video signal 470, the signal is provided to the entropy decoder 410 and from there to the chain of inverse scan unit 400, inverse quantizer 420 and inverse transform unit 430, and then added by adder 450 to the output of the image predictor 320. Thus, on the decoder side, the decoder reconstructs a version of the residual image, which is then applied (by adder 450) to a predicted version of the image (block-by-block) to decode each block. Briefly, the output 460 of the adder 450 forms the output decompressed video signal 480. In practice, further filtering may optionally be applied before outputting the signal (e.g., by filter 560 shown in fig. 8, but omitted from fig. 7 for clarity of the high level diagram of fig. 7).

The devices of fig. 7 and 8 may act as compression (encoding) devices or decompression (decoding) devices. The functions of the two devices substantially overlap. The scanning unit 360 and the entropy encoder 370 are not used for the decompression mode, and the predictor 320 (to be described in detail below) and other units operate following the mode and parameter information contained in the received compressed bitstream, rather than generating such information itself.

Fig. 8 schematically shows the generation of a prediction image, in particular the operation of the image predictor 320.

There are two basic modes of prediction performed by the image predictor 320: so-called intra-picture prediction and so-called inter-picture or Motion Compensated (MC) prediction. On the encoder side, each involves detecting the prediction direction with respect to the current block to be predicted and generating a predicted block of samples from other samples (in the same (intra) or another (inter) picture). By means of the

unit

310 or 450, the difference between the predicted block and the actual block is encoded or applied to encode or decode the block, respectively.

(at the decoder, or on the inverse decoding side of the encoder, the detection of the prediction direction may indicate which direction to use at the encoder in response to data associated with the data encoded by the encoder.

Intra-picture prediction predicts the content of a block or region of a picture based on data from within the same picture. This corresponds to so-called l-frame coding in other video compression techniques. However, in contrast to l-frame encoding, which involves encoding the entire image by intra-frame encoding, in this embodiment the selection between intra and inter encoding can be made block by block, although in other embodiments the selection is still made image by image.

Motion compensated prediction is an example of inter-picture prediction, using motion information to attempt to define the source of picture detail to be encoded in a current picture in another neighboring or nearby picture. Thus, in an ideal example, the content of a block of image data in a predicted image can be very simply encoded as a reference (motion vector) to a corresponding block in the same or slightly different position in a neighboring image.

The technique known as "block copy" prediction is, in some aspects, a mixture of the two, in that a vector is used to indicate a block of samples at a position within the same picture that is displaced from the current predicted block, which block of samples should be copied to form the current predicted block.

Returning to fig. 8, two picture prediction settings (corresponding to intra-picture and inter-picture prediction) are shown, the results of which are selected by the multiplexer 500 under control of a mode signal 510 (e.g., from controller 343) to provide a block of predicted pictures to be provided to the

adders

310 and 450. The selection is made according to which selection gives the lowest "energy" (which, as mentioned above, can be considered as the information content that needs to be encoded), and is communicated to the decoder within the encoded output data stream. In this case, the image energy may be detected, for example, by performing trial subtraction on regions of two versions of a predicted image from the input image, squaring each pixel value of the difference image, summing the squared values, and identifying which of the two versions produces a lower mean square value of the difference image associated with that image region. In other examples, trial encoding may be performed for each selection or potential selection, and then selected according to one or both of the number of bits required for picture encoding and distortion according to the cost of each potential selection.

In an intra-coding system, the actual prediction is based on image blocks received as part of the signal 460, that is, the prediction is based on encoding decoded image blocks, so that the exact same prediction can be made in the decompression device. However, data may be derived from the input video signal 300 by the intra mode selector 520 to control the operation of the intra picture predictor 530.

For inter-picture prediction, the motion compensated predictor 540 uses motion information, e.g., motion vectors, derived by the motion estimator 550 from the input video signal 300. Motion compensated predictor 540 applies these motion vectors to a processed version of reconstructed image 460 to generate an inter-image predicted block.

Thus, both units 530 and 540 (operating with estimator 550) act as detectors to detect the prediction direction with respect to the current block to be predicted, and as generators to generate a predicted block of samples (forming part of the prediction passed to units 310 and 450) from other samples defined by the prediction direction.

The processing applied to the signal 460 will now be described. First, the signal is optionally filtered by a filter unit 560, which will be described in more detail below. This includes applying a "deblocking" filter to eliminate or at least tend to reduce the impact of block-based processing and subsequent operations performed by transform unit 340. A Sample Adaptive Offset (SAO) filter may also be used. Further, optionally, the adaptive loop filter is applied using coefficients obtained by processing the reconstructed signal 460 and the input video signal 300. An adaptive loop filter is a filter that applies adaptive filter coefficients to data to be filtered using known techniques. That is, the filter coefficients may vary depending on various factors. Data defining which filter coefficients to use is included as part of the encoded output data stream.

When the device is operating as a decompression device, the filtered output from the filter unit 560 effectively forms the output video signal 480. It is also buffered in one or more image or frame memories 570; the storage of successive pictures is a requirement of the motion compensated prediction process, in particular the generation of motion vectors. To conserve memory requirements, the images stored in image memory 570 may be saved in compressed form and then decompressed for use in generating motion vectors. For this purpose, any known compression/decompression system may be used. The stored image may be passed to an interpolation filter 580, which generates a higher resolution version of the stored image; in this example, intermediate samples (subsamples) are generated such that the resolution of the interpolated image output by the interpolation filter 580 is 4 for the luminance channel stored in the image memory 570: 2: 4 times the resolution of the image of 0 (in each dimension) and the chroma channels stored in the image memory 570 are 4: 2: 0 (in each dimension) is 8 times the resolution of the image. The interpolated image is passed as input to a motion estimator 550 and also to a motion compensated predictor 540.

The manner in which the image is segmented for compression processing will now be described. Basically, the image to be compressed is considered as an array of blocks or regions of samples. The image may be partitioned into such blocks or regions by a decision tree, for example, the decision tree described in Bross et al, "High Efficiency Video Coding (HEVC) text specification draft 6", JCTVC-h1003_ d0 (11 months 2011), the contents of which are incorporated herein by reference. In some examples, the resulting blocks or regions have a size, and in some cases shapes, that by virtue of a decision tree can generally follow the arrangement of image features within an image. This in itself may allow for improved coding efficiency, as samples representing or following similar image features will tend to be grouped together by this arrangement. In some examples, square blocks or regions of different sizes (e.g., 4 × 4 samples up to 64 × 64 blocks or more) are available for selection. In other example arrangements, differently shaped blocks or regions may be used, for example, rectangular blocks (e.g., vertically or horizontally oriented). Other non-square and non-rectangular blocks are contemplated. The result of dividing the image into such blocks or regions is that (at least in this example) each sample of the image is assigned to one and only one such block or region.

The intra prediction process will now be discussed. Generally, intra prediction involves generating a prediction of a current block of samples from previously encoded and decoded samples in the same picture.

Fig. 9 schematically shows a partially encoded image 800. Here, the image is coded block by block from top left to bottom right. An example block encoded in the middle of processing the entire image is shown as block 810. Shaded areas 820 above and to the left of block 810 have been encoded. Intra-image prediction of the content of block 810 may utilize any shaded region 820, but cannot utilize the non-shaded region below it.

In some examples, the image is coded block-by-block such that larger blocks (referred to as coding units or CUs) are coded in the order discussed, for example, with reference to fig. 9. Within each CU, it is possible for the CU to be processed as a set of two or more smaller blocks or Transform Units (TUs) (depending on the block partitioning process that has occurred). This may give a hierarchical order of encoding such that pictures are encoded CU-by-CU, and each CU is potentially encoded TU-by-TU. Note, however, that for a single TU within the current coding tree unit (the largest node in the tree structure of block partitioning), the hierarchical order of coding (CU-by-CU, TU-by-TU) discussed above means that there may be previously coded samples in the current CU and available for coding of that TU, e.g., top right or bottom left of that TU.

Block 810 represents a CU; as described above, this may be subdivided into a set of smaller units for the purpose of the intra-image prediction process. An example of a current TU 830 is shown within CU 810. More generally, pictures are partitioned into sample regions or groups of samples to allow for efficient encoding of signaling information and transform data. The signaling of information may require a tree structure different from the sub-partitions of the transform, in effect a tree structure that predicts information or predicts itself. For this, the coding unit may have a tree structure different from the transform block or region, the prediction block or region, and the prediction information. In some examples, such as HEVC, the structure may be a so-called quadtree of coding units, whose leaf nodes contain one or more prediction units and one or more transform units; the transform unit may contain a plurality of transform blocks corresponding to luminance and chrominance representations of the image, and the prediction may be considered to be applicable to the transform block level. In an example, the parameters applied to a particular sample group may be considered to be defined primarily at the block level, which may be different from the granularity of the transform structure.

Intra-picture prediction takes into account samples coded before considering the current TU, e.g., samples above and/or to the left of the current TU. The source samples of the prediction needed samples may be located at different positions or directions relative to the current TU. To decide which direction is appropriate for the current prediction unit, the mode selector 520 of the example encoder may test all combinations of available TU structures for each candidate direction and select the prediction direction and TU structure with the best compression efficiency.

Pictures may also be coded on a "slice" basis. In one example, a slice is a horizontally adjacent set of CUs. But more generally the entire residual image may form one slice, or one slice may be a single CU, or one slice may be a row of CUs, and so on. Since slices are encoded as independent units, slices can provide some fault tolerance. The encoder and decoder states are completely reset at the slice boundary. For example, intra prediction is not performed on slice boundaries; for this reason, the slice boundary is regarded as an image boundary.

More generally, a picture (which may form part of a sequence of pictures) may be encoded on a sub-portion basis, where each sub-portion is independently decodable and reconstructable (that is, independent of any other sub-portion of the picture or sequence of pictures, that is, the encoding parameters are not shared and independent of the sub-portion boundaries). For example, the sub-portions may represent sub-portions in a list including sub-pictures, slices, and tiles, respectively.

In these examples, (i) the sub-picture represents a region of the picture; (ii) a slice represents a portion of a picture, sub-picture, or tile (tile) in raster order and is constrained to be encapsulated in a corresponding Network Abstraction Layer (NAL) unit; and (iii) tiles represent a portion of a picture, sub-picture, or slice that defines respective horizontal and vertical boundaries in a grid arrangement, and are not limited to encapsulation in respective NAL units.

Fig. 10 schematically shows a set of possible (candidate) prediction directions. The entire set of candidate directions is available to the prediction unit. The direction is determined by the horizontal and vertical displacement with respect to the current block position, but is encoded as a prediction "mode", a set of modes being shown in fig. 11. Note that the so-called DC mode represents a simple arithmetic average of the surrounding upper left samples. Note also that the set of directions shown in FIG. 10 is only one example; in other examples, the set of (for example) 65 angular modes plus DC sum planes (the entire set of 67 modes) schematically illustrated in fig. 12 constitutes the entire set. Other numbers of modes may be used.

In general, after detecting the prediction direction, the system is operable to generate a block of prediction samples from other samples defined by the prediction direction. In an example, an image encoder is configured to encode data identifying a prediction direction selected for each sample or region of an image (and an image decoder is configured to detect such data).

Fig. 13 schematically illustrates an intra prediction process in which samples 900 of a block or region 910 of samples are derived from other reference samples 920 of the same picture according to a direction 930 defined by the intra prediction mode associated with the sample. The reference samples 920 in this example are from the blocks above and to the left of the block 910 in question, and a prediction value for the sample 900 is obtained by tracking the reference sample 920 along the direction 930. The direction 930 may point to a single individual reference sample, but in a more general case, an interpolated value between surrounding reference samples is used as the predicted value. Note that block 910 may be square as shown in fig. 13, or may be other shapes such as rectangular.

Fig. 14 and 15 schematically illustrate the previously proposed reference sample projection process.

In fig. 14 and 15, a block or region 1400 of samples to be predicted is surrounded by a linear array of reference samples from which intra prediction of the predicted samples is performed. The reference samples 1410 are shown as shaded blocks in fig. 14 and 15, and the samples to be predicted are shown as unshaded blocks. Note that in this example an 8 x 8 block or region of samples to be predicted is used, but the technique is applicable to variable block sizes and actual block shapes.

As described above, the reference sample comprises at least two linear arrays in respective directions with respect to the current image area of the sample to be predicted. For example, the linear array may be an array or row 1420 of samples above the block of samples to be predicted and an array or column 1430 of samples to the left of the block of samples to be predicted.

As discussed above with reference to fig. 13, the reference sample array may extend beyond the range of the block to be predicted to provide a prediction mode or direction within the range indicated in fig. 10-12. If necessary, if the previously decoded samples cannot be used as reference samples for a particular reference sample position, other reference samples may be reused at these missing positions. A reference sample filtering process may be used for the reference sample.

Fig. 14 schematically illustrates the operation of the CABAC entropy encoder.

The CABAC encoder operates on binary data (that is, data represented by only two symbols 0 and 1). The encoder utilizes a so-called context modeling process that selects a "context" or probability model for subsequent data based on previously encoded data. The selection of the context is performed in a deterministic manner, so that the same determination can be performed at the decoder based on previously decoded data without the need to add further data (specifying the context) to the encoded data stream transmitted to the decoder.

Referring to fig. 14, if the input data to be encoded is not already in binary form, it may be transmitted to a binary converter 1400; if the data is already in binary form, the converter 1400 is bypassed (via schematic switch 1410). In this embodiment, the conversion to binary form is actually achieved by representing the quantized DCT coefficient data as a series of binary "maps", as will be described further below.

The binary data may then be processed by one of two processing paths, namely a "normal" path and a "bypass" path (these two paths are shown schematically as separate paths, but may in practice be implemented by the same processing stage, using only slightly different parameters, in the embodiments of the invention discussed below). The bypass path employs a so-called bypass encoder 1420, which does not necessarily use context modeling in the same form as the conventional path. In some examples of CABAC coding, this bypass path may be chosen if a batch of data needs to be processed particularly quickly, but in this embodiment two features of the so-called "bypass" data are noted: first, the bypass data is processed by a CABAC encoder (950, 1460) using only a fixed context model representing a 50% probability; second, bypass data relates to certain classes of data, one particular example being coefficient sign data. Otherwise, the normal path is selected by the

illustrative switches

1430, 1440 operating under the control of the control circuit 1435. This includes data processed by the context modeler 1450 followed by the encoding engine 1460.

If the block is formed entirely of zero-valued data, the entropy encoder shown in FIG. 14 encodes the block of data (i.e., data corresponding to a block of coefficients associated with a residual image block, for example) as a single value. For each block that does not belong to this category, that is, a block that contains at least some non-zero data, a "significance map" is prepared. The significance map indicates, for each position in a block of data to be encoded, whether the corresponding coefficient in the block is non-zero. The significance map data in binary form is itself CABAC encoded. The use of a significance map facilitates compression since no data need be encoded for coefficients having a magnitude indicated as 0 by the significance map. Furthermore, the significance map may include a special code to indicate the final non-zero coefficients in the block, so that all final high frequency/trailing zero coefficients may be omitted from the encoding. In the encoded bitstream, the significance map is followed by data defining non-zero coefficient values specified by the significance map.

More levels of mapping data are also prepared and encoded. One example is a map that defines as a binary value (1-yes, 0-no) whether or not coefficient data at a mapping position where the significance map has indicated "non-zero" actually has a value of "1". Another mapping specifies whether the coefficient data at the mapping location where the significance map has been indicated as "non-zero" actually has a value of "2". Another mapping indicates whether the data has a value "greater than 2" for those mapping positions for which the significance map already indicates that the coefficient data is "non-zero". For data identified as "non-zero," another mapping indicates the sign of the data value (using a predetermined binary representation, e.g., 1 for +, 0 for-, and vice versa, of course).

In an embodiment of the invention, the significance map and the other maps are assigned to the CABAC encoder or the bypass encoder in a predetermined manner and both represent different respective attributes or value ranges of the same initial data item. In one example, at least the significance map is CABAC encoded and at least some of the remaining maps (e.g., symbol data) are bypass encoded. Thus, each data item is divided into respective data subsets, and the respective subsets are encoded by first (e.g., CABAC) and second (e.g., bypass) encoding systems. The nature of the data and of the CABAC and bypass codes is such that for a predetermined number of CABAC encoded data, a variable number of zero or more bypass data are generated for the same initial data item. Thus, for example, if the quantized, reordered DCT data contains substantially all zero values, no bypass data or a very small amount of bypass data may be generated, since the bypass data only relates to those mapping positions for which the significance map already indicates a value other than zero. In another example, in quantized reordered DCT data with many high-value coefficients, a large amount of bypass data may be generated.

In an embodiment of the present invention, significance maps and other maps are generated, for example, by the scanning unit 360 from quantized DCT coefficients and subjected to a zigzag scanning process (or a scanning process selected from zigzag, horizontal raster, and vertical raster scanning according to an intra-prediction mode) before being subjected to CABAC coding.

Generally, CABAC coding involves a context or probability model that predicts the next bit to be coded based on other previously coded data. If the next bit is the same as the bit identified by the probability model as "most likely", then the encoding of the information that the next bit is consistent with the probability model can be encoded very efficiently. The coding efficiency of "next bit is not consistent with the probability model" is low, so the derivation of context data is important for the encoder to work well. The term "adaptive" refers to adjusting or changing the context or probability model during encoding in an attempt to provide a good match with the next data (which has not yet been encoded).

With a simple analogy, the letter "U" is relatively uncommon in written English. But it is quite common for the letter position to follow the letter "Q". Thus, the probability model may set the probability of "U" to a very low value, but if the current letter is "Q", the probability model of "U" as the next letter may be set to a very high probability value.

In the present arrangement, CABAC coding is used at least for significance mapping and mapping indicating whether a non-zero value is 1 or 2. In these embodiments, the bypass process is the same as CABAC coding, but in fact the probability model is fixed with equal (0.5: 0.5) probability distributions of 1s and 0s, the bypass process being used at least for the mapping of the sign data and whether the indicator value > 2. For those data locations identified as >2, the actual values of the data may be encoded using a separate so-called escape data encoding. This may include Golomb-Rice coding techniques.

The CABAC context modeling and encoding process is described in more detail in WD 4: working draft 4 of high-efficiency video coding, JCTVC-F803_ d5, ISO/I EC 23008-HEVC draft; 201x (E) 2011-10-28.

Referring now to fig. 15 and 16, an entropy encoder forming part of a video encoding apparatus comprises a first encoding system (e.g., an arithmetic encoding system, e.g., CABAC encoder 1500) and a second encoding system (e.g., bypass encoder 1510) arranged such that a particular data word or value is encoded into a final output data stream by the CABAC encoder or bypass encoder, but not both. In an embodiment of the invention, the data values passed to the CABAC encoder and the bypass encoder are respective subsets of ordered data values, separate or derived from the initial input data (in this example, reordered quantized DCT data), representing different mappings in a set of "mappings" generated from the input data.

The schematic representation in fig. 15 treats the CABAC encoder and the bypass encoder as independent devices. This may be a good case in practice, but in another possibility, as schematically shown in fig. 16, a single CABAC encoder 1620 is used as the CABAC encoder 1500 and the bypass encoder 1510 of fig. 15. The encoder 1620 operates under control of the encoding mode select signal 1630 to operate with an adaptive context model (as described above) when in the mode of the CABAC encoder 1500, and a fixed 50% probability context model when in the mode of the bypass encoder 1510.

A third possibility combines both, i.e. two substantially identical CABAC encoders can operate in parallel (similar to the parallel arrangement of fig. 15), with the difference that the CABAC encoder operating as the bypass encoder 1510 fixes its context model at the 50% probability context model.

The outputs of the CABAC encoding process and the bypass encoding process may be stored (at least temporarily) in

respective buffers

1540, 1550. In the case of fig. 16, a switch or demultiplexer 1660 acts under control of a mode signal 1630 to route CABAC encoded data to a buffer 1550 and bypass the encoded data to a buffer 1540.

Fig. 17 and 18 schematically show examples of an entropy decoder forming part of a video decoding device. Referring to fig. 17, the

respective buffers

1710, 1700 convey data to a CABAC decoder 1730 and a bypass decoder 1720, which are set such that a particular encoded data word or value is decoded by either the CABAC decoder or the bypass decoder, but not both. Logic 1740 reorders the decoded data into the proper order for the subsequent decode stages.

The schematic representation in fig. 17 treats the CABAC decoder and the bypass decoder as independent settings. This may be a good case in practice, but in another possibility, as schematically shown in fig. 18, a single CABAC decoder 1850 is used as the CABAC decoder 1730 and the bypass decoder 1720 of fig. 17. The decoder 1850 operates under control of a decode mode select signal 1860 to operate with an adaptive context model (as described above) when in the mode of the CABAC decoder 1730, and with a fixed 50% probability context model when in the mode of the bypass encoder 1720.

As previously mentioned, a third possibility combines both, i.e. two substantially identical CABAC decoders can be operated in parallel (similar to the parallel arrangement of fig. 17), with the difference that the CABAC decoder operating as the bypass decoder 1720 fixes its context model to the 50% probability context model.

In the case of fig. 18, a switch or multiplexer 1870 acts under the control of a mode signal 1860 to properly route CABAC encoded data from buffer 1700 or buffer 1710 to decoder 1850.

Fig. 19 schematically illustrates a picture 1900 and will be used to illustrate various picture partitioning schemes relevant to the following discussion.

One example of picture segmentation is slices or "regular slices". Each regular slice is encapsulated in its own Network Abstraction Layer (NAL) unit. Prediction within the picture (e.g., intra-sample prediction, motion information prediction, coding mode prediction) and entropy coding dependencies across slice boundaries are not allowed. This means that one regular slice can be reconstructed independently of the other regular slices in the same picture.

So-called tiles (tiles) define horizontal and vertical boundaries to divide a tile into rows and columns of tiles. In a manner corresponding to conventional slices, intra-picture prediction dependencies are not allowed to cross tile boundaries, nor are entropy decoding dependencies allowed. However, tiles are not limited to being contained in a single NAL unit.

A sub-picture represents a region of a picture and is independently decodable and reconstructable.

An exemplary object of the present technique is as follows:

allowing the picture to be composed of a plurality of sub-parts;

the multiple sub-portions may be processed separately, possibly merged into other combinations, e.g. a 360 ° representation or an intended viewport in a panoramic or 360 ° representation;

the separate parts may be extracted and decoded by a single decoder.

In general, there may be multiple tiles in a slice, or there may be multiple slices in a tile, and one or more tiles in each tile are located in a picture or sub-picture.

These are examples of sub-portions, where each sub-portion is independently decodable and reconstructable (for the purposes of at least some embodiments) (that is, independent of any other sub-portion of the picture or sequence of pictures, i.e., the encoding parameters are not shared and do not depend on sub-portion boundaries). For example, the sub-portions may represent sub-portions in a list including sub-pictures, slices, and tiles, respectively.

The schematic example of fig. 19 shows 4

slices

1910, 1920, 1930, 1940, where the slice 1940 comprises 2

tiles

1950, 1960. However, as mentioned above, this is only one arbitrary illustrative example.

In some example settings, there is a threshold for the number of bins (EPs or CABAC) that can be encoded in a slice or picture according to the following equation:

BinCountsinNalUnits<＝(4/3)*NumByteslnVclNalUnits+(RawMinCuBits*PicSizelnMinCbsY)/32

the right side of the equation depends on the sum of two parts: these parts are constant values (rawmncubits @ picsizelnminby) of a specific image region, related to the size of the slice or picture; and a dynamic value (numbyteslnvclnallunits), which is the number of bytes encoded in the output stream of the subportion or picture. Note that the value 1.25 represents a binary (bin) number per bit.

Rawmicubits is the number of bits in the original CU of minimum size, typically 4 x 4; and picsizylnminccbsy is the number of smallest size CUs in a subsection or picture.

If this threshold is exceeded, a CABAC zero word (3 bytes with a value of 000003) will be appended to the stream until the threshold is reached. Each such zero word increases the dynamic value by 3.

This constraint (or other version with different constants) can be generally expressed as:

N<＝K1*B+(K2*CU)

wherein:

n is the number of binary symbols in the output data unit;

k1 is a constant;

b is the number of coding bytes of the output data unit;

k2 is a variable depending on the properties of the minimum-size coding unit employed by the image data encoding apparatus; and

CU is the size of the subdivision represented by the output data unit, expressed as the number of coding units of minimum size.

In the previously proposed example, this threshold check is performed at the picture and slice level.

However, as noted with reference to fig. 19, a picture or slice may be partitioned into multiple tiles. One example of why this is done is to allow the use of multiple concurrent (parallel) decoders.

Under the previously proposed setup, each tile does not necessarily satisfy the threshold calculations discussed above. For example, if tiles are used or decoded independently as pictures, or if different tiles (e.g., with different quantization parameters or from different sources) are synthesized together, this may lead to problems, and there is no guarantee that the synthesized slice or picture will meet the above specifications.

To address this issue, in an example embodiment, the CABAC threshold is applied to the end of each sub-portion, rather than to the end of each slice or picture separately. Thus, the application of the threshold occurs at the end of encoding any of the tiles, slices, and pictures. That said, if every tile in the image meets the threshold, it can be assumed that the entire picture must also meet the threshold, so in the case of dividing the picture into slices or tiles, it is not necessary to apply the threshold again at the end of encoding the picture.

More generally, embodiments of the present disclosure apply the threshold to each of two or more sub-portions of the picture, respectively.

The terms "tile" and "slice" refer to independently decodable units and denote names applicable to the example sub-parts used at the priority date of the present application. In the case of subsequent or other name changes, the setup is applicable to other such independently decodable units. Note that the term "sub-picture" also refers to an example of a sub-portion.

To apply the equations discussed above, the dynamic value represents the number of bytes encoded in the output stream of the tile, while the fixed value depends on the number of minimum size Coding Units (CUs) in the tile.

Fig. 20 schematically shows a device configured to perform this test. Referring to fig. 20, at input 2000, a CABAC/EP encoded stream is received from an encoder. The detector 2010 detects whether the above threshold calculation is met at a predetermined stage of completion of the referencing of the sub-portion, for example, at the end of encoding the slice, tile, or sub-picture. The controller 2020 controls the generator 2030 in response to detection by the detector 2010 to generate padding data 2040, e.g. the CABAC zero words described above, and append it to the stream by the combiner 2050 to form the output stream 2060. The generation of the zero words may also be signaled back to the detector 2010 so that when the zero words are appended, the detector 2010 may continue to monitor whether the threshold is met and, once the threshold is met, cause the controller 2020 to stop the generation of the zero words.

The apparatus of fig. 7 and 14 operating in accordance with the principles just described represent an example of an image data encoding apparatus comprising:

an entropy encoder (fig. 14) for selectively encoding a data item representing image data to be encoded by a first Context Adaptive Binary Arithmetic Coding (CABAC) encoding system or a second bypass encoding system to generate encoded binary symbols;

the image data represents one or more pictures, each picture comprising an output data unit representing:

(i) one or more slices within respective Network Abstraction Layer (NAL) units, each slice of a picture being independently decodable from any other slice of the same picture; and

(ii) zero or more tiles that define respective horizontal and vertical boundaries of a picture region, and that are not limited to encapsulation within respective NAL units, that are decodable independent of other tiles of the same picture;

the entropy encoder is configured to generate an output data stream subject to a constraint that defines an upper limit on the number of binarization symbols that can be represented by any individual output data unit relative to a byte size of the output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit that represents a slice and each output data unit that represents a tile, and for each output data unit that does not satisfy the constraint, provide padding data to increase the byte size of the output data unit to satisfy the constraint.

Similarly, the apparatus of fig. 7 and 14 operating in accordance with the principles described represents one example of an image data encoding apparatus comprising:

wherein each subsection is affected by a respective minimum compression ratio.

For example, the second encoding system may be a binary arithmetic coding system (e.g., bypass encoder/decoder 1420) using a fixed 50% probability context model.

As described above, the detector 2010 may be configured to detect whether the current output data unit satisfies the constraint at a predetermined stage of encoding with respect to the current output data unit; and the padding data generator 2030 may be configured to generate and insert enough padding data into the current output data unit such that the output data unit including the inserted padding data satisfies the constraint.

The predetermined phase may be the end of encoding the current output data unit.

the entropy encoder is configured to generate an output data stream subject to a constraint defining an upper bound on a number of binarization symbols that are representable by any individual output data unit relative to a byte size of the output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit and, for each output data unit that does not satisfy the constraint, to provide padding data to increase the byte size of the output data unit to satisfy the constraint.

Fig. 21 is a schematic flowchart showing an image data encoding method, including:

selectively encoding (at step 2100) a data item representing image data to be encoded by a first Context Adaptive Binary Arithmetic Coding (CABAC) encoding system or a second bypass encoding system to generate an encoded binary symbol;

Generating (at step 2110) an output data stream;

Encoding profile and minimum compression ratio

The following techniques may be applied in conjunction with any of the techniques discussed above, or may be applied independently of one or more of the techniques discussed above.

Fig. 22 schematically shows an example set of sub-portions 2210, 2220 of a picture 2200. For example, the sub-portions may be so-called sub-pictures or any other type of sub-portion discussed above.

Each sub-portion has associated parametric data defining its encoding and decoding aspects. Such parameter data may be provided once (or once for a sequence of pictures of at least several pictures, assuming repeated and identical subdivision between successive pictures), or in principle may be provided once for/at each picture for a particular subdivision comprised in that particular picture. The parameter data may be provided as a sub-portion 2310 of the parameter set 2300 (fig. 23) associated with the picture or as a separate sub-portion parameter set, e.g. by sub-portion header data associated with the sub-portion representing the encoding described for the sub-portion. These may be transmitted in association with the coded data stream representing the picture data itself, e.g. as header data, SEI (supplemental enhancement information) messages, sequence parameter sets (which may contain information about the sub-portions, in fact instructions on how to reconstruct the complete picture), etc.

One example aspect of encoding and decoding is the so-called Minimum Compression Ratio (MCR), which has been previously associated with picture parameter data relating to the entire picture. The use of MCR provides at least the following benefits: the buffer (e.g., the compressed image buffer on the decoder and/or encoder side) is allowed to be of an appropriate size so that it does not overflow. Indeed, in some examples, the MCR may be defined at least in part by the rate at which data is removed from the compressed picture buffer.

The MCR may be defined by, for example, a ratio that indicates a final fractional data amount denominator (e.g., 2, indicating a compression ratio 1/2, or 1/2, indicating the same compression ratio) or as a maximum data amount for a picture (e.g., X kilobytes, where an uncompressed version of a picture requires 2X kilobytes). The particular choice of presentation format is not of technical significance to the present discussion.

In examples of the present disclosure, each subsection is affected by a respective minimum compression ratio. For example, such a corresponding MCR may be defined by parametric data associated with the sub-portions based on a particular sub-portion in a particular picture, or based on the sub-portions defined in a sequence of pictures. In other words, associated data may be provided to define such MCRs on a subdivision-by-subdivision basis, and processing is performed to implement and/or comply with such MCRs, which processing is processed on a subdivision-by-subdivision basis. This does not require that the respective MCRs have to be different, but are specified separately.

Accordingly, example embodiments relate to an image data encoding apparatus (and corresponding methods) comprising:

wherein each subsection is affected by a respective minimum compression ratio.

Optionally (in such an arrangement), the entropy encoder may be configured to generate the output data stream subject to a constraint defining an upper limit on the number of binarization symbols that may be represented by any individual output data unit relative to the byte size of that output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit and, for each output data unit that does not satisfy the constraint, to provide padding data to increase the byte size of that output data unit to satisfy the constraint.

The use of a single MCR for each subsection is associated with the advantages discussed above, since the subsections are independent of each other and can be decoded and buffered separately without fear that such a buffer may overflow.

If a sub-part does not fit its associated MCR, the encoder will have to make a different choice to ensure that the threshold is met, e.g. using a higher quantization step size, or discarding the coefficient data.

Another example technique to associate MCRs with sub-portions is to allocate a portion of the total minimum amount of data applicable to the entire picture for each picture at the encoder and/or decoder side. For example, this may be performed according to a comparison (e.g., proportionally) of the number of pixels or samples in the sub-portion to the number of pixels or samples in the entire picture. In some examples, this may be derived directly from the sub-portion size. In other examples, a value (e.g., n/256ths, where n is between 1 and 256) may be associated with each sub-portion (e.g., in the parameter or header data) and multiplied by the maximum amount of data associated with the picture. In this example, it is generally expected that the sum of the n values on the picture does not exceed 256.

In some other examples, the MCR may be defined by a so-called encoding profile. In some examples, the device may operate in accordance with an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data and/or output data stream to be encoded, wherein the image data encoding device is configured to apply the respective encoding profile to each sub-portion of a picture irrespective of the encoding profile applied to any other sub-portion of the picture.

The encoding profile may define a set or all of the features associated with encoding and/or decoding. In general, an encoding profile may be used for a particular type of data, such as HD (e.g., an image of 1920 x 1080 pixels) or 4K (e.g., an image of 3840 x 2160 pixels). However, at least in most cases nothing prevents e.g. a 4K profile from being used for a sub-part of an HD image, or vice versa. The profiles and/or MCRs may simply be independent or may be different from each other. The use of different profiles, sub-part by sub-part, allows to set the requirements of each sub-picture individually and accurately for the set of encoding profiles/levels that may eventually be used. This may be because there are constraints on features such as the number of tiles and columns that are not used linearly on the current picture and therefore cannot be derived simply by examining the fraction of pixels that a sub-pixel represents.

Fig. 24 schematically illustrates a data signal 2400, the data signal 2400 including respective data 2410 for a set of sub-portions of an image, each sub-portion having an associated respective MCR and/or encoding profile as described above.

Fig. 25 is a schematic flow chart illustrating a method comprising:

compression encoding (at 2500) image data representing one or more pictures of a sequence of pictures, each picture including two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures;

wherein the compression encoding step is operable in accordance with an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data and/or output data stream to be encoded, wherein the compression encoding step comprises applying the respective encoding profile to each sub-portion of a picture independently of the encoding profile applied to any other sub-portion of the picture.

The apparatus of fig. 7 operating according to the features of fig. 25 provides an example of an image data encoding apparatus including:

an image data encoder for applying compression coding to generate compressed image data representing one or more pictures of a sequence of pictures, each picture comprising an output data unit representing two or more sub-portions, a sub-portion being decodable and reconstructable independently of the picture or other sub-portions of the sequence of pictures;

Coding example 1-modifying Profile definition

A.4.2 Profile-specific level restrictions

…

The variable MinCr is set equal to MinCrBase MinCrScaleFactor/HBRfactor.

For sub-picture index i, the variable SubPicSizelnSamplesY [ j]Is set equal to (sub-picture width minus1) [j]+1 (height of sub-picture minus1[ j ]]+1])

And SubPictureFraction [ j]Is set equal to SubPicSizelnSamplesY [ j]÷ PicSizelnSamplesY

…

Each sub-picture jShould be less than or equal to the FormatCapabilityFactor (Max: (n))SubPicSizelnSamplesY[j],fR*MaxLumaSr*SubPictureFraction[j])+MaxLumaSr*(AuCpbRemovalTime[0]-AuNominalRemovalTirne[0])*SubPictureFraction[j]) MinCr for Picture 0SubValues for PicSizelnSamplesY, where MaxLumaSr and FormatCapabilityFactor are the values specified in table a.2 and table a.3, respectively, for picture 0.

Each sub-picture jThe sum of the NumByteslnNalUnit variables of the access unit n (n is greater than 0) should be small(ii) equal to or greater than FormatCapabilityFactor MaxLumaSr (AuCpbRemovalTime [ n ]]-AuCpbRemovalTime[n-1])*SubPictureFraction[j]MinCr, where MaxLumaSr and FormatCapabilityFactor are the values specified in Table A.2 and Table A.3, respectively, for Picture n.

Note that MaxLumaSr is the maximum luminance sampling rate. The FormatCapabilityFactor converts MaxLumaSr to the original bit rate. AuCpbRemovalTime [ n ] -AuCpbRemovalTime [ n-1]) gives a time scale that converts rate to number of bits in a frame.

Coding example 2-SEI semantics

D.3.8 sub-picture level information SEI message semantics

One requirement for bitstream conformance is that a bitstream extracted from the jth sub-graph of j in the range of 0 to sps _ num _ sub _ minus1 (including 0 and sps _ num _ sub _ minus1) and conforming to a profile of i in the range of 0 to num _ ref _ level _ minus1 (including 0 and num _ ref _ level _ minus1) should comply with the following constraints for each bitstream conformance test specified in annex C:

ceil (256. sub PicSizeY [ j ]/. RefLevelFraction [ i ] [ j ]) should be less than or equal to MaxLumaPs, wherein MaxLumaPs are specified in Table A.1;

the value of Ceil (256 × subapic _ width _ minus1[ j ] + 1)/(RefLevelFraction [ i ] [ j ]) should be less than or equal to Sqrt (MaxLumaps 8);

the value of Ceil (256 (subapic _ height _ minus1[ j ] +1) ÷ RefLevelFraction [ i ] [ j ]) should be less than or equal to Sqrt (MaxLumaps 8);

the value of SubPicNumTileCols [ j ] should be less than or equal to MaxTileCols, and the value of SubPicNumTileRows [ j ] should be less than or equal to MaxTileRows, where MaxTileCols and MaxTileRows are specified in Table A.1;

the sum of the NumByteslnNalUnit variables of access unit 0 corresponding to the jth sub-picture should be less than or equal to Value FormatCapabilityFactor of SubPicSizelnSamplesY for Picture 0 (Maxi (SubPicSizeY) [i],fR*MaxLumaSr*RefLevelFraction[i][j]÷256)+MaxLumaSr*(AuCpbRemovalTime[0]- AuNominalRemovalTime[0])*RefLevelFraction[i][j]) /256 MinCr, wherein, MaxLumasR And FormatCapabilityFactor are the values specified in Table A.2 and Table A.3, respectively, for Picture 0, at the level ref \\ u level_idc[i]Next, the derivation of MinCr is shown in A.4.2.

The sum of the NumByteslnNalUnit variables of access unit n (n is greater than 0) corresponding to the jth sub-picture should be less than Or equal to FormatCapabilitvFactor MaxLumaSr (AuCpbRemovalTime [ n [ ])]-AuCpbRemovalTime [n-1])*RefLevelFraction[i][j]Div (256 MinCr), wherein MaxLumasR and FormatCapabilitvFactor is the value specified in Table A.2 and Table A.3, respectively, for Picture n, at the level ref \\ u level_idc[i]Next, the derivation of MinCr is shown in A.4.2.

In each case, embodiments of the disclosure are represented by computer software and a machine-readable non-transitory storage medium storing such computer software, which when executed by a computer, causes the computer to perform a corresponding method. In the case of an encoding method, embodiments of the present disclosure are represented by a data signal comprising encoded data generated according to a corresponding method. To the extent that embodiments of the present disclosure have been described as being implemented at least in part by a software-controlled data processing device, it should be understood that a non-transitory machine-readable medium (e.g., an optical disk, a magnetic disk, a semiconductor memory, etc.) carrying such software is also considered to represent embodiments of the present disclosure. Similarly, data signals (whether contained on a non-transitory machine-readable medium or not) that include encoded data generated according to the above-described methods are also considered to represent embodiments of the present disclosure. Similarly, a decoder configured to decode such a data signal represents an embodiment of the present disclosure.

Obviously, many modifications and variations of the present disclosure are possible in light of the above teachings. It is, therefore, to be understood that within the scope of the appended claims, the technology may be practiced otherwise than as specifically described herein.

Corresponding aspects and features are defined by the following numbered items:

1. an image data encoding apparatus comprising:

(ii) zero or more tiles that define respective horizontal and vertical boundaries of a picture region and are not limited to encapsulation within respective NAL units, which tiles are decodable independent of other tiles of the same picture;

the entropy encoder is configured to generate an output data stream subject to a constraint defining an upper limit on the number of binarization symbols that can be represented by any individual output data unit relative to a byte size of the output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit representing a slice and each output data unit representing a tile, and for each output data unit not satisfying the constraint, to provide padding data to increase the byte size of the output data unit to satisfy the constraint.

2. The image data encoding apparatus according to item 1, wherein the second encoding system is a binary arithmetic encoding system using a fixed 50% probability context model.

3. The image data encoding apparatus according to item 1 or 2, wherein the constraint is defined by:

N<＝K1*B+(K2*CU)

wherein:

n is the number of binarized symbols in the output data unit;

k1 is a constant;

b is the number of coding bytes of the output data unit;

CU is the size of the picture, slice, or tile represented by the output data unit, expressed as the number of coding units of minimum size.

4. The image data encoding apparatus according to any one of the preceding claims, wherein the entropy encoder includes:

a detector configured to detect whether the current output data unit satisfies the constraint at a predetermined stage of encoding with respect to the current output data unit; and

a padding data generator configured to generate and insert sufficient padding data into the current output data unit such that the output data unit including the inserted padding data satisfies the constraint.

5. The image data encoding apparatus according to item 4, wherein the predetermined stage is an end of encoding the current output data unit.

6. A video storage, capture, transmission or reception apparatus comprising an apparatus according to any preceding claim.

7. An image data encoding method comprising:

the image data represents one or more pictures, each picture comprising:

(ii) zero or more tiles, a tile defining respective horizontal and vertical boundaries of a picture region and not limited to encapsulation within a respective NAL unit, the tiles being decodable independent of other tiles of the same picture;

generating an output data stream;

the generating step is subject to a constraint defining an upper limit on the number of binarized symbols, which may be represented by any individual output data unit relative to the byte size of that output data unit, wherein the generating step comprises applying the constraint to each output data unit representing a slice and each output data unit representing a tile; and is

8. Computer software which, when executed by a computer, causes the computer to perform the method of item 7.

9. A machine-readable non-transitory storage medium storing computer software according to item 8.

10. A data signal comprising encoded data generated according to the method of item 7.

11. An image data decoder configured to decode the data signal according to item 10.

Further corresponding aspects and features are defined by the following numbered items:

1. an image data encoding apparatus comprising:

the entropy encoder is configured to generate an output data stream subject to a constraint defining an upper bound on a number of binarization symbols, the binarization symbols being representable by any individual output data unit relative to a byte size of the output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit and, for each output data unit not satisfying the constraint, to provide padding data to increase the byte size of the output data unit to satisfy the constraint.

3. The image data encoding apparatus according to item 1 or item 2, wherein the constraint is defined by:

N<＝K1*B+(K2*CU)

wherein:

n is the number of binarized symbols in the output data unit;

k1 is a constant;

b is the number of coding bytes of the output data unit;

5. The image data encoding apparatus according to item 4, wherein the predetermined stage is an end of encoding a current output data unit.

6. The image data encoding apparatus of any preceding claim, wherein the sub-portions respectively represent sub-portions from a list comprising sub-pictures, slices and tiles.

7. The image data encoding apparatus according to any one of the preceding claims, wherein:

(i) the sub-picture represents a region of the picture;

(ii) a slice represents a portion of a picture, sub-picture, or tile in raster order and is constrained to be encapsulated in a corresponding Network Abstraction Layer (NAL) unit; and

(iii) tiles represent a portion of a picture, sub-picture, or slice that defines respective horizontal and vertical boundaries in a grid arrangement and are not limited to encapsulation in respective NAL units.

8. The image data encoding apparatus of any of the preceding claims, wherein each subsection is affected by a respective minimum compression ratio.

9. An image data encoding apparatus according to any preceding claim, wherein the apparatus is operable according to an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data to be encoded and/or the output data stream, wherein the image data encoding apparatus is configured to apply the respective encoding profile to each sub-portion of a picture irrespective of the encoding profile applied to any other sub-portion of that picture.

10. The image data encoding apparatus according to item 9, wherein the encoding profile for a subsection is defined by subsection header data associated with the subsection.

11. A video storage, capture, transmission or reception apparatus comprising an apparatus according to any preceding claim.

12. An image data encoding apparatus comprising:

wherein the device is operable in accordance with an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data and/or output data stream to be encoded, wherein the image data encoding device is configured to apply the respective encoding profile to each sub-portion of a picture irrespective of the encoding profile applied to any other sub-portion of the picture.

13. The image data encoding apparatus of item 11, wherein the encoding profile defines at least a minimum compression ratio, the image data encoder being configured to generate compressed image data subject to the respective minimum compression ratio applicable to each subsection.

14. A video storage, capture, transmission or reception device comprising a device according to item 12 or item 13.

15. An image data encoding method comprising:

generating an output data stream;

the generating step is subject to a constraint defining an upper limit on the number of binarized symbols, which may be represented by any individual output data unit relative to the byte size of that output data unit, wherein the generating step includes applying the constraint to each output data unit; and is

16. Computer software which, when executed by a computer, causes the computer to perform the method of item 15.

17. A machine-readable non-transitory storage medium storing computer software according to item 16.

18. A data signal comprising encoded data generated according to the method of item 15.

19. An image data decoder configured to decode the data signal according to item 18.

20. An image data encoding method comprising:

21. Computer software which, when executed by a computer, causes the computer to perform the method of item 20.

22. A machine-readable non-transitory storage medium storing computer software according to item 21.

23. A data signal comprising encoded data generated according to the method of claim 20.

24. A data signal comprising respective data for a set of sub-portions of an image, each sub-portion having an associated respective minimum compression ratio and/or encoding profile.

25. An image data decoder configured to decode the data signal according to item 23 or item 24.

1. an image data encoding apparatus comprising:

wherein each subsection is affected by a respective minimum compression ratio.

2. The image data encoding apparatus of item 1, wherein the entropy encoder is configured to generate the output data stream subject to a constraint that defines an upper limit on a number of binarized symbols representable by any individual output data unit relative to a byte size of that output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit and, for each output data unit that does not satisfy the constraint, provide padding data to increase the byte size of that output data unit to satisfy the constraint.

3. The image data encoding apparatus according to item 1 or item 2, wherein the second encoding system is a binary arithmetic encoding system using a fixed 50% probability context model.

4. The image data encoding apparatus according to any one of the preceding claims, wherein the constraint is defined by:

N<＝K1*B+(K2*CU)

wherein:

n is the number of binarized symbols in the output data unit;

k1 is a constant;

b is the number of coding bytes of the output data unit;

5. The image data encoding apparatus according to any one of the preceding claims, wherein the entropy encoder includes:

6. The image data encoding apparatus according to item 5, wherein the predetermined stage is an end of encoding the current output data unit.

7. The image data encoding apparatus of any preceding claim, wherein the sub-portions respectively represent sub-portions from a list comprising sub-pictures, slices and tiles.

8. The image data encoding apparatus according to any one of the preceding claims, wherein:

(i) the sub-picture represents a region of the picture;

(iii) tiles represent portions of a picture, sub-picture, or slice that define respective horizontal and vertical boundaries in a grid arrangement and are not limited to encapsulation in respective NAL units.

12. An image data encoding apparatus comprising:

14. A video storage, capture, transmission or reception device comprising the device according to item 12.

15. An image data encoding method comprising:

Generating an output data stream;

20. An image data encoding method comprising:

25. An image data decoder configured to decode a data signal according to item 23 or item 24.

Claims

1. An image data encoding apparatus comprising:

wherein each subsection is affected by a respective minimum compression ratio.

2. The image data encoding apparatus of claim 1, wherein the entropy encoder is configured to generate the output data stream subject to a constraint that defines an upper limit on a number of binarized symbols that can be represented by any individual output data unit relative to a byte size of that output data unit, wherein the entropy encoder is configured to apply the constraint to each output data unit and, for each output data unit that does not satisfy the constraint, provide padding data to increase the byte size of that output data unit to satisfy the constraint.

3. The image data encoding apparatus according to claim 1, wherein the second encoding system is a binary arithmetic encoding system using a fixed 50% probability context model.

4. The image data encoding apparatus according to claim 1, wherein the constraint is defined by:

N<＝K1*B+(K2*CU)

wherein:

n is the number of binarized symbols in the output data unit;

k1 is a constant;

b is the number of coding bytes of the output data unit;

5. The image data encoding apparatus according to claim 1, wherein the entropy encoder includes:

a detector configured to detect whether a current output data unit satisfies the constraint at a predetermined stage of encoding with respect to the current output data unit; and

a padding data generator configured to generate and insert sufficient padding data into the current output data unit such that the output data unit including the inserted padding data satisfies a constraint.

6. The image data encoding apparatus according to claim 5, wherein the predetermined stage is an end of encoding the current output data unit.

7. The image data encoding apparatus of claim 1, wherein the sub-portions respectively represent sub-portions from a list including sub-pictures, slices, and tiles.

8. The image data encoding apparatus according to claim 1, wherein:

(i) the sub-picture represents a region of the picture;

(iii) tiles represent portions of a picture, sub-picture, or slice that define respective horizontal and vertical boundaries in a grid arrangement and are not limited to encapsulation in respective network abstraction layer units.

9. An image data encoding apparatus according to claim 1, wherein the apparatus is operable in accordance with an encoding profile selected from a set of encoding profiles, each encoding profile defining at least one set of constraints on the image data to be encoded and/or the output data stream, wherein the image data encoding apparatus is configured to apply the respective encoding profile to each sub-portion of a picture irrespective of the encoding profile applied to any other sub-portion of that picture.

10. The image data encoding apparatus of claim 9, wherein the encoding profile for a subsection is defined by subsection header data associated with the subsection.

11. A video storage, capture, transmission or reception device comprising the device of claim 1.

12. An image data encoding apparatus comprising:

13. The image data encoding device of claim 11, wherein the encoding profile defines at least a minimum compression ratio, the image data encoder being configured to generate compressed image data subject to the respective minimum compression ratio applicable to each subsection.

14. A video storage, capture, transmission or reception apparatus comprising an apparatus according to claim 12.

15. An image data encoding method comprising:

the image data representing one or more pictures of a sequence of pictures, each picture comprising two or more units of output data representing respective sub-portions of the picture, each sub-portion being decodable and reconstructable independently of the picture and other sub-portions in the sequence of pictures; and is

Generating an output data stream;

16. Computer software which, when executed by a computer, causes the computer to perform the method of claim 15.

17. A machine-readable non-transitory storage medium storing computer software according to claim 16.

18. A data signal comprising encoded data generated according to the method of claim 15.

19. An image data decoder configured to decode the data signal of claim 18.

20. An image data encoding method comprising:

21. Computer software which, when executed by a computer, causes the computer to perform the method of claim 20.

22. A machine-readable non-transitory storage medium storing computer software according to claim 21.

25. An image data decoder configured to decode the data signal of claim 23.

26. An image data decoder configured to decode the data signal of claim 24.