WO2015114322A1 - Image data encoding and decoding - Google Patents

Image data encoding and decoding Download PDF

Info

Publication number
WO2015114322A1
WO2015114322A1 PCT/GB2015/050188 GB2015050188W WO2015114322A1 WO 2015114322 A1 WO2015114322 A1 WO 2015114322A1 GB 2015050188 W GB2015050188 W GB 2015050188W WO 2015114322 A1 WO2015114322 A1 WO 2015114322A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
prediction
samples
block
intra
Prior art date
Application number
PCT/GB2015/050188
Other languages
French (fr)
Inventor
Karl James Sharman
James Alexander GAMEI
Original Assignee
Sony Corporation
Sony Europe Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corporation, Sony Europe Limited filed Critical Sony Corporation
Publication of WO2015114322A1 publication Critical patent/WO2015114322A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Definitions

  • This disclosure relates to data encoding and decoding.
  • Video data compression and decompression systems which involve generating a prediction of a group of video data, deriving the residual or difference between the video data and the prediction, transforming the residual data into a frequency domain representation, quantising the frequency domain coefficients and then applying some form of entropy encoding to the quantised coefficients.
  • intra-block-copy may be considered as a hybrid of inter- and intra-image encoding, in that a predicted block is derived from another block in an encoded and decoded version of the current image, defined with respect to the current block by a motion vector.
  • This disclosure provides a data encoding method according to claim 1 .
  • Figure 1 schematically illustrates an audio/video (A/V) data transmission and reception system using video data compression and decompression
  • Figure 2 schematically illustrates a video display system using video data decompression
  • FIG. 3 schematically illustrates an audio/video storage system using video data compression and decompression
  • Figure 4 schematically illustrates a video camera using video data compression
  • Figure 5 provides a schematic overview of a video data compression and decompression apparatus
  • Figure 6 schematically illustrates the generation of predicted images
  • FIG. 7 schematically illustrates a largest coding unit (LCU).
  • Figure 8 schematically illustrates a set of four coding units (CU);
  • Figures 9 and 10 schematically illustrate the coding units of Figure 8 sub-divided into smaller coding units
  • Figure 1 schematically illustrates an array of prediction units (PU);
  • FIG 12 schematically illustrates an array of transform units (TU);
  • Figure 13 schematically illustrates a partially-encoded image
  • Figure 14 schematically illustrates a set of possible prediction directions
  • Figure 15 schematically illustrates a set of prediction modes
  • Figure 16 schematically illustrates a zig-zag scan
  • Figure 17 schematically illustrates an intra-block-copy process
  • Figures 18a-18c schematically illustrate the division of a CU into multiple PUs
  • Figure 19 is a schematic flowchart illustrating a previously proposed process for using multiple PUs derived from a single CU
  • Figure 20 is a schematic flowchart illustrating a process for merging small PUs together
  • Figures 21 -23 schematically illustrate sets of PUs derived from a square CU
  • Figure 24 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU
  • Figure 25 schematically illustrates a modified search area
  • Figure 26 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU
  • Figure 27 schematically illustrates an encoding method
  • Figure 28 schematically illustrates a decoding method
  • Figures 29 and 30 schematically illustrate examples of machine-readable non-transitory storage media.
  • Figures 1 -4 are provided to give schematic illustrations of apparatus or systems making use of the compression and/or decompression apparatus to be described below in connection with embodiments. All of the data compression and/or decompression apparatus is to be described below may be implemented in hardware, in software running on a general-purpose data processing apparatus such as a general-purpose computer, as programmable hardware such as an application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or as combinations of these.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Figure 1 schematically illustrates an audio/video data transmission and reception system using video data compression and decompression.
  • An input audio/video signal 10 is supplied to a video data compression apparatus 20 which compresses at least the video component of the audio/video signal 10 for transmission along a transmission route 30 such as a cable, an optical fibre, a wireless link or the like.
  • the compressed signal is processed by a decompression apparatus 40 to provide an output audio/video signal 50.
  • a compression apparatus 60 compresses an audio/video signal for transmission along the transmission route 30 to a decompression apparatus 70.
  • the compression apparatus 20 and decompression apparatus 70 can therefore form one node of a transmission link.
  • the decompression apparatus 40 and decompression apparatus 60 can form another node of the transmission link.
  • the transmission link is uni-directional, only one of the nodes would require a compression apparatus and the other node would only require a decompression apparatus.
  • FIG. 2 schematically illustrates a video display system using video data decompression.
  • a compressed audio/video signal 100 is processed by a decompression apparatus 1 10 to provide a decompressed signal which can be displayed on a display 120.
  • the decompression apparatus 1 10 could be implemented as an integral part of the display 120, for example being provided within the same casing as the display device.
  • the decompression apparatus 1 10 might be provided as (for example) a so-called set top box (STB), noting that the expression "set-top” does not imply a requirement for the box to be sited in any particular orientation or position with respect to the display 120; it is simply a term used in the art to indicate a device which is connectable to a display as a peripheral device.
  • STB set top box
  • Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression.
  • An input audio/video signal 130 is supplied to a compression apparatus 140 which generates a compressed signal for storing by a store device 150 such as a magnetic disk device, an optical disk device, a magnetic tape device, a solid state storage device such as a semiconductor memory or other storage device.
  • a store device 150 such as a magnetic disk device, an optical disk device, a magnetic tape device, a solid state storage device such as a semiconductor memory or other storage device.
  • compressed data is read from the store device 150 and passed to a decompression apparatus 160 for decompression to provide an output audio/video signal 170.
  • Figure 4 schematically illustrates a video camera using video data compression.
  • image capture device 180 such as a charge coupled device (CCD) image sensor and associated control and read-out electronics, generates a video signal which is passed to a compression apparatus 190.
  • a microphone (or plural microphones) 200 generates an audio signal to be passed to the compression apparatus 190.
  • the compression apparatus 190 generates a compressed audio/video signal 210 to be stored and/or transmitted (shown generically as a schematic stage 220).
  • the techniques to be described below relate primarily to video data compression. It will be appreciated that many existing techniques may be used for audio data compression in conjunction with the video data compression techniques which will be described, to generate a compressed audio/video signal. Accordingly, a separate discussion of audio data compression will not be provided. It will also be appreciated that the data rate associated with video data, in particular broadcast quality video data, is generally very much higher than the data rate associated with audio data (whether compressed or uncompressed). It will therefore be appreciated that uncompressed audio data could accompany compressed video data to form a compressed audio/video signal.
  • Figure 5 provides a schematic overview of a video data compression and decompression apparatus.
  • Successive images of an input video signal 300 are supplied to an adder 310 and to an image predictor 320.
  • the image predictor 320 will be described below in more detail with reference to Figure 6.
  • the adder 310 in fact performs a subtraction (negative addition) operation, in that it receives the input video signal 300 on a "+" input and the output of the image predictor 320 on a "-" input, so that the predicted image is subtracted from the input image. The result is to generate a so-called residual image signal 330 representing the difference between the actual and projected images.
  • a residual image signal is generated.
  • the data coding techniques to be described that is to say the techniques which will be applied to the residual image signal, tends to work more efficiently when there is less "energy” in the image to be encoded.
  • the term “efficiently” refers to the generation of a small amount of encoded data; for a particular image quality level, it is desirable (and considered “efficient") to generate as little data as is practicably possible.
  • the reference to "energy” in the residual image relates to the amount of information contained in the residual image. If the predicted image were to be identical to the real image, the difference between the two (that is to say, the residual image) would contain zero information (zero energy) and would be very easy to encode into a small amount of encoded data. In general, if the prediction process can be made to work reasonably well, the expectation is that the residual image data will contain less information (less energy) than the input image and so will be easier to encode into a small amount of encoded data.
  • the residual image data 330 is supplied to a transform unit 340 which generates a discrete cosine transform (DCT) representation of the residual image data.
  • DCT discrete cosine transform
  • DST discrete sine transform
  • the output of the transform unit 340 which is to say, a set of transform coefficients for each transformed block of image data, is supplied to a quantiser 350.
  • quantisation techniques are known in the field of video data compression, ranging from a simple multiplication by a quantisation scaling factor through to the application of complicated lookup tables under the control of a quantisation parameter. The general aim is twofold. Firstly, the quantisation process reduces the number of possible values of the transformed data. Secondly, the quantisation process can increase the likelihood that values of the transformed data are zero. Both of these can make the entropy encoding process, to be described below, work more efficiently in generating small amounts of compressed video data.
  • a data scanning process is applied by a scan unit 360.
  • the purpose of the scanning process is to reorder the quantised transformed data so as to gather as many as possible of the non-zero quantised transformed coefficients together, and of course therefore to gather as many as possible of the zero-valued coefficients together.
  • These features can allow so-called run-length coding or similar techniques to be applied efficiently.
  • the scanning process involves selecting coefficients from the quantised transformed data, and in particular from a block of coefficients corresponding to a block of image data which has been transformed and quantised, according to a "scanning order" so that (a) all of the coefficients are selected once as part of the scan, and (b) the scan tends to provide the desired reordering.
  • a scanning order which can tend to give useful results is a so-called zigzag scanning order.
  • CABAC Context Adaptive Binary Arithmetic Coding
  • CAVLC Context Adaptive Variable-Length Coding
  • the output of the entropy encoder 370 along with additional data (mentioned above and/or discussed below), for example defining the manner in which the predictor 320 generated the predicted image, provides a compressed output video signal 380.
  • a return path is also provided because the operation of the predictor 320 itself depends upon a decompressed version of the compressed output data.
  • the reason for this feature is as follows. At the appropriate stage in the decompression process (to be described below) a decompressed version of the residual data is generated. This decompressed residual data has to be added to a predicted image to generate an output image (because the original residual data was the difference between the input image and a predicted image). In order that this process is comparable, as between the compression side and the decompression side, the predicted images generated by the predictor 320 should be the same during the compression process and during the decompression process. Of course, at decompression, the apparatus does not have access to the original input images, but only to the decompressed images. Therefore, at compression, the predictor 320 bases its prediction (at least, for inter-image encoding) on decompressed versions of the compressed images.
  • the entropy encoding process carried out by the entropy encoder 370 is considered to be "lossless", which is to say that it can be reversed to arrive at exactly the same data which was first supplied to the entropy encoder 370. So, the return path can be implemented before the entropy encoding stage. Indeed, the scanning process carried out by the scan unit 360 is also considered lossless, but in the present embodiment the return path 390 is from the output of the quantiser 350 to the input of a complimentary inverse quantiser 420.
  • an entropy decoder 410 the reverse scan unit 400, an inverse quantiser 420 and an inverse transform unit 430 provide the respective inverse functions of the entropy encoder 370, the scan unit 360, the quantiser 350 and the transform unit 340.
  • the discussion will continue through the compression process; the process to decompress an input compressed video signal will be discussed separately below.
  • the scanned coefficients are passed by the return path 390 from the quantiser 350 to the inverse quantiser 420 which carries out the inverse operation of the scan unit 360.
  • An inverse quantisation and inverse transformation process are carried out by the units 420, 430 to generate a compressed-decompressed residual image signal 440.
  • the image signal 440 is added, at an adder 450, to the output of the predictor 320 to generate a reconstructed output image 460. This forms one input to the image predictor 320, as will be described below.
  • the signal is supplied to the entropy decoder 410 and from there to the chain of the reverse scan unit 400, the inverse quantiser 420 and the inverse transform unit 430 before being added to the output of the image predictor 320 by the adder 450.
  • the output 460 of the adder 450 forms the output decompressed video signal 480.
  • further filtering may be applied before the signal is output.
  • Figure 6 schematically illustrates the generation of predicted images, and in particular the operation of the image predictor 320.
  • Intra-image prediction bases a prediction of the content of a block of the image on data from within the same image. This corresponds to so-called l-frame encoding in other video compression techniques.
  • l-frame encoding where the whole image is intra- encoded
  • the choice between intra- and inter- encoding can be made on a block-by-block basis, though in other embodiments the choice is still made on an image-by-image basis.
  • Motion-compensated prediction makes use of motion information which attempts to define the source, in another adjacent or nearby image, of image detail to be encoded in the current image. Accordingly, in an ideal example, the contents of a block of image data in the predicted image can be encoded very simply as a reference (a motion vector) pointing to a corresponding block at the same or a slightly different position in an adjacent image.
  • a reference a motion vector
  • two image prediction arrangements (corresponding to intra- and inter-image prediction) are shown, the results of which are selected by a multiplexer 500 under the control of a mode signal 510 so as to provide blocks of the predicted image for supply to the adders 310 and 450.
  • the choice is made in dependence upon which selection gives the lowest "energy” (which, as discussed above, may be considered as information content requiring encoding), and the choice is signalled to the encoder within the encoded output datastream.
  • Image energy in this context, can be detected, for example, by carrying out a trial subtraction of an area of the two versions of the predicted image from the input image, squaring each pixel value of the difference image, summing the squared values, and identifying which of the two versions gives rise to the lower mean squared value of the difference image relating to that image area.
  • the actual prediction, in the intra-encoding system, is made on the basis of image blocks received as part of the signal 460, which is to say, the prediction is based upon encoded- decoded image blocks in order that exactly the same prediction can be made at a decompression apparatus.
  • data can be derived from the input video signal 300 by an intra-mode selector 520 to control the operation of the intra-image predictor 530.
  • a motion compensated (MC) predictor 540 uses motion information such as motion vectors derived by a motion estimator 550 from the input video signal 300. Those motion vectors are applied to a processed version of the reconstructed image 460 by the motion compensated predictor 540 to generate blocks of the inter-image prediction.
  • the signal is filtered by a filter unit 560.
  • an adaptive loop filter is applied using coefficients derived by processing the reconstructed signal 460 and the input video signal 300.
  • the adaptive loop filter is a type of filter which, using known techniques, applies adaptive filter coefficients to the data to be filtered. That is to say, the filter coefficients can vary in dependence upon various factors. Data defining which filter coefficients to use is included as part of the encoded output datastream.
  • the filtered output from the filter unit 560 in fact forms the output video signal 480. It is also buffered in one or more image stores 570; the storage of successive images is a requirement of motion compensated prediction processing, and in particular the generation of motion vectors. To save on storage requirements, the stored images in the image stores 570 may be held in a compressed form and then decompressed for use in generating motion vectors. For this particular purpose, any known compression / decompression system may be used.
  • the stored images are passed to an interpolation filter 580 which generates a higher resolution version of the stored images; in this example, intermediate samples (sub-samples) are generated such that the resolution of the interpolated image is output by the interpolation filter 580 is 8 times (in each dimension) that of the images stored in the image stores 570.
  • the interpolated images are passed as an input to the motion estimator 550 and also to the motion compensated predictor 540.
  • a further optional stage is provided, which is to multiply the data values of the input video signal by a factor of four using a multiplier 600 (effectively just shifting the data values left by two bits), and to apply a corresponding divide operation (shift right by two bits) at the output of the apparatus using a divider or right-shifter 610. So, the shifting left and shifting right changes the data purely for the internal operation of the apparatus. This measure can provide for higher calculation accuracy within the apparatus, as the effect of any data rounding errors is reduced.
  • LCU 700 largest coding unit 700 ( Figure 7), which represents a square array of 64 x 64 samples.
  • LCU 700 largest coding unit 700
  • the discussion relates to luminance samples.
  • chrominance mode such as 4:4:4, 4:2:2, 4:2:0 or 4:4:4:4 (GBR plus key data)
  • coding units Three basic types of blocks will be described: coding units, prediction units and transform units.
  • the recursive subdividing of the LCUs allows an input picture to be partitioned in such a way that both the block sizes and the block coding parameters (such as prediction or residual coding modes) can be set according to the specific characteristics of the image to be encoded.
  • the LCU may be subdivided into so-called coding units (CU). Coding units are always square and have a size between 8x8 samples and the full size of the LCU 700.
  • the coding units can be arranged as a kind of tree structure, so that a first subdivision may take place as shown in Figure 8, giving coding units 710 of 32x32 samples; subsequent subdivisions may then take place on a selective basis so as to give some coding units 720 of 16x16 samples ( Figure 9) and potentially some coding units 730 of 8x8 samples ( Figure 10). Overall, this process can provide a content-adapting coding tree structure of CU blocks, each of which may be as large as the LCU or as small as 8x8 samples. Encoding of the output video data takes place on the basis of the coding unit structure.
  • Figure 1 1 schematically illustrates an array of prediction units (PU).
  • a prediction unit is a basic unit for carrying information relating to the image prediction processes, or in other words the additional data added to the entropy encoded residual image data to form the output video signal from the apparatus of Figure 5.
  • prediction units are not restricted to being square in shape. They can take other shapes, in particular rectangular shapes forming half of one of the square coding units, as long as the coding unit is greater than the minimum (8x8) size.
  • the aim is to allow the boundary of adjacent prediction units to match (as closely as possible) the boundary of real objects in the picture, so that different prediction parameters can be applied to different real objects.
  • Each coding unit may contain one or more prediction units.
  • FIG. 12 schematically illustrates an array of transform units (TU).
  • a transform unit is a basic unit of the transform and quantisation process. Transform units are always square and can take a size from 4x4 up to 32x32 samples. Each coding unit can contain one or more transform units.
  • the acronym SDIP-P in Figure 12 signifies a so-called short distance intra- prediction partition. In this arrangement only one dimensional transforms are used, so a 4xN block is passed through N transforms with input data to the transforms being based upon the previously decoded neighbouring blocks and the previously decoded neighbouring lines within the current SDIP-P.
  • intra-prediction involves generating a prediction of a current block (a prediction unit) of samples from previously-encoded and decoded samples in the same image.
  • Figure 13 schematically illustrates a partially encoded image 800. Here, the image is being encoded from top-left to bottom-right on an LCU basis.
  • An example LCU encoded partway through the handling of the whole image is shown as a block 810.
  • a shaded region 820 above and to the left of the block 810 has already been encoded.
  • the intra-image prediction of the contents of the block 810 can make use of any of the shaded area 820 but cannot make use of the unshaded area below that.
  • the block 810 represents an LCU; as discussed above, for the purposes of intra-image prediction processing, this may be subdivided into a set of smaller prediction units.
  • An example of a prediction unit 830 is shown within the LCU 810.
  • the intra-image prediction takes into account samples above and/or to the left of the current LCU 810.
  • Source samples from which the required samples are predicted, may be located at different positions or directions relative to a current prediction unit within the LCU 810.
  • To decide which direction is appropriate for a current prediction unit the results of a trial prediction based upon each candidate direction are compared in order to see which candidate direction gives an outcome which is closest to the corresponding block of the input image.
  • the candidate direction giving the closest outcome is selected as the prediction direction for that prediction unit.
  • the picture may also be encoded on a "slice" basis.
  • a slice is a horizontally adjacent group of LCUs. But in more general terms, the entire residual image could form a slice, or a slice could be a single LCU, or a slice could be a row of LCUs, and so on. Slices can give some resilience to errors as they are encoded as independent units.
  • the encoder and decoder states are completely reset at a slice boundary. For example, intra- prediction is not carried out across slice boundaries; slice boundaries are treated as image boundaries for this purpose.
  • Figure 14 schematically illustrates a set of possible (candidate) prediction directions. The full set of 34 candidate directions is available to a prediction unit of 8x8, 16x16 or 32x32 samples.
  • prediction unit sizes of 4x4 and 64x64 samples have a reduced set of candidate directions available to them (17 candidate directions and 5 candidate directions respectively).
  • the directions are determined by horizontal and vertical displacement relative to a current block position, but are encoded as prediction "modes", a set of which is shown in Figure 15. Note that the so-called DC mode represents a simple arithmetic mean of the surrounding upper and left-hand samples.
  • Figure 16 schematically illustrates a zig-zag scan as an example of an order of processing samples in a block, starting from a sample position 840.
  • Figure 17 schematically illustrates an intra-block-copy process.
  • FIG 17 two largest coding units (LCUs) 1000, 1010 are shown, representing a portion of a current image to be encoded.
  • the LCU 1000 is currently being encoded, whereas the LCU 1010 is a previously encoded and then decoded portion of the image to the left of the LCU 1000.
  • the encoding process proceeds generally from the top left to the lower right of the image, so that the LCU 1010 is handled by the encoding process before the LCU 1000.
  • coding units (CUs) such as a CU 1020 are encoded in a certain order, for example from top left to lower right within the LCU. Accordingly, when an arbitrary CU 1020 is encoded, unless the CU 1020 is the first CU within that LCU to be encoded, there will be other portions of the LCU containing the CU 1020 which have already been encoded and decoded.
  • the intra-block-copy process bases a prediction of the CU 1020 on a correspondingly- sized source block (for example, a block 1030) which is selected from portions of the image which have already been encoded and decoded.
  • a correspondingly- sized source block for example, a block 1030
  • the spatial relationship between the CU 1020 and the source block 1030 is indicated by a motion vector 1040.
  • the prediction is based upon an encoded and decoded version of the source block, so that the same information is available at the encoder and that the decoder (the decoder does not have access to the original image but only to encoded and decoded versions of the image).
  • the prediction of the CU 1020 is equal to the contents of the encoded-decoded source block 1030.
  • various processing could be applied to the contents of the source block 1030 to form the prediction of the CU 1020.
  • Intra-block-copy is particularly useful when dealing with so-called "screen content", which is a term sometimes used to define image content not generated using a camera.
  • screen content may include animation, computer graphics, subtitling and the like, or a combination of these with content generated using a camera.
  • Screen content may provide multiple small regions which are sufficiently similar to one another that the intra-block-copy process can provide a very accurate prediction of a current block.
  • the source block 1030 must lie in a region of the image which has already been encoded and decoded.
  • the source block 1030 must be in the same row of LCUs as the LCU containing the CU 1020 to be encoded.
  • the source block 1030 must be displaced to the left of the LCU containing the CU 1020 to be encoded by no more than a maximum horizontal displacement (though the source block 1030 may also be within the LCU containing the CU 1020).
  • the maximum horizontal displacement is equal to 64 luminance samples.
  • a further criterion may be applied in a so-called "constrained intra” mode.
  • This mode is provided within the standards to allow, for example, the image to be refreshed periodically even if an entire intra-encoded image (an "I frame" in MPEG terminology) is not being provided.
  • the constrained intra mode at least portions of the image are constrained so that they may only be derived from other samples which have themselves been intra-image encoded or encoded using intra-block-copy processing. In this way, the constrained intra mode prevents (for example, over the course of several images) erroneous data from previously processed images being propagated into subsequent images by virtue of inter-image processing.
  • the source block 1030 In the constrained intra mode, therefore, the source block 1030 must itself have been intra-image encoded and/or encoded using intra-block-copy processing (it could be both because a source block 1030 could span two CUs).
  • an allowable range of displacements represented by the motion vectors comprising of one or more selected from the group consisting of: displacements to a position within a current row of coding units; displacements of no more than a predetermined number of samples to the left of the LCU containing the prediction unit; and in a constrained intra mode, displacements to source samples which were derived using intra-image or intra-block-copy processing.
  • the CU 1020 has been treated as a single unit for the purposes of the intra-block-copy process. Strictly speaking, the CU 1020 has been handled as a single prediction unit (PU) and the motion vector 1040 was in fact associated with the single PU. Where the size and extent of the CU exactly matched that of the single PU, the question of whether to refer to the block 1020 as a CU or a PU was somewhat semantic. However, the difference becomes technically significant where a CU is divided into multiple PUs for intra- block-copy processing. In previously proposed arrangements, it was impermissible in the draft HEVC standards for a CU to be subject to intra-block-copy processing to be divided into multiple PUs. In contrast, the present embodiments relate to arrangements in which a CU, to be subject to intra-block-copy processing, can be divided into multiple PUs.
  • PU prediction unit
  • Figures 18a-18c schematically illustrate the division of a CU into multiple PUs.
  • a CU comprises a square-shaped array of luminance samples and the associated chrominance samples.
  • the array of corresponding or associated chrominance samples will also be a square array of equal size to the array of luminance samples because of the 1 :1 relationship between samples of the luminance and chrominance components.
  • a subsampled chrominance format such as 4:2:2 or 4:2:0, there will be fewer chrominance samples than luminance samples for a particular CU.
  • Figures 18a-18c schematically illustrates the situation for the luminance samples only; the situation for the chrominance samples will be discussed further below.
  • luminance and chrominance apply to a system in which there is indeed a luminance channel, such as a YUV system.
  • the equivalent primary channel would be one of the channels such as (in the example of RGB) the green channel.
  • the discussions here relating to "luminance” can therefore be applied to the primary channel (such as the green channel in an RGB system) and the discussions here relating to "chrominance” can be applied to channels other than the primary channel (such as red and blue in an RGB system).
  • the techniques are applicable to any situation in which motion vectors derived from a PU are potentially shared by other PUs, whether or not particular channels or components are defined.
  • the dimension is expressed as "2N" to indicate that the number is divisible by 2; this feature will become clear when discussing the nomenclature of the divided PUs below.
  • a 2N x 2N CU 1 100 is divided horizontally into two PUs, each of size N x 2N samples.
  • a 2N x 2N CU 1 1 10 is divided vertically into two PUs, each of size 2N x N samples.
  • a 2N x 2N CU 1 120 is divided horizontally and vertically into four PUs, each of size N x N samples. Note that the splitting of a CU into multiple PUs is not a recursive process in the current HEVC system. That is to say the splitting is carried out only once, so that a PU is not further divided. This means that the divisions shown in Figures 18a-18c form an exhaustive list of the possible divisions of a CU into multiple PUs.
  • the choice of how to divide the CU (and, indeed, whether to divide it at all) is taken by the controller 345.
  • various trial compressions can be performed in whole or in part and the most efficient compression (in terms of a cost function, for example relating to output bit rate and/or error rate or signal to noise ratio) may be selected using known techniques.
  • the choice between inter-image, intra-image and intra-block-copy processing is made by the controller 345 by a similar technique.
  • an individual motion vector is generated for each of the divided PUs.
  • Motion vectors are generated in respect of the luminance component of each PU and, subject to the exceptions to be discussed below, the same motion vectors are used in respect of the corresponding chrominance components of that PU.
  • the encoder can also take account of the non-primary components such as the chrominance components in preparing the motion vectors.
  • Figure 19 is a schematic flowchart illustrating a previously proposed process for using multiple PUs derived from a single CU in an intra-block-copy process.
  • a CU is split into multiple PUs.
  • the decision to make the division into multiple PUs, and indeed the decision to use intra-block-copy processing rather than inter-image or intra-image processing, is taken by the controller 345.
  • a motion vector is derived from the luminance component of each PU.
  • a motion vector is obtained using an established correlation technique which compares the PU to be encoded with correspondingly-sized test regions within a search area of the already-encoded (and decoded) image, though other techniques are envisaged.
  • the test region providing the greatest correlation, or in other words the greatest similarity, with the PU to be encoded is selected as the source block for that PU.
  • the displacement between the PU and the selected source block forms the corresponding motion vector.
  • a step 1 170 the motion vector is tested to detect whether the displacement represented by that motion vector is allowable according to the four criteria listed above. If not, control returns to the step 1 160 and a different motion vector is selected. Note that the steps 1 160 and 1 170 are shown separately for clarity of the explanation but they can be combined such that non-allowable motion vectors are inhibited from being selected in the first place.
  • this process provides an example of a method of operation of an image data encoding apparatus, the method comprising: in respect of a coding unit comprising an array of data samples of a current image, dividing the coding unit into two or more prediction units; and for each prediction unit, detecting a motion vector pointing to an array of luminance samples for use in generating a predicted version of the luminance samples of that prediction unit, by comparing samples of the prediction unit with source samples within a search area of a version of the current image, so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements.
  • the same motion vector is used for the corresponding chrominance PU at a step 1 180.
  • this relationship is implicit, so that an active step corresponding to the step 1 180 may not be literally implemented at the encoder.
  • a single set of motion vectors one for each PU, is generated from the luminance PUs and is provided as part of the compressed datastream.
  • the re-use of the motion vectors in respect of corresponding chrominance PUs is assumed.
  • the decoder reuses the motion vectors as described.
  • An issue which may arise when a CU is split into multiple PUs relates to the size of the corresponding chrominance PUs.
  • a minimum size of a block or array of samples for processing is 4 samples. That is to say, any array of samples for processing within the system should have a horizontal width of at least 4 samples and a vertical height of at least 4 samples.
  • the chrominance format is not 4:4:4, which is to say that a subsampled chrominance format (such as, for example, 4:2:2 or 4:2:0) is in use, there are fewer chrominance samples associated with a CU than the number of luminance samples.
  • the chrominance samples are subsampled horizontally with respect to the luminance samples so that any further horizontal division such as that represented by Figure 18a or Figure 18c will result in a horizontal dimension of the chrominance PUs equal to 2 chrominance samples.
  • the chrominance components have half the horizontal and half the vertical resolution of the luminance component so that any of the divisions shown in Figures 18a-18c will result in a chrominance PU having at least one dimension equal to 2 samples.
  • dimensions less than 4 samples are not allowable within the HEVC system and so an established technique for dealing with this problem is to merge together chrominance PUs derived from a single CU so as to achieve the minimum dimension of 4 samples.
  • Figure 20 is a schematic flowchart illustrating a process for merging small PUs together.
  • a step 1200 it is detected (for example, by the controller 345) that the currently selected division of a CU into multiple PUs will result in chrominance PUs being generated which are smaller than 4 samples in either or both of the horizontal and vertical dimensions.
  • the chrominance PUs are merged, or in other words multiple ones of the PUs are treated as a single PU.
  • the merging process is applied to the minimum extent necessary to achieve the required minimum dimension of 4 vertical samples and 4 horizontal samples. So, after the merging process has been carried out, in some permutations there may still be more than one chrominance PU corresponding to the original CU.
  • a chrominance PU derived by merging a group of two or more divided chrominance PUs the motion vector from the corresponding luminance PU at a lower- right position within the group of two or more PUs is used in respect of the merged PU.
  • This process is indicated schematically by a step 1220 in Figure 20, but as before it is noted that this is fundamentally a process carried out at the decoder rather than an active step performed at the encoder.
  • Figures 21 -23 schematically illustrate sets of PUs derived from a square 8 x 8 CU.
  • the block shown at the left corresponds to the respective CU 1 100, 1 1 10 and 1 120 of Figures 18a-18c.
  • a block shown at the middle of each diagram represents the format of the chrominance PU for each chrominance component in an example 4:2:0 format
  • the block(s) shown at the right of each diagram represent the format of the chrominance PU(s) for each chrominance component in an example 4:2:2 format.
  • each of the split chrominance PUs in the 4:2:0 format would have a width of 2 samples, and so the two chrominance PUs for each component are combined or merged into a single chrominance PU 1 102.
  • each of the two chrominance PUs would also have a width of 2 samples, and so once again the two chrominance PUs for each component are merged into a single chrominance PU 1 104.
  • the chrominance PU 1 102 and the chrominance PU 1 104 are decoded using the motion vector associated with the right-hand luminance PU, labelled as PU1 in Figure 21 .
  • each of the split chrominance PUs in the 4:2:0 format would have a vertical height of 2 samples, and so the two chrominance PUs for each component are combined or merged into a single chrominance PU 1 1 12.
  • each of the two chrominance PUs 1 1 14, 1 1 16 would have a width and a height of 4 samples, and so no merging is required in 4:2:2.
  • the chrominance PU 1 1 12 is decoded using the motion vector associated with the lower-right-hand luminance PU, labelled as PU1 in Figure 22.
  • each of the split chrominance PUs in the 4:2:0 format would have a width and a height of 2 samples, and so the four chrominance PUs for each component are combined or merged into a single chrominance PU 1 122.
  • each of the two chrominance PUs would also have a width of 2 samples, and so pairs of chrominance PUs for each component are merged into a single chrominance PU 1 124, 1 126, noting that no merging is required in the vertical direction because the 4:2:2 format has full vertical resolution, which means that the divided PUs have a height of four samples.
  • the chrominance PU 1 122 and the chrominance PU 1 126 are decoded using the motion vector associated with the lower-right- hand luminance PU, labelled as PU3 in Figure 23.
  • the motion vector is used from the lower-right-hand one of the luminance PUs corresponding to group of PUs which were merged to form, the chrominance PU 1 124. So, the motion vector for the luminance PU labelled as PU1 in Figure 23 is used to decode the merged chrominance PU 1 124.
  • the potential problem relates to the constraints placed on the location of the source block 1030, as discussed above with reference to Figure 17.
  • these constraints are tested against each motion vector as the motion vectors are first generated.
  • a motion vector is generated in respect of each luminance PU within the CU such that the motion vector complies with all of the criteria relating to allowable displacements for that luminance PU.
  • the source block 1030 corresponding to that luminance PU is in an image region considered allowable according to the set of criteria.
  • a motion vector derived in respect of a particular spatial position within the CU may be used in the decoding of the merged PU which represents a different (in fact, a larger) spatial region within the CU. In some situations, this may mean that the source block used in the decoding of the merged chrominance PU does not comply with the criteria.
  • a part of the source block used for the merged chrominance PU may be displaced further from the LCU containing the merged chrominance PU than the limit of (in this example) 64 luminance samples, or a part of the source block used for the merged chrominance PU may be outside the current row of LCUs, or a part of the source block used for the merged chrominance PU may contravene the requirement in a constrained intra mode that the source block is entirely intra-image or intra-block-copy encoded. Indeed, more than one of these criteria may be contravened by the source block corresponding to the merged chrominance PU.
  • a motion vector to be used by the decoder to decode a merged chrominance PU may in fact be illegal or invalid.
  • One possibility is for the decoder to detect this and to clip or otherwise change the motion vector in order to render it valid and legal.
  • testing and altering motion vectors at the decoder would place an undesirable processing burden on the decoder. Accordingly, this is not considered to be a useful solution.
  • the problem may be addressed by the controller 345 disallowing intra-block-copy for modes other than a 4:4:4 mode.
  • the problem may be addressed by the controller 345 disallowing splitting of 8x8 intra-block-copy CUs for modes other than a 4:4:4 mode.
  • controller 345 disallowing splitting of 8x8 intra-block-copy CUs where such a split would cause any of the PUs to require to be merged (and/or to fall below a minimum size in either dimension).
  • Figure 24 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU according to an embodiment of the present disclosure.
  • a step 1300 corresponds to the step 1 150 of Figure 19 and will not be described further here.
  • the detecting step comprises detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more other prediction units of that coding unit, by a displacement within the allowable range of displacements.
  • a step 1330 corresponds to the step 1 180 discussed above.
  • a further constraint is applied such that the displacements (corresponding to the positions of the corresponding source blocks) represented by those motion vectors are also tested for validity against the spatial positions of other PUs derived from that CU. Only motion vectors which comply with this additional validity test are allowed to be generated.
  • the predetermined subset of the luminance PU motion vectors comprises those motion vectors which will (or may) be used in respect of merged PUs. This will always include the motion vector for the lower-right luminance PU but may also include the motion vector for the
  • the predetermined subset of the luminance PU motion vectors comprises those motion vectors which will (or may) be used in respect of merged PUs.
  • the additional validity test may be applied to all of the motion vectors of a split CU. However, this may impose an unnecessary restriction on some of the motion vectors, and so in embodiments of the present disclosure the additional validity test is applied only to the predetermined subset of motion vectors.
  • the additional validity test applied to the predetermined subset of motion vectors is such that the motion vectors are not only valid for the spatial position of the luminance PU from which they were derived (a situation corresponding to the previously proposed step 1 170 of Figure 19) but are also valid in respect of the spatial position of any further luminance PUs corresponding to the chrominance PUs of a merged PU for which that motion vector may or will be used at decoding.
  • each motion vector in the predetermined subset of motion vectors is required to be valid in respect of all of the PUs of that CU.
  • This additional validity test encompasses the criteria set out above. However, in other embodiments the additional validity test may be applied only in respect of the spatial position of any further luminance PUs corresponding to the chrominance PUs of a merged PU for which that motion vector may or will be used at decoding.
  • the motion vector derived in respect of the luminance PU labelled as PU1 is tested for validity against not only its own spatial position but also the spatial position of the luminance PU labelled as PU0.
  • the motion vector derived in respect of the luminance PU labelled as PU1 is tested for validity against not only its own spatial position but also the spatial position of the luminance PU labelled as PU0.
  • the situation is slightly more complicated according to the chrominance format in use.
  • the motion vector derived in respect of the luminance PU labelled as PU3 is tested for validity against not only its own spatial position but also the spatial position of all of the other luminance PUs.
  • the motion vector derived in respect of the luminance PU labelled as PU1 is tested for validity against not only its own spatial position but also the special position of the luminance PU labelled as PU0
  • the motion vector derived in respect of the luminance PU labelled as PU3 is tested for validity against not only its own spatial position but also the special position of the luminance PU labelled as PU2.
  • Figure 25 schematically illustrates a modified search area in respect of a current PU 1400 which is subject to the additional validity test 1320.
  • the upper boundary 1410 represents the top of the current row of LCUs.
  • the left boundary 1420 represents a position 64 luminance samples to the left of the PU 1400.
  • the search area is modified.
  • the search area's upper boundary 1410 is lowered by an amount ⁇ which is equal to at least the vertical height of the PU above the position of the PU 1400 (in the examples discussed here, this will be equal to 4 luminance samples).
  • the search area's left- hand boundary 1420 is moved to the right by an amount ⁇ which is equal to at least the horizontal width of the PU to the left of the position of the PU 1400 (again, in the examples discussed here, this will be equal to 4 luminance samples). This provides an example of reducing the extent of the search area by a number of samples corresponding to the horizontal or vertical size of a luminance prediction unit.
  • Figure 26 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU. This flowchart schematically represents an example implementation of the step 1310 discussed above.
  • a search is carried out for luminance motion vectors in respect of each
  • a test is carried out to detect whether the luminance motion vector for one or more predetermined PUs (for example, the predetermined subset of PUs discussed above) is valid for one or more other PUs (for example, all of the PU is in the CU or just those other PUs which will make a response to a merged chrominance PU which will use that motion vector). If the test is failed, then the motion vector is rejected at a step 1520 and the process repeated. Otherwise, the luminance motion vectors are used in the encoding process at a step 1530. In some embodiments, if no valid motion vector can be found, the attempt to encode using intra-block-copy is abandoned and intra-image encoding is used instead.
  • the step 1310 may also comprise detecting whether the chrominance prediction units corresponding to the luminance prediction units have an array size less than a threshold size and, if so, detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
  • Figure 27 schematically illustrates an encoding method in which a data stream is encoded at a step 1600 according to the procedures discussed above, and at a step 1610 the data stream is output (transmitted and/or stored).
  • Figure 28 schematically illustrates a decoding method in which an encoded data stream is received (replayed and/or received as a transmission) at a step 1620 and, at a step 1630 the data stream is decoded.
  • Embodiments of the present disclosure also relates to an encoded data stream which has been encoded using the techniques discussed above.
  • Figures 29 and 30 schematically illustrate examples of machine-readable non-transitory storage media storing such a data stream.
  • Figure 29 provides a schematic example of a non-transitory memory such as a flash memory device 1640 storing such a data stream
  • Figure 30 provides a schematic example of a magnetic and/or optical disc medium 1650 storing such a data stream.
  • Embodiments of the present disclosure also relate to an instance of distribution of an encoded data stream or portion thereof. This may be from a server to a client device.
  • the instance of distribution may be defined in time, for example by a user request.
  • a method of operation of an image data encoding apparatus comprising: in respect of a coding unit comprising an array of data samples of a current image, dividing the coding unit into two or more prediction units; and
  • the detecting step comprises detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more other prediction units of that coding unit, by a displacement within the allowable range of displacements.
  • the detecting step comprises detecting whether the chrominance prediction units corresponding to the luminance prediction units have an array size less than a threshold size and, if so, detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
  • the detecting step comprises comparing samples of the prediction unit with source samples within a search area of a version of the current image
  • the detecting step comprises reducing the extent of the search area by a number of samples corresponding to the horizontal or vertical size of a prediction unit.
  • the detecting step comprises detecting, for one or more prediction units other than the predetermined prediction unit, whether applying the motion vector associated with the predetermined prediction unit represents a displacement outside of the allowable range of displacements.
  • the predetermined subset of the prediction units comprises those prediction units for which the corresponding motion vectors will be used in the decoding of a merged prediction unit. 1 1 .
  • the one or more other prediction units comprise those other prediction units for which the corresponding chrominance prediction units will form part of the merged prediction unit.
  • a method of operation of an image data encoding apparatus comprising: generating a predicted version of an image block by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
  • a method of operation of an image data encoding apparatus comprising: generating a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
  • a method of operation of an image data encoding apparatus comprising: generating a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
  • Image data comprising a plurality of coding units, at least some of which are divided into multiple prediction units having associated motion vectors for intra-block-copy prediction such that for at least one image component the prediction units are merged so as to share a motion vector from one of the prediction units of that coding unit, the image data being constrained so that the shared motion vector represents a displacement within an allowable range of displacements with respect to all of the merged prediction units sharing that motion vector.
  • a method of operation of an image data decoding apparatus comprising decoding image data according to clause 15 or clause 16.
  • Computer software which, when executed by a computer, causes the computer to carry out the method of any one of clauses 1 to 14 or 17.
  • a non-transitory machine readable storage medium which stores computer software according to clause 18.
  • 20. A non-transitory machine readable storage medium which stores image data according to clause 15 or clause 16.
  • a data encoding apparatus comprising:
  • a block generator operable in respect of a coding unit comprising an array of data samples of a current image, and configured to divide the coding unit into two or more prediction units;
  • a detector configured, for each prediction unit, to detect a motion vector pointing to an array of samples for use in generating a predicted version of the samples of that prediction unit so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements;
  • the detector is configured to detect a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
  • a data decoding apparatus configured to decode image data according to clause 15 or clause 16.
  • An image data encoding apparatus comprising:
  • an encoder configured to generate a predicted version of an image block by an intra- block-copy operation in which the predicted version is based upon another image region derived from the same image; and to disallow intra-block-copy operations for chrominance modes other than a 4:4:4 mode.
  • An image data encoding apparatus comprising:
  • an encoder configured to generate a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; to split the coding unit into two or more prediction units; and to disallow splitting of 8x8 sample intra-block-copy coding units for chrominance modes other than a 4:4:4 mode.
  • An image data encoding apparatus comprising:
  • an encoder configured to generate a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; to split the coding unit into two or more prediction units; and to disallow splitting of 8x8 intra-block-copy coding units where such a split would cause any of the prediction units to require to be merged and/or to fall below a minimum size in either dimension.
  • Video data capture, transmission, display and/or storage apparatus comprising apparatus according to any one of clauses 22 to 26. Apparatus features of the above encoder or decoder may be carried using the apparatus described above, with various functionality being provided by the controller 345.

Abstract

A method of operation of an image data encoding apparatus comprises in respect of a coding unit comprising an array of data samples of a current image, dividing the coding unit into two or more prediction units; and for each prediction unit, detecting a motion vector pointing to an array of samples for use in generating a predicted version of the samples of that prediction unit so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements; in which the detecting step comprises detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more other prediction units of that coding unit, by a displacement within the allowable range of displacements.

Description

IMAGE DATA ENCODING AND DECODING
Field of the Invention
This disclosure relates to data encoding and decoding.
Description of the Related Art
The "background" description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present disclosure.
There are several video data compression and decompression systems which involve generating a prediction of a group of video data, deriving the residual or difference between the video data and the prediction, transforming the residual data into a frequency domain representation, quantising the frequency domain coefficients and then applying some form of entropy encoding to the quantised coefficients.
One such technique is defined by the HEVC (High Efficiency Video Coding) standards. There are various ways to generate the prediction data, including inter-image prediction, in which the prediction data are generated with respect to positions, in one or more other decoded images, defined with respect to the current image by a motion vector and one or more reference indices; intra-image encoding in which the prediction data are derived from surrounding encoded and decoded samples of the current image; and intra-block-copy. In some respects, intra-block-copy may be considered as a hybrid of inter- and intra-image encoding, in that a predicted block is derived from another block in an encoded and decoded version of the current image, defined with respect to the current block by a motion vector.
Summary
This disclosure provides a data encoding method according to claim 1 .
Further respective aspects and features are defined in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary, but not restrictive of, the present disclosure.
Brief Description of the Drawings
A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description of embodiments, when considered in connection with the accompanying drawings, wherein:
Figure 1 schematically illustrates an audio/video (A/V) data transmission and reception system using video data compression and decompression; Figure 2 schematically illustrates a video display system using video data decompression;
Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression;
Figure 4 schematically illustrates a video camera using video data compression;
Figure 5 provides a schematic overview of a video data compression and decompression apparatus;
Figure 6 schematically illustrates the generation of predicted images;
Figure 7 schematically illustrates a largest coding unit (LCU);
Figure 8 schematically illustrates a set of four coding units (CU);
Figures 9 and 10 schematically illustrate the coding units of Figure 8 sub-divided into smaller coding units;
Figure 1 1 schematically illustrates an array of prediction units (PU);
Figure 12 schematically illustrates an array of transform units (TU);
Figure 13 schematically illustrates a partially-encoded image;
Figure 14 schematically illustrates a set of possible prediction directions;
Figure 15 schematically illustrates a set of prediction modes;
Figure 16 schematically illustrates a zig-zag scan;
Figure 17 schematically illustrates an intra-block-copy process;
Figures 18a-18c schematically illustrate the division of a CU into multiple PUs;
Figure 19 is a schematic flowchart illustrating a previously proposed process for using multiple PUs derived from a single CU;
Figure 20 is a schematic flowchart illustrating a process for merging small PUs together;
Figures 21 -23 schematically illustrate sets of PUs derived from a square CU;
Figure 24 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU;
Figure 25 schematically illustrates a modified search area;
Figure 26 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU;
Figure 27 schematically illustrates an encoding method;
Figure 28 schematically illustrates a decoding method; and
Figures 29 and 30 schematically illustrate examples of machine-readable non-transitory storage media.
Description of the Embodiments
Referring now to the drawings, Figures 1 -4 are provided to give schematic illustrations of apparatus or systems making use of the compression and/or decompression apparatus to be described below in connection with embodiments. All of the data compression and/or decompression apparatus is to be described below may be implemented in hardware, in software running on a general-purpose data processing apparatus such as a general-purpose computer, as programmable hardware such as an application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or as combinations of these. In cases where the embodiments are implemented by software and/or firmware, it will be appreciated that such software and/or firmware, and non-transitory machine- readable data storage media by which such software and/or firmware are stored or otherwise provided, are considered as embodiments.
Figure 1 schematically illustrates an audio/video data transmission and reception system using video data compression and decompression.
An input audio/video signal 10 is supplied to a video data compression apparatus 20 which compresses at least the video component of the audio/video signal 10 for transmission along a transmission route 30 such as a cable, an optical fibre, a wireless link or the like. The compressed signal is processed by a decompression apparatus 40 to provide an output audio/video signal 50. For the return path, a compression apparatus 60 compresses an audio/video signal for transmission along the transmission route 30 to a decompression apparatus 70.
The compression apparatus 20 and decompression apparatus 70 can therefore form one node of a transmission link. The decompression apparatus 40 and decompression apparatus 60 can form another node of the transmission link. Of course, in instances where the transmission link is uni-directional, only one of the nodes would require a compression apparatus and the other node would only require a decompression apparatus.
Figure 2 schematically illustrates a video display system using video data decompression. In particular, a compressed audio/video signal 100 is processed by a decompression apparatus 1 10 to provide a decompressed signal which can be displayed on a display 120. The decompression apparatus 1 10 could be implemented as an integral part of the display 120, for example being provided within the same casing as the display device. Alternatively, the decompression apparatus 1 10 might be provided as (for example) a so-called set top box (STB), noting that the expression "set-top" does not imply a requirement for the box to be sited in any particular orientation or position with respect to the display 120; it is simply a term used in the art to indicate a device which is connectable to a display as a peripheral device.
Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression. An input audio/video signal 130 is supplied to a compression apparatus 140 which generates a compressed signal for storing by a store device 150 such as a magnetic disk device, an optical disk device, a magnetic tape device, a solid state storage device such as a semiconductor memory or other storage device. For replay, compressed data is read from the store device 150 and passed to a decompression apparatus 160 for decompression to provide an output audio/video signal 170.
It will be appreciated that the compressed or encoded signal, and a storage medium or data carrier storing that signal, are considered as embodiments.
Figure 4 schematically illustrates a video camera using video data compression. In
Figure 4, and image capture device 180, such as a charge coupled device (CCD) image sensor and associated control and read-out electronics, generates a video signal which is passed to a compression apparatus 190. A microphone (or plural microphones) 200 generates an audio signal to be passed to the compression apparatus 190. The compression apparatus 190 generates a compressed audio/video signal 210 to be stored and/or transmitted (shown generically as a schematic stage 220).
The techniques to be described below relate primarily to video data compression. It will be appreciated that many existing techniques may be used for audio data compression in conjunction with the video data compression techniques which will be described, to generate a compressed audio/video signal. Accordingly, a separate discussion of audio data compression will not be provided. It will also be appreciated that the data rate associated with video data, in particular broadcast quality video data, is generally very much higher than the data rate associated with audio data (whether compressed or uncompressed). It will therefore be appreciated that uncompressed audio data could accompany compressed video data to form a compressed audio/video signal. It will further be appreciated that although the present examples (shown in Figures 1 -4) relate to audio/video data, the techniques to be described below can find use in a system which simply deals with (that is to say, compresses, decompresses, stores, displays and/or transmits) video data. That is to say, the embodiments can apply to video data compression without necessarily having any associated audio data handling at all.
Figure 5 provides a schematic overview of a video data compression and decompression apparatus.
Successive images of an input video signal 300 are supplied to an adder 310 and to an image predictor 320. The image predictor 320 will be described below in more detail with reference to Figure 6. The adder 310 in fact performs a subtraction (negative addition) operation, in that it receives the input video signal 300 on a "+" input and the output of the image predictor 320 on a "-" input, so that the predicted image is subtracted from the input image. The result is to generate a so-called residual image signal 330 representing the difference between the actual and projected images.
One reason why a residual image signal is generated is as follows. The data coding techniques to be described, that is to say the techniques which will be applied to the residual image signal, tends to work more efficiently when there is less "energy" in the image to be encoded. Here, the term "efficiently" refers to the generation of a small amount of encoded data; for a particular image quality level, it is desirable (and considered "efficient") to generate as little data as is practicably possible. The reference to "energy" in the residual image relates to the amount of information contained in the residual image. If the predicted image were to be identical to the real image, the difference between the two (that is to say, the residual image) would contain zero information (zero energy) and would be very easy to encode into a small amount of encoded data. In general, if the prediction process can be made to work reasonably well, the expectation is that the residual image data will contain less information (less energy) than the input image and so will be easier to encode into a small amount of encoded data.
The residual image data 330 is supplied to a transform unit 340 which generates a discrete cosine transform (DCT) representation of the residual image data. The DCT technique itself is well known and will not be described in detail here. There are however aspects of the techniques used in the present apparatus which will be described in more detail below, in particular relating to the selection of different blocks of data to which the DCT operation is applied. These will be discussed with reference to Figures 7-12 below.
Note that in some embodiments, a discrete sine transform (DST) is used instead of a
DCT. In other embodiments, no transform might be used. This can be done selectively, so that the transform stage is, in effect, bypassed, for example under the control of a "transform skip" command or mode.
The output of the transform unit 340, which is to say, a set of transform coefficients for each transformed block of image data, is supplied to a quantiser 350. Various quantisation techniques are known in the field of video data compression, ranging from a simple multiplication by a quantisation scaling factor through to the application of complicated lookup tables under the control of a quantisation parameter. The general aim is twofold. Firstly, the quantisation process reduces the number of possible values of the transformed data. Secondly, the quantisation process can increase the likelihood that values of the transformed data are zero. Both of these can make the entropy encoding process, to be described below, work more efficiently in generating small amounts of compressed video data.
A data scanning process is applied by a scan unit 360. The purpose of the scanning process is to reorder the quantised transformed data so as to gather as many as possible of the non-zero quantised transformed coefficients together, and of course therefore to gather as many as possible of the zero-valued coefficients together. These features can allow so-called run-length coding or similar techniques to be applied efficiently. So, the scanning process involves selecting coefficients from the quantised transformed data, and in particular from a block of coefficients corresponding to a block of image data which has been transformed and quantised, according to a "scanning order" so that (a) all of the coefficients are selected once as part of the scan, and (b) the scan tends to provide the desired reordering. Techniques for selecting a scanning order will be described below. One example scanning order which can tend to give useful results is a so-called zigzag scanning order.
The scanned coefficients are then passed to an entropy encoder (EE) 370. Again, various types of entropy encoding may be used. Two examples which will be described below are variants of the so-called CABAC (Context Adaptive Binary Arithmetic Coding) system and variants of the so-called CAVLC (Context Adaptive Variable-Length Coding) system. In general terms, CABAC is considered to provide a better efficiency, and in some studies has been shown to provide a 10-20% reduction in the quantity of encoded output data for a comparable image quality compared to CAVLC. However, CAVLC is considered to represent a much lower level of complexity (in terms of its implementation) than CABAC. The CABAC technique will be discussed with reference to Figure 17 below.
Note that the scanning process and the entropy encoding process are shown as separate processes, but in fact can be combined or treated together. That is to say, the reading of data into the entropy encoder can take place in the scan order. Corresponding considerations apply to the respective inverse processes to be described below.
The output of the entropy encoder 370, along with additional data (mentioned above and/or discussed below), for example defining the manner in which the predictor 320 generated the predicted image, provides a compressed output video signal 380.
However, a return path is also provided because the operation of the predictor 320 itself depends upon a decompressed version of the compressed output data.
The reason for this feature is as follows. At the appropriate stage in the decompression process (to be described below) a decompressed version of the residual data is generated. This decompressed residual data has to be added to a predicted image to generate an output image (because the original residual data was the difference between the input image and a predicted image). In order that this process is comparable, as between the compression side and the decompression side, the predicted images generated by the predictor 320 should be the same during the compression process and during the decompression process. Of course, at decompression, the apparatus does not have access to the original input images, but only to the decompressed images. Therefore, at compression, the predictor 320 bases its prediction (at least, for inter-image encoding) on decompressed versions of the compressed images.
The entropy encoding process carried out by the entropy encoder 370 is considered to be "lossless", which is to say that it can be reversed to arrive at exactly the same data which was first supplied to the entropy encoder 370. So, the return path can be implemented before the entropy encoding stage. Indeed, the scanning process carried out by the scan unit 360 is also considered lossless, but in the present embodiment the return path 390 is from the output of the quantiser 350 to the input of a complimentary inverse quantiser 420. In general terms, an entropy decoder 410, the reverse scan unit 400, an inverse quantiser 420 and an inverse transform unit 430 provide the respective inverse functions of the entropy encoder 370, the scan unit 360, the quantiser 350 and the transform unit 340. For now, the discussion will continue through the compression process; the process to decompress an input compressed video signal will be discussed separately below.
In the compression process, the scanned coefficients are passed by the return path 390 from the quantiser 350 to the inverse quantiser 420 which carries out the inverse operation of the scan unit 360. An inverse quantisation and inverse transformation process are carried out by the units 420, 430 to generate a compressed-decompressed residual image signal 440.
The image signal 440 is added, at an adder 450, to the output of the predictor 320 to generate a reconstructed output image 460. This forms one input to the image predictor 320, as will be described below.
Turning now to the process applied to a received compressed video signal 470, the signal is supplied to the entropy decoder 410 and from there to the chain of the reverse scan unit 400, the inverse quantiser 420 and the inverse transform unit 430 before being added to the output of the image predictor 320 by the adder 450. In straightforward terms, the output 460 of the adder 450 forms the output decompressed video signal 480. In practice, further filtering may be applied before the signal is output.
The operation of the apparatus of Figure 5 is under the control of a controller 345.
Figure 6 schematically illustrates the generation of predicted images, and in particular the operation of the image predictor 320.
There are two basic modes of prediction: so-called intra-image prediction and so-called inter-image, or motion-compensated (MC), prediction.
Intra-image prediction bases a prediction of the content of a block of the image on data from within the same image. This corresponds to so-called l-frame encoding in other video compression techniques. In contrast to l-frame encoding, where the whole image is intra- encoded, in the present embodiments the choice between intra- and inter- encoding can be made on a block-by-block basis, though in other embodiments the choice is still made on an image-by-image basis.
Motion-compensated prediction makes use of motion information which attempts to define the source, in another adjacent or nearby image, of image detail to be encoded in the current image. Accordingly, in an ideal example, the contents of a block of image data in the predicted image can be encoded very simply as a reference (a motion vector) pointing to a corresponding block at the same or a slightly different position in an adjacent image.
Returning to Figure 6, two image prediction arrangements (corresponding to intra- and inter-image prediction) are shown, the results of which are selected by a multiplexer 500 under the control of a mode signal 510 so as to provide blocks of the predicted image for supply to the adders 310 and 450. The choice is made in dependence upon which selection gives the lowest "energy" (which, as discussed above, may be considered as information content requiring encoding), and the choice is signalled to the encoder within the encoded output datastream. Image energy, in this context, can be detected, for example, by carrying out a trial subtraction of an area of the two versions of the predicted image from the input image, squaring each pixel value of the difference image, summing the squared values, and identifying which of the two versions gives rise to the lower mean squared value of the difference image relating to that image area.
The actual prediction, in the intra-encoding system, is made on the basis of image blocks received as part of the signal 460, which is to say, the prediction is based upon encoded- decoded image blocks in order that exactly the same prediction can be made at a decompression apparatus. However, data can be derived from the input video signal 300 by an intra-mode selector 520 to control the operation of the intra-image predictor 530.
For inter-image prediction, a motion compensated (MC) predictor 540 uses motion information such as motion vectors derived by a motion estimator 550 from the input video signal 300. Those motion vectors are applied to a processed version of the reconstructed image 460 by the motion compensated predictor 540 to generate blocks of the inter-image prediction.
The processing applied to the signal 460 will now be described. Firstly, the signal is filtered by a filter unit 560. This involves applying a "deblocking" filter to remove or at least tend to reduce the effects of the block-based processing carried out by the transform unit 340 and subsequent operations. Also, an adaptive loop filter is applied using coefficients derived by processing the reconstructed signal 460 and the input video signal 300. The adaptive loop filter is a type of filter which, using known techniques, applies adaptive filter coefficients to the data to be filtered. That is to say, the filter coefficients can vary in dependence upon various factors. Data defining which filter coefficients to use is included as part of the encoded output datastream.
The filtered output from the filter unit 560 in fact forms the output video signal 480. It is also buffered in one or more image stores 570; the storage of successive images is a requirement of motion compensated prediction processing, and in particular the generation of motion vectors. To save on storage requirements, the stored images in the image stores 570 may be held in a compressed form and then decompressed for use in generating motion vectors. For this particular purpose, any known compression / decompression system may be used. The stored images are passed to an interpolation filter 580 which generates a higher resolution version of the stored images; in this example, intermediate samples (sub-samples) are generated such that the resolution of the interpolated image is output by the interpolation filter 580 is 8 times (in each dimension) that of the images stored in the image stores 570. The interpolated images are passed as an input to the motion estimator 550 and also to the motion compensated predictor 540.
In embodiments, a further optional stage is provided, which is to multiply the data values of the input video signal by a factor of four using a multiplier 600 (effectively just shifting the data values left by two bits), and to apply a corresponding divide operation (shift right by two bits) at the output of the apparatus using a divider or right-shifter 610. So, the shifting left and shifting right changes the data purely for the internal operation of the apparatus. This measure can provide for higher calculation accuracy within the apparatus, as the effect of any data rounding errors is reduced.
The way in which an image is partitioned for compression processing will now be described. At a basic level, and image to be compressed is considered as an array of blocks of samples. For the purposes of the present discussion, the largest such block under consideration is a so-called largest coding unit (LCU) 700 (Figure 7), which represents a square array of 64 x 64 samples. Here, the discussion relates to luminance samples. Depending on the chrominance mode, such as 4:4:4, 4:2:2, 4:2:0 or 4:4:4:4 (GBR plus key data), there will be differing numbers of corresponding chrominance samples corresponding to the luminance block.
Three basic types of blocks will be described: coding units, prediction units and transform units. In general terms, the recursive subdividing of the LCUs allows an input picture to be partitioned in such a way that both the block sizes and the block coding parameters (such as prediction or residual coding modes) can be set according to the specific characteristics of the image to be encoded.
The LCU may be subdivided into so-called coding units (CU). Coding units are always square and have a size between 8x8 samples and the full size of the LCU 700. The coding units can be arranged as a kind of tree structure, so that a first subdivision may take place as shown in Figure 8, giving coding units 710 of 32x32 samples; subsequent subdivisions may then take place on a selective basis so as to give some coding units 720 of 16x16 samples (Figure 9) and potentially some coding units 730 of 8x8 samples (Figure 10). Overall, this process can provide a content-adapting coding tree structure of CU blocks, each of which may be as large as the LCU or as small as 8x8 samples. Encoding of the output video data takes place on the basis of the coding unit structure.
Figure 1 1 schematically illustrates an array of prediction units (PU). A prediction unit is a basic unit for carrying information relating to the image prediction processes, or in other words the additional data added to the entropy encoded residual image data to form the output video signal from the apparatus of Figure 5. In general, prediction units are not restricted to being square in shape. They can take other shapes, in particular rectangular shapes forming half of one of the square coding units, as long as the coding unit is greater than the minimum (8x8) size. The aim is to allow the boundary of adjacent prediction units to match (as closely as possible) the boundary of real objects in the picture, so that different prediction parameters can be applied to different real objects. Each coding unit may contain one or more prediction units.
Figure 12 schematically illustrates an array of transform units (TU). A transform unit is a basic unit of the transform and quantisation process. Transform units are always square and can take a size from 4x4 up to 32x32 samples. Each coding unit can contain one or more transform units. The acronym SDIP-P in Figure 12 signifies a so-called short distance intra- prediction partition. In this arrangement only one dimensional transforms are used, so a 4xN block is passed through N transforms with input data to the transforms being based upon the previously decoded neighbouring blocks and the previously decoded neighbouring lines within the current SDIP-P.
The intra-prediction process will now be discussed. In general terms, intra-prediction involves generating a prediction of a current block (a prediction unit) of samples from previously-encoded and decoded samples in the same image. Figure 13 schematically illustrates a partially encoded image 800. Here, the image is being encoded from top-left to bottom-right on an LCU basis. An example LCU encoded partway through the handling of the whole image is shown as a block 810. A shaded region 820 above and to the left of the block 810 has already been encoded. The intra-image prediction of the contents of the block 810 can make use of any of the shaded area 820 but cannot make use of the unshaded area below that.
The block 810 represents an LCU; as discussed above, for the purposes of intra-image prediction processing, this may be subdivided into a set of smaller prediction units. An example of a prediction unit 830 is shown within the LCU 810.
The intra-image prediction takes into account samples above and/or to the left of the current LCU 810. Source samples, from which the required samples are predicted, may be located at different positions or directions relative to a current prediction unit within the LCU 810. To decide which direction is appropriate for a current prediction unit, the results of a trial prediction based upon each candidate direction are compared in order to see which candidate direction gives an outcome which is closest to the corresponding block of the input image. The candidate direction giving the closest outcome is selected as the prediction direction for that prediction unit.
The picture may also be encoded on a "slice" basis. In one example, a slice is a horizontally adjacent group of LCUs. But in more general terms, the entire residual image could form a slice, or a slice could be a single LCU, or a slice could be a row of LCUs, and so on. Slices can give some resilience to errors as they are encoded as independent units. The encoder and decoder states are completely reset at a slice boundary. For example, intra- prediction is not carried out across slice boundaries; slice boundaries are treated as image boundaries for this purpose. Figure 14 schematically illustrates a set of possible (candidate) prediction directions. The full set of 34 candidate directions is available to a prediction unit of 8x8, 16x16 or 32x32 samples. The special cases of prediction unit sizes of 4x4 and 64x64 samples have a reduced set of candidate directions available to them (17 candidate directions and 5 candidate directions respectively). The directions are determined by horizontal and vertical displacement relative to a current block position, but are encoded as prediction "modes", a set of which is shown in Figure 15. Note that the so-called DC mode represents a simple arithmetic mean of the surrounding upper and left-hand samples.
Figure 16 schematically illustrates a zig-zag scan as an example of an order of processing samples in a block, starting from a sample position 840.
Figure 17 schematically illustrates an intra-block-copy process.
In Figure 17, two largest coding units (LCUs) 1000, 1010 are shown, representing a portion of a current image to be encoded. The LCU 1000 is currently being encoded, whereas the LCU 1010 is a previously encoded and then decoded portion of the image to the left of the LCU 1000. Comparing Figure 17 with Figure 13 discussed above, the encoding process proceeds generally from the top left to the lower right of the image, so that the LCU 1010 is handled by the encoding process before the LCU 1000. Similarly, within an LCU, coding units (CUs) such as a CU 1020 are encoded in a certain order, for example from top left to lower right within the LCU. Accordingly, when an arbitrary CU 1020 is encoded, unless the CU 1020 is the first CU within that LCU to be encoded, there will be other portions of the LCU containing the CU 1020 which have already been encoded and decoded.
The intra-block-copy process bases a prediction of the CU 1020 on a correspondingly- sized source block (for example, a block 1030) which is selected from portions of the image which have already been encoded and decoded. The spatial relationship between the CU 1020 and the source block 1030 is indicated by a motion vector 1040.
The prediction is based upon an encoded and decoded version of the source block, so that the same information is available at the encoder and that the decoder (the decoder does not have access to the original image but only to encoded and decoded versions of the image).
In the example of the HEVC system, the prediction of the CU 1020 is equal to the contents of the encoded-decoded source block 1030. However, it is envisaged that in other embodiments various processing could be applied to the contents of the source block 1030 to form the prediction of the CU 1020.
Intra-block-copy is particularly useful when dealing with so-called "screen content", which is a term sometimes used to define image content not generated using a camera. So, examples of screen content may include animation, computer graphics, subtitling and the like, or a combination of these with content generated using a camera. Screen content may provide multiple small regions which are sufficiently similar to one another that the intra-block-copy process can provide a very accurate prediction of a current block.
There are restrictions in the current HEVC system as to the allowable location of the source block 1030:
1 . The source block 1030 must lie in a region of the image which has already been encoded and decoded.
2. The source block 1030 must be in the same row of LCUs as the LCU containing the CU 1020 to be encoded.
3. The source block 1030 must be displaced to the left of the LCU containing the CU 1020 to be encoded by no more than a maximum horizontal displacement (though the source block 1030 may also be within the LCU containing the CU 1020). In the present embodiments, the maximum horizontal displacement is equal to 64 luminance samples.
4. Finally, the source block 1030 must not overlap the CU 1020.
A further criterion may be applied in a so-called "constrained intra" mode. This mode is provided within the standards to allow, for example, the image to be refreshed periodically even if an entire intra-encoded image (an "I frame" in MPEG terminology) is not being provided. In the constrained intra mode, at least portions of the image are constrained so that they may only be derived from other samples which have themselves been intra-image encoded or encoded using intra-block-copy processing. In this way, the constrained intra mode prevents (for example, over the course of several images) erroneous data from previously processed images being propagated into subsequent images by virtue of inter-image processing. In the constrained intra mode, therefore, the source block 1030 must itself have been intra-image encoded and/or encoded using intra-block-copy processing (it could be both because a source block 1030 could span two CUs).
These criteria together provide examples of an allowable range of displacements represented by the motion vectors comprising of one or more selected from the group consisting of: displacements to a position within a current row of coding units; displacements of no more than a predetermined number of samples to the left of the LCU containing the prediction unit; and in a constrained intra mode, displacements to source samples which were derived using intra-image or intra-block-copy processing.
In the discussion above, the CU 1020 has been treated as a single unit for the purposes of the intra-block-copy process. Strictly speaking, the CU 1020 has been handled as a single prediction unit (PU) and the motion vector 1040 was in fact associated with the single PU. Where the size and extent of the CU exactly matched that of the single PU, the question of whether to refer to the block 1020 as a CU or a PU was somewhat semantic. However, the difference becomes technically significant where a CU is divided into multiple PUs for intra- block-copy processing. In previously proposed arrangements, it was impermissible in the draft HEVC standards for a CU to be subject to intra-block-copy processing to be divided into multiple PUs. In contrast, the present embodiments relate to arrangements in which a CU, to be subject to intra-block-copy processing, can be divided into multiple PUs.
Figures 18a-18c schematically illustrate the division of a CU into multiple PUs.
According to the proposed HEVC standard, a CU comprises a square-shaped array of luminance samples and the associated chrominance samples. In a non-subsampled chrominance format such as 4:4:4, the array of corresponding or associated chrominance samples will also be a square array of equal size to the array of luminance samples because of the 1 :1 relationship between samples of the luminance and chrominance components. However, in a subsampled chrominance format such as 4:2:2 or 4:2:0, there will be fewer chrominance samples than luminance samples for a particular CU. Figures 18a-18c schematically illustrates the situation for the luminance samples only; the situation for the chrominance samples will be discussed further below.
The references above to luminance and chrominance apply to a system in which there is indeed a luminance channel, such as a YUV system. In other systems, such as an RGB system, the equivalent primary channel would be one of the channels such as (in the example of RGB) the green channel. The discussions here relating to "luminance" can therefore be applied to the primary channel (such as the green channel in an RGB system) and the discussions here relating to "chrominance" can be applied to channels other than the primary channel (such as red and blue in an RGB system). However, the techniques are applicable to any situation in which motion vectors derived from a PU are potentially shared by other PUs, whether or not particular channels or components are defined.
Also, in the proposed HEVC standard, the square array of luminance samples has a size equal to 2N x 2N samples, where 2N = 8, 16, 32 or 64. The dimension is expressed as "2N" to indicate that the number is divisible by 2; this feature will become clear when discussing the nomenclature of the divided PUs below.
Referring to Figure 18a, a 2N x 2N CU 1 100 is divided horizontally into two PUs, each of size N x 2N samples. In Figure 18b, a 2N x 2N CU 1 1 10 is divided vertically into two PUs, each of size 2N x N samples. In Figure 18c, a 2N x 2N CU 1 120 is divided horizontally and vertically into four PUs, each of size N x N samples. Note that the splitting of a CU into multiple PUs is not a recursive process in the current HEVC system. That is to say the splitting is carried out only once, so that a PU is not further divided. This means that the divisions shown in Figures 18a-18c form an exhaustive list of the possible divisions of a CU into multiple PUs.
The choice of how to divide the CU (and, indeed, whether to divide it at all) is taken by the controller 345. For example, various trial compressions can be performed in whole or in part and the most efficient compression (in terms of a cost function, for example relating to output bit rate and/or error rate or signal to noise ratio) may be selected using known techniques. Indeed, the choice between inter-image, intra-image and intra-block-copy processing is made by the controller 345 by a similar technique.
Where the CU is divided into multiple PUs for an intra-block-copy process, an individual motion vector is generated for each of the divided PUs. Motion vectors are generated in respect of the luminance component of each PU and, subject to the exceptions to be discussed below, the same motion vectors are used in respect of the corresponding chrominance components of that PU. Note that in some embodiments the encoder can also take account of the non-primary components such as the chrominance components in preparing the motion vectors.
Figure 19 is a schematic flowchart illustrating a previously proposed process for using multiple PUs derived from a single CU in an intra-block-copy process.
At a step 1 150, a CU is split into multiple PUs. As discussed above, the decision to make the division into multiple PUs, and indeed the decision to use intra-block-copy processing rather than inter-image or intra-image processing, is taken by the controller 345.
At a step 1 160, a motion vector is derived from the luminance component of each PU. In some examples, a motion vector is obtained using an established correlation technique which compares the PU to be encoded with correspondingly-sized test regions within a search area of the already-encoded (and decoded) image, though other techniques are envisaged. The test region providing the greatest correlation, or in other words the greatest similarity, with the PU to be encoded is selected as the source block for that PU. The displacement between the PU and the selected source block forms the corresponding motion vector.
At a step 1 170, the motion vector is tested to detect whether the displacement represented by that motion vector is allowable according to the four criteria listed above. If not, control returns to the step 1 160 and a different motion vector is selected. Note that the steps 1 160 and 1 170 are shown separately for clarity of the explanation but they can be combined such that non-allowable motion vectors are inhibited from being selected in the first place.
Accordingly, this process provides an example of a method of operation of an image data encoding apparatus, the method comprising: in respect of a coding unit comprising an array of data samples of a current image, dividing the coding unit into two or more prediction units; and for each prediction unit, detecting a motion vector pointing to an array of luminance samples for use in generating a predicted version of the luminance samples of that prediction unit, by comparing samples of the prediction unit with source samples within a search area of a version of the current image, so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements.
Once an allowable motion vector has been obtained for that luminance PU, the same motion vector is used for the corresponding chrominance PU at a step 1 180. Note that in practice, this relationship is implicit, so that an active step corresponding to the step 1 180 may not be literally implemented at the encoder. From the encoder's point of view, a single set of motion vectors, one for each PU, is generated from the luminance PUs and is provided as part of the compressed datastream. The re-use of the motion vectors in respect of corresponding chrominance PUs is assumed. The decoder reuses the motion vectors as described.
An issue which may arise when a CU is split into multiple PUs relates to the size of the corresponding chrominance PUs.
In the HEVC system, generally speaking a minimum size of a block or array of samples for processing is 4 samples. That is to say, any array of samples for processing within the system should have a horizontal width of at least 4 samples and a vertical height of at least 4 samples.
In the case of the division of a CU into multiple PUs, this is not a problem for the luminance samples. This is because the minimum size of a CU is defined as 8 x 8 samples, and so (using the exhaustive list of possible divisions shown in Figures 18a-18c) it is impossible for a luminance PU to have either dimension smaller than 4 samples.
However, when the CU size is 8 x 8 luminance samples and the chrominance format is not 4:4:4, which is to say that a subsampled chrominance format (such as, for example, 4:2:2 or 4:2:0) is in use, there are fewer chrominance samples associated with a CU than the number of luminance samples. In particular, in 4:2:2 the chrominance samples are subsampled horizontally with respect to the luminance samples so that any further horizontal division such as that represented by Figure 18a or Figure 18c will result in a horizontal dimension of the chrominance PUs equal to 2 chrominance samples. In the example of 4:2:0, the chrominance components have half the horizontal and half the vertical resolution of the luminance component so that any of the divisions shown in Figures 18a-18c will result in a chrominance PU having at least one dimension equal to 2 samples. As mentioned above, dimensions less than 4 samples are not allowable within the HEVC system and so an established technique for dealing with this problem is to merge together chrominance PUs derived from a single CU so as to achieve the minimum dimension of 4 samples.
Figure 20 is a schematic flowchart illustrating a process for merging small PUs together. At a step 1200, it is detected (for example, by the controller 345) that the currently selected division of a CU into multiple PUs will result in chrominance PUs being generated which are smaller than 4 samples in either or both of the horizontal and vertical dimensions.
At a step 1210, the chrominance PUs are merged, or in other words multiple ones of the PUs are treated as a single PU. Although it would be possible always to merge all of the chrominance PUs into a single PU in this situation, in the present embodiments the merging process is applied to the minimum extent necessary to achieve the required minimum dimension of 4 vertical samples and 4 horizontal samples. So, after the merging process has been carried out, in some permutations there may still be more than one chrominance PU corresponding to the original CU.
The question then arises as to which motion vector to associate with the merged PU(s).
It is proposed that for a chrominance PU derived by merging a group of two or more divided chrominance PUs, the motion vector from the corresponding luminance PU at a lower- right position within the group of two or more PUs is used in respect of the merged PU. This process is indicated schematically by a step 1220 in Figure 20, but as before it is noted that this is fundamentally a process carried out at the decoder rather than an active step performed at the encoder.
Figures 21 -23 schematically illustrate sets of PUs derived from a square 8 x 8 CU. In each case, the block shown at the left corresponds to the respective CU 1 100, 1 1 10 and 1 120 of Figures 18a-18c. A block shown at the middle of each diagram represents the format of the chrominance PU for each chrominance component in an example 4:2:0 format, and the block(s) shown at the right of each diagram represent the format of the chrominance PU(s) for each chrominance component in an example 4:2:2 format.
Referring to Figure 21 , a so-called N x 2N split has been carried out corresponding to that shown in Figure 18a. With an 8 x 8 CU 1 100, each of the split chrominance PUs in the 4:2:0 format would have a width of 2 samples, and so the two chrominance PUs for each component are combined or merged into a single chrominance PU 1 102. Similarly, in the horizontally subsampled 4:2:2 format, each of the two chrominance PUs would also have a width of 2 samples, and so once again the two chrominance PUs for each component are merged into a single chrominance PU 1 104. Applying the criteria set out above, the chrominance PU 1 102 and the chrominance PU 1 104 are decoded using the motion vector associated with the right-hand luminance PU, labelled as PU1 in Figure 21 .
In Figure 22, a so-called 2N x N split has been carried out corresponding to that shown in Figure 18b. With an 8 x 8 CU 1 100, each of the split chrominance PUs in the 4:2:0 format would have a vertical height of 2 samples, and so the two chrominance PUs for each component are combined or merged into a single chrominance PU 1 1 12. However, in the horizontally subsampled 4:2:2 format, each of the two chrominance PUs 1 1 14, 1 1 16 would have a width and a height of 4 samples, and so no merging is required in 4:2:2. Applying the criteria set out above, the chrominance PU 1 1 12 is decoded using the motion vector associated with the lower-right-hand luminance PU, labelled as PU1 in Figure 22.
Referring to Figure 23, a so-called N x N split has been carried out corresponding to that shown in Figure 18c. With an 8 x 8 CU 1 100, each of the split chrominance PUs in the 4:2:0 format would have a width and a height of 2 samples, and so the four chrominance PUs for each component are combined or merged into a single chrominance PU 1 122. Similarly, in the horizontally subsampled 4:2:2 format, each of the two chrominance PUs would also have a width of 2 samples, and so pairs of chrominance PUs for each component are merged into a single chrominance PU 1 124, 1 126, noting that no merging is required in the vertical direction because the 4:2:2 format has full vertical resolution, which means that the divided PUs have a height of four samples. Applying the criteria set out above, the chrominance PU 1 122 and the chrominance PU 1 126 are decoded using the motion vector associated with the lower-right- hand luminance PU, labelled as PU3 in Figure 23. Applying the criteria to the 4:2:2 PU 1 124, the motion vector is used from the lower-right-hand one of the luminance PUs corresponding to group of PUs which were merged to form, the chrominance PU 1 124. So, the motion vector for the luminance PU labelled as PU1 in Figure 23 is used to decode the merged chrominance PU 1 124.
However, using these arrangements, a further issue has been identified in accordance with the present disclosure. This issue will now be discussed and techniques for resolving or at least alleviating the problem will be described.
The potential problem relates to the constraints placed on the location of the source block 1030, as discussed above with reference to Figure 17. In previously proposed systems such as HEVC systems, these constraints are tested against each motion vector as the motion vectors are first generated. In the case of a split CU having multiple PUs, a motion vector is generated in respect of each luminance PU within the CU such that the motion vector complies with all of the criteria relating to allowable displacements for that luminance PU. In other words, the source block 1030 corresponding to that luminance PU is in an image region considered allowable according to the set of criteria.
However, in the case of merged chrominance PUs, a motion vector derived in respect of a particular spatial position within the CU may be used in the decoding of the merged PU which represents a different (in fact, a larger) spatial region within the CU. In some situations, this may mean that the source block used in the decoding of the merged chrominance PU does not comply with the criteria. For example, a part of the source block used for the merged chrominance PU may be displaced further from the LCU containing the merged chrominance PU than the limit of (in this example) 64 luminance samples, or a part of the source block used for the merged chrominance PU may be outside the current row of LCUs, or a part of the source block used for the merged chrominance PU may contravene the requirement in a constrained intra mode that the source block is entirely intra-image or intra-block-copy encoded. Indeed, more than one of these criteria may be contravened by the source block corresponding to the merged chrominance PU.
The effect of this is that a motion vector to be used by the decoder to decode a merged chrominance PU may in fact be illegal or invalid. One possibility is for the decoder to detect this and to clip or otherwise change the motion vector in order to render it valid and legal. However, this results in fundamentally the wrong motion vector being used to decode that merged chrominance PU, because the clipped or changed motion vector is by definition different to the motion vector used at the encoder to generate the prediction of that PU. Also, testing and altering motion vectors at the decoder would place an undesirable processing burden on the decoder. Accordingly, this is not considered to be a useful solution.
In some embodiments, the problem may be addressed by the controller 345 disallowing intra-block-copy for modes other than a 4:4:4 mode.
In other embodiments, the problem may be addressed by the controller 345 disallowing splitting of 8x8 intra-block-copy CUs for modes other than a 4:4:4 mode.
In other embodiments the problem may be addressed by the controller 345 disallowing splitting of 8x8 intra-block-copy CUs where such a split would cause any of the PUs to require to be merged (and/or to fall below a minimum size in either dimension).
Further possible arrangements according to embodiments of the disclosure will be discussed below.
Figure 24 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU according to an embodiment of the present disclosure.
A step 1300 corresponds to the step 1 150 of Figure 19 and will not be described further here.
At a step 1310, motion vectors are generated in respect of each luminance PU in the manner described above with reference to the steps 1 160, 1 170, but with an additional constraint 1320 such that in respect of image data of a subsampled chrominance format, the detecting step comprises detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more other prediction units of that coding unit, by a displacement within the allowable range of displacements. This additional constraint will be described below.
A step 1330 corresponds to the step 1 180 discussed above.
The additional constraint represented schematically as 1320 in Figure 24 is as follows.
For a predetermined subset of the luminance PU motion vectors, a further constraint is applied such that the displacements (corresponding to the positions of the corresponding source blocks) represented by those motion vectors are also tested for validity against the spatial positions of other PUs derived from that CU. Only motion vectors which comply with this additional validity test are allowed to be generated.
The predetermined subset of the luminance PU motion vectors comprises those motion vectors which will (or may) be used in respect of merged PUs. This will always include the motion vector for the lower-right luminance PU but may also include the motion vector for the
PU labelled as PU1 in Figure 23, since that motion vector is used for the decoding of the chrominance PU 1 124 of Figure 23. Of course, if a different convention were used so that instead of the lower-right motion vector being used for a merged PU a different one of the motion vectors were selected, the predetermined subset would change accordingly. Overall, however, as mentioned above the predetermined subset of the luminance PU motion vectors comprises those motion vectors which will (or may) be used in respect of merged PUs.
Of course, the additional validity test may be applied to all of the motion vectors of a split CU. However, this may impose an unnecessary restriction on some of the motion vectors, and so in embodiments of the present disclosure the additional validity test is applied only to the predetermined subset of motion vectors.
The additional validity test applied to the predetermined subset of motion vectors is such that the motion vectors are not only valid for the spatial position of the luminance PU from which they were derived (a situation corresponding to the previously proposed step 1 170 of Figure 19) but are also valid in respect of the spatial position of any further luminance PUs corresponding to the chrominance PUs of a merged PU for which that motion vector may or will be used at decoding.
For simplicity, in some embodiments each motion vector in the predetermined subset of motion vectors is required to be valid in respect of all of the PUs of that CU. This additional validity test encompasses the criteria set out above. However, in other embodiments the additional validity test may be applied only in respect of the spatial position of any further luminance PUs corresponding to the chrominance PUs of a merged PU for which that motion vector may or will be used at decoding.
So, referring to the examples of Figures 21 -23, the additional validity test is applied as follows:
In respect of Figure 21 , in at least a 4:2:0 mode or a 4:2:2 mode, the motion vector derived in respect of the luminance PU labelled as PU1 is tested for validity against not only its own spatial position but also the spatial position of the luminance PU labelled as PU0.
In respect of Figure 22, in at least a 4:2:0 mode, the motion vector derived in respect of the luminance PU labelled as PU1 is tested for validity against not only its own spatial position but also the spatial position of the luminance PU labelled as PU0.
In respect of Figure 23, the situation is slightly more complicated according to the chrominance format in use. In a 4:2:0 format, the motion vector derived in respect of the luminance PU labelled as PU3 is tested for validity against not only its own spatial position but also the spatial position of all of the other luminance PUs. In a 4:2:2 format, the motion vector derived in respect of the luminance PU labelled as PU1 is tested for validity against not only its own spatial position but also the special position of the luminance PU labelled as PU0, and the motion vector derived in respect of the luminance PU labelled as PU3 is tested for validity against not only its own spatial position but also the special position of the luminance PU labelled as PU2.
As discussed above, there are various validity criteria relating to allowable displacements represented by the motion vectors. One of these criteria relates to the maximum displacement to the left of the current block position, and another of the criteria relates to constraining the source block to lie within the current row of LCUs. Either or both of these criteria may be elegantly and simply addressed using a modified search area as described below.
Figure 25 schematically illustrates a modified search area in respect of a current PU 1400 which is subject to the additional validity test 1320.
A normal search area for use in respect of the PU position presented by the block 1400 as shown in solid line. The upper boundary 1410 represents the top of the current row of LCUs. The left boundary 1420 represents a position 64 luminance samples to the left of the PU 1400.
In order to meet the additional validity test in respect of the PU 1400, the search area is modified.
If the motion vector in respect of the PU 1400 is required to be valid in respect of a PU position above the position of the PU 1400, then the search area's upper boundary 1410 is lowered by an amount ΔΥ which is equal to at least the vertical height of the PU above the position of the PU 1400 (in the examples discussed here, this will be equal to 4 luminance samples). Similarly, if the motion vector in respect of the PU 1400 is required to be valid in respect of a PU position to the left of the position of the PU 1400, then the search area's left- hand boundary 1420 is moved to the right by an amount ΔΧ which is equal to at least the horizontal width of the PU to the left of the position of the PU 1400 (again, in the examples discussed here, this will be equal to 4 luminance samples). This provides an example of reducing the extent of the search area by a number of samples corresponding to the horizontal or vertical size of a luminance prediction unit.
Figure 26 is a schematic flowchart illustrating a technique for using multiple PUs derived from a single CU. This flowchart schematically represents an example implementation of the step 1310 discussed above.
At a step 1500, a search is carried out for luminance motion vectors in respect of each
PU of a divided CU, for example using a correlation technique in respect of a search area as discussed above. At a step 1510, a test is carried out to detect whether the luminance motion vector for one or more predetermined PUs (for example, the predetermined subset of PUs discussed above) is valid for one or more other PUs (for example, all of the PU is in the CU or just those other PUs which will make a response to a merged chrominance PU which will use that motion vector). If the test is failed, then the motion vector is rejected at a step 1520 and the process repeated. Otherwise, the luminance motion vectors are used in the encoding process at a step 1530. In some embodiments, if no valid motion vector can be found, the attempt to encode using intra-block-copy is abandoned and intra-image encoding is used instead.
It will be understood that the additional validity test may be applied in respect of all CUs which are subject to splitting, or just to 8 x 8 CUs, or just to CUs for which the selected splitting, under the current chrominance format, will result in the merging of chrominance PUs. Accordingly, the step 1310 may also comprise detecting whether the chrominance prediction units corresponding to the luminance prediction units have an array size less than a threshold size and, if so, detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
Figure 27 schematically illustrates an encoding method in which a data stream is encoded at a step 1600 according to the procedures discussed above, and at a step 1610 the data stream is output (transmitted and/or stored).
Figure 28 schematically illustrates a decoding method in which an encoded data stream is received (replayed and/or received as a transmission) at a step 1620 and, at a step 1630 the data stream is decoded.
Embodiments of the present disclosure also relates to an encoded data stream which has been encoded using the techniques discussed above. By way of example, Figures 29 and 30 schematically illustrate examples of machine-readable non-transitory storage media storing such a data stream. Figure 29 provides a schematic example of a non-transitory memory such as a flash memory device 1640 storing such a data stream, and Figure 30 provides a schematic example of a magnetic and/or optical disc medium 1650 storing such a data stream.
Embodiments of the present disclosure also relate to an instance of distribution of an encoded data stream or portion thereof. This may be from a server to a client device. The instance of distribution may be defined in time, for example by a user request.
Respective aspects and features are defined by the following numbered clauses.
1 . A method of operation of an image data encoding apparatus, the method comprising: in respect of a coding unit comprising an array of data samples of a current image, dividing the coding unit into two or more prediction units; and
for each prediction unit, detecting a motion vector pointing to an array of samples for use in generating a predicted version of the samples of that prediction unit so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements;
in which the detecting step comprises detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more other prediction units of that coding unit, by a displacement within the allowable range of displacements.
2. A method according to clause 1 , operable in respect of image data of a subsampled chrominance format, in which the detecting step is comprises detecting the motion vector from luminance samples of the image data.
3. A method according to clause 2, in which:
the detecting step comprises detecting whether the chrominance prediction units corresponding to the luminance prediction units have an array size less than a threshold size and, if so, detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
4. A method according to any one of the preceding clauses, in which the detecting step comprises comparing samples of the prediction unit with source samples within a search area of a version of the current image,
5. A method according to clause 4, in which the detecting step comprises reducing the extent of the search area by a number of samples corresponding to the horizontal or vertical size of a prediction unit.
6. A method according to any one of the preceding clauses, in which the detecting step comprises detecting, for one or more prediction units other than the predetermined prediction unit, whether applying the motion vector associated with the predetermined prediction unit represents a displacement outside of the allowable range of displacements.
7. A method according to any one of the preceding clauses, in which the allowable range of displacements comprises of one or more selected from the group consisting of:
displacements to a position within a current row of coding units;
displacements of no more than a predetermined number of samples to the left of the largest coding unit containing the prediction unit; and
in a constrained intra mode, displacements to source samples which were derived using intra-image or intra-block-copy prediction.
8. A method according to any one of the preceding clauses as dependent upon clause 2, in which the threshold size is 4 samples in a horizontal or a vertical direction.
9. A method according to any one of the preceding clauses as dependent upon clause 2, in which the subsampled chrominance format is a 4:2:2 format or a 4:2:0 format.
10. A method according to any one of the preceding clauses, in which the predetermined subset of the prediction units comprises those prediction units for which the corresponding motion vectors will be used in the decoding of a merged prediction unit. 1 1 . A method according to clause 10, in which the one or more other prediction units comprise those other prediction units for which the corresponding chrominance prediction units will form part of the merged prediction unit.
12. A method of operation of an image data encoding apparatus, the method comprising: generating a predicted version of an image block by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; and
disallowing intra-block-copy operations for chrominance modes other than a 4:4:4 mode.
13. A method of operation of an image data encoding apparatus, the method comprising: generating a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
splitting the coding unit into two or more prediction units; and
disallowing splitting of 8x8 sample intra-block-copy coding units for chrominance modes other than a 4:4:4 mode.
14. A method of operation of an image data encoding apparatus, the method comprising: generating a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
splitting the coding unit into two or more prediction units; and
disallowing splitting of 8x8 intra-block-copy coding units where such a split would cause any of the prediction units to require to be merged and/or to fall below a minimum size in either dimension.
15. Image data encoded according to the method of any one of the preceding clauses.
16. Image data comprising a plurality of coding units, at least some of which are divided into multiple prediction units having associated motion vectors for intra-block-copy prediction such that for at least one image component the prediction units are merged so as to share a motion vector from one of the prediction units of that coding unit, the image data being constrained so that the shared motion vector represents a displacement within an allowable range of displacements with respect to all of the merged prediction units sharing that motion vector.
17. A method of operation of an image data decoding apparatus, the method comprising decoding image data according to clause 15 or clause 16.
18. Computer software which, when executed by a computer, causes the computer to carry out the method of any one of clauses 1 to 14 or 17.
19. A non-transitory machine readable storage medium which stores computer software according to clause 18. 20. A non-transitory machine readable storage medium which stores image data according to clause 15 or clause 16.
21 . An instance of distribution of image data according to clause 15 or clause 16.
22. A data encoding apparatus comprising:
a block generator, operable in respect of a coding unit comprising an array of data samples of a current image, and configured to divide the coding unit into two or more prediction units; and
a detector configured, for each prediction unit, to detect a motion vector pointing to an array of samples for use in generating a predicted version of the samples of that prediction unit so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements;
in which the detector is configured to detect a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
23. A data decoding apparatus configured to decode image data according to clause 15 or clause 16.
24. An image data encoding apparatus comprising:
an encoder configured to generate a predicted version of an image block by an intra- block-copy operation in which the predicted version is based upon another image region derived from the same image; and to disallow intra-block-copy operations for chrominance modes other than a 4:4:4 mode.
25. An image data encoding apparatus comprising:
an encoder configured to generate a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; to split the coding unit into two or more prediction units; and to disallow splitting of 8x8 sample intra-block-copy coding units for chrominance modes other than a 4:4:4 mode.
26. An image data encoding apparatus comprising:
an encoder configured to generate a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; to split the coding unit into two or more prediction units; and to disallow splitting of 8x8 intra-block-copy coding units where such a split would cause any of the prediction units to require to be merged and/or to fall below a minimum size in either dimension. 27. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to any one of clauses 22 to 26. Apparatus features of the above encoder or decoder may be carried using the apparatus described above, with various functionality being provided by the controller 345.
As discussed earlier, it will be appreciated that apparatus features of the above clause may be implemented by respective features of the encoder or decoder as discussed earlier.

Claims

1 . A method of operation of an image data encoding apparatus, the method comprising: in respect of a coding unit comprising an array of data samples of a current image, dividing the coding unit into two or more prediction units; and
for each prediction unit, detecting a motion vector pointing to an array of samples for use in generating a predicted version of the samples of that prediction unit so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements;
in which the detecting step comprises detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more other prediction units of that coding unit, by a displacement within the allowable range of displacements.
2. A method according to claim 1 , operable in respect of image data of a subsampled chrominance format, in which the detecting step is comprises detecting the motion vector from luminance samples of the image data.
3. A method according to claim 2, in which:
the detecting step comprises detecting whether the chrominance prediction units corresponding to the luminance prediction units have an array size less than a threshold size and, if so, detecting a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
4. A method according to claim 1 , in which the detecting step comprises comparing samples of the prediction unit with source samples within a search area of a version of the current image,
5. A method according to claim 4, in which the detecting step comprises reducing the extent of the search area by a number of samples corresponding to the horizontal or vertical size of a prediction unit.
6. A method according to claim 1 , in which the detecting step comprises detecting, for one or more prediction units other than the predetermined prediction unit, whether applying the motion vector associated with the predetermined prediction unit represents a displacement outside of the allowable range of displacements.
7. A method according to claim 1 , in which the allowable range of displacements comprises of one or more selected from the group consisting of:
displacements to a position within a current row of coding units;
displacements of no more than a predetermined number of samples to the left of the largest coding unit containing the prediction unit; and
in a constrained intra mode, displacements to source samples which were derived using intra-image or intra-block-copy prediction.
8. A method according to claim 2, in which the threshold size is 4 samples in a horizontal or a vertical direction.
9. A method according to claim 2, in which the subsampled chrominance format is a 4:2:2 format or a 4:2:0 format.
10. A method according to claim 1 , in which the predetermined subset of the prediction units comprises those prediction units for which the corresponding motion vectors will be used in the decoding of a merged prediction unit.
1 1 . A method according to claim 10, in which the one or more other prediction units comprise those other prediction units for which the corresponding chrominance prediction units will form part of the merged prediction unit.
12. A method of operation of an image data encoding apparatus, the method comprising: generating a predicted version of an image block by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; and
disallowing intra-block-copy operations for chrominance modes other than a 4:4:4 mode.
13. A method of operation of an image data encoding apparatus, the method comprising: generating a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
splitting the coding unit into two or more prediction units; and disallowing splitting of 8x8 sample intra-block-copy coding units for chrominance modes other than a 4:4:4 mode.
14. A method of operation of an image data encoding apparatus, the method comprising: generating a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image;
splitting the coding unit into two or more prediction units; and
disallowing splitting of 8x8 intra-block-copy coding units where such a split would cause any of the prediction units to require to be merged and/or to fall below a minimum size in either dimension.
15. Image data encoded according to the method of claim 1 .
16. Image data comprising a plurality of coding units, at least some of which are divided into multiple prediction units having associated motion vectors for intra-block-copy prediction such that for at least one image component the prediction units are merged so as to share a motion vector from one of the prediction units of that coding unit, the image data being constrained so that the shared motion vector represents a displacement within an allowable range of displacements with respect to all of the merged prediction units sharing that motion vector.
17. A method of operation of an image data decoding apparatus, the method comprising decoding image data according to claim 15.
18. Computer software which, when executed by a computer, causes the computer to carry out the method of claim 1 .
19. Computer software which, when executed by a computer, causes the computer to carry out the method of claim 16.
20. A non-transitory machine readable storage medium which stores computer software according to claim 18.
21 . A non-transitory machine readable storage medium which stores computer software according to claim 19.
22. A non-transitory machine readable storage medium which stores image data according to claim 15.
23. A non-transitory machine readable storage medium which stores image data according to claim 16.
24. An instance of distribution of image data according to claim 15.
25. An instance of distribution of image data according to claim 16.
26. A data encoding apparatus comprising:
a block generator, operable in respect of a coding unit comprising an array of data samples of a current image, and configured to divide the coding unit into two or more prediction units; and
a detector configured, for each prediction unit, to detect a motion vector pointing to an array of samples for use in generating a predicted version of the samples of that prediction unit so that the source samples applicable to each prediction unit are displaced, with respect to that prediction unit, by a displacement within an allowable range of displacements;
in which the detector is configured to detect a motion vector in respect of a predetermined subset of the prediction units of a coding unit so that the source samples pointed to by each motion vector in that subset of the motion vectors are displaced, with respect to one or more of the other prediction units of that coding unit, by a displacement within the allowable range of displacements.
27. A data decoding apparatus configured to decode image data according to claim 16.
28. An image data encoding apparatus comprising:
an encoder configured to generate a predicted version of an image block by an intra- block-copy operation in which the predicted version is based upon another image region derived from the same image; and to disallow intra-block-copy operations for chrominance modes other than a 4:4:4 mode.
29. An image data encoding apparatus comprising:
an encoder configured to generate a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; to split the coding unit into two or more prediction units; and to disallow splitting of 8x8 sample intra-block-copy coding units for chrominance modes other than a 4:4:4 mode.
30. An image data encoding apparatus comprising:
an encoder configured to generate a predicted version of an image coding unit by an intra-block-copy operation in which the predicted version is based upon another image region derived from the same image; to split the coding unit into two or more prediction units; and to disallow splitting of 8x8 intra-block-copy coding units where such a split would cause any of the prediction units to require to be merged and/or to fall below a minimum size in either dimension.
31 . Video data capture, transmission, display and/or storage apparatus comprising apparatus according to claim 26.
32. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to claim 27.
PCT/GB2015/050188 2014-01-29 2015-01-28 Image data encoding and decoding WO2015114322A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1401529.1A GB2522844A (en) 2014-01-29 2014-01-29 Data encoding and decoding
GB1401529.1 2014-01-29

Publications (1)

Publication Number Publication Date
WO2015114322A1 true WO2015114322A1 (en) 2015-08-06

Family

ID=50287760

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2015/050188 WO2015114322A1 (en) 2014-01-29 2015-01-28 Image data encoding and decoding

Country Status (2)

Country Link
GB (1) GB2522844A (en)
WO (1) WO2015114322A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016122900A1 (en) * 2015-01-27 2016-08-04 Microsoft Technology Licensing, Llc Special case handling for merged chroma blocks in intra block copy prediction mode
US10368091B2 (en) 2014-03-04 2019-07-30 Microsoft Technology Licensing, Llc Block flipping and skip mode in intra block copy prediction
US10390034B2 (en) 2014-01-03 2019-08-20 Microsoft Technology Licensing, Llc Innovations in block vector prediction and estimation of reconstructed sample values within an overlap area
US10469863B2 (en) 2014-01-03 2019-11-05 Microsoft Technology Licensing, Llc Block vector prediction in video and image coding/decoding
US10506254B2 (en) 2013-10-14 2019-12-10 Microsoft Technology Licensing, Llc Features of base color index map mode for video and image coding and decoding
US10542274B2 (en) 2014-02-21 2020-01-21 Microsoft Technology Licensing, Llc Dictionary encoding and decoding of screen content
CN110719467A (en) * 2019-09-18 2020-01-21 浙江大华技术股份有限公司 Prediction method of chrominance block, encoder and storage medium
US10582213B2 (en) 2013-10-14 2020-03-03 Microsoft Technology Licensing, Llc Features of intra block copy prediction mode for video and image coding and decoding
US10659783B2 (en) 2015-06-09 2020-05-19 Microsoft Technology Licensing, Llc Robust encoding/decoding of escape-coded pixels in palette mode
US10785486B2 (en) 2014-06-19 2020-09-22 Microsoft Technology Licensing, Llc Unified intra block copy and inter prediction modes
US10812817B2 (en) 2014-09-30 2020-10-20 Microsoft Technology Licensing, Llc Rules for intra-picture prediction modes when wavefront parallel processing is enabled
US10986349B2 (en) 2017-12-29 2021-04-20 Microsoft Technology Licensing, Llc Constraints on locations of reference blocks for intra block copy prediction
US11109036B2 (en) 2013-10-14 2021-08-31 Microsoft Technology Licensing, Llc Encoder-side options for intra block copy prediction mode for video and image coding
CN113826384A (en) * 2019-03-12 2021-12-21 腾讯美国有限责任公司 Method and apparatus for encoding or decoding video
US11284103B2 (en) 2014-01-17 2022-03-22 Microsoft Technology Licensing, Llc Intra block copy prediction with asymmetric partitions and encoder-side search patterns, search ranges and approaches to partitioning

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10531085B2 (en) * 2017-05-09 2020-01-07 Futurewei Technologies, Inc. Coding chroma samples in video compression

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110110430A1 (en) * 2009-11-12 2011-05-12 National Chung Cheng University Method for motion estimation in multimedia images

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101365570B1 (en) * 2007-01-18 2014-02-21 삼성전자주식회사 Method and apparatus for encoding and decoding based on intra prediction
JP4788649B2 (en) * 2007-04-27 2011-10-05 株式会社日立製作所 Moving image recording method and apparatus
WO2012097376A1 (en) * 2011-01-14 2012-07-19 General Instrument Corporation Spatial block merge mode
US9247266B2 (en) * 2011-04-18 2016-01-26 Texas Instruments Incorporated Temporal motion data candidate derivation in video coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110110430A1 (en) * 2009-11-12 2011-05-12 National Chung Cheng University Method for motion estimation in multimedia images

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NACCARI M ET AL: "HEVC Range Extensions Test Model 6 Encoder Description", 16. JCT-VC MEETING; 9-1-2014 - 17-1-2014; SAN JOSE; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-P1013, 23 February 2014 (2014-02-23), XP030115885 *
ROSEWARNE C ET AL: "BoG report on Range Extensions", 16. JCT-VC MEETING; 9-1-2014 - 17-1-2014; SAN JOSE; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-P0288-v4, 14 January 2014 (2014-01-14), XP030115839 *
SHARMAN K ET AL: "AHG5: Intra-block-copy in Non-4:4:4 Formats", 17. JCT-VC MEETING; 27-3-2014 - 4-4-2014; VALENCIA; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-Q0075, 17 March 2014 (2014-03-17), XP030115976 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10506254B2 (en) 2013-10-14 2019-12-10 Microsoft Technology Licensing, Llc Features of base color index map mode for video and image coding and decoding
US11109036B2 (en) 2013-10-14 2021-08-31 Microsoft Technology Licensing, Llc Encoder-side options for intra block copy prediction mode for video and image coding
US10582213B2 (en) 2013-10-14 2020-03-03 Microsoft Technology Licensing, Llc Features of intra block copy prediction mode for video and image coding and decoding
US10469863B2 (en) 2014-01-03 2019-11-05 Microsoft Technology Licensing, Llc Block vector prediction in video and image coding/decoding
US10390034B2 (en) 2014-01-03 2019-08-20 Microsoft Technology Licensing, Llc Innovations in block vector prediction and estimation of reconstructed sample values within an overlap area
US11284103B2 (en) 2014-01-17 2022-03-22 Microsoft Technology Licensing, Llc Intra block copy prediction with asymmetric partitions and encoder-side search patterns, search ranges and approaches to partitioning
US10542274B2 (en) 2014-02-21 2020-01-21 Microsoft Technology Licensing, Llc Dictionary encoding and decoding of screen content
US10368091B2 (en) 2014-03-04 2019-07-30 Microsoft Technology Licensing, Llc Block flipping and skip mode in intra block copy prediction
US10785486B2 (en) 2014-06-19 2020-09-22 Microsoft Technology Licensing, Llc Unified intra block copy and inter prediction modes
US10812817B2 (en) 2014-09-30 2020-10-20 Microsoft Technology Licensing, Llc Rules for intra-picture prediction modes when wavefront parallel processing is enabled
US9591325B2 (en) 2015-01-27 2017-03-07 Microsoft Technology Licensing, Llc Special case handling for merged chroma blocks in intra block copy prediction mode
WO2016122900A1 (en) * 2015-01-27 2016-08-04 Microsoft Technology Licensing, Llc Special case handling for merged chroma blocks in intra block copy prediction mode
US10659783B2 (en) 2015-06-09 2020-05-19 Microsoft Technology Licensing, Llc Robust encoding/decoding of escape-coded pixels in palette mode
US10986349B2 (en) 2017-12-29 2021-04-20 Microsoft Technology Licensing, Llc Constraints on locations of reference blocks for intra block copy prediction
CN113826384A (en) * 2019-03-12 2021-12-21 腾讯美国有限责任公司 Method and apparatus for encoding or decoding video
CN113826384B (en) * 2019-03-12 2024-03-22 腾讯美国有限责任公司 Method and apparatus for encoding or decoding video
CN110719467A (en) * 2019-09-18 2020-01-21 浙江大华技术股份有限公司 Prediction method of chrominance block, encoder and storage medium
CN110719467B (en) * 2019-09-18 2022-04-19 浙江大华技术股份有限公司 Prediction method of chrominance block, encoder and storage medium

Also Published As

Publication number Publication date
GB2522844A (en) 2015-08-12
GB201401529D0 (en) 2014-03-12

Similar Documents

Publication Publication Date Title
US11109019B2 (en) Data encoding and decoding
WO2015114322A1 (en) Image data encoding and decoding
US10958938B2 (en) Data encoding and decoding
US10341680B2 (en) Data encoding and decoding apparatus, method and storage medium
US11290709B2 (en) Image data encoding and decoding
US10863176B2 (en) Intra encoding and decoding with variable intra prediction direction sets
EP2858367A2 (en) Quantisation parameter selection for different colour sampling formats
GB2580106A (en) Image data encoding and decoding
CN111684798A (en) Data encoding and decoding
GB2564150A (en) Image data encoding and decoding
GB2577337A (en) Image data encoding and decoding
US20230007259A1 (en) Image data encoding and decoding
GB2580108A (en) Image data encoding and decoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15702551

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15702551

Country of ref document: EP

Kind code of ref document: A1