WO2014050009A1 - Traitement d'image dans des systèmes vidéo échelonnables - Google Patents

Traitement d'image dans des systèmes vidéo échelonnables Download PDF

Info

Publication number
WO2014050009A1
WO2014050009A1 PCT/JP2013/005458 JP2013005458W WO2014050009A1 WO 2014050009 A1 WO2014050009 A1 WO 2014050009A1 JP 2013005458 W JP2013005458 W JP 2013005458W WO 2014050009 A1 WO2014050009 A1 WO 2014050009A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
index
electronic device
layer
picture processing
Prior art date
Application number
PCT/JP2013/005458
Other languages
English (en)
Inventor
Kiran Mukesh MISRA
Christopher Andrew Segall
Jie Zhao
Original Assignee
Sharp Kabushiki Kaisha
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/631,857 external-priority patent/US20140092971A1/en
Application filed by Sharp Kabushiki Kaisha filed Critical Sharp Kabushiki Kaisha
Publication of WO2014050009A1 publication Critical patent/WO2014050009A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Definitions

  • the present disclosure relates generally to electronic devices. More specifically, the present disclosure relates to electronic devices for coding scalable video.
  • HEVC high efficiency video coding
  • a picture is an array of luma samples in monochrome format or an array of luma samples and two corresponding arrays of chroma samples in 4:2:0, 4:2:2, and 4:4:4 colour format.
  • a coding block is an NxN block of samples for some value of N.
  • the division of a coding tree block into coding blocks is a partitioning
  • a coding tree block is an NxN block of samples for some value of N.
  • the division of one of the arrays that compose a picture that has three sample arrays or of the array that compose a picture in monochrome format or a picture that is coded using three separate colour planes into coding tree blocks is a partitioning.
  • a coding tree unit is a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples.
  • the division of a slice into coding tree units is a partitioning.
  • a coding unit is a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples.
  • the division of a coding tree unit into coding units is a partitioning.
  • Prediction is defined as an embodiment of the prediction process.
  • a prediction block is a rectangular MxN block on which the same prediction is applied.
  • the division of a coding block into prediction blocks is a partitioning.
  • a prediction process is the use of a predictor to provide an estimate of the data element (e.g. sample value or motion vector) currently being decoded.
  • a prediction unit is a prediction block of luma samples, two corresponding prediction blocks of chroma samples of a picture that has three sample arrays, or a prediction block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to predict the prediction block samples.
  • a predictor is a combination of specified values or previously decoded data elements (e.g. sample value or motion vector) used in the decoding process of subsequent data elements.
  • a tile is an integer number of coding tree blocks co-occurring in one column and one row, ordered consecutively in coding tree block raster scan of the tile.
  • the division of each picture into tiles is a partitioning. Tiles in a picture are ordered consecutively in tile raster scan of the picture.
  • a tile scan is a specific seqential ordering of coding tree blocks partitioning a picture.
  • the tile scan order traverses the coding tree blocks in coding tree block raster scan within a tile and traverses tiles in tile raster scan within a picture.
  • a slice contains coding tree blocks that are consecutive in coding tree block raster scan of a tile, these coding tree blocks are not necessarily consecutive in coding tree block raster scan of the picture.
  • a slice is an integer number of coding tree blocks ordered consecutively in the tile scan.
  • the division of each picture into slices is a partitioning.
  • the coding tree block addresses are derived from the first coding tree block address in a slice (as represented in the slice header).
  • a B slice or a bi-predictive slice is a slice that may be decoded using intra prediction or inter prediction using at most two motion vectors and reference indices to predict the sample values of each block.
  • a P slice or a predictive slice is a slice that may be decoded using intra prediction or inter prediction using at most one motion vector and reference index to predict the sample values of each block.
  • a reference picture list is a list of 'reference pictures' that is used for 'uni-prediction' of a 'P' or 'B slice'.
  • a reference picture list 0 is a reference picture list used for inter prediction of a P or B slice. All inter prediction used for P slices uses reference picture list 0.
  • Reference picture list 0 is one of two reference picture lists used for bi-prediction for a B slice, with the other being reference picture list 1.
  • a reference picture list 1 is a reference picture list used for bi-prediction of a B slice.
  • Reference picture list 1 is one of two reference picture lists used for bi-prediction for a B slice, with the other being reference picture list 0.
  • a reference index is an index into a reference picture list.
  • a picture order count is a variable that is associated with each picture that indicates the position of the associated picture in output order relative to the output order positions of the other pictures in the same coded video sequence.
  • a long-term reference picture is a 'picture' that is marked as "used for long-term reference".
  • a picture is first partitioned into smaller collection of pixels.
  • this collection of pixels is referred to as a prediction unit.
  • a video encoder then performs a search in previously transmitted pictures for a collection of pixels which is closest to the current prediction unit under consideration.
  • the encoder instructs the decoder to use this closest collection of pixels as an initial estimate for the current prediction unit. It may then transmit residue information to improve this estimate.
  • the instruction to use an initial estimate is conveyed to the decoder by means of a signal that contains a pointer to this collection of pixels in the reference picture. More specifically, the pointer information contains an index into a list of reference pictures which is called the reference index and the spatial displacement vector (or motion vector) with respect to the current prediction unit.
  • the spatial displacement vector is not an integer value, and as such, the initial estimate corresponds to a representation of the collection of pixels.
  • an encoder may alternatively identify two collections of pixels in one or more reference pictures and instruct the decoder to use a linear combination of the two collections of pixels as an initial estimate of the current prediction unit. An encoder will then need to transmit two corresponding pointers to the decoders each containing a reference index into a list and a motion vector. In general a linear combination of one or more collections of pixels in previously decoded pictures is used to exploit the temporal correlation in a video sequence.
  • an encoder transmits an indicator to the decoder. In HEVC this indicator is called the inter-prediction mode. Using this motion information a decoder may construct an initial estimate of the prediction unit under consideration.
  • the motion information assigned to each prediction unit within HEVC consists of the following three pieces of information:
  • list 0 is a first list of reference pictures
  • list 0 is a second list of reference pictures, which may have a same combination or a different combination of values than the first list.
  • motion information carried by prediction units are spatially correlated, i.e. a prediction unit will carry the same or similar motion information as the spatially neighboring prediction units. For example a large object like a bus undergoing translational motion within a video sequence and spanning across several prediction units in a picture/frame will typically contain several prediction units carrying the same motion information. This type of correlation is also observed in co-located prediction units of previously decoded pictures. Often it is bit-efficient for the encoder to instruct the decoder to copy the motion information from one of these spatial or temporal neighbors. In HEVC, this process of copying motion information may be referred to as the merge mode of signaling motion information.
  • the motion vector may be spatially and/or temporally correlated but there exists pictures other than the ones pointed to by the spatial/temporal neighbors which carry higher quality pixel reconstructions corresponding to the prediction unit under consideration.
  • the encoder explicitly signals all the motion information except the motion vector information to the decoder.
  • the encoder instructs the decoder to use one of the neighboring spatial/temporal motion vectors as an initial estimate and then sends a refinement motion vector delta to the decoder.
  • a merge flag is transmitted in the bitstream to indicate that the signaling mechanism used for motion information is based on the merging process.
  • the merge mode a list of up to five candidates is constructed. The first set of candidates is constructed using spatial and temporal neighbors. The spatial and temporal candidates are followed by various bi-directional combinations of the candidates added so far. Zero motion vector candidates are then added following the bi-directional motion information.
  • Each of the five candidates contains all the three pieces of motion information required by a prediction unit: inter-prediction mode, reference indices and motion vector. If the merge flag is true a merge index is signaled to indicate which candidate motion information from the merge list is to be used by all the prediction units within the coding unit.
  • a merge flag is transmitted in the bitstream to indicate that the signaling mechanism used for motion information is based on the merging process. If the merge flag is true a merge index into the merge list is signaled for a prediction unit using the merge mode. This merge index uniquely identifies the motion information to be used for the prediction unit.
  • a prediction unit may explicitly receives the inter-prediction mode and reference indices in the bitstream.
  • the inter-prediction mode may not be received and inferred based on data received earlier in the bitstream, for example based on slice type.
  • MVP list a list of two motion vectors predictors
  • An index into this list identifies the predictor to use.
  • the prediction unit receives a motion vector delta. The sum of the predictor identified using the index into MVP list and the received motion vector delta (also called motion vector difference) gives the motion vector associated with the prediction unit.
  • Scalable video coding is known.
  • a primary bit stream (called the base layer bitstream) is received by a decoder.
  • the decoder may receive one or more secondary bitstream(s) (called enhancement layer bitstreams(s)).
  • the function of each enhancement layer bitstream may be: to improve the quality of the base layer bitstream; to improve the frame rate of the base layer bitstream; or to improve the pixel resolution of the base layer bitstream.
  • Quality scalability is also referred to as Signal-to-Noise Ratio (SNR) scalability.
  • Frame rate scalability is also referred to as temporal scalability.
  • Resolution scalability is also referred to as spatial scalability.
  • Enhancement layer bitstream(s) can change other features of the base layer bitstream.
  • an enhancement layer bitstream can be associated with a different aspect ratio and/or viewing angle than the base layer bitstream.
  • Another aspect of enhancement layer bitstreams is that it is also possible that the base layer bitstream and an enhancement layer bitstream correspond to different video coding standards, e.g. the base layer bitstream may be coded according to a first video coding standard and an enhancement layer bitstream may be coded according to a second different video coding standard.
  • An ordering may be defined between layers. For example:
  • enhancement layer(s) may have dependency on one another (in an addition to the base layer).
  • enhancement layer 2 is usable only if at least a portion of enhancement layer 1 has been parsed and/or reconstructed successfully (and if at least a portion of the base layer has been parsed and/or reconstructed successfully).
  • FIG. 1A illustrates a decoding process for a scalable video decoder with two enhancement layers.
  • a base layer decoder outputs decoded base layer pictures.
  • the base layer decoder also provides metadata, e.g. motion vectors, and/or picture data, e.g. pixel data, to inter layer processing 0.
  • Inter layer processing 0 provides an inter layer prediction to the enhancement layer 0 decoder, which in turn outputs decoded enhancement layer 0 pictures.
  • the decoded enhancement layer 0 pictures have a quality improvement with respect to decoded base layer pictures.
  • Enhancement layer 0 decoder also provides metadata and/or picture data to inter layer processing 1.
  • Inter layer processing 1 provides an inter layer prediction to the enhancement layer 1 decoder, which in turn outputs decoded enhancement layer 1 pictures.
  • decoded enhancement layer 1 pictures have increased spatial resolution as compared to decoded enhancement layer 0 pictures.
  • Prediction may be by uni-prediction or bi-prediction - in the later case there will be two reference indices and a motion vector for each reference index.
  • FIG. 1B illustrates uni-prediction according to HEVC
  • FIG. 1C illustrates bi-prediction according to HEVC.
  • Transmission to a decoder e.g. transmission over a network to the decoder, according to known schemes consumes bandwidth, e.g. network bandwidth.
  • bandwidth e.g. network bandwidth.
  • the bandwidth consumed by the transmission to the decoder according to these known schemes is too high for some applications.
  • One embodiment of the present invention discloses a system, comprising: an electronic device of a decoder, the electronic device configured to: receive a first layer bitstream and a second enhancement layer bitstream corresponding to the first layer bitstream; obtain a reference index for recovering an enhancement layer picture; determine whether a reference picture pointed to by the obtained reference index is a first layer picture representation that is different than a first layer picture; responsive to determining that the reference picture is the first layer picture representation that is different than the first layer picture, recover a picture processing index; and responsive to recovering the picture processing index, recover the enhancement layer picture and store the recovered enhancement layer picture in a memory device.
  • Another embodiment of the present invention discloses a method, comprising: receiving a first layer bitstream and a second enhancement layer bitstream corresponding to the first layer bitstream; obtaining a reference index for recovering an enhancement layer picture; determining whether a reference picture pointed to by the obtained reference index is a first layer picture representation that is different than a first layer picture; responsive to determining that the reference picture is the first layer picture representation that is different than the first layer picture, recovering a picture processing index; and responsive to recovering the picture processing index, recovering the enhancement layer picture and store the recovered enhancement layer picture in a memory device.
  • FIG. 1A is a block diagram of a scalable decoder.
  • FIG. 1B illustrates uni-prediction according to HEVC.
  • FIG. 1C illustrates bi-prediction according to HEVC.
  • FIG. 2A is a block diagram illustrating an example of an encoder and a decoder.
  • FIG. 2B is a block diagram illustrating an example of the decoder of FIG. 2A.
  • FIG. 3A is a flow diagram illustrating one configuration of a method for determining a mode for signaling motion information on an electronic device.
  • FIG. 3B is a flow diagram illustrating one configuration of a merge process on an electronic device.
  • FIG. 3C is a flow diagram illustrating one configuration of an explicit motion information transmission process on an electronic device.
  • FIG. 1A is a block diagram of a scalable decoder.
  • FIG. 1B illustrates uni-prediction according to HEVC.
  • FIG. 1C illustrates bi-prediction according to HEVC.
  • FIG. 3D is a flow diagram illustrating one configuration of signaling a reference index and a motion vector on an electronic device.
  • FIG. 4A is a flow diagram illustrating one configuration of merge list construction on an electronic device.
  • FIG. 4B is a flow diagram illustrating more of the configuration of merge list construction of FIG. 4A.
  • FIG. 5 illustrates a plurality of prediction units.
  • FIG. 6A is a flow diagram illustrating one configuration of motion vector predictor list construction on an electronic device.
  • FIG. 6B is a flow diagram illustrating more of the configuration of motion vector predictor list construction of FIG. 6A.
  • FIG. 7A illustrates flow diagrams illustrating an example of processes A and B that may be used for motion vector predictor list construction (FIGS. 6A-C and 7A-C).
  • FIG. 7B illustrates another example of process B from FIG. 7A.
  • FIG. 8 is a diagram to illustrate a second layer picture co-located with first layer reference picture for co-located first layer picture.
  • FIG. 9 is a block diagram to illustrate processing an output of a picture processing in the difference domain.
  • FIG. 10 illustrates an example where the decoder may receive a picture processing index if a coding unit level flag is set to true.
  • FIG. 11 illustrates an example where the decoder may receive a flag to indicate that only first layer pictures are used for reference when predicting the second layer pixel data. The first layer pictures are to be used along with a known motion vector.
  • FIG. 12 illustrates a list containing picture processing indices.
  • FIG. 13A illustrates a list containing picture indices and their corresponding list indices.
  • FIG. 13B is a block diagram to illustrate an example process of obtaining a picture processing index from a list index.
  • FIG.14 is a block diagram to illustrate an example process where an input picture with height h0 and width w0 is processed by two cascaded 1-D up-samplers. The output is a picture with height h1 and width w1. The dimensions of the input and output picture may or may not be the same.
  • FIG. 13A illustrates a list containing picture indices and their corresponding list indices.
  • FIG. 13B is a block diagram to illustrate an example process of obtaining a picture processing index from a list index.
  • FIG.14 is a block diagram to illustrate an example process where an input picture with height h0 and width w0 is processed by two cascaded 1-D up-samplers. The output is a picture with height h1 and width w1. The dimensions of the input and
  • FIG. 15 is a block diagram to illustrate an example process where an input picture with height h0 and width w0 is processed by four cascaded picture processors corresponding to 1-D horizontal up-sampler, 1-D horizontal bilateral filter, 1-D vertical up-sampler and 1-D vertical bilateral filter.
  • the output is a picture with height h1 and width w1.
  • the dimensions of the input and output picture may or may not be the same.
  • FIG. 16 is a block diagram to illustrate an example process where an input picture with height h0 and width w0 is processed by two cascaded stages. Each stage consists of three picture processors. In the first stage there is a 1-D horizontal up-sampler, a 1-D horizontal edge-adaptive filter and a 1-D horizontal edge-adaptive filter for full-pel positions.
  • the output is a picture with height h1 and width w1.
  • the dimensions of the input and output picture may or may not be the same.
  • FIG. 2A is a block diagram illustrating an example of an encoder and a decoder.
  • the system 200 includes an encoder 211 to generate bitstreams to be decoded by a decoder 212.
  • the encoder 211 and the decoder 212 may communicate over a network.
  • the decoder 212 includes an electronic device 222 configured to decode using some or all of the processes described with reference to the flow diagrams.
  • the electronic device 222 may comprise a processor and memory in electronic communication with the processor, where the memory stores instructions being executable to perform the operations shown in the flow diagrams.
  • the encoder 211 includes an electronic device 221 configured to encode video data to be decoded by the decoder 212.
  • the electronic device 221 may be configured to signal the electronic device 222 a picture processing index corresponding to one or more picture processors, e.g. upsamplers, filters, or the like, or any combination thereof.
  • the picture processing index may associate a particular picture processor of a set of picture processors available to the decoder 212 with a unit, e.g. a coding unit or a prediction unit of a coding unit.
  • the picture processing index may be associated with a coding unit (for skip mode) and/or a prediction unit (for merge mode and/or explicit transmission mode).
  • the electronic device 221 may be configured to signal the picture processing index using skip mode, merge process, and/or selective explicit signaling.
  • the electronic device 221 may be configured to transmit the picture processing index for only selected ones of reference pictures.
  • the picture processing index is transmitted for a reference picture that is a 'representation' of a first layer picture (that is different than the first layer picture).
  • the picture processing index may not be transmitted.
  • the electronic device 221 may be configured to transmit the picture processing index for only selected ones of reference pictures.
  • the first picture processing index is transmitted for a reference picture that is a representation of a first layer picture and a second picture processing index is transmitted for a representation of the first layer picture temporally co-located with the current second layer picture.
  • the picture processing index may not be transmitted for a representation of a first layer picture.
  • the picture processing index may not be transmitted for a representation of the first layer picture temporally co-located with the current second layer picture.
  • a temporally co-located first layer picture is the first layer picture which is at the same time instance as the second layer picture.
  • the electronic device 221 may be configured to transmit the picture processing index for a coding unit or prediction unit when a flag indicating the presence of the picture processing index is set to a value known to both the encoder and decoder.
  • the flag indicating the presence of the picture processing index is sent in a coding unit.
  • the flag indicating the presence of the picture processing index is sent in a prediction unit.
  • the flag indicating the presence of the picture processing index is called an IntraBL flag.
  • FIG. 10 illustrates an example where the decoder 212 receives a coding unit level intraBL flag 1010. If the intraBL flag received is set to true then the decoder 212 receives a picture processing index 1030.
  • the electronic device 222 may be configured to recover a picture processing index for only a portion of the prediction units. In an example, the electronic device 222 may be configured to infer some or all of a picture processing index for a prediction unit from a neighbor prediction unit. Therefore, for some prediction units, the picture processing index may not be transmitted completely or at all.
  • FIG. 2B is a block diagram illustrating an example of the decoder 212 of FIG. 2A.
  • the decoder 212 receives a first layer bitstream, e.g. a base layer bitstream or an enhancement layer bitstream, and a second layer bitstream, e.g. an enhancement layer bitstream that is dependent on the first layer bitstream.
  • a first layer bitstream e.g. a base layer bitstream or an enhancement layer bitstream
  • a second layer bitstream e.g. an enhancement layer bitstream that is dependent on the first layer bitstream.
  • the metadata output by the first layer decoder may include a picture processing index 232 to define the picture processor 230, e.g. identify a particular picture processor from a set of picture processors available to the decoder 212.
  • the metadata carried by second layer bitstream may include a picture processing index to define the picture processor 230, e.g. identify a particular picture processor from a set of picture processors available to the decoder 212.
  • the picture processor 230 receives picture data, e.g. pixel data, from a decoded picture buffer of the first layer decoder.
  • picture data e.g. pixel data
  • the decoded picture buffer may represent pixel data obtained from partially decoded pictures.
  • the picture processors 230 generates the representation 231 based on the input picture data, and provides the representation 231 to a decoded picture buffer of the second layer decoder.
  • the representation 231 may have a different spatial resolution than the input pixel data, e.g. higher spatial resolution when the picture processing comprises upsampling.
  • the representation 231 may have the same spatial resolution as the input pixel data.
  • an input picture with height h0 and width w0 is received 1405 and processed by two cascaded picture processors corresponding to a 1-D horizontal up-sampler 1410 and 1-D vertical up-sampler 1420.
  • the output is a picture with height h1 and width w1 1425.
  • the output picture may correspond to 231 in FIG. 2B.
  • FIG. 14 may represent an up-sampling process.
  • the horizontal and vertical up-sampling stages may be interchanged.
  • a configuration where up-sampling is carried out using two separable 1-D up-sampling stages is referred to as a two dimensional (2-D) separable up-sampler.
  • the input picture may be the base layer decoded picture at lower resolution having height h0 and width w0.
  • the output of the first horizontal up-sampling stage 1410 is a picture up-sampled in the horizontal direction and has height h0 and width w1.
  • the input picture is the output of the horizontal up-sampling stage, and the output is the fully up-sampled picture with height h1 and width w1 1425.
  • the output of both the horizontal and vertical up-sampling stages contains pixels which can be classified into two categories.
  • the first category of output pixels are those which are directly copied from the input picture.
  • the second category of pixels is those which represent interpolated pixels using pixels from the input picture.
  • An output pixel that is spatially collocated with a pixel in the input picture is referred as full-pel pixel.
  • a full-pel pixel is directly copied from the spatially collocated pixel in the input picture.
  • a full-pel pixel is computed as a weighted average of pixels in the input picture.
  • An output pixel in the up-sampled picture that is not spatially collocated with any pixels in the input picture is referred to as the interpolated pixel and is determined using a weighted average, or N-tap filtering, of input pixels.
  • Output pixels may be assigned an index called phase that represents the spatial location of the output pixels relative to the input pixel.
  • the tap values (or weight values) of the N-tap filter used to determine interpolated pixels are called the filter coefficients and are dependent on the phase of the output up-sampled pixel.
  • full-pel pixels may be assigned a phase of 0.
  • the picture processing index refers to picture processors within a set of picture processors for all components of the image, e.g. luma and chroma components. In an example, the picture processing index refers to picture processors within a set of picture processors for only a portion of the components of the image. In an example, a subset of luma and/or chroma components share the same picture processing index.
  • the picture processing index refers to a parameter (or set of parameters) that determine the operation of the picture processor.
  • the picture processor is an edge adaptive filter, such as a bi-lateral filter, sieve filter or other filter method that provides relatively more smoothing in image regions with low gradient properties.
  • the picture processing index may determine the degree of smoothing as a function of image gradient, where image gradient is determined using well known techniques such as determining the difference between a value corresponding to a first pixel and a value corresponding to a second pixel.
  • the picture processing index may determine the support of the picture processor, where support defines the collection of pixel locations used for filtering a current pixel location.
  • the picture processing index may determine both the support and degree of smoothing as a function of image gradient of the filter.
  • the picture processing index may consist of two index values or a single index that determines the support and degree of smoothing as a function of image gradient.
  • the picture processor is the bi-lateral filter given as
  • Output(x,y) denotes the output of the pixel processor at location x and y
  • Input(a,b) denotes the input to the pixel processor at location a and b
  • G(a,b) denotes the gradient of the input pixel data at location a and b, for example, the intensity difference of input pixel at location a and b relative to the input pixel at location x and y is a form of gradient.
  • f is a function that maps the gradient information and spatial information to a weight value
  • sum is the summation operator over the values of a and b that form the support for location x and y.
  • x and y denote a horizontal and vertical location in a two-dimensional representation
  • a and b denote a horizontal and vertical location in a two-dimensional representation.
  • the function f consists of two parts.
  • the first part is a spatial kernel that maps the spatial distance between two pixel locations to a weight value.
  • a pixel with a large spatial distance from a current pixel will be assigned a smaller weight, while a pixel with a small spatial distance from a current pixel will be assigned a larger weight.
  • the second part is a gradient kernel that maps large gradient values to small weight values and small gradient values to large weight values.
  • the control parameter sigma_R is determined by the picture processing index.
  • the spatial kernel is the 5x5 matrix:
  • the function f is denoted as a look up table that provides a mapping between input and output values.
  • the lookup table is expressed as ⁇ 16, 15, 12, 9, 6, 4, 2, 1, 0 ⁇ , and the location in the lookup table is determined as (i>>2) where i>>2 denotes a right shift by two bits that is an approximation of dividing by 4.
  • first layer decoder and second layer decoder may have different spatial resolution.
  • First layer decoded picture may be up-sampled before being used as reference by the second layer decoder.
  • a picture processor is applied to the first layer decoded picture before up-sampling.
  • an example picture processor is the edge-adaptive filter.
  • a picture processor is applied to the up-sampled first layer decoded picture.
  • an example picture processor is the edge-adaptive filter.
  • the picture processing operations may be combined with other operations within the up-sampler.
  • an input picture with height h0 and width w0 is received 1505 and processed by four cascaded picture processors corresponding to 1-D horizontal up-sampler 1510, 1-D horizontal edge-adaptive filter 1515, 1-D vertical up-sampler 1520 and 1-D vertical edge-adaptive filter 1525.
  • the output 1530 is a picture with height h1 and width w1.
  • the dimensions of the input and output picture may or may not be the same
  • the four cascaded picture processors together represent a 2-D separable edge adaptive up-sampling filter.
  • 1510 and 1515 together represent a.
  • 1-D horizontal edge adaptive up-sampling filter followed by 1520 and 1525 which together represent a 1-D vertical edge adaptive up-sampling filter.
  • the order of 1510 and 1515 may be reversed, and the order of 1520 and 1525 may be reversed.
  • the ordering of the 1-D horizontal edge adaptive up-sampling filter and the 1-D vertical edge adaptive up-sampling filter may be interchanged.
  • the 1-D edge adaptive filter denotes the 2-D edge adaptive filter described above with the restriction that the values used as input to the edge adaptive filter must be collocated on the same row or column of the picture. (In other words, one of the values amongst the 2-tuple location is a constant.) As a result the filter support region is one dimensional. A potential benefit of separable filter over a 2-D filter is lower complexity.
  • a 1-D edge adaptive filter may also easier to combine into the up-sampling process.
  • the 1-D edge adaptive filter is combined with the up-sampler as described below:
  • the 1-D horizontal up-sampling filter 1510 is applied to the input picture 1505.
  • the output of this process is a picture with height h0 and width w1
  • a 1-D horizontal edge adaptive filter 1515 is applied to the output of 1510.
  • a 1-D vertical up-sampling filter 1520 is applied to the output of 1515.
  • the output of this process is a picture with height h1 and width w1.
  • a 1-D vertical edge adaptive filter 1525 is applied to the output of 1520.
  • the output of this process 1530 is a picture with height h1 and width w1.
  • the edge adaptive filter may use both input pixels received in process 1505 and the interpolated output pixels determined in 1510 and 1520.
  • the edge adaptive filter process may be implemented as a separate pass from the up-sampling.
  • the edge adaptive filter process maybe, combined into the up-sampling loop but with some delay. This is illustrated in Fig. 16.
  • the 1-D horizontal edge adaptive filter (in process 1620) will need to wait for up-sampled pixels output by a horizontal up-sampler (in process 1610).
  • the edge-adaptive filter may need to wait for only a subset of up-sampled pixels thereby reducing the delay.
  • the edge adaptive filtering is done in the same pass as up-sampling. In FIG.
  • Process 16 a subset of the pixels required to generate horizontally up-sampled picture is generated as output (1622) of process 1620 and the remaining pixels are generated as output (1617) of process 1615.
  • Process 1615 represents a process where the edge-adaptive filtering operations are merged with the up-sampling operations.
  • a single-pass 1-D up-sampling and edge adaptive filter only uses input pixels and/or the pixels already generated in the past, i.e. causal pixels, so that the filtering operation can be done in the same pass as up-sampling without waiting for the future up-sampled pixels. Referring FIG. 16, for determining output pixels at full-pel positions, a 1-D horizontal edge-adaptive filtering 1615 is used.
  • the 1-D up-sampler is not needed since the output pixels at full-pel positions for a 1-D up-sampler are just copies of the input pixels.
  • the output 1617 of 1-D horizontal edge-adaptive filtering process 1615 are a subset of the output pixels required.
  • the output 1622 of process 1620 and output 1617 of process 1615 are multiplexed 1625 to generate a picture with height h0 and width w1. This picture is the output of the 1-D horizontal stage and serves as the input for the 1-D vertical stage.
  • the 1-D vertical stage contains three picture processors: a 1-D vertical up-sampler 1650, a 1-D vertical edge adaptive filter 1660 and a 1-D vertical edge-adaptive filter for full-pel positions 1655.
  • the 1-D vertical up-sampler 1650 generates the interpolated output pixels which is further processed by the 1-D vertical edge-adaptive filter 1660.
  • the 1-D vertical edge-adaptive filter generates the full-pel output pixels. These two subset of output pixels are multiplexed 1665 to generate the output picture with height h1 and width w1.
  • the intermediate result may need to be downshifted to avoid data overflow.
  • the 1-D edge adaptive filter in 1620, 1615, 1660, 1655 may consist of a spatial kernel and a gradient kernel as shown in Eq (1), except that the supporting region is 1 dimensional, and the spatial kernel is also 1 dimensional.
  • 3x1 spatial kernel is used and with the values ⁇ 4, 6, 3 ⁇ .
  • the input pixel to be filtered gets a weight a 6 in this example, the input pixel to the left gets a weight of 4, and the input pixel to the right gets a weight of 3.
  • the spatial kernel may be dependent on the phase of the up-sampled pixels, where phase refers to the location of an upsampled pixel relative to the full-pel pixel. It may also depend on the color component being processed.
  • a received picture may contain a luma and chroma color component.
  • Some example 3x1 spatial kernels used by 1-D edge adaptive fitler are given below. Depending on the phase of the up-sampled pixel and the color channel, one of the spatial kernel shown below is used.
  • the gradient kernel may be implemented as a look-up-table using gradient as index. Smaller gradient gets larger weight.
  • An example gradient kernel is:
  • the gradient kernel can be quantized to reduce the size of the look-up-table and complexity.
  • the look-up-table below uses the minimum of (gradient>>2) and the size of the look-up-table as the index: ⁇ 16, 15, 12, 9, 6, 4, 2, 1, 0 ⁇ .
  • the product of spatial weight and gradient weight forms the total weight for an input pixel.
  • the spatial kernel and the gradient kernel can be combined into one look up table.
  • the 1D spatial kernel and the quantized gradient kernel above can be combined into:
  • the 1st array index is the spatial weight (0 to 6 in this example) looked up from the example 3x1 luma or chroma spatial kernel above; and the 2nd array index is the quantized gradient level, (gradient>>2) in this example.
  • the center pixel whose gradient is always 0, and spatial weight is 6 here, its combined weight will be stored in combined_BLT_Kernel[6][0], and its value is 96 in this case.
  • This combined kernel may be scaled to other values.
  • Another example of the combined kernel is:
  • center pixel has a weight of 128, which can be implemented as shift when being multiplied with the input pixel value.
  • a smaller gradient table and a smaller kernel makes the total number of reciprocals in the division operation of Output(x,y) shown in Eq (1) smaller so that the division operation can also be implemented as look up table. This reduces the complexity of the edge adaptive filter.
  • Division i/j can then be implemented as i*LUT(j)>>N, where ">>" means right shift N bit.
  • the base of the division in Eq (1) is the sum weight of the 3 input pixels.
  • FIG. 3A is a flow diagram illustrating one configuration of a method for determining a mode for signaling motion information on an electronic device.
  • the electronic device 222 receives a skip flag, and in process 304 determines whether the skip flag is true.
  • Skip flags are transmitted for coding units (CUs). The skip flag signals to copy motion information for a neighbor to skip a transmission of motion information for the CU. If the skip flag is true, then in process 305 the electronic device 222 performs the merge process for the CU (the merge process will be discussed in more detail with respect to FIG. 3B).
  • the electronic device 222 receives a prediction mode flag and a partition mode flag. These flags are transmitted for prediction units (PUs), which are components of the CU.
  • the electronic device 222 determines whether the prediction mode is intra. If the prediction mode is intra, then in process 311 the electronic device 222 performs intra decoding (no motion information is transmitted).
  • the electronic device 222 determines the number (n) of prediction units (PUs), i.e. nPUs (motion information may be transmitted in a plurality of units, namely PUs). Starting at N equals 0, the electronic device 222 in process 315 determines whether N less than nPU. If N is less than nPU, then in process 317 the electronic device 222 receives a merge flag. In process 319, the electronic device 222 determines whether the merge flag is true. If the merge flag is true, then in the electronic device 222 performs the merge process 305 'for the PU' (again, the merge process will be discussed in more detail with respect to FIG. 3B).
  • PUs prediction units
  • process 321 the electronic device 222 performs an explicit motion information transmission process for the PU (such process will be discussed in more detail with respect to FIG. 3C).
  • the process of FIG. 3A repeats as shown for a different N value.
  • FIG. 3B is a flow diagram illustrating one configuration of a merge process on an electronic device.
  • the electronic device 222 in process 325 constructs a merge list (merge list construction will be discussed in more detail with respect to FIGS. 4A-B). Still referring to FIG. 3B, in process 327, the electronic device 222 determines whether a number of merge candidates is greater than 1. If the number is not greater than 1, then the merge index equals 0. The electronic device 222 in process 335 copies, for the current unit, information (such as the inter-prediction mode [indicating whether uni-prediction or bi-prediction and which list], a recovered index such as a reference index and/or the optional picture processing index that will be described later in more detail, and a motion vector) for the candidate corresponding to merge index equals 0.
  • information such as the inter-prediction mode [indicating whether uni-prediction or bi-prediction and which list]
  • a recovered index such as a reference index and/or the optional picture processing index that will be described later in more detail
  • the electronic device 222 in process 337 receives the merge index.
  • the electronic device 222 in process 335 copies, for the current unit, information (such as the inter-prediction mode, at least one reference index, and at least one motion vector) for the candidate corresponding to the received merge index.
  • FIG. 3C is a flow diagram illustrating one configuration of an explicit motion information transmission process on an electronic device.
  • the electronic device 222 in process 351 receives an inter-prediction mode (again indicating whether uni-prediction or bi-prediction and which list). If the inter-prediction mode indicates that the current PU does not point to list 1, i.e. does not equal Pred_L1, then X equals 0 and the electronic device 222 in process 355 signals reference index and motion vector (such process will be discussed in more detail with respect to FIG. 3D).
  • the electronic device 222 in process 357 determines whether inter-prediction mode indicates that the current PU does not point to list 0, i.e. does not equal Pred_L0, then X equals 1 and the electronic device 222 in process 355 signals reference index and motion vector (such process will be discussed in more detail with respect to FIG. 3D).
  • FIG. 3D is a flow diagram illustrating one configuration of signaling a reference index and a motion vector on an electronic device.
  • the electronic device 222 in process 375 determines whether the number of entries in list X greater than 1. If the number of entries in list X is greater than 1, then in process 379 the electronic device 222 receives a list X reference index. If the number of entries in list X is not greater than 1, then in process 377 the list X reference index is equal to 0.
  • the electronic device 222 may be configured to perform process 378 indicated by the shaded diamond. In some examples, the electronic device 222 is not configured with processes 378 (in such examples processing continues directly from process 377 or 379 to process 380 along dashed line 372). The optional process 378 will be described in more detail later in the section entitled "Conditional Transmission/Receiving of Picture Processing Index".
  • the electronic device 222 in process 380 receives a picture processing index.
  • the electronic device 222 determines in process 387 whether X is equal to 1 and, if so, whether a motion vector difference flag (indicating whether motion vector difference is zero) for list 1 is true. If the flag is not true, then the electronic device 222 in process 388 receives the motion vector difference. If the flag is true, then in process 390 the motion vector difference is zero.
  • the electronic device 222 in process 391 constructs a motion vector predictor list (motion vector predictor list construction will be discussed in more detail with reference to FIGS. 6A-B).
  • the electronic device 222 in process 397 receives a motion vector predictor flag.
  • FIG. 4A is a flow diagram illustrating one configuration of merge list construction on an electronic device.
  • the electronic device 222 in process 452 determines whether conditions corresponding to the left LB PU (FIG. 5, 505) are true.
  • the conditions are: is the left LB PU 505 available; whether the left LB PU 505 and a spatial neighboring PU are not in the same motion estimation region; and whether the left LB PU 505 does not belong to the same CU (as the current PU).
  • One criterion for availability is based on the partitioning of the picture (information of one partition may not be accessible for another partition).
  • Another criterion for availability is inter/intra (if intra, then there is no motion information available). If all conditions are true, then the electronic device 222 in process 454 adds motion information from the left LB PU 505 to the merge list.
  • the electronic device 222 in process 456 determines whether conditions corresponding to the above RT PU (FIG. 5, 509) are true.
  • the conditions include: is the above RT PU 509 available; whether the above RT PU and a spatial neighboring PU are not in the same motion estimation region; whether the above RT PU 509 does not belong to the same CU (as the current PU); and whether the above RT PU 509 and the left LB PU 505 do not have the same reference indices and motion vectors.
  • the electronic device 222 may be configured to check an additional condition indicated by the shaded diamond 457. In some examples, the electronic device 222 is not configured to check the additional condition (in such examples processing continues directly from process 456 to process 458 along dashed line 401). The additional condition indicated by optional process 457 will be described in more detail later in the section entitled "Merge List Construction Processes for Signaling Picture Processing".
  • the electronic device 222 in processes 460 and 464 makes 'similar' determinations for the above-right RT PU (FIG. 5, 511) and the left-bottom LB PU (FIG. 5, 507), respectively.
  • Additional conditions may be checked as indicated by optional diamonds 461 and 465 and dashed lines 402 and 403. Additional motion information may be added to the merge list in processes 462 and 466.
  • the electronic device 222 in process 468 determines whether the merge list size less than 4, and whether conditions corresponding to the above-left LT PU (FIG. 5, 503) are true.
  • the conditions include: is the above-left LT PU 503 available; are the above-left LT PU 503 and a spatial neighboring PU not in the same motion estimation region; do the above-left LT PU 503 and a left PU (FIG. 5, 517) not have same reference indices and motion vectors; and do the above-left LT PU 503 and an above PU 515 not have the same indices and motion vectors.
  • the electronic device 222 may be configured to check an additional condition indicated by the shaded diamond 469. In some examples, the electronic device 222 is not configured to check the additional condition (in such examples processing continues directly from process 468 to process 470 along dashed line 405).
  • the additional condition indicated by optional process 469 will be described in more detail later in the section entitled "Merge List Construction Processes for Signaling Picture Processing".
  • the electronic device 222 in process 470 adds motion information for the above-left LT PU 503 to the merge list.
  • the process continues to FIG. 4B as indicated by the letter "A".
  • FIG. 4B is a flow diagram illustrating more of the configuration of merge list construction of FIG. 4A.
  • the electronic device 222 in process 472 determines whether a temporal motion vector predictor flag (transmitted in an HEVC bitstream) is true. If the temporal motion vector predictor flag is not true, then the electronic device 222 in process 491, if space is available in the merge list, selectively adds bi-directional combinations of the candidates added so far, e.g. known candidates. The electronic device 222 in process 492, if space is available in the merge list, adds zero motion vectors pointing to different reference pictures.
  • the electronic device 222 may construct a candidate using a reference index from a spatial neighbor and a motion vector from a temporal neighbor.
  • the electronic device 222 in process 474 determines whether a left PU 517 is available. If the left PU 517 is available, then the electronic device 222 in process 476 determines whether the left PU 517 is a first, i.e. initial, PU in the CU.
  • the electronic device 222 in process 478 sets RefIdxTmp0 and RefIdxTmp1 to reference indices read from list 0 and list 1 of left PU 517 (if reference indices are invalid then 0 is used as a reference index).
  • the electronic device 222 in process 480 sets RefIdxTmp0 and RefIdxTmp1 to 0.
  • the electronic device 222 may be configured to perform some or all of processes 487, 488, 489, 490, and 495 indicated by the shaded boxes and/or diamonds. However, in some examples the electronic device 222 is not configured with processes 487, 488, 489, 490, and 495 (in such examples processing continues directly from process 480 or 478 to process 482 along the dashed line 410 or the dashed line 411, and also continues directly from process 486 to process 491 along dashed line 413).
  • the optional processes 487, 488, 489, 490, and 495 will be described in more detail later in the section entitled "Determining Pixel Processing Indices from Spatial Neighbors".
  • the electronic device 222 in process 482 fetches motion information belonging to the PU of a previously decoded picture in the current layer.
  • the electronic device 222 in process 484 scales motion vectors belonging to the PU of a previously decoded picture in the first layer using the fetched reference indices and RefIdxTmp0 and RefIdxTmp1.
  • the electronic device 222 in process 486 adds the motion information determined by RefIdxTmp0, RefIdxTmp1, and the called motion vectors to the merge list.
  • the previously decoded picture in the first layer is a picture temporally co-located, e.g. corresponding to the same time instance, with the current picture being coded with the current picture being coded.
  • the electronic device 222 may be configured to perform process 496 indicated by the shaded box. However, in some examples the electronic device 222 is not configured with process 496 (in such examples processing continues directly from 472 [no result] to process 491 along dashed line 412 and directly from process 486 to process 491 along the dashed line 413).
  • process 496 will be described in more detail later in the section entitled "Merge List Construction Processes for Signaling Picture Processing".
  • FIG. 6A is a flow diagram illustrating one configuration of motion vector predictor list construction on an electronic device.
  • the electronic device 222 in process 625 determines whether at least one of below-left LB PU (not shown) or left LB PU (FIG. 5, 505) is available.
  • the electronic device 222 in process 627 sets a variable addSMVP to true if at least one of such PUs are available.
  • the electronic device 222 in process 630 tries to add below-left LB PU motion vector predictor (MVP) using process A to MVP list. If not successful, then the electronic device 222 in process 634 tries adding left LB PU MVP using process A to MVP list. If not successful, then the electronic device 222 in process 637 tries adding below-left LB PU MVP using process B to MVP list. If not successful, then the electronic device 222 in process 640 tries adding left LB PU MVP using process B to MVP list. At least one of processes 632, 635, and 639 may be performed.
  • MVP motion vector predictor
  • process A is configured to add a candidate MVP only if a reference picture of a neighboring PU and that of the current PU (i.e. the PU presently under consideration) is/are the same.
  • process B is a different process than process A.
  • process B is configured to scale the motion vector of a neighboring PU based on temporal distance and add the result as a candidate to the MVP list.
  • processes A and B operate as shown in FIG. 7A.
  • Process B of FIG. 7B accounts for the change in spatial resolution across layers. In such an event, the scaling of motion vectors is not only based on temporal distance, but also on the spatial resolutions of the first and second layer.
  • the electronic device 222 in process 642 tries to add above-right RT PU MVP using process A to MVP list. If not successful, then the electronic device 222 in process 645 tries adding above RT PU MVP using process A to MVP list. If not successful, then the electronic device 222 in process 647 tries adding above-left LT PU MVP using process A to MVP list. At least one of processes 644, 646, and 648 may be performed.
  • the electronic device 222 in process 649 sets the value of a variable "added” to the same value as variable "addSMVP".
  • the electronic device 222 in process 650 sets the variable "added” to true if the MVP list is full.
  • the process continues to FIG. 6B as indicated by the letter "B".
  • FIG. 6B is a flow diagram illustrating more of the configuration of motion vector predictor list construction of FIG. 6A.
  • the electronic device 222 in process 651 determines whether below-left LB or left LB PU 505 are available, i.e. determines whether the variable "added" is set to true. If not, then in process 652 the electronic device 222 tries adding above-right RT PU MVP using process B to MVP list. If not successful, then the electronic device 222 in process 656 tries adding above RT PU MVP using process B to MVP list. If not successful, then the electronic device 222 in process 660 tries adding the above-left LT PU MVP using process B to MVP list. At least one of processes 654, 658, and 662 may be performed. The electronic device 222 in process 663 may remove any duplicate candidates in the MVP list.
  • the electronic device 222 in process 667 determines whether temporal motion vector predictor addition is allowed, e.g. determines whether the temporal motion vector predictor flag is true. If allowed, the electronic device 222 in process 668 fetches a motion vector belonging to the PU of a previously decoded picture in the current layer, and adds the fetched motion vector after scaling to MVP list. The electronic device 222 in process 664, if space is available in the MVP list, adds zero motion vectors to the MVP list.
  • the electronic device 222 in process 378 determines whether the reference index is pointing to a representation of the first layer picture. If the reference index is pointing to a representation of the first layer picture, the electronic device 222 in process 380 receives a picture processing index. As will be explained later in greater detail, even if process 380 is not performed (for example because of a no result in process 378), a picture processing index may still be inferred, for example, based on a picture processing index of neighbors (spatial and/or temporal).
  • the electronic device 222 receives a reference index.
  • the electronic device 222 is configured to determine whether a reference picture (determined using the reference index) is a representation of a first layer picture that is different than the first layer picture, e.g. is an upsampled first layer picture, a filtered first layer picture, or the like.
  • the electronic device 222 is configured to, responsive to determining that the reference picture is the representation of the first layer picture that is different than the first layer picture, receive a picture processing index. Otherwise, the picture processing index is not received.
  • the electronic device 222 may be configured to perform process 382 indicated by the shaded diamond. In some examples, the electronic device 222 is not configured with processes 382 (in such examples processing continues directly from process 379 [yes path] along dashed line 373).
  • the optional process 382 will be described in more detail later in the section entitled "Conditional Transmission/Receiving of Picture Processing Index in Systems Carrying out Prediction in the Difference Domain".
  • the electronic device 222 is configured to determine whether the above RT PU (FIG. 5, 509) and left LB PU (FIG. 5, 505) do not have the same picture processing indices as indicated by diamond 457. If this condition and the conditions indicated by diamond 456 are all true, then the electronic device 222 performs previously described process 458. It should be appreciated that checking the conditions indicated in diamonds 456 and 457 may comprise any number of processes, e.g. a single process, two processes, one process for each condition, etc., and the disclosure is not limited in this respect.
  • the electronic device 222 is configured to determine whether the above RT PU (FIG. 5, 509) and above-right RT PU (FIG. 5, 511) do not have the same picture processing indices as indicated by diamond 461. If this condition and the conditions indicated by diamond 460 are all true, then the electronic device 222 performs previously described process 462. It should be appreciated that checking the conditions indicated in diamonds 460 and 461 may comprise any number of processes, e.g. a single process, two processes, one process for each condition, etc., and the disclosure is not limited in this respect.
  • the electronic device 222 is configured to determine whether the left-bottom LB PU (FIG. 5, 507) and left LB PU 505 do not have the same picture processing indices as indicated by diamond 465. If this condition and the conditions indicated by diamond 464 are all true, then the electronic device 222 performs previously described process 466. It should be appreciated that checking the conditions indicated in diamonds 464 and 465 may comprise any number of processes, e.g. a single process, two processes, one process for each condition, etc., and the disclosure is not limited in this respect.
  • the electronic device 222 is configured to determine whether the above-left LT PU (FIG. 5, 503) and above PU (FIG. 5, 515) do not have the same picture processing indices as indicated by diamond 469. If this condition and the conditions indicated by diamond 468 are all true, then the electronic device 222 performs previously described process 470. It should be appreciated that checking the conditions indicated in diamonds 468 and 469 may comprise any number of processes, e.g. a single process, two processes, one process for each condition, etc., and the disclosure is not limited in this respect.
  • the electronic device 222 in process 487 determines picture processing indices from spatial neighbors.
  • the electronic device 222 in process 490 sets processing index 0 and processing index 1 to processing indices read from list 0 and list 1 of left PU 517, and determines picture processing indices from spatial neighbors.
  • the electronic device 222 in process 488 determines whether reference picture(s) pointed to by the reference index is a representation of a first layer picture. If the reference picture(s) pointed to by the reference index is a representation of a first layer picture, then the electronic device 222 in process 489 replaces the reference index with a reference index of the second layer picture co-located with the first layer reference picture for co-located first layer picture as shown (referring to FIG. 8, R1 is the second layer picture co-located with first layer reference picture for co-located first layer picture), and determines the picture processing index from spatial neighbors.
  • R1 is the second layer picture co-located with first layer reference picture for co-located first layer picture
  • the electronic device 222 in process 495 adds to the motion information existing in the merge list the determined picture processing indices.
  • the process continues directly from process 495 to previously described process 491 along the dashed line 414.
  • the electronic device 222 may create a second merge list containing the determined picture processing indices and does not contain motion information.
  • the electronic device 222 is configured to determine the number of times each picture processing index appears in a subset of neighbors. The list may then be ordered according to the determined number.
  • the electronic device 222 may be configured to select, as a representative picture processing index for the current PU or CU, a picture processing index corresponding to a predefined position in the ordering of list, e.g. an initial position in the ordering that correspond to the most number of times.
  • the electronic device 222 is configured to select the picture processing index corresponding to the media count as the representative picture processing index for the current PU.
  • the spatial neighbors considered are Left LB PU, above RT PU, above-right RT PU, left bottom LB PU, and above left LT PU. If no neighbors are available, then a predetermined picture processing index may be chosen as the representative picture processing index for the current PU.
  • the electronic device 222 in process 495 adds motion information corresponding to the reference index of the co-located first layer picture, picture processing indices not yet added to the merge list, and zero motion vectors.
  • motion information, picture processing indices, and zero motion vectors are added after selectively adding bi-directional combinations of the candidates added so far, e.g. known candidates.
  • the process continues directly from previously described process 486 to optional process 496 along the dashed line 415.
  • prediction may be carried out in a difference domain instead of the pixel domain.
  • the decoder 212 (FIG. 2) may include the components shown in FIG. 9.
  • a picture processor 901 receives the picture data from the first layer, and provides a representation of the same in the decoded processed picture buffer 902.
  • a comparator determines a difference between the representation and a picture from the buffer 902, and outputs a difference picture 905.
  • the representation 906 and the difference picture 905 are provided to the predictor.
  • the processing device 222 in process 382 determines whether a difference coding mode is being used. If either the reference index is pointing to a representation of the first layer picture (379) or a difference coding mode is being used, then the processing device 222 in process 380 receives the picture processing index. In an example, a difference coding flag indicates whether or not difference coding mode is being used. In an example, a process continues directly from blocks 377/379 to diamond 382 (such example does not include process 378).
  • the electronic device 222 generates a list based on picture processing indices in the neighborhood of a current prediction unit.
  • the electronic device 222 receives from the electronic device 221 an index into this list for a unit.
  • the index into the list may be explicitly signaled.
  • the electronic device 222 is configured to determine the number of times each picture processing index appears in a subset of neighbors. The list may then be ordered according to the determined number.
  • the electronic device 222 may be configured to select, as a representative picture processing index for the current PU or CU, a picture processing index corresponding to a predefined position in the ordering of list, e.g. an initial position in the ordering that correspond to the most number of times.
  • the electronic device 222 is configured to select the picture processing index corresponding to the media count as the representative picture processing index for the current PU.
  • the spatial neighbors considered are Left LB PU, above RT PU, above-right RT PU, left bottom LB PU, and above left LT PU. If no neighbors are available, then a predetermined picture processing index may be chosen as the representative picture processing index for the current PU.
  • a system may include an electronic device of a decoder, the electronic device configured to: receive a first layer bitstream and a second enhancement layer bitstream corresponding to the first layer bitstream; obtain a reference index for recovering an enhancement layer picture; determine whether a reference picture pointed to by the obtained reference index is a first layer picture representation that is different than a first layer picture; responsive to determining that the reference picture is the first layer picture representation that is different than the first layer picture, recover a picture processing index; and responsive to recovering the picture processing index, recover the enhancement layer picture and store the recovered enhancement layer picture in a memory device.
  • the picture processing index points to a picture processor of a set of picture processors.
  • the set of picture processors comprises at least one upsampler.
  • the set of picture processors comprises at least one filter.
  • the set of picture processors comprises at least one least one edge adaptive filter.
  • the set of picture processors comprises at least one combination of upsampler and edge adaptive filter.
  • the electronic device is further configured to: determine whether a difference coding mode is being used; and recover the picture processing index responsive to determining that the difference coding mode is being used.
  • the electronic device is further configured to: determine, for a selected Prediction Unit (PU), conditions including: is the selected PU available, are the selected PU and a spatial neighboring PU not in a same motion estimation region, do the selected PU and a previously selected PU have a same reference index and motion vector, and do the selected PU and a previously selected PU not have a same picture processing index; responsive to determining that the included conditions are all true, add motion information from the selected PU to a merge list.
  • PU Prediction Unit
  • the determined conditions further include: do the selected PU and a currently considered PU belong to a same Coding Unit (CU).
  • CU Coding Unit
  • the electronic device is further configured to, responsive to determining that the reference picture is the first layer picture representation that is different than the first layer picture, replace a reference index in a merge list with a different reference index.
  • the electronic device is further configured to add motion information determined by the recovered picture processing index to a merge list.
  • the electronic device is further configured to add the recovered picture processing index to the merge list.
  • the electronic device is further configured to receive 1110 a flag 11A that determines that only first layer pictures are available for reference when predicting the second layer pixel data.
  • the flag 11A is set to true 1120 it indicates the use of a known motion vector for prediction 1130.
  • the known motion vector is the zero motion vector.
  • the electronic device is further configured to receive the picture processing index 1140 after said flag.
  • the picture processing index is received conditionally after said flag is received with a known value.
  • the picture processing index is received only if said flag is equal to a true value 1120.
  • the picture processing index is not explicitly signaled but is inferred based on information previously received in the bitstream. Said flag may be received on a sequence, picture, slice, coding unit, prediction unit or other partitioning of the pixel data.
  • the electronic device is further configured to infer the value of the picture processing index.
  • the picture processing index is inferred to be equal to the index of the one picture processor.
  • the electronic device is further configured to receive a representation of the picture processing index that determines the distance from a known picture processing index and the received picture processing index.
  • the known picture processing index is a predicted picture processing index.
  • the known picture processing index is signaled in the picture parameter set, the sequence parameter set, the adaptation parameter set, the slice header, the picture parameter set extension, the sequence parameter set extension, the slice header extension or any normative means of receiving information from an encoder for a decoder.
  • the predicted picture processing index may be conditionally signaled based on a layer identifier. A layer identifier may determine association with either a first layer or second layer.
  • distance is defined as the difference between the known picture processing index and the received picture processing index.
  • the picture processing indices are ordered by index value for determining the value of the difference.
  • the picture processing indices determine a control parameter of the picture processor, and the picture processing indices are ordered by the corresponding control parameter value to determine the value of the difference.
  • the picture processing indices determine multiple control parameters of the picture processor.
  • the difference may be determined for each control parameter value.
  • the difference received may consist of a first and second difference value that indicate that the decoder use the picture processor with index corresponding to a difference in control parameter values of the first and second difference value combined with the first and second parameter value corresponding to the known picture processing index.
  • the electronic device is configured to receive a representation of the pixel processing index indicating the distance (defined above) from a known picture processing index.
  • the known predicted picture processing index may be determined based on the picture processing index of the spatial and/or temporal neighbors of the pixel partition under consideration.
  • pixel partition may denote a coding unit, prediction unit, slice or any normative method for partition.
  • picture processors at the decoder are ordered as a list based on the frequency of occurrence in the spatial and/or temporal neighborhood.
  • the picture processing index at a pre-determined location in the ordered list is chosen as the known picture processing index. In one example, this pre-determined location represents the top of the list 1210.
  • a pre-determined tie-breaking method is used to determine the ordering.
  • the pre-determined tie-breaking mechanism places a picture processing index first in the list if it belongs to a first preferred neighborhood amongst the neighborhoods under consideration.
  • the pre-determined tie-breaking mechanism places a picture processing index second in the list if it belongs to a second preferred neighborhood amongst the neighborhoods under consideration, and so on.
  • a default picture processing index may be used as the predictor.
  • the default picture processing index may be signaled in the picture parameter set, the sequence parameter set, the adaptation parameter set, the slice header, the picture parameter set extension, the sequence parameter set extension, the slice header extension or any alternative suitable location.
  • the default picture processing index may be conditionally signaled based on a layer identifier.
  • a layer identifier may determine association with either enhancement or base layer.
  • more than the picture processing indices may be partitioned into more than one group and each group has its own associated picture processing index predictor.
  • the electronic device is configured to receive a representation of the picture processing index as an index into a list.
  • the list may be determined based on the picture processing index of the spatial and/or temporal neighbors of the block under consideration.
  • the list represents an ordering based on the frequency of occurrence of picture processing index in the spatial and/or temporal neighborhood.
  • a pre-determined tie-breaking mechanism is used to enforce a ordering.
  • the pre-determined tie-breaking mechanism places a picture processing index first in the list if it belongs to a first preferred neighborhood amongst the neighborhoods under consideration.
  • the pre-determined tie-breaking mechanism places a picture processing index second in the list if it belongs to a second preferred neighborhood amongst the neighborhoods under consideration, and so on.
  • the electronic device is configured to receive a representation of the pixel processing index as an index into a list.
  • a first picture processing index 1310 (for example, which occurs the most frequently in the past picture, or slice, or a part of the bitstream) corresponds to the top of the list 1311.
  • the first picture processing index may be received in the picture parameter set, the sequence parameter set, the adaptation parameter set, the slice header, the picture parameter set extension, the sequence parameter set extension, the slice header extension or by any normative method from receiving information from an encoder by a decoder.
  • the rest of the list corresponds to the remaining picture processing indices in a pre-determined ordering (for example, the second position 1321 in the list is occupied by the next smallest picture processing index remaining 1320, the third position 1331 in the list is occupied by the next smallest picture processing index remaining 1330, and so on).
  • the picture processing index may be obtained from the list index using the processing steps outlined in FIG 13B.
  • the picture processor is an up-sampler.
  • the table below represents a syntax table where the up-sampler is mapped to an index into a list. This index is sent conditionally within a coding unit (based on a layer identifier and coding unit level intraBL flag).
  • the index into the list is the syntax element cu_mapped_upsampler_idx[x0][y0].
  • cu_intra_bl_flag[ x0 ][ y0 ] specify that the sample values corresponding to the coding unit are copied from an up-sampled version of the co-located base layer picture.
  • the array indices x0 , y0 specify the location ( x0 , y0 ) of the top-left luma sample of the considered prediction block relative to the top-left luma sample of the picture.
  • cu_mapped_upsampler_idx[ x0 ][ y0 ] specify the mapped up-sampler index.
  • the array indices x0 , y0 specify the location ( x0 , y0 ) of the top-left luma sample of the considered prediction block relative to the top-left luma sample of the picture.
  • the up-sampler index may be obtained by means of a look-up table.
  • getLayerId() is afunction which returns an identifier corresponding to the current layer. For base layer the function getLayerId() returns a zero value. For non-base layer the function getLayerId() returns a non-zero value.
  • transquant_bypass_enable_flag 1 specifies that cu_transquant_bypass_flag is present.
  • transquant_bypass_enable_flag 0 specifies that cu_transquant_bypass_flag is not present. .
  • transquant_bypass_enable_flag is signaled in the picture parameter set.
  • cu_transquant_bypass_flag 1 specifies that the scaling and transform process and the in-loop filter process are bypassed. When cu_transquant_bypass_flag is not present, it is inferred to be equal to 0.
  • cu_skip_flag[ x0 ][ y0 ] 1 specifies that for the current coding unit, when decoding a P or B slice, no more syntax elements except the merging candidate index merge_idx[ x0 ][ y0 ] are parsed after cu_skip_flag[ x0 ][ y0 ].
  • cu_skip_flag[ x0 ][ y0 ] 0 specifies that the coding unit is not skipped.
  • the array indices x0, y0 specify the location ( x0, y0 ) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture.
  • the electronic device is configured to only use a subset of picture processors within a partitioning.
  • the list of picture processing indices corresponding to the picture processors that are to be used is received in the bitstream.
  • the list of picture processors may be signaled in the picture parameter set, the sequence parameter set, the adaptation parameter set, the slice header, the picture parameter set extension, the sequence parameter set extension, the slice header extension or any normative method for signaling from an encoder to a decoder.
  • the subset of picture processors may be further mapped to second set of indices (for example a set of indices which are more easily compressed).
  • a representation of the picture processing index is received that maps an element in a second set of indices, this element is then signaled in the bitstream to identify the picture processor selected.
  • the subset of picture processing indices is mapped to a sequentially increasing second set.
  • the syntax table listed below is an example where the picture processors to be used are bilateral filters and they are signaled in the picture parameter set:
  • bilateral_filter_idx[ i ] specifies the bilateral filter to be used from a pre-determined set of bilateral filters.
  • a bilateral filter may in turn be mapped to a picture processing index.
  • pps_pic_parameter_set_id identifies the picture parameter set for reference by other syntax elements.
  • the value of pps_pic_parameter_set_id shall be in the range of 0 to 63, inclusive.
  • log2_parallel_merge_level_minus2 specifies the parallel processing level of merge/skip mode.
  • the value of log2_parallel_merge_level_minus2 shall be in the range of 0 to log2_min_luma_coding_block_size_minus3 + 1 + log2_diff_max_min_luma_coding_block_size, inclusive.
  • num_extra_slice_header_bits 0 specifies that no extra slice header bits are present in the slice header RBSP for coded pictures referring to the picture parameter set.
  • num_extra_slice_header_bits shall be equal to 0 in bitstreams conforming to this version of this Specification. Other values for num_extra_slice_header_bits are reserved for future use by ITU-T
  • the electronic device is configured to use a first subset of picture processors.
  • Control parameters for the first subset of picture processors are received in the bitstream.
  • the control parameters are received in the picture parameter set, the sequence parameter set, the adaptation parameter set, the slice header, the picture parameter set extension, the sequence parameter set extension, the slice header extension or by any normative method for receiving information from an encoder by a decoder.
  • the subset of picture processor parameters is further mapped to second set of indices (for example a set of indices which are more easily compressed). In such an example the second set of indices are used to identify the picture processor selected.
  • the subset of picture processor parameters is mapped to a sequentially increasing second set.
  • the system and apparatus described above may use dedicated processor systems, micro controllers, programmable logic devices, microprocessors, or any combination thereof, to perform some or all of the operations described herein. Some of the operations described above may be implemented in software and other operations may be implemented in hardware. One or more of the operations, processes, and/or methods described herein may be performed by an apparatus, a device, and/or a system substantially similar to those as described herein and with reference to the illustrated figures.
  • a processing device may execute instructions or "code" stored in memory.
  • the memory may store data as well.
  • the processing device may include, but may not be limited to, an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, or the like.
  • the processing device may be part of an integrated control system or system manager, or may be provided as a portable electronic device configured to interface with a networked system either locally or remotely via wireless transmission.
  • the processor memory may be integrated together with the processing device, for example RAM or FLASH memory disposed within an integrated circuit microprocessor or the like.
  • the memory may comprise an independent device, such as an external disk drive, a storage array, a portable FLASH key fob, or the like.
  • the memory and processing device may be operatively coupled together, or in communication with each other, for example by an I/O port, a network connection, or the like, and the processing device may read a file stored on the memory.
  • Associated memory may be "read only" by design (ROM) by virtue of permission settings, or not.
  • Other examples of memory may include, but may not be limited to, WORM, EPROM, EEPROM, FLASH, or the like, which may be implemented in solid state semiconductor devices.
  • Other memories may comprise moving parts, such as a conventional rotating disk drive. All such memories may be "machine-readable” and may be readable by a processing device.
  • Operating instructions or commands may be implemented or embodied in tangible forms of stored computer software (also known as "computer program” or “code”).
  • Programs, or code may be stored in a digital memory and may be read by the processing device.
  • “Computer-readable storage medium” (or alternatively, “machine-readable storage medium”) may include all of the foregoing types of memory, as well as new technologies of the future, as long as the memory may be capable of storing digital information in the nature of a computer program or other data, at least temporarily, and as long as the stored information may be “read” by an appropriate processing device.
  • the term “computer-readable” may not be limited to the historical usage of "computer” to imply a complete mainframe, mini-computer, desktop or even laptop computer.
  • “computer-readable” may comprise storage medium that may be readable by a processor, a processing device, or any computing system.
  • Such media may be any available media that may be locally and/or remotely accessible by a computer or a processor, and may include volatile and non-volatile media, and removable and non-removable media, or any combination thereof.
  • a program stored in a computer-readable storage medium may comprise a computer program product.
  • a storage medium may be used as a convenient means to store or transport a computer program.
  • the operations may be described as various interconnected or coupled functional blocks or diagrams. However, there may be cases where these functional blocks or diagrams may be equivalently aggregated into a single logic device, program or operation with unclear boundaries.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un système utilisant un traitement d'image dans un système vidéo échelonnable. Le système peut comprendre un dispositif électronique configuré pour récupérer un indice de traitement d'image correspondant à un ou plusieurs processeurs d'image, par exemple des sur-échantillonneurs, des filtres, ou analogues, ou toute combinaison de ceux-ci. L'indice de traitement d'image peut associer un processeur d'image particulier d'un ensemble de processeurs d'image accessibles au décodeur à une unité, par exemple une unité de codage ou une unité de prédiction d'une unité de codage.
PCT/JP2013/005458 2012-09-28 2013-09-13 Traitement d'image dans des systèmes vidéo échelonnables WO2014050009A1 (fr)

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US13/631,857 2012-09-28
US13/631,857 US20140092971A1 (en) 2012-09-28 2012-09-28 Picture processing in scalable video systems
US201361749817P 2013-01-07 2013-01-07
US61/749,817 2013-01-07
US201361773721P 2013-03-06 2013-03-06
US61/773,721 2013-03-06
US201361774507P 2013-03-07 2013-03-07
US61/774,507 2013-03-07
US201361806782P 2013-03-29 2013-03-29
US61/806,782 2013-03-29

Publications (1)

Publication Number Publication Date
WO2014050009A1 true WO2014050009A1 (fr) 2014-04-03

Family

ID=50387453

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/005458 WO2014050009A1 (fr) 2012-09-28 2013-09-13 Traitement d'image dans des systèmes vidéo échelonnables

Country Status (1)

Country Link
WO (1) WO2014050009A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11266457A (ja) * 1998-01-14 1999-09-28 Canon Inc 画像処理装置、方法、及び記録媒体
JP2004173246A (ja) * 2002-11-07 2004-06-17 Victor Co Of Japan Ltd 動画像時間軸階層符号化方法、符号化装置、復号化方法及び復号化装置並びにコンピュータプログラム
JP2008544708A (ja) * 2005-06-24 2008-12-04 株式会社エヌ・ティ・ティ・ドコモ 適応補間を使用する映像符号化及び復号化の方法と装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11266457A (ja) * 1998-01-14 1999-09-28 Canon Inc 画像処理装置、方法、及び記録媒体
JP2004173246A (ja) * 2002-11-07 2004-06-17 Victor Co Of Japan Ltd 動画像時間軸階層符号化方法、符号化装置、復号化方法及び復号化装置並びにコンピュータプログラム
JP2008544708A (ja) * 2005-06-24 2008-12-04 株式会社エヌ・ティ・ティ・ドコモ 適応補間を使用する映像符号化及び復号化の方法と装置

Similar Documents

Publication Publication Date Title
US9516344B2 (en) Motion derivation and coding for scaling video
US11166040B2 (en) Video signal processing method and apparatus using adaptive motion vector resolution
WO2020244610A1 (fr) Détermination de contexte permettant une prédiction intra basée sur une matrice
WO2020233663A1 (fr) Prédiction intra basée sur une matrice utilisant un suréchantillonnage
CN114631317B (zh) 子图片的参数集信令中的语法约束
WO2020221372A1 (fr) Codage de contexte pour prédiction intra basée sur une matrice
WO2020221373A1 (fr) Prédiction intra basée sur une matrice, utilisant un filtrage
EP3005702B1 (fr) Ré-échantillonnage au moyen d'un facteur d'échelle
KR20160019068A (ko) 멀티-계층 비디오 코딩에서의 계층간 예측 타입들에 관한 비트스트림 제약들의 프로세싱
JP7404526B2 (ja) カラーコンポーネントに基づくシンタックスシグナリング及び構文解析
US20210243476A1 (en) Method and device for processing video signal by using subblock-based motion compensation
KR20140007269A (ko) 스케일러블 비디오 부호화 방법 및 그 장치, 스케일러블 비디오 복호화 방법 및 장치
US11917196B2 (en) Initialization for counter-based intra prediction mode
TW201820874A (zh) 影像解碼裝置、影像解碼方法、及儲存影像解碼程式之記錄媒體
KR20170110556A (ko) 움직임 벡터 차분치를 이용하는 영상 부호화 및 복호화 방법과 영상 복호화 장치
US20140092971A1 (en) Picture processing in scalable video systems
US20220182665A1 (en) Counter-based intra prediction mode
CN115699756A (zh) 视频编解码中的缩放窗口
CN115104307A (zh) 基于重采样色度信号的视频编码和解码
EP3005706A1 (fr) Commande de gamme dynamique de données intermédiaires dans un processus de rééchantillonnage
US20140092972A1 (en) Picture processing in scalable video systems
WO2014050009A1 (fr) Traitement d'image dans des systèmes vidéo échelonnables
CN114270830A (zh) 基于亮度样本的映射和色度样本的缩放的视频或图像编码
CN114175651A (zh) 基于亮度映射和色度缩放的视频或图像编码
KR101782155B1 (ko) 움직임 벡터 정밀성을 이용하는 영상 부호화 및 복호화 방법과 영상 복호화 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13840800

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13840800

Country of ref document: EP

Kind code of ref document: A1