WO2014006263A1 - A method and apparatus for scalable video coding - Google Patents

A method and apparatus for scalable video coding Download PDF

Info

Publication number
WO2014006263A1
WO2014006263A1 PCT/FI2012/050701 FI2012050701W WO2014006263A1 WO 2014006263 A1 WO2014006263 A1 WO 2014006263A1 FI 2012050701 W FI2012050701 W FI 2012050701W WO 2014006263 A1 WO2014006263 A1 WO 2014006263A1
Authority
WO
WIPO (PCT)
Prior art keywords
video picture
pixel
macro block
pixels
picture
Prior art date
Application number
PCT/FI2012/050701
Other languages
French (fr)
Inventor
Jani Lainema
Kemal Ugur
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/FI2012/050701 priority Critical patent/WO2014006263A1/en
Publication of WO2014006263A1 publication Critical patent/WO2014006263A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability

Definitions

  • the present application relates to an apparatus and method for coding and decoding a video signal.
  • a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
  • Scalable video coding refers to a coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions, frame rates and/or other types of scalability.
  • a scalable bitstream may consist of a base layer providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers.
  • the coded representation of that layer may depend on the lower layers.
  • Each layer together with all its dependent layers is one representation of the video signal at a certain spatial resolution, temporal resolution, quality level, and/or operation point of other types of scalability.
  • a method comprising:. selecting a pixel from a first video picture; selecting a further pixel from the first video picture; selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
  • Predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture may comprise predicting the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
  • the further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture.
  • the method may further comprise generating a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may indicate that the further macro block of pixels of the first video picture may be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may also be horizontally positioned relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture may be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may also be vertically positioned relative to the macro block of pixels of the second video picture.
  • the first video picture may be associated with an enhancement level picture of a scalable video coder
  • the second video picture may be associated with a reconstructed base layer picture of a scalable video coder.
  • an apparatus configured to: select a pixel from a first video picture; select a further pixel from the first video picture; select a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; select a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
  • the apparatus configured to predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture may be further configured to predict the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
  • the apparatus configured to select the pixel from the first video picture may be configured to select the pixel from a macro block of pixels of the first video picture.
  • the apparatus configured to select the further pixel from the first video picture may be configured to select the further pixel from a further macro block of pixels from the first video picture.
  • the apparatus configured to select the pixel from the second video picture may be configured to select the pixel from a macro block of pixels of the second video picture.
  • the apparatus configured to select the further pixel from the second video picture may be configured to select the further pixel from a further macro block of pixels of the second video picture.
  • the further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture.
  • the apparatus may be further configured to generate a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may indicate that the further macro block of pixels of the first video picture can be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can also be horizontally positioned relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture can be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can also be vertically positioned relative to the macro block of pixels of the second video picture.
  • the first video picture may be associated with an enhancement level picture of a scalable video coder
  • the second video picture may be associated with a reconstructed base layer picture of a scalable video coder.
  • an apparatus comprising at least one processor and at least one memory including computer code for one or more programs, the at least one memory and the computer code configured with the at least one processor to cause the apparatus at least to: select a pixel from a first video picture; select a further pixel from the first video picture; select a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; select a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in
  • the apparatus caused to predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture may be further caused to predict the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
  • the apparatus caused to select the pixel from the first video picture may be caused to select the pixel from a macro block of pixels of the first video picture.
  • the apparatus caused to select the further pixel from the first video picture may be caused to select the further pixel from a further macro block of pixels from the first video picture.
  • the apparatus caused to select the pixel from the second video picture may be caused to select the pixel from a macro block of pixels of the second video picture.
  • the apparatus caused to select the further pixel from the second video picture may be caused to select the further pixel from a further macro block of pixels of the second video picture.
  • the further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and wherein the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture.
  • the apparatus may be further caused to generate a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may indicate that the further macro block of pixels of the first video picture may be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may be horizontally positioned relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture may be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may be vertically positioned relative to the macro block of pixels of the second video picture.
  • the first video picture may be associated with an enhancement level picture of a scalable video coder
  • the second video picture may be associated with a reconstructed base layer picture of a scalable video coder
  • a non-transitory computer- readable storage medium having stored thereon computer-readable code, which, when executed by computing apparatus, causes the computing apparatus to perform a method comprising: selecting a pixel from a first video picture; selecting a further pixel from the first video picture; selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
  • the non-transitory computer-readable storage medium having stored thereon computer- readable code, which, when executed by computing apparatus, causes the computing apparatus to perform predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel in the second video picture may cause the computing apparatus to perform predicting the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
  • Selecting the pixel from the first video picture may cause the computing apparatus to perform selecting the pixel from a macro block of pixels of the first video picture.
  • Selecting the further pixel from the first video picture may cause the computing apparatus to perform selecting the further pixel from a further macro block of pixels from the first video picture.
  • Selecting the pixel from the second video picture may cause the computing apparatus to perform selecting the pixel from a macro block of pixels of the second video picture
  • Selecting the further pixel from the second video picture may cause the computing apparatus to perform selecting the further pixel from a further macro block of pixels of the second video picture.
  • the further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture.
  • the non-transitory computer-readable storage medium having stored thereon computer- readable code, which, when executed by computing apparatus, may further cause the computing apparatus to perform generating a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may indicate that the further macro block of pixels of the first video picture can be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can be horizontally positioned relative to the macro block of pixels of the second video picture.
  • the direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture can be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can be vertically positioned relative to the macro block of pixels of the second video picture.
  • the first video picture may be associated with an enhancement level picture of a scalable video coder
  • the second video picture may be associated with a reconstructed base layer picture of a scalable video coder.
  • FIG 1 shows schematically an electronic device employing embodiments of the invention
  • Figure 2 shows schematically a user equipment suitable for employing embodiments of the invention
  • FIG. 3 further shows schematically electronic devices employing embodiments of the invention connected using wireless and wired network connections
  • Figure 4 shows schematically an embodiment as incorporated within an encoder
  • Figure 5 shows a flow diagram showing the operation of an embodiment with respect to the enhancement layer picture pixel predictor as shown in Figure 4;
  • Figure 6 shows a simplified representation of generating a predicted pixel sample in the enhancement layer picture pixel predictor
  • Figure 7 shows a schematic diagram of a decoder according to embodiments of the invention.
  • Figure 8 shows a flow diagram showing the operation of an embodiment with respect to the decoder shown in Figure 7.
  • Figure 1 shows a schematic block diagram of an exemplary apparatus or electronic device 50, which may incorporate a codec according to an embodiment of the invention.
  • the electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 further may comprise a display 32 in the form of a liquid crystal display.
  • the display may be any suitable display technology suitable to display an image or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the apparatus may further comprise an infrared port 42 for short range line of sight communication to other devices.
  • the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
  • the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • a card reader 48 and a smart card 46 for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 further may comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting and receiving radio frequency signals generated at the radio interface circuitry 52.
  • the apparatus 50 comprises a camera capable of recording or detecting individual frames which are then passed to the codec 54 or controller for processing.
  • the apparatus may receive the video image data for processing from an adjacent device prior to transmission and/or storage.
  • the apparatus 50 may receive either wirelessly or by a wired connection the image for coding/decoding.
  • FIG 3 a system within which embodiments of the present invention can be utilised is shown.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA network etc), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • a wireless cellular telephone network such as a GSM, UMTS, CDMA network etc
  • WLAN wireless local area network
  • the system 10 may include both wired and wireless communication devices or apparatus 50 suitable for implementing embodiments of the invention.
  • the system shown in Figure 3 shows a mobile telephone network 1 1 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination personal digital assistant (PDA) and mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22.
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an aeroplane, a bicycle, a motorcycle or any similar suitable mode of transport.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 1 1 and the internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • IMS instant messaging service
  • Bluetooth IEEE 802.1 1 and any similar wireless communication technology.
  • a communications device involves in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable
  • Typical video codecs for example International Telephone Union - Technical Board (ITU-T) H.263 and H.264 coding standards, encode video information in two phases.
  • pixel values in a certain picture area or "block" are predicted. These pixel values can be predicted, for example, by motion compensation mechanisms, which involve finding and indicating an area in one of the previously encoded video frames (or a later coded video frame) that corresponds closely to the block being coded. Additionally, pixel values can be predicted by spatial mechanisms which involve finding and indicating a spatial region relationship.
  • the second phase is one of coding the error between the predicted block of pixels and the original block of pixels. This is typically accomplished by transforming the difference in pixel values using a specified transform. This transform is typically a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized and entropy encoded.
  • DCT Discrete Cosine Transform
  • the encoder can control the balance between the accuracy of the pixel representation, (in other words, the quality of the picture) and the size of the resulting encoded video representation (in other words, the file size or transmission bit rate).
  • the decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantised prediction signal in the spatial domain).
  • the decoder After applying pixel prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the pixel values) to form the output video frame.
  • the decoder (and encoder) may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming frames in the video sequence.
  • the motion information is indicated by motion vectors associated with each motion compensated image block.
  • Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder) and the prediction source block in one of the previously coded or decoded images (or pictures).
  • motion vectors are typically coded differentially with respect to block specific predicted motion vector.
  • the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
  • the prediction residual after motion compensation can be first transformed with a transform kernel, for example a discrete cosine transform (DOT), and then coded.
  • a transform kernel for example a discrete cosine transform (DOT)
  • DOT discrete cosine transform
  • Typical video encoders utilise the Lagrangian cost function to find optimal coding modes, for example the desired macro block mode and associated motion vectors.
  • This type of cost function uses a weighting factor or ⁇ to tie together the exact or estimated image distortion due to lossy coding methods and the exact or estimated amount of information required to represent the pixel values in an image area. This may be represented by the equation:
  • C D+ R
  • D the image distortion (in other words the mean-squared error) with the mode and motion vectors currently considered
  • R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).
  • Scalable video coding refers to a video coding methodology in which multiple representations of video content are encoded into a bit stream. Each representation of the video content may be encoded either at different bitrate, resolution or frame rate.
  • the receiver may then extract the desired encoded video representation depending on the characteristics of the receiving device. For example, a receiving device may select an encoded video representation which best matches the resolution of the display device.
  • a server or a network element may extract sections of the bitstream for further transmission to a receiver, where such sections may be associated with a particular representation of the encoded video content.
  • the particular representation extracted for transmission may be dependent on factors such as network characteristics or processing capabilities of the receiver.
  • a typical scalable bitstream may consist of a base layer and one or more enhancement layers.
  • the base layer may provide for the lowest quality video when decoded in the absence of the enhancement layers, and each further enhancement layer may provide further data for the decoding operation. Therefore, subsequently received enhancement layers can result in a progressive improvement to the quality of the decoded video signal.
  • One of the particular features of layered based coding is the possibility offered of intervening at any level whatsoever of the transmission or storage chain, so as to delete a part of binary stream without having to include any particular indication to the decoder.
  • Scalable video coding systems may improve coding efficiency of an enhancement layer by exploiting similarities between the base layer and any intervening enhancement layer.
  • SVC may use inter layer prediction of certain video coding parameters.
  • Inter layer motion prediction may also include the prediction of block coding mode, and header information, wherein motion vectors from a lower layer may be used for prediction of the higher layer.
  • the pixel data of base or lower enhancement layers can be used to predict pixel data for an enhancement layer.
  • macro blocks of an enhancement layer may be predicted from corresponding surrounding macro blocks or co-located macro blocks from the base layer or intervening enhancement layers. It is to be understood that intra prediction techniques may not employ macro block information from coded pictures at other time instances.
  • an SVC codec may employ residual data from the lower or intervening layers for the prediction of the current layer.
  • a SVC coding system may comprise a conventional non-scalable video coder and decoder which may be used to formulate the base layer coded video content.
  • the base layer encoded video content may then be decoded at the encoding side of the process in order to form the reconstructed base layer pictures.
  • the reconstructed base layer pictures may then form the bases for any inter layer prediction within subsequent enhancement layers.
  • reconstructed pictures associated with intervening enhancement layers may be used as a base for further inter layer prediction of further intervening enhancement layers.
  • reconstructed pictures of base and intervening enhancement coding layers may be stored in a reconstructed picture buffer in order to available for use in subsequent inter layer coding of further enhancement layers.
  • reconstructed pictures associated with a base layer may be stored in the reference picture buffer of an enhancement layer.
  • reconstructed pictures may be stored in the form of reference picture lists.
  • the base layer reconstructed pictures which are used in the inter layer prediction for a picture of an enhancement layer may be inserted into the reference picture list of the enhancement layer.
  • the video encoder may then select a base layer reference picture as a inter prediction reference and indicate the use as a reference by inserting a reference picture index into the coded bit stream.
  • FIG. 4 a block diagram of a video encoder suitable for carrying out embodiments of the invention is shown. Furthermore, with respect to Figure 5, the operation of the encoder is shown which exemplifies embodiments of the invention specifically with respect the inter layer prediction and coding of a block of pixels of an enhancement layer picture.
  • Figure 4 shows the encoder as configured to encode an enhancement layer of a scalable video coding system.
  • the enhancement layer encoder of Figure 4 is depicted as comprising an enhancement layer pixel predictor 302, prediction error encoder 303 and the prediction error decoder 304.
  • the enhancement layer pixel predictor 302 receives the enhanced layer image frame 300 to be encoded at the block inter-layer predictor 306 which can determine the difference between the image 300 and a reference frame 318.
  • the reference frame 318 can be a reconstructed picture frame from the base layer.
  • the reconstructed base layer picture frame 318 may be of the same time instant as that of the enhanced layer picture frame 300.
  • the reconstructed base layer picture frame 318 can be the decoded and reconstructed base layer picture associated with the enhanced layer image frame 300.
  • the reconstructed base layer having been formed by the SVC encoder as a result of a previous operation to encode the base layer picture.
  • the reconstructed base layer picture frame 318 may be stored in the Decoded Picture Buffer (DPB) or picture frame memory associated with the enhancement layer pixel predictor 302.
  • DPB Decoded Picture Buffer
  • block inter layer predictor 306 The operation of the block inter layer predictor 306 will be described hereafter in further detail.
  • the encoder generates images in terms of NxN pixel macro blocks which go to form the full image or picture. It would be appreciated that various different sizes of macro blocks may be adopted by the block inter layer predictor 306. For example, some embodiments may deploy a 16x16 pixel macro block size, whereas other embodiments may deploy other macro block sizes such as a 8x8 pixel macro block size.
  • the block inter layer predictor 306 may receive a macro block of the enhancement layer picture 300, or in other words a macro block of the enhancement layer picture 300 is selected as shown in Figure 5, step 501 .
  • the block inter-layer predictor 306 may then identify a directionality of prediction, or prediction direction, associated with a macro block of the enhancement layer image.
  • a single value for the directionality of prediction may be assigned to a macro block.
  • each macro block of pixels may have a specific prediction direction assigned to it, thereby allowing prediction direction to be assigned on a macro block by macro block basis.
  • Some embodiments may choose a horizontal direction of prediction in which the value of a pixel within a macro block may be predicted from a pixel value sited in the same row within the enhancement layer picture.
  • the pixel value used for the prediction can be drawn from the same row within a neighbouring macro block.
  • Other embodiments may choose to predict a pixel value using a further pixel value from the same column within the enhancement layer picture as the predicted pixel value.
  • a vertical direction of prediction is used.
  • typically the pixel value used to in the prediction can be drawn from the same column within a neighbouring macro block.
  • the processing step of identifying a prediction direction for pixels of the selected macro block is shown as step 503 in Figure 5.
  • the block inter layer predictor 306 may identify a candidate pixel to be predicted from a macro block of the enhancement layer picture 300.
  • an example of a pixel P(x,y) 601 is shown as being selected for prediction from the macro block 603.
  • block layer inter layer predictor 306 may provide means for selecting a pixel from an enhancement layer picture.
  • variables x and y may represent the coordinate position of the pixel P(x,y) within the macro block 603, and the value of the pixel P is a variable dependent on the coordinate position.
  • the step of selecting a pixel P(x,y) for prediction in block of an enhancement layer picture 300 is shown as processing step 505 in Figure 5.
  • the block inter layer predictor 306 may determine a reference sample from which the pixel P(x,y) can be predicted. With reference to Figure 6, the reference pixel sample is shown as PR(xR,yR) 605.
  • the reference pixel PR(xR, yR) 605 may be drawn from a neighbouring macro block within the same enhancement layer picture.
  • the neighbouring macro block may be depicted as 607 in Figure 6.
  • the block layer inter layer predictor 306 may provide means for selecting a further pixel for use in the prediction of the pixel, the further pixel being selected from the enhancement layer picture.
  • the reference pixel PR(xR,yR) may be selected according to either horizontal or vertical directions of prediction. This has the advantage of simplifying the any implementation of embodiments since reference pixel can be obtained from directly neighbouring macro blocks to that of the macro block containing the identified pixel P(x,y).
  • Figure 6 depicts the case in which the pixel P(x,y) may be predicted using a pixel reference value based upon the vertical prediction mode.
  • the value of the pixel P(x,y) is selected such that it can be predicted using a reference value pixel PR(xR, yR) in the same column.
  • the value of the pixel P(x,y) may be selected to be predicted using a reference pixel in the same row from a neighbouring macro block.
  • the reference pixel for predicting the value of pixel P(x,y) can be drawn from the row or column immediately next to the macro block containing the pixel
  • the reference pixel PR(xR, yR) 605 used to predict the value of the pixel P(x,y) 601 at position x, y in the current macro block may be drawn from the row immediately above the macro block and within the same column containing said pixel P(x,y) 601. In Figure 6 this row may be depicted as row 607.
  • the reference pixel PR(xR, yR) used to predict the value of the pixel P(x,y) at position x, y may be drawn from the column immediately adjoining the vertical side of the macro block and within the same row containing said pixel P(x,y) 601 .
  • the reference pixel may be selected from the row directly above the position of the pixel P(x,y) whose value is to be predicted.
  • the processing step of selecting a pixel in a neighbouring macro block of the enhancement level image 300 from which to predict a value of a pixel in a current macro block of the enhancement level image is shown as processing step 507 in Figure 5.
  • the block inter layer predictor 306 may then select a pixel from a corresponding macro block of the reconstructed base layer picture 318 which corresponds to the location of the pixel P(x,y) in the enhancement layer picture 300.
  • the selected reconstructed base layer pixel lies within a corresponding equivalent marco block to the macro block which contains the pixel P(x,y) in the enhancement layer picture
  • the pixel selected from the reconstructed base layer has a position within the marco block which corresponds to the position of the pixel P(x,y) in the marco block of the enhancement layer picture.
  • the corresponding base layer pixel may be depicted as pixel B(x',y') 610 within the macro block 613 of the reconstructed base layer picture 318. It may be seen that the pixel B(x',y') 610 lies in the same position of the reconstructed base layer picture macro block as the pixel P(x,y) in the macro block of the enhancement layer picture.
  • processing step 509 in Figure 5 The processing step of selecting a pixel in the reconstructed base layer picture macro block which corresponds to the position of the pixel to be predicted in the current macro block of the enhancement level picture is shown as processing step 509 in Figure 5.
  • the block layer inter layer predictor 306 may provide means for selecting a pixel from a reconstructed base layer picture, the position of the pixel in the reconstructed base layer picture is equivalent to the position of the selected pixel from the enhancement layer picture. Further, the block inter-layer predictor 306 may also select a pixel in the reconstructed base layer picture 318 which corresponds to the same location as the reference pixel sample PR(xR,yR) 605 in the enhancement layer picture 300.
  • the selected pixel in the reconstructed base layer picture which corresponds to the reference pixel sample PR(xR,yR) 605 may lie in a macro block whose position in the reconstructed base layer picture corresponds to the same position as the macro block containing the reference pixel sample PR(xR,yR) 605 in the enhancement layer picture. Furthermore, the pixel selected from the reconstructed base layer may have a position within the macro block which corresponds to the position of the pixel PR(xR,yR) 605 in the marco block of the enhancement layer picture.
  • the block inter-layer predictor 306 may identify the base layer reference pixel BR(xR',yR') 615 in the in the reconstructed base layer picture 318 which corresponds to the position of the reference pixel sample PR(xR,yR) 605 in the enhancement layer picture 300.
  • the reconstructed base layer reference pixel BR(xR',yR') 615 may be drawn from a macro block 617 which neighbours the macro block 613 containing the base layer pixel B(x',y') 610.
  • the neighbouring macro block 617 from which the reconstructed base layer reference pixel BR(xR',yR') 615 may be drawn can correspond to the equivalent macro block 607 in the enhancement layer picture 300.
  • the enhancement layer picture 300 may be at a higher resolution to that of the corresponding base layer picture.
  • the macro blocks of the reconstructed base layer image may be upsampled in order to have the same spatial resolution as the macro blocks of the enhancement layer picture 300.
  • the coordinate positions of the reconstructed base layer picture pixels and reconstructed base layer reference picture pixels may then have equivalent coordinate positions to their corresponding enhancement layer picture pixels and enhancement layer reference picture pixels.
  • the corresponding reconstructed base layer picture may not be up sampled to be of the same spatial resolution as the enhancement layer picture, and as such the coordinate systems of the reconstructed base layer picture and enhancement layer picture may be different from each other.
  • processing step 51 1 in Figure 5 The processing step of selecting a pixel in the reconstructed base layer picture macro block which corresponds to the position of the pixel in the neighbouring macro block which is used to predict the pixel in the current macro block of the enhancement level picture is shown as processing step 51 1 in Figure 5.
  • the block layer inter layer predictor 306 may provide means for selecting a further pixel from the reconstructed base layer picture, the position of the further pixel in the reconstructed base layer picture is equivalent to the position of the selected further pixel in the enhancement layer picture.
  • the block inter layer predictor 306 may then use the relative difference in pixel values between a reconstructed base layer pixel B(x',y') and its respective reconstructed base layer reference pixel BR(xR',yR') in order to predict the value of the pixel P at the coordinate position x , y relative to a reference pixel PR(xR,yR) in a neighbouring macro block of an enhancement layer picture 300.
  • the difference in pixel values between a reconstructed base layer pixel and its respective reconstructed base layer reference pixel for a specific pixel position may be equivalent to the difference in pixel values between a pixel and its corresponding reference pixel at the same positions within an enhancement layer picture.
  • a predicted value for the pixel P with coordinate positions x and y, P(x,y) may be obtained by adding the difference of the base layer pixel value B(x',y') and the base layer reference pixel value BR(xR',yR') to the enhancement layer reference sample PR(xR,yR). This may be expressed mathematically in relation to the non-limiting illustrative example of Figure 6 as
  • the processing step of predicting the pixel value P(x,y) in the inter block layer predictor 306 by summing the difference between the corresponding reconstructed base layer pixel and reference base layer pixel with the selected reference pixel of the enhancement layer picture is shown as processing step 513 in Figure 5.
  • the predicted pixel value at a position x,y P(x,y) exceeds an allowed dynamic range for a value of a pixel, the predicted pixel value may be limited according to
  • P(x,y) Clip ( PR(xR,yR) + (B(x',y') - BR(xR',yR')) ).
  • Clip() represents the operation to limit the value of P(x,y) to a desired range. For example, in an 8 bit pixel representation system the value of the predicted pixel P(x,y) may be limited to the range 0 to 255.
  • the block layer inter layer predictor 306 may provide means for predicting the value of the pixel from the enhancement layer picture dependent on the value of the selected further pixel from the enhancement layer picture and on the values of the selected pixel and the selected further pixel from the reconstructed base layer picture.
  • processing steps 501 to 513 for finding a predicted value of the pixel P(x,y) of a macro block of the enhancement layer picture may be repeated for each pixel position within the macro block.
  • each pixel value P(x,y) is at least in part dependent on a reference pixel value from a neighbouring reconstructed macro block of the enhancement layer picture which has been previously coded.
  • each pixel value P(x,y) may backward predicted using a previously encoded pixel value from a neighbouring reconstructed enhancement layer picture macro block.
  • the pixel values of the first macro block for each enhancement layer picture may be predicted by using a default initial value for each reference pixel value PR(xR,yR). This has the advantage that for each enhancement layer picture there is no requirement for reference pixel information to be transmitted to the decoder.
  • the output of the block inter layer predictor 306 is a pixel predicted representation of a macro block of the enhancement layer picture.
  • the pixel predicted representation of a macro block of the enhancement layer picture may be passed to a first summing device 321 .
  • the first summing device 321 may subtract the pixel predicted macro block representation of the enhancement layer picture (in other words the predicted enhancement layer picture) from the corresponding macro block of the enhancement layer picture 300 to produce a first prediction error signal 320 which may form an input to the prediction error encoder 303.
  • the block inter layer predictor 306 may be configured to provide a further output comprising the parameters from which the pixel predicted representation of the enhancement layer picture can be reconstructed.
  • the parameters may comprise for each macro block of the predicted enhanced layer picture an indicator for indicating the directionality of prediction.
  • the prediction error encoder 303 may be configured to carry out any known residual encoding algorithm known in the art for the pixels of a macro block of the prediction error signal 320.
  • the output from the prediction error encoder 303 may be the encoded macro block of the prediction error signal 325.
  • the encoded macro block of the prediction error signal 335 may be passed to the entropy encoder 330 and also to the prediction error decoder 304.
  • the prediction error decoder 304 receives the output from the prediction error encoder 303 and performs the opposite processes of the prediction error encoder 303 to produce a macro block of a decoded prediction error signal 338 which when combined with a macro block of the predicted enhancement layer picture 312 at the second summing device 339 produces a macro block of the reconstructed enhancement layer picture 314.
  • the macro block of the reconstructed enhancement layer picture 314 may then be passed to the block layer inter predictor 306 whereby it can be used for the prediction of a subsequent macro block of the enhancement level picture 300.
  • the macro block of the reconstructed enhancement layer picture may be passed into a picture frame memory for the prediction of further enhancement layers.
  • the entropy encoder 330 receives the output of the prediction error encoder 303 and may perform a suitable entropy encoding/variable length encoding on the signal to provide error detection and correction capability. Any suitable entropy encoding algorithm may be employed.
  • the entropy encoder 330 may also receive any parameters form the inter layer block encoder 306 which may be used for the decoding of the predicted enhancement layer picture.lt is to be appreciated that the enhancement layer picture encoder generates pictures in terms of NxN pixel macro blocks which go to form the full picture or image. Thus the enhancement layer picture encoder produce a series of encoded macro blocks which represent when decoded the reconstructed enhancement layer picture.
  • Figure 7 shows a block diagram of a video decoder suitable for employing embodiments of the invention.
  • the decoder shows an entropy decoder 700 which performs entropy decoding on the received signal.
  • the entropy decoder thus performs the inverse operation to the entropy encoder 330 of the encoder described above.
  • the entropy decoder 700 outputs the results of the entropy decoding to a prediction error decoder 702 and a decoder enhancement layer pixel predictor 704.
  • the decoder enhancement layer pixel predictor 704 receives the output of the entropy decoder 700 and outputs a macro block of a predicted representation of the enhancement level picture 716 to a first combiner 713.
  • the predicted representation of the enhancement level picture macro block 716 is used in conjunction with a macro block of the reconstructed prediction error signal 712 to generate a macro block of a reconstructed enhancement level picture 718.
  • the macro block of the reconstructed enhancement level picture 718 may be used in the block inter layer predictor 714 and also may be passed to a filter 720.
  • the filter 720 applies a filtering which outputs a final reconstructed enhancement level picture 722.
  • the final reconstructed enhancement level picture 722 may be stored in a reference frame memory 724, where the reference picture frame memory 724 may be connected to the decoder block inter layer predictor 714 in order to facilitate further prediction operations for higher enhancement layer pictures. It is to be appreciated that the reference picture frame memory 724 may also comprise reconstructed base layer pictures, the reconstructed base layer pictures having been decoded as a separate decoding operation within the SVC decoder.
  • decoder block inter layer predictor 714 The operation of the decoder block inter layer predictor 714 is described in further detail with respect to the flow diagram Figure 8.
  • the decoder block inter layer predictor 714 may form the predicted enhancement layer image on a macro block by macro block basis. Initially the decoder block inter layer predictor 714 may receive a decoded direction of prediction indicator from the entropy decoder 700.
  • the direction of prediction indicator indicates the direction from which a pixel value lies in relation to another pixel value which it is used to predict.
  • the direction of prediction indicator may be used to signify the direction of prediction for all pixels within a macro block.
  • the direction of prediction indicator may indicate whether the pixels in the macro block are predicted using pixels from a horizontal direction to the predicted pixel.
  • the direction of prediction indicator may indicate whether the pixels in the macro block are predicted using pixels from a vertical direction to the predicted pixel.
  • the decoder block inter layer predictor 714 may provide means for receiving a direction of prediction indicator.
  • the indicator indicates a position of a macro block of pixels of the enhancement layer picture relative to another macro block of pixels of the enhancement layer picture and a position of a macro block of pixels of the reconstructed base layer picture relative to another macro block of pixels of the reconstructed base layer picture.
  • the step of receiving the direction of prediction indicator for the current macro block is shown as processing step 801 in Figure 8.
  • the decoder block inter layer predictor 714 may then select a position x,y within the macro block for prediction of the pixel value at that position.
  • the decoder block layer inter layer predictor 714 may provide means for selecting a pixel from an enhancement layer picture.
  • processing step 803 The step of selecting a pixel position for predicting the value thereof in the current macro block is shown as processing step 803 in Figure 8.
  • predicting the value of the pixel at position x,y within a macro block may involve determining the reference pixel sample PR(xR, yR) in a previously decoded macro block or pixel value of the reconstructed enhancement layer picture 722. It is to be understood in the first group of embodiments the reference pixel sample may be determined to be in a neighbouring macro block to that of the macro block which is currently being decoded. The direction of the neighbouring macro block relative to the currently decoded macro block is given by the direction or prediction indicator. In other words the decoder block layer inter layer predictor 714 may provide means for selecting a further pixel for use in the prediction of the pixel, the further pixel being selected from the enhancement layer picture.
  • processing step 805 The step of selecting the reference pixel sample corresponding to the pixel value selected for prediction, the reference sample being selected from a neighbouring macro block of the reconstructed enhancement layer image is shown as processing step 805 in Figure 8.
  • the position of the reference pixel sample PR(xR, yR) may be constant relative to the position of pixel value P(x,y) which it is used to predict.
  • a particular pixel value P(x,y) within a macro block may use a reference pixel sample PR(xR, yR) which is always at the same position in a neighbouring macro block for the particular pixel value. Therefore for each predicted pixel value there is no requirement to transmit the position xR, yR of its associated reference pixel sample.
  • the decoder block inter layer predictor 714 may identify the corresponding base layer pixel B(x',y') and its respective base layer reference pixel BR(xR',yR').
  • a decoded base layer picture which may be associated with the reconstructed enhancement layer picture may be resident in the reference picture frame memory as a result of an earlier decoding operation by the SVC decoder.
  • the base layer pixel B(x',y') and its respective base layer reference pixel BR(xR',yR') may be drawn from the decoded reconstructed base layer picture associated with the reconstructed enhancement layer picture.
  • the processing step of selecting the base layer pixel B(x',y') and its respective base layer reference pixel BR(xR',yR') from the corresponding macro block of the reconstructed base layer image is shown as processing steps 807 and 809 in Figure 8.
  • the decoder block layer inter layer predictor 306 may provide means for selecting a pixel from the reconstructed base layer picture, the position of the pixel in the reconstructed base layer picture is equivalent to the position of the selected pixel from the enhancement layer picture. Further the decoder block layer inter layer predictor 306 may provide means for selecting a further pixel from the reconstructed base layer picture, the position of the further pixel in the reconstructed base layer picture is equivalent to the position of the selected further pixel from the enhancement layer picture
  • the value of the Pixel P(x,y) may then be determined by applying the expression as above of
  • P(x,y) PR(xR,yR) + (B(x',y') - BR(xR',yR')).
  • processing step 809 The processing step of determining the value of the predicted pixel P(x,y) is shown as processing step 809 in Figure 8.
  • the decoder block layer inter layer predictor 714 may provide means for predicting the value of the pixel from the enhancement layer picture dependent on the value of the selected further pixel from the enhancement layer picture and on the values of the selected pixel and the selected further pixel from the reconstructed base layer picture.
  • processing steps 801 to 81 1 may be repeated for each pixel position x,y in the currently decoded macro block of the reconstructed enhancement level picture.
  • the current macro block may be combined with the corresponding macro block of the reconstructed prediction error signal 712 to give a decoded macro block of the reconstructed enhancement layer picture via the summer 713
  • the decoded macro block of the reconstructed enhancement layer picture may then form a neighbouring macro block from which the pixels of the next macro block of the decoded predicted enhancement layer picture are predicted.
  • the above steps may then be repeated for each macro block in the decoded reconstructed enhancement layer picture.
  • the embodiments of the invention described above describe the codec in terms of separate encoder and decoder apparatus in order to assist the understanding of the process involved. However, it would be appreciated that the apparatus, structures and operations may be implemented as a single encoder-decoder apparatus/structure/operation. Furthermore in some embodiments of the invention the coder and decoder may share some/or all common elements.
  • embodiments of the invention operating an within an electronic device or apparatus
  • the application as described below may be implemented as part of any video codec.
  • embodiments of the application may be implemented in a video codec which may implement video coding over fixed or wired communication paths.
  • user equipment may compromise a video codec such as those describe in embodiments of the application above. It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers. Furthermore elements of a public land mobile network (PLMN) may also comprise video codecs as described above.
  • PLMN public land mobile network
  • the various embodiments described above may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of the application may be implemented by computer software executable by a data processor, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • a data processor such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example digital versatile disc (DVD), compact discs (CD) and the data variants thereof both.
  • DVD digital versatile disc
  • CD compact discs
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate. Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
  • a standardized electronic format e.g., Opus, GDSII, or the like
  • circuitry may refer to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as and where applicable: (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • circuitry would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware.
  • circuitry would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.
  • processor and memory may comprise but are not limited to in this application: (1 ) one or more microprocessors, (2) one or more processor(s) with accompanying digital signal processor(s), (3) one or more processor(s) without accompanying digital signal processor(s), (3) one or more special-purpose computer chips, (4) one or more field-programmable gate arrays (FPGAS), (5) one or more controllers, (6) one or more application-specific integrated circuits (ASICS), or detector(s), processor(s) (including dual-core and multiple-core processors), digital signal processor(s), controller(s), receiver, transmitter, encoder, decoder, memory (and memories), software, firmware, RAM, ROM, display, user interface, display circuitry, user interface circuitry, user interface software, display software, circuit(s), antenna, antenna circuitry, and circuitry.
  • FPGAS field-programmable gate arrays
  • ASICS application-specific integrated circuits

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

There is inter alia a method comprising: selecting a pixel from a first video picture; selecting a further pixel from the first video picture; selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.

Description

A METHOD AND APPARATUS FOR SCALABLE VIDEO CODING
Field of the Application
The present application relates to an apparatus and method for coding and decoding a video signal.
Background of the Application
A video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. The encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
Scalable video coding refers to a coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions, frame rates and/or other types of scalability. A scalable bitstream may consist of a base layer providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer may depend on the lower layers. Each layer together with all its dependent layers is one representation of the video signal at a certain spatial resolution, temporal resolution, quality level, and/or operation point of other types of scalability.
However, existing solutions for scalable video coding fail to exploit information that is available in the base and enhancement layers when encoding the enhancement layer.
Summary of the Application
The following embodiments aim to address the above problem.
There is provided according to an aspect of the application a method comprising:. selecting a pixel from a first video picture; selecting a further pixel from the first video picture; selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
Predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture may comprise predicting the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
Selecting the pixel from the first video picture may comprise selecting the pixel from a macro block of pixels of the first video picture. Selecting the further pixel from the first video picture may comprise selecting the further pixel from a further macro block of pixels from the first video picture. Selecting the pixel from the second video picture may comprise selecting the pixel from a macro block of pixels of the second video picture. Selecting the further pixel from the second video picture may comprise selecting the further pixel from a further macro block of pixels of the second video picture.
The further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture. The method may further comprise generating a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture. The direction of prediction indicator may indicate that the further macro block of pixels of the first video picture may be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may also be horizontally positioned relative to the macro block of pixels of the second video picture. The direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture may be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may also be vertically positioned relative to the macro block of pixels of the second video picture. The first video picture may be associated with an enhancement level picture of a scalable video coder, and the second video picture may be associated with a reconstructed base layer picture of a scalable video coder. According to a further aspect of the application there is provided an apparatus configured to: select a pixel from a first video picture; select a further pixel from the first video picture; select a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; select a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture. The apparatus configured to predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture may be further configured to predict the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
The apparatus configured to select the pixel from the first video picture may be configured to select the pixel from a macro block of pixels of the first video picture The apparatus configured to select the further pixel from the first video picture may be configured to select the further pixel from a further macro block of pixels from the first video picture. The apparatus configured to select the pixel from the second video picture may be configured to select the pixel from a macro block of pixels of the second video picture. The apparatus configured to select the further pixel from the second video picture may be configured to select the further pixel from a further macro block of pixels of the second video picture.
The further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture. The apparatus may be further configured to generate a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture. The direction of prediction indicator may indicate that the further macro block of pixels of the first video picture can be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can also be horizontally positioned relative to the macro block of pixels of the second video picture.
The direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture can be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can also be vertically positioned relative to the macro block of pixels of the second video picture.
The first video picture may be associated with an enhancement level picture of a scalable video coder, and the second video picture may be associated with a reconstructed base layer picture of a scalable video coder. According to another aspect of the application there is provided an apparatus comprising at least one processor and at least one memory including computer code for one or more programs, the at least one memory and the computer code configured with the at least one processor to cause the apparatus at least to: select a pixel from a first video picture; select a further pixel from the first video picture; select a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; select a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
The apparatus caused to predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture may be further caused to predict the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
The apparatus caused to select the pixel from the first video picture may be caused to select the pixel from a macro block of pixels of the first video picture. The apparatus caused to select the further pixel from the first video picture may be caused to select the further pixel from a further macro block of pixels from the first video picture. The apparatus caused to select the pixel from the second video picture may be caused to select the pixel from a macro block of pixels of the second video picture. The apparatus caused to select the further pixel from the second video picture may be caused to select the further pixel from a further macro block of pixels of the second video picture.
The further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and wherein the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture.
The apparatus may be further caused to generate a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
The direction of prediction indicator may indicate that the further macro block of pixels of the first video picture may be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may be horizontally positioned relative to the macro block of pixels of the second video picture.
The direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture may be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture may be vertically positioned relative to the macro block of pixels of the second video picture.
The first video picture may be associated with an enhancement level picture of a scalable video coder, and the second video picture may be associated with a reconstructed base layer picture of a scalable video coder.
According to yet another aspect of the application there is provided a non-transitory computer- readable storage medium having stored thereon computer-readable code, which, when executed by computing apparatus, causes the computing apparatus to perform a method comprising: selecting a pixel from a first video picture; selecting a further pixel from the first video picture; selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture; selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
The non-transitory computer-readable storage medium having stored thereon computer- readable code, which, when executed by computing apparatus, causes the computing apparatus to perform predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel in the second video picture may cause the computing apparatus to perform predicting the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
Selecting the pixel from the first video picture may cause the computing apparatus to perform selecting the pixel from a macro block of pixels of the first video picture. Selecting the further pixel from the first video picture may cause the computing apparatus to perform selecting the further pixel from a further macro block of pixels from the first video picture. Selecting the pixel from the second video picture may cause the computing apparatus to perform selecting the pixel from a macro block of pixels of the second video picture Selecting the further pixel from the second video picture may cause the computing apparatus to perform selecting the further pixel from a further macro block of pixels of the second video picture.
The further macro block of pixels of the first video picture may neighbour the macro block of pixels of the first video picture, and the further macro block of pixels of the second video picture may neighbour the macro block of pixels of the second video picture.
The non-transitory computer-readable storage medium having stored thereon computer- readable code, which, when executed by computing apparatus, may further cause the computing apparatus to perform generating a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
The direction of prediction indicator may indicate that the further macro block of pixels of the first video picture can be horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can be horizontally positioned relative to the macro block of pixels of the second video picture.
The direction of prediction indicator may also indicate that the further macro block of pixels of the first video picture can be vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture can be vertically positioned relative to the macro block of pixels of the second video picture.
The first video picture may be associated with an enhancement level picture of a scalable video coder, and the second video picture may be associated with a reconstructed base layer picture of a scalable video coder. A computer program comprising instructions that when executed by a computer apparatus perform the method as described herein. For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows schematically an electronic device employing embodiments of the invention;
Figure 2 shows schematically a user equipment suitable for employing embodiments of the invention;
Figure 3 further shows schematically electronic devices employing embodiments of the invention connected using wireless and wired network connections;
Figure 4 shows schematically an embodiment as incorporated within an encoder;
Figure 5 shows a flow diagram showing the operation of an embodiment with respect to the enhancement layer picture pixel predictor as shown in Figure 4;
Figure 6 shows a simplified representation of generating a predicted pixel sample in the enhancement layer picture pixel predictor;
Figure 7 shows a schematic diagram of a decoder according to embodiments of the invention; and
Figure 8 shows a flow diagram showing the operation of an embodiment with respect to the decoder shown in Figure 7.
Description of Some Embodiments of the Application
The following describes in further detail suitable apparatus and possible mechanisms for the provision of coding the enhancement picture layer of a scalable video codec whilst utilising information from the base and enhancement picture layers in order to further the coding efficiency of said enhancement picture layer. In this regard reference is first made to Figure 1 which shows a schematic block diagram of an exemplary apparatus or electronic device 50, which may incorporate a codec according to an embodiment of the invention.
The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.
The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise an infrared port 42 for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.
The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 further may comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting and receiving radio frequency signals generated at the radio interface circuitry 52.
In some embodiments of the invention, the apparatus 50 comprises a camera capable of recording or detecting individual frames which are then passed to the codec 54 or controller for processing. In other embodiments of the invention, the apparatus may receive the video image data for processing from an adjacent device prior to transmission and/or storage. In other embodiments of the invention, the apparatus 50 may receive either wirelessly or by a wired connection the image for coding/decoding. With respect to Figure 3, a system within which embodiments of the present invention can be utilised is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA network etc), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
The system 10 may include both wired and wireless communication devices or apparatus 50 suitable for implementing embodiments of the invention.
For example, the system shown in Figure 3 shows a mobile telephone network 1 1 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
The example communication devices show in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination personal digital assistant (PDA) and mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22. The apparatus 50 may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an aeroplane, a bicycle, a motorcycle or any similar suitable mode of transport.
Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 1 1 and the internet 28. The system may include additional communication devices and communication devices of various types.
The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 and any similar wireless communication technology. A communications device involves in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection. Typical video codecs, for example International Telegraphic Union - Technical Board (ITU-T) H.263 and H.264 coding standards, encode video information in two phases. In the first phase, pixel values in a certain picture area or "block" are predicted. These pixel values can be predicted, for example, by motion compensation mechanisms, which involve finding and indicating an area in one of the previously encoded video frames (or a later coded video frame) that corresponds closely to the block being coded. Additionally, pixel values can be predicted by spatial mechanisms which involve finding and indicating a spatial region relationship. The second phase is one of coding the error between the predicted block of pixels and the original block of pixels. This is typically accomplished by transforming the difference in pixel values using a specified transform. This transform is typically a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized and entropy encoded.
By varying the fidelity of the quantisation process, the encoder can control the balance between the accuracy of the pixel representation, (in other words, the quality of the picture) and the size of the resulting encoded video representation (in other words, the file size or transmission bit rate).
The decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantised prediction signal in the spatial domain).
After applying pixel prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the pixel values) to form the output video frame. The decoder (and encoder) may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming frames in the video sequence.
In typical video codecs, the motion information is indicated by motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder) and the prediction source block in one of the previously coded or decoded images (or pictures). In order to represent motion vectors efficiently, motion vectors are typically coded differentially with respect to block specific predicted motion vector. In a typical video codec, the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
In many video codecs the prediction residual after motion compensation can be first transformed with a transform kernel, for example a discrete cosine transform (DOT), and then coded. This can have the advantage of exploiting any residual correlation which may be still present in the prediction residual thereby providing for more efficient coding.
Typical video encoders utilise the Lagrangian cost function to find optimal coding modes, for example the desired macro block mode and associated motion vectors. This type of cost function uses a weighting factor or λ to tie together the exact or estimated image distortion due to lossy coding methods and the exact or estimated amount of information required to represent the pixel values in an image area. This may be represented by the equation:
C=D+ R where C is the Lagrangian cost to be minimised, D is the image distortion (in other words the mean-squared error) with the mode and motion vectors currently considered, and R is the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).
Scalable video coding (SVC) refers to a video coding methodology in which multiple representations of video content are encoded into a bit stream. Each representation of the video content may be encoded either at different bitrate, resolution or frame rate.
The receiver may then extract the desired encoded video representation depending on the characteristics of the receiving device. For example, a receiving device may select an encoded video representation which best matches the resolution of the display device.
Alternatively, a server or a network element may extract sections of the bitstream for further transmission to a receiver, where such sections may be associated with a particular representation of the encoded video content. The particular representation extracted for transmission may be dependent on factors such as network characteristics or processing capabilities of the receiver.
A typical scalable bitstream may consist of a base layer and one or more enhancement layers. The base layer may provide for the lowest quality video when decoded in the absence of the enhancement layers, and each further enhancement layer may provide further data for the decoding operation. Therefore, subsequently received enhancement layers can result in a progressive improvement to the quality of the decoded video signal.
One of the particular features of layered based coding is the possibility offered of intervening at any level whatsoever of the transmission or storage chain, so as to delete a part of binary stream without having to include any particular indication to the decoder.
Scalable video coding systems may improve coding efficiency of an enhancement layer by exploiting similarities between the base layer and any intervening enhancement layer. For example, SVC may use inter layer prediction of certain video coding parameters.
Information that may be inter layer predicted in an SVC codec can include intra texture, motion and residual data. Inter layer motion prediction may also include the prediction of block coding mode, and header information, wherein motion vectors from a lower layer may be used for prediction of the higher layer. Similarly, the pixel data of base or lower enhancement layers can be used to predict pixel data for an enhancement layer.
In the case of intra coding in an SVC codec, macro blocks of an enhancement layer may be predicted from corresponding surrounding macro blocks or co-located macro blocks from the base layer or intervening enhancement layers. It is to be understood that intra prediction techniques may not employ macro block information from coded pictures at other time instances.
Additionally an SVC codec may employ residual data from the lower or intervening layers for the prediction of the current layer.
In essence a SVC coding system may comprise a conventional non-scalable video coder and decoder which may be used to formulate the base layer coded video content. The base layer encoded video content may then be decoded at the encoding side of the process in order to form the reconstructed base layer pictures. The reconstructed base layer pictures may then form the bases for any inter layer prediction within subsequent enhancement layers.
In some embodiments reconstructed pictures associated with intervening enhancement layers may be used as a base for further inter layer prediction of further intervening enhancement layers.
In order to facilitate the hierarchical coding mechanism of a scalable video codec, reconstructed pictures of base and intervening enhancement coding layers may be stored in a reconstructed picture buffer in order to available for use in subsequent inter layer coding of further enhancement layers. For example, reconstructed pictures associated with a base layer may be stored in the reference picture buffer of an enhancement layer.
In some video codecs reconstructed pictures may be stored in the form of reference picture lists. For these codecs the base layer reconstructed pictures which are used in the inter layer prediction for a picture of an enhancement layer may be inserted into the reference picture list of the enhancement layer. The video encoder may then select a base layer reference picture as a inter prediction reference and indicate the use as a reference by inserting a reference picture index into the coded bit stream.
With respect to Figure 4, a block diagram of a video encoder suitable for carrying out embodiments of the invention is shown. Furthermore, with respect to Figure 5, the operation of the encoder is shown which exemplifies embodiments of the invention specifically with respect the inter layer prediction and coding of a block of pixels of an enhancement layer picture.
Figure 4 shows the encoder as configured to encode an enhancement layer of a scalable video coding system.
The enhancement layer encoder of Figure 4 is depicted as comprising an enhancement layer pixel predictor 302, prediction error encoder 303 and the prediction error decoder 304.
The enhancement layer pixel predictor 302 receives the enhanced layer image frame 300 to be encoded at the block inter-layer predictor 306 which can determine the difference between the image 300 and a reference frame 318.
In embodiments the reference frame 318 can be a reconstructed picture frame from the base layer. The reconstructed base layer picture frame 318 may be of the same time instant as that of the enhanced layer picture frame 300. In other words the reconstructed base layer picture frame 318 can be the decoded and reconstructed base layer picture associated with the enhanced layer image frame 300. The reconstructed base layer having been formed by the SVC encoder as a result of a previous operation to encode the base layer picture.
In embodiments the reconstructed base layer picture frame 318 may be stored in the Decoded Picture Buffer (DPB) or picture frame memory associated with the enhancement layer pixel predictor 302.
The operation of the block inter layer predictor 306 will be described hereafter in further detail.
In the flowing examples the encoder generates images in terms of NxN pixel macro blocks which go to form the full image or picture. It would be appreciated that various different sizes of macro blocks may be adopted by the block inter layer predictor 306. For example, some embodiments may deploy a 16x16 pixel macro block size, whereas other embodiments may deploy other macro block sizes such as a 8x8 pixel macro block size. The block inter layer predictor 306 may receive a macro block of the enhancement layer picture 300, or in other words a macro block of the enhancement layer picture 300 is selected as shown in Figure 5, step 501 .
The block inter-layer predictor 306 may then identify a directionality of prediction, or prediction direction, associated with a macro block of the enhancement layer image.
In the first group of embodiments a single value for the directionality of prediction may be assigned to a macro block. In other words each macro block of pixels may have a specific prediction direction assigned to it, thereby allowing prediction direction to be assigned on a macro block by macro block basis.
Some embodiments may choose a horizontal direction of prediction in which the value of a pixel within a macro block may be predicted from a pixel value sited in the same row within the enhancement layer picture. Typically the pixel value used for the prediction can be drawn from the same row within a neighbouring macro block.
Other embodiments may choose to predict a pixel value using a further pixel value from the same column within the enhancement layer picture as the predicted pixel value. In other words a vertical direction of prediction is used. As before, typically the pixel value used to in the prediction can be drawn from the same column within a neighbouring macro block.
The processing step of identifying a prediction direction for pixels of the selected macro block is shown as step 503 in Figure 5. In embodiments the block inter layer predictor 306 may identify a candidate pixel to be predicted from a macro block of the enhancement layer picture 300. With respect to Figure 6, an example of a pixel P(x,y) 601 is shown as being selected for prediction from the macro block 603.
In other words the block layer inter layer predictor 306 may provide means for selecting a pixel from an enhancement layer picture.
It is to be appreciated that the variables x and y may represent the coordinate position of the pixel P(x,y) within the macro block 603, and the value of the pixel P is a variable dependent on the coordinate position. The step of selecting a pixel P(x,y) for prediction in block of an enhancement layer picture 300 is shown as processing step 505 in Figure 5.
Having identified a direction of prediction associated with the pixel, the block inter layer predictor 306 may determine a reference sample from which the pixel P(x,y) can be predicted. With reference to Figure 6, the reference pixel sample is shown as PR(xR,yR) 605.
It is to be understood in embodiments that the reference pixel PR(xR, yR) 605 may be drawn from a neighbouring macro block within the same enhancement layer picture. The neighbouring macro block may be depicted as 607 in Figure 6.
In other words the block layer inter layer predictor 306 may provide means for selecting a further pixel for use in the prediction of the pixel, the further pixel being selected from the enhancement layer picture.
In a first group of embodiments the reference pixel PR(xR,yR) may be selected according to either horizontal or vertical directions of prediction. This has the advantage of simplifying the any implementation of embodiments since reference pixel can be obtained from directly neighbouring macro blocks to that of the macro block containing the identified pixel P(x,y).
For example, Figure 6 depicts the case in which the pixel P(x,y) may be predicted using a pixel reference value based upon the vertical prediction mode. In other words the value of the pixel P(x,y) is selected such that it can be predicted using a reference value pixel PR(xR, yR) in the same column.
It is to be understood that for the case of the horizontal prediction mode, then the value of the pixel P(x,y) may be selected to be predicted using a reference pixel in the same row from a neighbouring macro block. In the first group of embodiments the reference pixel for predicting the value of pixel P(x,y) can be drawn from the row or column immediately next to the macro block containing the pixel
P(x.y)-
With reference to Figure 6 in relation to the case of vertical prediction mode, the reference pixel PR(xR, yR) 605 used to predict the value of the pixel P(x,y) 601 at position x, y in the current macro block may be drawn from the row immediately above the macro block and within the same column containing said pixel P(x,y) 601. In Figure 6 this row may be depicted as row 607.
Similarly, for the case of horizontal prediction mode the reference pixel PR(xR, yR) used to predict the value of the pixel P(x,y) at position x, y may be drawn from the column immediately adjoining the vertical side of the macro block and within the same row containing said pixel P(x,y) 601 .
On other embodiments which deploy a vertical prediction mode, the reference pixel may be selected from the row directly above the position of the pixel P(x,y) whose value is to be predicted. In other words if the selected pixel lies at the coordinate position x,y within a macro block of the enhancement layer picture, then the corresponding reference pixel used to predict pixel P(x,y) may be the pixel value at position x, y-1 , that is PR(xR, yR) = P(x, y-1 ). The processing step of selecting a pixel in a neighbouring macro block of the enhancement level image 300 from which to predict a value of a pixel in a current macro block of the enhancement level image is shown as processing step 507 in Figure 5.
The block inter layer predictor 306 may then select a pixel from a corresponding macro block of the reconstructed base layer picture 318 which corresponds to the location of the pixel P(x,y) in the enhancement layer picture 300. In other words the selected reconstructed base layer pixel lies within a corresponding equivalent marco block to the macro block which contains the pixel P(x,y) in the enhancement layer picture, and the pixel selected from the reconstructed base layer has a position within the marco block which corresponds to the position of the pixel P(x,y) in the marco block of the enhancement layer picture.
With reference to Figure 6 the corresponding base layer pixel may be depicted as pixel B(x',y') 610 within the macro block 613 of the reconstructed base layer picture 318. It may be seen that the pixel B(x',y') 610 lies in the same position of the reconstructed base layer picture macro block as the pixel P(x,y) in the macro block of the enhancement layer picture.
The processing step of selecting a pixel in the reconstructed base layer picture macro block which corresponds to the position of the pixel to be predicted in the current macro block of the enhancement level picture is shown as processing step 509 in Figure 5.
In other words the block layer inter layer predictor 306 may provide means for selecting a pixel from a reconstructed base layer picture, the position of the pixel in the reconstructed base layer picture is equivalent to the position of the selected pixel from the enhancement layer picture. Further, the block inter-layer predictor 306 may also select a pixel in the reconstructed base layer picture 318 which corresponds to the same location as the reference pixel sample PR(xR,yR) 605 in the enhancement layer picture 300.
In other words the selected pixel in the reconstructed base layer picture which corresponds to the reference pixel sample PR(xR,yR) 605 may lie in a macro block whose position in the reconstructed base layer picture corresponds to the same position as the macro block containing the reference pixel sample PR(xR,yR) 605 in the enhancement layer picture. Furthermore, the pixel selected from the reconstructed base layer may have a position within the macro block which corresponds to the position of the pixel PR(xR,yR) 605 in the marco block of the enhancement layer picture.
With reference to Figure 6, the block inter-layer predictor 306 may identify the base layer reference pixel BR(xR',yR') 615 in the in the reconstructed base layer picture 318 which corresponds to the position of the reference pixel sample PR(xR,yR) 605 in the enhancement layer picture 300.
It is to be appreciated in embodiments that the reconstructed base layer reference pixel BR(xR',yR') 615 may be drawn from a macro block 617 which neighbours the macro block 613 containing the base layer pixel B(x',y') 610.
It is to be further appreciated in embodiments that the neighbouring macro block 617 from which the reconstructed base layer reference pixel BR(xR',yR') 615 may be drawn can correspond to the equivalent macro block 607 in the enhancement layer picture 300. It is to be understood in some embodiments that the enhancement layer picture 300 may be at a higher resolution to that of the corresponding base layer picture. For these embodiments, the macro blocks of the reconstructed base layer image may be upsampled in order to have the same spatial resolution as the macro blocks of the enhancement layer picture 300. The coordinate positions of the reconstructed base layer picture pixels and reconstructed base layer reference picture pixels may then have equivalent coordinate positions to their corresponding enhancement layer picture pixels and enhancement layer reference picture pixels. In other words the coordinate positional values of P(x,y) and B(x',y') may be the same such that x' = x and y' = y, and the coordinate positional values of PR(xR,yR) and BR(xR',yR') may also be the same whereby xR' = xR and yR' = yR.
In other embodiments the corresponding reconstructed base layer picture may not be up sampled to be of the same spatial resolution as the enhancement layer picture, and as such the coordinate systems of the reconstructed base layer picture and enhancement layer picture may be different from each other. In these embodiments it may than be necessary to rescale the coordinates of the enhancement layer picture pixels in order to compensate for the mismatch in relative spatial resolutions between the base layer reference picture and the enhancement layer picture. For example, in the situation that the base layer picture has a spatial resolution half that of the enhancement layer picture the pixel coordinates associated with the enhancement layer picture may have to be scaled by a commensurate factor of a half in order to translate to equivalent coordinates in the base layer picture.
The processing step of selecting a pixel in the reconstructed base layer picture macro block which corresponds to the position of the pixel in the neighbouring macro block which is used to predict the pixel in the current macro block of the enhancement level picture is shown as processing step 51 1 in Figure 5.
In other words the block layer inter layer predictor 306 may provide means for selecting a further pixel from the reconstructed base layer picture, the position of the further pixel in the reconstructed base layer picture is equivalent to the position of the selected further pixel in the enhancement layer picture.
The block inter layer predictor 306 may then use the relative difference in pixel values between a reconstructed base layer pixel B(x',y') and its respective reconstructed base layer reference pixel BR(xR',yR') in order to predict the value of the pixel P at the coordinate position x , y relative to a reference pixel PR(xR,yR) in a neighbouring macro block of an enhancement layer picture 300. In other words the difference in pixel values between a reconstructed base layer pixel and its respective reconstructed base layer reference pixel for a specific pixel position may be equivalent to the difference in pixel values between a pixel and its corresponding reference pixel at the same positions within an enhancement layer picture. With reference to Figure 6, a predicted value for the pixel P with coordinate positions x and y, P(x,y), may be obtained by adding the difference of the base layer pixel value B(x',y') and the base layer reference pixel value BR(xR',yR') to the enhancement layer reference sample PR(xR,yR). This may be expressed mathematically in relation to the non-limiting illustrative example of Figure 6 as
P(x,y) = PR(xR,yR) + (B(x',y') - BR(xR',yR')). The processing step of predicting the pixel value P(x,y) in the inter block layer predictor 306 by summing the difference between the corresponding reconstructed base layer pixel and reference base layer pixel with the selected reference pixel of the enhancement layer picture is shown as processing step 513 in Figure 5. In embodiments if the predicted pixel value at a position x,y P(x,y) exceeds an allowed dynamic range for a value of a pixel, the predicted pixel value may be limited according to
P(x,y) = Clip ( PR(xR,yR) + (B(x',y') - BR(xR',yR')) ). where Clip() represents the operation to limit the value of P(x,y) to a desired range. For example, in an 8 bit pixel representation system the value of the predicted pixel P(x,y) may be limited to the range 0 to 255.
In other words the block layer inter layer predictor 306 may provide means for predicting the value of the pixel from the enhancement layer picture dependent on the value of the selected further pixel from the enhancement layer picture and on the values of the selected pixel and the selected further pixel from the reconstructed base layer picture.
It is to be understood in embodiments that the processing steps 501 to 513 for finding a predicted value of the pixel P(x,y) of a macro block of the enhancement layer picture may be repeated for each pixel position within the macro block.
It is to be appreciated that the prediction of each pixel value P(x,y) is at least in part dependent on a reference pixel value from a neighbouring reconstructed macro block of the enhancement layer picture which has been previously coded. In other words each pixel value P(x,y) may backward predicted using a previously encoded pixel value from a neighbouring reconstructed enhancement layer picture macro block.
It is to be further appreciated in embodiments that the pixel values of the first macro block for each enhancement layer picture may be predicted by using a default initial value for each reference pixel value PR(xR,yR). This has the advantage that for each enhancement layer picture there is no requirement for reference pixel information to be transmitted to the decoder.
The output of the block inter layer predictor 306 is a pixel predicted representation of a macro block of the enhancement layer picture. In other words a macro block of the predicted enhancement layer picture. The pixel predicted representation of a macro block of the enhancement layer picture (predicted enhancement layer picture) may be passed to a first summing device 321 . The first summing device 321 may subtract the pixel predicted macro block representation of the enhancement layer picture (in other words the predicted enhancement layer picture) from the corresponding macro block of the enhancement layer picture 300 to produce a first prediction error signal 320 which may form an input to the prediction error encoder 303. Additionally the block inter layer predictor 306 may be configured to provide a further output comprising the parameters from which the pixel predicted representation of the enhancement layer picture can be reconstructed. In some embodiments the parameters may comprise for each macro block of the predicted enhanced layer picture an indicator for indicating the directionality of prediction.
The prediction error encoder 303 may be configured to carry out any known residual encoding algorithm known in the art for the pixels of a macro block of the prediction error signal 320.
The output from the prediction error encoder 303 may be the encoded macro block of the prediction error signal 325.
The encoded macro block of the prediction error signal 335 may be passed to the entropy encoder 330 and also to the prediction error decoder 304.
The prediction error decoder 304 receives the output from the prediction error encoder 303 and performs the opposite processes of the prediction error encoder 303 to produce a macro block of a decoded prediction error signal 338 which when combined with a macro block of the predicted enhancement layer picture 312 at the second summing device 339 produces a macro block of the reconstructed enhancement layer picture 314. The macro block of the reconstructed enhancement layer picture 314 may then be passed to the block layer inter predictor 306 whereby it can be used for the prediction of a subsequent macro block of the enhancement level picture 300.
In some embodiments the macro block of the reconstructed enhancement layer picture may be passed into a picture frame memory for the prediction of further enhancement layers.
The entropy encoder 330 receives the output of the prediction error encoder 303 and may perform a suitable entropy encoding/variable length encoding on the signal to provide error detection and correction capability. Any suitable entropy encoding algorithm may be employed.
Additionally, the entropy encoder 330 may also receive any parameters form the inter layer block encoder 306 which may be used for the decoding of the predicted enhancement layer picture.lt is to be appreciated that the enhancement layer picture encoder generates pictures in terms of NxN pixel macro blocks which go to form the full picture or image. Thus the enhancement layer picture encoder produce a series of encoded macro blocks which represent when decoded the reconstructed enhancement layer picture.
For completeness a suitable decoder is hereafter described. Figure 7 shows a block diagram of a video decoder suitable for employing embodiments of the invention. The decoder shows an entropy decoder 700 which performs entropy decoding on the received signal. The entropy decoder thus performs the inverse operation to the entropy encoder 330 of the encoder described above. The entropy decoder 700 outputs the results of the entropy decoding to a prediction error decoder 702 and a decoder enhancement layer pixel predictor 704.
The decoder enhancement layer pixel predictor 704 receives the output of the entropy decoder 700 and outputs a macro block of a predicted representation of the enhancement level picture 716 to a first combiner 713. The predicted representation of the enhancement level picture macro block 716 is used in conjunction with a macro block of the reconstructed prediction error signal 712 to generate a macro block of a reconstructed enhancement level picture 718. The macro block of the reconstructed enhancement level picture 718 may be used in the block inter layer predictor 714 and also may be passed to a filter 720. The filter 720 applies a filtering which outputs a final reconstructed enhancement level picture 722. In some embodiments the final reconstructed enhancement level picture 722 may be stored in a reference frame memory 724, where the reference picture frame memory 724 may be connected to the decoder block inter layer predictor 714 in order to facilitate further prediction operations for higher enhancement layer pictures. It is to be appreciated that the reference picture frame memory 724 may also comprise reconstructed base layer pictures, the reconstructed base layer pictures having been decoded as a separate decoding operation within the SVC decoder.
The operation of the decoder block inter layer predictor 714 is described in further detail with respect to the flow diagram Figure 8.
The decoder block inter layer predictor 714 may form the predicted enhancement layer image on a macro block by macro block basis. Initially the decoder block inter layer predictor 714 may receive a decoded direction of prediction indicator from the entropy decoder 700.
The direction of prediction indicator indicates the direction from which a pixel value lies in relation to another pixel value which it is used to predict.
As mentioned above in a first group of embodiments the direction of prediction indicator may be used to signify the direction of prediction for all pixels within a macro block.
In the first group of embodiments the direction of prediction indicator may indicate whether the pixels in the macro block are predicted using pixels from a horizontal direction to the predicted pixel. Alternatively, the direction of prediction indicator may indicate whether the pixels in the macro block are predicted using pixels from a vertical direction to the predicted pixel. In other words the decoder block inter layer predictor 714 may provide means for receiving a direction of prediction indicator. The indicator indicates a position of a macro block of pixels of the enhancement layer picture relative to another macro block of pixels of the enhancement layer picture and a position of a macro block of pixels of the reconstructed base layer picture relative to another macro block of pixels of the reconstructed base layer picture.
The step of receiving the direction of prediction indicator for the current macro block is shown as processing step 801 in Figure 8.
The decoder block inter layer predictor 714 may then select a position x,y within the macro block for prediction of the pixel value at that position.
In other words the decoder block layer inter layer predictor 714 may provide means for selecting a pixel from an enhancement layer picture.
The step of selecting a pixel position for predicting the value thereof in the current macro block is shown as processing step 803 in Figure 8.
As mentioned above predicting the value of the pixel at position x,y within a macro block may involve determining the reference pixel sample PR(xR, yR) in a previously decoded macro block or pixel value of the reconstructed enhancement layer picture 722. It is to be understood in the first group of embodiments the reference pixel sample may be determined to be in a neighbouring macro block to that of the macro block which is currently being decoded. The direction of the neighbouring macro block relative to the currently decoded macro block is given by the direction or prediction indicator. In other words the decoder block layer inter layer predictor 714 may provide means for selecting a further pixel for use in the prediction of the pixel, the further pixel being selected from the enhancement layer picture.
The step of selecting the reference pixel sample corresponding to the pixel value selected for prediction, the reference sample being selected from a neighbouring macro block of the reconstructed enhancement layer image is shown as processing step 805 in Figure 8.
It is to be appreciated in embodiments that the position of the reference pixel sample PR(xR, yR) may be constant relative to the position of pixel value P(x,y) which it is used to predict. In other words, a particular pixel value P(x,y) within a macro block may use a reference pixel sample PR(xR, yR) which is always at the same position in a neighbouring macro block for the particular pixel value. Therefore for each predicted pixel value there is no requirement to transmit the position xR, yR of its associated reference pixel sample.
Once the decoder block inter layer predictor 714 has selected the pixel value P(x,y) to predict and identified its corresponding reference sample PR(xR, yR) from a previously decoded neighbouring macro block of the predicted enhancement layer picture, the decoder block inter layer predictor 714 may identify the corresponding base layer pixel B(x',y') and its respective base layer reference pixel BR(xR',yR').
It is to be appreciated in embodiments that a decoded base layer picture which may be associated with the reconstructed enhancement layer picture may be resident in the reference picture frame memory as a result of an earlier decoding operation by the SVC decoder. Thus, the base layer pixel B(x',y') and its respective base layer reference pixel BR(xR',yR') may be drawn from the decoded reconstructed base layer picture associated with the reconstructed enhancement layer picture.
The processing step of selecting the base layer pixel B(x',y') and its respective base layer reference pixel BR(xR',yR') from the corresponding macro block of the reconstructed base layer image is shown as processing steps 807 and 809 in Figure 8.
In other words the decoder block layer inter layer predictor 306 may provide means for selecting a pixel from the reconstructed base layer picture, the position of the pixel in the reconstructed base layer picture is equivalent to the position of the selected pixel from the enhancement layer picture. Further the decoder block layer inter layer predictor 306 may provide means for selecting a further pixel from the reconstructed base layer picture, the position of the further pixel in the reconstructed base layer picture is equivalent to the position of the selected further pixel from the enhancement layer picture
Once the respective pixels have been identified the value of the Pixel P(x,y) may then be determined by applying the expression as above of
P(x,y) = PR(xR,yR) + (B(x',y') - BR(xR',yR')).
The processing step of determining the value of the predicted pixel P(x,y) is shown as processing step 809 in Figure 8.
As above, should the dynamic range of the predicted pixel P(x,y) exceed a predetermined value, then the value of the predicted pixel P(x,y) may be clipped according to P(x,y) = Clip ( PR(xR,yR) + (B(x',y') - BR(xR',yR')) ).
In other words the decoder block layer inter layer predictor 714 may provide means for predicting the value of the pixel from the enhancement layer picture dependent on the value of the selected further pixel from the enhancement layer picture and on the values of the selected pixel and the selected further pixel from the reconstructed base layer picture.
It is to be understood that the above processing steps 801 to 81 1 may be repeated for each pixel position x,y in the currently decoded macro block of the reconstructed enhancement level picture.
Once the current macro block has been fully decoded to give a decoded macro block of the predicted enhancement layer picture it may be combined with the corresponding macro block of the reconstructed prediction error signal 712 to give a decoded macro block of the reconstructed enhancement layer picture via the summer 713
It is to be appreciated that the decoded macro block of the reconstructed enhancement layer picture may then form a neighbouring macro block from which the pixels of the next macro block of the decoded predicted enhancement layer picture are predicted.
The above steps may then be repeated for each macro block in the decoded reconstructed enhancement layer picture. The embodiments of the invention described above describe the codec in terms of separate encoder and decoder apparatus in order to assist the understanding of the process involved. However, it would be appreciated that the apparatus, structures and operations may be implemented as a single encoder-decoder apparatus/structure/operation. Furthermore in some embodiments of the invention the coder and decoder may share some/or all common elements.
Although the above examples describe embodiments of the invention operating an within an electronic device or apparatus, it would be appreciated that the application as described below may be implemented as part of any video codec. Thus, for example, embodiments of the application may be implemented in a video codec which may implement video coding over fixed or wired communication paths.
Thus user equipment may compromise a video codec such as those describe in embodiments of the application above. It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers. Furthermore elements of a public land mobile network (PLMN) may also comprise video codecs as described above.
In general, the various embodiments described above may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of the application may be implemented by computer software executable by a data processor, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example digital versatile disc (DVD), compact discs (CD) and the data variants thereof both.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate. Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
As used in this application, the term circuitry may refer to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as and where applicable: (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term circuitry would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.
The term processor and memory may comprise but are not limited to in this application: (1 ) one or more microprocessors, (2) one or more processor(s) with accompanying digital signal processor(s), (3) one or more processor(s) without accompanying digital signal processor(s), (3) one or more special-purpose computer chips, (4) one or more field-programmable gate arrays (FPGAS), (5) one or more controllers, (6) one or more application-specific integrated circuits (ASICS), or detector(s), processor(s) (including dual-core and multiple-core processors), digital signal processor(s), controller(s), receiver, transmitter, encoder, decoder, memory (and memories), software, firmware, RAM, ROM, display, user interface, display circuitry, user interface circuitry, user interface software, display software, circuit(s), antenna, antenna circuitry, and circuitry.

Claims

CLAIMS:
1 . A method comprising:
selecting a pixel from a first video picture;
selecting a further pixel from the first video picture;
selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture;
selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and
predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
2. The method as claimed in claim 1 , wherein predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture comprises:
predicting the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
3. The method as claimed in claims 1 and 2, wherein selecting the pixel from the first video picture comprises selecting the pixel from a macro block of pixels of the first video picture, wherein selecting the further pixel from the first video picture comprises selecting the further pixel from a further macro block of pixels from the first video picture, wherein selecting the pixel from the second video picture comprises selecting the pixel from a macro block of pixels of the second video picture, and wherein selecting the further pixel from the second video picture comprises selecting the further pixel from a further macro block of pixels of the second video picture.
4. The method as claimed in claim 3, wherein the further macro block of pixels of the first video picture neighbours the macro block of pixels of the first video picture, and wherein the further macro block of pixels of the second video picture neighbours the macro block of pixels of the second video picture.
5. The method as claimed in claims 3 and 4, further comprising:
generating a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
6. The method as claimed in claim 5, wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is horizontally positioned relative to the macro block of pixels of the second video picture.
7. The method as claimed in claim 5, wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is vertically positioned relative to the macro block of pixels of the second video picture.
8. The method as claimed in claims 1 to 7, wherein the first video picture is associated with an enhancement level picture of a scalable video coder, and wherein the second video picture is associated with a reconstructed base layer picture of a scalable video coder.
9. An apparatus configured to:
select a pixel from a first video picture;
select a further pixel from the first video picture;
select a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture;
select a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and
predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
10. The apparatus as claimed in claim 9, wherein the apparatus configured to predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture is further configured to:
predict the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
1 1. The apparatus as claimed in claims 9 and 10, wherein the apparatus configured to select the pixel from the first video picture is configured to select the pixel from a macro block of pixels of the first video picture, wherein the apparatus configured to select the further pixel from the first video picture is configured to select the further pixel from a further macro block of pixels from the first video picture, wherein the apparatus configured to select the pixel from the second video picture is configured to select the pixel from a macro block of pixels of the second video picture, and wherein the apparatus configured to select the further pixel from the second video picture is configured to select the further pixel from a further macro block of pixels of the second video picture.
12. The apparatus as claimed in claim 1 1 , wherein the further macro block of pixels of the first video picture neighbours the macro block of pixels of the first video picture, and wherein the further macro block of pixels of the second video picture neighbours the macro block of pixels of the second video picture.
13. The apparatus as claimed in claims 1 1 and 12, is further configured to:
generate a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
14. The apparatus as claimed in claim 13, wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is horizontally positioned relative to the macro block of pixels of the second video picture.
15. The apparatus as claimed in claim 13, wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is vertically positioned relative to the macro block of pixels of the second video picture.
16. The apparatus as claimed in claims 9 to 15, wherein the first video picture is associated with an enhancement level picture of a scalable video coder, and wherein the second video picture is associated with a reconstructed base layer picture of a scalable video coder.
17. An apparatus comprising at least one processor and at least one memory including computer code for one or more programs, the at least one memory and the computer code configured with the at least one processor to cause the apparatus at least to: select a pixel from a first video picture;
select a further pixel from the first video picture;
select a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture;
select a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and
predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
18. The apparatus as claimed in claim 17, wherein the apparatus caused to predict the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel from the second video picture is further caused to:
predict the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
19. The apparatus as claimed in claims 17 and 18, wherein the apparatus caused to select the pixel from the first video picture is caused to select the pixel from a macro block of pixels of the first video picture, wherein the apparatus caused to select the further pixel from the first video picture is caused to select the further pixel from a further macro block of pixels from the first video picture, wherein the apparatus caused to select the pixel from the second video picture is caused to select the pixel from a macro block of pixels of the second video picture, and wherein the apparatus caused to select the further pixel from the second video picture is caused to select the further pixel from a further macro block of pixels of the second video picture.
20. The apparatus as claimed in claim 19, wherein the further macro block of pixels of the first video picture neighbours the macro block of pixels of the first video picture, and wherein the further macro block of pixels of the second video picture neighbours the macro block of pixels of the second video picture.
21 . The apparatus as claimed in claims 19 and 20, is further caused to:
generate a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
22. The apparatus as claimed in claim 21 , wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is horizontally positioned relative to the macro block of pixels of the second video picture.
23. The apparatus as claimed in claim 21 , wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is vertically positioned relative to the macro block of pixels of the second video picture.
24. The apparatus as claimed in claims 17 to 23, wherein the first video picture is associated with an enhancement level picture of a scalable video coder, and wherein the second video picture is associated with a reconstructed base layer picture of a scalable video coder.
25. A non-transitory computer-readable storage medium having stored thereon computer- readable code, which, when executed by computing apparatus, causes the computing apparatus to perform a method comprising:
selecting a pixel from a first video picture;
selecting a further pixel from the first video picture;
selecting a pixel from a second video picture, wherein the position of the pixel from the second video picture is equivalent to the position of the selected pixel from the first video picture;
selecting a further pixel from the second video picture, wherein the position of the further pixel from the second video picture is equivalent to the position of the selected further pixel from the first video picture; and
predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and on the values of the selected pixel and the selected further pixel in the second video picture.
26. The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claim 25 which, when executed by computing apparatus, causes the computing apparatus to perform predicting the value of the pixel from the first video picture dependent on the value of the selected further pixel from the first video picture and the values of the selected pixel and the selected further pixel in the second video picture. causes the computing apparatus to perform:
predicting the value of the pixel from the first video picture by summing the value of the selected further pixel from the first video picture to the difference in values between the selected pixel and the selected further pixel from the second video picture.
27. The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claims 25 and 26, wherein selecting the pixel from the first video picture causes the computing apparatus to perform selecting the pixel from a macro block of pixels of the first video picture, wherein selecting the further pixel from the first video picture causes the computing apparatus to perform selecting the further pixel from a further macro block of pixels from the first video picture, wherein selecting the pixel from the second video picture causes the computing apparatus to perform selecting the pixel from a macro block of pixels of the second video picture, and wherein selecting the further pixel from the second video picture causes the computing apparatus to perform selecting the further pixel from a further macro block of pixels of the second video picture.
28. The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claim 27, wherein the further macro block of pixels of the first video picture neighbours the macro block of pixels of the first video picture, and wherein the further macro block of pixels of the second video picture neighbours the macro block of pixels of the second video picture.
29. The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claims 27 and 28, which, when executed by computing apparatus, further causes the computing apparatus to perform:
generating a direction of prediction indicator indicating a position of the further macro block of pixels of the first video picture relative to the macro block of pixels of the first video picture and a position of the further macro block of pixels of the second video picture relative to the macro block of pixels of the second video picture.
30. The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claim 29, wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is horizontally positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is horizontally positioned relative to the macro block of pixels of the second video picture.
31 . The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claim 29, wherein the direction of prediction indicator indicates that the further macro block of pixels of the first video picture is vertically positioned relative to the macro block of pixels of the first video picture and the further macro block of pixels of the second video picture is vertically positioned relative to the macro block of pixels of the second video picture.
32. The non-transitory computer-readable storage medium having stored thereon computer- readable code as claimed in claims 25 to 31 , wherein the first video picture is associated with an enhancement level picture of a scalable video coder, and wherein the second video picture is associated with a reconstructed base layer picture of a scalable video coder.
33. A computer program comprising instructions that when executed by a computer apparatus control it to perform the method of any of claims 1 to 8.
PCT/FI2012/050701 2012-07-03 2012-07-03 A method and apparatus for scalable video coding WO2014006263A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/FI2012/050701 WO2014006263A1 (en) 2012-07-03 2012-07-03 A method and apparatus for scalable video coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/FI2012/050701 WO2014006263A1 (en) 2012-07-03 2012-07-03 A method and apparatus for scalable video coding

Publications (1)

Publication Number Publication Date
WO2014006263A1 true WO2014006263A1 (en) 2014-01-09

Family

ID=49881404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2012/050701 WO2014006263A1 (en) 2012-07-03 2012-07-03 A method and apparatus for scalable video coding

Country Status (1)

Country Link
WO (1) WO2014006263A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090168872A1 (en) * 2005-01-21 2009-07-02 Lg Electronics Inc. Method and Apparatus for Encoding/Decoding Video Signal Using Block Prediction Information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090168872A1 (en) * 2005-01-21 2009-07-02 Lg Electronics Inc. Method and Apparatus for Encoding/Decoding Video Signal Using Block Prediction Information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANG W. ET AL.: "Gradient based fast mode decision algorithm for intra prediction in HEVC", 2ND INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, COMMUNICATIONS AND NETWORKS (CECNET), 21 April 2012 (2012-04-21), YICHANG, pages 1836 - 1840, XP032182054, DOI: doi:10.1109/CECNet.2012.6201851 *
SHI, Z. ET AL.: "CGS quality scalability for HEVC", 2011 IEEE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 17 October 2011 (2011-10-17), HANGZHOU *

Similar Documents

Publication Publication Date Title
US20210409756A1 (en) Method for video coding and an apparatus
US20230179792A1 (en) Method for coding and an apparatus
US11368700B2 (en) Apparatus, a method and a computer program for video coding
US9280835B2 (en) Method for coding and an apparatus based on a DC prediction value
US20120243606A1 (en) Methods, apparatuses and computer programs for video coding
US9432699B2 (en) Methods, apparatuses and computer programs for video coding
EP2813078A1 (en) Method for coding and an apparatus
WO2014006263A1 (en) A method and apparatus for scalable video coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12880328

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12880328

Country of ref document: EP

Kind code of ref document: A1