WO2013109179A1 - Output of decoded reference pictures - Google Patents

Output of decoded reference pictures Download PDF

Info

Publication number
WO2013109179A1
WO2013109179A1 PCT/SE2012/051372 SE2012051372W WO2013109179A1 WO 2013109179 A1 WO2013109179 A1 WO 2013109179A1 SE 2012051372 W SE2012051372 W SE 2012051372W WO 2013109179 A1 WO2013109179 A1 WO 2013109179A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
output
value
decoded
order count
Prior art date
Application number
PCT/SE2012/051372
Other languages
French (fr)
Inventor
Jonaton SAMUELSSON
Rickard Sjöberg
Original Assignee
Telefonaktiebolaget L M Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget L M Ericsson (Publ) filed Critical Telefonaktiebolaget L M Ericsson (Publ)
Priority to EP12818654.1A priority Critical patent/EP2805490A1/en
Publication of WO2013109179A1 publication Critical patent/WO2013109179A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain

Definitions

  • the embodiments generally relate to encoding and decoding of pictures, and in particular to outputting decoded reference pictures from a decoded picture buffer.
  • H.264 Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC)
  • MPEG-4 Moving Picture Experts Group-4
  • AVC Advanced Video Coding
  • JCT-VC Joint Collaborative Team - Video Coding
  • JCT-VC is a collaborative project between MPEG and International Telecommunication Union Telecommunication standardization sector (ITU-T).
  • ITU-T International Telecommunication Union Telecommunication standardization sector
  • WD HEVC Working Draft
  • the HEVC WD specifies that each picture shall belong to a temporal layer and that a syntax element called temporaljd shall be present for each picture in the bitstream, corresponding to the temporal layer the picture belongs to.
  • the temporal layers are ordered and have the property that a lower temporal layer never references a higher temporal layer. Thus, higher temporal layers can be removed without affecting the lower temporal layers.
  • the removal of temporal layers can be referred to as temporal scaling. Removal of layers can be done in an entity that is neither an encoder nor a decoder, such as a network node. Such an entity can, but is not limited to, forward video bitstream packets from an encoder to a decoder and perform removal of temporal layers without performing full video decoding on the incoming data.
  • the resulting bitstream after one or more temporal layers have been removed is called a subsequence.
  • HEVC High Efficiency Video Coding
  • HEVC it is possible to signal that a picture is a temporal layer switching point, which indicates that at this picture it is possible for a decoder to start decoding more temporal layers than what was decoded before the switching point.
  • the switching point indication guarantees that no picture following the switching point references a picture from before the switching point that might not have been decoded because it belongs to a higher temporal layer than what was decoded before the switching point.
  • the switching points are therefore very useful for a layer removal entity in order to know when to stop removing a certain temporal layer and start forwarding it.
  • the output process is changed compared to H.264/AVC so that marking of pictures as "unused for prediction" is performed prior to decoding of the current picture.
  • the output process is also performed prior to the decoding of the current picture.
  • HEVC defines a Decoded Picture Buffer (DPB) that consists of frame buffers, also referred to as picture slots or picture buffers, in which decoded pictures are stored.
  • DPB Decoded Picture Buffer
  • the DPB size is determined from syntax elements in the bitstream.
  • the DPB fullness increases with one when a picture is inserted into the DPB and decreases with one when a picture is removed from the DPB. If the DPB fullness is equal to the DPB size there are no empty frame buffers and the DPB is said to be full.
  • the "bumping" process that is used for outputting pictures basically consists of filling up the DPB and then start outputting as few pictures as possible, in correct output order, to free up a frame buffer for the current picture.
  • DPB fullness is equal to DPB size
  • the "bumping" is invoked repeatedly until there is an empty frame buffer in which to store the current decoded picture.
  • Picture Order Count represented by the variable PicOrderCntVal is used in HEVC to define the display order of pictures. POC is also used to identify reference pictures.
  • the "bumping" process consists of the following steps:
  • the picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as "needed for output”.
  • IDR Instant Decoder Refresh
  • the "bumping" process includes filling up the DPB before starting to output.
  • a subsequence consisting of a subset of the temporal layers in the original sequence uses a different DPB size.
  • An encoder that would signal different DPB sizes for different temporal layers would, however, be required to evaluate the "bumping" operations for each subsequence, i.e. for each temporal layer, to validate that DPB size requirements are fulfilled and the display order of pictures is correct.
  • the encoder uses different DPB sizes for different temporal layers and additionally signals switching points, it has to track all possible switching alternatives and keep track of the output status in each subsequence to ensure that the picture order is correct for each possible subsequence. This is also the case for a layer removing entity.
  • the encoder can control how many subsequences there are and thereby the complexity of this problem a layer removing entity can not, it has to handle every possible incoming bitstream.
  • An aspect of the embodiments relates to a method of outputting decoded pictures of a video stream from a decoded picture buffer in a decoder.
  • the method comprises calculating a picture order count limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream.
  • the method also outputs decoded reference pictures stored in the decoded picture buffer and having a respective picture order count value that is lower than the calculated picture order count limit.
  • a related aspect of the embodiments defines a decoder comprising a decoded picture buffer configured to store decoded pictures of a video stream.
  • a limit calculator of the decoder is configured to calculate a picture order count limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream.
  • the decoder also comprises a picture outputting unit configured to output decoded reference pictures stored in the decoded picture buffer and having a respective picture order count value lower than the picture order count limit calculated by the limit calculator.
  • a further related aspect of the embodiments defines a receiver comprising a decoder comprising a decoded picture buffer configured to store decoded pictures of a video stream.
  • a limit calculator of the decoder is configured to calculate a picture order count limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream.
  • the decoder also comprises a picture outputting unit configured to output decoded reference pictures stored in the decoded picture buffer and having a respective picture order count value lower than the picture order count limit calculated by the limit calculator.
  • Another aspect of the embodiments relates to a method of encoding a current picture of a video stream in an encoder.
  • the method comprises determining a picture order count limit to have a value enabling a target state of a decoded picture buffer in a decoder for the current picture.
  • the picture order count limit defines a number of decoded reference pictures to be output from the decoded picture buffer in a picture output process for the current picture.
  • the method also comprises determining at least one syntax element representative of the picture order count limit.
  • the current picture is encoded to get an encoded representation of the current picture.
  • the at least one syntax element is associated with this encoded representation.
  • Another related aspect of the embodiments defines an encoder comprising a limit determiner configured to determine a picture order count limit to have a value enabling a target state of a decoded picture buffer in a decoder for a current picture of a video stream.
  • the determined picture order count defines a number of decoded reference pictures to be output from the decoded picture buffer in a picture output process for the current picture.
  • a syntax element determiner is configured to determine at least one syntax element representative of the value determined for the picture order count limit.
  • An encoding unit is configured to encode the current picture to get an encoded representation of the current picture.
  • the encoder also comprises an associating unit configured to associate the at least one syntax element with the encoded representation.
  • Yet another related aspect of the embodiments defines a transmitter comprising an encoder comprising a limit determiner configured to determine a picture order count limit to have a value enabling a target state of a decoded picture buffer in a decoder for a current picture of a video stream.
  • the determined picture order count defines a number of decoded reference pictures to be output from the decoded picture buffer in a picture output process for the current picture.
  • a syntax element determiner is configured to determine at least one syntax element representative of the value determined for the picture order count limit.
  • An encoding unit is configured to encode the current picture to get an encoded representation of the current picture.
  • the encoder also comprises an associating unit configured to associate the at least one syntax element with the encoded representation.
  • the present embodiments provide a picture output process enabling a control of which decoded reference pictures to be output from the decoded picture buffer and the timing of outputting the decoded reference pictures. As a consequence, any delay in outputting decoded reference pictures can be reduced as compared to the prior art "bumping" process. I n fact, it is with the present embodiments possible to start outputting decoded reference pictures even if the decoded picture buffer fullness has not reached the defined decoded picture buffer size.
  • the embodiments are particular advantageous in connection with temporally scaled video streams and sequences where otherwise significant output delays can occur due to removal of various temporal layers.
  • Fig. 1 is a schematic illustration of a video stream of pictures comprising one or more slices
  • Fig. 2 is an illustration of a data packet comprising a NAL unit
  • Fig. 3 is an illustration of an encoded representation of a slice
  • Fig. 4 is a flow diagram of a method of outputting decoded pictures according to an embodiment
  • Fig. 5 is a flow diagram illustrating additional, optional steps of the method in Fig. 4 according to an embodiment
  • Fig. 6 is a flow diagram illustrating additional, optional steps of the method in Figs. 4 and 5 according to an embodiment
  • Fig. 7 is a flow diagram illustrating additional, optional steps of the method in Fig. 4 according to another embodiment
  • Fig. 8 is a flow diagram illustrating additional, optional steps of the method in Fig. 7 according to an embodiment
  • Fig. 9 is a flow diagram of a method of encoding a picture according to an embodiment
  • Fig. 10 is a flow diagram illustrating additional, optional steps of the method in Fig. 9 according to an embodiment
  • Fig. 11 is a schematic block diagram of a receiver according to an embodiment
  • Fig. 12 is a schematic block diagram of a decoder according to an embodiment
  • Fig. 13 is a schematic block diagram of a decoder according to another embodiment
  • Fig. 14 is a schematic block diagram of a transmitter according to an embodiment
  • Fig. 15 is a schematic block diagram of an encoder according to an embodiment
  • Fig. 16 is a schematic block diagram of an encoder according to another embodiment. DETAILED DESCRIPTION
  • the present embodiments generally relate to the field of encoding and decoding of pictures of a video stream.
  • the embodiments relate to outputting decoded reference pictures from a decoded picture buffer (DPB).
  • DPB decoded picture buffer
  • the embodiments hence, provide an outputting process that could be used instead of the current "bumping" process mentioned in the background section.
  • the outputting process provides significant advantages over the prior art bumping process in terms of reducing output delay. These advantages are in particular obtained in connection with temporarily scaled video sequences. It is proposed herein to use a limit calculated from syntax elements signaled in the bitstream, i.e. in the encoded data generated by an encoder and transmitted to a decoder.
  • This limit is then used at the decoder to identify those decoded references pictures stored in the DPB that should be output, such as for display.
  • all pictures in the DPB with a picture order count value, such as PicOrderCntVal, lower than the limit and that have not yet been output are output, such as for display.
  • a syntax element is a codeword or data element forming part of the encoded data generated by an encoder and to be decoded by a decoder.
  • a syntax element is typically a codeword or data element forming part of the control data associated with an encoded representation or such control data or header data present in an encoded representation of a picture.
  • a syntax element can, for instance, be a codeword in a slice header of the encoded representation of a picture.
  • a syntax element can, for instance, be a codeword in a parameter set or other control data associated with the encoded representation of a picture, e.g. retrievable from the bitstream based on data present in the encoded representation or sent outside of the bitstream but retrievable based on data present in the encoded representation.
  • a picture is typically output for display on a screen of or connected to the decoder.
  • a picture could also be output for other reasons including, but not limited, storage on a file; coded in another format or with other properties, such as transcoded; delivered to another unit or device for post-decoding processing; etc.
  • Output as used herein therefore typically relates to output for display but also encompasses other forms of picture output, such as any of the above mentioned examples.
  • Fig. 1 is a schematic illustration of a video stream 1 of pictures 2.
  • a picture 2 in HEVC is partioned into one or more slices 3, where each slice 3 is an independently decodable segment of a picture 2. This means that if a slice 3 is missing, for instance got lost during transmission, the other slices 3 of that picture 2 can still be decoded correctly.
  • slices 3 In order to make slices 3 independent, they should not depend on each other. Hence, in a particular embodiment, no bitstream element of a slice 3 is required for decoding any element of another slice 3.
  • NAL Network Abstraction Layer
  • a coded video stream or sequence i.e. bitstream
  • NAL unit 1 1 comprises either a slice with a corresponding slice header including control information for that slice or the NAL unit 11 comprises a parameter set.
  • the parameter set comprises control information.
  • a NAL unit 11 as output from an encoder is typically complemented with headers 12 to form a data packet 10 that can be transmitted as a part of a bistream from the encoder to the decoder.
  • headers 12 For instance, Real-time Transport Protocol (RTP), User Datagram Protocol (UDP) and Internet Protocol (IP) headers 12 could be added to the NAL unit 11.
  • RTP Real-time Transport Protocol
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • This form of packetization of NAL units 11 merely constitutes an example in connection with video transport.
  • Other approaches of handling NAL units 11 such as file format, MPEG-2 transport streams, MPEG-2 program streams, etc. are possible.
  • APS comprises control information valid for more than one slice.
  • the control information may differ between the slices.
  • PPS comprises control information valid for several pictures, and may be the same for multiple pictures of the same video sequence.
  • SPS comprises control information valid for an entire video sequence.
  • an encoded representation 20 of a slice comprises a slice header 21 which independently provides all required data for the slice to be independently decodable.
  • An example of a syntax element or data element present in the slice header 21 is the slice address, which is used by the decoder to know the spatial location of the slice.
  • the encoded representation 20 also comprises, in addition to the slice header 21 , slice data 22 that comprises the encoded data of the particular slice, e.g. encoded color values of the pixels in the slice.
  • Fig. 4 is a flow diagram illustrating a method of outputting decoded pictures of a video stream from a DPB in a decoder.
  • the method generally starts in step S1.
  • a picture order count (POC) limit sometimes referred to as limit X herein, is calculated based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream.
  • POC picture order count
  • step S2 of the method decoded reference pictures stored in the DPB and having a respective POC value lower than the POC limit calculated in step S1 are output.
  • one or more syntax elements obtained based on the encoded representation of a current picture are used to calculate a POC limit, which in turn defines those decoded reference pictures in the DPB that should be output.
  • a POC limit which in turn defines those decoded reference pictures in the DPB that should be output.
  • step S1 comprises calculating the POC limit based a POC value of the current picture and the at least one syntax element retrieved based on the encoded representation of the current picture.
  • the at least one syntax element represents a so-called output distance value or OutputDistance.
  • the output distance value is calculated based on the at least one syntax element.
  • the POC limit is preferably calculated based on the POC value of the current picture and the output distance value and more preferably as a difference between the POC value and the output distance value.
  • the output distance value defines a distance or step among POC values from the position of the current picture in the output order to the position of the latest, according to the output order, picture that should be output.
  • any pictures stored in the decoded picture buffer and that precede, according to the output order as defined by the POC values, this latest picture should also be output in step S2.
  • the output distance value is thereby used, preferably together with the POC value of the current picture, to define the picture with the highest POC value, i.e. the POC limit, which should be output. Pictures stored in the decoded picture buffer and that have respective POC values lower than this highest POC value should also be output.
  • step S1 involves calculating the POC limit as PicOrderCnt ⁇ CurrPic ) - OutputDistance, wherein PicOrderCnt ⁇ CurrPic ) denotes the POC value of the current picture and OutputDistance denotes the output distance value.
  • PicOrderCnt ⁇ CurrPic denotes the POC value of the current picture
  • OutputDistance denotes the output distance value.
  • the POC limit can be determined in other ways.
  • OutputDistance as the retrieved syntax element, or syntax elements used to calculate the output distance value is (are) preferably signaled in the bitstream from the encoder to the decoder. If syntax elements are signaled, the output distance value is calculated at the decoder. The output distance value is then used for outputting "old" pictures, i.e. already decoded reference pictures, from the DPB in order to reduce the content of the DPB such that the output delay is decreased.
  • syntax elements are used to calculate OutputDistance, i.e. the output distance value, the syntax elements may be signaled in the slice header or in a parameter set.
  • the encoded representation 20 of the current picture may include the at least one syntax element that is used to calculate the POC limit.
  • the at least one syntax element is then used to calculate the output distance value, which in turn is used, preferably together with the POC value of the current picture, to calculate the POC limit.
  • the decoder can directly retrieve the at least one syntax element during the parsing and decoding of the slice header 21.
  • the at least one syntax element is not necessarily present in the encoded representation 20 of the picture.
  • the encoded representation 20 comprises data enabling retrieval of the at least one syntax element.
  • the encoded representation 20 preferably comprises a parameter set identifier, such as an APS identifier or a PPS identifier.
  • the parameter set identifier is typically present in the slice header 21 of the encoded representation 20. In such a case, the parameter set identifier is retrieved during parsing and decoding of the slice header 21 and the decoder can then locate the particular parameter set that is identified by the parameter set identifier.
  • the at least one syntax element is thereby retrieved from the identified parameter set.
  • the encoded representation 20 comprises, such as in the slice header 21 , a first parameter set identifier, such as a PPS identifier, identifying a first parameter set, such as a PPS.
  • This first parameter set could then comprise a second parameter set identifier, such as an SPS identifier, identifying a second parameter set, such an SPS.
  • the at least one syntax element could then be present in the second parameter set.
  • the at least one syntax element is thereby obtained based on the second parameter set identifier as obtained using the first parameter set identifier retrieved from the encoded representation 20.
  • any syntax element could be retrieved directly from the encoded representation 20 or all could be retrieved from a parameter set identified based on a parameter set identifier obtained based on the encoded representation 20.
  • at least one of the multiple syntax element is directly retrieved from the encoded representation 20 and at least one of the multiple syntax element is retrieved from one or more parameter sets.
  • the multiple syntax elements could be carried in different parameter sets, which are identified based on data, i.e. one or more parameter set identifiers, carried in the encoded representation 20.
  • the variable OutputDistance i.e. the output distance value
  • the syntax elements may be signaled in slice headers or in parameter sets.
  • Reference pictures in the DPB that have not yet been output when a current picture [CurrPic) is to be decoded and that have a POC value, such as represented by PicOrderCntVal, lower than PicOrderCnt ⁇ CurrPic ) - OutputDistance are output.
  • OutputDistance is a non-negative integer. In an embodiment there is no limitation of the maximum value for OutputDistance.
  • OutputDistance is, though, limited to be in the range from 0 and N, inclusive, for some positive integer N.
  • N is MaxPicOrderCntLsbA , which is the maximum possible value that can be signaled for a syntax element pic_order_cnt_lsb.
  • PicOrderCnt ⁇ is defined to be consistently increasing relative to the latest IDR picture in decoding order.
  • the variable PicOrderCntVal is derived as PicOrderCntMsb + pic_order_cnt_lsb, where pic_order_cnt_lsb is signaled for each picture and PicOrderCntMsb is calculated from a previous picture.
  • the variable MaxPicOrderCntLsb which is calculated from syntax elements in the SPS, is used to define the range of possible values for pic_order_cnt_lsb, i.e. 0 to MaxPicOrderCntLsbA , inclusive.
  • the maximum value of the variable OutputDistance is defined based on the syntax element in the SPS used to define the range of the parameter pic_order_cnt_lsb, based on which the POC value ⁇ PicOrderCntVal) of a picture is calculated.
  • the maximum value of the variable OutputDistance, i.e. N is maxPOCA if wrapped POC is used.
  • PicOrderCnt ⁇ the function PicOrderCnt ⁇ ) is defined with wrap around.
  • POC wrap around works like this. Assume a sequence of pictures that constructs a video sequence. Regardless of the length of the video sequence a display order number can be assigned to each picture that simply represents the order in which the pictures should be displayed. Denote this number the "true_POC". However, in order to save bits during encoding the true_POC is not signaled in the bitstream. Instead a range-limited syntax element is encoded referred to as "POC" and limited to be in the range from 0 to MaxPOCA , inclusive. One possible way to calculate POC from the true_POC is to use the modulo operation denoted "%", i.e.
  • POC true_POC % MaxPOC. That means, for example, that the true_POC value equal to MaxPOC+ ⁇ will be given the POC value 1.
  • a DiffPOC ⁇ ) function can be defined for wrapped POC to give correct differences, i.e. distances, between two POC values as long as -MaxPOCIl ⁇ true_POC distance ⁇ MaxPOCIl.
  • Fig. 5 illustrates additional, optional steps of the method in Fig. 4.
  • the method starts in step S10, in which a first output flag is retrieved based on the encoded representation.
  • This retrieval of the first output flag is advantageously performed by retrieving the first output flag from the encoded representation, such as from the slice header.
  • the first output flag is retrieved from a parameter set identified based on data retrieved from the encoded representation, such as a parameter set identifier present in the slice header.
  • this first output flag is investigated in the optional step S11. If the first output flag has a first predefined value, such as It , the method continues to step S12. In this step S12 an output distance syntax element is retrieved based on the encoded representation.
  • This output distance syntax element is preferably present in the encoded representation, such as in the slice header. Alternatively, it is present in a parameter set identifiable based on a parameter set identifier present in the encoded parameter set.
  • the retrieved output distance syntax element is employed in step S13 to determine an output distance value.
  • the method then continues to step S1 of Fig. 4 where the POC limit is calculated based on a POC value of the current picture and the output distance value, preferably equal to the POC value of the current picture subtracted by the output distance value determined in step S13.
  • the output distance value is determined in step S13 to be equal to half of a difference syntax element representing a largest POC difference that can be signaled from the encoder to the decoder, e.g. MaxPicOrderCntLsbl2, if the output distance syntax element has a predefined value, preferably 0 or ( n.
  • the output distance value is preferably determined in step S13 to be equal to the output distance syntax element if the output distance syntax element has a value different from the predefined value, such as 0.
  • step S14 the output distance value is preferably determined to be equal to zero.
  • step S1 of Fig. 4 the POC limit is calculated based on the output distance value.
  • the POC limit is calculated based on the POC value of the current picture and the output distance value, more preferably the POC limit is equal to the POC value of the current picture subtracted by the (zero) output distance value.
  • output_all jprecedingjpicsjiag first output flag
  • output_distance_idc output distance syntax element
  • a further syntax element could be used to signal whether syntax element(s) to be used for calculating the output distance value and therefore the POC limit is present for each picture or if the output distance value could be inferred to be a specific value.
  • Such an embodiment is shown in Fig. 6.
  • Fig. 6 starts in step S20 where a second output flag is retrieved based on the encoded representation.
  • This second output flag is preferably retrieved from a parameter set, such as PPS or SPS, using a parameter set identifier retrieved from the encoded representation or retrieved from a second parameter set identified based on a second parameter set identifier present in the encoded representation. It could alternatively be possible to include the second output flag in the encoded representation, such as in the slice header.
  • step S21 it is investigated whether the second output flag has a predefined value, preferably I t .
  • the method continues to step S22 where the output distance value is determined or inferred to be set equal to a predefined value, preferably zero.
  • the method then continues to step S1 where the POC limit is calculated based on the zero output distance value.
  • the POC limit is calculated to be based on, preferably equal to, the POC value of the current picture subtracted by the output distance value. Since the output distance value is zero in this case the POC limit is preferably equal to the POC value of the current picture.
  • the method continues to step S10.
  • the second output flag is, for instance, equal to (Li the retrieving steps S10, S12 and the determining step S13 of Fig. 5 or the retrieving step S10 and the determining step S14 of Fig. 5 are preferably performed in order to determine the output distance value, which is used in step S1 to calculate the POC limit.
  • another syntax element output_distance_always_zero (second output flag) in SPS, PPS or in another appropriate data field defines whether syntax elements used to calculate the variable OutputDistance (output distance value) shall be present for each picture or if OutputDistance shall be inferred to be a specific value. If syntax elements used to calculate OutputDistance are not present it is preferred that OutputDistance is inferred to be set equal to 0. In an embodiment the syntax element output_distance_always_zero is a one bit flag. OutputDistance can then be calculated as exemplified below.
  • step S2 of Fig. 4 comprises outputting, in increasing order of POC values starting from a lowest POC value, decoded reference pictures that i) are stored in the DPB, ii) have a respective POC value that is lower than the POC limit and iii) are marked as "needed for output".
  • decoded reference pictures stored in frame buffers of the DPB can be marked as needed for output if they need to be output, e.g. for display.
  • a decoded reference picture that is not needed for output, such as has already been output, e.g. for display, is typically marked as "not needed for output”.
  • the method also comprises an additional step S30 as shown in Fig. 7.
  • step S30 the decoded reference pictures that are output in step S2 are marked as not needed for output. This means that the decoded reference picture(s) that previously was(were) marked as needed for output and that was(were) output in step S2 of Fig. 4 is(are) remarked as not needed for output in step S30. This remarking is used to indicate that the decoded reference picture(s) has(have) already been output for display and therefore do(es) not need to be output any longer.
  • An optional but preferred additional step S31 of the method comprises emptying any frame buffer of the DPB that stores a decoded reference picture marked as "unused for reference” and marked, such as in step S30, as "not needed for output". Hence, at this step S31 one or more of the frame buffers of the DPB could become empty and available for storing a new decoded picture. If any frame buffer is emptied in step S31 the DPB fullness is preferably reduced by the corresponding number of frame buffers that have been emptied in step S31.
  • the encoded representation can be decoded in step S40 of Fig. 8 to get a current decoded picture.
  • Decoding of pictures using the slice data of the encoded representation and control information as defined by the slice header is performed according techniques well known in the art.
  • the current decoded picture obtained in step S40 can then be stored in an empty frame buffer in the DPB in step S41.
  • outputting of decoded reference pictures are preferably performed prior to decoding the slice data of the encoded representation of the current slice.
  • the removal of pictures from the DPB before decoding of the current picture, but after parsing the slice header of the first slice of the current picture proceeds as follows.
  • the decoding process for reference picture set is invoked. If the current picture is not an IDR picture, frame buffers containing a picture which is marked as "not needed for output” and "unused for reference” are emptied without output.
  • the DPB fullness is decremented by the number of frame buffers emptied.
  • the output process consists, in an embodiment, of the following ordered steps: 1.
  • the picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as "needed for output”.
  • the picture is optionally cropped, using the cropping rectangle specified in the active SPS for the picture, the optionally cropped picture is output, and the picture is marked as "not needed for output".
  • the frame buffer that included the picture that was output and optionally cropped contains a picture marked as "unused for reference”
  • the frame buffer is emptied and the DPB fullness is decremented by 1.
  • wrapped POC is used to signal the POC values. For that case PicOrderCntVal and PicOrderCnt ⁇ ) are calculated relative each current picture.
  • Fig. 9 is a flow diagram of a method of encoding a current picture of a video stream in an encoder according to an embodiment.
  • the method generally starts in step S50 where a POC limit is determined to have a value enabling, determining or defining a target state of a DPB in a decoder for the current picture.
  • the POC limit determined in step S50 defines a number of decoded reference pictures to be output from the DPB in a picture output process invoked for the current picture.
  • a next step S51 at least one syntax element representative of the value of the POC limit determined in step S50 is determined.
  • the at least one syntax element determined in step S51 enables determination or calculation of the POC limit.
  • the current picture is encoded in step S52 to get an encoded representation of the current picture.
  • This encoded representation may be in the form of an encoded representation 20 of a slice comprising a slice header 21 and slice data 22, such as packed into a NAL unit 11 as shown in Fig. 2.
  • the encoded representation of the current picture could be in form of, if the current picture consists of multiple slices, multiple respective encoded representations 20 of slices, each having a respective slice header 21 and slice data 22. These multiple encoded representations 20 of slices could be packed in separate NAL units 11.
  • Encoding of a picture is performed according to techniques well known in the art of picture and video coding.
  • the at least one syntax element determined in step S51 is associated with or to the encoded representation in step S53.
  • This step S53 can be performed prior to, after or substantially in parallel with step S52.
  • Associating the at least one syntax element with the encoded representation can be performed according to various embodiments as mentioned herein.
  • the at least one syntax element could, for instance, be added to the encoded representation, such as inserted into the slice header of the encoded representation in step S53.
  • each encoded representation of a slice for the picture preferably comprises the at least one syntax element.
  • the at least one syntax element could be inserted, in step S53, into one or more parameter sets.
  • step S53 involves inserting at least one syntax element into the slice header and inserting at least one parameter set identifier into the slice header, where this at least one parameter set identifier enables identification of at least one parameter set carrying at least one syntax element as determined in step S51.
  • the encoding method as shown in Fig. 9 can therefore, by determining the POC limit to have a particular value, determine or define a target state that the DPB in the decoder will have for the current picture. This means that if a particular target state of the DPB is desired, such as a desired DPB fullness, the POC limit is determined in step S50 to have a value that will achieve the particular target state, such as DPB fullness, when processing the encoded representation of the current picture at the decoder.
  • the DPB preferably comprises a number of frame buffers in which decoded reference pictures are stored.
  • the POC limit is determined to have a value such that at least one frame buffer in the DPB is emptied from a decoded reference picture marked as unused for reference if there are no empty frame buffers in the DPB prior to emptying the at least one frame buffer.
  • the DPB fullness prior to processing the current picture at the decoder, is equal to the DPB size so there are no empty frame buffers in which the current picture can be entered once it has been decoded by the decoder.
  • the desired target state of the DPB is in this case therefore to achieve a DPB fullness that is lower than the DPB size to allow room for the current picture in the DPB.
  • the encoder has all the knowledge of the DPB status and therefore knows which decoded reference pictures that are stored in the frame buffers of the DPB in the decoder at the time of decoding of the current picture. This means that the encoder also knows the respective POC values of these stored decoded reference pictures, their markings and the POC value of the current picture. All these, i.e. the POC values and the markings, are actually determined and set by the encoder.
  • the encoder can thereby, such as based on the POC values of the decoded reference pictures stored in the DPB prior to decoding the current picture and the POC value of the current picture, determine a POC limit to have a value so that when the decoder performs the previously described picture output method or process at least one frame buffer of the DPB is emptied.
  • the POC limit is determined to have value defined based on the coding structure of the video stream and preferably based on POC values of future pictures of the video stream, i.e. pictures following the current picture in decoding order.
  • This embodiment is particular suitable if there already is at least one empty frame buffer in the DPB for the current picture.
  • the POC limit value is in this embodiment determined based on the coding structure of the video stream, i.e. the encoding and decoding relationships between pictures in the video stream. Information of pictures that are encoded and thereby decoded based on other pictures, i.e.
  • the encoder can, by determining a suitable value of the POC limit in step S50, make sure that reference pictures that have already been marked as "unused for reference", i.e. are no longer needed as reference picture for the current and/or following pictures, are emptied from the DPB so that new reference pictures can be added to the DPB.
  • the POC limit is thereby determined so that a target status of the DPB is achieved and any reference pictures that might be needed as reference for future picture decoding could be entered in frame buffers of the DPB.
  • the syntax element that is determined in step S51 of Fig. 9 depends on the particular embodiment.
  • the syntax element determined in step S51 could be at least one of output_distance_idc, MaxPicOrderCntLsb, output_all receding icsjlag, output_distance_always_zero.
  • the at least one syntax element determined in step S51 could, for instance, be the previously mentioned output distance syntax element and preferably also the first output flag and optionally the second output flag.
  • the maximum picture order count value could also be included as syntax element.
  • the output distance value [OutputDistance) could be determined based on the POC limit, such as PicOrderCnt ⁇ CurrPic ) - POC limit. In an embodiment, if this OutputDistance becomes zero, the first output flag [outpu all jirecedingjiicsjlag) could be set to zero. Alternatively, if OutputDistance should be equal to half of the largest POC difference [MaxPicOrderCntLsblT) the first output flag is preferably set to one and the output distance syntax element [output_distance_idc) is preferably set to zero. Otherwise the first output flag is preferably set to one and the output distance syntax element is preferably set to the determined value of the output distance value.
  • the POC limit such as PicOrderCnt ⁇ CurrPic ) - POC limit.
  • the first output flag [outpu all jirecedingjiicsjlag) could be set to zero.
  • the first output flag is preferably set to
  • the second output flag [output_distance_always_zero) is used. This is in particular beneficial if multiple of the pictures in the video stream should have an output distance value of zero.
  • a bitstream restriction may be imposed on the value of the output distance.
  • a reason for this is that the bitstream otherwise could become somewhat sensitive to loss of data packets carrying encoded slices (see Fig. 2) if very large values are allowed for the output distance.
  • Fig. 10 is a flow diagram illustrating additional, optional steps of the method in Fig. 9 when using such bitstream restrictions. This embodiment is particular suitable for HEVC or other video codecs in which each picture has a respective POC value and a respective temporal identifier.
  • a value X is compared to another value X'.
  • the value X represents and is preferably equal to the highest POC value of all decoded pictures of the video stream with temporal identifier lower than or equal to a temporal identifier of the current picture and that have been output prior to invoking the output method or process for the current picture.
  • the value X' correspondingly represents and is preferably equal to the highest POC value of all decoded pictures of the video stream with temporal identifier lower than or equal to the temporal identifier of the current picture and that have been output after invoking the output method or process for a previous picture in the video stream.
  • This previous picture is previous to the current picture according to the decoding order of the video stream.
  • the previous picture has a temporal identifier lower than or equal to the temporal identifier of the current picture.
  • the previous picture is preferably the closest, according to the decoding order, picture with temporal identifier equal to or lower than the temporal identifier of the current picture that precedes the current picture according to the decoding order.
  • step S60 If the value X is equal to the value X' as investigated in step S60 the method continues to step S51 of Fig. 9. Hence, in this embodiment no restriction is needed for the output distance value.
  • step S61 a new value of the syntax element is set, e.g. by setting a new value of the output distance value, which is smaller than the POC of the current value subtracted by the value X.
  • the output distance value is determined to be smaller than PicOrderCntVal ⁇ CurrPic ) - X.
  • Fig. 12 is a schematic block diagram of a decoder 40 according to an embodiment.
  • the decoder 40 comprises a decoded picture buffer (DPB) 48 configured to store decoded pictures of a video stream.
  • the decoder 40 also comprises a limit calculator 41 , also denoted limit calculating unit, means or module.
  • the limit calculator 41 is configured to calculate a POC limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream.
  • a picture outputting unit 42 also denoted picture output or picture outputting means or module, is implemented in the decoder 40 and configured to output decoded reference pictures stored in the DPB 48 and having a respective POC value that is lower than the POC limit.
  • the decoder 40 comprises an optional flag retriever 43, also denoted flag retrieving unit, means or module.
  • the flag retriever 43 is configured to retrieve a first output flag based on the encoded representation.
  • the flag retriever 43 could be configured to retrieve the first output flag from a slice header of the encoded representation or from a parameter set using a parameter set identifier obtained based on, such as present in, the slice header of the encoded representation as previously discussed herein.
  • the decoder 40 preferably, in this particular embodiment, also comprises an optional element retriever 44, also denoted element retrieving unit, means or module.
  • the element retriever 44 is configured to retrieve an output distance syntax element if the first output flag retrieved by the flag retriever 43 has a first predefined value, such as It .
  • the element retriever 44 is configured to retrieve this output distance element based on the encoded representation, such as from a slice header in the encoded representation.
  • An optional value determiner 45, also denoted value determining unit, means or module, of the decoder 40 is configured to determine an output distance value based on the output distance syntax element retrieved by the element retriever 44.
  • the limit calculator 41 is configured to calculate the POC limit to be based on, preferably equal to, the POC value of the current picture subtracted by the output distance value determined by the value determiner 45.
  • the value determiner 45 is, in an embodiment, configured to determine the output distance value to be based on, preferably equal to, half of a largest POC difference if the output distance syntax element retrieved by the element retriever 44 has a predefined value, such as 0.
  • the value determiner 45 is preferably configured to determine the output distance value to be based on, preferably equal to, the value defined or represented by the output distance syntax element.
  • the value determiner 45 is configured to determine the output distance value to be equal to zero if the first output flag retrieved by the flag retriever 43 has a second predefined value, such as (Li.
  • the limit calculator 41 preferably calculates the POC limit to be equal to the POC value of the current picture subtracted by the output distance value, i.e. equal to the POC value of the current picture since the output distance value is zero in this example.
  • the flag retriever 43 is configured to retrieve a second output flag based on the encoded representation.
  • the flag retriever 43 typically retrieves the second output flag from a parameter set identified based on a parameter set identifier present in a slice header of the encoded representation or present in a second parameter set identified based on a second parameter set identifier present in the slice header of the encoded representation.
  • the flag retriever 43 is then configured to retrieve the first output flag if this second output flag has a predefined value, such as (Li. If the second output flag instead has a second predefined value, such as It , the output distance value could have a predefined value, such as zero, so that no retrieval of any first output flag or retrieval of any output distance syntax element is needed to calculate the POC limit.
  • the picture outputting unit 42 is, in an embodiment, configured to output, in increasing order of POC values starting with a lower POC value, decoded reference pictures that are stored in the DPB and have a respective POC value lower than the POC limit calculated by the limit calculator 41 and are marked as needed for output.
  • the decoder 40 comprises, in an optional embodiment, a picture marking unit 46, also denoted picture marker or picture marking means or module.
  • the picture marking unit 46 is configured to mark the decoded reference pictures that are output by the picture outputting unit 42 as "not needed for output" to indicate that the decoded reference picture(s) already has(have) been output.
  • the picture outputting unit 42 is preferably configured to empty any frame buffer 49 of the DPB 48 storing a decoded reference picture that is marked as "not needed for output” and marked as "unused for reference”.
  • the decoder 40 preferably also comprises a decoding unit 47, also denoted picture decoder or decoding means or module.
  • the decoding unit 47 is configured to decode the encoded representation to get a current decoded picture. This current decoded picture can then be stored in the DPB 48 in an empty frame buffer 49.
  • the decoder decodes 40 the information needed to calculate the POC limit (limit X), e.g. PicOrderCntVal of the current picture and OutputDistance.
  • the syntax elements that may be used for calculating OutputDistance may be specified by standard specifications.
  • the decoder 40 calculates OutputDistance based on the received syntax elements.
  • the decoder 40 displays pictured in the DPB 48 marked as "needed for output” with PicOrderCntVal lower than the POC limit (limit X), e.g. defined as PicOrderCnt ⁇ CurrPic ) - OutputDistance, in increasing order of PicOrderCntVal starting with the one with lowest PicOrderCntVal. Pictures that have been displayed are marked as "not needed for output”.
  • limit X e.g. defined as PicOrderCnt ⁇ CurrPic
  • the current picture is decoded and marked according to its OutputFlag, e.g. "needed for output” or “not needed for output”.
  • the decoder 40 of Fig. 12 with its including units 41 -47 could be implemented in hardware.
  • circuitry elements that can be used and combined to achieve the functions of the units 41-47 of the decoder 40. Such variants are encompassed by the embodiments.
  • Particular examples of hardware implementation of the decoder 40 is implementation in digital signal processor (DSP) hardware and integrated circuit technology, including both general-purpose electronic circuitry and application- specific circuitry.
  • DSP digital signal processor
  • the decoder 50 can also be implemented by means of a processor 52 and a memory 54 as illustrated 5 in Fig. 13.
  • the decoder 50 is implemented e.g. by one or more of a processor 52 and adequate software with suitable storage or memory 54 therefore, a programmable logic device (PLD) or other electronic component(s).
  • the decoder 50 preferably comprises an input or input unit 51 configured to receive the encoded representations of the video stream, such as in the form of NAL units.
  • a corresponding output or output unit 53 is configured to output the decoded 10 pictures.
  • the decoder can be implemented in a device, such as a mobile device exemplified as mobile phones, tablets, video camera, set-top-box, etc.
  • Fig. 1 1 illustrates such an example where the decoder 32 is located in a receiver 30, such as in a video camera or a display, e.g. in a mobile device.
  • the receiver 30 such as in a video camera or a display, e.g. in a mobile device.
  • the 15 30 then comprises an input or input unit 31 configured to receive a coded bitstream, such as data packets of NAL units as shown in Fig. 2.
  • the encoded representations of the NAL units are decoded by the decoder 32 as disclosed herein.
  • the decoder 32 preferably comprises or is connected to a reference picture buffer 34 that temporarily stores already decoded reference pictures 35 that are to be used as reference pictures for other pictures in the video stream. Decoded pictures are output from the
  • These output pictures are sent to be displayed to a user on a screen or display of or connected, including wirelessly connected, to the receiver 30.
  • Fig. 15 is a schematic block diagram of an encoder 70 according to an embodiment.
  • the encoder 70 25 comprises a limit determiner 71 , also denoted limit determining unit, means or module.
  • the limit determiner 71 is configured to determine a POC limit to have a value enabling, defining or determining a target state of a DPB in a decoder for a current picture of a video stream. This determined POC limit defines a number of decoded reference pictures to be output from the DPB in a picture output process invoked by the decoder for the current picture.
  • a syntax element determiner 72 also denoted syntax element determining unit, means or module, is configured to determine at least one syntax element representative of the value of the POC limit determined by the limit determiner 71.
  • An encoding unit 73 also denoted picture encoder or encoding means or module, of the encoder 70 is configured to encode the current picture to get an encoded representation of the current picture.
  • the encoder 70 also comprises an associating unit 74, also denoted associator or associating means or module.
  • the associating unit 74 is configured to associate the at least one syntax element determined by the syntax element determiner 72 with or to the encoded representation.
  • the associating unit 74 could be configured to include the a syntax element into the slice header of the encoded representation and/or include a parameter set identifier in the slice header where this parameter set identifier enables identification of a parameter set comprising a syntax element.
  • the limit determiner 71 is configured to determine the POC limit to have a value selected so that at least one frame buffer in the DPB is emptied from a decoded reference picture marked as unused for reference if there are no such empty frame buffers in the DPB prior to emptying the at least one frame buffer.
  • the particular value determined for the POC limit frees a frame buffer and thereby makes room in the DPB for the current picture during decoding.
  • the limit determiner 71 is configured to determine the POC limit to have a value defined based on the coding structure of at least a portion of the video stream and preferably of POC values of future pictures of the video stream. This embodiment is particular suitable if there is at least one empty frame buffer in the DPB for the current picture.
  • the limit determiner 71 preferably and at least partly determines the POC limit based on coding structure, i.e. how pictures of the video stream are encoded and decoded relative to each other, i.e. used as reference pictures, for the current picture but preferably also for future pictures of the video stream that follow the current picture according to the decoding order.
  • the encoder 70 may impose bitstream restriction to syntax element determined by the syntax element determiner 72.
  • the encoder 70 preferably comprises a comparator 75, also denoted comparing unit, means or module.
  • the comparator 75 is configured to compare value X with a value X'.
  • the value X is preferably equal to the highest POC of all decoded pictures of the video stream with temporal identifier lower than or equal to a temporal identifier of the current picture and that have been output prior to invoking the picture output process for the current picture.
  • the value X' is preferably equal to the higher POC value of all decoded pictures with temporal identifier lower than or equal to the temporal identifier of the current picture and that have been output after invoking a picture output process for a previous picture.
  • the previous picture is previous to the current picture according to the decoding order of the video stream and has a temporal identifier lower than or equal to the temporal identifier of the current picture.
  • the syntax element determiner 72 is configured to set a new value of at least one syntax element of the at least one syntax element, which value is smaller than the POC value of the current picture subtracted by the value X if the value X is different from the value X' as determined by the comparator 75. If the value X is equal to the value X' no new value of the at least one syntax element needs to be determined.
  • the encoder 70 preferably ensures that there is an empty frame buffer in the DPB that can be used by the new picture. If the DPB is full, i.e.
  • a frame buffer is emptied by the encoder by selecting a value of OutputDistance such that at least one picture marked as "unused for reference" is output. Otherwise the encoder 70 preferably selects any value for OutputDistance within the specified allowed range according to what is needed for the coding structure and POC values of future pictures. If the selected value for OutputDistance is larger than what is allowed by an optional bitstream restriction the encoder 70 preferably selects a new value for OutputDistance that is not larger than what is required by the bitstream restriction. The encoder 70 encodes the value of OutputDistance using the syntax elements form which the OutDistance is calculated at the decoder. The picture is encoded.
  • the encoder 70 of Fig. 15 with its including units 71-75 could be implemented in hardware.
  • circuitry elements that can be used and combined to achieve the functions of the units 71-75 of the encoder 70. Such variants are encompassed by the embodiments.
  • Particular examples of hardware implementation of the encoder 70 is implementation in digital signal processor (DSP) hardware and integrated circuit technology, including both general-purpose electronic circuitry and application- specific circuitry.
  • DSP digital signal processor
  • the encoder 80 can also be implemented by means of a processor 82 and a memory 84 as illustrated in Fig. 16.
  • the encoder 80 is implemented e.g.
  • the encoder 80 preferably comprises an input or input unit 81 configured to receive the pictures of the video stream.
  • a corresponding output or output unit 83 is configured to output the encoded representations of the pictures, such as in the form of NAL units.
  • the encoder can be implemented in a device, such as a mobile device exemplified as mobile phones, tablets, video camera, etc.
  • Fig. 14 illustrates an example of such a device in the form of a transmitter 60, e.g. implemented in a video camera e.g. in a mobile device.
  • the transmitter 60 then comprises an input or input unit 61 configured to receive pictures of a video stream to be encoded.
  • the pictures are encoded by the encoder 62 as disclosed herein.
  • Encoded pictures are output from the transmitter 60 by an output or output unit 63 in the form of a coded bitstream, such as of NAL units or data packets carrying such NAL units as shown in Fig. 2.
  • a network node may use the embodiments. For instance, pictures are forwarded by the network node and temporal layer switches are performed at temporal layer switching points. According to the embodiments this picture forwarding can be performed by the network node without having to care about the DPB status in the decoder, i.e. without regard to what pictures that have been output, for different temporal layers.
  • the present embodiments can be applied to different video codecs and different types of extensions, including, but not limited to, multi-view video codecs and scalable video codecs.
  • Temporal identifiers as discussed herein could, in alternative embodiments, be replaced by general layer identifiers that do not necessarily have to relate to different temporal layers. Such layer identifier could, for instance, define various camera view, different scalability layers, spatial layers, etc.
  • picture order count or POC is used herein as identifier of the pictures in the video stream, either in consistently increasing order relative to a latest IDR picture or by using POC wrap around. The embodiments are, however, not limited to using picture order count values as picture identifiers. In alternative embodiment other types of picture identifiers could be used instead of POC values.
  • the functional blocks may include or encompass, without limitation, digital signal processor (DSP) hardware, reduced instruction set processor, hardware (e.g., digital or analog) circuitry including but not limited to application specific integrated circuit(s) (ASIC), and (where appropriate) state machines capable of performing such functions.
  • DSP digital signal processor
  • ASIC application specific integrated circuit

Abstract

An output process for outputting decoded pictures (35) of a video stream (1) from a decoded picture buffer (34, 48) in a decoder (40, 50) involves calculating a picture order count limit based on at least one syntax element retrieved based on an encoded representation (20) of a current picture (2) of the video stream (1). Decoded reference pictures (35) stored in the decoded picture buffer (34, 48) and having a respective picture order count value lower than the picture order count limit are then output from the decoded picture buffer (34, 48). The output process reducing output delay in connection with picture decoding, in particular for temporally scaled video streams.

Description

OUTPUT OF DECODED REFERENCE PICTURES
TECHNICAL FIELD
The embodiments generally relate to encoding and decoding of pictures, and in particular to outputting decoded reference pictures from a decoded picture buffer.
BACKGROUND
H.264 (Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC)) is the state of the art video coding standard. It consists of a block based hybrid video coding scheme that exploits temporal and spatial prediction. A process for outputting, e.g. output for display, pictures is described in Annex C of the H.264 standard. The process is invoked after the decoding of a picture and after the marking of pictures as "unused for reference", "used for short-term reference" and "used for long-term reference". High Efficiency Video Coding (HEVC) is a new video coding standard currently being developed in Joint Collaborative Team - Video Coding (JCT-VC). JCT-VC is a collaborative project between MPEG and International Telecommunication Union Telecommunication standardization sector (ITU-T). Currently, an HEVC Working Draft (WD) is defined that includes a number of new tools and is considerably more efficient than H.264/AVC. The HEVC WD specifies that each picture shall belong to a temporal layer and that a syntax element called temporaljd shall be present for each picture in the bitstream, corresponding to the temporal layer the picture belongs to.
The temporal layers are ordered and have the property that a lower temporal layer never references a higher temporal layer. Thus, higher temporal layers can be removed without affecting the lower temporal layers. The removal of temporal layers can be referred to as temporal scaling. Removal of layers can be done in an entity that is neither an encoder nor a decoder, such as a network node. Such an entity can, but is not limited to, forward video bitstream packets from an encoder to a decoder and perform removal of temporal layers without performing full video decoding on the incoming data. The resulting bitstream after one or more temporal layers have been removed is called a subsequence. In HEVC it is possible to signal that a picture is a temporal layer switching point, which indicates that at this picture it is possible for a decoder to start decoding more temporal layers than what was decoded before the switching point. The switching point indication guarantees that no picture following the switching point references a picture from before the switching point that might not have been decoded because it belongs to a higher temporal layer than what was decoded before the switching point. The switching points are therefore very useful for a layer removal entity in order to know when to stop removing a certain temporal layer and start forwarding it. In HEVC the output process is changed compared to H.264/AVC so that marking of pictures as "unused for prediction" is performed prior to decoding of the current picture. The output process is also performed prior to the decoding of the current picture.
HEVC defines a Decoded Picture Buffer (DPB) that consists of frame buffers, also referred to as picture slots or picture buffers, in which decoded pictures are stored. The DPB size is determined from syntax elements in the bitstream. The DPB fullness increases with one when a picture is inserted into the DPB and decreases with one when a picture is removed from the DPB. If the DPB fullness is equal to the DPB size there are no empty frame buffers and the DPB is said to be full. The "bumping" process that is used for outputting pictures basically consists of filling up the DPB and then start outputting as few pictures as possible, in correct output order, to free up a frame buffer for the current picture. Thus, when there is no empty frame buffer, i.e. DPB fullness is equal to DPB size, the "bumping" is invoked repeatedly until there is an empty frame buffer in which to store the current decoded picture.
Picture Order Count (POC) represented by the variable PicOrderCntVal is used in HEVC to define the display order of pictures. POC is also used to identify reference pictures.
The "bumping" process consists of the following steps:
1. The picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as "needed for output".
2. The picture is output, and the picture is marked as "not needed for output".
3. If the picture was marked as "unused for reference", a frame buffer is emptied from the picture and the DPB fullness is decremented by 1. Otherwise (the picture was marked as "used for reference"), the process is repeated from step 1 until there is an empty frame buffer. This frame buffer can now be used to store a new decoded picture. A problem with the combination of removal of temporal layers and the "bumping" process is that the content of the DPB will be different depending on how many temporal layers are decoded. For instance, a layer removal entity can be in use such that the number of temporal layers that are presented to the decoder are fewer than what the encoder produced. Since pictures are output "as late as possible" that means the output delay will increase when only a subset of the temporal layers is decoded.
Increased delay is in itself a problem, but there are also side effects of it that are problematic. For example, when an Instant Decoder Refresh (IDR) picture is received the DPB is emptied, with or without output of pictures. If it is emptied without output of pictures, some pictures will not be output if only a subset of temporal layers is decoded that would otherwise have been output.
Another problem is that the "bumping" process includes filling up the DPB before starting to output. In the current WD of HEVC it is possible to signal that a subsequence consisting of a subset of the temporal layers in the original sequence uses a different DPB size. An encoder that would signal different DPB sizes for different temporal layers would, however, be required to evaluate the "bumping" operations for each subsequence, i.e. for each temporal layer, to validate that DPB size requirements are fulfilled and the display order of pictures is correct.
If the encoder uses different DPB sizes for different temporal layers and additionally signals switching points, it has to track all possible switching alternatives and keep track of the output status in each subsequence to ensure that the picture order is correct for each possible subsequence. This is also the case for a layer removing entity. However, whereas the encoder can control how many subsequences there are and thereby the complexity of this problem a layer removing entity can not, it has to handle every possible incoming bitstream.
SUMMARY
It is a general objective to provide a picture output process to be used in connection with picture decoding.
It is particular objective to provide such a picture output process having acceptable output delays. These and other objectives are met by embodiments disclosed herein. An aspect of the embodiments relates to a method of outputting decoded pictures of a video stream from a decoded picture buffer in a decoder. The method comprises calculating a picture order count limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream. The method also outputs decoded reference pictures stored in the decoded picture buffer and having a respective picture order count value that is lower than the calculated picture order count limit.
A related aspect of the embodiments defines a decoder comprising a decoded picture buffer configured to store decoded pictures of a video stream. A limit calculator of the decoder is configured to calculate a picture order count limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream. The decoder also comprises a picture outputting unit configured to output decoded reference pictures stored in the decoded picture buffer and having a respective picture order count value lower than the picture order count limit calculated by the limit calculator.
A further related aspect of the embodiments defines a receiver comprising a decoder comprising a decoded picture buffer configured to store decoded pictures of a video stream. A limit calculator of the decoder is configured to calculate a picture order count limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream. The decoder also comprises a picture outputting unit configured to output decoded reference pictures stored in the decoded picture buffer and having a respective picture order count value lower than the picture order count limit calculated by the limit calculator. Another aspect of the embodiments relates to a method of encoding a current picture of a video stream in an encoder. The method comprises determining a picture order count limit to have a value enabling a target state of a decoded picture buffer in a decoder for the current picture. The picture order count limit defines a number of decoded reference pictures to be output from the decoded picture buffer in a picture output process for the current picture. The method also comprises determining at least one syntax element representative of the picture order count limit. The current picture is encoded to get an encoded representation of the current picture. The at least one syntax element is associated with this encoded representation. Another related aspect of the embodiments defines an encoder comprising a limit determiner configured to determine a picture order count limit to have a value enabling a target state of a decoded picture buffer in a decoder for a current picture of a video stream. The determined picture order count defines a number of decoded reference pictures to be output from the decoded picture buffer in a picture output process for the current picture. A syntax element determiner is configured to determine at least one syntax element representative of the value determined for the picture order count limit. An encoding unit is configured to encode the current picture to get an encoded representation of the current picture. The encoder also comprises an associating unit configured to associate the at least one syntax element with the encoded representation.
Yet another related aspect of the embodiments defines a transmitter comprising an encoder comprising a limit determiner configured to determine a picture order count limit to have a value enabling a target state of a decoded picture buffer in a decoder for a current picture of a video stream. The determined picture order count defines a number of decoded reference pictures to be output from the decoded picture buffer in a picture output process for the current picture. A syntax element determiner is configured to determine at least one syntax element representative of the value determined for the picture order count limit. An encoding unit is configured to encode the current picture to get an encoded representation of the current picture. The encoder also comprises an associating unit configured to associate the at least one syntax element with the encoded representation.
The present embodiments provide a picture output process enabling a control of which decoded reference pictures to be output from the decoded picture buffer and the timing of outputting the decoded reference pictures. As a consequence, any delay in outputting decoded reference pictures can be reduced as compared to the prior art "bumping" process. I n fact, it is with the present embodiments possible to start outputting decoded reference pictures even if the decoded picture buffer fullness has not reached the defined decoded picture buffer size.
The embodiments are particular advantageous in connection with temporally scaled video streams and sequences where otherwise significant output delays can occur due to removal of various temporal layers.
BRIEF DESCRIPTION OF THE DRAWINGS
The embodiments, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which: Fig. 1 is a schematic illustration of a video stream of pictures comprising one or more slices; Fig. 2 is an illustration of a data packet comprising a NAL unit;
Fig. 3 is an illustration of an encoded representation of a slice;
Fig. 4 is a flow diagram of a method of outputting decoded pictures according to an embodiment; Fig. 5 is a flow diagram illustrating additional, optional steps of the method in Fig. 4 according to an embodiment;
Fig. 6 is a flow diagram illustrating additional, optional steps of the method in Figs. 4 and 5 according to an embodiment;
Fig. 7 is a flow diagram illustrating additional, optional steps of the method in Fig. 4 according to another embodiment;
Fig. 8 is a flow diagram illustrating additional, optional steps of the method in Fig. 7 according to an embodiment;
Fig. 9 is a flow diagram of a method of encoding a picture according to an embodiment;
Fig. 10 is a flow diagram illustrating additional, optional steps of the method in Fig. 9 according to an embodiment;
Fig. 11 is a schematic block diagram of a receiver according to an embodiment; Fig. 12 is a schematic block diagram of a decoder according to an embodiment;
Fig. 13 is a schematic block diagram of a decoder according to another embodiment; Fig. 14 is a schematic block diagram of a transmitter according to an embodiment; Fig. 15 is a schematic block diagram of an encoder according to an embodiment; and
Fig. 16 is a schematic block diagram of an encoder according to another embodiment. DETAILED DESCRIPTION
Throughout the drawings, the same reference numbers are used for similar or corresponding elements.
The present embodiments generally relate to the field of encoding and decoding of pictures of a video stream. In particular, the embodiments relate to outputting decoded reference pictures from a decoded picture buffer (DPB). The embodiments, hence, provide an outputting process that could be used instead of the current "bumping" process mentioned in the background section. The outputting process provides significant advantages over the prior art bumping process in terms of reducing output delay. These advantages are in particular obtained in connection with temporarily scaled video sequences. It is proposed herein to use a limit calculated from syntax elements signaled in the bitstream, i.e. in the encoded data generated by an encoder and transmitted to a decoder. This limit is then used at the decoder to identify those decoded references pictures stored in the DPB that should be output, such as for display. In an embodiment, all pictures in the DPB with a picture order count value, such as PicOrderCntVal, lower than the limit and that have not yet been output are output, such as for display.
A syntax element is a codeword or data element forming part of the encoded data generated by an encoder and to be decoded by a decoder. Hence, a syntax element is typically a codeword or data element forming part of the control data associated with an encoded representation or such control data or header data present in an encoded representation of a picture. A syntax element can, for instance, be a codeword in a slice header of the encoded representation of a picture. Alternatively, a syntax element can, for instance, be a codeword in a parameter set or other control data associated with the encoded representation of a picture, e.g. retrievable from the bitstream based on data present in the encoded representation or sent outside of the bitstream but retrievable based on data present in the encoded representation.
Generally, a picture is typically output for display on a screen of or connected to the decoder. However, a picture could also be output for other reasons including, but not limited, storage on a file; coded in another format or with other properties, such as transcoded; delivered to another unit or device for post-decoding processing; etc. Output as used herein therefore typically relates to output for display but also encompasses other forms of picture output, such as any of the above mentioned examples.
Fig. 1 is a schematic illustration of a video stream 1 of pictures 2. A picture 2 in HEVC is partioned into one or more slices 3, where each slice 3 is an independently decodable segment of a picture 2. This means that if a slice 3 is missing, for instance got lost during transmission, the other slices 3 of that picture 2 can still be decoded correctly. Generally, in order to make slices 3 independent, they should not depend on each other. Hence, in a particular embodiment, no bitstream element of a slice 3 is required for decoding any element of another slice 3.
In HEVC a coded video stream or sequence, i.e. bitstream, comprises Network Abstraction Layer (NAL) units 1 1 as illustrated in Fig. 2. Basically, one NAL unit 1 1 comprises either a slice with a corresponding slice header including control information for that slice or the NAL unit 11 comprises a parameter set. The parameter set comprises control information.
A NAL unit 11 as output from an encoder is typically complemented with headers 12 to form a data packet 10 that can be transmitted as a part of a bistream from the encoder to the decoder. For instance, Real-time Transport Protocol (RTP), User Datagram Protocol (UDP) and Internet Protocol (IP) headers 12 could be added to the NAL unit 11. This form of packetization of NAL units 11 merely constitutes an example in connection with video transport. Other approaches of handling NAL units 11 , such as file format, MPEG-2 transport streams, MPEG-2 program streams, etc. are possible.
Examples of parameter sets that could be carried in NAL units 11 include Adaptation Parameter Set (APS), Video Parameter Set (VPS), Picture Parameter Set (PPS) and Sequence Parameter Set (SPS). APS comprises control information valid for more than one slice. The control information may differ between the slices. PPS comprises control information valid for several pictures, and may be the same for multiple pictures of the same video sequence. SPS comprises control information valid for an entire video sequence. As shown in Fig. 3 an encoded representation 20 of a slice comprises a slice header 21 which independently provides all required data for the slice to be independently decodable. An example of a syntax element or data element present in the slice header 21 is the slice address, which is used by the decoder to know the spatial location of the slice. Another example is the slice quantization delta which is used by the decoder to know what quantization parameter to use for the start of the slice. The encoded representation 20 also comprises, in addition to the slice header 21 , slice data 22 that comprises the encoded data of the particular slice, e.g. encoded color values of the pixels in the slice.
Fig. 4 is a flow diagram illustrating a method of outputting decoded pictures of a video stream from a DPB in a decoder. The method generally starts in step S1. In this step S1 a picture order count (POC) limit, sometimes referred to as limit X herein, is calculated based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream.
In a next step S2 of the method decoded reference pictures stored in the DPB and having a respective POC value lower than the POC limit calculated in step S1 are output.
Thus, one or more syntax elements obtained based on the encoded representation of a current picture are used to calculate a POC limit, which in turn defines those decoded reference pictures in the DPB that should be output. This means that an encoder, by determining the value(s) of the at least one syntax element, can control the delay of outputting decoded reference pictures from the DPB. It is even possible for the encoder to control the output process so that decoded reference pictures can be output from the DPB before the DPB is full.
In an embodiment step S1 comprises calculating the POC limit based a POC value of the current picture and the at least one syntax element retrieved based on the encoded representation of the current picture. In an implementation example of this embodiment the at least one syntax element represents a so-called output distance value or OutputDistance. In another implementation example of this embodiment the output distance value is calculated based on the at least one syntax element. In these implementation examples the POC limit is preferably calculated based on the POC value of the current picture and the output distance value and more preferably as a difference between the POC value and the output distance value.
Hence, the output distance value defines a distance or step among POC values from the position of the current picture in the output order to the position of the latest, according to the output order, picture that should be output. Hence, any pictures stored in the decoded picture buffer and that precede, according to the output order as defined by the POC values, this latest picture should also be output in step S2. The output distance value is thereby used, preferably together with the POC value of the current picture, to define the picture with the highest POC value, i.e. the POC limit, which should be output. Pictures stored in the decoded picture buffer and that have respective POC values lower than this highest POC value should also be output.
In a particular embodiment step S1 involves calculating the POC limit as PicOrderCnt{ CurrPic ) - OutputDistance, wherein PicOrderCnt{ CurrPic ) denotes the POC value of the current picture and OutputDistance denotes the output distance value. However, the POC limit can be determined in other ways.
Accordingly, in this embodiment, a parameter OutputDistance is introduced. OutputDistance, as the retrieved syntax element, or syntax elements used to calculate the output distance value is (are) preferably signaled in the bitstream from the encoder to the decoder. If syntax elements are signaled, the output distance value is calculated at the decoder. The output distance value is then used for outputting "old" pictures, i.e. already decoded reference pictures, from the DPB in order to reduce the content of the DPB such that the output delay is decreased.
If syntax elements are used to calculate OutputDistance, i.e. the output distance value, the syntax elements may be signaled in the slice header or in a parameter set.
In the former case, the encoded representation 20 of the current picture, as represented by one or more encoded representations 20 of slices as shown in Fig. 3, may include the at least one syntax element that is used to calculate the POC limit. The at least one syntax element is then used to calculate the output distance value, which in turn is used, preferably together with the POC value of the current picture, to calculate the POC limit. This means that the decoder can directly retrieve the at least one syntax element during the parsing and decoding of the slice header 21.
In the latter case, the at least one syntax element is not necessarily present in the encoded representation 20 of the picture. However, the encoded representation 20 comprises data enabling retrieval of the at least one syntax element. Thus, if the at least one syntax element is present in a parameter set, such as an APS or a PPS, the encoded representation 20 preferably comprises a parameter set identifier, such as an APS identifier or a PPS identifier. The parameter set identifier is typically present in the slice header 21 of the encoded representation 20. In such a case, the parameter set identifier is retrieved during parsing and decoding of the slice header 21 and the decoder can then locate the particular parameter set that is identified by the parameter set identifier. The at least one syntax element is thereby retrieved from the identified parameter set. In an alternative approach, the encoded representation 20 comprises, such as in the slice header 21 , a first parameter set identifier, such as a PPS identifier, identifying a first parameter set, such as a PPS. This first parameter set could then comprise a second parameter set identifier, such as an SPS identifier, identifying a second parameter set, such an SPS. The at least one syntax element could then be present in the second parameter set. The at least one syntax element is thereby obtained based on the second parameter set identifier as obtained using the first parameter set identifier retrieved from the encoded representation 20. If more than one syntax element is employed to calculate the POC limit, all of them could be retrieved directly from the encoded representation 20 or all could be retrieved from a parameter set identified based on a parameter set identifier obtained based on the encoded representation 20. Alternatively, at least one of the multiple syntax element is directly retrieved from the encoded representation 20 and at least one of the multiple syntax element is retrieved from one or more parameter sets. As a further variant, the multiple syntax elements could be carried in different parameter sets, which are identified based on data, i.e. one or more parameter set identifiers, carried in the encoded representation 20.
In an embodiment, the variable OutputDistance, i.e. the output distance value, is introduced and calculated from syntax elements signaled for each picture. The syntax elements may be signaled in slice headers or in parameter sets. Reference pictures in the DPB that have not yet been output when a current picture [CurrPic) is to be decoded and that have a POC value, such as represented by PicOrderCntVal, lower than PicOrderCnt{ CurrPic ) - OutputDistance are output. In a preferred embodiment OutputDistance is a non-negative integer. In an embodiment there is no limitation of the maximum value for OutputDistance. In a preferred embodiment OutputDistance is, though, limited to be in the range from 0 and N, inclusive, for some positive integer N. An example of such an N is MaxPicOrderCntLsbA , which is the maximum possible value that can be signaled for a syntax element pic_order_cnt_lsb. In HEVC PicOrderCnt{ ) is defined to be consistently increasing relative to the latest IDR picture in decoding order. The variable PicOrderCntVal is derived as PicOrderCntVal = PicOrderCntMsb + pic_order_cnt_lsb, where pic_order_cnt_lsb is signaled for each picture and PicOrderCntMsb is calculated from a previous picture. The variable MaxPicOrderCntLsb, which is calculated from syntax elements in the SPS, is used to define the range of possible values for pic_order_cnt_lsb, i.e. 0 to MaxPicOrderCntLsbA , inclusive.
The function PicOrderCnt{ ) is defined for a picture p/'cX as PicOrderCnt{ p/'cX) = PicOrderCntVal of the picture p/'cX.
Thus, in an embodiment the maximum value of the variable OutputDistance is defined based on the syntax element in the SPS used to define the range of the parameter pic_order_cnt_lsb, based on which the POC value {PicOrderCntVal) of a picture is calculated.
In another example the maximum value of the variable OutputDistance, i.e. N is maxPOCA if wrapped POC is used.
In this example the function PicOrderCnt{ ) is defined with wrap around. POC wrap around works like this. Assume a sequence of pictures that constructs a video sequence. Regardless of the length of the video sequence a display order number can be assigned to each picture that simply represents the order in which the pictures should be displayed. Denote this number the "true_POC". However, in order to save bits during encoding the true_POC is not signaled in the bitstream. Instead a range-limited syntax element is encoded referred to as "POC" and limited to be in the range from 0 to MaxPOCA , inclusive. One possible way to calculate POC from the true_POC is to use the modulo operation denoted "%", i.e. POC = true_POC % MaxPOC. That means, for example, that the true_POC value equal to MaxPOC+\ will be given the POC value 1. A DiffPOC{ ) function can be defined for wrapped POC to give correct differences, i.e. distances, between two POC values as long as -MaxPOCIl≤ true_POC distance < MaxPOCIl.
In a particular embodiment POC values increases by a defined number when proceeding through pictures according to the output order. For instance, the POC values could increase by one, such as ...5, 6, 7, 8, ... Further embodiments of the method of outputting decoded pictures will now be further described in connection with Fig. 5. Fig. 5 illustrates additional, optional steps of the method in Fig. 4. The method starts in step S10, in which a first output flag is retrieved based on the encoded representation. This retrieval of the first output flag is advantageously performed by retrieving the first output flag from the encoded representation, such as from the slice header. In an alternative approach, the first output flag is retrieved from a parameter set identified based on data retrieved from the encoded representation, such as a parameter set identifier present in the slice header.
The value of this first output flag is investigated in the optional step S11. If the first output flag has a first predefined value, such as It , the method continues to step S12. In this step S12 an output distance syntax element is retrieved based on the encoded representation. This output distance syntax element is preferably present in the encoded representation, such as in the slice header. Alternatively, it is present in a parameter set identifiable based on a parameter set identifier present in the encoded parameter set.
The retrieved output distance syntax element is employed in step S13 to determine an output distance value. The method then continues to step S1 of Fig. 4 where the POC limit is calculated based on a POC value of the current picture and the output distance value, preferably equal to the POC value of the current picture subtracted by the output distance value determined in step S13.
In an embodiment the output distance value is determined in step S13 to be equal to half of a difference syntax element representing a largest POC difference that can be signaled from the encoder to the decoder, e.g. MaxPicOrderCntLsbl2, if the output distance syntax element has a predefined value, preferably 0 or ( n. In this embodiment the output distance value is preferably determined in step S13 to be equal to the output distance syntax element if the output distance syntax element has a value different from the predefined value, such as 0.
If the first output flag as retrieved in step S10 has a second predefined value, preferably (tain, the method instead continues to step S14. In this step S14 the output distance value is preferably determined to be equal to zero. The method then continues to step S1 of Fig. 4 where the POC limit is calculated based on the output distance value. In a preferred embodiment, the POC limit is calculated based on the POC value of the current picture and the output distance value, more preferably the POC limit is equal to the POC value of the current picture subtracted by the (zero) output distance value. Hence, in the above described embodiments the variable OutputDistance, i.e. the output distance value, and therefore the POC limit is calculated from two new syntax elements: output_all jprecedingjpicsjiag (first output flag) and output_distance_idc (output distance syntax element). Thus, in an embodiment, if output_all receding icsjlag equals 0, OutputDistance = 0. Otherwise, i.e. \ output_aii jirecedingjiicsjlag is different from 0, such as 1 , then if output_distance_idc equals 0 then OutputDistance = MaxPicORderCntLsbl2. Otherwise, i.e. if output_all receding icsjlag is different from 0, such as 1 , and output_distance_idc is different from 0, then OutputDistance = output_distance_idc.
In another embodiment a further syntax element could be used to signal whether syntax element(s) to be used for calculating the output distance value and therefore the POC limit is present for each picture or if the output distance value could be inferred to be a specific value. Such an embodiment is shown in Fig. 6.
Fig. 6 starts in step S20 where a second output flag is retrieved based on the encoded representation. This second output flag is preferably retrieved from a parameter set, such as PPS or SPS, using a parameter set identifier retrieved from the encoded representation or retrieved from a second parameter set identified based on a second parameter set identifier present in the encoded representation. It could alternatively be possible to include the second output flag in the encoded representation, such as in the slice header.
In a next optional step S21 it is investigated whether the second output flag has a predefined value, preferably I t . In such a case, the method continues to step S22 where the output distance value is determined or inferred to be set equal to a predefined value, preferably zero. The method then continues to step S1 where the POC limit is calculated based on the zero output distance value. In a preferred embodiment the POC limit is calculated to be based on, preferably equal to, the POC value of the current picture subtracted by the output distance value. Since the output distance value is zero in this case the POC limit is preferably equal to the POC value of the current picture.
If the second output flag does not have the predefined value, i.e. has another predefined value, preferably Own, the method continues to step S10. Hence, if the second output flag is, for instance, equal to (Li the retrieving steps S10, S12 and the determining step S13 of Fig. 5 or the retrieving step S10 and the determining step S14 of Fig. 5 are preferably performed in order to determine the output distance value, which is used in step S1 to calculate the POC limit.
Thus, in an embodiment another syntax element output_distance_always_zero (second output flag) in SPS, PPS or in another appropriate data field defines whether syntax elements used to calculate the variable OutputDistance (output distance value) shall be present for each picture or if OutputDistance shall be inferred to be a specific value. If syntax elements used to calculate OutputDistance are not present it is preferred that OutputDistance is inferred to be set equal to 0. In an embodiment the syntax element output_distance_always_zero is a one bit flag. OutputDistance can then be calculated as exemplified below.
If output_distance_always_zero equals 1 , OutputDistance = 0. Otherwise, i.e. output_distance_always_zero is different from 1 , such as 0, then if outpuLall jprecedingjpicsjlag (first output flag) equals 0, then OutputDistance = 0. Otherwise, i.e. output_distance_always_zero is different from 1 , such as 0, and outpuLall _preceding_pics_flag is different from 0, such as 1 , then if outpuLdistanceJdc (output distance syntax element) equals 0, then OutputDistance = MaxPicOrderCntLsbl2. Otherwise, i.e. output_distance_always_zero is different from 1 , such as 0, outpuLall jprecedingjpicsjlag is different from 0, such as 1 , and outpuLdistanceJdc is different from 0, then OutputDistance = outpuLdistanceJdc.
The decoded reference pictures are, as discussed in the background section, stored in a respective frame buffer in the DPB. In a particular embodiment step S2 of Fig. 4 comprises outputting, in increasing order of POC values starting from a lowest POC value, decoded reference pictures that i) are stored in the DPB, ii) have a respective POC value that is lower than the POC limit and iii) are marked as "needed for output".
Thus, in this embodiment decoded reference pictures stored in frame buffers of the DPB can be marked as needed for output if they need to be output, e.g. for display. Correspondingly, a decoded reference picture that is not needed for output, such as has already been output, e.g. for display, is typically marked as "not needed for output".
In an optional embodiment the method also comprises an additional step S30 as shown in Fig. 7. In this step S30 the decoded reference pictures that are output in step S2 are marked as not needed for output. This means that the decoded reference picture(s) that previously was(were) marked as needed for output and that was(were) output in step S2 of Fig. 4 is(are) remarked as not needed for output in step S30. This remarking is used to indicate that the decoded reference picture(s) has(have) already been output for display and therefore do(es) not need to be output any longer. An optional but preferred additional step S31 of the method comprises emptying any frame buffer of the DPB that stores a decoded reference picture marked as "unused for reference" and marked, such as in step S30, as "not needed for output". Hence, at this step S31 one or more of the frame buffers of the DPB could become empty and available for storing a new decoded picture. If any frame buffer is emptied in step S31 the DPB fullness is preferably reduced by the corresponding number of frame buffers that have been emptied in step S31.
Once the outputting process as disclosed in the foregoing has been performed the encoded representation can be decoded in step S40 of Fig. 8 to get a current decoded picture. Decoding of pictures using the slice data of the encoded representation and control information as defined by the slice header is performed according techniques well known in the art. The current decoded picture obtained in step S40 can then be stored in an empty frame buffer in the DPB in step S41.
Hence, in an embodiment outputting of decoded reference pictures are preferably performed prior to decoding the slice data of the encoded representation of the current slice.
Thus, in an embodiment the removal of pictures from the DPB before decoding of the current picture, but after parsing the slice header of the first slice of the current picture, proceeds as follows. The decoding process for reference picture set is invoked. If the current picture is not an IDR picture, frame buffers containing a picture which is marked as "not needed for output" and "unused for reference" are emptied without output. The DPB fullness is decremented by the number of frame buffers emptied. When there is one or more pictures p/cX in the DPB marked as "needed for output" with PicOrderCnt{ p/cX ) < PicOrderCnt{ CurrPic ) - OutputDistance the output process specified below is invoked repeatedly until all pictures p/cX with PicOrderCnt{ p/cX ) < PicOrderCnt{ CurrPic ) - OutputDistance have been marked as "not needed for output". The output process is preferably invoked in the following cases. There is one or more pictures p/cX in the DPB marked as "neded for output" with PicOrderCnt{ p/cX ) < PicOrderCnt{ CurrPic ) - OutputDistance as specified above. The output process consists, in an embodiment, of the following ordered steps: 1. The picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as "needed for output".
2. The picture is optionally cropped, using the cropping rectangle specified in the active SPS for the picture, the optionally cropped picture is output, and the picture is marked as "not needed for output".
3. If the frame buffer that included the picture that was output and optionally cropped contains a picture marked as "unused for reference", the frame buffer is emptied and the DPB fullness is decremented by 1. In an alternative version of the embodiments, wrapped POC is used to signal the POC values. For that case PicOrderCntVal and PicOrderCnt{ ) are calculated relative each current picture.
Fig. 9 is a flow diagram of a method of encoding a current picture of a video stream in an encoder according to an embodiment. The method generally starts in step S50 where a POC limit is determined to have a value enabling, determining or defining a target state of a DPB in a decoder for the current picture. The POC limit determined in step S50 defines a number of decoded reference pictures to be output from the DPB in a picture output process invoked for the current picture.
In a next step S51 at least one syntax element representative of the value of the POC limit determined in step S50 is determined. Hence, the at least one syntax element determined in step S51 enables determination or calculation of the POC limit.
The current picture is encoded in step S52 to get an encoded representation of the current picture. This encoded representation may be in the form of an encoded representation 20 of a slice comprising a slice header 21 and slice data 22, such as packed into a NAL unit 11 as shown in Fig. 2. Alternatively, the encoded representation of the current picture could be in form of, if the current picture consists of multiple slices, multiple respective encoded representations 20 of slices, each having a respective slice header 21 and slice data 22. These multiple encoded representations 20 of slices could be packed in separate NAL units 11.
Encoding of a picture is performed according to techniques well known in the art of picture and video coding. The at least one syntax element determined in step S51 is associated with or to the encoded representation in step S53. This step S53 can be performed prior to, after or substantially in parallel with step S52. Associating the at least one syntax element with the encoded representation can be performed according to various embodiments as mentioned herein. The at least one syntax element could, for instance, be added to the encoded representation, such as inserted into the slice header of the encoded representation in step S53. If the picture is composed of multiple slices and each slice is encoded into a separate encoded representation of a slice, the at least one syntax element could be inserted into the encoded representation of the first slice of the picture, such as in the slice header for this first slice in step S53. However, in order to provide robustness in the case the encoded representation of the first slice got lost during the transmission from the encoder to the decoder, each encoded representation of a slice for the picture preferably comprises the at least one syntax element. As an alternative of inserting the at least one syntax element into the slice header of the encoded representation of the slices for the picture the at least one syntax element could be inserted, in step S53, into one or more parameter sets. In such a case, one or more parameter set identifiers enabling identification of the relevant parameter set(s) is(are) inserted, in step S53, into the encoded representation of the picture, for instance in the slice headers of each slice in the picture. If multiple syntax elements are determined in step S51 these could be distributed among a parameter set and slice headers as previously disclosed herein. In such a case, step S53 involves inserting at least one syntax element into the slice header and inserting at least one parameter set identifier into the slice header, where this at least one parameter set identifier enables identification of at least one parameter set carrying at least one syntax element as determined in step S51.
The encoding method as shown in Fig. 9 can therefore, by determining the POC limit to have a particular value, determine or define a target state that the DPB in the decoder will have for the current picture. This means that if a particular target state of the DPB is desired, such as a desired DPB fullness, the POC limit is determined in step S50 to have a value that will achieve the particular target state, such as DPB fullness, when processing the encoded representation of the current picture at the decoder.
As previously disclosed herein the DPB preferably comprises a number of frame buffers in which decoded reference pictures are stored. In an embodiment of step S50 the POC limit is determined to have a value such that at least one frame buffer in the DPB is emptied from a decoded reference picture marked as unused for reference if there are no empty frame buffers in the DPB prior to emptying the at least one frame buffer. Hence, in this case the DPB fullness, prior to processing the current picture at the decoder, is equal to the DPB size so there are no empty frame buffers in which the current picture can be entered once it has been decoded by the decoder. The desired target state of the DPB is in this case therefore to achieve a DPB fullness that is lower than the DPB size to allow room for the current picture in the DPB.
The encoder has all the knowledge of the DPB status and therefore knows which decoded reference pictures that are stored in the frame buffers of the DPB in the decoder at the time of decoding of the current picture. This means that the encoder also knows the respective POC values of these stored decoded reference pictures, their markings and the POC value of the current picture. All these, i.e. the POC values and the markings, are actually determined and set by the encoder. The encoder can thereby, such as based on the POC values of the decoded reference pictures stored in the DPB prior to decoding the current picture and the POC value of the current picture, determine a POC limit to have a value so that when the decoder performs the previously described picture output method or process at least one frame buffer of the DPB is emptied.
In an alternative or additional embodiment of step S50 the POC limit is determined to have value defined based on the coding structure of the video stream and preferably based on POC values of future pictures of the video stream, i.e. pictures following the current picture in decoding order. This embodiment is particular suitable if there already is at least one empty frame buffer in the DPB for the current picture. Thus, the POC limit value is in this embodiment determined based on the coding structure of the video stream, i.e. the encoding and decoding relationships between pictures in the video stream. Information of pictures that are encoded and thereby decoded based on other pictures, i.e. reference pictures, can be used in step S50 to, at least partly, define that the DPB comprises the relevant reference pictures that are to be used by the decoder when decoding the current picture and preferably also when decoding following pictures of the video stream. In other words, the encoder can, by determining a suitable value of the POC limit in step S50, make sure that reference pictures that have already been marked as "unused for reference", i.e. are no longer needed as reference picture for the current and/or following pictures, are emptied from the DPB so that new reference pictures can be added to the DPB. The POC limit is thereby determined so that a target status of the DPB is achieved and any reference pictures that might be needed as reference for future picture decoding could be entered in frame buffers of the DPB. The syntax element that is determined in step S51 of Fig. 9 depends on the particular embodiment. For instance, the syntax element determined in step S51 could be at least one of output_distance_idc, MaxPicOrderCntLsb, output_all receding icsjlag, output_distance_always_zero. Thus, the at least one syntax element determined in step S51 could, for instance, be the previously mentioned output distance syntax element and preferably also the first output flag and optionally the second output flag. Additionally, the maximum picture order count value could also be included as syntax element.
For instance, the output distance value [OutputDistance) could be determined based on the POC limit, such as PicOrderCnt{ CurrPic ) - POC limit. In an embodiment, if this OutputDistance becomes zero, the first output flag [outpu all jirecedingjiicsjlag) could be set to zero. Alternatively, if OutputDistance should be equal to half of the largest POC difference [MaxPicOrderCntLsblT) the first output flag is preferably set to one and the output distance syntax element [output_distance_idc) is preferably set to zero. Otherwise the first output flag is preferably set to one and the output distance syntax element is preferably set to the determined value of the output distance value.
In an alternative embodiment also the second output flag [output_distance_always_zero) is used. This is in particular beneficial if multiple of the pictures in the video stream should have an output distance value of zero.
In an embodiment a bitstream restriction may be imposed on the value of the output distance. A reason for this is that the bitstream otherwise could become somewhat sensitive to loss of data packets carrying encoded slices (see Fig. 2) if very large values are allowed for the output distance. Fig. 10 is a flow diagram illustrating additional, optional steps of the method in Fig. 9 when using such bitstream restrictions. This embodiment is particular suitable for HEVC or other video codecs in which each picture has a respective POC value and a respective temporal identifier.
The method continues from step S51 in Fig. 9. In a next step S60 a value X is compared to another value X'. The value X represents and is preferably equal to the highest POC value of all decoded pictures of the video stream with temporal identifier lower than or equal to a temporal identifier of the current picture and that have been output prior to invoking the output method or process for the current picture. The value X' correspondingly represents and is preferably equal to the highest POC value of all decoded pictures of the video stream with temporal identifier lower than or equal to the temporal identifier of the current picture and that have been output after invoking the output method or process for a previous picture in the video stream. This previous picture is previous to the current picture according to the decoding order of the video stream. In addition, the previous picture has a temporal identifier lower than or equal to the temporal identifier of the current picture. In a particular embodiment, the previous picture is preferably the closest, according to the decoding order, picture with temporal identifier equal to or lower than the temporal identifier of the current picture that precedes the current picture according to the decoding order.
If the value X is equal to the value X' as investigated in step S60 the method continues to step S51 of Fig. 9. Hence, in this embodiment no restriction is needed for the output distance value.
However, if the value X is not equal to the value X' the method continues, in this embodiment, to step S61. In step S61 a new value of the syntax element is set, e.g. by setting a new value of the output distance value, which is smaller than the POC of the current value subtracted by the value X. Hence, in an embodiment the output distance value is determined to be smaller than PicOrderCntVal{ CurrPic ) - X.
Fig. 12 is a schematic block diagram of a decoder 40 according to an embodiment. The decoder 40 comprises a decoded picture buffer (DPB) 48 configured to store decoded pictures of a video stream. The decoder 40 also comprises a limit calculator 41 , also denoted limit calculating unit, means or module. The limit calculator 41 is configured to calculate a POC limit based on at least one syntax element retrieved based on an encoded representation of a current picture of the video stream. A picture outputting unit 42, also denoted picture output or picture outputting means or module, is implemented in the decoder 40 and configured to output decoded reference pictures stored in the DPB 48 and having a respective POC value that is lower than the POC limit.
In a particular embodiment the decoder 40 comprises an optional flag retriever 43, also denoted flag retrieving unit, means or module. The flag retriever 43 is configured to retrieve a first output flag based on the encoded representation. The flag retriever 43 could be configured to retrieve the first output flag from a slice header of the encoded representation or from a parameter set using a parameter set identifier obtained based on, such as present in, the slice header of the encoded representation as previously discussed herein.
The decoder 40 preferably, in this particular embodiment, also comprises an optional element retriever 44, also denoted element retrieving unit, means or module. The element retriever 44 is configured to retrieve an output distance syntax element if the first output flag retrieved by the flag retriever 43 has a first predefined value, such as It . The element retriever 44 is configured to retrieve this output distance element based on the encoded representation, such as from a slice header in the encoded representation. An optional value determiner 45, also denoted value determining unit, means or module, of the decoder 40 is configured to determine an output distance value based on the output distance syntax element retrieved by the element retriever 44. In this embodiment the limit calculator 41 is configured to calculate the POC limit to be based on, preferably equal to, the POC value of the current picture subtracted by the output distance value determined by the value determiner 45. The value determiner 45 is, in an embodiment, configured to determine the output distance value to be based on, preferably equal to, half of a largest POC difference if the output distance syntax element retrieved by the element retriever 44 has a predefined value, such as 0. Correspondingly, if the output distance syntax element has a value different from this predefined value the value determiner 45 is preferably configured to determine the output distance value to be based on, preferably equal to, the value defined or represented by the output distance syntax element.
In an embodiment, the value determiner 45 is configured to determine the output distance value to be equal to zero if the first output flag retrieved by the flag retriever 43 has a second predefined value, such as (Li. In this embodiment the limit calculator 41 preferably calculates the POC limit to be equal to the POC value of the current picture subtracted by the output distance value, i.e. equal to the POC value of the current picture since the output distance value is zero in this example.
In an optional embodiment the flag retriever 43 is configured to retrieve a second output flag based on the encoded representation. The flag retriever 43 typically retrieves the second output flag from a parameter set identified based on a parameter set identifier present in a slice header of the encoded representation or present in a second parameter set identified based on a second parameter set identifier present in the slice header of the encoded representation. In an embodiment the flag retriever 43 is then configured to retrieve the first output flag if this second output flag has a predefined value, such as (Li. If the second output flag instead has a second predefined value, such as It , the output distance value could have a predefined value, such as zero, so that no retrieval of any first output flag or retrieval of any output distance syntax element is needed to calculate the POC limit.
The picture outputting unit 42 is, in an embodiment, configured to output, in increasing order of POC values starting with a lower POC value, decoded reference pictures that are stored in the DPB and have a respective POC value lower than the POC limit calculated by the limit calculator 41 and are marked as needed for output.
The decoder 40 comprises, in an optional embodiment, a picture marking unit 46, also denoted picture marker or picture marking means or module. The picture marking unit 46 is configured to mark the decoded reference pictures that are output by the picture outputting unit 42 as "not needed for output" to indicate that the decoded reference picture(s) already has(have) been output.
The picture outputting unit 42 is preferably configured to empty any frame buffer 49 of the DPB 48 storing a decoded reference picture that is marked as "not needed for output" and marked as "unused for reference".
The decoder 40 preferably also comprises a decoding unit 47, also denoted picture decoder or decoding means or module. The decoding unit 47 is configured to decode the encoded representation to get a current decoded picture. This current decoded picture can then be stored in the DPB 48 in an empty frame buffer 49.
In a particular embodiment the decoder decodes 40 the information needed to calculate the POC limit (limit X), e.g. PicOrderCntVal of the current picture and OutputDistance. The syntax elements that may be used for calculating OutputDistance may be specified by standard specifications. The decoder 40 calculates OutputDistance based on the received syntax elements.
The decoder 40 displays pictured in the DPB 48 marked as "needed for output" with PicOrderCntVal lower than the POC limit (limit X), e.g. defined as PicOrderCnt{ CurrPic ) - OutputDistance, in increasing order of PicOrderCntVal starting with the one with lowest PicOrderCntVal. Pictures that have been displayed are marked as "not needed for output".
The current picture is decoded and marked according to its OutputFlag, e.g. "needed for output" or "not needed for output".
The decoder 40 of Fig. 12 with its including units 41 -47 could be implemented in hardware. There are numerous variants of circuitry elements that can be used and combined to achieve the functions of the units 41-47 of the decoder 40. Such variants are encompassed by the embodiments. Particular examples of hardware implementation of the decoder 40 is implementation in digital signal processor (DSP) hardware and integrated circuit technology, including both general-purpose electronic circuitry and application- specific circuitry.
The decoder 50 can also be implemented by means of a processor 52 and a memory 54 as illustrated 5 in Fig. 13. Thus, in an embodiment the decoder 50 is implemented e.g. by one or more of a processor 52 and adequate software with suitable storage or memory 54 therefore, a programmable logic device (PLD) or other electronic component(s). In addition, the decoder 50 preferably comprises an input or input unit 51 configured to receive the encoded representations of the video stream, such as in the form of NAL units. A corresponding output or output unit 53 is configured to output the decoded 10 pictures.
The decoder can be implemented in a device, such as a mobile device exemplified as mobile phones, tablets, video camera, set-top-box, etc. Fig. 1 1 illustrates such an example where the decoder 32 is located in a receiver 30, such as in a video camera or a display, e.g. in a mobile device. The receiver
15 30 then comprises an input or input unit 31 configured to receive a coded bitstream, such as data packets of NAL units as shown in Fig. 2. The encoded representations of the NAL units are decoded by the decoder 32 as disclosed herein. The decoder 32 preferably comprises or is connected to a reference picture buffer 34 that temporarily stores already decoded reference pictures 35 that are to be used as reference pictures for other pictures in the video stream. Decoded pictures are output from the
20 receiver 30, such as from the reference picture buffer 34, by means of an output or output unit 33.
These output pictures are sent to be displayed to a user on a screen or display of or connected, including wirelessly connected, to the receiver 30.
Fig. 15 is a schematic block diagram of an encoder 70 according to an embodiment. The encoder 70 25 comprises a limit determiner 71 , also denoted limit determining unit, means or module. The limit determiner 71 is configured to determine a POC limit to have a value enabling, defining or determining a target state of a DPB in a decoder for a current picture of a video stream. This determined POC limit defines a number of decoded reference pictures to be output from the DPB in a picture output process invoked by the decoder for the current picture.
30
A syntax element determiner 72, also denoted syntax element determining unit, means or module, is configured to determine at least one syntax element representative of the value of the POC limit determined by the limit determiner 71. An encoding unit 73, also denoted picture encoder or encoding means or module, of the encoder 70 is configured to encode the current picture to get an encoded representation of the current picture.
The encoder 70 also comprises an associating unit 74, also denoted associator or associating means or module. The associating unit 74 is configured to associate the at least one syntax element determined by the syntax element determiner 72 with or to the encoded representation. For instance, the associating unit 74 could be configured to include the a syntax element into the slice header of the encoded representation and/or include a parameter set identifier in the slice header where this parameter set identifier enables identification of a parameter set comprising a syntax element.
In an embodiment the limit determiner 71 is configured to determine the POC limit to have a value selected so that at least one frame buffer in the DPB is emptied from a decoded reference picture marked as unused for reference if there are no such empty frame buffers in the DPB prior to emptying the at least one frame buffer. Thus, the particular value determined for the POC limit frees a frame buffer and thereby makes room in the DPB for the current picture during decoding.
In alternative or additional embodiment the limit determiner 71 is configured to determine the POC limit to have a value defined based on the coding structure of at least a portion of the video stream and preferably of POC values of future pictures of the video stream. This embodiment is particular suitable if there is at least one empty frame buffer in the DPB for the current picture. Hence, the limit determiner 71 preferably and at least partly determines the POC limit based on coding structure, i.e. how pictures of the video stream are encoded and decoded relative to each other, i.e. used as reference pictures, for the current picture but preferably also for future pictures of the video stream that follow the current picture according to the decoding order.
In an embodiment the encoder 70 may impose bitstream restriction to syntax element determined by the syntax element determiner 72. In such a case, the encoder 70 preferably comprises a comparator 75, also denoted comparing unit, means or module. The comparator 75 is configured to compare value X with a value X'. The value X is preferably equal to the highest POC of all decoded pictures of the video stream with temporal identifier lower than or equal to a temporal identifier of the current picture and that have been output prior to invoking the picture output process for the current picture. The value X' is preferably equal to the higher POC value of all decoded pictures with temporal identifier lower than or equal to the temporal identifier of the current picture and that have been output after invoking a picture output process for a previous picture. The previous picture is previous to the current picture according to the decoding order of the video stream and has a temporal identifier lower than or equal to the temporal identifier of the current picture.
In this embodiment the syntax element determiner 72 is configured to set a new value of at least one syntax element of the at least one syntax element, which value is smaller than the POC value of the current picture subtracted by the value X if the value X is different from the value X' as determined by the comparator 75. If the value X is equal to the value X' no new value of the at least one syntax element needs to be determined. In a particular embodiment and before encoding a new picture the encoder 70 preferably ensures that there is an empty frame buffer in the DPB that can be used by the new picture. If the DPB is full, i.e. there is no empty frame buffer, a frame buffer is emptied by the encoder by selecting a value of OutputDistance such that at least one picture marked as "unused for reference" is output. Otherwise the encoder 70 preferably selects any value for OutputDistance within the specified allowed range according to what is needed for the coding structure and POC values of future pictures. If the selected value for OutputDistance is larger than what is allowed by an optional bitstream restriction the encoder 70 preferably selects a new value for OutputDistance that is not larger than what is required by the bitstream restriction. The encoder 70 encodes the value of OutputDistance using the syntax elements form which the OutDistance is calculated at the decoder. The picture is encoded.
The encoder 70 of Fig. 15 with its including units 71-75 could be implemented in hardware. There are numerous variants of circuitry elements that can be used and combined to achieve the functions of the units 71-75 of the encoder 70. Such variants are encompassed by the embodiments. Particular examples of hardware implementation of the encoder 70 is implementation in digital signal processor (DSP) hardware and integrated circuit technology, including both general-purpose electronic circuitry and application- specific circuitry. The encoder 80 can also be implemented by means of a processor 82 and a memory 84 as illustrated in Fig. 16. Thus, in an embodiment the encoder 80 is implemented e.g. by one or more of a processor 82 and adequate software with suitable storage or memory 84 therefore, a programmable logic device (PLD) or other electronic component(s). In addition, the encoder 80 preferably comprises an input or input unit 81 configured to receive the pictures of the video stream. A corresponding output or output unit 83 is configured to output the encoded representations of the pictures, such as in the form of NAL units.
The encoder can be implemented in a device, such as a mobile device exemplified as mobile phones, tablets, video camera, etc. Fig. 14 illustrates an example of such a device in the form of a transmitter 60, e.g. implemented in a video camera e.g. in a mobile device. The transmitter 60 then comprises an input or input unit 61 configured to receive pictures of a video stream to be encoded. The pictures are encoded by the encoder 62 as disclosed herein. Encoded pictures are output from the transmitter 60 by an output or output unit 63 in the form of a coded bitstream, such as of NAL units or data packets carrying such NAL units as shown in Fig. 2.
A network node may use the embodiments. For instance, pictures are forwarded by the network node and temporal layer switches are performed at temporal layer switching points. According to the embodiments this picture forwarding can be performed by the network node without having to care about the DPB status in the decoder, i.e. without regard to what pictures that have been output, for different temporal layers.
The present embodiments can be applied to different video codecs and different types of extensions, including, but not limited to, multi-view video codecs and scalable video codecs.
Temporal identifiers as discussed herein could, in alternative embodiments, be replaced by general layer identifiers that do not necessarily have to relate to different temporal layers. Such layer identifier could, for instance, define various camera view, different scalability layers, spatial layers, etc. Furthermore, picture order count or POC is used herein as identifier of the pictures in the video stream, either in consistently increasing order relative to a latest IDR picture or by using POC wrap around. The embodiments are, however, not limited to using picture order count values as picture identifiers. In alternative embodiment other types of picture identifiers could be used instead of POC values. It is to be understood that the choice of interacting units or modules, as well as the naming of the units are only for exemplary purpose, and may be configured in a plurality of alternative ways in order to be able to execute the disclosed process actions. The functions of the various elements including functional blocks may be provided through the use of hardware such as circuit hardware and/or hardware capable of executing software in the form of coded instructions stored on computer readable medium. Thus, such functions and illustrated functional blocks are to be understood as being either hardware-implemented and/or computer-implemented, and thus machine-implemented.
In terms of hardware implementation, the functional blocks may include or encompass, without limitation, digital signal processor (DSP) hardware, reduced instruction set processor, hardware (e.g., digital or analog) circuitry including but not limited to application specific integrated circuit(s) (ASIC), and (where appropriate) state machines capable of performing such functions.
The embodiments described above are to be understood as a few illustrative examples of the present invention. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present invention. I n particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible. The scope of the present invention is, however, defined by the appended claims.

Claims

1. A method of outputting decoded pictures (35) of a video stream (1 ) from a decoded picture buffer (34, 48) in a decoder (40, 50), said method comprising:
calculating (S1) a picture order count limit based on at least one syntax element retrieved based on an encoded representation (20) of a current picture (2) of said video stream (1); and
outputting (S2) decoded reference pictures (35) stored in said decoded picture buffer (34) and having a respective picture order count value lower than said picture order count limit.
2. The method according to claim 1 , further comprising:
retrieving (S20) a first output flag based on said encoded representation (20);
retrieving (S12), if said first output flag has a first predefined value, an output distance syntax element based on said encoded representation (20); and
determining (S13) an output distance value based on said output distance syntax element, wherein calculating (S1) said picture order count limit comprises calculating (S1) said picture order count limit to be equal to a picture order count value of said current picture (2) subtracted by said output distance value.
3. The method according to claim 2, wherein determining (S13) said output distance value comprises:
determining (S13) said output distance value to be equal to half of a difference syntax element representing a largest picture order count difference if said output distance syntax element has a predefined value.
4. The method according to claim 2 or 3, wherein determining (S13) said output distance value comprises:
determining (S13) said output distance value to be equal to said output distance syntax element if said output distance syntax element has a value different from a predefined value.
5. The method according to any of the claims 1 to 4, further comprising:
retrieving (S10) a first output flag based on said encoded representation (20); and
determining (S14), if said first output flag has a second predefined value, an output distance value to be equal to zero, wherein calculating (S1) said picture order count limit comprises calculating (S1) said picture order count limit to be equal to a picture order count value of said current picture (2) subtracted by said output distance value.
6. The method according to any of the claims 2 to 5, further comprising:
retrieving (S20) a second output flag based on said encoded representation (20); and performing said retrieving (S10, S12) step if said second output flag has a predefined value.
7. The method according to any of the claims 1 to 6, wherein said decoded reference pictures (35) are stored in a respective frame buffer (49) of said decoded picture buffer (43, 48) and wherein
outputting (S2) said decoded reference pictures (35) comprises outputting (S2), in increasing order of picture order count values starting with a lowest picture order count value, decoded reference pictures (35) i) stored in said decoded picture buffer (34, 48), ii) having a respective picture order count value lower than said picture order count limit and iii) being marked as needed for output, said method further comprising:
marking (S30) said decoded reference pictures (35) that are output as not needed for output.
8. The method according to claim 7, further comprising:
emptying (S31) any frame buffer (49) of said decoded picture buffer (34, 48) storing a decoded reference picture (35) marked as not needed for output and unused for reference.
9. The method according to claim 8, further comprising:
decoding (S40) said encoded representation (20) to get a current decoded picture (2); and storing (S41) said current decoded picture (2) in an empty frame buffer (49) of said decoded picture buffer (34, 48).
10. A method of encoding a current picture (2) of a video stream (1) in an encoder (70, 80), said method comprising:
determining (S50) a picture order count limit to have a value enabling a target state of a decoded picture buffer (34, 48) in a decoder (40, 50) for said current picture (2), wherein said picture order count limit defines a number of decoded reference pictures (35) to be output from said decoded picture buffer (34, 48) in a picture output process for said current picture (2);
determining (S51) at least one syntax element representative of said value of said picture order count limit;
encoding (S52) said current picture (2) to get an encoded representation (20) of said current picture (2); and
associating (S53) said at least one syntax element with said encoded representation (20).
11. The method according to claim 10, wherein said decoded reference pictures (35) are stored in a respective frame buffer (49) in said decoded picture buffer (34, 48) and wherein determining (S50) said picture order count limit comprises determining (S50) said picture order count limit to have a value such that at least one frame buffer (49) in said decoded picture buffer (34, 48) is emptied from a decoded reference picture (35) marked as unused for reference if there are no empty frame buffers (49) in said decoded picture buffer (34, 48) prior to emptying said at least one frame buffer (49).
12. The method according to claim 10 or 11 , wherein said decoded reference pictures (35) are stored in a respective frame buffer (49) in said decoded picture buffer (34, 48) and wherein determining
(S50) said picture order count limit comprises determining (S50) said picture order count limit to have a value defined based on a coding structure of said video stream (1) and picture order count values of future pictures of said video stream (1) if there is at least one empty frame buffer (49) in said decoded picture buffer (34, 48) for said current picture (2).
13. The method according to any of the claims 10 to 12, wherein each picture (2) of said video stream (1) has a respective picture order count value and a respective temporal identifier, said method further comprising:
comparing (S60) a value X equal to a highest picture order count value of all decoded pictures with temporal identifier lower than or equal to a temporal identifier of said current picture (2) that have been output prior to invoking said picture output process for said current picture (2) with a value X' equal to a highest picture order count value of all decoded pictures with temporal identifier lower than or equal to said temporal identifier of said current picture (2) that have been output after invoking a picture output process for a previous picture that is previous to said current picture (2) according to a decoding order of said video stream (1) and has a temporal identifier lower than or equal to said temporal identifier of said current picture (2); and
setting (S61) a new value of said at least one syntax element that is smaller than a picture order count value of said current picture (2) subtracted by said value X if said value X is different from said value X'.
14. A decoder (40, 50) comprising:
a decoded picture buffer (34, 48) configured to store decoded pictures (35) of a video stream (1); a limit calculator (41) configured to calculate a picture order count limit based on at least one syntax element retrieved based on an encoded representation (20) of a current picture (2) of said video stream (1); and
a picture outputting unit (42) configured to output decoded reference pictures (35) stored in said decoded picture buffer (34, 48) and having a respective picture order count value lower than said picture order count limit.
15. The decoder according to claim 14, further comprising:
a flag retriever (43) configured to retrieve a first output flag based on said encoded representation (20);
an element retriever (44) configured to retrieve, if said first output flag has a first predefined value, an output distance syntax element based on said encoded representation (20); and
a value determiner (45) configured to determine an output distance value based on said output distance syntax element, wherein said limit calculator (41) is configured to calculate said picture order count limit to be equal to a picture order count value of said current picture (2) subtracted by said output distance value.
16. The decoder according to claim 15, wherein said value determiner (45) is configured to determine said output distance value to be equal to half of a difference syntax element representing a largest picture order count difference if said output distance syntax element has a predefined value.
17. The decoder according to claim 15 or 16, wherein said value determiner (45) is configured to determine said output distance value to be equal to said output distance syntax element if said output distance syntax element has a value different from a predefined value.
18. The decoder according to any of the claims 14 to 17, further comprising:
a flag retriever (43) configured to retrieve a first output flag based on said encoded representation (20); and
a value determiner (45) configured to determine, if said first output flag has a second predefined value, an output distance value to be equal to zero, wherein said limit calculator (41) is configured to calculate said picture order count limit to be equal to a picture order count value of said current picture (2) subtracted by said output distance value.
19. The decoder according to any of the claims 15 to 18, wherein said flag retriever (43) is configured to retrieve a second output flag based on said encoded representation (20); and
said flag retriever (43) is configured to retrieve said first output flag if said second output flag has a predefined value.
5
20. The decoder according to any of the claims 14 to 19, wherein said decoded reference pictures (35) are stored in a respective frame buffer (49) of said decoded picture buffer (35, 48) and wherein said picture outputting unit (42) is configured to output, in increasing order of picture order count values starting with a lowest picture order count value, decoded reference pictures (35) i) stored in said 10 decoded picture buffer (34, 48), ii) having a respective picture order count value lower than said picture order count limit and iii) being marked as needed for output, said decoder (40, 50) further comprises: a picture marking unit (46) configured to mark said decoded reference pictures (35) that are output as not needed for output.
15 21. The decoder according to claim 20, wherein said picture outputting unit (42) is configured to empty any frame buffer (49) of said decoded picture buffer (34, 48) storing a decoded reference picture (35) marked as not needed for output and unused for reference.
22. The decoder according to claim 21 , further comprising:
20 a decoding unit (47) configured to decode said encoded representation (20) to get a current decoded picture (2), wherein
said decoded picture buffer (34, 48) is configured to store said current decoded picture (2) in an empty frame buffer (49).
25 23. A receiver (30) comprising a decoder (32, 40, 50) comprising:
a decoded picture buffer (34, 48) configured to store decoded pictures (35) of a video stream (1); a limit calculator (41 ) configured to calculate a picture order count limit based on at least one syntax element retrieved based on an encoded representation (20) of a current picture (2) of said video stream (1); and
30 a picture outputting unit (42) configured to output decoded reference pictures (35) stored in said decoded picture buffer (34, 48) and having a respective picture order count value lower than said picture order count limit.
An encoder (70, 80) comprising: a limit determiner (71) configured to determine a picture order count limit to have a value enabling a target state of a decoded picture buffer (34, 48) in a decoder (40, 50) for a current picture (2) of a video stream (1), wherein said picture order count limit defines a number of decoded reference pictures (35) to be output from said decoded picture buffer (34, 48) in a picture output process for said current picture (2);
a syntax element determiner (72) configured to determine at least one syntax element representative of said value of said picture order count limit;
an encoding unit (73) configured to encode said current picture (2) to get an encoded representation (20) of said current picture (2); and
an associating unit (74) configured to associate said at least one syntax element with said encoded representation (20).
25. The encoder according to claim 24, wherein
said decoded reference pictures (35) are stored in a respective frame buffer (40) in said decoded picture buffer (34, 48), and
said limit determiner (71) is configured to determine said picture order count limit to have a value such that at least one frame buffer (49) in said decoded picture buffer (34, 48) is emptied from a decoded reference picture (35) marked as unused for reference if there are no empty frame buffers (49) in said decoded picture buffer (34, 48) prior to emptying said at least one frame buffer (49).
26. The encoder according to claim 24 or 25, wherein
said decoded reference pictures (35) are stored in a respective frame buffer (49) in said decoded picture buffer (34, 48), and
said limit determiner (71) is configured to determine said picture order count limit to have a value defined based on a coding structure of said video stream (1) and picture order count values of future pictures of said video stream (1 ) if there is at least one empty frame buffer (49) in said decoded picture buffer (34, 48) for said current picture (2).
27. The encoder according to any of the claims 24 to 26, wherein each picture of said video stream (1) has a respective picture order count value and a respective temporal identifier, said encoder (70,
80) further comprises:
a comparator (75) configured to compare a value X equal to a highest picture order count value of all decoded pictures with temporal identifier lower than or equal to a temporal identifier of said current picture (2) that have been output prior to invoking said picture output process for said current picture (2) with a value X' equal to a highest picture order count value of all decoded pictures with temporal identifier lower than or equal to said temporal identifier of said current picture (2) that have been output after invoking a picture output process for a previous to picture that is previous said current picture (2) according to a decoding order of said video stream (1) and has a temporal identifier lower than or equal to said temporal identifier of said current picture (2), wherein
said syntax element determiner (72) is configured to set a new value of said at least one syntax element that is smaller than a picture order count value of said current picture (2) subtracted by said value X if said value X is different from said value X'.
28. A transmitter (60) comprising an encoder (62, 70, 80) comprising:
a limit determiner (71 ) configured to determine a picture order count limit to have a value enabling a target state of a decoded picture buffer (34, 48) in a decoder (40, 50) for a current picture (2) of a video stream (1), wherein said picture order count limit defines a number of decoded reference pictures (35) to be output from said decoded picture buffer (34, 48) in a picture output process for said current picture (2);
a syntax element determiner (72) configured to determine at least one syntax element representative of said value of said picture order count limit;
an encoding unit (73) configured to encode said current picture (2) to get an encoded representation (20) of said current picture (2); and
an associating unit (74) configured to associate said at least one syntax element with said encoded representation (20).
PCT/SE2012/051372 2012-01-20 2012-12-11 Output of decoded reference pictures WO2013109179A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP12818654.1A EP2805490A1 (en) 2012-01-20 2012-12-11 Output of decoded reference pictures

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261588764P 2012-01-20 2012-01-20
US61/588,764 2012-01-20

Publications (1)

Publication Number Publication Date
WO2013109179A1 true WO2013109179A1 (en) 2013-07-25

Family

ID=47604013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2012/051372 WO2013109179A1 (en) 2012-01-20 2012-12-11 Output of decoded reference pictures

Country Status (2)

Country Link
EP (1) EP2805490A1 (en)
WO (1) WO2013109179A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015056179A1 (en) * 2013-10-15 2015-04-23 Nokia Technologies Oy Video encoding and decoding using syntax element
TWI645721B (en) * 2013-10-23 2018-12-21 高通公司 Multi-layer video file format designs
CN113545045A (en) * 2019-02-01 2021-10-22 弗劳恩霍夫应用研究促进协会 Video codec allowing random access according to sub-picture or region and video composition concept using the same

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
KADONO: "Err. Additional Problem Reports for Final Draft", 8. JVT MEETING; 23-05-2003 - 27-05-2003; GENEVA, CH; (JOINT VIDEO TEAMOF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ),, no. JVT-H025withNotes, 27 May 2003 (2003-05-27), XP030005729, ISSN: 0000-0426 *
MISRA K ET AL: "AHG21: Long term picture referencing using wrapped POC", 7. JCT-VC MEETING; 98. MPEG MEETING; 21-11-2011 - 30-11-2011; GENEVA; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-G713, 9 November 2011 (2011-11-09), XP030110697 *
SAMUELSSON J ET AL: "AHG15: Syntax controlled output process", 99. MPEG MEETING; 6-2-2012 - 10-2-2012; SAN JOSÃ CR ; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. m23448, 21 January 2012 (2012-01-21), XP030051973 *
SAMUELSSON J ET AL: "Reducing output delay for bumping process", 7. JCT-VC MEETING; 98. MPEG MEETING; 21-11-2011 - 30-11-2011; GENEVA; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-G583, 8 November 2011 (2011-11-08), XP030110567 *
SJÃBERG R ET AL: "Absolute signaling of reference pictures", 6. JCT-VC MEETING; 97. MPEG MEETING; 14-7-2011 - 22-7-2011; TORINO; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-F493, 22 July 2011 (2011-07-22), XP030009516 *
WIEGAND T ET AL: "WD3: Working Draft 3 of High-Efficiency Video Coding", 20110329, no. JCTVC-E603, 29 March 2011 (2011-03-29), XP030009014, ISSN: 0000-0003 *
Y-K WANG ET AL: "MVC HRD and bitstream restriction", 27. JVT MEETING; 6-4-2008 - 10-4-2008; GENEVA, ; (JOINT VIDEO TEAM OFISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ),, no. JVT-AA020, 28 April 2008 (2008-04-28), XP030007363, ISSN: 0000-0091 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015056179A1 (en) * 2013-10-15 2015-04-23 Nokia Technologies Oy Video encoding and decoding using syntax element
US9641862B2 (en) 2013-10-15 2017-05-02 Nokia Technologies Oy Video encoding and decoding
US10205965B2 (en) 2013-10-15 2019-02-12 Nokia Technologies Oy Video encoding and decoding
TWI645721B (en) * 2013-10-23 2018-12-21 高通公司 Multi-layer video file format designs
CN113545045A (en) * 2019-02-01 2021-10-22 弗劳恩霍夫应用研究促进协会 Video codec allowing random access according to sub-picture or region and video composition concept using the same

Also Published As

Publication number Publication date
EP2805490A1 (en) 2014-11-26

Similar Documents

Publication Publication Date Title
US11849144B2 (en) Signaling of state information for a decoded picture buffer and reference picture lists
US10951899B2 (en) Extension data handling
US10057571B2 (en) Reference picture list handling
RU2581566C2 (en) Reference picture signaling
US9774927B2 (en) Multi-layer video stream decoding
US20200413042A1 (en) Multi-Layer Video Stream Encoding and Decoding
WO2013109179A1 (en) Output of decoded reference pictures
EP2936809B1 (en) Multi-layer video stream decoding
US20140233653A1 (en) Decoder and encoder for picture outputting and methods thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12818654

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2012818654

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012818654

Country of ref document: EP