US20130089154A1 - Adaptive frame size support in advanced video codecs - Google Patents
Adaptive frame size support in advanced video codecs Download PDFInfo
- Publication number
- US20130089154A1 US20130089154A1 US13/648,174 US201213648174A US2013089154A1 US 20130089154 A1 US20130089154 A1 US 20130089154A1 US 201213648174 A US201213648174 A US 201213648174A US 2013089154 A1 US2013089154 A1 US 2013089154A1
- Authority
- US
- United States
- Prior art keywords
- sequence
- sub
- parameter set
- resolution
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/573—Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/58—Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- This disclosure relates to video coding and, more particularly, to techniques for coding video data.
- Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like.
- Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard presently under development, and extensions of such standards.
- the video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
- Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences.
- a video slice i.e., a video picture or a portion of a video picture
- video blocks which may also be referred to as treeblocks, coding tree blocks (CTBs), coding tree units (CTUs), coding units (CUs) and/or coding nodes.
- Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture.
- Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures.
- Pictures may be referred to as frames, and reference pictures may be referred to a reference frames.
- Residual data represents pixel differences between the original block to be coded and the predictive block.
- An inter-coded block is encoded according to a motion vector that points to a block of reference samples forming the predictive block, and the residual data indicating the difference between the coded block and the predictive block.
- An intra-coded block is encoded according to an intra-coding mode and the residual data.
- the residual data may be transformed from the pixel domain to a transform domain, resulting in residual transform coefficients, which then may be quantized.
- the quantized transform coefficients initially arranged in a two-dimensional array, may be scanned in order to produce a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
- this disclosure describes techniques for coding video sequences that include frames, or “pictures,” having different spatial resolutions.
- One aspect of this disclosure includes using multiple sequence parameter sets in a single resolution-adaptive coded video sequence to indicate a resolution of a sequence of pictures in coded video.
- the resolution-adaptive coded video sequence may comprise two or more sub-sequences which may be coded, wherein each sub-sequence may comprise a set of pictures with a common spatial resolution, and may refer to a same active sequence parameter set.
- Another aspect of this disclosure includes a novel activation process for activating a sequence parameter set when using multiple sequence parameter sets in a single resolution-adaptive coded video sequence, as described above.
- a size of a DPB is not indicated using a number of frame buffers (e.g., a number of storage locations each capable of storing a frame, or “picture,” of a fixed size), consistent with some techniques, but rather using a different unit of size.
- the availability of the DPB to store the decoded picture is determined based on a spatial resolution of the decoded picture to be inserted, so as to ensure that the DPB includes sufficient empty buffer space for inserting the decoded picture.
- the availability of the DPB to store a subsequent decoded picture is determined based on a spatial resolution of the removed decoded picture, and a spatial resolution of the subsequent decoded picture to be inserted into the DPB.
- the proportion of the DPB unavailable to store decoded pictures, or a “fullness” of the DPB, after removing the decoded picture is not decreased by an amount corresponding to a single decoded picture of a fixed size, consistent with some techniques, but rather by a varying amount, depending on the spatial resolution of the removed decoded picture.
- FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize techniques described in this disclosure.
- FIG. 2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.
- FIG. 3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.
- FIGS. 4A-4D are conceptual diagrams illustrating an example video sequence that includes a plurality of pictures that are encoded and transmitted in accordance with the techniques of this disclosure.
- FIG. 5 is a conceptual diagram illustrating the operation of a decoded picture buffer of a hypothetical reference decoder (HRD) model in accordance with the techniques of this disclosure.
- HRD hypothetical reference decoder
- FIG. 6 is a flowchart illustrating an example operation of using a first sub-sequence and a second sub-sequence to decode video in accordance with the techniques of this disclosure.
- FIG. 7 is a flowchart illustrating an example operation of managing a decoded picture buffer in accordance with the techniques of this disclosure.
- the techniques of this disclosure are generally related for techniques for using multiple sequence parameter sets (SPSs) for communicating video data at different resolutions, and techniques for managing the multiple SPSs.
- SPS sequence parameter set
- Additional syntax information for the CVS also signaled in the SPS includes the Largest Coding Unit (LCU) size and the Smallest Coding Unit (SCU) size, which define a largest and a smallest block, or coding unit, size for each picture, respectively.
- LCU Largest Coding Unit
- SCU Smallest Coding Unit
- a CVS may refer to a sequence of coded pictures starting from an instantaneous decoding refresh (IDR) picture to another IDR picture, exclusive, in decoding order, or the end of the coded video bitstream if the starting IDR picture is the last IDR picture in coded video bitstream.
- IDR instantaneous decoding refresh
- HEVC may support resolution-adaptive video sequences that include frames with different resolutions.
- One method for adaptive frame size support is described in JCTVC-F158: Resolution switching for coding efficiency and resilience, Davies, 6th Meeting, Turin, IT, 14-22 Jul. 2011, referred to as JCTVC-F158 hereinafter.
- each SPS of the multiple SPSs may include information related to a sequence of pictures that has a different resolution.
- This disclosure also introduces a new sequence, referred to as a resolution sub-sequence (RSS) that may refer back to one of the multiple SPSs in order to indicate the resolution of a sequence of pictures.
- RSS resolution sub-sequence
- This disclosure also describes techniques for activating a single SPS when multiple parameters sets may be utilized within a single CVS, as well as different techniques and orders for transmitting the different SPSs.
- a video coder e.g., a video encoder or a video decoder
- the DPB stores decoded pictures, including reference pictures. Reference pictures are pictures that can potentially be used for inter-predicting a picture.
- the video coder may predict a picture, during coding (encoding or decoding) of that picture, based on one or more reference pictures stored in the DPB.
- DPB Decoded Picture Buffer
- DPB management processes including a storage process of decoded pictures into the DPB, a marking process of reference pictures, and an output and removal processes of decoded pictures from the DPB, are specified.
- DPB management includes at least the following aspects: (1) Picture identification and reference picture identification; (2) Reference picture list construction; (3) Reference picture marking; (4) Picture output from the DPB; (5) Picture insertion into the DPB; and (6) Picture removal from the DPB.
- Each CVS may include a number of reference pictures, which may be used to predict pixel values of other pictures (e.g., pictures that come before or after the reference picture).
- a video coder marks each reference picture, and stores the reference picture in the DPB.
- the DPB includes a maximum number, referred to as M (num_ref_frames), of reference pictures used for inter-prediction in the active sequence parameter set.
- the reference picture When a reference picture is decoded, the reference picture is marked as “used for reference.” If the decoding of the reference picture caused more than M pictures to be marked as “used for reference,” at least one picture must be marked as “unused for reference.” The DPB removal process then would remove pictures marked as “unused for reference” from the DPB if they are not needed for output as well.
- the decoded picture When a picture is decoded, the decoded picture may be either a non-reference picture or a reference picture.
- a reference picture may be a long-term reference picture or short-term reference picture, and when the decoded picture is marked as “unused for reference”, the decoded picture may become no longer needed for reference.
- the operation mode for reference picture marking may be selected on a picture basis; whereas, the sliding window operation mode may work as a first-in-first-out queue with a fixed number of short-term reference pictures. In other words, short-term reference pictures with earliest decoding time may be the first to be removed (marked as picture not used for reference), in an implicit fashion.
- the video coder may also be tasked with constructing reference picture lists that indicate which reference pictures may be used for inter-prediction purposes. Two of these reference picture lists are referred to as List 0 and List 1 , respectively.
- the video coder firstly employs default construction techniques to construct List 0 and List 1 (e.g., preconfigured construction schemes for constructing List 0 and List 1 ).
- the video decoder may decode syntax elements, when present, that instruct the video decoder to modify the initial List 0 and List 1 .
- the video encoder may signal syntax elements that are indicative of identifier(s) of reference pictures in the DPB, and the video encoder may also signal syntax elements that include indices, within List 0 , List 1 , or both List 0 and List 1 , that indicate which reference picture or pictures to use to decode a coded block of a current picture.
- the video decoder uses the received identifier to identify the index value or values for a reference picture or reference pictures listed in List 0 , List 1 , or both List 0 and List 1 .
- the video decoder retrieves the reference picture or reference pictures, or part(s) thereof, from the DPB, and decodes the coded block of the current picture based on the retrieved reference picture or pictures and one or more motion vectors that identify blocks within the reference picture or pictures that are used for decoding the coded block.
- a coded video sequence refers to a sequence of coded frames, or “pictures,” ranging from an instantaneous decoding refresh (IDR) picture to another IDR picture, exclusive, in a decoding order, or to an end of a coded video bitstream if the starting IDR picture is the last IDR picture in the coded video bitstream.
- IDR instantaneous decoding refresh
- a sub-sequence of pictures with one resolution may have different coding parameters, such as an LCU size, than another sub-sequence of pictures with another, different resolution. Accordingly, it may not be sufficient to use a single active SPS to describe characteristics of a CVS comprising the sub-sequences of pictures with the different resolutions.
- different sub-sequences of a CVS may have reference pictures having different sizes, that is, different spatial resolutions.
- one set of particular parameters included in an SPS for the CVS e.g., max_num_ref_frames, may be optimal for one sub-sequence, but can be sub-optimal for all sub-sequences included in the CVS.
- some techniques for DPB management may no longer be effective when coding a single CVS that includes pictures having different resolutions.
- a size of a DPB used to store the pictures can no longer be indicated using a number of frame buffers, e.g., a number of storage locations each capable of storing a frame, or “picture,” of a fixed size.
- the DPB must include an empty frame buffer of a size that is sufficiently large to store the decoded picture.
- a frame buffer of a fixed size may not correspond to a size of a particular decoded picture to be inserted. Accordingly, merely determining whether the DPB includes an empty frame buffer of a fixed size may be insufficient to determine whether the DPB is available to store the decoded picture. As one example, the DPB may have less buffer space than is required to store the decoded picture.
- the decoded picture has a resolution that corresponds to a size that is different than the size of the frame buffer
- determining that the decoded picture has been removed from the DPB may be insufficient to determine whether the DPB is actually available to store a subsequent decoded picture having a particular resolution.
- the above determination is also insufficient to indicate the actual buffer space that may be available within the DPB for storing additional decoded pictures.
- a single empty frame buffer of a fixed size may exist within the DPB, and the DPB may store decoded picture(s) having a particular resolution in the frame buffer.
- the DPB may store decoded picture(s) having a particular resolution in the frame buffer.
- a video coder removes a decoded picture from the DPB, and the removed picture has a resolution that is smaller than the size of the frame buffer, sufficient buffer space may exist within the DPB to insert a decoded picture with a resolution that corresponds to a size that is larger than the size of the removed decoded picture. Accordingly, merely determining that a particular decoded picture has been removed from the DPB may be insufficient to indicate the actual buffer space that may be available within the DPB for storing additional decoded pictures.
- FIG. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize techniques described in this disclosure.
- a reference picture set is defined as a set of reference pictures associated with a picture, consisting of all reference pictures that are prior to the associated picture in decoding order, that may be used for inter prediction of the associate picture or any picture following the associated picture in decoding order.
- the reference pictures that are prior to the associated picture may be reference pictures until the next instantaneous decoding refresh (IDR) picture, or broken link access (BLA) picture.
- IDR instantaneous decoding refresh
- BLA broken link access
- reference pictures in the reference picture set may all be prior to the current picture in decoding order.
- the reference pictures in the reference picture set may be used for inter-predicting the current picture and/or inter-predicting any picture following the current picture in decoding order until the next IDR picture or BLA picture.
- some of the reference pictures in the reference picture set are reference pictures that can potentially be used to inter-predict a block of the current picture, and not pictures following the current picture in decoding order.
- Some of the reference pictures in the reference picture set are reference pictures that can potentially be used to inter-predict a block of the current picture, and blocks in one or more pictures following the current picture in decoding order.
- Some of the reference pictures in the reference picture set are reference pictures that can potentially be used to inter-predict blocks in one or more pictures following the current picture in decoding order, and cannot be used to inter-predict a block in the current picture.
- reference pictures that can potentially be used for inter-prediction refer to reference pictures that can be used for inter-prediction, but do not necessarily have to be used for inter-prediction.
- the reference picture set may identify reference pictures that can potentially be used for inter-prediction. However, this does not mean that all of the identified reference pictures must be used for inter-prediction. Rather, one or more of these identified reference pictures could be used for inter-prediction, but all do not necessarily have to be used for inter-prediction.
- system 10 includes a source device 12 that generates encoded video for decoding by destination device 14 .
- Source device 12 and destination device 14 may each be an example of a video coding device.
- Source device 12 may transmit the encoded video to destination device 14 via communication channel 16 or may store the encoded video on a storage medium 17 or a file server 19 , such that the encoded video may be accessed by the destination device 14 as desired.
- Source device 12 and destination device 14 may comprise any of a wide range of devices, including a wireless handset such as so-called “smart” phones, so-called “smart” pads, or other such wireless devices equipped for wireless communication. Additional examples of source device 12 and destination device 14 include, but are not limited to, a digital television, a device in digital direct broadcast system, a device in wireless broadcast system, a personal digital assistants (PDA), a laptop computer, a desktop computer, a tablet computer, an e-book reader, a digital camera, a digital recording device, a digital media player, a video gaming device, a video game console, a cellular radio telephone, a satellite radio telephone, a video teleconferencing device, and a video streaming device, a wireless communication device, or the like.
- PDA personal digital assistants
- source device 12 and/or destination device 14 may be equipped for wireless communication.
- communication channel 16 may comprise a wireless channel, a wired channel, or a combination of wireless and wired channels suitable for transmission of encoded video data.
- the file server 19 may be accessed by the destination device 14 through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server.
- a wireless channel e.g., a Wi-Fi connection
- wired connection e.g., DSL, cable modem, etc.
- system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony
- source device 12 includes a video source 18 , video encoder 20 , a modulator/demodulator (MODEM) 22 and an output interface 24 .
- video source 18 may include a source such as a video capture device, such as a video camera, a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources.
- source device 12 and destination device 14 may form so-called camera phones or video phones.
- the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications.
- the captured, pre-captured, or computer-generated video may be encoded by video encoder 20 .
- the encoded video information may be modulated by modem 22 according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14 via output interface 24 .
- Modem 22 may include various mixers, filters, amplifiers or other components designed for signal modulation.
- Output interface 24 may include circuits designed for transmitting data, including amplifiers, filters, and one or more antennas.
- the captured, pre-captured, or computer-generated video that is encoded by the video encoder 20 may also be stored onto a storage medium 17 or a file server 19 for later consumption.
- the storage medium 17 may include Blu-ray discs, DVDs, CD-ROMs, flash memory, or any other suitable digital storage media for storing encoded video.
- the encoded video stored on the storage medium 17 may then be accessed by destination device 14 for decoding and playback.
- File server 19 may be any type of server capable of storing encoded video and transmitting that encoded video to the destination device 14 .
- Example file servers include a web server (e.g., for a website), an FTP server, network attached storage (NAS) devices, a local disk drive, or any other type of device capable of storing encoded video data and transmitting it to a destination device.
- the transmission of encoded video data from the file server 19 may be a streaming transmission, a download transmission, or a combination of both.
- the file server 19 may be accessed by the destination device 14 through any standard data connection, including an Internet connection.
- This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, Ethernet, USB, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server.
- a wireless channel e.g., a Wi-Fi connection
- a wired connection e.g., DSL, cable modem, Ethernet, USB, etc.
- a combination of both that is suitable for accessing encoded video data stored on a file server.
- Destination device 14 in the example of FIG. 1 , includes an input interface 26 , a modem 28 , a video decoder 30 , and a display device 32 .
- Input interface 26 of destination device 14 receives information over channel 16 , as one example, or from storage medium 17 or file server 17 , as alternate examples, and modem 28 demodulates the information to produce a demodulated bitstream for video decoder 30 .
- the demodulated bitstream may include a variety of syntax information generated by video encoder 20 for use by video decoder 30 in decoding video data. Such syntax may also be included with the encoded video data stored on a storage medium 17 or a file server 19 .
- the syntax may be embedded with the encoded video data, although aspects of this disclosure should not be considered limited to such a requirement.
- the syntax information defined by video encoder 20 may include syntax elements that describe characteristics and/or processing of video blocks, such as coding tree units (CTUs), coding tree blocks (CTBs), prediction units (PUs), coding units (CUs) or other units of coded video, e.g., video slices, video pictures, and video sequences or groups of pictures (GOPs).
- CTUs coding tree units
- CTBs coding tree blocks
- PUs prediction units
- CUs coding units
- Each of video encoder 20 and video decoder 30 may form part of a respective encoder-decoder (CODEC) that is capable of encoding or decoding video data.
- CODEC encoder-decoder
- Display device 32 may be integrated with, or external to, destination device 14 .
- destination device 14 may include an integrated display device and also be configured to interface with an external display device.
- destination device 14 may be a display device.
- display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.
- LCD liquid crystal display
- OLED organic light emitting diode
- communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media.
- Communication channel 16 may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet.
- Communication channel 16 generally represents any suitable communication medium, or collection of different communication media, for transmitting video data from source device 12 to destination device 14 , including any suitable combination of wired or wireless media.
- Communication channel 16 may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 12 to destination device 14 .
- Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions.
- a video compression standard such as the include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions.
- SVC Scalable Video Coding
- MVC Multiview Video Coding
- HEVC High Efficiency Video Coding
- JCT-VC Joint Collaboration Team on Video Coding
- MPEG ISO/IEC Motion Picture Experts Group
- video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams.
- MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
- Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more processors including microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- a device may store instructions for the software in a suitable, non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure.
- Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device.
- video encoder 20 and video decoder 30 may be commonly referred to as a video coder that codes information (e.g., pictures and syntax elements).
- the coding of information may refer to encoding when the video coder corresponds to video encoder 20 .
- the coding of information may refer to decoding when the video coder corresponds to video decoder 30 .
- FIG. 2 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure.
- Video encoder 20 may perform intra- and inter-coding of video blocks within video slices.
- Intra coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture.
- Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence.
- Intra-mode may refer to any of several spatial based compression modes.
- Inter-modes such as uni-directional prediction (P mode) or bi-prediction (B mode), may refer to any of several temporal-based compression modes.
- video encoder 20 includes a partitioning unit 35 , prediction processing unit 41 , summer 50 , transform processing unit 52 , quantization unit 54 , entropy encoding unit 56 , decoded picture buffer (DPB) 64 , and DPB management unit 65 .
- Prediction processing unit 41 includes motion estimation unit 42 , motion compensation unit 44 , and intra prediction unit 46 .
- video encoder 20 also includes inverse quantization unit 58 , inverse transform unit 60 , and summer 62 .
- a deblocking filter (not shown in FIG. 2 ) may also be included to filter block boundaries to remove blockiness artifacts from reconstructed video. If desired, the deblocking filter would typically filter the output of summer 62 . Additional loop filters (in loop or post loop) may also be used in addition to the deblocking filter.
- video encoder 20 receives video data, and partitioning unit 35 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as wells as video block partitioning, e.g., according to a quadtree structure of LCUs and CUs.
- Video encoder 20 generally illustrates the components that encode video blocks within a video slice to be encoded. The slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as tiles).
- Prediction processing unit 41 may select one of a plurality of possible coding modes, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes, for the current video block based on error results (e.g., coding rate and the level of distortion). Prediction processing unit 41 may provide the resulting intra- or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference picture.
- error results e.g., coding rate and the level of distortion
- Intra prediction unit 46 within prediction processing unit 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same picture or slice as the current block to be coded to provide spatial compression.
- Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.
- Motion estimation unit 42 may be configured to determine the inter-prediction mode for a video slice according to a predetermined pattern for a video sequence.
- the predetermined pattern may designate video slices in the sequence as P slices or B slices.
- Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes.
- Motion estimation, performed by motion estimation unit 42 is the process of generating motion vectors, which estimate motion for video blocks.
- a motion vector for example, may indicate the displacement of a PU of a video block within a current video picture relative to a predictive block within a reference picture.
- a predictive block is a block that is found to closely match the PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of square difference (SSD), or other difference metrics.
- video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in decoded picture buffer 64 . For example, video encoder 20 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Therefore, motion estimation unit 42 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
- Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture.
- the reference picture may be selected from a first reference picture list (List 0 ) or a second reference picture list (List 1 ), each of which identify one or more reference pictures stored in decoded picture buffer 64 .
- Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44 .
- Motion compensation performed by motion compensation unit 44 may involve fetching or generating the predictive block based on the motion vector determined by motion estimation, possibly performing interpolations to sub-pixel precision.
- motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference picture lists.
- Video encoder 20 forms a residual video block by subtracting pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values.
- the pixel difference values form residual data for the block, and may include both luma and chroma difference components.
- Summer 50 represents the component or components that perform this subtraction operation.
- Motion compensation unit 44 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.
- Intra-prediction unit 46 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44 , as described above. In particular, intra-prediction unit 46 may determine an intra-prediction mode to use to encode a current block. In some examples, intra-prediction unit 46 may encode a current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode select unit 40 , in some examples) may select an appropriate intra-prediction mode to use from the tested modes.
- intra-prediction unit 46 may calculate rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristics among the tested modes.
- Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number of bits) used to produce the encoded block.
- Intra-prediction unit 46 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
- intra-prediction unit 46 may provide information indicative of the selected intra-prediction mode for the block to entropy encoding unit 56 .
- Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode in accordance with the techniques of this disclosure.
- Video encoder 20 may include in the transmitted bitstream configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks, and indications of a most probable intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table to use for each of the contexts.
- video encoder 20 forms a residual video block by subtracting the predictive block from the current video block.
- the residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52 .
- Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform.
- Transform processing unit 52 may convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
- Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54 .
- Quantization unit 54 quantizes the transform coefficients to further reduce bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter.
- quantization unit 54 may then perform a scan of the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.
- entropy encoding unit 56 entropy encodes the quantized transform coefficients.
- entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding or another entropy encoding methodology or technique.
- CAVLC context adaptive variable length coding
- CABAC context adaptive binary arithmetic coding
- SBAC syntax-based context-adaptive binary arithmetic coding
- PIPE probability interval partitioning entropy
- the encoded bitstream may be transmitted to video decoder 30 , or archived for later transmission or retrieval by video decoder 30 .
- Entropy encoding unit 56 may also entropy encode the motion vectors and the other syntax elements for the current video slice being coded.
- Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference picture.
- Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation.
- Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in decoded picture buffer 64 .
- the reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture.
- prediction processing unit 41 represents one example unit for performing the example functions described above.
- prediction processing unit 41 may encode syntax elements that support the use of adaptive resolution CVSs.
- Prediction processing unit 41 may also generate SPSs that may be activated by one or more resolution sub-sequences, and transmit the SPSs and RSSs to a video decoder.
- Each of the SPSs may include resolution information for one or more sequences of pictures.
- Prediction processing unit 41 may also receive and order one or more SPSs and cause video encoder 20 to code information indicative of the reference pictures that belong to the reference picture set.
- DPB management unit 65 may also perform techniques related to the management of DPB 64 .
- prediction processing unit 41 may construct the plurality of reference picture subsets that each identifies one or more of the reference pictures. Prediction processing unit processing 41 may also derive the reference picture set from the constructed plurality of reference picture subsets. Also, prediction processing unit 41 and DPB management unit 65 may implement any one or more of the sets of example pseudo code described below to implement one or more example techniques described in this disclosure.
- prediction processing unit 41 may generate a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution.
- the second sub-sequence may include one or more frames each having a second resolution.
- the first sub-sequence may be different than the second sub-sequence, and the first resolution may be different than the second resolution.
- Prediction processing unit 41 may further generate a first sequence parameter set and a second sequence parameter set for the video sequence.
- the first sequence parameter set may indicate the first resolution of the one or more frames of the first sub-sequence
- the second sequence parameter set may indicate the second resolution of the one or more frames of the second sub-sequence.
- the first sequence parameter set may be different than the second sequence parameter set.
- Prediction processing unit 41 may transmit the coded video sequence comprising the first sub-sequence and the second sub-sequence, and the first sequence parameter set and the second sequence.
- the resolution may comprise a spatial resolution.
- Prediction processing unit 41 may also alter the coding of the sequence parameters sets. For example, prediction processing unit 41 may code the first sequence parameter set and the second sequence parameter in a transmitted bitstream prior to either the first sub-sequence or the second sub-sequence. Prediction processing unit 41 may also interleave in the coded video sequence the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence.
- prediction processing unit 41 may be configured to transmit both the first sequence parameter set and the second sequence parameter set prior to transmitting either of the first sub-sequence and the second sub-sequence. In another example, to transmit the first sequence parameter set and the second sequence parameter set of the coded video sequence, prediction processing unit 41 may be configured to transmit the second sequence parameter set after transmitting at least one frame of the one or more frames of the first sub-sequence, and prior to transmitting the second sub-sequence.
- prediction processing unit 41 may code the first sequence parameter set in a transmitted bitstream prior to coding the first sub-sequence and prediction processing unit 41 may also code the second sequence parameter set in the transmitted bitstream after at least one frame of the one or more frames of the first sub-sequence and prior to the second sub-sequence.
- Decoded picture buffer 64 may also perform the techniques of this disclosure.
- decoded picture buffer 64 may receive a first decoded frame of video data, wherein the first decoded frame is associated with a first resolution, determine whether a decoded picture buffer is available to store the first decoded frame based on the first resolution, and in the event the decoded picture buffer is available to store the first decoded frame, store the first decoded frame in the decoded picture buffer, and determine whether decoded buffer 64 is available to store a second decoded frame of video data, wherein the second decoded frame is associated with a second resolution, based on the first resolution and the second resolution, wherein the first decoded frame is different than the second decoded frame.
- DPB management unit 65 may determine an amount of information that may be stored within decoded picture buffer 64 , determine an amount of information associated with the first decoded frame based on the first resolution, and compare the amount of information that may be stored within decoded picture buffer 64 , and the amount of information associated with the first decoded frame.
- DPB management unit 65 may be configured to determine an amount of information that may be stored within decoded picture buffer 64 based on the first resolution, determine an amount of information associated with the second decoded frame based on the second resolution, and compare the amount of information that may be stored within decoded picture buffer 64 and the amount of information associated with the second decoded frame. DPB management unit 65 may also be configured to remove the first decoded frame from decoded picture buffer 64 , and in some examples, the resolution may comprise a spatial resolution.
- the techniques described in this disclosure may refer to video encoder 20 signaling information.
- video encoder 20 signals information the techniques of this disclosure generally refer to any manner in which video encoder 20 provides the information in a coded bitstream.
- video encoder 20 signals syntax elements to video decoder 30 it may mean that video encoder 20 transmitted the syntax elements to video decoder 30 as part of a coded bitstream via output interface 24 and communication channel 16 , or that video encoder 20 stored the syntax elements in a coded bitstream on storage medium 17 and/or file server 19 for eventual reception by video decoder 30 .
- signaling from video encoder 20 to video decoder 30 should not be interpreted as requiring transmission directly from video encoder 20 to video decoder 30 , although this may be one possibility for real-time video applications. In other examples, however, signaling from video encoder 20 to video decoder 30 should be interpreted as any technique with which video encoder 20 provides information in a bitstream for eventual reception by video decoder 30 , either directly or via an intermediate storage (e.g., in storage medium 17 and/or file server 19 ).
- an intermediate storage e.g., in storage medium 17 and/or file server 19
- Video encoder 20 and video decoder 30 may be configured to implement the example techniques described in this disclosure for coding, transmitting, receiving and activating SPSs and RSSs, as well as for managing the DPB.
- video decoder 30 may invoke the techniques to support adaptive resolution CVSs and to add and remove reference pictures from the DPB.
- Video decoder 30 may invoke the process in a similar manner.
- RSS may indicate information, such as a resolution of a series of coded video pictures of a CVS.
- Prediction processing unit 41 may use one resolution sub-sequence (RSS) at given time.
- RSS may reference a single SPS.
- n there are “n” RSSs in a given CVS, there may be, altogether, “n” active SPSs when decoding the CVS.
- multiple RSSs may refer to a single SPS in a CVS.
- the SPS or PPS may indicate the different resolution of each RSS.
- the SPS or PPS may include a resolution ID as well as a syntax element that indicates the resolution associated with each resolution ID.
- a computer-readable storage medium may include a data structure that represents CVSs, SPSs, and RSSs.
- the data structure may include a coded video sequence comprising a first sub-sequence and a second sub-sequence.
- the first sub-sequence may include one or more frames each having a first resolution
- the second sub-sequence may include one or more frames each having a second resolution.
- the first sub-sequence may also be different than the second sub-sequence
- the first resolution may be different than the second resolution.
- the data structure may further comprise a first sequence parameter set and a second sequence parameter set for the coded video sequence.
- the first sequence parameter set may indicate the first resolution of the one or more frames of the first sub-sequence
- the second sequence parameter set may indicate the second resolution of the one or more frames of the second sub-sequence
- the first sequence parameter set may be different than the second sequence parameter set.
- Prediction processing unit 41 of video encoder 20 may order or restrict each of the RSSs according to spatial resolution characteristics of each RSS.
- prediction processing unit 41 may order the SPSs based on their horizontal resolutions. As an example, if a horizontal size of a resolution “A” of an SPS is greater than that of a resolution “B” of an SPS, a vertical size of the resolution “A” may not be less than that of the resolution “B.” With this restriction, a resolution “C” of an SPS may be considered to be larger than a resolution “D” of an SPS as long as one of a horizontal size and a vertical size of the resolution “C” is greater than a corresponding size of the resolution “D.”
- Video encoder 20 may assign an RSS with a largest spatial resolution a resolution ID equal to “0,” and an RRS with a second largest spatial resolution a resolution ID equal to “1,” and so forth.
- prediction processing unit 41 may not signal a resolution ID. Rather, video encoder 20 may derive the resolution ID according to the spatial resolutions of the RSSs. Prediction processing unit 41 may still order each of the RSSs in each CVS according to the spatial resolutions of each RSS, as described above. The RSS with the largest spatial resolution is assigned a resolution ID equal to 0, and the RSS with the second largest spatial resolution is assigned a resolution ID equal to 1, and so on.
- prediction processing unit 41 may refer to decoded pictures only within the same RSS, within an RSS with a resolution ID equal to “rId ⁇ 1,” or within an RSS with a resolution ID equal to “rId+1.” Prediction processing unit 41 may not refer to decoded pictures within other RSSs when performing inter-prediction.
- an RSS prediction processing unit 41 may only perform inter-prediction of blocks from two adjacent RSSs, i.e., the RSS with the immediately larger spatial resolution and the RSS with the immediately smaller spatial resolution.
- prediction processing unit 41 may not be limited to performing inter-prediction using spatially-neighboring RSSs, and prediction processing unit 41 may perform inter-prediction using any RSS, not just spatially neighboring RSSs (e.g., RSSs with rId+1 or rId ⁇ 1).
- the techniques of this disclosure may also include processes and techniques for transmitting and activating picture parameters sets (PPSs).
- PPSs may decouple the transmission of infrequently changing information from the transmission of coded block data for the CVSs.
- Video encoder 20 and decoder 30 may, in some applications convey or signal the SPSs and PPSs “out-of-band,” or using a different communication channel than that used to communicate the coded block data of the CVSs, e.g., using a reliable transport mechanism.
- a PPS raw byte sequence payload may include parameters to which coded slice network abstraction layer (NAL) units of one or more coded pictures may refer.
- NAL coded slice network abstraction layer
- Each PPS RBSP is initially considered not active at a start of a decoding process. At most, one PPS RBSP is considered active at any given moment during the decoding process, and activation of any particular PPS RBSP results in deactivation of a previously-active PPS RBSP, if any.
- prediction processing unit 41 of video encoder 20 and prediction processing unit 81 of video decoder 30 may support RSSs each having the same resolution aspect ratio.
- video encoder 20 and decoder 30 may support different RSSs having different resolution aspect ratios among the different RSSs.
- the resolution aspect ratio of an RSS may be defined as the proportion of the width of an RSS versus the height of the RSS.
- prediction processing units 41 and 81 may crop a portion of a block of a reference picture having a first resolution aspect ratio in order to predict the values of a predictive block having a second, different resolution aspect ratio.
- the techniques of this disclosure define a number of syntax elements, referred to as cropping parameters, which may be signaled in the RBSP of an SPS to indicate how a reference picture should be cropped.
- the cropped area of the reference picture may be referred to as a “cropping window.”
- the syntax elements may include a profile indicator or a flag that indicates the existence of more than one spatial resolution in the CVS. Alternatively no flag may be added, but the existence of the more than one spatial resolution in the CVS may be indicated by a particular value of the profile indicator, which may be denoted as profile_idc. Additionally, the syntax elements may include a resolution ID, a syntax element that indicates a spatial relationship between the current resolution sub-sequence and an adjacent spatial resolution sub-sequence, and a syntax element that indicates the required size of the DPB in units of 8 ⁇ 8 blocks.
- a modified SPS RBSP syntax structure may be expressed as shown below in Table I:
- adaptive_spatial_resolution_flag When equal to “1,” the flag indicates that a CVS containing an RSS referring to an SPS may contain pictures with different spatial resolutions. When equal to “0,” the flag indicates that all pictures in the CVS have a same spatial resolution, or equivalently, that there is only one RSS in the CVS. This syntax element applies to the entire CVS, and its value shall be identical for all SPSs that may be activated for the CVS.
- the adaptive_spatial_resolution flag is only one example of how adaptive resolution CVSs may be implemented. As another example, there may be one or more profiles defined that enable adaptive spatial resolution. Accordingly, the value of the profile_idc syntax element, which may indicate the selection of an adaptive resolution profile, may signal the enablement of adaptive resolution.
- resolution_id Specifies an identifier of the RSS referring to the SPS.
- a value of resolution_id may be in a range of “0” to “7,” inclusive.
- An RSS with a largest spatial resolution among all RSSs in the CVS may have resolution_id equal to “0.”
- cropping_resolution_idc[i] Indicates whether cropping is needed to specify a reference region of a reference picture from a target RSS, as defined below, used for inter-prediction as a reference when decoding a coded picture from a current RSS.
- the pseudocode that follows describes one example of how the numbering of an RSS using the resolution_id value that refers to an SPS may be implemented according to the techniques of this disclosure.
- video encoder 20 may predict the pixel values of a block from a block of a reference picture that has a different aspect ratio. Because of the difference in the aspect ratios, video encoder 20 may crop the portion of the block of the reference block in order to obtain a block with a similar resolution aspect ratio to the predictive block.
- the following syntax elements describe how video encoder 20 may perform cropping of blocks to obtain blocks with different resolution aspect ratios.
- Cropping_resolution_idc[i] indicates that the target RSS does not exist, or that no cropping is needed.
- Cropping_resolution_idc[i] indicates that cropping at a left and/or right side is needed.
- Cropping_resolution_idc[i] indicates that cropping at a top and/or bottom is needed.
- Cropping_resolution_idc[i] is “3” indicates that cropping at both the left/right and the top/bottom is needed.
- Table II illustrates the various values of Cropping_resolution_idc[i], and the corresponding indications.
- the RBSP of an SPS may also include syntax elements that may indicate the number of pixels to be cropped from the top, bottom, left, and/or right of a reference picture from an RSS. These additional cropping syntax elements are described in further detail below.
- cropped_left[i] Specifies a number of pixels to be cropped at a left side of a luma component of the reference picture from the target RSS, to specify the reference region.
- video encoder 20 may infer the value to be equal to “0.”
- cropped_right[i] Specifies a number of pixels to be cropped at a right side of the luma component of the reference picture from the target RSS, to specify the reference region.
- video encoder 20 may infer the value to be equal to “0.”
- cropped_top[i] Specifies a number of pixels to be cropped at a top of the luma component of the reference picture from the target RSS, to specify the reference region.
- video encoder 20 may infer the value to be equal to “0.”
- cropped_bottom[i] Specifies a number of pixels to be cropped at a bottom of the luma component of the reference picture from the target RSS, to specify the reference region.
- video encoder 20 may infer the value to be equal to “0.”
- video encoder 20 may signal the cropping window in other ways.
- video encoder 20 may signal the cropping window as the starting vertical and horizontal positions plus the width and height.
- video encoder 20 may signal the cropping window as the starting vertical and horizontal positions and the ending vertical and horizontal positions.
- prediction processing unit 41 may use a coded picture in the current RSS, prediction processing unit 41 may crop a decoded picture from the target RSS as specified by the above cropping syntax elements. prediction processing unit 41 may also scale the cropped reference picture to be the same resolution as the coded picture in the current RSS, and scale the motion vectors of the cropped block accordingly.
- video encoder 20 may each include DPB 64 that may contain decoded pictures.
- DPB management units 65 may manage DPB 64 .
- Each decoded picture contained within DPB 64 may be needed for either inter-prediction as a reference, or for future output.
- DPB 64 may be modified to support adaptive-resolution CVSs, and more generally to store frames of different sizes.
- the DPB may empty (i.e., an indication of a proportion of DPB 64 that is unavailable to store decoded pictures, or DPB “fullness,” is set to “0”).
- DPB management unit 65 may increment the “fullness” of the DPB by the number of blocks (e.g., CUs or 8 ⁇ 8 pixel blocks) in the picture.
- DPB management unit 65 may decrease the fullness of the DPB by the number of blocks (e.g., CUs or 8 ⁇ 8 pixel blocks) in the removed picture.
- the RBSP of an SPS may include a syntax element that specifies a size of the DPB in 8 ⁇ 8 blocks.
- the parameter, denoted as max_dec_pic_buffering specifies a required size of a decoded picture buffer (DPB), in units of 8 ⁇ 8 blocks, for decoding the CVS.
- DPB decoded picture buffer
- This syntax element may apply to the entire CVS, and its value is identical for all SPSs that may be activated for the CVS. Further detail of the operation of the DPB is described with respect to FIG. 5 , below.
- FIG. 3 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure.
- video decoder 30 includes an entropy decoding unit 80 , prediction processing unit 81 , inverse quantization unit 86 , inverse transformation unit 88 , summer 90 , decoded picture buffer (DPB) 92 , and DBP management unit 93 .
- Prediction processing unit 81 includes motion compensation unit 82 and intra prediction unit 84 .
- Video decoder 30 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 20 from FIG. 2 .
- video decoder 30 receives an encoded video bitstream that represents video blocks of an encoded video slice and associated syntax elements from video encoder 20 .
- Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements.
- Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction processing unit 81 .
- Video decoder 30 may receive the syntax elements at the video slice level and/or the video block level.
- intra prediction unit 84 of prediction processing unit 81 may generate prediction data for a video block of the current video slice based on a signaled intra prediction mode and data from previously decoded blocks of the current picture.
- motion compensation unit 82 of prediction processing unit 81 produces predictive blocks for a video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 80 .
- the predictive blocks may be produced from one of the reference pictures within one of the reference picture lists.
- Video decoder 30 may construct the reference frame lists, List 0 and List 1 , using default construction techniques based on reference pictures stored in decoded picture buffer 92 . In some examples, video decoder 30 may construct List 0 and List 1 from the reference pictures identified in the derived reference picture set.
- Motion compensation unit 82 determines prediction information for a video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra- or inter-prediction) used to code the video blocks of the video slice, an inter-prediction slice type (e.g., B slice or P slice), construction information for one or more of the reference picture lists for the slice, motion vectors for each inter-encoded video block of the slice, inter-prediction status for each inter-coded video block of the slice, and other information to decode the video blocks in the current video slice.
- a prediction mode e.g., intra- or inter-prediction
- an inter-prediction slice type e.g., B slice or P slice
- construction information for one or more of the reference picture lists for the slice motion vectors for each inter-encoded video block of the slice, inter-pre
- Motion compensation unit 82 may also perform interpolation based on interpolation filters. Motion compensation unit 82 may use interpolation filters as used by video encoder 20 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from the received syntax elements and use the interpolation filters to produce predictive blocks.
- Inverse quantization unit 86 inverse quantizes, i.e., de quantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80 .
- the inverse quantization process may include use of a quantization parameter calculated by video encoder 20 for each video block in the video slice to determine a degree of quantization and, likewise, a degree of inverse quantization that should be applied.
- Inverse transform unit 88 applies an inverse transform, e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
- video decoder 30 forms a decoded video block by summing the residual blocks from inverse transform unit 88 with the corresponding predictive blocks generated by prediction processing unit 81 .
- Summer 90 represents the component or components that perform this summation operation.
- a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts.
- Other loop filters may also be used to smooth pixel transitions, or otherwise improve the video quality.
- DPB management unit 93 may store the decoded video blocks of a given in decoded picture buffer 92 , which stores reference pictures used for subsequent motion compensation.
- Decoded picture buffer 92 also stores decoded video for later presentation on a display device, such as display device 32 of FIG. 1 .
- prediction processing unit 81 and DPB management unit 93 represent example units for performing the example functions described above.
- prediction processing unit 81 may receive a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution.
- Prediction processing unit 81 may also receive a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set. Prediction processing unit 81 may also use the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.
- prediction processing unit 81 may also receive a first decoded frame of video data, wherein the first decoded frame is associated with a first resolution.
- DPB management unit 93 may determine whether DPB 92 is available to store the first decoded frame based on the first resolution, and in the event the decoded picture buffer is available to store the first decoded frame, store the first decoded frame in DPB 92 , and determine whether the DPB 93 is available to store a second decoded frame of video data, wherein the second decoded frame is associated with a second resolution, based on the first resolution and the second resolution, wherein the first decoded frame is different than the second decoded frame.
- video decoder 30 may perform any of the techniques of this disclosure. In some examples, video decoder 30 may perform some or all of the techniques described above with respect to video encoder 20 in FIG. 2 . In some examples, video decoder 30 may perform the techniques described with respect to FIG. 2 in a reciprocal ordering or manner to that described with respect to video encoder 20 .
- FIGS. 4A-4D are conceptual diagrams that illustrate examples of a coded bitstream including coded video data in accordance with the techniques of this disclosure.
- a coded bitstream 400 may comprise one or more coded video sequences (CVSs), in particular, CVS 402 and CVS 404 .
- CVSs coded video sequences
- each of CVS 402 and CVS 404 may comprise one or more frames, or “pictures,” PIC_ 1 ( 0 )-PIC_ 1 (N), and PIC_ 2 ( 0 )-PIC_ 2 (M), respectively.
- FIG. 4A a coded bitstream 400 may comprise one or more coded video sequences (CVSs), in particular, CVS 402 and CVS 404 .
- each of CVS 402 and CVS 404 may comprise one or more frames, or “pictures,” PIC_ 1 ( 0 )-PIC_ 1 (N), and PIC_ 2 ( 0 )-PIC_ 2 (M), respectively.
- each of CVS 402 and CVS 404 may further comprise a single sequence parameter set (SPS), in particular, SPS 1 and SPS 2 , respectively.
- SPS 1 and SPS 2 may define parameters for the corresponding one of CVS 402 and CVS 404 , including LCU size, SCU size, and other syntax information for the respective CVS that is common to all frames, or “pictures” within the CVS.
- a particular CVS, CVS 406 may further comprise one or more picture parameter sets (PPSs), in particular, PPS 1 and PPS 2 .
- PPSs picture parameter sets
- each of PPS 1 and PPS 2 may define parameters for CVS 406 , including syntax information that indicates picture resolution, that are common to one or more pictures within CVS 406 , but not to all pictures within CVS 406 .
- syntax information included within each of PPS 1 and PPS 2 e.g., picture resolution syntax information, may apply to a sub-set of the pictures included within CVS 406 .
- PPS 1 may indicate picture resolution for PIC_ 1 ( 0 )-PIC_ 1 (N)
- PPS 2 may indicate picture resolution for PIC_ 2 ( 0 )-PIC_ 2 (M).
- CVS 406 may comprise pictures having different resolutions, wherein picture resolution for a particular one or more pictures (e.g., PIC_ 1 ( 0 )-PIC_ 1 (N)) within CVS 406 that share a common picture resolution may be specified by a corresponding one of PPS 1 and PPS 2 .
- a PPS may have to be signaled prior to each picture having a different picture resolution relative to a previous picture in the decoding order, to indicate the picture resolution for the currently decoded picture. Accordingly, in such cases, multiple PPSs may need to be signaled throughout decoding the CVS, which may increase coding overhead.
- a PPS RBSP may include parameters that can be referred to by coded slice NAL units of one or more coded pictures.
- Each PPS RBSP is initially considered not active at a start of a decoding process. In most examples, one PPS RBSP is considered active at any given moment during the decoding process, and activation of any particular PPS RBSP results in deactivation of a previously-active PPS RBSP, if any.
- a PPS RBSP (with a particular value of the pic_parameter_set_id syntax element) is not active, and is referred to by a coded slice NAL unit (using the particular value of pic_parameter_set_id)
- the PPS referred to by the pic_parameter_sed_id is activated.
- This PPS RBSP is referred to as an “active PPS RBSP,” until it is deactivated by an activation of another PPS.
- Video encoder 20 or decoder 30 may require a PPS with the referenced pic_parameter_set_id, value to have been received before activating that PPS with that pic_parameter_set_id.
- a NAL unit may refer to PPS 1 .
- Video encoder 20 or decoder 30 may activate PPS 1 based on the reference to PPS 1 in the NAL unit.
- PPS 1 is the active PPS RBSP.
- PPS 1 remains the active PPS RBSP until a NAL unit references PPS 2 , at which point video encoder 20 or decoder 30 may activate PPS 2 .
- PPS 2 becomes the active PPS RBSP, and PPS 1 is no longer the active PPS RBSP.
- Any PPS NAL unit that has the same pic_parameter_set_id value for the active PPS RBSP for a coded picture may have the same content as that of the active PPS RBSP for the coded picture. That is, if the pic_parameter_set_id of the PPS NAL is the same as that of the active PPS RBSP, the content of the active PPS RBSP may not change. There may be an exception to this rule, however.
- a PPS NAL has the same pic_parameter_set_id as the active PPS RBSP, and the PPS NAL follows the last Video Coding Layer (VCL) NAL unit of the coded picture, and precedes the first VCL NAL unit of another coded picture, then the content of the active PPS RBSP may change (e.g., the pic_parameter_set_id value may indicate a different set of parameters).
- VCL Video Coding Layer
- syntax information that indicates picture resolution for one or more pictures within a CVS, wherein the CVS comprises one or more pictures having different sizes may be indicated using multiple SPSs for the CVS, rather than using a plurality of PPSs, as described above with reference to FIGS. 4A-4B .
- a SPS RBSP may include parameters that can be referred to by one or more PPS RBSPs, or one or more Supplemental Extension Information (SEI) NAL units containing a buffering period SEI message.
- SEI Supplemental Extension Information
- Each SPS is initially considered not active at a start of a decoding process. At most, one SPS may be considered active for each RSS at any given moment during the decoding process, and the activation of any particular SPS may result in a deactivation of a previously-active SPS for the same resolution sub-sequence, if any. Also, if there are “n” resolution sub-sequences within the CVS, at most “n” SPS RBSPs may be considered active for the entire CVS at any given moment during the decoding process.
- an SPS RBSP (with a particular value of seq_parameter_set_id) is not already active, and is referred to by activation of a PPS RBSP (using the particular value of seq_parameter_set_id), or is referred to by an SEI NAL unit containing a buffering period SEI message (using the particular value of seq_parameter_set_id), the SPS RBSP is activated.
- This SPS RBSP may be referred to as an “active SPS RBSP” for the associated RSS (the RSS in which the coded pictures refers to the active SPS RBSP through the PPS RBSPs), until it is deactivated by an activation of another SPS RBSP.
- Video encoder 20 or decoder 30 may require the SPS RBSP with a particular value of seq_parameter_set_id, to be available to video encoder 20 or video decoder 30 prior to the activation of that SPS. Additionally, the SPS may remain active for the entire RSS in the CVS.
- an SPS RBSP may only be activated by a buffering period SEI message when the buffering period SEI message is part of an IDR access unit.
- Any SPS NAL unit containing the particular value of seq_parameter_set_id for the active SPS RBSP for a RSS in a CVS may have the same content as that of the active SPS RBSP for the RSS in the CVS, unless it follows a last access unit of the CVS, and precedes the first VCL NAL unit and the first SEI NAL unit containing a buffering period SEI message (when present) of another CVS.
- PPS RBSP or SPS RBSP are conveyed within the bitstream, these constraints impose an order constraint on the NAL units that contain the PPS RBSP or the SPS RBSP, respectively. Otherwise if PPS RBSP or SPS RBSP are conveyed by other means not specified in this disclosure, they should be available to the decoding process in a timely fashion such that these constraints are obeyed.
- constraints that are expressed on the relationship between the values of the syntax elements (and the values of variables derived from those syntax elements) in SPS and PPS, and other syntax elements, are typically expressions of constraints that apply only to the active SPS and the active PPS. If any SPS RBSP is present that is not activated in the bitstream, its syntax elements usually have values that would conform to the specified constraints if it were activated by reference in an otherwise conforming bitstream. If any PPS RBSP is present that is not ever activated in the bitstream, the syntax elements of the PPS RBSP may have values that would conform to the specified constraints if the PPS were activated by reference in an otherwise-conforming bitstream.
- the values of parameters of the active PPS and the active SPS may be considered to be in effect.
- the values of the parameters of the PPS and SPS that are active for the operation of the decoding process for the VCL NAL units of the primary coded picture in the same access unit may be considered in effect unless otherwise specified in the SEI message semantics.
- CVS 408 may include one or more SPSs, in particular, SPS 1 and SPS 2 , that each indicate picture resolution for PIC_ 1 ( 0 ), PIC_ 1 ( 1 ), etc., and PIC_ 2 ( 0 ), PIC_ 2 ( 1 ), etc., respectively.
- SPS 1 indicates picture resolution information for PIC_ 1 ( 0 ), PIC_ 1 ( 1 ), etc.
- SPS 2 indicates picture resolution information for PIC_ 2 ( 0 ), PIC_ 2 ( 1 ), etc.
- CVS 408 may further comprise one or more PPSs (not shown), wherein the one or more PPSs may specify syntax information for one or more pictures of CVS 408 , but wherein the one or more PPSs do not include any syntax information that indicates picture resolution for any of the one or more pictures of CVS 408 .
- SPS 1 and SPS 2 may indicate picture resolution information for all pictures within CVS 408 , even in cases where pictures having different resolutions are alternated within a CVS in the decoding order. Accordingly, after the indicating picture resolution information for all pictures within CVS 408 using SPS 1 and SPS 2 , no additional indication of the information may be needed.
- the multiple SPSs may be located at the beginning of the corresponding CVS, e.g., CVS 408 , prior to any of PIC_ 1 ( 0 ), PIC_ 1 ( 1 ) and PIC_ 2 ( 0 ), PIC_ 2 ( 1 ).
- an SPS that indicates picture resolution information for one or more pictures may be located before a first one of such pictures in a decoding sequence. For example, as shown in FIG.
- SPS 2 is located within CVS 410 prior to a first one of pictures PIC_ 2 ( 0 ), PIC_ 2 ( 1 ), etc., but after a first one of PIC_ 1 ( 0 ), PIC_ 1 ( 1 ), etc.
- FIG. 5 is a conceptual diagram illustrating the operation of a decoded picture buffer of a hypothetical reference decoder (HRD) model in accordance with the techniques of this disclosure.
- FIG. 5 includes coded picture buffer (CPB) 502 , decoded picture buffer (DPB) 504 , and DPB management unit 506 .
- DPB management unit 506 may remove a picture in coded picture buffer (CPB) 502 .
- Video encoder 20 or decoder 30 may decode the picture, and DPB management unit 506 may store the decoded picture in decoded picture buffer 504 . Based on various criteria, such as an output time, output flag, or a picture count, DPB management unit 506 may remove a picture from DPB 504 .
- video encoder 20 or decoder 30 may output the decoded picture.
- CPB 502 may contain encoded pictures that are removed so that video encoder 20 or decoder 30 may utilize the decoded pictures that may be needed for inter-prediction as a reference, or for future output.
- DPB 504 may include a maximum capacity. In previous video coding standards, DPB 504 may include a maximum number of frames that can be stored in the DPB. However, the support adaptive-resolution CVSs, DPB management unit 506 may maintain a count of blocks contained within the DPB to measure the “fullness” of the DPB.
- DPB management unit 506 of video decoder 30 may remove decoded pictures based on an output time if the pictures are intended for output.
- DPB management unit 506 may remove decode pictures based on the picture order count (POC) values if the pictures are intended for output.
- POC picture order count
- DPB management unit 506 may remove decoded pictures that are not needed for output (i.e., outputted already or not intended for output) when the decoded picture is not in the reference picture set, and prior to decoding the current picture.
- video encoder 20 and DPB management unit 506 of video encoder 20 may also perform any of the DPB management techniques described in this disclosure.
- DPB 504 may include a plurality of buffers, and each buffer may store a decoded picture that is to be used as a reference picture or is held for future output. Initially, the DPB is empty (i.e., the DPB fullness is set to zero). In the described example techniques, the removal of the decoded pictures from the DPB may occur before the decoding of the current picture, but after video decoder 30 parses the slice header of the first slice of the current picture.
- t r (n) is CPB removal time (i.e., decoding time) of the access unit n containing the current picture.
- the techniques occurring instantaneously may mean that the in the HRD model, it is assumed that decoding of a picture is instantaneous, with a time period for decoding a picture equal to zero.
- decoder 30 may invoke the derivation process for a reference picture set. If the current picture, which DPB management unit 506 may retrieve from CPB 502 is an IDR picture, DPB management unit 506 may remove all decoded pictures from DPB 504 , and may set and the DPB fullness to 0. If the decoded picture is not an IDR picture, DPB management unit 506 may remove all pictures not included in the reference picture set of the current picture from DPB 504 .
- the OutputFlag may indicate that video decoder 30 should output the picture (e.g., for display or for transmission in the case of an encoder).
- DPB management unit 506 may decrement the fullness of DPB 504 by the number of 8 ⁇ 8 blocks in the picture, i.e., (pic_width_in_luma_samples*pic_height_in_luma_samples)>>6.
- DPB management unit 506 After DPB management unit 506 has removed any pictures from the DPB, video decoder 30 may decode and store the received picture “n” in the DPB. DPB management unit 506 may increment the DPB fullness by the number of 8 ⁇ 8 blocks in the stored decoded picture, i.e., (pic_width_in_luma_samples*pic_height_in_luma_samples)>>6.
- Each picture may also have an OutputFlag, as described above.
- the DPB output time, denoted as t o,dpb (n), of the picture may be derived by the following equation.
- dpb output delay(n) may be the value of dpb output delay specified in the picture timing SEI message associated with access unit “n.”
- Video decoder 30 may output the current picture. Otherwise, if the value of OutputFlag is equal to 0, video decoder 30 may not output the current picture. Otherwise, (i.e., if OutputFlag is equal to 1 and t o,dpb (n)>t r (n)), video decoder 30 may output the current picture later, at time t o,dpb (n).
- video decoder 30 may crop the picture in the decoded picture buffer.
- Video decoder 30 may utilize the cropping rectangle specified in the active sequence parameter set for the picture to determine the cropping rectangle.
- video decoder 30 may determine a difference between the DPB output time for a picture and the DPB output time for a picture following the picture in output order.
- the output time of picture “n” ⁇ t o,dpb (n) may be defined according to the following equation.
- ⁇ t o,dpb ( n ) t o,dpb ( n n ) ⁇ t o,dpb ( n )
- n n may denote the picture that follows after picture “n” in output order and has OutputFlag equal to 1.
- the HRD may implement the techniques instantaneously when DPB management unit 506 removes an access unit from CPB 502 .
- video decoder 30 and DPB management unit 506 of video decoder 30 may implement the removing of decoded pictures from DPB 504
- video decoder 30 may not necessarily include CPB 502 .
- video decoder 30 and video encoder 20 may not require CPB 502 . Rather, CPB 504 is described as part of the HRD model for purposes of illustration only.
- DPB management unit 506 may remove the pictures from the DPB before the decoding of the current picture, but after parsing the slice header of the first slice of the current picture. Also, similar to the first perspective for removing decoded pictures, in the second perspective, video decoder 30 and DPB management unit 506 may perform similar functions to those described above with respect to the first perspective when the current picture is an IDR picture.
- DPB management unit 506 may empty, without output, buffers of the DPB that store a picture that is marked as “not needed for output” and that store pictures not included in the reference picture set of the current picture. DPB management unit 506 may also decrement the DPB fullness by the number of buffers that DPB management unit 506 emptied. When there is not empty buffer (i.e., the DPB fullness is equal to the DBP size), DPB management unit 506 may implement a “bumping” process described below. In some examples, when there is no empty buffer, DPB management unit 506 may implement the bumping process repeatedly unit there is an empty buffer in which video decoder 30 can store the current decoded picture.
- video decoder 30 may implement the following steps to implement the bumping process.
- Video decoder 30 may first determine the picture to be outputted. For example, video decoder 30 may select the picture having the smaller PicOrderCnt (POC) value of all the pictures in DPB 504 that are marked as “needed for output.”
- Video decoder 30 may crop the selected picture using the cropping rectangle specified in the active sequence parameter set for the picture.
- Video decoder 30 may output the cropped picture, and may mark the picture as “not needed for output.”
- Video decoder 30 may check the buffer of DPB 504 that stored the cropped and outputted picture. If the picture is not included in the reference picture set, DPB management unit 506 may empty that buffer and may decrement the DPB fullness by the number of 8 ⁇ 8 blocks in the removed picture.
- POC PicOrderCnt
- video encoder 20 may implement similar techniques. However, video encoder 20 implementing similar techniques is not required in every example. In some examples, video decoder 30 may implement these techniques, and video encoder 20 may not implement these techniques.
- a video coder may implement techniques to support CVSs having adaptive resolution.
- the reference picture set may identify the reference pictures that can potentially be used for inter-predicting the current picture and can potentially be used for inter-predicting one or more picture following the current picture in decoding order.
- the DPB size or fullness may be signaled with respect to the number of 8 ⁇ 8 blocks of a pictured stored in the DPB.
- the fullness of the DPB i.e., the max_dec_pic buffering syntax element, may be signaled based on the number of smallest coding units (SCUs) of a picture. For example, if the smallest SCU among all active SPSs is 16 ⁇ 16, then the unit of max_dec_pic buffering may be 16 ⁇ 16 blocks.
- video encoder 20 or decoder 30 may signal the DPB size, indicated by the max_dec_pic buffering syntax element, using units of frame buffers that are specific to the spatial resolution indicated by the SPS. For example, if there are two RSSs, rss 1 and rss 2 , with resolution res 1 and resolution res 2 , referring to SPS sps 1 and SPS sps 2 respectively, wherein res 1 is greater than res 2 , then max_dec_pic buffering in sps 1 is counted in frame buffers of res 1 , and max_dec_pic buffering in sps 2 is counted in frame buffers of res 2 .
- video encoder 20 or decoder 30 may be subject to the restriction that the DPB size, if counted in units of 8 ⁇ 8 blocks, indicated by the max_dec_pic buffering value in sps 1 may not be less than that indicated by the max_dec_pic buffering value in sps 2 . Consequently, in the DPB operations, when video decoder 30 removes one frame buffer of res 1 from DPB 504 , the freed buffer space may be sufficient for insertion of a decoded picture of either resolution. However, when decoder 30 removes one frame buffer of res 2 from DPB 504 , the freed buffer space may not be sufficient for insertion of a decoded picture of res 1 . Rather, video decoder 30 may remove multiple frame buffers of res 2 from DPB 504 in this case.
- the video decoder 30 may derive the reference picture set in any manner, including the example techniques described above. Video decoder 30 may determine whether a decoded picture stored in the decoded picture buffer is not needed for output and is not identified in the reference picture set. When video decoder 30 has outputted the decoded picture and the decoded picture is not identified in the reference picture set, the video decoder 30 may remove the decoded picture from the decoded picture buffer. Subsequent to removing the decoded picture, video decoder 30 may code the current picture. For example, video decoder 30 may construct the reference picture list(s) as described above, and code the current picture based on the reference picture list(s).
- FIG. 6 is a flowchart illustrating an example operation of using a first sub-sequence and a second sub-sequence to decode video in accordance with the techniques of this disclosure.
- the method of FIG. 6 may be performed by a video coder corresponding to either video encoder 20 or video decoder 30 .
- the video coder may process a coded video sequence comprising a first sub-sequence and a second sub-sequence ( 601 ).
- the first sub-sequence may include one or more frames each having a first resolution
- the second sub-sequence may include one or more frames each having a second resolution.
- the first sub-sequence may be different than the second sub-sequence, and the first resolution may be different than the second resolution.
- the video coder may also process a first sequence parameter set (SPS) and a second sequence parameter set for the coded video sequence ( 602 ).
- the first sequence parameter set may indicate the first resolution of the one or more frames of the first sub-sequence
- the second sequence parameter set may indicate the second resolution of the one or more frames of the second sub-sequence.
- the first sequence parameter set may also be different than the second sequence parameter set.
- the video coder e.g., video encoder 20 or video decoder 30
- the video coder may comprise an encoder, e.g., encoder 20 of FIGS. 1-2 .
- processing SPSs and sub-sequences may comprise receiving the SPSs and sub-sequences.
- coding the first and second video sequences may comprise decoding the first and second video sequences.
- processing SPSs and sub-sequences may comprise generating the SPSs and sub-sequences.
- coding the first and second video sequences may comprise encoding the first and second video sequences.
- the video encoder may transmit the coded video sequence comprising the first sub-sequence and the second subs-sequence instead of receiving the video sequence comprising the first and second sub-sequence.
- the first resolution and the second resolution may each comprise a spatial resolution.
- the video coder may code the first sequence parameter set and the second sequence parameter in a received bitstream prior to either the first sub-sequence or the second sub-sequence.
- the video coder may be configured to receive both the first sequence parameter set and the second sequence parameter set prior to receiving either of the first sub-sequence and the second sub-sequence.
- the video coder may code the first sequence parameter set in a received bitstream prior to the first sub-sequence and the second sequence parameter set is coded in the received bitstream after at least one frame of the one or more frames of the first sub-sequence, and prior to the second sub-sequence.
- the video coder may be configured to receive the second sequence parameter set after receiving at least one frame of the one or more frames of the first sub-sequence, and prior to receiving the second sub-sequence.
- the video coder may interleave the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence in the coded video sequence.
- FIG. 7 is a flowchart illustrating an example operation of managing a decoded picture buffer.
- the method of FIG. 7 may be performed by a video coder corresponding to either video encoder 20 or video decoder 30 .
- a video coder may receive a coded video sequence comprising a first sub-sequence and a second sub-sequence ( 701 ).
- the first sub-sequence may include one or more frames each having a first resolution
- the second sub-sequence may include one or more frames each having a second resolution.
- the first sub-sequence may be different than the second sub-sequence, and the first resolution may be different than the second resolution.
- the video coder may receive a first decoded frame of video data, and the first decoded frame may be associated with a first resolution.
- the resolution may comprise a spatial resolution.
- the video coder may also determine whether a decoded picture buffer is available to store the first decoded frame based on the first resolution ( 702 ). In the event the decoded picture buffer is available to store the first decoded frame, the video coder may store the first decoded frame in the decoded picture buffer, and determine whether the decoded picture buffer is available to store a second decoded frame of video data. The second decoded frame of video data may be associated with a second resolution. The video coder may also determine whether the decoded picture buffer is available to store the second decoded frame based on the first resolution and the second resolution ( 704 ). The first decoded frame may also be different than the second decoded frame.
- the video coder may be configured to determine an amount of information that may be stored within the decoded picture buffer, determine an amount of information associated with the first decoded frame based on the first resolution, and compare the amount of information that may be stored within the decoded picture buffer and the amount of information associated with the first decoded frame.
- the video coder may be configured to determine an amount of information that may be stored within the decoded picture buffer based on the first resolution, determine an amount of information associated with the second decoded frame based on the second resolution, and compare the amount of information that may be stored within the decoded picture buffer and the amount of information associated with the second decoded frame.
- such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- any connection is properly termed a computer-readable medium.
- a computer-readable medium For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
- DSL digital subscriber line
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
- the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
- IC integrated circuit
- a set of ICs e.g., a chip set.
- Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Techniques are described related to receiving first and second sub-sequences of video, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, receiving a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set, and using the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.
Description
- This application claims the benefit of:
- U.S. Provisional Application No. 61/545,525, filed Oct. 10, 2011, and
- U.S. Provisional Application No. 61/550,276, filed on Oct. 21, 2011 the entire contents each of which is hereby incorporated by reference in its entirety.
- This disclosure relates to video coding and, more particularly, to techniques for coding video data.
- Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard presently under development, and extensions of such standards. The video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
- Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as treeblocks, coding tree blocks (CTBs), coding tree units (CTUs), coding units (CUs) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to a reference frames.
- Spatial or temporal prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples forming the predictive block, and the residual data indicating the difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, resulting in residual transform coefficients, which then may be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to produce a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
- In general, this disclosure describes techniques for coding video sequences that include frames, or “pictures,” having different spatial resolutions. One aspect of this disclosure includes using multiple sequence parameter sets in a single resolution-adaptive coded video sequence to indicate a resolution of a sequence of pictures in coded video. As one example, the resolution-adaptive coded video sequence may comprise two or more sub-sequences which may be coded, wherein each sub-sequence may comprise a set of pictures with a common spatial resolution, and may refer to a same active sequence parameter set. Another aspect of this disclosure includes a novel activation process for activating a sequence parameter set when using multiple sequence parameter sets in a single resolution-adaptive coded video sequence, as described above.
- Yet another aspect of this disclosure includes novel techniques for managing a decoded picture buffer (DPB). As one example, a size of a DPB is not indicated using a number of frame buffers (e.g., a number of storage locations each capable of storing a frame, or “picture,” of a fixed size), consistent with some techniques, but rather using a different unit of size. As another example, before inserting a decoded picture into a DPB, the availability of the DPB to store the decoded picture is determined based on a spatial resolution of the decoded picture to be inserted, so as to ensure that the DPB includes sufficient empty buffer space for inserting the decoded picture. As still another example, after removing a decoded picture from a DPB, the availability of the DPB to store a subsequent decoded picture is determined based on a spatial resolution of the removed decoded picture, and a spatial resolution of the subsequent decoded picture to be inserted into the DPB. In other words, the proportion of the DPB unavailable to store decoded pictures, or a “fullness” of the DPB, after removing the decoded picture, is not decreased by an amount corresponding to a single decoded picture of a fixed size, consistent with some techniques, but rather by a varying amount, depending on the spatial resolution of the removed decoded picture.
-
FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize techniques described in this disclosure. -
FIG. 2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure. -
FIG. 3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure. -
FIGS. 4A-4D are conceptual diagrams illustrating an example video sequence that includes a plurality of pictures that are encoded and transmitted in accordance with the techniques of this disclosure. -
FIG. 5 is a conceptual diagram illustrating the operation of a decoded picture buffer of a hypothetical reference decoder (HRD) model in accordance with the techniques of this disclosure. -
FIG. 6 is a flowchart illustrating an example operation of using a first sub-sequence and a second sub-sequence to decode video in accordance with the techniques of this disclosure. -
FIG. 7 is a flowchart illustrating an example operation of managing a decoded picture buffer in accordance with the techniques of this disclosure. - The techniques of this disclosure are generally related for techniques for using multiple sequence parameter sets (SPSs) for communicating video data at different resolutions, and techniques for managing the multiple SPSs. In the current High Efficiency Video Coding (HEVC) design, pictures in a same coded video sequence (CVS) have a same size, wherein the size is signaled in a sequence parameter set (SPS) for the CVS. Additional syntax information for the CVS also signaled in the SPS includes the Largest Coding Unit (LCU) size and the Smallest Coding Unit (SCU) size, which define a largest and a smallest block, or coding unit, size for each picture, respectively. In the context of H.264/AVC and High Efficiency Video Coding (HEVC), a CVS may refer to a sequence of coded pictures starting from an instantaneous decoding refresh (IDR) picture to another IDR picture, exclusive, in decoding order, or the end of the coded video bitstream if the starting IDR picture is the last IDR picture in coded video bitstream.
- However, HEVC may support resolution-adaptive video sequences that include frames with different resolutions. One method for adaptive frame size support is described in JCTVC-F158: Resolution switching for coding efficiency and resilience, Davies, 6th Meeting, Turin, IT, 14-22 Jul. 2011, referred to as JCTVC-F158 hereinafter.
- To support resolution-adaptive video, this disclosure describes techniques for coding multiple SPSs. Each SPS of the multiple SPSs may include information related to a sequence of pictures that has a different resolution. This disclosure also introduces a new sequence, referred to as a resolution sub-sequence (RSS) that may refer back to one of the multiple SPSs in order to indicate the resolution of a sequence of pictures. This disclosure also describes techniques for activating a single SPS when multiple parameters sets may be utilized within a single CVS, as well as different techniques and orders for transmitting the different SPSs.
- The techniques of this disclosure are also related to techniques for managing a decoded picture buffer (DPB). For example, a video coder (e.g., a video encoder or a video decoder) includes a DPB. The DPB stores decoded pictures, including reference pictures. Reference pictures are pictures that can potentially be used for inter-predicting a picture. In other words, the video coder may predict a picture, during coding (encoding or decoding) of that picture, based on one or more reference pictures stored in the DPB.
- Decoded pictures used for predicting subsequent coded pictures, and for future output, are buffered in a Decoded Picture Buffer (DPB).
- To efficiently utilize memory of a DPB, DPB management processes, including a storage process of decoded pictures into the DPB, a marking process of reference pictures, and an output and removal processes of decoded pictures from the DPB, are specified. DPB management includes at least the following aspects: (1) Picture identification and reference picture identification; (2) Reference picture list construction; (3) Reference picture marking; (4) Picture output from the DPB; (5) Picture insertion into the DPB; and (6) Picture removal from the DPB. Some introduction to reference picture marking and reference picture list construction is included below.
- Each CVS may include a number of reference pictures, which may be used to predict pixel values of other pictures (e.g., pictures that come before or after the reference picture). A video coder marks each reference picture, and stores the reference picture in the DPB. In previous video coding standards, such as H.264/AVC, the DPB includes a maximum number, referred to as M (num_ref_frames), of reference pictures used for inter-prediction in the active sequence parameter set. When a reference picture is decoded, the reference picture is marked as “used for reference.” If the decoding of the reference picture caused more than M pictures to be marked as “used for reference,” at least one picture must be marked as “unused for reference.” The DPB removal process then would remove pictures marked as “unused for reference” from the DPB if they are not needed for output as well.
- When a picture is decoded, the decoded picture may be either a non-reference picture or a reference picture. A reference picture may be a long-term reference picture or short-term reference picture, and when the decoded picture is marked as “unused for reference”, the decoded picture may become no longer needed for reference. In some video coding standards, there may be reference picture marking operations that change the status of the reference pictures.
- There may be at least two types of operation modes for the reference picture marking, such as a sliding window operation mode, and an adaptive memory control operation mode. The operation mode for reference picture marking may be selected on a picture basis; whereas, the sliding window operation mode may work as a first-in-first-out queue with a fixed number of short-term reference pictures. In other words, short-term reference pictures with earliest decoding time may be the first to be removed (marked as picture not used for reference), in an implicit fashion.
- The video coder may also be tasked with constructing reference picture lists that indicate which reference pictures may be used for inter-prediction purposes. Two of these reference picture lists are referred to as
List 0 andList 1, respectively. The video coder firstly employs default construction techniques to constructList 0 and List 1 (e.g., preconfigured construction schemes for constructingList 0 and List 1). Optionally, after theinitial List 0 andList 1 are constructed, the video decoder may decode syntax elements, when present, that instruct the video decoder to modify theinitial List 0 andList 1. - The video encoder may signal syntax elements that are indicative of identifier(s) of reference pictures in the DPB, and the video encoder may also signal syntax elements that include indices, within
List 0,List 1, or bothList 0 andList 1, that indicate which reference picture or pictures to use to decode a coded block of a current picture. The video decoder, in turn, uses the received identifier to identify the index value or values for a reference picture or reference pictures listed inList 0,List 1, or bothList 0 andList 1. From the index value(s) as well as the identifier(s) of the reference picture or reference pictures, the video decoder retrieves the reference picture or reference pictures, or part(s) thereof, from the DPB, and decodes the coded block of the current picture based on the retrieved reference picture or pictures and one or more motion vectors that identify blocks within the reference picture or pictures that are used for decoding the coded block. - In the context of AVC and HEVC, a coded video sequence (CVS) refers to a sequence of coded frames, or “pictures,” ranging from an instantaneous decoding refresh (IDR) picture to another IDR picture, exclusive, in a decoding order, or to an end of a coded video bitstream if the starting IDR picture is the last IDR picture in the coded video bitstream.
- However, when coding a single CVS comprising pictures having at least two different spatial resolutions, with respect to some solutions based on HEVC, e.g., as described in JCTVC-F158, using a DPB having a size measured in pictures may cause a number of issues, which are described below.
- First, a sub-sequence of pictures with one resolution may have different coding parameters, such as an LCU size, than another sub-sequence of pictures with another, different resolution. Accordingly, it may not be sufficient to use a single active SPS to describe characteristics of a CVS comprising the sub-sequences of pictures with the different resolutions.
- Furthermore, different sub-sequences of a CVS may have reference pictures having different sizes, that is, different spatial resolutions. Accordingly, one set of particular parameters included in an SPS for the CVS, e.g., max_num_ref_frames, may be optimal for one sub-sequence, but can be sub-optimal for all sub-sequences included in the CVS.
- Additionally, some techniques for DPB management may no longer be effective when coding a single CVS that includes pictures having different resolutions. As one example, because the pictures having the different resolutions may correspond to the pictures having different sizes, a size of a DPB used to store the pictures can no longer be indicated using a number of frame buffers, e.g., a number of storage locations each capable of storing a frame, or “picture,” of a fixed size.
- Furthermore, to insert a decoded picture into the DPB, the DPB must include an empty frame buffer of a size that is sufficiently large to store the decoded picture. However, once again, because the pictures having the different resolutions may correspond to the pictures having different sizes, a frame buffer of a fixed size may not correspond to a size of a particular decoded picture to be inserted. Accordingly, merely determining whether the DPB includes an empty frame buffer of a fixed size may be insufficient to determine whether the DPB is available to store the decoded picture. As one example, the DPB may have less buffer space than is required to store the decoded picture.
- Similarly, after removing a decoded picture from the DPB, wherein the removed decoded picture has a resolution that corresponds to a size that is different than the size of the frame buffer, merely determining that the decoded picture has been removed from the DPB may be insufficient to determine whether the DPB is actually available to store a subsequent decoded picture having a particular resolution. Furthermore, the above determination is also insufficient to indicate the actual buffer space that may be available within the DPB for storing additional decoded pictures.
- In another example, a single empty frame buffer of a fixed size may exist within the DPB, and the DPB may store decoded picture(s) having a particular resolution in the frame buffer. However, if a video coder removes a decoded picture from the DPB, and the removed picture has a resolution that is smaller than the size of the frame buffer, sufficient buffer space may exist within the DPB to insert a decoded picture with a resolution that corresponds to a size that is larger than the size of the removed decoded picture. Accordingly, merely determining that a particular decoded picture has been removed from the DPB may be insufficient to indicate the actual buffer space that may be available within the DPB for storing additional decoded pictures.
-
FIG. 1 is a block diagram illustrating an example video encoding anddecoding system 10 that may utilize techniques described in this disclosure. In general, a reference picture set is defined as a set of reference pictures associated with a picture, consisting of all reference pictures that are prior to the associated picture in decoding order, that may be used for inter prediction of the associate picture or any picture following the associated picture in decoding order. In some examples, the reference pictures that are prior to the associated picture may be reference pictures until the next instantaneous decoding refresh (IDR) picture, or broken link access (BLA) picture. In other words, reference pictures in the reference picture set may all be prior to the current picture in decoding order. Also, the reference pictures in the reference picture set may be used for inter-predicting the current picture and/or inter-predicting any picture following the current picture in decoding order until the next IDR picture or BLA picture. - For example, some of the reference pictures in the reference picture set are reference pictures that can potentially be used to inter-predict a block of the current picture, and not pictures following the current picture in decoding order. Some of the reference pictures in the reference picture set are reference pictures that can potentially be used to inter-predict a block of the current picture, and blocks in one or more pictures following the current picture in decoding order. Some of the reference pictures in the reference picture set are reference pictures that can potentially be used to inter-predict blocks in one or more pictures following the current picture in decoding order, and cannot be used to inter-predict a block in the current picture.
- As used in this disclosure, reference pictures that can potentially be used for inter-prediction refer to reference pictures that can be used for inter-prediction, but do not necessarily have to be used for inter-prediction. For example, the reference picture set may identify reference pictures that can potentially be used for inter-prediction. However, this does not mean that all of the identified reference pictures must be used for inter-prediction. Rather, one or more of these identified reference pictures could be used for inter-prediction, but all do not necessarily have to be used for inter-prediction.
- As shown in
FIG. 1 ,system 10 includes asource device 12 that generates encoded video for decoding bydestination device 14.Source device 12 anddestination device 14 may each be an example of a video coding device.Source device 12 may transmit the encoded video todestination device 14 viacommunication channel 16 or may store the encoded video on astorage medium 17 or afile server 19, such that the encoded video may be accessed by thedestination device 14 as desired. -
Source device 12 anddestination device 14 may comprise any of a wide range of devices, including a wireless handset such as so-called “smart” phones, so-called “smart” pads, or other such wireless devices equipped for wireless communication. Additional examples ofsource device 12 anddestination device 14 include, but are not limited to, a digital television, a device in digital direct broadcast system, a device in wireless broadcast system, a personal digital assistants (PDA), a laptop computer, a desktop computer, a tablet computer, an e-book reader, a digital camera, a digital recording device, a digital media player, a video gaming device, a video game console, a cellular radio telephone, a satellite radio telephone, a video teleconferencing device, and a video streaming device, a wireless communication device, or the like. - As indicated above, in many cases,
source device 12 and/ordestination device 14 may be equipped for wireless communication. Hence,communication channel 16 may comprise a wireless channel, a wired channel, or a combination of wireless and wired channels suitable for transmission of encoded video data. Similarly, thefile server 19 may be accessed by thedestination device 14 through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server. - The techniques of this disclosure, however, may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions, e.g., via the Internet, encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples,
system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony - In the example of
FIG. 1 ,source device 12 includes avideo source 18,video encoder 20, a modulator/demodulator (MODEM) 22 and anoutput interface 24. Insource device 12,video source 18 may include a source such as a video capture device, such as a video camera, a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources. As one example, ifvideo source 18 is a video camera,source device 12 anddestination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. - The captured, pre-captured, or computer-generated video may be encoded by
video encoder 20. The encoded video information may be modulated bymodem 22 according to a communication standard, such as a wireless communication protocol, and transmitted todestination device 14 viaoutput interface 24.Modem 22 may include various mixers, filters, amplifiers or other components designed for signal modulation.Output interface 24 may include circuits designed for transmitting data, including amplifiers, filters, and one or more antennas. - The captured, pre-captured, or computer-generated video that is encoded by the
video encoder 20 may also be stored onto astorage medium 17 or afile server 19 for later consumption. Thestorage medium 17 may include Blu-ray discs, DVDs, CD-ROMs, flash memory, or any other suitable digital storage media for storing encoded video. The encoded video stored on thestorage medium 17 may then be accessed bydestination device 14 for decoding and playback. -
File server 19 may be any type of server capable of storing encoded video and transmitting that encoded video to thedestination device 14. Example file servers include a web server (e.g., for a website), an FTP server, network attached storage (NAS) devices, a local disk drive, or any other type of device capable of storing encoded video data and transmitting it to a destination device. The transmission of encoded video data from thefile server 19 may be a streaming transmission, a download transmission, or a combination of both. Thefile server 19 may be accessed by thedestination device 14 through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, Ethernet, USB, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server. -
Destination device 14, in the example ofFIG. 1 , includes aninput interface 26, amodem 28, avideo decoder 30, and adisplay device 32.Input interface 26 ofdestination device 14 receives information overchannel 16, as one example, or fromstorage medium 17 orfile server 17, as alternate examples, andmodem 28 demodulates the information to produce a demodulated bitstream forvideo decoder 30. The demodulated bitstream may include a variety of syntax information generated byvideo encoder 20 for use byvideo decoder 30 in decoding video data. Such syntax may also be included with the encoded video data stored on astorage medium 17 or afile server 19. As one example, the syntax may be embedded with the encoded video data, although aspects of this disclosure should not be considered limited to such a requirement. The syntax information defined byvideo encoder 20, which is also used byvideo decoder 30, may include syntax elements that describe characteristics and/or processing of video blocks, such as coding tree units (CTUs), coding tree blocks (CTBs), prediction units (PUs), coding units (CUs) or other units of coded video, e.g., video slices, video pictures, and video sequences or groups of pictures (GOPs). Each ofvideo encoder 20 andvideo decoder 30 may form part of a respective encoder-decoder (CODEC) that is capable of encoding or decoding video data. -
Display device 32 may be integrated with, or external to,destination device 14. In some examples,destination device 14 may include an integrated display device and also be configured to interface with an external display device. In other examples,destination device 14 may be a display device. In general,display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device. - In the example of
FIG. 1 ,communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media.Communication channel 16 may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet.Communication channel 16 generally represents any suitable communication medium, or collection of different communication media, for transmitting video data fromsource device 12 todestination device 14, including any suitable combination of wired or wireless media.Communication channel 16 may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication fromsource device 12 todestination device 14. -
Video encoder 20 andvideo decoder 30 may operate according to a video compression standard, such as the include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions. In addition, there is a new video coding standard, namely High Efficiency Video Coding (HEVC) standard presently under development by the Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG). A recent Working Draft (WD) of HEVC, and referred to as HEVC WD8 hereinafter, is available, as of Jul. 20, 2012, from http://phenix.int-evry.fr/jct/doc_end_user/documents/10_Stockholm/wg11/JCTVC-J1003-v8.zip. - The techniques of this disclosure, however, are not limited to any particular coding standard. For purposes of illustration only, the techniques are described in accordance with the HEVC standard.
- Although not shown in
FIG. 1 , in some aspects,video encoder 20 andvideo decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP). -
Video encoder 20 andvideo decoder 30 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more processors including microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented partially in software, a device may store instructions for the software in a suitable, non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. - Each of
video encoder 20 andvideo decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device. In some instances,video encoder 20 andvideo decoder 30 may be commonly referred to as a video coder that codes information (e.g., pictures and syntax elements). The coding of information may refer to encoding when the video coder corresponds tovideo encoder 20. The coding of information may refer to decoding when the video coder corresponds tovideo decoder 30. -
FIG. 2 is a block diagram illustrating anexample video encoder 20 that may implement the techniques described in this disclosure.Video encoder 20 may perform intra- and inter-coding of video blocks within video slices. Intra coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I mode) may refer to any of several spatial based compression modes. Inter-modes, such as uni-directional prediction (P mode) or bi-prediction (B mode), may refer to any of several temporal-based compression modes. - In the example of
FIG. 2 ,video encoder 20 includes apartitioning unit 35,prediction processing unit 41,summer 50, transform processingunit 52,quantization unit 54,entropy encoding unit 56, decoded picture buffer (DPB) 64, andDPB management unit 65.Prediction processing unit 41 includes motion estimation unit 42,motion compensation unit 44, andintra prediction unit 46. For video block reconstruction,video encoder 20 also includesinverse quantization unit 58,inverse transform unit 60, andsummer 62. A deblocking filter (not shown inFIG. 2 ) may also be included to filter block boundaries to remove blockiness artifacts from reconstructed video. If desired, the deblocking filter would typically filter the output ofsummer 62. Additional loop filters (in loop or post loop) may also be used in addition to the deblocking filter. - As shown in
FIG. 2 ,video encoder 20 receives video data, andpartitioning unit 35 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as wells as video block partitioning, e.g., according to a quadtree structure of LCUs and CUs.Video encoder 20 generally illustrates the components that encode video blocks within a video slice to be encoded. The slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as tiles).Prediction processing unit 41 may select one of a plurality of possible coding modes, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes, for the current video block based on error results (e.g., coding rate and the level of distortion).Prediction processing unit 41 may provide the resulting intra- or inter-coded block tosummer 50 to generate residual block data and tosummer 62 to reconstruct the encoded block for use as a reference picture. -
Intra prediction unit 46 withinprediction processing unit 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same picture or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 andmotion compensation unit 44 withinprediction processing unit 41 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression. - Motion estimation unit 42 may be configured to determine the inter-prediction mode for a video slice according to a predetermined pattern for a video sequence. The predetermined pattern may designate video slices in the sequence as P slices or B slices. Motion estimation unit 42 and
motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, may indicate the displacement of a PU of a video block within a current video picture relative to a predictive block within a reference picture. - A predictive block is a block that is found to closely match the PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of square difference (SSD), or other difference metrics. In some examples,
video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in decodedpicture buffer 64. For example,video encoder 20 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Therefore, motion estimation unit 42 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision. - Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identify one or more reference pictures stored in decoded
picture buffer 64. Motion estimation unit 42 sends the calculated motion vector toentropy encoding unit 56 andmotion compensation unit 44. - Motion compensation, performed by
motion compensation unit 44, may involve fetching or generating the predictive block based on the motion vector determined by motion estimation, possibly performing interpolations to sub-pixel precision. Upon receiving the motion vector for the PU of the current video block,motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference picture lists.Video encoder 20 forms a residual video block by subtracting pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values. The pixel difference values form residual data for the block, and may include both luma and chroma difference components.Summer 50 represents the component or components that perform this subtraction operation.Motion compensation unit 44 may also generate syntax elements associated with the video blocks and the video slice for use byvideo decoder 30 in decoding the video blocks of the video slice. -
Intra-prediction unit 46 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 42 andmotion compensation unit 44, as described above. In particular,intra-prediction unit 46 may determine an intra-prediction mode to use to encode a current block. In some examples,intra-prediction unit 46 may encode a current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode select unit 40, in some examples) may select an appropriate intra-prediction mode to use from the tested modes. For example,intra-prediction unit 46 may calculate rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number of bits) used to produce the encoded block.Intra-prediction unit 46 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block. - After selecting an intra-prediction mode for a block,
intra-prediction unit 46 may provide information indicative of the selected intra-prediction mode for the block toentropy encoding unit 56.Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode in accordance with the techniques of this disclosure.Video encoder 20 may include in the transmitted bitstream configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks, and indications of a most probable intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table to use for each of the contexts. - After
prediction processing unit 41 generates the predictive block for the current video block via either inter-prediction or intra-prediction,video encoder 20 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processingunit 52. Transform processingunit 52 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. Transform processingunit 52 may convert the residual video data from a pixel domain to a transform domain, such as a frequency domain. - Transform processing
unit 52 may send the resulting transform coefficients toquantization unit 54.Quantization unit 54 quantizes the transform coefficients to further reduce bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples,quantization unit 54 may then perform a scan of the matrix including the quantized transform coefficients. Alternatively,entropy encoding unit 56 may perform the scan. - Following quantization,
entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example,entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding or another entropy encoding methodology or technique. Following the entropy encoding byentropy encoding unit 56, the encoded bitstream may be transmitted tovideo decoder 30, or archived for later transmission or retrieval byvideo decoder 30.Entropy encoding unit 56 may also entropy encode the motion vectors and the other syntax elements for the current video slice being coded. -
Inverse quantization unit 58 andinverse transform unit 60 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference picture.Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists.Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation.Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced bymotion compensation unit 44 to produce a reference block for storage in decodedpicture buffer 64. The reference block may be used by motion estimation unit 42 andmotion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture. - In accordance with this disclosure,
prediction processing unit 41 represents one example unit for performing the example functions described above. For example,prediction processing unit 41 may encode syntax elements that support the use of adaptive resolution CVSs.Prediction processing unit 41 may also generate SPSs that may be activated by one or more resolution sub-sequences, and transmit the SPSs and RSSs to a video decoder. Each of the SPSs may include resolution information for one or more sequences of pictures.Prediction processing unit 41 may also receive and order one or more SPSs and causevideo encoder 20 to code information indicative of the reference pictures that belong to the reference picture set. In addition,DPB management unit 65 may also perform techniques related to the management ofDPB 64. - Also, during the reconstruction process (e.g., the process used to reconstruct a picture for use as a reference picture and storage in DPB 64),
prediction processing unit 41 may construct the plurality of reference picture subsets that each identifies one or more of the reference pictures. Predictionprocessing unit processing 41 may also derive the reference picture set from the constructed plurality of reference picture subsets. Also,prediction processing unit 41 andDPB management unit 65 may implement any one or more of the sets of example pseudo code described below to implement one or more example techniques described in this disclosure. - In accordance with the techniques of this disclosure,
prediction processing unit 41 may generate a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution. The second sub-sequence may include one or more frames each having a second resolution. The first sub-sequence may be different than the second sub-sequence, and the first resolution may be different than the second resolution.Prediction processing unit 41 may further generate a first sequence parameter set and a second sequence parameter set for the video sequence. The first sequence parameter set may indicate the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set may indicate the second resolution of the one or more frames of the second sub-sequence. Also, the first sequence parameter set may be different than the second sequence parameter set.Prediction processing unit 41 may transmit the coded video sequence comprising the first sub-sequence and the second sub-sequence, and the first sequence parameter set and the second sequence. In some examples, the resolution may comprise a spatial resolution. -
Prediction processing unit 41 may also alter the coding of the sequence parameters sets. For example,prediction processing unit 41 may code the first sequence parameter set and the second sequence parameter in a transmitted bitstream prior to either the first sub-sequence or the second sub-sequence.Prediction processing unit 41 may also interleave in the coded video sequence the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence. - In some examples, to transmit the first sequence parameter set and the second sequence parameter set of the coded video sequence,
prediction processing unit 41 may be configured to transmit both the first sequence parameter set and the second sequence parameter set prior to transmitting either of the first sub-sequence and the second sub-sequence. In another example, to transmit the first sequence parameter set and the second sequence parameter set of the coded video sequence,prediction processing unit 41 may be configured to transmit the second sequence parameter set after transmitting at least one frame of the one or more frames of the first sub-sequence, and prior to transmitting the second sub-sequence. - In some examples,
prediction processing unit 41 may code the first sequence parameter set in a transmitted bitstream prior to coding the first sub-sequence andprediction processing unit 41 may also code the second sequence parameter set in the transmitted bitstream after at least one frame of the one or more frames of the first sub-sequence and prior to the second sub-sequence. -
Decoded picture buffer 64, decoded picturebuffer management unit 65 andvideo encoder 20 may also perform the techniques of this disclosure. In some examples, decodedpicture buffer 64 may receive a first decoded frame of video data, wherein the first decoded frame is associated with a first resolution, determine whether a decoded picture buffer is available to store the first decoded frame based on the first resolution, and in the event the decoded picture buffer is available to store the first decoded frame, store the first decoded frame in the decoded picture buffer, and determine whether decodedbuffer 64 is available to store a second decoded frame of video data, wherein the second decoded frame is associated with a second resolution, based on the first resolution and the second resolution, wherein the first decoded frame is different than the second decoded frame. - In some additional examples,
DPB management unit 65 may determine an amount of information that may be stored within decodedpicture buffer 64, determine an amount of information associated with the first decoded frame based on the first resolution, and compare the amount of information that may be stored within decodedpicture buffer 64, and the amount of information associated with the first decoded frame. - In one example, to determine whether decoded
picture buffer 64 is available to store the second decoded frame based on the first resolution and the second resolution,DPB management unit 65 may be configured to determine an amount of information that may be stored within decodedpicture buffer 64 based on the first resolution, determine an amount of information associated with the second decoded frame based on the second resolution, and compare the amount of information that may be stored within decodedpicture buffer 64 and the amount of information associated with the second decoded frame.DPB management unit 65 may also be configured to remove the first decoded frame from decodedpicture buffer 64, and in some examples, the resolution may comprise a spatial resolution. - The techniques described in this disclosure may refer to
video encoder 20 signaling information. Whenvideo encoder 20 signals information, the techniques of this disclosure generally refer to any manner in whichvideo encoder 20 provides the information in a coded bitstream. For example, whenvideo encoder 20 signals syntax elements tovideo decoder 30, it may mean thatvideo encoder 20 transmitted the syntax elements tovideo decoder 30 as part of a coded bitstream viaoutput interface 24 andcommunication channel 16, or thatvideo encoder 20 stored the syntax elements in a coded bitstream onstorage medium 17 and/orfile server 19 for eventual reception byvideo decoder 30. In this way, signaling fromvideo encoder 20 tovideo decoder 30 should not be interpreted as requiring transmission directly fromvideo encoder 20 tovideo decoder 30, although this may be one possibility for real-time video applications. In other examples, however, signaling fromvideo encoder 20 tovideo decoder 30 should be interpreted as any technique with whichvideo encoder 20 provides information in a bitstream for eventual reception byvideo decoder 30, either directly or via an intermediate storage (e.g., instorage medium 17 and/or file server 19). -
Video encoder 20 andvideo decoder 30 may be configured to implement the example techniques described in this disclosure for coding, transmitting, receiving and activating SPSs and RSSs, as well as for managing the DPB. For example,video decoder 30 may invoke the techniques to support adaptive resolution CVSs and to add and remove reference pictures from the DPB.Video decoder 30 may invoke the process in a similar manner. - To support SPSs in a single adaptive-resolution CVS,
prediction processing unit 41 may utilize RSSs. Each RSS may indicate information, such as a resolution of a series of coded video pictures of a CVS.Prediction processing unit 41 may use one resolution sub-sequence (RSS) at given time. Each RSS may reference a single SPS. As an example, if there are “n” RSSs in a given CVS, there may be, altogether, “n” active SPSs when decoding the CVS. However, in some examples, multiple RSSs may refer to a single SPS in a CVS. The SPS or PPS may indicate the different resolution of each RSS. The SPS or PPS may include a resolution ID as well as a syntax element that indicates the resolution associated with each resolution ID. - In accordance with the techniques of this disclosure, a computer-readable storage medium may include a data structure that represents CVSs, SPSs, and RSSs. In particular, the data structure may include a coded video sequence comprising a first sub-sequence and a second sub-sequence. The first sub-sequence may include one or more frames each having a first resolution, and the second sub-sequence may include one or more frames each having a second resolution. The first sub-sequence may also be different than the second sub-sequence, and the first resolution may be different than the second resolution. The data structure may further comprise a first sequence parameter set and a second sequence parameter set for the coded video sequence. The first sequence parameter set may indicate the first resolution of the one or more frames of the first sub-sequence, the second sequence parameter set may indicate the second resolution of the one or more frames of the second sub-sequence, and the first sequence parameter set may be different than the second sequence parameter set.
-
Prediction processing unit 41 ofvideo encoder 20 may order or restrict each of the RSSs according to spatial resolution characteristics of each RSS. In general,prediction processing unit 41 may order the SPSs based on their horizontal resolutions. As an example, if a horizontal size of a resolution “A” of an SPS is greater than that of a resolution “B” of an SPS, a vertical size of the resolution “A” may not be less than that of the resolution “B.” With this restriction, a resolution “C” of an SPS may be considered to be larger than a resolution “D” of an SPS as long as one of a horizontal size and a vertical size of the resolution “C” is greater than a corresponding size of the resolution “D.”Video encoder 20 may assign an RSS with a largest spatial resolution a resolution ID equal to “0,” and an RRS with a second largest spatial resolution a resolution ID equal to “1,” and so forth. - In some examples,
prediction processing unit 41 may not signal a resolution ID. Rather,video encoder 20 may derive the resolution ID according to the spatial resolutions of the RSSs.Prediction processing unit 41 may still order each of the RSSs in each CVS according to the spatial resolutions of each RSS, as described above. The RSS with the largest spatial resolution is assigned a resolution ID equal to 0, and the RSS with the second largest spatial resolution is assigned a resolution ID equal to 1, and so on. - For any RSS with a resolution ID equal to “rId,” during inter-prediction,
prediction processing unit 41 may refer to decoded pictures only within the same RSS, within an RSS with a resolution ID equal to “rId−1,” or within an RSS with a resolution ID equal to “rId+1.”Prediction processing unit 41 may not refer to decoded pictures within other RSSs when performing inter-prediction. - In some examples, there may be additional restrictions on inter-prediction amongst RSSs. In one instance, an RSS
prediction processing unit 41 may only perform inter-prediction of blocks from two adjacent RSSs, i.e., the RSS with the immediately larger spatial resolution and the RSS with the immediately smaller spatial resolution. In another example,prediction processing unit 41 may not be limited to performing inter-prediction using spatially-neighboring RSSs, andprediction processing unit 41 may perform inter-prediction using any RSS, not just spatially neighboring RSSs (e.g., RSSs with rId+1 or rId−1). - The techniques of this disclosure may also include processes and techniques for transmitting and activating picture parameters sets (PPSs). The use of PPSs may decouple the transmission of infrequently changing information from the transmission of coded block data for the CVSs.
Video encoder 20 anddecoder 30 may, in some applications convey or signal the SPSs and PPSs “out-of-band,” or using a different communication channel than that used to communicate the coded block data of the CVSs, e.g., using a reliable transport mechanism. - A PPS raw byte sequence payload (RBSP) may include parameters to which coded slice network abstraction layer (NAL) units of one or more coded pictures may refer. Each PPS RBSP is initially considered not active at a start of a decoding process. At most, one PPS RBSP is considered active at any given moment during the decoding process, and activation of any particular PPS RBSP results in deactivation of a previously-active PPS RBSP, if any.
- In some examples,
prediction processing unit 41 ofvideo encoder 20 andprediction processing unit 81 ofvideo decoder 30 may support RSSs each having the same resolution aspect ratio. In other examples,video encoder 20 anddecoder 30 may support different RSSs having different resolution aspect ratios among the different RSSs. The resolution aspect ratio of an RSS may be defined as the proportion of the width of an RSS versus the height of the RSS. - In the example where
prediction processing units prediction processing units - In order to support CVSs with adaptive-resolution, the techniques of this disclosure propose adding following syntax structures to the SPS. The syntax elements may include a profile indicator or a flag that indicates the existence of more than one spatial resolution in the CVS. Alternatively no flag may be added, but the existence of the more than one spatial resolution in the CVS may be indicated by a particular value of the profile indicator, which may be denoted as profile_idc. Additionally, the syntax elements may include a resolution ID, a syntax element that indicates a spatial relationship between the current resolution sub-sequence and an adjacent spatial resolution sub-sequence, and a syntax element that indicates the required size of the DPB in units of 8×8 blocks.
- According to the techniques of this disclosure, a modified SPS RBSP syntax structure may be expressed as shown below in Table I:
-
TABLE I seq_parameter_set_rbsp( ) { Descriptor profile_idc u(8) reserved_zero_8bits /* equal to 0 */ u(8) level_idc u(8) seq_parameter_set_id ue(v) max_temporal_layers_minus1 u(3) pic_width_in_luma_samples u(16) pic_height_in_luma_samples u(16) bit_depth_luma_minus8 ue(v) bit_depth_chroma_minus8 ue(v) pcm_bit_depth_luma_minus1 u(4) pcm_bit_depth_chroma_minus1 u(4) log2_max_pic_order_cnt_lsb_minus4 ue(v) max_num_ref_frames ue(v) log2_min_coding_block_size_minus3 ue(v) log2_diff_max_min_coding_block_size ue(v) log2_min_transform_block_size_minus2 ue(v) log2_diff_max_min_transform_block_size ue(v) log2_min_pcm_coding_block_size_minus3 ue(v) max_transform_hierarchy_depth_inter ue(v) max_transform_hierarchy_depth_intra ue(v) chroma_pred_from_luma_enabled_flag u(1) loop_filter_across_slice_flag u(1) sample_adaptive_offset_enabled_flag u(1) adaptive_loop_fiter_enabled_flag u(1) pcm_loop_filter_disable_flag u(1) cu_qp_delta_enabled_flag u(1) temporal_id_nesting_flag u(1) inter_4x4_enabled_flag u(1) adaptive_spatial_resolution_flag u(1) if (adaptive_spatial_resolution_flag ) { resolution_id ue(v) for ( i = 0; i < 2; i++) { cropping_resolution_idc[ i ] u(2) if (cropping_resolution_idc[ i ] & 0x01) { cropped_left[ i ] ue(v) cropped_right[ i ] ue(v) } if (cropping_resolution_idc[ i ] & 0x10) { cropped_top[ i ] ue(v) cropped_bottom[ i ] ue(v) } } } max_dec_pic_buffering ue(v) rbsp_trailing_bits( ) } - An exemplary description of the new SPS syntax elements in Table I is set forth in more detail below.
- adaptive_spatial_resolution_flag: When equal to “1,” the flag indicates that a CVS containing an RSS referring to an SPS may contain pictures with different spatial resolutions. When equal to “0,” the flag indicates that all pictures in the CVS have a same spatial resolution, or equivalently, that there is only one RSS in the CVS. This syntax element applies to the entire CVS, and its value shall be identical for all SPSs that may be activated for the CVS.
- The adaptive_spatial_resolution flag is only one example of how adaptive resolution CVSs may be implemented. As another example, there may be one or more profiles defined that enable adaptive spatial resolution. Accordingly, the value of the profile_idc syntax element, which may indicate the selection of an adaptive resolution profile, may signal the enablement of adaptive resolution.
- resolution_id: Specifies an identifier of the RSS referring to the SPS. A value of resolution_id may be in a range of “0” to “7,” inclusive. An RSS with a largest spatial resolution among all RSSs in the CVS may have resolution_id equal to “0.”
- cropping_resolution_idc[i]: Indicates whether cropping is needed to specify a reference region of a reference picture from a target RSS, as defined below, used for inter-prediction as a reference when decoding a coded picture from a current RSS.
- The pseudocode that follows describes one example of how the numbering of an RSS using the resolution_id value that refers to an SPS may be implemented according to the techniques of this disclosure.
-
- Let “rId” be a resolution_id of the current RSS;
- The target RSS is the RSS with a resolution_id equal to: rId+(i==0?−1:1);
- If the current RSS has a resolution_id equal to 0, cropping_resolution_idc[0]=0
- If the current RSS has a largest resolution_id among all RSSs in the CVS, cropping_resolution_idc[1]=0
- As described above, the techniques of this disclosure may enable RSSs and SPSs that may have different aspect ratios. When performing inter-prediction,
video encoder 20 may predict the pixel values of a block from a block of a reference picture that has a different aspect ratio. Because of the difference in the aspect ratios,video encoder 20 may crop the portion of the block of the reference block in order to obtain a block with a similar resolution aspect ratio to the predictive block. The following syntax elements describe howvideo encoder 20 may perform cropping of blocks to obtain blocks with different resolution aspect ratios. - Cropping_resolution_idc[i] equal to “0” indicates that the target RSS does not exist, or that no cropping is needed.
- Cropping_resolution_idc[i] equal to “1” indicates that cropping at a left and/or right side is needed.
- Cropping_resolution_idc[i] equal to “2” indicates that cropping at a top and/or bottom is needed.
- Cropping_resolution_idc[i] equal to “3” indicates that cropping at both the left/right and the top/bottom is needed.
- Table II below illustrates the various values of Cropping_resolution_idc[i], and the corresponding indications.
-
TABLE II cropping_resolution_idc[ i ] 0 No cropping is needed 1 Cropping may happen at the left and/or right side 2 Cropping may happen at the top and/or bottom 3 Cropping may happen at both left/right and top/bottom - In addition to “cropping_resolution_idc” value, the RBSP of an SPS may also include syntax elements that may indicate the number of pixels to be cropped from the top, bottom, left, and/or right of a reference picture from an RSS. These additional cropping syntax elements are described in further detail below.
- cropped_left[i]: Specifies a number of pixels to be cropped at a left side of a luma component of the reference picture from the target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to “0.” - cropped_right[i]: Specifies a number of pixels to be cropped at a right side of the luma component of the reference picture from the target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to “0.” - cropped_top[i]: Specifies a number of pixels to be cropped at a top of the luma component of the reference picture from the target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to “0.” - cropped_bottom[i]: Specifies a number of pixels to be cropped at a bottom of the luma component of the reference picture from the target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to “0.” - In addition to signaling a bottom, top, left, and/or right cropping,
video encoder 20 may signal the cropping window in other ways. As an example,video encoder 20 may signal the cropping window as the starting vertical and horizontal positions plus the width and height. As another example,video encoder 20 may signal the cropping window as the starting vertical and horizontal positions and the ending vertical and horizontal positions. - Before
prediction processing unit 41 may use a coded picture in the current RSS,prediction processing unit 41 may crop a decoded picture from the target RSS as specified by the above cropping syntax elements.prediction processing unit 41 may also scale the cropped reference picture to be the same resolution as the coded picture in the current RSS, and scale the motion vectors of the cropped block accordingly. - As described above,
video encoder 20 may each includeDPB 64 that may contain decoded pictures.DPB management units 65 may manageDPB 64. Each decoded picture contained withinDPB 64 may be needed for either inter-prediction as a reference, or for future output. In accordance with the techniques of this disclosure,DPB 64 may be modified to support adaptive-resolution CVSs, and more generally to store frames of different sizes. - In accordance with the techniques of this disclosure, prior to initialisation, the DPB may empty (i.e., an indication of a proportion of
DPB 64 that is unavailable to store decoded pictures, or DPB “fullness,” is set to “0”). When a decoded picture is stored inDPB 64,DPB management unit 65 may increment the “fullness” of the DPB by the number of blocks (e.g., CUs or 8×8 pixel blocks) in the picture. Similarly, whenDPB management unit 65 removes a decoded picture fromDPB 64,DPB management unit 65 may decrease the fullness of the DPB by the number of blocks (e.g., CUs or 8×8 pixel blocks) in the removed picture. - To support a DPB that utilizes a count a block count rather than a frame count to indicate the “fullness” of the DPB, the RBSP of an SPS may include a syntax element that specifies a size of the DPB in 8×8 blocks. The parameter, denoted as max_dec_pic_buffering, specifies a required size of a decoded picture buffer (DPB), in units of 8×8 blocks, for decoding the CVS. This syntax element may apply to the entire CVS, and its value is identical for all SPSs that may be activated for the CVS. Further detail of the operation of the DPB is described with respect to
FIG. 5 , below. -
FIG. 3 is a block diagram illustrating anexample video decoder 30 that may implement the techniques described in this disclosure. In the example ofFIG. 3 ,video decoder 30 includes anentropy decoding unit 80,prediction processing unit 81,inverse quantization unit 86,inverse transformation unit 88,summer 90, decoded picture buffer (DPB) 92, andDBP management unit 93.Prediction processing unit 81 includesmotion compensation unit 82 andintra prediction unit 84.Video decoder 30 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect tovideo encoder 20 fromFIG. 2 . - During the decoding process,
video decoder 30 receives an encoded video bitstream that represents video blocks of an encoded video slice and associated syntax elements fromvideo encoder 20.Entropy decoding unit 80 ofvideo decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements.Entropy decoding unit 80 forwards the motion vectors and other syntax elements toprediction processing unit 81.Video decoder 30 may receive the syntax elements at the video slice level and/or the video block level. - When the video slice is coded as an intra-coded (I) slice,
intra prediction unit 84 ofprediction processing unit 81 may generate prediction data for a video block of the current video slice based on a signaled intra prediction mode and data from previously decoded blocks of the current picture. When the video picture is coded as an inter-coded (i.e., B or P) slice,motion compensation unit 82 ofprediction processing unit 81 produces predictive blocks for a video block of the current video slice based on the motion vectors and other syntax elements received fromentropy decoding unit 80. The predictive blocks may be produced from one of the reference pictures within one of the reference picture lists.Video decoder 30 may construct the reference frame lists,List 0 andList 1, using default construction techniques based on reference pictures stored in decodedpicture buffer 92. In some examples,video decoder 30 may constructList 0 andList 1 from the reference pictures identified in the derived reference picture set. -
Motion compensation unit 82 determines prediction information for a video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block being decoded. For example,motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra- or inter-prediction) used to code the video blocks of the video slice, an inter-prediction slice type (e.g., B slice or P slice), construction information for one or more of the reference picture lists for the slice, motion vectors for each inter-encoded video block of the slice, inter-prediction status for each inter-coded video block of the slice, and other information to decode the video blocks in the current video slice. -
Motion compensation unit 82 may also perform interpolation based on interpolation filters.Motion compensation unit 82 may use interpolation filters as used byvideo encoder 20 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case,motion compensation unit 82 may determine the interpolation filters used byvideo encoder 20 from the received syntax elements and use the interpolation filters to produce predictive blocks. -
Inverse quantization unit 86 inverse quantizes, i.e., de quantizes, the quantized transform coefficients provided in the bitstream and decoded byentropy decoding unit 80. The inverse quantization process may include use of a quantization parameter calculated byvideo encoder 20 for each video block in the video slice to determine a degree of quantization and, likewise, a degree of inverse quantization that should be applied.Inverse transform unit 88 applies an inverse transform, e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain. - After
prediction processing unit 81 generates the predictive block for the current video block based on either inter- or intra-prediction,video decoder 30 forms a decoded video block by summing the residual blocks frominverse transform unit 88 with the corresponding predictive blocks generated byprediction processing unit 81.Summer 90 represents the component or components that perform this summation operation. If desired, a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts. Other loop filters (either in the coding loop or after the coding loop) may also be used to smooth pixel transitions, or otherwise improve the video quality.DPB management unit 93 may store the decoded video blocks of a given in decodedpicture buffer 92, which stores reference pictures used for subsequent motion compensation.Decoded picture buffer 92 also stores decoded video for later presentation on a display device, such asdisplay device 32 ofFIG. 1 . - In accordance with this disclosure,
prediction processing unit 81 andDPB management unit 93 represent example units for performing the example functions described above. For example,prediction processing unit 81 may receive a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution.Prediction processing unit 81 may also receive a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set.Prediction processing unit 81 may also use the first sequence parameter set and the second sequence parameter set to decode the coded video sequence. - As another example in accordance with the techniques of this disclosure,
prediction processing unit 81 may also receive a first decoded frame of video data, wherein the first decoded frame is associated with a first resolution.DPB management unit 93 may determine whetherDPB 92 is available to store the first decoded frame based on the first resolution, and in the event the decoded picture buffer is available to store the first decoded frame, store the first decoded frame inDPB 92, and determine whether theDPB 93 is available to store a second decoded frame of video data, wherein the second decoded frame is associated with a second resolution, based on the first resolution and the second resolution, wherein the first decoded frame is different than the second decoded frame. - In general,
video decoder 30 may perform any of the techniques of this disclosure. In some examples,video decoder 30 may perform some or all of the techniques described above with respect tovideo encoder 20 inFIG. 2 . In some examples,video decoder 30 may perform the techniques described with respect toFIG. 2 in a reciprocal ordering or manner to that described with respect tovideo encoder 20. -
FIGS. 4A-4D are conceptual diagrams that illustrate examples of a coded bitstream including coded video data in accordance with the techniques of this disclosure. As shown inFIG. 4A , a codedbitstream 400 may comprise one or more coded video sequences (CVSs), in particular,CVS 402 andCVS 404. As also shown inFIG. 4A , each ofCVS 402 andCVS 404 may comprise one or more frames, or “pictures,” PIC_1 (0)-PIC_1 (N), and PIC_2 (0)-PIC_2 (M), respectively. As still further shown inFIG. 4A , each ofCVS 402 andCVS 404 may further comprise a single sequence parameter set (SPS), in particular, SPS1 and SPS2, respectively. As described above, each of SPS1 and SPS2 may define parameters for the corresponding one ofCVS 402 andCVS 404, including LCU size, SCU size, and other syntax information for the respective CVS that is common to all frames, or “pictures” within the CVS. - As shown in
FIG. 4B , a particular CVS,CVS 406, may further comprise one or more picture parameter sets (PPSs), in particular, PPS1 and PPS2. As described above, each of PPS1 and PPS2 may define parameters forCVS 406, including syntax information that indicates picture resolution, that are common to one or more pictures withinCVS 406, but not to all pictures withinCVS 406. For example, syntax information included within each of PPS1 and PPS2, e.g., picture resolution syntax information, may apply to a sub-set of the pictures included withinCVS 406. As one example, PPS1 may indicate picture resolution for PIC_1 (0)-PIC_1 (N), and PPS2 may indicate picture resolution for PIC_2 (0)-PIC_2 (M). Accordingly,CVS 406 may comprise pictures having different resolutions, wherein picture resolution for a particular one or more pictures (e.g., PIC_1 (0)-PIC_1 (N)) withinCVS 406 that share a common picture resolution may be specified by a corresponding one of PPS1 and PPS2. - In cases where pictures having different resolutions are alternated within a CVS in a decoding order, e.g., in a resolution-adaptive CVS, a PPS may have to be signaled prior to each picture having a different picture resolution relative to a previous picture in the decoding order, to indicate the picture resolution for the currently decoded picture. Accordingly, in such cases, multiple PPSs may need to be signaled throughout decoding the CVS, which may increase coding overhead.
- As described above, A PPS RBSP may include parameters that can be referred to by coded slice NAL units of one or more coded pictures. Each PPS RBSP is initially considered not active at a start of a decoding process. In most examples, one PPS RBSP is considered active at any given moment during the decoding process, and activation of any particular PPS RBSP results in deactivation of a previously-active PPS RBSP, if any.
- When a PPS RBSP (with a particular value of the pic_parameter_set_id syntax element) is not active, and is referred to by a coded slice NAL unit (using the particular value of pic_parameter_set_id), the PPS referred to by the pic_parameter_sed_id is activated. This PPS RBSP is referred to as an “active PPS RBSP,” until it is deactivated by an activation of another PPS.
Video encoder 20 ordecoder 30 may require a PPS with the referenced pic_parameter_set_id, value to have been received before activating that PPS with that pic_parameter_set_id. - As an example of the PPS activation process, a NAL unit may refer to PPS1.
Video encoder 20 ordecoder 30 may activate PPS1 based on the reference to PPS1 in the NAL unit. PPS1 is the active PPS RBSP. PPS1 remains the active PPS RBSP until a NAL unit references PPS2, at whichpoint video encoder 20 ordecoder 30 may activate PPS2. Once activated, PPS2 becomes the active PPS RBSP, and PPS1 is no longer the active PPS RBSP. - Any PPS NAL unit that has the same pic_parameter_set_id value for the active PPS RBSP for a coded picture may have the same content as that of the active PPS RBSP for the coded picture. That is, if the pic_parameter_set_id of the PPS NAL is the same as that of the active PPS RBSP, the content of the active PPS RBSP may not change. There may be an exception to this rule, however. If a PPS NAL has the same pic_parameter_set_id as the active PPS RBSP, and the PPS NAL follows the last Video Coding Layer (VCL) NAL unit of the coded picture, and precedes the first VCL NAL unit of another coded picture, then the content of the active PPS RBSP may change (e.g., the pic_parameter_set_id value may indicate a different set of parameters).
- In accordance with the techniques of this disclosure, as shown in
FIGS. 4C-4D , syntax information that indicates picture resolution for one or more pictures within a CVS, wherein the CVS comprises one or more pictures having different sizes, may be indicated using multiple SPSs for the CVS, rather than using a plurality of PPSs, as described above with reference toFIGS. 4A-4B . - A SPS RBSP may include parameters that can be referred to by one or more PPS RBSPs, or one or more Supplemental Extension Information (SEI) NAL units containing a buffering period SEI message. Each SPS is initially considered not active at a start of a decoding process. At most, one SPS may be considered active for each RSS at any given moment during the decoding process, and the activation of any particular SPS may result in a deactivation of a previously-active SPS for the same resolution sub-sequence, if any. Also, if there are “n” resolution sub-sequences within the CVS, at most “n” SPS RBSPs may be considered active for the entire CVS at any given moment during the decoding process.
- When an SPS RBSP (with a particular value of seq_parameter_set_id) is not already active, and is referred to by activation of a PPS RBSP (using the particular value of seq_parameter_set_id), or is referred to by an SEI NAL unit containing a buffering period SEI message (using the particular value of seq_parameter_set_id), the SPS RBSP is activated. This SPS RBSP may be referred to as an “active SPS RBSP” for the associated RSS (the RSS in which the coded pictures refers to the active SPS RBSP through the PPS RBSPs), until it is deactivated by an activation of another SPS RBSP.
Video encoder 20 ordecoder 30 may require the SPS RBSP with a particular value of seq_parameter_set_id, to be available tovideo encoder 20 orvideo decoder 30 prior to the activation of that SPS. Additionally, the SPS may remain active for the entire RSS in the CVS. - Additionally, because an instantaneous decoder refresh (IDR) access unit begins a new CVS, and an activated SPS RBSP may remain active for the entire RSS in the CVS, an SPS RBSP may only be activated by a buffering period SEI message when the buffering period SEI message is part of an IDR access unit.
- Any SPS NAL unit containing the particular value of seq_parameter_set_id for the active SPS RBSP for a RSS in a CVS may have the same content as that of the active SPS RBSP for the RSS in the CVS, unless it follows a last access unit of the CVS, and precedes the first VCL NAL unit and the first SEI NAL unit containing a buffering period SEI message (when present) of another CVS.
- Also, if a PPS RBSP or an SPS RBSP is conveyed within the bitstream, these constraints impose an order constraint on the NAL units that contain the PPS RBSP or the SPS RBSP, respectively. Otherwise if PPS RBSP or SPS RBSP are conveyed by other means not specified in this disclosure, they should be available to the decoding process in a timely fashion such that these constraints are obeyed.
- The constraints that are expressed on the relationship between the values of the syntax elements (and the values of variables derived from those syntax elements) in SPS and PPS, and other syntax elements, are typically expressions of constraints that apply only to the active SPS and the active PPS. If any SPS RBSP is present that is not activated in the bitstream, its syntax elements usually have values that would conform to the specified constraints if it were activated by reference in an otherwise conforming bitstream. If any PPS RBSP is present that is not ever activated in the bitstream, the syntax elements of the PPS RBSP may have values that would conform to the specified constraints if the PPS were activated by reference in an otherwise-conforming bitstream.
- During the decoding process, the values of parameters of the active PPS and the active SPS may be considered to be in effect. For interpretation of SEI messages, the values of the parameters of the PPS and SPS that are active for the operation of the decoding process for the VCL NAL units of the primary coded picture in the same access unit may be considered in effect unless otherwise specified in the SEI message semantics.
- As one example, as shown in
FIG. 4C ,CVS 408 may include one or more SPSs, in particular, SPS1 and SPS2, that each indicate picture resolution for PIC_1 (0), PIC_1 (1), etc., and PIC_2 (0), PIC_2 (1), etc., respectively. In other words, SPS1 indicates picture resolution information for PIC_1 (0), PIC_1 (1), etc., and SPS2 indicates picture resolution information for PIC_2 (0), PIC_2 (1), etc. In this example,CVS 408 may further comprise one or more PPSs (not shown), wherein the one or more PPSs may specify syntax information for one or more pictures ofCVS 408, but wherein the one or more PPSs do not include any syntax information that indicates picture resolution for any of the one or more pictures ofCVS 408. - In this example, SPS1 and SPS2 may indicate picture resolution information for all pictures within
CVS 408, even in cases where pictures having different resolutions are alternated within a CVS in the decoding order. Accordingly, after the indicating picture resolution information for all pictures withinCVS 408 using SPS1 and SPS2, no additional indication of the information may be needed. - As shown in
FIG. 4C , the multiple SPSs, e.g., SPS1 and SPS2, may be located at the beginning of the corresponding CVS, e.g.,CVS 408, prior to any of PIC_1 (0), PIC_1 (1) and PIC_2 (0), PIC_2 (1). As shown inFIG. 4D , alternatively, an SPS that indicates picture resolution information for one or more pictures may be located before a first one of such pictures in a decoding sequence. For example, as shown inFIG. 4D , SPS2 is located withinCVS 410 prior to a first one of pictures PIC_2 (0), PIC_2 (1), etc., but after a first one of PIC_1 (0), PIC_1 (1), etc. -
FIG. 5 is a conceptual diagram illustrating the operation of a decoded picture buffer of a hypothetical reference decoder (HRD) model in accordance with the techniques of this disclosure.FIG. 5 includes coded picture buffer (CPB) 502, decoded picture buffer (DPB) 504, and DPB management unit 506. DPB management unit 506 may remove a picture in coded picture buffer (CPB) 502.Video encoder 20 ordecoder 30 may decode the picture, and DPB management unit 506 may store the decoded picture in decodedpicture buffer 504. Based on various criteria, such as an output time, output flag, or a picture count, DPB management unit 506 may remove a picture fromDPB 504. In somecases video encoder 20 ordecoder 30 may output the decoded picture.CPB 502 may contain encoded pictures that are removed so thatvideo encoder 20 ordecoder 30 may utilize the decoded pictures that may be needed for inter-prediction as a reference, or for future output. In general,DPB 504 may include a maximum capacity. In previous video coding standards,DPB 504 may include a maximum number of frames that can be stored in the DPB. However, the support adaptive-resolution CVSs, DPB management unit 506 may maintain a count of blocks contained within the DPB to measure the “fullness” of the DPB. - This disclosure describes the removal techniques of decoded pictures in the DPB from at least two perspectives. In the first perspective, DPB management unit 506 of
video decoder 30 may remove decoded pictures based on an output time if the pictures are intended for output. In the second perspective, DPB management unit 506 may remove decode pictures based on the picture order count (POC) values if the pictures are intended for output. In either perspectives, DPB management unit 506 may remove decoded pictures that are not needed for output (i.e., outputted already or not intended for output) when the decoded picture is not in the reference picture set, and prior to decoding the current picture. Although described with respect tovideo decoder 30,video encoder 20 and DPB management unit 506 ofvideo encoder 20 may also perform any of the DPB management techniques described in this disclosure. -
DPB 504 may include a plurality of buffers, and each buffer may store a decoded picture that is to be used as a reference picture or is held for future output. Initially, the DPB is empty (i.e., the DPB fullness is set to zero). In the described example techniques, the removal of the decoded pictures from the DPB may occur before the decoding of the current picture, but aftervideo decoder 30 parses the slice header of the first slice of the current picture. - In the first perspective, the following techniques may occur instantaneously at time tr(n) in the following sequence. In this example, tr(n) is CPB removal time (i.e., decoding time) of the access unit n containing the current picture. As described in this disclosure, the techniques occurring instantaneously may mean that the in the HRD model, it is assumed that decoding of a picture is instantaneous, with a time period for decoding a picture equal to zero.
- In the first perspective,
decoder 30 may invoke the derivation process for a reference picture set. If the current picture, which DPB management unit 506 may retrieve fromCPB 502 is an IDR picture, DPB management unit 506 may remove all decoded pictures fromDPB 504, and may set and the DPB fullness to 0. If the decoded picture is not an IDR picture, DPB management unit 506 may remove all pictures not included in the reference picture set of the current picture fromDPB 504. DPB management unit 506 may also remove all pictures having an OutputFlag value equal to “0”, or having DPB output time is less than or equal to the CPB removal time of the current picture, which may be referred to as “n” (i.e., to,dpb(m)<=tr(n)). The OutputFlag may indicate thatvideo decoder 30 should output the picture (e.g., for display or for transmission in the case of an encoder). - Whenever DPB management unit 506 removes a picture from
DPB 504, DPB management unit 506 may decrement the fullness ofDPB 504 by the number of 8×8 blocks in the picture, i.e., (pic_width_in_luma_samples*pic_height_in_luma_samples)>>6. - After DPB management unit 506 has removed any pictures from the DPB,
video decoder 30 may decode and store the received picture “n” in the DPB. DPB management unit 506 may increment the DPB fullness by the number of 8×8 blocks in the stored decoded picture, i.e., (pic_width_in_luma_samples*pic_height_in_luma_samples)>>6. - Each picture may also have an OutputFlag, as described above. When the picture has an OutputFlag value equal to 1, the DPB output time, denoted as to,dpb(n), of the picture may be derived by the following equation.
-
t o,dpb(n)=t r(n)+t c*dpb_outputdelay(n) - In the equation, dpb output delay(n) may be the value of dpb output delay specified in the picture timing SEI message associated with access unit “n.”
- If the OutputFlag of a picture is equal to “1” and to,dpb(n)=tr(n),
video decoder 30 may output the current picture. Otherwise, if the value of OutputFlag is equal to 0,video decoder 30 may not output the current picture. Otherwise, (i.e., if OutputFlag is equal to 1 and to,dpb(n)>tr(n)),video decoder 30 may output the current picture later, at time to,dpb(n). - As described above, in some examples,
video decoder 30 may crop the picture in the decoded picture buffer.Video decoder 30 may utilize the cropping rectangle specified in the active sequence parameter set for the picture to determine the cropping rectangle. - In some examples,
video decoder 30 may determine a difference between the DPB output time for a picture and the DPB output time for a picture following the picture in output order. When picture “n” is a picture that is output and is not the last picture of the bitstream that is output, the output time of picture “n” Δto,dpb(n) may be defined according to the following equation. -
Δt o,dpb(n)=t o,dpb(n n)−t o,dpb(n) - In preceding equation, nn may denote the picture that follows after picture “n” in output order and has OutputFlag equal to 1.
- In the second perspective for removing decoded pictures, the HRD may implement the techniques instantaneously when DPB management unit 506 removes an access unit from
CPB 502. Again,video decoder 30 and DPB management unit 506 ofvideo decoder 30 may implement the removing of decoded pictures fromDPB 504, andvideo decoder 30 may not necessarily includeCPB 502. In some examples,video decoder 30 andvideo encoder 20 may not requireCPB 502. Rather,CPB 504 is described as part of the HRD model for purposes of illustration only. - As above, in the second perspective for removing decoded pictures, DPB management unit 506 may remove the pictures from the DPB before the decoding of the current picture, but after parsing the slice header of the first slice of the current picture. Also, similar to the first perspective for removing decoded pictures, in the second perspective,
video decoder 30 and DPB management unit 506 may perform similar functions to those described above with respect to the first perspective when the current picture is an IDR picture. - Otherwise, if the current picture is not an IDR picture, DPB management unit 506 may empty, without output, buffers of the DPB that store a picture that is marked as “not needed for output” and that store pictures not included in the reference picture set of the current picture. DPB management unit 506 may also decrement the DPB fullness by the number of buffers that DPB management unit 506 emptied. When there is not empty buffer (i.e., the DPB fullness is equal to the DBP size), DPB management unit 506 may implement a “bumping” process described below. In some examples, when there is no empty buffer, DPB management unit 506 may implement the bumping process repeatedly unit there is an empty buffer in which
video decoder 30 can store the current decoded picture. - In general,
video decoder 30 may implement the following steps to implement the bumping process.Video decoder 30 may first determine the picture to be outputted. For example,video decoder 30 may select the picture having the smaller PicOrderCnt (POC) value of all the pictures inDPB 504 that are marked as “needed for output.”Video decoder 30 may crop the selected picture using the cropping rectangle specified in the active sequence parameter set for the picture.Video decoder 30 may output the cropped picture, and may mark the picture as “not needed for output.”Video decoder 30 may check the buffer ofDPB 504 that stored the cropped and outputted picture. If the picture is not included in the reference picture set, DPB management unit 506 may empty that buffer and may decrement the DPB fullness by the number of 8×8 blocks in the removed picture. - Although the above techniques for the DPB management are described from the context of
video decoder 30 andDPB management unit 65, in some examples,video encoder 20, andDPB management unit 93 may implement similar techniques. However,video encoder 20 implementing similar techniques is not required in every example. In some examples,video decoder 30 may implement these techniques, andvideo encoder 20 may not implement these techniques. - In this manner, a video coder (e.g.,
video encoder 20 or video decoder 30) may implement techniques to support CVSs having adaptive resolution. Again, the reference picture set may identify the reference pictures that can potentially be used for inter-predicting the current picture and can potentially be used for inter-predicting one or more picture following the current picture in decoding order. - In the above examples, the DPB size or fullness may be signaled with respect to the number of 8×8 blocks of a pictured stored in the DPB. Alternatively, the fullness of the DPB, i.e., the max_dec_pic buffering syntax element, may be signaled based on the number of smallest coding units (SCUs) of a picture. For example, if the smallest SCU among all active SPSs is 16×16, then the unit of max_dec_pic buffering may be 16×16 blocks.
- As still another example,
video encoder 20 ordecoder 30 may signal the DPB size, indicated by the max_dec_pic buffering syntax element, using units of frame buffers that are specific to the spatial resolution indicated by the SPS. For example, if there are two RSSs, rss1 and rss2, with resolution res1 and resolution res2, referring to SPS sps1 and SPS sps2 respectively, wherein res1 is greater than res2, then max_dec_pic buffering in sps1 is counted in frame buffers of res1, and max_dec_pic buffering in sps2 is counted in frame buffers of res2. In this example,video encoder 20 ordecoder 30 may be subject to the restriction that the DPB size, if counted in units of 8×8 blocks, indicated by the max_dec_pic buffering value in sps1 may not be less than that indicated by the max_dec_pic buffering value in sps2. Consequently, in the DPB operations, whenvideo decoder 30 removes one frame buffer of res1 fromDPB 504, the freed buffer space may be sufficient for insertion of a decoded picture of either resolution. However, whendecoder 30 removes one frame buffer of res2 fromDPB 504, the freed buffer space may not be sufficient for insertion of a decoded picture of res1. Rather,video decoder 30 may remove multiple frame buffers of res2 fromDPB 504 in this case. - The
video decoder 30 may derive the reference picture set in any manner, including the example techniques described above.Video decoder 30 may determine whether a decoded picture stored in the decoded picture buffer is not needed for output and is not identified in the reference picture set. Whenvideo decoder 30 has outputted the decoded picture and the decoded picture is not identified in the reference picture set, thevideo decoder 30 may remove the decoded picture from the decoded picture buffer. Subsequent to removing the decoded picture,video decoder 30 may code the current picture. For example,video decoder 30 may construct the reference picture list(s) as described above, and code the current picture based on the reference picture list(s). -
FIG. 6 is a flowchart illustrating an example operation of using a first sub-sequence and a second sub-sequence to decode video in accordance with the techniques of this disclosure. For purposes of illustration only, the method ofFIG. 6 may be performed by a video coder corresponding to eithervideo encoder 20 orvideo decoder 30. In the method ofFIG. 6 , the video coder may process a coded video sequence comprising a first sub-sequence and a second sub-sequence (601). The first sub-sequence may include one or more frames each having a first resolution, and the second sub-sequence may include one or more frames each having a second resolution. The first sub-sequence may be different than the second sub-sequence, and the first resolution may be different than the second resolution. - The video coder (e.g.,
video encoder 20 or video decoder 30) may also process a first sequence parameter set (SPS) and a second sequence parameter set for the coded video sequence (602). The first sequence parameter set may indicate the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set may indicate the second resolution of the one or more frames of the second sub-sequence. The first sequence parameter set may also be different than the second sequence parameter set. The video coder (e.g.,video encoder 20 or video decoder 30) may use the first sequence parameter set and the second sequence parameter set to code the coded video sequence (603). - In some examples, the video coder may comprise an encoder, e.g.,
encoder 20 ofFIGS. 1-2 . In the case where the video coder comprises a decoder, processing SPSs and sub-sequences may comprise receiving the SPSs and sub-sequences. In this case, coding the first and second video sequences may comprise decoding the first and second video sequences. - In the case where the video coder comprises an encoder, processing SPSs and sub-sequences may comprise generating the SPSs and sub-sequences. In this case, coding the first and second video sequences may comprise encoding the first and second video sequences. Additionally in the case where the video coder comprises an encoder, the video encoder may transmit the coded video sequence comprising the first sub-sequence and the second subs-sequence instead of receiving the video sequence comprising the first and second sub-sequence. In some examples, the first resolution and the second resolution may each comprise a spatial resolution.
- In some examples, the video coder may code the first sequence parameter set and the second sequence parameter in a received bitstream prior to either the first sub-sequence or the second sub-sequence.
- In another example, to receive the first sequence parameter set and the second sequence parameter set of the coded video sequence, the video coder may be configured to receive both the first sequence parameter set and the second sequence parameter set prior to receiving either of the first sub-sequence and the second sub-sequence.
- In another example, the video coder may code the first sequence parameter set in a received bitstream prior to the first sub-sequence and the second sequence parameter set is coded in the received bitstream after at least one frame of the one or more frames of the first sub-sequence, and prior to the second sub-sequence.
- In another example, to receive the first sequence parameter set and the second sequence parameter set of the coded video sequence, the video coder may be configured to receive the second sequence parameter set after receiving at least one frame of the one or more frames of the first sub-sequence, and prior to receiving the second sub-sequence.
- In yet another example, the video coder may interleave the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence in the coded video sequence.
-
FIG. 7 is a flowchart illustrating an example operation of managing a decoded picture buffer. For purposes of illustration only, the method ofFIG. 7 may be performed by a video coder corresponding to eithervideo encoder 20 orvideo decoder 30. In the method ofFIG. 7 , a video coder may receive a coded video sequence comprising a first sub-sequence and a second sub-sequence (701). The first sub-sequence may include one or more frames each having a first resolution, and the second sub-sequence may include one or more frames each having a second resolution. The first sub-sequence may be different than the second sub-sequence, and the first resolution may be different than the second resolution. The video coder may receive a first decoded frame of video data, and the first decoded frame may be associated with a first resolution. In some examples, wherein the resolution may comprise a spatial resolution. - In accordance with the method illustrated in
FIG. 7 , the video coder may also determine whether a decoded picture buffer is available to store the first decoded frame based on the first resolution (702). In the event the decoded picture buffer is available to store the first decoded frame, the video coder may store the first decoded frame in the decoded picture buffer, and determine whether the decoded picture buffer is available to store a second decoded frame of video data. The second decoded frame of video data may be associated with a second resolution. The video coder may also determine whether the decoded picture buffer is available to store the second decoded frame based on the first resolution and the second resolution (704). The first decoded frame may also be different than the second decoded frame. - In some examples, to determine whether the decoded picture buffer is available to store the first decoded frame based on the first resolution, the video coder may be configured to determine an amount of information that may be stored within the decoded picture buffer, determine an amount of information associated with the first decoded frame based on the first resolution, and compare the amount of information that may be stored within the decoded picture buffer and the amount of information associated with the first decoded frame.
- In an example, to determine whether the decoded picture buffer is available to store the second decoded frame based on the first resolution and the second resolution, the video coder may be configured to determine an amount of information that may be stored within the decoded picture buffer based on the first resolution, determine an amount of information associated with the second decoded frame based on the second resolution, and compare the amount of information that may be stored within the decoded picture buffer and the amount of information associated with the second decoded frame.
- In some examples, the video coder may be further configured to remove the first decoded frame from the decoded picture buffer. The video coder may also be an encoder, e.g.,
encoder 20 ofFIGS. 1-2 , or a decoder, e.g.,decoder 30 ofFIGS. 1-2 , in some examples. - By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
- Various examples have been described. These and other examples are within the scope of the following claims.
Claims (35)
1. A method of decoding video data, the method comprising:
receiving a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
receiving a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
using the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.
2. The method of claim 1 , wherein the first sequence parameter set and the second sequence parameter set are coded in a received bitstream prior to either the first sub-sequence or the second sub-sequence.
3. The method of claim 1 , wherein receiving the first sequence parameter set and the second sequence parameter set of the coded video sequence comprises:
receiving both the first sequence parameter set and the second sequence parameter set prior to receiving either of the first sub-sequence and the second sub-sequence.
4. The method of claim 1 , wherein the first sequence parameter set is coded in a received bitstream prior to the first sub-sequence and the second sequence parameter set is coded in the received bitstream after at least one frame of the one or more frames of the first sub-sequence, and prior to the second sub-sequence.
5. The method of claim 1 , wherein receiving the first sequence parameter set and the second sequence parameter set of the coded video sequence comprises:
receiving the second sequence parameter set after receiving at least one frame of the one or more frames of the first sub-sequence, and prior to receiving the second sub-sequence.
6. The method of claim 1 , wherein the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence are interleaved in the coded video sequence.
7. The method of claim 1 , wherein the first resolution and the second resolution each comprise a spatial resolution.
8. An apparatus for decoding video data, the apparatus comprising a video decoder configured to:
receive a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
receive a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
use the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.
9. The apparatus of claim 8 , wherein the first sequence parameter set and the second sequence parameter set are coded in a received bitstream prior to either the first sub-sequence or the second sub-sequence.
10. The apparatus of claim 8 , wherein to receive the first sequence parameter set and the second sequence parameter set of the coded video sequence, the apparatus is configured to:
receive both the first sequence parameter set and the second sequence parameter set prior to receiving either of the first sub-sequence and the second sub-sequence.
11. The apparatus of claim 8 , wherein the first sequence parameter set is coded in a received bitstream prior to the first sub-sequence and the second sequence parameter set is coded in the received bitstream after at least one frame of the one or more frames of the first sub-sequence, and prior to the second sub-sequence.
12. The apparatus of claim 8 , wherein to receive the first sequence parameter set and the second sequence parameter set of the coded video sequence, the apparatus is configured to:
receive the second sequence parameter set after receiving at least one frame of the one or more frames of the first sub-sequence, and prior to receiving the second sub-sequence.
13. The apparatus of claim 8 , wherein the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence are interleaved in the coded video sequence.
14. The apparatus of claim 8 , wherein the first resolution and the second resolution each comprise a spatial resolution.
15. An apparatus for decoding video data, the apparatus comprising:
means for receiving a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
means for receiving a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
means for using the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.
16. A computer-readable storage medium comprising instructions that, when executed, cause at least one processor to decode video data, wherein the instructions cause the at least one processor to:
receive a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
receive a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
use the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.
17. A method of encoding video data, the method comprising:
generating a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
generating a first sequence parameter set and a second sequence parameter set for the video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
transmitting the coded video sequence comprising the first sub-sequence and the second sub-sequence, and the first sequence parameter set and the second sequence parameter.
18. The method of claim 17 , wherein the first sequence parameter set and the second sequence parameter set are coded in a transmitted bitstream prior to either the first sub-sequence or the second sub-sequence.
19. The method of claim 17 , wherein transmitting the first sequence parameter set and the second sequence parameter set of the coded video sequence comprises:
transmitting both the first sequence parameter set and the second sequence parameter set prior to transmitting either of the first sub-sequence and the second sub-sequence.
20. The method of claim 17 , wherein the first sequence parameter set is coded in a transmitted bitstream prior to the first sub-sequence and the second sequence parameter set is coded in the transmitted bitstream after at least one frame of the one or more frames of the first sub-sequence and prior to the second sub-sequence.
21. The method of claim 17 , wherein transmitting the first sequence parameter set and the second sequence parameter set of the coded video sequence comprises:
transmitting the second sequence parameter set after transmitting at least one frame of the one or more frames of the first sub-sequence, and prior to transmitting the second sub-sequence.
22. The method of claim 17 , wherein the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence are interleaved in the coded video sequence.
23. The method of claim 17 , wherein the first resolution and the second resolution each comprise a spatial resolution.
24. An apparatus for coding video data, the apparatus comprising a video coder configured to:
generate a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
generate a first sequence parameter set and a second sequence parameter set for the video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
transmit the coded video sequence comprising the first sub-sequence and the second sub-sequence, and the first sequence parameter set and the second sequence parameter.
25. The apparatus of claim 24 , wherein the first sequence parameter set and the second sequence parameter set are coded in a transmitted bitstream prior to either the first sub-sequence or the second sub-sequence.
26. The apparatus of claim 24 , wherein to transmit the first sequence parameter set and the second sequence parameter set of the coded video sequence, the apparatus is configured to:
transmit both the first sequence parameter set and the second sequence parameter set prior to transmitting either of the first sub-sequence and the second sub-sequence.
27. The apparatus of claim 24 , wherein the first sequence parameter set is coded in a transmitted bitstream prior to the first sub-sequence and the second sequence parameter set is coded in the transmitted bitstream after at least one frame of the one or more frames of the first sub-sequence and prior to the second sub-sequence.
28. The apparatus of claim 24 , wherein to transmit the first sequence parameter set and the second sequence parameter set of the coded video sequence, the apparatus is configured to:
transmit the second sequence parameter set after transmitting at least one frame of the one or more frames of the first sub-sequence, and prior to transmitting the second sub-sequence.
29. The apparatus of claim 24 , wherein the one or more frames of the first sub-sequence and the one or more frames of the second sub-sequence are interleaved in the coded video sequence.
30. The apparatus of claim 24 , wherein first resolution and the second resolution each comprise a spatial resolution.
31. An apparatus for encoding video data, the apparatus comprising:
means for generating a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
means for generating a first sequence parameter set and a second sequence parameter set for the video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
means for transmitting the coded video sequence comprising the first sub-sequence and the second sub-sequence, and the first sequence parameter set and the second sequence parameter.
32. A computer readable storage medium comprising instructions that, when executed, cause at least one processor of a video encoding device to:
generate a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution;
generate a first sequence parameter set and a second sequence parameter set for the video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set; and
transmit the coded video sequence comprising the first sub-sequence and the second sub-sequence, and the first sequence parameter set and the second sequence parameter.
33. A computer readable storage medium, comprising a data structure stored thereon, the data structure comprising:
a coded video sequence comprising a first sub-sequence and a second sub-sequence, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, and wherein the first sub-sequence is different than the second sub-sequence, and the first resolution is different than the second resolution; and
a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set.
34. The computer readable medium of claim 33 , wherein the first sequence parameter set and the second sequence parameter set are coded in a bitstream on the data structure prior to either the first sub-sequence or the second sub-sequence.
35. The computer readable medium of claim 33 , wherein the first sequence parameter set is coded in a bitstream on the data structure prior to the first sub-sequence and the second sequence parameter set is coded in the bitstream on the data structure after at least one frame of the one or more frames of the first sub-sequence, and prior to the second sub-sequence.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/648,174 US20130089154A1 (en) | 2011-10-10 | 2012-10-09 | Adaptive frame size support in advanced video codecs |
PCT/US2012/059577 WO2013055806A1 (en) | 2011-10-10 | 2012-10-10 | Adaptive frame size support in advanced video codecs |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161545525P | 2011-10-10 | 2011-10-10 | |
US201161550276P | 2011-10-21 | 2011-10-21 | |
US13/648,174 US20130089154A1 (en) | 2011-10-10 | 2012-10-09 | Adaptive frame size support in advanced video codecs |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130089154A1 true US20130089154A1 (en) | 2013-04-11 |
Family
ID=48042061
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/647,257 Expired - Fee Related US9451284B2 (en) | 2011-10-10 | 2012-10-08 | Efficient signaling of reference picture sets |
US13/648,174 Abandoned US20130089154A1 (en) | 2011-10-10 | 2012-10-09 | Adaptive frame size support in advanced video codecs |
US13/648,189 Abandoned US20130089135A1 (en) | 2011-10-10 | 2012-10-09 | Adaptive frame size support in advanced video codecs |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/647,257 Expired - Fee Related US9451284B2 (en) | 2011-10-10 | 2012-10-08 | Efficient signaling of reference picture sets |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/648,189 Abandoned US20130089135A1 (en) | 2011-10-10 | 2012-10-09 | Adaptive frame size support in advanced video codecs |
Country Status (6)
Country | Link |
---|---|
US (3) | US9451284B2 (en) |
EP (1) | EP2767088A1 (en) |
JP (1) | JP5972984B2 (en) |
KR (1) | KR101569305B1 (en) |
CN (1) | CN103959793A (en) |
WO (3) | WO2013055681A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140003504A1 (en) * | 2012-07-02 | 2014-01-02 | Nokia Corporation | Apparatus, a Method and a Computer Program for Video Coding and Decoding |
US20140036998A1 (en) * | 2011-11-03 | 2014-02-06 | Matthias Narroschke | Quantization parameter for blocks coded in the pcm mode |
US20140185688A1 (en) * | 2012-12-28 | 2014-07-03 | Canon Kabushiki Kaisha | Coding system transform apparatus, coding system transform method, and storage medium |
US20150341654A1 (en) * | 2014-05-22 | 2015-11-26 | Apple Inc. | Video coding system with efficient processing of zooming transitions in video |
US20160086312A1 (en) * | 2013-05-15 | 2016-03-24 | Sony Corporation | Image processing apparatus and image processing method |
US9426462B2 (en) | 2012-09-21 | 2016-08-23 | Qualcomm Incorporated | Indication and activation of parameter sets for video coding |
US20160330453A1 (en) * | 2015-05-05 | 2016-11-10 | Cisco Technology, Inc. | Parameter Set Header |
US20170105018A1 (en) * | 2014-06-10 | 2017-04-13 | Hangzhou Hikvision Digital Technology Co., Ltd. | Image encoding method and device and image decoding method and device |
US9936197B2 (en) | 2011-10-28 | 2018-04-03 | Samsung Electronics Co., Ltd. | Method for inter prediction and device therefore, and method for motion compensation and device therefore |
US20180242015A1 (en) * | 2017-02-23 | 2018-08-23 | Netflix, Inc. | Techniques for selecting resolutions for encoding different shot sequences |
CN108449564A (en) * | 2017-02-16 | 2018-08-24 | 北京视联动力国际信息技术有限公司 | A kind of method and system for supporting video decoding chip adaptive video source resolution ratio |
US10666992B2 (en) | 2017-07-18 | 2020-05-26 | Netflix, Inc. | Encoding techniques for optimizing distortion and bitrate |
US20200186795A1 (en) * | 2018-12-07 | 2020-06-11 | Beijing Dajia Internet Information Technology Co., Ltd. | Video coding using multi-resolution reference picture management |
US10742708B2 (en) | 2017-02-23 | 2020-08-11 | Netflix, Inc. | Iterative techniques for generating multiple encoded versions of a media title |
US10798387B2 (en) * | 2016-12-12 | 2020-10-06 | Netflix, Inc. | Source-consistent techniques for predicting absolute perceptual video quality |
CN113228666A (en) * | 2018-12-31 | 2021-08-06 | 华为技术有限公司 | Supporting adaptive resolution change in video coding and decoding |
US11153585B2 (en) | 2017-02-23 | 2021-10-19 | Netflix, Inc. | Optimizing encoding operations when generating encoded versions of a media title |
US11166034B2 (en) | 2017-02-23 | 2021-11-02 | Netflix, Inc. | Comparing video encoders/decoders using shot-based encoding and a perceptual visual quality metric |
US20210392349A1 (en) * | 2019-03-01 | 2021-12-16 | Alibaba Group Holding Limited | Adaptive Resolution Video Coding |
US20220132153A1 (en) * | 2019-01-02 | 2022-04-28 | Tencent America LLC | Adaptive picture resolution rescaling for inter-prediction and display |
US11611768B2 (en) * | 2019-08-06 | 2023-03-21 | Op Solutions, Llc | Implicit signaling of adaptive resolution management based on frame type |
Families Citing this family (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102494145B1 (en) | 2011-09-22 | 2023-01-31 | 엘지전자 주식회사 | Method and apparatus for signaling image information, and decoding method and apparatus using same |
US9451284B2 (en) * | 2011-10-10 | 2016-09-20 | Qualcomm Incorporated | Efficient signaling of reference picture sets |
US20130094774A1 (en) * | 2011-10-13 | 2013-04-18 | Sharp Laboratories Of America, Inc. | Tracking a reference picture based on a designated picture on an electronic device |
US8768079B2 (en) | 2011-10-13 | 2014-07-01 | Sharp Laboratories Of America, Inc. | Tracking a reference picture on an electronic device |
US8855433B2 (en) * | 2011-10-13 | 2014-10-07 | Sharp Kabushiki Kaisha | Tracking a reference picture based on a designated picture on an electronic device |
GB2497914B (en) * | 2011-10-20 | 2015-03-18 | Skype | Transmission of video data |
PL3917140T3 (en) | 2012-01-19 | 2023-12-04 | Vid Scale, Inc. | Method and apparatus for signaling and construction of video coding reference picture lists |
CN104160706B (en) * | 2012-01-20 | 2018-12-28 | 诺基亚技术有限公司 | The method and apparatus that encoded to image and decoded method and apparatus are carried out to video bits stream |
US11039138B1 (en) | 2012-03-08 | 2021-06-15 | Google Llc | Adaptive coding of prediction modes using probability distributions |
JP2013187905A (en) * | 2012-03-08 | 2013-09-19 | Panasonic Corp | Methods and apparatuses for encoding and decoding video |
US9838706B2 (en) * | 2012-04-16 | 2017-12-05 | Telefonaktiebolaget Lm Ericsson (Publ) | Encoder, decoder and methods thereof for video encoding and decoding |
ES2772028T3 (en) * | 2012-04-16 | 2020-07-07 | Ericsson Telefon Ab L M | Provisions and methods thereof for video processing |
US9420286B2 (en) | 2012-06-15 | 2016-08-16 | Qualcomm Incorporated | Temporal motion vector prediction in HEVC and its extensions |
RS64224B1 (en) * | 2012-06-25 | 2023-06-30 | Huawei Tech Co Ltd | Gradual temporal layer access pictures in video compression |
US9167248B2 (en) | 2012-07-13 | 2015-10-20 | Qualcomm Incorporated | Reference picture list modification for video coding |
US9992490B2 (en) | 2012-09-26 | 2018-06-05 | Sony Corporation | Video parameter set (VPS) syntax re-ordering for easy access of extension parameters |
US9392268B2 (en) | 2012-09-28 | 2016-07-12 | Qualcomm Incorporated | Using base layer motion information |
US20140140406A1 (en) * | 2012-11-16 | 2014-05-22 | General Instrument Corporation | Devices and methods for processing of non-idr related syntax for high efficiency video coding (hevc) |
US10219006B2 (en) | 2013-01-04 | 2019-02-26 | Sony Corporation | JCTVC-L0226: VPS and VPS_extension updates |
US10419778B2 (en) | 2013-01-04 | 2019-09-17 | Sony Corporation | JCTVC-L0227: VPS_extension with updates of profile-tier-level syntax structure |
MY169901A (en) * | 2013-04-12 | 2019-06-13 | Ericsson Telefon Ab L M | Constructing inter-layer reference picture lists |
JP6361866B2 (en) * | 2013-05-09 | 2018-07-25 | サン パテント トラスト | Image processing method and image processing apparatus |
US9648326B2 (en) * | 2013-07-02 | 2017-05-09 | Qualcomm Incorporated | Optimizations on inter-layer prediction signalling for multi-layer video coding |
US11284103B2 (en) | 2014-01-17 | 2022-03-22 | Microsoft Technology Licensing, Llc | Intra block copy prediction with asymmetric partitions and encoder-side search patterns, search ranges and approaches to partitioning |
EP3158734A1 (en) * | 2014-06-19 | 2017-04-26 | Microsoft Technology Licensing, LLC | Unified intra block copy and inter prediction modes |
EP3917146A1 (en) | 2014-09-30 | 2021-12-01 | Microsoft Technology Licensing, LLC | Rules for intra-picture prediction modes when wavefront parallel processing is enabled |
EP3313079B1 (en) * | 2015-06-18 | 2021-09-01 | LG Electronics Inc. | Image filtering method in image coding system |
US20160373763A1 (en) * | 2015-06-18 | 2016-12-22 | Mediatek Inc. | Inter prediction method with constrained reference frame acquisition and associated inter prediction device |
US10623755B2 (en) * | 2016-05-23 | 2020-04-14 | Qualcomm Incorporated | End of sequence and end of bitstream NAL units in separate file tracks |
CN108668132A (en) * | 2018-05-07 | 2018-10-16 | 联发科技(新加坡)私人有限公司 | Manage method, image decoder and the storage medium of decoding image buffering area |
CN110572712B (en) * | 2018-06-05 | 2021-11-02 | 杭州海康威视数字技术股份有限公司 | Decoding method and device |
BR112021002832A2 (en) | 2018-08-17 | 2021-05-04 | Huawei Technologies Co., Ltd. | reference image management in video encoding |
US11463736B2 (en) * | 2018-09-21 | 2022-10-04 | Sharp Kabushiki Kaisha | Systems and methods for signaling reference pictures in video coding |
US11196988B2 (en) | 2018-12-17 | 2021-12-07 | Apple Inc. | Reference picture management and list construction |
US11303913B2 (en) * | 2019-06-19 | 2022-04-12 | Qualcomm Incorporated | Decoded picture buffer indexing |
US11968374B2 (en) * | 2019-07-03 | 2024-04-23 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for coding and decoding |
CN114503581A (en) | 2019-08-06 | 2022-05-13 | Op方案有限责任公司 | Adaptive block-based resolution management |
MX2022001593A (en) * | 2019-08-06 | 2022-03-11 | Op Solutions Llc | Adaptive resolution management signaling. |
WO2021026334A1 (en) * | 2019-08-06 | 2021-02-11 | Op Solutions | Adaptive resolution management signaling |
EP4011084A4 (en) | 2019-08-06 | 2023-08-09 | OP Solutions | Adaptive resolution management prediction rescaling |
US20220353536A1 (en) * | 2019-08-22 | 2022-11-03 | Sharp Kabushiki Kaisha | Systems and methods for signaling picture information in video coding |
BR112022005530A2 (en) * | 2019-09-24 | 2022-06-21 | Fraunhofer Ges Forschung | Decoding apparatus, encoding apparatus, video decoding method and video encoding method |
WO2021091253A1 (en) * | 2019-11-05 | 2021-05-14 | 엘지전자 주식회사 | Slice type-based image/video coding method and apparatus |
MX2022005534A (en) | 2019-11-08 | 2022-08-04 | Op Solutions Llc | Methods and systems for adaptive cropping. |
US11785214B2 (en) * | 2019-11-14 | 2023-10-10 | Mediatek Singapore Pte. Ltd. | Specifying video picture information |
CN115988219B (en) | 2020-01-12 | 2024-01-16 | 华为技术有限公司 | Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns |
US11533472B2 (en) * | 2020-05-21 | 2022-12-20 | Alibaba Group Holding Limited | Method for reference picture processing in video coding |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100238822A1 (en) * | 2009-03-18 | 2010-09-23 | Kyohei Koyabu | Image processing device, image processing method, information processing device, and information processing method |
US20110216243A1 (en) * | 2010-03-05 | 2011-09-08 | Canon Kabushiki Kaisha | Image processing apparatus capable of extracting frame image data from video data and method for controlling the same |
US20130089135A1 (en) * | 2011-10-10 | 2013-04-11 | Qualcomm Incorporated | Adaptive frame size support in advanced video codecs |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI249356B (en) | 2002-11-06 | 2006-02-11 | Nokia Corp | Picture buffering for prediction references and display |
KR101094323B1 (en) | 2003-09-17 | 2011-12-19 | 톰슨 라이센싱 | Adaptive reference picture generation |
FI115589B (en) | 2003-10-14 | 2005-05-31 | Nokia Corp | Encoding and decoding redundant images |
US7400681B2 (en) | 2003-11-28 | 2008-07-15 | Scientific-Atlanta, Inc. | Low-complexity motion vector prediction for video codec with two lists of reference pictures |
US20050254526A1 (en) | 2004-05-12 | 2005-11-17 | Nokia Corporation | Parameter sets update in streaming applications |
KR100883603B1 (en) | 2005-04-13 | 2009-02-13 | 엘지전자 주식회사 | Method and apparatus for decoding video signal using reference pictures |
EP1869888B1 (en) | 2005-04-13 | 2016-07-06 | Nokia Technologies Oy | Method, device and system for effectively coding and decoding of video data |
KR100825743B1 (en) | 2005-11-15 | 2008-04-29 | 한국전자통신연구원 | A method of scalable video coding for varying spatial scalability of bitstream in real time and a codec using the same |
US8582663B2 (en) * | 2006-08-08 | 2013-11-12 | Core Wireless Licensing S.A.R.L. | Method, device, and system for multiplexing of video streams |
WO2008048499A2 (en) * | 2006-10-13 | 2008-04-24 | Thomson Licensing | Reference picture list management syntax for multiple view video coding |
US20080170793A1 (en) | 2007-01-12 | 2008-07-17 | Mitsubishi Electric Corporation | Image encoding device and image encoding method |
JP5026092B2 (en) | 2007-01-12 | 2012-09-12 | 三菱電機株式会社 | Moving picture decoding apparatus and moving picture decoding method |
US20090141809A1 (en) | 2007-12-04 | 2009-06-04 | Sony Corporation And Sony Electronics Inc. | Extension to the AVC standard to support the encoding and storage of high resolution digital still pictures in parallel with video |
KR101431543B1 (en) | 2008-01-21 | 2014-08-21 | 삼성전자주식회사 | Apparatus and method of encoding/decoding video |
WO2010086500A1 (en) | 2009-01-28 | 2010-08-05 | Nokia Corporation | Method and apparatus for video coding and decoding |
WO2010092740A1 (en) | 2009-02-10 | 2010-08-19 | パナソニック株式会社 | Image processing apparatus, image processing method, program and integrated circuit |
US20100218232A1 (en) | 2009-02-25 | 2010-08-26 | Cisco Technology, Inc. | Signalling of auxiliary information that assists processing of video according to various formats |
US20110013692A1 (en) | 2009-03-29 | 2011-01-20 | Cohen Robert A | Adaptive Video Transcoding |
US9008176B2 (en) * | 2011-01-22 | 2015-04-14 | Qualcomm Incorporated | Combined reference picture list construction for video coding |
US9008181B2 (en) * | 2011-01-24 | 2015-04-14 | Qualcomm Incorporated | Single reference picture list utilization for interprediction video coding |
US20120328005A1 (en) * | 2011-06-22 | 2012-12-27 | General Instrument Corporation | Construction of combined list using temporal distance |
MX2013014857A (en) * | 2011-06-30 | 2014-03-26 | Ericsson Telefon Ab L M | Reference picture signaling. |
MY166191A (en) * | 2011-06-30 | 2018-06-07 | Ericsson Telefon Ab L M | Absolute or explicit reference picture signaling |
US20140169449A1 (en) * | 2011-07-05 | 2014-06-19 | Telefonaktiebolaget L M Ericsson (Publ) | Reference picture management for layered video |
ES2800049T3 (en) * | 2011-08-25 | 2020-12-23 | Sun Patent Trust | Procedures and apparatus for encoding and decoding video using an updated buffer description |
ES2625097T3 (en) * | 2011-09-07 | 2017-07-18 | Sun Patent Trust | Image coding method and image coding apparatus |
US9462268B2 (en) | 2012-10-09 | 2016-10-04 | Cisco Technology, Inc. | Output management of prior decoded pictures at picture format transitions in bitstreams |
-
2012
- 2012-10-08 US US13/647,257 patent/US9451284B2/en not_active Expired - Fee Related
- 2012-10-09 EP EP12784111.2A patent/EP2767088A1/en not_active Withdrawn
- 2012-10-09 US US13/648,174 patent/US20130089154A1/en not_active Abandoned
- 2012-10-09 US US13/648,189 patent/US20130089135A1/en not_active Abandoned
- 2012-10-09 WO PCT/US2012/059346 patent/WO2013055681A1/en active Application Filing
- 2012-10-09 KR KR1020147012508A patent/KR101569305B1/en not_active IP Right Cessation
- 2012-10-09 CN CN201280049743.7A patent/CN103959793A/en active Pending
- 2012-10-09 JP JP2014535786A patent/JP5972984B2/en not_active Expired - Fee Related
- 2012-10-10 WO PCT/US2012/059577 patent/WO2013055806A1/en active Application Filing
- 2012-10-10 WO PCT/US2012/059579 patent/WO2013055808A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100238822A1 (en) * | 2009-03-18 | 2010-09-23 | Kyohei Koyabu | Image processing device, image processing method, information processing device, and information processing method |
US20110216243A1 (en) * | 2010-03-05 | 2011-09-08 | Canon Kabushiki Kaisha | Image processing apparatus capable of extracting frame image data from video data and method for controlling the same |
US20130089135A1 (en) * | 2011-10-10 | 2013-04-11 | Qualcomm Incorporated | Adaptive frame size support in advanced video codecs |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11962791B2 (en) | 2011-10-28 | 2024-04-16 | Samsung Electronics Co., Ltd. | Method for inter prediction and device therefor, and method for motion compensation and device therefor |
US9936197B2 (en) | 2011-10-28 | 2018-04-03 | Samsung Electronics Co., Ltd. | Method for inter prediction and device therefore, and method for motion compensation and device therefore |
US10575002B2 (en) | 2011-10-28 | 2020-02-25 | Samsung Electronics Co., Ltd. | Method for inter prediction and device therefor, and method for motion compensation and device therefor |
US10819989B2 (en) | 2011-10-28 | 2020-10-27 | Samsung Electronics Co., Ltd. | Method for inter prediction and device therefor, and method for motion compensation and device therefor |
US11206414B2 (en) | 2011-10-28 | 2021-12-21 | Samsung Electronics Co., Ltd. | Method for inter prediction and device therefor, and method for motion compensation and device therefor |
US9774851B2 (en) * | 2011-11-03 | 2017-09-26 | Sun Patent Trust | Quantization parameter for blocks coded in the PCM mode |
US20140036998A1 (en) * | 2011-11-03 | 2014-02-06 | Matthias Narroschke | Quantization parameter for blocks coded in the pcm mode |
US20140003504A1 (en) * | 2012-07-02 | 2014-01-02 | Nokia Corporation | Apparatus, a Method and a Computer Program for Video Coding and Decoding |
US9554146B2 (en) | 2012-09-21 | 2017-01-24 | Qualcomm Incorporated | Indication and activation of parameter sets for video coding |
US9426462B2 (en) | 2012-09-21 | 2016-08-23 | Qualcomm Incorporated | Indication and activation of parameter sets for video coding |
US20140185688A1 (en) * | 2012-12-28 | 2014-07-03 | Canon Kabushiki Kaisha | Coding system transform apparatus, coding system transform method, and storage medium |
US10424050B2 (en) * | 2013-05-15 | 2019-09-24 | Sony Semiconductor Solutions Corporation | Image processing apparatus and image processing method |
US20160086312A1 (en) * | 2013-05-15 | 2016-03-24 | Sony Corporation | Image processing apparatus and image processing method |
US10051281B2 (en) * | 2014-05-22 | 2018-08-14 | Apple Inc. | Video coding system with efficient processing of zooming transitions in video |
US20150341654A1 (en) * | 2014-05-22 | 2015-11-26 | Apple Inc. | Video coding system with efficient processing of zooming transitions in video |
US20170105018A1 (en) * | 2014-06-10 | 2017-04-13 | Hangzhou Hikvision Digital Technology Co., Ltd. | Image encoding method and device and image decoding method and device |
US10659800B2 (en) * | 2014-06-10 | 2020-05-19 | Hangzhou Hikvision Digital Technology Co., Ltd. | Inter prediction method and device |
US20160330453A1 (en) * | 2015-05-05 | 2016-11-10 | Cisco Technology, Inc. | Parameter Set Header |
US11758148B2 (en) | 2016-12-12 | 2023-09-12 | Netflix, Inc. | Device-consistent techniques for predicting absolute perceptual video quality |
US11503304B2 (en) | 2016-12-12 | 2022-11-15 | Netflix, Inc. | Source-consistent techniques for predicting absolute perceptual video quality |
US10834406B2 (en) | 2016-12-12 | 2020-11-10 | Netflix, Inc. | Device-consistent techniques for predicting absolute perceptual video quality |
US10798387B2 (en) * | 2016-12-12 | 2020-10-06 | Netflix, Inc. | Source-consistent techniques for predicting absolute perceptual video quality |
CN108449564A (en) * | 2017-02-16 | 2018-08-24 | 北京视联动力国际信息技术有限公司 | A kind of method and system for supporting video decoding chip adaptive video source resolution ratio |
US20180242015A1 (en) * | 2017-02-23 | 2018-08-23 | Netflix, Inc. | Techniques for selecting resolutions for encoding different shot sequences |
US11444999B2 (en) | 2017-02-23 | 2022-09-13 | Netflix, Inc. | Iterative techniques for generating multiple encoded versions of a media title |
US10897618B2 (en) | 2017-02-23 | 2021-01-19 | Netflix, Inc. | Techniques for positioning key frames within encoded video sequences |
US10917644B2 (en) | 2017-02-23 | 2021-02-09 | Netflix, Inc. | Iterative techniques for encoding video content |
CN110313183A (en) * | 2017-02-23 | 2019-10-08 | 奈飞公司 | Iterative technique for being encoded to video content |
US11153585B2 (en) | 2017-02-23 | 2021-10-19 | Netflix, Inc. | Optimizing encoding operations when generating encoded versions of a media title |
US11166034B2 (en) | 2017-02-23 | 2021-11-02 | Netflix, Inc. | Comparing video encoders/decoders using shot-based encoding and a perceptual visual quality metric |
US11184621B2 (en) * | 2017-02-23 | 2021-11-23 | Netflix, Inc. | Techniques for selecting resolutions for encoding different shot sequences |
US11870945B2 (en) | 2017-02-23 | 2024-01-09 | Netflix, Inc. | Comparing video encoders/decoders using shot-based encoding and a perceptual visual quality metric |
US11871002B2 (en) | 2017-02-23 | 2024-01-09 | Netflix, Inc. | Iterative techniques for encoding video content |
US10715814B2 (en) | 2017-02-23 | 2020-07-14 | Netflix, Inc. | Techniques for optimizing encoding parameters for different shot sequences |
US11818375B2 (en) | 2017-02-23 | 2023-11-14 | Netflix, Inc. | Optimizing encoding operations when generating encoded versions of a media title |
US11758146B2 (en) | 2017-02-23 | 2023-09-12 | Netflix, Inc. | Techniques for positioning key frames within encoded video sequences |
US10742708B2 (en) | 2017-02-23 | 2020-08-11 | Netflix, Inc. | Iterative techniques for generating multiple encoded versions of a media title |
US10666992B2 (en) | 2017-07-18 | 2020-05-26 | Netflix, Inc. | Encoding techniques for optimizing distortion and bitrate |
US11910039B2 (en) | 2017-07-18 | 2024-02-20 | Netflix, Inc. | Encoding technique for optimizing distortion and bitrate |
US20200186795A1 (en) * | 2018-12-07 | 2020-06-11 | Beijing Dajia Internet Information Technology Co., Ltd. | Video coding using multi-resolution reference picture management |
US20220124317A1 (en) * | 2018-12-07 | 2022-04-21 | Beijing Dajia Internet Information Technology Co., Ltd. | Video coding using multi-resolution reference picture management |
CN113810689A (en) * | 2018-12-07 | 2021-12-17 | 北京达佳互联信息技术有限公司 | Video coding and decoding using multi-resolution reference picture management |
US12022059B2 (en) * | 2018-12-07 | 2024-06-25 | Beijing Dajia Internet Information Technology Co., Ltd. | Video coding using multi-resolution reference picture management |
US11652985B2 (en) | 2018-12-31 | 2023-05-16 | Huawei Technologies Co., Ltd. | Support of adaptive resolution change in video coding |
CN113228666A (en) * | 2018-12-31 | 2021-08-06 | 华为技术有限公司 | Supporting adaptive resolution change in video coding and decoding |
US20220132153A1 (en) * | 2019-01-02 | 2022-04-28 | Tencent America LLC | Adaptive picture resolution rescaling for inter-prediction and display |
US20210392349A1 (en) * | 2019-03-01 | 2021-12-16 | Alibaba Group Holding Limited | Adaptive Resolution Video Coding |
US11611768B2 (en) * | 2019-08-06 | 2023-03-21 | Op Solutions, Llc | Implicit signaling of adaptive resolution management based on frame type |
Also Published As
Publication number | Publication date |
---|---|
JP5972984B2 (en) | 2016-08-17 |
WO2013055808A1 (en) | 2013-04-18 |
WO2013055806A1 (en) | 2013-04-18 |
KR20140093229A (en) | 2014-07-25 |
US20130089134A1 (en) | 2013-04-11 |
JP2014532374A (en) | 2014-12-04 |
EP2767088A1 (en) | 2014-08-20 |
WO2013055681A1 (en) | 2013-04-18 |
US9451284B2 (en) | 2016-09-20 |
US20130089135A1 (en) | 2013-04-11 |
KR101569305B1 (en) | 2015-11-13 |
CN103959793A (en) | 2014-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11490119B2 (en) | Decoded picture buffer management | |
US20130089154A1 (en) | Adaptive frame size support in advanced video codecs | |
US9532052B2 (en) | Cross-layer POC alignment for multi-layer bitstreams that may include non-aligned IRAP pictures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHEN, YING;WANG, YE-KUI;KARCZEWICZ, MARTA;REEL/FRAME:029472/0887 Effective date: 20121206 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |