CN113615184A

CN113615184A - Explicit signaling to extend long-term reference picture preservation

Info

Publication number: CN113615184A
Application number: CN202080021301.6A
Authority: CN
Inventors: B·富尔赫特; H·卡瓦; V·阿季奇
Original assignee: OP Solutions LLC
Current assignee: OP Solutions LLC
Priority date: 2019-01-28
Filing date: 2020-01-28
Publication date: 2021-11-05
Anticipated expiration: 2040-01-28
Also published as: KR20210118155A; JP7498502B2; BR112021014753A2; SG11202108105YA; CN113615184B; MX2021009024A; JP2024100973A; EP3918799A1; WO2020159993A1; CN118714324A; EP3918799A4; JP2022524917A

Abstract

A decoder includes circuitry configured to: receiving a bit stream; storing a plurality of long-term reference frames in a reference list; retaining the long-term reference frame in the reference list for a length of time based on the retention time; and decoding at least a portion of the video using the long-term reference frames retained in the reference list. Related apparatus, systems, techniques, and articles are also described.

Description

Explicit signaling to extend long-term reference picture preservation

Cross Reference to Related Applications

This application claims priority from U.S. provisional patent application No. 62/797,806 entitled "EXPLICIT SIGNALING OF EXPLICIT LONG TERM REFERENCE PICTURE RETENTION" filed on 28.1.2019, which is incorporated herein by reference in its entirety.

Technical Field

The present invention relates generally to the field of video compression. In particular, the present invention relates to explicit signaling extending long-term reference picture reservation.

Background

A video codec may include electronic circuitry or software that compresses or decompresses digital video. Which can convert uncompressed video to a compressed format and vice versa. In video compression, the device that compresses the video (and/or performs some of its functions) may be generally referred to as an encoder and the device that decompresses the video (and/or performs some of its functions) may be referred to as a decoder.

The format of the compressed data may conform to standard video compression specifications. Compression can be lossy because the compressed video lacks some of the information present in the original video. The result may include that the quality of the decompressed video may be lower than the original uncompressed video because there is not enough information to accurately reconstruct the original video.

There may be a complex relationship between video quality, the amount of data used to represent the video (e.g., as determined by bit rate), the complexity of encoding and decoding algorithms, susceptibility to data loss and errors, ease of editing, random access, end-to-end delay (e.g., latency), and so forth.

Motion compensation may include methods of predicting a video frame or portion thereof given a reference frame (e.g., a previous and/or future frame) by taking into account motion of the camera and/or objects in the video. It can be used for encoding and decoding of video data for video compression, for example in the encoding and decoding of the Moving Picture Experts Group (MPEG) -2 (also known as Advanced Video Coding (AVC) and h.264) standard. Motion compensation may describe a picture in terms of a transformation of a reference picture to a current picture. The reference picture may be previous in time, may be from the future, or may include a Long Term Reference (LTR) frame when compared to the current picture. Compression efficiency may be improved when images may be accurately synthesized from previously transmitted and/or stored images.

Long Term Reference (LTR) frames have been used in video coding standards such as MPEG-2, H.264 (also known as AVC or MPEG-4 part 10), and H.265 (also known as High Efficiency Video Coding (HEVC)). Frames marked as LTR frames in the video bitstream may be used as references until they are explicitly removed by bitstream signaling. LTR frames improve prediction and compression efficiency in scenes with long periods of static background (e.g., background in video conferencing or parking lot surveillance video). However, over time, the background of the scene may gradually change (e.g., become part of the background scene when the car is parked in the open space). Thus, the LTR frames are updated to allow better prediction to improve compression performance.

Current standards, such as h.264 and h.265, allow for updating LTR frames by signaling that newly decoded frames are saved and made available as reference frames. This update is signaled by the encoder and the entire frame is updated. But the cost of updating the entire frame can be high. And when updating the LTR frame, the previous LTR frame is discarded. If the static background associated with the previously dropped LTR frames appears again in the video (e.g., as in the video switching from a first scene to a second scene and then back to the first scene), the previous LTR frames must be encoded again in the bitstream, which reduces compression efficiency.

Disclosure of Invention

In one aspect, a decoder includes circuitry configured to receive a bitstream, store a plurality of long-term reference frames in a reference list, retain the long-term reference frames in the reference list for a length of time based on a retention time, and decode at least a portion of video using the long-term reference frames retained in the reference list.

In another aspect, a method includes receiving, by a decoder, a bitstream. The method includes storing, by a decoder, a plurality of long-term reference frames in a reference list. The method includes retaining, by a decoder, a long-term reference frame in a reference list for a length of time based on a retention time. The method includes decoding, by a decoder, at least a portion of a video using long-term reference frames retained in a reference list.

The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.

Drawings

For the purpose of illustrating the invention, the drawings show aspects of one or more embodiments of the invention. It should be understood, however, that the present invention is not limited to the precise arrangements and instrumentalities shown in the attached drawings, wherein:

FIG. 1 shows an example reference list for frame prediction over a long period of time;

FIG. 2 is a process flow diagram illustrating an example process of extending long term reference (eLTR) frame retention, where eLTR frames are retained in a reference list;

FIG. 3 is a system block diagram illustrating an example decoder capable of decoding a bitstream using eLTR frames retained in a reference list;

FIG. 4 is a process flow diagram illustrating an example process for encoding video using eLTR frames retained in a reference list that can improve compression efficiency compared to some prior approaches in accordance with some aspects of the present subject matter;

FIG. 5 is a system block diagram illustrating an example video encoder capable of signaling eLTR reservation in a reference list; and

FIG. 6 is a block diagram of a computing system that may be used to implement any one or more of the methods disclosed herein and any one or more portions thereof.

The drawings are not necessarily to scale and may be shown with phantom lines, diagrammatic representations and fragmentary views. In certain instances, details that are not necessary for an understanding of the embodiments or that render other details difficult to perceive may have been omitted. Like reference symbols in the various drawings indicate like elements.

Detailed Description

In the case where certain portions of a frame are occluded and then repeatedly uncovered over time, long term reference pictures (LTRs) may be used to better predict the video frame. Conventionally, LTRs are used for the duration of a scene or a group of pictures, and are then replaced or discarded. Some embodiments of the current subject matter extend the utility of LTR usage by selecting the best candidate LTR to remain in the reference list. In some embodiments, an explicitly signaled extended long term reference (etlr) frame may remain in the reference list for the explicitly signaled duration. Some embodiments of the present subject matter may provide significant compression efficiency gains compared to some existing approaches.

Some embodiments of the present subject matter may enable selection and retention of the etlr frames in video coding. The etlr may remain in the picture reference list, which may be used by the current frame or group of frames for prediction. The eLTR may remain in the reference list, although all other frames in the list may change in a relatively short period of time. For example, fig. 1 shows an example reference list for long temporal frame prediction. As a non-limiting illustrative example, video frames shown as shadows may be reconstructed using reference frames. The reference list may contain frames that change over time and the retained eLTR.

In some embodiments, still referring to fig. 1, the encoder performs the operations of etlr selection and reservation calculation. The selected frame and retention time may be signaled to the decoder, for example, using a pair (eLTRN, TRn) indicating an index of eLTR (eLTRN) and a retention time of frame n (TRn). The decoder may retain frame etlrn in the reference list for a period of time TRn. After the eLTRn frame resides at least in TRn in the reference list, the eLTRn frame may be marked as unavailable for further use. In some embodiments, the etlrn frame may remain in memory, but in an unavailable state. In some implementations, the encoder may explicitly signal the decoder to mark the etlrn frame as available or unavailable. For example, after the retention time TRn has elapsed, an etlrn frame previously marked as unavailable may be marked as available. This property may enable eLTRN to be reused in the future, for example for video containing scenes switched back and forth. In some embodiments, the encoder may include a signal in the bitstream for the decoder to remove the etlrn frame from memory. The decoder may remove the etlrn frame from the reference list and memory based on such signals.

Fig. 2 is a process flow diagram illustrating a non-limiting example of a process 200 of ltr frame retention, where the ltr frames are retained in a reference list. Such an etlr reservation may improve compression efficiency compared to some existing video encoding and decoding methods.

In step 210, still referring to fig. 2, the decoder receives a bitstream. The bitstream may include data found in the bitstream as input to a decoder, for example, when data compression is utilized. The bitstream may include information required to decode the video. The receiving operation may include extracting and/or parsing a block and associated signaling information from the bitstream. In some implementations, receiving the bitstream may include parsing the etltr frames, the indices of the frames (etlrn), and the associated retention times (TRn), where the retention times are based on time within the decoded frames and/or video.

With continued reference to fig. 2, at step 220, the etlr frame may be stored in a reference picture list.

At step 230, still referring to fig. 2, the stored etlr frame may be retained (e.g., held) in the reference list for a certain length of time based on the associated retention time (TRn).

At step 240, still referring to fig. 2, at least a portion of the video may be decoded from the bitstream. The decoding operation may include decoding the current block. For example, a received current encoded block contained in a bitstream may be decoded, for example, by using inter-prediction. Decoding via inter prediction may include using previous, future and/or etlr frames as references for computing a prediction, which may be combined with a residual contained in the bitstream.

With further reference to FIG. 2, for subsequent current blocks, the eLTR frame may be used as a reference frame for inter prediction. For example, a second encoded block may be received. Possibly determining whether inter prediction mode is enabled for the second coding block; the determining operation may include receiving an explicit signal from the bitstream indicating whether inter prediction mode is enabled. It is possible to use the etlr frame as a reference frame and determine the second decoded block according to the inter prediction mode. For example, a decoding operation via inter-prediction may include using the etlr frame as a reference to compute a prediction, which may be combined with a residual contained in the bitstream.

Fig. 3 is a system block diagram illustrating a non-limiting example of a decoder 300, the decoder 300 being capable of decoding a bitstream 370 using the etlr frames retained in the reference list. The decoder 300 may include an entropy decoder processor 310, an inverse quantization and inverse transform processor 320, a deblocking filter 330, a frame buffer 340, a motion compensation processor 350, and an intra prediction processor 360. In some embodiments, bitstream 370 may include parameters (e.g., fields in a bitstream header) that represent the etltr index (etlrn) and the retention time (TRn). The motion compensation processor 350 may reconstruct the pixel information using the eLTR frame and retain the eLTR frame according to its associated retention time (TRn). For example, when an eLTR frame (eLTRN) is received and retained in the reference list for at least one associated retention time, the eLTR frame (eLTRN) may be used as a reference for inter-frame prediction mode, at least during the associated retention time.

In operation, still referring to fig. 3, the bitstream 370 may be received by the decoder 300 and input to the entropy decoder processor 310, and the entropy decoder processor 310 may entropy decode the bitstream into quantized coefficients. The quantized coefficients may be provided to an inverse quantization and inverse transform processor 320, and the inverse quantization and inverse transform processor 320 may perform inverse quantization and inverse transform to create a residual signal, which may be added to the output of the motion compensation processor 350 or the intra prediction processor 360 according to a processing mode. The output of the motion compensation processor 350 and the intra prediction processor 360 may include block predictions based on previously decoded blocks and/or erlr frames maintained in a reference list. The sum of the prediction and the residual may be processed by deblocking filter 630 and stored in frame buffer 640.

Fig. 4 is a process flow diagram illustrating a non-limiting example of a process 400 of encoding video with an etlr frame retained in a reference list that may improve compression efficiency compared to some existing approaches, in accordance with some aspects of the present subject matter. At step 410, a sequence of video frames may be encoded, including determining one or more eLTR frames. At step 420, an eLTR frame retention time (TRn) may be determined, for example, based on the length of time an eLTR frame is utilized by an encoder/decoder, where, for example, time is based on the frame being decoded in the video.

At step 430, still referring to fig. 4, additional signaling parameters may be determined. For example, it may be determined whether and when an eLTR frame is marked as unavailable or available, and it may be determined whether and when each eLTR frame should be removed from memory.

In step 440, still referring to fig. 4, the etlr retention time and additional signaling parameters may be included in the bitstream.

Fig. 5 is a system block diagram illustrating a non-limiting example of a video encoder 500, the video encoder 500 being capable of signaling an etlr reservation in a reference list. The example video encoder 500 receives input video 505, and the input video 505 may be initially partitioned or partitioned according to a processing scheme, such as a tree-structured macroblock partitioning scheme (e.g., quad tree plus binary tree). Examples of tree-structured macroblock partitioning schemes may include partitioning schemes that partition a picture frame into large block elements, which for purposes of this disclosure may be referred to as Coding Tree Units (CTUs). In some embodiments, each CTU may be further divided one or more times into a plurality of sub-blocks called Coding Units (CUs). The result of such partitioning may include a set of sub-blocks, which for purposes of this disclosure may be referred to as Prediction Units (PUs). Transform Units (TUs) may also be used.

Still referring to fig. 5, the example video encoder 500 may include an intra prediction processor 515, a motion estimation/compensation processor 520 capable of supporting etlr frame preservation (also referred to as an inter prediction processor), a transform/quantization processor 525, an inverse quantization/inverse transform processor 530, an in-loop filter 535, a decoded picture buffer 540, and an entropy encoding processor 545. In some embodiments, the motion estimation/compensation processor 520 may determine the etlr retention time and additional signaling parameters. Bitstream parameters representing the retention of the etlr frame and additional parameters may be input to the entropy encoding processor 545 for inclusion in the output bitstream 550.

In operation, and with continued reference to fig. 5, for each block of a frame of input video 505, it may be determined whether to process the block via intra-picture prediction or using motion estimation/compensation. The block may be provided to an intra prediction processor 510 or a motion estimation/compensation processor 520. If the block is to be processed by intra prediction, the intra prediction processor 510 may perform processing to output a prediction value. If the block is to be processed by motion estimation/compensation, motion estimation/compensation processor 520 may perform processing including using the eLTR frame as a reference for inter prediction (if applicable).

With continued reference to fig. 5, it is possible to form a residual by subtracting the prediction value from the input video. The residual may be received by a transform/quantization processor 525, and the transform/quantization processor 525 may perform a transform process (e.g., a Discrete Cosine Transform (DCT)) to produce coefficients that may be quantized. The quantized coefficients and any associated signaling information may be provided to the entropy encoding processor 545 for entropy encoding and inclusion in the output bitstream 550. The entropy encoding processor 545 may support encoding of signaling information related to the retention of the etlr frame. Further, the quantized coefficients may be provided to an inverse quantization/inverse transform processor 530, which may render pixels that may be combined with the predictor and processed by an in-loop filter 535, the output of which may be stored in a decoded picture buffer 540 for use by the motion estimation/compensation processor 520 capable of supporting the retention of the eLTR frame.

Still referring to fig. 5, although some variations have been described in detail above, other modifications or additions are possible. For example, in some implementations, the current block may include any symmetric block (8 × 8, 16 × 16, 32 × 32, 64 × 64, 128 × 128, etc.) as well as any asymmetric block (8 × 4, 16 × 8, etc.).

In some embodiments, and with continued reference to fig. 5, it is possible to implement a quadtree plus binary decision tree (QTBT). In QTBT, at the coding tree unit level, the partitioning parameters of QTBT may be dynamically derived to adapt to local characteristics without transmitting any overhead. Subsequently, at the coding unit level, the joint classifier decision tree structure may eliminate unnecessary iterations and control the risk of mispredictions.

In some embodiments, the decoder may include an ltr frame reservation processor (not shown) that determines whether and when to mark an ltr frame as unavailable or removed from a reference list.

In some embodiments, the current subject matter may be applied to broadcast (and similar) scenarios in which the decoder tunes in (turn in) in the middle of the retention period. To support standard playback, the encoder may mark (e) LTR frames as Instantaneous Decoding Refresh (IDR) type frames. In this case, streaming may resume after the next available ltr (idr) frame. This approach may be similar to some current broadcast standards that specify inter-frame frames as IDR frames.

The subject matter described herein provides a number of technical advantages. For example, some embodiments of the current subject matter may provide for decoding a block using the etlr frame that remains in the reference list. This method can improve compression efficiency.

It should be noted that any one or more of the aspects and embodiments described herein may be conveniently implemented using digital electronic circuitry, integrated circuitry, a specially designed Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) computer hardware, firmware, software, and/or combinations thereof, as implemented and/or embodied in one or more machines (e.g., one or more computing devices as user computing devices for electronic documents, one or more server devices such as document servers, etc.). These various aspects or features may include implementations in one or more computer programs and/or software executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. The aspects and embodiments discussed above that employ software and/or software modules may also include suitable hardware for facilitating the implementation of the machine-executable instructions of the software and/or software modules.

Such software may be a computer program product employing a machine-readable storage medium. A machine-readable storage medium may be any medium that can store and/or encode a sequence of instructions for execution by a machine (e.g., a computing device) and that cause the machine to perform any one of the methods and/or embodiments described herein. Examples of a machine-readable storage medium include, but are not limited to, magnetic disks, optical disks (e.g., CD-R, DVD-R, etc.), magneto-optical disks, read-only memory "ROM" devices, random-access memory "RAM" devices, magnetic cards, optical cards, solid-state memory devices, EPROM, EEPROM, Programmable Logic Devices (PLD), and/or any combination thereof. Machine-readable media as used herein is intended to include both a single medium and a collection of physically separate media, such as a collection of optical disks or one or more hard disk drives in combination with computer memory. As used herein, a machine-readable storage medium does not include transitory forms of signal transmission.

Such software may also include information (e.g., data) carried as a data signal on a data carrier (e.g., a carrier wave). For example, machine-executable information may be included as data-bearing signals embodied in data carriers, where the signals encode a sequence of instructions or portions thereof for execution by a machine (e.g., a computing device) and any related information (e.g., data structures and data) that cause the machine to perform any one of the methods and/or embodiments described herein.

Examples of a computing device include, but are not limited to, an electronic book reading device, a computer workstation, a terminal computer, a server computer, a handheld device (e.g., tablet computer, smartphone, etc.), a network appliance, a network router, network switch, network bridge, any machine capable of executing a sequence of instructions that specify actions to be taken by that machine, and any combination thereof. In one example, the computing device may include and/or be included in a kiosk.

Fig. 6 shows a diagram of one embodiment of a computing device in the exemplary form of a computer system 600 within which a set of instructions, for causing a control system to perform any one or more aspects and/or methods of the present disclosure, may be executed. It is also contemplated that multiple computing devices may be utilized to implement a specifically configured set of instructions for causing one or more devices to perform any one or more aspects and/or methods of the present disclosure. The computer system 600 includes a processor 604 and a memory 608, the processor 604 and the memory 608 communicating with each other and with other components via a bus 612. The bus 612 may include any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures.

Memory 608 may include various components (e.g., machine-readable media), including but not limited to random access memory components, read only components, and any combination thereof. In one example, a basic input/output system 616(BIOS), containing the basic routines that help to transfer information between elements within computer system 600, such as during start-up, may be stored in memory 608. Memory 608 may also include (e.g., stored on one or more machine-readable media) instructions (e.g., software) 620, instructions 620 embodying any one or more aspects and/or methodologies of the present disclosure. In another example, memory 608 may further include any number of program modules including, but not limited to, an operating system, one or more application programs, other program modules, program data, and any combinations thereof.

The computer system 600 may also include a storage device 624. Examples of storage devices (e.g., storage device 624) include, but are not limited to, hard disk drives, magnetic disk drives, optical disk drives in combination with optical media, solid state storage devices, and any combination thereof. The storage device 624 may be connected to the bus 612 by an appropriate interface (not shown). Example interfaces include, but are not limited to, SCSI, Advanced Technology Attachment (ATA), Serial ATA, Universal Serial Bus (USB), IEEE 1394 (firewire), and any combination thereof. In one example, storage 624 (or one or more components thereof) may be removably interfaced with computer system 600 (e.g., via an external port connector (not shown)). In particular, storage 624 and associated machine-readable media 628 may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for computer system 600. In one example, software 620 may reside, completely or partially, within machine-readable media 628. In another example, the software 620 may reside, completely or partially, within the processor 604.

The computer system 600 may also include an input device 632. In one example, a user of computer system 600 may enter commands and/or other information into computer system 600 via input device 632. Examples of input device 632 include, but are not limited to, an alphanumeric input device (e.g., a keyboard), a pointing device, a joystick, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), a cursor control device (e.g., a mouse), a touchpad, an optical scanner, a video capture device (e.g., a still camera, a video camera), a touchscreen, and any combination thereof. Input device 632 may be connected to bus 612 via any of a variety of interfaces (not shown), including, but not limited to, a serial interface, a parallel interface, a game port, a USB interface, a firewire interface, a direct interface to bus 612, and any combination thereof. The input device 632 may comprise a touch screen interface that may be part of the display 636 or separate from the display 636, as will be discussed further below. Input device 632 may serve as a user selection device for selecting one or more graphical representations in a graphical interface as described above.

A user may also enter commands and/or other information into computer system 600 through storage 624 (e.g., a removable disk drive, a flash drive, etc.) and/or network interface device 640. A network interface device, such as network interface device 640, may be used to connect the computer system 600 to one or more of various networks, such as the network 644, and to one or more remote devices 648 connected thereto. Examples of network interface devices include, but are not limited to, a network interface card (e.g., a mobile network interface card, a LAN card), a modem, and any combination thereof. Examples of networks include, but are not limited to, a wide area network (e.g., the internet, an enterprise network), a local area network (e.g., a network associated with an office, a building, a campus, or other relatively small geographic space), a telephone network, a data network associated with a telephone/voice provider (e.g., a mobile communications provider data and/or voice network), a direct connection between two computing devices, and any combination thereof. Networks such as network 644 may employ wired and/or wireless communication modes. In general, any network topology may be used. Information (e.g., data, software 620, etc.) may be transferred to computer system 600 and/or from computer system 600 via network interface device 640.

Computer system 600 may also include a video display adapter 652 for transferring displayable pictures to a display device, such as display device 636. Examples of display devices include, but are not limited to, Liquid Crystal Displays (LCDs), Cathode Ray Tubes (CRTs), plasma displays, Light Emitting Diode (LED) displays, and any combination thereof. A display adapter 652 and a display device 636 may be used in conjunction with the processor 604 to provide graphical representations of various aspects of the present disclosure. In addition to a display device, computer system 600 may include one or more other peripheral output devices, including but not limited to audio speakers, printers, and any combination thereof. These peripheral output devices may be connected to bus 612 through a peripheral interface 656. Examples of peripheral interfaces include, but are not limited to, a serial port, a USB connection, a firewire connection, a parallel connection, and any combination thereof.

The foregoing has described in detail illustrative embodiments of the invention. Various modifications and additions can be made without departing from the spirit and scope of the invention. The features of the various embodiments described above may be combined with the features of the other described embodiments as appropriate in order to provide a variety of combinations of features in the associated new embodiments. Furthermore, while the foregoing describes a number of separate embodiments, this description is merely illustrative of the application of the principles of the present invention. Moreover, although particular methods herein may be shown and/or described as being performed in a particular order, this order is highly variable within the ordinary skill in order to implement the embodiments disclosed herein. Accordingly, this description is meant to be exemplary only, and not limiting as to the scope of the invention.

In the description and claims above, phrases such as "at least one" or "one or more" may appear after a connected list of elements or features. The term "and/or" may also appear in a list of two or more elements or features. Such phrases are intended to mean any element or feature listed individually or in combination with any other listed element or feature, unless implicitly or explicitly contradicted by context in which it is used. For example, at least one of the phrases "a and B; one or more of "" "A and B; "and" A and/or B "are each intended to mean" A alone, B alone, or A and B together. "similar explanations apply to lists containing three or more items. For example, at least one of the phrases "A, B and C; one or more of "" "A, B, C"; and "A, B and/or C" are each intended to mean "a alone, B alone, C alone, a and B together, a and C together, B and C together, or a and B and C together". Furthermore, the use of the term "based on" above and in the claims is intended to mean "based at least in part on" such that unrecited features or elements are also permissible.

The subject matter described herein may be embodied in systems, apparatus, methods, and/or articles of manufacture depending on the desired configuration. The embodiments set forth in the foregoing description do not represent all embodiments consistent with the subject matter described herein. Rather, they are merely a few examples consistent with aspects related to the described subject matter. Although some variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations may be provided in addition to those set forth herein. For example, the above-described embodiments may be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. Moreover, the logic flows depicted in the figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other embodiments are possible within the scope of the following claims.

Claims

1. A decoder comprising circuitry configured to:

receiving a bit stream;

storing a plurality of long-term reference frames in a reference list;

retaining long-term reference frames in the reference list for a length of time based on a retention time; and

decoding at least a portion of video using the long-term reference frames retained in the reference list.

2. The decoder of claim 1 wherein each of the stored long-term reference frames each includes an associated retention time.

3. The decoder of claim 1, further configured to mark the long-term reference frame as unavailable after the long-term reference frame has resided in the reference list for at least the retention time.

4. The decoder of claim 3, further configured to mark the long-term reference frame as available based on a signal in the bitstream.

5. The decoder of claim 1, wherein the bitstream includes a signal indicating removal of the long-term reference frame from memory.

6. The decoder of claim 5, further configured to remove the long-term reference frame from the reference list based on the signal.

7. The decoder of claim 1, further comprising:

an entropy decoder processor configured to receive the bitstream and decode the bitstream into quantized coefficients;

an inverse quantization and inverse transform processor configured to process the quantized coefficients, including performing inverse discrete cosine processing;

a deblocking filter;

a frame buffer; and

an intra prediction processor.

8. The decoder of claim 1, further configured to:

receiving a coding block;

determining that inter-prediction mode has been enabled for the coding block; and

determining a decoded block using the long-term reference frame as a reference frame and according to the inter prediction mode.

9. The decoder of claim 8 wherein the decoded block forms part of a quad-tree plus binary decision tree.

10. The decoder of claim 8 wherein the decoded block is a non-leaf node of a quadtree plus binary decision tree.

11. A method, comprising:

receiving, by a decoder, a bitstream;

storing, by the decoder, a plurality of long-term reference frames in a reference list;

retaining, by the decoder, the long-term reference frame in the reference list for a length of time based on a retention time; and

decoding, by the decoder, at least a portion of video using the long-term reference frames retained in the reference list.

12. The method of claim 11, wherein each of the stored long-term reference frames each includes an associated retention time.

13. The method of claim 11, further comprising marking the long-term reference frame as unavailable after the long-term reference frame has resided in the reference list for at least the retention time.

14. The method of claim 13, further comprising marking the long-term reference frame as available based on a signal in the bitstream.

15. The method of claim 11, wherein the bitstream includes a signal indicating removal of the long-term reference frame from memory.

16. The method of claim 15, further comprising removing the long-term reference frame from the reference list based on the signal.

17. The method of claim 11, wherein the decoder further comprises:

a deblocking filter;

a frame buffer; and

an intra prediction processor.

18. The method of claim 11, further comprising:

receiving a coding block;

19. The method of claim 18, wherein the decoded block forms part of a quad-tree plus binary decision tree.

20. The method of claim 18, wherein said decoded block is a non-leaf node of a quadtree plus binary decision tree.