CN111955009A - Unequal weight planar motion vector derivation - Google Patents

Unequal weight planar motion vector derivation Download PDF

Info

Publication number
CN111955009A
CN111955009A CN201980025081.1A CN201980025081A CN111955009A CN 111955009 A CN111955009 A CN 111955009A CN 201980025081 A CN201980025081 A CN 201980025081A CN 111955009 A CN111955009 A CN 111955009A
Authority
CN
China
Prior art keywords
neighboring
information associated
motion information
video
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980025081.1A
Other languages
Chinese (zh)
Inventor
克里特·帕努索波内
洪胜煜
余越
王利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ai Ruishi LLC
Original Assignee
Ai Ruishi LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ai Ruishi LLC filed Critical Ai Ruishi LLC
Priority claimed from PCT/US2019/027560 external-priority patent/WO2019204234A1/en
Publication of CN111955009A publication Critical patent/CN111955009A/en
Pending legal-status Critical Current

Links

Images

Abstract

A method of planar motion vector derivation is provided that may employ an unequal weighted combination of neighboring motion vectors. In some implementations, motion vector information associated with a bottom-right pixel or block adjacent to the current coding unit may be derived from motion information associated with a top row or top neighboring row of the current coding unit and motion information associated with a left column or left neighboring column of the current coding unit. Weighted or unweighted combinations of such values may be combined in a planar mode prediction model to derive associated motion information for the bottom and/or right neighboring pixels or blocks.

Description

Unequal weight planar motion vector derivation
Priority requirement
This application is in accordance with 35u.s.c. § 119(e) priority of earlier filed us 62/657,831, us provisional application filed 2018, 4, 15, the entire contents of which are hereby incorporated herein by reference.
Technical Field
The present disclosure relates to the field of video coding, in particular to increased coding efficiency and associated memory burden with a reduction in the number of stored co-located pictures.
Background
Technological improvements of the evolving video coding standards show a tendency to increase coding efficiency to achieve higher bit rates, higher resolutions, and better video quality. A new video coding scheme called jviet was developed by the joint video exploration team and an updated video coding scheme called universal video coding (VVC) -the VVC of draft 2 of the standard titled universal video coding (draft 2) published by jviet on 2018, 10, 17The entire contents of the versions are hereby incorporated by reference. Similar to other video coding schemes like HEVC (high efficiency video coding), both jfet and VVC are block-based hybrid spatial and temporal prediction coding schemes. However, with respect to HEVC, jvt and VVC include many modifications to the bitstream structure, syntax, constraints, and mappings used to generate decoded pictures. Jfet has been implemented in Joint Exploration Model (JEM) encoders and decoders, but VVC is expected to be implemented by the year 2020.
Current and contemplated video coding schemes typically utilize simple assumptions of neighboring pixel/block similarities to determine values of predicted intensity values for neighboring pixels or pixel blocks. The same process may be implemented for the associated Motion Vector (MV). However, this assumption may lead to erroneous results. There is a need for a system and method of unequal weight planar motion vector derivation.
Disclosure of Invention
A system of one or more computers may be configured to perform certain operations or actions by installing software, firmware, hardware, or a combination thereof on the system that, in operation, causes the system to perform certain described actions. One or more computer programs may be configured to perform particular operations or actions by including instructions that, when executed by a data processing apparatus, cause the apparatus to perform the actions. One general aspect includes: identifying a coding unit having a top adjacent row, a left adjacent column, a bottom adjacent row, and a right adjacent column; determining motion information associated with a lower right neighboring pixel located at an intersection of the bottom neighboring row and the right neighboring column based, at least in part, on motion information associated with the top row and the left neighboring column; determining motion information associated with the right-adjacent column based at least in part on the motion information associated with the lower right-adjacent pixel; and encoding the encoding unit. Other embodiments of this aspect may include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features: determining motion information associated with the bottom neighboring row based at least in part on the motion information associated with the bottom neighboring pixel; wherein a planar coding mode is adopted; determining a first weight value associated with the top adjacent row and determining a second weight value associated with the left adjacent column, wherein the step of determining motion information associated with the right lower adjacent pixel is based at least in part on a combination of the first weight value and the motion information associated with the top adjacent row and a combination of the second weight value and the motion information associated with the left adjacent column. Implementations of the described technology may further include hardware, methods or processes, or computer software on a computer-accessible medium.
Further, a general aspect may include a system of video encoding, comprising: storing in a memory coding units having a top adjacent row, a left adjacent column, a bottom adjacent row, and a right adjacent column; determining and storing in the memory motion information associated with a bottom-right neighboring pixel positioned at an intersection of the bottom neighboring row and the right neighboring column based at least in part on motion information associated with the top row and the left neighboring column; determining and storing in memory motion information associated with the right-adjacent column based at least in part on the motion information associated with the lower right-adjacent pixel; and encoding the encoding unit. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features: the system for video encoding further comprises: determining and storing in memory motion information associated with the bottom neighboring row based at least in part on the motion information associated with the bottom neighboring pixel. The system for video encoding further comprises: determining and storing in memory a first weight value associated with the top adjacent row and determining and storing in memory a second weight value associated with the left adjacent column, wherein the step of determining motion information associated with the right lower adjacent pixel is based at least in part on a combination of the first weight value and the motion information associated with the top adjacent row and a combination of the second weight value and the motion information associated with the left adjacent column. Implementations of the described technology may include hardware, methods or processes, or computer software on a computer-accessible medium.
Drawings
Further details of the invention are explained with the aid of the drawings, in which:
fig. 1 depicts the division of a frame into a plurality of Coding Tree Units (CTUs).
Fig. 2a-2c depict an exemplary partitioning of a CTU into Coding Units (CUs).
FIG. 3 depicts a quadtree plus binary tree (QTBT) representation of the CU partition of FIG. 2.
Fig. 4 depicts a simplified block diagram for CU encoding in a jfet or VVC encoder.
Fig. 5 depicts possible intra prediction modes for luma components in jvt or VVC.
Fig. 6 depicts a simplified block diagram for CU encoding in a jfet or VVC decoder.
Fig. 7a-7b depict graphical representations of horizontal and vertical predictor operators, respectively.
FIG. 8 depicts an exemplary embodiment of the weight parameter S [ n ], where the sum of the width and height is 256.
FIG. 9 depicts an exemplary embodiment of the weight parameter S [ n ], where the sum of the width and height is 512.
FIG. 10 depicts a block flow diagram of a system and method of unequal weight planar motion vector derivation.
Fig. 11 depicts an embodiment of a computer system adapted and configured to provide variable template sizes for template matching.
Fig. 12 depicts an implementation of a video encoder/decoder adapted and configured to provide variable template sizes for template matching.
Detailed Description
Fig. 1 depicts the division of a frame into a plurality of Coding Tree Units (CTUs) 100. A frame may be an image in a video sequence. A frame may comprise a matrix or a set of matrices with pixel values representing intensity measures in an image. Thus, a set of these matrices may generate a video sequence. Pixel values may be defined to represent color and brightness in full color video coding, where the pixels are divided into three channels. For example, in the YCbCr color space, a pixel may have a luminance value Y representing the gray-level intensity in the image, and two chrominance values Cb and Cr representing the degree of difference in color from gray to blue and red. In other embodiments, pixel values may be represented by values in different color spaces or models. The resolution of the video may determine the number of pixels in a frame. Higher resolution may represent more pixels and better image definition, but may also result in higher bandwidth, storage and transmission requirements.
A frame of a video sequence may be encoded and decoded using jfet. Jfet is a video coding scheme developed in conjunction with a video exploration team. Multiple versions of JVET have been implemented in JEM (joint exploration model) encoders and decoders. Similar to other video coding schemes like HEVC (high efficiency video coding), jfet is a block-based hybrid spatial and temporal prediction coding scheme. During encoding with jfet, the frame is first divided into square blocks called CTUs 100, as shown in fig. 1. For example, the CTU100 may be a block of 128 × 128 pixels.
Fig. 2a depicts an exemplary partitioning of a CTU100 into CUs 102. Each CTU100 in a frame may be partitioned into one or more CUs (coding units) 102. CU102 may be used for prediction and transformation, as described below. Unlike HEVC, in jfet, CU102 may be rectangular or square and may be encoded without further partitioning into prediction units or transform units. A CU102 may be as large as its root CTU100 or a smaller subdivision of the root CTU100 as small as a 4 x 4 block.
In JFET, the CTU100 may be partitioned into CUs 102 according to a quadtree-plus-binary tree (QTBT) scheme, where the CTU100 may be recursively partitioned into square blocks according to a quadtree, and then those square blocks may be recursively partitioned horizontally or vertically according to a binary tree. Parameters such as CTU size, minimum sizes of the quad tree and binary tree leaf nodes, maximum size of the binary tree root node, and maximum depth of the binary tree may be set to control partitioning according to QTBT. In VVC, the CTU100 may also be divided into CUs by a ternary tree division.
By way of non-limiting example, fig. 2a shows CTUs 100 partitioned into CUs 102, where solid lines represent quadtree partitions and dashed lines represent binary tree partitions. As shown, binary tree partitioning allows for horizontal partitioning and vertical partitioning to define the structure of CTUs and their subdivision into CUs. Fig. 2b and 2c depict alternative non-limiting examples of the ternary tree partitioning of CUs, where the subdivision of CUs is unequal.
FIG. 3 depicts the segmented QTBT representation of FIG. 2. The quad tree root node represents the CTU100, where each child node in the quad tree portion represents one of four square blocks divided from a square parent block. The square blocks represented by the leaf nodes of the quadtree, which are the root nodes of the binary tree, can then be divided zero or more times using a binary tree. At each level of the binary tree portion, the blocks may be divided vertically or horizontally. A flag set to "0" indicates a horizontally divided block, and a flag set to "1" indicates a vertically divided block.
After quadtree partitioning and binary tree partitioning, the blocks represented by the leaf nodes of the QTBT represent the final CU102 to be encoded, e.g., using inter-prediction or intra-prediction encoding. For slices or full frames encoded with inter prediction, different partition structures may be used for the luma and chroma components. For example, for inter slices, CU102 may have Coding Blocks (CBs) for different color components, e.g., one luma CB and two chroma CBs. For slices or full frames encoded with intra prediction, the partitioning structure may be the same for the luma and chroma components.
Fig. 4 depicts a simplified block diagram for CU coding in a jfet encoder. The main stages of video coding include: partitioning as described above to identify CU102, followed by encoding CU102 using prediction at 404 or 406, generating residual CU 410 at 408, transforming at 412, quantizing at 416, and entropy encoding at 420. The encoder and encoding process shown in fig. 4 also includes a decoding process described in more detail below.
Given the current CU102, the encoder may obtain the predicted CU 402 spatially using intra-prediction at 404 or temporally using inter-prediction at 406. The basic idea of predictive coding is to transmit a differential or residual signal between the original signal and the prediction for the original signal. At the receiver side, the original signal may be reconstructed by adding the residual and the prediction, as will be described below. Because the differential signal has a lower correlation than the original signal, it requires fewer bits to transmit.
A slice that is encoded entirely with intra-prediction CU, e.g., an entire picture or a portion of a picture, may be an I-slice that can be encoded without reference to other slices, and thus may be a possible point at which decoding can begin. A slice encoded with at least some inter-predicted CUs may be a predicted (P) or bi-predicted (B) slice that may be decoded based on one or more reference pictures. P slices may use intra prediction and inter prediction with previously coded slices. For example, P slices may be compressed further than I slices using inter prediction, but require encoding of previously encoded slices to encode them. The B slice may use intra prediction or inter prediction, using interpolated prediction from two different frames, to use data from previous and/or subsequent slices for its encoding, thereby improving the accuracy of the motion estimation process. In some cases, intra block copy may also or alternatively be used to encode P and B slices, where data from other portions of the same slice is used.
As will be discussed below, intra-prediction or inter-prediction may be performed based on CU 434 reconstructed from a previously encoded CU102 (e.g., neighboring CU102 or CU102 in a reference picture).
When CU102 is encoded spatially with intra prediction at 404, an intra prediction mode may be found that best predicts the pixel values of CU102 based on samples from neighboring CU102 in the picture.
When encoding the luma component of a CU, the encoder may generate a list of candidate intra prediction modes. Although HEVC has 35 possible intra prediction modes for luma components, in jfet there are 67 possible intra prediction modes for luma components and 85 prediction modes in VVC. These modes include a planar mode using a three-dimensional plane of values generated from neighboring pixels, a DC mode using an average value of neighboring pixels, a DC mode using values copied from neighboring pixels in a direction indicated by a solid line, 65 directional modes shown in fig. 5, and 18 wide-angle prediction modes that can be used with non-square blocks.
When generating the list of candidate intra prediction modes for the luma component of the CU, the number of candidate modes on the list may depend on the size of the CU. The candidate list may include: a subset of 35 modes of HEVC with lowest SATD (sum of absolute transform differences) cost; new directional patterns added for jfet adjacent to candidates found from HEVC mode; and a set of six Most Probable Modes (MPMs) from CU102 identified based on intra prediction modes for previously encoded neighboring blocks, and a mode in the default mode list.
The list of candidate intra prediction modes may also be generated when coding the chroma components of the CU. The list of candidate modes may include modes generated from luma samples using cross-component linear model projection, intra prediction modes found for luma CB, especially co-located positions in chroma blocks, and chroma prediction modes previously found for neighboring blocks. The encoder may find the candidate mode on the list with the lowest rate-distortion cost and use those intra prediction modes when encoding the luma and chroma components of the CU. Syntax may be encoded in a bitstream that indicates the intra-prediction mode used to encode each CU 102.
After the best intra prediction mode for CU102 has been selected, the encoder may use those modes to generate predicted CU 402. When the selected mode is a directional mode, a 4-tap filter may be used to improve directional accuracy. The column or row at the top or left side of the prediction block may be adjusted with a boundary prediction filter, e.g., a 2-tap or 3-tap filter.
The predicted CU 402 may be further smoothed using a position dependent intra prediction combining (PDPC) process that uses unfiltered samples of neighboring blocks to adjust the predicted CU 402 generated based on the filtered samples of the neighboring blocks, or adaptive reference sample smoothing using a 3-tap or 5-tap low pass filter to process the reference samples.
When CU102 is temporally encoded using inter prediction at 406, a set of Motion Vectors (MVs) may be found that point to samples in a reference picture that make the best prediction for the pixel values of CU 102. Inter prediction exploits temporal redundancy between slices by representing the displacement of blocks of pixels in the slices. The displacement is determined from the pixel values in the previous or subsequent slices by a process called motion compensation. A motion vector and associated reference index representing a pixel displacement relative to a particular reference picture may be provided to a decoder in a bitstream, along with a residual between the original pixel and the motion compensated pixel. The decoder may use the residual and the signaled motion vector and reference index to reconstruct the pixel blocks in the reconstructed slice.
In jfet, the motion vector may be stored with an accuracy of 1/16 pixels, and the difference between the motion vector and the predicted motion vector of the CU may be encoded with a quarter-pixel or integer-pixel resolution.
In jfet, motion vectors may be found for multiple sub-CUs within CU102 using various techniques, such as Advanced Temporal Motion Vector Prediction (ATMVP), Spatial Temporal Motion Vector Prediction (STMVP), affine motion compensated prediction, pattern-matched motion vector derivation (PMMVD), and/or bi-directional optical flow (BIO).
Using ATMVP, the encoder may find a temporal vector for CU102 that points to the corresponding block in the reference picture. The temporal vector may be found based on the motion vectors and reference pictures found for the previously encoded neighboring CU 102. A motion vector may be found for each sub-CU within CU102 using the reference block to which the temporal vector of the entire CU102 points.
STMVP may find a motion vector for a sub-CU by scaling and averaging the motion vectors found for neighboring blocks previously coded using inter prediction, and find a temporal vector together.
An affine motion compensated prediction may be used to predict the motion vector field for each sub-CU in a block based on two control motion vectors found for corners of the block. For example, the motion vectors of sub-CUs may be derived based on corner motion vectors found for each 4 x 4 block within CU 102.
PMMVD may use bilateral matching or template matching to find the initial motion vector for current CU 102. Bilateral matching may look at the current CU102 and reference blocks in two different reference pictures along the motion trajectory, while template matching may look at the corresponding blocks in the current CU102 and reference pictures identified by the template. The initial motion vectors found for CU102 may then be refined for each sub-CU one by one.
BIO may be used when performing inter prediction based on earlier and later reference pictures with bi-prediction, and BIO allows finding a motion vector for a sub-CU based on a gradient of a difference between two reference pictures.
In some cases, Local Illumination Compensation (LIC) may be used at the CU level to find the values of the scaling factor parameter and the offset parameter based on samples adjacent to the current CU102 and corresponding samples adjacent to the reference block identified by the candidate motion vector. In jfet, the LIC parameters may be changed and signaled at the CU level.
For some of the above approaches, the motion vectors found for each sub-CU of a CU may be signaled to the decoder at the CU level. For other methods, such as PMMVD and BIO, motion information is not signaled in the bitstream to save overhead, and the decoder can derive the motion vectors by the same process.
After having found the motion vectors for CU102, the encoder may use those motion vectors to generate predicted CU 402. In some cases, Overlapped Block Motion Compensation (OBMC) may be used when motion vectors have been found for individual sub-CUs, when generating the predicted CU 402 by combining those motion vectors with motion vectors previously found for one or more neighboring sub-CUs.
When using bi-prediction, the jfet may use decoder-side motion vector refinement (DMVR) to find the motion vectors. DMVR allows motion vectors to be found based on two motion vectors found for bi-directional prediction using a bi-directional template matching process. In DMVR, a weighted combination of the predicted CU 402 generated with each of the two motion vectors can be found and the two motion vectors can be refined by replacing them with a new motion vector that best points to the combined predicted CU 402. Two refined motion vectors may be used to generate the final predicted CU 402.
At 408, as described above, once the predicted CU 402 has been found with intra prediction at 404 or inter prediction at 406, the encoder may subtract the predicted CU 402 from the current CU102 to find the residual CU 410.
The encoder may use one or more transform operations at 412 to transform the residual CU 410 into transform coefficients 414 that express the residual CU 410 in the transform domain, e.g., using a discrete cosine block transform (DCT transform) to convert the data into the transform domain. Compared to HEVC, JFET allows more types of transform operations, including DCT-II, DST-VII, DCT-VIII, DST-I, and DCT-V operations. The allowed transform operations may be grouped into subsets and an indication of which subsets are used and which particular operations in those subsets may be signaled by the encoder. In some cases, a large block size transform may be used to zero out high frequency transform coefficients in CUs 102 that are larger than a certain size, so that low frequency transform coefficients are kept for only those CUs 102.
In some cases, a mode dependent quadratic transform (mdsnst) may be applied to the low frequency transform coefficients 414 after the forward kernel transform. The MDNSST operation may use the Hypercube-Givens transform (HyGT) based on rotation data. When used, an index value identifying a particular mdsnst operation may be signaled by the encoder.
At 416, the encoder may quantize the transform coefficients 414 into quantized transform coefficients 416. The quantization for each coefficient may be calculated by dividing the coefficient value by a quantization step size, which is derived from a Quantization Parameter (QP). In some embodiments, Qstep is defined as 2(QP-4)/6Quantization may facilitate data compression because the high precision transform coefficients 414 may be converted into quantized transform coefficients 416 having a limited number of possible values. Quantization of the transform coefficients may then limit the amount of bits generated and transmitted by the transform process. However, although quantization is a lossy operation and the loss of quantization cannot be recovered, the quantization process makes a trade-off between the quality of the reconstructed sequence and the amount of information needed to represent the sequence. For example, a lower QP value may result in better quality decoded video, although a greater amount of data is required for representation and transmission. Conversely, high QP values may yield qualitative resultsLower amounts of reconstructed video sequences, but lower data and bandwidth requirements.
Instead of using the same frame QP when encoding each CU102 of a frame, jfet may utilize a variance-based adaptive quantization technique, which allows each CU102 to use different quantization parameters for its encoding process. Variance-based adaptive quantization techniques adaptively reduce the quantization parameter for some blocks while increasing the quantization parameter in other blocks. To select a particular QP for CU102, the variance of the CU is calculated. In short, if the variance of a CU is higher than the average variance of a frame, a QP higher than the QP of the frame may be set for that CU 102. A lower QP may be allocated if CU102 exhibits a lower variance than the average variance of the frame.
At 420, the encoder may find the final compressed bits 422 by entropy encoding the quantized transform coefficients 418. Entropy coding aims to eliminate statistical redundancy of the information to be transmitted. In jfet, the quantized transform coefficients 418 may be encoded using CABAC (context adaptive binary arithmetic coding), a technique that uses a probability metric to eliminate statistical redundancy. For CUs 102 with non-zero quantized transform coefficients 418, the quantized transform coefficients 418 may be converted to binary. Each bit of the binary representation ("bin") may then be encoded using the context model. CU102 may be decomposed into three regions, each region having its own set of context models for the pixels within the region.
Multiple scan passes may be performed to encode the binary bits. During the round of encoding the first three bins (bin0, bin1, and bin2), an index value indicating which context model to use for that bin may be found by finding the sum of its positions in up to five previously encoded neighboring quantized transform coefficients 418 identified by the template.
The context model may be based on the probability that the value of the binary bit is "0" or "1". When encoding values, the probabilities in the context model may be updated based on the actual number of values "0" and "1" encountered. Although HEVC uses a fixed table to reinitialize the context model for each new picture, in jfet the probability of the context model for a new inter-predicted picture can be initialized based on the context model developed for the previously encoded inter-predicted picture.
The encoder may generate a bitstream containing entropy encoded bits 422 of residual CU 410, prediction information such as a selected intra-prediction mode or motion vector, indicators of how to partition CU102 from CTU100 according to the QTBT structure, and/or other information about the encoded video. The bitstream may be decoded by a decoder, as described below.
In addition to using the quantized transform coefficients 418 to find the final compressed bits 422, the encoder may also use the quantized transform coefficients 418 to generate the reconstructed CU 434 by following the same decoding process that the decoder will use to generate the reconstructed CU 434. Thus, once the transform coefficients have been calculated and quantized by the encoder, the quantized transform coefficients 418 may be transmitted to a decoding loop in the encoder. After quantizing the transform coefficients of the CU, the decoding loop allows the encoder to generate the same reconstructed CU 434 as the decoder generated during the decoding process. Thus, when performing intra-prediction or inter-prediction on a new CU102, the encoder may use the same reconstructed CU 434 that the decoder would use for neighboring CUs 102 or reference pictures. Reconstructed CU102, reconstructed slice, or complete reconstructed frame may serve as a reference for other prediction stages.
At the decoding loop of the encoder (and see below for the same operation in the decoder), a dequantization process may be performed in order to obtain pixel values of the reconstructed image. To dequantize a frame, the quantized value for each pixel of the frame is multiplied by a quantization step, e.g., (Qstep) above, to obtain reconstructed dequantized transform coefficients 426, for example. For example, in the decoding process shown in fig. 4, in the encoder, the quantized transform coefficients 418 of the residual CU 410 may be dequantized at 424 to find dequantized transform coefficients 426. If the mdsnst operation is performed during encoding, the operation may be reversed after dequantization.
At 428, a reconstructed image may be obtained, for example, by applying a DCT to the values, thereby inverse transforming the dequantized transform coefficients 426 to find a reconstructed residual CU 430. At 432, the reconstructed residual CU 430 may be added to the corresponding predicted CU 402 found with intra prediction at 404 or with inter prediction at 406 in order to find a reconstructed CU 434.
At 436, one or more filters may be applied to the reconstruction data during the decoding process (in the encoder, or in the decoder, as described below) at the picture level or CU level. For example, the encoder may apply a deblocking filter, a Sample Adaptive Offset (SAO) filter, and/or an Adaptive Loop Filter (ALF). The decoding process of the encoder may implement a filter to estimate and transmit to the decoder the best filter parameters that can account for potential artifacts in the reconstructed image. Such improvements improve the objective and subjective quality of the reconstructed video. In deblocking filtering, pixels near sub-CU boundaries may be modified, while in SAO, pixels in CTU100 may be modified using edge offset or band offset classification. The ALF of jfet may use a filter with a circularly symmetric shape for each 2 × 2 block. An indication of the size and identity of the filter for each 2 x 2 block may be signaled.
If the reconstructed pictures are reference pictures, they may be stored in reference buffer 438 for inter-prediction of future CUs 102 at 406.
During the above steps, jfet allows the color values to be adjusted using a content adaptive clipping operation to match between the upper and lower clipping boundaries. The clipping boundary may change for each slice and parameters identifying the boundary may be signaled in the bitstream.
Fig. 6 depicts a simplified block diagram for CU coding in a jfet decoder. The jfet decoder may receive a bitstream containing information about the encoded CU 102. The bitstream may indicate how CU102 of a picture is partitioned from CTU100 according to the QTBT structure, prediction information (e.g., intra prediction mode or motion vector) of CU102, and bits 602 representing entropy encoded residual CU.
At 604, the decoder may decode the entropy coded bits 602 using a CABAC context model signaled by the encoder in the bitstream. The decoder may use the parameters signaled by the encoder to update the probabilities of the context model in the same way as the probabilities of updating the context model during encoding.
After the entropy encoding is inverse at 604 to find the quantized transform coefficients 606, the decoder may dequantize them at 608 to find the dequantized transform coefficients 610. If the mdsnst operation is performed during encoding, the operation can be inverted by the decoder after de-quantization.
At 612, the dequantized transform coefficients 610 may be inverse transformed to find reconstructed residual CU 614. At 616, reconstructed residual CU 614 may be added to corresponding predicted CU 626 found with intra prediction at 622 or inter prediction at 624 in order to find reconstructed CU 618.
At 620, one or more filters may be applied to the reconstruction data at the picture level or CU level. For example, the decoder may apply a deblocking filter, a Sample Adaptive Offset (SAO) filter, and/or an Adaptive Loop Filter (ALF). As described above, the optimal filter parameters may be estimated using an in-loop filter located in the decoding loop of the encoder to improve the objective and subjective quality of the frame. These parameters are transmitted to the decoder to filter the reconstructed frame at 620 to match the filtered reconstructed frame in the encoder.
After generating a reconstructed picture by finding a reconstructed CU 618 and applying the signaled filter, the decoder may output the reconstructed picture as output video 628. If the reconstructed pictures are to be used as reference pictures, they may be stored in reference buffer 630 for inter prediction of future CUs 102 at 624.
The planar mode is often the intra coding mode most often used in VVC, HEVC and jfet. Fig. 7a and 7b show the VVC, HEVC and jfet plane predictor generation processes for horizontal predictor calculation (fig. 7a)700 and vertical predictor calculation (fig. 7b)710 for coding units (blocks) with height H8702 and width W8704, where the (0,0)706 coordinator corresponds to the upper left position within the coded CU.
Planar modes in VVC, HEVC, and jfet (HEVC plane) generate a first order approximation of the prediction of the current Coding Unit (CU) by forming planes based on intensity values of neighboring pixels. Due to the raster scan coding order, the reconstructed left column neighboring pixels and the reconstructed top row neighboring pixels are available for the current CU, rather than the right column neighboring pixels and the bottom row neighboring pixels. The plane predictor sub-process of VVC, HEVC and jfet sets the intensity values of all right column neighbors to be the same as the intensity value of the upper right neighbor and the intensity values of all bottom row pixels to be the same as the intensity value of the lower left neighbor.
Once the neighboring pixels of the surround prediction block are defined, the horizontal and vertical predictors (P respectively) for each pixel within the CU can be determined according to the following equationsh(x, y) and Pv(x,y)):
Ph(x,y)=(W-1-x)*R(-1,y)+(x+1)*R(W,-1)
Pv(x,y)=(H-1-y)*R(x,-1)+(y+1)*R(-1,H)
Wherein
R (x, y) represents the intensity value of the reconstructed neighboring pixel at (x, y) coordinates;
w is the block width; and
h is the block height.
From these values, the final plane predictor P (x, y) is calculated by averaging the horizontal predictor and the vertical predictor, with a specific adjustment when the current CU is non-square according to the following equation:
Figure BDA0002717291420000151
this plane prediction concept can be applied to determine MVs with fine granularity. In VVC and jfet, each 4 × 4 subblock within a CU may have its own MV. If MV (x, y) is assumed to be MV of a sub-block containing pixel (x, y) and the earlier described concept of planar intra prediction, the reconstructed pixel R (x, y) can be converted to MV at sub-block level, MV (x, y), notably noting that R (x, y) is 1D intensity for intra planes and P (x, y) is 2D MV for inter planes. That is, the horizontal and vertical predictors Ph(x, y) and Pv(xY) can be converted into horizontal and vertical MV predictors (MVs, respectively)h(x, y) and MVv(x, y)). The final plane MV, MV (x, y), may then be calculated by averaging the horizontal and vertical predictors.
In some implementations, multiple reference slices may be used for temporal prediction. Thus, neighboring sub-blocks may use a differential reference for their associated MVs. For simplicity, where more than one reference is employed, one reference may be used when combining more than one MV. In some implementations, one possible option to combine multiple references is to select the reference that is closest in POC distance to the neighboring sub-block of the encoded picture.
The planar derivation of MVs may require MVs of CUs surrounding neighboring sub-blocks. However, in some implementations, it may be possible that some neighboring sub-blocks may not be considered available for plane derivation because they may not have a suitable MV. By way of non-limiting example, neighboring sub-blocks may be encoded in intra mode, appropriate reference lists may not be used, or appropriate reference slices may not be used. In such cases, default MVs or alternative MVs may be used in place of particular MVs of neighboring sub-blocks. In this case, the alternative MV may be used, for example, based on the MV of the first available neighboring neighbor. In an alternative implementation where neighboring sub-blocks MVs do not use the appropriate reference slice presented, one possible choice is to scale the available MVs to the desired reference slice with a weighting factor according to the ratio of temporal distances.
The present disclosure presents a system and method to: the MV of the lower right neighboring sub-block of the current CU is derived (lifting process), and then the MVs of the bottom row and right column neighboring sub-blocks are calculated using the derived MVs of the lower right neighboring sub-block along with the MVs of the other corner neighboring sub-blocks (e.g., upper right neighboring sub-block, lower left neighboring sub-block).
In some implementations, the lower right sub-block MV derivation process may be a weighted average of the upper right and lower left neighboring sub-blocks, as defined in the equation presented below:
Figure BDA0002717291420000171
Figure BDA0002717291420000172
in an alternative implementation, a flat plane may be assumed, and based on MVs of the top-left, top-right, and bottom-left neighboring sub-blocks, MVs may be derived based on the following equations:
MV(W,H)=MV(W,-1)+MV(-1,H)–MV(-1,-1)
where position (0,0) represents the top-left sub-block position of the current block, W is the width of the current block and H is the height of the current block. MV (x, y) may thus represent MV of the reconstructed sub-block containing the pixel at position (x, y) as well as estimated/predicted MV at the sub-block containing position (x, y).
Yet another non-limiting example may derive the lower right sub-block MV based on using MVs of sub-blocks at co-located positions in the co-located reference that contain pixel (W, H) (similar to TMVP derivation).
In case MV, MV (W, H) of the lower right neighboring sub-block is derived, MV of the bottom row neighboring sub-block can be calculatedb(x, H) and MV, MV of the right column neighboring subblockr(W, y). If linear interpolation is assumed, the MV may be defined as follows:
Figure BDA0002717291420000173
Figure BDA0002717291420000174
however, in alternative implementations, models other than linear interpolation may be used to correlate the MVs.
Once motion vectors are defined around neighboring sub-blocks of the current CU, the horizontal and vertical MVs (MVs, respectively) of each sub-block within the CU may be determined according to the following equationsh(x, y) and MVv(x,y))∶
MVh(x,y)=(W-1-x)*MV(-1,y)+(x+1)*MVr(W,y)
MVv(x,y)=(H-1-y)*MV(x,-1)+(y+1)*MVb(x,H)
Wherein the MVh(x, y) and MVv(x, y) are scaled versions of the horizontal and vertical MV predictors. However, these factors can be compensated in the final MV predictor calculation step.
In some embodiments, the top-right and bottom-left corner sub-block positions may be set to MV (W-1, -1) and MV (-1, H-1), respectively. In such implementations, the interpolation for the intermediate predictor may be described, for example, by the following equation:
Figure BDA0002717291420000181
MVh(x,y)=(W-1-x)*MV(-1,y)+(x+1)*MVr(W-1,y)
MVv(x,y)=(H-1-y)*MV(x,-1)+(y+1)*MVb(x,H-1)
some embodiments may utilize unequal weight combining for final plane MV derivation. Unequal weights may be employed to take advantage of differences in the accuracy of the input intensities in the final interpolation process. In particular, a greater weight may be applied to sub-block locations that are closer to more reliable neighboring sub-block locations. In VVC and jfet, the processing order follows a raster scan at the CTU level and a z-scan for CUs within the CTU. Thus, the top row and left column neighbor sub-blocks are the actual reconstructed sub-blocks and are more reliable than the estimated bottom row and right column neighbor sub-blocks. An example application using unequal weights employed at the final MV derivation is described in the following equation:
Figure BDA0002717291420000182
also, the example of unequal weight assignment shown in the above equation may be generalized as a general equation as shown below:
Figure BDA0002717291420000183
where A (x, y) and B (x, y) are position dependent weighting factors for the horizontal and vertical predictors, respectively, c (x, y) is a position dependent rounding factor, and D (x, y) is a position dependent scaling factor.
It should be noted that the unequal weight assignment may also be used in the horizontal and vertical MV predictor computation stages, and the lower right position adjustment and unequal weight assignment components may be used together or separately, depending on codec design considerations. In some implementations, the weighting factors and lifting processes may be modified according to picture type (I, P, B and/or any other known, convenient, and/or desired type), temporal layer, color component (Y, Cb, Cr, and/or any other known, convenient, and/or desired color component).
In some embodiments, a special merge mode with a unique merge candidate indicator may be used to signal the use of unequal weight plane MV derivation. In an implementation where this special merge mode is selected, the merge sub-blocks MV may be calculated according to the following equation:
Figure BDA0002717291420000191
Figure BDA0002717291420000192
the equations associated with the lower right sub-block position adjustment process and the unequal weight assignment process, respectively, as presented above, involve division operations, which may be costly in terms of computational complexity. However, these division operations can typically be converted to scaling operations to make them more efficient and implementation friendly, as presented in the following equation:
MV(W,H)=((W*MV(W,-1)+H*MV(-1,H))*S[W+H])
>>ShiftDenom
MV(W,H)=((H*MV(W,-1)+W*MV(-1,H))*S[W+H])
>>ShiftDenom
MVb(x,H)=(((W-1-x)*MV(-1,H)+(x+1)*MV(W,H))*S[W])
>>ShiftDenom
MVr(W,y)=(((H-1-y)*MV(W,-1)+(y+1)*MV(W,H))*S[H])
>>ShiftDenom
MV(x,y)
=((H*MVh(x,y)*(y+1)+W*MVv(x,y)*(x+1))
*S[x+y+2])
>>(ShiftDenom+log2W+log2H)
wherein S [ n ]]Is a weighting factor for the parameter n and ShiftDenom is a factor for the shift down operation. Specifically, S [ n ]]Is a factor of
Figure BDA0002717291420000202
And can be described as:
Figure BDA0002717291420000201
an example 800 of S [ n ] is depicted in fig. 8, where the sum of width and height is 256 and ShiftDenom is 10, and another example 900 of S [ n ] is depicted in fig. 9, where the sum of width and height is 512 and ShiftDenom is 10.
In the example presented in fig. 8 and 9, memory sizes of 2570 bits (257 entries of 10 bits each for fig. 8) and 5130 bits (513 entries of 10 bits each for fig. 9) are required to populate the weight table. This memory size may be excessive and the memory burden heavy, and thus may be beneficial for efficiency and memory management to reduce this memory requirement. The following are two examples of two possible ways to achieve a reduction in the size of S [ n ] and memory burden.
Non-limiting examples of S [ n ] are presented below, where the sum of the width and height is 128 and ShiftDenom is 10.
S[n]={341,256,205,171,146,128,114,102,93,85,79,73,68,
64,60,57,54,51,49,47,45,43,41,39,38,37,35,34,33,
32,31,30,29,28,28,27,26,26,25,24,24,23,23,22,22,
21,21,20,20,20,19,19,19,18,18,18,17,17,17,17,16,
16,16,16,15,15,15,15,14,14,14,14,14,13,13,13,13,
13,13,12,12,12,12,12,12,12,12,11,11,11,11,11,11,
11,11,10,10,10,10,10,10,10,10,10,10,9,9,9,9,
9,9,9,9,9,9,9,9,9,8,8,8,8,8,8,8,8}
Another non-limiting example of S [ n ] is shown below, where the sum of the width and height is 128 and ShiftDenom is 9.
S[n]={171,128,102,85,73,64,57,51,47,43,39,37,34,
32,30,28,27,26,24,23,22,21,20,20,19,18,18,17,17,
16,16,15,15,14,14,13,13,13,12,12,12,12,11,11,11,
11,10,10,10,10,10,9,9,9,9,9,9,9,8,8,8,
8,8,8,8,8,7,7,7,7,7,7,7,7,7,7,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4}
In the above example, only 126 of the 129 entries necessary need to be stored, since the first two entries (1/0 and 1/1) are not used in the presented systems and methods. Further, the third entry represented 1/2 in the above example has values 512 and 256, and may be handled separately during weight calculation. Accordingly, the weighted average calculation shown in the equation presented above:
MVb(x,H)=(((W-1-x)*MV(-1,H)+(x+1)*MV(W,H))*S[W])
>>ShiftDenom
MVr(W,y)=(((H-1-y)*MV(W,-1)+(y+1)*MV(W,H))*S[H])
>>ShiftDenom
can be modified as shown in the following equation:
MVb(x,H)
=(((W-1-x)*MV(-1,H)+(x+1)*MV(W,H))
*S[W-3])
>>ShiftDenom
MVr(W,y)
=(((H-1-y)*MV(W,-1)+(y+1)*MV(W,H))
*S[H-3])
>>ShiftDenom
however, such as
Figure BDA0002717291420000211
Simple shift conversion of the identification does not provide an accurate output, thus resulting in poor coding efficiency. The inefficiency may be due to the conversion process, which may tolerate errors that accumulate linearly with distance. In some implementations, this error can be reduced by exploiting the fact that the weights of the horizontal and vertical predictors are complementary in the following equation:
Figure BDA0002717291420000221
accordingly, the weighting may be calculated based on the weights of the horizontal or vertical predictors, whichever is more accurate. This can be achieved by introducing the parameters horWeight and verWeight into the following equation,
Figure BDA0002717291420000222
this yields the following equation:
MV(x,y)=(H*MVh(x,y)*horWeight+W*MVv(x,y)*verWeight)
>>(ShiftDenom+log2 W+log2 H)
Figure BDA0002717291420000223
Figure BDA0002717291420000224
alternatively, the following equations may be employed when using a reduced table as identified above.
Figure BDA0002717291420000225
Figure BDA0002717291420000226
HorWeight ═ verWeight (1 < (ShiftDenom-1)), where x ═ y ═ 0
In some embodiments, the table size used to store the weights for parameter S [ n ] may be further reduced because the unequal weight plane MV derivation operates at the subblock level rather than the pixel level, because the subblock level may be configured with a coarser granularity (e.g., 4 x 4 in VVC and jfet).
Thus, when the sub-block size is N × N, MV (x, y) may be mapped to sub-block coordinates MV (x/N, y/N). In the case where the size is reduced from W H to (W/N) X (H/N), the maximum size of the table is correspondingly lower, so a smaller table size can be used. Accordingly, the above equation may be reformulated and presented as:
Figure BDA0002717291420000231
Figure BDA0002717291420000232
Figure BDA0002717291420000233
fig. 10 depicts a flow diagram 1000 of a system and method of unequal weight planar motion vector derivation. In step 1002, a CU is received, and then in step 1004, motion information associated with the CU is determined. Then in step 1006, motion information associated with the lower right neighboring pixel or block is derived based on the motion information associated with the CU. Then in steps 1008 and 1010, motion information may be derived and/or defined in accordance with the systems and methods described herein based at least in part on motion information associated with the CU and motion information associated with the derived bottom-right neighboring pixel or block. Although fig. 10 depicts step 1008 as preceding step 1010, in some embodiments, steps 1008 and 1010 may occur in parallel and/or step 1010 may precede step 1008. In step 1012, it is determined whether an unequal weighting technique is used to derive the associated motion vector. If it is determined in step 1012 that weighting is not employed, the system may continue encoding in step 1014, as described herein. However, if it is determined in step 1012 that motion information was derived using the weighted combination technique, an indicator may be set in step 1016 and the system may continue encoding in step 1014.
Execution of the sequences of instructions necessary to implement an embodiment may be performed by the computer system 1100 shown in FIG. 11. In an embodiment, execution of the sequences of instructions is performed by a single computer system 1100. According to other embodiments, two or more computer systems 1100 coupled by communication link 1115 may execute sequences of instructions in coordination with each other. Although a description of only one computer system 1100 will be given below, it should be appreciated that embodiments may be practiced with any number of computer systems 1100.
A computer system 1100 according to an embodiment will now be described with reference to fig. 11, which is a block diagram of the functional components of the computer system 1100. As used herein, the term computer system 1100 is used broadly to describe any computing device that can store and independently execute one or more programs.
Each computer system 1100 may include a communication interface 1114 coupled to bus 1106. Communication interface 1114 provides a two-way communication between computer system 1100. Communication interface 1114 of each computer system 1100 transmits and receives electrical, electromagnetic or optical signals, which include data streams representing various types of signal information, such as instructions, messages and data. Communication link 1115 links one computer system 1100 to another computer system 1100. For example, communication link 1115 may be a LAN, in which case communication interface 1114 may be a LAN card; or communication link 1115 may be the PSTN, in which case communication interface 1114 may be an Integrated Services Digital Network (ISDN) card or a modem; or communication link 1115 may be the internet, in which case communication interface 1114 may be a dial-up, cable, or wireless modem.
Computer system 1100 can transmit and receive messages, data, and instructions, including programs, i.e., applications, code, through its respective communication link 1115 and communication interface 1114. The received program code may be executed by a corresponding processor 1107 as it is received, and/or stored in storage device 1110, or other associated non-volatile storage media, for later execution.
In an embodiment, computer system 1100 works in conjunction with a data storage system 1131, such as data storage system 1131 that contains databases 1132 that are easily accessible by computer system 1100. The computer system 1100 communicates with a data storage system 1131 through a data interface 1133. A data interface 1133 coupled to bus 1106 transmits and receives electrical, electromagnetic or optical signals, including data streams representing various types of signal information, such as instructions, messages, and data. In an embodiment, the functions of data interface 1133 may be performed by communication interface 1114.
Computer system 1100 includes a bus 1106 or other communication mechanism for communicating instructions, messages, and data (collectively, information), and one or more processors 1107 coupled with bus 1106 for processing information. Computer system 1100 also includes a main memory 1108, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 1106 for storing dynamic data and instructions to be executed by processor 1107. Main memory 1108 also may be used for storing temporary data, i.e., variables or other intermediate information during execution of instructions by processor 1107.
Computer system 1100 may also include a Read Only Memory (ROM)1109 or other static storage device coupled to bus 1106 for storing static data and instructions for processor 1107. A storage device 1110, such as a magnetic disk or optical disk, may also be provided and coupled to bus 1106 for storing data and instructions for processor 1107.
Computer system 1100 may be coupled via bus 1106 to a display device 1111, such as, but not limited to, a Cathode Ray Tube (CRT) or a Liquid Crystal Display (LCD), for displaying information to a user. An input device 1112, such as alphanumeric and other keys, may be coupled to bus 1106 for communicating information and command selections to processor 1107.
According to one embodiment, each computer system 1100 performs specific operations by a respective processor 1107 executing one or more sequences of one or more instructions contained in main memory 1108. Such instructions may be read into main memory 1108 from another computer-usable medium, such as ROM 1109 or storage device 1110. Execution of the sequences of instructions contained in main memory 1108 causes processor 1107 to perform the processes described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, embodiments are not limited to any specific combination of hardware circuitry and/or software.
The term "computer-usable medium" as used herein refers to any medium that provides information or is usable by processor 1107. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media, that is, media that can hold information without power, include ROM 1109, CD ROMs, magnetic tape, and magnetic disks. Volatile media, i.e., media that cannot retain information without power, includes main memory 1108. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1106. Transmission media can also take the form of carrier waves (i.e., electromagnetic waves that can be modulated in frequency, amplitude, or phase to transmit information signals). Moreover, transmission media can take the form of acoustic or light waves (such as those generated during radio-wave and infra-red data communications).
In the foregoing specification, embodiments have been described with reference to specific elements thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the embodiments. For example, the reader is to understand that the specific ordering and combination of process actions shown in the process flow diagrams described herein is merely exemplary, and that embodiments may be practiced using different or additional process actions or different combinations or orderings of process actions. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
It should also be noted that the present invention may be implemented in a variety of computer systems. The various techniques described herein may be implemented in hardware or software or a combination of both. The techniques are preferably implemented in computer programs executing on programmable computers each comprising a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code may be applied to the data entered using the input device to perform the functions described above and to generate output information. The output information is applied to one or more output devices. Each program is preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, programs may be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described above. It is also contemplated that the system may be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific predefined manner. Further, the storage element of an exemplary computing application may be a relational or sequential (flat file) type computing database, which is capable of storing data in various combinations and configurations.
Fig. 12 is a high-level view of a source device 1212 and a destination device 1210 that may incorporate features of the systems and devices described herein. As shown in fig. 12, exemplary video encoding system 1210 includes a source device 1212 and a destination device 1214, where, in this example, source device 1212 generates encoded video data. Accordingly, source device 1212 may be referred to as a video encoding device. Destination device 1214 may decode the encoded video data generated by source device 1212. Destination device 1214 may thus be referred to as a video decoding device. Source device 1212 and destination device 1214 may be examples of video encoding devices.
Destination device 1214 may receive encoded video data from source device 1212 via channel 1216. Channel 1216 may comprise a medium or device capable of moving encoded video data from source device 1212 to destination device 1214. In one example, channel 1216 may comprise a communication medium that enables source device 1212 to transmit encoded video data directly to destination device 1214 in real-time.
In this example, source device 1212 may modulate the encoded video data according to a communication standard (e.g., a wireless communication protocol) and may transmit the modulated video data to destination device 1214. The communication medium may comprise a wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the internet. The communication medium may include a router, switch, base station, or other device that facilitates communication from source device 1212 to destination device 1214. In another example, channel 1216 may correspond to a storage medium that stores encoded video data generated by source device 1212.
In the example of fig. 12, source device 1212 includes a video source 1218, a video encoder 1220, and an output interface 1222. In some cases, output interface 1228 can include a modulator/demodulator (modem) and/or a transmitter. In source device 1212, video source 1218 may include a source, such as a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface that receives video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of such sources.
The video encoder 1220 may encode captured, pre-captured, or computer-generated video data. The input image may be received by the video encoder 1220 and stored in the input frame memory 1221. From which the general-purpose processor 1223 can load information and execute code. The program for driving the general-purpose processor may be loaded from a storage device, such as the exemplary memory module depicted in fig. 12. The general purpose processor may be encoded with processing memory 1222 and the output of the encoded information by the general purpose processor may be stored in a buffer, such as output buffer 1226.
The video encoder 1220 may include a resampling module 1225, which may be configured to encode (code) (e.g., encode) video data in a scalable video coding scheme that defines at least one base layer and at least one enhancement layer. As part of the encoding process, the resampling module 1225 may resample at least some of the video data, where the resampling may be performed in an adaptive manner using a resampling filter.
Encoded video data, e.g., an encoded bitstream, may be transmitted directly to destination device 1214 via output interface 1228 of source device 1212. In the example of fig. 12, destination device 1214 includes an input interface 1238, a video decoder 1230, and a display device 1232. In some cases, input interface 1228 can include a receiver and/or a modem. Input interface 1238 of destination device 1214 receives the encoded video data over channel 1216. The encoded video data may include various syntax elements generated by the video encoder 1220 that represent the video data. Such syntax elements may be included with the encoded video data transmitted over the communication medium, stored on a storage medium or stored on a file server.
The encoded video data may also be stored on a storage medium or file server for later access by destination device 1214 for decoding and/or playback. For example, the encoded bitstream may be temporarily stored in the input buffer 1231 and then loaded into the general-purpose processor 1233. The program for driving the general-purpose processor may be loaded from a storage device or a memory. A general purpose processor may use processing memory 1232 to perform the decoding. The video encoder 1230 may also include a resampling module 1235 similar to the resampling module 1225 employed in the video encoder 1220.
Fig. 12 depicts the resampling module 1235 separate from the general purpose processor 1233, but those skilled in the art will appreciate that the resampling function may be performed by a program executed by the general purpose processor, and that the processing in the video encoder may be accomplished using one or more processors. The decoded image may be stored in output frame buffer 1236 and then sent to input interface 1238.
Display device 1238 may be integrated with destination device 1214 or may be external thereto. In some examples, destination device 1214 may include an integrated display device, and may also be configured to interface with an external display device. In other examples, destination device 1214 may be a display device. In general, display device 1238 displays the decoded video data to a user.
The video encoder 1220 and the video decoder 1230 may operate according to a video compression standard. ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) are studying the potential requirements for standardization of future video coding technologies with compression capabilities significantly exceeding the current high efficiency video coding HEVC standard (including its current and recent extensions to screen content coding and high dynamic range coding). Various groups are working together to develop this exploration activity, known as the joint video exploration team (jfet), to evaluate the compression technology design proposed by their experts in this field. The recent development of JVET is described in "Algorithm Description of Joint expression Test Model 5(JEM 5)" by the authors J.Chen, E.Alshina, G.Sullivan, J.Ohm, J.Boyce, JVET-E1001-V2.
Additionally or alternatively, the video encoder 1220 and video decoder 1230 may operate according to other patents or industry standards working with the disclosed jfet features. Thus, other standards are, for example, the ITU-T H.264 standard, otherwise known as MPEG-4, part 10, Advanced Video Coding (AVC), or an extension of such a standard. Thus, although newly developed for jfet, the techniques of this disclosure are not limited to any particular coding standard or technique. Other examples of video compression standards and techniques include MPEG-2, ITU-T H.263, and proprietary or open source compression formats and related formats.
The video encoder 1220 and the video decoder 1230 may be implemented in hardware, software, firmware, or any combination thereof. For example, the video encoder 1220 and decoder 1230 may employ one or more processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, or any combinations thereof. When the video encoder 1220 and decoder 1230 are implemented in part in software, the device may store the instructions for the software in a suitable non-transitory computer-readable storage medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the video encoder 1220 and the video decoder 1230 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Aspects of the subject matter described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer, such as general- purpose processors 1223 and 1233 described above. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Aspects of the subject matter described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Examples of memory include Random Access Memory (RAM), Read Only Memory (ROM), or both. The memory may store instructions, e.g., source code or binary code, for performing the techniques described above. The memory may also be used to store variables or other intermediate information during execution of instructions to be executed by processors, such as the processors 1223 and 1233.
The storage device may also store instructions, e.g., source code or binary code, for performing the techniques described above. The storage device may additionally store data used and manipulated by the computer processor. For example, a storage device in video encoder 1220 or video decoder 1230 may be a database accessed by computer systems 1223 or 1233. Other examples of storage devices include Random Access Memory (RAM), read-only memory (ROM), hard drives, magnetic disks, optical disks, CD-ROMs, DVDs, flash memory, USB memory cards, or any other medium from which a computer can read.
The memory or storage device may be an example of a non-transitory computer-readable storage medium used by or in connection with a video encoder and/or decoder. The non-transitory computer readable storage medium contains instructions for controlling a computer system to be configured to perform the functions described in particular embodiments. The instructions, when executed by one or more computer processors, may be configured to perform the functions described in particular embodiments.
Also, it is noted that some embodiments have been described as a process which may be depicted as a flowchart or a block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel, or concurrently. Further, the order of the operations may be rearranged. The process may have additional steps not included in the figures.
Particular embodiments may be implemented in a non-transitory computer readable storage medium for use by or in connection with an instruction execution system, apparatus, system, or machine. The computer readable storage medium contains instructions for controlling a computer system to perform the method described in the specific embodiments. A computer system includes one or more computing devices. The instructions, when executed by one or more computer processors, may be configured to perform the functions described in particular embodiments.
As used in the specification herein and throughout the claims that follow, "a" and "the" include plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of "in … …" includes "in … …" and "on … …" unless the context clearly dictates otherwise.
Although exemplary embodiments of the invention have been described in language specific to structural features and/or methodological acts above, it is to be understood that those skilled in the art will readily appreciate that many additional modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of the invention. Furthermore, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Accordingly, these and all such modifications are intended to be included within the scope of this invention as interpreted according to the breadth and scope of the appended claims.

Claims (20)

1. A method of video encoding, comprising:
identifying a coding unit having a top adjacent row, a left adjacent column, a bottom adjacent row, and a right adjacent column;
determining motion information associated with a bottom-right neighboring pixel positioned at an intersection of the bottom-neighboring row and the right-neighboring column based, at least in part, on motion information associated with the top-neighboring row and the left-neighboring column;
determining motion information associated with the right-adjacent column based at least in part on the motion information associated with the lower right-adjacent pixel; and
and encoding the coding unit.
2. The method of video encoding according to claim 1, further comprising:
determining motion information associated with the bottom neighboring row based at least in part on the motion information associated with the bottom neighboring pixel.
3. The method of video coding as claimed in claim 2, wherein a flat coding mode is employed.
4. A method of video coding as defined in claim 3, wherein the current coding unit is coded according to HEVC.
5. A method of video encoding as claimed in claim 3 wherein the current encoding uses according to VVC encoding.
6. The method of video encoding according to claim 3, wherein the current encoding uses is encoded according to JFET.
7. The method of video encoding according to claim 2, further comprising:
determining a first weight value associated with the top adjacent line; and
determining a second weight value associated with the left adjacent column,
wherein the step of determining motion information associated with the lower right neighboring pixel is based at least in part on a combination of the first weight value and the motion information associated with the top neighboring row and a combination of the second weight value and the motion information associated with the left neighboring column.
8. The method of video coding as claimed in claim 7, wherein a flat coding mode is employed.
9. The method of video coding according to claim 8, wherein the current coding unit is coded according to HEVC.
10. The method of video encoding as claimed in claim 8, wherein the current encoding uses according to VVC encoding.
11. The method of video encoding according to claim 8, wherein the current encoding uses is encoded according to jfet.
12. A system of video encoding, comprising:
storing in a memory coding units having a top adjacent row, a left adjacent column, a bottom adjacent row, and a right adjacent column;
determining and storing in the memory motion information associated with a bottom-right neighboring pixel positioned at an intersection of the bottom-neighboring row and the right-neighboring column based, at least in part, on motion information associated with the top-neighboring row and the left-neighboring column;
determining and storing in memory motion information associated with the right-adjacent column based at least in part on the motion information associated with the lower right-adjacent pixel; and
and encoding the coding unit.
13. The system of video encoding of claim 12, further comprising:
determining and storing in memory motion information associated with the bottom neighboring row based at least in part on the motion information associated with the bottom neighboring pixel.
14. The system for video coding as claimed in claim 13, wherein a flat coding mode is employed.
15. The system of video coding of claim 14, wherein the current coding unit is coded according to at least one of HEVC, jvt and VVC.
16. The video encoding system of claim 13, further comprising:
determining and storing in memory a first weight value associated with the top neighbor line; and
determining and storing in memory a second weight value associated with the left adjacent column;
wherein the step of determining motion information associated with the lower right neighboring pixel is based at least in part on a combination of the first weight value and the motion information associated with the top neighboring row and a combination of the second weight value and the motion information associated with the left neighboring column.
17. The system for video coding as claimed in claim 16, wherein a flat coding mode is employed.
18. The system of video coding of claim 17, wherein the current coding unit is coded according to HEVC.
19. The system of video encoding of claim 17, wherein the current encoding uses encoding according to VVC.
20. The system of video encoding according to claim 17, wherein the current encoding uses encoding according to jfet.
CN201980025081.1A 2018-04-15 2019-04-15 Unequal weight planar motion vector derivation Pending CN111955009A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862657831P 2018-04-15 2018-04-15
US62/657,831 2018-04-15
PCT/US2019/027560 WO2019204234A1 (en) 2018-04-15 2019-04-15 Unequal weight planar motion vector derivation

Publications (1)

Publication Number Publication Date
CN111955009A true CN111955009A (en) 2020-11-17

Family

ID=73337455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980025081.1A Pending CN111955009A (en) 2018-04-15 2019-04-15 Unequal weight planar motion vector derivation

Country Status (1)

Country Link
CN (1) CN111955009A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050013498A1 (en) * 2003-07-18 2005-01-20 Microsoft Corporation Coding of motion vector information
US20120134415A1 (en) * 2010-11-29 2012-05-31 Mediatek Inc. Method and Apparatus of Extended Motion Vector Predictor
JP2012165278A (en) * 2011-02-08 2012-08-30 Jvc Kenwood Corp Image encoding device, image encoding method, and image encoding program
CN102934444A (en) * 2010-04-06 2013-02-13 三星电子株式会社 Method and apparatus for video encoding and method and apparatus for video decoding
CN103039075A (en) * 2010-05-21 2013-04-10 Jvc建伍株式会社 Image encoding apparatus, image encoding method, image encoding program, image decoding apparatus, image decoding method and image decoding program
WO2017157259A1 (en) * 2016-03-15 2017-09-21 Mediatek Inc. Method and apparatus of video coding with affine motion compensation
KR20180033030A (en) * 2016-09-23 2018-04-02 세종대학교산학협력단 Method and apparatus for processing a video signal based on adaptive block patitioning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050013498A1 (en) * 2003-07-18 2005-01-20 Microsoft Corporation Coding of motion vector information
CN102934444A (en) * 2010-04-06 2013-02-13 三星电子株式会社 Method and apparatus for video encoding and method and apparatus for video decoding
CN103039075A (en) * 2010-05-21 2013-04-10 Jvc建伍株式会社 Image encoding apparatus, image encoding method, image encoding program, image decoding apparatus, image decoding method and image decoding program
US20120134415A1 (en) * 2010-11-29 2012-05-31 Mediatek Inc. Method and Apparatus of Extended Motion Vector Predictor
JP2012165278A (en) * 2011-02-08 2012-08-30 Jvc Kenwood Corp Image encoding device, image encoding method, and image encoding program
WO2017157259A1 (en) * 2016-03-15 2017-09-21 Mediatek Inc. Method and apparatus of video coding with affine motion compensation
KR20180033030A (en) * 2016-09-23 2018-04-02 세종대학교산학협력단 Method and apparatus for processing a video signal based on adaptive block patitioning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KRIT PANUSOPONE, SEUNGWOOK HONG: "《Unequal Weight Planar Prediction and Constrained PDPC》", JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16,NO.JVET-E0068-R1, pages 2 *
NA ZHANG: "《Planar Motion Vector Prediction》", JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16, NO.JVET-J0061-V2, pages 2 *

Similar Documents

Publication Publication Date Title
US11936854B2 (en) Adaptive unequal weight planar prediction
US11936858B1 (en) Constrained position dependent intra prediction combination (PDPC)
US11956459B2 (en) Video bitstream coding
CN110959290B (en) Intra-frame mode JVT compiling method
US11259027B2 (en) System and method for constructing a plane for planar prediction
US11622107B2 (en) Unequal weight planar motion vector derivation
CN112655214A (en) Motion information storage for video coding and signaling
CN111903133A (en) Variable template size for template matching
CN112106369A (en) Reducing motion vector information transmission in bi-directional temporal prediction
CN111955009A (en) Unequal weight planar motion vector derivation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination