TWI833795B

TWI833795B - Fast encoding methods for interweaved prediction

Info

Publication number: TWI833795B
Application number: TW108131752A
Authority: TW
Inventors: 張凱; 張莉; 劉鴻彬; 王悅
Original assignee: 大陸商北京字節跳動網絡技術有限公司; 美商字節跳動有限公司
Priority date: 2018-09-03
Filing date: 2019-09-03
Publication date: 2024-03-01
Also published as: CN110876063A; TW202027501A; CN110876063B; WO2020049447A1

Abstract

Methods, systems, and devices related to sub-block based motion prediction in video coding are described. In one representative aspect, a method for video processing includes determining, based on a characteristic of a video block, a prediction block for the video block using a first intermediate prediction block and a second intermediate prediction block, and generating a coded representation of the video block using the prediction block. The first intermediate prediction block is generated by partitioning the video block into a first set of sub-blocks and the second intermediate prediction block is generated by partitioning the video block into a second set of sub-blocks. At least one sub-block in the second set has a different size than a sub-block in the first set.

Description

Fast coding method for interleaved prediction

本專利文件涉及視頻處理技術、裝置和系統。 This patent document relates to video processing technologies, devices and systems.

[Cross-reference to related applications]

根據適用的專利法和/或根據巴黎公約的規則，本申請及時要求於2018年9月3日提交的國際專利申請號PCT/CN2018/103770的優先權和權益。出於全部目的，國際專利申請號PCT/CN2018/103770的全部公開通過引用整合作為本專利文件的公開的一部分。 This application timely claims priority and interest in International Patent Application No. PCT/CN2018/103770, filed on September 3, 2018, under applicable patent law and/or under the rules of the Paris Convention. The entire disclosure of International Patent Application No. PCT/CN2018/103770 is incorporated by reference as part of the disclosure of this patent document for all purposes.

運動補償(MC)是視頻處理中的技術，其用於在給定之前幀和/或未來幀的情况下，通過考慮到相機的運動和/或視頻中的對象而預測視頻中的幀。運動補償可用於視頻數據的編碼以用於視頻壓縮。 Motion compensation (MC) is a technique in video processing that is used to predict frames in a video by taking into account the motion of the camera and/or objects in the video, given previous and/or future frames. Motion compensation can be used in the encoding of video data for video compression.

本文件公開了涉及視頻運動補償中的基於子塊的運動預測的方法、系統和裝置。 This document discloses methods, systems and apparatuses related to sub-block based motion prediction in video motion compensation.

在一個代表性方面中，公開了一種視頻處理的方法。該方法包含，基於視頻塊的特性，使用第一中間預測塊和第二中間預測塊確定視頻塊的預測塊，以及使用預測塊生成視頻塊的編解碼表示。通過將視頻塊細分為子塊的第一集合生成第一中間預測塊，並且通過將視頻塊細分為子塊的第二集合生成第二中間預測塊。第二集合中的至少一個子塊具有與第一集合中的子塊不同的尺寸。 In one representative aspect, a method of video processing is disclosed. the The method includes determining a prediction block of the video block using a first intermediate prediction block and a second intermediate prediction block based on characteristics of the video block, and generating a codec representation of the video block using the prediction block. A first intermediate prediction block is generated by subdividing the video block into a first set of sub-blocks, and a second intermediate prediction block is generated by subdividing the video block into a second set of sub-blocks. At least one sub-block in the second set has a different size than the sub-block in the first set.

在另一代表性方面中，公開了一種用於改善基於塊的運動預測視頻系統的頻寬使用和預測精度的方法。方法包含，選擇來自視頻幀的像素集合以形成塊，根據第一樣式將塊細分為子塊的第一集合，基於子塊的第一集合生成第一中間預測塊，根據第二樣式將塊細分為子塊的第二集合，基於子塊的第二集合生成第二中間預測塊，以及基於第一中間預測塊和第二中間預測塊確定預測塊。第二集合中的至少一個子塊具有與第一集合中的子塊不同的尺寸。 In another representative aspect, a method for improving bandwidth usage and prediction accuracy of a block-based motion prediction video system is disclosed. The method includes selecting a set of pixels from a video frame to form a block, subdividing the block into a first set of sub-blocks based on a first pattern, generating a first intermediate prediction block based on the first set of sub-blocks, dividing the block based on a second pattern subdividing into a second set of sub-blocks, generating a second intermediate prediction block based on the second set of sub-blocks, and determining a prediction block based on the first intermediate prediction block and the second intermediate prediction block. At least one sub-block in the second set has a different size than the sub-block in the first set.

在另一代表性方面中，公開了一種用於改善視頻系統中的基於塊的運動預測的方法。方法包含，選擇來自視頻幀的像素集合以形成塊，基於塊的尺寸或來自空域上或時域上與塊相鄰的另一塊的信息將塊細分為多個子塊，以及通過將編解碼算法應用到多個子塊生成運動向量預測。多個子塊中的至少一個子塊具有與其他子塊不同的尺寸。 In another representative aspect, a method for improving block-based motion prediction in a video system is disclosed. Methods include selecting a set of pixels from a video frame to form a block, subdividing the block into a plurality of sub-blocks based on dimensions of the block or information from another block spatially or temporally adjacent to the block, and by applying a coding and decoding algorithm Generate motion vector predictions to multiple sub-blocks. At least one sub-block of the plurality of sub-blocks has a different size than other sub-blocks.

在另一代表性方面中，公開了一種包括處理器和其上具有指令的非暫態儲存器的設備。由處理器執行指令時，使處理器選擇來自視頻幀的像素集合以形成塊，根據第一樣式將塊細分為子塊的第一集合，基於子塊的第一集合生成第一中間預測塊，根據第二樣式將塊細分為子塊的第二集合，其中第二集合中的至少一個子塊具有與第一集合中的子塊不同的尺寸，基於子塊的第二集合生成第二中間預測塊，以及基於第一中間預測塊和第二中間預測塊確定預測塊。 In another representative aspect, an apparatus is disclosed that includes a processor and non-transitory storage having instructions thereon. When instructions are executed by the processor, the processor Selecting a set of pixels from the video frame to form a block, subdividing the block into a first set of sub-blocks according to a first pattern, generating a first intermediate prediction block based on the first set of sub-blocks, subdividing the block into sub-blocks according to a second pattern a second set of blocks, wherein at least one sub-block in the second set has a different size than the sub-block in the first set, generating a second intermediate prediction block based on the second set of sub-blocks, and based on the first intermediate prediction block and the second intermediate prediction block determine the prediction block.

在又一代表性方面中，本文中所描述的各種技術可以是實施為非暫態計算機可讀介質上儲存的計算機程式產品。計算機程式產品包含用於進行本文中所描述的方法的程式代碼。 In yet another representative aspect, various techniques described herein may be implemented as a computer program product stored on a non-transitory computer-readable medium. A computer program product contains program code for performing the methods described herein.

在又一代表性方面中，視頻解碼器設備可以實現如本文中所描述的方法。 In yet another representative aspect, a video decoder device can implement a method as described herein.

以下所附附件、附圖以及說明書中提出了一個或多個實現方式的細節。其他特徵從說明書和附圖以及申請專利範圍將變得清楚。 Details of one or more implementations are set forth in the accompanying attachments, drawings, and description below. Other features will become apparent from the description and drawings and from the claims.

100、200、300、400:塊 100, 200, 300, 400: blocks

101:子塊 101: Sub-block

500:當前CU 500:Current CU

501:左 501:Left

502:上 502:Up

503:右上 503:upper right

504:左下 504:lower left

505:左上 505:upper left

600:CU 600:CU

601:子CU 601: Sub-CU

650:參考圖片 650:Reference picture

651:塊 651:Block

700:8×8 CU 700:8×8CU

701~704:A~D 701~704:A~D

711~714:a~d 711~714:a~d

900:塊 900: block

901:填充區域 901: Fill area

1000、1100:當前CU 1000, 1100: current CU

1001、1002:MV0、MV1 1001, 1002:MV0, MV1

1003、1004:TD0、TD1 1003, 1004:TD0, TD1

1010、1011、1100、1110:參考圖片 1010, 1011, 1100, 1110: Reference pictures

1200:單邊運動估計 1200: Unilateral motion estimation

1300:當前塊 1300:Current block

1301、1302:樣式0、樣式1 1301, 1302: Style 0, Style 1

1303、1304、1305:P1、P2、P 1303, 1304, 1305:P1, P2, P

1500、1550:方法 1500, 1550: Method

1502~1512、1552~1556: 1502~1512, 1552~1556:

1600:計算機系統 1600:Computer systems

1605:多個處理器 1605:Multiple processors

1610:儲存器 1610:Storage

1615:網路適配器 1615:Network adapter

1625:互連件 1625:Interconnects

1700:移動裝置 1700:Mobile device

1701:處理器或控制器 1701: Processor or controller

1702:儲存器 1702:Storage

1703:I/O介面和感測器 1703:I/O interface and sensors

1704:顯示裝置 1704:Display device

1900:方法 1900:Method

1902、1904:操作 1902, 1904: Operation

2000:系統 2000:System

2002:輸入 2002:Enter

2004:編解碼組件 2004: Codec component

2006、2008:組件 2006, 2008: Components

2010:顯示介面 2010: Display interface

A、B、C、D、E:子塊 A, B, C, D, E: sub-blocks

V0、V1、v1~v4:向量 V0, V1, v1~v4: vector

τ₀、τ₁:距離 τ ₀ , τ ₁ : distance

圖1是示出基於子塊的預測的示例的示意圖。 FIG. 1 is a schematic diagram showing an example of sub-block based prediction.

圖2示出了由兩個控制點運動向量描述的塊的仿射運動場的示例。 Figure 2 shows an example of an affine motion field for a block described by two control point motion vectors.

圖3示出了塊的每個子塊的仿射運動向量場的示例。 Figure 3 shows an example of an affine motion vector field for each sub-block of a block.

圖4示出了AF_INTER模式中的塊400的運動向量預測的示例。 Figure 4 shows an illustration of motion vector prediction for block 400 in AF_INTER mode. example.

圖5A示出了當前編解碼單元(CU)的候選塊的選擇順序的示例。 FIG. 5A shows an example of the selection order of candidate blocks of the current coding and decoding unit (CU).

圖5B示出了AF_MERGE模式中的當前CU的候選塊的另一示例。 FIG. 5B shows another example of candidate blocks of the current CU in AF_MERGE mode.

圖6示出了CU的可選時域運動向量預測(ATMVP)運動預測過程的示例。 Figure 6 shows an example of an alternative temporal motion vector prediction (ATMVP) motion prediction process for a CU.

圖7示出了具有四個子塊和相鄰塊的一個CU的示例。 Figure 7 shows an example of one CU with four sub-blocks and adjacent blocks.

圖8示出了雙向光流(BIO)方法中的示例性光流軌迹。 Figure 8 shows an exemplary optical flow trajectory in the Bidirectional Optical Flow (BIO) method.

圖9A示出了塊之外的存取位置的示例。 Figure 9A shows an example of an access location outside a block.

圖9B示出了可以用於避免額外儲存器存取和計算的填充區域(padding area)。 Figure 9B shows a padding area that can be used to avoid extra memory accesses and computations.

圖10示出了幀速率上轉換(FRUC)方法中使用的雙邊匹配的示例。 Figure 10 shows an example of bilateral matching used in the frame rate up conversion (FRUC) method.

圖11示出了FRUC方法中使用的模板匹配的示例。 Figure 11 shows an example of template matching used in the FRUC method.

圖12示出了FRUC方法中的單邊運動估計(ME)的示例。 Figure 12 shows an example of one-sided motion estimation (ME) in the FRUC method.

圖13示出了根據本公開技術的具有兩種細分樣式的交織預測的示例。 13 illustrates an example of interleaved prediction with two subdivision styles according to the disclosed technology.

圖14A示出了根據本公開技術的將塊細分為4×4子塊的示例性細分樣式。 14A illustrates an exemplary subdivision pattern for subdividing a block into 4×4 sub-blocks in accordance with the disclosed technology.

圖14B示出了根據本公開技術的將塊細分為8×8子塊的示例性細分樣式。 14B illustrates an exemplary subdivision pattern for subdividing a block into 8×8 sub-blocks in accordance with the disclosed technology.

圖14C示出了根據本公開技術的將塊細分為4×8子塊的示例性細分樣式。 Figure 14C illustrates an exemplary subdivision pattern for subdividing a block into 4x8 sub-blocks in accordance with the disclosed technology.

圖14D示出了根據本公開技術的將塊細分為8×4子塊的示例性細分樣式。 14D illustrates an exemplary subdivision pattern for subdividing a block into 8×4 sub-blocks in accordance with the disclosed technology.

圖14E示出了根據本公開技術的將塊細分為非均勻子塊的示例性細分樣式。 14E illustrates an exemplary subdivision pattern for subdividing a block into non-uniform sub-blocks in accordance with the disclosed technology.

圖14F示出了根據本公開技術的將塊細分為非均勻子塊的另一示例性細分樣式。 Figure 14F illustrates another exemplary subdivision pattern for subdividing a block into non-uniform sub-blocks in accordance with the disclosed technology.

圖14G示出了根據本公開技術的將塊細分為非均勻子塊的又一示例性細分樣式。 14G illustrates yet another exemplary subdivision pattern for subdividing a block into non-uniform sub-blocks in accordance with the disclosed technology.

圖15A是根據本公開的技術的用於改善基於塊的運動預測視頻系統的頻寬使用和預測精度的方法的示例性流程圖。 15A is an exemplary flow diagram of a method for improving bandwidth usage and prediction accuracy of a block-based motion prediction video system in accordance with the techniques of this disclosure.

圖15B是根據本公開的技術的用於改善基於塊的運動預測視頻系統的頻寬使用和預測精度的方法的另一示例性流程圖。 15B is another exemplary flow diagram of a method for improving bandwidth usage and prediction accuracy of a block-based motion prediction video system in accordance with the techniques of this disclosure.

圖16是圖示了可以用於實現本公開的技術的各部分的計算機系統或其他控制裝置的架構的示例的方塊圖。 16 is a block diagram illustrating an example of the architecture of a computer system or other control device that may be used to implement portions of the techniques of this disclosure.

圖17示出了可以用來實現本公開技術的各部分的移動裝置的示例性實施例的方塊圖。 17 illustrates a block diagram of an exemplary embodiment of a mobile device that may be used to implement portions of the disclosed technology.

圖18A至18C示出了部分交織預測技術的示例性實施例。 Figures 18A to 18C illustrate exemplary embodiments of partially interleaved prediction techniques.

圖19是用於改善視頻系統中的基於塊的運動預測的方法的流程圖表示。 Figure 19 is a flowchart representation of a method for improving block-based motion prediction in a video system.

圖20是可以實現本公開的技術的示例性視頻處理系統的方塊圖。 20 is an illustration of an exemplary video processing system in which the techniques of the present disclosure may be implemented. block diagram.

全域運動補償是運動補償技術的多種變體之一，並且可以用於預測相機的運動。然而，在幀內運動對象未由全域運動補償的各種實現方式充分地表示。將幀細分為像素的塊以用於進行運動預測的局部運動估計(諸如塊運動補償)可以用於考慮幀內運動的對象。 Global motion compensation is one of many variations of motion compensation technology and can be used to predict camera motion. However, moving objects within a frame are not adequately represented by various implementations of global motion compensation. Local motion estimation, such as block motion compensation, which subdivides a frame into blocks of pixels for motion prediction, can be used to account for objects moving within a frame.

基於塊運動補償開發的基於子塊的預測首先由高效視頻編解碼(HEVC)Annex I(3D-HEVC)引入到視頻編解碼標準中。圖1是示出基於子塊的預測的示例的示意圖。在基於子塊的預測的情况下，塊100(諸如編解碼單元(CU)或預測單元(PU))被細分為若干不重叠的子塊101。不同的子塊可以分配不同的運動信息，諸如參考索引或運動向量(MV)。然後對於每個子塊單獨地進行運動補償。 Sub-block-based prediction developed based on block motion compensation was first introduced into the video codec standard by High Efficiency Video Coding (HEVC) Annex I (3D-HEVC). FIG. 1 is a schematic diagram showing an example of sub-block based prediction. In the case of sub-block based prediction, a block 100 (such as a codec unit (CU) or a prediction unit (PU)) is subdivided into several non-overlapping sub-blocks 101. Different sub-blocks may be assigned different motion information, such as reference indices or motion vectors (MVs). Motion compensation is then performed individually for each sub-block.

為了探索HEVC之外的未來視頻編解碼技術，由視頻編解碼專家組(VCEG)和運動圖片專家組(MPEG)於2015年聯合成立了聯合視頻探索組(JVET)。許多方法已經被JVET采用並添加到名為聯合探索模型(JEM)的參考軟體。在JEM中，基於子塊的預測被在若干編解碼技術中采用，諸如以下詳細討論的仿射預測、可選時域運動向量預測(ATMVP)、空域-時域運動向量預測(STMVP)、雙向光流(BIO)，以及幀速率上轉換(FRUC)。 In order to explore future video codec technologies beyond HEVC, the Joint Video Exploration Team (JVET) was jointly established in 2015 by the Video Codec Experts Group (VCEG) and the Moving Picture Experts Group (MPEG). Many methods have been adopted by JVET and added to a reference software called the Joint Exploration Model (JEM). In JEM, sub-block based prediction is adopted in several codec techniques, such as affine prediction, optional temporal motion vector prediction (ATMVP), spatial-temporal motion vector prediction (STMVP), bi-directional motion vector prediction, discussed in detail below. Optical flow (BIO), and frame rate upconversion (FRUC).

仿射預測 Affine prediction

在HEVC中，僅平移運動模型被應用於運動補償預測(MCP)。然而，相機和對象可以具有許多類型的運動，例如放大/縮小、旋轉、透視運動和/或其他不常規運動。另一方面，JEM應用簡化的仿射變換運動補償預測。圖2示出了由兩個控制點運動向量V0和V1描述的塊200的仿射運動場的示例。塊200的運動向量場(MVF)可以由以下等式描述：

In HEVC, only translational motion models are applied for motion compensated prediction (MCP). However, cameras and objects can have many types of motion, such as zoom in/out, rotation, perspective motion, and/or other unusual motion. JEM, on the other hand, applies a simplified affine transform for motion-compensated prediction. Figure 2 shows an example of an affine motion field for a block 200 described by two control point motion vectors V0 and V1. The motion vector field (MVF) of block 200 can be described by the following equation:

如圖2中所示，(v0x，v0y)是左上角控制點的運動向量，並且(v1x，v1y)是右上角控制點的運動向量。為簡化運動補償預測，可以應用基於子塊的仿射變換預測。子塊尺寸M×N如以下導出：

As shown in Figure 2, (v0x, v0y) is the motion vector of the upper left corner control point, and (v1x, v1y) is the motion vector of the upper right corner control point. To simplify motion compensated prediction, sub-block based affine transform prediction can be applied. The sub-block size M×N is derived as follows:

此處，MvPre是運動向量分數精度(例如，JEM中的1/16)。(v2x，v2y)是根據等式(1)計算的左下控制點的運動向量。如果需要，則M和N可以被向下調整，以使其分別為w和h的除數。 Here, MvPre is the motion vector fractional precision (eg, 1/16 in JEM). (v2x, v2y) is the motion vector of the lower left control point calculated according to equation (1). If necessary, M and N can be adjusted downwards so that they are divisors of w and h respectively.

圖3示出了塊300的每個子塊的仿射MVF的示例。為導出每個M×N子塊的運動向量，可以根據等式(1)計算每個子塊的中央樣本的運動向量，並且四捨五入到運動向量分數精度(例如，JEM中的1/16)。然後，可以應用運動補償插值濾波器，以用導出的運動向量生成每個子塊的預測。在MCP之後，每個子塊的高精度運動向量被四捨五入且保存為與普通運動向量相同精度。 Figure 3 shows an example of affine MVF for each sub-block of block 300. To derive the motion vector of each M × N sub-block, each sub-block can be calculated according to Equation (1) The motion vector of the central sample, and rounded to the motion vector fractional precision (e.g., 1/16 in JEM). A motion compensated interpolation filter can then be applied to generate predictions for each sub-block with the derived motion vectors. After MCP, the high-precision motion vector of each sub-block is rounded and saved with the same precision as the normal motion vector.

在JEM中，存在兩種仿射運動模式：AF_INTER模式和AF_MERGE模式。對於寬度和高度兩者都大於8的CU，可以應用AF_INTER模式。在位元流中信令通知CU級中的仿射標志，以指示是否使用AF_INTER模式。在AF_INTER模式中，使用相鄰塊建構具有運動向量對{(v₀,v₁)|v₀={v_A,v_B,v_C},v₁={v_D,v_E}}的候選列表。圖4示出了AF_INTER模式中的塊400的運動向量預測(MVP)的示例。如圖4中所示，從子塊A、B或C的運動向量選擇v0。來自相鄰塊的運動向量可以根據參考列表被縮放。運動向量也可以根據相鄰塊的參考的圖片順序計數(POC)、當前CU的參考的POC，以及當前CU的POC之間的關係被縮放。從相鄰子塊D和E選擇v1的方案是類似的。如果候選列表的數目小於2，則由通過複製AMVP候選中的每一個構成的運動向量對來填充該列表。當候選列表大於2時，候選可以首先根據相鄰運動向量(例如，基於一對候選中的兩個運動向量的相似度)進行分類。在一些實現方式中，前兩個候選被保留。在一些實施例中，使用速率失真(RD)成本檢查來確定將哪個運動向量對候選選擇為當前CU的控制點運動向量預測(CPMVP)。可以在位元流中信令通知指示 CPMVP在候選列表中的位置的索引。在當前仿射CU的CPMVP被確定之後，應用仿射運動估計，並且找到控制點運動向量(CPMV)。然後在位元流中信令通知CPMV與CPMVP之間的差异。 In JEM, there are two affine motion modes: AF_INTER mode and AF_MERGE mode. For CUs where both width and height are greater than 8, AF_INTER mode can be applied. An affine flag in the CU level is signaled in the bitstream to indicate whether to use AF_INTER mode. In AF_INTER mode, adjacent blocks are used to construct candidates with motion vector pairs {(v ₀ ,v ₁ )|v ₀ ={v _A ,v _B ,v _C },v ₁ ={v _D ,v _E }} list. Figure 4 shows an example of motion vector prediction (MVP) for block 400 in AF_INTER mode. As shown in Figure 4, v0 is selected from the motion vector of sub-block A, B or C. Motion vectors from neighboring blocks can be scaled according to the reference list. The motion vector may also be scaled according to the relationship between the picture order count (POC) of the neighboring block's reference, the POC of the current CU's reference, and the POC of the current CU. The scheme of selecting v1 from adjacent sub-blocks D and E is similar. If the number of candidate lists is less than 2, then the list is populated by motion vector pairs formed by copying each of the AMVP candidates. When the candidate list is larger than 2, the candidates may first be classified based on adjacent motion vectors (eg, based on the similarity of two motion vectors in a pair of candidates). In some implementations, the first two candidates are retained. In some embodiments, a rate-distortion (RD) cost check is used to determine which motion vector pair candidate is selected as the control point motion vector prediction (CPMVP) for the current CU. An index indicating the position of the CPMVP in the candidate list may be signaled in the bitstream. After the CPMVP of the current affine CU is determined, affine motion estimation is applied, and the control point motion vector (CPMV) is found. The difference between CPMV and CPMVP is then signaled in the bitstream.

當CU在AF_MERGE模式中應用時，其從有效相鄰重構的塊得到以仿射模式編解碼的第一塊。圖5A示出了當前CU 500的候選塊的選擇順序的示例。如圖5A中所示，選擇順序可以為從當前CU 500的左(501)、上(502)、右上(503)、左下(504)至左上(505)。圖5B示出了AF_MERGE模式中的當前CU 500的候選塊的另一示例。如果相鄰左下塊501以仿射模式編解碼，如圖5B中所示，則導出含有子塊501的CU的左上角、右上角和左下角的運動向量v2、v3和v4。基於v2、v3和v4計算當前CU 500上的左上角的運動向量v0。相應地計算當前CU的右上的運動向量v1。 When a CU is applied in AF_MERGE mode, it gets the first block encoded in affine mode from valid neighboring reconstructed blocks. FIG. 5A shows an example of the selection order of candidate blocks of the current CU 500. As shown in Figure 5A, the selection order may be from left (501), upper (502), upper right (503), lower left (504) to upper left (505) of the current CU 500. FIG. 5B shows another example of candidate blocks for the current CU 500 in AF_MERGE mode. If the adjacent lower left block 501 is coded in affine mode, as shown in Figure 5B, then the motion vectors v2, v3 and v4 of the upper left corner, upper right corner and lower left corner of the CU containing sub-block 501 are derived. The motion vector v0 of the upper left corner on the current CU 500 is calculated based on v2, v3 and v4. The upper right motion vector v1 of the current CU is calculated accordingly.

在當前CU v0和v1的CPMV根據等式(1)中的仿射運動模型計算之後，可以生成當前CU的MVF。為了識別當前CU是否以AF_MERGE模式編解碼，當存在以仿射模式編解碼的至少一個相鄰塊時，可以在位元流中信令通知仿射標志。 After the CPMV of the current CU v0 and v1 is calculated according to the affine motion model in Equation (1), the MVF of the current CU can be generated. In order to identify whether the current CU is coded in AF_MERGE mode, when there is at least one neighboring block coded in affine mode, an affine flag may be signaled in the bit stream.

可選時域運動向量預測(ATMVP) Optional Temporal Motion Vector Prediction (ATMVP)

在ATMVP方法中，通過從小於當前CU的塊取回運動信息的多個集合(包含運動向量和參考索引)來修改時域運動向量預測(TMVP)方法。 In the ATMVP method, the temporal motion vector prediction (TMVP) method is modified by retrieving multiple sets of motion information (containing motion vectors and reference indices) from blocks smaller than the current CU.

圖6示出了CU 600的ATMVP運動預測過程的示例。ATMVP方法以兩個步驟預測CU 600內的子CU 601的運動向量。第一步驟是以時域向量識別參考圖片650中的對應的塊651。參考圖片650還稱為運動源圖片。第二步驟是將當前CU 600分割為子CU 601，並且從對應於每個子CU的塊獲取每個子CU的運動向量以及參考索引。 FIG. 6 shows an example of the ATMVP motion prediction process of the CU 600. The ATMVP method predicts motion vectors for sub-CUs 601 within CU 600 in two steps. The first step is to identify the corresponding block 651 in the reference picture 650 with a temporal vector. The reference picture 650 is also called a motion source picture. The second step is to partition the current CU 600 into sub-CUs 601 and obtain the motion vector and reference index of each sub-CU from the block corresponding to each sub-CU.

在第一步驟中，由當前CU 600的空域相鄰塊的運動信息確定參考圖片650和對應的塊。為了避免相鄰塊的重複掃描過程，使用當前CU 600的MERGE候選列表中的第一MERGE候選。第一可用運動向量及其相關聯的參考索引設定為時域向量和運動源圖片的索引。以此方式，與TMVP相比可以更準確地識別對應的塊，其中對應的塊(有時稱為搭配塊)總是相對於當前CU在右下或中央位置。 In the first step, the reference picture 650 and the corresponding block are determined from the motion information of the spatial neighboring blocks of the current CU 600 . In order to avoid repeated scanning processes of adjacent blocks, the first MERGE candidate in the MERGE candidate list of the current CU 600 is used. The first available motion vector and its associated reference index are set to the temporal vector and the index of the motion source picture. In this way, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes called a collocation block) is always in the lower right or center position relative to the current CU.

在第二步驟中，通過將時間向量添加到當前CU的坐標中，通過運動源圖片650中的時間向量識別子CU 651的對應塊。對於每個子CU，使用其對應的塊(例如，覆蓋中央樣本的最小運動網格)的運動信息來導出子CU的運動信息。在識別對應的N×N塊的運動信息之後，以與HEVC的TMVP相同的方式將其轉換為當前子CU的運動向量和參考索引，其中運動縮放和其他過程適用。例如，解碼器檢查是否滿足低延遲條件(例如當前圖片的全部參考圖片的POC小於當前圖片的POC)，並且可能地使用運動向量MVx(例如，對應於參考圖片列表X的運動向量)來預測每個子CU的運動向量MVy(例如，X等於0或1，並且Y等於1-X)。 In a second step, the corresponding block of the sub-CU 651 is identified by the time vector in the motion source picture 650 by adding the time vector to the coordinates of the current CU. For each sub-CU, the motion information of the sub-CU is derived using the motion information of its corresponding block (eg, the minimum motion grid covering the central sample). After identifying the motion information of the corresponding N×N block, it is converted into the motion vector and reference index of the current sub-CU in the same way as HEVC’s TMVP, where motion scaling and other processes are applicable. For example, the decoder checks whether the low latency condition is met (e.g., the POC of all reference pictures of the current picture is less than the POC of the current picture), and possibly uses the motion vector MVx (e.g., the motion vector corresponding to the reference picture list X) to predict each The motion vector MVy of the sub-CU (eg, X equals 0 or 1, and Y equals 1-X).

空域-時域運動向量預測(STMVP) Spatial-temporal motion vector prediction (STMVP)

在STMVP方法中，遵循光栅掃描順序，遞迴地導出子CU的運動向量。圖7示出了具有四個子塊和相鄰塊的一個CU的示例。考慮8×8 CU 700，其包含四個4×4子CU，A(701)、B(702)、C(703)以及D(704)。當前幀中的相鄰4×4塊標記為a(711)、b(712)、c(713)以及d(714)。 In the STMVP method, the motion vectors of sub-CUs are derived recursively following the raster scanning order. Figure 7 shows an example of one CU with four sub-blocks and adjacent blocks. Consider an 8x8 CU 700, which contains four 4x4 sub-CUs, A(701), B(702), C(703), and D(704). Adjacent 4x4 blocks in the current frame are labeled a (711), b (712), c (713), and d (714).

子CU A的運動導出開始於識別其兩個空域鄰域(neighbor)。第一鄰域是子CU A 701上方的N×N塊(塊c 713)。如果該塊c(713)不可用或是幀內編解碼的，則檢查子CU A(701)上方的其他N×N塊(從左到右，從塊c 713開始)。第二鄰域是在子CU A 701左側的塊(塊b 712)。如果塊b(712)不可用或是幀內編解碼的，則檢查在子CU A 701左側的其他塊(從上到下，從塊b 712開始)。對於每個列表從相鄰塊獲取的運動信息被縮放到對於給定列表的第一參考幀。接下來，通過遵循與如HEVC中所指定的TMVP相同的過程來導出子塊A 701的時域運動向量預測(TMVP)。塊D 704處的搭配塊的運動信息被相應地取回和縮放。最終，在提取和縮放運動信息之後，全部可用運動向量對於每個參考列表被分開地平均。平均的運動向量被分配為當前子CU的運動向量。 The motion derivation of sub-CU A starts by identifying its two spatial neighbors. The first neighborhood is the NxN block above sub-CU A 701 (block c 713). If block c (713) is not available or is intra-coded, then other NxN blocks above sub-CU A (701) are checked (from left to right, starting with block c 713). The second neighborhood is the block to the left of sub-CU A 701 (block b 712). If block b (712) is not available or is intra-coded, then other blocks to the left of sub-CU A 701 are checked (from top to bottom, starting with block b 712). The motion information obtained from neighboring blocks for each list is scaled to the first reference frame for the given list. Next, the temporal motion vector prediction (TMVP) of sub-block A 701 is derived by following the same process as TMVP as specified in HEVC. The motion information of the collocated block at block D 704 is retrieved and scaled accordingly. Finally, after extracting and scaling the motion information, all available motion vectors are averaged separately for each reference list. The average motion vector is assigned as the motion vector of the current sub-CU.

雙向光流(BIO) Bidirectional Optical Flow (BIO)

雙向光流(BIO)方法是在對雙向預測的塊方面 (block-wise)運動補償之上進行的樣本方面(sample-wise)運動細化。在一些實現方式中，樣本級運動細化不使用信令通知。 The Bidirectional Optical Flow (BIO) method is based on bidirectionally predicted blocks. Sample-wise motion refinement on top of (block-wise) motion compensation. In some implementations, sample-level motion refinement does not use signaling.

使I ^(k)為塊運動補償之後來自參考k(k=0，1)的亮度值，並且

I ^(k)/

x、

I ^(k)/

y分別為I ^(k)梯度的水平和垂直分量。假設光流是有效的，運動向量場(v _x,v _y)由以下給出：

Let I ^{( k )} be the brightness value from reference k (k=0,1) after block motion compensation, and

I ^{( k )} /

x ,

I ^{( k )} /

y are the horizontal and vertical components of the gradient of I ^{( k )} respectively. Assuming that optical flow is effective, the motion vector field ( v _x , v _y ) is given by:

將此光流等式與用於每個樣本的運動軌迹的埃爾米特(Hermite)插值組合得到唯一的三階多項式，其匹配函數值I ^(k)和端部處的導數

I ^(k)/

x,

I ^(k)/

y兩者。該多項式在t=0處的值是BIO預測：

Combining this optical flow equation with Hermite interpolation for each sample's motion trajectory results in a unique third-order polynomial that matches the function value I ^{( k )} and the derivative at the ends

I ^{( k )} /

x ,

I ^{( k )} /

y both. The value of this polynomial at t=0 is the BIO prediction:

圖8示出了雙向光流(BIO)方法中的示例性光流軌迹。此處，τ₀和τ₁指代與參考幀的距離。基於Ref0和Ref1的POC計算距離τ₀和τ₁：τ 0=POC(當前)-POC(Ref0)，τ 1=POC(Ref1)-POC(當前)。如果兩個預測來自相同的時間方向(都來自過去或來自未來)，則符號是不同的(例如，τ₀．τ₁<0)。在此情况下，如果預測不是來自相同時刻(例如，τ₀≠τ₁)，則應用BIO。兩個參考區域都具有非零運動(例如，MVx ₀,MVy ₀,MVx ₁,MVy ₁≠0)，並且塊運動向量與時間距離成比例(例如，MVx ₀/MVx ₁=MVy ₀/MVy ₁=-τ₀/τ₁)。 Figure 8 shows an exemplary optical flow trajectory in the Bidirectional Optical Flow (BIO) method. Here, τ ₀ and τ ₁ refer to the distance from the reference frame. Calculate the distance τ ₀ and τ ₁ based on the POC of Ref0 and Ref1: τ 0 =POC(current)-POC(Ref0), τ 1 =POC(Ref1)-POC(current). If two predictions come from the same direction in time (both from the past or from the future), the signs are different (e.g., τ ₀ · τ ₁ <0). In this case, BIO is applied if the predictions are not from the same time instant (eg, τ ₀ ≠ τ ₁ ). Both reference regions have non-zero motion (e.g., MVx ₀ , MVy ₀ , MVx ₁ , MVy ₁ ≠0), and the block motion vector is proportional to the temporal distance (e.g., MVx ₀ / MVx ₁ = MVy ₀ / MVy ₁ =-τ ₀ /τ ₁ ).

通過最小化A點和B點中的值之間的差异△來確定運動向量場(v _x,v _y)。圖9A-9B示出了運動軌迹和參考幀平面的交叉的示例。對於△，模型僅使用局部泰勒(Taylor)展開的第一綫性項：

The motion vector field ( v _x , v _y ) is determined by minimizing the difference Δ between the values in point A and point B. 9A-9B illustrate examples of intersections of motion trajectories and reference frame planes. For Δ, the model uses only the first linear term of the local Taylor expansion:

以上等式中的全部值取决於樣本位置，指代為(i',j')。假設運動在局部圍繞區域中是一致的，則△可以在(2M+1)×(2M+1)正方形窗Ω內被最小化，正方形窗Ω以當前預測的點(i,j)為中心，其中M等於2：

All values in the above equation depend on the sample location, referred to as ( i',j' ). Assuming that the motion is consistent in the local surrounding area, then △ can be minimized within a (2M+1)×(2M+1) square window Ω centered on the currently predicted point ( i,j ), where M equals 2:

對於此最優化問題，JEM使用簡化的方案，首先在垂直方向上並且然後在水平方向上進行最小化。這得到以下：

For this optimization problem, JEM uses a simplified scheme, minimizing first in the vertical direction and then in the horizontal direction. This results in the following:

其中，

in,

為了避免除以零或很小的值，正則化參數r和m可以被引入到等式(7)和等式(8)中。 To avoid division by zero or very small values, regularization parameters r and m can be introduced into Equation (7) and Equation (8).

r=500．4^d-8 等式(10) r =500.4 ^{d -8Equation} (10)

m=700．4^d-8 等式(11)此處，d是視頻樣本的位深度。 m =700.4 ^{d -8} Equation (11) Here, d is the bit depth of the video sample.

為了使如BIO的儲存器存取與常規雙向預測運動補償一樣，對當前塊內的位置計算全部預測和梯度值 I ^(k),

I ^(k)/

x,

I ^(k)/

y。圖9A示出了塊900之外的存取位置的示例。如圖9A中所示，在等式(9)中，預測的塊的邊界上的當前預測的點為中心的(2M+1)×(2M+1)正方形窗Ω需要存取塊之外的位置。在JEM中，塊之外的I ^(k),

I ^(k)/

x,

I ^(k)/

y的值被設定為等於塊內的最接近可用值。例如，這可以實現為填充區域901，如圖9B中所示。 To make memory access like BIO the same as conventional bidirectional predictive motion compensation, all prediction and gradient values I ^{( k )} are calculated for the position within the current block,

I ^{( k )} /

x ,

I ^{( k )} /

y . Figure 9A shows an example of access locations outside of block 900. As shown in Figure 9A, in Equation (9), a (2M+1)×(2M+1) square window Ω centered on the currently predicted point on the boundary of the predicted block requires access outside the block. Location. In JEM, I ^{( k )} outside the block,

I ^{( k )} /

x ,

I ^{( k )} /

The value of y is set equal to the closest available value within the block. For example, this may be implemented as filling area 901, as shown in Figure 9B.

使用BIO，可以將運動場對於每個樣本細化。為了降低計算複雜度，在JEM中使用基於塊的設計的BIO。可以基於4×4塊計算運動細化。在基於塊的BIO中，4×4塊中的全部樣本的等式(9)中的sn的值可以被聚集，並且然後sn的聚集的值被用於導出4×4塊的BIO運動向量偏移。更具體地，以下公式可以用於基於塊的BIO導出：

Using BIO, the playing field can be refined for each sample. To reduce computational complexity, block-based designed BIOs are used in JEM. Motion refinement can be calculated based on 4×4 blocks. In block-based BIO, the values of sn in Equation (9) for all samples in a 4×4 block can be aggregated, and then the aggregated values of sn are used to derive the BIO motion vector bias of the 4×4 block. shift. More specifically, the following formula can be used for block-based BIO export:

此處，bk指代屬於預測的塊的第k個4×4塊的樣本的集合。等式(7)和等式(8)中的sn被((sn,bk)>>4)取代，以導出相關聯的運動向量偏移。 Here, bk refers to the set of samples belonging to the k-th 4×4 block of the predicted block. sn in Equation (7) and Equation (8) is replaced by ((sn,bk)>>4) to derive the associated motion vector offset.

在一些情形下，由於噪聲或不常規運動，BIO的MV團(regiment)可能是不可靠的。因此，在BIO中，MV團的大小被修剪到閾值。閾值是基於當前圖片的參考圖片是否全部來自一個方向而確定的。例如，如果當前圖片的全部參考圖片來自一個方向，則閾值的值設定為12×2^14-d；否則，將其設定為12×2^13-d。 In some situations, the BIO's MV regiment may be unreliable due to noise or irregular movement. Therefore, in BIO, the size of the MV clump is trimmed to a threshold. The threshold is determined based on whether the reference pictures of the current picture all come from one direction. For example, if all reference pictures of the current picture come from one direction, the value of the threshold is set to 12×2 ^{14- d} ; otherwise, it is set to 12×2 ^{13- d} .

可以在與運動補償插值的同時使用與HEVC運動補償過程(例如，2D可分離有限脉衝響應(FIR))一致的操作計算BIO的梯度。在一些實施例中，2D可分離FIR的輸入是與運動補償過程和根據塊運動向量的分數部分的分數位置(fracX，fracY)相同的參考幀樣本。對於水平梯度

I/

x，首先使用BIOfilterS對信號進行垂直地插值，其對應於具有去縮放位移d-8的分數位置fracY。然後將梯度濾波器BIOfilterG應用於對應於具有去縮放位移18-d的分數位置fracX的水平方向上。對於垂直梯度

I/

y，對應於具有去縮放位移d-8的分數位置fracY，使用BIOfilterG垂直地應用梯度濾波器。然後對應於具有去縮放位移18-d的分數位置fracX，在水平方向使用BIOfilterS進行信號置換。梯度計算的插值濾波器的長度BIOfilterG和信號置換BIOfilterF可以更短(例如，6-tap)，以便保持合理的複雜度。表1示出了可以用於BIO中的塊運動向量的不同分數位置的梯度計算的示例性濾波器。表2示出了可以用於BIO中的預測信號生成的示例性插值濾波器。 The gradient of the BIO may be calculated simultaneously with motion compensated interpolation using operations consistent with the HEVC motion compensation process (eg, 2D separable finite impulse response (FIR)). In some embodiments, the input to the 2D separable FIR is the same reference frame sample as the motion compensation process and the fractional position (fracX, fracY) according to the fractional part of the block motion vector. For horizontal gradient

I /

x , the signal is first interpolated vertically using BIOfilterS, which corresponds to the fractional position fracY with a descaled displacement d-8. The gradient filter BIOfilterG is then applied in the horizontal direction corresponding to the fractional position fracX with a descaled displacement of 18-d. for vertical gradient

I /

y , corresponding to the fractional position fracY with descaled displacement d-8, apply the gradient filter vertically using BIOfilterG. Signal permutation is then performed using BIOfilterS in the horizontal direction corresponding to the fractional position fracX with a descaled displacement of 18-d. The lengths of the gradient calculation interpolation filter BIOfilterG and the signal permutation BIOfilterF can be shorter (e.g., 6-tap) in order to keep the complexity reasonable. Table 1 shows exemplary filters that can be used for gradient calculation of different fractional positions of block motion vectors in BIO. Table 2 shows exemplary interpolation filters that may be used for prediction signal generation in BIO.

表1：BIO中的梯度計算的示例性濾波器

Table 1: Exemplary filters for gradient calculation in BIO

在JEM中，當兩個預測來自不同參考圖片時，BIO可以應用於全部雙向預測塊。當局部照明補償(LIC)對CU啓用時，BIO可以被禁用。 In JEM, BIO can be applied to all bidirectional prediction blocks when the two predictions come from different reference pictures. BIO can be disabled when Local Illumination Compensation (LIC) is enabled for the CU.

在一些實施例中，在普通MC過程之後，對塊應用OBMC。為了降低計算複雜度，在OBMC過程期間可以不應用BIO。這意味著在OBMC過程期間，當使用其自身的MV時，在塊的MC過程中應用BIO，並且當使用相鄰塊的MV時，在MC過程中不應用BIO。 In some embodiments, OBMC is applied to the block after the normal MC process. To reduce computational complexity, BIO may not be applied during the OBMC process. This means that during the OBMC process, when using its own MV, the BIO is applied during the MC process of a block, and when the MV of an adjacent block is used, BIO is not applied during the MC process.

幀速率上轉換(FRUC) Frame rate upconversion (FRUC)

當CU的Merge標志為真時，可以向該CU信令通知FRUC標志。當FRUC標志為偽時，Merge索引可以被信令通知，並且使用常規Merge模式。當FRUC標志為真時，附加FRUC模式標志可以被信令通知，以指示要使用哪種方法(例如，雙邊匹配或模板匹配)來導出塊的運動信息。 When the CU's Merge flag is true, the FRUC flag may be signaled to the CU. When the FRUC flag is false, the Merge index can be signaled and regular Merge mode is used. When the FRUC flag is true, an additional FRUC mode flag may be signaled to indicate which method (eg, bilateral matching or template matching) is to be used to derive the motion information of the block.

在編碼器側，是否對CU使用FRUC Merge模式的决定是基於RD成本選擇，如對普通Merge候選進行的。例如，通過使用RD成本選擇，對於CU檢查多個匹配模式(例如，雙邊匹配和模板匹配)。指向最小成本的一個被進一步與其他CU模式比較。如果FRUC匹配模式是最高效率的一個，則對CU將FRUC標志設定為真，並且使用相關的匹配模式。 On the encoder side, the decision whether to use FRUC Merge mode for a CU is based on RD cost selection, as done for normal Merge candidates. For example, by using RD cost selection, multiple matching modes (eg, bilateral matching and template matching) are checked for CU. The one pointing to the minimum cost is further compared with other CU modes. If the FRUC matching mode is the most efficient one, then the FRUC flag is set to true for the CU and the associated matching mode is used.

典型地，FRUC Merge模式中的運動導出過程具有兩個步驟：首先進行CU級運動搜索，然後是子CU級運動細化。在CU級，基於雙邊匹配或模板匹配，對於整個CU導出初始運動向量。首先，生成MV候選的列表，並且將指向最小匹配成本的候選選擇為進一步CU級細化的起點。然後在起點周圍進行基於雙邊匹配或模板匹配的局部搜索。導致最小匹配成本的MV被作為整個CU的MV。隨後，運動信息在子CU級被用導出的CU運動向量作為起點進一步細化。 Typically, the motion derivation process in FRUC Merge mode has two steps: first CU-level motion search, followed by sub-CU-level motion refinement. At the CU level, initial motion vectors are derived for the entire CU based on bilateral matching or template matching. First, a list of MV candidates is generated, and the candidate pointing to the minimum matching cost is selected as the starting point for further CU-level refinement. A local search based on bilateral matching or template matching is then performed around the starting point. The MV that results in the minimum matching cost is taken as the MV of the entire CU. Subsequently, the motion information is further refined at the sub-CU level using the derived CU motion vectors as a starting point.

例如，對於W×H CU運動信息導出進行以下導出過程。在第一階段，導出整個W×H CU的MV。在第二階段，CU被進一步細分為M×M子CU。如(16)中計算M的值，D是預定分割深度，其在JEM中默認設定為3。然後導出每個子CU的MV。 For example, the following derivation process is performed for W×H CU motion information derivation. In the first stage, the MV of the entire W×H CU is derived. In the second stage, the CU is further subdivided into M×M sub-CUs. The value of M is calculated as in (16), and D is the predetermined segmentation depth, which is set to 3 by default in JEM. Then export the MV of each sub-CU.

圖10示出了在幀速率上轉換(FRUC)方法中使用的雙邊匹配的示例。雙邊匹配用於通過在兩個不同參考圖片(1010,1011)中沿著當前CU(1000)的運動軌迹找到兩個塊之間的最接近匹配，來導出當前CU的運動信息。在連續運動軌迹的假設下，指向兩個參考塊的運動向量MV0(1001)和MV1(1002)與當前圖片與兩個參考圖片之間的時域距離--例如，TD0(1003)和TD1(1004)成比例。在一些實施例中，當當前圖片1000在時間上在兩個參考圖片(1010,1011)之間並且從當前圖片到兩個參考圖片的時域距離相同時，雙邊匹配變為基於雙向MV的鏡像。圖11示出了FRUC方法中使用的模板匹配的示例。模板匹配可以用於通過找到當前圖片中的模板(例如，當前CU的上和/或左相鄰塊)與參考圖片1110中的塊(例如，與模板相同大小)之間的最接近匹配，來導出當前CU 1100的運動信息。除了上述FRUC Merge模式之外，模板匹配也可以應用於AMVP模式。在JEM和HEVC兩者中，AMVP有兩個候選。使用模板匹配方法，可以導出新的候選。如果通過模板匹配的新導出的候選與第一現有 AMVP候選不同，則將其插入AMVP候選列表的最開始，並且然後將列表大小設置為2(例如，通過移除第二現有AMVP候選)。應用於AMVP模式時，僅應用CU級搜索。 Figure 10 shows an example of bilateral matching used in the frame rate up conversion (FRUC) method. Bilateral matching is used to derive the motion information of the current CU by finding the closest match between two blocks along the motion trajectory of the current CU (1000) in two different reference pictures (1010, 1011). Under the assumption of continuous motion trajectories, the motion vectors MV0 (1001) and MV1 (1002) pointing to the two reference blocks are related to the temporal distance between the current picture and the two reference pictures - for example, TD0 (1003) and TD1 ( 1004) in proportion. In some embodiments, when the current picture 1000 is temporally between two reference pictures (1010, 1011) and the temporal distance from the current picture to the two reference pictures is the same, the bilateral matching becomes bidirectional MV based mirroring . Figure 11 shows an example of template matching used in the FRUC method. Template matching may be used by finding the closest match between a template in the current picture (e.g., the upper and/or left neighbor block of the current CU) and a block in the reference picture 1110 (e.g., the same size as the template). Export the current motion information of CU 1100. In addition to the above FRUC Merge mode, template matching can also be applied to AMVP mode. AMVP has two candidates among both JEM and HEVC. Using template matching methods, new candidates can be derived. If the newly exported candidate is matched by template with the first existing If the AMVP candidate is different, it is inserted at the beginning of the AMVP candidate list, and the list size is then set to 2 (eg, by removing the second existing AMVP candidate). When applied to AMVP mode, only CU-level searches are applied.

在CU級設定的MV候選可以包含以下：(1)如果當前CU處於AMVP模式，則為原始AMVP候選，(2)全部Merge候選，(3)插值MV場(後面描述)中的的若干MV以及上和左相鄰運動向量。 The MV candidates set at the CU level can include the following: (1) If the current CU is in AMVP mode, it is the original AMVP candidate, (2) all Merge candidates, (3) several MVs in the interpolated MV field (described later), and Up and left neighboring motion vectors.

當使用雙邊匹配時，Merge候選的每個有效MV可以用作輸入，以在雙邊匹配的假設下生成MV對。例如，Merge候選的一個有效MV是在參考列表A處的(MVa，refa)。然後，在其他參考列表B中找到其配對雙邊MV的參考圖片refb，使得refa和refb在時間上位於當前圖片的不同側。如果參考列表B中這樣的refb不可用，則refb被確定為與refa不同的參考，並且其到當前圖片的時域距離是列表B中的最小的一個。在確定refb之後，基於當前圖片與refa、refb之間的時域距離，通過縮放MVa導出MVb。 When using bilateral matching, each valid MV of the Merge candidate can be used as input to generate MV pairs under the assumption of bilateral matching. For example, a valid MV for the Merge candidate is (MVa, refa) at reference list A. Then, the reference picture refb whose paired bilateral MV is found in other reference list B is such that refa and refb are on different sides of the current picture in time. If such refb is not available in reference list B, refb is determined as a reference different from refa and its temporal distance to the current picture is the smallest one in list B. After determining refb, based on the time domain distance between the current picture and refa and refb, MVb is derived by scaling MVa.

在一些實現方式中，還可以將來自插值的MV場的四個MV添加到CU級候選列表。更具體地，添加當前CU的位置(0,0)、(W/2,0)、(0，H/2)和(W/2，H/2)處的插值MV。當FRUC應用於AMVP模式時，原始AMVP候選也被添加到CU級MV候選集合。在一些實現方式中，在CU級，可以將用於AMVP CU的15個MV和用於Merge CU的13個MV添加到候選列表。 In some implementations, four MVs from the interpolated MV field may also be added to the CU-level candidate list. More specifically, the interpolated MVs at the positions (0,0), (W/2,0), (0,H/2), and (W/2,H/2) of the current CU are added. When FRUC is applied in AMVP mode, the original AMVP candidates are also added to the CU-level MV candidate set. In some implementations, at the CU level, 15 MVs for AMVP CUs and 13 MVs for Merge CUs may be added to the candidate list.

在子CU級的MV候選集合包括從(1)CU級搜索確定的MV，(2)上、左、左上和右上相鄰MV，(3)來自參考圖片的搭配MV的縮放版本，(4)一個或多個ATMVP候選(例如，多至四個)，以及(5)一個或多個STMVP候選(例如，多至四個)。來自參考圖片的縮放的MV如下導出。遍歷兩個列表中的參考圖片。參考圖片中的子CU的搭配位置處的MV被縮放到起始CU級MV的參考。ATMVP和STMVP候選可以是前四個。在子CU級，將一個或多個MV(例如，多至17個)添加到候選列表。 The MV candidate set at the sub-CU level includes MVs determined from (1) CU-level searches, (2) top, left, top left and top right adjacent MVs, (3) scaled versions of collocated MVs from reference pictures, (4) One or more ATMVP candidates (eg, up to four), and (5) one or more STMVP candidates (eg, up to four). The scaled MV from the reference image is exported as follows. Iterate over the reference images in both lists. The MV at the collocation position of the sub-CU in the reference picture is scaled to the reference of the starting CU-level MV. ATMVP and STMVP candidates can be the top four. At the sub-CU level, one or more MVs (eg, up to 17) are added to the candidate list.

插值的MV場的生成 Generation of interpolated MV fields

在對幀進行編解碼之前，基於單邊ME為整個圖片生成插值運動場。然後，運動場可以之後被用作CU級或子CU級MV候選。 Before encoding and decoding the frame, an interpolated motion field is generated for the entire picture based on single-sided ME. The sports fields can then be used later as CU-level or sub-CU-level MV candidates.

在一些實施例中，兩個參考列表中的每個參考圖片的運動場在4×4塊級遍歷。圖12示出了FRUC方法中的單邊運動估計(ME)1200的示例。對於每個4×4塊，如果與塊相關聯的運動通過當前圖片中的4×4塊(如圖12所示)並且該塊尚未被分配任何插值運動，則參考塊的運動根據時間距離TD0和TD1(與HEVC中的TMVP的MV縮放的方式相同)縮放到當前圖片，並且縮放的運動被分配給當前幀中的塊。如果沒有縮放的MV被分配給4×4塊，則在插值的運動場中將塊的運動標記為不可用。 In some embodiments, the motion field for each reference picture in both reference lists is traversed at the 4x4 block level. Figure 12 shows an example of one-sided motion estimation (ME) 1200 in the FRUC method. For each 4×4 block, if the motion associated with the block passes through the 4×4 block in the current picture (as shown in Figure 12) and the block has not been assigned any interpolation motion, the motion of the reference block is based on the temporal distance TD0 and TD1 (in the same way as TMVP's MV scaling in HEVC) scales to the current picture, and the scaled motion is assigned to the blocks in the current frame. If no scaled MV is assigned to a 4×4 block, the motion of the block is marked as unavailable in the interpolated motion field.

插值和匹配成本 Interpolation and matching costs

當運動向量指向分數樣本位置時，需要運動補償插值。為了降低複雜度，可以將雙綫性插值代替常規8抽頭(tap)HEVC插值用於雙邊匹配和模板匹配兩者。 When motion vectors point to fractional sample locations, motion compensated interpolation is required. To reduce complexity, bilinear interpolation can be used instead of conventional 8-tap HEVC interpolation for both bilateral matching and template matching.

匹配成本的計算在不同步驟略有不同。當從CU級的候選集合中選擇候選時，匹配成本可以是雙邊匹配或模板匹配的絕對和差(SAD)。在確定起始MV之後，如下計算子CU級搜索的雙邊匹配的匹配成本C：C=SAD+w．(|MV _x-MV _x ^s|+|MV _y-MV _y ^s|) 等式(14) The calculation of matching costs differs slightly at different steps. When selecting candidates from a CU-level candidate set, the matching cost can be the sum of absolute differences (SAD) for bilateral matching or template matching. After determining the starting MV, the matching cost C of bilateral matching for sub-CU level search is calculated as follows: C = SAD + w . (| MV _x - MV _x ^s |+| MV _y - MV _y ^s |) Equation (14)

此處，w是權重因子。在一些實施例中，w可憑經驗設置為4。MV和MVs分別指示當前MV和起始MV。SAD仍可用作子CU級搜索的模板匹配的匹配成本。 Here, w is the weight factor. In some embodiments, w can be empirically set to 4. MV and MVs indicate the current MV and starting MV respectively. SAD can still be used as the matching cost for template matching for sub-CU level searches.

在FRUC模式中，僅通過使用亮度樣本來導出MV。導出的運動將用於MC幀間預測的亮度和彩度兩者。在决定MV之後，使用用於亮度的8抽頭(tap)插值濾波器和用於彩度的4抽頭插值濾波器來進行最終MC。 In FRUC mode, MV is derived by using luma samples only. The derived motion will be used for both luma and chroma of MC inter prediction. After deciding the MV, the final MC is performed using an 8-tap interpolation filter for luma and a 4-tap interpolation filter for chroma.

MV細化是基於MV搜索的樣式，其具有雙邊匹配成本或模板匹配成本的標準。在JEM中，支持兩種搜索樣式--分別在CU級和子CU級用於MV細化的無限制中央偏置菱形搜索(UCBDS)和自適應交叉搜索(adaptive cross search)。對於CU和子CU級MV細化，以四分之一亮度樣本MV精度直接搜索MV，並且接著是八分之一亮度樣本MV細化。用於CU和子CU步驟的MV細化的搜索範圍被設定為等於8個亮度樣本。 MV refinement is a style based on MV search with criteria of bilateral matching cost or template matching cost. In JEM, two search styles are supported - Unrestricted Central Biased Diamond Search (UCBDS) and adaptive cross search for MV refinement at the CU level and sub-CU level respectively. For CU and sub-CU level MV refinement, MVs are searched directly with one-quarter luma sample MV precision, and followed by one-eighth luma sample MV refinement. The search range for MV refinement for CU and sub-CU steps is set equal to 8 luma samples.

在雙邊匹配Merge模式中，應用雙向預測，因為CU的運動信息是基於沿兩個不同參考圖片中的當前CU的運動軌迹的兩個塊之間的最接近匹配而導出的。在模板匹配Merge模式中，編碼器可以在對於CU的來自list0的單向預測、來自list1的單向預測或者雙向預測中選擇。選擇可以基於模板匹配成本，如下所示：如果costBi<=factor * min(cost0，cost1)則使用雙向預測；否則，如果cost0<=cost1則使用來自list0的單向預測；否則，使用來自list1的單向預測；此處，cost0是list0模板匹配的SAD，cost1是list1模板匹配的SAD，並且costBi是雙向預測模板匹配的SAD。例如，當因數(factor)的值等於1.25時，這意味著選擇過程偏向於雙向預測。幀間預測方向選擇可以應用於CU級模板匹配過程。 In bilateral matching Merge mode, bidirectional prediction is applied because CU's The motion information is derived based on the closest match between two blocks along the motion trajectory of the current CU in two different reference pictures. In template matching Merge mode, the encoder can choose between unidirectional prediction from list0, unidirectional prediction from list1, or bidirectional prediction for the CU. The selection can be based on template matching cost as follows: if costBi<=factor * min(cost0, cost1) then use bidirectional prediction; otherwise, if cost0<=cost1 then use unidirectional prediction from list0; otherwise, use unidirectional prediction from list1 Unidirectional prediction; here, cost0 is the SAD for list0 template matching, cost1 is the SAD for list1 template matching, and costBi is the SAD for bidirectional prediction template matching. For example, when the value of factor is equal to 1.25, it means that the selection process is biased towards bidirectional prediction. Inter prediction direction selection can be applied to the CU-level template matching process.

當子塊的尺寸較小時，上面討論的基於子塊的預測技術可用於獲取每個子塊的更準確的運動信息。然而，較小的子塊在運動補償中造成了更高的頻寬要求。另一方面，針對較小子塊導出的運動信息可能不準確，尤其是當塊中存在一些噪聲時。因此，在一個塊內具有固定的子塊大小可能是次優的。 When the size of sub-blocks is smaller, the sub-block-based prediction techniques discussed above can be used to obtain more accurate motion information for each sub-block. However, smaller sub-blocks impose higher bandwidth requirements in motion compensation. On the other hand, the motion information derived for smaller sub-blocks may be inaccurate, especially when there is some noise in the block. Therefore, having fixed sub-block sizes within a block may be suboptimal.

本文件描述了可以在各種實施例中使用的技術，以使用非均勻和/或可變子塊大小來解决固定子塊大小引入的頻寬和精度問題。這些技術(也稱為交織預測)使用不同的細分塊的方式，使得可以更加魯棒地獲取運動信息而不增加頻寬消耗。 This document describes techniques that can be used in various embodiments to address the bandwidth and accuracy introduced by fixed sub-block sizes using non-uniform and/or variable sub-block sizes. problem. These techniques (also known as interleaved prediction) use different ways of subdividing blocks, allowing for more robust acquisition of motion information without increasing bandwidth consumption.

使用交織預測技術，將塊以一個或多個細分模式細分為子塊。細分樣式表示將塊細分為子塊的方式，包括子塊的大小和子塊的位置。對於每個細分樣式，可以通過基於細分樣式導出每個子塊的運動信息來生成對應的預測塊。因此，在一些實施例中，即使對於一個預測方向，也可以通過多個細分樣式生成多個預測塊。在一些實施例中，對於每個預測方向，可以僅應用一個細分樣式。 Using interleaved prediction techniques, blocks are subdivided into sub-blocks in one or more subdivision modes. A subdivision style represents the way a block is subdivided into subblocks, including the size of the subblocks and the location of the subblocks. For each subdivision style, a corresponding prediction block can be generated by deriving motion information of each sub-block based on the subdivision style. Therefore, in some embodiments, multiple prediction blocks may be generated by multiple subdivision patterns even for one prediction direction. In some embodiments, only one segmentation style may be applied for each prediction direction.

圖13示出了根據本公開的技術的具有兩個細分樣式的交織預測的示例。當前塊1300可以細分為多個樣式。例如，如圖13所示，當前塊被細分為樣式0(1301)和樣式1(1302)兩者。生成兩個預測塊P0(1303)和P1(1304)。可以通過計算P0(1303)和P1(1304)的加權和，來生成當前塊1300的最終預測塊P(1305)。 13 illustrates an example of interleaved prediction with two subdivision patterns according to the techniques of this disclosure. The current block 1300 may be subdivided into multiple styles. For example, as shown in Figure 13, the current block is subdivided into both Pattern 0 (1301) and Pattern 1 (1302). Two prediction blocks P0 (1303) and P1 (1304) are generated. The final prediction block P (1305) of the current block 1300 may be generated by calculating the weighted sum of P0 (1303) and P1 (1304).

更一般地，給定X個細分樣式，可以通過具有X個細分樣式的基於子塊的預測來生成當前塊的X個預測塊，表示為P0，P1，......，PX-1。表示為P的當前塊的最終預測可以生成為

More generally, given X subdivision styles, X prediction blocks of the current block can be generated by sub-block based prediction with . The final prediction of the current block, denoted as P, can be generated as

此處，(x，y)是塊中的像素的坐標，並且w _i(x,y)是Pi的權重值。作為示例且非限制性地，權重可以表達為：

Here, (x, y) are the coordinates of the pixels in the block, and w _i ( x, y ) are the weight values of Pi. By way of example and without limitation, the weights can be expressed as:

N是非負值。替代地，等式(16)中的位移位操作也可以表達為：

N is a non-negative value. Alternatively, the bit shift operation in equation (16) can also be expressed as:

作為2的幂的權重之和允許通過執行位元移位操作而不是浮點除法來更有效地計算加權和P。 The sum of the weights as a power of 2 allows the weighted sum P to be calculated more efficiently by performing bit shifting operations instead of floating point division.

細分樣式可以具有子塊的不同的形狀、尺寸或位置。在一些實施例中，細分樣式可包含不規則的子塊尺寸。圖14A-14G示出了用於16×16塊的細分樣式的若干示例。在圖14A中，根據所公開的技術將塊細分為4×4子塊。此樣式也用於JEM中。圖14B示出了根據本公開的技術將塊細分為8×8子塊的示例。圖14C示出了根據本公開技術將塊細分為8×4子塊的示例。圖14D示出了根據本公開的技術將塊細分為4×8子塊的示例。在圖14E中，根據本公開的技術，塊的一部分被細分為4×4子塊。塊邊界處的像素被細分為具有諸如2×4、4×2或2×2的大小的更小子塊。可以合並一些子塊以形成更大的子塊。圖14F示出了相鄰子塊的示例，諸如4×4子塊和2×4子塊，其被合並以形成具有諸如6×4、4×6或6×6的尺寸的更大子塊。在圖14G中，塊的一部分被細分為8×8子塊。塊邊界處的像素被細分為較小的子塊，其具有諸如8×4、4×8或4×4的尺寸。 Subdivision styles can have different shapes, sizes, or locations of subblocks. In some embodiments, the subdivision pattern may include irregular sub-block sizes. Figures 14A-14G illustrate several examples of subdivision patterns for 16x16 blocks. In Figure 14A, the block is subdivided into 4x4 sub-blocks according to the disclosed techniques. This style is also used in JEM. Figure 14B shows an example of subdividing a block into 8x8 sub-blocks in accordance with the techniques of this disclosure. Figure 14C shows an example of subdividing a block into 8x4 sub-blocks in accordance with the disclosed technology. Figure 14D shows an example of subdividing a block into 4x8 sub-blocks in accordance with the techniques of this disclosure. In Figure 14E, a portion of the block is subdivided into 4x4 sub-blocks in accordance with the techniques of this disclosure. Pixels at block boundaries are subdivided into smaller sub-blocks of size such as 2x4, 4x2, or 2x2. Some sub-chunks can be merged to form larger sub-chunks. Figure 14F shows examples of adjacent sub-blocks, such as 4x4 sub-blocks and 2x4 sub-blocks, which are merged to form larger sub-blocks with dimensions such as 6x4, 4x6, or 6x6 . In Figure 14G, part of the block is subdivided into 8×8 sub-blocks. Pixels at block boundaries are subdivided into smaller sub-blocks, with dimensions such as 8×4, 4×8, or 4×4.

可以基於編解碼塊的形狀和/或大小和/或編解碼塊信息來確定基於子塊的預測中的子塊的形狀和大小。例如，在一些實施例中，當當前塊具有M×N的大小時，子塊具有4×N(或8×N等) 的大小。也就是說，子塊具有與當前塊相同的高度。在一些實施例中，當當前塊具有M×N的大小時，子塊具有M×4(或M×8等)的大小。也就是說，子塊具有與當前塊相同的寬度。在一些實施例中，當當前塊具有M×N(其中M>N)的大小時，子塊具有A×B的大小，其中A>B(例如，8×4)。替代地，子塊可以具有B×A(例如4×8)的大小。 The shape and size of the sub-block in the sub-block based prediction may be determined based on the shape and/or size of the codec block and/or the codec block information. For example, in some embodiments, when the current block has a size of M×N, the sub-block has a size of 4×N (or 8×N, etc.) the size of. That is, the child block has the same height as the current block. In some embodiments, when the current block has a size of M×N, the sub-block has a size of M×4 (or M×8, etc.). That is, the subblock has the same width as the current block. In some embodiments, when the current block has a size of M×N (where M>N), the sub-block has a size of A×B, where A>B (eg, 8×4). Alternatively, the sub-blocks may be of size B×A (eg 4×8).

在一些實施例中，當前塊具有M×N的大小。當M×N<=T(或Min(M，N)<=T，或Max(M，N)<=T等)時，子塊具有A×B的大小，並且當M×N>T(或Min(M，N)>T，或Max(M，N)>T等)時，子塊具有C×D的大小，其中A<=C且B<=D。例如，如果M×N<=256，則子塊可以是4×4的大小。在一些實現方式中，子塊具有8×8的大小。 In some embodiments, the current block has a size of MxN. When M×N<=T (or Min(M,N)<=T, or Max(M,N)<=T, etc.), the sub-block has the size of A×B, and when M×N>T( or Min(M, N)>T, or Max(M, N)>T, etc.), the sub-block has a size of C×D, where A<=C and B<=D. For example, if M×N<=256, the sub-block may be 4×4 in size. In some implementations, the sub-blocks have a size of 8x8.

在一些實施例中，可以基於幀間預測方向來確定是否應用交織預測。例如，在一些實施例中，交織預測可以應用於雙向預測，但不應用於單向預測。作為另一示例，當應用多假設時，當存在多於一個參考塊時，可以將交織預測應用於一個預測方向。 In some embodiments, whether to apply interleaved prediction may be determined based on the inter prediction direction. For example, in some embodiments, interleaved prediction may be applied to bidirectional prediction but not to unidirectional prediction. As another example, when applying multiple hypotheses, interleaved prediction may be applied to one prediction direction when there is more than one reference block.

在一些實施例中，還可以基於幀間預測方向來確定如何應用交織預測。在一些實施例中，具有基於子塊的預測的雙向預測塊對於兩個不同參考列表以兩個不同細分樣式被細分為子塊。例如，當從參考列表0(L0)預測時，雙向預測塊被細分為4×8子塊，如圖14D所示。當從參考列表1(L1)預測時，將相同的塊細分為8×4子塊，如圖14C所示。最終預測P計算為

In some embodiments, how to apply interleaved prediction may also be determined based on the inter prediction direction. In some embodiments, a bidirectionally predicted block with sub-block based prediction is subdivided into sub-blocks in two different subdivision patterns for two different reference lists. For example, when predicting from reference list 0 (L0), the bidirectional prediction block is subdivided into 4×8 sub-blocks, as shown in Figure 14D. When predicting from reference list 1 (L1), the same block is subdivided into 8×4 sub-blocks, as shown in Figure 14C. The final prediction P is calculated as

此處，P0和P1分別是來自L0和L1的預測。w0和w1分別是L0和L1的權重值。如等式(16)所示，權重值可以確定為：w0(x,y)+w1(x,y)=1<<N(其中N是非負整數值)。因為在每個方向上用於預測的子塊更少(例如，與8×8子塊相比，4×8子塊)，與現有的基於子塊的方法相比，計算需要更少的頻寬。通過使用更大的子塊，預測結果也更不易受噪聲干擾的影響。 Here, P0 and P1 are predictions from L0 and L1 respectively. w0 and w1 are the weight values of L0 and L1 respectively. As shown in equation (16), the weight value can be determined as: w0(x,y)+w1(x,y)=1<<N (where N is a non-negative integer value). Because there are fewer sub-blocks used for prediction in each direction (e.g., 4×8 sub-blocks compared to 8×8 sub-blocks), the computation requires less frequency than existing sub-block-based methods. wide. By using larger sub-blocks, the prediction results are also less susceptible to noise interference.

在一些實施例中，具有基於子塊的預測的單向預測塊對於相同參考列表以兩種或更多種不同細分樣式被細分為子塊。例如，列表L(L=0或1)的預測PL被計算為

In some embodiments, a unidirectional prediction block with sub-block based prediction is subdivided into sub-blocks in two or more different subdivision patterns for the same reference list. For example, the predicted PL for the list L (L=0 or 1) is calculated as

這裏XL是列表L的細分樣式的數目。

(x,y)是用第i細分樣式預測生成的，並且

(x,y)是

(x,y)的權重值。例如，當XL 為2時，兩種細分樣式應用於列表L。在第一細分樣式中，塊被細分為4×8子塊，如圖14D所示。在第二細分樣式中，塊被細分為8×4子塊如圖14D所示。 Here XL is the number of subdivision styles of list L.

( x,y ) is generated using the i-th subdivision style prediction, and

( x,y ) is

The weight value of ( x,y ). For example, when XL is 2, two subdivision styles are applied to list L. In the first subdivision style, the block is subdivided into 4×8 sub-blocks, as shown in Figure 14D. In the second subdivision style, the block is subdivided into 8×4 sub-blocks as shown in Figure 14D.

在一個實施例中，具有基於子塊的預測的雙向預測塊被認為是分別來自L0和L1的兩個單向預測塊的組合。來自每個列表的預測可以按上述示例中的描述導出。最終預測P可以被計算為

In one embodiment, a bi-directional prediction block with sub-block based prediction is considered to be a combination of two uni-directional prediction blocks from L0 and L1 respectively. Predictions from each list can be exported as described in the example above. The final prediction P can be calculated as

這裏，參數a和b是應用於兩個內部預測塊的兩個附加權重。在該具體示例中，a和b可以都設定為1。類似於上面的示例，因為在每個方向上使用更少的子塊用於預測(例如，與8×8子塊相比，4×8子塊)，頻寬使用比現有基於子塊的方法更好或同等水平。同時，通過使用更大的子塊可以改善預測結果。 Here, parameters a and b are two additional weights applied to the two intra prediction blocks. In this specific example, a and b can both be set to 1. Similar to the example above, because fewer sub-blocks are used in each direction for prediction (e.g., 4×8 sub-blocks compared to 8×8 sub-blocks), the bandwidth usage is higher than that of existing sub-block based methods. Better or equivalent. At the same time, the prediction results can be improved by using larger sub-blocks.

在一些實施例中，可以在每個單向預測塊中使用單個非均勻樣式。例如，對於每個列表L(例如，L0或L1)，塊被分成不同的樣式(例如，如圖14E或圖14F所示)。使用較少數目的子塊降低了對頻寬的需求。子塊的非均勻性也增加了預測結果的魯棒性。 In some embodiments, a single non-uniform pattern may be used in each unidirectional prediction block. For example, for each list L (eg, L0 or L1), the blocks are divided into different patterns (eg, as shown in Figure 14E or Figure 14F). Using a smaller number of sub-blocks reduces bandwidth requirements. The non-uniformity of sub-blocks also increases the robustness of the prediction results.

在一些實施例中，對於多假設編解碼塊，對於每個預測方向(或參考圖片列表)，可以存在由不同細分樣式生成的多於一個預測塊。可以使用多個預測塊來生成應用了附加權重的最終預測。例如，附加權重可以設定為1/M，其中M是所生成的預測塊的總數。 In some embodiments, for multi-hypothesis codec blocks, there may be more than one prediction block generated by different subdivision styles for each prediction direction (or reference picture list). Multiple prediction blocks can be used to generate a final prediction with additional weights applied. For example, the additional weight can be set to 1/M, where M is the total number of generated prediction blocks.

在一些實施例中，編碼器可以確定是否應用以及如何應用交織預測。然後，編碼器可以在序列級、圖片級、視圖級、條帶級、編解碼樹單元(CTU)(也稱為最大編解碼單元(LCU))級、CU級、PU級、樹單元(TU)級、或區域級(可包含多個CU/PU/Tu/LCU)將對應於該確定的信息發送到解碼器。可以在序列參數集(SPS)、視圖參數集(VPS)、圖片參數集(PPS)、條帶標頭(SH)、CTU/LCU、CU、PU、TU、或區域的第一個塊中將信息信令通知。 In some embodiments, the encoder may determine whether and how to apply interleaved prediction. The encoder can then operate at sequence level, picture level, view level, slice level, codec tree unit (CTU) (also known as largest codec unit (LCU)) level, CU level, PU level, tree unit (TU) ) level, or regional level (which may include multiple CU/PU/Tu/LCU), sends information corresponding to the determination to the decoder. Can be ordered The information is contained in the column parameter set (SPS), view parameter set (VPS), picture parameter set (PPS), slice header (SH), CTU/LCU, CU, PU, TU, or the first block of the region. order notice.

在一些實現方式中，交織預測適用於現有的子塊方法，例如仿射預測、ATMVP、STMVP、FRUC或BIO。在這種情况下，不需要額外的信令成本。在一些實現方式中，可以將由交織預測生成的新子塊Merge候選插入到Merge列表中，例如，交織預測+ATMVP、交織預測+STMVP、交織預測+FRUC等。 In some implementations, interleaved prediction is adapted to existing sub-block methods, such as affine prediction, ATMVP, STMVP, FRUC, or BIO. In this case, no additional signaling cost is required. In some implementations, new sub-block Merge candidates generated by interleaved prediction can be inserted into the Merge list, for example, interleaved prediction + ATMVP, interleaved prediction + STMVP, interleaved prediction + FRUC, etc.

在一些實施例中，可以基於來自空域和/或時域相鄰塊的信息導出當前塊要使用的細分樣式。例如，不是依賴於編碼器來信令通知相關信息，而是編碼器和解碼器兩者都可以采用一組預定規則來基於時域鄰接(例如，先前使用的相同塊的細分樣式)或空域鄰接(例如，相鄰塊使用的細分樣式)來獲取細分樣式。 In some embodiments, the subdivision pattern to be used for the current block may be derived based on information from spatial and/or temporal neighboring blocks. For example, instead of relying on the encoder to signal the relevant information, both the encoder and decoder could employ a set of predetermined rules to determine whether the code is based on temporal contiguity (e.g., subdivision pattern of the same block previously used) or spatial contiguity ( For example, the subdivision style used by adjacent blocks) to obtain the subdivision style.

在一些實施例中，權重值w可以是固定的。例如，全部細分樣式可以相等地加權：w _i(x,y)=1。在一些實施例中，可以基於塊的位置以及所使用的細分樣式來確定權重值。例如，對於不同的(x,y)，w _i(x,y)可以是不同的。在一些實施例中，權重值還可以取决於基於子塊預測的編解碼技術(例如，仿射或ATMVP)和/或其他編解碼信息(例如，跳過或非跳過模式，和/或MV信息)。在一些實施例中，編碼器可以確定權重值，並且在序列級、圖片級、條帶級、CTU/LCU級、CU級、PU級或區域級(其可以包括多個CU/PU/Tu/LCU))中將值發送到解碼器。可以在序列參數集 (SPS)、圖片參數集(PPS)、條帶標頭(SH)、CTU/LCU、CU、PU或區域的第一塊中信令通知權重值。在一些實施例中，可以從空域和/或時域相鄰塊的權重值導出權重值。 In some embodiments, the weight value w may be fixed. For example, all subdivision styles can be weighted equally: w _i ( x,y ) = 1. In some embodiments, the weight value may be determined based on the location of the block and the subdivision style used. For example, w _i (x,y) can be different for different ( x,y ). In some embodiments, the weight value may also depend on the sub-block prediction based codec technology (eg, affine or ATMVP) and/or other codec information (eg, skip or non-skip mode, and/or MV information). In some embodiments, the encoder may determine the weight value and perform the processing at the sequence level, picture level, slice level, CTU/LCU level, CU level, PU level, or region level (which may include multiple CU/PU/Tu/ The value is sent to the decoder in the LCU)). The weight value may be signaled in the Sequence Parameter Set (SPS), Picture Parameter Set (PPS), Slice Header (SH), CTU/LCU, CU, PU or the first block of the region. In some embodiments, the weight values may be derived from the weight values of spatial and/or temporal neighboring blocks.

注意，本文公開的交織預測技術可以應用於基於子塊的預測中的一種、一些或全部編解碼技術。例如，交織預測技術可以應用於仿射預測，而基於子塊的預測的其他編解碼技術(例如，ATMVP、STMVP、FRUC或BIO)不使用交織預測。作為另一個例子，仿射、ATMVP和STMVP的全部都應用本文公開的交織預測技術。 Note that the interleaved prediction techniques disclosed herein may be applied to one, some, or all coding and decoding techniques in sub-block based prediction. For example, interleaved prediction techniques may be applied to affine prediction, while other codec techniques for sub-block based prediction (eg, ATMVP, STMVP, FRUC, or BIO) do not use interleaved prediction. As another example, Affine, ATMVP, and STMVP all apply the interleaved prediction techniques disclosed herein.

圖15A是根據本公開技術的用於改善視頻系統中的運動預測的方法1500的示例性流程圖。方法1500包含在1502處從視頻幀選擇像素集合以形成塊。方法1500包含在1504處根據第一樣式將塊細分為子塊的第一集合。方法1500包含在1506處基於子塊的第一集合生成第一中間預測塊。方法1500包含在1508處根據第二樣式將塊細分為子塊的第二集合。第二集合中的至少一個子塊具有與第一集合中的子塊不同的大小。方法1500包含在1510處基於子塊的第二集合生成第二中間預測塊。方法1500還包含在1512處基於第一中間預測塊和第二中間預測塊確定預測塊。 15A is an exemplary flow diagram of a method 1500 for improving motion prediction in a video system in accordance with the disclosed technology. Method 1500 includes selecting, at 1502, a set of pixels from a video frame to form a block. Method 1500 includes, at 1504, subdividing the block into a first set of sub-blocks according to the first pattern. Method 1500 includes generating, at 1506, a first intermediate prediction block based on a first set of sub-blocks. Method 1500 includes, at 1508, subdividing the block into a second set of sub-blocks according to the second pattern. At least one sub-block in the second set has a different size than the sub-block in the first set. Method 1500 includes generating, at 1510, a second intermediate prediction block based on a second set of sub-blocks. Method 1500 also includes determining a prediction block based on the first intermediate prediction block and the second intermediate prediction block at 1512 .

在一些實施例中，使用(1)仿射預測方法，(2)可選時域運動向量預測方法，(3)空域-時域運動向量預測方法，(4)雙向光流方法，或(5)幀速率上轉換方法中至少一者來生成第一中間預測塊或第二中間預測塊。 In some embodiments, (1) affine prediction method, (2) alternative temporal motion vector prediction method, (3) spatial-temporal motion vector prediction method, (4) bidirectional optical flow method, or (5 ) frame rate up-conversion method to generate the first intermediate prediction block or the second intermediate prediction block.

在一些實施例中，第一集合或第二集合中的子塊具有矩形形狀。在一些實施例中，子塊的第一集合中的子塊具有非均勻的形狀。在一些實施例中，子塊的第二集合中的子塊具有非均勻的形狀。 In some embodiments, the sub-blocks in the first set or the second set have a rectangular shape. In some embodiments, the sub-blocks in the first set of sub-blocks have a non-uniform shape. In some embodiments, the sub-blocks in the second set of sub-blocks have a non-uniform shape.

在一些實施例中，該方法包含基於塊的大小確定第一樣式或第二樣式。在一些實施例中，該方法包含基於來自與塊在時域上或空域上相鄰的第二塊的信息來確定第一樣式或第二樣式。 In some embodiments, the method includes determining the first pattern or the second pattern based on the size of the block. In some embodiments, the method includes determining the first pattern or the second pattern based on information from a second block that is temporally or spatially adjacent to the block.

在一些實施例中，為了塊在第一方向上的運動預測進行將塊細分為子塊的第一集合。在一些實施例中，為了塊在第二方向上的運動預測進行將塊細分為第二子塊集合。 In some embodiments, subdividing the block into a first set of sub-blocks is performed for motion prediction of the block in a first direction. In some embodiments, the block is subdivided into a second set of sub-blocks for motion prediction of the block in the second direction.

在一些實施例中，為了塊在第一方向上的運動預測進行將塊細分為子塊的第一集合並將塊細分為子塊的第二集合。在一些實施例中，該方法還包含通過以下來進行塊在第二方向上的運動預測：根據第三樣式將塊細分為子塊的第三集合，基於子塊的第三集合生成第三中間預測塊，根據第四樣式將塊細分為子塊的第四集合，其中第四集合中的至少一個子塊具有與第三集合中的子塊不同的尺寸，基於子塊的第四集合生成第四中間預測塊，基於第三中間預測塊和第四中間預測塊確定第二預測塊，以及基於預測塊和第二預測塊確定第三預測塊。 In some embodiments, subdividing the block into a first set of sub-blocks and subdividing the block into a second set of sub-blocks is performed for motion prediction of the block in a first direction. In some embodiments, the method further includes performing motion prediction of the block in the second direction by subdividing the block into a third set of sub-blocks according to the third pattern and generating a third intermediate based on the third set of sub-blocks. Predicting the block, subdividing the block into a fourth set of sub-blocks according to a fourth pattern, wherein at least one sub-block in the fourth set has a different size than a sub-block in the third set, generating a third set based on the fourth set of sub-blocks. Four intermediate prediction blocks, the second prediction block is determined based on the third intermediate prediction block and the fourth intermediate prediction block, and the third prediction block is determined based on the prediction block and the second prediction block.

在一些實施例中，該方法包含將細分塊的第一樣式和第二樣式的信息傳輸到基於塊的運動預測視頻系統中的編解碼裝置。在一些實施例中，傳輸第一樣式和第二樣式的信息在以下之一處進行：(1)序列級，(2)圖片級，(3)視圖級，(4)條帶級，(5)編解碼樹單元，(6)最大編解碼單元級，(7)編解碼單元級，(8)預測單元級，(10)樹單元級，或(11)區域級。 In some embodiments, the method includes transmitting information of the first pattern and the second pattern of the subdivided blocks to a codec device in a block-based motion prediction video system. In some embodiments, transmitting the first style and the second style of information is between Conducted in one place: (1) sequence level, (2) picture level, (3) view level, (4) stripe level, (5) codec tree unit, (6) maximum codec unit level, (7) codec decoding unit level, (8) prediction unit level, (10) tree unit level, or (11) region level.

在一些實施例中，確定預測結果包含將權重的第一集合應用到第一中間預測塊，以獲取第一加權預測塊，將權重的第二集合應用到第二中間預測塊，以獲取第二加權預測塊，以及計算第一加權預測塊和第二加權預測塊的加權和，以獲取預測塊。 In some embodiments, determining the prediction result includes applying a first set of weights to a first intermediate prediction block to obtain a first weighted prediction block, and applying a second set of weights to a second intermediate prediction block to obtain a second weighting the prediction block, and calculating a weighted sum of the first weighted prediction block and the second weighted prediction block to obtain the prediction block.

在一些實施例中，權重的第一集合或權重的第二集合包含固定權重值。在一些實施例中，基於來自與該塊在時域上或空域上相鄰的另一塊的信息來確定權重的第一集合或權重的第二集合。在一些實施例中，使用用於生成第一預測塊或第二預測塊的編解碼算法來確定權重的第一集合或權重的第二集合。在一些實現方式中，權重的第一集合中的至少一個值不同於權重的第一集合中的另一值。在一些實現方式中，權重的第二集合中的至少一個值不同於權重的第二集合中的另一值。在一些實現方式中，權重之和等於2的幂。 In some embodiments, the first set of weights or the second set of weights includes fixed weight values. In some embodiments, the first set of weights or the second set of weights is determined based on information from another block that is temporally or spatially adjacent to the block. In some embodiments, the first set of weights or the second set of weights are determined using a coding and decoding algorithm used to generate the first prediction block or the second prediction block. In some implementations, at least one value in the first set of weights is different from another value in the first set of weights. In some implementations, at least one value in the second set of weights is different from another value in the second set of weights. In some implementations, the sum of the weights is equal to a power of two.

在一些實施例中，該方法包含將權重傳輸到基於塊的運動預測視頻系統中的編解碼裝置。在一些實施例中，傳輸權重在以下之一處進行：(1)序列級，(2)圖片級，(3)視圖級，(4)條帶級，(5)編解碼樹單元，(6)最大編解碼單元級，(7)編解碼單元級，(8)預測單元級，(10)樹單元級，或(11)區域級。 In some embodiments, the method includes transmitting weights to a codec device in a block-based motion prediction video system. In some embodiments, transmission weighting is performed at one of: (1) sequence level, (2) picture level, (3) view level, (4) slice level, (5) codec tree unit, (6 ) maximum codec unit level, (7) codec unit level, (8) prediction unit level, (10) tree unit level, or (11) region level.

圖15B是根據所公開的技術的用於改善視頻系統中的基於塊的運動預測的方法1550的示例流程圖。方法1550包括在1552處從視頻幀中選擇像素集合以形成塊。方法1550包括在1554處基於塊的大小或來自與塊在空域上或時域上相鄰的另一塊的信息將塊細分為多個子塊。多個子塊中的至少一個子塊具有與其他子塊不同的大小。方法1550還包括在1556處通過將編解碼算法應用於多個子塊來生成運動向量預測。在一些實施例中，編解碼算法包括(1)仿射預測方法，(2)可選時域運動向量預測方法，(3)空域-時域運動向量預測方法，(4)雙向光流方法，或(5)幀速率上轉換方法中的至少一個。 Figure 15B is a basic diagram for improving the video system in accordance with the disclosed technology. An example flowchart of a method 1550 for block-based motion prediction. Method 1550 includes selecting, at 1552, a set of pixels from the video frame to form a block. Method 1550 includes, at 1554, subdividing the block into a plurality of sub-blocks based on the size of the block or information from another block that is spatially or temporally adjacent to the block. At least one sub-block of the plurality of sub-blocks has a different size than other sub-blocks. Method 1550 also includes generating, at 1556, a motion vector prediction by applying a coding and decoding algorithm to the plurality of sub-blocks. In some embodiments, the encoding and decoding algorithms include (1) affine prediction method, (2) optional temporal motion vector prediction method, (3) spatial-temporal motion vector prediction method, (4) bidirectional optical flow method, or (5) at least one of the frame rate upconversion methods.

在方法1500和1550中，可以實現部分交織。使用該方案，預測樣本的第一子集中的樣本被計算為第一中間預測塊的加權組合，並且預測樣本的第二子集中的樣本被從基於子塊的預測複製，其中第一子集和第二子集是基於細分樣式。第一子集和第二子集可以一起構成整個預測塊，例如，當前正被處理的塊。如圖18A-18C中所示，在各種示例中，從交織中排除的第二子集可以由(a)角部子塊或(b)子塊的最上和最下行或(c)子塊的最左或右列構成。當前正在處理的塊的大小可以用作决定是否從交織預測中排除某些子塊的條件。例如，下文中描述的某些條件。 In methods 1500 and 1550, partial interleaving may be achieved. Using this scheme, the samples in a first subset of prediction samples are computed as a weighted combination of the first intermediate prediction block, and the samples in a second subset of prediction samples are copied from the sub-block based prediction, where the first subset and The second subset is based on subdivision styles. The first subset and the second subset may together constitute the entire prediction block, eg, the block currently being processed. As shown in Figures 18A-18C, in various examples, the second subset excluded from interleaving may consist of (a) the corner sub-block or (b) the uppermost and lowermost rows of the sub-block or (c) the Composed of the leftmost or right column. The size of the block currently being processed can be used as a condition for deciding whether to exclude certain sub-blocks from interleaved prediction. For example, some of the conditions described below.

如在本文件中進一步描述的，編碼過程可以避免檢查從母塊細分的塊的仿射模式，其中母塊本身用不同於仿射模式的模式編碼。 As described further in this document, the encoding process can avoid checking the affine pattern of blocks subdivided from a parent block, where the parent block itself is encoded with a different pattern than the affine pattern.

圖16是示出可以用來實現本公開技術的各部分的計算機系統或其他控制裝置1600的架構的示例的方塊圖。在圖16中，計算機系統1600包含經由互連件1625連接的一個或多個處理器1605和儲存器1610。互連件1625可以表示任意一個或多個分開的物理總綫、點對點連接或兩者，其由適當的橋接、適配器，或控制器連接。因此，互連件1625可以包含例如系統總綫、外圍部件互連(PCI)總綫、超傳輸(HyperTransport)或行業標準架構(ISA)總綫、小型計算機系統介面(SCSI)總綫、通用串行總綫(USB)、IIC(I2C)總綫，或電氣與電子工程師學會(IEEE)標準674總綫，有時稱為“火綫(Firewire)”。 16 is an illustration of a computer that may be used to implement portions of the disclosed technology. A block diagram of an example of the architecture of a system or other control device 1600. In FIG. 16 , computer system 1600 includes one or more processors 1605 and storage 1610 connected via interconnect 1625 . Interconnect 1625 may represent any one or more separate physical buses, point-to-point connections, or both, connected by appropriate bridges, adapters, or controllers. Accordingly, interconnect 1625 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or Industry Standard Architecture (ISA) bus, a Small Computer System Interface (SCSI) bus, a Universal Serial Bus Line bus (USB), IIC (I2C) bus, or Institute of Electrical and Electronics Engineers (IEEE) standard 674 bus, sometimes called "Firewire".

(多個)處理器1605可以包含中央處理單元(CPU)，以控制例如主機計算機的總體操作。在某些實施例中，(多個)處理器1605通過儲存在儲存器1610中的執行軟體或韌體將此實現。(多個)處理器1605可以是或可以包含一個或多個可編程通用或專用微處理器、數位信令處理器(DSP)、可編程控制器、應用專用積體電路(ASIC)、可編程邏輯裝置(PLD)等，或這樣的裝置的組合。 Processor(s) 1605 may include a central processing unit (CPU) to control, for example, the overall operation of a host computer. In some embodiments, processor(s) 1605 accomplish this through execution software or firmware stored in memory 1610 . Processor(s) 1605 may be or may include one or more programmable general or special purpose microprocessors, digital signaling processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic device (PLD), etc., or a combination of such devices.

儲存器1610可以是或包含計算機系統的主儲存器。儲存器1610表示任意適當形式的隨機存取儲存器(RAM)、只讀儲存器(ROM)、閃速儲存器等，或這樣的裝置的組合。在使用中，儲存器1610除其他之外可以含有機器指令集，當由處理器1605執行該機器指令集時，使處理器1605進行操作，以實現本公開技術的實施例。 Storage 1610 may be or include the main storage of the computer system. Storage 1610 represents any suitable form of random access memory (RAM), read only memory (ROM), flash memory, etc., or a combination of such devices. In use, storage 1610 may contain, among other things, a set of machine instructions that, when executed by processor 1605, cause processor 1605 to operate to implement embodiments of the disclosed technology.

還通過互連件1625連接到(多個)處理器1605的是(可選的)網路適配器1615。網路適配器1615為計算機系統1600提供與遠程裝置(諸如儲存客戶端，和/或其他儲存服務器)通信的能力，並且可以是例如以太網適配器或光纖通道適配器。 Also connected to processor(s) 1605 via interconnect 1625 is an (optional) network adapter 1615. Network adapter 1615 provides computer system 1600 with the ability to communicate with remote devices (such as storage clients, and/or other storage servers), and may be, for example, an Ethernet adapter or a Fiber Channel adapter.

圖17示出了可以用來實現本公開技術的各部分的移動裝置1700的示例性實施例的方塊圖。移動裝置1700可以是膝上式計算機、智能電話、平板計算機、攝錄機，或能够處理視頻的其他類型的裝置。移動裝置1700包含處理器或控制器1701以處理數據，以及與處理器1701通信的儲存器1702，以儲存和/或緩衝數據。例如，處理器1701可以包含中央處理單元(CPU)或微控制器單元(MCU)。在一些實現方式中，處理器1701可以包含現場可編程門陣列(FPGA)。在一些實現方式中，移動裝置1700包含以下或與以下通信：圖形處理單元(GPU)、視頻處理單元(VPU)和/或用於智能電話裝置的各種視覺和/或通信數據處理功能的無綫通信單元。例如，儲存器1702可以包含並儲存處理器可執行代碼，當由處理器1701執行該代碼時，將移動裝置1700配置為進行各種操作，例如，諸如接收信息、命令，和/或數據，處理信息和數據，以及將處理的信息/數據傳輸或提供到另一裝置，諸如致動器或外部顯示器。為了支持移動裝置1700的各種功能，儲存器1702可以儲存信息和數據，諸如指令、軟體、值、圖像，以及由處理器1701處理或引用的其他數據。例如，各種類型的隨機存取儲存器(RAM)裝置、只讀儲存器(ROM)裝置、閃速儲存器裝置，以及其他適當儲存介質可以用於實現儲存器1702的儲存功能。在一些實現方式中，移動裝置1700包含輸入/輸出(I/O)單元1703，以將處理器1701和/或儲存器1702與其他模組、單元或裝置相接。例如，I/O單元1703可以將處理器1701和儲存器1702與典型的數據通信標準兼容的各種類型的無綫介面相接並將其利用，例如，諸如在雲中的一個或多個計算機與用戶裝置之間。在一些實現方式中，移動裝置1700可以經由I/O單元1703使用有綫連接與其他裝置相接。移動裝置1700還可以與其他外部介面(諸如數據儲存，和/或視覺或音頻顯示裝置1704)相接，以取回並傳輸可由處理器處理的數據和信息，儲存在儲存器中，或展示在顯示裝置1704的輸出單元或外部裝置上。例如，顯示裝置1704可以顯示基於根據本公開技術的MVP修改的視頻幀(例如，包含預測塊1305的視頻幀，如圖13中所示)。 17 illustrates a block diagram of an exemplary embodiment of a mobile device 1700 that may be used to implement portions of the disclosed technology. Mobile device 1700 may be a laptop computer, smartphone, tablet computer, camcorder, or other type of device capable of processing video. Mobile device 1700 includes a processor or controller 1701 to process data, and a memory 1702 in communication with processor 1701 to store and/or buffer data. For example, processor 1701 may include a central processing unit (CPU) or a microcontroller unit (MCU). In some implementations, processor 1701 may include a field programmable gate array (FPGA). In some implementations, the mobile device 1700 includes or is in communication with a graphics processing unit (GPU), a video processing unit (VPU), and/or wireless communication for various visual and/or communication data processing functions of the smartphone device. unit. For example, memory 1702 may contain and store processor executable code that, when executed by processor 1701, configures mobile device 1700 to perform various operations, such as, for example, receiving information, commands, and/or data, processing information and data, as well as transmitting or providing the processed information/data to another device, such as an actuator or external display. To support various functions of mobile device 1700 , storage 1702 may store information and data, such as instructions, software, values, images, and other data processed or referenced by processor 1701 . For example, various types of random access memory (RAM) devices, read only memory (ROM) devices, flash memory devices Configuration, and other appropriate storage media may be used to implement the storage function of the storage 1702. In some implementations, mobile device 1700 includes an input/output (I/O) unit 1703 to interface processor 1701 and/or memory 1702 with other modules, units, or devices. For example, I/O unit 1703 may interface and utilize processor 1701 and memory 1702 with various types of wireless interfaces that are compatible with typical data communications standards, for example, such as between one or more computers in the cloud and a user. between devices. In some implementations, mobile device 1700 may interface with other devices using wired connections via I/O unit 1703 . Mobile device 1700 may also interface with other external interfaces (such as data storage, and/or visual or audio display device 1704) to retrieve and transmit data and information that may be processed by a processor, stored in memory, or displayed on on the output unit of the display device 1704 or on an external device. For example, display device 1704 may display video frames based on MVP modifications in accordance with the disclosed technology (eg, video frames including prediction block 1305, as shown in FIG. 13).

在一些實施例中，視頻解碼器設備可實施視頻解碼的方法，其中如本文中所描述的經改善的基於塊的運動預測用於視頻解碼。該方法可以包含使用來自視頻幀的像素集合來形成視頻的塊。可以根據第一樣式將塊細分為子塊的第一集合。第一中間預測塊可以對應於子塊的第一集合。該塊可以包含根據第二樣式的子塊的第二集合。第二集合中的至少一個子塊具有與第一集合中的子塊不同的大小。該方法還可以基於第一中間預測塊和從子塊的第二集合生成的第二中間預測塊來確定預測塊。該方法的其他特徵可以類似於上述方法1500。 In some embodiments, a video decoder device may implement a method of video decoding in which improved block-based motion prediction as described herein is used for video decoding. The method may include using a collection of pixels from a video frame to form blocks of the video. The block may be subdivided into a first set of sub-blocks according to the first pattern. The first intermediate prediction block may correspond to the first set of sub-blocks. The block may contain a second set of sub-blocks according to a second pattern. At least one sub-block in the second set has a different size than the sub-block in the first set. The method may also determine the prediction block based on the first intermediate prediction block and a second intermediate prediction block generated from the second set of sub-blocks. Other features of the method may be similar to method 1500 described above.

在一些實施例中，視頻解碼的解碼器側方法可以使用基於塊的運動預測以用於通過使用用於預測的視頻幀的塊來改善視頻質量，其中塊對應於像素塊的集合。可以基於塊的大小或者來自與塊在空域上或時域上相鄰的另一塊的信息將塊細分為多個子塊，其中多個子塊中的至少一個子塊具有與其他子塊不同的大小。解碼器可以使用通過將編解碼算法應用於多個子塊而生成的運動向量預測。關於圖15B和對應的描述描述了該方法的其他特徵。 In some embodiments, a decoder-side approach to video decoding may use block-based motion prediction for improving video quality by using blocks of video frames for prediction, where blocks correspond to sets of blocks of pixels. The block may be subdivided into a plurality of sub-blocks based on the size of the block or information from another block spatially or temporally adjacent to the block, wherein at least one sub-block of the plurality of sub-blocks has a different size than the other sub-blocks. The decoder may use motion vector predictions generated by applying a coding and decoding algorithm to multiple sub-blocks. Additional features of this method are described with respect to Figure 15B and the corresponding description.

在一些實施例中，可以使用實現在關於圖16和圖17描述的硬體平臺上的解碼設備實現視頻解碼方法。 In some embodiments, the video decoding method may be implemented using a decoding device implemented on the hardware platform described with respect to FIGS. 16 and 17 .

部分交織 partially intertwined

在一些實施例中，部分交織預測可以如下實現。 In some embodiments, partially interleaved prediction may be implemented as follows.

在一些實施例中，交織預測應用到當前塊的一部分。一些位置處的預測樣本被計算為兩個或更多個基於子塊的預測的加權和。其他位置處的預測樣本不用於加權和。例如，這些預測樣本從具有特定細分樣式的基於子塊的預測複製。 In some embodiments, interleaved prediction is applied to a portion of the current block. Prediction samples at some locations are calculated as a weighted sum of two or more sub-block based predictions. Predicted samples at other locations are not used in the weighted sum. For example, these prediction samples are copied from sub-block based predictions with a specific segmentation style.

在一些實施例中，由分別具有細分樣式D0和細分樣式D1的基於子塊的預測P1和P2預測當前塊。最終預測計算為P=w0×P0+w1×P1。在一些位置處，w0≠0且w1≠0。但在一些其他位置處，w0=1且w1=0，即，交織預測不應用於那些位置。 In some embodiments, the current block is predicted from sub-block based predictions P1 and P2 with subdivision pattern D0 and subdivision pattern D1 respectively. The final prediction is calculated as P=w0×P0+w1×P1. At some locations, w0≠0 and w1≠0. But at some other positions, w0=1 and w1=0, i.e., interleaved prediction should not be applied at those positions.

在一些實施例中，交織預測不應用於如圖18A中所示的四個角部子塊。 In some embodiments, interleaved prediction is not applied to the four corner sub-blocks as shown in Figure 18A.

在一些實施例中，交織預測不應用於如圖18B中所示的子塊的最左列和子塊的最右列。 In some embodiments, interleaved prediction is not applied to the leftmost column of the subblock and the rightmost column of the subblock as shown in Figure 18B.

在一些實施例中，交織預測不應用於如圖18C中所示的子塊的最頂部行和子塊的最底部行。 In some embodiments, interleaved prediction is not applied to the topmost row of a subblock and the bottommost row of a subblock as shown in Figure 18C.

整合在編碼器實施例內的技術的示例 Examples of techniques integrated within encoder embodiments

在一些實施例中，交織預測不應用於運動估計(ME)過程。 In some embodiments, interleaved prediction is not used in the motion estimation (ME) process.

例如，對於6-參數仿射預測，交織預測不應用於ME過程。 For example, for 6-parameter affine prediction, interleaved prediction should not be used in the ME process.

例如，如果當前塊的尺寸滿足諸如以下的特定條件，則交織預測不應用於ME過程。此處，假設當前塊的寬度和高度分別為W和H，並且T、T1、T2為整數值：W>=T1且H>=T2；W<=T1且H<=T2；W>=T1或H>=T2；W<=T1或H<=T2；W+H>=T For example, if the size of the current block satisfies certain conditions such as the following, interleaved prediction should not be applied to the ME process. Here, assume that the width and height of the current block are W and H respectively, and T, T1, and T2 are integer values: W>=T1 and H>=T2; W<=T1 and H<=T2; W>=T1 Or H>=T2; W<=T1 or H<=T2; W+H>=T

W+H<=T W+H<=T

W×H>=T W×H>=T

W×H<=T W×H<=T

例如，如果當前塊是從母塊分割的，並且母塊在編碼器處不選擇仿射模式，則在ME過程省略交織預測。 For example, if the current block is split from a parent block, and the parent block does not select affine mode at the encoder, interleaving prediction is omitted in the ME process.

圖19是用於改善視頻系統中的基於塊的運動預測的方法1900的流程圖表示。方法1900包含，在操作1902，基於視頻塊的特性，使用第一中間預測塊和第二中間預測塊確定視頻塊的預測塊。方法1900包含，在操作1904，使用預測塊生成視頻塊的編解碼表示。通過將視頻塊細分為子塊的第一集合生成第一中間預測塊，並且通過將視頻塊細分為子塊的第二集合生成第二中間預測塊。第二集合中的至少一個子塊具有與第一集合中的子塊不同的尺寸。 Figure 19 is a flowchart representation of a method 1900 for improving block-based motion prediction in a video system. Method 1900 includes, at operation 1902, determining a prediction block for the video block using a first intermediate prediction block and a second intermediate prediction block based on characteristics of the video block. Method 1900 includes, at operation 1904, generating a codec representation of the video block using the prediction block. A first intermediate prediction block is generated by subdividing the video block into a first set of sub-blocks, and a second intermediate prediction block is generated by subdividing the video block into a second set of sub-blocks. At least one sub-block in the second set has a different size than the sub-block in the first set.

在一些實施例中，在編碼過程中的運動估計的階段進行確定。在一些實施例中，視頻塊的特性指示預測塊不是基於仿射預測確定的。 In some embodiments, the determination is made during the motion estimation stage of the encoding process. In some embodiments, the characteristics of the video block indicate that the prediction block is not determined based on affine prediction.

在一些實施例中，塊具有W的寬度和H的高度，並且視頻塊的特性指示預測塊是由於W和H不滿足一個或多個條件而確定的。在一些實施例中，一個或多個條件包含W

T1且H

T2，T1和T2為預定整數值。在一些實施例中，一個或多個條件包含W

T1且H

T1或H

T1或H

T2，T1和T2為預定整數值。在一些實施例中，一個或多個條件包含W+H

T，T為預定整數值。在一些實施例中，一個或多個條件包含W+H

T，T為預定整數值。在一些實施例中，一個或多個條件包含W×H

T，T為預定整數值。在一些實施例中，一個或多個條件包含 W×H

T，T為預定整數值。 In some embodiments, the block has a width of W and a height of H, and the characteristics of the video block indicate that the prediction block is determined because W and H do not satisfy one or more conditions. In some embodiments, one or more conditions include W

T1 and H

T2, T1 and T2 are predetermined integer values. In some embodiments, one or more conditions include W

T1 and H

T1 or H

T2, T1 and T2 are predetermined integer values. In some embodiments, one or more conditions include W+H

T, T is a predetermined integer value. In some embodiments, one or more conditions include W+H

T, T is a predetermined integer value. In some embodiments, one or more conditions include W×H

T, T is a predetermined integer value.

在一些實施例中，視頻塊的特性指示視頻塊不是從母塊分割的。在一些實施例中，視頻塊的特性指示塊是從仿射編解碼的母塊分割的。 In some embodiments, the characteristics of the video block indicate that the video block is not split from a parent block. In some embodiments, the characteristics of the video block indicate that the block is partitioned from an affine-coded parent block.

替代地，如果當前塊是從母塊分割的，並且母塊在編碼器處不選擇仿射模式，則在編碼器處不檢查仿射模式。 Alternatively, if the current block is split from a parent block and the parent block does not select affine mode at the encoder, then no affine mode is checked at the encoder.

圖20是示出可以實現本文公開的各種技術的示例性視頻處理系統2000的方塊圖。各種實現方式可以包含系統2000的一些或全部組件。系統2000可以包含用於接收視頻內容的輸入2002。視頻內容可以以原始或未壓縮格式接收，例如8或10位元多分量像素值，或者可以是壓縮或編碼格式。輸入2002可以表示網路介面、外圍總綫介面或儲存介面。網路介面的示例包含諸如以太網，無源光網路(PON)等的有綫介面和諸如Wi-Fi或蜂窩介面的無綫介面。 20 is a block diagram illustrating an example video processing system 2000 that may implement various techniques disclosed herein. Various implementations may include some or all components of system 2000. System 2000 may include an input 2002 for receiving video content. Video content may be received in raw or uncompressed format, such as 8 or 10-bit multi-component pixel values, or may be in compressed or encoded format. Input 2002 can represent a network interface, peripheral bus interface, or storage interface. Examples of network interfaces include wired interfaces such as Ethernet, Passive Optical Network (PON), etc. and wireless interfaces such as Wi-Fi or cellular interfaces.

系統2000可以包含編解碼組件2004，其可以實現本文件中描述的各種編解碼或編碼方法。編解碼組件2004可以將來自輸入2002的視頻的平均位元率减小到編解碼組件2004的輸出，以產生視頻的編解碼表示。因此，編解碼技術有時被稱為視頻壓縮或視頻轉碼技術。編解碼組件2004的輸出可以儲存或者經由連接的通信傳輸，如組件2006所表示的。在輸入2002處接收的視頻的儲存或通信的位元流(或編解碼)表示可以由組件2008使用，以生成發送到顯示介面2010的像素值或可顯示視頻。從位元流表示生成用戶可視視頻的過程有時被稱為視頻解壓縮。此外，雖然某些視頻處理操作被稱為“編解碼”操作或工具，但是應當理解，編解碼工具或操作在編碼器處使用，並且逆轉編解碼結果的相應的解碼工具或操作將由解碼器進行。 System 2000 may include a codec component 2004 that may implement the various codecs or encoding methods described in this document. Codec component 2004 may reduce the average bitrate of the video from input 2002 to the output of codec component 2004 to produce a codec representation of the video. Therefore, codec technology is sometimes called video compression or video transcoding technology. The output of the codec component 2004 may be stored or transmitted via connected communications, as represented by component 2006 . The stored or communicated bitstream (or codec) representation of the video received at input 2002 may be used by component 2008 to generate pixel values or displayable video that is sent to display interface 2010 . from bit stream table The process of generating user-viewable video is sometimes called video decompression. Additionally, while certain video processing operations are referred to as "encoding" operations or tools, it is understood that the codec tools or operations are used at the encoder and that the corresponding decoding tools or operations that reverse the codec results would be performed by the decoder .

外圍總綫介面或顯示介面的示例可以包含通用串行總綫(USB)或高解析度多媒體介面(HDMI)或顯示端口(Displayport)等。儲存介面的示例包含SATA(串行先進技術附件)、PCI、IDE介面等。本文件中描述的技術可以實施為各種電子設備，諸如移動電話、膝上型計算機、智能電話或能够執行數位數據處理和/或視頻顯示的其他裝置。 Examples of peripheral bus interfaces or display interfaces may include Universal Serial Bus (USB), High Definition Multimedia Interface (HDMI), or Display Port (Displayport). Examples of storage interfaces include SATA (Serial Advanced Technology Attachment), PCI, IDE interfaces, etc. The techniques described in this document may be implemented in a variety of electronic devices, such as mobile phones, laptop computers, smartphones, or other devices capable of performing digital data processing and/or video display.

從前述內容可以理解，本文已經出於說明的目的描述了本公開技術的具體實施例，但是在不脫離本發明的範圍的情况下可以進行各種修改。相應地，除了所附申請專利範圍之外，本發明所公開的技術不受限制。 It will be understood from the foregoing that specific embodiments of the disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without departing from the scope of the invention. Accordingly, the technology disclosed in the present invention is not limited except for the scope of the appended claims.

本文件中描述的公開的和其他實施例、模組和功能操作可以在數位電子電路中實現，或者在計算機軟體、韌體或硬體中實現，包含本文件中公開的結構及其結構等同，或者它們中的一個或多個的組合。所公開的和其他實施例可以實現為一個或多個計算機程式產品，即，在計算機可讀介質上編碼的一個或多個計算機程式指令模組，用於由數據處理設備執行或控制數據處理設備的操作。計算機可讀介質可以是機器可讀儲存設備、機器可讀儲存基板、儲存器裝置、影響機器可讀傳播信號的物質組合，或者它們中的一個或多個的組合。術語“數據處理設備”包含用於處理數據的全部設備、裝置和機器，包含例如可編程處理器、計算機或多個處理器或計算機。除了硬體之外，該設備還可以包含為所討論的計算機程式創建執行環境的代碼，例如，構成處理器韌體的代碼、協議棧、數據庫管理系統、操作系統，或者它們中的一個或多個的組合。傳播信令是人工生成的信號，例如機器生成的電信號、光信號或電磁信號，其被生成以對信息進行編碼以便傳輸到合適的接收器設備。 The disclosed and other embodiments, modules, and functional operations described in this document may be implemented in digital electronic circuits, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or a combination of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by or to control data processing equipment. operation. The computer-readable medium may be a machine-readable storage device, a machine-readable storage substrate, a storage device, a combination of substances that effect a machine-readable propagated signal, or or a combination of one or more of them. The term "data processing apparatus" includes all equipment, apparatus and machines for processing data, including for example a programmable processor, a computer or a plurality of processors or computers. In addition to hardware, the device may contain code that creates the execution environment for the computer program in question, such as code that makes up the processor's firmware, a protocol stack, a database management system, an operating system, or one or more of them. a combination of. Propagated signaling is an artificially generated signal, such as a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to a suitable receiver device.

計算機程式(也稱為程式、軟體、軟體應用程式、脚本或代碼)可以用任何形式的編程語言編寫，包含編譯或解釋語言，並且它可以以任何形式部署，包含如獨立程式或適合在計算環境中使用的模組、組件、子例程或其他單元。計算機程式不一定對應於文件系統中的文件。程式可以儲存在文件的保存其他程式或數據的一部分(例如，儲存在標記語言文件中的一個或多個脚本)中，儲存在專用於所討論的程式的單個文件中，或儲存在多個協調文件中(例如，儲存一個或多個模組、子程式或代碼的部分的文件)。可以部署計算機程式以在一個計算機上或在位於一個站點上或分布在多個站點上並通過通信網路互連的多個計算機上執行。 A computer program (also called a program, software, software application, script, or code) may be written in any form of programming language, including a compiled or interpreted language, and it may be deployed in any form, including as a stand-alone program or as suitable in a computing environment Modules, components, subroutines or other units used in. Computer programs do not necessarily correspond to files in a file system. Programs may be stored as part of a file that holds other programs or data (for example, one or more scripts stored in a markup language file), in a single file dedicated to the program in question, or in multiple coordinated In a file (for example, a file that stores one or more modules, subroutines, or portions of code). A computer program may be deployed to execute on one computer or on multiple computers located at a site or distributed across multiple sites and interconnected by a communications network.

本文件中描述的過程和邏輯流程可以由執行一個或多個計算機程式的一個或多個可編程處理器執行，以通過對輸入數據進行操作並生成輸出來執行功能。過程和邏輯流程也可以由專用邏輯電路(例如，FPGA(現場可編程門陣列)或ASIC(專用積體電路))執行，並且裝置也可以實現為專用邏輯電路。 The processes and logic flows described in this document may be executed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. Procedures and logic flows can also be represented by dedicated Logic circuits (eg, FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) are implemented, and the device may also be implemented as an application specific logic circuit.

適合於執行計算機程式的處理器包含例如通用和專用微處理器，以及任何類型的數位計算機的任何一個或多個處理器。通常，處理器將從只讀儲存器或隨機存取儲存器或兩者接收指令和數據。計算機的基本元件是用於執行指令的處理器和用於儲存指令和數據的一個或多個儲存器設備。通常，計算機還將包含用於儲存數據的一個或多個大容量裝置設備(例如磁盤、磁光盤或光盤)或與之可操作地耦合以從一個或多個大容量裝置設備接收數據或將數據傳輸到一個或多個大容量裝置設備。但是，計算機不需要這樣的設備。適用於儲存計算機程式指令和數據的計算機可讀介質包含全部形式的非揮發性儲存器、介質和儲存器裝置，包含例如半導體儲存器設備，例如EPROM，EEPROM和快閃記憶體；磁盤，例如內部硬盤或可移動磁盤；磁光盤；以及，CD ROM和DVD-ROM盤。處理器和儲存器可以由專用邏輯電路補充或並入專用邏輯電路中。 Processors suitable for the execution of a computer program include, for example, general and special purpose microprocessors, and any one or more processors of any type of digital computer. Typically, a processor will receive instructions and data from read-only memory or random access memory, or both. The basic elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also contain or be operably coupled to one or more mass storage devices (such as magnetic, magneto-optical, or optical disks) for storing data or to receive data from or to one or more mass storage devices. Transfer to one or more bulk device devices. However, computers do not require such equipment. Computer-readable media suitable for storage of computer program instructions and data includes all forms of non-volatile storage, media, and storage devices, including, for example, semiconductor storage devices, such as EPROM, EEPROM, and flash memory; magnetic disks, such as internal Hard or removable disks; magneto-optical disks; and, CD ROM and DVD-ROM disks. The processor and memory may be supplemented by or incorporated into special purpose logic circuitry.

雖然本專利文件包含許多細節，但這些細節不應被解釋為對任何發明或可要求保護的範圍的限制，而是作為特定於特定發明的特定實施例的特徵的描述。在單獨的實施例的上下文中在本專利文件中描述的某些特徵也可以在單個實施例中組合實現。相反，在單個實施例的上下文中描述的各種特徵也可以單獨地或以任何合適的子組合在多個實施例中實現。此外，儘管上面的特徵可以描述為以某些組合起作用並且甚至最初如此要求保護，但是在某些情况下可以從組合中去除來自所要求保護的組合的一個或多個特徵，並且所要求保護的組合可以針對子組合或子組合的變體。 Although this patent document contains many details, these details should not be construed as limitations on the scope of any invention or that may be claimed, but rather as descriptions of features specific to particular embodiments of a particular invention. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, despite the above special Features may be described as functioning in certain combinations and even initially claimed as such, but in some cases one or more features from the claimed combination may be removed from the combination and the claimed combination may be sub- Variations of combinations or subcombinations.

類似地，雖然在附圖中以特定順序描繪了操作，但是這不應該被理解成為了實現期望的結果要求以所示的特定順序或按順序執行這樣的操作，或者執行全部示出的操作。此外，在本專利文件中描述的實施例中的各種系統組件的分離不應被理解為在全部實施例中都需要這種分離。 Similarly, although operations are depicted in the drawings in a specific order, this should not be understood to imply that achieving desirable results requires that such operations be performed in the specific order shown, or in sequence, or that all illustrated operations are performed. Furthermore, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

僅描述了幾個實現方式和示例，並且可以基於本專利文件中描述和示出的內容來進行其他實現方式、增强和變化。 Only a few implementations and examples are described, and other implementations, enhancements, and changes may be made based on what is described and illustrated in this patent document.

1500:方法 1500:Method

1502:從視頻幀選擇像素集合以形成塊 1502: Select a collection of pixels from a video frame to form a block

1504:根據第一樣式將塊細分為子塊的第一集合 1504: Subdivide the block into a first set of sub-blocks according to the first style

1506:基於子塊的第一集合生成第一中間預測塊 1506: Generate a first intermediate prediction block based on the first set of sub-blocks

1508:根據第二樣式將塊細分為子塊的第二集合 1508: Subdivide the block into a second set of sub-blocks according to the second style

1510:基於子塊的第二集合生成第二中間預測塊 1510: Generate a second intermediate prediction block based on the second set of sub-blocks

1512:基於第一中間預測塊和第二中間預測塊確定預測塊 1512: Determine the prediction block based on the first intermediate prediction block and the second intermediate prediction block.

Claims

A method of processing video, comprising: using a first intermediate prediction block and a second intermediate prediction block to determine a prediction block of the video block based on characteristics of the video block; and using the prediction block to generate a codec of the video block, wherein the first intermediate prediction block is generated by subdividing the video block into a first set of sub-blocks, wherein the second intermediate prediction block is generated by subdividing the video block into a second set of sub-blocks, and wherein at least one sub-block in the second set has a different size than the sub-block in the first set, wherein the video block has a width of W and a height of H, and wherein the characteristics of the video block Indicates that the prediction block is determined because W and H do not satisfy one or more conditions, and wherein the one or more conditions include: W

T1 and H

T2, T1 and T2 are predetermined integer values; W

T1 or H

T2, T1 and T2 are predetermined integer values; W

T1 or H

T2, T1 and T2 are predetermined integer values; W+H

T, T is a predetermined integer value; W+H

T, T is a predetermined integer value; or, W×H

T, T is a predetermined integer value.

A method as claimed in claim 1, wherein said determination is performed during the motion estimation stage of the encoding process.

The method of claim 1, wherein characteristics of the video block indicate that the prediction block is not determined based on affine prediction.

The method of claim 1 or 2, wherein the characteristics of the video block indicate that the video block is not split from a parent block.

The method of claim 1 or 2, wherein the characteristics of the video block indicate that the video block is split from an affine coded parent block.

A device in a video system, including a processor and a non-transitory storage having instructions thereon, wherein when the instructions are executed by the processor, the processor is caused to implement items 1 to 5 of the patent application scope. any of the methods described.

A non-transitory computer-readable medium comprising the computer code stored thereon, the computer program code being used to execute the method described in any one of items 1 to 5 of the patent application.