WO2020042990A1 - Procédé et dispositif de prédiction inter-trame, et procédé et dispositif de codage/décodage pour leur application - Google Patents

Procédé et dispositif de prédiction inter-trame, et procédé et dispositif de codage/décodage pour leur application Download PDF

Info

Publication number
WO2020042990A1
WO2020042990A1 PCT/CN2019/101893 CN2019101893W WO2020042990A1 WO 2020042990 A1 WO2020042990 A1 WO 2020042990A1 CN 2019101893 W CN2019101893 W CN 2019101893W WO 2020042990 A1 WO2020042990 A1 WO 2020042990A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion information
current coding
candidate motion
unit
current
Prior art date
Application number
PCT/CN2019/101893
Other languages
English (en)
Chinese (zh)
Inventor
杨海涛
徐巍炜
赵寅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810990347.3A external-priority patent/CN110868589B/zh
Priority claimed from CN201811164177.XA external-priority patent/CN111010565B/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020042990A1 publication Critical patent/WO2020042990A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh

Definitions

  • the embodiments of the present application relate to the field of video encoding, and more specifically, to an inter prediction method in a video encoding and decoding process.
  • Video encoding (video encoding and decoding) is widely used in digital video applications, such as broadcast digital TV, video transmission on the Internet and mobile networks, real-time conversation applications such as video chat and video conferencing, DVD and Blu-ray discs, video content acquisition and editing systems And security applications for camcorders.
  • Video coding standards include MPEG-1 video, MPEG-2 video, ITU-T H.262 / MPEG-2, ITU-T H.263, ITU-T H.264 / MPEG-4 Part 10 Advanced Video Coding (Advanced Video Coding (AVC), ITU-T H.265 / High Efficiency Video Coding (HEVC) ... and extensions to such standards, such as scalability and / or three-dimensional (3D) extensions.
  • AVC Advanced Video Coding
  • HEVC High Efficiency Video Coding
  • one of the goals of most video coding standards is to reduce the bit rate compared to previous standards without reducing the subjective quality of the picture.
  • HEVC High Efficiency Video Coding
  • VVC Very Video Coding
  • a frame of image will be divided into non-overlapping coding tree units (CTU), and the CTU size can be set to 64 ⁇ 64 or 128 ⁇ 128 size.
  • CTU non-overlapping coding tree units
  • a CTU is divided into one or more coding units (Coding Units, CU).
  • a CU contains basic coding information, including information such as prediction mode and transform coefficients.
  • the decoding end can perform corresponding prediction processing, inverse quantization, inverse transform, reconstruction, and filtering on the CU according to the encoded information to generate a reconstructed image corresponding to the CU.
  • a CU corresponds to a predicted image and a residual image, and the predicted image and the residual image are added to obtain a reconstructed image.
  • the predicted image is generated by intra prediction or inter prediction, and the residual image is generated by inverse quantization and inverse transform processing of the transform coefficients.
  • Inter prediction is a prediction technique based on motion compensation.
  • inter-prediction coding because the same objects in adjacent frames of the image have a certain time-domain correlation, each frame of the image sequence can be divided into many non-overlapping blocks, and all pixels in the block are considered to have the same motion. .
  • the main processing process is to determine the motion information of the current block, and obtain the reference image block from the reference frame of the current block according to the motion information to generate a predicted image of the current block.
  • the current block refers to the encoding / decoding process in progress.
  • An image block, where the current block may be a luma block or a chroma block in a coding unit.
  • the motion information includes an inter prediction direction, a reference frame index (ref_idx), a motion vector (MV), and the like.
  • the inter prediction direction indicates how the current block uses forward prediction, backward prediction, or bidirectional prediction.
  • the motion vector indicates the displacement vector of the reference image block in the reference frame for predicting the current block relative to the current block. Therefore, a motion vector corresponds to a reference image block in a reference frame.
  • Inter-prediction of an image block can generate a predicted image using only one motion vector and using pixels in a reference frame, which is called unidirectional prediction; it can also use two motion vectors to combine using pixels in two reference frames. Generating a prediction image is called bidirectional prediction. That is, an image block can usually contain one or two motion vectors. For some multi-hypothesis inter prediction techniques, an image block may contain more than two motion vectors.
  • a MV is a two-dimensional vector that contains horizontal and vertical displacement components.
  • An MV corresponds to two frames, each frame has a picture order number (POC), which is used to indicate the order in which the images are displayed. On the number, so a MV also corresponds to a POC difference.
  • the POC difference has a linear relationship with the time interval.
  • the scaling of the motion vector usually uses a scaling method based on the POC difference value to convert a motion vector between one pair of images into a motion vector between another pair of images.
  • H.265 / HEVC, H.266 / VVC and other video coding standards divide a frame of image into non-overlapping Coding Tree Units (CTUs).
  • a CTU is divided into one or more codes.
  • One CU contains coding information, including information such as prediction mode and transform coefficients.
  • Decoding end Perform corresponding prediction, inverse quantization, inverse transform and other decoding processing on the CU according to the encoded information to generate a reconstructed image corresponding to the CU.
  • motion information occupies a large amount of data.
  • motion information is usually transmitted by prediction.
  • the prediction methods are divided into inter prediction and intra prediction. Intra prediction uses reference blocks in the same frame image as prediction blocks, while inter prediction is Reference blocks in different frames are used as prediction blocks.
  • AMVP Advanced Motion Vector Prediction
  • inter prediction direction forward, backward, or bidirectional
  • reference frame index Identifies the inter prediction direction (forward, backward, or bidirectional), reference frame index, and motion vector prediction used in the code stream for the current block.
  • Value index motion vector predictor index, MVP index
  • motion vector residual value motion vector difference
  • the motion vector prediction value index indicates that one MVP in the MVP list is used as the prediction value of the current block MV, and one MVP and one MVD are added to obtain one MV.
  • merge / skip mode identify the merge index in the code stream, select a merge candidate from the merge candidate list according to the merge index, and the motion information of the current block (including prediction direction, reference frame, The motion vector) is determined by this merge candidate.
  • merge mode implies that the current block has residual information, that is, the motion vector obtained from the motion candidate list is used as the motion vector prediction value of the current block, and the motion vector of the current block is determined by The predicted value of the motion vector is added to the residual value of the motion vector, and the residual of the motion vector is obtained by decoding the code stream; and the skip mode implies that the current block has no residual information (or the residual is 0), that is, from The motion vector obtained in the motion vector list is directly used as the motion vector of the current block for inter prediction; the two methods of deriving motion information are the same.
  • Affine transformation mode obtain the motion vector of each sub-block in the current block from the two or three control point motion vectors by affine transformation.
  • AMVP mode and merge / skip mode each of them needs to establish a candidate list first.
  • a candidate motion vector list (AMVP candidate list) needs to be established. A better motion vector will be selected as the current one. Block motion vector prediction value, and the index value of the motion vector will be written into the code stream; for Merge / skip mode, a motion candidate list needs to be proposed, and the current candidate
  • the candidate motion information in the motion information list includes unidirectional or bidirectional reference information, a reference frame index, and motion vector information corresponding to the reference direction.
  • Figure 6 shows the specific positions of candidate blocks in the spatial domain and candidate blocks in the time domain that need to be referenced in the AMVP and merge / skip modes to build the candidate motion vector list and the current candidate motion information list.
  • the blocks at the bottom right and center of the current block are the most suitable to provide good temporal motion vector prediction (TMVP).
  • TMVP temporal motion vector prediction
  • the graph on the right of Figure 6 shows that two spatial MVP candidates A and B are derived from five spatially neighboring blocks.
  • AMVP allows a maximum of two candidate motion vectors, that is, the maximum value of the AMVP candidate list is 2, and the merge / skip mode allows more candidate motion information.
  • the maximum number of candidate motion information allowed in HEVC is 5. That is, the maximum value of the current candidate motion information list is 5.
  • the JVET-K0104 proposal proposes a method of adding historical candidate motion information (history candidate) to the fusion motion information candidate list and candidate motion vector prediction list, adding merge / skip fusion motion information candidates and motion vector prediction for Inter MVP mode The number of candidates improves prediction efficiency.
  • the historical candidate motion information list is composed of historical candidate motion information, where the historical candidate motion information is motion information of a previously encoded block.
  • the JVET-K0104 proposal introduces the use of historical candidate sports information list (history, candidate list) and the construction method of historical candidate sports information list.
  • the construction method of the fused motion information candidate list (the use method of the historical candidate motion information list) incorporating the historical candidate motion information is as follows:
  • Step 1 The spatial candidate and the temporal candidate adjacent to the spatial domain of the current block are added to the fusion motion information candidate list of the current block.
  • the method is the same as the method in HEVC.
  • the spatial candidates include A0, A1, B0, B1, and B2, and the time domain candidates include T0 and T1.
  • time-domain candidates also include candidates provided by adaptive time-domain motion vector prediction (ATMVP) technology.
  • ATMVP adaptive time-domain motion vector prediction
  • Step 2 The historical candidate motion information in the historical candidate motion information list is added to the fusion motion information candidate list in order, and a preset number of historical candidate motion information is checked in the order from the tail to the head of the historical candidate motion information list, as shown in the figure. 7 is shown.
  • Starting from the historical candidate motion information at the end of the historical candidate motion information list check whether it is the same as the fusion motion information candidate in the fusion motion information candidate list obtained in step 1. If they are different, add them to the fusion motion information candidate list. If they are the same, check The next historical candidate motion information in the historical candidate motion information list.
  • Step 3 Add other types of fused motion information candidates, such as bi-predictive candidates and zero motion vector candidates.
  • the historical candidate motion information list is constructed using the motion information of the coded block in the current frame, and the historical candidate motion information list is accessed in a first-in, first-out manner.
  • the overall historical candidate motion information list in the encoding / decoding side is constructed and used as follows:
  • Step 1 Initialize the historical candidate motion information list at the beginning of slice decoding and clear it.
  • Step 2 Decode the current CU. If the current CU or current block is in merge or inter inter prediction mode, a fusion motion information candidate list or candidate motion vector prediction list is generated, and historical candidate motion information in the historical candidate motion information list is added to the list. To the fused motion information candidate list or candidate motion vector prediction list.
  • Step 3 After decoding the current CU or the current block, the motion information of the current block is added as new historical candidate motion information to the historical candidate motion information list, and the historical candidate motion information list is updated, as shown in FIG. 8.
  • the motion information of the current block is compared with the historical candidate motion information in the historical candidate motion information list. If some historical candidate motion information (for example, MV2 in FIG. 3) is the same as the motion information of the current block, then this historical candidate motion information MV2 is removed. Then, check the size of the historical candidate motion information list. If the size of the list exceeds a preset size, the historical candidate motion information at the head of the list is removed. Finally, the motion information of the current block is added to the tail of the historical candidate motion information list.
  • some historical candidate motion information for example, MV2 in FIG. 3
  • the historical candidate motion information list is initialized when each slice used in the prior art starts encoding and decoding, which is not conducive to parallel encoding at the row level and CTU level.
  • the method updates the historical candidate motion information list grid for each coding block.
  • the construction and update time is longer.
  • the present invention provides an inter prediction method and device, a codec method applying the method, and a codec device applying the device.
  • a method for inter prediction includes: initializing a historical candidate motion information list corresponding to a current coding tree unit, wherein the historical candidate motion information list includes N storage spaces, so The N storage spaces are used to store historical candidate motion information.
  • the initialized historical candidate motion information list includes at least M vacant storage spaces, where M ⁇ N, M and N are integers, and the current coding tree unit includes In a coding tree unit set (Slice) composed of multiple coding tree units, the current coding tree unit is not the first one in the coding tree unit set according to a predetermined processing sequence;
  • the motion information at L positions in the spatially adjacent blocks of the current coding tree unit is added to the historical candidate motion information list, where M ⁇ L ⁇ N, and the L positions in the spatially adjacent blocks are according to a preset
  • a combination of lists performs inter prediction on the current coding tree unit or the current coding unit
  • the historical candidate motion information list is initialized, that is, an independent historical candidate motion information list corresponding to the current coding tree unit is constructed, thereby cutting the coding tree unit coding.
  • the dependency relationship due to the construction of the historical candidate motion information list in the process, so that the coding tree unit can independently encode according to its own historical candidate motion information list, which has considerable encoding efficiency and is more conducive to designing row-level and CTU-level encoding and decoding are parallel. Through parallel processing, the encoding and decoding time can be greatly reduced while ensuring that the encoding quality is basically not lost.
  • the above initialization process may also be performed conditionally, for example, if the current coding tree unit is located at the first of a coding tree unit row or the first of a combination of parallel coding tree units (the parallel coding tree unit is located at the same location as Consecutive K coding tree unit rows of coding tree unit rows, where K is greater than or equal to 1, less than the total number of coding tree units L in a coding tree unit row, and L is greater than or equal to 2), then the historical candidate motion corresponding to the current coding tree unit is initialized list of information
  • M predetermined positions in adjacent blocks in the airspace are adopted as the M historical candidates
  • the source position of the motion information specifically, the M positions in the adjacent blocks in the spatial domain are: obtaining the first candidate motion information from preset positions in the adjacent blocks in the spatial domain to obtain the first candidate motion information
  • the position of X is the starting point, and the remaining M-1 candidate motion information is obtained at a preset step interval.
  • the motion vectors at the M positions are usually preset from a starting position and preset.
  • the motion vectors at M positions are obtained in sequence.
  • the preset interval can also be called the step size.
  • the step size can be fixed. For example, 4 or 8 pixels are used as the unit.
  • the step size can also be changed, for example, different step sizes can be set according to the size of the current coding tree unit.
  • the order of adding the motion information of the M position may be a preset order, for example, in a clockwise Sequentially, starting from the spatial neighboring block in the lower left corner of the current coding tree unit and using the spatial neighboring block in the upper right corner of the current coding tree unit as the end point,
  • the motion information is added to the historical candidate motion information list.
  • This acquisition method aims at being able to match the processing order of adjacent blocks in the airspace well and simplify the read and write logic of historical motion information. Therefore, a variety of different methods can be adopted. For example, in a counterclockwise manner, or from the adjacent blocks in the airspace at the two end points as starting points and reading in opposite directions at the same time.
  • the method may further include: Combining the current candidate motion information list of the current coding tree unit and the combination of the historical candidate motion information list or combining the current candidate motion information list of the current coding unit and the historical candidate motion information list, which may specifically Yes: adding the historical candidate motion information to the current candidate motion information list of the current coding tree unit or the current candidate motion information list of the previous coding tree unit, and then based on the current candidate motion information list or the previous coding tree unit of the current coding tree unit In the current candidate motion information list of the coding unit, the inter prediction is performed.
  • This processing method can simplify the indexing operation of the current motion coding tree unit or the motion information of the coding unit, and add the motion information in the historical candidate motion information list to the current motion coding tree unit or the current candidate motion information list of the coding unit. Later, the original candidate motion information and historical candidate motion information use a unified index order and index number, and it is not necessary to establish an additional current candidate motion information list index, which can effectively simplify the indexing process.
  • the method may further include updating the historical candidate motion information list based on the current coding unit motion information. This method enables the historical candidate motion information list corresponding to the current coding tree unit to be continuously updated to improve the accuracy of inter prediction.
  • the above-mentioned updating of the historical candidate motion information list can be divided into two cases, that is, if If the M positions are not full, adding the current coding unit motion information as historical motion information to the vacant storage space closest to the NM position among the M positions in the historical candidate motion information list; or If the M positions are filled, the historical motion information added to the historical candidate motion information list is removed first according to the first-in, first-out principle, and the remaining historical motion information exceeds the removed historical motion information position.
  • the current coding unit motion information is added to the tail of the historical candidate motion information list as historical motion information, where the end of the historical candidate motion information list that contains the latest added historical motion information is the historical candidate.
  • the end of the list of exercise information This method provides the flexibility of the historical candidate motion information list application, that is, the historical candidate motion information list can also be used for inter prediction of the current block when it is not completely filled, and when the historical candidate motion information list is full Next, the motion information / motion vector of the current coding block can still be used to update the historical candidate motion information list.
  • the motion information / motion vector of the current coding block is still
  • the historical candidate motion information list may no longer be updated, that is, inter-frame prediction is performed on another coding unit based on the same method as the current coding unit, where the other coding unit is located in a preset processing order.
  • the historical motion information list used for inter prediction of the other coding unit includes a history used for inter prediction of the current coding unit Historical exercise information in the exercise information list.
  • the current coding unit motion information is added as historical motion information to the historical candidate motion information list, and the vacant position closest to the NM position among the M positions is vacant. If the M positions are filled, the next coding unit is inter-predicted based on the current candidate motion information list.
  • This processing method can allow parallel processing of coded blocks within the current coding tree unit.
  • the historical candidate motion information list is not filled after traversing the spatially adjacent image blocks.
  • the current coding tree unit is located at the uppermost side of a frame of image, or if the current coding tree unit is located at the leftmost side of a frame of image
  • one of the following methods can be referred to to process the historical candidate motion information list that is not full section.
  • Method 1 The motion information of any other source is no longer filled, and the motion information of the current coding unit obtained when the current coding unit in the current coding tree unit is coded is added to the historical candidate motion information as the historical candidate motion information. List.
  • Manner 2 Fill the motion information of the coding blocks in the non-adjacent positions of the current coding tree unit.
  • the preset non-proximity position may be a fixed interval from the neighboring position or a preset template.
  • Method 3 Fill the time-domain motion information of the coding block from the reference frame with the corresponding position of the current coding tree unit and a preset position in the corresponding position of the current coding tree unit near the coding block.
  • the preset positions in the corresponding positions of the current coding tree unit may be extracted at fixed intervals, or may be extracted in a specific rule or order.
  • the preset position in the corresponding position of the current coding tree unit adjacent to the coding block may be extracted by a specific rule and a specific order.
  • Method 4 The time-domain motion information of the coding block from the preset position in the corresponding position of the coding block corresponding to the current coding tree unit and the preset non-adjacent position of the previous coding tree unit is filled in the reference frame.
  • the preset position in the corresponding position of the current pre-coding tree unit may be extracted at a fixed interval, or may be extracted in a specific rule or order.
  • the preset position in the corresponding position of the current pre-coding tree unit preset non-adjacent coding block may be extracted by a specific rule and a specific order.
  • Method 5 The historical candidate motion information from the historical candidate motion information list of the coding tree unit adjacent to the current coding tree unit is filled.
  • any of the above methods for filling the historical candidate information motion list that is not full can enrich the historical candidate motion information in the historical candidate motion information list.
  • the current candidate motion information list is complemented so as to make full use of the coding and decoding gain that the historical candidate motion information list can bring.
  • a second aspect of the present invention provides a method for encoding using the inter prediction of the first aspect of the present invention, which includes: performing an inter frame on a current coding tree unit or coding unit based on the inter prediction method of the first aspect of the present invention. Prediction to obtain an inter-predicted image; subtracting the obtained inter-predicted image from the current coding tree unit or the original image of the current coding unit to obtain a residual image; indexing the residual image and the motion information Encode to form a code stream.
  • the current stream can be obtained by analyzing the code stream, the current coding tree unit or the motion information index corresponding to the current coding unit, according to the combination of the historical candidate motion information list and the current candidate motion information list. Motion information of the coding tree unit or the current coding unit.
  • a third aspect of the present invention provides a method for encoding using the inter prediction of the first aspect of the present invention, which includes: performing inter-frame on a current coding tree unit or coding unit based on the inter prediction method of the first aspect of the present invention. Prediction to obtain an inter-prediction image, subtracting the obtained inter-prediction image from the current coding tree unit or the original image of the current coding unit to obtain a residual image; encoding the residual image to form a code stream. Wherein, in the process of obtaining the inter-predicted image based on the inter-prediction method based on the first aspect, the method further includes: obtaining the current encoding tree from a combination of the historical candidate motion information list and the current candidate motion information list.
  • Motion information of a unit or a current coding unit and a motion information index of the motion information performing inter prediction on the current coding tree unit or the current coding unit according to the motion information of the current coding tree unit or the current coding unit An inter prediction image; and encoding the motion index.
  • the update of the historical candidate motion information list at the level of the encoding tree is adopted, which allows the row-level and CTU-level encoding and decoding to be parallelized, which can effectively reduce the encoding time.
  • the present invention also provides an inter-frame prediction device, an encoding device, and an encoding device corresponding to the first, second, and third aspects of the invention, and a decoding device corresponding to the third aspect of the invention is connected to a decoding device.
  • the present invention also provides an inter-array prediction device, an encoding device, and an encoding device corresponding to the first, second, and third aspects of the present invention.
  • the prediction device includes a digital processor and a memory, and the executable is stored in the memory.
  • An instruction set, and the digital processor reads the instruction set stored in the memory to implement the method provided by the first, second or third aspect of the present invention.
  • FIG. 1 is a block diagram of an example of a video encoding system for implementing an embodiment of the present invention
  • FIG. 2 is a block diagram of an example structure of a video encoder for implementing an embodiment of the present invention
  • FIG. 3 is a block diagram of an example structure of a video decoder for implementing an embodiment of the present invention
  • FIG. 4 is a diagram showing the decoder 20 including the encoder 20 of FIG. 2 and the decoder 30 of FIG. 3.
  • FIG. 5 is a block diagram showing an example of another encoding device or decoding device
  • FIG. 6 is a schematic diagram showing the positions of the neighboring blocks in the spatial domain and the neighboring blocks in the time domain of the current block;
  • FIG. 7 is a schematic diagram of adding historical candidate motion information to the current candidate motion information list
  • FIG. 8 is a schematic diagram of the construction of a historical candidate motion information list
  • FIG. 9 is motion information of adjacent image blocks on the left and upper spatial domains of a pre-coding tree unit
  • FIG. 10 is a flow for acquiring two spatial candidate motion information A and B.
  • FIG. 11 is a flowchart of an example operation of a video encoder implementing an inter prediction method of the present invention according to an embodiment
  • FIG. 12 is a flowchart of a method for decoding a video decoder based on the inter-array prediction method of FIG. 11 according to another embodiment
  • FIG. 13 is a flowchart of a method for encoding a video decoder based on the inter-array prediction method of FIG. 11 according to another embodiment
  • FIG. 14 is a schematic diagram of an inter prediction device provided to implement the method described in FIG. 11 according to another embodiment
  • FIG. 15 is a schematic diagram of an inter prediction device provided to implement the method described in FIG. 12 according to another embodiment
  • FIG. 16 is a schematic diagram of an inter prediction device provided to implement the method described in FIG. 13 according to another embodiment
  • FIG. 17 is a schematic diagram of a device provided to implement any of the methods in FIGS. 11 to 13 according to another embodiment.
  • the corresponding device may include one or more units such as functional units to perform the described one or more method steps (e.g., one unit performs one or more steps Or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the drawings.
  • the corresponding method may include a step to perform the functionality of one or more units (e.g., a step performs one or more units Functionality, or multiple steps, where each performs the functionality of one or more of the multiple units), even if such one or more steps are not explicitly described or illustrated in the drawings.
  • a step performs one or more units Functionality, or multiple steps, where each performs the functionality of one or more of the multiple units
  • the features of the various exemplary embodiments and / or aspects described herein may be combined with each other, unless explicitly stated otherwise.
  • Video coding generally refers to processing a sequence of pictures that form a video or a video sequence.
  • picture In the field of video coding, the terms “picture”, “frame” or “image” can be used as synonyms.
  • Video encoding used in this application means video encoding or video decoding.
  • Video encoding is performed on the source side and typically involves processing (e.g., by compressing) the original video picture to reduce the amount of data required to represent the video picture (thus storing and / or transmitting more efficiently).
  • Video decoding is performed on the destination side and usually involves inverse processing relative to the encoder to reconstruct the video picture.
  • the video pictures (or collectively referred to as pictures, which will be explained below) referred to in the embodiments should be understood as “encoding” or “decoding” related to a video sequence.
  • the combination of the encoding part and the decoding part is also called codec (encoding and decoding).
  • the original video picture can be reconstructed, that is, the reconstructed video picture has the same quality as the original video picture (assuming there is no transmission loss or other data loss during storage or transmission).
  • further compression is performed by, for example, quantization to reduce the amount of data required to represent the video picture, and the decoder side cannot completely reconstruct the video picture, that is, the quality of the reconstructed video picture is compared to the original video picture The quality is lower or worse.
  • Each picture of a video sequence is usually partitioned into a set of non-overlapping blocks, usually encoded at the block level.
  • the encoder side usually processes at the block (video block) level, that is, encodes the video.
  • the prediction block is generated by spatial (intra-picture) prediction and temporal (inter-picture) prediction.
  • the encoder duplicates the decoder processing loop so that the encoder and decoder generate the same predictions (such as intra prediction and inter prediction) and / or reconstruction for processing, that is, encoding subsequent blocks.
  • the term "block” may be part of a picture or frame.
  • VVC Video Coding Experts Group
  • MPEG ISO / IEC Motion Picture Experts Group
  • High-Efficiency Video Coding (HEVC) developed by Video Coding (JCT-VC) describes embodiments of the present invention.
  • JCT-VC Video Coding
  • HEVC a CTU is split into multiple CUs by using a quad-tree structure represented as a coding tree.
  • Each CU can be further split into one, two or four PUs according to the PU split type. The same prediction process is applied within a PU, and related information is transmitted to the decoder on the basis of the PU.
  • a CU may be partitioned into a transform unit (TU) according to other quad-tree structures similar to a coding tree for a CU.
  • TU transform unit
  • quad-tree and binary-tree (QTBT) split frames are used to split coded blocks.
  • the CU may be a square or rectangular shape.
  • a coding tree unit (CTU) is first divided by a quad tree structure.
  • the quad leaf nodes are further partitioned by a binary tree structure.
  • Binary leaf nodes are called coding units (CUs), and the segments are used for prediction and transformation processing without any other segmentation.
  • CUs coding units
  • the segments are used for prediction and transformation processing without any other segmentation.
  • CUs coding units
  • the segments are used for prediction and transformation processing without any other segmentation.
  • the CU, PU, and TU have the same block size in the QTBT coded block structure.
  • Embodiments of the encoder 20, the decoder 30, and the encoding system 10 are described below based on Figs. 1 to 3 (before the embodiment of the present invention is described in more detail based on Fig. 6).
  • FIG. 1 is a conceptual or schematic block diagram of an exemplary encoding system 10, for example, a video encoding system 10 that can utilize the technology of the present application (the present disclosure).
  • the encoder 20 e.g., video encoder 20
  • decoder 30 e.g., video decoder 30
  • the encoding system 10 includes a source device 12 for providing the encoded data 13, such as the encoded picture 13, to a destination device 14 that decodes the encoded data 13, for example.
  • the source device 12 includes an encoder 20, and in addition, optionally, may include a picture source 16, such as a pre-processing unit 18 of a picture pre-processing unit 18, and a communication interface or communication unit 22.
  • a picture source 16 such as a pre-processing unit 18 of a picture pre-processing unit 18, and a communication interface or communication unit 22.
  • the picture source 16 may include or may be any kind of picture capture device for, for example, capturing real-world pictures, and / or any kind of pictures or comments (for screen content encoding, some text on the screen is also considered to be a picture to be encoded Or a part of an image) generating device, for example, a computer graphics processor for generating computer animated pictures, or for obtaining and / or providing real world pictures, computer animated pictures (for example, screen content, virtual reality (VR) ) Pictures) of any type of device, and / or any combination thereof (eg, augmented reality (AR) pictures).
  • a computer graphics processor for generating computer animated pictures, or for obtaining and / or providing real world pictures, computer animated pictures (for example, screen content, virtual reality (VR) ) Pictures) of any type of device, and / or any combination thereof (eg, augmented reality (AR) pictures).
  • AR augmented reality
  • a (digital) picture is or can be regarded as a two-dimensional array or matrix of sampling points with luminance values.
  • the sampling points in the array may also be called pixels (short for picture element) or pixels.
  • the number of sampling points of the array or picture in the horizontal and vertical directions (or axes) defines the size and / or resolution of the picture.
  • three color components are usually used, that is, a picture can be represented as or contain three sampling arrays.
  • pictures include corresponding red, green, and blue sampling arrays.
  • each pixel is usually represented in a luma / chroma format or color space, for example, YCbCr, including the luma component indicated by Y (sometimes also indicated by L) and the two chroma indicated by Cb and Cr Weight.
  • Luma (abbreviated as luma) component Y represents luminance or gray level intensity (for example, both are the same in a grayscale picture), while two chroma (abbreviated as chroma) components Cb and Cr represent chroma or color information components .
  • a picture in the YCbCr format includes a luminance sampling array of luminance sampling values (Y), and two chrominance sampling arrays of chrominance values (Cb and Cr).
  • Y luminance sampling values
  • Cb and Cr chrominance sampling arrays of chrominance values
  • Pictures in RGB format can be converted or converted to YCbCr format, and vice versa. This process is also called color conversion or conversion. If the picture is black and white, the picture can include only an array of luminance samples.
  • the picture source 16 may be, for example, a camera for capturing pictures, such as a memory of a picture memory, including or storing a previously captured or generated picture, and / or any category (internal) of obtaining or receiving a picture Or external) interface.
  • the camera may be, for example, an integrated camera that is local or integrated in the source device, and the memory may be local or, for example, an integrated memory that is integrated in the source device.
  • the interface may be, for example, an external interface for receiving pictures from an external video source.
  • the external video source is, for example, an external picture capture device, such as a camera, an external memory, or an external picture generation device.
  • the external picture generation device is, for example, an external computer graphics processor, a computer.
  • the interface may be any type of interface according to any proprietary or standardized interface protocol, such as a wired or wireless interface, an optical interface.
  • the interface for acquiring the picture data 17 may be the same interface as the communication interface 22 or a part of the communication interface 22.
  • the picture or picture data 17 may also be referred to as the original picture or the original picture data 17.
  • the pre-processing unit 18 is configured to receive (original) picture data 17 and perform pre-processing on the picture data 17 to obtain pre-processed pictures 19 or pre-processed picture data 19.
  • the pre-processing performed by the pre-processing unit 18 may include trimming, color format conversion (for example, conversion from RGB to YCbCr), color correction, or denoising. It is understood that the pre-processing unit 18 may be an optional component.
  • An encoder 20 (eg, video encoder 20) is used to receive the pre-processed picture data 19 and provide the encoded picture data 21 (details will be further described below, for example, based on FIG. 2 or FIGS. 4, 5).
  • the encoder 20 may select a prediction method that is most suitable for the current block (the current image block to be encoded) according to the rate-distortion cost evaluation, for example, using intra prediction or inter prediction.
  • the encoder 20 selects the inter prediction mode, the encoder may perform the following method to perform inter prediction on the current block. That is, the encoder 20 first initializes a historical candidate corresponding to the current coding tree unit.
  • a motion information list where the historical candidate motion information list includes N storage spaces, the N storage spaces are used to store historical candidate motion information, and the initialized historical candidate motion information list includes at least M vacant storage Space, where M ⁇ N, M and N are integers, the current coding tree unit is included in a coding tree unit set (Slice) composed of multiple coding tree units, and the current coding tree unit is not the coding
  • the first in the tree unit set is in a predetermined processing order; then, the motion information at L positions in the spatially adjacent block of the current coding tree unit is added to the historical candidate motion information list in the predetermined order.
  • the L positions in the spatially adjacent blocks are obtained according to a preset rule; constructing the current coding tree unit or the current coding Elementary current candidate motion information list, wherein the coding unit is obtained by dividing the coding tree unit; finally, according to the current coding tree unit or the current candidate motion information list of the current coding unit and the historical candidate motion information list In combination, perform inter prediction on the current coding tree unit or the current coding unit. Based on the above method, the encoder 20 can obtain more diverse motion information during the encoding process to predict the current block. In the application process of the historical candidate motion information list, it is not necessary to wait for the previous encoding tree unit to be encoded.
  • the above initialization of the historical candidate motion information list corresponding to the current coding tree unit may be performed conditionally, for example, if the current coding tree unit is located at the first of a coding tree unit row, or at the first of a parallel coding tree unit combination One (the parallel coding tree unit is composed of consecutive K coding tree units located in the same coding tree unit row, K is greater than or equal to 1, less than the total number of coding tree units in one coding tree unit row), then initialization and current coding List of historical candidate motion information corresponding to the tree unit.
  • the communication interface 22 of the source device 12 can be used to receive the encoded picture data 21 and transmit it to other devices, such as the destination device 14 or any other device, for storage or direct reconstruction, or for correspondingly storing the
  • the encoded data 13 and / or the encoded picture data 21 are processed before transmitting the encoded data 13 to other devices, such as the destination device 14 or any other device for decoding or storage.
  • the destination device 14 includes a decoder 30 (for example, a video decoder 30), and in addition, optionally, it may include a communication interface or communication unit 28, a post-processing unit 32, and a display device 34.
  • a decoder 30 for example, a video decoder 30
  • the communication interface 28 of the destination device 14 is used, for example, to receive the encoded picture data 21 or the encoded data 13 directly from the source device 12 or any other source.
  • Any other source is, for example, a storage device, and the storage device is, for example, encoded picture data storage. device.
  • the communication interface 22 and the communication interface 28 can be used for direct communication through a direct communication link between the source device 12 and the destination device 14 or transmission or reception of encoded picture data 21 or encoded data 13 through any type of network
  • the link is, for example, a direct wired or wireless connection, and any type of network is, for example, a wired or wireless network or any combination thereof, or any type of private and public network, or any combination thereof.
  • the communication interface 22 may be used, for example, to encapsulate the encoded picture data 21 into a suitable format, such as a packet, for transmission over a communication link or communication network.
  • the communication interface 28 forming a corresponding part of the communication interface 22 may be used, for example, to decapsulate the encoded data 13 to obtain the encoded picture data 21.
  • Both the communication interface 22 and the communication interface 28 may be configured as unidirectional communication interfaces, as indicated by the arrows for the encoded picture data 13 from the source device 12 to the destination device 14 in FIG. 1, or configured as bidirectional communication interfaces, and It can be used, for example, to send and receive messages to establish a connection, acknowledge, and exchange any other information related to a communication link and / or data transmission such as encoded picture data transmission.
  • the decoder 30 is configured to receive the encoded picture data 21 and provide the decoded picture data 31 or the decoded picture 31 (details will be further described below, for example, based on FIG. 3 or FIG. 5).
  • the decoder 30 may be configured to decode the data encoded by the encoder. Specifically, the decoder 30 may parse a bitstream to obtain a fusion candidate index, and obtain a corresponding one from the fusion candidate list according to the fusion candidate index. And use the fusion candidate as the motion information of the current block; perform inter prediction on the current block according to the motion information of the current block to obtain a predicted image of the current block; and acquire the current image of the current block. A residual image; adding the predicted image of the current block and the residual image of the current block to obtain a reconstructed image of the current block.
  • the post-processor 32 of the destination device 14 is used to post-process decoded picture data 31 (also referred to as reconstructed picture data), for example, decoded picture 131 to obtain post-processed picture data 33, for example, post-processed Picture 33.
  • the post-processing performed by the post-processing unit 32 may include, for example, color format conversion (e.g., conversion from YCbCr to RGB), color correction, retouching, or resampling, or any other processing, such as preparing the decoded picture data 31 to be processed by
  • the display device 34 displays it.
  • the display device 34 of the destination device 14 is used to receive the post-processed picture data 33 to display a picture to, for example, a user or a viewer.
  • the display device 34 may be or may include any kind of display for presenting a reconstructed picture, such as an integrated or external display or monitor.
  • the display may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), Digital light processor (DLP) or any other display of any kind.
  • FIG. 1 shows the source device 12 and the destination device 14 as separate devices
  • the device embodiment may also include both the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding Functionality and destination device 14 or corresponding functionality.
  • the same hardware and / or software, or separate hardware and / or software, or any combination thereof may be used to implement the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality .
  • Both the encoder 20 e.g., video encoder 20
  • decoder 30 e.g., video decoder 30
  • DSP digital signal processors
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • the device may store the software's instructions in a suitable non-transitory computer-readable storage medium, and may use one or more processors to execute the instructions in hardware to perform the techniques of the present disclosure.
  • processors any one of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be considered as one or more processors.
  • Each of the video encoder 20 and the video decoder 30 may be included in one or more encoders or decoders, and any of the encoders or decoders may be integrated as a combined encoder / decoder in a corresponding device (Codec).
  • the source device 12 may be referred to as a video encoding device or a video encoding device.
  • the destination device 14 may be referred to as a video decoding device or a video decoding device.
  • the source device 12 and the destination device 14 may be examples of a video encoding device or a video encoding apparatus.
  • Source device 12 and destination device 14 may include any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, mobile phone, smartphone, tablet or tablet computer, video camera, desktop Computer, set-top box, TV, display device, digital media player, video game console, video streaming device (such as content service server or content distribution server), broadcast receiver device, broadcast transmitter device, etc., and may not be used Or use any kind of operating system.
  • a notebook or laptop computer mobile phone, smartphone, tablet or tablet computer, video camera, desktop Computer, set-top box, TV, display device, digital media player, video game console, video streaming device (such as content service server or content distribution server), broadcast receiver device, broadcast transmitter device, etc., and may not be used Or use any kind of operating system.
  • source device 12 and destination device 14 may be equipped for wireless communication. Therefore, the source device 12 and the destination device 14 may be wireless communication devices.
  • the video encoding system 10 shown in FIG. 1 is merely an example, and the techniques of the present application may be applicable to a video encoding setting (eg, video encoding or video decoding) that does not necessarily include any data communication between encoding and decoding devices. .
  • data may be retrieved from local storage, streamed over a network, and the like.
  • the video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
  • video decoder 30 may be used to perform the reverse process.
  • video decoder 30 may be used to receive and parse such syntax elements, and decode related video data accordingly.
  • the video encoder 20 may add one or more syntax elements that define the specific position of the fusion candidate in the fusion candidate list and the syntax element of the inter-coding type of the spatial non-adjacent block of the current block. Entropy coded into an encoded video bitstream. In such examples, video decoder 30 may parse such syntax elements and decode related video data accordingly.
  • FIG. 2 shows a schematic / conceptual block diagram of an example of a video encoder 20 for implementing the techniques of the present invention.
  • the video encoder 20 includes a residual calculation unit 204, a transformation processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transformation processing unit 212, a reconstruction unit 214, a buffer 216, and a loop filter.
  • the prediction processing unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a mode selection unit 262.
  • the inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown).
  • the video encoder 20 shown in FIG. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.
  • the residual calculation unit 204, the transformation processing unit 206, the quantization unit 208, the prediction processing unit 260, and the entropy encoding unit 270 form the forward signal path of the encoder 20, while the inverse quantization unit 210, the inverse transformation processing unit 212,
  • the constructing unit 214, the buffer 216, the loop filter 220, the decoded picture buffer (DPB) 230, and the prediction processing unit 260 form a backward signal path of the encoder, wherein the backward signal path of the encoder corresponds to To the decoder's signal path (see decoder 30 in Figure 3).
  • the encoder 20 receives a picture 201 or a block 203 of the picture 201 through, for example, an input 202, for example, a picture in a picture sequence forming a video or a video sequence.
  • the picture block 203 can also be called the current picture block or the picture block to be encoded
  • the picture 201 can be called the current picture or the picture to be encoded (especially when the current picture is distinguished from other pictures in video encoding, other pictures such as the same video sequence (Ie previously encoded and / or decoded pictures in the video sequence of the current picture).
  • An embodiment of the encoder 20 may include a segmentation unit (not shown in FIG. 2) for segmenting the picture 201 into multiple blocks, such as the block 203, and generally into multiple non-overlapping blocks.
  • the segmentation unit can be used to use the same block size and corresponding raster to define the block size for all pictures in the video sequence, or to change the block size between pictures or subsets or groups of pictures, and split each picture into Corresponding block.
  • VVC the block structure for the next generation of video coding was introduced by J. An et al.
  • quad-tree-binary-tree (QTBT) partitioning technique proposed in "VCEG Recommendation COM16-C966"
  • simulations have shown that the proposed QTBT structure is more than the quad-tree structure in the HEVC used Efficient.
  • the CU can have a square or rectangular shape.
  • the coding tree unit (CTU) is first divided by the quad tree structure.
  • the quad leaf nodes can further pass the binary tree structure Dividing.
  • partitioning in binary tree partitioning There are two types of partitioning in binary tree partitioning: symmetrical horizontal partitioning and symmetrical vertical partitioning.
  • the nodes are divided by dividing the nodes horizontally or vertically along the middle.
  • Binary leaf nodes are called coding units (coding unit, CU), and the prediction and transformation processes without any further division.
  • CU, PU, and TU have the same Size.
  • a CU is sometimes composed of coding blocks (CBs) with different color components.
  • CBs coding blocks
  • P and B slices in 4: 2: 0 chroma format one CU contains one luminance CB and two Chroma CB
  • CUs are sometimes composed of CBs with a single component, for example, in the case of I-strips, one CU contains only one luminance CB or only two chroma CBs.
  • U.S. Patent Application Publication No. 20170208336 proposes a block division structure called a multi-type-tree (MTT) instead of a CU structure based on QT, BT, and / or QTBT.
  • the MTT partition structure is still a recursive tree structure.
  • a plurality of different partition structures (for example, three or more) are used.
  • three or more different partition structures may be used at each depth of the tree structure for each corresponding non-leaf node of the tree structure.
  • the depth of a node in the tree structure may refer to the length (eg, the number of divisions) of a path from the node to the root of the tree structure.
  • a partition structure may generally refer to how many different blocks a block can be divided into.
  • the partition structure can be a quad tree partition structure that can divide a block into four blocks, a binary tree partition structure that can divide a block into two blocks, or a triple tree partition structure that can divide a block into three blocks.
  • the tri-tree The partition structure may not be divided by the center.
  • the partition structure can have multiple different partition types.
  • the division type may additionally define how to divide the block, including symmetrical or asymmetrical division, uniform or uneven division, and / or horizontal or vertical division.
  • the encoder 100 may be used to further divide the subtree using a particular partition type of one of the three further partition structures.
  • the encoder 100 may be used to determine specific partition types from QT, BT, triple-tree (TT), and other partition structures.
  • the QT partition structure may include a square quadtree or a rectangular quadtree partition type.
  • the encoder 100 may use a square quadtree partition to divide a square block by dividing the block horizontally and vertically along the center into four square blocks of equal size.
  • the encoder 100 may use a rectangular quadtree partition to divide a rectangular (eg, non-square) block by dividing the rectangular block horizontally and vertically along the center into four equal-sized rectangular blocks.
  • the BT partition structure may include at least one of a horizontally symmetric binary tree, a vertically symmetric binary tree, a horizontally asymmetric binary tree, or a vertically asymmetric binary tree partition type.
  • the encoder 100 may be used to horizontally divide a block horizontally into two symmetrical blocks of the same size along the center of the block.
  • the encoder 100 may be used to bisect a block vertically into two symmetrical blocks of the same size along the center of the block.
  • the encoder 100 may be used to horizontally divide a block into two blocks of different sizes.
  • one block may be 1/4 of the size of the parent block, while another block may be 3/4 of the size of the parent block, similar to the PART_2N ⁇ nU or PART_2N ⁇ nD partition type.
  • the encoder 100 may be used to vertically divide a block into two blocks of different sizes.
  • one block may be 1/4 of the size of the parent block, and another block may be 3/4 of the size of the parent block, similar to the PART_nL ⁇ 2N or PART_nR ⁇ 2N partition type.
  • the asymmetric binary tree partition type may divide the parent block into sections of different sizes.
  • one child block may be 3/8 of the parent block, and the other child block may be 5/8 of the parent block.
  • this type of division can be vertical or horizontal.
  • the difference between the TT partition structure and the type of the QT or BT structure is that the TT partition structure does not partition the blocks along the center. The center areas of the blocks are kept together in the same sub-block. Different from the QT generating four blocks or the binary tree generating two blocks, three blocks are generated according to the division of the TT partition structure.
  • Example partition types according to the TT partition structure include symmetric partition types (both horizontal and vertical) and asymmetric partition types (both horizontal and vertical).
  • the symmetric partition type according to the TT partition structure may be uneven / uneven or equal / uniform.
  • the asymmetric partition type according to the TT partition structure is uneven / uneven.
  • the TT partition structure may include at least one of the following partition types: level equal / uniformly symmetrical tri-tree, vertical equal / uniformly symmetrical tri-tree, horizontally uneven / uniformly symmetrical tri-tree, vertical uneven / Uneven symmetrical tri-tree, horizontal uneven / uneven asymmetric tri-tree, or vertical uneven / uneven asymmetric tri-tree partition type.
  • the unequal / unsymmetric symmetric tri-tree partition type is a partition type that is symmetrical around the center line of the block but has at least one of the three blocks obtained with a size different from the other two.
  • a preferred example is where the side block is 1/4 of the block size and the center block is 1/2 of the block size.
  • the equal / uniform symmetrical tri-tree partition type is a partition type that is symmetrical around the center line of the block and the size of the resulting block is all the same. This type of division is possible if the block height or width-depending on the vertical or horizontal division-is an integer multiple of three.
  • the uneven / uneven asymmetric tri-tree partition type is a partition type that is not symmetrical around the center line of the block and wherein at least one of the resulting blocks is not the same size as the other two.
  • the prediction processing unit 260 of the video encoder 20 may be used to perform any combination of the aforementioned segmentation techniques.
  • block 203 is also or can be regarded as a two-dimensional array or matrix of sampling points with brightness values (sampling values), although its size is smaller than picture 201.
  • block 203 may include, for example, one sampling array (e.g., a luminance array in the case of a black and white picture 201) or three sampling arrays (e.g., one luminance array and two chroma arrays in the case of a color picture) or An array of any other number and / or category of color formats applied.
  • the number of sampling points in the horizontal and vertical directions (or axes) of the block 203 defines the size of the block 203.
  • the encoder 20 shown in FIG. 2 is used to encode a picture 201 block by block, for example, performing encoding and prediction on each block 203.
  • the residual calculation unit 204 is configured to calculate the residual block 205 based on the picture block 203 and the prediction block 265 (the other details of the prediction block 265 are provided below). For example, the sample value of the picture block 203 is subtracted from the prediction by sample-by-sample (pixel-by-pixel). Sample values of block 265 to obtain residual block 205 in the sample domain.
  • the transform processing unit 206 is configured to apply a transform such as discrete cosine transform (DCT) or discrete sine transform (DST) on the sample values of the residual block 205 to obtain transform coefficients 207 in the transform domain.
  • a transform such as discrete cosine transform (DCT) or discrete sine transform (DST)
  • DCT discrete cosine transform
  • DST discrete sine transform
  • the transform coefficient 207 may also be referred to as a transform residual coefficient, and represents a residual block 205 in a transform domain.
  • the transform processing unit 206 may be used to apply an integer approximation of DCT / DST, such as the transform specified for HEVC / H.265. Compared to an orthogonal DCT transform, this integer approximation is usually scaled by a factor. To maintain the norm of the residual blocks processed by the forward and inverse transforms, an additional scaling factor is applied as part of the transform process.
  • the scaling factor is usually selected based on certain constraints, for example, the scaling factor is a power of two used for shift operations, the bit depth of the transform coefficients, the trade-off between accuracy, and implementation cost.
  • a specific scaling factor is specified on the decoder 30 side by, for example, the inverse transform processing unit 212 (and on the encoder 20 side by, for example, the inverse transform processing unit 212 as the corresponding inverse transform), and accordingly, the The 20 side specifies a corresponding scaling factor for the positive transformation through the transformation processing unit 206.
  • the quantization unit 208 is used to quantize the transform coefficients 207, for example, by applying scalar quantization or vector quantization to obtain the quantized transform coefficients 209.
  • the quantized transform coefficient 209 may also be referred to as a quantized residual coefficient 209.
  • the quantization process can reduce the bit depth associated with some or all of the transform coefficients 207. For example, n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m.
  • the degree of quantization can be modified by adjusting the quantization parameter (QP). For scalar quantization, for example, different scales can be applied to achieve finer or coarser quantization.
  • a smaller quantization step size corresponds to a finer quantization, while a larger quantization step size corresponds to a coarser quantization.
  • An appropriate quantization step size can be indicated by a quantization parameter (QP).
  • the quantization parameter may be an index of a predefined set of suitable quantization steps.
  • smaller quantization parameters may correspond to fine quantization (smaller quantization step size)
  • larger quantization parameters may correspond to coarse quantization (larger quantization step size)
  • Quantization may include division by a quantization step size and corresponding quantization or inverse quantization performed, for example, by inverse quantization 210, or may include multiplication by a quantization step size.
  • Embodiments according to some standards such as HEVC may use quantization parameters to determine the quantization step size.
  • the quantization step size can be calculated using a fixed-point approximation using an equation containing division based on the quantization parameter. Additional scaling factors may be introduced for quantization and inverse quantization to restore the norm of the residual block that may be modified due to the scale used in the fixed-point approximation of the equation for the quantization step size and quantization parameter.
  • inverse transform and inverse quantization scales can be combined.
  • a custom quantization table can be used and signaled from the encoder to the decoder in, for example, a bitstream. Quantization is a lossy operation, where the larger the quantization step, the greater the loss.
  • the inverse quantization unit 210 is configured to apply the inverse quantization of the quantization unit 208 on the quantized coefficients to obtain the inverse quantized coefficients 211. For example, based on or using the same quantization step as the quantization unit 208, the quantization scheme applied by the quantization unit 208 is applied. Inverse quantization scheme.
  • the dequantized coefficient 211 may also be referred to as a dequantized residual coefficient 211, which corresponds to the transform coefficient 207, although the loss due to quantization is usually different from the transform coefficient.
  • the inverse transform processing unit 212 is used to apply an inverse transform of the transform applied by the transform processing unit 206, for example, an inverse discrete cosine transform (DCT) or an inverse discrete sine transform (DST), in the sample domain.
  • DCT inverse discrete cosine transform
  • DST inverse discrete sine transform
  • the inverse transform block 213 may also be referred to as an inverse transform inverse quantized block 213 or an inverse transform residual block 213.
  • the reconstruction unit 214 (for example, the summer 214) is used to add the inverse transform block 213 (that is, the reconstructed residual block 213) to the prediction block 265 to obtain the reconstructed block 215 in the sample domain.
  • the sample values of the reconstructed residual block 213 are added to the sample values of the prediction block 265.
  • a buffer unit 216 (or simply "buffer" 216), such as a line buffer 216, is used to buffer or store the reconstructed block 215 and corresponding sample values, for example, for intra prediction.
  • the encoder may be used to use any unfiltered reconstructed block and / or corresponding sample values stored in the buffer unit 216 for any category of estimation and / or prediction, such as intra-frame prediction.
  • an embodiment of the encoder 20 may be configured such that the buffer unit 216 is used not only for storing the reconstructed block 215 for intra prediction 254, but also for the loop filter unit 220 (not shown in FIG. 2). Out), and / or, for example, to make the buffer unit 216 and the decoded picture buffer unit 230 form a buffer.
  • Other embodiments may be used to use the filtered block 221 and / or blocks or samples from the decoded picture buffer 230 (neither shown in FIG. 2) as the input or basis for the intra prediction 254.
  • the loop filter unit 220 (or simply "loop filter” 220) is configured to filter the reconstructed block 215 to obtain the filtered block 221, so as to smoothly perform pixel conversion or improve video quality.
  • the loop filter unit 220 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters, such as a bilateral filter, Adaptive loop filters (adaptive loop filters, ALF), or sharpening or smoothing filters, or cooperative filters.
  • the loop filter unit 220 is shown as an in-loop filter in FIG. 2, in other configurations, the loop filter unit 220 may be implemented as a post-loop filter.
  • the filtered block 221 may also be referred to as a filtered reconstructed block 221.
  • the decoded picture buffer 230 may store the reconstructed encoded block after the loop filter unit 220 performs a filtering operation on the reconstructed encoded block.
  • An embodiment of the encoder 20 may be used to output loop filter parameters (e.g., sample adaptive offset information), for example, directly output or by the entropy coding unit 270 or any other
  • the entropy coding unit outputs after entropy coding, for example, so that the decoder 30 can receive and apply the same loop filter parameters for decoding.
  • the decoded picture buffer (DPB) 230 may be a reference picture memory that stores reference picture data for the video encoder 20 to encode video data.
  • DPB 230 can be formed by any of a variety of memory devices, such as dynamic random access (DRAM) (including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (resistive RAM, RRAM)) or other types of memory devices.
  • DRAM dynamic random access
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • the DPB 230 and the buffer 216 may be provided by the same memory device or separate memory devices.
  • a decoded picture buffer (DPB) 230 is used to store the filtered block 221.
  • the decoded picture buffer 230 may be further used to store other previous filtered blocks of the same current picture or different pictures such as previously reconstructed pictures, such as the previously reconstructed and filtered block 221, and may provide a complete previous Reconstruction is the decoded picture (and corresponding reference blocks and samples) and / or part of the reconstructed current picture (and corresponding reference blocks and samples), for example for inter prediction.
  • a decoded picture buffer (DPB) 230 is used to store the reconstructed block 215.
  • Prediction processing unit 260 also referred to as block prediction processing unit 260, is used to receive or obtain block 203 (current block 203 of current picture 201) and reconstructed picture data, such as a reference to the same (current) picture from buffer 216 Samples and / or reference picture data 231 from one or more previously decoded pictures from the decoded picture buffer 230, and used to process such data for prediction, i.e., may be provided as inter-predicted blocks 245 or intra- Prediction block 265 of prediction block 255.
  • the mode selection unit 262 may be used to select a prediction mode (such as an intra or inter prediction mode) and / or a corresponding prediction block 245 or 255 used as the prediction block 265 to calculate the residual block 205 and reconstruct the reconstructed block 215.
  • a prediction mode such as an intra or inter prediction mode
  • a corresponding prediction block 245 or 255 used as the prediction block 265 to calculate the residual block 205 and reconstruct the reconstructed block 215.
  • An embodiment of the mode selection unit 262 may be used to select a prediction mode (e.g., selected from those prediction modes supported by the prediction processing unit 260) that provides the best match or minimum residual (minimum residual means Better compression in transmission or storage), or provide minimal signaling overhead (minimum signaling overhead means better compression in transmission or storage), or consider or balance both.
  • the mode selection unit 262 may be used to determine a prediction mode based on rate distortion optimization (RDO), that is, to select a prediction mode that provides the minimum code rate distortion optimization, or to select a prediction mode whose related code rate distortion meets at least the prediction mode selection criteria. .
  • RDO rate distortion optimization
  • the encoder 20 is used to determine or select the best or optimal prediction mode from a set of (predetermined) prediction modes.
  • the prediction mode set may include, for example, an intra prediction mode and / or an inter prediction mode.
  • the set of intra prediction modes may include 35 different intra prediction modes, for example, non-directional modes such as DC (or average) mode and planar mode, or directional modes as defined in H.265, or may include 67 Different intra prediction modes, such as non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in the developing H.266.
  • the set of (possible) inter-prediction modes depends on the available reference pictures (i.e., at least part of the decoded pictures previously stored in DBP 230) and other inter-prediction parameters, such as whether to use the entire reference picture or only the reference A part of the picture, such as a search window area surrounding the area of the current block, searches for the best matching reference block, and / or depends on, for example, whether pixel interpolation such as half-pixel and / or quarter-pixel interpolation is applied.
  • a skip mode and / or a direct mode can also be applied.
  • the prediction processing unit 260 may be further configured to divide the block 203 into smaller block partitions or sub-blocks, for example, using a quad-tree (QT) partition, a binary-tree (BT) partition, or Triple-tree (TT) segmentation, or any combination thereof, and for performing predictions, for example, for each of block partitions or sub-blocks, where the mode selection includes selecting the tree structure of the partitioned block 203 and the selection applied to the block The prediction mode for each of the partitions or sub-blocks.
  • QT quad-tree
  • BT binary-tree
  • TT Triple-tree
  • the inter prediction unit 244 may include a motion estimation (ME) unit (not shown in FIG. 2) and a motion compensation (MC) unit (not shown in FIG. 2).
  • the motion estimation unit is configured to receive or obtain picture block 203 (current picture block 203 of current picture 201) and decoded picture 231, or at least one or more previously reconstructed blocks, for example, one or more other / different previous
  • the reconstructed block of picture 231 is decoded for motion estimation.
  • the video sequence may include the current picture and the previously decoded picture 31, or in other words, the current picture and the previously decoded picture 31 may be part of the picture sequence forming the video sequence or form the picture sequence.
  • the construction of the fusion candidate list of the present application can be implemented by the motion estimation module.
  • the encoder 20 may be used to select a reference block from multiple reference blocks of the same or different pictures in multiple other pictures, and provide a reference picture (or reference picture index) to a motion estimation unit (not shown in FIG. 2). ...) and / or provide an offset (spatial offset) between the position (X, Y coordinates) of the reference block and the position of the current block as an inter prediction parameter.
  • This offset is also called a motion vector (MV).
  • the motion compensation unit is used for obtaining, for example, receiving inter prediction parameters, and performing inter prediction based on or using the inter prediction parameters to obtain the inter prediction block 245.
  • Motion compensation performed by a motion compensation unit may include taking out or generating a prediction block based on a motion / block vector determined through motion estimation (possibly performing interpolation on sub-pixel accuracy). Interpolation filtering can generate additional pixel samples from known pixel samples, potentially increasing the number of candidate prediction blocks that can be used to encode picture blocks.
  • the motion compensation unit 246 may locate the prediction block pointed to by the motion vector in a reference picture list.
  • Motion compensation unit 246 may also generate syntax elements associated with blocks and video slices for use by video decoder 30 when decoding picture blocks of video slices.
  • the intra prediction unit 254 is configured to obtain, for example, a picture block 203 (current picture block) and one or more previously reconstructed blocks, such as reconstructed neighboring blocks, that receive the same picture for intra estimation.
  • the encoder 20 may be used to select an intra prediction mode from a plurality of (predetermined) intra prediction modes.
  • Embodiments of the encoder 20 may be used to select an intra prediction mode based on optimization criteria, such as based on a minimum residual (e.g., an intra prediction mode that provides a prediction block 255 that is most similar to the current picture block 203) or a minimum code rate distortion ( E.g administrat).
  • a minimum residual e.g., an intra prediction mode that provides a prediction block 255 that is most similar to the current picture block 203
  • a minimum code rate distortion E.g.
  • the intra prediction unit 254 is further configured to determine the intra prediction block 255 based on the intra prediction parameters of the intra prediction mode as selected. In any case, after selecting the intra prediction mode for the block, the intra prediction unit 254 is further configured to provide the intra prediction parameters to the entropy encoding unit 270, that is, to provide an indication of the selected intra prediction mode for the block. Information. In one example, the intra prediction unit 254 may be used to perform any combination of intra prediction techniques described below.
  • the entropy coding unit 270 is configured to apply an entropy coding algorithm or scheme (for example, a variable length coding (VLC) scheme, a context adaptive VLC (context adaptive VLC, CAVLC) scheme, an arithmetic coding scheme, and a context adaptive binary arithmetic Coding (context, adaptive binary coding, CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or other entropy Encoding method or technique) applied to one or all of the quantized residual coefficients 209, inter prediction parameters, intra prediction parameters, and / or loop filter parameters (or not applied) to obtain
  • VLC variable length coding
  • CAVLC context adaptive VLC
  • CABAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the encoded picture data 21 is output in the form of, for example, an encoded bit stream 21.
  • the encoded bitstream may be transmitted to video decoder 30 or archived for later transmission or retrieval by video decoder 30.
  • the entropy encoding unit 270 may also be used to entropy encode other syntax elements of the current video slice that is being encoded.
  • video encoder 20 may be used to encode a video stream.
  • the non-transform-based encoder 20 may directly quantize the residual signal without a transform processing unit 206 for certain blocks or frames.
  • the encoder 20 may have a quantization unit 208 and an inverse quantization unit 210 combined into a single unit.
  • FIG. 3 illustrates an exemplary video decoder 30 for implementing the technique of the present application, that is, performing fusion candidate list construction of a block to be decoded (current block) and decoding the compressed image based on the constructed fusion candidate list.
  • the video decoder 30 is configured to receive, for example, encoded picture data (eg, an encoded bit stream) 21 encoded by the encoder 20 to obtain a decoded picture 231.
  • video decoder 30 receives video data from video encoder 20, such as an encoded video bitstream and associated syntax elements representing picture blocks of encoded video slices.
  • the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (such as a summer 314), a buffer 316, a loop filter 320, The decoded picture buffer 330 and the prediction processing unit 360.
  • the prediction processing unit 360 may include an inter prediction unit 344, an intra prediction unit 354, and a mode selection unit 362.
  • video decoder 30 may perform a decoding pass that is substantially inverse to the encoding pass described with reference to video encoder 20 of FIG. 2.
  • the entropy decoding unit 304 is configured to perform entropy decoding on the encoded picture data 21 to obtain, for example, quantized coefficients 309 and / or decoded encoding parameters (not shown in FIG. 3), for example, inter prediction, intra prediction parameters , (Filtered) any or all of the loop filter parameters and / or other syntax elements.
  • the entropy decoding unit 304 is further configured to forward the inter prediction parameters, the intra prediction parameters, and / or other syntax elements to the prediction processing unit 360.
  • Video decoder 30 may receive syntax elements at the video slice level and / or the video block level.
  • the inverse quantization unit 310 may be functionally the same as the inverse quantization unit 110
  • the inverse transformation processing unit 312 may be functionally the same as the inverse transformation processing unit 212
  • the reconstruction unit 314 may be functionally the same as the reconstruction unit 214
  • the buffer 316 may be functionally
  • the loop filter 320 may be functionally the same as the loop filter 220
  • the decoded picture buffer 330 may be functionally the same as the decoded picture buffer 230.
  • the prediction processing unit 360 may include an inter prediction unit 344 and an intra prediction unit 354.
  • the inter prediction unit 344 may be functionally similar to the inter prediction unit 244 and the intra prediction unit 354 may be functionally similar to the intra prediction unit 254.
  • the prediction processing unit 360 is generally used to perform block prediction and / or obtain a prediction block 365 from the encoded data 21, and to receive or obtain prediction-related parameters from, for example, an entropy decoding unit 304 (explicitly or implicitly) and / or Information about the selected prediction mode.
  • the intra-prediction unit 354 of the prediction processing unit 360 is used for the intra-prediction mode based on the signal representation and the previously decoded block from the current frame or picture Data to generate a prediction block 365 for a picture block of the current video slice.
  • the inter-prediction unit 344 e.g., a motion compensation unit
  • the other syntax elements generate a prediction block 365 for a video block of the current video slice.
  • a prediction block may be generated from a reference picture in a reference picture list.
  • the video decoder 30 may construct a reference frame list using a default construction technique based on the reference pictures stored in the DPB 330: List 0 and List 1.
  • the prediction processing unit 360 is configured to determine prediction information for a video block of a current video slice by analyzing a motion vector and other syntax elements, and use the prediction information to generate a prediction block for a current video block that is being decoded. For example, the prediction processing unit 360 uses some of the received syntax elements to determine a prediction mode (e.g., intra or inter prediction) of a video block used to encode a video slice, an inter prediction slice type (e.g., B slice, P slice or GPB slice), construction information for one or more of the reference picture lists for the slice, motion vectors for each inter-coded video block for the slice, each warp for the slice The inter-prediction status and other information of the inter-coded video block to decode the video block of the current video slice.
  • a prediction mode e.g., intra or inter prediction
  • an inter prediction slice type e.g., B slice, P slice or GPB slice
  • construction information for one or more of the reference picture lists for the slice motion vectors for each inter-coded video block
  • the inverse quantization unit 310 may be used for inverse quantization (ie, inverse quantization) of the quantized transform coefficients provided in the bitstream and decoded by the entropy decoding unit 304.
  • the inverse quantization process may include using the quantization parameters calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization that should be applied and also to determine the degree of inverse quantization that should be applied.
  • the inverse transform processing unit 312 is configured to apply an inverse transform (for example, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to generate a residual block in the pixel domain.
  • an inverse transform for example, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process
  • Reconstruction unit 314 (e.g., summer 314) is used to add inverse transform block 313 (i.e., reconstructed residual block 313) to prediction block 365 to obtain reconstructed block 315 in the sample domain, such as by The sample values of the reconstructed residual block 313 are added to the sample values of the prediction block 365.
  • the loop filter unit 320 (during or after the encoding cycle) is used to filter the reconstructed block 315 to obtain the filtered block 321 so as to smoothly perform pixel conversion or improve video quality.
  • the loop filter unit 320 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters such as a bilateral filter, Adaptive loop filters (adaptive loop filters, ALF), or sharpening or smoothing filters, or cooperative filters.
  • the loop filter unit 320 is shown as an in-loop filter in FIG. 3, in other configurations, the loop filter unit 320 may be implemented as a post-loop filter.
  • the decoded video block 321 in a given frame or picture is then stored in a decoded picture buffer 330 that stores reference pictures for subsequent motion compensation.
  • the decoder 30 is used, for example, to output a decoded picture 31 through an output 332 for presentation to or review by a user.
  • video decoder 30 may be used to decode the compressed bitstream.
  • the decoder 30 may generate an output video stream without the loop filter unit 320.
  • the non-transform-based decoder 30 may directly inversely quantize the residual signal without the inverse transform processing unit 312 for certain blocks or frames.
  • the video decoder 30 may have an inverse quantization unit 310 and an inverse transform processing unit 312 combined into a single unit.
  • FIG. 4 is an explanatory diagram of an example of a video encoding system 40 including the encoder 20 of FIG. 2 and / or the decoder 30 of FIG. 3 according to an exemplary embodiment.
  • the system 40 can implement the technology of the present application, which is used to construct a fusion candidate list of the current block based on the fusion candidate construction method proposed by the present invention, and to encode or decode an image based on the fusion candidate list.
  • the video encoding system 40 may include an imaging device 41, a video encoder 20, a video decoder 30 (and / or a video encoder implemented by the logic circuit 47 of the processing unit 46), an antenna 42, One or more processors 43, one or more memories 44, and / or a display device 45.
  • the imaging device 41, antenna 42, processing unit 46, logic circuit 47, video encoder 20, video decoder 30, processor 43, memory 44, and / or display device 45 can communicate with each other.
  • video encoding system 40 is shown with video encoder 20 and video decoder 30, in different examples, video encoding system 40 may include only video encoder 20 or only video decoder 30.
  • the video encoding system 40 may include an antenna 42.
  • the antenna 42 may be used to transmit or receive an encoded bit stream of video data.
  • the video encoding system 40 may include a display device 45.
  • the display device 45 may be used to present video data.
  • the logic circuit 47 may be implemented by the processing unit 46.
  • the processing unit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the video encoding system 40 may also include an optional processor 43, which may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • ASIC application-specific integrated circuit
  • the logic circuit 47 may be implemented by hardware, such as dedicated hardware for video encoding, and the processor 43 may be implemented by general software, operating system, and the like.
  • the memory 44 may be any type of memory, such as volatile memory (e.g., Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), etc.) or non-volatile memory Memory (for example, flash memory, etc.).
  • the memory 44 may be implemented by a cache memory.
  • the logic circuit 47 may access the memory 44 (eg, for implementing an image buffer).
  • the logic circuit 47 and / or the processing unit 46 may include a memory (eg, a cache, etc.) for implementing an image buffer or the like.
  • video encoder 20 implemented by logic circuits may include an image buffer (eg, implemented by processing unit 46 or memory 44) and a graphics processing unit (eg, implemented by processing unit 46).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include a video encoder 20 implemented by a logic circuit 47 to implement the various modules discussed with reference to FIG. 2 and / or any other encoder system or subsystem described herein.
  • Logic circuits can be used to perform various operations discussed herein.
  • Video decoder 30 may be implemented in a similar manner by logic circuit 47 to implement the various modules discussed with reference to decoder 30 of FIG. 3 and / or any other decoder system or subsystem described herein.
  • video decoder 30 implemented by a logic circuit may include an image buffer (implemented by processing unit 2820 or memory 44) and a graphics processing unit (eg, implemented by processing unit 46).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include a video decoder 30 implemented by a logic circuit 47 to implement various modules discussed with reference to FIG. 3 and / or any other decoder system or subsystem described herein.
  • the antenna 42 of the video encoding system 40 may be used to receive an encoded bit stream of video data.
  • the encoded bitstream may contain data, indicators, index values, mode selection data, etc. related to encoded video frames discussed herein, such as data related to coded segmentation (e.g., transform coefficients or quantized transform coefficients , (As discussed) optional indicators, and / or data defining code partitions).
  • the video encoding system 40 may also include a video decoder 30 coupled to the antenna 42 and used to decode the encoded bitstream.
  • the display device 45 is used to present video frames.
  • FIG. 5 is a simplified block diagram of an apparatus 500 that can be used as either or both of the source device 12 and the destination device 14 in FIG. 1 according to an exemplary embodiment.
  • the apparatus 500 can implement the technology of the present application, and is used for constructing a fusion candidate list and encoding or decoding an image based on the constructed fusion candidate list.
  • the apparatus 500 may take the form of a computing system including a plurality of computing devices, or take the form of a single computing device such as a mobile phone, tablet computer, laptop computer, notebook computer, desktop computer, and the like.
  • the processor 502 in the apparatus 500 may be a central processing unit.
  • the processor 502 may be any other type of device or multiple devices capable of manipulating or processing information, existing or to be developed in the future.
  • speed and efficiency advantages can be achieved using more than one processor.
  • the memory 504 in the device 500 may be a read-only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can be used as the memory 504.
  • the memory 504 may include code and data 506 accessed by the processor 502 using the bus 512.
  • the memory 504 may further include an operating system 508 and an application program 510, which contains at least one program that permits the processor 502 to perform the methods described herein.
  • the application program 510 may include applications 1 to N, and applications 1 to N further include a video encoding application that performs the fusion candidate list construction described herein.
  • the device 500 may also include additional memory in the form of a slave memory 514, which may be, for example, a memory card for use with a mobile computing device. Because a video communication session may contain a large amount of information, this information may be stored in whole or in part in the slave memory 514 and loaded into the memory 504 for processing as needed.
  • a slave memory 514 may be, for example, a memory card for use with a mobile computing device. Because a video communication session may contain a large amount of information, this information may be stored in whole or in part in the slave memory 514 and loaded into the memory 504 for processing as needed.
  • the apparatus 500 may also include one or more output devices, such as a display 518.
  • the display 518 may be a touch-sensitive display combining a display and a touch-sensitive element operable to sense a touch input.
  • the display 518 may be coupled to the processor 502 through a bus 512.
  • other output devices may be provided that allow the user to program or otherwise use the device 500, or provide other output devices as an alternative to the display 518.
  • the display can be implemented in different ways, including through a liquid crystal display (LCD), a cathode-ray tube (CRT) display, a plasma display, or a light emitting diode diode (LED) displays, such as organic LED (OLED) displays.
  • LCD liquid crystal display
  • CTR cathode-ray tube
  • plasma display a plasma display
  • LED light emitting diode diode
  • OLED organic LED
  • the apparatus 500 may further include or be in communication with an image sensing device 520, such as a camera or any other image sensing device 520 that can or will be developed in the future to sense an image, such as An image of a user running the device 500.
  • the image sensing device 520 may be placed directly facing a user of the running apparatus 500.
  • the position and optical axis of the image sensing device 520 may be configured such that its field of view includes an area immediately adjacent to the display 518 and the display 518 is visible from the area.
  • the device 500 may also include or be in communication with a sound sensing device 522, such as a microphone or any other sound sensing device that can or will be developed in the future to sense the sound near the device 500.
  • the sound sensing device 522 may be placed directly facing the user of the operating device 500 and may be used to receive a sound, such as a voice or other sound, emitted by the user when the device 500 is running.
  • the processor 502 and memory 504 of the apparatus 500 are shown in FIG. 5 as being integrated in a single unit, other configurations may be used.
  • the operation of the processor 502 may be distributed among multiple directly-coupled machines (each machine has one or more processors), or distributed in a local area or other network.
  • the memory 504 may be distributed among multiple machines, such as a network-based memory or a memory among multiple machines running the apparatus 500.
  • the bus 512 of the device 500 may be formed of multiple buses.
  • the slave memory 514 may be directly coupled to other components of the device 500 or may be accessed through a network, and may include a single integrated unit, such as one memory card, or multiple units, such as multiple memory cards. Therefore, the apparatus 500 can be implemented in various configurations.
  • FIG. 11 is a flowchart of an example operation of a method for constructing a fusion candidate list according to an embodiment of the present invention according to the video encoder 20 and the video decoder 30 shown in FIG. 1.
  • One or more functional units of the video encoder 20 or the video decoder 30 include a prediction processing unit 260/360, which can be used to execute the method of FIG. 11.
  • an improved method for updating the historical candidate motion information list is proposed. This method is applied to inter prediction, allowing the historical candidate motion information list to be reconstructed at the CTU (coding tree) level, without adding additional While the storage area has considerable encoding efficiency, it is more conducive to the design of line-level and CTU-level codec parallelism.
  • This method can ensure that the encoding and decoding time is basically not lost during the inter-frame encoding process, which greatly reduces the encoding and decoding time.
  • the inter prediction method of the candidate list with reference to FIG. 11 includes:
  • the historical candidate motion information list includes N storage spaces, and the N storage spaces are used to store historical candidate motion information.
  • the initialized historical candidate motion information list includes at least M vacant storage spaces. M ⁇ N, M and N are integers, the current coding tree unit is included in a coding tree unit set (Slice) composed of multiple coding tree units, and the current coding tree unit is not in the coding tree unit set Follow the first of the predetermined processing sequence;
  • the initialization of the historical candidate motion information list can also be replaced by the following manner:
  • the coding tree unit is generally used in a row-by-line operation mode. In this operation mode, if the current coding tree unit is the first of a coding tree unit row, or the current coding unit tree is located in The first combination of parallel coding unit (the parallel coding tree unit is composed of consecutive K coding tree units located in the same coding tree unit row, K is greater than or equal to 1, and less than the total number of coding tree units L in a coding tree unit row , L is greater than or equal to 2).
  • Initializing the historical candidate motion information list helps to improve the coding efficiency.
  • S1103 adds motion information at L positions in the spatially adjacent blocks of the current coding tree unit to the historical candidate motion information list in a predetermined order, where M ⁇ L ⁇ N, the spatially adjacent blocks The L positions are obtained according to a preset rule;
  • S1105 construct a current candidate motion information list of the current coding tree unit or a current candidate motion information list of the current coding unit, where the coding unit is obtained by dividing the coding tree unit;
  • the current encoding The tree unit or the current coding unit performs inter prediction.
  • the historical candidate motion information list is initialized, that is, an independent historical candidate motion information list corresponding to the current coding tree unit is constructed, thereby cutting the coding tree unit coding.
  • the dependency relationship due to the construction of the historical candidate motion information list in the process, so that the coding tree unit can independently encode according to its own historical candidate motion information list, which has considerable encoding efficiency and is more conducive to designing row-level and CTU-level encoding and decoding are parallel. Through parallel processing, the encoding and decoding time can be greatly reduced while ensuring that the encoding quality is basically not lost.
  • This method enables the current coding tree unit to construct a completely new historical candidate motion information. List to increase the accuracy of inter prediction.
  • the M positions in the adjacent blocks in the spatial domain are: obtaining the first candidate motion information from a preset position in the adjacent blocks in the spatial domain, and using the position where the first candidate motion information is obtained as a starting point, The remaining M-1 candidate motion information is acquired at a preset step size.
  • the motion vectors at the M positions are usually preset from a starting position and preset.
  • the motion vectors at M positions are obtained in sequence.
  • the preset interval can also be called the step size.
  • the step size can be fixed. For example, 4 or 8 pixels are used as the unit.
  • the step size can also be changed, for example, different step sizes can be set according to the size of the current coding tree unit.
  • the order of adding the motion information / motion vectors of the M position may be a preset order. For example, in a clockwise order, starting from the spatially adjacent block in the lower left corner of the current coding tree unit, starting with the current coding The spatial neighboring block in the upper right corner of the tree unit is the end point, and the motion information at L positions in the spatial neighboring block is added to the historical candidate motion information list.
  • the method may further combine a combination of the current candidate motion information list of the current coding tree unit and the historical candidate motion information list, or The current candidate motion information list and the historical candidate motion information list are combined, which may specifically be: adding the historical candidate motion information to the current candidate motion information list of the current coding tree unit or the current candidate motion information of the previous coding unit In the list, and then perform the inter prediction based on the current candidate motion information list of the current coding tree unit or the current candidate motion information list of the previous coding unit.
  • the method may further include: obtaining according to a combination of the current candidate motion information list of the current coding unit and the historical candidate motion information list. Motion information of the current coding unit, and performing inter prediction on the current coding unit according to the acquired motion information; and updating the historical candidate motion information list based on the current coding unit motion information.
  • This method enables the historical candidate motion information list corresponding to the current coding tree unit to be continuously updated to improve the accuracy of inter prediction.
  • the above-mentioned updating of the historical candidate motion information list can be divided into two cases, that is, if the M positions are not filled, the current coding unit motion information is added to the historical information as historical motion information.
  • the vacant storage space closest to the NM position among the M positions in the historical candidate motion information list; or, if the M positions are filled, the earliest first-in-first-out principle will be added to the historical candidate Remove the historical motion information in the motion information list, and shift the remaining historical motion information beyond the position of the removed historical motion information, and add the current coding unit motion information as historical motion information to the historical candidate motion information list.
  • a tail wherein the end of the historical candidate motion information list containing the latest added historical motion information is the tail of the historical candidate motion information list.
  • This method provides the flexibility of the historical candidate motion information list application, that is, the historical candidate motion information list can also be used for inter prediction of the current block when it is not completely filled, and when the historical candidate motion information list is full Next, the motion information / motion vector of the current coding block can still be used to update the historical candidate motion information list.
  • the motion information / motion vectors of the current coding block can still no longer be updated to the historical candidate motion information list, that is, if the M positions are not full, Adding the current coding unit motion information as historical motion information to the vacant storage space closest to the NM position among the M positions in the historical candidate motion information list; if the M positions are filled, Then perform inter prediction processing on the next coding unit based on the current candidate motion information list.
  • This processing manner may allow parallel processing of coding blocks within the current coding tree unit, and specifically may be performing inter prediction on another coding unit based on the same method as the current coding unit, wherein the other A coding unit is located behind the current coding unit in a preset processing order and belongs to the coding tree unit with the current coding unit, and the historical motion information list used for inter prediction of the other coding unit includes the current coding Historical motion information in the historical motion information list used by the unit's inter prediction.
  • the inventive solution allows reconstruction of the historical candidate motion information list at the CTU (coding tree) level. While having considerable coding efficiency, it is more conducive to designing line-level and CTU-level codec parallelism. This method can ensure that between frames When the encoding quality is basically not lost during the encoding process, the encoding and decoding time is greatly reduced.
  • the initialization process of the historical candidate motion information list can refer to the prior art.
  • the process can be performed in the same way as the JVET-K0104 proposal (that is, the historical candidate motion information list is emptied at the beginning of the film (SLICE)).
  • Other methods for initializing the historical candidate motion information list may be adopted, which is not limited by the present invention; in this embodiment, the initialization is to clear the historical candidate motion information list.
  • the coding tree unit is a to-be-processed image block in which a prediction mode can be determined based on the unit during the coding process, which may be further divided or not divided, and its definition is consistent with the definition of the coding tree unit in HEVC and VVC. In the macroblock (macroblock). In the following, a coding tree unit is used. If the coding tree unit is further divided, multiple coding units are formed. Therefore, in this case, the coding tree unit can also be understood as a combination of coding units.
  • the motion information of the spatially neighboring blocks of the current coding tree unit is added to the historical candidate motion information list.
  • the motion information of the spatially adjacent image blocks includes the motion information (A, An) of the left spatially adjacent image blocks and the motion information (B, Bn, C) of the upper spatially adjacent image blocks, as shown in FIG. 10.
  • Bn and An in FIG. 10 are motion information extracted by a predetermined rule from all the upper and left adjacent coded / decoded image blocks.
  • the predetermined rule may be extracted at fixed intervals M and N (M and N are positive integers greater than 0), the interval M is suitable for extracting motion information in the adjacent image block on the left, and the interval N is applicable to the upper adjacent
  • M and N are positive integers greater than 0
  • the interval M is suitable for extracting motion information in the adjacent image block on the left
  • the interval N is applicable to the upper adjacent
  • the extraction of the motion information in the image block may also use other extracted predetermined rule methods, and the present invention does not involve specific extracted predetermined rule methods.
  • the adjacent image blocks described above preferably refer to image blocks located in the same SLICE as the current
  • the historical candidate motion information list is not filled after traversing the spatially adjacent image blocks, or if the current coding tree unit is located at the top of a frame image, or if the current coding tree unit is located at the top of a frame image On the left, you can refer to any of the following methods to process the unfilled part of the historical candidate motion information list.
  • the motion information of any other source is no longer filled, and the motion information of the current coding unit obtained when the current coding unit in the current coding tree unit is coded is added to the historical candidate motion information list as the historical candidate motion information.
  • the motion information of the coding blocks in the current coding tree unit spatial domain preset non-adjacent positions is filled.
  • the preset non-proximity position may be a fixed interval from the neighboring position or a preset template.
  • the preset positions in the corresponding positions of the current coding tree unit may be extracted at fixed intervals, or may be extracted in a specific rule or order.
  • the preset position in the corresponding position of the current coding tree unit adjacent to the coding block may be extracted by a specific rule and a specific order.
  • the preset position in the corresponding position of the current pre-coding tree unit may be extracted at a fixed interval, or may be extracted in a specific rule or order.
  • the preset position in the corresponding position of the current pre-coding tree unit preset non-adjacent coding block may be extracted by a specific rule and a specific order.
  • the history candidate motion information list is initialized when each coding tree unit starts coding, so that each coding tree unit does not need to wait for the coding of the last coding unit of the previous coding tree unit to be completed. Can start processing, but can be processed in parallel with the previous coding tree unit, which greatly saves processing time.
  • the historical candidate motion information list is initialized.
  • the spatial domain of a preset number of the current coding tree unit is initialized.
  • the motion information of neighboring image blocks is added to the historical candidate motion information list, where the preset number is a positive integer greater than 0.
  • a possible implementation manner of adding motion information of a preset number of spatially adjacent image blocks of a current coding tree unit to a historical candidate motion information list may be an existing historical motion candidate information list On the basis of historical candidate motion information, a predetermined number of motion information is additionally added as new historical candidate motion information, regardless of whether the historical candidate motion information list is filled or not filled; another possible implementation may be to The existing historical candidate motion information in some historical candidate motion information lists is cleared according to predetermined rules to a predetermined number of historical candidate motion information, and then a preset number of motion information in the spatially adjacent image block of the current coding tree unit is used as a new The historical candidate motion information is added to the historical candidate motion information list.
  • This embodiment is the inter-prediction between the application and the current coding unit after initialization or reconstruction of the historical candidate motion information list in the first and second embodiments, and specifically includes:
  • Embodiment 1 or Embodiment 2 The method in Embodiment 1 or Embodiment 2 is used to update or reconstruct the historical candidate motion information list.
  • Embodiment 1 or Embodiment 2 For a specific reconstruction method, see Embodiment 1 or Embodiment 2.
  • Inter-prediction of at least one coding unit in the current coding tree unit or the current coding tree unit is based on the above-mentioned luma block or chroma block included in the coding unit or coding tree unit as a basic processing unit.
  • the coding tree unit or coding unit includes one luma coding block and two chroma coding blocks.
  • At least one color component (chroma or luminance) coding block in a pre-coding tree unit or a current coding tree unit that is performing encoding or decoding processing is referred to as a current block.
  • the above-mentioned inter prediction on at least one coding unit in the current coding tree unit or in the current coding tree unit may include:
  • historical candidate motion information in the historical candidate motion information list is added to the fused motion information candidate list or the candidate motion vector prediction list.
  • the historical candidate motion information in the historical candidate motion information list may not be added to the fusion motion information candidate list or the candidate motion vector prediction list, and the historical candidate motion information list may be kept independent, and the current block When performing prediction, the historical candidate motion information list is directly indexed. If the historical candidate motion information in the historical candidate motion information list is added to the fused motion information candidate list, the method in the JVET-K0104 proposal may be used, or other methods may be used to integrate the historical candidate motion information in the historical candidate motion information list according to The present invention does not limit the addition to the candidate list of fused motion information.
  • the position of the historical candidate motion information in the historical candidate motion information list may be before other types of fusion candidates, such as bidirectional prediction fusion candidates (bi -predictive mergecandidate) and zero motion vector fusion candidates.
  • the detailed process of generating the candidate list of fused motion information is as follows.
  • the fused motion information candidate list is constructed based on the following candidates: a. Up to four spatially fused motion information candidate lists obtained from five spatially neighboring blocks; b. One temporally fused motion information candidate obtained from two temporally co-located blocks; c . Additional fused motion information candidates containing combined bi-directional prediction candidates and zero motion vector candidates.
  • the first candidate in the fused motion information candidate list is a spatial neighbor. According to the right part of FIG. 6, by checking A1, B1, B0, A0, and B2 sequentially, it is possible to insert up to four candidates in the order in the merge list.
  • Some additional redundancy checks are performed before using all motion data of neighboring blocks as fusion motion information candidates. These redundancy checks can be divided into two categories for two different purposes: a. Avoiding candidates with redundant motion data in the list; b. Preventing the merging of two redundant syntaxes that can be represented in other ways Partition.
  • N is the number of spatially fused motion information candidate lists
  • the complete redundancy check will be determined by Comparison of secondary exercise data.
  • ten motion data comparisons will be required to ensure that all candidates in the merged list have different motion data.
  • the inspection of redundant motion data has been reduced to a subset, thereby maintaining a significant reduction in comparison logic while maintaining coding efficiency.
  • no more than two comparisons are performed for each candidate, resulting in a total of five comparisons. Given the order of ⁇ A1, B1, B0, A0, B2 ⁇ , B0 checks only B1, A0 checks only A1, and B2 checks only A1 and B1.
  • partition redundancy check the bottom PU and top PU of the 2N ⁇ N partition are merged by selecting candidate B1. This will result in one CU having two PUs with the same motion data, which can be signaled equally as 2N ⁇ 2N CU. Overall, this check applies to all second PUs with rectangular and asymmetric partitions 2N ⁇ N, 2N ⁇ nU, 2N ⁇ nD, N ⁇ 2N, nR ⁇ 2N, and nL ⁇ 2N. It should be noted that for the spatially fused motion information candidate list, only a redundancy check is performed, and motion data is copied from the candidate blocks as is. Therefore, no motion vector scaling is needed here.
  • the motion vector of the temporal fusion motion information candidate list is the same as that of TMVP. Since the fused motion information candidate list includes all motion data and TMVP is only one motion vector, obtaining the entire motion data depends only on the slice type. For bidirectional prediction slices, TMVP is obtained for each reference picture list. Depending on the availability of TMVP for each list, set the prediction type to bi-directional prediction or the list available to TMVP. All related reference picture indexes are set equal to zero. Therefore, for unidirectional prediction slices, only the TMVP of list 0 is obtained along with the reference picture index equal to zero.
  • the length of the fusion motion information candidate list list is fixed. After the spatial and temporal fusion motion information candidate list has been added, it may happen that the list does not yet have a fixed length. To compensate for the coding efficiency loss that occurs with non-length adaptive list index signaling, additional candidates are generated. Depending on the type of slice, up to two candidates can be used to completely populate the list: a. Combined bi-directional prediction candidates; b. Zero motion vector candidates.
  • another candidate can be generated based on the existing candidate. This is done by copying ⁇ x 0 , ⁇ y 0 , ⁇ t 0 from one candidate such as the first candidate and copying ⁇ x 1 , ⁇ y 1 , ⁇ t 1 from another candidate such as the second candidate.
  • Different combinations are predefined and given in Table 1.1.
  • zero motion vector candidates are calculated to make the list complete. All zero motion vector candidates have one zero displacement motion vector for one-way prediction slices and two zero displacement motion vectors for two-way prediction slices.
  • the reference index is set equal to zero and incremented by one for each additional candidate until the maximum number of reference indexes is reached. If this is the case, and there are other candidates missing, these candidates are created using a reference index equal to zero. For all other candidates, no redundancy check is performed, as the results show that omitting these checks does not cause a loss of coding efficiency.
  • merge_flag indicates that the block merge is used to obtain motion data.
  • merge_idx further determines the candidates in the merge list that provide all the motion data required by the MCP.
  • the number of candidates in the merge list is also signaled in the slice header. Since the default value is five, it is expressed as the difference from five (five_minus_max_num_merge_cand). In this way, five are signaled with a short codeword of 0, while only one candidate is signaled with a longer codeword of 4.
  • the entire process remains unchanged, but after the list contains the maximum number of fusion motion information candidate lists, the process is terminated.
  • the maximum value of the merge index coding is given by the amount of spatial and temporal candidate motion information available in the list.
  • the index can be efficiently encoded as a flag.
  • the entire fusion motion information candidate list list must be constructed to understand the actual number of candidates. Assuming neighboring blocks that are unavailable due to a transmission error, it will no longer be possible to parse the merge index.
  • the key application of the block merge concept in HEVC is the combination with the skip mode.
  • a skip mode was used to indicate blocks that speculate rather than explicitly signal motion data, and predict that the residual is zero, that is, no transform coefficients are sent.
  • the skip_flag is signaled at the beginning of each CU in the inter-picture prediction slice, which means the following: a. The CU contains only one PU (2N ⁇ 2N partition type); b. Use the merge mode to Get motion data (merge_flag is equal to 1); c. There is no residual data in the code stream.
  • a parallel merge estimation level indicating regions is introduced in HEVC, where a fusion motion information candidate list list can be obtained independently by checking whether a candidate block is located in the merge estimation region (MER).
  • MER merge estimation region
  • the candidate blocks in the same MER are not included in the fusion motion information candidate list list. Therefore, its motion data does not need to be available during list construction.
  • this level is, for example, 32, then all prediction units in a 32 ⁇ 32 region can construct a fusion motion information candidate list list in parallel, because all the fusion motion information candidate lists in the same 32 ⁇ 32MER are not inserted into the list.
  • All potential fusion motion information candidate lists for the first PU0 are available because they are outside the first 32 ⁇ 32MER.
  • the PU 2-6 fusion motion information candidate list list cannot contain motion data from these PUs. Therefore, for example, when viewing PU5, no fused motion information candidate list is available and therefore is not inserted into the fused motion information candidate list list.
  • the merged list of PU5 consists only of temporal candidate motion information (if available) and zero MV candidates.
  • the parallel merge estimation level is adaptive, and is signaled as log2_parallel_merge_level_minus2 in the picture parameter set.
  • Inter MVP mode also called AMVP
  • the detailed process of generating a candidate motion vector prediction list is as follows.
  • the initial design of the AMVP mode includes five MVPs from three different class predictions: three motion vectors from spatial neighbors, the median value of the three spatial predictions, and scaled motion vectors from neighboring blocks in the same time.
  • the list of predicted values is modified by reordering to place the most probable motion prediction values in the first position and by removing redundant candidates to ensure minimal signaling overhead.
  • the initial AMVP went through many simplifications, such as removing the median predicted value, reducing the number of candidates in the list from five to two, fixing the order of candidates in the list, and reducing the number of redundant checks.
  • the final design of the AMVP candidate list construction includes the following two MVP candidates: a. A maximum of two spatial candidate MVPs obtained from five spatially neighboring blocks; b.
  • FIG. 10 depicts the process of obtaining two spatial candidate motion information A and B.
  • candidate A the motion data from the two blocks A0 and A1 in the lower left corner are considered in a two-pass method. In the first pass, it is checked whether any candidate block contains a reference index equal to the reference index of the current block.
  • the first motion vector found will be candidate A.
  • the related motion vector cannot be used as it is. Therefore, in the second pass, the motion vector needs to be scaled according to the temporal distance between the candidate reference picture and the current reference picture.
  • Equation (1.1) shows how to scale the candidate motion vector mvcand according to the scaling factor.
  • the ScaleFactor is calculated based on the time distance between the current picture and the reference picture of the candidate block td and the time distance between the current picture and the reference picture of the current block tb.
  • the time distance is expressed as a difference between picture order number (POC) values that define a picture display order.
  • POC picture order number
  • the scaling operation is basically the same as the scheme used for time pass-through mode in H.264 / AVC.
  • This decomposition allows pre-calculation of the ScaleFactor at the slice level, as it only depends on the signaled reference picture list structure in the slice header. It should be noted that MV scaling is performed only when the current reference picture and the candidate reference picture are both short-term reference pictures.
  • the parameter td is defined as the POC difference between the co-located picture of the co-located candidate block and the reference picture.
  • candidate B the candidates B0 to B2 are sequentially checked in the same manner as A0 and A1 are checked in the first pass.
  • the second pass is performed only when the blocks A0 and A1 do not contain any motion information, that is, when it is not available or encoded using intra-picture prediction.
  • candidate A is found, candidate A is set equal to unscaled candidate B, and candidate B is set to a second unscaled or scaled variant equal to candidate B.
  • the second pass searches the unscaled and scaled MVs obtained from the candidates B0 to B2. Overall, this design allows A0 and A1 to be processed independently of B0, B1, and B2.
  • TMVP temporal motion vector predictor
  • HEVC provides an indication for each picture which The possibility of a reference picture being considered co-located. This is done by signalling the co-located reference picture list and reference picture index in the slice header and requiring that these syntax elements in all slices in the picture should specify the same reference picture.
  • temporal MVP candidates introduce additional dependencies, their use may need to be disabled for error robustness reasons.
  • the temporal pass-through mode (direct_spatial_mv_pred_flag) of the bidirectional prediction slice in the slice header may be disabled.
  • the HEVC syntax extends this signaling by allowing TMVP (sps / slice_temporal_mvp_enabled_flag) to be disabled at the sequence level or at the picture level.
  • TMVP sps / slice_temporal_mvp_enabled_flag
  • inter_pred_idc signals whether the reference list 0, 1 or both are used.
  • the corresponding reference picture ( ⁇ t) is signaled by the index ref_idx_l0 / 1 of the reference picture list, and MV ( ⁇ x, ⁇ y) is indicated by the index mvp_l0 / 1_flag of the MVP and its MVD .
  • a newly introduced flag mvd_l1_zero_flag in the slice header indicates whether the MVD of the second reference picture list is equal to zero and is therefore not signaled in the code stream.
  • the method in the JVET-K0104 proposal may be used, or other methods may be used to add the historical candidate motion information in the historical candidate motion information list according to the The present invention is not limited to the fusion motion information candidate list.
  • the motion information of the current block can be obtained from the current method.
  • the decoding end if the current block is in a merge / skip mode, the motion information of the current block is determined according to the fusion index carried in the code stream. If the current block is in the Inter MVP mode, the motion information of the current block is determined according to the inter prediction direction, reference frame index, motion vector prediction value index, and motion vector residual value transmitted in the code stream.
  • the decoding end perform motion compensation (motion compensation) according to the motion information to obtain a predicted image; further, the above step (4) may further include obtaining a residual image of the current block, and combining the inter-predicted image and the residual image The difference images are added to obtain a reconstructed image of the current block; if the current block has no residuals, the predicted image is the reconstructed image of the current block.
  • motion compensation motion compensation
  • the third embodiment described above may further include updating the historical candidate motion information list by using the motion information of the current block, this step may be after step (2) before step (4), or after step (4);
  • the method in the JVET-K0104 proposal or other methods may be used to update the historical candidate motion information list.
  • the motion information of the current block is compared with the historical candidate motion information in the historical candidate motion information list; if there is a historical candidate motion information and the current block motion information, Similarly, the historical candidate motion information is removed from the historical candidate motion information list. Then, check the size of the historical candidate motion information list. If the list size exceeds a preset size, remove the historical candidate motion information at the head of the list and remove the remaining historical candidate motion information from the current candidate motion information list.
  • the motion information of the current block is added to the tail of the historical candidate motion information list.
  • the above-mentioned step of determining whether the current block motion information is the same as a certain historical candidate motion information in the historical candidate motion information list may not be performed, that is, the historical candidate motion information list may exist
  • the two identical motion information may also be the same after some processing, for example, the results of two motion vectors being shifted right by 2 bits are the same.
  • the update of the historical candidate motion information list using the motion information of the current block is not included in the third embodiment, it means that the coding blocks in the current coding tree unit can be performed using the same historical candidate motion information list. Inter-prediction, thereby allowing parallel operations to be performed during the processing of coded blocks within a coding tree unit.
  • FIG. 12 is a flowchart of an example operation of implementing image decoding by applying the inter prediction method in FIG. 11 in an embodiment of the present invention according to the video decoder 30 shown in FIG. 1.
  • One or more functional units of the video decoder 30 include a prediction processing unit 360, which may be used to perform the method of FIG. In the example in FIG. 12, the picture is decoded based on the inter prediction method in the method in FIG. 11.
  • the decoding method 1200 specifically includes:
  • the historical candidate motion information list includes N storage spaces, and the N storage spaces are used to store historical candidate motion information.
  • the historical candidate motion information list includes at least M vacant storage spaces, where M ⁇ N, M and N are integers, and the current coding tree unit is included in a coding tree unit set (Slice) composed of multiple coding tree units, The current coding tree unit is not the first one in the coding tree unit set according to a predetermined processing sequence;
  • the initialization process can also be replaced as follows:
  • the coding tree unit is generally used in a row-by-line operation mode. In this operation mode, if the current coding tree unit is the first of a coding tree unit row, or the current coding unit tree is located in The first combination of parallel coding unit (the parallel coding tree unit is composed of consecutive K coding tree units located in the same coding tree unit row, K is greater than or equal to 1, and less than the total number of coding tree units L in a coding tree unit row , L is greater than or equal to 2).
  • Initializing the historical candidate motion information list helps to improve the coding efficiency.
  • S1203 adds motion information at L positions in the spatially adjacent blocks of the current coding tree unit to the historical candidate motion information list in a predetermined order, where M ⁇ L ⁇ N, the spatially adjacent blocks The L positions are obtained according to a preset rule;
  • S1205 constructs the current coding tree unit or the current candidate motion information list of the current coding unit
  • S1207 obtains the motion information of the current coding tree unit or the current coding unit from the combination of the historical candidate motion information list and the current candidate motion information list, and according to the motion information of the current coding tree unit or the current coding unit, Performing inter prediction on the current coding tree unit or the current coding unit to obtain an inter prediction image;
  • This step includes parsing the bitstream, the current coding tree unit or the motion information index corresponding to the current coding unit, and obtaining the current information from the combination of the historical candidate motion information list and the current candidate motion information list. Coding motion information of a tree unit or a current coding unit, and performing inter prediction on the current coding tree unit or the current coding unit according to the motion information to obtain an inter prediction image;
  • S1209 adds the obtained inter prediction image and the current coding tree unit or a residual image of the current coding unit to obtain the current coding tree unit or a reconstructed image of the current coding unit.
  • the update of the historical candidate motion information list at the CTU level is adopted, allowing line-level and CTU-level codecs to be parallelized, which can effectively reduce the decoding time.
  • FIG. 13 is a flowchart of an example operation of implementing image encoding by applying the fusion candidate list construction method constructed in FIG. 11 in an embodiment of the present invention according to the video encoder 20 shown in FIG. 1.
  • One or more functional units of the video encoder 20 include a prediction processing unit 260, which may be used to execute the method of FIG. In the example in FIG. 13, the picture is encoded based on the inter prediction method in the method in FIG. 11.
  • the encoding method 1300 specifically includes:
  • the historical candidate motion information list includes N storage spaces, and the N storage spaces are used to store historical candidate motion information.
  • the initialized historical candidate motion information list includes at least M vacant storage spaces. M ⁇ N, M and N are integers, the current coding tree unit is included in a coding tree unit set (Slice) composed of multiple coding tree units, and the current coding tree unit is not in the coding tree unit set Follow the first of the predetermined processing sequence;
  • the initialization process can also be replaced in the following manner:
  • the coding tree unit is generally used in a row-by-line operation mode. In this operation mode, if the current coding tree unit is the first of a coding tree unit row, or the current coding unit tree is located in The first combination of parallel coding units (the parallel coding tree unit is composed of consecutive K coding tree units located in the same coding tree unit row, K is greater than or equal to 1, and less than the total number of coding tree units L in a coding unit row, L is greater than or equal to 2).
  • Initializing the historical candidate motion information list helps to improve the coding efficiency.
  • S1303 adds motion information at L positions in the spatially adjacent blocks of the current coding tree unit to the historical candidate motion information list in a predetermined order, where M ⁇ L ⁇ N, the spatially adjacent blocks The L positions are obtained according to a preset rule;
  • the current candidate motion information list in the mode includes a motion vector
  • the current candidate motion information list includes a bidirectional reference or a unidirectional reference indication, a reference frame index, and a reference direction corresponding to the reference direction.
  • S1305 construct the current coding tree unit or the current candidate motion information list of the current coding unit
  • S1307 obtain the motion information of the current coding tree unit or the current coding unit and the motion information index of the motion information from a combination of the historical candidate motion information list and the current candidate motion information list;
  • S1311 subtracts the obtained current prediction tree unit or the original image of the current encoding unit from the obtained inter prediction image to obtain a residual image
  • S1213 encodes the residual image and the motion information index to form a code stream.
  • the update of the historical candidate motion information list at the CTU level is adopted, allowing line-level and CTU-level codecs to be parallelized, which can effectively reduce the encoding time.
  • FIG. 14 is an inter prediction device 1400 provided by the present invention.
  • the inter prediction device has the functions of implementing the inter prediction method in FIG. 11 described above, and includes: an initialization module 1401 for initializing the current coding tree unit.
  • a corresponding historical candidate motion information list where the historical candidate motion information list includes N storage spaces, the N storage spaces are used to store historical candidate motion information, and the initialized historical candidate motion information list includes at least M vacant storage space, where M ⁇ N, M and N are integers, the current coding tree unit is included in a coding tree unit set (Slice) composed of multiple coding tree units, and the current coding tree unit It is not the first one in the coding tree unit set according to a predetermined processing order; the historical candidate motion information list construction module 1403 is configured to, in accordance with a predetermined order, L within the spatially adjacent block of the current coding tree unit Motion information at each position is added to the historical candidate motion information list, where M ⁇ L ⁇ N positions in adjacent blocks in the spatial domain are according to
  • the initialization module 140 is configured to, when the current coding tree unit is located at the first of a coding tree unit row or the first of a combination of parallel coding tree units, initialize with the current coding tree unit. List of historical candidate motion information corresponding to the coding tree unit.
  • the M positions in the adjacent blocks in the spatial domain are: obtaining the first candidate motion information from a preset position in the adjacent blocks in the spatial domain, and using the position where the first candidate motion information is obtained as a starting point, The remaining M-1 candidate motion information is acquired at a preset step size, the preset step size is a fixed value, or the predetermined step size is changed according to a preset rule.
  • the prediction module 1405 is configured to obtain the motion information of the current coding unit according to a combination of the current candidate motion information list and the historical candidate motion information list of the current coding unit, and according to the acquired motion The information performs inter prediction on the current coding unit; the device 1400 further includes: a historical motion information list update module 1407, configured to update the historical candidate motion information list based on the current coding unit motion information.
  • the historical motion information list update module updates the historical candidate motion information list according to the following rule: if the M positions are not filled, the current coding unit motion information is added to the history as historical motion information In the candidate motion information list, the vacant storage space closest to the NM position among the M positions; or; if the M positions are full, the historical candidate motion information will be added to the historical candidate motion information first according to the first-in, first-out principle Remove the historical motion information in the list, and shift the remaining historical motion information to the position of the removed historical motion information, and add the current coding unit motion information as historical motion information to the tail of the historical candidate motion information list, Wherein, the end of the historical candidate motion information list containing the latest added historical motion information is the tail of the historical candidate motion information list.
  • the current candidate motion information list construction module 1403 is further configured to: add historical candidate motion information from the historical candidate motion information list to the current candidate motion information list of the current coding unit; correspondingly, the prediction module 1405
  • the motion information of the current coding unit will be acquired according to the current candidate motion information list of the current coding unit, and the current coding unit will be inter-predicted according to the acquired motion information.
  • the prediction module 1405 is configured to obtain the motion information of the current coding unit according to a combination of the current candidate motion information list and the historical candidate motion information list of the current coding unit, and according to the acquired Motion information performs inter prediction on the current coding unit; the prediction module 1405 is further configured to perform inter prediction on another coding unit based on the same method as the current coding unit, wherein the another coding unit It is located behind the current coding unit according to a preset processing order and belongs to the coding tree unit with the current coding unit, and the historical motion information list used for inter prediction of the other coding unit includes the current coding unit Historical motion information in the historical motion information list used for inter prediction.
  • the historical candidate motion information list constructing module 1403 is configured to, in a clockwise order, start from a spatial neighboring block in the lower left corner of the current coding tree unit, and use the upper right corner of the current coding tree unit as a starting point.
  • the spatial information of the spatially adjacent blocks is used as an end point, and the motion information at L positions in the spatially adjacent blocks is added to the historical candidate motion information list.
  • FIG. 15 is an encoding device 1500 provided by the present invention.
  • the encoding device has the functions of implementing the encoding method in FIG. 12 described above, and includes: an inter prediction device 1501 (same as the inter prediction device 1400) to obtain the current encoding.
  • the inter-predicted image of the tree unit or the inter-predicted image of the current coding unit; the obtaining the inter-predicted image of the current coding tree unit or the inter-predicted image of the current coding unit includes: Obtaining the current motion information of the current coding tree unit or the current coding unit and the motion information index of the motion information from the combination of the current candidate motion information list; The current coding tree unit or the current coding unit performs inter prediction to obtain an inter-predicted image; a residual calculation module 1503 is configured to compare an original image of the current coding tree unit or the current coding unit with an obtained inter-frame Predictive image subtraction to obtain a residual image; and an encoding module 1505, which is used for the residual Index image and the motion information encoding code stream formation.
  • FIG. 16 is a decoding device 1600 provided by the present invention.
  • the decoding device has functions for implementing the decoding method in FIG. 13 described above, and includes: an inter prediction device 1601 (same as the inter prediction device 1400), for: Acquiring an inter-prediction image of the current coding tree unit or an inter-prediction image of the current coding unit; and a reconstruction module (1603), configured to link the obtained inter-prediction image with the current coding tree unit or the current The residual images of the coding units are added to obtain the current coding tree unit or a reconstructed image of the current coding unit.
  • an inter prediction device 1601 (same as the inter prediction device 1400), for: Acquiring an inter-prediction image of the current coding tree unit or an inter-prediction image of the current coding unit
  • a reconstruction module (1603) configured to link the obtained inter-prediction image with the current coding tree unit or the current
  • the residual images of the coding units are added to obtain the current coding tree unit or a
  • FIG. 17 is a general schematic diagram of a device for implementing the methods in FIGS. 11 to 13 provided by the present invention.
  • the device 1700 may be an inter prediction device, an encoding device, and a decoding device.
  • the device includes Yes, an inter prediction device includes a digital processor 1701 and a memory 1702, and an executable instruction set is stored in the memory, and the digital processor reads the instruction set stored in the memory for Implement the method described in FIGS. 11 to 13.
  • a computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium such as a data storage medium or a communication medium including any medium that facilitates transfer of a computer program from one place to another, according to a communication protocol, for example.
  • computer-readable media generally may correspond to (1) tangible computer-readable storage media that is non-transitory, or (2) a communication medium such as a signal or carrier wave.
  • a data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and / or data structures used to implement the techniques described in this disclosure.
  • the computer program product may include a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage or other magnetic storage devices, flash memory, or may be used to store instructions or data structures Any other media that requires program code and is accessible by the computer.
  • any connection is properly termed a computer-readable medium.
  • a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit instructions from a website, server, or other remote source
  • Coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the medium.
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transitory tangible storage media.
  • magnetic disks and compact discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible discs and Blu-ray discs, where the discs are usually magnetic The data is reproduced, while the optical disk uses a laser to reproduce the data optically. Combinations of the above should also be included within the scope of computer-readable media.
  • processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits , ASIC), field programmable logic array (field programmable logic arrays, FPGA) or other equivalent integrated or discrete logic circuits.
  • DSPs digital signal processors
  • ASIC application specific integrated circuits
  • FPGA field programmable logic arrays
  • processors may refer to any of the above-described structures or any other structure suitable for implementing the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and / or software modules for encoding and decoding, or incorporated in a composite codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
  • the techniques of this disclosure may be implemented in a variety of devices or devices that include a wireless handset, an integrated circuit (IC), or a collection of ICs (eg, a chipset).
  • IC integrated circuit
  • the present disclosure describes various components, modules, or units to emphasize functional aspects of the device for performing the disclosed techniques, but does not necessarily need to be implemented by different hardware units.
  • the various units may be combined in a codec hardware unit in combination with suitable software and / or firmware, or provided by a collection of interoperable hardware units, which include as described above One or more processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé de prédiction inter-trame. Le procédé comprend les étapes suivantes : si l'unité d'arborescence de codage actuelle est la première dans une rangée d'unités d'arborescence de codage, ou la première dans une association d'unités d'arborescence de codage parallèles, initialiser une liste d'informations de mouvement candidates historiques correspondant à l'unité d'arborescence de codage actuelle, les unités d'arborescence de codage parallèles comprenant K unités d'arborescence de codage consécutives dans la même rangée d'unités d'arborescence de codage, K étant supérieur ou égal à 1 et inférieur au nombre total d'unités d'arborescence de codage dans la rangée d'unités d'arborescence de codage, la liste d'informations de mouvement candidates historiques comprenant N espaces de stockage, la liste d'informations de mouvement candidates historiques initialisée comprenant au moins M espaces de stockage vacants, M≤N, l'unité d'arborescence de codage actuelle étant comprise dans un ensemble (tranche) d'unités d'arborescence de codage constitué d'une pluralité d'unités d'arborescence de codage, et l'unité d'arborescence de codage actuelle n'étant pas la première dans l'ensemble d'unités d'arborescence de codage selon un ordre de traitement prédéterminé ; faire monter les informations de mouvement dans L positions dans des blocs adjacents au domaine spatial de l'unité d'arborescence de codage actuelle vers la liste d'informations de mouvement candidates historiques selon l'ordre prédéterminé, où M≤L≤N ; et réaliser une prédiction inter-trame sur l'unité d'arborescence de codage actuelle ou l'unité de codage actuelle sur la base de la liste d'informations de mouvement candidates historiques.
PCT/CN2019/101893 2018-08-28 2019-08-22 Procédé et dispositif de prédiction inter-trame, et procédé et dispositif de codage/décodage pour leur application WO2020042990A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810990347.3 2018-08-28
CN201810990347.3A CN110868589B (zh) 2018-08-28 2018-08-28 帧间预测方法、装置及其应用的编/解方法及装置
CN201811164177.X 2018-10-04
CN201811164177.XA CN111010565B (zh) 2018-10-04 2018-10-04 帧间预测方法、装置及其应用的编/解方法及装置

Publications (1)

Publication Number Publication Date
WO2020042990A1 true WO2020042990A1 (fr) 2020-03-05

Family

ID=69643928

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/101893 WO2020042990A1 (fr) 2018-08-28 2019-08-22 Procédé et dispositif de prédiction inter-trame, et procédé et dispositif de codage/décodage pour leur application

Country Status (1)

Country Link
WO (1) WO2020042990A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084512A1 (fr) * 2015-11-20 2017-05-26 Mediatek Inc. Procédé et appareil de prédiction de vecteur de mouvement ou de dérivation de candidat à la fusion lors d'un codage vidéo
CN107615765A (zh) * 2015-06-03 2018-01-19 联发科技股份有限公司 视频编解码系统中在帧内块复制模式和帧间预测模式之间的资源共享的方法和装置
CN108293111A (zh) * 2015-10-16 2018-07-17 Lg电子株式会社 用于改善在图像编码系统中进行预测的滤波方法和装置
CN108293128A (zh) * 2015-11-20 2018-07-17 联发科技股份有限公司 视频编解码系统中全局运动补偿的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107615765A (zh) * 2015-06-03 2018-01-19 联发科技股份有限公司 视频编解码系统中在帧内块复制模式和帧间预测模式之间的资源共享的方法和装置
CN108293111A (zh) * 2015-10-16 2018-07-17 Lg电子株式会社 用于改善在图像编码系统中进行预测的滤波方法和装置
WO2017084512A1 (fr) * 2015-11-20 2017-05-26 Mediatek Inc. Procédé et appareil de prédiction de vecteur de mouvement ou de dérivation de candidat à la fusion lors d'un codage vidéo
CN108293128A (zh) * 2015-11-20 2018-07-17 联发科技股份有限公司 视频编解码系统中全局运动补偿的方法及装置

Similar Documents

Publication Publication Date Title
CN112956190B (zh) 仿射运动预测
JP7395580B2 (ja) ビデオデコーダおよび方法
CN111107356B (zh) 图像预测方法及装置
CN110868589B (zh) 帧间预测方法、装置及其应用的编/解方法及装置
CN112823518A (zh) 用于译码块的三角划分块的帧间预测的装置及方法
JP7237144B2 (ja) ビデオ処理方法、ビデオ処理装置、エンコーダ、デコーダ、媒体、およびコンピュータプログラム
CN110881129B (zh) 视频解码方法及视频解码器
JP7164710B2 (ja) ビデオ復号化方法及びビデオ・デコーダ
CN113796071A (zh) 编码器、解码器及用于ibc融合列表的相应方法
CN113660497B (zh) 编码器、解码器和使用ibc合并列表的对应方法
CN112673626A (zh) 各分割约束元素之间的关系
JP2022535859A (ja) Mpmリストを構成する方法、クロマブロックのイントラ予測モードを取得する方法、および装置
CN113597769A (zh) 基于光流的视频帧间预测
CN110944184B (zh) 视频解码方法及视频解码器
CN111010565B (zh) 帧间预测方法、装置及其应用的编/解方法及装置
CN110958452B (zh) 视频解码方法及视频解码器
WO2020063598A1 (fr) Codeur vidéo, décodeur vidéo, et procédés correspondants
WO2020042990A1 (fr) Procédé et dispositif de prédiction inter-trame, et procédé et dispositif de codage/décodage pour leur application
JP7507827B2 (ja) ビデオ復号化方法及びビデオ・デコーダ
WO2020038357A1 (fr) Procédé de construction de liste de candidats à une fusion, dispositif, et procédé et dispositif de codage/décodage
WO2020048361A1 (fr) Procédé de décodage vidéo et décodeur vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19855052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19855052

Country of ref document: EP

Kind code of ref document: A1