CN118251887A - Flip mode for chroma and intra template matching - Google Patents

Flip mode for chroma and intra template matching Download PDF

Info

Publication number
CN118251887A
CN118251887A CN202380014496.5A CN202380014496A CN118251887A CN 118251887 A CN118251887 A CN 118251887A CN 202380014496 A CN202380014496 A CN 202380014496A CN 118251887 A CN118251887 A CN 118251887A
Authority
CN
China
Prior art keywords
block
current
mode
luma
chroma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380014496.5A
Other languages
Chinese (zh)
Inventor
赵欣
李贵春
陈联霏
许晓中
刘杉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent America LLC
Original Assignee
Tencent America LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/382,922 external-priority patent/US20240236324A9/en
Application filed by Tencent America LLC filed Critical Tencent America LLC
Priority claimed from PCT/US2023/077597 external-priority patent/WO2024091913A1/en
Publication of CN118251887A publication Critical patent/CN118251887A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video bitstream is received, the video bitstream including encoding information of a current block of a current picture. The encoding information indicates that the current block is encoded by a flip mode in which a position of a sample of the current block is adjusted within the current block. A reference block is determined for the current block from a plurality of candidate reference blocks in a reconstructed region of the current image based on a template matching TM cost, the TM cost indicating a difference between a template of the current block and respective templates of the plurality of candidate reference blocks. A reconstructed block of the current block is determined based on the determined reference block. Based on the flipped mode, the current block is reconstructed by adjusting the positions of the samples of the reconstructed block within the reconstructed block.

Description

Flip mode for chroma and intra template matching
Cross Reference to Related Applications
The present application claims the benefit of priority from U.S. patent application Ser. No. 18/382,922, entitled "FLIPPING MODE FOR CHROMA AND INTRA TEMPLATE MATCHING", filed on day 2023, month 10, 23, which claims the benefit of priority from U.S. provisional application Ser. No. 63/418,944, entitled "Flipping Mode for Chroma AND INTRA TEMPLATE MATCHING", filed on day 2022, month 10, 24. The contents of the prior application are hereby incorporated by reference in their entirety.
Technical Field
Embodiments generally associated with video coding are described.
Background
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Picture/video compression may facilitate transfer of picture/video data between different devices, memories, and networks, and minimize quality degradation of the picture/video data. In some examples, video codec techniques may compress video based on spatial and temporal redundancy. In one example, a video codec may use a technique called intra prediction, which may compress pictures based on spatial redundancy. For example, intra prediction may use reference data from the current image being reconstructed for sample prediction. In another example, a video codec may use a technique called inter prediction, which may compress pictures based on temporal redundancy. For example, inter prediction may predict samples in a current image from a previously reconstructed image and perform motion compensation. Motion compensation may be represented by Motion Vectors (MVs).
Disclosure of Invention
Aspects of the present disclosure include methods and apparatus for video encoding/decoding. In some examples, an apparatus for video decoding includes processing circuitry.
According to an aspect of the present disclosure, a video decoding method is provided. In the method, a video bitstream is received, the video bitstream including encoding information of a current block in a current picture. The encoding information indicates that the current block is encoded by a flip mode in which a position of a sample of the current block is adjusted within the current block. A reference block is determined for the current block from a plurality of candidate reference blocks in a reconstructed region of the current image based on a template matching (TEMPLATE MATCHING, TM) cost, the TM cost indicating a difference between a template of the current block and respective templates of the plurality of candidate reference blocks. A reconstructed block of the current block is determined based on the determined reference block. Based on the flipped mode, the current block is reconstructed by adjusting the position of the samples of the reconstructed block within the reconstructed block.
In one example, the flip mode includes one of: (i) A vertical flip mode configured to adjust positions of samples of the current block such that upper and lower portions of the current block are inverted within the current block, and (ii) a horizontal flip mode configured to adjust positions of samples of the current block such that left and right portions of the current block are inverted within the current block.
In one example, first encoded information is received from a received video bitstream. The first encoding information indicates whether a roll-over mode is applied to the reconstructed block. The type of the flipped mode is determined based on the second encoded information in the received video bitstream in response to the first encoded information indicating that the flipped mode is to be applied to the reconstructed block. Based on the determined type of the flip mode, the current block is reconstructed by adjusting the positions of the samples of the reconstructed block.
In one example, a plurality of candidate reference blocks are determined within a search region in a reconstruction region of a current image. A TM cost is determined between the template of the current block and the template of each of the plurality of candidate reference blocks. A reference block is determined from the plurality of candidate reference blocks, the reference block corresponding to a minimum TM cost of TM costs between the template of the current block and the templates of the plurality of candidate reference blocks.
In one example, the vertical extent of the search area is less than the horizontal extent of the search area based on the flip mode being a horizontal flip. In one example, the vertical extent of the search area is greater than the horizontal extent of the search area based on the flip mode being a vertical flip.
In one example, the determined reference block is flipped by adjusting a position of a sample of the determined reference block within the determined reference block based on a flipped pattern. Based on the flipped reference block, a reconstructed block is determined.
According to another aspect of the present invention, there is provided a video decoding method. In the method, a video bitstream is determined, the video bitstream including a current chroma block in a current image and a collocated luma block of the current chroma block. It is determined whether the characteristic value of the collocated luminance block is greater than a predefined threshold. The characteristic value is associated with one or more predefined coding modes applied to the collocated luma block. And determining chroma coding mode information of the current chroma block according to the luma coding mode information of the predefined luma block based on the characteristic value being larger than the predefined threshold value. Based on the determined chroma coding mode information, the current chroma block is reconstructed.
In one example, the characteristic value indicates one of: (i) a number of luma co-located block positions in the co-located luma block encoded by the one or more predefined coding modes, (ii) a size of a luma region in the co-located luma block encoded by the one or more predefined coding modes, and (iii) a ratio of luma regions in the co-located luma block encoded by the one or more predefined coding modes.
In one example, the luminance juxtaposition block positions include a center position and four angular positions of the juxtaposition luminance blocks.
In an example, the one or more predefined coding modes include at least one of: intra Block Copy (IBC) flip mode, IBC mode indicated by Block Vector (BV), IBC rotation mode, and IBC geometry segmentation mode.
In an example, the one or more predefined coding modes include at least one of an intra template matching Prediction (INTRA TEMPLATE MATCHING Prediction, intraTMP) flip mode and a IntraTMP mode indicated by the displacement vector.
In an example, the chroma coding mode information for a current chroma block is determined based on one of: the method includes (i) luma coding mode information of a first luma block that is coded along a predefined scanning order by one of one or more predefined coding modes, (ii) luma coding mode information of a most common mode among luma co-located block positions coded by the one or more predefined coding modes, and (iii) first luma coding mode information of the one or more predefined coding modes that is used N or more times along the predefined scanning order, wherein N is a positive integer.
In an example, chroma coding mode information for a current chroma block is determined based on luma coding mode information for samples at predefined collocated luma positions. The samples at the predefined collocated luma positions are encoded by one of the one or more predefined encoding modes, and the predefined collocated luma positions include one of a center position and an upper left corner position of the collocated luma block.
In an example, a plurality of candidate reference chroma blocks are determined within a search range in a current image indicated by a Block Vector (BV) of the current chroma block based on one or more predefined coding modes including IntraTMP modes. The BV of the current chroma block is determined based on one of the plurality of associated luma blocks. The plurality of associated luminance blocks includes: (i) a first IntraTMP encoded block along a predefined scan order, (ii) a collocated luma block, and (iii) a IntraTMP encoded block corresponding to first IntraTMP mode information that is used N or more times along the predefined scan order. A plurality of flipped modes is determined for each candidate reference chroma block of the plurality of candidate reference chroma blocks. A TM cost between a template of each candidate reference chroma block of the plurality of candidate reference chroma blocks and a template of the current chroma block is determined based on each of the plurality of flipped modes. A flipped mode is determined from the plurality of flipped modes according to the plurality of flipped modes, the flipped mode corresponding to a minimum TM cost of TM costs between a template of each candidate reference chroma block of the plurality of candidate reference chroma blocks and a template of the current chroma block. The flipped mode of the current chroma block is determined as a flipped mode determined from the plurality of flipped modes.
In an example, a BV of the current chroma block is determined as a scaled BV of one of the plurality of associated luma blocks based on a scaling factor, wherein the scaling factor is determined based on a sub-sampling ratio associated with the current chroma block.
In an example, chroma coding mode information for a current chroma block is determined based on scaled luma coding mode information for a predefined luma block according to a scaling factor, wherein the scaling factor is determined based on a sampling format of the current chroma block.
According to another aspect of the invention, an apparatus is provided. The apparatus includes a processing circuit. The processing circuitry may be configured to perform any of the methods described for video decoding/encoding.
Aspects of the present invention also provide a non-transitory computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform any of the methods for video decoding/encoding.
Drawings
Further features, properties and various advantages of the disclosed subject matter will become more apparent from the following detailed description and drawings, in which:
fig. 1 is a schematic diagram of an exemplary block diagram of a communication system (100).
Fig. 2 is a schematic diagram of an exemplary block diagram of a decoder.
Fig. 3 is a schematic diagram of an exemplary block diagram of an encoder.
Fig. 4A, 4B, 4C, and 4D illustrate exemplary reference samples for an Intra Block Copy (IBC) process.
Fig. 5 is a schematic diagram of an exemplary Block Vector (BV) adjustment based on horizontal flipping.
Fig. 6 is a schematic diagram of an exemplary BV adjustment based on vertical flip.
Fig. 7 is a schematic diagram of intra template matching Prediction (INTRA TEMPLATE MATCHING Prediction, intraTMP).
Fig. 8 illustrates a flow chart outlining a first decoding process according to some embodiments of the present disclosure.
Fig. 9 shows a flowchart outlining a first encoding process according to some embodiments of the present disclosure.
Fig. 10 illustrates a flow chart outlining a second decoding process according to some embodiments of the present disclosure.
Fig. 11 illustrates a flow chart outlining a second encoding process according to some embodiments of the present disclosure.
FIG. 12 is a schematic diagram of a computer system according to an embodiment.
Detailed Description
Fig. 1 shows a block diagram of a video processing system (100) in some examples. As an example of an application of the disclosed subject matter, the video processing system (100) is a video encoder and video decoder in a streaming environment. The presently disclosed subject matter is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV, streaming services, storing compressed video on digital media including CD, DVD, memory stick, etc.
The video processing system (100) includes an acquisition subsystem (113) that may include a video source (101), such as a digital camera, for example, that creates an uncompressed video image stream (102). In one example, a video image stream (102) includes samples taken by a digital camera. The video image stream (102) is depicted as a bold line compared to the encoded video data (104) (or encoded video bit stream) to emphasize high data amounts, the video image stream (102) being processable by an electronic device (120), the electronic device (120) comprising a video encoder (103) coupled to a video source (101). The video encoder (103) may include hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in more detail below. The encoded video data (104) (or encoded video bitstream) is depicted as thin lines compared to the video image stream (102) to emphasize lower amounts of data, which may be stored on a streaming server (105) for future use. One or more streaming client sub-systems, such as client sub-system (106) and client sub-system (108) in fig. 1, may access streaming server (105) to retrieve copies (107) and (109) of encoded video data (104). The client subsystem (106) may include, for example, a video decoder (110) in an electronic device (130). A video decoder (110) decodes an incoming copy (107) of encoded video data and generates an output video image stream (111) that can be presented on a display (112) (e.g., a display screen) or another presentation device (not depicted). In some streaming systems, encoded video data (104), video data (107), and video data (109) (e.g., a video bitstream) may be encoded according to some video encoding/compression standard. Examples of such criteria include ITU-T recommendation H.265. In one example, the video coding standard being developed is informally referred to as next generation video coding (VERSATILE VIDEO CODING, VVC). The disclosed subject matter may be used in the context of VVC standards.
It should be noted that the electronic device (120) and the electronic device (130) may include other components (not shown). For example, the electronic device (120) may include a video decoder (not shown), and the electronic device (130) may also include a video encoder (not shown).
Fig. 2 shows an exemplary block diagram of a video decoder (210). The video decoder (210) may be disposed in the electronic device (230). The electronic device (230) may include a receiver (231) (e.g., a receiving circuit). The video decoder (210) may be used in place of the video decoder (110) in the example of fig. 1.
The receiver (231) may receive one or more coded video sequences (e.g., contained in a bitstream) to be decoded by the video decoder (210). In one embodiment, one encoded video sequence is received at a time, wherein the decoding of each encoded video sequence is independent of the decoding of the other encoded video sequences. The encoded video sequence may be received from a channel (201), which may be a hardware/software link to a storage device storing encoded video data. The receiver (231) may receive encoded video data as well as other data, e.g. encoded audio data and/or auxiliary data streams, which may be forwarded to their respective use entities (not shown). The receiver (231) may separate the encoded video sequence from other data. To prevent network jitter, a buffer memory (215) may be coupled between the receiver (231) and the entropy decoder/parser (220) (hereinafter "parser (220)"). In some applications, the buffer memory (215) is part of the video decoder (210). In other cases, the buffer memory (215) may be disposed external (not shown) to the video decoder (210). While in other cases the exterior of the video decoder (210) is provided with a buffer memory (not shown) to prevent network jitter, for example, and another buffer memory (215) may be configured inside the video decoder (210) to handle playout timing, for example. The buffer memory (215) may not be required to be configured or may be made smaller when the receiver (231) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network. Of course, for use over best effort packet networks such as the internet, a buffer memory (215) may also be required, which may be relatively large and advantageously may be of an adaptive size, and may be implemented at least in part in the operating system or in a similar element (not shown) external to the video decoder (210).
The video decoder (210) may include a parser (220) to reconstruct the symbols (221) from the encoded video sequence. The categories of symbols include information for managing the operation of the video decoder (210) and potential information to control a presentation device, such as a presentation device (212) (e.g., a display screen), that is not an integral part of the electronic device (230) but that may be coupled to the electronic device (230), as shown in fig. 2. The control information for the one or more rendering devices may be a supplemental enhancement information (Supplemental Enhancement Information, SEI information) message or a parameter set fragment (not shown) of video availability information (Video Usability Information, VUI). A parser (220) may parse/entropy decode the received encoded video sequence. The encoding of the encoded video sequence may be in accordance with video encoding techniques or standards, and may follow various principles, including variable length encoding, huffman coding (Huffman coding), arithmetic coding with or without context sensitivity, and so forth. The parser (220) may extract a subgroup parameter set for at least one of the subgroups of pixels in the video decoder from the encoded video sequence based on the at least one parameter corresponding to the group. The subgroup may include a group of pictures (Group of Pictures, GOP), pictures, tiles, slices, macroblocks, coding Units (CUs), blocks, transform Units (TUs), prediction Units (PUs), and so on. The parser (220) may also extract information, such as transform coefficients, quantizer parameter values, motion vectors, etc., from the encoded video sequence.
The parser (220) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (215) to create symbols (221).
Depending on the type of encoded video image or a portion of encoded video image (e.g., inter and intra images, inter and intra blocks), and other factors, the reconstruction of the symbol (221) may involve a plurality of different units. Which units are involved and how are involved can be controlled by subgroup control information parsed by a parser (220) from the encoded video sequence. For brevity, such subgroup control information flow between the parser (220) and the units below is not described.
In addition to the functional blocks already mentioned, the video decoder (210) can be conceptually subdivided into several functional units as described below. In practical implementations operating under commercial constraints, many of these units interact tightly with each other and may be at least partially integrated with each other. However, for the purpose of describing the disclosed subject matter, it is conceptually subdivided into the following functional units.
The first unit is a scaler/inverse transform unit (251). A scaler/inverse transform unit (251) receives, from a parser (220), quantized transform coefficients and control information as symbols (221), which transform scheme, block size, quantization factor, quantization scaling matrix, etc. are used. The sealer/inverse transform unit (251) may output a block including sample values, which may be input into the aggregator (255).
In some cases, the output samples of the sealer/inverse transform unit (251) may belong to an intra-coded block. The intra-coded block does not use predictive information from a previously reconstructed image, but may use predictive information from a previously reconstructed portion of the current image. Such predictive information may be provided by an intra image prediction unit (252). In some cases, the intra image prediction unit (252) uses surrounding reconstructed information extracted from the current image buffer (258) to generate blocks of the same size and shape as the block being reconstructed. For example, the current image buffer (258) buffers the partially reconstructed current image and/or the fully reconstructed current image. In some cases, the aggregator (255) adds, on a per sample basis, the prediction information generated by the intra prediction unit (252) to the output sample information provided by the sealer/inverse transform unit (251).
In other cases, the output samples of the scaler/inverse transform unit (251) may belong to inter-coded and potentially motion compensated blocks. In this case, the motion compensated prediction unit (253) may access the reference picture memory (257) to extract samples for prediction. After motion compensation of the extracted samples according to the symbols (221) belonging to the block, these samples may be added by an aggregator (255) to the output of the sealer/inverse transform unit (251), in this case referred to as residual samples or residual signals, generating output sample information. The motion compensated prediction unit (253) obtains prediction samples from an address in the reference image memory (257), which address may be controlled by a motion vector, and which motion vector is in the form of the symbol (221) for use by the motion compensated prediction unit (253), which symbol (221) may for example comprise X, Y and reference image components. The motion compensation may also include interpolation of sample values extracted from the reference image memory (257), motion vector prediction mechanisms, etc. when sub-sample accurate motion vectors are used.
The output samples of the aggregator (255) may be employed by various loop filtering techniques in a loop filter unit (256). Video compression techniques may include in-loop filter techniques that are controlled by parameters included in the encoded video sequence (also referred to as an encoded video bitstream) and that are available to a loop filter unit (256) as symbols (221) from a parser (220). Video compression may also be responsive to meta information obtained during decoding of a previous (in decoding order) portion of an encoded image or encoded video sequence, as well as to previously reconstructed and loop filtered sample values.
The output of the loop filter unit (256) may be a sample stream, which may be output to the rendering device (212) and stored in a reference picture memory (257) for subsequent inter picture prediction.
Once fully reconstructed, some of the encoded pictures can be used as reference pictures for subsequent prediction. For example, once an encoded image corresponding to a current image is fully reconstructed and the encoded image has been identified (by, for example, a parser (220)) as a reference image, the current image buffer (258) may become part of a reference image memory (257) and a new current image buffer may be reallocated before starting to reconstruct a subsequent encoded image.
The video decoder (210) may perform decoding operations according to a predetermined video compression technique or standard such as ITU-T Rec h.265. The coded video sequence may conform to the syntax specified by the video compression technique or standard used in the sense that the coded video sequence conforms to both the syntax of the video compression technique or standard and the configuration file recorded in the video compression technique or standard. In particular, a profile may select some tools from all tools available in a video compression technology or standard as the only tools available under the profile. It may be necessary for compliance that the complexity of the encoded video sequence be within the range defined by the hierarchy of video compression techniques or standards. In some cases, the hierarchy limits a maximum image size, a maximum frame rate, a maximum reconstruction sampling rate (measured in units of, for example, mega samples per second), a maximum reference image size, and so forth. In some cases, the limits set by the hierarchy may be further defined by hypothetical reference decoder (Hypothetical Reference Decoder, HRD) specifications and metadata managed by a HRD buffer signaled in the encoded video sequence.
In one embodiment, the receiver (231) may receive encoded video and additional (redundant) data. The additional data may be part of the encoded video sequence. The additional data may be used by the video decoder (210) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, a temporal, spatial, or signal-to-noise ratio (Signal Noise Ratio, SNR) enhancement layer, redundant slices, redundant pictures, forward error correction codes, and the like.
Fig. 3 shows an exemplary block diagram of a video encoder (303). The video encoder (303) is provided in the electronic device (320). The electronic device (320) includes a transmitter (340) (e.g., a transmission circuit). A video encoder (303) may be used in place of the video encoder (103) in the example of fig. 1.
The video encoder (303) may receive video samples from a video source (301) (not part of the electronic device (320) in the example of fig. 3) that may capture video pictures to be encoded by the video encoder (303). In another example, the video source (301) is part of an electronic device (320).
The video source (301) may provide a source video sequence in the form of a stream of digital video samples to be encoded by the video encoder (303), which may have any suitable bit depth (e.g., 8 bits, 10 bits, 12 bits … …), any color space (e.g., bt.601Y CrCB, RGB … …), and any suitable sampling structure (e.g., Y CrCB 4:2:0, Y CrCB 4: 4). In a media service system, a video source (301) may be a storage device that stores previously prepared video. In a video conferencing system, the video source (301) may be a camera that collects local tile information as a video sequence. The video data may be provided as a plurality of individual images which, when viewed in sequence, are given motion. The image itself may be constructed as a spatial pixel array, where each pixel may include one or more samples, depending on the sampling structure, color space, etc. used. The following focuses on describing the sample.
According to an embodiment, the video encoder 303 may encode and compress the images of the source video sequence into an encoded video sequence 343 in real time or under any other temporal constraint desired. Performing the proper encoding speed is a function of the controller (350). In some embodiments, a controller (350) controls and is functionally coupled to other functional units as described below. For simplicity, coupling is not shown. The parameters set by the controller (350) may include rate control related parameters (picture skip, quantizer, lambda value of rate distortion optimization technique, etc.), picture size, picture group (Group of Pictures, GOP) layout, maximum motion vector search range, etc. The controller (350) may be used to have other suitable functions related to the video encoder (303) optimized for a certain system design.
In some embodiments, the video encoder (303) is configured to operate in an encoding loop. As a simple description, in an example, the encoding loop may include a source encoder (330) (e.g., responsible for creating symbols, such as a symbol stream, based on the input image and reference image to be encoded) and a (local) decoder (333) embedded in the video encoder (303). A decoder (333) reconstructs the symbols to create sample data in a similar manner as a (remote) decoder creates sample data. The reconstructed sample stream (sample data) is input to a reference image memory (334). Since decoding of the symbol stream produces a bit-accurate result independent of the decoder location (local or remote), the content in the reference picture store (334) also corresponds bit-accurately between the local encoder and the remote encoder. In other words, the reference picture samples "seen" by the prediction portion of the encoder are exactly the same as the sample values "seen" when the decoder would use prediction during decoding. This reference picture synchronicity rationale (and drift that occurs in the event that synchronicity cannot be maintained due to channel errors, for example) is also used in some related art.
The operation of the "local" decoder (333) may be the same as a "remote" decoder, such as the video decoder (210) already described in detail above in connection with fig. 2. However, referring briefly to fig. 2 in addition, when a symbol is available and the entropy encoder (345) and the decoder (220) are able to losslessly encode/decode the symbol into an encoded video sequence, the entropy decoding portion of the video decoder (210), including the buffer memory (215) and the decoder (220), may not be entirely implemented in the local decoder (333).
In an embodiment, any decoder technique other than parsing/entropy decoding present in a decoder is also present in the corresponding encoder in the same or substantially the same functional form. For this reason, the presently disclosed subject matter focuses on decoder operation. The description of the encoder technique may be simplified because the encoder technique is reciprocal to the fully described decoder technique. A more detailed description is required only in certain areas and is provided below.
During operation, in some examples, the source encoder (330) may perform motion compensated predictive encoding. The motion compensated predictive coding predictively codes an input image with reference to one or more previously coded images from a video sequence designated as "reference images". In this way, the encoding engine (332) encodes differences between blocks of pixels of an input image and blocks of pixels of a reference image that may be selected as a prediction reference for the input image.
The local video decoder (333) may decode encoded video data of an image that may be designated as a reference image based on the symbol created by the source encoder (330). The operation of the encoding engine (332) may advantageously be a lossy process. When encoded video data may be decoded at a video decoder (not shown in fig. 3), the reconstructed video sequence may typically be a copy of the source video sequence with some errors. The local video decoder (333) replicates a decoding process that may be performed on a reference picture by the video decoder and may cause the reconstructed reference picture to be stored in the reference picture memory (334). In this way, the video encoder (303) may locally store a copy of the reconstructed reference picture that has common content (no transmission errors) with the reconstructed reference picture to be obtained by the remote video decoder.
The predictor (335) may perform a predictive search for the encoding engine (332). That is, for a new image to be encoded, the predictor (335) may search the reference image memory (334) for sample data (as candidate reference pixel blocks) or some metadata, such as reference image motion vectors, block shapes, etc., that may be suitable prediction references for the new image. The predictor (335) may operate on a block of samples by block of pixels basis to find a suitable prediction reference. In some cases, based on search results obtained by the predictor (335), it may be determined that the input image may have prediction references derived from a plurality of reference images stored in the reference image memory (334).
The controller (350) may manage the encoding operations of the source encoder (330), including, for example, setting parameters and subgroup parameters for encoding video data.
The outputs of all of the above functional units may be entropy encoded in an entropy encoder (345). An entropy encoder (345) converts symbols generated by the various functional units into an encoded video sequence by applying lossless compression to the symbols according to techniques such as huffman coding, variable length coding, arithmetic coding, etc.
The transmitter (340) may buffer the encoded video sequence created by the entropy encoder (345) in preparation for transmission over a communication channel (360), which may be a hardware/software link to a storage device that is to store encoded video data. The transmitter (340) may combine the encoded video data from the video encoder (303) with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (source not shown).
The controller (350) may manage the operation of the video encoder (303). During encoding, the controller (350) may assign each encoded image a certain encoded image type, but this may affect the encoding techniques applicable to the respective images. For example, images may generally be assigned to any of the following image types:
intra pictures (I pictures), which can be encoded and decoded without using any other pictures in the sequence as prediction sources. Some video codecs allow for different types of intra pictures, including, for example, independent decoder refresh (INDEPENDENT DECODER REFRESH, "IDR") pictures.
A predictive picture (P picture) which can be intra-predicted or inter-predicted using a motion vector and a reference index to predict a sample value of each block, thereby performing encoding and decoding.
Bi-predictive pictures (B-pictures), which can be intra-predicted or inter-predicted using two motion vectors and a reference index to predict the sample value of each block, for encoding and decoding. Similarly, more than two reference pictures and associated metadata may be used by the multiple predictive pictures for reconstructing a single block.
The source image may typically be spatially subdivided into a plurality of sample blocks (e.g., 4 x 4, 8 x 8, 4 x 8, or 16 x 16 samples per block), and encoded block by block. These blocks may be predictive coded with reference to other (coded) blocks, which are determined by the coding allocation applied to the individual pictures of the block. For example, a block of an I picture may be non-predictive coded, or the block may be predictive coded (spatial prediction or intra prediction) with reference to a coded block of the same picture. The pixel blocks of the P picture may be predictively coded by spatial prediction or by temporal prediction with reference to a previously coded reference picture. The block of B pictures may be predictively coded by spatial prediction or by temporal prediction with reference to one or two previously coded reference pictures.
The video encoder (303) may perform encoding operations according to a predetermined video encoding technique or standard such as ITU-T rec.h.265. In its operation, the video encoder (303) may perform various compression operations, including predictive coding operations that exploit temporal redundancy and spatial redundancy in an input video sequence. Thus, the encoded video data may conform to the syntax specified by the video encoding technique or standard used.
In one embodiment, the transmitter (340) may transmit the additional data while transmitting the encoded video. The source encoder (330) may take such data as part of the encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, redundant pictures and slices, other forms of redundant data, SEI messages, VUI parameter set slices, and the like.
The acquired video may be provided as a plurality of source images (video images) in a time series. Intra-picture prediction (often abbreviated as intra-prediction) exploits spatial correlation in a given picture, while inter-picture prediction exploits (temporal or other) correlation between pictures. In an example, a particular image being encoded/decoded is divided into blocks, and the particular image being encoded/decoded is referred to as a current image. When a block in the current image is similar to a reference block in a reference image that has been previously encoded and still buffered in video, the block in the current image may be encoded by a vector called a motion vector. The motion vector points to a reference block in the reference picture and, in case multiple reference pictures are used, the motion vector may have a third dimension identifying the reference picture.
In some embodiments, bi-prediction techniques may be used in inter-picture prediction. According to bi-prediction techniques, two reference pictures are used, such as a first reference picture and a second reference picture, both preceding the current picture in video in decoding order (but possibly in the past and future, respectively, in display order). The block in the current image may be encoded by a first motion vector pointing to a first reference block in the first reference image and a second motion vector pointing to a second reference block in the second reference image. The block may be predicted by a combination of the first reference block and the second reference block.
Furthermore, merge mode techniques may be used for inter-image prediction to improve coding efficiency.
According to some embodiments of the present invention, prediction such as inter-image prediction and intra-image prediction is performed in units of blocks. For example, according to the HEVC standard, pictures in a video picture sequence are partitioned into Coding Tree Units (CTUs) for compression, the CTUs in the pictures having the same size, e.g., 64 x 64 pixels, 32 x 32 pixels, or 16 x 16 pixels. In general, a CTU includes three coding tree blocks (Coding Tree Block, CTBs) that are one luma CTB and two chroma CTBs. Each CTU may also be split into one or more Coding Units (CUs) in a quadtree. For example, a 64×64 pixel CTU may be split into one 64×64 pixel CU, or 432×32 pixel CUs, or 16 16×16 pixel CUs. In an example, each CU is analyzed to determine a prediction type for the CU, such as an inter prediction type or an intra prediction type. Depending on temporal and/or spatial predictability, a CU is split into one or more Prediction Units (PUs). In general, each PU includes a luminance Prediction Block (PB) and two chrominance PB. In an embodiment, a prediction operation in coding (encoding/decoding) is performed in units of prediction blocks. Taking the example of using a luminance prediction block as the prediction block, the prediction block includes a matrix of pixel values (e.g., luminance values), such as 8×8 pixels, 16×16 pixels, 8×16 pixels, 16×8 pixels, and so on.
It should be noted that video encoder (103) and video encoder (303), as well as video decoder (110) and video decoder (210), may be implemented using any suitable technique. In one embodiment, the video encoder (103) and video encoder (303), and the video decoder (110) and video decoder (210) may be implemented using one or more integrated circuits. In another embodiment, the video encoder (103) and video encoder (303), and the video decoder (110) and video decoder (210) may be implemented using one or more processors executing software instructions.
The present disclosure includes a number of aspects related to inheriting Intra Block Copy (IBC) mode information and/or luma IntraTMP mode information for luma of chroma color components, and applying a flipped mode to IntraTMP coded blocks.
IBC, or current image reference (Current Picture Referencing, CPR), is a tool that can be used to improve the coding efficiency of screen content material, for example employed in HEVC extension on screen content coding (Screen Content Coding, SCC). Since the IBC mode can be implemented as a Block-level coding mode, block Matching (BM) can be performed at the encoder to find the best Block vector (or motion vector) for each CU. Here, the block vector may be used to indicate a displacement from the current block to the reference block, wherein the reference block has been reconstructed within the current image. The luma block vector of an IBC-encoded CU may be defined with integer precision. The chroma block vector may also be rounded to integer precision. When combined with an adaptive motion vector resolution (Adaptive Motion Vector Resolution, AMVR), IBC mode can be switched between 1-pixel motion vector accuracy and 4-pixel motion vector accuracy. An IBC-encoded CU may be considered as a third prediction mode in addition to an intra prediction mode or an inter prediction mode. The IBC mode may be applicable to CUs having a width and a height of less than or equal to 64 luma samples.
On the encoder side, hash-based motion estimation may be performed for IBC. The encoder may perform a Rate Distortion (RD) check on blocks of no more than 16 luma samples in width or height. For the non-merge mode, a block vector search may be performed first using a hash-based search. If the hash-based search does not return valid candidates, a local search based on block matching may be performed.
In hash-based searches, the hash key match (32-bit CRC) between the current block and the reference block may be extended to the allowed block size. The hash key calculation for each location in the current image may be based on a4 x 4 sub-block. For a larger current block, when all hash keys of all 4 x 4 sub-blocks match the hash key of the corresponding reference position, the hash key may be determined to match the hash key of the reference block. If the hash key of the plurality of reference blocks is found to match the hash key of the current block, a block vector cost for each matching reference block may be calculated and the lowest cost matching reference block may be selected.
In the block matching search, a search range may be set to cover the previous CTU and the current CTU. At the CU level, the IBC mode may be signaled using a flag, and may be signaled as IBC advanced motion vector prediction (Advanced Motion Vector Prediction, AMVP) mode or IBC skip/merge mode. Examples of AMVP mode and IBC skip/merge mode are as follows:
(1) IBC skip/merge mode: the merge candidate index may be used to indicate which block vector from the list of neighboring candidate IBC-encoded blocks is used to predict the current block. The merge list may include spatial, history-based MVPs (HMVP) and paired candidates.
(2) IBC AMVP mode: the block vector differences may be encoded in the same way as the motion vector differences. The block vector prediction method may use two candidates as predictors, one from the left neighbor and one from the top neighbor (if IBC coded). When the left neighboring candidate or the upper neighboring candidate is not available, then the default block vector may be used as a predictor. A flag may be signaled to indicate a block vector predictor index.
To reduce memory consumption and decoder complexity, IBC may be applied to the reconstructed portion of a predefined region that includes one region of the current CTU and some regions of the left CTU. Fig. 4A, 4B, 4C, and 4D illustrate exemplary reference regions of IBC mode, where each block may represent a 64×64 luma sampling unit.
Depending on the position of the current coding CU in the current CTU, the reference region of the IBC mode may be defined as follows:
(1) As shown in fig. 4A, if the current block falls into the upper left 64×64 block (402) of the current CTU (400A), the current block may refer to reference samples in the lower right 64×64 block (406) of the left CTU (400B) using the CPR mode in addition to samples already reconstructed in the current CTU. The current block may also reference the reference samples in the lower left 64 x 64 block (408) and the upper right 64 x 64 block (404) of the left CTU using CPR mode.
(2) As shown in fig. 4B, if the current block falls into the upper right 64×64 block (410) of the current CTU, the current block may refer to reference samples in the lower left 64×64 block (416) and the lower right 64×64 block (414) of the left CTU using the CPR mode if the luminance position (0, 64) with respect to the current CTU has not been reconstructed in addition to the samples already reconstructed in the current CTU. Otherwise, the current block may also refer to reference samples in the lower right 64×64 block (414) of the left CTU.
(3) As shown in fig. 4C, if the current block falls into the lower left 64×64 block (418) of the current CTU, the current block may refer to reference samples in the upper right 64×64 block (420) of the current CTU and the lower right 64×64 block (424) of the left CTU using the CPR mode if the luminance position (64, 0) with respect to the current CTU has not been reconstructed in addition to the samples already reconstructed in the current CTU. Otherwise, the current block may also reference the reference samples in the lower right 64 x 64 block (424) of the left CTU using CPR mode.
(4) As shown in fig. 4D, if the current block falls into the lower right 64 x 64 block (426) of the current CTU (400C), the current block may refer to only the samples that have been reconstructed in the current CTU using the CPR mode.
This limitation allows the IBC mode for hardware implementation to be implemented using local on-chip memory.
For IBC encoded blocks, e.g. in JVET-AA0070, a reconstruct-reorder IBC (RR-IBC) mode may be allowed. When applying RR-IBC, the samples in the reconstructed block may be flipped according to the flip type of the current block. On the encoder side, the original block (or current block) may be flipped before the motion search and residual calculation are performed on the original block, while the prediction block may be derived without flipping. At the decoder side, the reconstructed block may be flipped back to recover the original block.
For RR-IBC encoded blocks, two flipping methods, horizontal flipping and vertical flipping, can be supported. A syntax flag may first be signaled for the IBC AMVP encoding block, indicating whether the reconstructed block is to be flipped. If the syntax flag indicates that the reconstructed block is to be flipped, another flag may be further signaled specifying the type of flip (e.g., vertical flip or horizontal flip). For IBC merging, the flip type may inherit from neighboring blocks without syntax signaling. The current block and the reference block may be generally aligned horizontally or vertically in view of horizontal symmetry or vertical symmetry. Thus, when horizontal flipping is applied, the vertical component of the Block Vector (BV) may not be signaled and may be inferred to be equal to 0. Similarly, when vertical flipping is applied, the horizontal component of the BV may not be signaled and may be inferred to be equal to 0.
To better exploit the symmetry properties, a roll-over aware BV adjustment method may be applied to refine block vector candidates. Fig. 5 illustrates an exemplary BV adjustment based on horizontal flip, and fig. 6 illustrates an exemplary BV adjustment based on vertical flip. As shown in fig. 5 and 6, (x nbr,ynbr) and (x cur,ycur) may represent coordinates of center samples of the neighboring block (e.g., (504) or (604)) and the current block (e.g., (502) or (602)), respectively. BV nbr and BV cur may represent BVs of the neighboring block (504) and the current block (502), respectively. In the horizontal flip shown in fig. 5, rather than inheriting BV directly from the neighboring block (504), in the case of encoding the neighboring block (504) using horizontal flip, the motion offset may be added by adding a motion offset to the horizontal component of BV nbr (denoted BV nbr h) of the neighboring block (504), The horizontal component of BV cur of the current block (502) is calculated. Thus, the horizontal component of BV cur of the current block (502) may be defined as BV cur h=2(xnbr-xcur)+BVnbr h. similarly, in the case of the vertical flip shown in fig. 6, where the neighboring block (604) is encoded using the vertical flip, the motion offset may be added by adding the motion offset to the vertical component of BV nbr (denoted BV nbr v) of the neighboring block (604), The vertical component of BV cur of the current block (602) is calculated. Thus, the vertical component of BV cur of the current block (602) may be defined as BV cur v=2(ynbr-ycur)+BVnbr v.
The intra template matching Prediction (INTRA TEMPLATE MATCHING Prediction, intraTMP) may be an intra Prediction mode for the current block, for example in ECM software. IntraTMP may copy the best prediction block (e.g., the prediction block with the smallest difference from the current block) from the reconstructed portion of the current frame, where the L-shaped template of the reconstructed portion matches the current template of the current block. For a predefined search range, the encoder may search for a template most similar to the current template in the reconstructed portion of the current frame and use the corresponding block as a prediction block for the current block. The encoder may then signal IntraTMP the use of the mode and may perform the same prediction operation at the decoder side.
The prediction signal may be generated by matching an L-shaped causal neighboring block of the current block with another block in the predefined search area. An exemplary IntraTMP process is shown in fig. 7. As shown in fig. 7, a current block (702) in a current frame (704) may include an L-shaped template (706) adjacent to the current block (702). The search area for determining the reference block of the current block (702) may include (1) R1: current CTU, (2) R2: upper left CTU, (3) R3: upper CTU, (4) R4: left CTU. In the search process, a cost function, such as sum of absolute differences (Sum of Absolute Difference, SAD), may be used. Within each region (e.g., R1, R2, R3, and R4), the decoder may search for a template (e.g., (710)) having a minimum SAD relative to the current template (e.g., template (706)). A block (e.g., 708) corresponding to the template (e.g., 710)) may be used as the prediction block. The dimensions of all regions (e.g., search range_w, search range_h) may be set to be proportional to the block dimensions (e.g., blkW, blkH) of the current block (402) to have a fixed number of SAD comparisons for each pixel. For example, the dimension of the search area may be defined as the following equations (1) and (2):
Search range_w=a BlkW equation (1)
Search Range_h=a BlkH equation (2)
Where a may be a constant that controls the gain/complexity tradeoff. In one example, a is equal to 5.
An intra template matching tool may be enabled for CUs having a width and/or height less than or equal to 64. The maximum CU size for intra template matching can be configured.
When Decoder-SIDE INTRA Mode derivative (DIMD) cannot be used for the current CU, the intra template matching prediction Mode can be signaled at the CU level by a dedicated flag.
Although IntraTMP mode allows template matching with the L-type template, as described above, introducing multiple types of templates for template matching may be advantageous for improving the accuracy of template matching, thereby improving coding performance.
When the partition tree between the luminance component and the chrominance component is different, for example, in VVC, the IBC mode may not be applied to the chrominance color component. However, IBC may be applied to the chrominance components to provide coding gain. When IBC is applied to a chrominance component, e.g., on top of an ECM, it may be desirable to design how to reuse luminance IBC mode information (e.g., IBC flip mode) for the chrominance component.
In IntraTMP mode, assuming that there is no rollover between the reference block and the current block, this may limit the coding performance of IntraTMP mode.
In the present disclosure, a reconstruct-reorder (or flip) mode may be applied to intra template matching (i.e., RR-ITM). For a current block predicted using intra template matching, when applying RR-ITM, samples in a reconstructed block of the current block may be flipped according to the flip type of the current block. For example, on the encoder side, the original block (or current block) may be flipped before motion search and residual calculation, while the prediction block may be derived without flipping. At the decoder side, the reconstructed block may be flipped back to recover the original block.
In an example, on the encoder side, a flip mode, such as horizontal flip or vertical flip, may be determined for a current block in a current image. The current block may be flipped (e.g., vertically flipped or horizontally flipped) in the current image such that the locations of the samples of the current block are flipped within the current block based on the determined flipping pattern. Based on the template matching (TEMPLATE MATCHING, TM) costs, a prediction block may be determined from a plurality of candidate prediction blocks in the current image that are used to flip the current block. The TM cost may indicate a difference between the template of the flipped current block and the respective templates of the plurality of candidate prediction blocks. Furthermore, the current block may be encoded based on the determined prediction block. In an example, a residual block indicating a difference between the flipped current block and the determined prediction block may be encoded. In an example, the determined prediction block may also be flipped based on a flipped mode. A residual block indicating a difference between the flipped current block and the flipped predicted block may be encoded.
In an example, the flip mode may include a vertical flip configured to adjust a position of a sample of a block (e.g., a current block, a reconstructed block of the current block, a predicted block of the current block, or a reference block of the current block) such that an upper portion and a lower portion of the block are inverted within the block relative to a horizontal split line, wherein the block is divided equally into the upper portion and the lower portion by the horizontal split line. In an example, the flip mode may include a horizontal flip configured to adjust a position of a sample of the block such that left and right portions of the block are inverted within the block relative to a vertical split line, wherein the block is divided equally into left and right portions by the vertical split line.
In an example, at the decoder side, a video bitstream including encoding information of a current block in a current image may be received. The encoding information may indicate that the current block is encoded by a flip mode in which a position of a sample of the current block is flipped within the current block when the current block is encoded in the encoder. Based on the Template Matching (TM) cost, a reference block may be determined from a plurality of candidate reference blocks for the current block in the reconstructed region of the current image. The TM cost may indicate a difference between the template of the current block and respective templates of the plurality of candidate reference blocks. A reconstructed block of the current block may be determined based on the determined reference block. The current block may be reconstructed by adjusting the positions of the samples of the reconstructed block based on the flipped mode. In an example, the reconstructed block may be determined as the sum of the determined reference block and the residual block. The residual block may be obtained from the encoded information. In an example, the determined reference block may first be flipped according to a flipped pattern. The reconstructed block may be determined as the sum of the flipped reference block and the residual block.
In an example, a plurality of candidate reference blocks may be determined within a search region in a reconstruction region of a current image. A TM cost between the template of the current block and the template of each of the plurality of candidate reference blocks may be determined. The reference block may be determined from a plurality of candidate reference blocks, the reference block corresponding to a minimum TM cost among TM costs between a template of the current block and templates of the plurality of candidate reference blocks.
In one aspect, multiple flipping methods may be supported for an RR-ITM encoded block. In one example, the plurality of flipping methods includes two flipping methods, horizontal flipping and vertical flipping. In one aspect, one or more syntax elements may be signaled to indicate whether the reconstructed block is flipped and/or to indicate a type of flip. In an example, a syntax flag may first be signaled to indicate whether to flip the reconstructed block. If the syntax flag indicates that the reconstructed block is to be flipped, another flag may be further signaled to specify a flip type (e.g., horizontal flip or vertical flip).
In an example, the first encoding information (e.g., a first syntax flag) of the received video bitstream may indicate whether a flip mode is applied to the reconstructed block. The type of the flip mode is determined based on second encoding information (e.g., a second syntax flag) of the received video bitstream in response to the first encoding information indicating that the flip mode is to be applied to the reconstructed block. Based on the determined type of the flip mode, the current block may be reconstructed by adjusting the positions of the samples of the reconstructed block.
In one aspect, one or more different search scopes may be applied in the event of a flip. In an example, when template matching is applied, different search ranges of the template matching process may be applied for horizontal flip and vertical flip.
In one aspect, for a horizontal flip mode, the vertical extent of the template matching search candidate location (or the candidate search location for template matching) may be less than the horizontal extent. For example, the search candidate position in the horizontal flip may have the same vertical coordinates as the current block. In one aspect, for a vertical flip mode, the horizontal extent of the template matching search candidate location (or the candidate search location for template matching) may be less than the vertical extent. For example, the search candidate location may have the same horizontal coordinates as the current block.
In one example, the vertical extent of the search area is less than the horizontal extent of the search area based on the flip mode being a horizontal flip. In one example, the vertical extent of the search area is greater than the horizontal extent of the search area based on the flip mode being a vertical flip.
In one aspect, the horizontal flip and/or vertical flip may have the same search candidates as the non-flipped mode.
In one aspect, when different coding block partitions are applied to a luminance component and a chrominance component of an image, and if IBC mode is applied to a luminance region, the related mode information of the luminance region may be reused for predicting the collocated chrominance color component (or chrominance block). Further, a luminance region corresponding to a chroma block may cover more than one luma coded block, and the boundary of the luma block may not be completely aligned with the boundary of the chroma block.
In one aspect, the mode information reused for chroma may include, but is not limited to: IBC flip mode (e.g., horizontal flip or vertical flip), block Vector (BV), IBC rotation mode (e.g., rotation angle), and IBC geometric segmentation mode (e.g., segmentation mode). In one aspect, when luminance pattern information is applied on chroma samples, the luminance pattern information used for calculation, such as BV components, size-related parameters, and distance-related parameters, may be scaled according to the chroma sampling format.
In one aspect, for some IBC modes, a chroma block may not inherit IBC mode information from a luma region (or luma block). Exemplary IBC modes may include, but are not limited to, IBC horizontal flip mode and/or IBC vertical flip mode.
In the present disclosure, it may be determined whether a characteristic value of a collocated luma block of a current chroma block is greater than a predefined threshold. The characteristic value may be associated with one or more predefined coding modes applied to the collocated luma block. Based on the feature value being greater than the predefined threshold, chroma coding mode information for the current chroma block may be determined from luma coding mode information for the predefined luma block. In one example, the characteristic value indicates one of: (i) a number of luma co-located block positions in the co-located luma block encoded by the one or more predefined coding modes, (ii) a size of a luma region in the co-located luma block encoded by the one or more predefined coding modes, and (iii) a ratio of luma regions in the co-located luma block encoded by the one or more predefined coding modes. In an example, the one or more predefined coding modes may include at least one of: intra Block Copy (IBC) flip mode, IBC mode indicated by Block Vectors (BV), IBC rotation mode, and IBC geometry segmentation mode. In an example, chroma coding mode information for a current chroma block may be determined based on scaled luma coding mode information for a predefined luma block according to a scaling factor, wherein the scaling factor is determined based on a sampling format of the current chroma block.
In one aspect, a set of luma co-located block positions may be predefined for the current chroma block. If the luminance positions encoded by one or more predefined IBC modes exceeds thr1, IBC mode information associated with the predefined luminance block may be inherited for the current chroma block. thr1 may be a predefined threshold or may be signaled, for example, by a high level syntax (e.g., at sequence level, image level, slice level, etc.). In an example, the set of luma co-located block positions may include, but is not limited to, a center position and four angular positions of a co-located luma block of the current chroma block.
In one aspect, IBC mode information associated with a first IBC encoded block along a given scan order may be acquired and reused for collocated chroma blocks. In one aspect, IBC mode information that is most commonly used in predefined luma co-located block locations may be acquired and reused for co-located chroma blocks. In one aspect, first IBC mode information that has been used N or more times along a given scan order may be acquired and reused for collocated chroma blocks. Exemplary values of N may be integers such as 1, 2,3, or 4.
In one aspect, in a collocated luma block region of a current chroma block, if a feature value (e.g., a luma region size covered by one or more predefined IBC modes) is greater than thr2, IBC mode information associated with the predefined luma block may be inherited for the current chroma block. thr2 may be a predefined threshold or may be signaled, for example, by a high level syntax (e.g., at sequence level, image level, slice level, etc.).
In one aspect, IBC mode information associated with a first IBC encoded block along a given scan order may be acquired and reused for collocated chroma blocks. In one aspect, IBC mode information that is most commonly used in predefined luma co-located block locations may be acquired and reused for co-located chroma blocks. In one aspect, first IBC mode information that has been used N or more times along a given scan order may be acquired and reused for collocated chroma blocks. Exemplary values of N may be positive integers, such as 1, 2,3, or 4.
In one aspect, in the collocated luma block region, if the feature value (e.g., the ratio of the collocated luma block region covered by one or more predefined IBC modes) is greater than thr3, IBC mode information associated with the predefined luma blocks may be inherited for the chroma blocks. thr3 may be a predefined threshold or may be signaled, for example, by a high level syntax (e.g., at sequence level, image level, slice level, etc.).
In one aspect, IBC mode information associated with a first IBC encoded block along a given scan order may be acquired and reused for collocated chroma blocks. In one aspect, IBC mode information that is most commonly used in predefined luma co-located block locations may be acquired and reused for co-located chroma blocks. In one aspect, first IBC mode information that has been used N or more times along a given scan order may be acquired and reused for collocated chroma blocks. Exemplary values of N may be positive integers, such as 1, 2,3, or 4.
In an example, chroma coding mode information for a current chroma block may be determined based on one of: the method includes (i) luma coding mode information of a first luma block that is coded along a predefined scanning order by one of one or more predefined coding modes, (ii) luma coding mode information of a most-used one of luma co-located block positions coded by the one or more predefined coding modes, and (iii) first luma coding mode information of the one or more predefined coding modes that is used N times or more along the predefined scanning order, wherein N may be a positive integer.
In one aspect, specific juxtaposed brightness positions may be predefined. If the coding mode associated with the sample at the predefined luma position is coded by the IBC mode, the IBC mode information associated with the sample may be inherited for the chroma block. Examples of specific collocated luminance locations may include, but are not limited to, a center location of a collocated luminance block of a chroma block, or an upper left location of a collocated luminance block.
In an example, chroma coding mode information for a current chroma block may be determined based on luma coding mode information for samples at predefined collocated luma positions. The samples at the predefined collocated luminance locations may be encoded by one of one or more predefined encoding modes, and the predefined collocated luminance locations may include one of a center location and an upper left corner location of the collocated luminance block.
In one aspect, when different coded block partitions are applied to the luma component and the collocated chroma component of the image, and if the ITM mode (or IntraTMP) is applied to the luma component, the relevant mode information for the luma component may be reused to predict the collocated chroma color component.
In one aspect, the mode information reused for the chroma block may include, but is not limited to: intraTMP flip mode (e.g., horizontal flip or vertical flip), displacement vector for IntraTMP.
In one aspect, for some IntraTMP modes, the chroma block may not inherit IntraTMP mode information from the luma block. Exemplary IntraTMP modes may include, but are not limited to, intraTMP horizontal flip mode and/or IntraTMP vertical flip mode.
In one aspect, a set of luma co-located block positions may be predefined for the current chroma block. If luma concatenated block positions encoded by one or more predefined IntraTMP modes exceed thr1 numbers, intraTMP mode information associated with the predefined luma block may be inherited for the current chroma block. thr1 may be a predefined threshold or may be signaled, for example, by a high level syntax (e.g., at sequence level, image level, slice level, etc.). In an example, the set of luma co-located block positions may include, but is not limited to, a center position and four angular positions of a co-located luma block of the current chroma block.
In an example, the one or more predefined encoding modes include at least one of an intra template matching prediction (IntraTMP) flip mode and a IntraTMP mode indicated by the displacement vector.
In one aspect, intraTMP mode information associated with a first IntraTMP encoded block along a given scan order may be acquired and reused for collocated chroma blocks. In one aspect, intraTMP mode information that is most commonly used in predefined luma co-located block locations may be acquired and reused for co-located chroma blocks. In one aspect, the first IntraTMP mode information that has been used N or more times along a given scan order may be acquired and reused for juxtaposing chroma blocks. Exemplary values of N may be positive integers, such as 1,2,3, or 4.
In one aspect, in a collocated luma block region of a chroma block, if the luma region size covered by one or more predefined IntraTMP modes is greater than thr2, then IntraTMP mode information associated with the predefined luma block may be inherited for the chroma block. thr2 may be a predefined threshold or may be signaled, for example, by a high level syntax (e.g., at the sequence level, image level, slice level, etc.).
In one aspect, intraTMP mode information associated with a first IntraTMP encoded block along a given scan order may be acquired and reused for collocated chroma blocks. In one aspect, intraTMP mode information that is most commonly used in predefined luma co-located block locations may be acquired and reused for co-located chroma blocks. In one aspect, the first IntraTMP mode information that has been used N or more times along a given scan order may be acquired and reused for juxtaposing chroma blocks. Exemplary values of N may be positive integers, such as 1, 2, 3, or 4.
In one aspect, in the collocated luma block region of the chroma block, if the ratio of the size of the collocated luma block region covered by one or more predefined IntraTMP modes is greater than thr2, then IntraTMP mode information associated with the predefined luma blocks may be inherited for the chroma block. thr2 may be a predefined threshold or may be signaled, for example, by a high level syntax (e.g., at sequence level, image level, slice level, etc.).
In one aspect, intraTMP mode information associated with a first IntraTMP encoded block along a given scan order may be acquired and reused for collocated chroma blocks. In one aspect, intraTMP mode information that is most commonly used in predefined luma co-located block locations may be acquired and reused for co-located chroma blocks. In one aspect, the first IntraTMP mode information that has been used N or more times along a given scan order may be acquired and reused for juxtaposing chroma blocks. Exemplary values of N may be positive integers, such as 1, 2, 3, or 4.
In one aspect, search candidates pointed to by block vectors of associated luma IntraTMP blocks (scaling factors with sub-sampling ratios) may be searched to find the best (or selected) template match with the smallest template matching cost. In an example, the associated luminance IntraTMP block may include: (i) a first IntraTMP encoded block along a predefined scan order, (ii) a collocated luma block, and (iii) a IntraTMP encoded block corresponding to first IntraTMP mode information that is used N or more times.
In one aspect, the flip mode of each search candidate may be inherited from the flip mode of the associated IntraTMP luma block. In one aspect, template matching may be performed on all possible flip patterns (e.g., vertical flip or horizontal flip) for each candidate to find the best (or selected) flip pattern with the smallest template matching cost. In one aspect, a flipped pattern may be signaled for each chroma coding unit.
In an example, a plurality of candidate reference chroma blocks may be determined within a search range in a current image indicated by a Block Vector (BV) of the current chroma block based on one or more predefined coding modes including IntraTMP modes. The BV of the current chroma block may be determined based on one of the plurality of associated luma blocks. The plurality of associated luminance blocks includes: (i) a first IntraTMP encoded block along a predefined scan order, (ii) a collocated luma block, and (iii) a IntraTMP encoded block corresponding to the first IntraTMP mode information used N or more times along the predefined scan order. A plurality of flipped modes may be determined for each of a plurality of candidate reference chroma blocks. The TM cost between the template of each candidate reference chroma block of the plurality of candidate reference chroma blocks and the template of the current chroma block may be determined based on each of the plurality of flipped modes. From the plurality of flipped modes, a flipped mode may be determined from the plurality of flipped modes that corresponds to a minimum TM cost of TM costs between the template of each candidate reference chroma block of the plurality of candidate reference chroma blocks and the template of the current chroma block. The flipped mode of the current chroma block may be determined as a flipped mode determined from a plurality of flipped modes.
In an example, the BV of the current chroma block may be determined as a scaled BV of one of the plurality of associated luma blocks based on a scaling factor, wherein the scaling factor may be determined based on a sub-sampling ratio associated with the current chroma block.
In one aspect, specific juxtaposed brightness positions may be predefined. If the coding mode associated with the sample at the predefined luma position is coded by IntraTMP modes, then IntraTMP mode information associated with the sample may be inherited for the chroma block. In an example, the particular collocated luma location may include, but is not limited to, a center location of the collocated luma region of the current chroma block.
Fig. 8 shows a flowchart outlining a process (800) according to one embodiment of the present disclosure. The process (800) may be used for a video decoder. In various embodiments, the process (800) is performed by processing circuitry, such as processing circuitry that performs the functions of the video decoder (110), processing circuitry that performs the functions of the video decoder (210), and so forth. In some embodiments, the process (800) may be implemented by software instructions, so that when the processing circuitry executes the software instructions, the processing circuitry may perform the process (800). The process starts (S801) and proceeds to step (S810).
At (S810), a video bitstream is received, the video bitstream including encoding information of a current block in a current picture. The encoding information indicates that the current block is encoded by a flip mode in which a position of a sample of the current block is adjusted within the current block.
At (S820), a reference block is determined for the current block from a plurality of candidate reference blocks in a reconstructed region of the current image based on a Template Matching (TM) cost. The TM cost indicates a difference between the template of the current block and respective templates of the plurality of candidate reference blocks.
At (S830), a reconstructed block of the current block is determined based on the determined reference block.
At (S840), the current block is reconstructed by adjusting the positions of the samples of the reconstructed block within the reconstructed block based on the flipped pattern.
In one example, the flip mode includes one of: (i) A vertical flip mode configured to adjust positions of samples of the current block such that upper and lower portions of the current block are inverted within the current block, and (ii) a horizontal flip mode configured to adjust positions of samples of the current block such that left and right portions of the current block are inverted within the current block.
In one example, first encoded information is received from a received video bitstream. The first encoding information indicates whether a roll-over mode is applied to the reconstructed block. The type of the flipped mode is determined based on the second encoded information in the received video bitstream in response to the first encoded information indicating that the flipped mode is to be applied to the reconstructed block. Based on the determined type of the flip mode, the current block is reconstructed by adjusting the positions of the samples of the reconstructed block.
In an example, a plurality of candidate reference blocks are determined within a search region in a reconstruction region of a current image. A TM cost is determined between the template of the current block and the template of each of the plurality of candidate reference blocks. A reference block is determined from the plurality of candidate reference blocks, the reference block corresponding to a minimum TM cost of TM costs between the template of the current block and the templates of the plurality of candidate reference blocks.
In one example, the vertical extent of the search area is less than the horizontal extent of the search area based on the flip mode being a horizontal flip. In one example, the vertical extent of the search area is greater than the horizontal extent of the search area based on the flip mode being a vertical flip.
In one example, the determined reference block is further flipped by adjusting the position of the samples of the determined reference block within the determined reference block based on the flipped pattern. A reconstructed block is determined based on the flipped reference block.
Then, the process proceeds to (S899) and terminates.
The process (800) may be suitably adapted. Steps in process (800) may modify and/or omit one or more steps in process (800). One or more additional steps may be added. May be implemented using any suitable order.
Fig. 9 shows a flowchart outlining a process (900) according to one embodiment of the present disclosure. The process (900) may be used in a video encoder. In various embodiments, the process (900) is performed by processing circuitry, such as processing circuitry that performs the functions of the video encoder (103), processing circuitry that performs the functions of the video encoder (303), and so forth. In some embodiments, the process (900) may be implemented by software instructions, so that when the processing circuitry executes the software instructions, the processing circuitry may perform the process (900). The process starts from (S901), and proceeds to step (S910).
At (S910), a current block in a current image is flipped by a flipping mode in which a position of a sample of the current block is adjusted within the current block.
At (S920), a reference block is determined for the flipped current block from a plurality of candidate reference blocks in a reconstruction region of the current image based on a Template Matching (TM) cost, wherein the TM cost indicates a difference between a template of the flipped current block and respective templates of the plurality of candidate reference blocks.
At (S930), the flipped current block is encoded based on the determined reference block.
At (S940), encoding information indicating that the current block is encoded by the flip mode is signaled.
Then, the process proceeds to (S999) and ends.
The process 900 may be suitably adapted. One or more steps of the process (900) may be modified and/or omitted. One or more additional steps may be added. May be implemented using any suitable order.
Fig. 10 shows a flowchart outlining a process (1000) according to one embodiment of the present disclosure. The process (1000) may be used for a video decoder. In various embodiments, the process (1000) is performed by processing circuitry, such as processing circuitry that performs the functions of the video decoder (110), processing circuitry that performs the functions of the video decoder (210), and so forth. In some embodiments, the process (1000) may be implemented by software instructions, so that when the processing circuit executes the software instructions, the processing circuit may perform the process (1000). The process starts from (S1001), and proceeds to step (S1010).
At (S1010), a video bitstream is determined that includes a current chroma block in a current image and a collocated luma block of the current chroma block.
At (S1020), it is determined whether the characteristic value of the collocated luminance block is greater than a predefined threshold. The characteristic value is associated with one or more predefined coding modes applied to the collocated luma block.
At (S1030), chroma coding mode information of the current chroma block is determined from luma coding mode information of the predefined luma block based on the feature value being greater than the predefined threshold.
At (S1040), the current chroma block is reconstructed based on the determined chroma coding-mode information.
In one example, the characteristic value indicates one of: (i) a number of luma co-located block positions in the co-located luma block encoded by the one or more predefined coding modes, (ii) a size of a luma region in the co-located luma block encoded by the one or more predefined coding modes, and (iii) a ratio of luma regions in the co-located luma block encoded by the one or more predefined coding modes.
In one example, the luminance juxtaposition block positions include a center position and four angular positions of the juxtaposition luminance blocks.
In an example, the one or more predefined coding modes include at least one of: intra Block Copy (IBC) flip mode, IBC mode indicated by Block Vectors (BV), IBC rotation mode, and IBC geometry segmentation mode.
In an example, the one or more predefined encoding modes include at least one of an intra template matching prediction (IntraTMP) flip mode and a IntraTMP mode indicated by the displacement vector.
In an example, chroma coding mode information for a current chroma block is determined based on one of: the method includes (i) luma coding mode information of a first luma block that is coded along a predefined scan order by one of one or more predefined coding modes, (ii) luma coding mode information of a most-used one of luma co-located block positions coded by the one or more predefined coding modes, and (iii) first luma coding mode information of the one or more predefined coding modes that is used N times or more along the predefined scan order, wherein N is a positive integer.
In an example, chroma coding mode information for a current chroma block is determined based on luma coding mode information for samples at predefined collocated luma positions. The samples at the predefined collocated luminance locations are encoded by one of the one or more predefined encoding modes, and the predefined collocated luminance locations include one of a center location and an upper left corner location of the collocated luminance block.
In an example, a plurality of candidate reference chroma blocks are determined within a search range in a current image indicated by a Block Vector (BV) of the current chroma block based on one or more predefined coding modes including IntraTMP modes. The BV of the current chroma block is determined based on one of the plurality of associated luma blocks. The plurality of associated luminance blocks includes: (i) a first IntraTMP encoded block along a predefined scan order, (ii) a collocated luma block, and (iii) a IntraTMP encoded block corresponding to the first IntraTMP mode information used N or more times along the predefined scan order. A plurality of flip modes is determined for each of a plurality of candidate reference chroma blocks. A TM cost between the template of each of the plurality of candidate reference chroma blocks and the template of the current chroma block is determined based on each of the plurality of flip modes. A flipped mode is determined from the plurality of flipped modes according to the plurality of flipped modes, the flipped mode corresponding to a minimum TM cost of TM costs between the template of each of the plurality of candidate reference chroma blocks and the template of the current chroma block. The flipped mode of the current chroma block is determined as a flipped mode determined from the plurality of flipped modes.
In an example, a BV of the current chroma block is determined as a scaled BV of one of the plurality of associated luma blocks based on a scaling factor, wherein the scaling factor is determined based on a sub-sampling ratio associated with the current chroma block.
In an example, chroma coding mode information for a current chroma block is determined based on scaled luma coding mode information for a predefined luma block according to a scaling factor, wherein the scaling factor is determined based on a sampling format of the current chroma block.
Then, the process proceeds to (S1099) and ends.
The process (1000) may be suitably adapted. One or more steps in process (1000) may be modified and/or omitted. One or more additional steps may be added. May be implemented using any suitable order.
FIG. 11 shows a flowchart outlining a process (1100) according to one embodiment of the present disclosure. The process (1100) may be used in a video encoder. In various embodiments, the process (1100) is performed by processing circuitry, e.g., processing circuitry that performs the functions of the video encoder (103), processing circuitry that performs the functions of the video encoder (303), etc. In some embodiments, the process (1100) may be implemented by software instructions, so that when the processing circuitry executes the software instructions, the processing circuitry may perform the process (1100). The process starts (S1101) and proceeds to step (S1110).
At (S1110), it is determined whether a feature value of a collocated luma block of a current chroma block in a current image is greater than a predefined threshold. The characteristic value is associated with one or more predefined coding modes applied to the collocated luma block.
At (S1120), chroma coding mode information of a current chroma block is determined from luma coding mode information of a predefined luma block in the current image based on the feature value being greater than a predefined threshold.
At (S1130), the current chroma block is encoded based on the determined chroma coding-mode information.
Then, the process proceeds to (S1199) and ends.
The process (1100) may be suitably adapted. One or more steps of the process (1100) may be modified and/or omitted. One or more additional steps may be added. May be implemented using any suitable order.
The techniques described above may be implemented as computer software using computer readable instructions and physically stored in one or more computer readable media. For example, FIG. 12 illustrates a computer system (1200) suitable for implementing certain embodiments of the disclosed subject matter.
The computer software may be encoded using any suitable machine code or computer language that may be subject to compilation, linking, or similar mechanisms to create code comprising instructions that may be executed directly by one or more computer central processing units (Central Processing Unit, CPU), graphics processing units (Graphics Processing Unit, GPU), etc., or by decoding, microcode, etc.
The instructions may be executed on various types of computers or components thereof, such as computers or components thereof including personal computers, tablet computers, servers, smartphones, gaming devices, internet of things devices, and the like.
The components shown in fig. 12 for computer system (1200) are exemplary in nature, and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Nor should the configuration of components be construed as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiment of the computer system (1200).
The computer system (1200) may include some human interface input devices. Such human interface input devices may be responsive to one or more human users' inputs by, for example, the following: tactile input (e.g., key strokes, data glove movements), audio input (e.g., voice, clapping hands), visual input (e.g., gestures), olfactory input (not depicted). The human interface device may also be used to obtain certain media that are not necessarily directly related to the conscious input of a person, such as audio (e.g., speech, music, ambient sound), pictures (e.g., scanned pictures, photographic pictures taken from still picture cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).
The input human interface device may include one or more of the following (only one shown in each): a keyboard (1201), a mouse (1202), a touch pad (1203), a touch screen (1210), a data glove (not shown), a joystick (1205), a microphone (1206), a scanner (1207), a camera (1208).
The computer system (1200) may also include some human interface output devices. Such human interface output devices may stimulate one or more human user senses, for example, by tactile output, sound, light, and smell/taste. Such human interface output devices may include haptic output devices (e.g., haptic feedback of a touch screen (1210), data glove (not shown), or joystick (1205), but may also be haptic feedback devices that are not input devices), audio output devices (e.g., speaker (1209), headphones (not depicted)), visual output devices (e.g., screen (1210) including CRT screen, LCD screen, plasma screen, OLED screen, each with or without touch screen input functionality, each with or without haptic feedback functionality, some of which can output two-dimensional visual output or output beyond three dimensions through means such as stereoscopic picture output, virtual reality glasses (not depicted), holographic displays and smoke boxes (not depicted), and printers (not depicted).
The computer system (1200) may also include a human accessible storage device and associated media: such as optical media including CD/DVD ROM/RW (1220) with media (1221) such as CD/DVD, finger drives (1222), removable hard disk drives or solid state drives (1223), conventional magnetic media (not depicted) such as magnetic tape and floppy disk, special ROM/ASIC/PLD based devices (not depicted) such as secure dongles, and the like.
It should also be appreciated by those skilled in the art that the term "computer-readable medium" as used in connection with the presently disclosed subject matter does not encompass transmission media, carrier waves, or other transitory signals.
The computer system (1200) may also include an interface (1254) to one or more communication networks (1255). The network may be, for example, a wireless network, a wired network, an optical network. The network may further be a local network, wide area network, metropolitan area network, vehicle and industrial network, real-time network, delay tolerant network, and the like. Examples of networks include local area networks such as ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., television wired or wireless wide area digital networks including cable television, satellite television and terrestrial broadcast television, vehicle and industrial networks including CANBus, etc. Some networks typically require an external network interface adapter (e.g., a USB port of computer system (1200)) to connect to some general data port or peripheral bus (1249); as described below, other network interfaces are typically integrated into the core of the computer system (1200) by connecting to a system bus (e.g., to an ethernet interface in a PC computer system or to a cellular network interface in a smartphone computer system). The computer system (1200) may communicate with other entities using any of these networks. Such communications may be received only unidirectionally (e.g., broadcast television), transmitted only unidirectionally (e.g., CANbus connected to some CANbus devices), or bi-directional, e.g., connected to other computer systems using a local or wide area network digital network. Certain protocols and protocol stacks may be used on each of those networks and network interfaces as described above.
The human interface device, human accessible storage device, and network interface described above may be attached to a kernel (1240) of a computer system (1200).
The core (1240) may include one or more Central Processing Units (CPUs) (1241), graphics Processing Units (GPUs) (1242), special purpose programmable processing units in the form of Field Programmable Gate Areas (FPGAs) (1243), hardware accelerators (1244) for certain tasks, graphics adapters (1250), and the like. These devices as well as Read Only Memory (ROM) (1245), random access memory (1246), internal mass storage (1247) such as internal non-user accessible hard drives, SSDs, etc. may be connected by a system bus (1248). In some computer systems, the system bus (1248) may be accessed in the form of one or more physical plugs to enable expansion by an additional CPU, GPU, or the like. The peripheral devices may be connected directly to the system bus (1248) of the core or through a peripheral bus (1249) to the system bus (1248) of the core. In an example, a display (1210) may be connected to a graphics adapter (1250). The architecture of the peripheral bus includes PCI, USB, etc.
The CPU (1241), GPU (1242), FPGA (1243) and accelerator (1244) may execute certain instructions, which may be combined to make up the computer code described above. The computer code may be stored in ROM (1245) or RAM (1246). The transition data may also be stored in RAM (1246), while the permanent data may be stored in an internal mass memory (1247), for example. Fast storage and retrieval of any storage device may be achieved through the use of a cache, which may be closely associated with: one or more CPUs (1241), GPUs (1242), mass memory (1247), ROM (1245), RAM (1246), etc.
The computer-readable medium may have thereon computer code for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.
As a non-limiting example, a computer system having an architecture (1200), particularly a kernel (1240), may provide functionality by having one or more processors (including CPU, GPU, FPGA, accelerators, etc.) execute software embodied in one or more tangible computer-readable media. Such computer readable media may be media associated with mass storage accessible to the user as described above, as well as memory of some non-transitory cores (1240), such as internal core mass storage (1247) or ROM (1245). Software implementing various embodiments of the present disclosure may be stored in such devices and executed by the kernel (1240). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (1240), and in particular the processor therein (including CPU, GPU, FPGA, etc.), to perform particular processes or particular portions of particular processes described herein, including defining data structures stored in RAM (1246) and modifying such data structures according to the processes defined by the software. Additionally or alternatively, the computer system may provide functionality through logic hardwired or otherwise embodied in circuitry (e.g., accelerator (1244)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to portions of software may include logic, and vice versa. Where appropriate, references to computer readable medium may include circuitry (e.g., integrated circuit (INTEGRATED CIRCUIT, IC)) storing software for execution, circuitry embodying logic for execution, or both. The present disclosure includes any suitable combination of hardware and software.
The use of "at least one" or "one of" in the present disclosure is intended to include any one of the elements described, or a combination thereof. For example, references to the following are intended to encompass inclusion of a only, B only, C only, or any combination thereof: A. at least one of B or C; A. at least one of B and C; A. at least one of B and/or C; and at least one of A to C. References to one of A or B and one of A and B are intended to encompass A or B or (A and B). Where applicable, the use of "a" or "an" does not exclude any combination of such elements, e.g. when the elements are not mutually exclusive.
While this disclosure has described a number of exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this invention. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope of the disclosure.

Claims (20)

1. A video decoding method, the method comprising:
Receiving a video bitstream, the video bitstream comprising encoding information of a current block in a current image, the encoding information indicating encoding the current block by a flipped mode in which positions of samples of the current block are adjusted within the current block;
determining a reference block for a current block from a plurality of candidate reference blocks in a reconstructed region of the current image based on a Template Matching (TM) cost, the TM cost indicating a difference between templates of the current block and respective templates of the plurality of candidate reference blocks;
determining a reconstructed block of the current block based on the determined reference block; and
And reconstructing the current block by adjusting the position of the sample of the reconstruction block in the reconstruction block based on the flipping mode.
2. The method of claim 1, wherein the flip mode comprises one of: (i) A vertical flip mode configured to adjust positions of samples of the current block such that upper and lower portions of the current block are inverted within the current block, and (ii) a horizontal flip mode configured to adjust positions of samples of the current block such that left and right portions of the current block are inverted within the current block.
3. The method of claim 1, wherein reconstructing the current block further comprises:
Receiving first coding information from the received video bitstream, the first coding information indicating whether to apply the flipped mode to the reconstructed block;
Determining a type of the flipped mode based on second encoding information in the received video bitstream in response to the first encoding information indicating that the flipped mode is to be applied to the reconstructed block; and
Based on the determined type of the flip mode, the current block is reconstructed by adjusting the positions of the samples of the reconstructed block.
4. The method of claim 1, wherein determining the reference block from the plurality of candidate reference blocks further comprises:
determining the plurality of candidate reference blocks in a search area of a reconstruction area of the current image;
Determining a TM cost between the template of the current block and the template of each of the plurality of candidate reference blocks; and
The reference block is determined from the plurality of candidate reference blocks, the reference block corresponding to a minimum TM cost of TM costs between the template of the current block and the templates of the plurality of candidate reference blocks.
5. The method of claim 4, wherein,
Based on the flip mode being a horizontal flip, a vertical extent of the search area is less than a horizontal extent of the search area; and
Based on the flip mode being a vertical flip, a vertical extent of the search area is greater than a horizontal extent of the search area.
6. The method of claim 1, wherein determining the reconstructed block further comprises:
Flipping the determined reference block by adjusting a position of a sample of the determined reference block within the determined reference block based on the flipping pattern; and
The reconstructed block is determined based on the flipped reference block.
7. A video decoding method, the method comprising:
Receiving a video bit stream, wherein the video bit stream comprises a current chroma block in a current image and a juxtaposed luma block of the current chroma block;
Determining whether a characteristic value of the collocated luma block is greater than a predefined threshold value, the characteristic value being associated with one or more predefined coding modes applied to the collocated luma block;
Determining chroma coding mode information of the current chroma block according to the luma coding mode information of a predefined luma block in the current image based on the characteristic value being larger than the predefined threshold value; and
Reconstructing the current chroma block based on the determined chroma coding pattern information.
8. The method of claim 7, wherein the characteristic value indicates one of: (i) a number of luma co-located block positions in the co-located luma block encoded by the one or more predefined coding modes, (ii) a size of a luma region in the co-located luma block encoded by the one or more predefined coding modes, and (iii) a ratio of luma regions in the co-located luma block encoded by the one or more predefined coding modes.
9. The method of claim 8, wherein the luma co-located block positions comprise a center position and four angular positions of the co-located luma block.
10. The method of claim 7, wherein the one or more predefined coding modes comprise at least one of: intra Block Copy (IBC) flip mode, IBC mode indicated by Block Vectors (BV), IBC rotation mode, and IBC geometry segmentation mode.
11. The method of claim 7, wherein the one or more predefined encoding modes include at least one of an intra template matching prediction (IntraTMP) flip mode and a IntraTMP mode indicated by a displacement vector.
12. The method of claim 8, wherein determining the chroma coding mode information further comprises:
Determining the chroma coding mode information for the current chroma block based on one of: (i) luma coding mode information of a first luma block that is coded along a predefined scanning order by one of the one or more predefined coding modes, (ii) luma coding mode information of a most common mode of the luma co-located block positions that are coded by the one or more predefined coding modes, and (iii) first luma coding mode information of the one or more predefined coding modes that is used N or more times along the predefined scanning order, wherein N is a positive integer.
13. The method of claim 7, wherein determining the chroma coding mode information further comprises:
the chroma coding pattern information for the current chroma block is determined based on luma coding pattern information for samples at predefined collocated luma positions, wherein samples at the predefined collocated luma positions are encoded by one of the one or more predefined coding patterns and the predefined collocated luma positions include one of a center position and an upper left corner position of the collocated luma block.
14. The method of claim 11, the method further comprising:
Based on the one or more predefined coding modes including IntraTMP modes,
Determining a plurality of candidate reference chroma blocks within a search range in the current image indicated by a Block Vector (BV) of the current chroma block, wherein the BV of the current chroma block is determined based on one of a plurality of associated luma blocks comprising: (i) a first IntraTMP encoded block along a predefined scan order, (ii) the collocated luma block, and (iii) a IntraTMP encoded block corresponding to first IntraTMP mode information that is used N or more times along the predefined scan order;
Determining a plurality of flipped modes for each candidate reference chroma block of the plurality of candidate reference chroma blocks;
Determining a TM cost between a template of each candidate reference chroma block of the plurality of candidate reference chroma blocks and a template of the current chroma block according to each of the plurality of flipped modes;
Determining a flipped mode from the plurality of flipped modes according to the plurality of flipped modes, the flipped mode corresponding to a minimum TM cost of TM costs between a template of each candidate reference chroma block of the plurality of candidate reference chroma blocks and a template of the current chroma block; and
Determining a flipped mode of the current chroma block as the flipped mode determined from the plurality of flipped modes.
15. The method of claim 14, wherein the BV of the current chroma block is determined as a scaled BV of one of the plurality of associated luma blocks based on a scaling factor determined based on a sub-sampling ratio associated with the current chroma block.
16. The method of claim 7, wherein determining the chroma coding mode information further comprises:
The chroma coding mode information for the current chroma block is determined based on scaled luma coding mode information for the predefined luma block according to a scaling factor, the scaling factor being determined based on a sampling format of the current chroma block.
17. An apparatus, the apparatus comprising:
Processing circuitry configured to:
Receiving a video bitstream, the video bitstream comprising encoding information of a current block in a current image, the encoding information indicating encoding the current block by a flipped mode in which positions of samples of the current block are adjusted within the current block;
determining a reference block for a current block from a plurality of candidate reference blocks in a reconstructed region of the current image based on a Template Matching (TM) cost, the TM cost indicating a difference between templates of the current block and respective templates of the plurality of candidate reference blocks;
determining a reconstructed block of the current block based on the determined reference block; and
And reconstructing the current block by adjusting the position of the sample of the reconstruction block in the reconstruction block based on the flipping mode.
18. The apparatus of claim 17, wherein the flip mode comprises one of: (i) A vertical flip mode configured to adjust positions of samples of the current block such that upper and lower portions of the current block are inverted within the current block, and (ii) a horizontal flip mode configured to adjust positions of samples of the current block such that left and right portions of the current block are inverted within the current block.
19. The apparatus of claim 17, wherein the processing circuit is further configured to:
Receiving first coding information from the received video bitstream, the first coding information indicating whether to apply the flipped mode to the reconstructed block;
Determining a type of the flipped mode based on second encoding information in the received video bitstream in response to the first encoding information indicating that the flipped mode is to be applied to the reconstructed block; and
Based on the determined type of the flip mode, the current block is reconstructed by adjusting the positions of the samples of the reconstructed block.
20. The apparatus of claim 17, wherein the processing circuit is further configured to:
determining the plurality of candidate reference blocks in a search area of a reconstruction area of the current image;
Determining a TM cost between the template of the current block and the template of each of the plurality of candidate reference blocks; and
The reference block is determined from the plurality of candidate reference blocks, the reference block corresponding to a minimum TM cost of TM costs between the template of the current block and the templates of the plurality of candidate reference blocks.
CN202380014496.5A 2022-10-24 2023-10-24 Flip mode for chroma and intra template matching Pending CN118251887A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/418,944 2022-10-24
US18/382,922 2023-10-23
US18/382,922 US20240236324A9 (en) 2023-10-23 Flipping mode for chroma and intra template matching
PCT/US2023/077597 WO2024091913A1 (en) 2022-10-24 2023-10-24 Flipping mode for chroma and intra template matching

Publications (1)

Publication Number Publication Date
CN118251887A true CN118251887A (en) 2024-06-25

Family

ID=91551420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202380014496.5A Pending CN118251887A (en) 2022-10-24 2023-10-24 Flip mode for chroma and intra template matching

Country Status (1)

Country Link
CN (1) CN118251887A (en)

Similar Documents

Publication Publication Date Title
CN112889269B (en) Video decoding method and device
CN112235572B (en) Video decoding method and apparatus, computer device, and storage medium
CN112106371B (en) Method, device, readable medium and equipment for video decoding
JP7391121B2 (en) Methods, apparatus, computer programs, and non-transitory computer-readable media for video encoding and decoding
CN113287307A (en) Video coding and decoding method and device
CN110708541B (en) Video decoding method, video decoder, apparatus and storage medium
CN110719469A (en) Video encoding and decoding method, device and storage medium
CN110708557B (en) Video decoding method and device in decoder, and computer device
CN111316639B (en) Video decoding method and device and storage medium
CN112640464A (en) Motion vector prediction method and apparatus using spatial and temporal merging
CN113875234A (en) Video coding and decoding method and device
CN111971965A (en) Video coding and decoding method and device
CN113647100A (en) Method and apparatus for signaling prediction candidate list size
CN113632465A (en) Method and apparatus for signaling predictor candidate list size for intra block compensation
JP2023040204A (en) Method, apparatus and program for video decoding
CN113228631A (en) Video coding and decoding method and device
CN113348668B (en) Video decoding method, device and storage medium
KR20210124474A (en) Method and apparatus for video coding
CN116803076A (en) Intra mode constraint for geometric partition modes with inter and intra prediction
CN114762333A (en) Method and device for encoding and decoding video
JP7483025B2 (en) METHOD AND APPARATUS FOR VIDEO CODING - Patent application
US20240022711A1 (en) Block vector refinement method for intra block copy
CN118251887A (en) Flip mode for chroma and intra template matching
US20240022739A1 (en) Ibc chroma block vector derivation from luma block vectors
US20230075788A1 (en) Template matching on ibc merge candidates

Legal Events

Date Code Title Description
PB01 Publication