US20250240445A1 - Video coding and decoding - Google Patents
Video coding and decodingInfo
- Publication number
- US20250240445A1 US20250240445A1 US18/855,522 US202318855522A US2025240445A1 US 20250240445 A1 US20250240445 A1 US 20250240445A1 US 202318855522 A US202318855522 A US 202318855522A US 2025240445 A1 US2025240445 A1 US 2025240445A1
- Authority
- US
- United States
- Prior art keywords
- candidates
- list
- motion vector
- canceled
- vector predictor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
Definitions
- the present invention relates to video coding and decoding.
- said at least one candidate is a bi-directional candidate with two spatially matched templates, and the cost of said bi-directional candidate is computed only using the available template.
- a method of generating a list of motion vector predictor candidates for predicting motion in an image portion comprising: adding a plurality of motion vector predictor candidates to a list; wherein at least one candidate in the list is derived from at least one spatially or temporally matched template; wherein templates inside a delimited area relative to said image portion are available and templates outside of the delimited area are non-available; and reordering the list unless at least one template is non-available.
- the reordering is performed unless all of the templates are non-available.
- the method comprises adding a further at least one temporal candidate to the list.
- the method further comprises decreasing a maximum candidate number.
- the cost of a candidate in the list may be computed based on a comparative measure between at least one sample associated with the candidate and at least one another sample.
- the cost for a candidate may be computed based on the difference between a neighboring samples of predictors block and the neighboring samples of a current block.
- the cost for a candidate is computed by calculating a difference of two blocks' predictors.
- the cost for a candidate is computed by calculating a difference with another candidate in the list.
- the other candidate is a most probable candidate.
- the cost is based on sub-sampling of neighbouring or samples of the predictors.
- the cost is based on samples corresponding to an image from another resolution.
- a value of the samples used to compute the cost is pre-processed.
- the cost corresponds to a distortion.
- Said distortion may be a SAD, SATD, SSE or SSIM.
- a variable identifies the first candidate from the second set of motion vector predictor candidates.
- Each motion vector predictor candidate in the second set may be associated with a variable, and the reordering is performed in dependence on these variables.
- the method optionally comprises setting the non-reordered motion vector predictor candidates to the end of the list.
- the method may include performing a second reordering process on the non-reordered motion vector predictor candidates when the first set contains no more than one candidate.
- Said second reordering is, optionally, not applied for subblock merge mode.
- the method further comprises performing a second reordering process on the non-reordered motion vector predictor candidates when the mode has a number of candidates above a threshold.
- FIGS. 6 and 7 show the labelling scheme used to describe blocks situated relative to a current block
- FIGS. 9 ( a ), ( b ), ( c ), ( d ) illustrate the geometric mode
- FIG. 14 illustrates a modification of the first steps of the Merge candidates list derivation shown in FIG. 10 ;
- FIG. 16 illustrates a modification of the derivation of a pairwise candidate shown in FIG. 12 ;
- FIG. 17 illustrates the costs determination of a list candidates
- FIG. 18 illustrates the reordering process of the list of Merge mode candidates
- FIG. 19 illustrates the pairwise candidate derivation during the reordering process of the list of Merge mode candidates
- FIG. 20 illustrates the Merge candidates list derivation of the present invention
- FIG. 21 illustrates the reordering process of the list of Merge mode candidates of the present invention
- FIG. 22 illustrates three examples of templates for a candidate outside an area
- FIG. 25 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments of the present invention.
- FIG. 26 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.
- Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU).
- the maximum size of a PU or TU is equal to the CU size.
- a Prediction Unit corresponds to the partition of the CU for prediction of pixels values.
- Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs.
- a Transform Unit is an elementary unit that is subjected to spatial transformation using DCT.
- a CU can be partitioned into TUs based on a quadtree representation 607 .
- the VPS is a type of parameter set defined in HEVC, and applies to all of the layers of a bitstream.
- a layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer.
- HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
- the data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201 .
- the server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
- the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
- the apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308 , each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300 .
- peripherals such as for example a digital camera 320 or a microphone 308 , each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300 .
- the executable code may be stored either in read only memory 306 , on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously.
- the executable code of the programs can be received by means of the communication network 303 , via the interface 302 , in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304 .
- the apparatus is a programmable apparatus which uses software to implement the invention.
- the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
- Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
- Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405 .
- a reference image from among a set of reference images 416 is selected, and a portion of the reference image, also called reference area or image portion, which is the closest area (closest in terms of pixel value similarity) to the given block to be encoded, is selected by the motion estimation module 404 .
- Motion compensation module 405 then predicts the block to be encoded using the selected area.
- the difference between the selected reference area and the given block, also called a residual block is computed by the motion compensation module 405 .
- the selected reference area is indicated using a motion vector.
- a residual is computed by subtracting the predictor from the original block.
- the encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion.
- an encoding cost criterion such as a rate-distortion criterion.
- a transform such as DCT
- the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409 .
- the encoded residual block of the current block being encoded is inserted into the bitstream 410 .
- FIG. 5 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention.
- the decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300 , a corresponding step of a method implemented by the decoder 60 .
- the decoder 60 receives a bitstream 61 comprising encoded units (e.g. data corresponding to a block or a coding unit), each one being composed of a header containing information on encoding parameters and a body containing the encoded video data.
- encoded units e.g. data corresponding to a block or a coding unit
- the encoded video data is entropy encoded, and the motion vector predictors' indexes are encoded, for a given block, on a predetermined number of bits.
- the received encoded video data is entropy decoded by module 62 .
- the residual data are then dequantized by module 63 and then an inverse transform is applied by module 64 to obtain pixel values.
- the mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks (units/sets/groups) of image data.
- an INTRA predictor is determined by intra prediction module 65 based on the intra prediction mode specified in the bitstream.
- the motion prediction information is extracted from the bitstream so as to find (identify) the reference area used by the encoder.
- the motion prediction information comprises the reference frame index and the motion vector residual.
- the motion vector predictor is added to the motion vector residual by motion vector decoding module 70 in order to obtain the motion vector.
- Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor for the current block has been obtained, the actual value of the motion vector associated with the current block can be decoded and used to apply motion compensation by module 66 .
- the reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to apply the motion compensation 66 .
- the motion vector field data 71 is updated with the decoded motion vector in order to be used for the prediction of subsequent decoded motion vectors.
- decoded block is obtained.
- post filtering is applied by post filtering module 67 .
- a decoded video signal 69 is finally obtained and provided by the decoder 60 .
- HEVC uses 3 different INTER modes: the Inter mode (Advanced Motion Vector Prediction (AMVP)), the “classical” Merge mode (i.e. the “non-Affine Merge mode” or also known as “regular” Merge mode) and the “classical” Merge Skip mode (i.e. the “non-Affine Merge Skip” mode or also known as “regular” Merge Skip mode).
- AMVP Advanced Motion Vector Prediction
- classical Merge mode i.e. the “non-Affine Merge mode” or also known as “regular” Merge mode
- the “classical” Merge Skip mode i.e. the “non-Affine Merge Skip” mode or also known as “regular” Merge Skip mode.
- the main difference between these modes is the data signalling in the bitstream.
- the current HEVC standard includes a competitive based scheme for Motion vector prediction which was not present in earlier versions of the standard.
- Intra Block Copy In the Screen Content Extension of HEVC, the new coding tool called Intra Block Copy (IBC) is signalled as any of those three INTER modes, the difference between IBC and the equivalent INTER mode being made by checking whether the reference frame is the current one. This can be implemented e.g. by checking the reference index of the list L 0 , and deducing this is Intra Block Copy if this is the last frame in that list. Another way to do is comparing the Picture Order Count of current and reference frames: if equal, this is Intra Block Copy.
- IBC Intra Block Copy
- FIG. 6 show the labelling scheme used herein to describe blocks situated relative to a current block (i.e. the block currently being en/decoded) between frames ( FIG. 6 ).
- VVC VVC
- new Merge modes have been added to the regular Merge mode of HEVC.
- MCP motion compensation prediction
- the affine mode is a motion compensation mode like the Inter modes (AMVP, “classical” Merge, or “classical” Merge Skip). Its principle is to generate one motion information per pixel according to 2 or 3 neighbouring motion information. In the JEM, the affine mode derives one motion information for each 4 ⁇ 4 block as depicted in FIG. 8 ( a ) (each square is a 4 ⁇ 4 block, and the whole block in FIG. 8 ( a ) is a 16 ⁇ 16 block which is divided into 16 blocks of such square of 4 ⁇ 4 size—each 4 ⁇ 4 square block having a motion vector associated therewith).
- the Affine mode is available for the AMVP mode and the Merge modes (i.e. the classical Merge mode which is also referred to as “non-Affine Merge mode” and the classical Merge Skip mode which is also referred to as “non-Affine Merge Skip mode”), by enabling the affine mode with a flag.
- the subblock Merge mode of VVC contains a subblock-based temporal merging candidates, which inherit the motion vector field of a block in a previous frame pointed by a spatial motion vector candidate. This subblock candidate is followed by inherited affine motion candidate if the neighboring blocks have been coded with an inter affine mode of subblock merge and then some as constructed affine candidates are derived before some zero Mv candidate.
- the CIIP used the same Motion vector candidates list as the regular Merge mode.
- the decoder side motion vector derivation (DMVR), in VVC, increases the accuracy of the MVs of the Merge mode.
- a bilateral-matching (BM) based decoder side motion vector refinement is applied.
- BM bilateral-matching
- a refined MV is searched around the initial MVs in the reference picture list L 0 and reference picture list L 1 .
- the BM method calculates the distortion between the two candidate blocks in the reference picture list L 0 and list L 1 .
- VVC also includes Adaptive Motion Vector Resolution (AMVR).
- AMVR allows the motion vector difference of the CU to be coded in different precision. For example for AMVP mode: quarter-luma-sample, half-luma-sample, integer-luma-sample or four-luma-sample are considered.
- the following table of the VVC specification gives the AMVR shift based on different syntax elements.
- the bi-prediction mode with CU-level weight is extended beyond simple averaging (as performed in HEVC) to allow weighted averaging of the two prediction signals P 0 and P 1 according to the following formula.
- the weight index, bcwIndex is signalled after the motion vector difference.
- the weight index is inferred from neighbouring blocks based on the merge candidate index.
- BCW is used only for CUs with 256 or more luma samples. Moreover, for low-delay pictures, all 5 weights are used. And for non-low-delay pictures, only 3 weights (w ⁇ 3,4,5 ⁇ ) are used.
- the regular Merge list is derived as in FIG. 10 and FIG. 11 .
- variable cnt is incremented ( 1015 , 1009 , 1013 , 1017 , 1023 , 1027 , 1115 , 1108 ).
- the candidate B 2 ( 1019 ) is added ( 1022 ) if it has not the same motion information as A 1 and B 1 ( 1021 ).
- the temporal candidate is added.
- the history based (HMVP) are added ( 1101 ), if they have not the same motion information as A 1 and B 1 ( 1103 ).
- the number of history based candidates can't exceed the maximum number of candidates minus 1 of the Merge candidates list ( 1102 ). So after the history based candidates there is at least one position missing in the merge candidates list.
- the pairwise candidate is built ( 1106 ) and added in the Merge candidates list ( 1107 ).
- the number of the candidates in the list can be superior to the maximum number of candidates in the final list Maxcand. Yet this number of candidates in the initial list, MaxCandInitialList, is used for the derivation. Consequently, the Zero candidates are added until MaxCandInitialList and not until Maxcand.
- the Merge candidates for the BM Merge mode are derived from spatial neighbouring coded blocks, TMVPs, non-adjacent blocks, HMVPs, pair-wise candidate, in a similar manner as for the regular Merge mode. A difference is that only those meet DMVR conditions are added into the candidate. Merge index is coded in a similar manner as for regular Merge mode.
- the AVMP Merge mode also known as the bi-directional predictor, is defined as the following in JVET-X2025: It is composed of an AMVP predictor in one direction and a Merge predictor in the other direction.
- the mode can be enabled to a coding block when the selected Merge predictor and the AMVP predictor satisfy DMVR condition, where there is at least one reference picture from the past and one reference picture from the future relatively to the current picture and the distances from two reference pictures to the current picture are the same, the bilateral matching MV refinement is applied for the Merge MV candidate and AMVP MVP as a starting point. Otherwise, if template matching functionality is enabled, template matching MV refinement is applied to the Merge predictor or the AMVP predictor which has a higher template matching cost.
- AMVP part of the mode is signalled as a regular uni-directional AMVP, i.e. reference index and MVD are signalled, and it has a derived MVP index if template matching is used or MVP index is signalled when template matching is disabled.
- the intra prediction, fusion for template-based intra mode derivation is described as the following in JVET-X2025: For each intra prediction mode in MPMs, The SATD between the prediction and reconstruction samples of the template is calculated. First two intra prediction modes with the minimum SATD are selected as the TIMD modes. These two TIMD modes are fused with the weights after applying PDPC process, and such weighted intra prediction is used to code the current CU. Position dependent intra prediction combination (PDPC) is included in the derivation of the TIMD modes.
- PDPC Position dependent intra prediction combination
- the MvTh is equal to a value which depends on the number of luma sample nbSamples in the current CU for the template matching regular Merge mode as defined as the following:
- FIG. 18 gives an example of this method on a regular Merge candidate list containing candidates as in the CTC.
- this method was also extended to reorder and select a candidates to be included in the final list of Merge mode candidates.
- JVET-X0087 all possible non-adjacent candidates ( 1540 ) and History based candidates ( 1501 ) are considered with temporal non-adjacent candidates in order to obtain a list of candidates.
- This list of candidates is built without considering the maximum number of candidates.
- This list candidates is then reordered. Only a correct number of candidates from this list are added to the final list of Merge candidates.
- the non-adjacent candidates and History based candidates are processed separately from the adjacent spatial and temporal candidates.
- the processed list is used to supplement the adjacent spatial and temporal Merge candidates already present in the Merge candidate list to generate a final Merge candidate list.
- the Merge temporal candidate is selected from among several temporal candidates which are reordered using ARMC. In the same way, all possible Adjacent candidates are subject to ARMC and up to 9 of these candidates can be added to the list of Merge candidates.
- JVET X0087 re-uses the cost computed during the reordering of the non-adjacent and History based candidates, to avoid additional computation costs.
- JVET-X0133 applies a systematic reordering on all candidates on the final list of merge candidates.
- the Local Illumination Compensation (LIC) have been added. It is based on a linear model for illumination changes. The linear model is computed thanks to neighboring samples of the current block and the neighboring sample of the previous blocks.
- reordering motion vector predictor candidates in a list generally means that the most likely predictors are positioned higher up the list, and as such require fewer bits to encode. However, it has been noted that this does not always occur.
- the following examples aim to improve the reordering process by defining certain circumstances or categories of candidates where the reordering process is not used, reduced, and/or a secondary reordering process is performed.
- the list of candidates or predictors will be reordered according to a cost computed. For example, as mentioned above, several lists of Merge candidates or motion vector predictors are reordered according to template costs as well as Intra predictors.
- the list of candidates or predictors can be Intra or Inter blocks or a list of predictors to derive something other than a predictor.
- the MVD sign prediction method reorder a list of possible MVD sign indexes.
- the present disclosure mainly focus on Merge candidates derivation.
- the list of candidates or predictors will be reordered according to a cost computed.
- the list derived before reordering can come from previously decoded or encoded motion information. For example, spatial positions, temporal positions, or spatial/temporal non adjacent positions, or from a list of previous decoded candidates, or candidates derived from other candidates or candidates derived from other decoded samples or estimated samples. Such candidates are chosen on the basis that their corresponding samples are likely to be correlated to the samples to be en/decoded.
- the list of candidates does not reach always the maximum number of candidates that the final list can contain (Maxcand).
- the maximum number of candidates inside a list corresponds to the maximum index value that a decoder can decode.
- the list of candidates before reordering can contain more than this maximum and the additional candidates in this list can be removed thanks to a reordering algorithm based on cost values as described in FIG. 17 . So, during the derivation the maximum candidates can be higher than the final maximum number of candidates after the reordering.
- Coding efficiency is improved when the candidate(s) added to reach the maximum number of candidates are not considered during the reordering process (i.e. excluded from the reordering process). This is mainly because these candidates are duplicated and when the reordering process set them at an early position (earlier than the selected/best candidate), several consecutive Merge indexes are used to signal the same candidate which increases the merge index rate or avoid the selection of the best candidate. A further reason is that, on average, these candidates are inefficient as they have no correlation with the current block especially when the list of candidates is large.
- the zero candidates which are added to the list are not considered for the reordering process.
- the advantage of this is a complexity reduction as the number of computations of cost is limited.
- An additional advantage of this embodiment is a coding efficiency improvement as the zero candidates are often inefficient and when they are reordered they can take position of useful candidates and when they are duplicated zero candidates, several consecutive indexes at the beginning of the list or at the middle represent exactly the same candidate and the signaling of the others candidates after these candidates have a higher rate than it is needed.
- the reordering process may only be applied in some coding modes. In a particularly advantageous example, the reordering process is applied for one mode on full list.
- the advantage is a coding efficiency improvement. Indeed, when the zero candidate is interesting in term of coding efficiency, there is at least one mode for which it is represented with a minimum number of bits.
- the mode where the reordering process is applied on the full list is a mode with a number of candidates below a threshold (i.e. a small number of candidates) and/or where the zero candidates are often in the list but the number of zero candidates is low (e.g. below a threshold).
- a suitable threshold is 4.
- the reordering process is not performed on candidates set to fulfil the list as the zero candidates for modes which have a number of candidates above a threshold and/or the number of zero candidates is above a threshold.
- the accuracy of the process of determining the relative ‘cost’ of the candidates at the decoder side is a determiner of how effective the reordering process is.
- the following examples provide improvements to the cost determination process which result in a more accurate list, a lower complexity of calculating cost, or both.
- the cost for a candidate can be computed based on neighboring samples of that predictor's block and the neighboring samples of the current block. Such samples are readily available at the decoder side.
- the cost can be also computed compared to another candidate.
- one other candidate can be a most probable candidate or predictor.
- the cost of a candidate is computed thanks to its samples and the samples of this most probable candidate.
- Another implementation should be to change the value of NumMergeCandInList by the variable numMaxNonZeroCand in module 2101 or 1701 . This is only an implementation issue and depends on the initialization of the variables.
- the candidates use to fulfil the list of candidates are set to the end of the list.
- these are ordered according to the order that they have been derived in the initial list. This characteristic gives the coding efficiency as previously explained. For example, in FIG. 21 when the current number of candidates is superior to the value numMaxNonZeroCand ( 2113 ), corresponding the zero candidates, the cost stays equal to the maximum value MAXVAL. Consequently, during the update candidates list process 2110 , the zero candidates are at the end of the list.
- the predictors which are duplicated in the list are not reordered during the reordering process.
- One implementation of this consists in changing the derivation of the candidates. For example, the zero candidates are added with a duplicate check (either when they all have been added or during the process of adding), a variable numNonDuplicateCand is set equal to the current number of candidates in the list. Then the zero candidates are added without duplicate check to fulfill the list.
- all candidates with higher indexes will not be reordered and kept at the end of the list. All previous features can be applied, for example the cost computation, signalling and processing of the non-reordered candidates.
- the candidates use to fulfill the list are not reordered in the list for some modes except one mode where only the duplicate candidates are not reordered.
- the cost is computed as based on the size(s) of template(s), for example being proportional to the size of the template(s).
- the advantage is that the cost of the unavailable template will be statistically closest to the real cost and the comparison compared to the cost of the other candidates is fairer. This method may also be computationally simpler.
- FIG. 22 illustrates particular case 1 and case 2 , but for some cases, only a part of the template is missing. In that case a partial cost can be computed on all available samples.
- the proportionality is a computed with shift operations. For example, a shift operation closest to the actual calculation can be used. For example, the denominator is approximated to the closest power of 2 value.
- the advantage is a simplification, especially for hardware implementation as no division is used. (A shift operation is less complex than a division).
- cost costUp + w ⁇ x ⁇ ( costUp / width ) ⁇ height
- top and left template with one line of samples as in the ECM. But longer templates, or more templates can be considered.
- the top right template, or the bottom left template may be considered.
- the maximum value is inferior to a maximum value set for the candidate used to fulfill the list (e.g. the zero candidates). Indeed, these candidates are more efficient than such candidates which are added to fulfil the requirement to have a certain number of candidates.
- the advantage is a coding efficiency improvement.
- the bi-prediction is computed for the samples available in both directions and the uni-prediction is considered for available samples in one direction when not available for the second direction.
- the advantage is a computation of a cost closest to the real cost.
- the bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus.
- the system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g.
- the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content.
- the decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
- Embodiments of the present invention can also be realized by wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of JCs (e.g. a chip set).
- IC integrated circuit
- JCs e.g. a chip set
- Various components, modules, or units are described herein to illustrate functional aspects of devices/apparatuses configured to perform those embodiments, but do not necessarily require realization by different hardware units. Rather, various modules/units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors in conjunction with suitable software/firmware.
- Embodiments of the present invention can be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium to perform the modules/units/functions of one or more of the above-described embodiments and/or that includes one or more processing unit or circuits for performing the functions of one or more of the above-described embodiments, and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiments and/or controlling the one or more processing unit or circuits to perform the functions of one or more of the above-described embodiments.
- computer executable instructions e.g., one or more programs
- the computer may include a network of separate computers or separate processing units to read out and execute the computer executable instructions.
- the computer executable instructions may be provided to the computer, for example, from a computer-readable medium such as a communication medium via a network or a tangible storage medium.
- the communication medium may be a signal/bitstream/carrier wave.
- the tangible storage medium is a “non-transitory computer-readable storage medium” which may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)TM), a flash memory device, a memory card, and the like.
- At least some of the steps/functions may also be implemented in hardware by a machine or a dedicated component, such as an FPGA (“Field-Programmable Gate Array”) or an ASIC (“Application-Specific Integrated Circuit”).
- FIG. 26 is a schematic block diagram of a computing device 3600 for implementation of one or more embodiments of the invention.
- the computing device 3600 may be a device such as a micro-computer, a workstation or a light portable device.
- the computing device 3600 comprises a communication bus connected to: a central processing unit (CPU) 3601 , such as a microprocessor; a random access memory (RAM) 3602 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least part of an image according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; a read only memory (ROM) 3603 for storing computer programs for implementing embodiments of the invention; a network interface (NET) 3604 is typically connected to a communication network over which digital data to be processed are transmitted or received.
- CPU central processing unit
- RAM random access memory
- ROM read only memory
- the network interface (NET) 3604 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 3601 ; a user interface (UI) 3605 may be used for receiving inputs from a user or to display information to a user; a hard disk (HD) 3606 may be provided as a mass storage device; an Input/Output module (IO) 3607 may be used for receiving/sending data from/to external devices such as a video source or display.
- UI user interface
- HD hard disk
- IO Input/Output module
- the executable code may be stored either in the ROM 3603 , on the HD 3606 or on a removable digital medium such as, for example a disk.
- the executable code of the programs can be received by means of a communication network, via the NET 3604 , in order to be stored in one of the storage means of the communication device 3600 , such as the HD 3606 , before being executed.
- the CPU 3601 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means.
- a decoder according to an aforementioned embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a table or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user.
- a device e.g. a display apparatus
- an encoder is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode. Two such examples are provided below with reference to FIGS. 37 and 38 .
- FIG. 27 is a diagram illustrating a network camera system 3700 including a network camera 3702 and a client apparatus 202 .
- the network camera 3702 includes an imaging unit 3706 , an encoding unit 3708 , a communication unit 3710 , and a control unit 3712 .
- the network camera 3702 and the client apparatus 202 are mutually connected to be able to communicate with each other via the network 200 .
- the encoding unit 3708 encodes the image data by using said encoding methods explained above, or a combination of encoding methods described above.
- the communication unit 3710 of the network camera 3702 transmits the encoded image data encoded by the encoding unit 3708 to the client apparatus 202 .
- the communication unit 3710 receives commands from client apparatus 202 .
- the commands include commands to set parameters for the encoding of the encoding unit 3708 .
- the control unit 3712 controls other units in the network camera 3702 in accordance with the commands received by the communication unit 3712 .
- the decoding unit 3716 decodes the encoded image data by using said decoding methods explained above, or a combination of the decoding methods explained above.
- the control unit 3718 of the client apparatus 202 controls other units in the client apparatus 202 in accordance with the user operation or commands received by the communication unit 3714 .
- the control unit 3718 of the client apparatus 202 controls a display apparatus 2120 so as to display an image decoded by the decoding unit 3716 .
- the control unit 3718 of the client apparatus 202 also controls a display apparatus 2120 so as to display GUI (Graphical User Interface) to designate values of the parameters for the network camera 3702 includes the parameters for the encoding of the encoding unit 3708 .
- GUI Graphic User Interface
- FIG. 28 is a diagram illustrating a smart phone 3800 .
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2205318.5A GB2617568A (en) | 2022-04-11 | 2022-04-11 | Video coding and decoding |
| GB2205318.5 | 2022-04-11 | ||
| PCT/EP2023/059429 WO2023198701A2 (en) | 2022-04-11 | 2023-04-11 | Video coding and decoding |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20250240445A1 true US20250240445A1 (en) | 2025-07-24 |
Family
ID=81653135
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/855,522 Pending US20250240445A1 (en) | 2022-04-11 | 2023-04-11 | Video coding and decoding |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20250240445A1 (https=) |
| JP (1) | JP2025510514A (https=) |
| CN (1) | CN119013966A (https=) |
| GB (1) | GB2617568A (https=) |
| WO (1) | WO2023198701A2 (https=) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025149344A1 (en) * | 2024-01-09 | 2025-07-17 | Interdigital Ce Patent Holdings, Sas | Mmvd candidates list for dmvr candidates |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108293131B (zh) * | 2015-11-20 | 2021-08-31 | 联发科技股份有限公司 | 基于优先级运动矢量预测子推导的方法及装置 |
| US10701393B2 (en) * | 2017-05-10 | 2020-06-30 | Mediatek Inc. | Method and apparatus of reordering motion vector prediction candidate set for video coding |
| WO2019103564A1 (ko) * | 2017-11-27 | 2019-05-31 | 엘지전자 주식회사 | 영상 코딩 시스템에서 인터 예측에 따른 영상 디코딩 방법 및 장치 |
| WO2019107916A1 (ko) * | 2017-11-30 | 2019-06-06 | 엘지전자 주식회사 | 영상 코딩 시스템에서 인터 예측에 따른 영상 디코딩 방법 및 장치 |
| WO2019194499A1 (ko) * | 2018-04-01 | 2019-10-10 | 엘지전자 주식회사 | 인터 예측 모드 기반 영상 처리 방법 및 이를 위한 장치 |
| WO2020004931A1 (ko) * | 2018-06-27 | 2020-01-02 | 엘지전자 주식회사 | 영상 코딩 시스템에서 인터 예측에 따른 영상 처리 방법 및 장치 |
| US10863193B2 (en) * | 2018-06-29 | 2020-12-08 | Qualcomm Incorporated | Buffer restriction during motion vector prediction for video coding |
| US10911768B2 (en) * | 2018-07-11 | 2021-02-02 | Tencent America LLC | Constraint for template matching in decoder side motion derivation and refinement |
| EP4409907A1 (en) * | 2021-09-29 | 2024-08-07 | Canon Kabushiki Kaisha | Video coding and decoding |
| CN118511521A (zh) * | 2021-10-29 | 2024-08-16 | 抖音视界有限公司 | 用于视频处理的方法、装置和介质 |
-
2022
- 2022-04-11 GB GB2205318.5A patent/GB2617568A/en active Pending
-
2023
- 2023-04-11 JP JP2024550328A patent/JP2025510514A/ja active Pending
- 2023-04-11 CN CN202380033208.0A patent/CN119013966A/zh active Pending
- 2023-04-11 US US18/855,522 patent/US20250240445A1/en active Pending
- 2023-04-11 WO PCT/EP2023/059429 patent/WO2023198701A2/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023198701A3 (en) | 2023-11-23 |
| GB2617568A (en) | 2023-10-18 |
| WO2023198701A2 (en) | 2023-10-19 |
| GB202205318D0 (en) | 2022-05-25 |
| JP2025510514A (ja) | 2025-04-15 |
| CN119013966A (zh) | 2024-11-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7804815B2 (ja) | ビデオ符号化及び復号化 | |
| US12587670B2 (en) | Video coding and decoding | |
| GB2611367A (en) | Video coding and decoding | |
| GB2585017A (en) | Video coding and decoding | |
| US20250267277A1 (en) | Video coding and decoding | |
| GB2617626A (en) | Data coding and decoding | |
| US20250240445A1 (en) | Video coding and decoding | |
| US20250254290A1 (en) | Data coding and decoding | |
| GB2628209A (en) | Image and video coding and decoding | |
| WO2024213439A1 (en) | Image and video coding and decoding | |
| EP4695999A1 (en) | Image and video coding and decoding | |
| EP4696000A1 (en) | Image and video coding and decoding | |
| WO2024213386A1 (en) | Image and video coding and decoding | |
| CN118020301A (zh) | 视频编码和解码 | |
| GB2629031A (en) | Image and video coding and decoding | |
| WO2025149564A2 (en) | Image and video coding and decoding | |
| HK40120887A (zh) | 编码方法、解码方法、编码器、解码器和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: CANON KABUSHIKI KAISHA, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LAROCHE, GUILLAUME;ONNO, PATRICE;BELLESSORT, ROMAIN;SIGNING DATES FROM 20240902 TO 20240903;REEL/FRAME:068851/0923 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION COUNTED, NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |