CN116074534A - Bidirectional inter-frame prediction method and device - Google Patents

Bidirectional inter-frame prediction method and device Download PDF

Info

Publication number
CN116074534A
CN116074534A CN202211736966.2A CN202211736966A CN116074534A CN 116074534 A CN116074534 A CN 116074534A CN 202211736966 A CN202211736966 A CN 202211736966A CN 116074534 A CN116074534 A CN 116074534A
Authority
CN
China
Prior art keywords
block
prediction
template
current
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211736966.2A
Other languages
Chinese (zh)
Inventor
张雪
江东
林聚财
殷俊
彭双
方诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202211736966.2A priority Critical patent/CN116074534A/en
Publication of CN116074534A publication Critical patent/CN116074534A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Abstract

The application discloses a bidirectional inter-frame prediction method and device. The bidirectional inter prediction method comprises the following steps: determining a first reference block and a second reference block of the current block; determining a current template, a first reference template and a second reference template, wherein the current template is a pixel area adjacent to a current block, and the position relationship between the current template and the current block, the position relationship between the first reference template and the first reference block and the position relationship between the second reference template and the second reference block are the same; utilizing pixels in the current template, substituting the pixels in the first reference template and the second reference template into the initial model to solve parameters of the initial model, and obtaining a prediction model; and predicting the current block by adopting the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block. The method and the device can solve the problems that the weight selection set in the bidirectional inter-frame prediction process is limited and the method and the device cannot adapt to relatively complex actual conditions.

Description

Bidirectional inter-frame prediction method and device
Technical Field
The present disclosure relates to the field of video processing technologies, and in particular, to a bidirectional inter-frame prediction method and apparatus.
Background
Because the video image data volume is relatively large, it is usually required to encode and compress the video image data, the compressed data is called a video code stream, and the video code stream is transmitted to a user terminal through a wired or wireless network and then decoded and watched. The whole video coding flow comprises the processes of prediction, transformation, quantization, entropy coding and the like. Wherein, the prediction is divided into two parts of intra prediction and inter prediction.
Inter-prediction may include unidirectional inter-prediction and bidirectional inter-prediction. Generally, in the bi-directional inter prediction of a current block, two reference blocks are weighted according to their respective weights to generate a predicted block of the current block.
In the existing video coding standard, the weights used for bidirectional inter-frame prediction are preset weights. That is, for any prediction block using bi-prediction weighting, only one of several preset weights can be used for weighting, and the set weights have limited selection and cannot adapt to relatively complex practical situations.
Disclosure of Invention
The application provides a bidirectional inter-frame prediction method and device, which can solve the problems that the weight selection set in the bidirectional inter-frame prediction process is limited and the bidirectional inter-frame prediction cannot adapt to relatively complex actual conditions.
To solve the above problems, the present application provides a bi-directional inter prediction method, which includes:
determining a first reference block and a second reference block of the current block;
determining a current template, a first reference template and a second reference template, wherein the current template is a pixel area adjacent to the current block, and the position relation between the current template and the current block, the position relation between the first reference template and the first reference block and the position relation between the second reference template and the second reference block are the same;
Utilizing pixels in the current template, substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model, and obtaining a prediction model;
and predicting the current block by adopting the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block.
In an embodiment, predicting the current block using the first reference block, the second reference block, and the prediction model to obtain a predicted block of the current block includes:
and weighting a first prediction block and a second prediction block through the prediction model to obtain a prediction block of the current block, wherein the first prediction block is obtained by performing motion compensation on the first reference block, and the second prediction block is obtained by performing motion compensation on the second reference block.
In an embodiment, the weighting, by the prediction model, the first prediction block and the second prediction block to obtain the prediction block of the current block includes:
and weighting each pixel in the first prediction block and the corresponding pixel in the second prediction block through the prediction model, and/or calculating the gradient of each pixel in the first prediction block and/or the gradient of the corresponding pixel in the second prediction block, and/or calculating the position information of the corresponding pixel in the current block to obtain the predicted value of the corresponding pixel in the current block so as to obtain the prediction block.
In one embodiment of the present invention, in one embodiment,
the predicted value of each pixel in the current block is equal to the sum value or the sum value is truncated to the value after the value range of the pixel value;
the sum value is equal to the sum of a weighting value, a gradient operation value, a position information operation value and/or a constant; the weighted value is a value obtained by weighting each pixel in the first prediction block and a corresponding pixel in the second prediction block, the gradient operation value is a value obtained by operating a gradient of each pixel in the first prediction block and/or a gradient of a corresponding pixel in the second prediction block, and the position operation value is a value obtained by operating position information of a corresponding pixel in the current block.
In an embodiment, the constant is a value calculated from a bit depth of an image to which the current block belongs.
In an embodiment, the computing the gradient of each pixel in the first prediction block and/or the gradient of the corresponding pixel in the second prediction block comprises:
fusing the gradient of each pixel in the first prediction block and the gradient of the corresponding pixel in the second prediction block to obtain a fused gradient;
performing linear operation on the fusion gradient;
The gradient of each pixel in the first prediction block is obtained by using the gradient value of each pixel in the first prediction block in at least one direction, and the gradient of the corresponding pixel in the second prediction block is obtained by using the gradient value of the corresponding pixel in the second prediction block in at least one direction.
In one embodiment of the present invention, in one embodiment,
the sum of all parameters in the prediction model is 1.
In one embodiment of the present invention, in one embodiment,
the method for solving the parameters of the initial model by utilizing the pixels in the current template and substituting the pixels in the first reference template and the second reference template into the initial model comprises the following steps:
calculating a parameter optimal solution for the initial model by taking the minimum fusion value of the difference values of all pixel positions as an optimization target, wherein the difference value of each pixel position is equal to the difference value of the predicted value of each pixel position and the value of each pixel position in the current template, and the predicted value of each pixel position is obtained by substituting the value obtained by carrying out motion compensation on the pixel values of each pixel position in the first reference template and the second reference template into the initial model; and/or the number of the groups of groups,
substituting the pixel values in the first prediction template, the pixel values in the second prediction template and the pixel values in the current template into the initial model to construct a plurality of equations; and solving parameters in the initial model based on the equations, wherein the first prediction template is obtained by performing motion compensation on the first reference template, and the second prediction template is obtained by performing motion compensation on the second reference template.
In one embodiment of the present invention, in one embodiment,
the obtaining a prediction model by using pixels in the current template and substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model includes:
utilizing pixels of a public area in the current template, substituting the pixels of the public area in the first reference template and the second reference template into an initial model to solve parameters of the initial model, and obtaining the prediction model;
the common area is an intersection of an area capable of determining a pixel value in the current template, an area capable of determining a pixel value in the first reference template and an area capable of determining a pixel value in the second reference template.
In an embodiment, the determining the current template, the first reference template, and the second reference template includes: determining a plurality of current templates, a plurality of first reference templates and a plurality of second reference templates, wherein the plurality of current templates, the plurality of first reference templates and the plurality of second reference templates are uniform and correspond to one another;
the obtaining a prediction model by using pixels in the current template and substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model includes: utilizing pixels in each current template, substituting the pixels in the first reference template and the second reference template corresponding to each current template into the initial model to solve parameters of the initial model, and obtaining a prediction model corresponding to each current template;
The predicting the current block by using the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block includes: predicting the current block by adopting the first reference block, the second reference block and a prediction model corresponding to each current template to obtain a prediction block corresponding to each current template; and determining the prediction block of the current block based on the prediction blocks corresponding to the plurality of current templates.
In one embodiment, the number of initial models is a plurality,
the obtaining a prediction model by using pixels in the current template and substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model includes: utilizing pixels in the current template, substituting the pixels in the first reference template and the second reference template into each initial model to solve parameters of each initial model, and obtaining a prediction model corresponding to each initial model;
the predicting the current block by using the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block includes: predicting the current block by adopting the first reference block, the second reference block and the prediction model corresponding to each initial model to obtain the prediction block corresponding to each initial model; determining a prediction block of the current block based on the prediction blocks corresponding to the plurality of initial models;
Wherein the formula and/or gradient calculation method of different initial models are different.
To solve the above problem, the present application further provides a video encoding method, which includes:
determining a prediction block of a current block in an image based on the above prediction method;
the current block is encoded based on the prediction block.
In an embodiment, the encoding the current block based on the prediction block includes:
and setting values of preset syntax elements in the coded code stream, wherein different values of the preset syntax elements represent whether the prediction method is started or not.
In an embodiment, the encoding the current block based on the prediction block includes:
and coding the current template index and the initial model index corresponding to the prediction block to obtain a coded code stream.
In order to solve the above problem, the present application further provides a video decoding method, which includes:
determining a prediction block of a current block in an image based on the above prediction method;
the current block is decoded based on the prediction block.
To solve the above problems, the present application also provides an encoder including a processor; the processor is configured to execute instructions to implement the steps of the method as described above.
To solve the above problems, the present application also provides a decoder including a processor; the processor is configured to execute instructions to implement the steps of the method as described above.
To solve the above-described problems, the present application also provides a computer-readable storage medium storing instructions/program data that can be executed to implement the above-described method.
According to the method, the parameters of the prediction model are determined based on the first reference block and the second reference block, so that the parameters of the prediction model are solved by using the template of the current block and the templates of the two reference blocks, the bidirectional prediction coefficient can be calculated by using the template area self-adaption and is applied to the current block, the calculated parameters are not necessarily one of the preset several choices, different and relatively suitable parameters can be obtained for different images self-adaption, the prediction accuracy can be increased, and the coding and decoding efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a schematic diagram of an embodiment of a template arrangement in the related art;
FIG. 2 is a flow chart of an embodiment of a bi-directional inter prediction method of the present application;
FIG. 3 is a schematic diagram of an embodiment of template setup in the bi-directional inter prediction method of the present application;
FIG. 4 is a schematic diagram of another embodiment of template setup in the bi-directional inter prediction method of the present application;
FIG. 5 is a schematic diagram of an embodiment of gradient calculation in the bi-directional inter prediction method of the present application;
FIG. 6 is a schematic diagram of parameter calculation using templates in the bi-directional inter prediction method of the present application;
FIG. 7 is a schematic illustration of the determination of regions for calculating parameters in a template region in a bi-directional inter prediction method of the present application;
FIG. 8 is a schematic view of regions for calculating parameters in a template region in a bi-directional inter prediction method of the present application;
FIG. 9 is a schematic diagram of predicting a current block using a prediction model and two reference blocks in the bi-directional inter prediction method of the present application;
FIG. 10 is a schematic diagram of various template regions set in the bi-directional inter prediction method of the present application;
FIG. 11 is a flow chart illustrating an embodiment of a video encoding method according to the present application;
FIG. 12 is a schematic diagram of a scheme syntax set in the video coding method of the present application;
FIG. 13 is a schematic diagram of another arrangement of scheme syntax in the video coding method of the present application;
FIG. 14 is a schematic diagram of yet another arrangement of scheme syntax in the video coding method of the present application;
FIG. 15 is a schematic view of still another arrangement of scheme syntax in the video coding method of the present application;
FIG. 16 is a flow chart of an embodiment of a video decoding method of the present application;
FIG. 17 is a schematic diagram of an embodiment of an encoder of the present application;
FIG. 18 is a schematic diagram of an embodiment of a decoder of the present application;
fig. 19 is a schematic diagram of the structure of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure. In addition, the term "or" as used herein refers to a non-exclusive "or" (i.e., "and/or") unless otherwise indicated (e.g., "or otherwise" or in the alternative "). Also, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments may be combined with one or more other embodiments to form new embodiments.
In the first related art, different prediction weight combinations are set, and the bi-directional inter prediction process performs RDO selection among the multiple weight combinations to select an optimal weight combination, and applies the optimal weight combination to two prediction blocks to obtain a prediction block of the current block.
Specifically, the correlation technique allows selection among multiple sets of weight combinations, w belonging to {1,2,3,4,5,6,7}, each number in the set divided by 8 representing MV 1 Prediction result P 0 Subtracting the weight of the first block from 1 to obtain MV 2 Prediction result P 1 Weighting and fusing the two blocks according to the distributed weights to obtain a current block prediction result P bi-pred . The detailed weighting scheme is shown in the following formula:
P bi-pred =((8-w)*P 0 +w*P 1 +4)>>3;
wherein >3 is a right shift 3-bit operation, equivalent to division by 8; +4 is a common operation in engineering implementations to avoid rounding errors.
In a second related art, the cost may be calculated using a template matching technique to derive the weight combinations used to arrive at a bi-prediction block. Specifically, as shown in fig. 1, calculating the cost of each weight combination by using the template of the current block and the templates of the two reference blocks; selecting the weight combination with the minimum cost from a plurality of weight combinations; and predicting the current block by using the two reference blocks based on the selected weight combination to obtain a predicted block of the current block.
In the related art, the weights used are preset weight combinations, and for any prediction block using bi-prediction weighting, only one of the preset weight combinations can be used for weighting, so that the set weight selection is limited, and the prediction block cannot be adapted to the excessively complex practical situation.
Based on the above, the present application proposes a bidirectional inter prediction method, which is based on a template of a current block, and substitutes pixels in templates of two reference blocks into a prediction model to solve parameters of the prediction model, so as to determine a block weighted prediction model used for motion compensation of each of a first reference block and a second reference block, so that parameters of the prediction model are solved by using the template of the current block and the templates of the two reference blocks, bidirectional prediction coefficients can be adaptively calculated by using a template region and applied to the current block, and the parameters thus calculated are not necessarily one of several preset options, so that different and relatively suitable parameters can be adaptively obtained for different images, prediction accuracy can be increased, and coding and decoding efficiency can be improved.
Specifically, as shown in fig. 2, the bidirectional inter prediction method provided in the present application specifically includes the following steps. It should be noted that the following step numbers are only for simplifying the description, and are not intended to limit the execution order of the steps, and the execution order of the steps of the present embodiment may be arbitrarily changed without departing from the technical idea of the present application.
S101: a first reference block and a second reference block of the current block are determined.
The first reference block and the second reference block of the current block may be determined first, so that parameters of the prediction model are solved subsequently based on templates of the current block and the two reference blocks.
Alternatively, the first reference block and the second reference block of the current block may be determined by means of merge and/or AMVP or the like.
S102: a current template, a first reference template, and a second reference template are determined.
After determining the first reference block and the second reference block of the current block, the current template, the first reference template, and the second reference template may be determined.
As shown in fig. 1, the current template is a template of the current block, i.e., a pixel region adjacent to the current block. The first reference template is a template of the first reference block, i.e. a region of pixels adjacent to the first reference block. The second reference template is a template of the second reference block, i.e. a region of pixels adjacent to the second reference block. The positional relationship between the current template and the current block, the positional relationship between the first reference template and the first reference block, and the positional relationship between the second reference template and the second reference block are the same.
In one implementation, the method is pre-configured with a plurality of template regions, and the current template may be one or more of the plurality of template regions. After determining a current template, determining a corresponding first reference template and a corresponding second reference template based on the current template so as to solve a prediction model corresponding to the current template based on the current template and the corresponding first reference template and second reference template, thereby facilitating the subsequent prediction of a prediction block corresponding to the current template by using the prediction model corresponding to the current template. For example, as shown in fig. 3, assuming that the determined current template is an a region, a B region, and a D region adjacent to the current block, the first reference template corresponding to the current template is also an a region, a B region, and a D region adjacent to the first reference block, and the second reference template corresponding to the current template is also an a region, a B region, and a D region adjacent to the second reference block.
In another implementation, the maximum template area may be divided into a plurality of sub-areas; selecting at least one sub-area from the plurality of sub-areas as a current template, so that all pixels or partial pixel areas in the maximum template area can be used as the current template; and corresponding first and second reference templates are determined based on the current template. As shown in fig. 4, the maximum template area may include an a area, a B area, a C area, a D area, and an E area. For example, an a region, a B region, and a D region adjacent to the current block may be taken as template regions of the current block, and respective first and second reference templates may be determined. For another example, an a region, a B region, and a C region adjacent to the current block may be taken as template regions of the current block, and respective first and second reference templates may be determined. For another example, an a region, a D region, and an E region adjacent to the current block may be taken as template regions of the current block, and respective first and second reference templates may be determined. For another example, a B region adjacent to the current block may be taken as a template region of the current block, and the corresponding first and second reference templates may be determined. For another example, a D region adjacent to the current block may be taken as a template region of the current block, and the corresponding first and second reference templates may be determined.
Alternatively, as shown in fig. 4, the number of rows H and/or columns W of the current template is not limited, and may be equal to 1, 3, or 4, for example. And the number of rows H and the number of columns W of the current template may be equal or unequal. For example, the number of rows h=the number of columns w=1 of the current template. For another example, the number of rows h=the number of columns w=4 of the current template. For another example, the number of rows h=1 of the current template and the number of columns w=4 of the current template.
S103: and substituting the pixels in the first reference template and the second reference template into the initial model to solve the parameters of the initial model by utilizing the pixels in the current template, so as to obtain a prediction model.
After determining the templates of the current block and the two reference blocks, the parameters of the initial model can be solved based on the templates of the current block and the two reference blocks, and a prediction model is obtained.
Alternatively, the above-mentioned "substituting pixels in the first reference template and the second reference template into the initial model" may refer to: substituting the values of the pixels in the first reference template and the second reference template after the motion compensation into the initial model.
The initial model may include weighted terms of the two reference blocks such that, after the parameters of the initial model are solved using this step to obtain a prediction model, the prediction model is subsequently used to weight the motion-compensated blocks of the first reference block and the second reference block, respectively, to obtain a prediction block for the current block. For example, the formula of the initial model may be P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x, y); wherein, when solving the initial model parameters, P 1 (x, y) and P 2 (x, y) being motion compensated values of (x, y) pixels in the first and second reference templates, respectively; p when predicting a current block using a prediction model 1 (x, y) and P 2 (x, y) is the motion compensated value of the (x, y) pixel in the first reference block and the second reference block, P bi-pred (x, y) is the predicted value of the (x, y) pixel in the current block.
The initial model may further include a position information item, so that after the parameters of the initial model are solved by using the step to obtain a prediction model, relevant information such as a position which can be obtained is fused when the prediction model is used to weight the blocks after the motion compensation of the first reference block and the second reference block, so that the blocks after the motion compensation of the two reference blocks can be fused more flexibly, so that the prediction accuracy is improved, and the coding efficiency is improved. For example, the formula of the initial model may be P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×pos x +d×pos y The method comprises the steps of carrying out a first treatment on the surface of the Wherein, when solving the parameters of the initial model, pos x And pos y The relative horizontal and longitudinal positions of the pixel points at the (x, y) positions in the current template relative to the first preset point (such as the upper left corner or the upper right corner and the like) in the current template; and when predicting the current block by using the prediction model obtained by solving, pos x And pos y The relative horizontal and vertical positions of the (x, y) position pixel point in the current block with respect to a second preset point (e.g., upper left corner or upper right corner, etc.) in the current block, respectively.
In addition, the initial model may further include gradient terms, so that after the parameters of the initial model are solved by the step to obtain a prediction model, relevant information such as texture and the like which can be obtained is fused when the prediction model is used for weighting the blocks after the motion compensation of the first reference block and the second reference block respectively, so that the blocks after the motion compensation of the two reference blocks respectively can be fused more flexibly, the prediction accuracy is improved, and the coding efficiency is improved. For example, the formula of the initial model may be P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x, y) +c×grad (x, y), where grad (x, y) is the gradient value at the pixel point at the (x, y) position. For another example, the initial model may be formulated as P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×pos x +d×pos y +e×grad(x,y)。
The method for calculating the gradient value at the pixel point of the (x, y) position is not limited. Optionally, in solving the parameters of the initial model, grad (x, y) is obtained by using the gradient value at the (x, y) position pixel point in the first reference template and/or the gradient value at the (x, y) position pixel point in the second reference template; correspondingly, when the current block is predicted by using the prediction model obtained by solving, grad (x, y) is obtained by using the gradient value at the (x, y) position pixel point in the first reference block and/or the gradient value at the (x, y) position pixel point in the second reference block.
For example, in solving for parameters of the initial model, grad (x, y) is equal to the gradient value at the (x, y) position pixel point in the first reference template; accordingly, when predicting the current block using the solved prediction model, grad (x, y) is equal to the gradient value at the (x, y) position pixel point in the first reference block.
For another example, assume that the image frame to which the second reference block belongs is closer to the image frame to which the current block belongs than the image frame to which the first reference block belongs: when solving the parameters of the initial model, grad (x, y) is equal to the gradient value at the pixel point at the (x, y) position in the second reference template; accordingly, when predicting the current block using the solved prediction model, grad (x, y) is equal to the gradient value at the (x, y) position pixel point in the second reference block.
For another example, assume that the image frame to which the first reference block belongs is closer to the image frame to which the current block belongs than the image frame to which the second reference block belongs: then, in solving the parameters of the initial model, grad (x, y) is obtained by using the gradient value at the (x, y) position pixel point in the first reference template, specifically, grad (x, y) may be equal to the gradient value at the (x, y) position pixel point in the first reference template; accordingly, when predicting the current block using the solved prediction model, the grad (x, y) is obtained using the gradient value at the (x, y) position pixel point in the first reference block, and specifically, the grad (x, y) may be equal to the gradient value at the (x, y) position pixel point in the first reference block.
For another example, when solving the parameters of the initial model, the grad (x, y) is equal to the value obtained by adding the gradient value at the (x, y) position pixel point in the first reference template to the gradient value at the (x, y) position pixel point in the second reference template; accordingly, when the current block is predicted using the solved prediction model, grad (x, y) is equal to a value obtained by adding the gradient value at the (x, y) position pixel point in the first reference block to the gradient value at the (x, y) position pixel point in the second reference block.
For another example, in solving the parameters of the initial model, grad (x, y) is equal to the average of the gradient values at the (x, y) position pixels in the first reference template and the gradient values at the (x, y) position pixels in the second reference template; accordingly, when predicting the current block using the solved prediction model, grad (x, y) is equal to an average value of the gradient values at the (x, y) position pixels in the first reference block and the gradient values at the (x, y) position pixels in the second reference block.
The calculation mode of the gradient value at the (x, y) position pixel point in each reference template and the gradient value at the (x, y) position pixel point in each reference block is not limited, and specifically, the gradient of the (x, y) position pixel point is calculated through the (x, y) position pixel point and the surrounding pixel points. And, the gradient value at the (x, y) position pixel point in each reference template may be calculated by using the gradient value in at least one direction at the (x, y) position pixel point in each reference template, and correspondingly, the gradient value at the (x, y) position pixel point in each reference block may also be calculated by using the gradient value in at least one direction at the (x, y) position pixel point in each reference block. In a specific example, the gradient value at the (x, y) position pixel point in each reference template may be equal to the sum of squares of the gradient values in the plurality of directions at the (x, y) position pixel point in each reference template, and correspondingly, the gradient value at the (x, y) position pixel point in each reference block is equal to the sum of the squares of the gradient values in the plurality of directions at the (x, y) position pixel point in each reference block. In another specific example, the gradient values at the (x, y) position pixels in each reference template may be equal to weighted values of the gradient values in the plurality of directions at the (x, y) position pixels in each reference template, and correspondingly, the gradient values at the (x, y) position pixels in each reference block are equal to weighted values of the gradient values in the plurality of directions at the (x, y) position pixels in each reference block.
The gradient value of each direction at the pixel point of the (x, y) position in each reference template can be calculated through a gradient operator such as a Sobel operator or a plurality of pixel points connected into a line along each direction of the (x, y) position. Accordingly, the gradient value of each direction at the pixel point of the (x, y) position in each reference block can also be calculated by an arbitrary gradient operator, or several points connected in a line along each direction of the (x, y) position. Illustratively, as shown in FIG. 5, the information may be obtained by grad A =P E -P C 、grad A =P C -P E Or grad A =(P E -P A )+(P A -P c ) The horizontal gradient of the A pixel is calculated by the equation, wherein P A 、P E And P C Representing pixel values of A point, E point and C point respectively, grad A The value of grad (x, y) in the formula is expressed when (x, y) is the point A.
Preferably, the gradient value of the (x, y) position pixel point of the first reference block/the first reference template and the gradient value calculation method of the (x, y) position pixel point of the second reference block/the second reference template may be the same. Of course, in other embodiments, the gradient value of the (x, y) position pixel point of the first reference block/first reference template and the gradient value calculation method of the (x, y) position pixel point of the second reference block/second reference template may also be different.
Furthermore, the initial model may also include constant terms. For example, initially The formula of the initial model can be P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y +f x grad (x, y), where 2 bitdepth-1 Is constant. For another example, the initial model may be formulated as P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×grad (x, y). For another example, the initial model may be formulated as P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y . Further, the constant term may be calculated by calculating the bit depth of the image to which the current block belongs. Illustratively, in an image with a bit depth of 10, a constant of 2 is assumed 10-1 =512, i.e. 2 bitdepth-1 Is the pixel value 512. Of course, in other embodiments, the above-mentioned constant may also be calculated by parameters such as the resolution of the image to which the current block belongs. Alternatively, the constant may be a set value.
In addition, there may be a limitation of parameter normalization in the initial model, that is, the sum of all parameters in the initial model is 1. Thus, the formula of the initial model may be: p (P) bi-pred (x,y)=a×P 1 (x,y)+(1-a)×P 2 (x, y); or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+(1-a-b)×2 bitdepth-1 The method comprises the steps of carrying out a first treatment on the surface of the Or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×pos x +(1-a-b-c)×pos y The method comprises the steps of carrying out a first treatment on the surface of the Or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x, y) + (1-a-b) ×grad (x, y); or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×pos x +d×pos y ++ (1-a-b-c-d) ×grad (x, y); or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 ++ (1-a-b-c) ×grad (x, y); or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +(1-a-b-c-d)×pos y The method comprises the steps of carrying out a first treatment on the surface of the Or, P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y ++ (1-a-b-c-d-e). Times.grad (x, y); or, P bi-pred (x,y)=(1-a-b-c-d-e)×P 1 (x,y)+a×P 2 (x,y)+b×2 bitdepth-1 +c×pos x +d×pos y +e×grad(x,y)。
Of course, in other embodiments, there may be no parameter normalization limitation in the initial model, i.e., the sum of all parameters in the initial model may not be equal to 1.
In addition, a value range can be set for at least one parameter in the initial model, and when the solved parameter in the initial model exceeds the value range, the parameter in the initial model can be modified so as to limit the parameter in the obtained prediction model in the value range. For example, equation P bi-pred (x,y)=a×P 1 (x,y)+(1-a)×P 2 In (x, y), the value range of a can be set to be [0,1 ]]This limitation may not be set. In this example, the value range of a is set to be [0,1]. If a is actually calculated to be greater than 1, a is changed to be equal to 1. If a is actually calculated to be smaller than 0, a is changed to be equal to 0.
In one implementation, the parameters of the initial model may be solved using an optimization method to obtain the predictive model. Alternatively, the fusion value of the difference values of all the pixel positions is the minimum as an optimization target, and the optimal solution of the parameters can be obtained for the initial model to obtain the prediction model. The difference value of each pixel is equal to the difference value between the predicted value of each pixel position and the value of each pixel position in the current template, and the predicted value of each pixel position is obtained by substituting the value obtained by performing motion compensation on the pixel value of each pixel position in the first reference template and the second reference template into the initial model.
Specifically, as shown in fig. 6, the current template is marked as a, the templates of the two reference blocks are marked as B and C, respectively, and (x is set a i ,y a i ),(x b i ,y b i ),(x c i ,y c i ) Points at corresponding locations of the ABC region, respectively. Will (x) b i ,y b i ) And (x) c i ,y c i ) The predicted value obtained by the pixel value through motion compensation is substituted into the initial model to obtain (x) a i ,y a i ) Predicted values at (x) and can also be obtained (x a i ,y a i ) The true value of the template pixel.
The parameter optimal solution is a weight parameter that makes the difference between the pixel values predicted by all the position points and the pixel values actually obtained by the current template A at the point minimum, and if n points in A/B/C are set, the optimization formula can be expressed as follows:
θ=argmin θi=0…n-1 d(P bi-pred (x i ,y i )-P bi-rec (x i ,y i ));
where d (-) represents the measurement method, e.g. square (-) 2 Or absolute value I.I; p (P) bi-pred (x i ,y i ) Representing the predicted result of the current template (at a in fig. 6) obtained by substituting the pixel values of the two reference templates (at B and C in fig. 6) and other information into the weighting model; p (P) bi-rec (x i ,y i ) Representing the true reconstructed value of the current template (at a in fig. 6); argmin θ (. Cndot.) means that the best weight parameter θ is obtained by minimizing the difference between the two.
The specific solving process of the optimization formula is not limited.
In a specific example of this realizable form, assume that the formula of the initial model is the following:
P bi-pred (x,y)=a×P 1 (x,y)+(1-a)×P 2 (x,y);
a in the formula is a parameter which needs to be calculated by using a template region.
In this example, the parameter a is solved using the optimization method of the achievable approach.
Specifically, the optimization objective may be determined first: θ=argmin θi=0…n-1 (P bi-pred (x i ,y i )-P bi-rec (x i ,y i )) 2
Then the first reference template and the second reference templateSubstituting n points in the reference template into the formula to obtain the formula a=argmin ai=0…n-1 (a×P 1 (x i ,y i )+(1-a)×P 2 (x i ,y i )-P bi-rec (x i ,y i )) 2
The formula is derived to be equivalent to a=argmin ai=0…n-1 ((P 1 (x i ,y i )-P 2 (x i ,y i )) 2 ×a 2 +2a(P 1 (x i ,y i )-P 2 (x i ,y i ))(P 2 (x i ,y i )-P bi-rec (x i ,y i ))=argmin ai=0…n-1 (P 1 (x i ,y i )-P 2 (x i ,y i )) 2 ×a 2 +∑ i=0…n-1 2(P 1 (x i ,y i )-P 2 (x i ,y i ))(P 2 (x i ,y i )-P bi -rec(x i ,y i ))×a;
Solving the above formula to determine parameters
Figure BDA0004032839360000111
In another implementation, at least one equation may be listed by inputting the pixel values of the two reference templates (at B and C in FIG. 6) and other information, as well as the pixels in the current module (at A in FIG. 6), into the initial model; and solving at least one equation to obtain parameters in the initial model, so as to obtain the prediction model.
In a specific example of this realizable form, the formula of the initial model is the following:
P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y +f×grad(x,y);
wherein a, b, c, d, e and f in the above formula are parameters that need to be calculated using template regions.
The gradient grad (x, y) calculation mode can be set as follows: the square of the transverse gradient plus the square of the longitudinal gradient is calculated using the sobel operator. I.e. gradient grad (x, y) is calculated from the following formula: grad (x, y) =grad A (x,y)+grad a (x, y). Further, as shown in FIG. 5, the gradient grad at point A A =(P F +2P C +P H -P G -2P E -P I ) 2 +(P F +2P B +P G -P H -2P D -P I ) 2 The method comprises the steps of carrying out a first treatment on the surface of the And P i Representing pixel values of I points, I can be F, C, H, G, E, B, D and I as shown in fig. 5, and a point a is the current position point of one reference template, and the corresponding position point of the other reference template is set as a point. Gradient grad of point a a Computing means of (c) and gradient grad of point a A The same calculation method is not described in detail herein.
Model parameters may be calculated using the current template, the templates of the two reference blocks, as shown in fig. 6.
Optimization objectives may be determined: θ=argmin θi=0…n-1 (P bi-pred (x i ,y i )-P bi-rec (x i ,y i )) 2
The optimization problem can also be converted into a problem solved by a system of linear equations, for each point (x i ,y i ) The formula is: aXP 1 (x i ,y i )+b×P 2 (x i ,y i )+c×2 bitdepth-1 +d×pos xi +e×pos yi +f×
grad(x i ,y i )=P bi-rec (x i ,y i );
For a template region of n points in total, n equations are found, which constitute a solution to the system of equations. The equation set can be solved by utilizing the manners of solving the equation set such as LDL and the like to obtain parameter values of a, b, c, d, e and f, and the obtained parameter values are substituted into a formula to obtain a solved prediction model.
When the parameters are solved by the method for solving the initial model parameters, all pixel points in the current template and the two reference templates can be utilized to solve the initial model parameters.
Of course, when the parameters are solved by the method for solving the initial model parameters, the initial model parameters can be solved by using partial pixels in the current template and the two reference templates.
Further, considering that some pixels in the current template and/or the two reference templates cannot determine pixel values due to the fact that the pixels are not reconstructed and the like, at least partial pixels of a common area in the current template and the two reference templates can be utilized to solve initial model parameters, wherein the common area is an intersection of an area capable of determining pixel values in the current template, an area capable of determining pixel values in the first reference template and an area capable of determining pixel values in the second reference template, and therefore the problem that pixels incapable of determining pixel values are substituted into a formula to influence the solving of initial model parameters is avoided.
Alternatively, the common area should be less than or equal to the maximum template area. The method of determining the common area may be as follows:
if for a certain region of the maximum template region only one or both of the current template and the two reference templates can determine the pixel value, the region is cropped out such that the common region is equal to the region left by the current template region cropped out of this region.
Or if the current template cannot determine the pixel value of a certain area in the maximum template area, cutting off the area, wherein the common area is equal to the area left by subtracting the area from the current template area.
The pixel value of each point within the common area determined by the above method may be determined by a direct acquisition or filling method. The manner of pixel filling includes, but is not limited to, the following:
A. filling with nearby pixel values;
B. filling with all or part of statistical values, such as mean, mode, etc., from which pixel values can be obtained;
C. fixed value padding may be utilized, such as padding 512 at a 10 bit depth, etc.
In a specific example, as shown in fig. 7, the region C in the current template is a region where the pixel value cannot be determined, that is, the upper right template pixel of the current block cannot determine the pixel value; since the partial region of the reference template pointed by MV2 exceeds the image boundary, partial pixels are not acquired at the lower left.
Thus, as shown in fig. 8, since the current template cannot be obtained in the region C, the templates of the two reference blocks also remove the region; and if only partial pixels of the reference template pointed by the MV2 in the E region cannot be obtained, filling the part which cannot be obtained by the means of the partial pixels which can be obtained in the template pointed by the MV2 in the block. In connection with the illustration of fig. 7, the resulting common regions, i.e., the portions of the three templates that are ultimately obtained, are region a, region B, region D, and region E.
S104: and predicting the current block by adopting the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block.
After the initial model parameters are solved based on the steps to obtain a prediction model, the prediction model can be applied to the prediction process of the current block, specifically, two prediction blocks obtained by motion compensation of two reference blocks can be used as the input of the prediction model, and the prediction value of each point in the current block is calculated and output through the prediction model, so that the prediction block of the current block can be obtained.
Wherein, the calculation by the prediction model can be represented as: and weighting the first prediction block and the second prediction block by using a prediction model, wherein the first prediction block is obtained by performing motion compensation on the first reference block, and the second prediction block is obtained by performing motion compensation on the second reference block.
Alternatively, in the case where the predictive model includes gradient terms, the calculation by the predictive model may be embodied as: and weighting each pixel in the first prediction block and the corresponding pixel in the second prediction block by using the prediction model, and calculating the gradient of each pixel in the first prediction block and/or the gradient of the corresponding pixel in the second prediction block to obtain the predicted value of the corresponding pixel in the current block, thereby obtaining the prediction block of the current block.
Alternatively, in the case where the prediction model includes a positional information item, calculation by the prediction model may be embodied as: and weighting each pixel in the first prediction block and the corresponding pixel in the second prediction block by using the prediction model, and calculating the position information of the corresponding pixel in the current block to obtain the prediction value of the corresponding pixel in the current block, thereby obtaining the prediction block of the current block.
Further, in the case where the prediction model includes a gradient term and a position information term, calculation by the prediction model may be embodied as: and weighting each pixel in the first prediction block and the corresponding pixel in the second prediction block by using the prediction model, calculating the gradient of each pixel in the first prediction block and/or the gradient of the corresponding pixel in the second prediction block, and calculating the position information of the corresponding pixel in the current block to obtain the predicted value of the corresponding pixel in the current block, thereby obtaining the prediction block of the current block.
Specifically, the predicted value of the corresponding pixel in the current block may be equal to a sum of a weighted value, a gradient operation value, a position information operation value and/or a constant, where the weighted value may be a value obtained by weighting each pixel in the first predicted block and a pixel corresponding to each pixel in the second predicted block, the gradient operation value is a value obtained by operating a gradient of each pixel in the first predicted block and/or a gradient of a pixel corresponding to each pixel in the second predicted block, and the position operation value is a value obtained by operating position information of a pixel corresponding to each pixel in the current block.
The gradient operation method, the position information operation method and the constants can be specifically referred to the content in step S103, and are not described herein.
Further, considering that the prediction result of the pixel in the current block determined by the prediction model may exceed the value range of the pixel value, after the prediction result of the pixel in the current block determined by the prediction model is obtained, the prediction result of the pixel in the current block determined by the prediction model may be truncated to the value range of the pixel value. In this way, the predicted value of the corresponding pixel in the current block may be equal to a value obtained by truncating the sum of the weighted value, the gradient operation value, the position information operation value, and/or the constant to the value range of the pixel value.
In a specific example, as shown in fig. 9, two prediction blocks, i.e., a first prediction block and a second prediction block, are obtained through motion compensation using two MV-oriented reference blocks B and C.
And substituting the input of each point (x, y) in the two prediction blocks into a prediction model to obtain the prediction value of each point in the current block A.
For example, the predictive model is a formula as shown below:
P(x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y +f×
grad(x,y);
based on the above formula, it can be seen that, except for P 1 (x, y) and P 2 (x, y) is substituted into the prediction model, and the relative position pos of the (x, y) point relative to the upper left corner of the current block A is also calculated x And pos y And substituting the gradient grad (x, y) into the prediction model to determine a prediction result of each pixel in the current block through the prediction model. The gradient calculation method may refer to the content in step S103, which is not described herein.
Truncating the prediction result P (x, y) to a defined pixel range, e.g. a pixel range of [0,1023 ] in an image with a bit depth of 10]Using formula P bi-pred (x, y) =clip (P (x, y)) truncates the prediction result of all pixels of the current block a to a pixel range of [0,1023 ]]Obtaining the predicted value P of each point in the current block bi-pred Thus, the prediction block of the current block can be obtained.
In this embodiment, based on the template of the current block, pixels in the templates of the two reference blocks are substituted into the prediction model to solve the parameters of the prediction model, so as to determine the prediction model for predicting the prediction block of the current block based on the first reference block and the second reference block, thus, the parameters of the prediction model are solved by using the template of the current block and the templates of the two reference blocks, the bi-directional prediction coefficient can be adaptively calculated by using the template region and applied to the current block, the parameters thus calculated are not necessarily one of several preset choices, different and relatively suitable parameters can be adaptively obtained for different images, the prediction accuracy can be increased, and the coding and decoding efficiency can be improved.
Alternatively, in the process of performing bidirectional inter prediction by using the above method, multiple current templates, multiple first reference templates, and multiple second reference templates may be set for the current block, where the multiple current templates, the multiple first reference templates, and the multiple second reference templates are all in one-to-one correspondence. In this case, it is possible to determine that the current block corresponds to a prediction block corresponding to each current template using steps S103 and S104, and then determine a prediction block of the current block based on prediction blocks corresponding to a plurality of current templates.
Specifically, the determination of the current block corresponding to the prediction block corresponding to each current template using step S103 and step S104 may be embodied as: substituting pixels in a first reference template and a second reference template corresponding to each current template into an initial model to solve parameters of the initial model by using the pixels in each current template, so as to obtain a prediction model corresponding to each current template; and predicting the current block by adopting the first reference block, the second reference block and the prediction model corresponding to each current template to obtain the prediction block corresponding to each current template.
And the step of determining the prediction block of the current block based on the prediction blocks corresponding to the plurality of current templates may include: calculating prediction cost based on the prediction block corresponding to each current template to obtain cost of each current template; and determining the prediction block of the current block based on the costs of the prediction blocks corresponding to all the current templates.
Preferably, the prediction block corresponding to the current template with the minimum cost is used as the prediction block of the current block.
The above-mentioned multiple current templates can be used as multiple modes, i.e. different current template selection methods are different modes, so that the coding and decoding can preset multiple modes, the coding end can select the optimal mode by means of RDO, etc., and can code the optimal mode index into the code stream, and the decoding end can decode the code stream and adopts the mode of optimal mode to make prediction.
Illustratively, as shown in fig. 10, 6 kinds of current templates are set for the current block, i.e., 6 kinds of modes are set for the template area. In this case, a prediction block corresponding to each current template may be determined, and an optimal prediction block may be selected from the prediction blocks corresponding to the 6 current templates by an RDO method or the like, and the current template corresponding to the optimal prediction block may be used as an optimal mode.
In addition, in the process of bi-directional inter prediction using the above method, various initial models may be set. In this case, it is possible to determine that the current block corresponds to a prediction block corresponding to each of the initial models using step S103 and step S104, and then determine a prediction block of the current block based on the prediction blocks corresponding to the plurality of initial models.
Specifically, the determination of the current block corresponding to the prediction block corresponding to each initial model using step S103 and step S104 may be embodied as: utilizing pixels in the current template, substituting the pixels in the first reference template and the second reference template into each initial model to solve parameters of each initial model, and obtaining a prediction model corresponding to each initial model; and predicting the current block by adopting the first reference block, the second reference block and the prediction model corresponding to each initial model to obtain the prediction block corresponding to each initial model.
And the step of determining the prediction block of the current block based on the prediction blocks corresponding to the plurality of initial models may include: calculating prediction cost based on the prediction blocks corresponding to each initial model to obtain cost of each initial model; and determining the prediction block of the current block based on the costs of the prediction blocks corresponding to all the initial models.
Preferably, the prediction block corresponding to the initial model with the minimum cost is used as the prediction block of the current block.
Wherein the formulas of the different initial models may be different. Of course, in some cases, the formulas of different initial models may be the same, and the gradient calculation manners of the multiple initial models with the same formulas may be different.
The above-described various initial models may be used as a plurality of modes, i.e., different initial models are different modes. And for the case of different gradient calculation methods, different gradient calculation methods can be set to be different modes. The encoding and decoding can preset a plurality of modes, the encoding end can select the optimal mode through RDO and other methods, the optimal mode index is encoded into the code stream, and the decoding end can decode the code stream and predict in an optimal mode.
Illustratively, 3 initial models are provided, i.e. 3 modes are provided in terms of initial models. The formulas of the 3 initial models are respectively as follows:
P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1
P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×
pos y
P bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×
pos y +f×grad(x,y);
and under each initial model, calculating parameters by using the current template and the two reference templates to obtain a prediction model corresponding to each initial model. The prediction block corresponding to each initial model may then be calculated using the prediction model corresponding to each initial model. Then, the optimal mode can be selected according to RDO and other methods, and the corresponding prediction block can be obtained.
In another specific example, assume that the formula of the initial model is the following: p (P) bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y +f×grad (x, y); a plurality of gradient calculation methods are set in the initial model, so that different gradient calculation methods can be set into different modes. As shown in fig. 5, the multiple gradient calculation methods in the initial model may include: the sobel operator calculates the transverse gradient, and the specific calculation formula can be grad A =P F +2P C +P H -P G -2P E -P I The method comprises the steps of carrying out a first treatment on the surface of the The longitudinal gradient is calculated by the Sobel operator, and a specific calculation formula can be as follows: grad A =P F +2P B +P G -P H -2P D -P I The method comprises the steps of carrying out a first treatment on the surface of the The diagonal gradient is calculated, and a specific calculation formula can be as follows: grad A =P C +2P H +P D -P B -2P G -P E The method comprises the steps of carrying out a first treatment on the surface of the The anti-diagonal gradient is calculated, and a specific calculation formula can be as follows: grad A =P C +2P F +P B -P D -2P I -P E The method comprises the steps of carrying out a first treatment on the surface of the Under each gradient calculation method, calculating initial model parameters by using a current template and two reference template pixels to obtain a prediction model corresponding to each gradient calculation method, then calculating by using the prediction model to obtain a current prediction block, and selecting an optimal mode and obtaining a corresponding prediction block according to an RDO method.
Of course, in the process of performing bidirectional inter prediction using the above method, a plurality of current templates, a plurality of first reference templates, and a plurality of second reference templates may be set for the current block, where the plurality of current templates, the plurality of first reference templates, and the plurality of second reference templates are all in one-to-one correspondence. And a variety of initial models may also be provided. In this case, it is possible to determine that the current block corresponds to the prediction block corresponding to each of the current templates and each of the initial models using steps S103 and S104, and then determine the prediction block of the current block based on the prediction blocks corresponding to the plurality of current templates and the plurality of initial models.
Specifically, the determination of the current block corresponding to each current template and the prediction block corresponding to each initial model using step S103 and step S104 may be embodied as: utilizing pixels in each current template, substituting the pixels in the first reference template and the second reference template corresponding to each current template into each initial model to solve parameters of each initial model, and obtaining prediction models corresponding to each current template and each initial model; and predicting the current block by adopting the first reference block, the second reference block and the prediction models corresponding to each current template and each initial model to obtain the prediction blocks corresponding to each current template and each initial model.
And the step of determining the prediction block of the current block based on the prediction blocks corresponding to the plurality of current templates and the plurality of initial models may include: calculating prediction cost based on the prediction blocks corresponding to each current template and each initial model to obtain cost corresponding to each current template and each initial model; and determining the prediction block of the current block based on the costs of the prediction blocks corresponding to all the current templates and all the initial models.
Preferably, the prediction block corresponding to the current template with the minimum cost is used as the prediction block of the current block.
Referring to fig. 11, fig. 11 is a flowchart illustrating an embodiment of a video encoding method according to the present application. It should be noted that, if there are substantially the same results, the present embodiment is not limited to the flow sequence shown in fig. 11. In this embodiment, the video encoding method includes the steps of:
s301: a prediction block of a current block in the image is determined based on any of the prediction methods described above.
S302: the current block is encoded based on the prediction block.
Optionally, when the current block is encoded, a value of a preset syntax element may be set in the encoded bitstream, where different values of the preset syntax element represent whether the prediction method of the present application is enabled, that is, a switch syntax is set to indicate a use state of the prediction method of the present application.
For example, syntax enhanced_ bcw _enable is set in the case of bi-prediction; when the syntax value is 0, the method indicates that the prediction blocks obtained by performing motion compensation on the two reference blocks by using a first related technology are weighted; when the syntax value is 1, the prediction blocks obtained by performing motion compensation on the two reference blocks by using the bidirectional inter prediction method are weighted.
In addition, when a plurality of schemes or modes are selected in a codec flow to which the bi-directional inter prediction method of the present application is applied, a scheme syntax may be used to express the finally selected scheme or mode. For example, the scheme syntax is used to express the final selected current template, prediction model, and/or gradient computation mode when multiple current templates, multiple prediction models, and/or multiple gradient computation modes are involved in the codec flow, etc.
Alternatively, the mode may be set according to the current template. That is, if there are N selectable modes, a first schema syntax needs to be added, which can take on N indexes.
The modes may also be set differently depending on the predictive model. That is, if there are N selectable modes, a second scheme syntax can be added, which can take on N indexes.
The mode may be set differently according to the gradient calculation mode. That is, if there are N selectable modes, a third mode syntax can be added, which can take the value of N indexes.
In one specific example, a syntax enhanced bcw area mode is added, indicating the selection of the template region (i.e., the current template). As shown in fig. 12, enhanced_ bcw _enable is 1 to indicate selection of the bi-directional inter prediction method of the present application, in which case, syntax enhanced_ bcw _area_mode is added, which has a value of 0 to 5, and each of the 6 region selection methods in fig. 10 is represented, and syntax takes one of the index values to represent the selected template region setting method.
In another specific example, a syntax enhanced_ bcw _fun_mode is added, indicating the selection of the initial model/predictive model. As shown in fig. 13, enhanced_ bcw _enable is 1 to select the bi-directional inter prediction method of the present application, in this case, syntax enhanced_ bcw _fun_mode is added, which takes a value of 0 to 2, and respectively represents an initial model/prediction model of 3 different formulas, and syntax takes one of index values to represent the selected formulas.
In yet another specific example, a syntax enhanced_ bcw _grad_mode is added, indicating the choice of gradient calculation method. As shown in fig. 14, enhanced_ bcw _enable is 1 to indicate that the bi-directional inter prediction method of the present application is selected, in this case, syntax enhanced_ bcw _grad_mode is added, which takes values of 0 to 3, respectively indicates 4 gradient calculation methods, and syntax takes one of index values to indicate the selected gradient calculation method.
In yet another specific exampleIn the sub-table, syntax enhanced_ bcw _fun_mode is added to represent the choice of formulas in the initial model/predictive model. The added syntax enhanced_ bcw _grad_mode represents the choice of gradient calculation method. As shown in fig. 15, enhanced_ bcw _enable is 1 to indicate that the bi-directional inter prediction method of the present application is selected, in this case, syntax enhanced_ bcw _fun_mode is added, its value is 0-2, and each of the initial models/prediction models of 3 different formulas is represented, and one of the index values is syntactically represented by the selected formula. Where enhanced bcw fun mode, when given a value of 2, represents the use of the following gradient formula: p (P) bi-pred (x,y)=a×P 1 (x,y)+b×P 2 (x,y)+c×2 bitdepth-1 +d×pos x +e×pos y +f×grad (x, y); in this case, a syntax enhanced_ bcw _grad_mode may be added, whose values are 0 to 3, respectively representing 4 different gradient calculation methods, and syntax is one of the index values, representing the gradient calculation method selected.
Referring to fig. 16, fig. 16 is a flowchart illustrating an embodiment of a video decoding method according to the present application. It should be noted that, if there are substantially the same results, the present embodiment is not limited to the flow sequence shown in fig. 16. In this embodiment, the video decoding method includes the steps of:
s401: determining a prediction block of a current block in the image based on any one of the prediction methods described above;
s402: the current block is decoded based on the prediction block.
Referring to fig. 17, fig. 17 is a schematic structural diagram of an embodiment of an encoder of the present application. The present encoder 10 includes a processor 12, the processor 12 being configured to execute instructions to implement the prediction method and the video encoding method described above. The specific implementation process is described in the above embodiments, and will not be described herein.
The processor 12 may also be referred to as a CPU (Central Processing Unit ). The processor 12 may be an integrated circuit chip having signal processing capabilities. Processor 12 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 12 may be any conventional processor or the like.
Encoder 10 may further include a memory 11 for storing instructions and data necessary for processor 12 to operate.
The processor 12 is configured to execute instructions to implement the methods provided by any of the embodiments of the prediction method and video coding method of the present application and any non-conflicting combinations described above.
Referring to fig. 18, fig. 18 is a schematic structural diagram of an embodiment of a decoder of the present application. The present decoder 20 includes a processor 22, the processor 22 being configured to execute instructions to implement the prediction method and the video decoding method described above. The specific implementation process is described in the above embodiments, and will not be described herein.
The processor 22 may also be referred to as a CPU (Central Processing Unit ). The processor 22 may be an integrated circuit chip having signal processing capabilities. Processor 22 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a microprocessor or the processor 22 may be any conventional processor or the like.
Decoder 20 may further include a memory 21 for storing instructions and data required for processor 22 to operate.
The processor 22 is configured to execute instructions to implement the methods provided by any of the embodiments of the prediction method and video decoding method of the present application and any non-conflicting combinations described above.
Referring to fig. 19, fig. 19 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present application. The computer readable storage medium 30 of the embodiments of the present application stores instruction/program data 31, which when executed, implements the methods provided by any of the embodiments of the prediction method, video decoding method, and video encoding method, and any non-conflicting combination of the present application. In an embodiment, the instructions/program data 31 may form a program file stored in the storage medium 30 in the form of a software product, so that a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) performs all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium 30 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is only the embodiments of the present application, and not the patent scope of the present application is limited by the foregoing description, but all equivalent structures or equivalent processes using the contents of the present application and the accompanying drawings, or directly or indirectly applied to other related technical fields, which are included in the patent protection scope of the present application.

Claims (18)

1. A method of bi-directional inter prediction, the method comprising:
determining a first reference block and a second reference block of the current block;
determining a current template, a first reference template and a second reference template, wherein the current template is a pixel area adjacent to the current block, and the position relation between the current template and the current block, the position relation between the first reference template and the first reference block and the position relation between the second reference template and the second reference block are the same;
utilizing pixels in the current template, substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model, and obtaining a prediction model;
and predicting the current block by adopting the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block.
2. The bi-directional inter prediction method according to claim 1, wherein predicting the current block using the first reference block, the second reference block, and the prediction model to obtain the predicted block of the current block comprises:
and weighting a first prediction block and a second prediction block through the prediction model to obtain a prediction block of the current block, wherein the first prediction block is obtained by performing motion compensation on the first reference block, and the second prediction block is obtained by performing motion compensation on the second reference block.
3. The bi-directional inter prediction method according to claim 2, wherein said weighting the first prediction block and the second prediction block by the prediction model to obtain the prediction block of the current block comprises:
and weighting each pixel in the first prediction block and the corresponding pixel in the second prediction block through the prediction model, and/or calculating the gradient of each pixel in the first prediction block and/or the gradient of the corresponding pixel in the second prediction block, and/or calculating the position information of the corresponding pixel in the current block to obtain the predicted value of the corresponding pixel in the current block so as to obtain the prediction block.
4. The method for bi-directional inter prediction as recited in claim 3, wherein,
the predicted value of each pixel in the current block is equal to the sum value or the sum value is truncated to the value after the pixel value is in the value range;
wherein the sum value is equal to the sum of a weighting value, a gradient operation value, a position information operation value and/or a constant; the weighted value is a value obtained by weighting each pixel in the first prediction block and a corresponding pixel in the second prediction block, the gradient operation value is a value obtained by operating a gradient of each pixel in the first prediction block and/or a gradient of a corresponding pixel in the second prediction block, and the position operation value is a value obtained by operating position information of a corresponding pixel in the current block.
5. The bi-directional inter prediction method according to claim 4, wherein the constant is a value or a set value calculated from a bit depth of an image to which the current block belongs.
6. The bi-directional inter-prediction method of claim 3, wherein the operating on the gradient of each pixel in the first prediction block and/or the gradient of the corresponding pixel in the second prediction block comprises:
Fusing the gradient of each pixel in the first prediction block and the gradient of the corresponding pixel in the second prediction block to obtain a fused gradient;
performing linear operation on the fusion gradient;
the gradient of each pixel in the first prediction block is obtained by using the gradient value of each pixel in the first prediction block in at least one direction, and the gradient of the corresponding pixel in the second prediction block is obtained by using the gradient value of the corresponding pixel in the second prediction block in at least one direction.
7. The method of bi-directional inter prediction as recited in claim 1, wherein,
the sum of all parameters in the prediction model is 1.
8. The method of bi-directional inter prediction as recited in claim 1, wherein,
the method for solving the parameters of the initial model by utilizing the pixels in the current template and substituting the pixels in the first reference template and the second reference template into the initial model comprises the following steps:
calculating a parameter optimal solution for the initial model by taking the minimum fusion value of the difference values of all pixel positions as an optimization target, wherein the difference value of each pixel position is equal to the difference value of the predicted value of each pixel position and the value of each pixel position in the current template, and the predicted value of each pixel position is obtained by substituting the value obtained by carrying out motion compensation on the pixel values of each pixel position in the first reference template and the second reference template into the initial model; and/or the number of the groups of groups,
Substituting the pixel values in the first prediction template, the pixel values in the second prediction template and the pixel values in the current template into the initial model to construct a plurality of equations; and solving parameters in the initial model based on the equations, wherein the first prediction template is obtained by performing motion compensation on the first reference template, and the second prediction template is obtained by performing motion compensation on the second reference template.
9. The method of bi-directional inter prediction as recited in claim 1, wherein,
the obtaining a prediction model by using pixels in the current template and substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model includes:
utilizing pixels of a public area in the current template, substituting the pixels of the public area in the first reference template and the second reference template into an initial model to solve parameters of the initial model, and obtaining the prediction model;
the common area is an intersection of an area capable of determining a pixel value in the current template, an area capable of determining a pixel value in the first reference template and an area capable of determining a pixel value in the second reference template.
10. The bi-directional inter-prediction method of claim 1, wherein the determining the current template, the first reference template, and the second reference template comprises: determining a plurality of current templates, a plurality of first reference templates and a plurality of second reference templates, wherein the plurality of current templates, the plurality of first reference templates and the plurality of second reference templates are uniform and correspond to one another;
the obtaining a prediction model by using pixels in the current template and substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model includes: utilizing pixels in each current template, substituting the pixels in the first reference template and the second reference template corresponding to each current template into the initial model to solve parameters of the initial model, and obtaining a prediction model corresponding to each current template;
the predicting the current block by using the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block includes: predicting the current block by adopting the first reference block, the second reference block and a prediction model corresponding to each current template to obtain a prediction block corresponding to each current template; and determining the prediction block of the current block based on the prediction blocks corresponding to the plurality of current templates.
11. The method of bi-directional inter prediction according to claim 1, wherein the number of the initial models is a plurality,
the obtaining a prediction model by using pixels in the current template and substituting the pixels in the first reference template and the second reference template into an initial model to solve parameters of the initial model includes: utilizing pixels in the current template, substituting the pixels in the first reference template and the second reference template into each initial model to solve parameters of each initial model, and obtaining a prediction model corresponding to each initial model;
the predicting the current block by using the first reference block, the second reference block and the prediction model to obtain a prediction block of the current block includes: predicting the current block by adopting the first reference block, the second reference block and the prediction model corresponding to each initial model to obtain the prediction block corresponding to each initial model; determining a prediction block of the current block based on the prediction blocks corresponding to the plurality of initial models;
wherein the formula and/or gradient calculation method of different initial models are different.
12. A method of video encoding, the method comprising:
Determining a prediction block of a current block in an image based on the prediction method of any one of claims 1-11;
the current block is encoded based on the prediction block.
13. The video coding method of claim 12, wherein the encoding the current block based on the prediction block comprises:
and setting values of preset syntax elements in the coded code stream, wherein different values of the preset syntax elements represent whether the prediction method is started or not.
14. The video coding method of claim 12, wherein the encoding the current block based on the prediction block comprises:
and coding the current template index and the initial model index corresponding to the prediction block to obtain a coded code stream.
15. A method of video decoding, the method comprising:
determining a prediction block of a current block in an image based on the prediction method of any one of claims 1-11;
the current block is decoded based on the prediction block.
16. An encoder, the encoder comprising a processor; the processor is configured to execute instructions to implement the steps of the method according to any one of claims 1-14.
17. A decoder, the decoder comprising a processor; the processor is configured to execute instructions to implement the steps of the method according to any one of claims 1-11 and 15.
18. A computer readable storage medium having stored thereon a program and/or instructions, which when executed, implement the steps of the method of any of claims 1-15.
CN202211736966.2A 2022-12-30 2022-12-30 Bidirectional inter-frame prediction method and device Pending CN116074534A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211736966.2A CN116074534A (en) 2022-12-30 2022-12-30 Bidirectional inter-frame prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211736966.2A CN116074534A (en) 2022-12-30 2022-12-30 Bidirectional inter-frame prediction method and device

Publications (1)

Publication Number Publication Date
CN116074534A true CN116074534A (en) 2023-05-05

Family

ID=86176234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211736966.2A Pending CN116074534A (en) 2022-12-30 2022-12-30 Bidirectional inter-frame prediction method and device

Country Status (1)

Country Link
CN (1) CN116074534A (en)

Similar Documents

Publication Publication Date Title
JP7335315B2 (en) Image prediction method and related device
CN107809642B (en) Method for encoding and decoding video image, encoding device and decoding device
CN103141100B (en) Smoothing filter in the frame of video coding
CN112087629B (en) Image prediction method, device and computer readable storage medium
CN100442855C (en) Image encoding apparatus, image encoding method, image encoding program, image decoding apparatus, image decoding method, and image decoding program
JP5367098B2 (en) Motion vector predictive coding method, motion vector predictive decoding method, moving picture coding apparatus, moving picture decoding apparatus, and programs thereof
CN102939760A (en) Method and apparatus for performing interpolation based on transform and inverse transform
CN101822061A (en) Video coding method and video decoding method
CN109587479A (en) Inter-frame prediction method, device and the codec of video image
JP2010135864A (en) Image encoding method, device, image decoding method, and device
US20210112254A1 (en) Affine model-based image encoding/decoding method and device
KR20090058954A (en) Video coding method and apparatus using side matching, and video decoding method and appartus thereof
CN101009839A (en) Method for video encoding or decoding based on orthogonal transform and vector quantization, and apparatus thereof
CN102422643A (en) Image processing device, method, and program
CN112866720B (en) Motion vector prediction method and device and coder-decoder
WO2008148272A1 (en) Method and apparatus for sub-pixel motion-compensated video coding
WO2020042604A1 (en) Video encoder, video decoder and corresponding method
WO2010090335A1 (en) Motion picture coding device and motion picture decoding device using geometric transformation motion compensating prediction
CN112740663B (en) Image prediction method, device and corresponding encoder and decoder
CN111050177A (en) Video encoding method, video decoding method, video encoding apparatus, video decoding apparatus, computer device, and storage medium
CN102905125B (en) Motion picture encoding apparatus and dynamic image encoding method
JP4427553B2 (en) Video decoding device
CN105933706A (en) A multimedia codec, an application processor, and an electronic device
CN116074534A (en) Bidirectional inter-frame prediction method and device
KR20110134404A (en) Method for predicting a block of image data, decoding and coding devices implementing said method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination