WO2022067805A1 - 图像预测方法、编码器、解码器以及计算机存储介质 - Google Patents

图像预测方法、编码器、解码器以及计算机存储介质 Download PDF

Info

Publication number
WO2022067805A1
WO2022067805A1 PCT/CN2020/119731 CN2020119731W WO2022067805A1 WO 2022067805 A1 WO2022067805 A1 WO 2022067805A1 CN 2020119731 W CN2020119731 W CN 2020119731W WO 2022067805 A1 WO2022067805 A1 WO 2022067805A1
Authority
WO
WIPO (PCT)
Prior art keywords
current block
value
image component
reference image
predicted
Prior art date
Application number
PCT/CN2020/119731
Other languages
English (en)
French (fr)
Inventor
马彦卓
邱瑞鹏
霍俊彦
万帅
杨付正
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP20955825.3A priority Critical patent/EP4224842A4/en
Priority to CN202080105520.2A priority patent/CN116472707A/zh
Priority to PCT/CN2020/119731 priority patent/WO2022067805A1/zh
Publication of WO2022067805A1 publication Critical patent/WO2022067805A1/zh
Priority to US18/126,696 priority patent/US20230262251A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes

Definitions

  • the present application relates to the technical field of video coding and decoding, and in particular, to an image prediction method, an encoder, a decoder, and a computer storage medium.
  • H.265/High Efficiency Video Coding has been unable to meet the needs of the rapid development of video applications.
  • JVET Joint Video Exploration Team
  • VVC VVC Test Model
  • H.266/VVC considering the correlation of video data in space, time and components, a variety of prediction techniques have been produced.
  • the existing prediction algorithms do not fully consider the correlation between different components, which reduces the accuracy of prediction among the components, and further reduces the compression coding efficiency.
  • CCLM cross-component linear model
  • the present application provides an image prediction method, an encoder, a decoder and a computer storage medium, which can enhance the predicted value of an image component, so that the enhanced target predicted value is closer to the real value, thereby effectively improving the prediction accuracy , thereby improving the encoding and decoding efficiency.
  • an embodiment of the present application provides an image prediction method, which is applied to an encoder, and the method includes:
  • the to-be-predicted image component of the current block is encoded according to the target predicted value.
  • an embodiment of the present application provides an image prediction method, which is applied to a decoder, and the method includes:
  • the target prediction mode determine the initial prediction value of the to-be-predicted image component of the current block
  • an embodiment of the present application provides an encoder, the encoder includes a first determination unit, a first prediction unit, and a coding unit; wherein,
  • the first determination unit configured to determine the initial prediction value of the to-be-predicted image component of the current block
  • the first determining unit is further configured to determine the sample value related to the reference image component of the current block; and determine the side information of the current block according to the sample value related to the reference image component;
  • the first prediction unit is configured to filter the initial predicted value by using a preset network model and side information of the current block to obtain a target predicted value of the image component to be predicted of the current block;
  • the encoding unit is configured to encode the to-be-predicted image component of the current block according to the target prediction value.
  • an embodiment of the present application provides an encoder, where the encoder includes a first memory and a first processor; wherein,
  • the first memory for storing a computer program executable on the first processor
  • the first processor is configured to execute the method according to the first aspect when running the computer program.
  • an embodiment of the present application provides a decoder, the decoder includes a parsing unit, a second determining unit, a second prediction unit, and a decoding unit; wherein,
  • the parsing unit is configured to parse the code stream to determine the target prediction mode of the current block
  • the second determining unit is configured to determine the initial prediction value of the image component to be predicted of the current block according to the target prediction mode
  • the second determining unit is further configured to determine the sample value related to the reference image component of the current block; and determine the side information of the current block according to the sample value related to the reference image component;
  • the second prediction unit is configured to use a preset network model and side information of the current block to filter the initial predicted value to obtain the target predicted value of the image component to be predicted of the current block;
  • the decoding unit is configured to decode the to-be-predicted image component of the current block according to the target prediction value.
  • an embodiment of the present application provides a decoder, where the decoder includes a second memory and a second processor; wherein,
  • the second memory for storing a computer program executable on the second processor
  • the second processor is configured to execute the method according to the second aspect when running the computer program.
  • an embodiment of the present application provides a computer storage medium, where the computer storage medium stores an image prediction program, and when the image prediction program is executed by the first processor, implements the method described in the first aspect, or The method as described in the second aspect is implemented when executed by the second processor.
  • Embodiments of the present application provide an image prediction method, an encoder, a decoder, and a computer storage medium.
  • On the encoder side by determining an initial prediction value of an image component to be predicted of a current block; determining a reference image component of the current block related sample values; determine the side information of the current block according to the sample values related to the reference image components; filter the initial predicted value by using a preset network model and the side information of the current block to obtain the The target prediction value of the to-be-predicted image component of the current block; encoding the to-be-predicted image component of the current block according to the target predicted value.
  • the code stream is parsed to obtain the target prediction mode of the current block; according to the target prediction mode, the initial prediction value of the to-be-predicted image component of the current block is determined; sample value; determine the side information of the current block according to the sample value related to the reference image component; filter the initial predicted value by using a preset network model and the side information of the current block to obtain the current block.
  • the target prediction value of the to-be-predicted image component; according to the target prediction value, the to-be-predicted image component of the current block is decoded.
  • the correlation between image components can be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value, thereby effectively improving the prediction accuracy. , thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.
  • Fig. 1 is the flow chart of a kind of MIP prediction process that the related art scheme provides
  • 2A is a block diagram of the composition of a video coding system provided by an embodiment of the application.
  • 2B is a block diagram of the composition of a video decoding system provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of an image prediction method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an application scenario of an image prediction method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an application scenario of another image prediction method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a network structure of a preset network model provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a network structure of a residual layer provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a network structure of another preset network model provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a network structure of another preset network model provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a network structure of still another preset network model provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an application scenario of another image prediction method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • 15 is a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • 16 is a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • 17 is a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • 19 is a schematic diagram of the composition and structure of an encoder provided by an embodiment of the application.
  • 20 is a schematic diagram of a specific hardware structure of an encoder provided by an embodiment of the application.
  • 21 is a schematic diagram of the composition and structure of a decoder provided by an embodiment of the present application.
  • FIG. 22 is a schematic diagram of a specific hardware structure of a decoder provided by an embodiment of the present application.
  • a first image component, a second image component, and a third image component are generally used to represent a coding block (Coding Block, CB).
  • the three image components are respectively a luminance component, a blue chrominance component and a red chrominance component.
  • the luminance component is usually represented by the symbol Y
  • the blue chrominance component is usually represented by the symbol Cb or U
  • the red color is usually represented by the symbol Y.
  • the degree component is usually represented by the symbol Cr or V; in this way, the video image can be represented in the YCbCr format or in the YUV format.
  • the first image component may be a luminance component
  • the second image component may be a blue chrominance component
  • the third image component may be a red chrominance component
  • the current pixel value is predicted using the encoded and decoded reconstructed pixels of the upper and left regions of the current block as references. Specifically, including but not limited to the following intra prediction modes:
  • Planar (PLANAR) mode It is mainly used in areas where the image texture is relatively smooth and has a gradual process, using the reference pixels on the four adjacent boundaries of the pixels to be predicted in the current block to perform linear interpolation, summation and average (based on position), to get the predicted value of the current pixel.
  • DC mode It is mainly used for areas with flat images, smooth textures, and no gradients. The average value of all reference pixels in the previous row and left column is used as the predicted value of the pixels in the current block.
  • VVC adopts a more refined intra prediction direction, expanding the 33 angle prediction modes in HEVC to 65, using all the reference pixels in the previous row and left column for angle projection to obtain the current pixel prediction value.
  • Position-based intra prediction combination (Position Dependent intra Prediction Combination, PDPC): It is a technique for modifying the predicted value. After the intra-frame prediction is performed in some intra-frame modes, the weighted average calculation of PDPC is performed to obtain the final predicted value.
  • WAIP Wide-Angle Intra Prediction
  • Multiple reference line intra prediction mode uses multiple optional reference lines (adjacent on the upper side and adjacent on the left) to perform intra-frame prediction.
  • Matrix-weighted Intra Prediction By training the matrix, multiply the input of the previous row and the left column with the matrix trained based on deep learning to obtain the predicted value of the current block.
  • the pixel value of the current coding block is predicted in combination with the motion information.
  • inter prediction modes including but not limited to the following inter prediction modes:
  • (a) Merge mode The Merge mode in H.266/VVC will create a motion vector (Motion Vector, MV) candidate list for the current CU, including 6 candidate motion information, by traversing the 6 candidate motion information , and calculate the rate-distortion cost, and finally select a candidate with the smallest rate-distortion cost (motion estimation) as the optimal motion information of the Merge mode.
  • MV Motion Vector
  • CIIP Combined Inter and Intra Prediction
  • Geometric partitioning prediction mode (d) Geometric partitioning prediction mode (GPM): a shape partitioning mode other than squares and rectangles. GPM stipulates that 360° unequal intervals are quantized into 24 angles, and each angle has a maximum of 4 A total of 64 GPM division modes are combined. The inter-frame block is divided into two non-rectangular sub-partitions for unidirectional prediction and then weighted fusion to obtain prediction values, so as to more flexibly represent inter-frame prediction data, reduce prediction errors, and improve coding efficiency.
  • Block-based Affine Transform Motion Compensated Prediction (Affine Motion Compensated Prediction, Affine): The affine motion field of the block is described by the motion information of two control points (4 parameters) or three control point motion vectors (6 parameters). .
  • Subblock-based Temporal Motion Vector Prediction uses the motion field in the co-located image to improve the motion vector prediction and merge mode of CUs in the current image.
  • Bi-prediction with CU-level weight (Bi-prediction with CU-level Weight, BCW): For the bi-prediction mode in VVC, a weighted average of two prediction signals is allowed.
  • the luma components When encoding the chroma components of the CU, the luma components have already been encoded to obtain luma reconstruction values. At this time, when the pixel value of the chrominance component is encoded, the luminance reconstruction value of the same area can be used to predict it.
  • This technique is called CCLM chrominance prediction coding.
  • the input of the CCLM prediction process is the luma reconstruction value of the current block, and the upper and left adjacent reference pixels
  • the output of the CCLM prediction process is the chroma prediction value of the current block.
  • H.266/VVC proposes a coding technique for the chrominance component, namely the CCLM technique .
  • the CCLM technique can obtain the predicted value of the chrominance component of the coded block by calculating the luma reconstruction value of the CU using the established linear model. Specifically as follows,
  • Pred C (i, j) represents the chrominance prediction value of the coding block
  • Rec L (i, j) represents the (downsampled) luminance reconstruction value in the same coding block
  • ⁇ and ⁇ represent the model parameters
  • the prediction process of CCLM technology is mainly summarized into four steps: 1 Determine the adjacent reference pixel range and availability of luminance and chrominance according to different CCLM prediction modes, and select adjacent reference pixels for subsequent linear model derivation. 2. Since the chrominance component in the 4:2:0 color component sampling format is half of the luminance component in the horizontal direction and the vertical direction, in order to make the luminance pixels of the current CU correspond to the chrominance pixels one-to-one, it is necessary to measure the luminance block. Perform down-sampling; (3) perform grouping processing according to the selected adjacent reference pixels and deduce and calculate the linear model parameters to obtain a linear model; (4) calculate the chromaticity prediction value according to the obtained linear model.
  • model parameters ( ⁇ and ⁇ ) are derived as follows:
  • the CCLM mode can include a total of 3 modes, namely: LM mode, LM_T mode and LM_L mode.
  • the main difference between these three modes is the selection of adjacent reference pixel ranges.
  • W and H width and height of the current chroma block
  • Ref top and Ref left to represent the number of adjacent reference pixels on the upper side of the chroma block and the number of adjacent reference pixels on the left side
  • numLeftBelow and numTopRight represents the number of available adjacent reference pixels on the lower left side of the current chroma block and the number of available adjacent reference pixels on the upper right side, respectively.
  • the positions of the four luminance sample points in LM mode are:
  • the positions of the four luminance sample points in LM_T mode are:
  • the positions of the four luminance sample points in LM_L mode are:
  • the selected four luminance sample points can be down-sampled, and then four comparisons are made, and then the smallest two points (represented by x 0 A and x 1 A) and the largest two points (represented by x 1 A) can be found.
  • 0 B and x 1 B) the corresponding chrominance sample points are respectively represented by y 0 A and y 1 A, y 0 B and y 1 B.
  • the horizontal axis ie, the X-axis
  • the vertical axis ie, the Y-axis
  • Chroma chromaticity
  • the two points filled with black are the two smallest points
  • the two points filled with white are the two largest points
  • the points filled with grid lines Use Xa and Ya to represent the mean luminance and chromaticity, respectively; between two white-filled points, the dots filled with grid lines use Xb and Yb to represent the mean luminance and chromaticity, respectively.
  • Xa, Ya, Xb and Yb are calculated as follows,
  • model parameters can be derived from Xa, Ya, Xb and Yb.
  • the derivation of the model parameter ⁇ is shown in formula (4), and the derivation of the model parameter ⁇ is shown in formula (5).
  • the chrominance prediction value of the current block can be finally calculated according to formula (1).
  • the existing prediction algorithms do not fully consider the correlation between different components, which reduces the accuracy of prediction among the components, and further reduces the compression coding efficiency.
  • the embodiments of the present application provide an image prediction method, which can be applied to both an encoder and a decoder.
  • the basic idea of the method is: after determining the initial prediction value of the image component to be predicted of the current block, determine the sample value related to the reference image component of the current block; then determine the sample value related to the reference image component according to the sample value related to the reference image component the side information of the current block; then use the preset network model and the side information of the current block to filter the initial predicted value to obtain the target predicted value of the to-be-predicted image component of the current block; finally, according to the target The predicted value, encoding or decoding the to-be-predicted image component of the current block.
  • the correlation between image components can be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value, thereby effectively improving the prediction accuracy. , thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.
  • the video coding system 10 includes a transformation and quantization unit 101, an intra-frame estimation unit 102, an intra-frame The prediction unit 103, the motion compensation unit 104, the motion estimation unit 105, the inverse transform and inverse quantization unit 106, the filter control analysis unit 107, the filtering unit 108, the encoding unit 109, the decoded image buffer unit 110, etc., wherein the filtering unit 108 can Realize deblocking (Deblocking) filtering, sample adaptive offset (Sample Adaptive Offset, SAO) filtering, and adaptive loop filtering (Adaptive loop Filter, ALF), etc., the encoding unit 109 can realize header information encoding and context-based automatic Adaptive Binary Arithmetic Coding (Context-based Adaptive Binary Arithmatic Coding, CABAC).
  • Deblocking deblocking
  • SAO sample adaptive offset
  • ALF adaptive loop filtering
  • the encoding unit 109 can realize header information encoding and context-based automatic Adaptive Binary Arithm
  • a video coding block can be obtained by dividing the coding tree block (Coding Tree Unit, CTU), and then the residual pixel information obtained after intra-frame or inter-frame prediction is transformed and quantized.
  • Unit 101 pairs The video coding block is transformed, including transforming residual information from the pixel domain to the transform domain, and quantizing the resulting transform coefficients to further reduce the bit rate;
  • the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used for Intra prediction is performed on the video coding block; specifically, the intra prediction unit 102 and the intra prediction unit 103 are used to determine the intra prediction mode to be used to encode the video coding block;
  • the motion compensation unit 104 and the motion estimation unit 105 is used to perform inter-predictive encoding of the received video encoding block relative to one or more blocks in one or more reference frames to provide temporal prediction information; the motion estimation performed by the motion estimation unit 105 is to generate a motion vector.
  • the motion vector can estimate the motion of the video coding block, and then the motion compensation unit 104 performs motion compensation based on the motion vector determined by the motion estimation unit 105; after determining the intra prediction mode, the intra prediction unit 103 also For providing the selected intra prediction data to the encoding unit 109, and the motion estimation unit 105 also sends the calculated motion vector data to the encoding unit 109; in addition, the inverse transform and inverse quantization unit 106 is used for the video Reconstruction of the coding block, reconstructing the residual block in the pixel domain, the reconstructed residual block removing the blocking artifacts by the filter control analysis unit 107 and the filtering unit 108, and then adding the reconstructed residual block to the decoding A predictive block in the frame of the image buffer unit 110 is used to generate a reconstructed video coding block; the coding unit 109 is used for coding various coding parameters and quantized transform coefficients.
  • the context content can be based on adjacent coding blocks, and can be used to encode information indicating the determined intra-frame prediction mode, and output a code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding blocks, for Forecast reference. As the video image coding proceeds, new reconstructed video coding blocks are continuously generated, and these reconstructed video coding blocks are all stored in the decoded image buffer unit 110.
  • FIG. 2B it shows an example of a block diagram of a video decoding system provided by an embodiment of the present application; as shown in FIG. 2B, the video decoding system 20 includes a decoding unit 201, an inverse transform and inverse quantization unit 202, an intra-frame The prediction unit 203, the motion compensation unit 204, the filtering unit 205, the decoded image buffering unit 206, etc., wherein the decoding unit 201 can implement header information decoding and CABAC decoding, and the filtering unit 205 can implement deblocking filtering, SAO filtering, and ALF filtering.
  • the decoding unit 201 can implement header information decoding and CABAC decoding
  • the filtering unit 205 can implement deblocking filtering, SAO filtering, and ALF filtering.
  • the code stream of the video signal is output; the code stream is input into the video decoding system 20, and firstly passes through the decoding unit 201 to obtain the decoded transform coefficient; Inverse transform and inverse quantization unit 202 processes to generate residual blocks in the pixel domain; intra prediction unit 203 may be used to generate based on the determined intra prediction mode and data from previously decoded blocks of the current frame or picture Prediction data for the current video decoding block; motion compensation unit 204 determines prediction information for the video decoding block by parsing the motion vector and other associated syntax elements, and uses the prediction information to generate predictive information for the video decoding block being decoded block; a decoded video block is formed by summing the residual block from inverse transform and inverse quantization unit 202 and the corresponding predictive block produced by intra prediction unit 203 or motion compensation unit 204; the decoded video signal Video quality may be improved by filtering unit 205 in order to remove blocking artifacts; decoded video blocks are then stored in decoded image buffer unit 206, which stores reference
  • the initial prediction value is enhanced after each prediction technique is performed.
  • the image prediction method in this embodiment of the present application may be applied to a video coding system, that is, the image prediction method may be applied after the prediction part of FIG. 2A (the black and bold block diagram part in FIG. 2A ), or may also It is applied before or after the in-loop filtering part (eg, the gray-bold block diagram part in FIG. 2A).
  • the image prediction method in this embodiment of the present application may also be applied to a video decoding system, that is, the image prediction method may be applied after the prediction part in FIG.
  • the in-loop filtering part eg, the gray-bold block diagram part in FIG.
  • the image prediction method in the embodiment of the present application can be applied to both a video encoding system and a video decoding system, and can even be applied to a video encoding system and a video decoding system at the same time. Specific restrictions.
  • the image prediction method provided by the embodiment of the present application is applied to a video encoding device, that is, an encoder.
  • the functions implemented by the method can be implemented by the first processor in the encoder calling a computer program, and of course the computer program can be stored in the first memory.
  • the encoder includes at least a first processor and a first memory.
  • FIG. 3 shows a schematic flowchart of an image prediction method provided by an embodiment of the present application. As shown in Figure 3, the method may include:
  • S301 Determine the initial prediction value of the image component to be predicted of the current block.
  • each image block to be encoded currently may be called a coding block (Coding Block, CB).
  • each encoding block may include a first image component, a second image component, and a third image component; and the current block is the encoding of the first image component or the second image component or the third image component to be predicted currently in the video image yuan.
  • the current block performs the first image component prediction, and the first image component is a luminance component, that is, the image component to be predicted is a luminance component, then the current block may also be called a luminance block; or, it is assumed that the current block performs the second image component prediction prediction, and the second image component is a chrominance component, that is, the image component to be predicted is a chrominance component, then the current block may also be called a chrominance block.
  • the determining the initial prediction value corresponding to the to-be-predicted image component of the current block may include:
  • the target prediction mode is used to indicate the prediction mode adopted for the coding prediction of the current block.
  • a simple decision-making strategy can be adopted, such as determining according to the size of the distortion value; or a complex decision-making strategy can be adopted, such as determining according to the cost result of Rate Distortion Optimization (RDO)
  • RDO Rate Distortion Optimization
  • the embodiments of the present application do not make any limitation.
  • the RDO method may be used to determine the target prediction mode of the current block.
  • the determining the target prediction mode of the current block may include:
  • An optimal rate-distortion cost result is selected from the rate-distortion cost results, and a candidate prediction mode corresponding to the optimal rate-distortion cost result is determined as the target prediction mode of the current block.
  • candidate prediction modes may be used for the current block to respectively perform precoding processing on the image components to be predicted in the current block.
  • candidate prediction modes generally include intra prediction modes, inter prediction modes, and inter-component prediction modes.
  • the intra-frame prediction modes may include PLANAR mode, DC mode, angle prediction mode, PDCP mode, WAIP mode, MRL mode, and MIP mode, etc.
  • the inter-frame prediction modes may include Merge mode, MMVD mode, CIIP mode, GPM mode, SbTMVP mode, etc. mode and BCW mode, etc.
  • the inter-component prediction modes may include intra-image inter-component prediction modes and cross-image inter-component prediction modes.
  • the target prediction mode may be an intra prediction mode or an inter prediction mode.
  • the target prediction mode may also be an intra-picture component inter-prediction mode or a cross-picture component inter-prediction mode.
  • the inter-component prediction mode usually refers to a cross-image inter-component prediction mode, such as CCLM mode.
  • the rate-distortion cost result corresponding to the candidate prediction mode can be obtained;
  • the optimal rate-distortion cost result is obtained, and the candidate prediction mode corresponding to the optimal rate-distortion cost result is determined as the target prediction mode of the current block.
  • the distortion value corresponding to the candidate prediction mode can be obtained; and then selecting from the obtained distortion values The minimum distortion value is determined, and the candidate prediction mode corresponding to the minimum distortion value is determined as the target prediction mode of the current block. In this way, the determined target prediction mode is finally used to encode the current block, which can make the prediction residual smaller and improve the encoding efficiency.
  • S302 Determine a sample value related to a reference image component of the current block.
  • the reference image component may include one or more image components in the current image that are different from the image components to be predicted, where the current image is the image where the current block is located.
  • the current image here means that different image components in the current image can be used as the input for filtering enhancement in the embodiment of the present application.
  • the determining a sample value related to the reference image component of the current block may include:
  • a sample value associated with the reference image component of the current block is determined based on at least one of a predicted value and a reconstructed value of the reference image component of the current block.
  • the predicted value of the reference image component here can be interpreted as "determine the predicted value according to the prediction mode of the reference image component of the current block", and the reconstructed value of the reference image component here can be interpreted as "according to the current block reference image component
  • the prediction mode reconstructs the obtained reconstructed value by encoding after obtaining the predicted value.” That is to say, it can be determined based on the predicted value of the reference image component of the current block, or the reconstructed value of the reference image component of the current block, or the predicted value and the reconstructed value of the reference image component of the current block.
  • the samples associated with the reference picture components are described in the predicted value of the reference image component.
  • the determining a sample value related to the reference image component of the current block may include:
  • a sample value associated with the reference image component is determined according to at least one of a predicted value and a reconstructed value of the reference image component corresponding to adjacent pixels of the current block.
  • the predicted value of the reference image component here can be interpreted as “determine the predicted value according to the prediction mode of the image block where the adjacent pixel is located corresponding to the reference image component”, and the reconstructed value of the reference image component here can be It is interpreted as "after obtaining the predicted value according to the prediction mode of the image block in which the adjacent pixel is located corresponding to the reference image component, reconstruct the obtained reconstructed value through encoding". That is to say, the predicted value of the reference image component may correspond to the adjacent pixels of the current block, or the reconstructed value of the reference image component may be corresponding to the adjacent pixels of the current block, or may also be corresponding to the adjacent pixels of the current block.
  • the predicted value and the reconstructed value of the reference picture component are used to determine the sample value related to the reference picture component.
  • the adjacent pixels of the current block may include at least one row of pixels adjacent to the current block.
  • the adjacent pixels of the current block may also include at least one column of pixels adjacent to the current block.
  • the sample value related to the reference image component of the current block may include at least one of the following: the reference image component of the current block The reconstructed value of , the image component value corresponding to at least one row of pixels adjacent to the current block, and the image component value corresponding to at least one column of pixels adjacent to the current block.
  • the reference image component is different from the predicted image component, and the image component value may include the predicted image component reference value and/or the reference image component reference value.
  • the reference value here can be a predicted value or a reconstructed value.
  • the sample values related to the reference image component of the current block may include the reconstructed value of the reference image component of the current block, the reference image component value corresponding to at least one row of pixels adjacent to the current block, and the value of the reference image component corresponding to the current block. At least two items of reference image component values corresponding to at least one column of pixels adjacent to the block, wherein the reference image component of the current block is different from the to-be-predicted image component.
  • the reference picture components may include one or more picture components in a reference picture, wherein the reference picture is different from the current picture in which the current block is located.
  • reference image here is different from the current image where the current block is located, and the reference image refers to that different image components in the reference image can be used as the input for filtering enhancement in the embodiment of the present application.
  • the determining a sample value related to the reference image component of the current block may include:
  • the samples associated with the reference picture components are determined from the reconstructed values of one or more picture components in the prediction reference block of the current block.
  • the method may also include:
  • an inter prediction mode parameter of the current block is determined, wherein the inter prediction mode parameter includes a reference image index corresponding to the reference image and an index indicating the reference image in the reference image. the motion vector of the prediction reference block;
  • the reference image index refers to the image index sequence number corresponding to the reference image, and the motion vector is used to indicate the prediction reference block in the reference image.
  • the reference image index and the motion vector can be used as inter-frame prediction mode parameters and written into the code stream for transmission from the encoder to the decoder.
  • the side information of the current block can be determined according to the samples related to the reference image components.
  • S303 Determine the side information of the current block according to the samples related to the reference image components.
  • the technical key of the present application is to use the relevant parameters of one image component to perform enhancement filtering on the initial predicted value of another image component.
  • the relevant parameters of "one image component” are mainly the side information of the current block.
  • the sample value related to the reference image component of the current block can be directly determined as the side information of the current block; on the other hand, the sample value related to the reference image component can also be filtered. The latter sample value is determined as the side information of the current block, which is not limited in this embodiment of the present application.
  • the determining the side information of the current block according to the samples related to the reference image components may include: determining the samples related to the reference image components as Side information of the current block.
  • the determining the side information of the current block according to the sample value related to the reference image component may include: according to the color component sampling format, the sample value related to the reference image component is determined. Perform a first filtering process on the value of the filtered reference image component to obtain the filtered sample value related to the reference image component; and determine the filtered sample value related to the reference image component as the side information of the current block.
  • the color component may include a luminance component, a blue chrominance component, and a red chrominance component
  • the color component sampling formats may include a 4:4:4 format, a 4:2:2 format, and a 4:2:0 format.
  • the 4:4:4 format means that neither the blue chrominance nor the red chrominance component is down-sampled relative to the luminance component.
  • the 4:2:2 format represents a 2:1 horizontal downsampling of the blue chrominance component or the red chrominance component relative to the luma component, with no vertical downsampling.
  • the 4:2:0 format represents a 2:1 horizontal downsampling and a 2:1 vertical downsampling of the blue chrominance component or the red chrominance component with respect to the luma component. That is, the 4:2:2 format and the 4:2:0 format are suitable for the above-mentioned first filtering process, while the 4:4:4 format is not suitable for the above-mentioned first filtering process.
  • the method may also include:
  • the method may further include:
  • the color component sampling format can be written into the code stream; or, the value of the bit field used to indicate the color component sampling format is determined.
  • the value is written into the code stream; then transmitted by the encoder to the decoder, so that the decoder can directly obtain the color component sampling format after parsing the code stream.
  • the performing the first filtering process on the samples related to the reference image components may include:
  • the filtered samples associated with the reference image component are set equal to the samples associated with the reference image component.
  • the resolution of the sample value related to the filtered reference image component is equal to the resolution of the initial prediction value.
  • the image component to be predicted is a chrominance component and the reference image component is a luminance component
  • the resolution of the luminance component after downsampling processing is the same as the color.
  • the resolution of the chrominance component is the same; or, if the image component to be predicted is a luminance component and the reference image component is a chrominance component, then the samples related to the reference image component need to be upsampled, and the chrominance after the upsampling process
  • the resolution of the component is the same as the resolution of the luminance component.
  • the to-be-predicted image component is the blue chrominance component and the reference image component is the red chrominance component
  • the resolution of the blue chrominance component is the same as the resolution of the red chrominance component
  • the related samples are subjected to the first filtering process, that is, the samples related to the reference image components after filtering may be set equal to the samples related to the reference image components before filtering.
  • the side information of the current block can be determined according to the samples related to the reference image components, so as to use the side information of the current block to filter and enhance the initial predicted value of the to-be-predicted image component.
  • S304 Filter the initial predicted value by using a preset network model and side information of the current block to obtain a target predicted value of the image component to be predicted of the current block.
  • the embodiment of the present application uses a preset network model
  • the preset network model can be named as a neural network model with local data short-circuit, or named as a semi-residual network, and then combined with the current block
  • the side information is used to filter and enhance the initial predicted value to improve the accuracy of the prediction.
  • the side information of the current block is not equivalent to the “side information” commonly understood by those in the art, and the side information in this embodiment of the present application is mainly: using “one image component” to “another image component”
  • the predicted value is filtered and enhanced.
  • the side information may be parameters related to "one or more image components", these parameters may be parameters used to obtain the predicted value or reconstructed value of "one or more image components", or they may be directly “one or more image components”.
  • the method may further include: determining a preset network model.
  • the preset network model is obtained through model training. In some embodiments, it may specifically include:
  • the training sample set includes one or more images
  • the initial network model after training is determined as the preset network model.
  • the training sample set may include one or more images.
  • the training sample set can be a set of training samples stored locally by the encoder, or a set of training samples obtained from a remote server according to link or address information, or even a set of decoded image samples in the video. This embodiment of the present application There is no specific limitation.
  • the initial network model can be trained by using the training sample set through the cost function.
  • the loss value (Loss) of the cost function converges to a certain preset threshold
  • the initial network model obtained by training at this time The model is the default network model.
  • the cost function may be a rate-distortion cost function
  • the preset threshold may be specifically set according to the actual situation, which is not limited in this embodiment of the present application.
  • network model parameters in the preset network model may be determined first.
  • the determining the preset network model may include:
  • the preset network model is constructed according to the determined network model parameters.
  • the network model parameters may be determined through model training. Specifically, in some embodiments, it may include: acquiring a training sample set; constructing an initial network model, wherein the initial network model includes model parameters; using the training sample set to train the initial network model, The model parameters in the subsequent initial network model are determined as the network model parameters.
  • the network model parameters can be written into the code stream.
  • the decoder side can directly obtain network model parameters by parsing the code stream, and a preset network model can be constructed without model training on the decoder side.
  • the preset network model may include a neural network model and a first adder.
  • CNN Convolutional Neural Networks
  • Feedforward Neural Networks Feedforward Neural Networks
  • Convolutional neural network has the ability of representation learning and can perform shift-invariant classification of input information according to its hierarchical structure, so it is also called "shift-invariant artificial neural network”.
  • Neural Networks, SIANN Neural Networks
  • Neural networks have developed to the stage of deep learning. Deep learning is a branch of machine learning. It is an algorithm that attempts to use multiple processing layers containing complex structures or multiple nonlinear transformations to perform high-level abstraction on data. Its powerful expressive ability makes it suitable for video and image processing. The performance has achieved good results.
  • the residual layer may be composed of an activation function, a convolution layer and a second adder, but it is not specifically limited here.
  • the activation function can be a linear rectification function (Rectified Linear Unit, ReLU), also known as a modified linear unit, which is a commonly used activation function in artificial neural networks, usually referring to the ramp function and its variants. Linear function.
  • ReLU Rectified Linear Unit
  • filtering the initial predicted value by using a preset network model and side information of the current block to obtain the target predicted value of the image component to be predicted of the current block may include: :
  • the side information of the current block and the initial predicted value of the image component to be predicted are input into the preset network model, and the target predicted value of the image component to be predicted is output through the preset network model.
  • the preset network model may include a neural network model and a first adder
  • the side information of the current block and the initial predicted value of the image component to be predicted are input into the preset network model
  • Outputting the target predicted value of the to-be-predicted image component through the preset network model may specifically include: inputting the side information and the initial predicted value into the neural network model, and outputting an intermediate value;
  • the adder performs addition processing on the intermediate value and the initial predicted value to obtain the target predicted value.
  • the preset network model includes a neural network model 401 and a first adder 402 .
  • the input is the side information and the initial predicted value of the image component to be predicted.
  • the intermediate value can be obtained, and then the intermediate value and the initial predicted value of the image component to be predicted are calculated by the first adder 402.
  • the addition process is performed, and the final output is the target predicted value of the image component to be predicted.
  • the preset network model achieves a reduction from two-channel input to one-channel output.
  • S305 Encode the to-be-predicted image component of the current block according to the target predicted value.
  • the image component to be predicted of the current block may be encoded. Specifically, according to the target predicted value, the residual value of the current block (ie, the difference between the target predicted value and the real value) can be calculated, and then the residual value is encoded and written into the code stream.
  • This embodiment provides an image prediction method, which is applied to an encoder.
  • determining the initial prediction value of the to-be-predicted image component of the current block determining the sample value related to the reference image component of the current block; determining the side information of the current block according to the sample value related to the reference image component;
  • the network model and the side information of the current block filter the initial predicted value to obtain the target predicted value of the to-be-predicted image component of the current block; according to the target predicted value, the to-be-predicted value of the current block is obtained image components are encoded.
  • the correlation between image components can be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value, thereby effectively improving the prediction accuracy. , thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.
  • the technical solutions of the embodiments of the present application propose to use the neural network technology (or called semi-residual network) of local data short-circuit, and combine the correlation between the side information around the current block and the current block to filter the predicted initial predicted value Enhancements to improve forecast accuracy. That is, the embodiments of the present application focus on filtering the initial predicted value of one image component using one or more other image components different from the image component.
  • the initial predicted value can be obtained by using a common non-inter-component prediction mode (such as intra-frame prediction mode and inter-frame prediction mode), or can be obtained by using an inter-component prediction mode (such as CCLM mode).
  • FIG. 5 it shows a schematic diagram of an application scenario of another image prediction method provided by an embodiment of the present application.
  • the image component to be predicted is a chrominance component (represented by Cb or Cr)
  • the reference image component is a luminance component (represented by L) of the current block.
  • the initial predicted values (Cb pred and Cr pred ) of the chroma blocks may be predicted by using the CCLM mode, or may be predicted by other intra-frame prediction modes or inter-frame prediction modes.
  • the side information of the current block is the reconstructed value of the luminance component of the current block, and when the image component to be predicted is the chrominance component,
  • the filtering of the initial predicted value by using the preset network model and the side information of the current block to obtain the target predicted value of the to-be-predicted image component of the current block may include:
  • the filtered reconstruction value and the initial predicted value of the chrominance component are input into the preset network model, and the target predicted value of the chrominance component is output through the preset network model.
  • the side information of the current block is the reconstructed value of the luminance component of the current block, and the image component to be predicted is the chrominance component,
  • the filtering of the initial predicted value by using the preset network model and the side information of the current block to obtain the target predicted value of the to-be-predicted image component of the current block may include:
  • the reconstructed value of the luminance component of the current block and the filtered initial predicted value are input into the preset network model, and the target predicted value of the chrominance component is output through the preset network model.
  • the side information of the current block refers to the reconstructed value of the luminance component (L rec ), and the initial predicted value of the image component to be predicted of the current block refers to the initial predicted value of the chrominance component (Cb pred or Cr pred ).
  • the initial predicted value predicted by the chrominance component is refined by using the reconstructed value of the luminance component of the current block, and the enhanced target predicted value of the chrominance component (Cb pred '/Cr pred ').
  • the input of the preset network model is the luminance component reconstruction value L rec of the current block and the chrominance component prediction value Cb pred /Cr pred of the current block; wherein, for the luminance component reconstruction value L rec of the current block, according to different
  • the color component sampling format is assumed to be 4:2:0 format. At this time, after 2 times downsampling (if it is 4:4:4 format, then the downsampling step is not required), the size of the luminance component is aligned with the size of the chrominance component .
  • the preset network model effectively learns the correlation of the two inputs, and at the same time, one of the inputs of the preset network model (the chrominance component prediction value Cb pred /Cr pred ) is connected to the output of the model, so that the output Cb pred '/Cr pred ' is closer to the actual chrominance value (or referred to as the original chrominance value) of the current block than the original chrominance component prediction value Cb pred /Cr pred .
  • the preset network model may include two parts, the neural network model 601 and the first adder 602 .
  • the neural network model 601 can be formed by stacking a convolution layer, a residual layer, an average pooling layer and a sampling conversion module, and one of the inputs of the preset network model (the predicted value of the chrominance component Cb pred /Cr pred ) Connect to the output of the preset network model and add to the output of the neural network model to obtain Cb pred '/Cr pred '.
  • the residual layer it can also be called a residual block (Residual Block, Resblock), and an example of its network structure is shown in Figure 7.
  • the neural network model 601 for two inputs (L rec and Cb pred /Cr pred ), firstly perform double downsampling for L rec , and then perform concatenation layer (Concatenate) Splicing; then perform convolution operation through the convolution layer to extract the feature map, and output the intermediate value after processing through the residual layer (Resblock), the average pooling layer, the sampling conversion module and the two convolution layers, and finally use the first
  • the adder 602 adds one of the inputs (the chrominance component predicted value Cb pred /Cr pred ) and the intermediate value to output Cb pred '/Cr pred '.
  • the convolutional layer can be divided into the first convolutional layer and the second convolutional layer.
  • the first convolutional layer is Conv(3,64,1), that is, the convolution kernel is 3*3, the number of channels is 64, and the step The length is 1;
  • the second convolution layer is Conv(3,1,1), that is, the convolution kernel is 3*3, the number of channels is 1, and the stride is 1.
  • the average pooling layer (Avg-pooling) has a down-sampling function, so a sampling conversion module can also be included in the neural network model.
  • the sampling conversion module may refer to an up-sampling module (Up-sampling) or a down-sampling module (Under-sampling).
  • the sampling conversion module generally refers to an up-sampling module (Up-sampling), such as the example of the neural network model 601 in FIG. 6 .
  • the residual layer may be composed of a residual network 701 and a second adder 702 .
  • the residual network 701 may be composed of an activation function and a convolutional layer, the activation function may be represented by ReLU, and the convolutional layer is the first convolutional layer, namely Conv(3,64,1).
  • the input of the residual layer (Input) obtained after passing through the residual network 701 will be added to the input of the residual layer by the second adder 702, and the output (Output) of the residual layer will be obtained after the addition. .
  • the network structure in FIG. 6 , for the neural network model 601, a total of 1 splicing layer, 2 first convolution layers, 6 residual layers, 2 average pooling layers, 2 upper sampling module and 1 second convolutional layer. It should be noted that the network structure is not unique, and may also be other stacking manners or other network structures, which are not specifically limited in this embodiment of the present application.
  • FIG. 8 shows a schematic diagram of a network structure of another preset network model provided by an embodiment of the present application.
  • FIG. 9 it shows a schematic diagram of a network structure of still another preset network model provided by an embodiment of the present application.
  • FIG. 10 it shows a schematic diagram of a network structure of still another preset network model provided by an embodiment of the present application.
  • the luminance component reconstructed value (L rec ) is down-sampled to achieve the same resolution of the luminance component reconstructed value (L rec ) and the chrominance component predicted value (Cb pred /Cr pred ) .
  • FIG. 9 and 10 upsampling the chrominance component predicted value (Cb pred /Cr pred ) to achieve the resolution of the luminance component reconstruction value (L rec ) and the chrominance component predicted value (Cb pred /Cr pred ) ) at the same resolution.
  • FIG. 8 , FIG. 9 , and FIG. 10 provide alternative examples of three network structures, to illustrate that the network structure of the preset network model is not unique, and may also be other stacking manners or other network structures.
  • FIG. 11 shows a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • the image component to be predicted is a chrominance component (denoted by Cb or Cr)
  • the side information of the current block is the reconstructed value of the luminance component of the current block (denoted by L rec ) and the reconstructed value of the luminance component of the previous adjacent block (denoted by TopL rec representation).
  • FIG. 12 shows a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • the image component to be predicted is a chrominance component (denoted by Cb or Cr)
  • the side information of the current block is the reconstructed value of the luminance component of the current block (denoted by L rec ) and the reconstructed value of the luminance component of the left adjacent block (denoted by LeftL rec representation).
  • FIG. 13 shows a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • the image component to be predicted is a chrominance component (represented by Cb or Cr)
  • the side information of the current block is the reconstructed value of the luminance component of the current block (represented by L rec ) and the predicted value of the chrominance component of the adjacent block ( Expressed as TopCb pred /TopCr pred ).
  • FIG. 14 it shows a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • the image component to be predicted is a chrominance component (represented by Cb or Cr)
  • the side information of the current block is the reconstructed value of the luminance component of the current block (represented by L rec ) and the predicted value of the chrominance component of the left adjacent block ( Expressed as LeftCb pred /LeftCr pred ).
  • the side information of the preset network model may be other side information.
  • the side information can be the luminance components (TopL rec , LeftL rec ) and chroma components (TopCb pred /TopCr pred , LeftCb pred /LeftCr of the upper and left adjacent blocks of the current block) pred ), etc., as shown in FIG. 11 , FIG. 12 , FIG. 13 and FIG. 14 .
  • FIG. 15 it shows a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • the image component to be predicted is a chrominance component (represented by Cb or Cr)
  • the side information of the current block is the reconstructed value of the luminance component of the current block (represented by L rec ), the reconstructed value of the luminance component of the previous adjacent block (represented by TopLrec), the reconstructed value of the luma component of the left adjacent block (denoted by LeftLrec), the predicted value of the chrominance component of the upper adjacent block (denoted by TopCbpred/TopCrpred), and the predicted value of the chrominance component of the left adjacent block (denoted by LeftCbpred) /LeftCrpred indicates).
  • the side information of the current block includes reconstructed values of reference image components of the current block, image component values corresponding to at least one row of pixels adjacent to the current block, and images corresponding to at least one column of pixels adjacent to the current block When at least two of the component values,
  • the filtering of the initial predicted value by using the preset network model and the side information of the current block to obtain the target predicted value of the to-be-predicted image component of the current block may include:
  • the joint side information and the initial predicted value of the predicted image component are input into the preset network model, and the target predicted value of the predicted image component is output through the preset network model.
  • side information of the preset network model may be joint side information. It is even possible to combine all side information as input or combine part of side information as input, as shown in Figure 15.
  • FIG. 16 it shows a schematic diagram of an application scenario of still another image prediction method provided by an embodiment of the present application.
  • the upper and left adjacent two rows/columns (or multiple rows/columns) of the current luminance block are compared with the current luminance block (filled with slashes) Splicing and downsampling by 2 times, the upper and left adjacent row/column (or multiple rows/columns) of the current chroma block are spliced with the current chroma block (filled with grid lines), and the two together are used as the preset network. input to the model.
  • the current chrominance block is connected to the output of the preset network model, and finally the target predicted value (Cb pred '/Cr pred ') of the filtered and enhanced chrominance component is obtained.
  • the joint side information includes the predicted image component reference value corresponding to at least one row of reference pixels adjacent to the upper side of the current block and the reference value corresponding to at least one column of reference pixels adjacent to the left side of the current block.
  • the predicted image component reference value the method may further include:
  • the predicted image component reference value corresponding to at least one row of reference pixels adjacent to the upper side of the current block the predicted image component reference value corresponding to at least one column of reference pixels adjacent to the left side of the current block, and the current block.
  • the initial predicted value of the predicted image component is input into the preset network model, and the target predicted value of the predicted image component of the current block is output through the preset network model.
  • the upper and left adjacent lines (or lines) of the current luminance block and the current luminance block can be used.
  • Fill) stitching as the input of the preset network model.
  • the current luminance block is connected to the output of the preset network model, and finally the target predicted value (L pred ') of the filtered and enhanced luminance component is obtained.
  • the reconstructed data before or after the loop filtering process, can also be enhanced by using the input data local short-circuit type network (ie, the preset network model) proposed in the embodiment of the present application. .
  • the input data local short-circuit type network ie, the preset network model
  • a local short-circuit network for input data proposed in this embodiment of the present application can also be added, and other side information is used as input to generate Improve the accuracy of forecast information.
  • the above-mentioned preset network model can also be used to perform the filtering enhancement method. It should also be noted that, for the initial predicted value obtained after encoding of other luminance or chrominance components in video coding, the above-mentioned method can also be used to improve the accuracy of the predicted value.
  • the embodiment of the present application is to utilize a local input data short-circuit type deep learning network to enhance the accuracy of prediction values in video coding, thereby improving coding efficiency.
  • the data to be enhanced such as the initial predicted value of the chrominance component in the above embodiment
  • the side information such as the reconstructed value of the luminance component of the current block in the above embodiment
  • the data to be enhanced and the side information are used as inputs.
  • the high correlation between the chrominance component and the luminance component enhances the data to be enhanced, thereby improving the performance of the overall codec.
  • the prediction data can be enhanced by using the local input data short-circuit network by using the correlation between the data;
  • the network enhancement can effectively improve the prediction accuracy of the chrominance components, thereby improving the coding efficiency; and the local input data short-circuit network itself is also easy to train and apply to the operation of the actual scene.
  • This embodiment provides an image prediction method, and the implementation of the foregoing embodiment is described in detail through the foregoing embodiment. It can be seen from this that the correlation between image components is used, according to the side information of the current block and the preset network model.
  • the prediction enhancement of the initial predicted value can make the enhanced target predicted value closer to the real value, thereby effectively improving the prediction accuracy, thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.
  • the image prediction method provided by the embodiment of the present application is applied to a video decoding device, that is, a decoder.
  • the functions implemented by the method can be implemented by calling a computer program by the second processor in the decoder.
  • the computer program can be stored in the second memory.
  • the decoder includes at least a second processor and a second memory.
  • FIG. 18 shows a schematic flowchart of another image prediction method provided by an embodiment of the present application. As shown in Figure 18, the method may include:
  • each decoding block may also include a first image component, a second image component, and a third image component; and the current block is the current block in the video image that is to be predicted for the first image component, the second image component, or the third image component.
  • Decode block may also include a first image component, a second image component, and a third image component; and the current block is the current block in the video image that is to be predicted for the first image component, the second image component, or the third image component.
  • the current block performs the first image component prediction, and the first image component is a luminance component, that is, the image component to be predicted is a luminance component, then the current block may also be called a luminance block; or, it is assumed that the current block performs the second image component prediction prediction, and the second image component is a chrominance component, that is, the image component to be predicted is a chrominance component, then the current block may also be called a chrominance block.
  • the encoder After the encoder determines the target prediction mode, the encoder writes the target prediction mode into the code stream. In this way, the decoder can obtain the target prediction mode of the current block by parsing the code stream.
  • the target prediction mode may be an intra-frame prediction mode or an inter-frame prediction mode. What's more, the target prediction mode may also be an intra-picture component inter-prediction mode or a cross-picture component inter-prediction mode.
  • the inter-component prediction mode usually refers to a cross-image inter-component prediction mode, such as CCLM mode.
  • S1802 Determine the initial prediction value of the to-be-predicted image component of the current block according to the target prediction mode.
  • the target prediction mode is used to indicate the prediction mode adopted for the coding prediction of the current block.
  • the determining, according to the target prediction mode, the initial prediction value of the to-be-predicted image component of the current block may include:
  • the image component to be predicted of the current block can be predicted according to the target prediction mode, and the initial predicted value of the image component to be predicted of the current block can be obtained.
  • the reference image component may include one or more image components in the current image that are different from the image components to be predicted, where the current image is the image where the current block is located.
  • the determining a sample value related to the reference image component of the current block may include:
  • a sample value associated with the reference image component of the current block is determined based on at least one of a predicted value and a reconstructed value of the reference image component of the current block.
  • the determining a sample value related to the reference image component of the current block may include:
  • a sample value associated with the reference image component is determined according to at least one of a predicted value and a reconstructed value of the reference image component corresponding to adjacent pixels of the current block.
  • the adjacent pixels of the current block may include at least one row of pixels adjacent to the current block.
  • the adjacent pixels of the current block may also include at least one column of pixels adjacent to the current block.
  • the sample value related to the reference image component of the current block may include the reconstructed value of the reference image component of the current block, the reference image component value corresponding to at least one row of pixels adjacent to the current block, and the At least two items of reference image component values corresponding to at least one column of pixels adjacent to the current block, wherein the reference image component of the current block is different from the to-be-predicted image component.
  • the reference picture components may include one or more picture components in a reference picture, wherein the reference picture is different from the current picture in which the current block is located.
  • the determining a sample value related to the reference image component of the current block may include:
  • the samples associated with the reference picture components are determined from the reconstructed values of one or more picture components in the prediction reference block of the current block.
  • the method may also include:
  • the target prediction mode indicates an inter-frame prediction mode
  • the prediction reference block is determined in the reference picture based on the motion vector.
  • a reference image is also required.
  • the reference image index refers to the image index sequence number corresponding to the reference image
  • the motion vector is used to indicate the prediction reference block in the reference image.
  • the encoder can use the reference picture index and the motion vector as inter prediction mode parameters and write it into the code stream for transmission from the encoder to the decoder. In this way, the decoder can directly obtain the reference image index and motion vector by parsing the code stream.
  • the side information of the current block can be determined according to the samples related to the reference image components.
  • S1804 Determine the side information of the current block according to the samples related to the reference image components.
  • the technical key of the present application is to use the relevant parameters of one image component to perform enhancement filtering on the initial predicted value of another image component.
  • the relevant parameters of "one image component” are mainly the side information of the current block.
  • the sample value related to the reference image component of the current block can be directly determined as the side information of the current block; on the other hand, the sample value related to the reference image component can also be filtered. The latter sample value is determined as the side information of the current block, which is not limited in this embodiment of the present application.
  • the determining the side information of the current block according to the samples related to the reference image components may include: determining the samples related to the reference image components as Side information of the current block.
  • the determining the side information of the current block according to the sample value related to the reference image component may include: according to the color component sampling format, the sample value related to the reference image component is determined. Perform a first filtering process on the value of the filtered reference image component to obtain the filtered sample value related to the reference image component; and determine the filtered sample value related to the reference image component as the side information of the current block.
  • the color component may include a luminance component, a blue chrominance component, and a red chrominance component
  • the color component sampling formats may include a 4:4:4 format, a 4:2:2 format, and a 4:2:0 format.
  • the 4:2:2 format and the 4:2:0 format are applicable to the above-described first filtering process, and the 4:4:4 format is not applicable to the above-described first filtering process.
  • the method may also include:
  • the method may further include:
  • the color component sampling format is determined according to the value of the bit field.
  • the encoder determines the color component sampling format, it can write the color component sampling format into the code stream; or, determine the value of the bit field used to indicate the color component sampling format, the value of the bit field Write the code stream; and then transmit it to the decoder from the encoder, so that the decoder can directly obtain the color component sampling format after parsing the code stream.
  • the performing the first filtering process on the samples related to the reference image components may include:
  • the filtered samples associated with the reference image component are set equal to the samples associated with the reference image component.
  • the resolution of the sample value related to the filtered reference image component is equal to the resolution of the initial prediction value.
  • the image component to be predicted is a chrominance component and the reference image component is a luminance component
  • the resolution of the luminance component after downsampling processing is the same as the color.
  • the resolution of the chrominance component is the same; or, if the image component to be predicted is a luminance component and the reference image component is a chrominance component, then the samples related to the reference image component need to be upsampled, and the chrominance after the upsampling process
  • the resolution of the component is the same as the resolution of the luminance component.
  • the to-be-predicted image component is the blue chrominance component and the reference image component is the red chrominance component
  • the resolution of the blue chrominance component is the same as the resolution of the red chrominance component
  • the related samples are subjected to the first filtering process, that is, the samples related to the reference image components after filtering may be set equal to the samples related to the reference image components before filtering.
  • the side information of the current block can be determined according to the samples related to the reference image components, so as to use the side information of the current block to filter and enhance the initial predicted value of the to-be-predicted image component.
  • S1805 Filter the initial predicted value by using a preset network model and side information of the current block to obtain a target predicted value of the image component to be predicted of the current block.
  • the embodiment of the present application uses a preset network model
  • the preset network model can be named as a neural network model with local data short-circuit, or named as a semi-residual network, and then combined with the current block
  • the side information is used to filter and enhance the initial predicted value to improve the accuracy of the prediction.
  • the method may further include: determining a preset network model.
  • the preset network model is obtained through model training. In some embodiments, it may specifically include:
  • the training sample set includes one or more images
  • the initial network model after training is determined as the preset network model.
  • the training sample set may include one or more images.
  • the training sample set may be a set of training samples stored locally by the encoder, or a set of training samples obtained from a remote server according to link or address information, or even a set of decoded image samples in the video. This embodiment of the present application There is no specific limitation.
  • the initial network model can be trained by using the training sample set through the cost function.
  • the loss value (Loss) of the cost function converges to a certain preset threshold
  • the initial network model obtained by training at this time The model is the default network model.
  • the cost function may be a rate-distortion cost function
  • the preset threshold may be specifically set according to the actual situation, which is not limited in this embodiment of the present application.
  • network model parameters in the preset network model may be determined first.
  • the determining the preset network model may include:
  • the preset network model is constructed according to the determined network model parameters.
  • the network model parameters may be determined through model training. Specifically, in some embodiments, it may include: acquiring a training sample set; constructing an initial network model, wherein the initial network model includes model parameters; using the training sample set to train the initial network model, The model parameters in the subsequent initial network model are determined as the network model parameters.
  • the determining the preset network model may include:
  • the preset network model is determined according to the network model parameters.
  • the encoder obtains the network model parameters through model training
  • the network model parameters are written into the code stream.
  • the decoder can directly obtain the network model parameters by parsing the code stream, and a preset network model can be constructed without model training in the decoder.
  • the preset network model may include a neural network model and a first adder.
  • the neural network model may include at least a convolution layer, a residual layer, an average pooling layer and a sample rate conversion module.
  • the residual layer can be composed of an activation function, a convolutional layer and a second adder.
  • the sampling conversion module may be an up-sampling module or a down-sampling module.
  • the average pooling layer and the sampling rate conversion module are equivalent to the effect of low-pass filtering, and the sampling rate conversion module usually refers to the up-sampling module, but the embodiment of the present application does not specifically limit it .
  • filtering the initial predicted value by using a preset network model and side information of the current block to obtain the target predicted value of the image component to be predicted of the current block may include: :
  • the side information of the current block and the initial predicted value of the image component to be predicted are input into the preset network model, and the target predicted value of the image component to be predicted is output through the preset network model.
  • the preset network model may include a neural network model and a first adder
  • the side information of the current block and the initial predicted value of the image component to be predicted are input into the preset network model
  • Outputting the target predicted value of the to-be-predicted image component through the preset network model may specifically include: inputting the side information and the initial predicted value into the neural network model, and outputting an intermediate value;
  • the adder performs addition processing on the intermediate value and the initial predicted value to obtain the target predicted value.
  • S1806 Decode the to-be-predicted image component of the current block according to the target predicted value.
  • the image component to be predicted of the current block may be decoded. Specifically, after obtaining the target predicted value, the residual value is obtained by parsing the code stream, and then the real image information can be decoded and restored by using the residual value and the target predicted value.
  • This embodiment provides an image prediction method, which is applied to a decoder.
  • Obtain the target prediction mode of the current block by parsing the code stream; determine the initial prediction value of the to-be-predicted image component of the current block according to the target prediction mode; determine the sample value related to the reference image component of the current block; The sample value related to the reference image component is used to determine the side information of the current block; the initial predicted value is filtered by using a preset network model and the side information of the current block to obtain the to-be-predicted image of the current block
  • the target predicted value of the component according to the target predicted value, the to-be-predicted image component of the current block is decoded.
  • the correlation between image components can be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value, thereby effectively improving the prediction accuracy. , thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.
  • FIG. 19 shows a schematic structural diagram of the composition of an encoder 190 provided by an embodiment of the present application.
  • the encoder 190 may include: a first determination unit 1901, a first prediction unit 1902, and a coding unit 1903; wherein,
  • a first determining unit 1901 configured to determine the initial prediction value of the image component to be predicted of the current block
  • the first determining unit 1901 is further configured to determine the sample value related to the reference image component of the current block; and determine the side information of the current block according to the sample value related to the reference image component;
  • the first predicting unit 1902 is configured to filter the initial predicted value by using a preset network model and side information of the current block to obtain the target predicted value of the image component to be predicted of the current block;
  • the encoding unit 1903 is configured to encode the to-be-predicted image component of the current block according to the target prediction value.
  • the first prediction unit 1901 is further configured to determine a target prediction mode of the current block; and to predict the to-be-predicted image component of the current block according to the target prediction mode, and determine the current block The initial predicted value of the image component to be predicted.
  • the encoder 190 may further include a precoding unit 1904;
  • the first determining unit 1901 is further configured to determine the to-be-predicted image component of the current block
  • a precoding unit 1904 configured to pre-encode the to-be-predicted image component by using one or more candidate prediction modes, and determine a rate-distortion cost result corresponding to the candidate prediction mode;
  • the first determining unit 1901 is further configured to select an optimal rate-distortion cost result from the rate-distortion cost result, and determine a candidate prediction mode corresponding to the optimal rate-distortion cost result as the target prediction mode of the current block .
  • the target prediction mode is an intra prediction mode or an inter prediction mode.
  • the target prediction mode is an intra-picture component prediction mode or a cross-picture component inter-prediction mode.
  • the reference image components include one or more image components in a current image that are different from the image components to be predicted, wherein the current image is the image in which the current block is located.
  • the first determining unit 1901 is specifically configured to determine a sample value related to the reference image component according to at least one of the predicted value and the reconstructed value of the reference image component of the current block.
  • the first determining unit 1901 is specifically configured to, according to at least one of the predicted value and the reconstructed value of the reference image component corresponding to the adjacent pixels of the current block, determine the relative value of the reference image component. sample value.
  • the adjacent pixels of the current block include at least one row of pixels adjacent to the current block.
  • the adjacent pixels of the current block include at least one column of pixels adjacent to the current block.
  • the reference picture components comprise one or more picture components in a reference picture, wherein the reference picture is different from the current picture in which the current block is located.
  • the first determining unit 1901 is specifically configured to determine the sample value related to the reference image component according to the reconstructed value of one or more image components in the prediction reference block of the current block.
  • the encoder 190 may further include a writing unit 1905;
  • the first determining unit 1901 is further configured to, when the target prediction mode indicates an inter prediction mode, determine an inter prediction mode parameter of the current block, wherein the inter prediction mode parameter includes an indication that the reference image corresponds to a reference picture index and a motion vector indicating the prediction reference block in the reference picture;
  • the writing unit 1905 is configured to write the determined inter-frame prediction mode parameter into the code stream.
  • the first determining unit 1901 is further configured to determine a sample value related to the reference image component as the side information of the current block.
  • the encoder 190 may further include a first sampling unit 1906 configured to perform a first filtering process on the samples related to the reference image components according to the color component sampling format, to obtain the filtered reference image component-related samples; and determining the filtered reference image component-related samples as side information of the current block.
  • a first sampling unit 1906 configured to perform a first filtering process on the samples related to the reference image components according to the color component sampling format, to obtain the filtered reference image component-related samples; and determining the filtered reference image component-related samples as side information of the current block.
  • the writing unit 1905 is further configured to write the color component sampling format into the code stream.
  • the first determining unit 1901 is further configured to determine the value of the bit field to be written in the code stream; wherein, the value of the bit field is used to indicate the color component sampling format;
  • the writing unit 1905 is further configured to write the value of the bit field into the code stream.
  • the first sampling unit 1906 is specifically configured to, when the resolution of the initial prediction value is smaller than the resolution of the sample value related to the reference image component, perform a sampling operation on the sample value related to the reference image component. down-sampling processing; when the resolution of the initial prediction value is greater than the resolution of the sample value related to the reference image component, perform up-sampling processing on the sample value related to the reference image component; When the resolution is equal to the resolution of the sample associated with the reference image component, the filtered sample associated with the reference image component is set equal to the sample associated with the reference image component.
  • the resolution of the samples associated with the filtered reference image component is equal to the resolution of the initial predictor.
  • the samples related to the reference image component of the current block include reconstructed values of the reference image component of the current block, reference image component values corresponding to at least one row of pixels adjacent to the current block, and At least two items of reference image component values corresponding to at least one column of pixels adjacent to the current block, wherein the reference image component of the current block is different from the to-be-predicted image component.
  • the first prediction unit 1902 is specifically configured to input the side information of the current block and the initial prediction value of the image component to be predicted into the preset network model, and use the preset network model A target predicted value of the to-be-predicted image component is output.
  • the first determining unit 1901 is further configured to determine the preset network model.
  • the preset network model includes a neural network model and a first adder.
  • the first prediction unit 1902 is specifically configured to input the side information and the initial predicted value into the neural network model, and output an intermediate value; and performing addition processing with the initial predicted value to obtain the target predicted value.
  • the neural network model includes at least one of the following: a convolutional layer, a residual layer, an average pooling layer, and a sample rate conversion module.
  • the includes at least one of the following: an activation function, a convolutional layer, and a second adder.
  • the encoder 190 may further include a first training unit 1907 configured to obtain a training sample set; wherein the training sample set includes one or more images; and construct an initial network model, using The training sample set trains the initial network model, and determines the trained initial network model as the preset network model.
  • a first training unit 1907 configured to obtain a training sample set; wherein the training sample set includes one or more images; and construct an initial network model, using The training sample set trains the initial network model, and determines the trained initial network model as the preset network model.
  • the first determining unit 1901 is further configured to determine network model parameters; and construct the preset network model according to the determined network model parameters.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a module, and it may also be non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or The part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions for making a computer device (which can be It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • an embodiment of the present application provides a computer storage medium, which is applied to the encoder 190, where an image prediction program is stored in the computer storage medium, and when the image prediction program is executed by the first processor, any one of the foregoing embodiments is implemented the method described.
  • FIG. 20 shows a schematic diagram of the hardware structure of the encoder 190 provided by the embodiment of the present application.
  • it may include: a first communication interface 2001 , a first memory 2002 and a first processor 2003 ; each component is coupled together through a first bus system 2004 .
  • the first bus system 2004 is used to realize the connection and communication between these components.
  • the first bus system 2004 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as the first bus system 2004 in FIG. 20 . in,
  • the first communication interface 2001 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • a first memory 2002 for storing a computer program that can run on the first processor 2003;
  • the first processor 2003 is configured to, when running the computer program, execute:
  • the to-be-predicted image component of the current block is encoded according to the target predicted value.
  • the first memory 2002 in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory Synchronous DRAM, SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the first processor 2003 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the first processor 2003 or an instruction in the form of software.
  • the above-mentioned first processor 2003 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the first memory 2002, and the first processor 2003 reads the information in the first memory 2002, and completes the steps of the above method in combination with its hardware.
  • the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic Devices (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), General Purpose Processors, Controllers, Microcontrollers, Microprocessors, Others for performing the functions described herein electronic unit or a combination thereof.
  • the techniques described herein may be implemented through modules (e.g., procedures, functions, etc.) that perform the functions described herein.
  • Software codes may be stored in memory and executed by a processor.
  • the memory can be implemented in the processor or external to the processor.
  • the first processor 2003 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • This embodiment provides an encoder, and the encoder may include a first determination unit, a first prediction unit, and a coding unit.
  • the correlation between image components can be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value, Thereby, the prediction accuracy is effectively improved, the encoding and decoding efficiency is further improved, and the overall encoding and decoding performance is improved at the same time.
  • FIG. 21 shows a schematic structural diagram of the composition of a decoder 210 provided by an embodiment of the present application.
  • the decoder 210 may include: a parsing unit 2101, a second determining unit 2102, a second predicting unit 2103 and a decoding unit 2104; wherein,
  • the parsing unit 2101 is configured to parse the code stream and determine the target prediction mode of the current block
  • the second determination unit 2102 is configured to determine, according to the target prediction mode, the initial prediction value of the to-be-predicted image component of the current block;
  • the second determining unit 2102 is further configured to determine the sample value related to the reference image component of the current block; and determine the side information of the current block according to the sample value related to the reference image component;
  • the second prediction unit 2103 is configured to use a preset network model and side information of the current block to filter the initial predicted value to obtain the target predicted value of the image component to be predicted of the current block;
  • the decoding unit 2104 is configured to decode the to-be-predicted image component of the current block according to the target prediction value.
  • the second prediction unit 2103 is further configured to predict the to-be-predicted image component of the current block according to the target prediction mode, to obtain an initial predicted value of the to-be-predicted image component of the current block.
  • the target prediction mode is an intra prediction mode or an inter prediction mode.
  • the target prediction mode is an intra-picture component prediction mode or a cross-picture component inter-prediction mode.
  • the reference image components include one or more image components in a current image that are different from the image components to be predicted, wherein the current image is the image in which the current block is located.
  • the second determining unit 2102 is specifically configured to determine a sample value related to the reference image component according to at least one of the predicted value and the reconstructed value of the reference image component of the current block.
  • the second determining unit 2102 is specifically configured to, according to at least one of the predicted value and the reconstructed value of the reference image component corresponding to the adjacent pixels of the current block, determine the relative value of the reference image component. sample value.
  • the adjacent pixels of the current block include at least one row of pixels adjacent to the current block.
  • the adjacent pixels of the current block include at least one column of pixels adjacent to the current block.
  • the reference picture components comprise one or more picture components in a reference picture, wherein the reference picture is different from the current picture in which the current block is located.
  • the second determining unit 2102 is specifically configured to determine a sample value related to the reference image component according to the reconstructed value of one or more image components in the prediction reference block of the current block.
  • the parsing unit 2101 is further configured to, when the target prediction mode indicates an inter-frame prediction mode, parse the code stream to obtain an inter-frame prediction mode parameter of the current block, wherein the inter-frame prediction mode Prediction mode parameters include reference picture index and motion vector;
  • the second determining unit 2102 is further configured to determine the reference image according to the reference image index; and determine the prediction reference block in the reference image according to the motion vector.
  • the second determining unit 2102 is further configured to determine the sample value related to the reference image component as the side information of the current block.
  • the decoder 210 may further include a second sampling unit 2105 configured to perform a first filtering process on the samples related to the reference image components according to the color component sampling format, to obtain the filtered reference image component-related samples; and determining the filtered reference image component-related samples as side information of the current block.
  • a second sampling unit 2105 configured to perform a first filtering process on the samples related to the reference image components according to the color component sampling format, to obtain the filtered reference image component-related samples; and determining the filtered reference image component-related samples as side information of the current block.
  • the parsing unit 2101 is further configured to parse the code stream to obtain the color component sampling format.
  • the parsing unit 2101 is further configured to parse the parameter set data unit in the code stream, and obtain the value of the bit field used to indicate the color component sampling format;
  • the second determining unit 2102 is further configured to determine the color component sampling format according to the value of the bit field.
  • the second sampling unit 2105 is specifically configured to, when the resolution of the initial prediction value is smaller than the resolution of the sample value related to the reference image component, perform a sampling operation on the sample value related to the reference image component. down-sampling processing; when the resolution of the initial prediction value is greater than the resolution of the sample value related to the reference image component, perform up-sampling processing on the sample value related to the reference image component; When the resolution is equal to the resolution of the sample associated with the reference image component, the filtered sample associated with the reference image component is set equal to the sample associated with the reference image component.
  • the resolution of the samples associated with the filtered reference image component is equal to the resolution of the initial predictor.
  • the samples related to the reference image component of the current block include reconstructed values of the reference image component of the current block, reference image component values corresponding to at least one row of pixels adjacent to the current block, and At least two items of reference image component values corresponding to at least one column of pixels adjacent to the current block, wherein the reference image component of the current block is different from the to-be-predicted image component.
  • the second prediction unit 2103 is specifically configured to input the side information of the current block and the initial prediction value of the image component to be predicted into the preset network model, and use the preset network model A target predicted value of the to-be-predicted image component is output.
  • the second determining unit 2102 is further configured to determine the preset network model.
  • the preset network model includes a neural network model and a first adder.
  • the second prediction unit 2103 is specifically configured to input the side information and the initial predicted value into the neural network model, and output an intermediate value; and performing addition processing with the initial predicted value to obtain the target predicted value.
  • the neural network model includes at least one of the following: a convolutional layer, a residual layer, an average pooling layer, and a sample rate conversion module.
  • the residual layer includes at least one of an activation function, a convolutional layer, and a second adder.
  • the decoder 210 may further include a second training unit 2106 configured to obtain a training sample set; wherein the training sample set includes one or more images; and construct an initial network model, using The training sample set trains the initial network model, and determines the trained initial network model as the preset network model.
  • a second training unit 2106 configured to obtain a training sample set; wherein the training sample set includes one or more images; and construct an initial network model, using The training sample set trains the initial network model, and determines the trained initial network model as the preset network model.
  • the parsing unit 2101 is further configured to parse the code stream to obtain network model parameters of the preset network model
  • the second determining unit 2102 is further configured to determine the preset network model according to the network model parameter.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a module, and it may also be non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit may be stored in a computer-readable storage medium.
  • this embodiment provides a computer storage medium, which is applied to the decoder 210, where the computer storage medium stores an image prediction program, and when the image prediction program is executed by the second processor, implements any of the foregoing embodiments. one of the methods described.
  • FIG. 22 shows a schematic hardware structure of the decoder 210 provided by the embodiment of the present application.
  • it may include: a second communication interface 2201 , a second memory 2202 and a second processor 2203 ; each component is coupled together through a second bus system 2204 .
  • the second bus system 2204 is used to realize the connection communication between these components.
  • the second bus system 2204 also includes a power bus, a control bus, and a status signal bus.
  • the various buses are designated as the second bus system 2204 in FIG. 22 . in,
  • the second communication interface 2201 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • a second memory 2202 for storing computer programs that can run on the second processor 2203;
  • the second processor 2203 is configured to, when running the computer program, execute:
  • the target prediction mode determine the initial prediction value of the to-be-predicted image component of the current block
  • the second processor 2203 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the second memory 2202 is similar to that of the first memory 2002, and the hardware function of the second processor 2203 is similar to that of the first processor 2003; details are not described here.
  • This embodiment provides a decoder, and the decoder may include a parsing unit, a second determination unit, a second prediction unit, and a decoding unit.
  • the correlation between image components can also be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value. , thereby effectively improving the prediction accuracy, thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.
  • the sample value related to the reference image component of the current block after determining the initial prediction value of the image component to be predicted of the current block, determine the sample value related to the reference image component of the current block; then determine the sample value related to the reference image component according to the reference image component. side information of the current block; then use the preset network model and the side information of the current block to filter the initial predicted value to obtain the target predicted value of the to-be-predicted image component of the current block; finally predict according to the target value to encode or decode the to-be-predicted image component of the current block.
  • the correlation between image components can be used to predict and enhance the initial predicted value according to the side information of the current block and the preset network model, so that the enhanced target predicted value is closer to the real value, thereby effectively improving the prediction accuracy. , thereby improving the encoding and decoding efficiency, and at the same time improving the overall encoding and decoding performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种图像预测方法、编码器、解码器以及计算机存储介质,图像预测方法应用于编码器。该方法包括:确定当前块的待预测图像分量的初始预测值;确定所述当前块的参考图像分量相关的样值;根据所述参考图像分量相关的样值,确定所述当前块的边信息;利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;根据所述目标预测值,对所述当前块的待预测图像分量进行编码。

Description

图像预测方法、编码器、解码器以及计算机存储介质 技术领域
本申请涉及视频编解码技术领域,尤其涉及一种图像预测方法、编码器、解码器以及计算机存储介质。
背景技术
随着人们对视频显示质量要求的提高,高清和超高清视频等新视频应用形式应运而生。H.265/高效率视频编码(High Efficiency Video Coding,HEVC)已经无法满足视频应用迅速发展的需求,联合视频研究组(Joint Video Exploration Team,JVET)制定了最新的视频编码标准H.266/多功能视频编码(Versatile Video Coding,VVC),其相应的测试模型为VVC的参考软件测试平台(VVC Test Model,VTM)。
在H.266/VVC中,考虑到视频数据在空间、时间、分量间的相关性,目前产生了多种预测技术。在现有的这多种预测技术中,目前已有的预测算法没有充分考虑不同分量间的相关性,导致降低了分量间预测的准确性,进而降低了压缩编码效率。另外,使用已有的跨分量线性模型(Cross-component Linear Model,CCLM)预测模式所确定的当前块的预测值与当前块周围相邻参考像素的样值之间还存在明显的不连续性,从而降低了预测准确性,进而降低了压缩编码效率。
发明内容
本申请提供了一种图像预测方法、编码器、解码器以及计算机存储介质,可以实现对图像分量的预测值进行增强,使得增强后的目标预测值更接近于真实值,从而能够有效提高预测精度,进而提高编解码效率。
本申请的技术方案可以如下实现:
第一方面,本申请实施例提供了一种图像预测方法,应用于编码器,该方法包括:
确定当前块的待预测图像分量的初始预测值;
确定所述当前块的参考图像分量相关的样值;
根据所述参考图像分量相关的样值,确定所述当前块的边信息;
利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
第二方面,本申请实施例提供了一种图像预测方法,应用于解码器,该方法包括:
解析码流,获取当前块的目标预测模式;
根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;
确定所述当前块的参考图像分量相关的样值;
根据所述参考图像分量相关的样值,确定所述当前块的边信息;
利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
第三方面,本申请实施例提供了一种编码器,所述编码器包括第一确定单元、第一预测单元和编码单元;其中,
所述第一确定单元,配置为确定当前块的待预测图像分量的初始预测值;
所述第一确定单元,还配置为确定所述当前块的参考图像分量相关的样值;以及根据所述参考图像分量相关的样值,确定所述当前块的边信息;
所述第一预测单元,配置为利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波, 得到所述当前块的待预测图像分量的目标预测值;
所述编码单元,配置为根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
第四方面,本申请实施例提供了一种编码器,所述编码器包括第一存储器和第一处理器;其中,
所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;
所述第一处理器,用于在运行所述计算机程序时,执行如第一方面所述的方法。
第五方面,本申请实施例提供了一种解码器,所述解码器包括解析单元、第二确定单元、第二预测单元和解码单元;其中,
所述解析单元,配置为解析码流,确定当前块的目标预测模式;
所述第二确定单元,配置为根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;
所述第二确定单元,还配置为确定所述当前块的参考图像分量相关的样值;以及根据所述参考图像分量相关的样值,确定所述当前块的边信息;
所述第二预测单元,配置为利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
所述解码单元,配置为根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
第六方面,本申请实施例提供了一种解码器,所述解码器包括第二存储器和第二处理器;其中,
所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;
所述第二处理器,用于在运行所述计算机程序时,执行如第二方面所述的方法。
第七方面,本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有图像预测程序,所述图像预测程序被第一处理器执行时实现如第一方面所述的方法、或者被第二处理器执行时实现如第二方面所述的方法。
本申请实施例提供了一种图像预测方法、编码器、解码器以及计算机存储介质,在编码器侧,通过确定当前块的待预测图像分量的初始预测值;确定所述当前块的参考图像分量相关的样值;根据所述参考图像分量相关的样值,确定所述当前块的边信息;利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;根据所述目标预测值,对所述当前块的待预测图像分量进行编码。在解码器侧,解析码流,获取当前块的目标预测模式;根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;确定所述当前块的参考图像分量相关的样值;根据所述参考图像分量相关的样值,确定所述当前块的边信息;利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;根据所述目标预测值,对所述当前块的待预测图像分量进行解码。这样,可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
附图说明
图1为相关技术方案提供的一种MIP预测过程的流程框图;
图2A为本申请实施例提供的一种视频编码系统的组成框图;
图2B为本申请实施例提供的一种视频解码系统的组成框图;
图3为本申请实施例提供的一种图像预测方法的流程示意图;
图4为本申请实施例提供的一种图像预测方法的应用场景示意图;
图5为本申请实施例提供的另一种图像预测方法的应用场景示意图;
图6为本申请实施例提供的一种预设网络模型的网络结构示意图;
图7为本申请实施例提供的一种残差层的网络结构示意图;
图8为本申请实施例提供的另一种预设网络模型的网络结构示意图;
图9为本申请实施例提供的又一种预设网络模型的网络结构示意图;
图10为本申请实施例提供的再一种预设网络模型的网络结构示意图;
图11为本申请实施例提供的又一种图像预测方法的应用场景示意图;
图12为本申请实施例提供的再一种图像预测方法的应用场景示意图;
图13为本申请实施例提供的再一种图像预测方法的应用场景示意图;
图14为本申请实施例提供的再一种图像预测方法的应用场景示意图;
图15为本申请实施例提供的再一种图像预测方法的应用场景示意图;
图16为本申请实施例提供的再一种图像预测方法的应用场景示意图;
图17为本申请实施例提供的再一种图像预测方法的应用场景示意图;
图18为本申请实施例提供的另一种图像预测方法的流程示意图;
图19为本申请实施例提供的一种编码器的组成结构示意图;
图20为本申请实施例提供的一种编码器的具体硬件结构示意图;
图21为本申请实施例提供的一种解码器的组成结构示意图;
图22为本申请实施例提供的一种解码器的具体硬件结构示意图。
具体实施方式
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
在视频图像中,一般采用第一图像分量、第二图像分量和第三图像分量来表征编码块(Coding Block,CB)。其中,这三个图像分量分别为一个亮度分量、一个蓝色色度分量和一个红色色度分量,具体地,亮度分量通常使用符号Y表示,蓝色色度分量通常使用符号Cb或者U表示,红色色度分量通常使用符号Cr或者V表示;这样,视频图像可以用YCbCr格式表示,也可以用YUV格式表示。
在本申请实施例中,第一图像分量可以为亮度分量,第二图像分量可以为蓝色色度分量,第三图像分量可以为红色色度分量,但是本申请实施例不作具体限定。
下面将针对目前各种预测技术进行相关技术方案描述。
可以理解,考虑到视频数据在空间、时间、分量间的相关性,目前的视频编码中存在多种预测技术。例如,在H.266/VVC中,就使用到了以下各种预测技术。这里,针对这多种预测技术,可以划分为帧内预测技术、帧间预测技术和分量间预测技术。
(1)对于帧内预测技术而言
利用当前块上侧、左侧区域已经编码且解码后的重建像素为参考,预测当前的像素值。具体来说,包括但不限于以下各种帧内预测模式:
(a)平面(PLANAR)模式:主要用于图像纹理相对平滑且有渐变过程的区域,使用当前块内待预测像素点的上下左右4个相邻边界上的参考像素点进行线性插值求和平均(基于位置),来得到当前像素点的预测值。
(b)直流(Direct Current,DC)模式:主要用于图像平坦,纹理平滑,且没有渐变的区域,将上一行和左一列的所有参考像素求均值作为当前块内像素的预测值。
(c)角度预测模式:VVC采纳了更加精细的帧内预测方向,将HEVC中的33种角度预测模式扩展到了65种,利用上一行和左一列的所有参考像素进行角度投影,得到当前像素预测值。
(d)基于位置的帧内预测组合(Position Dependent intra Prediction Combination,PDPC):为一种对预测值的修正的技术。部分帧内模式在进行帧内预测之后,进行PDPC的加权平均计算,得到最终预测值,
(e)宽角度帧内预测模式(Wide-Angle Intra Prediction,WAIP):对于非方形的编码块(for non-square blocks),不再限制45°~-135°的角度预测范围,依据编码单元(Coding Unit,CU)的宽高比,自适应替换原始的角度预测模式,扩展出28种宽角度模式。
(f)多参考行帧内预测模式(Multiple reference line intra prediction,MRL):用多个可选的参考行(上侧相邻和左侧相邻)进行帧内预测。
(g)基于矩阵的帧内预测模式(Matrix weighted Intra Prediction,MIP):通过训练出来矩阵,将上一行和左一列的输入与基于深度学习训练出的矩阵相乘,得到当前块的预测值。
等等。
(2)对于帧间预测技术而言
利用参考帧中已重建像素值为参考,结合运动信息对当前编码块的像素值进行预测。具体来说,包括但不限于以下各种帧间预测模式:
(a)合并(Merge)模式:H.266/VVC中的Merge模式会为当前CU建立一个运动矢量(Motion Vector,MV)候选列表,包含6个候选运动信息,通过遍历这6个候选运动信息,并进行率失真代价的计算,最终选取率失真代价最小(运动估计)的一个候选作为该Merge模式的最优运动信息。
(b)带有运动矢量差(Motion Vector Difference,MVD)的Merge模式(Merge mode with MVD,MMVD):选取Merge列表中的前两个候选作为初始的运动矢量基。然后对两个初始MV分别进行扩展, 主要在4种运动方向上进行8种偏移步长的搜索,即在初始MV上加对应的偏移值(即MVD)扩展得到2×8×4=64个新的运动矢量,从中选出率失真代价最小的一个MV作为MMVD的最优的Merge候选。
(c)联合帧内帧间预测(Combined Inter and Intra Prediction,CIIP):首先对当前CU进行两种预测,一是利用帧内Planar模式预测得到帧内预测块(用P_intra表示),另一种是通过常规Merge列表中最优的运动候选运动补偿得到帧间预测块(用P_inter表示)。然后对帧内和帧间的预测值进行加权平均得到最终的帧内帧间联合预测值。
(d)几何划分预测模式(Geometric partitioning mode,GPM):一种除了正方形和矩形之外的其他形状划分模式,GPM规定将360°不等间隔量化出24种角度,每种角度下最多有4种偏移参数,总共组合出64种GPM划分模式。将帧间块划分为两个非矩形的子分区分别进行单向预测后进行加权融合得到预测值,从而更灵活的表示帧间预测数据,降低预测误差,从而提高编码效率。
(e)基于块的仿射变换运动补偿预测(Affine Motion Compensated Prediction,Affine):块的仿射运动场由两个控制点(4参数)或三个控制点运动向量(6参数)的运动信息描述。
(f)基于子块的时态运动矢量预测(Subblock-based Temporal Motion Vector Prediction,SbTMVP):使用同位图像中的运动场来改进当前图像中CUs的运动矢量预测和merge模式。
(g)带有CU级权重的双向加权预测(Bi-prediction with CU-level Weight,BCW):对于VVC中的双向预测模式允许对两个预测信号进行加权平均。
等等。
(3)对于分量间预测技术而言
对CU的色度分量进行编码时,亮度分量已经完成编码而获得亮度重建值。这时候编码色度分量的像素值时,就可以利用同区域的亮度重建值对其进行预测,该技术称为CCLM色度预测编码。具体来说,CCLM预测过程的输入是当前块的亮度重建值,以及上相邻和左相邻的参考像素,CCLM预测过程的输出是当前块的色度预测值。下面以CCLM预测模式为例进行说明。
为了利用亮度分量与色度分量之间的相关信息,基于对图像局部亮度分量和色度分量呈线性关系的假设,H.266/VVC提出了一种针对色度分量的编码技术,即CCLM技术。这里,CCLM技术可以通过使用建立的线性模型对CU的亮度重建值进行计算来得到编码块色度分量的预测值。具体如下所示,
Pred C(i,j)=α·Rec L(i,j)+β     (1)
其中,Pred C(i,j)表示编码块的色度预测值,Rec L(i,j)表示同一编码块中(经过下采样的)亮度重建值,α和β表示模型参数,
可以理解,CCLM技术的预测过程主要总结为四个步骤:①根据不同的CCLM预测模式确定亮度、色度的相邻参考像素范围以及可用性,并选定用于后续线性模型推导的相邻参考像素;②由于4:2:0的颜色分量采样格式下色度分量在水平方向、垂直方向上都为亮度分量的一半,为了使当前CU的亮度像素和色度像素一一对应,需要对亮度块进行下采样;③根据选定的相邻参考像素进行分组处理并进行线性模型参数的推导计算,以得到线性模型;④根据得到的线性模型计算色度预测值。
这里,模型参数(α和β)的推导如下:
CCLM模式共可以包括3种模式,分别为:LM模式,LM_T模式和LM_L模式。这三种模式的主要区别在于选择的相邻参考像素范围不同。假定当前色度块的宽度和高度分别表示为W、H,用Ref top和Ref left分别表示色度块上侧相邻参考像素的个数和左侧相邻参考像素的个数,用numLeftBelow和numTopRight分别表示当前色度块左下侧可用的相邻参考像素的个数和右上侧可用的相邻参考像素的个数,如此,可以将这三种模式的相邻参考像素的选择描述如下:
(i)LM模式。使用当前块的上一行与宽度相等数量的相邻参考像素和左一列与高度相等数量的相邻参考像素,即Ref top=W,Ref left=H,
LM模式下四个亮度样本点的位置分别为:
Figure PCTCN2020119731-appb-000001
(ii)LM_T模式。仅使用当前块的上一行的相邻参考像素,并将范围扩展至当前块的右上侧相邻区域,其范围表示如下:
Figure PCTCN2020119731-appb-000002
LM_T模式下四个亮度样本点的位置分别为:
Figure PCTCN2020119731-appb-000003
(iii)LM_L模式。仅使用当前块的左一列的相邻参考像素,并将范围扩展至当前块的左下侧相邻区域,其范围表示如下:
Figure PCTCN2020119731-appb-000004
LM_L模式下四个亮度样本点的位置分别为:
Figure PCTCN2020119731-appb-000005
之后,可以对所选取的四个亮度样本点进行下采样,再进行四次比较,然后找出最小的两个点(用x 0A和x 1A表示)和最大的两个点(用x 0B和x 1B表示),对应的色度样本点分别用y 0A和y 1A、y 0B和y 1B表示。如图1所示,水平轴(即X轴)用于表示亮度(Luma),垂直轴(即Y轴)用于表示色度(Chroma)。在图1中,两个用黑色填充的点为最小的两个点,两个用白色填充的点为最大的两个点,在两个黑色填充的点之间,用网格线填充的点用Xa和Ya分别表示亮度均值和色度均值;在两个白色填充的点之间,用网格线填充的点用Xb和Yb分别表示亮度均值和色度均值。其中,Xa、Ya、Xb和Yb的计算如下,
Xa=(x 0A+x 1A+1)>>1;Ya=(y 0A+y 1A+1)>>1;
Xb=(x 0B+x 1B+1)>>1;Yb=(y 0B+y 1B+1)>>1。
这样,根据Xa、Ya、Xb和Yb可以推导出模型参数。其中,模型参数α的推导如式(4)所示,模型参数β的推导如式(5)所示。
Figure PCTCN2020119731-appb-000006
β=Y b-α·X b       (5)
在得到α和β之后,最终可以根据式(1)计算得到当前块的色度预测值。
在现有的这多种预测技术中,由于目前已有的预测算法没有充分考虑不同分量间的相关性,导致降低了分量间预测的准确性,进而降低了压缩编码效率。另外,使用已有的CCLM预测方法确定的当前块的预测值与所述当前块周围相邻参考像素的样值之间还存在明显的不连续性,也就降低了预测准确性,进而降低了压缩编码效率。
基于此,本申请实施例提供了一种图像预测方法,既可以应用于编码器,又可以应用于解码器。该方法的基本思想是:在确定当前块的待预测图像分量的初始预测值之后,确定所述当前块的参考图像分量相关的样值;然后根据所述参考图像分量相关的样值,确定所述当前块的边信息;再利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;最后根据所述目标预测值,对所述当前块的待预测图像分量进行编码或者解码。这样,可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
下面将结合附图对本申请各实施例进行详细阐述。
参见图2A,其示出了本申请实施例提供的一种视频编码系统的组成框图示例;如图2A所示,该视频编码系统10包括变换与量化单元101、帧内估计单元102、帧内预测单元103、运动补偿单元104、运动估计单元105、反变换与反量化单元106、滤波器控制分析单元107、滤波单元108、编码单元109和解码图像缓存单元110等,其中,滤波单元108可以实现去方块(Deblocking)滤波、样本自适应缩进(Sample Adaptive 0ffset,SAO)滤波、以及自适应环路滤波(Adaptive loop Filter,ALF)等,编码单元109可以实现头信息编码及基于上下文的自适应二进制算术编码(Context-based Adaptive Binary Arithmatic Coding,CABAC)。针对输入的原始视频信号,通过编码树块(Coding Tree Unit,CTU)的划分可以得到一个视频编码块,然后对经过帧内或帧间预测后得到的残差像素信息通过变换与量化单元101对该视频编码块进行变换,包括将残差信息从像素域变换到变换域,并对所得的变换系数进行量化,用以进一步减少比特率;帧内估计单元102和帧内预测单元103是用于对该视频编码块进行帧内预测;明确地说,帧内估计单元102和帧内预测单元103用于确定待用以编码该视频编码块的帧内预测模式;运动补偿单元104和运动估计单元105用于执行所接收的视频编码块相对于一或多个参考帧中的一或多个块的帧间预测编码以提供时间预测信息;由运动估计单元105执行的运动估计为产生运动向量的过程,所述运动向量可以估计该视频编码块的运动,然后由运动补偿单元104基于由运动估计单元105所确定的运动向量执行运动补偿;在确定帧内预测模式之后,帧内预测单元103还用于将所选择的帧内预测数据提供到编码单元109,而且运动估计单元105将所计算确定的运动向量数据也发送到编码单元109;此外,反变换与反量化单元106是用于该视频编码块的重构建,在像素域中重构建残差块,该重构建残差块通过滤波器控制分析单元107和滤波单元108去除方块效应伪影,然后将该重构残差块添加到解码图像缓存单元110的帧中的一个预测性块,用以产生经重构建的视频编码块;编码单元109是用于编码各种编码参数及量化后的变换系数,在基于CABAC的编码算法中,上下文内容可基于相邻编码块,可用于编码指示所确定的帧内预测模式的信息,输出该视频信号的码流;而解码图像缓存单元110是用于存放重构建的视频编码块,用于预测参考。随着视频图像编码的进行,会不断生成新的重构建的视频编 码块,这些重构建的视频编码块都会被存放在解码图像缓存单元110中。
参见图2B,其示出了本申请实施例提供的一种视频解码系统的组成框图示例;如图2B所示,该视频解码系统20包括解码单元201、反变换与反量化单元202、帧内预测单元203、运动补偿单元204、滤波单元205和解码图像缓存单元206等,其中,解码单元201可以实现头信息解码以及CABAC解码,滤波单元205可以实现去方块滤波、SAO滤波以及ALF滤波等。输入的视频信号经过图2A的编码处理之后,输出该视频信号的码流;该码流输入视频解码系统20中,首先经过解码单元201,用于得到解码后的变换系数;针对该变换系数通过反变换与反量化单元202进行处理,以便在像素域中产生残差块;帧内预测单元203可用于基于所确定的帧内预测模式和来自当前帧或图片的先前经解码块的数据而产生当前视频解码块的预测数据;运动补偿单元204是通过剖析运动向量和其他关联语法元素来确定用于视频解码块的预测信息,并使用该预测信息以产生正被解码的视频解码块的预测性块;通过对来自反变换与反量化单元202的残差块与由帧内预测单元203或运动补偿单元204产生的对应预测性块进行求和,而形成解码的视频块;该解码的视频信号通过滤波单元205以便去除方块效应伪影,可以改善视频质量;然后将经解码的视频块存储于解码图像缓存单元206中,解码图像缓存单元206存储用于后续帧内预测或运动补偿的参考图像,同时也用于视频信号的输出,即得到了所恢复的原始视频信号。
需要说明的是,本申请实施例中的图像预测方法是在各项预测技术进行之后对初始预测值进行增强。这里,本申请实施例中的图像预测方法可以应用于视频编码系统,即该图像预测方法可以是在图2A的预测部分(如图2A中的黑色加粗框图部分)之后应用,或者还可以是在环路滤波部分(如图2A中的灰色加粗框图部分)之前或之后应用。本申请实施例中的图像预测方法还可以应用于视频解码系统,即该图像预测方法可以是在图2B的预测部分(如图2B中的黑色加粗框图部分)之后应用,或者还可以是在环路滤波部分(如图2B中的灰色加粗框图部分)之前或之后应用。也就是说,本申请实施例中的图像预测方法,既可以应用于视频编码系统,也可以应用于视频解码系统,甚至还可以同时应用于视频编码系统和视频解码系统,但是本申请实施例不作具体限定。
还需要说明的是,在进行详细阐述之前,说明书通篇中提到的“第一”、“第二”、“第三”等,仅仅是为了区分不同的特征,不具有限定优先级、先后顺序、大小关系等功能。
本申请的一实施例中,本申请实施例提供的图像预测方法应用于视频编码设备,即编码器。该方法所实现的功能可以通过编码器中的第一处理器调用计算机程序来实现,当然计算机程序可以保存在第一存储器中,可见,该编码器至少包括第一处理器和第一存储器。
基于上述图2A的应用场景示例,参见图3,其示出了本申请实施例提供的一种图像预测方法的流程示意图。如图3所示,该方法可以包括:
S301:确定当前块的待预测图像分量的初始预测值。
需要说明的是,视频图像可以划分为多个图像块,每个当前待编码的图像块可以称为编码块(Coding Block,CB)。这里,每个编码块可以包括第一图像分量、第二图像分量和第三图像分量;而当前块为视频图像中当前待进行第一图像分量或者第二图像分量或者第三图像分量预测的编码块。
其中,假定当前块进行第一图像分量预测,而且第一图像分量为亮度分量,即待预测图像分量为亮度分量,那么当前块也可以称为亮度块;或者,假定当前块进行第二图像分量预测,而且第二图像分量为色度分量,即待预测图像分量为色度分量,那么当前块也可以称为色度块。
在一些实施例中,所述确定当前块的待预测图像分量对应的初始预测值,可以包括:
确定所述当前块的目标预测模式;
根据所述目标预测模式对所述当前块的待预测图像分量进行预测,确定所述当前块的待预测图像分量的初始预测值。
需要说明的是,目标预测模式用于指示当前块编码预测采用的预测模式。这里,针对目标预测模式的确定,可以采用简单的决策策略,比如根据失真值的大小进行确定;也可以采用复杂的决策策略,比如根据率失真优化(Rate Distortion Optimization,RDO)的代价结果进行确定,本申请实施例不作任何限定。通常而言,本申请实施例可以采用RDO方式来确定当前块的目标预测模式。
具体地,在一些实施例中,所述确定所述当前块的目标预测模式,可以包括:
确定所述当前块的待预测图像分量;
利用一种或多种候选预测模式分别对所述待预测图像分量进行预编码,确定所述候选预测模式对应的率失真代价结果;
从所述率失真代价结果中选取最优率失真代价结果,并将所述最优率失真代价结果对应的候选预测模式确定为所述当前块的目标预测模式。
需要说明的是,在编码器侧,针对当前块可以采用一种或多种候选预测模式分别对当前块的待预测图像分量进行预编码处理。这里,候选预测模式通常包括有帧内预测模式、帧间预测模式和分量间预测 模式。其中,帧内预测模式可以包括PLANAR模式、DC模式、角度预测模式、PDCP模式、WAIP模式、MRL模式和MIP模式等,帧间预测模式可以包括Merge模式、MMVD模式、CIIP模式、GPM模式、SbTMVP模式和BCW模式等。分量间预测模式可以包括同图像分量间预测模式和跨图像分量间预测模式。
也就是说,在一些实施例中,目标预测模式可以是帧内预测模式或帧间预测模式。更甚者,目标预测模式还可以是同图像分量间预测模式或跨图像分量间预测模式。一般情况下,分量间预测模式通常是指跨图像分量间预测模式,比如CCLM模式。
这样,在利用一种或多种候选预测模式分别对当前块的待预测图像分量进行预编码之后,可以得到候选预测模式对应的率失真代价结果;然后从所得到的率失真代价结果中选取最优率失真代价结果,并将该最优率失真代价结果对应的候选预测模式确定为当前块的目标预测模式。除此之外,还可以在利用一种或多种候选预测模式分别对当前块的待预测图像分量进行预编码之后,可以得到候选预测模式对应的失真值;然后从所得到的失真值中选取最小失真值,并将最小失真值对应的候选预测模式确定为当前块的目标预测模式。如此,最终使用所确定的目标预测模式对当前块进行编码,可以使得预测残差较小,能够提高编码效率。
S302:确定所述当前块的参考图像分量相关的样值。
需要说明的是,参考图像分量可以包括当前图像中不同于待预测图像分量的一个或多个图像分量,其中,当前图像是所述当前块所处的图像。这里的当前图像是指可以使用当前图像中的不同图像分量作为本申请实施例中滤波增强的输入。
在一些实施例中,所述确定所述当前块的参考图像分量相关的样值,可以包括:
根据所述当前块的参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
需要说明的是,这里参考图像分量的预测值可以解释为“根据当前块参考图像分量的预测模式,确定该预测值”,这里参考图像分量的重建值可以解释为“根据当前块参考图像分量的预测模式在得到预测值之后,通过编码重建所得到的重建值”。也就是说,可以根据当前块的参考图像分量的预测值,或者可以根据当前块的参考图像分量的重建值,或者还可以根据当前块的参考图像分量的预测值和重建值,用以确定出参考图像分量相关的样值。
进一步地,在一些实施例中,所述确定所述当前块的参考图像分量相关的样值,可以包括:
根据所述当前块的相邻像素对应于所述参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
需要说明的是,这里参考图像分量的预测值可以解释为“根据所述相邻像素所处的图像块对应于参考图像分量的预测模式,确定该预测值”,这里参考图像分量的重建值可以解释为“根据所述相邻像素所处的图像块对应于参考图像分量的预测模式在得到预测值之后,通过编码重建所得到的重建值”。也就是说,可以根据当前块的相邻像素对应于参考图像分量的预测值,或者可以根据当前块的相邻像素对应于参考图像分量的重建值,或者还可以根据当前块的相邻像素对应于参考图像分量的预测值和重建值,用以确定出参考图像分量相关的样值。
还需要说明的是,当前块的相邻像素可以包括与所述当前块相邻的至少一行像素。或者,所述当前块的相邻像素也可以包括与所述当前块相邻的至少一列像素。
在本申请实施例中,针对当前块的参考图像分量相关的样值的确定,具体来讲,当前块的参考图像分量相关的样值至少可包括下述其中一项:当前块的参考图像分量的重建值、与当前块相邻的至少一行像素对应的图像分量值、与当前块相邻的至少一列像素对应的图像分量值。其中,参考图像分量与被预测图像分量不同,图像分量值可以包括被预测图像分量参考值和/或参考图像分量参考值。这里的参考值可以是预测值或者重建值。
也可以说,当前块的参考图像分量相关的样值可以包括所述当前块的参考图像分量的重建值、与所述当前块相邻的至少一行像素对应的参考图像分量值和与所述当前块相邻的至少一列像素对应的参考图像分量值中的至少两项,其中,所述当前块的参考图像分量不同于所述待预测图像分量。
另外,针对帧间预测模式而言,参考图像分量可以包括参考图像中一个或多个图像分量,其中,所述参考图像不同于所述当前块所处的当前图像。
需要说明的是,这里的参考图像不同于当前块所在的当前图像,而且参考图像是指可以使用参考图像中的不同图像分量作为本申请实施例中滤波增强的输入。
在一些实施例中,所述确定所述当前块的参考图像分量相关的样值,可以包括:
根据所述当前块的预测参考块中一个或多个图像分量的重建值,确定所述参考图像分量相关的样值。
进一步地,在一些实施例中,该方法还可以包括:
当目标预测模式指示帧间预测模式时,确定所述当前块的帧间预测模式参数,其中,所述帧间预测 模式参数包括指示所述参考图像对应的参考图像索引和指示所述参考图像中所述预测参考块的运动矢量;
将所确定的帧间预测模式参数写入码流。
需要说明的是,针对帧间预测模式,这时候除了当前块所在的当前图像之外,还需要有参考图像。其中,参考图像索引是指参考图像对应的图像索引序号,运动矢量则是用于指示参考图像中的预测参考块。
还需要说明的是,对于帧间预测模式,参考图像索引和运动矢量可以作为帧间预测模式参数并写入码流,以便由编码器传输到解码器。
这样,在得到当前块的参考图像分量相关的样值之后,可以根据参考图像分量相关的样值来确定当前块的边信息。
S303:根据所述参考图像分量相关的样值,确定所述当前块的边信息。
需要说明的是,在本申请实施例中,本申请的技术关键是使用一个图像分量的相关参数对另一个图像分量的初始预测值进行增强滤波。这里,“一个图像分量”的相关参数主要为当前块的边信息。其中,一方面可以直接将当前块的参考图像分量相关的样值确定为当前块的边信息;另一方面也可以对参考图像分量相关的样值进行上采样/下采样等滤波处理,将滤波后的样值确定为当前块的边信息,本申请实施例不作任何限定。
也就是说,在一种可能的实施方式中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,可以包括:将所述参考图像分量相关的样值确定为所述当前块的边信息。
在另一种可能的实施方式中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,可以包括:根据颜色分量采样格式,对所述参考图像分量相关的样值进行第一滤波处理,得到滤波后的参考图像分量相关的样值;将所述滤波后的参考图像分量相关的样值确定为所述当前块的边信息。
需要说明的是,颜色分量可以包括亮度分量、蓝色色度分量和红色色度分量,而颜色分量采样格式可以有4:4:4格式、4:2:2格式和4:2:0格式。其中,4:4:4格式表示相对于亮度分量,蓝色色度分量或红色色度分量均没有下采样。4:2:2格式表示蓝色色度分量或红色色度分量相对于亮度分量进行2:1的水平下采样,没有竖直下采样。4:2:0格式表示蓝色色度分量或红色色度分量相对于亮度分量进行2:1的水平下采样和2:1的竖直下采样。也就是说,4:2:2格式和4:2:0格式适用于上述的第一滤波处理,而4:4:4格式不适用于上述的第一滤波处理。
进一步地,在一些实施例中,该方法还可以包括:
将所述颜色分量采样格式写入码流。
或者,在一些实施例中,该方法还可以包括:
确定待写入码流的比特字段的取值;其中,所述比特字段的取值用于指示所述颜色分量采样格式;
将所述比特字段的取值写入码流。
也就是说,编码器侧,在确定颜色分量采样格式之后,可以将颜色分量采样格式写入码流;或者,确定出用于指示颜色分量采样格式的比特字段的取值,该比特字段的取值写入码流;然后由编码器传输到解码器,以便解码器解析码流后直接获得颜色分量采样格式。
还需要说明的是,由于亮度分量和色度分量(比如蓝色色度分量或红色色度分量)的分辨率不同,这时候根据具体情况需要对参考图像分量相关的样值进行第一滤波处理,比如上采样/下采样处理。具体地,在一些实施例中,所述对所述参考图像分量相关的样值进行第一滤波处理,可以包括:
当所述初始预测值的分辨率小于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行下采样处理;
当所述初始预测值的分辨率大于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行上采样处理;
当所述初始预测值的分辨率等于所述参考图像分量相关的样值的分辨率时,将所述滤波后的参考图像分量相关的样值设置为等于所述参考图像分量相关的样值。
在本申请实施例中,滤波后的参考图像分量相关的样值的分辨率与初始预测值的分辨率相等。
示例性地,如果待预测图像分量为色度分量,而参考图像分量为亮度分量,那么需要对参考图像分量相关的样值进行下采样处理,而且下采样处理后的亮度分量的分辨率与色度分量的分辨率相同;或者,如果待预测图像分量为亮度分量,而参考图像分量为色度分量,那么需要对参考图像分量相关的样值进行上采样处理,而且上采样处理后的色度分量的分辨率与亮度分量的分辨率相同。此外,如果待预测图像分量为蓝色色度分量,而参考图像分量为红色色度分量,那么由于蓝色色度分量的分辨率和红色色度分量的分辨率相同,此时不需要对参考图像分量相关的样值进行第一滤波处理,即可以将滤波后的参考图像分量相关的样值设置为等于滤波之前参考图像分量相关的样值。
这样,在确定出参考图像分量相关的样值之后,可以根据参考图像分量相关的样值确定出当前块的边信息,以便利用当前块的边信息对待预测图像分量的初始预测值进行滤波增强。
S304:利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值。
需要说明的是,本申请实施例利用预设网络模型,该预设网络模型可以命名为局部数据短接的神经网络模型,或者命名为半残差网络(Semi-residual Network),然后结合当前块的边信息对初始预测值进行滤波增强,用以提高预测的准确度。
还需要说明的是,当前块的边信息并不等同于本领域人员常规理解的“side information”,本申请实施例的边信息主要为:使用“一个图像分量”对“另一个图像分量”的预测值进行滤波增强。另外,边信息可以为“一个或多个图像分量”相关的参数,这些参数可以是用于获得“一个或多个图像分量”的预测值或者重建值的参数,也可以直接是“一个或多个图像分量”的预测值或者重建值。也就是说,本申请实施例中,当前块的边信息可以是根据当前块的参考图像分量相关的样值确定的。
对于预设网络模型而言,在S304之前,该方法还可以包括:确定预设网络模型。
在本申请实施例中,预设网络模型是通过模型训练得到的。在一些实施例中,具体可以包括:
获取训练样本集;其中,所述训练样本集包括一个或多个图像;
构建初始网络模型,利用所述训练样本集对所述初始网络模型进行训练;
将训练后的初始网络模型确定为所述预设网络模型。
需要说明的是,训练样本集可以包括有一个或多个图像。训练样本集可以是编码器在本地存储的训练样本集合,也可以是根据链接或者地址信息从远程服务器上获取的训练样本集合,甚至也可以是视频中已经解码的图像样本集合,本申请实施例不作具体限定。
这样,在获取到训练样本集之后,可以利用训练样本集通过代价函数对初始网络模型进行训练,当该代价函数的损失值(Loss)收敛到一定预设阈值时,这时候训练得到的初始网络模型即为预设网络模型。这里,代价函数可以为率失真代价函数,预设阈值可以根据实际情况进行具体设定,本申请实施例不作任何限定。
还需要说明的是,所述确定预设网络模型,可以先确定出预设网络模型中的网络模型参数。在一些实施例中,所述确定所述预设网络模型,可以包括:
确定网络模型参数;
根据所确定的网络模型参数,构建所述预设网络模型。
在本申请实施例中,网络模型参数可以是通过模型训练确定的。具体地,在一些实施例中,可以包括:获取训练样本集;构建初始网络模型,其中,所述初始网络模型包括模型参数;利用所述训练样本集对所述初始网络模型进行训练,将训练后的初始网络模型中的模型参数确定为所述网络模型参数。
这时候,在编码器侧,通过模型训练得到网络模型参数之后,可以将网络模型参数写入码流。这样,解码器侧可直接通过解析码流来获得网络模型参数,而无需在解码器侧进行模型训练就能够构建出预设网络模型。
在一些实施例中,预设网络模型可以包括神经网络模型和第一加法器。
需要说明的是,神经网络模型可以包括至少一项:卷积层、残差层、平均池化层和采样率转换模块。这里,卷积神经网络(Convolutional Neural Networks,CNN)是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(deep learning)的代表算法之一。卷积神经网络具有表征学习(representation learning)能力,能够按其阶层结构对输入信息进行平移不变分类(shift-invariant classification),因此也被称为“平移不变人工神经网络(Shift-Invariant Artificial Neural Networks,SIANN)”。神经网络已经发展到了深度学习阶段。深度学习是机器学习的分支,是一种试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的算法,其强大的表达能力使其在视频和图像处理上的表现取得了良好的效果。
还需要说明的是,残差层可以是由激活函数、卷积层和第二加法器组成,但是这里不作具体限定。其中,激活函数可以是线性整流函数(Rectified Linear Unit,ReLU),又可称为修正线性单元,是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数。
进一步地,在一些实施例中,所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,可以包括:
将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值。
需要说明的是,由于预设网络模型可以包括神经网络模型和第一加法器,那么将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预 测图像分量的目标预测值,具体可以包括:将所述边信息和所述初始预测值输入所述神经网络模型,输出中间值;通过所述第一加法器对所述中间值和所述初始预测值进行相加处理,得到所述目标预测值。
也就是说,如图4所示,预设网络模型包括有神经网络模型401和第一加法器402。其中,输入为边信息和待预测图像分量的初始预测值,在经过神经网络模型401的处理后,可以得到中间值,然后通过第一加法器402对中间值和待预测图像分量的初始预测值进行相加处理,最终的输出为待预测图像分量的目标预测值。换句话说,该预设网络模型实现了从二通道输入降至一通道输出。
S305:根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
需要说明的是,在得到待预测图像分量的目标预测值之后,可以对当前块的待预测图像分量进行编码。具体地,根据目标预测值,可以计算当前块的残差值(即目标预测值与真实值之间的差值),然后对残差值进行编码并写入码流。
本实施例提供了一种图像预测方法,应用于编码器。通过确定当前块的待预测图像分量的初始预测值;确定所述当前块的参考图像分量相关的样值;根据所述参考图像分量相关的样值,确定所述当前块的边信息;利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;根据所述目标预测值,对所述当前块的待预测图像分量进行编码。这样,可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
本申请的另一实施例中,下面将结合几种具体应用场景对本申请实施例的图像预测方法进行详细阐述。
本申请实施例的技术方案提出利用局部数据短接的神经网络技术(或称为半残差网络),结合当前块周围的边信息与当前块的相关性,对预测后的初始预测值进行滤波增强,以提高预测准确度。也就是说,本申请实施例聚焦在对一个图像分量的初始预测值使用不同与该图像分量的其他一个或多个图像分量对该初始预测值进行滤波。其中,初始预测值可以使用普通的非分量间预测方式(如帧内预测模式和帧间预测模式)得到,也可以使用分量间预测方式(如CCLM模式)得到。
(1)参见图5,其示出了本申请实施例提供的另一种图像预测方法的应用场景示意图。其中,假定待预测图像分量为色度分量(用Cb或Cr表示),参考图像分量为当前块的亮度分量(用L表示)。注意,这里的色度块的初始预测值(Cb pred和Cr pred)可以是利用CCLM模式预测得到的,也可以是其他帧内预测模式或者帧间预测模式预测得到的。
在一种可能的实施方式中,如果当前块的边信息为当前块的亮度分量重建值,待预测图像分量为色度分量时,
所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,可以包括:
基于颜色分量采样格式,对亮度分量重建值进行下采样滤波处理,得到滤波后的重建值;
将滤波后的重建值和色度分量的初始预测值输入到预设网络模型中,通过预设网络模型输出色度分量的目标预测值。
在另一种可能的实施方式中,如果当前块的边信息为当前块的亮度分量重建值,待预测图像分量为色度分量时,
所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,可以包括:
基于颜色分量采样格式,对色度分量的初始预测值进行上采样滤波处理,得到滤波后的初始预测值;
将当前块的亮度分量重建值和滤波后的初始预测值输入到预设网络模型中,通过预设网络模型输出色度分量的目标预测值。
如图5所示,当前块的边信息是指亮度分量重建值(L rec),当前块的待预测图像分量的初始预测值是指色度分量初始预测值(Cb pred或Cr pred)。这时候利用当前块的亮度分量重建值对经过色度分量预测的初始预测值进行细化,经过图5中所示的预设网络模型后可以得到增强后的色度分量目标预测值(Cb pred’/Cr pred’)。
具体来讲,预设网络模型的输入为当前块的亮度分量重建值L rec以及当前块的色度分量预测值Cb pred/Cr pred;其中,对于当前块的亮度分量重建值L rec,根据不同的颜色分量采样格式,假定为4:2:0格式,这时候经过2倍下采样(如果为4:4:4格式,那么无需该下采样步骤)后,亮度分量大小与色度分量大小对齐。由于该预设网络模型有效地学习了两个输入的相关性,同时将该预设网络模型的其中一个输入(色度分量预测值Cb pred/Cr pred)连接至该模型的输出,使得输出Cb pred’/Cr pred’相比于原来的 色度分量预测值Cb pred/Cr pred,更接近当前块的真实色度值(或称为原始色度值)。
进一步地,如图6所示,其示出了本申请实施例提供的一种预设网络模型的网络结构示意图。这里,该预设网络模型可以包括神经网络模型601和第一加法器602两部分。其中,神经网络模型601可以由卷积层、残差层、平均池化层和采样转换模块堆叠而成,并且将预设网络模型的其中一个输入(色度分量预测值Cb pred/Cr pred)连接至预设网络模型的输出,与神经网络模型的输出相加得到Cb pred’/Cr pred’。对于残差层,也可以称为残差块(Residual Block,Resblock),其网络结构示例如图7所示。
需要说明的是,在图6中,在神经网络模型601中,针对两个输入(L rec和Cb pred/Cr pred),首先针对L rec进行两倍下采样,然后通过拼接层(Concatenate)进行拼接;再通过卷积层进行卷积操作以提取特征图,以及通过残差层(Resblock)、平均池化层、采样转换模块和两个卷积层等处理后输出中间值,最后利用第一加法器602对其中一个输入(色度分量预测值Cb pred/Cr pred)和中间值进行相加,输出Cb pred’/Cr pred’。这里,卷积层可以分为第一卷积层和第二卷积层,第一卷积层为Conv(3,64,1),即卷积核为3*3,通道数为64,步长为1;第二卷积层为Conv(3,1,1),即卷积核为3*3,通道数为1,步长为1。另外,平均池化层(Avg-pooling)具有下采样功能,因此在该神经网络模型中还可以包括有采样转换模块。采样转换模块可以是指上采样模块(Up-sampling),也可以是指下采样模块(Under-sampling)。在本申请实施例中,采样转换模块通常是指上采样模块(Up-sampling),如图6中神经网络模型601的示例。
还需要说明的是,在图7中,残差层可以是由残差网络701和第二加法器702组成。其中,残差网络701可以由激活函数和卷积层组成,激活函数可以用ReLU表示,卷积层为第一卷积层,即Conv(3,64,1)。这里,残差层的输入(Input)通过残差网络701后得到的输出,将会和残差层的输入由第二加法器702进行相加,相加后得到残差层的输出(Output)。
另外,在图6的网络结构示例中,对于神经网络模型601,总共可以包括有1个拼接层、2个第一卷积层、6个残差层、2个平均池化层、2个上采样模块和1个第二卷积层。需要注意的是,该网络结构并不唯一,还可以为其他堆叠方式或其他网络结构,本申请实施例不作具体限定。
示例性地,参见图8,其示出了本申请实施例提供的另一种预设网络模型的网络结构示意图。参见图9,其示出了本申请实施例提供的又一种预设网络模型的网络结构示意图。参见图10,其示出了本申请实施例提供的再一种预设网络模型的网络结构示意图。这里,图8中是通过对亮度分量重建值(L rec)进行下采样,以实现亮度分量重建值(L rec)的分辨率和色度分量预测值(Cb pred/Cr pred)的分辨率相同。图9和图10则是通过对色度分量预测值(Cb pred/Cr pred)进行上采样,以实现亮度分量重建值(L rec)的分辨率和色度分量预测值(Cb pred/Cr pred)的分辨率相同。换句话说,图8、图9和图10给出了三种网络结构的替换示例,用以说明预设网络模型的网络结构并不唯一,还可以为其他堆叠方式或其他网络结构。
(2)参见图11,其示出了本申请实施例提供的又一种图像预测方法的应用场景示意图。其中,假定待预测图像分量为色度分量(用Cb或Cr表示),当前块的边信息为当前块的亮度分量重建值(用L rec表示)和上相邻块的亮度分量重建值(用TopL rec表示)。
参见图12,其示出了本申请实施例提供的再一种图像预测方法的应用场景示意图。其中,假定待预测图像分量为色度分量(用Cb或Cr表示),当前块的边信息为当前块的亮度分量重建值(用L rec表示)和左相邻块的亮度分量重建值(用LeftL rec表示)。
参见图13,其示出了本申请实施例提供的再一种图像预测方法的应用场景示意图。其中,假定待预测图像分量为色度分量(用Cb或Cr表示),当前块的边信息为当前块的亮度分量重建值(用L rec表示)和上相邻块的色度分量预测值(用TopCb pred/TopCr pred表示)。
参见图14,其示出了本申请实施例提供的再一种图像预测方法的应用场景示意图。其中,假定待预测图像分量为色度分量(用Cb或Cr表示),当前块的边信息为当前块的亮度分量重建值(用L rec表示)和左相邻块的色度分量预测值(用LeftCb pred/LeftCr pred表示)。
需要说明的是,预设网络模型的边信息可以是其他边信息。对于色度分量的块来说,边信息可以为当前块的上相邻块和左相邻块的亮度分量(TopL rec,LeftL rec)和色度分量(TopCb pred/TopCr pred,LeftCb pred/LeftCr pred),等等,具体如图11、图12、图13和图14所示。
(3)参见图15,其示出了本申请实施例提供的再一种图像预测方法的应用场景示意图。其中,假定待预测图像分量为色度分量(用Cb或Cr表示),当前块的边信息为当前块的亮度分量重建值(用L rec表示)、上相邻块的亮度分量重建值(用TopLrec表示)、左相邻块的亮度分量重建值(用LeftLrec表示)、上相邻块的色度分量预测值(用TopCbpred/TopCrpred表示)和左相邻块的色度分量预测值(用LeftCbpred/LeftCrpred表示)。
在一些实施例中,如果当前块的边信息包括当前块的参考图像分量的重建值、与当前块相邻的至少 一行像素对应的图像分量值和与当前块相邻的至少一列像素对应的图像分量值中的至少两项时,
所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,可以包括:
基于所述至少两项,组成联合边信息;
将所述联合边信息和所述被预测图像分量的初始预测值输入到所述预设网络模型中,通过所述预设网络模型输出所述被预测图像分量的目标预测值。
需要说明的是,预设网络模型的边信息可以是联合边信息。甚至可以联合全部边信息作为输入或者联合部分边信息作为输入,如图15所示。
(4)参见图16,其示出了本申请实施例提供的再一种图像预测方法的应用场景示意图。如图16所示,在得到当前块的色度分量初始预测值之后,将当前亮度块的上和左相邻两行/列(或多行/列)与当前亮度块(用斜线填充)拼接,并进行2倍下采样,当前色度块的上和左相邻一行/列(或多行/列)与当前色度块(用网格线填充)拼接,二者共同作为预设网络模型的输入。同时将当前色度块连接至预设网络模型的输出,最终得到滤波增强后色度分量的目标预测值(Cb pred’/Cr pred’)。
进一步地,在一些实施例中,联合边信息包括所述当前块上侧相邻的至少一行参考像素对应的被预测图像分量参考值和所述当前块左侧相邻的至少一列参考像素对应的被预测图像分量参考值,该方法还可以包括:
将所述当前块上侧相邻的至少一行参考像素对应的被预测图像分量参考值、所述当前块左侧相邻的至少一列参考像素对应的被预测图像分量参考值和所述当前块的被预测图像分量的初始预测值输入到所述预设网络模型中,通过所述预设网络模型输出所述当前块的被预测图像分量的目标预测值。
也就是说,如图17所示,在得到亮度分量的初始预测值(L pred)之后,可利用当前亮度块的上和左相邻两行(或多行)与当前亮度块(用斜线填充)拼接,作为预设网络模型的输入。同时将当前亮度块连接至预设网络模型的输出,最终得到滤波增强后的亮度分量的目标预测值(L pred’)。
除此之外,在本申请实施例中,在环路滤波环节之前或者之后,也可以使用本申请实施例所提出的输入数据局部短接型网络(即预设网络模型)对重建数据进行增强。或者,在其他用于有损或无损压缩编码网络框架中,在得到相应预测信息后,也可以增加一个本申请实施例提出的输入数据局部短接型网络,将其他边信息作为输入的方式以提高预测信息的精度。或者,在其他用于有损或无损压缩编码网络框架中,也可使用上述的预设网络模型进行滤波增强方法。还需要说明的是,针对视频编码中在其他亮度或色度分量编码后所得到的初始预测值,也可以利用上述的方式来提高预测值的精度。
简言之,本申请实施例在于利用一个局部输入数据短接型深度学习网络以增强视频编码中预测值的准确度,进而提高编码效率。一方面,待增强数据(如上述实施例中的色度分量初始预测值)和边信息(如上述实施例中的当前块的亮度分量重建值)同时作为输入,利用待增强数据和边信息(如色度分量与亮度分量)之间的高度相关性对待增强数据进行增强,进而提高整体编解码的性能。另一方面,同时将预设网络模型的部分输入,即待增强数据(如色度分量初始预测值)连接至预设网络模型的输出,还可以有利于网络模型的训练。这样,可以利用数据之间的相关性,用局部输入数据短接型网络对预测数据进行增强;比如利用色度分量与亮度分量之间的高度相关性,对色度分量的初始预测值利用神经网络增强以有效提高对色度分量的预测精度,进而提高编码效率;而且局部输入数据短接型网络自身也易于训练和应用于实际场景的运算。
本实施例提供了一种图像预测方法,通过上述实施例对前述实施例的实现进行具体阐述,从中可以看出,利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,可以使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
本申请的又一实施例中,本申请实施例提供的图像预测方法应用于视频解码设备,即解码器。该方法所实现的功能可以通过解码器中的第二处理器调用计算机程序来实现,当然计算机程序可以保存在第二存储器中,可见,该解码器至少包括第二处理器和第二存储器。
基于上述图2B的应用场景示例,参见图18,其示出了本申请实施例提供的另一种图像预测方法的流程示意图。如图18所示,该方法可以包括:
S1801:解析码流,获取当前块的目标预测模式。
需要说明的是,视频图像可以划分为多个图像块,每个当前待解码的图像块可以称为解码块。这里,每个解码块也可以包括第一图像分量、第二图像分量和第三图像分量;而当前块为视频图像中当前待进行第一图像分量或者第二图像分量或者第三图像分量预测的解码块。
其中,假定当前块进行第一图像分量预测,而且第一图像分量为亮度分量,即待预测图像分量为亮 度分量,那么当前块也可以称为亮度块;或者,假定当前块进行第二图像分量预测,而且第二图像分量为色度分量,即待预测图像分量为色度分量,那么当前块也可以称为色度块。
还需要说明的是,编码器在确定出目标预测模式之后,编码器会将目标预测模式写入码流。这样,解码器通过解析码流,可以获得当前块的目标预测模式。其中,目标预测模式可以是帧内预测模式或帧间预测模式。更甚者,目标预测模式还可以是同图像分量间预测模式或跨图像分量间预测模式。一般情况下,分量间预测模式通常是指跨图像分量间预测模式,比如CCLM模式。
S1802:根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值。
需要说明的是,目标预测模式用于指示当前块编码预测采用的预测模式。在一些实施例中,所述根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值,可以包括:
根据所述目标预测模式对所述当前块的待预测图像分量进行预测,得到所述当前块的待预测图像分量的初始预测值。
也就是说,在得到目标预测模式之后,可以根据目标预测模式对当前块的待预测图像分量进行预测,能够得到当前块的待预测图像分量的初始预测值。
S1803:确定所述当前块的参考图像分量相关的样值。
需要说明的是,参考图像分量可以包括当前图像中不同于待预测图像分量的一个或多个图像分量,其中,当前图像是所述当前块所处的图像。
在一些实施例中,所述确定所述当前块的参考图像分量相关的样值,可以包括:
根据所述当前块的参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
进一步地,在一些实施例中,所述确定所述当前块的参考图像分量相关的样值,可以包括:
根据所述当前块的相邻像素对应于所述参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
还需要说明的是,当前块的相邻像素可以包括与所述当前块相邻的至少一行像素。或者,所述当前块的相邻像素也可以包括与所述当前块相邻的至少一列像素。
在本申请实施例中,当前块的参考图像分量相关的样值可以包括所述当前块的参考图像分量的重建值、与所述当前块相邻的至少一行像素对应的参考图像分量值和与所述当前块相邻的至少一列像素对应的参考图像分量值中的至少两项,其中,所述当前块的参考图像分量不同于所述待预测图像分量。
另外,针对帧间预测模式而言,参考图像分量可以包括参考图像中一个或多个图像分量,其中,所述参考图像不同于当前块所处的当前图像。
在一些实施例中,所述确定所述当前块的参考图像分量相关的样值,可以包括:
根据所述当前块的预测参考块中一个或多个图像分量的重建值,确定所述参考图像分量相关的样值。
进一步地,在一些实施例中,该方法还可以包括:
当目标预测模式指示帧间预测模式时,解析所述码流,获取所述当前块的帧间预测模式参数,其中,所述帧间预测模式参数包括参考图像索引和运动矢量;
根据所述参考图像索引,确定所述参考图像;
根据所述运动矢量,在所述参考图像中确定所述预测参考块。
需要说明的是,针对帧间预测模式,这时候除了当前块所在的当前图像之外,还需要有参考图像。其中,参考图像索引是指参考图像对应的图像索引序号,运动矢量则是用于指示参考图像中的预测参考块。这里,编码器可以将参考图像索引和运动矢量作为帧间预测模式参数并写入码流,以便由编码器传输到解码器。这样,解码器通过解析码流,就可以直接获得参考图像索引和运动矢量。
这样,在得到当前块的参考图像分量相关的样值之后,可以根据参考图像分量相关的样值来确定当前块的边信息。
S1804:根据所述参考图像分量相关的样值,确定所述当前块的边信息。
需要说明的是,在本申请实施例中,本申请的技术关键是使用一个图像分量的相关参数对另一个图像分量的初始预测值进行增强滤波。这里,“一个图像分量”的相关参数主要为当前块的边信息。其中,一方面可以直接将当前块的参考图像分量相关的样值确定为当前块的边信息;另一方面也可以对参考图像分量相关的样值进行上采样/下采样等滤波处理,将滤波后的样值确定为当前块的边信息,本申请实施例不作任何限定。
也就是说,在一种可能的实施方式中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,可以包括:将所述参考图像分量相关的样值确定为所述当前块的边信息。
在另一种可能的实施方式中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,可以包括:根据颜色分量采样格式,对所述参考图像分量相关的样值进行第一滤波处理,得到滤波后的参考图像分量相关的样值;将所述滤波后的参考图像分量相关的样值确定为所述当前块的边信息。
需要说明的是,颜色分量可以包括亮度分量、蓝色色度分量和红色色度分量,而颜色分量采样格式可以有4:4:4格式、4:2:2格式和4:2:0格式。这里,4:2:2格式和4:2:0格式适用于上述的第一滤波处理,而4:4:4格式不适用于上述的第一滤波处理。
进一步地,在一些实施例中,该方法还可以包括:
解析所述码流,获取所述颜色分量采样格式。
或者,在一些实施例中,该方法还可以包括:
解析所述码流中的参数集数据单元,获取用于指示所述颜色分量采样格式的比特字段的取值;
根据所述比特字段的取值,确定所述颜色分量采样格式。
也就是说,当编码器在确定颜色分量采样格式之后,可以将颜色分量采样格式写入码流;或者,确定出用于指示颜色分量采样格式的比特字段的取值,该比特字段的取值写入码流;然后由编码器传输到解码器,以便解码器通过解析码流后,就可以直接获得颜色分量采样格式。
还需要说明的是,由于亮度分量和色度分量(比如蓝色色度分量或红色色度分量)的分辨率不同,这时候根据具体情况需要对参考图像分量相关的样值进行第一滤波处理,比如上采样/下采样处理。具体地,在一些实施例中,所述对所述参考图像分量相关的样值进行第一滤波处理,可以包括:
当所述初始预测值的分辨率小于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行下采样处理;
当所述初始预测值的分辨率大于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行上采样处理;
当所述初始预测值的分辨率等于所述参考图像分量相关的样值的分辨率时,将所述滤波后的参考图像分量相关的样值设置为等于所述参考图像分量相关的样值。
在本申请实施例中,滤波后的参考图像分量相关的样值的分辨率与初始预测值的分辨率相等。
示例性地,如果待预测图像分量为色度分量,而参考图像分量为亮度分量,那么需要对参考图像分量相关的样值进行下采样处理,而且下采样处理后的亮度分量的分辨率与色度分量的分辨率相同;或者,如果待预测图像分量为亮度分量,而参考图像分量为色度分量,那么需要对参考图像分量相关的样值进行上采样处理,而且上采样处理后的色度分量的分辨率与亮度分量的分辨率相同。此外,如果待预测图像分量为蓝色色度分量,而参考图像分量为红色色度分量,那么由于蓝色色度分量的分辨率和红色色度分量的分辨率相同,此时不需要对参考图像分量相关的样值进行第一滤波处理,即可以将滤波后的参考图像分量相关的样值设置为等于滤波之前参考图像分量相关的样值。
这样,在确定出参考图像分量相关的样值之后,可以根据参考图像分量相关的样值确定出当前块的边信息,以便利用当前块的边信息对待预测图像分量的初始预测值进行滤波增强。
S1805:利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值。
需要说明的是,本申请实施例利用预设网络模型,该预设网络模型可以命名为局部数据短接的神经网络模型,或者命名为半残差网络(Semi-residual Network),然后结合当前块的边信息对初始预测值进行滤波增强,用以提高预测的准确度。
对于预设网络模型而言,在S1805之前,该方法还可以包括:确定预设网络模型。
在本申请实施例中,预设网络模型是通过模型训练得到的。在一些实施例中,具体可以包括:
获取训练样本集;其中,所述训练样本集包括一个或多个图像;
构建初始网络模型,利用所述训练样本集对所述初始网络模型进行训练;
将训练后的初始网络模型确定为所述预设网络模型。
需要说明的是,训练样本集可以包括有一个或多个图像。训练样本集可以是编码器在本地存储的训练样本集合,也可以是根据链接或者地址信息从远程服务器上获取的训练样本集合,甚至也可以是视频中已经解码的图像样本集合,本申请实施例不作具体限定。
这样,在获取到训练样本集之后,可以利用训练样本集通过代价函数对初始网络模型进行训练,当该代价函数的损失值(Loss)收敛到一定预设阈值时,这时候训练得到的初始网络模型即为预设网络模型。这里,代价函数可以为率失真代价函数,预设阈值可以根据实际情况进行具体设定,本申请实施例不作任何限定。
还需要说明的是,所述确定预设网络模型,可以先确定出预设网络模型中的网络模型参数。在一种可能的实施方式中,所述确定所述预设网络模型,可以包括:
确定网络模型参数;
根据所确定的网络模型参数,构建所述预设网络模型。
在本申请实施例中,网络模型参数可以是通过模型训练确定的。具体地,在一些实施例中,可以包 括:获取训练样本集;构建初始网络模型,其中,所述初始网络模型包括模型参数;利用所述训练样本集对所述初始网络模型进行训练,将训练后的初始网络模型中的模型参数确定为所述网络模型参数。
在另一种可能的实施方式中,所述确定所述预设网络模型,可以包括:
解析所述码流,获取所述预设网络模型的网络模型参数;
根据所述网络模型参数,确定所述预设网络模型。
这时候,编码器通过模型训练得到网络模型参数之后,将网络模型参数写入码流。这样,解码器可直接通过解析码流来获得网络模型参数,而无需在解码器进行模型训练就能够构建出预设网络模型。
在一些实施例中,预设网络模型可以包括神经网络模型和第一加法器。
需要说明的是,神经网络模型至少可以包括卷积层、残差层、平均池化层和采样率转换模块。这里,残差层可以是由激活函数、卷积层和第二加法器组成。采样转换模块可以是上采样模块,也可以是下采样模块。其中,在本申请实施例的神经网络模型中,平均池化层和采样率转换模块相当于低通滤波的效果,而且采样率转换模块通常是指上采样模块,但是本申请实施例不作具体限定。
进一步地,在一些实施例中,所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,可以包括:
将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值。
需要说明的是,由于预设网络模型可以包括神经网络模型和第一加法器,那么将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值,具体可以包括:将所述边信息和所述初始预测值输入所述神经网络模型,输出中间值;通过所述第一加法器对所述中间值和所述初始预测值进行相加处理,得到所述目标预测值。
S1806:根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
需要说明的是,在得到待预测图像分量的目标预测值之后,可以对当前块的待预测图像分量进行解码。具体地,在得到目标预测值之后,通过解析码流获得残差值,然后利用残差值和目标预测值就可以解码恢复出真实图像信息。
本实施例提供了一种图像预测方法,应用于解码器。通过解析码流,获取当前块的目标预测模式;根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;确定所述当前块的参考图像分量相关的样值;根据所述参考图像分量相关的样值,确定所述当前块的边信息;利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;根据所述目标预测值,对所述当前块的待预测图像分量进行解码。这样,可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
基于前述实施例相同的发明构思,参见图19,其示出了本申请实施例提供的一种编码器190的组成结构示意图。如图190所示,该编码器190可以包括:第一确定单元1901、第一预测单元1902和编码单元1903;其中,
第一确定单元1901,配置为确定当前块的待预测图像分量的初始预测值;
第一确定单元1901,还配置为确定所述当前块的参考图像分量相关的样值;以及根据所述参考图像分量相关的样值,确定所述当前块的边信息;
第一预测单元1902,配置为利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
编码单元1903,配置为根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
在一些实施例中,第一预测单元1901,还配置为确定所述当前块的目标预测模式;以及根据所述目标预测模式对所述当前块的待预测图像分量进行预测,确定所述当前块的待预测图像分量的初始预测值。
在一些实施例中,参见图19,编码器190还可以包括预编码单元1904;
第一确定单元1901,还配置为确定所述当前块的待预测图像分量;
预编码单元1904,配置为利用一种或多种候选预测模式分别对所述待预测图像分量进行预编码,确定所述候选预测模式对应的率失真代价结果;
第一确定单元1901,还配置为从所述率失真代价结果中选取最优率失真代价结果,并将所述最优率失真代价结果对应的候选预测模式确定为所述当前块的目标预测模式。
在一些实施例中,所述目标预测模式是帧内预测模式或帧间预测模式。
在一些实施例中,所述目标预测模式是同图像分量间预测模式或跨图像分量间预测模式。
在一些实施例中,所述参考图像分量包括当前图像中不同于所述待预测图像分量的一个或多个图像分量,其中,所述当前图像是所述当前块所处的图像。
在一些实施例中,第一确定单元1901,具体配置为根据所述当前块的参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
在一些实施例中,第一确定单元1901,具体配置为根据所述当前块的相邻像素对应于所述参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
在一些实施例中,所述当前块的相邻像素包括与所述当前块相邻的至少一行像素。
在一些实施例中,所述当前块的相邻像素包括与所述当前块相邻的至少一列像素。
在一些实施例中,所述参考图像分量包括参考图像中一个或多个图像分量,其中,所述参考图像不同于所述当前块所处的当前图像。
在一些实施例中,第一确定单元1901,具体配置为根据所述当前块的预测参考块中一个或多个图像分量的重建值,确定所述参考图像分量相关的样值。
在一些实施例中,参见图19,编码器190还可以包括写入单元1905;
第一确定单元1901,还配置为当所述目标预测模式指示帧间预测模式时,确定所述当前块的帧间预测模式参数,其中,所述帧间预测模式参数包括指示所述参考图像对应的参考图像索引和指示所述参考图像中所述预测参考块的运动矢量;
写入单元1905,配置为将所确定的帧间预测模式参数写入码流。
在一些实施例中,第一确定单元1901,还配置为将所述参考图像分量相关的样值确定为所述当前块的边信息。
在一些实施例中,参见图19,编码器190还可以包括第一采样单元1906,配置为根据颜色分量采样格式,对所述参考图像分量相关的样值进行第一滤波处理,得到滤波后的参考图像分量相关的样值;以及将所述滤波后的参考图像分量相关的样值确定为所述当前块的边信息。
在一些实施例中,写入单元1905,还配置为将所述颜色分量采样格式写入码流。
在一些实施例中,第一确定单元1901,还配置为确定待写入码流的比特字段的取值;其中,所述比特字段的取值用于指示所述颜色分量采样格式;
写入单元1905,还配置为将所述比特字段的取值写入码流。
在一些实施例中,第一采样单元1906,具体配置为当所述初始预测值的分辨率小于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行下采样处理;当所述初始预测值的分辨率大于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行上采样处理;当所述初始预测值的分辨率等于所述参考图像分量相关的样值的分辨率时,将所述滤波后的参考图像分量相关的样值设置为等于所述参考图像分量相关的样值。
在一些实施例中,所述滤波后的参考图像分量相关的样值的分辨率与所述初始预测值的分辨率相等。
在一些实施例中,所述当前块的参考图像分量相关的样值包括所述当前块的参考图像分量的重建值、与所述当前块相邻的至少一行像素对应的参考图像分量值和与所述当前块相邻的至少一列像素对应的参考图像分量值中的至少两项,其中,所述当前块的参考图像分量不同于所述待预测图像分量。
在一些实施例中,第一预测单元1902,具体配置为将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值。
在一些实施例中,第一确定单元1901,还配置为确定所述预设网络模型。
在一些实施例中,所述预设网络模型包括神经网络模型和第一加法器。
在一些实施例中,第一预测单元1902,具体配置为将所述边信息和所述初始预测值输入所述神经网络模型,输出中间值;以及通过所述第一加法器对所述中间值和所述初始预测值进行相加处理,得到所述目标预测值。
在一些实施例中,所述神经网络模型包括下述至少一项:卷积层、残差层、平均池化层和采样率转换模块。
在一些实施例中,所述包括下述至少一项:激活函数、卷积层和第二加法器。
在一些实施例中,参见图19,编码器190还可以包括第一训练单元1907,配置为获取训练样本集;其中,所述训练样本集包括一个或多个图像;以及构建初始网络模型,利用所述训练样本集对所述初始网络模型进行训练,并将训练后的初始网络模型确定为所述预设网络模型。
在一些实施例中,第一确定单元1901,还配置为确定网络模型参数;以及根据所确定的网络模型参数,构建所述预设网络模型。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中, 也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,应用于编码器190,该计算机存储介质存储有图像预测程序,所述图像预测程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码器190的组成以及计算机存储介质,参见图20,其示出了本申请实施例提供的编码器190的硬件结构示意图。如图20所示,可以包括:第一通信接口2001、第一存储器2002和第一处理器2003;各个组件通过第一总线系统2004耦合在一起。可理解,第一总线系统2004用于实现这些组件之间的连接通信。第一总线系统2004除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图20中将各种总线都标为第一总线系统2004。其中,
第一通信接口2001,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器2002,用于存储能够在第一处理器2003上运行的计算机程序;
第一处理器2003,用于在运行所述计算机程序时,执行:
确定当前块的待预测图像分量的初始预测值;
确定所述当前块的参考图像分量相关的样值;
根据所述参考图像分量相关的样值,确定所述当前块的边信息;
利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
可以理解,本申请实施例中的第一存储器2002可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器2002旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器2003可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器2003中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器2003可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器2002,第一处理器2003读取第一存储器2002中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的 技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器2003还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
本实施例提供了一种编码器,该编码器可以包括第一确定单元、第一预测单元和编码单元。这样,对于编码器来说,可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
基于前述实施例相同的发明构思,参见图21,其示出了本申请实施例提供的一种解码器210的组成结构示意图。如图21所示,该解码器210可以包括:解析单元2101、第二确定单元2102、第二预测单元2103和解码单元2104;其中,
解析单元2101,配置为解析码流,确定当前块的目标预测模式;
第二确定单元2102,配置为根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;
第二确定单元2102,还配置为确定所述当前块的参考图像分量相关的样值;以及根据所述参考图像分量相关的样值,确定所述当前块的边信息;
第二预测单元2103,配置为利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
解码单元2104,配置为根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
在一些实施例中,第二预测单元2103,还配置为根据所述目标预测模式对所述当前块的待预测图像分量进行预测,得到所述当前块的待预测图像分量的初始预测值。
在一些实施例中,所述目标预测模式是帧内预测模式或帧间预测模式。
在一些实施例中,所述目标预测模式是同图像分量间预测模式或跨图像分量间预测模式。
在一些实施例中,所述参考图像分量包括当前图像中不同于所述待预测图像分量的一个或多个图像分量,其中,所述当前图像是所述当前块所处的图像。
在一些实施例中,第二确定单元2102,具体配置为根据所述当前块的参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
在一些实施例中,第二确定单元2102,具体配置为根据所述当前块的相邻像素对应于所述参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
在一些实施例中,所述当前块的相邻像素包括与所述当前块相邻的至少一行像素。
在一些实施例中,所述当前块的相邻像素包括与所述当前块相邻的至少一列像素。
在一些实施例中,所述参考图像分量包括参考图像中一个或多个图像分量,其中,所述参考图像不同于所述当前块所处的当前图像。
在一些实施例中,第二确定单元2102,具体配置为根据所述当前块的预测参考块中一个或多个图像分量的重建值,确定所述参考图像分量相关的样值。
在一些实施例中,解析单元2101,还配置为当所述目标预测模式指示帧间预测模式时,解析所述码流,获取所述当前块的帧间预测模式参数,其中,所述帧间预测模式参数包括参考图像索引和运动矢量;
第二确定单元2102,还配置为根据所述参考图像索引,确定所述参考图像;以及根据所述运动矢量,在所述参考图像中确定所述预测参考块。
在一些实施例中,第二确定单元2102,还配置为将所述参考图像分量相关的样值确定为所述当前块的边信息。
在一些实施例中,参见图21,解码器210还可以包括第二采样单元2105,配置为根据颜色分量采样格式,对所述参考图像分量相关的样值进行第一滤波处理,得到滤波后的参考图像分量相关的样值;以及将所述滤波后的参考图像分量相关的样值确定为所述当前块的边信息。
在一些实施例中,解析单元2101,还配置为解析所述码流,获取所述颜色分量采样格式。
在一些实施例中,解析单元2101,还配置为解析所述码流中的参数集数据单元,获取用于指示所述颜色分量采样格式的比特字段的取值;
第二确定单元2102,还配置为根据所述比特字段的取值,确定所述颜色分量采样格式。
在一些实施例中,第二采样单元2105,具体配置为当所述初始预测值的分辨率小于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行下采样处理;当所述初始预测值的分辨率大于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行上采样处理;当所述初始预测值的分辨率等于所述参考图像分量相关的样值的分辨率时,将所述滤波后的参考图像分 量相关的样值设置为等于所述参考图像分量相关的样值。
在一些实施例中,所述滤波后的参考图像分量相关的样值的分辨率与所述初始预测值的分辨率相等。
在一些实施例中,所述当前块的参考图像分量相关的样值包括所述当前块的参考图像分量的重建值、与所述当前块相邻的至少一行像素对应的参考图像分量值和与所述当前块相邻的至少一列像素对应的参考图像分量值中的至少两项,其中,所述当前块的参考图像分量不同于所述待预测图像分量。
在一些实施例中,第二预测单元2103,具体配置为将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值。
在一些实施例中,第二确定单元2102,还配置为确定所述预设网络模型。
在一些实施例中,所述预设网络模型包括神经网络模型和第一加法器。
在一些实施例中,第二预测单元2103,具体配置为将所述边信息和所述初始预测值输入所述神经网络模型,输出中间值;以及通过所述第一加法器对所述中间值和所述初始预测值进行相加处理,得到所述目标预测值。
在一些实施例中,所述神经网络模型包括下述至少一项:卷积层、残差层、平均池化层和采样率转换模块。
在一些实施例中,所述残差层包括下述至少一项:激活函数、卷积层和第二加法器。
在一些实施例中,参见图21,解码器210还可以包括第二训练单元2106,配置为获取训练样本集;其中,所述训练样本集包括一个或多个图像;以及构建初始网络模型,利用所述训练样本集对所述初始网络模型进行训练,并将训练后的初始网络模型确定为所述预设网络模型。
在一些实施例中,解析单元2101,还配置为解析所述码流,获取所述预设网络模型的网络模型参数;
第二确定单元2102,还配置为根据所述网络模型参数,确定所述预设网络模型。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,应用于解码器210,该计算机存储介质存储有图像预测程序,所述图像预测程序被第二处理器执行时实现前述实施例中任一项所述的方法。
基于上述解码器210的组成以及计算机存储介质,参见图22,其示出了本申请实施例提供的解码器210的硬件结构示意。如图22所示,可以包括:第二通信接口2201、第二存储器2202和第二处理器2203;各个组件通过第二总线系统2204耦合在一起。可理解,第二总线系统2204用于实现这些组件之间的连接通信。第二总线系统2204除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图22中将各种总线都标为第二总线系统2204。其中,
第二通信接口2201,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器2202,用于存储能够在第二处理器2203上运行的计算机程序;
第二处理器2203,用于在运行所述计算机程序时,执行:
解析码流,获取当前块的目标预测模式;
根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;
确定所述当前块的参考图像分量相关的样值;
根据所述参考图像分量相关的样值,确定所述当前块的边信息;
利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
可选地,作为另一个实施例,第二处理器2203还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第二存储器2202与第一存储器2002的硬件功能类似,第二处理器2203与第一处理器2003的硬件功能类似;这里不再详述。
本实施例提供了一种解码器,该解码器可以包括解析单元、第二确定单元、第二预测单元和解码单元。这样,对于解码器来说,也可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
工业实用性
本申请实施例中,在确定当前块的待预测图像分量的初始预测值之后,确定所述当前块的参考图像分量相关的样值;然后根据所述参考图像分量相关的样值,确定所述当前块的边信息;再利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;最后根据所述目标预测值,对所述当前块的待预测图像分量进行编码或者解码。这样,可以利用图像分量之间的相关性,根据当前块的边信息和预设网络模型对初始预测值进行预测增强,使得增强后的目标预测值更接近于真实值,从而有效提高了预测精度,进而提高了编解码效率,同时提高了整体编解码的性能。

Claims (60)

  1. 一种图像预测方法,应用于编码器,所述方法包括:
    确定当前块的待预测图像分量的初始预测值;
    确定所述当前块的参考图像分量相关的样值;
    根据所述参考图像分量相关的样值,确定所述当前块的边信息;
    利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
    根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
  2. 根据权利要求1所述的方法,其中,所述确定当前块的待预测图像分量对应的初始预测值,包括:
    确定所述当前块的目标预测模式;
    根据所述目标预测模式对所述当前块的待预测图像分量进行预测,确定所述当前块的待预测图像分量的初始预测值。
  3. 根据权利要求2所述的方法,其中,所述确定所述当前块的目标预测模式,包括:
    确定所述当前块的待预测图像分量;
    利用一种或多种候选预测模式分别对所述待预测图像分量进行预编码,确定所述候选预测模式对应的率失真代价结果;
    从所述率失真代价结果中选取最优率失真代价结果,并将所述最优率失真代价结果对应的候选预测模式确定为所述当前块的目标预测模式。
  4. 根据权利要求2或3所述的方法,其中,所述目标预测模式是帧内预测模式或帧间预测模式。
  5. 根据权利要求4所述的方法,其中,所述方法还包括:
    所述目标预测模式是同图像分量间预测模式或跨图像分量间预测模式。
  6. 根据权利要求1所述的方法,其中,所述参考图像分量包括当前图像中不同于所述待预测图像分量的一个或多个图像分量,其中,所述当前图像是所述当前块所处的图像。
  7. 根据权利要求6所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    根据所述当前块的参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
  8. 根据权利要求6所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    根据所述当前块的相邻像素对应于所述参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
  9. 根据权利要求8所述的方法,其中,所述当前块的相邻像素包括与所述当前块相邻的至少一行像素。
  10. 根据权利要求8所述的方法,其中,所述当前块的相邻像素包括与所述当前块相邻的至少一列像素。
  11. 根据权利要求4所述的方法,其中,所述参考图像分量包括参考图像中一个或多个图像分量,其中,所述参考图像不同于所述当前块所处的当前图像。
  12. 根据权利要求11所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    根据所述当前块的预测参考块中一个或多个图像分量的重建值,确定所述参考图像分量相关的样值。
  13. 根据权利要求12所述的方法,其中,所述方法还包括:
    当所述目标预测模式指示帧间预测模式时,确定所述当前块的帧间预测模式参数,其中,所述帧间预测模式参数包括指示所述参考图像对应的参考图像索引和指示所述参考图像中所述预测参考块的运动矢量;
    将所确定的帧间预测模式参数写入码流。
  14. 根据权利要求1所述的方法,其中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,包括:
    将所述参考图像分量相关的样值确定为所述当前块的边信息。
  15. 根据权利要求1所述的方法,其中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,包括:
    根据颜色分量采样格式,对所述参考图像分量相关的样值进行第一滤波处理,得到滤波后的参考图像分量相关的样值;
    将所述滤波后的参考图像分量相关的样值确定为所述当前块的边信息。
  16. 根据权利要求15所述的方法,其中,所述方法还包括:
    将所述颜色分量采样格式写入码流。
  17. 根据权利要求16所述的方法,其中,所述方法还包括:
    确定待写入码流的比特字段的取值;其中,所述比特字段的取值用于指示所述颜色分量采样格式;
    将所述比特字段的取值写入码流。
  18. 根据权利要求15所述的方法,其中,所述对所述参考图像分量相关的样值进行第一滤波处理,包括:
    当所述初始预测值的分辨率小于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行下采样处理;
    当所述初始预测值的分辨率大于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行上采样处理;
    当所述初始预测值的分辨率等于所述参考图像分量相关的样值的分辨率时,将所述滤波后的参考图像分量相关的样值设置为等于所述参考图像分量相关的样值。
  19. 根据权利要求15所述的方法,其中,所述方法还包括:
    所述滤波后的参考图像分量相关的样值的分辨率与所述初始预测值的分辨率相等。
  20. 根据权利要求1所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    所述当前块的参考图像分量相关的样值包括所述当前块的参考图像分量的重建值、与所述当前块相邻的至少一行像素对应的参考图像分量值和与所述当前块相邻的至少一列像素对应的参考图像分量值中的至少两项,其中,所述当前块的参考图像分量不同于所述待预测图像分量。
  21. 根据权利要求1所述的方法,其中,所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,包括:
    将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值。
  22. 根据权利要求21所述的方法,其中,所述方法还包括:
    确定所述预设网络模型。
  23. 根据权利要求22所述的方法,其中,所述预设网络模型包括神经网络模型和第一加法器。
  24. 根据权利要求23所述的方法,其中,所述将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值,包括:
    将所述边信息和所述初始预测值输入所述神经网络模型,输出中间值;
    通过所述第一加法器对所述中间值和所述初始预测值进行相加处理,得到所述目标预测值。
  25. 根据权利要求23所述的方法,其中,所述神经网络模型包括下述至少一项:卷积层、残差层、平均池化层和采样率转换模块。
  26. 根据权利要求25所述的方法,其中,所述残差层包括下述至少一项:激活函数、卷积层和第二加法器。
  27. 根据权利要求22所述的方法,其中,所述确定所述预设网络模型,包括:
    获取训练样本集;其中,所述训练样本集包括一个或多个图像;
    构建初始网络模型,利用所述训练样本集对所述初始网络模型进行训练;
    将训练后的初始网络模型确定为所述预设网络模型。
  28. 根据权利要求22所述的方法,其中,所述确定所述预设网络模型,包括:
    确定网络模型参数;
    根据所确定的网络模型参数,构建所述预设网络模型。
  29. 一种图像预测方法,应用于解码器,所述方法包括:
    解析码流,获取当前块的目标预测模式;
    根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;
    确定所述当前块的参考图像分量相关的样值;
    根据所述参考图像分量相关的样值,确定所述当前块的边信息;
    利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
    根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
  30. 根据权利要求29所述的方法,其中,所述根据所述目标预测模式,确定所述当前块的待预测 图像分量的初始预测值,包括:
    根据所述目标预测模式对所述当前块的待预测图像分量进行预测,得到所述当前块的待预测图像分量的初始预测值。
  31. 根据权利要求30所述的方法,其中,所述目标预测模式是帧内预测模式或帧间预测模式。
  32. 根据权利要求31所述的方法,其中,所述方法还包括:
    所述目标预测模式是同图像分量间预测模式或跨图像分量间预测模式。
  33. 根据权利要求29所述的方法,其中,所述参考图像分量包括当前图像中不同于所述待预测图像分量的一个或多个图像分量,其中,所述当前图像是所述当前块所处的图像。
  34. 根据权利要求33所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    根据所述当前块的参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
  35. 根据权利要求33所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    根据所述当前块的相邻像素对应于所述参考图像分量的预测值和重建值中至少之一,确定所述参考图像分量相关的样值。
  36. 根据权利要求35所述的方法,其中,所述当前块的相邻像素包括与所述当前块相邻的至少一行像素。
  37. 根据权利要求35所述的方法,其中,所述当前块的相邻像素包括与所述当前块相邻的至少一列像素。
  38. 根据权利要求29所述的方法,其中,所述参考图像分量包括参考图像中一个或多个图像分量,其中,所述参考图像不同于所述当前块所处的当前图像。
  39. 根据权利要求38所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    根据所述当前块的预测参考块中一个或多个图像分量的重建值,确定所述参考图像分量相关的样值。
  40. 根据权利要求39所述的方法,其中,所述方法还包括:
    当所述目标预测模式指示帧间预测模式时,解析所述码流,获取所述当前块的帧间预测模式参数,其中,所述帧间预测模式参数包括参考图像索引和运动矢量;
    根据所述参考图像索引,确定所述参考图像;
    根据所述运动矢量,在所述参考图像中确定所述预测参考块。
  41. 根据权利要求29所述的方法,其中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,包括:
    将所述参考图像分量相关的样值确定为所述当前块的边信息。
  42. 根据权利要求29所述的方法,其中,所述根据所述参考图像分量相关的样值,确定所述当前块的边信息,包括:
    根据颜色分量采样格式,对所述参考图像分量相关的样值进行第一滤波处理,得到滤波后的参考图像分量相关的样值;
    将所述滤波后的参考图像分量相关的样值确定为所述当前块的边信息。
  43. 根据权利要求42所述的方法,其中,所述方法还包括:
    解析所述码流,获取所述颜色分量采样格式。
  44. 根据权利要求43所述的方法,其中,所述方法还包括:
    解析所述码流中的参数集数据单元,获取用于指示所述颜色分量采样格式的比特字段的取值;
    根据所述比特字段的取值,确定所述颜色分量采样格式。
  45. 根据权利要求42所述的方法,其中,所述对所述参考图像分量相关的样值进行第一滤波处理,包括:
    当所述初始预测值的分辨率小于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行下采样处理;
    当所述初始预测值的分辨率大于所述参考图像分量相关的样值的分辨率时,对所述参考图像分量相关的样值进行上采样处理;
    当所述初始预测值的分辨率等于所述参考图像分量相关的样值的分辨率时,将所述滤波后的参考图像分量相关的样值设置为等于所述参考图像分量相关的样值。
  46. 根据权利要求42所述的方法,其中,所述方法还包括:
    所述滤波后的参考图像分量相关的样值的分辨率与所述初始预测值的分辨率相等。
  47. 根据权利要求29所述的方法,其中,所述确定所述当前块的参考图像分量相关的样值,包括:
    所述当前块的参考图像分量相关的样值包括所述当前块的参考图像分量的重建值、与所述当前块相邻的至少一行像素对应的参考图像分量值和与所述当前块相邻的至少一列像素对应的参考图像分量值 中的至少两项,其中,所述当前块的参考图像分量不同于所述待预测图像分量。
  48. 根据权利要求29所述的方法,其中,所述利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值,包括:
    将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值。
  49. 根据权利要求48所述的方法,其中,所述方法还包括:
    确定所述预设网络模型。
  50. 根据权利要求49所述的方法,其中,所述预设网络模型包括神经网络模型和第一加法器。
  51. 根据权利要求50所述的方法,其中,所述将所述当前块的边信息和所述待预测图像分量的初始预测值输入到所述预设网络模型,通过所述预设网络模型输出所述待预测图像分量的目标预测值,包括:
    将所述边信息和所述初始预测值输入所述神经网络模型,输出中间值;
    通过所述第一加法器对所述中间值和所述初始预测值进行相加处理,得到所述目标预测值。
  52. 根据权利要求50所述的方法,其中,所述神经网络模型包括下述至少一项:卷积层、残差层、平均池化层和采样率转换模块。
  53. 根据权利要求52所述的方法,其中,所述残差层包括下述至少一项:激活函数、卷积层和第二加法器。
  54. 根据权利要求49所述的方法,其中,所述确定所述预设网络模型,包括:
    获取训练样本集;其中,所述训练样本集包括一个或多个图像;
    构建初始网络模型,利用所述训练样本集对所述初始网络模型进行训练;
    将训练后的初始网络模型确定为所述预设网络模型。
  55. 根据权利要求54所述的方法,其中,所述确定所述预设网络模型,包括:
    解析所述码流,获取所述预设网络模型的网络模型参数;
    根据所述网络模型参数,确定所述预设网络模型。
  56. 一种编码器,所述编码器包括第一确定单元、第一预测单元和编码单元;其中,
    所述第一确定单元,配置为确定当前块的待预测图像分量的初始预测值;
    所述第一确定单元,还配置为确定所述当前块的参考图像分量相关的样值;以及根据所述参考图像分量相关的样值,确定所述当前块的边信息;
    所述第一预测单元,配置为利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
    所述编码单元,配置为根据所述目标预测值,对所述当前块的待预测图像分量进行编码。
  57. 一种编码器,所述编码器包括第一存储器和第一处理器;其中,
    所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;
    所述第一处理器,用于在运行所述计算机程序时,执行如权利要求1至28任一项所述的方法。
  58. 一种解码器,所述解码器包括解析单元、第二确定单元、第二预测单元和解码单元;其中,
    所述解析单元,配置为解析码流,确定当前块的目标预测模式;
    所述第二确定单元,配置为根据所述目标预测模式,确定所述当前块的待预测图像分量的初始预测值;
    所述第二确定单元,还配置为确定所述当前块的参考图像分量相关的样值;以及根据所述参考图像分量相关的样值,确定所述当前块的边信息;
    所述第二预测单元,配置为利用预设网络模型和所述当前块的边信息对所述初始预测值进行滤波,得到所述当前块的待预测图像分量的目标预测值;
    所述解码单元,配置为根据所述目标预测值,对所述当前块的待预测图像分量进行解码。
  59. 一种解码器,所述解码器包括第二存储器和第二处理器;其中,
    所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;
    所述第二处理器,用于在运行所述计算机程序时,执行如权利要求29至55任一项所述的方法。
  60. 一种计算机存储介质,其中,所述计算机存储介质存储有图像预测程序,所述图像预测程序被第一处理器执行时实现如权利要求1至28任一项所述的方法、或者被第二处理器执行时实现如权利要求29至55任一项所述的方法。
PCT/CN2020/119731 2020-09-30 2020-09-30 图像预测方法、编码器、解码器以及计算机存储介质 WO2022067805A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20955825.3A EP4224842A4 (en) 2020-09-30 2020-09-30 IMAGE PREDICTION METHOD, ENCODER, DECODER AND COMPUTER STORAGE MEDIUM
CN202080105520.2A CN116472707A (zh) 2020-09-30 2020-09-30 图像预测方法、编码器、解码器以及计算机存储介质
PCT/CN2020/119731 WO2022067805A1 (zh) 2020-09-30 2020-09-30 图像预测方法、编码器、解码器以及计算机存储介质
US18/126,696 US20230262251A1 (en) 2020-09-30 2023-03-27 Picture prediction method, encoder, decoder and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/119731 WO2022067805A1 (zh) 2020-09-30 2020-09-30 图像预测方法、编码器、解码器以及计算机存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/126,696 Continuation US20230262251A1 (en) 2020-09-30 2023-03-27 Picture prediction method, encoder, decoder and computer storage medium

Publications (1)

Publication Number Publication Date
WO2022067805A1 true WO2022067805A1 (zh) 2022-04-07

Family

ID=80949435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119731 WO2022067805A1 (zh) 2020-09-30 2020-09-30 图像预测方法、编码器、解码器以及计算机存储介质

Country Status (4)

Country Link
US (1) US20230262251A1 (zh)
EP (1) EP4224842A4 (zh)
CN (1) CN116472707A (zh)
WO (1) WO2022067805A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023245544A1 (zh) * 2022-06-23 2023-12-28 Oppo广东移动通信有限公司 编解码方法、码流、编码器、解码器以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106303535A (zh) * 2015-06-08 2017-01-04 上海天荷电子信息有限公司 参考像素取自不同程度重构像素的图像压缩方法和装置
CN110557646A (zh) * 2019-08-21 2019-12-10 天津大学 一种智能视点间的编码方法
CN110602491A (zh) * 2019-08-30 2019-12-20 中国科学院深圳先进技术研究院 帧内色度预测方法、装置、设备及视频编解码系统
CN110896478A (zh) * 2018-09-12 2020-03-20 北京字节跳动网络技术有限公司 交叉分量线性建模中的下采样
CN111164651A (zh) * 2017-08-28 2020-05-15 交互数字Vc控股公司 用多分支深度学习进行滤波的方法和装置
US20200288135A1 (en) * 2017-10-09 2020-09-10 Canon Kabushiki Kaisha New sample sets and new down-sampling schemes for linear component sample prediction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106303535A (zh) * 2015-06-08 2017-01-04 上海天荷电子信息有限公司 参考像素取自不同程度重构像素的图像压缩方法和装置
CN111164651A (zh) * 2017-08-28 2020-05-15 交互数字Vc控股公司 用多分支深度学习进行滤波的方法和装置
US20200288135A1 (en) * 2017-10-09 2020-09-10 Canon Kabushiki Kaisha New sample sets and new down-sampling schemes for linear component sample prediction
CN110896478A (zh) * 2018-09-12 2020-03-20 北京字节跳动网络技术有限公司 交叉分量线性建模中的下采样
CN110557646A (zh) * 2019-08-21 2019-12-10 天津大学 一种智能视点间的编码方法
CN110602491A (zh) * 2019-08-30 2019-12-20 中国科学院深圳先进技术研究院 帧内色度预测方法、装置、设备及视频编解码系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4224842A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023245544A1 (zh) * 2022-06-23 2023-12-28 Oppo广东移动通信有限公司 编解码方法、码流、编码器、解码器以及存储介质

Also Published As

Publication number Publication date
US20230262251A1 (en) 2023-08-17
EP4224842A4 (en) 2023-12-06
EP4224842A1 (en) 2023-08-09
CN116472707A (zh) 2023-07-21

Similar Documents

Publication Publication Date Title
JP3861698B2 (ja) 画像情報符号化装置及び方法、画像情報復号装置及び方法、並びにプログラム
US20230262212A1 (en) Picture prediction method, encoder, decoder, and computer storage medium
US11843781B2 (en) Encoding method, decoding method, and decoder
CN113439440A (zh) 图像分量预测方法、编码器、解码器以及存储介质
US20230262251A1 (en) Picture prediction method, encoder, decoder and computer storage medium
WO2021238396A1 (zh) 帧间预测方法、编码器、解码器以及计算机存储介质
WO2020132908A1 (zh) 解码预测方法、装置及计算机存储介质
CN113766233B (zh) 图像预测方法、编码器、解码器以及存储介质
CN112313950B (zh) 视频图像分量的预测方法、装置及计算机存储介质
CN113395520B (zh) 解码预测方法、装置及计算机存储介质
CN115280778A (zh) 帧间预测方法、编码器、解码器以及存储介质
WO2022246809A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2024007120A1 (zh) 编解码方法、编码器、解码器以及存储介质
WO2023197189A1 (zh) 编解码方法、装置、编码设备、解码设备以及存储介质
WO2024113311A1 (zh) 编解码方法、编解码器、码流以及存储介质
WO2022266971A1 (zh) 编解码方法、编码器、解码器以及计算机存储介质
WO2022257049A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2024016156A1 (zh) 滤波方法、编码器、解码器、码流以及存储介质
US20240187599A1 (en) Image decoding method and apparatus therefor
TW202145783A (zh) 幀間預測方法、編碼器、解碼器以及電腦儲存媒介
CN115988223A (zh) 帧内预测模式的确定、图像编码以及图像解码方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20955825

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202080105520.2

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020955825

Country of ref document: EP

Effective date: 20230502