WO2024050723A1 - 一种图像预测方法、装置及计算机可读存储介质 - Google Patents
一种图像预测方法、装置及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2024050723A1 WO2024050723A1 PCT/CN2022/117595 CN2022117595W WO2024050723A1 WO 2024050723 A1 WO2024050723 A1 WO 2024050723A1 CN 2022117595 W CN2022117595 W CN 2022117595W WO 2024050723 A1 WO2024050723 A1 WO 2024050723A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- prediction
- intra
- block
- prediction mode
- frame
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 238000004590 computer program Methods 0.000 claims description 17
- 230000015654 memory Effects 0.000 claims description 6
- 238000013139 quantization Methods 0.000 description 31
- 238000000638 solvent extraction Methods 0.000 description 22
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 20
- 238000007906 compression Methods 0.000 description 18
- 230000006835 compression Effects 0.000 description 18
- 230000009466 transformation Effects 0.000 description 16
- 238000001914 filtration Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 13
- 230000008859 change Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 7
- 241000023320 Luma <angiosperm> Species 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
Definitions
- the embodiments of the present disclosure relate to, but are not limited to, the technical field of video data processing, and in particular, to a video encoding and decoding method, device, and storage medium.
- Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
- Digital video compression standards can save a lot of video data, it is still necessary to pursue better digital video compression technology to reduce the number of Bandwidth and traffic pressure of video transmission to achieve more efficient video encoding, decoding, transmission and storage.
- Embodiments of the present disclosure provide an image prediction method, including:
- intra-frame sub-block division information indicates that the current block uses intra-frame sub-block division
- the intra-frame sub-block division information is obtained based on calculating spatial domain continuity parameters, and the intra-frame sub-block division information is written into the code stream.
- An embodiment of the present disclosure also provides an image prediction device, including a processor and a memory storing a computer program that can run on the processor, wherein when the processor executes the computer program, any one of the aspects of the present disclosure is implemented.
- the image prediction method described in the embodiment is not limited to:
- Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium.
- the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the computer program implements the method described in any embodiment of the present disclosure.
- Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application
- Figure 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application
- Figure 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
- Figure 4A is a schematic flow chart of an image prediction method provided by an embodiment of the present application.
- Figure 4B is a flow chart for calculating airspace continuity parameters provided by an embodiment of the present application.
- Figure 5 is a schematic flow chart of an image prediction method provided by an embodiment of the present application.
- Figure 6 is a schematic flow chart of an image prediction method provided by an embodiment of the present application.
- the present disclosure can be applied to the field of image encoding and decoding, the field of video encoding and decoding, the field of hardware video encoding and decoding, the field of dedicated circuit video encoding and decoding, the field of real-time video encoding and decoding, etc.
- the solution of the present disclosure can be combined with audio and video coding standards (AVS for short), such as H.264/Audio Video Coding (AVC for short) standard, H.265/High Efficiency Video Coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard.
- the disclosed approach may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
- SVC scalable video codec
- MVC multi-view video codec
- FIG. 1 For ease of understanding, the video encoding and decoding system involved in the embodiment of the present disclosure is first introduced with reference to FIG. 1 .
- FIG. 1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present disclosure. It should be noted that FIG. 1 is only an example, and the video encoding and decoding system according to the embodiment of the present disclosure includes, but is not limited to, what is shown in FIG. 1 .
- the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 .
- the encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device.
- the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
- the encoding device 110 in the embodiment of the present disclosure can be understood as a device with a video encoding function
- the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present disclosure includes a broader range of the encoding device 110 and the decoding device 120.
- Devices include, for example, smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
- the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 .
- Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
- channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time.
- encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120.
- the communication media includes wireless communication media, such as radio frequency spectrum.
- the communication media may also include wired communication media, such as one or more physical transmission lines.
- channel 130 includes a storage medium that can store video data encoded by encoding device 110 .
- Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc.
- the decoding device 120 may obtain the encoded video data from the storage medium.
- channel 130 may include a storage server that may store video data encoded by encoding device 110 .
- the decoding device 120 may download the stored encoded video data from the storage server.
- the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.
- FTP File Transfer Protocol
- the encoding device 110 includes a video encoder 112 and an output interface 113.
- the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
- the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
- Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.
- a video capture device eg, a video camera
- a video archive e.g., a video archive
- video input interface for receiving video data from a video content provider
- computer graphics system Used to generate video data.
- the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
- Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures).
- the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
- Encoded information may include encoded image data and associated data.
- the associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures.
- SPS sequence parameter set
- PPS picture parameter set
- An SPS can contain parameters that apply to one or more sequences.
- a PPS can contain parameters that apply to one or more images.
- a syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.
- the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 .
- the encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120 .
- decoding device 120 includes input interface 121 and video decoder 122.
- the decoding device 120 may also include a display device 123.
- the input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.
- the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
- the display device 123 displays the decoded video data.
- Display device 123 may be integrated with decoding device 120 or external to decoding device 120 .
- Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- plasma display a plasma display
- OLED organic light emitting diode
- Figure 1 is only an example, and the technical solutions of the embodiments of the present disclosure are not limited to Figure 1 .
- the technology of the present disclosure can also be applied to unilateral video encoding or unilateral video decoding.
- FIG. 2 is a schematic block diagram of a video encoder related to an embodiment of the present disclosure. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images.
- the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
- the video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format.
- YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation.
- 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr)
- 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr)
- 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
- the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU).
- CTB may be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
- LCU Large Coding unit
- CTB coding tree block
- Each CTU can be associated with an equal-sized block of pixels within the image.
- Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples.
- a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
- a CTU can be further divided into several coding units (Coding Units, CUs) for encoding.
- CUs can be rectangular blocks or square blocks.
- CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible.
- the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.
- Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, the video encoder and video decoder can support a PU size of 2N ⁇ 2N or N ⁇ N for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N and nR ⁇ 2N asymmetric PUs for inter prediction.
- the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.
- the current block may be called the current coding unit (CU) or the current prediction unit (PU), etc.
- the prediction block may also be called a predicted image block or an image prediction block
- the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.
- prediction unit 210 includes inter prediction unit 211 and intra estimation unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
- the inter-frame prediction unit 211 can be used for inter-frame prediction.
- Inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames.
- Inter-frame prediction uses motion information to find a reference from a reference frame. block, a prediction block is generated based on the reference block to eliminate temporal redundancy; the frames used in inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional predictions frame.
- Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks.
- the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
- the motion vector can be in whole pixels or sub-pixels.
- the reference frame found according to the motion vector is A block of whole pixels or sub-pixels is called a reference block.
- Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Generating a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing the prediction block to generate a prediction block for the current block.
- the intra-frame estimation unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy.
- the frames used in intra prediction may be I frames.
- Intra-frame prediction has multiple prediction modes. Taking the international digital video coding standard H series as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC has been extended to 33 angles. prediction mode and 2 non-angle prediction modes.
- the intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes.
- the intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes.
- Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.
- Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with the TU of the CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.
- QP quantization parameter
- Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.
- Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.
- the loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels to compensate for distortion information and provide a better reference for subsequent encoding of pixels. For example, a deblocking filtering operation can be performed to reduce the number of pixel blocks associated with the CU. block effect.
- the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, where the deblocking filtering unit is used to remove blocking effects, and the SAO/ALF unit Used to remove ringing effects.
- SAO/ALF sample adaptive compensation/adaptive loop filtering
- Decoded image cache 270 may store reconstructed pixel blocks.
- Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks.
- intra estimation unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.
- Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
- Figure 3 is a schematic block diagram of a video decoder related to an embodiment of the present disclosure.
- the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image cache 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.
- Video decoder 300 can receive the code stream.
- Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream.
- the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
- prediction unit 320 includes inter prediction unit 321 and intra estimation unit 322.
- Intra estimation unit 322 may perform intra prediction to generate predicted blocks for the PU. Intra estimation unit 322 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.
- the inter prediction unit 321 may construct a first reference image list (List 0) and a second reference image list (List 1) according to syntax elements parsed from the code stream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 321 may generate a predictive block for the PU based on one or more reference blocks of the PU.
- Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
- inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.
- Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
- Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.
- Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 .
- the video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
- the basic process of video encoding and decoding is as follows: at the encoding end, an image frame is divided into blocks.
- the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block.
- the residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block.
- the residual block may also be called residual information.
- the residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy.
- the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block.
- the entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may perform entropy encoding on the quantized change coefficient to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.
- the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
- the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
- the inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
- the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block.
- the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image.
- the encoding end also needs similar operations as the decoding end to obtain the decoded image.
- the decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for
- the block division information determined by the encoding end as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary.
- the decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.
- the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This disclosure is applicable to the block-based hybrid coding framework.
- the basic process of the video codec but is not limited to this framework and process.
- Each frame in the video is divided into square largest coding units (LCU largest coding unit) or coding tree units (CTU Coding Tree Unit) of the same size (such as 128x128, 64x64, etc.).
- LCU largest coding unit square largest coding units
- CTU Coding Tree Unit coding tree units
- Each maximum coding unit or coding tree unit can be divided into rectangular coding units (CU coding units) according to rules.
- the input image is divided into fixed-size blocks as the basic unit of encoding, and it is called a macro block (MB, Macro Block), including one luminance block and two chrominance blocks.
- the block size is 16 ⁇ 16. If 4:2:0 sampling is used, the chroma block size is half the luma block size.
- macroblocks are further divided into small blocks for prediction according to different prediction modes.
- intra-frame prediction macroblocks can be divided into small blocks of 16 ⁇ 16, 8 ⁇ 8, and 4 ⁇ 4, and each small block is subjected to intra-frame prediction separately.
- the macroblock is divided into 4 ⁇ 4 or 8 ⁇ 8 small blocks, and the prediction residuals in each small block are transformed and quantized respectively to obtain the quantized coefficients.
- H.265/HEVC Compared with H.264/AVC, H.265/HEVC has taken improvement measures in multiple encoding aspects.
- CTU coding tree units
- a CTU includes a luma coding tree block (CTB, Coding Tree Block) and two chroma coding tree blocks.
- CTB Coding Tree Block
- the maximum size of the CU in the H.265/HEVC standard is generally 64 ⁇ 64.
- CTU is iteratively divided into a series of coding units (CU, Coding Unit) using Quadtree (QT, Quadro Tree) method.
- CU is the basic unit of intra-frame/inter-frame coding.
- a CU contains one luma coding block (CB, Coding Block) and two chroma coding blocks and related syntax structures.
- the maximum CU size is CTU and the minimum CU size is 8 ⁇ 8.
- the leaf node CU obtained after coding tree division can be divided into three types according to different prediction methods: intra CU for intra-frame prediction, inter CU for inter-frame prediction and skipped CU.
- skipped CU can be regarded as a special case of inter CU, which does not contain motion information and prediction residual information.
- the leaf node CU contains one or more prediction units (PU, Prediction Unit).
- H.265/HEVC supports PU sizes from 4 ⁇ 4 to 64 ⁇ 64, with a total of eight division modes.
- intra coding mode there are two possible partitioning modes: Part_2Nx2N and Part_NxN.
- CU uses the prediction residual quadtree to divide it into transform units (TU: Transform Unit).
- a TU contains a luminance transform block (TB, Transform Block) and two chroma transform blocks. Only square divisions are allowed, dividing a CB into 1 or 4 PBs.
- the same TU has the same transformation and quantization process, and the supported size is 4 ⁇ 4 to 32 ⁇ 32.
- TBs can span the boundaries of PBs to further maximize the coding efficiency of inter-frame coding.
- H.266/VVC the video coded image is first divided into coding tree units CTU similar to H.265/HEVC, but the maximum size is increased from 64 ⁇ 64 to 128 ⁇ 128.
- H.266/VVC proposes quadtree and nested multi-type tree (MTT, Multi-Type Tree) division.
- MTT includes binary tree (BT, Binary Tree) and ternary tree (TT, Ternary Tree), and unifies H. 265/HEVC, and supports more flexible CU division shapes.
- the CTU is divided according to the quadtree structure, and the leaf nodes are further divided through MTT. Multi-type tree leaf nodes become coding units CU.
- chroma can adopt a separate partition tree structure and does not have to be consistent with the luminance partition tree.
- the chroma division of I frames in H.266/VVC uses a chroma separation tree, while the chroma division of P frames and B frames is consistent with the luminance division.
- Intra sub-partition is an intra prediction technology based on sub-block partitioning.
- a coding block will continue to be divided horizontally or vertically into multiple (such as 2 or 4) sub-blocks of the same size, and each sub-block is decoded and reconstructed in turn.
- a 16x8 current block can be divided into four 16x2 sub-blocks horizontally, or divided into four 4x8 sub-blocks vertically.
- a 4x8 block can be divided into two 4X4 blocks or two 2x8 sub-blocks.
- the number of sub-blocks can be only 2 (for 4x8, 8x4 sub-blocks). And the sub-block contains at least 16 pixels, so the 4x4 block will not be divided again, and there is no need to make corresponding syntax declarations on the encoding and decoding ends.
- the specific division rules are shown in Table 3. As for whether to divide horizontally or vertically, it needs to be decided by the RDO criterion on the encoding side. In order to save encoding time, the divided intra-frame sub-blocks share the intra-frame prediction mode of the current block (that is, the sub-blocks obtained by dividing the current block have the same intra-frame prediction mode as the current block).
- the image to be encoded is divided into multiple non-overlapping CTU blocks.
- Each CTU is processed in sequence according to raster scanning order, and the CTU is divided into several CUs in different ways.
- the main steps to determine the coding block division method are as follows: For a certain CTU, use conventional block division methods (quadtree, ternary tree, binary tree) for block division.
- For a certain division method Split[i], perform intra prediction in different prediction modes on each CU, and select the optimal conventional intra prediction mode bestRegIntraMode[i] and prediction cost bestRegIntraCost[i]. If the block size of a certain CU meets the ISP condition, continue to perform ISP subdivision on the CU to obtain several sub-blocks.
- the residual block is predicted according to the optimal block division method, and the residual block is transformed, quantized, and entropy coded.
- the prediction information such as block division mode and prediction mode is encoded, and the code stream is output.
- Step S401 Obtain intra-frame sub-block division information.
- the intra-frame sub-block division information is used to indicate whether the current coding block uses intra-frame sub-block division.
- the encoder receives a video stream, which is composed of a series of image frames, and performs video encoding on each frame of image in the video stream.
- the video encoder divides the image into blocks to obtain The current encoding block.
- the current coding block is also called a current block, a current image block, a coding block, a current coding unit, a current block to be coded, a current image block to be coded, etc.
- the current coding block can also be divided into intra-frame sub-blocks.
- the indication information may be used to indicate whether to divide the current coding block into intra-frame sub-blocks.
- a flag bit can be used to represent the intra-frame sub-block division information. When the flag bit takes the first value, it indicates that the current block uses intra-frame sub-block division. When the flag bit takes the second value, it indicates that the current block does not apply to intra-frame sub-block division.
- the intra-frame sub-block division information can be obtained according to the calculation of spatial continuity parameters.
- the spatial continuity parameter can represent the continuity of changes in pixels in a given image area.
- the intra-subblock partitioning information indicates that the current block uses intra-subblock partitioning.
- the continuity parameter is less than a certain threshold, the intra-subblock partitioning information indicates that the current block does not use intra-subblock partitioning.
- the intra-subblock partitioning information indicates that the current block uses intra-subblock partitioning.
- the continuity parameter is less than or equal to a certain threshold, the intra-subblock partitioning information indicates that the current block does not use intra-subblock partitioning.
- a given image region may be a single image, a single slice, a single coding tree unit, or a single coding block. In other embodiments, a given image region may be an image region composed of multiple spatially adjacent strips, multiple spatially adjacent coding tree units, or multiple spatially adjacent coding blocks. It can be understood that the method of calculating the spatial continuity parameter for a given image area can be applied to the above various possible given image areas, and the embodiment of the present disclosure does not limit this.
- the methods of determining the spatial continuity parameters of a given image area include but are not limited to the following methods:
- step S401 includes the following steps S401-A1 and S401-A2:
- S401-A2 Determine the continuity parameter of the image area according to the gradient map of the gradient map.
- the gradient map of the image area can reflect the change trend of the pixels in the image area, and the gradient map of the gradient map of the image area can reflect the change rate of the pixels in the image area, that is, the continuity of change. Therefore, in this manner, the continuity parameter of the image area is determined by calculating the gradient map of the gradient map of the image area.
- the gradient map Gmap of the image area is determined according to the following formula (1):
- I(x, y) is the pixel value of image area I at position (x, y)
- I(x+1, y) is the pixel value of image area I at position (x+1, y)
- I(x, y+1) is the pixel value of image area I at position (x, y+1)
- Gmap(x, y) is the gradient value of the pixel point at position (x, y) in image area I .
- the gradient map GGmap of the gradient map of the image area is determined according to the following formula (2):
- GGmap(x,y)
- Gmap(x+1,y) is the gradient value of the pixel at position (x+1,y) in image area I
- Gmap(x,y+1) is the position (x,y+ The gradient value of the pixel at 1)
- Gmap(x+1,y) and Gmap(x,y+1) can be determined according to the above formula (1)
- GGmap(x,y) is the gradient value of the gradient value of the pixel at position (x,y) in image area I.
- the above formula (1) and formula (2) show that the gradient value and the gradient value of the pixel point (x, y) of the image area I are determined. Other pixel points in the image area I refer to the pixel point (x, y). ) method, determine the gradient value of each pixel in the image area I, and obtain the gradient map Gmap of the image area I. Use the above formula (2) to calculate the gradient value corresponding to each pixel in the gradient map Gmap of the image area I, and then obtain the gradient map GGmap of the gradient map of the image area I.
- the above formula (1) and formula (2) are just examples. Embodiments of the present disclosure can also use other methods to determine the gradient map of the image area and the gradient map of the gradient map. For example, the above formula (1) and formula (2) are deformed, and the gradient map is determined using the deformed formula, as well as the gradient map of the gradient map.
- the continuity parameter of the image area is determined based on the gradient map GGmap of the gradient map.
- Embodiments of the present disclosure do not limit the method of determining the continuity parameter of the image area based on the gradient map of the gradient map in S401-A2 above.
- the median value of each gradient value in the gradient map of the image region is determined as the continuity parameter of the image region.
- the average value of the gradient values in the gradient map of the image area is determined as the continuity parameter of the image area.
- intra-frame sub-block division makes more detailed divisions of coding blocks based on correlation in the spatial domain. Since the sub-blocks obtained by division have strong correlation, the residual error of intra prediction can be further reduced, thereby improving compression efficiency. Obviously, intra-frame sub-block division will increase the compression rate while increasing more block division possibilities, resulting in more rate-distortion optimization calculations and greatly increasing coding complexity. For video images with weak spatial correlation, the performance gain brought by intra-frame sub-block division will be very limited, but it will still significantly increase the coding complexity.
- the continuity parameter of the image area is determined by determining the gradient map of the image area. The calculation method is simple and accurate, and the calculation cost is small, which further improves the coding efficiency.
- Step S402 When the intra-frame sub-block division information indicates that the current block uses intra-frame sub-block division, based on at least one intra-frame sub-block division, determine at least one first intra-frame prediction mode and the at least one first intra-frame prediction mode corresponding predicted cost.
- a coding block that can be divided into intra-frame sub-blocks, it can continue to be divided into multiple sub-blocks.
- the sub-block division method is not unique, that is, there are multiple sub-block divisions for the current block. In order to perform predictive coding on the current block, it is necessary to consider the prediction situations of the multiple sub-block divisions.
- first intra prediction modes are used to predict the current block. It can be understood that the same first intra prediction mode is used for prediction on several sub-blocks of the current block. That is, based on a certain first intra prediction mode, intra prediction is performed on sub-blocks of the current block in sequence to obtain prediction values of all sample positions of the current block.
- the first intra-frame prediction modes include but are not limited to planar mode, direct current mode DC, multiple angle modes, matrix weighted intra-frame prediction mode (Matrix weighted intra-frame prediction, MIP), etc.
- Prediction cost is usually used to characterize the comprehensive measure of code stream overhead and image distortion caused by encoding when encoding is completed using prediction mode.
- Commonly used prediction costs include but are not limited to rate distortion costs.
- Step S403 Determine at least one first other prediction mode and a prediction cost corresponding to the at least one first other prediction mode.
- prediction modes include inter prediction mode, intra-inter prediction (Combined inter and intra prediction, CIIP), etc.
- Prediction cost is usually used to characterize the comprehensive measure of code stream overhead and image distortion caused by encoding when encoding is completed using prediction mode. Commonly used prediction costs include but are not limited to rate distortion costs.
- Step S404 Determine the prediction mode with the lowest prediction cost among at least one first intra prediction mode and at least one first other prediction mode as the encoding prediction mode of the current block, and use the prediction mode with the lowest cost to predict the current block.
- step S402 and step S403 the current block has been predicted using at least one first intra prediction mode and at least one first other prediction mode to obtain prediction costs corresponding to these prediction modes. It is easy to imagine that by comparing these prediction costs, the prediction mode and prediction cost actually used in predictive coding can be determined, and the current block can be predicted based on the prediction mode.
- the prediction mode with the smallest prediction cost is determined as the prediction mode used in predictive encoding.
- Step S405 Write the intra-frame sub-block division information into a code stream.
- the indication information may be used to indicate whether the current coding block is divided into intra-frame sub-blocks.
- a flag bit can be used to represent the intra-frame sub-block division information, and the flag bit is written into the code stream.
- the flag bit is a flag bit located at the sequence level.
- the flag bit takes the first value, it indicates that in the current image sequence, at least one block in the image uses intra-frame sub-block division.
- the flag bit takes the first value, The second value indicates that all blocks in the image sequence do not use intra-frame sub-block division.
- the flag bit is a flag bit located at the image level.
- the flag bit takes the first value, it indicates that in the current image, at least one block in the strip uses intra-frame sub-block division.
- the flag bit takes the first value, The second value indicates that all blocks in the image do not use intra-frame sub-block division.
- the flag bit is a flag bit located at the slice level.
- the flag bit takes the first value, it indicates that in the current slice, at least one block in the coding tree unit uses intra-frame sub-block division.
- the flag bit takes the second value, it indicates that all blocks in the strip do not use intra-frame sub-block division.
- the flag bit is a flag bit located at the coding tree level.
- the flag bit takes the first value, it indicates that in the current coding tree unit, at least one block uses intra-frame sub-block division.
- the flag bit takes the first value, When binary, it means that all blocks in the coding tree unit do not use intra sub-block division.
- the flag bit is a flag bit located at the block level. When the flag bit takes the first value, it indicates that the current block uses intra-frame sub-block division. When the flag bit takes the second value, it indicates that the current block does not use Intra-frame sub-block partitioning.
- Step S501 Obtain intra-frame sub-block division information
- the encoder receives a video stream, which is composed of a series of image frames, and performs video encoding on each frame of image in the video stream.
- the video encoder divides the image frames into blocks, Get the current encoding block.
- the current coding block is also called a current block, a current image block, a coding block, a current coding unit, a current block to be coded, a current image block to be coded, etc.
- the current coding block can also be divided into intra-frame sub-blocks.
- the indication information may be used to indicate whether to divide the current coding block into intra-frame sub-blocks.
- a flag bit may be used to represent the intra-frame sub-block division information. When the flag bit takes the first value, it indicates that the current block uses intra-frame sub-block division. When the flag bit takes the second value, it indicates that the current block does not apply to intra-frame sub-block division.
- the intra-frame sub-block division information can be obtained according to the calculation of spatial continuity parameters.
- the spatial continuity parameter can represent the continuity of changes in pixels in a given image area.
- the intra-subblock partitioning information indicates that the current block uses intra-subblock partitioning.
- the continuity parameter is less than a certain threshold, the intra-subblock partitioning information indicates that the current block does not use intra-subblock partitioning.
- the intra-subblock partitioning information indicates that the current block uses intra-subblock partitioning.
- the continuity parameter is less than or equal to a certain threshold, the intra-subblock partitioning information indicates that the current block does not use intra-subblock partitioning.
- a given image region may be a single image, a single slice, a single coding tree unit, or a single coding block. In other embodiments, a given image region may be an image region composed of multiple spatially adjacent strips, multiple spatially adjacent coding tree units, or multiple spatially adjacent coding blocks. It can be understood that the method of calculating the spatial continuity parameter for a given image area can be applied to the above various possible given image areas, and the embodiment of the present disclosure does not limit this.
- step S501 includes steps S401-A1 and S401-A2.
- intra-frame sub-block division makes a more fine-grained division of coding blocks based on correlation in the spatial domain. Since the sub-blocks obtained by division have strong correlation, the residuals of intra-frame prediction can be further reduced, thereby improving compression efficiency. Obviously, intra-frame sub-block division will increase the possibility of block division while improving the compression rate, which will lead to more rate-distortion optimization calculations and greatly increase the coding complexity. For video images with weak spatial correlation, the performance gain brought by intra-frame sub-block division will be very limited, but it will still significantly increase the coding complexity.
- the continuity parameter of the image area is determined by determining the gradient map of the image area. The calculation method is simple and accurate, and the calculation overhead is small, which further improves the coding efficiency.
- Step S502 When the intra-frame sub-block division information indicates that the current block does not use intra-frame sub-block division, determine at least one second intra-frame prediction mode and the prediction cost corresponding to the at least one second intra-frame prediction mode.
- the block is predictively encoded as a whole.
- the current block is predicted using different second intra prediction modes respectively.
- the second intra-frame prediction mode that can be used includes but is not limited to planar mode, direct current mode DC, multiple angle modes, matrix weighted intra-frame prediction mode (Matrix weighted intra-frame prediction, MIP), etc.
- Prediction cost is usually used to characterize the comprehensive measure of code stream overhead and image distortion caused by encoding when encoding is completed using prediction mode.
- Commonly used prediction costs include but are not limited to rate distortion costs.
- Step S503 Determine at least one second other prediction mode and the prediction cost corresponding to the at least one second other prediction mode.
- the second other prediction mode includes inter prediction mode, intra-inter joint prediction, etc.
- Prediction cost is usually used to characterize the comprehensive measure of code stream overhead and image distortion caused by encoding when encoding is completed using prediction mode. Commonly used prediction costs include but are not limited to rate distortion costs.
- Step S504 Determine the prediction mode with the smallest prediction cost among at least one second intra prediction mode and at least one second other prediction mode to be the encoding prediction mode of the current block, and use the prediction mode with the smallest cost to predict the current block.
- step S502 and step S503 the current block has been predicted using at least one second intra prediction mode and at least one second other prediction mode to obtain prediction costs corresponding to these prediction modes respectively. It is easy to imagine that by comparing these prediction costs, the prediction mode and prediction cost actually used in predictive coding can be determined, and the current block can be predicted based on the prediction mode.
- the prediction mode with the smallest prediction cost is determined as the prediction mode used in predictive encoding.
- Step S505 Write the intra-frame sub-block division information into a code stream.
- the indication information may be used to indicate whether the current coding block is divided into intra-frame sub-blocks.
- a flag bit can be used to represent the intra-frame sub-block division information, and the flag bit is written into the code stream.
- the flag bit is a flag bit located at the sequence level.
- the flag bit takes the first value, it indicates that in the current image sequence, at least one block in the image uses intra-frame sub-block division.
- the flag bit takes the first value, The second value indicates that all blocks in the image sequence do not use intra-frame sub-block division.
- the flag bit is a flag bit located at the image level.
- the flag bit takes the first value, it indicates that in the current image, at least one block in the strip uses intra-frame sub-block division.
- the flag bit takes the first value, The second value indicates that all blocks in the image do not use intra-frame sub-block division.
- the flag bit is a flag bit located at the slice level.
- the flag bit takes the first value, it indicates that in the current slice, at least one block in the coding tree unit uses intra-frame sub-block division.
- the flag bit takes the second value, it indicates that all blocks in the strip do not use intra-frame sub-block division.
- the flag bit is a flag bit located at the coding tree level.
- the flag bit takes the first value, it indicates that in the current coding tree unit, at least one block uses intra-frame sub-block division.
- the flag bit takes the first value, When binary, it means that all blocks in the coding tree unit do not use intra sub-block division.
- the flag bit is a flag bit located at the block level. When the flag bit takes the first value, it indicates that the current block uses intra-frame sub-block division. When the flag bit takes the second value, it indicates that the current block does not use Intra-frame sub-block partitioning.
- Step S601 Obtain the image frame to be encoded, and calculate the spatial continuity parameter of the image frame.
- the spatial continuity parameter can represent the continuity of changes in pixels in a given image area.
- methods for determining the spatial continuity parameters of a given image area include but are not limited to the following methods:
- the continuity parameters of the image area are determined.
- the gradient map of the image area can reflect the change trend of the pixels in the image area, and the gradient map of the gradient map of the image area can reflect the change rate of the pixels in the image area, that is, the continuity of change. Therefore, in this manner, the continuity parameter of the image area is determined by calculating the gradient map of the gradient map of the image area.
- the gradient map Gmap of the image area is determined according to the following formula (1):
- I(x, y) is the pixel value of image area I at position (x, y)
- I(x+1, y) is the pixel value of image area I at position (x+1, y)
- I(x, y+1) is the pixel value of image area I at position (x, y+1)
- Gmap(x, y) is the gradient value of the pixel point at position (x, y) in image area I .
- the gradient map GGmap of the gradient map of the image area is determined according to the following formula (2):
- GGmap(x,y)
- Gmap(x+1,y) is the gradient value of the pixel at position (x+1,y) in image area I
- Gmap(x,y+1) is the position (x,y+ The gradient value of the pixel at 1)
- Gmap(x+1,y) and Gmap(x,y+1) can be determined according to the above formula (1)
- GGmap(x,y) is the gradient value of the gradient value of the pixel at position (x,y) in image area I.
- the above formula (1) and formula (2) show that the gradient value and the gradient value of the pixel point (x, y) of the image area I are determined. Other pixel points in the image area I refer to the pixel point (x, y). ) method, determine the gradient value of each pixel in the image area I, and obtain the gradient map Gmap of the image area I. Use the above formula (2) to calculate the gradient value corresponding to each pixel in the gradient map Gmap of the image area I, and then obtain the gradient map GGmap of the gradient map of the image area I.
- the above formula (1) and formula (2) are just examples. Embodiments of the present disclosure can also use other methods to determine the gradient map of the image area and the gradient map of the gradient map. For example, the above formula (1) and formula (2) are deformed, and the gradient map is determined using the deformed formula, as well as the gradient map of the gradient map.
- the continuity parameter of the image area is determined based on the gradient map GGmap of the gradient map.
- Embodiments of the present disclosure do not limit the method of determining the continuity parameter of the image area based on the gradient map of the gradient map in SZ01-A2 mentioned above.
- the median value of each gradient value in the gradient map of the image region is determined as the continuity parameter of the image region.
- the average value of the gradient values in the gradient map of the image region is determined as the continuity parameter of the image region.
- Step S602 Determine whether the continuity parameter is greater than or equal to the predetermined threshold. If yes, execute step S603. If not, execute step S604.
- Step S603 For the block to be encoded in the image frame to be encoded, determine that the intra-frame sub-block division information indicates that the block to be encoded uses intra-frame sub-block division, and determine the prediction mode with the smallest prediction cost based on the intra-frame sub-block division of the to-be-encoded block.
- Step S604 For the block to be encoded in the image frame to be encoded, determine that the intra-frame sub-block division information indicates that the block to be encoded does not use intra-frame sub-block division.
- the block to be encoded is predictively encoded as a whole, and the prediction with the smallest prediction cost is determined. model.
- the block to be encoded is predictively encoded as a whole, and the method of determining the prediction mode with the smallest prediction cost can be the same as the above-mentioned S502-S504, which will not be described again here.
- Step S605 Use the prediction mode with the smallest prediction cost to predict the to-be-coded block, and use the prediction mode with the smallest prediction cost determined in step S603 or step S604 to predict the to-be-coded block.
- Step S606 Write the intra-frame sub-block division information into the code stream, and the indication information may be used to indicate whether to perform intra-frame sub-block division on the current coding block. For example, a flag bit can be used to represent the intra-frame sub-block division information, and the flag bit is written into the code stream.
- this flag bit is a flag bit located at the image level.
- the flag bit takes the first value, it means that in the current image, at least one block in the coding tree unit is divided into intra-frame sub-blocks.
- the flag bit takes the second value, Indicates that all coding tree units in the image do not use intra-subblock partitioning.
- the given image area for calculating the spatial continuity parameter may also be a single slice, a single coding tree unit, or a single coding block. Therefore, the flag bit used to represent intra-frame sub-block division information may also be located at the slice level, coding tree level, or block level.
- the present disclosure also provides a video encoding device, including a processor and a memory storing a computer program that can be run on the processor, wherein when the processor executes the computer program, any implementation of the present disclosure is implemented.
- a video encoding device including a processor and a memory storing a computer program that can be run on the processor, wherein when the processor executes the computer program, any implementation of the present disclosure is implemented.
- the present disclosure also provides a computer storage medium on which a computer program is stored.
- the computer program When the computer program is executed by a computer, the computer can perform the method of the above method embodiment.
- embodiments of the present disclosure also provide a computer program product containing instructions. When the instructions are executed by a computer, they cause the computer to perform the method of the above method embodiments.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center.
- the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
- the available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
- the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
- a unit described as a separate component may or may not be physically separate.
- a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请提供一种预测方法、装置及计算机可读存储介质,本申请计算给定图像区域的空域连续性参数,确定帧内子块划分信息,用于指示编码块是否使用帧内子块划分。对当前编码块根据帧内子块划分信息的指示,选择预测代价最小的预测模式进行预测。避免在空域相关性弱的图像区域进行不必要的帧内子块划分,降低了计算复杂度,从而提高了编码效率。
Description
本公开实施例涉及但不限于视频数据处理技术领域,尤其设及一种视频编解码方法、设备及存储介质。
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够节省不少视频数据,但目前仍然需要追求更好的数字视频压缩技术,以减少数字视频传输的带宽和流量压力,达到更高效的视频编解码和传输存储。
为了提供最优的视频数据压缩结果,往往需要在多种具体的可用配置方案下进行编码尝试并进行优选,因此,在满足视频数据播放及传输要求的前提下,在编码技术领域一方面要寻求更优的视频压缩技术方案,另一方面也需要兼顾编解码效率。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本公开实施例提供一种图像预测方法,包括:
获取帧内子块划分信息,当所述帧内子块划分信息指示当前块使用帧内子块划分时,
基于至少一个帧内子块划分,确定至少一个第一帧内预测模式以及所述至少一个第一帧内预测模式对应的预测代价,
确定至少一个第一其他预测模式以及所述至少一个第一其他预测模式对应的预测代价,
其中,所述帧内子块划分信息是根据计算空域连续性参数得到的,且所述帧内子块划分信息被写入码流。
本公开实施例还提供一种图像预测设备,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的图像预测方法。
本公开实施例还提供一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的图像预测方法。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2是本申请实施例涉及的视频编码器的示意性框图;
图3是本申请实施例涉及的视频解码器的示意性框图;
图4A是本申请一实施例提供的图像预测方法的示意性流程图;
图4B是本申请一实施例提供的计算空域连续性参数的流程图;
图5是本申请一实施例提供的图像预测方法的示意性流程图;
图6是本申请一实施例提供的图像预测方法的示意性流程图.
本公开可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视 频编解码领域、实时视频编解码领域等。例如,本公开的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本公开的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本公开的技术不限于任何特定编解码标准或技术。
为了便于理解,首先结合图1对本公开的实施例涉及的视频编解码系统进行介绍。
图1为本公开的实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本公开的实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本公开的实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本公开的实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个 图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本公开的实施例的技术方案不限于图1,例如本公开的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本公开的实施例涉及的视频编码框架进行介绍。
图2是本公开的实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。
在一些实施例中,如图2所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本公开中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要在参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成当前块的预测块。
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图3是本公开的实施例涉及的视频解码器的示意性框图。
如图3所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧间预测单元321和帧内估计单元322。
帧内估计单元322可执行帧内预测以产生PU的预测块。帧内估计单元322可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元322还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元321可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元321可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元321可根据PU的一个或多个参考块来产生PU的预测块。
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为 帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本公开适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
目前通用的视频编解码标准都采用基于块的混合编码框架。视频中的每一帧被分割成相同大小(如128x128,64x64等)的正方形的最大编码单元(LCU largest coding unit)或编码树单元(CTU Coding Tree Unit)。每个最大编码单元或编码树单元可根据规则划分成矩形的编码单元(CU coding unit)。
在H.264/AVC中,将输入图像划分成固定的尺寸的块作为编码的基本单元,并把它称为宏块(MB,Macro Block),包括一个亮度块和两个色度块,亮度块大小为16×16。如果采用4:2:0采样,色度块大小为亮度块大小的一半。在预测环节,根据预测模式的不同,将宏块进一步划分为用于预测的小块。帧内预测中可以把宏块划分成16×16、8×8、4×4的小块,每个小块分别进行帧内预测。在变换、量化环节,将宏块划分为4×4或8×8的小块,将每个小块中的预测残差分别进行变换和量化,得到量化后的系数。
H.265/HEVC与H.264/AVC相比,在多个编码环节采取了改进措施。在H.265/HEVC中,一幅图像被分割成编码树单元(CTU,Coding Tree Unit),CTU是编码的基本单元(对应于H.264/AVC中的宏块)。一个CTU包含一个亮度编码树块(CTB,Coding Tree Block)和两个色度编码树块,H.265/HEVC标准中CU的最大尺寸一般为64×64。为了适应多种多样的视频内容和视频特征,CTU采用四叉树(QT,Quadro Tree)方式迭代划分为一系列编码单元(CU,Coding Unit),CU是帧内/帧间编码的基本单元。一个CU包含一个亮度编码块(CB,Coding Block)和两个色度编码块及相关语法结构,最大CU大小为CTU,最小CU大小为8×8。经过编码树划分得到的叶子节点CU根据预测方式的不同,可分为三种类型:帧内预测的intra CU、帧间预测的inter CU和skipped CU。skipped CU可以看作是inter CU的特例,不包含运动信息和预测残差信息。叶子节点CU包含一个或者多个预测单元(PU,Prediction Unit),H.265/HEVC支持4×4到64×64大小的PU,一共有八种划分模式。对于帧内编码模式,可能的划分模式有两种:Part_2Nx2N和Part_NxN。对于预测残差信号,CU采用预测残差四叉树划分为变换单元(TU:Transform Unit)。一个TU包含一个亮度变换块(TB,Transform Block)和两个色度变换块。仅允许方形的划分,将一个CB划分为1个或者4个PB。同一个TU具有相同的变换和量化过程,支持的大小为4×4到32×32。与之前的编码标准不同,在帧间预测中,TB可以跨越PB的边界,以进一步最大化帧间编码的编码效率。
在H.266/VVC中,视频编码图像首先划分为跟H.265/HEVC相似的编码树单元CTU,但是最大尺寸从64×64提高到了128×128。H.266/VVC提出了四叉树和嵌套多类型树(MTT,Multi-Type Tree)划分,MTT包括二叉树(BT,Binary Tree)和三叉树(TT,Ternary Tree),且统一了H.265/HEVC中CU、PU、TU的概念,并且支持更灵活的CU划分形状。CTU按照四叉树结构进行划分,叶子节点通过MTT进一步划分。多类型树叶子节点成为编码单元CU,当CU不大于最大变换单元(64×64)时,后续预测和变换不会再进一步划分。大部分情况下CU、PU、TU具有相同的大小。考虑到亮度和色度的不同特性和具体实现的并行度,H.266/VVC中,色度可以采用单独的划分树结构,而不必和亮度划分树保持一致。H.266/VVC中I帧的色度划分采用色度分离树,P帧和B帧色度划分则与亮度划分保持一致。
在此基础上,H.266/VVC还引入了被称为帧内子块划分的技术。帧内子块划分(intra sub-partition,ISP)是一种基于子块划分的帧内预测技术。当使用ISP模式时,一个编码块将继续被水平划分或竖直划分为多个(如2个或4个)相同大小的子块,每个子块依次解码重建。 例如,对于一个16x8的当前块来说,可以水平被划分成4个16x2的子块,或者垂直划分成4个4x8子块。对于一个4x8块来说可以被划分成2个4X4块或者2个2x8的子块。可以注意到,子块的数目可以仅为2个(4x8,8x4的子块来说)。并且子块至少包含16个像素点,所以4x4块不会再被划分,也不需要在编码端和解码端进行相应的语法申明。具体划分规则如表三所示,至于是水平划分还是垂直划分,需要经过编码端RDO准则决策。为了节省编码时间,被划分的帧内子块的共享当前当前块的帧内预测模式(也即当前块划分得到的子块与当前块的帧内预测模式相同)。
示例性的,在某些实施方式中,待编码图像被划分成不重叠的多个CTU块。按照光栅扫描顺序依次处理每个CTU,将CTU按照不同方式划分成若干个CU。确定编码块划分方式的主要步骤如下:对于某一个CTU,使用常规的块划分方法(四叉树、三叉树、二叉树)进行块划分。对某种划分方式Split[i],分别对每个CU进行不同预测模式的帧内预测,选出最优的常规帧内预测模式bestRegIntraMode[i]和预测代价bestRegIntraCost[i]。若某个CU的块尺寸满足ISP条件,继续对该CU进行ISP子划分,得到若干个子块,分别对每个子块做不同预测模式的帧内预测,选出最优预测模式bestIspIntraMode[i]和预测代价bestIspIntraCost[i]。选出当前划分方式的最优帧内预测模式bestIntraMode[i]和预测代价bestIntraCost[i]。采用其它方法(如帧间预测)进行预测,选出最优的预测模式bestOtherMode[i]和预测代价bestOtherCost[i]。比较bestRegIntraCost[i]、bestIspIntraCost[i]和bestOtherCost[i],选出当前划分方式的最优预测模式bestMode[i]和预测代价bestCost[i]。遍历所有块划分方式,选出使当前CTU预测代价最小的块划分方式Split[opt]和相应的预测模式bestMode[opt]。最后,按照最优的块划分方式预测得到残差块,对残差块进行变换、量化、熵编码,对块划分模式、预测模式等预测信息进行编码,输出码流。
可以看到,这些可实现的H.266/VVC编解码方案中,即使只考虑四叉树划分,也有4
5+4
4+4
3+4
2+4
1+4
0=1365种划分模式,远远超过H.265/HEVC的341种模式。加上还有二叉树、三叉树划分方式,理论上总的划分次数则有几千种。因此,这些可实现的QTMT技术方案会导致H.266/VVC的编码复杂度远超过H.265/HEVC。如果再考虑帧内子块划分,编码复杂度将进一步提升。
下面结合图4A,对本公开的实施例提供的图像预测方法进行介绍。
步骤S401,获取帧内子块划分信息。
该帧内子块划分信息用于指示当前编码块是否使用帧内子块划分。
可以理解的是,在图像编码过程中,编码器接收视频流,该视频流由一系列图像帧组成,针对视频流中的每一帧图像进行视频编码,视频编码器对图像进行块划分,得到当前编码块。在一些实施例中,当前编码块也称为当前块、当前图像块、编码块、当前编码单元、当前待编码块、当前待编码的图像块等。对当前编码块还可以进行帧内子块划分。
进一步的,可以使用指示信息来指示是否对当前编码块进行帧内子块划分。示例性的,可以使用一个标志位来表示该帧内子块划分信息。当标志位取第一值时,表示当前块使用帧内子块划分,当标志位取第二值时,表示当前块不适用帧内子块划分。
进一步的,可以根据计算空域连续性参数得到所述帧内子块划分信息。所述空域连续性参数可以表示给定图像区域中像素点的变化连续性。
在某些实施例中,如果空域连续性参数大于或等于某阈值,则帧内子块划分信息指示当前块使用帧内子块划分。反之,如果连续性参数小于某阈值,则帧内子块划分信息指示当前块不使用帧内子块划分。
在另一些实施例中,如果空域连续性参数大于某阈值,则帧内子块划分信息指示当前块使用帧内子块划分。反之,如果连续性参数小于或等于某阈值,则帧内子块划分信息指示当前块不使用帧内子块划分。
在某些实施例中,给定图像区域可以是单幅图像、单个条带、单个编码树单元或者单个 编码块。在另一些实施例中,给定图像区域可以是由多个空间相邻的条带、多个空间相邻的编码树单元、或者多个空间相邻的编码块组成的图像区域。可以理解的是,针对给定图像区域计算空域连续性参数的方法可以适用于以上各种可能的给定图像区域,本公开实施例对此不做限定。
结合图4B,本公开的实施例中,确定给定图像区域的空域连续性参数的方式包括但不限于如下方式:
通过梯度图确定图像区域的连续性参数,即上述步骤S401包括如下S401-A1和S401-A2的步骤:
S401-A1、确定给定图像区域的梯度图,以及该梯度图的梯度图;
S401-A2、根据该梯度图的梯度图,确定图像区域的连续性参数。
图像区域的梯度图可以反映出图像区域中像素点的变化趋势,图像区域的梯度图的梯度图可以反映出图像区域中像素点的变化率,即变化连续性。因此,在该方式中,通过计算图像区域的梯度图的梯度图来确定图像区域的连续性参数。
示例性的,根据如下公式(1)确定图像区域的梯度图Gmap:
Gmap(x,y)=|I(x,y)-I(x+1,y)|+|I(x,y)-I(x,y+1)| (1)
其中,I(x,y)是图像区域I在位置(x,y)处的像素值,I(x+1,y)是图像区域I在位置(x+1,y)处的像素值,I(x,y+1)是图像区域I在位置(x,y+1)处的像素值,Gmap(x,y)为图像区域I中位置(x,y)处的像素点的梯度值。
示例性的,根据如下公式(2)确定图像区域的梯度图的梯度图GGmap:
GGmap(x,y)=|Gmap(x,y)-Gmap(x+1,y)|+|Gmap(x,y)-Gmap(x,y+1)| (2)
其中,Gmap(x+1,y)为图像区域I中位置(x+1,y)处的像素点的梯度值,Gmap(x,y+1)为图像区域I中位置(x,y+1)处的像素点的梯度值,其中Gmap(x+1,y)和Gmap(x,y+1)可以根据上述公式(1)确定。GGmap(x,y)为图像区域I中位置(x,y)处的像素点的梯度值的梯度值。
上述公式(1)和公式(2)示出了确定图像区域I的像素点(x,y)的梯度值和梯度值的梯度值,图像区域I中的其他像素点参照像素点(x,y)的方式,确定出图像区域I中每个像素点的梯度值,得到图像区域I的梯度图Gmap。采用上述公式(2)对图像区域I的梯度图Gmap中每个像素点对应的梯度值再计算梯度值,得到图像区域I的梯度图的梯度图GGmap。
需要说明的是,上述公式(1)和公式(2)只是一种示例,本公开的实施例还可以采用其他的方式,确定图像区域的梯度图,以及梯度图的梯度图,例如对上述公式(1)和公式(2)进行变形,使用变形后的公式确定梯度图,以及梯度图的梯度图。
根据上述方式,确定出图像区域的梯度图的梯度图后,根据该梯度图的梯度图GGmap,确定图像区域的连续性参数。
本公开的实施例对上述S401-A2中根据该梯度图的梯度图,确定图像区域的连续性参数的方式不做限制。
在一种示例中,将图像区域的梯度图的梯度图中各梯度值的中值,确定为该图像区域的连续性参数。
在另一种示例中,将图像区域的梯度图的梯度图中梯度值的平均值,确定为该图像区域的连续性参数。
容易理解的是,帧内子块划分的根本原理是基于空域上的相关性,对编码块做更细的划分。由于划分得到的子块的相关性较强,可以进一步减少帧内预测的残差,从而提升压缩效率。显然,帧内子块划分在提升压缩率的同时会增加更多的块划分可能性,从而导致更多的 率失真优化计算,大幅增加编码复杂度。而空域相关性较弱的视频图像,帧内子块划分带来的性能增益会很有限,但依然会大幅增加编码复杂度。
可见,根据计算空域连续性参数得到帧内子块划分信息,可以避免在空域相关性弱的图像区域进行不必要的帧内子块划分,降低了计算复杂度,从而提高了编码效率。且上述方式中,通过确定图像区域的梯度图的梯度图确定图像区域的连续性参数,计算方法简单准确,计算开销小,进一步提高了编码效率。
步骤S402,当所述帧内子块划分信息指示当前块使用帧内子块划分时,基于至少一个帧内子块划分,确定至少一个第一帧内预测模式以及所述至少一个第一帧内预测模式对应的预测代价。
对于可以进行帧内子块划分的编码块,其可以继续被划分为多个子块。应当说明的是,子块划分的方式不是唯一的,也就是说,对于当前块存在多种子块划分。为了对当前块进行预测编码,需要考虑所述多种子块划分分别的预测情况。
在某些实施例中,基于某种子块划分,分别使用不同的第一帧内预测模式对当前块进行预测。可以理解的是,对当前块的若干个子块,使用相同的第一帧内预测模式进行预测。即,基于某种第一帧内预测模式,依次对当前块的子块进行帧内预测,以得到当前块所有样本位置的预测值。
示例性的,可以采用的第一帧内预测模式包括但不限于平面模式Planar、直流模式DC、多种角度模式、矩阵加权帧内预测模式(Matrix weighted intra-frame prediction,MIP)等。
可以理解的是,通过每一种帧内预测模式对当前块进行预测,都可以计算得到该帧内预测的预测代价。预测代价通常用于表征使用预测模式完成编码时,码流开销和由于编码带来的图像失真的综合测度。常用的预测代价包括但不限于率失真代价等。
可见,基于某种子块划分,分别使用不同的第一帧内预测模式对当前块进行预测的同时,也将得到相应的多个预测代价。
步骤S403,确定至少一个第一其它预测模式以及所述至少一个第一其他预测模式对应的预测代价。
对于当前编码块,还可以分别采用多种其它预测模式进行预测。在某些实施例中,其它预测模式包括帧间预测模式,帧内帧间联合预测(Combined inter and intra prediction,CIIP)等。
可以理解的是,在采用帧间预测的帧中,帧内预测不是被禁止的。也就是说,对于P帧和B帧中的编码块,在确定其预测模式时需要同时考虑帧内和帧间预测模式。
类似的,通过每一种其它预测模式对当前块进行预测,都可以计算得到该其它预测的预测代价。预测代价通常用于表征使用预测模式完成编码时,码流开销和由于编码带来的图像失真的综合测度。常用的预测代价包括但不限于率失真代价等。
可见,分别使用不同的第一其它预测模式对当前块进行预测的同时,也将得到相应的多个预测代价。
步骤S404,确定至少一个第一帧内预测模式和至少一个第一其他预测模式中,预测代价最小的预测模式为当前块的编码预测模式,使用所述代价最小的预测模式对当前块进行预测。
不难理解,在步骤S402和步骤S403中,已经使用至少一个第一帧内预测模式和至少一个第一其它预测模式对当前块进行了预测,以得到这些预测模式分别对应的预测代价。容易想到的是,可以通过比较这些预测代价,确定在预测编码中实际使用的预测模式和预测代价,并基于该预测模式对当前块进行预测。
在某些实施例中,预测代价最小的预测模式被确定为在预测编码中使用的预测模式。
步骤S405,将所述帧内子块划分信息写入码流。
可以使用指示信息来指示是否对当前编码块进行帧内子块划分。实例性的,可以使用一个标志位来表示该帧内子块划分信息,所述标志位被写入码流。
在某些实施例中,该标志位是位于序列级的标志位,当标志位取第一值时,表示当前图像序列中,至少有一幅图像中的块使用帧内子块划分,当标志位取第二值时,表示图像序列中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于图像级的标志位,当标志位取第一值时,表示当前图像中,至少有一个条带中的块使用帧内子块划分,当标志位取第二值时,表示图像中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于条带级的标志位,当标志位取第一值时,表示当前条带中,至少有一个编码树单元中的块使用帧内子块划分,当标志位取第二值时,表示条带中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于编码树级的标志位,当标志位取第一值时,表示当前编码树单元中,至少有一个块使用帧内子块划分,当标志位取第二值时,表示编码树单元中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于块级的标志位,当标志位取第一值时,表示当前块使用帧内子块划分,当标志位取第二值时,表示当前块不使用帧内子块划分。
下面结合图5,对本公开的另一实施例提供的图像预测方法进行介绍。
步骤S501,获取帧内子块划分信息
可以理解的是,在图像编码过程中,编码器接收视频流,该视频流由一系列图像帧组成,针对视频流中的每一帧图像进行视频编码,视频编码器对图像帧进行块划分,得到当前编码块。在一些实施例中,当前编码块也称为当前块、当前图像块、编码块、当前编码单元、当前待编码块、当前待编码的图像块等。对当前编码块还可以继续进行帧内子块划分。
进一步的,可以使用指示信息来指示是否对当前编码块进行帧内子块划分。示例性的,可以使用一个标志位来表示所述帧内子块划分信息。当标志位取第一值时,表示当前块使用帧内子块划分,当标志位取第二值时,表示当前块不适用帧内子块划分。
进一步的,可以根据计算空域连续性参数得到所述帧内子块划分信息。所述空域连续性参数可以表示给定图像区域中像素点的变化连续性。
在某些实施例中,如果空域连续性参数大于或等于某阈值,则帧内子块划分信息指示当前块使用帧内子块划分。反之,如果连续性参数小于某阈值,则帧内子块划分信息指示当前块不使用帧内子块划分。
在另一些实施例中,如果空域连续性参数大于某阈值,则帧内子块划分信息指示当前块使用帧内子块划分。反之,如果连续性参数小于或等于某阈值,则帧内子块划分信息指示当前块不使用帧内子块划分。
在某些实施例中,给定图像区域可以是单幅图像、单个条带、单个编码树单元或者单个编码块。在另一些实施例中,给定图像区域可以是由多个空间相邻的条带、多个空间相邻的编码树单元、或者多个空间相邻的编码块组成的图像区域。可以理解的是,针对给定图像区域计算空域连续性参数的方法可以适用于以上各种可能的给定图像区域,本公开实施例对此不做限定。
本公开的实施例中,确定给定图像区域的空域连续性参数的方式可以使用与上述S401-A1和S401-A2相同的方式,在此不再赘述。即步骤S501包括S401-A1和S401-A2的步骤。
容易理解的是,帧内子块划分的根本原理是基于空域上的相关性,对编码块做更细粒的划分。由于划分得到的子块的相关性较强,可以进一步减少帧内预测的残差,从而提升压缩 效率。显然,帧内子块划分在提升压缩率的同时会增加更多的块划分可能性,从而导致更多的率失真优化计算,大幅增加编码复杂度。而空域相关性较弱的视频图像,帧内子块划分带来的性能增益会很有限,但依然会大幅增加编码复杂度。
可见,根据计算空域连续性参数得到帧内子块划分信息,可以避免在空域相关性弱的图像区域进行不必要的帧内子块划分,降低了计算复杂度,从而提高了编码效率。且上述方式中,通过确定图像区域的梯度图的梯度图确定图像区域的连续性参数,计算方法简单准确,计算开销小,进一步提高了编码效率
步骤S502,当所述帧内子块划分信息指示当前块不使用帧内子块划分时,确定至少一个第二帧内预测模式以及所述至少一个第二帧内预测模式对应的预测代价。
对于不进行帧内子块划分的编码块,由于不再继续被划分为多个子块,该块做为一个整体进行预测编码。
在某些实施例中,分别使用不同的第二帧内预测模式对当前块进行预测。示例性的,可以采用的第二帧内预测模式包括但不限于平面模式Planar、直流模式DC、多种角度模式、矩阵加权帧内预测模式(Matrix weighted intra-frame prediction,MIP)等。
可以理解的是,通过每一种第二帧内预测模式对当前块进行预测,都可以计算得到所述第二帧内预测的预测代价。预测代价通常用于表征使用预测模式完成编码时,码流开销和由于编码带来的图像失真的综合测度。常用的预测代价包括但不限于率失真代价等。
可见,分别使用不同的第二帧内预测模式对当前块进行预测的同时,也将得到相应的多个预测代价。
步骤S503,确定至少一个第二其它预测模式以及该至少一个第二其他预测模式对应的预测代价。
对于当前编码块,还可以分别采用多种第二其它预测模式进行预测。在某些实施例中,第二其它预测模式包括帧间预测模式,帧内帧间联合预测等。
可以理解的是,在采用帧间预测的帧中,帧内预测不是被禁止的。也就是说,对于P帧和B帧中的编码块,在确定其预测模式时需要同时考虑帧内和帧间预测模式。
类似的,通过每一种第二其它预测模式对当前块进行预测,都可以计算得到所述第二其它预测的预测代价。预测代价通常用于表征使用预测模式完成编码时,码流开销和由于编码带来的图像失真的综合测度。常用的预测代价包括但不限于率失真代价等。
可见,分别使用不同的第二帧其它测模式对当前块进行预测的同时,也将得到相应的多个预测代价。
步骤S504,确定至少一个第二帧内预测模式和至少一个第二其他预测模式中,预测代价最小的预测模式为当前块的编码预测模式,使用所述代价最小的预测模式对当前块进行预测。
不难理解,在步骤S502和步骤S503中,已经使用至少一个第二帧内预测模式和至少一个第二其它预测模式对当前块进行了预测,以得到这些预测模式分别对应的预测代价。容易想到的是,可以通过比较这些预测代价,确定在预测编码中实际使用的预测模式和预测代价,并基于该预测模式对当前块进行预测。
在某些实施例中,预测代价最小的预测模式被确定为在预测编码中使用的预测模式。
步骤S505,将所述帧内子块划分信息写入码流。
可以使用指示信息来指示是否对当前编码块进行帧内子块划分。实例性的,可以使用一个标志位来表示该帧内子块划分信息,所述标志位被写入码流。
在某些实施例中,该标志位是位于序列级的标志位,当标志位取第一值时,表示当前图像序列中,至少有一幅图像中的块使用帧内子块划分,当标志位取第二值时,表示图像序列中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于图像级的标志位,当标志位取第一值时,表示当前图像中,至少有一个条带中的块使用帧内子块划分,当标志位取第二值时,表示图像中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于条带级的标志位,当标志位取第一值时,表示当前条带中,至少有一个编码树单元中的块使用帧内子块划分,当标志位取第二值时,表示条带中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于编码树级的标志位,当标志位取第一值时,表示当前编码树单元中,至少有一个块使用帧内子块划分,当标志位取第二值时,表示编码树单元中的所有块均不使用帧内子块划分。
在某些实施例中,该标志位是位于块级的标志位,当标志位取第一值时,表示当前块使用帧内子块划分,当标志位取第二值时,表示当前块不使用帧内子块划分。
下面结合图6,对本公开的再一实施例提供的图像预测方法进行介绍。
步骤S601,获取待编码图像帧,计算图像帧的空域连续性参数,所述空域连续性参数可以表示给定图像区域中像素点的变化连续性。
本公开的实施例中,确定给定图像区域的空域连续性参数的方式包括但不限于如下方式:
通过梯度图确定图像区域的连续性参数:
确定给定图像区域的梯度图,以及该梯度图的梯度图;
根据该梯度图的梯度图,确定图像区域的连续性参数。
图像区域的梯度图可以反映出图像区域中像素点的变化趋势,图像区域的梯度图的梯度图可以反映出图像区域中像素点的变化率,即变化连续性。因此,在该方式中,通过计算图像区域的梯度图的梯度图来确定图像区域的连续性参数。
示例性的,根据如下公式(1)确定图像区域的梯度图Gmap:
Gmap(x,y)=|I(x,y)-I(x+1,y)|+|I(x,y)-I(x,y+1)| (1)
其中,I(x,y)是图像区域I在位置(x,y)处的像素值,I(x+1,y)是图像区域I在位置(x+1,y)处的像素值,I(x,y+1)是图像区域I在位置(x,y+1)处的像素值,Gmap(x,y)为图像区域I中位置(x,y)处的像素点的梯度值。
示例性的,根据如下公式(2)确定图像区域的梯度图的梯度图GGmap:
GGmap(x,y)=|Gmap(x,y)-Gmap(x+1,y)|+|Gmap(x,y)-Gmap(x,y+1)| (2)
其中,Gmap(x+1,y)为图像区域I中位置(x+1,y)处的像素点的梯度值,Gmap(x,y+1)为图像区域I中位置(x,y+1)处的像素点的梯度值,其中Gmap(x+1,y)和Gmap(x,y+1)可以根据上述公式(1)确定。GGmap(x,y)为图像区域I中位置(x,y)处的像素点的梯度值的梯度值。
上述公式(1)和公式(2)示出了确定图像区域I的像素点(x,y)的梯度值和梯度值的梯度值,图像区域I中的其他像素点参照像素点(x,y)的方式,确定出图像区域I中每个像素点的梯度值,得到图像区域I的梯度图Gmap。采用上述公式(2)对图像区域I的梯度图Gmap中每个像素点对应的梯度值再计算梯度值,得到图像区域I的梯度图的梯度图GGmap。
需要说明的是,上述公式(1)和公式(2)只是一种示例,本公开的实施例还可以采用其他的方式,确定图像区域的梯度图,以及梯度图的梯度图,例如对上述公式(1)和公式(2)进行变形,使用变形后的公式确定梯度图,以及梯度图的梯度图。
根据上述方式,确定出图像区域的梯度图的梯度图后,根据该梯度图的梯度图GGmap,确定图像区域的连续性参数。
本公开的实施例对上述SZ01-A2中根据该梯度图的梯度图,确定图像区域的连续性参数的方式不做限制。
在一种示例中,将图像区域的梯度图的梯度图中各梯度值的中值,确定为该图像区域的连续性参数。
在另一种示例中,将图像区域的梯度图的梯度图中梯度值的平均值,确定为该图像区域的连续性参数。
步骤S602,判断连续性参数是否大于等于预定阈值,如果是执行步骤S603,如果不是执行步骤S604
步骤S603,对于待编码图像帧内的待编码块,确定帧内子块划分信息指示待编码块使用帧内子块划分,基于对待编码块的帧内子块划分,确定预测代价最小的预测模式。
可以理解的是,基于帧内子块划分,确定预测代价最小的预测模式的方式,可以使用与上述S402-S404相同的方式,在此不再赘述。
步骤S604,对于待编码图像帧内的待编码块,确定帧内子块划分信息指示待编码块不使用帧内子块划分,所述待编码块做为一个整体进行预测编码,确定预测代价最小的预测模式。
可以理解的是,所述待编码块做为一个整体进行预测编码,确定预测代价最小的预测模式的方式,可以使用与上述S502-S504相同的方式,在此不再赘述。
步骤S605,使用预测代价最小的预测模式对待编码快进行预测,使用步骤S603或步骤S604确定的预测代价最小的预测模式对待编码快进行预测。
步骤S606,将所述帧内子块划分信息写入码流,可以使用指示信息来指示是否对当前编码块进行帧内子块划分。实例性的,可以使用一个标志位来表示该帧内子块划分信息,所述标志位被写入码流。
可以理解的是,在一些实施例中,由于空域连续性参数是针对单幅待编码图像计算的。因此该标志位是位于图像级的标志位,当标志位取第一值时,表示当前图像中,至少有一个编码树单元中的块使用帧内子块划分,当标志位取第二值时,表示图像中的所有编码树单元均不使用帧内子块划分。
在另一些实施例中,计算空域连续性参数的给定图像区域还可以是单个条带、单个编码树单元或者单个编码块。因此,用于表示帧内子块划分信息的标志位也可以位于条带级、编码树级、块级。
本公开还提供了一种视频编码设备,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本公开的实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本公开的实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质 可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本公开的各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开的揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的的保护范围之内。因此,本公开的保护范围应以该权利要求的保护范围为准。
Claims (9)
- 一种图像预测方法,其特征在于,包括:获取帧内子块划分信息,当所述帧内子块划分信息指示当前块使用帧内子块划分时,基于至少一个帧内子块划分,确定至少一个第一帧内预测模式以及该至少一个第一帧内预测模式对应的预测代价,确定至少一个第一其它预测模式以及该至少一个第一其它预测模式对应的预测代价,确定至少一个第一帧内预测模式和至少一个第一其他预测模式中,预测代价最小的预测模式为当前块的编码预测模式,使用所述预测代价最小的预测模式对当前块进行预测,其中,所述帧内子块划分信息是根据计算空域连续性参数得到的,且所述帧内子块划分信息被写入码流。
- 根据权利要求1所述的预测方法,其特征在于,当所述帧内子块划分信息指示当前块不使用帧内子块划分时:确定至少一个第二帧内预测模式以及所述至少一个第二帧内预测模式对应的预测代价,确定至少一个第二其它预测模式以及所述至少一个第二其它预测模式对应的预测代价,确定至少一个第二帧内预测模式和至少一个第二其它预测模式中,预测代价最小的预测模式为当前块的编码预测模式,使用所述预测代价最小的预测模式对当前块进行预测。
- 根据权利要求1或2所述的预测方法,其特征在于,所述计算空域连续性参数包括:计算给定图像区域的第一梯度图,计算所述第一梯度图的梯度图,得到第二梯度图,取所述第二梯度图的平均值为所述空域连续性参数。
- 根据权利要求3所述的预测方法,其特征在于,所述给定图像区域为单个图像、单个条带、单个编码树块或单个编码块,所述当前块位于所述给定图像区域内。
- 根据权利要求1所述的预测方法,其特征在于,所述帧内子块划分信息在码流中位于序列级、图像级、条带级、编码树级或块级。
- 根据权利要求1或2所述的预测方法,其特征在于,所述至少一个第一其它预测模式包括帧间预测或帧内帧间联合预测。
- 根据权利要求1或2所述的预测方法,其特征在于,所述至少一个第二其它预测模式包括帧间预测或帧内帧间联合预测。
- 一种图像预测装置,其特征在于,包括处理器和存储器;所示存储器用于存储计算机程序;所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现如上述权利要求1至7任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,用于存储计算机程序;所述计算机程序使得计算机执行如上述权利要求1至7任一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/117595 WO2024050723A1 (zh) | 2022-09-07 | 2022-09-07 | 一种图像预测方法、装置及计算机可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/117595 WO2024050723A1 (zh) | 2022-09-07 | 2022-09-07 | 一种图像预测方法、装置及计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024050723A1 true WO2024050723A1 (zh) | 2024-03-14 |
Family
ID=90192692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/117595 WO2024050723A1 (zh) | 2022-09-07 | 2022-09-07 | 一种图像预测方法、装置及计算机可读存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024050723A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118509607A (zh) * | 2024-07-19 | 2024-08-16 | 典基网络科技(上海)有限公司 | 一种基于边缘计算的实时视频处理和智能分析方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102905150A (zh) * | 2012-10-22 | 2013-01-30 | 北京航空航天大学 | 一种新的多视点视频分形编码压缩与解压缩方法 |
US20170085892A1 (en) * | 2015-01-20 | 2017-03-23 | Beijing University Of Technology | Visual perception characteristics-combining hierarchical video coding method |
CN112055203A (zh) * | 2020-08-22 | 2020-12-08 | 浙江大华技术股份有限公司 | 帧间预测方法、视频编码方法及其相关装置 |
-
2022
- 2022-09-07 WO PCT/CN2022/117595 patent/WO2024050723A1/zh unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102905150A (zh) * | 2012-10-22 | 2013-01-30 | 北京航空航天大学 | 一种新的多视点视频分形编码压缩与解压缩方法 |
US20170085892A1 (en) * | 2015-01-20 | 2017-03-23 | Beijing University Of Technology | Visual perception characteristics-combining hierarchical video coding method |
CN112055203A (zh) * | 2020-08-22 | 2020-12-08 | 浙江大华技术股份有限公司 | 帧间预测方法、视频编码方法及其相关装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118509607A (zh) * | 2024-07-19 | 2024-08-16 | 典基网络科技(上海)有限公司 | 一种基于边缘计算的实时视频处理和智能分析方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI788331B (zh) | 視頻寫碼中之內濾波旗標 | |
TWI761332B (zh) | 從用於視訊寫碼之固定濾波器預測濾波器係數 | |
WO2018001207A1 (zh) | 编解码的方法及装置 | |
TW202005399A (zh) | 基於區塊之自適應迴路濾波器(alf)之設計及發信令 | |
CN110720218A (zh) | 与视频译码中的变换处理一起应用的帧内滤波 | |
JP2018511232A (ja) | 非正方形区分を使用してビデオデータを符号化するための最適化 | |
US20230379500A1 (en) | Residual and coefficients coding for video coding | |
US20240179314A1 (en) | Residual and coefficients coding for video coding | |
JP2021502771A (ja) | 画像コーディングシステムにおいてブロックサイズに応じた変換を使用する画像デコード方法およびその装置 | |
WO2022166462A1 (zh) | 编码、解码方法和相关设备 | |
WO2024050723A1 (zh) | 一种图像预测方法、装置及计算机可读存储介质 | |
WO2023138562A1 (zh) | 图像解码方法、图像编码方法及相应的装置 | |
WO2023004590A1 (zh) | 一种视频解码、编码方法及设备、存储介质 | |
WO2023044868A1 (zh) | 视频编解码方法、设备、系统、及存储介质 | |
WO2022174475A1 (zh) | 视频编解码方法与系统、及视频编码器与视频解码器 | |
JP2022540144A (ja) | デブロッキングフィルタリングに基づく映像コーディング方法及びその装置 | |
WO2023184250A1 (zh) | 视频编解码方法、装置、设备、系统及存储介质 | |
WO2022155922A1 (zh) | 视频编解码方法与系统、及视频编码器与视频解码器 | |
WO2022116054A1 (zh) | 图像处理方法、系统、视频编码器及视频解码器 | |
CN116760976B (zh) | 仿射预测决策方法、装置、设备及存储介质 | |
WO2023236113A1 (zh) | 视频编解码方法、装置、设备、系统及存储介质 | |
WO2023122969A1 (zh) | 帧内预测方法、设备、系统、及存储介质 | |
WO2023122968A1 (zh) | 帧内预测方法、设备、系统、及存储介质 | |
WO2023184248A1 (zh) | 视频编解码方法、装置、设备、系统及存储介质 | |
WO2024192733A1 (zh) | 视频编解码方法、装置、设备、系统、及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22957696 Country of ref document: EP Kind code of ref document: A1 |