WO2024016171A1 - 一种视频编码方法、设备、存储介质及码流 - Google Patents

一种视频编码方法、设备、存储介质及码流 Download PDF

Info

Publication number
WO2024016171A1
WO2024016171A1 PCT/CN2022/106532 CN2022106532W WO2024016171A1 WO 2024016171 A1 WO2024016171 A1 WO 2024016171A1 CN 2022106532 W CN2022106532 W CN 2022106532W WO 2024016171 A1 WO2024016171 A1 WO 2024016171A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
block division
encoded
division
texture feature
Prior art date
Application number
PCT/CN2022/106532
Other languages
English (en)
French (fr)
Inventor
唐桐
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/106532 priority Critical patent/WO2024016171A1/zh
Publication of WO2024016171A1 publication Critical patent/WO2024016171A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks

Definitions

  • Embodiments of the present disclosure relate to, but are not limited to, the technical field of video data processing, and in particular, to a video encoding method, device, storage medium, and code stream.
  • Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
  • Digital video compression standards can save a lot of video data, it is still necessary to pursue better digital video compression technology to reduce the number of Bandwidth and traffic pressure of video transmission to achieve more efficient video encoding, decoding, transmission and storage.
  • An embodiment of the present disclosure provides a video encoding method, including:
  • the first direction block division method is not used when performing block division processing of the data block to be encoded.
  • An embodiment of the present disclosure also provides a video encoding device, including a processor and a memory storing a computer program that can be run on the processor, wherein when the processor executes the computer program, any one of the aspects of the present disclosure is implemented.
  • a video encoding device including a processor and a memory storing a computer program that can be run on the processor, wherein when the processor executes the computer program, any one of the aspects of the present disclosure is implemented.
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the computer program implements the method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a code stream, wherein the code stream is generated according to the video encoding method described in any embodiment of the present disclosure.
  • Figure 1 is a structural block diagram of a video encoding and decoding system that can be used in embodiments of the present disclosure
  • Figure 2 is a structural block diagram of a video encoder that can be used in embodiments of the present disclosure
  • Figure 3 is a structural block diagram of a video decoder that can be used in embodiments of the present disclosure
  • Figure 4 is a schematic diagram of multiple types of trees that can be used in embodiments of the present disclosure.
  • Figure 5 is a schematic diagram of a block partitioning process that can be used in embodiments of the present disclosure
  • Figure 6 is a schematic diagram of a QTMT block division result that can be used in embodiments of the present disclosure
  • Figure 7 is a flow chart of a video encoding method that can be used in embodiments of the present disclosure.
  • Figure 8 is a schematic diagram of a first texture feature value calculation that can be used in embodiments of the present disclosure.
  • Figure 9 is a flow chart of another video encoding method that can be used in embodiments of the present disclosure.
  • Figure 10 is a flow chart of another video encoding method that can be used in embodiments of the present disclosure.
  • Figure 11 is a schematic structural diagram of a video encoding device that can be used in embodiments of the present disclosure.
  • Each frame in the video is divided into square largest coding units (LCU largest coding unit) or coding tree units (CTU Coding Tree Unit) of the same size (such as 128x128, 64x64, etc.).
  • LCU largest coding unit square largest coding units
  • CTU Coding Tree Unit coding tree units
  • Each maximum coding unit or coding tree unit can be divided into rectangular coding units (CU coding units) according to rules.
  • Coding units may also be divided into prediction units (PU prediction unit), transformation units (TU transform unit), etc.
  • the hybrid coding framework includes prediction, transform, quantization, entropy coding, in loop filter and other modules.
  • the prediction module includes intra prediction and inter prediction.
  • Inter-frame prediction includes motion estimation (motion estimation) and motion compensation (motion compensation).
  • the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
  • the mainstream video coding standards include H.264/Advanced Video Coding (AVC), H.265/High Efficiency Video Coding (HEVC), H.266/Versatile Video Coding (Multiple Functional video coding (VVC), MPEG (Moving Picture Experts Group, Moving Picture Experts Group), AOM (Alliance for Open Media), AVS (Audio Video coding Standard, audio and video coding standard) and the expansion of these standards, Or any other customized standards, etc.
  • AVC Advanced Video Coding
  • HEVC High Efficiency Video Coding
  • VVC Multiple Functional video coding
  • MPEG Motion Picture Experts Group
  • Moving Picture Experts Group Moving Picture Experts Group
  • AOM Alliance for Open Media
  • AVS Audio Video coding Standard, audio and video coding standard
  • these standards reduce the amount of data transmitted and the amount of data stored through video compression technology to achieve more efficient video encoding, decoding, transmission and storage.
  • the input image is divided into fixed-size blocks as the basic unit of encoding, and it is called a macro block (MB, Macro Block), including one luminance block and two chrominance blocks.
  • the block size is 16 ⁇ 16. If 4:2:0 sampling is used, the chroma block size is half the luma block size.
  • macroblocks are further divided into small blocks for prediction according to different prediction modes.
  • intra-frame prediction macroblocks can be divided into small blocks of 16 ⁇ 16, 8 ⁇ 8, and 4 ⁇ 4, and each small block is subjected to intra-frame prediction separately.
  • the macroblock is divided into 4 ⁇ 4 or 8 ⁇ 8 small blocks, and the prediction residuals in each small block are transformed and quantized respectively to obtain the quantized coefficients.
  • H.265/HEVC Compared with H.264/AVC, H.265/HEVC has taken improvement measures in multiple encoding aspects.
  • CTU coding tree units
  • a CTU includes a luma coding tree block (CTB, Coding Tree Block) and two chroma coding tree blocks.
  • CTB Coding Tree Block
  • the maximum size of the CU in the H.265/HEVC standard is generally 64 ⁇ 64.
  • CTU is iteratively divided into a series of coding units (CU, Coding Unit) using Quadtree (QT, Quadro Tree) method.
  • CU is the basic unit of intra-frame/inter-frame coding.
  • a CU contains one luma coding block (CB, Coding Block) and two chroma coding blocks and related syntax structures.
  • the maximum CU size is CTU and the minimum CU size is 8 ⁇ 8.
  • the leaf node CU obtained after coding tree division can be divided into three types according to different prediction methods: intra CU for intra-frame prediction, inter CU for inter-frame prediction and skipped CU.
  • skipped CU can be regarded as a special case of inter CU, which does not contain motion information and prediction residual information.
  • the leaf node CU contains one or more prediction units (PU, Prediction Unit).
  • H.265/HEVC supports PU sizes from 4 ⁇ 4 to 64 ⁇ 64, with a total of eight division modes.
  • intra coding mode there are two possible partitioning modes: Part_2Nx2N and Part_NxN.
  • CU uses the prediction residual quadtree to divide it into transform units (TU: Transform Unit).
  • a TU contains a luminance transform block (TB, Transform Block) and two chroma transform blocks. Only square divisions are allowed, dividing a CB into 1 or 4 PBs.
  • the same TU has the same transformation and quantization process, and the supported size is 4 ⁇ 4 to 32 ⁇ 32.
  • TBs can span the boundaries of PBs to further maximize the coding efficiency of inter-frame coding.
  • H.266/VVC the video coded image is first divided into coding tree units CTU similar to H.265/HEVC, but the maximum size is increased from 64 ⁇ 64 to 128 ⁇ 128.
  • H.266/VVC proposes quadtree and nested multi-type tree (MTT, Multi-Type Tree) division.
  • MTT includes binary tree (BT, Binary Tree) and ternary tree (TT, Ternary Tree), and unifies H. 265/HEVC, and supports more flexible CU division shapes.
  • the CTU is divided according to the quadtree structure, and the leaf nodes are further divided through MTT. Multi-type tree leaf nodes become coding units CU.
  • chroma can adopt a separate partition tree structure and does not have to be consistent with the luminance partition tree.
  • the chroma division of I frames in H.266/VVC uses a chroma separation tree, while the chroma division of P frames and B frames is consistent with the luminance division.
  • Figure 1 is a block diagram of a video encoding and decoding system that can be used in embodiments of the present disclosure. As shown in Figure 1, the system is divided into an encoding side device 1 and a decoding side device 2.
  • the encoding side device 1 encodes video images to generate a code stream.
  • the decoding side device 2 can decode the code stream to obtain a reconstructed video image.
  • the encoding side device 1 and the decoding side device 2 may include one or more processors and a memory coupled to the one or more processors, such as a random access memory, a power-erasable programmable read-only memory, a flash memory, or Other media.
  • the encoding side device 1 and the decoding side device 2 can be implemented by various devices, such as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, vehicle-mounted computers, or other similar devices. installation.
  • the decoding side device 2 can receive the code stream from the encoding side device 1 via the link 3 .
  • Link 3 includes one or more media or devices capable of moving the code stream from the encoding side device 1 to the decoding side device 2 .
  • the link 3 includes one or more communication media that enable the encoding side device 1 to directly send the code stream to the decoding side device 2 .
  • the encoding-side device 1 can modulate the code stream according to a communication standard (eg, a wireless communication protocol), and can send the modulated code stream to the decoding-side device 2 .
  • the one or more communication media may include wireless and/or wired communication media, such as the radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations or other devices that facilitate communication from the encoding side device 1 to the decoding side device 2 .
  • the code stream may also be output from the output interface 15 to a storage device, and the decoding side device 2 may read the stored data from the storage device via streaming or downloading.
  • the storage device may include any of a variety of distributed-access or locally-accessed data storage media, such as a hard drive, Blu-ray Disc, digital versatile disc, CD-ROM, flash memory, volatile or non-volatile Volatile memory, file servers, etc.
  • the encoding side device 1 includes a data source 11 , an encoder 13 and an output interface 15 .
  • Data source 11 may include a video capture device (eg, a video camera), an archive containing previously captured data, a feed interface to receive data from a content provider, a computer graphics system to generate the data, or a combination of these sources.
  • the encoder 13 can encode the data from the data source 11 and output it to the output interface 15.
  • the output interface 15 can include at least one of a regulator, a modem and a transmitter.
  • the decoding side device 2 includes an input interface 21 , a decoder 23 and a display device 25 .
  • input interface 21 includes at least one of a receiver and a modem.
  • the input interface 21 may receive the code stream via link 3 or from a storage device.
  • the decoder 23 decodes the received code stream.
  • the display device 25 is used to display the decoded data.
  • the display device 25 can be integrated with other devices of the decoding side device 2 or set up separately.
  • the display device 25 may be, for example, a liquid crystal display, a plasma display, an organic light emitting diode display or other types of display devices.
  • the decoding side device 2 may not include the display device 25 , or may include other devices or devices that apply decoded data.
  • the encoder 13 and decoder 23 of Figure 1 may be implemented using any one or any combination of the following circuits: one or more microprocessors, digital signal processors, application specific integrated circuits, field programmable gate arrays , discrete logic, hardware. If the present disclosure is implemented partly in software, the instructions for the software may be stored in a suitable non-volatile computer-readable storage medium and executed in hardware using one or more processors to thereby Implement the disclosed methods.
  • Figure 2 shows a structural block diagram of an exemplary video encoder.
  • the description is mainly based on the terminology and block division of the H.265/HEVC standard, but the structure of the video encoder can also be used for videos with H.264/AVC, H.266/VVC and other similar standards. coding.
  • the video encoder 20 is used to encode video data and generate a code stream.
  • video encoder 20 includes prediction processing unit 100, partitioning unit 101, prediction residual generation unit 102, transform processing unit 104, quantization unit 106, inverse quantization unit 108, inverse transform processing unit 110, reconstruction unit 112, Filter unit 113, decoded picture buffer 114, and entropy encoding unit 116.
  • the prediction processing unit 100 includes an inter prediction processing unit 121 and an intra prediction processing unit 126 .
  • video encoder 20 may include more, fewer, or different functional components than this example.
  • the prediction residual generation unit 102 and the reconstruction unit 112 are both represented by circles with a plus sign in the figure.
  • the dividing unit 101 cooperates with the prediction processing unit 100 to divide the received video data into slices, CTUs or other larger units.
  • the video data received by the dividing unit 101 may be a video sequence including video frames such as I frames, P frames, or B frames.
  • the prediction processing unit 100 may divide the CTU into CUs and perform intra prediction encoding or inter prediction encoding on the CUs.
  • the 2N ⁇ 2N CU can be divided into 2N ⁇ 2N or N ⁇ N prediction units (PU: prediction unit) for intra-frame prediction.
  • PU prediction unit
  • a 2N ⁇ 2N CU can be divided into PUs of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or other sizes for inter-frame prediction, and asymmetry of PUs can also be supported. divide.
  • the inter prediction processing unit 121 may perform inter prediction on the PU to generate prediction data of the PU, where the prediction data includes prediction blocks of the PU, motion information of the PU, and various syntax elements.
  • Intra-prediction processing unit 126 may perform intra-prediction on the PU to generate prediction data for the PU.
  • the prediction data of the PU may include the prediction block of the PU and various syntax elements.
  • Intra-prediction processing unit 126 may try multiple selectable intra-prediction modes, and select an intra-prediction mode with the lowest cost to perform intra-prediction for the PU.
  • Prediction residual generation unit 102 may generate a prediction residual block of the CU based on the original block of the CU and the prediction block of the CU-divided PU.
  • the transform processing unit 104 may divide the CU into one or more transform units (TU: Transform Unit), and the prediction residual block associated with the TU is a sub-block obtained by dividing the prediction residual block of the CU.
  • a TU-associated coefficient block is generated by applying one or more transforms to a TU-associated prediction residual block.
  • the transformation processing unit 104 may apply a discrete cosine transform (DCT: Discrete Cosine Transform), a directional transform, or other transformations to the TU-associated prediction residual block, and may convert the prediction residual block from the pixel domain to the frequency domain.
  • DCT discrete cosine transform
  • the quantization unit 106 can quantize the coefficients in the coefficient block based on the selected quantization parameter (QP).
  • QP quantization parameter
  • the quantization may bring quantitative losses (quantitative losses), and the degree of quantization of the coefficient block can be adjusted by adjusting the QP value.
  • Inverse quantization unit 108 and inverse transform unit 110 may respectively apply inverse quantization and inverse transform to the coefficient block to obtain a TU-associated reconstructed prediction residual block.
  • Reconstruction unit 112 may generate a reconstruction block of the CU based on the reconstructed prediction residual block and the prediction block generated by prediction processing unit 100 .
  • the filter unit 113 performs loop filtering on the reconstructed block and stores it in the decoded picture buffer 114.
  • Intra-prediction processing unit 126 may extract PU-neighbor reconstructed reference information from the reconstructed block cached in decoded picture buffer 114 to perform intra prediction on the PU.
  • Inter prediction processing unit 121 may perform inter prediction on PUs of other pictures using reference pictures buffered by decoded picture buffer 114 that contain reconstructed blocks.
  • the entropy coding unit 116 can perform entropy coding operations on the received data (such as syntax elements, quantized system blocks, motion information, etc.), such as context adaptive variable length coding (CAVLC: Context Adaptive Variable Length Coding), context adaptive Adaptive to binary arithmetic coding (CABAC: Context-based Adaptive Binary Arithmetic Coding), etc., output code stream (i.e. encoded video code stream).
  • CAVLC Context Adaptive Variable Length Coding
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • Figure 3 shows a structural block diagram of an exemplary video decoder.
  • the description is mainly based on the terminology and block division of the H.265/HEVC standard, but the structure of the video decoder can also be used for videos of H.264/AVC, H.266/VVC and other similar standards. decoding.
  • the video decoder 30 can decode the received code stream and output the decoded video data.
  • the video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transform processing unit 156, a reconstruction unit 158 (indicated by a circle with a plus sign in the figure), and a filter unit 159 , and picture buffer 160.
  • video decoder 30 may include more, fewer, or different functional components.
  • the entropy decoding unit 150 may perform entropy decoding on the received code stream, and extract information such as syntax elements, quantized coefficient blocks, and motion information of the PU.
  • the prediction processing unit 152, the inverse quantization unit 154, the inverse transform processing unit 156, the reconstruction unit 158 and the filter unit 159 may all perform corresponding operations based on syntax elements extracted from the code stream.
  • inverse quantization unit 154 may inversely quantize the quantized TU-associated coefficient block.
  • Inverse transform processing unit 156 may apply one or more inverse transforms to the inverse quantized coefficient block to produce a reconstructed prediction residual block of the TU.
  • Prediction processing unit 152 includes inter prediction processing unit 162 and intra prediction processing unit 164 . If the PU is encoded using intra prediction, intra prediction processing unit 164 may determine the intra prediction mode of the PU based on the syntax elements parsed from the code stream, based on the determined intra prediction mode and the PU neighboring information obtained from picture buffer device 60 Intra prediction is performed on the reconstructed reference information to generate the prediction block of the PU. If the PU is encoded using inter prediction, inter prediction processing unit 162 may determine one or more reference blocks for the PU based on the motion information of the PU and corresponding syntax elements, based on which a predictive block for the PU is generated.
  • Reconstruction unit 158 may obtain the reconstruction block of the CU based on the TU-associated reconstructed prediction residual block and the prediction block of the PU generated by prediction processing unit 152 (ie, intra prediction data or inter prediction data).
  • the filter unit 159 may perform loop filtering on the reconstruction block of the CU to obtain a reconstructed picture.
  • the reconstructed picture is stored in picture buffer 160.
  • the picture buffer 160 can provide reference pictures for subsequent motion compensation, intra-frame prediction, inter-frame prediction, etc., and can also output the reconstructed video data as decoded video data for presentation on the display device.
  • video coding includes two parts: encoding and decoding
  • the encoding on the encoder side and the decoding on the decoder side can also be collectively referred to as encoding or decoding.
  • encoding or decoding Based on the context of the relevant steps, those skilled in the art can know whether the encoding (decoding) mentioned later refers to encoding on the encoder side or decoding on the decoder side.
  • coding block or "video block” may be used in this application to refer to one or more blocks of samples, and the syntax structure for encoding (decoding) one or more blocks of samples; the coding block or the video block Example types may include CTU, CU, PU, TU, subblock in H.265/HEVC, CTU, CU in H.266/VVC, or macroblocks, macroblock partitions, etc. in other video codec standards.
  • CTU is the abbreviation of Coding Tree Unit, which is the coding processing unit in H.265/HEVC or H.266/VVC, equivalent to the macroblock in H.264/AVC.
  • a coding tree unit should include a luma coding tree block (CTB) and two chroma coding tree blocks (CTB) (Cr, Cb) at the same position.
  • Coding unit CU is the basic unit for performing various types of encoding operations or decoding operations in the video encoding and decoding process, such as CU-based prediction, transformation, entropy coding and other operations.
  • CU refers to a two-dimensional sampling point array, which can be a square array or a rectangular array.
  • a 4x8 CU can be viewed as a 4x8 square sampling point array composed of a total of 32 sampling points.
  • CU can also be called image block.
  • CU includes: one luminance coding block and two chroma (Cr, Cb) coding blocks, and related syntax structures.
  • Prediction Unit also called prediction block, includes: one luma prediction block and two chroma (Cr, Cb) prediction blocks.
  • the residual block refers to the residual image block formed by subtracting the prediction block from the current block to be encoded after the prediction block of the current block is generated through inter-frame prediction and/or intra-frame prediction. It can also be called the residual block.
  • the difference data or residual image includes: one luminance residual block and two chrominance (Cr, Cb) residual blocks.
  • the coefficient block includes transforming the residual block to obtain a transform block (TU, Transform Unit) containing transform coefficients, or the residual block is not transformed, including a residual block containing residual data (residual signal).
  • the coefficients include coefficients of the transform block obtained by transforming the residual block, or coefficients of the residual block, and entropy coding of the coefficients includes entropy coding of the coefficients of the transform block after quantization, or, if not Applying a transform to the residual data involves entropy coding the quantized coefficients of the residual block.
  • the untransformed residual signal and the transformed residual signal may also be collectively referred to as coefficients. for effective compression.
  • coefficients need to be quantized, and the quantized coefficients can also be called levels.
  • the transformation block TU includes: one brightness transformation block and two chroma (Cr, Cb) transformation blocks.
  • Quantization is usually used to reduce the dynamic range of coefficients, thereby achieving the purpose of expressing video with fewer code words.
  • the quantized value is usually called a level.
  • the quantization operation is usually divided by the quantization step size by the coefficient, and the quantization step size is determined by the quantization factor passed in the code stream. Inverse quantization is accomplished by multiplying the level by the quantization step size. For an N ⁇ M size block, the quantization of all coefficients can be completed independently.
  • This technology is widely used in many international video compression standards, such as H.265/HEVC, H.266/VVC, etc.
  • a specific scanning sequence can transform a two-dimensional block of coefficients into a one-dimensional coefficient stream.
  • the scanning order can be Z-shaped, horizontal, vertical or any other order of scanning.
  • the quantization operation can make use of the correlation between coefficients and the characteristics of the quantized coefficients to select a better quantization method, thereby achieving the purpose of optimizing quantization.
  • the residual block is usually much simpler than the original image, so determining the residual after prediction and then encoding can significantly improve the compression efficiency.
  • the residual block is not directly encoded, but is usually transformed first. Transformation is to transform the residual image from the spatial domain to the frequency domain and remove the correlation of the residual image. After the residual image is transformed into the frequency domain, since the energy is mostly concentrated in the low-frequency region, most of the transformed non-zero coefficients are concentrated in the upper left corner. After transformation, quantization is used for further compression. And since the human eye is insensitive to high frequencies, larger quantization step sizes can be used in high-frequency areas to further improve compression efficiency.
  • the CTU can be further divided into multiple CUs.
  • H.266/VVC adopts a more complex coding unit division structure (QTMT, a quadtree nested with multiple types of trees) than H.265/HEVC.
  • QTMT complex coding unit division structure
  • HEVC quadtree On the basis of QT
  • BT binary tree
  • TT ternary tree
  • CTU is first divided using a quadtree, and then the leaf nodes of the quadtree can be further divided using MT.
  • the process of CU partitioning is shown in Figure 5.
  • the default CTU size of VVC is 128*128, and the minimum CU size is 4*4; the CTU is first divided into 4 sub-CUs using the QT method by default; once a CU uses the MT division method, QT division cannot be performed subsequently.
  • QT nodes can be divided according to the five ways in Figure 5, and MT nodes can be divided according to the four ways in the figure.
  • QTMT division results of a CTU are shown in Figure 6.
  • the QTMT block division is located in the intra/inter prediction processing unit. According to different block divisions, the corresponding reference blocks are found for prediction, and the division mode with the minimum rate distortion cost is found (that is, as described in the embodiment of the present disclosure) The block division method is optimized to determine the corresponding optimal division method), thereby obtaining the final prediction residual, and then performing the next step of transformation and quantization to complete the block coding.
  • QTMT block division is located in the intra/inter prediction processing unit, and the division tree of the current CU is determined based on prediction information such as block division mode to further complete subsequent decoding steps.
  • encoding is performed according to the following methods:
  • the input image is divided into multiple non-overlapping CU blocks (i.e., the largest CU block: CTU);
  • the current CU can continue to be divided, repeat the process (2) for the divided sub-CUs; otherwise, end the current CU encoding, determine the optimal block division method and optimal prediction mode of the current CU, and calculate the residual
  • the residual block is transformed, quantized, and entropy encoded, and prediction information such as block division mode is encoded, and the output code stream is waiting for transmission.
  • decoding is performed according to the following method: first, the input code stream is entropy decoded, inverse quantized, and inverse transformed to obtain the residual value; then, the image is reconstructed based on the residual block.
  • the CU reconstruction process mainly Contains the following 3 steps:
  • the segmentation of text is more likely to be horizontal segmentation.
  • the relevant QTMT technical solution that can be implemented may generate a lot of unnecessary overhead related to vertical segmentation.
  • Embodiments of the present disclosure provide a video encoding scheme. According to the spatial distribution characteristics of pixels in the image corresponding to the data block to be encoded (also known as the texture characteristics or texture characteristics of the image), unnecessary blocks are skipped when performing related block division processing. Blocking method to reduce encoding overhead, save computing resources, and shorten encoding time.
  • Embodiments of the present disclosure provide a video coding solution, as shown in Figure 7, including:
  • Step 710 Determine the first direction block division identifier corresponding to the data block to be encoded according to the texture characteristics corresponding to the data block to be encoded;
  • Step 720 When the first direction block division flag is a first value, the first direction block division method is not used when performing block division processing of the data block to be encoded.
  • the block division process of the data block to be encoded does not use the first direction block division method, including:
  • the available division methods do not include the first direction block division method.
  • optimizing the available division methods for the data block to be encoded includes: using at least one available division method to attempt block division of the current data block to be encoded, and selecting the method with the smallest rate distortion cost as the optimal division method; Alternatively, a division method with a rate-distortion rate smaller than a set rate-distortion threshold is selected as the optimal division method.
  • the specific preferred treatment process is implemented according to relevant solutions, and specific aspects will not be discussed in detail in the embodiments of this application.
  • the block division scheme provided by embodiments of the present disclosure further considers video data blocks within the range of available division methods that have been determined in the relevant achievable block division schemes.
  • Corresponding texture features when trying multiple partitioning methods for video data blocks that meet the set conditions, skip the horizontal or vertical block division method to reduce coding complexity and improve coding efficiency.
  • the first value is FALSE or 0; or other set values.
  • the first direction includes: a horizontal direction or a vertical direction.
  • the first direction includes: a tilt direction of a set angle. For example, tilt vertically to the left at 45 degrees, tilt vertically at 30 degrees to the right, etc.; not limited to specific angles.
  • the texture feature includes a first texture feature value
  • the first texture feature value is determined according to the following method:
  • the maximum gradient among the gradients of all components of the single column pixel vector is determined to be the first texture feature value.
  • the first texture feature value is determined according to the following method:
  • the maximum gradient among the gradients of all components of the single row pixel vector is determined to be the first texture feature value.
  • the first direction block division identifier is determined according to the following method:
  • the first direction block division identifier is determined to be a first value.
  • the first direction block division flag includes: a horizontal binary tree block division allowed flag and a horizontal ternary tree block division allowed flag;
  • the first direction block division flag includes: a vertical binary tree block division allowed flag and a vertical ternary tree block division allowed flag.
  • the allowed horizontal binary tree block division flag is allowSplitBtHor defined in the H.266/VVC specification.
  • the allowed horizontal triple tree block partitioning identifies allowSplitTtHor defined in the H.266/VVC specification.
  • the allowed vertical binary tree block division flag is allowSplitBtVer defined in the H.266/VVC specification.
  • the allowed vertical triple tree block partitioning identifies allowSplitTtVer defined in the H.266/VVC specification.
  • the first direction when the first direction is the horizontal direction, when the first texture feature value corresponding to the video data block to be encoded is less than the first feature threshold, it indicates that the video data block corresponds to The single line of text characters contained in the image occupies the vertical direction of the entire data block, so there is no need to divide the video data block in the horizontal direction; when the first direction is the vertical direction, the determined text feature value corresponding to the video data block to be encoded is less than In the case of the first feature threshold, it indicates that the single column of text characters contained in the image corresponding to the video data block occupies the entire horizontal direction of the data block, and there is no need to divide the video data block in the vertical direction.
  • the texture features include: a second texture feature value and a third texture feature value
  • the second texture feature value and the third texture feature value are determined according to the following method:
  • the maximum value among the gradients of the last N-M components in the single column pixel vector is determined to be the third texture feature value.
  • the second texture feature value and the third texture feature value are determined according to the following method:
  • the maximum value among the gradients of the last N-M components in the single row of pixel vectors is determined to be the third texture feature value.
  • the first direction block division identifier is determined according to the following method:
  • the first direction block division identifier is determined to be the first value.
  • the first direction block division flag includes: a horizontal binary tree block division allowed flag
  • the first direction block division flag includes: a vertical binary tree block division flag is allowed.
  • the allowed horizontal binary tree block division flag is allowSplitBtHor defined in the H.266/VVC specification.
  • the allowed vertical binary tree block division flag is allowSplitBtVer defined in the H.266/VVC specification.
  • the embodiment of the present disclosure records a solution in which the first direction block division identifier is determined to be FALSE; the first direction block division identifier is FALSE indicates that the horizontal block division method is not used when performing block division processing of the data block to be encoded.
  • the first direction block division identifier is FALSE indicates that the horizontal block division method is not used when performing block division processing of the data block to be encoded.
  • the embodiment of the present disclosure records a solution in which the first direction block division flag is determined to be FALSE; if the first direction block division flag is FALSE, the instruction is to proceed
  • the vertical block division method is not used in the block division processing of the data block to be encoded.
  • whether the data block to be encoded is determined to have its first direction block division flag as TRUE still needs to be judged according to other constraints in the relevant scheme, and the final judgment result is Can be FALSE or TRUE.
  • the specific judgments of other constraints in related plans are not given in detail here.
  • the first direction is another direction, a similar judgment scheme is also applicable, and no detailed examples are provided here.
  • the y-th component line(y) in the single column pixel vector Line is calculated according to the following formula:
  • CU(x,y) is the pixel value of the video data block to be encoded at position (x, y), and CUWidth is the width of the video data block to be encoded.
  • the gradient of each component in the single column pixel vector is calculated according to the following formula:
  • G(y) is the gradient of the y-th component in the single column pixel vector.
  • the first texture feature value MG is determined based on the maximum gradient value of each component in a single column vector:
  • the gradient of each component of the single column vector is calculated, and then the respective maximum gradient value MG is determined as the first texture feature value.
  • the first texture feature value of the upper video data block in Figure 8 is less than the first feature threshold of 80, then when the video data block is optimally processed for block division, the horizontal binary tree and horizontal ternary tree block division attempts are skipped;
  • the first texture feature value of the lower video data block is greater than the first feature threshold of 80, then block division is attempted according to the existing solution.
  • the first feature threshold is 80; then when the first texture feature value MG of the data block to be encoded ⁇ the first feature threshold (80), the first direction is determined
  • the chunking flag is FALSE.
  • the horizontal binary tree blocking method and the horizontal ternary tree blocking method are skipped.
  • the first characteristic threshold is set to other values, which is not limited to the above example.
  • the following is added to the judgment conditions for determining the allowed horizontal binary split flag (allowSplitBtHor) in the allowed binary split process:
  • the allowed binary tree splitting flag (allowBtSplit) is FALSE.
  • the following is added to the judgment conditions for determining the allowed horizontal ternary split flag (allowSplitTtHor) in the allowed ternary split process:
  • the allowed ternary tree splitting flag (allowTtSplit) is FALSE.
  • the following is added to the judgment conditions for determining the allowed vertical binary tree split flag (allowSplitBtVer) in the allowed binary split process (allowed binary split process):
  • the allowed binary tree splitting flag (allowBtSplit) is FALSE.
  • the following is added to the judgment conditions for determining the allowed vertical ternary split flag (allowSplitTtVer) in the allowed ternary split process:
  • the allowed ternary tree splitting flag (allowTtSplit) is FALSE.
  • the above embodiment describes determining the first direction block based on the first texture feature value of the image corresponding to the video data block to be encoded.
  • the judgment conditions for determining the allowed binary tree split identifier (allowBtSplit) in the allowed binary split process also include other conditions, and any one or more of all conditions are met ( is TRUE), the allowed binary tree splitting flag (allowBtSplit) is all FALSE, correspondingly, the allowed horizontal binary tree splitting flag (allowSplitBtHor) is all FALSE; when all conditions are not met, the allowed binary tree splitting flag (allowBtSplit) is TRUE. , Correspondingly, the horizontal binary tree splitting flag (allowSplitBtHor) must be TRUE.
  • the allowed horizontal ternary tree splitting flag (allowSplitTtHor) and the allowed ternary tree splitting flag (allowTtSplit) are also determined in a similar manner and will not be discussed in detail here.
  • the x-th component line(x) in the single-line pixel vector Line is calculated according to the following formula:
  • CU(x,y) is the pixel value of the video data block to be encoded at position (x, y), and CUHeight is the height of the video data block to be encoded.
  • G(x) is the gradient of the x-th component in the single-row pixel vector.
  • the first texture feature value MG is determined based on the maximum gradient value of each component in a single row vector:
  • the second texture feature value MG1 and the third texture feature value MG2 are determined according to the gradient value of each component in a single column vector as follows:
  • N CUHeight is the height of the video data block to be encoded
  • M CUHeight/2.
  • N CUWidth is the width of the video data block to be encoded
  • M CUWidth/2.
  • the solution in which the first direction block division identifier is determined as the first value based on the second texture feature value and the third texture feature value can more accurately filter out the candidates that do not require binary tree division in the first direction. Encoded data block.
  • the second feature threshold is 80; then when the second texture feature value MG1 of the data block to be encoded ⁇ the second feature threshold (80), and the third texture feature value MG2 ⁇ In the case of two feature thresholds (80), the first direction block division identifier is determined to be the first value. In an embodiment of the present disclosure, when performing optimization processing in a multi-partitioning manner, the first direction binary tree block is skipped.
  • the second characteristic threshold is set to other values, which is not limited to the above example.
  • the first characteristic threshold and the second characteristic threshold are set independently and may be the same or different.
  • the following is added to the judgment conditions for determining the allowed horizontal binary split flag (allowSplitBtHor) in the allowed binary split process:
  • the binary tree split flag (allowBtSplit) is allowed to be FALSE.
  • the following is added to the judgment conditions for determining the allowed horizontal ternary split flag (allowSplitTtHor) in the allowed ternary split process:
  • the allowed ternary tree splitting flag (allowTtSplit) is FALSE.
  • the following is added to the judgment conditions for determining the allowed vertical binary tree split flag (allowSplitBtVer) in the allowed binary split process (allowed binary split process):
  • the binary tree split flag (allowBtSplit) is allowed to be FALSE.
  • the following is added to the judgment conditions for determining the allowed vertical ternary split flag (allowSplitTtVer) in the allowed ternary split process:
  • the allowed ternary tree splitting flag (allowTtSplit) is FALSE.
  • the judgment conditions for determining the allowed binary tree split identifier (allowBtSplit) in the allowed binary split process also include other conditions, and any one or more of all conditions are met ( is TRUE), the allowed binary tree splitting flag (allowBtSplit) is all FALSE, correspondingly, the allowed horizontal binary tree splitting flag (allowSplitBtHor) is all FALSE; when all conditions are not met, the allowed binary tree splitting flag (allowBtSplit) is TRUE. , Correspondingly, the horizontal binary tree splitting flag (allowSplitBtHor) must be TRUE.
  • the allowed horizontal ternary tree splitting flag (allowSplitTtHor) and the allowed ternary tree splitting flag (allowTtSplit) are also determined in a similar manner and will not be discussed in detail here.
  • An embodiment of the present disclosure provides a video encoding method, as shown in Figure 9, including:
  • Step 910 According to the method described in steps 710 and 720, perform optimization of available division methods on the data block to be encoded, determine the optimal division method corresponding to the current block division depth of the data block to be encoded, and determine whether to continue to the next step. division of depth;
  • Step 920 If it is determined not to continue dividing at the next depth, after dividing the data block to be encoded according to the determined preferred dividing method, complete the subsequent encoding steps of the data block to be encoded.
  • the video encoding method further includes:
  • Step 930 When it is determined to continue the division to the next depth, after dividing the data block to be encoded according to the determined preferred division method, the divided data sub-blocks to be encoded are sequentially divided to the next depth. Block partitioning and encoding steps.
  • step 910 after the first direction block division identifier is determined according to the encoding method according to any embodiment of the present disclosure, when the first direction block division identifier is a first value, for the When the data block to be encoded is selected for the available division methods, the first direction block division method is not used; that is, when the available division method is selected for the data block to be encoded, the available division method does not include the third One-way block division method.
  • step 910 includes:
  • the available division methods when the available division methods are optimized for the data block to be encoded, the available division methods do not include the first direction block division method. ;
  • step 910 is performed again for each data sub-block. It can be seen that when each data sub-block is executed again from step 910, its corresponding block division depth is increased by 1.
  • Optimizing the available division methods for the data block to be encoded, and determining the optimal division method corresponding to the current block division depth includes: using at least one available division method to try block division of the current data block to be encoded, and selecting the one with the smallest rate distortion cost The method is the preferred dividing method; or, a dividing method with a rate-distortion rate smaller than the set rate-distortion threshold is selected as the preferred dividing method.
  • the specific preferred treatment process is implemented according to relevant solutions, and specific aspects will not be discussed in detail in the embodiments of this application.
  • the encoding method provided by the embodiments of the present disclosure is used, and after being implemented on the VVC reference software VTM11.0, text-based screen content videos (WebBrowsing, WordEditing, ChineseDocumentEditing, etc.) are tested in intra-frame mode.
  • text-based screen content videos WebBrowsing, WordEditing, ChineseDocumentEditing, etc.
  • the average encoding performance loss is 0.71%
  • the encoding time is reduced by 12.6% on average.
  • the data shows that the embodiments of the present disclosure can effectively save encoding time without substantially reducing encoding performance. That is, for text-based screen content videos, this technology can significantly reduce coding complexity while maintaining coding performance that is basically equivalent to that of related technologies.
  • the video encoding scheme analyzes the spatial distribution of pixels in the image corresponding to the video data block to determine whether the texture features contained therein do not require block division processing in the first direction. For example, if a single line of words occupies the vertical direction of the entire block, it indicates that horizontal division is carried out. Each horizontal sub-block is only a part of the word, and the possibility of finding similar matching blocks is small. Therefore, it is determined that this type of data block meets the requirements and does not perform horizontal division.
  • Block conditions when performing block partitioning related processing for this data block, horizontal division is not required. For example, when performing the optimization process of available division methods, the available division methods do not include the horizontal block division method, that is, attempts to skip the horizontal block division method are skipped. This saves computing resources and shortens coding time.
  • the embodiment of the present disclosure illustrates a comparison scheme based on MG and the first characteristic threshold, a comparison scheme based on MG1, MG2, and the second characteristic threshold, and a calculation scheme based on MG, MG1, and MG2.
  • other equivalent deformation judgment schemes can also be used to determine that the spatial distribution of pixels in the image corresponding to the video data block is not suitable for division in the first direction, and then these unsuitable parts can be skipped in subsequent block processing. Try chunking.
  • An embodiment of the present disclosure also provides a video encoding device, as shown in Figure 11, including a processor and a memory storing a computer program that can run on the processor, wherein the processor executes the computer program
  • the program implements the video encoding method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, it implements any implementation of the present disclosure.
  • the video encoding method described in the example is not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, the computer program, and the present disclosure.
  • An embodiment of the present disclosure also provides a code stream, wherein the code stream is generated according to the video encoding method described in any embodiment of the present disclosure.
  • the block division and encoding and decoding methods analyze the pixel spatial distribution of the data block image to be encoded to determine the corresponding texture features. According to the texture features, when performing block division related processing, skip Unsuitable block division methods can effectively reduce the amount of calculation on the encoding side and shorten the encoding time.
  • Computer-readable media may include computer-readable storage media that corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, such as according to a communications protocol.
  • Computer-readable media generally may correspond to non-transitory, tangible computer-readable storage media or communication media such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure.
  • a computer program product may include computer-readable media.
  • Such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory or may be used to store instructions or data. Any other medium that stores the desired program code in the form of a structure and that can be accessed by a computer.
  • any connection is also termed a computer-readable medium if, for example, a connection is sent from a website, server, or using any of the following: coaxial cable, fiber-optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave or other remote source transmits instructions, then coaxial cable, fiber optic cable, twin-wire, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium.
  • coaxial cable, fiber optic cable, twin-wire, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium.
  • disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, or Blu-ray discs. Disks usually reproduce data magnetically, while optical discs use lasers to reproduce data. Regenerate data optically. Combinations of the above should also be included within the scope of computer-readable media.
  • processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the structures described above or any other structure suitable for implementing the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec.
  • the techniques may be implemented entirely in one or more circuits or logic elements.
  • inventions of the present disclosure may be implemented in a wide variety of devices or equipment, including wireless handsets, integrated circuits (ICs), or a set of ICs (eg, chipsets).
  • ICs integrated circuits
  • a set of ICs eg, chipsets.
  • Various components, modules or units are depicted in embodiments of the present disclosure to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require implementation by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperating hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供一种视频编码方法、设备、存储介质及码流。其中,所述视频编码方法,包括:根据待编码数据块对应的纹理特征,确定所述待编码数据块对应的第一方向分块划分标识;在所述第一方向分块划分标识为第一取值的情况下,进行所述待编码数据块的块划分处理时不使用第一方向分块划分方式。本公开实施例提供的视频编码方案,根据视频数据块中图像的纹理特征,筛选出不适合的分块划分方式,在视频编码中进行块划分相关处理时跳过这些不适合的分块划分方式,能够有效减小编码端计算量,缩短编码时间。

Description

一种视频编码方法、设备、存储介质及码流 技术领域
本公开实施例涉及但不限于视频数据处理技术领域,尤其设及一种视频数编码方法、设备、存储介质及码流。
背景技术
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够节省不少视频数据,但目前仍然需要追求更好的数字视频压缩技术,以减少数字视频传输的带宽和流量压力,达到更高效的视频编解码和传输存储。
为了提供最优的视频数据压缩结果,编码端往往需要在多种具体的可用配置方案下进行编码尝试并进行优选,因此,在满足视频数据播放及传输要求的前提下,在编码技术领域一方面要寻求更优的视频压缩技术方案,另一方面也需要兼顾编码效率。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本公开实施例提供一种视频编码方法,包括:
根据待编码数据块对应的纹理特征,确定所述待编码数据块对应的第一方向分块划分标识;
在所述第一方向分块划分标识为第一取值的情况下,进行所述待编码数据块的块划分处理时不使用第一方向分块划分方式。
本公开实施例还提供一种视频编码设备,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开实施例还提供一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的视频编码方法。
本公开实施例还提供一种码流,其中,所述码流根据如本公开任一实施例所述的视频编码方法生成。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
附图用来提供对本公开实施例的理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。
图1是可用于本公开实施例的一种视频编解码系统的结构框图;
图2是可用于本公开实施例的一种视频编码器的结构框图;
图3是可用于本公开实施例的一种视频解码器的结构框图;
图4是可用于本公开实施例中涉及的多类型树示意图;
图5是可用于本公开实施例的一种块划分流程示意图;
图6是可用于本公开实施例的一种QTMT块划分结果示意图;
图7是可用于本公开实施例的一种视频编码方法的流程图;
图8是可用于本公开实施例的一种第一纹理特征值计算示意图;
图9是可用于本公开实施例的另一种视频编码方法流程图;
图10是可用于本公开实施例的另一种视频编码方法流程图;
图11是可用于本公开实施例的一种视频编码设备结构示意图。
具体实施方式
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。
本公开中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性的”或者“例如”的任何实施例不应被解释为比其他实施例更优选或更具优势。
在描述具有代表性的示例性实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
目前通用的视频编解码标准都采用基于块的混合编码框架。视频中的每一帧被分割成相同大小(如128x128,64x64等)的正方形的最大编码单元(LCU largest coding unit)或编码树单元(CTU Coding Tree Unit)。每个最大编码单元或编码树单元可根据规则划分成矩形的编码单元(CU coding unit)。编码单元可能还会划分预测单元(PU prediction unit),变换单元(TU transform unit)等。混合编码框架包括预测(prediction)、变换(transform)、量化(quantization)、熵编码(entropy coding)、环路滤波(in loop filter)等模块。预测模块包括帧内预测(intra prediction)和帧间预测(inter prediction)。帧间预测包括运动估计(motion estimation)和运动补偿(motion compensation)。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
国际上,主流的视频编解码标准包括H.264/Advanced Video Coding(高级视频编码,AVC),H.265/High Efficiency Video Coding(高效视频编码,HEVC),H.266/Versatile Video Coding(多功能视频编码,VVC),MPEG(Moving Picture Experts Group,动态图像专家组),AOM(开放媒体联盟,Alliance for Open Media),AVS(Audio Video coding Standard,音视频编码标准)以及这些标准的拓展,或任何自定义的其他标准等,这些标准通过视频压缩技术减少传输的数据量和存储的数据量,以达到更高效的视频编解码和传输储存。
在H.264/AVC中,将输入图像划分成固定的尺寸的块作为编码的基本单元,并把它称为宏块(MB,Macro Block),包括一个亮度块和两个色度块,亮度块大小为16×16。如果采用4:2:0采样,色度块大小为亮度块大小的一半。在预测环节,根据预测模式的不同,将宏块进一步划分为用于预测的小块。帧内预测中可以把宏块划分成16×16、8×8、4×4的小块,每个小块分别进行帧内预测。在变换、量化环节,将宏块划分为4×4或8×8的小块,将每个小块中的预测残差分别进行变换和量化,得到量化后的系数。
H.265/HEVC与H.264/AVC相比,在多个编码环节采取了改进措施。在H.265/HEVC中,一幅图像被分割成编码树单元(CTU,Coding Tree Unit),CTU是编码的基本单元(对应于H.264/AVC中的宏块)。一个CTU包含一个亮度编码树块(CTB,Coding Tree Block)和两个色度编码树块,H.265/HEVC标准中CU的最大尺寸一般为64×64。为了适应多种多样的视频内容和视频特征,CTU采用四叉树(QT,Quadro Tree)方式迭代划分为一系列编码单元(CU,Coding Unit),CU是帧内/帧间编码的基本单元。一个CU包含一个亮度编码块(CB,Coding Block)和两个色度编码块及相关语法结构,最大CU大小为CTU,最小CU大小为8×8。经过编码树划分得到的叶子节点CU根据预测方式的不同,可分为三种类型:帧内预测的intra CU、帧间预测的inter CU和skipped CU。skipped CU可以看作是inter CU的特例,不包含运动信息和预测残差信息。叶子节点CU包含一个或者多个预测单元(PU,Prediction Unit),H.265/HEVC支持4×4到64×64大小的PU,一共有八种划分模式。对于帧内编码模式,可能的划分模式有两种:Part_2Nx2N和Part_NxN。对于预测残差信号,CU采用预测残差四叉树划分为变换单元(TU:Transform Unit)。一个TU包含一个亮度变换块(TB,Transform Block)和两个色度变换块。仅允许方形的划分,将一个CB划分为1个或者4个PB。同一个TU具有相同的变换和量化过程,支持的大小为4×4到32×32。与之前的编码标准不同,在帧间预测中,TB可以跨越PB的边界,以进一步最大化帧间编码的编码效率。
在H.266/VVC中,视频编码图像首先划分为跟H.265/HEVC相似的编码树单元CTU,但是最大尺寸从64×64提高到了128×128。H.266/VVC提出了四叉树和嵌套多类型树(MTT,Multi-Type Tree)划分,MTT包括二叉树(BT,Binary Tree)和三叉树(TT,Ternary Tree),且统一了H.265/HEVC中CU、PU、TU的概念,并且支持更灵活的CU划分形状。CTU按照四叉树结构进行划分,叶子节点通过MTT进一步划分。多类型树叶子节点成为编码单元CU,当CU不大于最大变换单元(64×64)时,后续预测和变换不会再进一步划分。大部分情况下CU、PU、TU具有相同的大小。考虑到亮度和色度的不同特性和具体实现的并行度,H.266/VVC中,色度可以采用单独的划分树结构,而不必和亮度划分树保持一致。H.266/VVC中I帧的色度划分采用色度分离树,P帧和B帧色度划分则与亮度划分保持一致。
图1为可用于本公开实施例的一种视频编解码系统的框图。如图1所示,该系统分为编码侧装置1和解码侧装置2,编码侧装置1对视频图像进行编码产生码流。解码侧装置2可对码流进行解码,得到重建的视频图像。编码侧装置1和解码侧装置2可包含一个或多个处理器以及耦合到所述一个或多个处理器的存储器,如随机存取存储器、带电可擦可编程只读存储器、快闪存储器或其它媒体。编码侧装置1和解码侧装置2可以用各种装置实现,如台式计算机、移动计算装置、笔记本电脑、平板计算机、机顶盒、电视机、相机、显示装置、数字媒体播放器、车载计算机或其他类似的装置。
解码侧装置2可经由链路3从编码侧装置1接收码流。链路3包括能够将码流从编码侧装置1移动到解码侧装置2的一个或多个媒体或装置。在一个示例中,链路3包括使得编码侧装置1能够将码流直接发送到解码侧装置2的一个或多个通信媒体。编码侧装置1可根据通信标准(例如无线通信协议)来调制码流,且可将经调制的码流发送到解码侧装置2。所述一个或多个通信媒体可包含无线和/或有线通信媒体,例如射频(radio frequency,RF)频谱或一个或多个物理传输线。所述一个或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一个或多个通信媒体可包含路由器、交换器、基站或促进从编码侧装置1到解码侧装置2的通信的其它设备。在另一示例中,也可将码流从输出接口15输出到一个存储装置,解码侧装置2可经由流式传输或下载从该存储装置读取所存储的数据。该存储装置可包含多种分布式存取或本地存取的数据存储媒体中的任一种,例如硬盘驱动器、蓝光光盘、数字多功能光盘、只读光盘、快闪存储器、易失性或非易失性存储器、文件服务器等等。
在图1所示的示例中,编码侧装置1包含数据源11、编码器13和输出接口15。在一些示例中。数据源11可包括视频捕获装置(例如,摄像机)、含有先前捕获的数据的存档、用以从内容提供者接收数据的馈入接口,用于产生数据的计算机图形系统,或这些来源的组合。编码器13可对来自数据源11的数据进行编码后输出到输出接口15,输出接口15可包含调节器、调制解调器和发射器中的至少之一。
在图1所示的示例中,解码侧装置2包含输入接口21、解码器23和显示装置25。在一些示例中,输入接口21包含接收器和调制解调器中的至少之一。输入接口21可经由链路3或从存储装置接收码流。解码器23对接收的码流进行解码。显示装置25用于显示解码后的数据,显示装置25可与解码侧装置2的其他装置集成在一起或者单独设置。显示装置25例如可以是液晶显示器、等离子显示器、有机发光二极管显示器或其它类型的显示装置。在其他示例中,解码侧装置2也可以不包含所述显示装置25,或者包含应用解码后数据的其他装置或设备。
图1的编码器13和解码器23可使用以下电路中的任意一种或者以下电路的任意组合来实现:一个或多个微处理器、数字信号处理器、专用集成电路、现场可编程门阵列、离散逻辑、硬件。如果部分地以软件来实施本公开,那么可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本公开方法。
图2所示是一种示例性的视频编码器的结构框图。在该示例中,主要基于H.265/HEVC标准的术语和块划分方式进行描述,但该视频编码器的结构也可以用于H.264/AVC、H.266/VVC及其他类似标准的视频编码。
如图所示,视频编码器20用于对视频数据编码,生成码流。如图所示,视频编码器20包含预测处理单元100、划分单元101、预测残差产生单元102、变换处理单元104、量化单元106、反量化单元108、反变换处理单元110、重建单元112、滤波器单元113、已解码图片缓冲器114,以及熵编码单元116。预测处理单元100包含帧间预测处理单元121和帧内预测处理单元126。在其他实施例中,视频编码器20可以包含比该示例更多、更少或不同功能组件。预测残差产生单元102和重建单元112在图中均用带加号的圆圈表示。
划分单元101与预测处理单元100配合将接收的视频数据划分为切片(Slice)、CTU或其它较大的单元。划分单元101接收的视频数据可以是包括I帧、P帧或B帧等视频帧的视频序列。
预测处理单元100可以将CTU划分为CU,对CU执行帧内预测编码或帧间预测编码。对CU做帧内编码时,可以将2N×2N的CU划分为2N×2N或N×N的预测单元(PU:prediction unit)进行帧内预测。对CU做帧间预测时,可以将2N×2N的CU划分为2N×2N、2N×N、N×2N、N×N或其他大小的PU进行帧间预测,也可以支持对PU的不对称划分。
帧间预测处理单元121可对PU执行帧间预测,产生PU的预测数据,所述预测数据包括PU的预测块、PU的运动信息和各种语法元素。
帧内预测处理单元126可对PU执行帧内预测,产生PU的预测数据。PU的预测数据可包含PU的预测块和各种语法元素。帧内预测处理单元126可尝试多种可选择的帧内预测模式,从中选取代价最小的一种帧内预测模式来执行对PU的帧内预测。
预测残差产生单元102可基于CU的原始块和CU划分的PU的预测块,产生CU的预测残差块。
变换处理单元104可将CU划分为一个或多个变换单元(TU:Transform Unit),TU关联的预测残差块是CU的预测残差块划分得到的子块。通过将一种或多种变换应用于TU关联的预测残差块来产生TU关联的系数块。例如,变换处理单元104可将离散余弦变换(DCT:Discrete Cosine Transform)、方向性变换或其他的变换应用于TU关联的预测残差块,可将预测残差块从像素域转换到频域。
量化单元106可基于选定的量化参数(QP)对系数块中的系数进行量化,量化可能会带来量化损失(quantitative losses),通过调整QP值可以调整对系数块的量化程度。
反量化单元108和反变换单元110可分别将反量化和反变换应用于系数块,得到TU关联的重建预测残差块。
重建单元112可基于所述重建预测残差块和预测处理单元100产生的预测块,产生CU的重建块。
滤波器单元113对所述重建块执行环路滤波后存储在已解码图片缓冲器114中。帧内预测处理单元126可以从已解码图片缓冲器114缓存的重建块中提取PU邻近的已重建参考信息以对PU执行帧内预测。帧间预测处理单元121可使用已解码图片缓冲器114缓存的含有重建块的参考图片对其他图片的PU执行帧间预测。
熵编码单元116可以对接收的数据(如语法元素、量化后的系统块、运动信息等)执行熵编码操作,如执行上下文自适应可变长度编码(CAVLC:Context Adaptive Variable Length Coding)、上下文自适应二进制算术编码(CABAC:Context-based Adaptive Binary Arithmetic Coding)等,输出码流(即已编码视频码流)。
图3所示是一种示例性的视频解码器的结构框图。在该示例中,主要基于H.265/HEVC标准的术语和块划分方式进行描述,但该视频解码器的结构也可以用于H.264/AVC、H.266/VVC及其他类似标准的视频解码。
视频解码器30可对接收的码流解码,输出已解码视频数据。如图所示,视频解码器30包含熵解码单元150、预测处理单元152、反量化单元154、反变换处理单元156、重建单元158(图中用带加号的圆圈表示)、滤波器单元159,以及图片缓冲器160。在其它实施例中,视频解码器30可以包含更多、更少或不同的功能组件。
熵解码单元150可对接收的码流进行熵解码,提取语法元素、量化后的系数块和PU的运动信息等信息。预测处理单元152、反量化单元154、反变换处理单元156、重建单元158以及滤波器单元159均可基于从码流提取的语法元素来执行相应的操作。
作为执行重建操作的功能组件,反量化单元154可对量化后的TU关联的系数块进行反量化。反变换处理单元156可将一种或多种反变换应用于反量化后的系数块以便产生TU的重建预测残差块。
预测处理单元152包含帧间预测处理单元162和帧内预测处理单元164。如果PU使用帧内预测编码,帧内预测处理单元164可基于从码流解析出的语法元素确定PU的帧内预测模式,根据确定的帧内预测模式和从图片缓冲器件60获取的PU邻近的已重建参考信息执行帧内预测,产生PU的预测块。如果PU使用帧间预测编码,帧间预测处理单元162可基于PU的运动信息和相应的语法元素来确定PU的一个或多个参考块,基于所述参考块来产生PU的预测块。
重建单元158可基于TU关联的重建预测残差块和预测处理单元152产生的PU的预测块(即帧内预测数据或帧间预测数据),得到CU的重建块。
滤波器单元159可对CU的重建块执行环路滤波,得到重建的图片。重建的图片存储在图片缓冲器160中。图片缓冲器160可提供参考图片以用于后续运动补偿、帧内预测、帧间预测等,也可将重建的视频数据作为已解码视频数据输出,在显示装置上的呈现。
因为视频编码包括编码和解码两部分,为方便后文描述,编码器端的编码和解码器端的解码也可以统称为编码或译码。根据相关步骤的上下文记载,本领域技术人员可以知晓后续提及的编码(译码)是指编码器端的编码,还是指解码器端的解码。本申请中可使用术语“编码块”或“视频块”以指代样本的一个或多个块,以及对于一个或多个样本块进行编码(译码)的语法结构;编码块或视频块的实例类型可包括H.265/HEVC中的CTU、CU、PU、TU、subblock,H.266/VVC中的CTU、CU,或其他视频编解码标准中的宏块、宏块分割区等。
下面先对本公开实施例中涉及到的一些概念进行介绍。本公开实施例的相关记载采用了H.265/HEVC或H.266/VVC中的术语,以易于解释。然而,并不限定本公开实施例提供的方案受限于H.265/HEVC或H.266/VVC,实际上,本公实施例提供的技术方案也可以实施于H.264/AVC、MPEG、AOM、AVS等,以及这些标准的后续和扩展中。
CTU是Coding Tree Unit的缩写,是H.265/HEVC或H.266/VVC中的编码处理单元,相当于H.264/AVC中的宏块。根据YUV采样格式,一个编码树单元(CTU)应当是包含了同一位置处的一个亮度编码树块(CTB)和两个色度编码树块(CTB)(Cr,Cb)。
编码单元CU(Coding Unit),是视频编解码过程中进行各种类型的编码操作或解码操作的基本单元,例如基于CU的预测、变换、熵编码等等操作。CU是指一个二维采样点阵列,可以是正方形阵列,或者可以是矩形阵列。例如,一个4x8大小的CU可看做4x8共32个采样点构成的方形采样点阵列。CU也可称为图像块。CU包括:一个亮度编码块和两个色度(Cr,Cb)编码块,及相关语法结构。
预测单元(Prediction Unit),也称为预测块,包括:一个亮度预测块和两个色度(Cr,Cb)预测块。
残差块,是指在经帧间预测和/或帧内预测产生当前块的预测块后,将从待编码的当前块减去所述预测块形成的残差图像块,也可称为残差数据或残差图像,包括:一个亮度残差块和两个色度(Cr,Cb)残差块。
系数块,包括对残差块进行变换得到含有变换系数的变换块(TU,Transform Unit),或者对残差块不进行变换,包括含有残差数据(残差信号)的残差块。本公开实施例中,系数包括对残差块进行变换得到的变换块的系数,或者残差块的系数,对系数进行熵编码包括对变换块的系数经量化后进行熵编码,或者,如果未将变换应用于残差数据,包括对残差块的系数经量化后进行熵编码。也可以将未经变换的残差信号和经变换的残差信号统称为系数(coefficient)。为进行有效的压缩。一般系数需进行量化处理,经量化后的系数也可以称为级别。其中,变换块TU包括:一个亮度变换块和两个色度(Cr,Cb)变换块。
量化通常被用于降低系数的动态范围,从而达到用更少的码字表达视频的目的。量化后的数值通常称为级别(level)。量化的操作通常是用系数除以量化步长,量化步长由在码流传递的量化因子决定。反量化则是通过级别乘以量化步长来完成。对于一个N×M大小的块,所有系数的量化可以独立的完成,这一技术被广泛地应用在很多国际视频压缩标准,例如H.265/HEVC、H.266/VVC等。特定的扫描顺序可以把一个二维的系数块变换成一维系数流。扫描顺序可以是Z型,水平,垂直或者其它任何一种顺序的扫描。在国际视频压缩标准中,量化操作可以利用系数间的相关性,利用已量化系数的特性来选择更优的量化方式,从而达到优化量化的目的。
可以看到,残差块通常要比原始图像简单很多,因而预测后确定残差,再进行编码可以显著提升压缩效率。对残差块也不是直接进行编码,而是通常先进行变换。变换是把残差图像从空间域变换到频率域,去除残差图像的相关性。残差图像变换到频率域以后,由于能量大多集中在低频区域,变换后的非零系数大多集中在左上角。变换后,接下来利用量化来进一步压缩。而且由于人眼对高频不敏感,高频区域可以使用更大的量化步长以进一步提升压缩效率。
CTU可以进一步划分为多个CU,H.266/VVC采用了比H.265/HEVC更为复杂的编码单元划分结构 (QTMT,嵌套多类型树的四叉树),在HEVC四叉树(QT)划分的基础上增加了两种二叉树(BT)划分和两种三叉树(TT)划分。其中BT和TT统称为多类型树MT,如图4所示。CTU首先使用四叉树进行划分,然后四叉树的叶子结点可以进一步采用MT进行划分。具体地,CU划分的流程如图5所示。
其中,VVC的CTU默认尺寸是128*128,最小CU尺寸为4*4;CTU默认先用QT方式划分成4个子CU;一旦一个CU用了MT划分方式,则后续不能再进行QT划分。
理论上,QT节点可以按照图5中5种方式划分,MT节点按照图中4种方式划分。例如,一个CTU可能的QTMT划分结果如图6所示。
在编码器中,QTMT块划分位于帧内/帧间预测处理单元,根据不同的块划分寻找相应的参考块进行预测,找到率失真代价最小的划分模式(即本公开实施例中所述的进行块划分方式优选处理,确定对应的优选划分方式),从而得到最终的预测残差,进行下一步的变换和量化等流程,完成块编码。在解码器中,QTMT块划分位于帧内/帧间预测处理单元,根据块划分模式等预测信息确定当前CU的划分树后进一步完成后续解码步骤。
一些可实现的编码方案中,根据以下方法进行编码:
(1)输入图像被划分成不重叠的多个CU块(即最大CU块:CTU);
(2)按照光栅扫描顺序依次处理每个CU,先按照标准规定确定可能的划分方式,再依次尝试每一种块划分方式,取率失真代价最小的方式为最优划分方式;
(3)若当前CU可以继续划分,则对划分后的子CU重复(2)过程;否则,结束当前CU编码,并确定当前CU的最优块划分方式和最优预测模式,同时计算得到残差块,对残差块进行变换、量化、熵编码,对块划分模式等预测信息进行编码,输出码流等待传输。
相应地,一些可实现的解码方案中,根据以下方法进行解码:首先,输入码流进行熵解码,反量化,反变换,得到残差值;接着,根据残差块重建图像,CU重建过程主要包含以下3个步骤:
(a)根据块划分模式等预测信息确定当前CU的划分树。
(b)利用运动矢量等信息找到预测块。
(c)将当前CU的残差值和预测值叠加,得到重建CU。最后,将重建图像送入DBF/SAO/ALF滤波器,滤波后的图像送入缓冲区,等待视频播放。
可以看到,这些可实现的H.266/VVC编解码方案中,即使只考虑四叉树划分,也有4 5+4 4+4 3+4 2+4 1+4 0=1365种划分模式,远远超过H.265/HEVC的341种模式。加上还有二叉树、三叉树划分方式,理论上总的划分次数则有几千种。因此,这些可实现的QTMT技术方案会导致H.266/VVC的编码复杂度远超过H.265/HEVC。例如,编码一个高清视频序列(1080p)可能需要几天时间。
研究发现,对于文本类屏幕内容视频图像中的文字,不论是英文单词还是汉字,亦或者其它文字,一般都是从左往右书写而成的,因此对于文字的分割更大概率上是垂直分割。举个例子,若图像中包含单词OBBO,那么对这个单词区域做块划分时,更大概率是把字母O或者B分割出来做预测(帧间预测或者帧内块拷贝预测),才能找到相似的匹配块(如参考图像中包含单词WTO)。因此,在编码文本类内容的视频图像时,相关可实现的QTMT技术方案可能产生大量不必要的与水平分割相关的开销,浪费了计算资源,增加了编码时间。
相似地,针对从上往下的书写模式,对于文字的分割更大概率是水平分割,对应该情况,相关可实现的QTMT技术方案可能产生大量不必要的与垂直分割相关的开销。
本公开实施例提供一种视频编码方案,根据待编码数据块对应图像中的像素空间分布特征(也称为图像的纹理特征或纹理特征),在进行相关分块划分处理时,跳过不必要的分块方式,以减小编码开销,节约计算资源,缩短编码时间。
本公开实施例提供一种视频编码方案,如图7所示,包括:
步骤710,根据待编码数据块对应的纹理特征,确定所述待编码数据块对应的第一方向分块划分标识;
步骤720,在所述第一方向分块划分标识为第一取值的情况下,进行所述待编码数据块的块划分处 理时不使用第一方向分块划分方式。
在本公开一实施例中,所述进行所述待编码数据块的块划分处理时不使用第一方向分块划分方式,包括:
针对所述待编码数据块进行可用划分方式选优处理时,所述可用划分方式中不包括第一方向分块划分方式。
需要说明的是,针对所述待编码数据块进行可用划分方式优选处理包括:采用至少一种可用划分方式对当前待编码数据块进行块划分尝试,选取率失真代价最小的方式为优选划分方式;或者,选取率失真率小于设定率失真阈值的一个划分方式为优选划分方式。具体的优选处理过程根据相关方案实施,具体方面在本申请实施例中不详细讨论。
可以看到,相比于相关的可实现的块划分方案,本公开实施例提供的块划分方案,在相关的可实现的块划分方案中已确定的可用划分方式范围内,进一步考虑视频数据块对应的纹理特征,对于符合设定条件的视频数据块在进行多划分方式尝试时,跳过水平或垂直分块划分方式,以减小编码复杂度,提升编码效率。
在本公开一实施例中,所述第一取值为FALSE或者0;或者其他设定取值。
在本公开一实施例中,所述第一方向包括:水平方向或垂直方向。
在本公开一实施例中,所述第一方向包括:设定角度的倾斜方向。例如,垂直向左倾斜45度方向,垂直向右倾斜30度方向,等;不限于特定的角度。
在本公开一实施例中,所述纹理特征包括第一纹理特征值;
所述第一方向为水平方向的情况下,所述第一纹理特征值根据以下方式确定:
对所述待编码数据块进行水平方向像素聚合,获得单列像素向量;
计算所述单列像素向量中各分量的梯度;
确定所述单列像素向量的全部分量的梯度中的最大梯度为所述第一纹理特征值。
本公开一实施例中,所述第一方向为垂直方向的情况下,所述第一纹理特征值根据以下方式确定:
对所述待编码数据块进行垂直方向像素聚合,获得单行像素向量;
计算所述单行像素向量中各分量的梯度;
确定所述单行像素向量的全部分量的梯度中的最大梯度为所述第一纹理特征值。
在本公开一实施例中,所述第一方向分块划分标识根据以下方式确定:
在所述第一纹理特征值小于第一特征阈值的情况下,确定所述第一方向分块划分标识为第一取值。
相应地,所述第一方向分块划分标识包括:允许水平二叉树分块划分标识和允许水平三叉树分块划分标识;
或者,
所述第一方向分块划分标识包括:允许垂直二叉树分块划分标识和允许垂直三叉树分块划分标识。
在本公开一实施例中,所述允许水平二叉树分块划分标识为H.266/VVC规范中定义的allowSplitBtHor。
在本公开一实施例中,所述允许水平三叉树分块划分标识H.266/VVC规范中定义的allowSplitTtHor。
在本公开一实施例中,所述允许垂直二叉树分块划分标识为H.266/VVC规范中定义的allowSplitBtVer。
在本公开一实施例中,所述允许垂直三叉树分块划分标识H.266/VVC规范中定义的allowSplitTtVer。
可以理解,在本公开一实施例中,第一方向为水平方向时,在根据待编码视频数据块对应确定的第一纹理特征值小于第一特征阈值的情况下,表明该视频数据块对应的图像包含的单行的文本字符占满整个数据块的垂直方向,则无需对该视频数据块进行水平方向的划分;第一方向为垂直方向时,在待编码视频数据块对应确定的文本特征值小于第一特征阈值的情况下,表明该视频数据块对应的图像包含的单列的文本字符占满整个数据块的水平方向,则无需对该视频数据块进行垂直方向的划分。
在本公开一实施例中,所述纹理特征包括:第二纹理特征值和第三纹理特征值;
所述第一方向为水平方向的情况下,所述第二纹理特征值和第三纹理特征值根据以下方式确定:
对所述待编码数据块进行水平方向像素聚合,获得单列像素向量;
计算所述单列像素向量中全部N个分量的梯度;
确定所述单列像素向量中前M个分量的梯度中的最大值为第二纹理特征值;
确定所述单列像素向量中后N-M个分量的梯度中的最大值为第三纹理特征值。
在本公开一实施例中,所述第一方向为垂直方向的情况下,所述第二纹理特征值和第三纹理特征值根据以下方式确定:
对所述待编码数据块进行垂直方向像素聚合,获得单行像素向量;
计算所述单行像素向量中全部N个分量的梯度;
确定所述单行像素向量中前M个分量的梯度中的最大值为第二纹理特征值;
确定所述单行像素向量中后N-M个分量的梯度中的最大值为第三纹理特征值。
在本公开一实施例中,所述第一方向分块划分标识根据以下方式确定:
在所述第二纹理特征值小于第二特征阈值,且所述第三纹理特征值小于第二特征阈值的情况下,确定所述第一方向分块划分标识为第一取值。
相应地,所述第一方向分块划分标识包括:允许水平二叉树分块划分标识;
或者,
所述第一方向分块划分标识包括:允许垂直二叉树分块划分标识。
在本公开一实施例中,所述允许水平二叉树分块划分标识为H.266/VVC规范中定义的allowSplitBtHor。
在本公开一实施例中,所述允许垂直二叉树分块划分标识为H.266/VVC规范中定义的allowSplitBtVer。
需要说明的是,以第一取值为FALSE,第一方向为水平方向为例,本公开实施例中记载了第一方向分块划分标识确定为FALSE的方案;第一方向分块划分标识为FALSE,则指示进行所述待编码数据块的块划分处理时不使用水平分块划分方式。而对于不满足上述判断规则,未确定为FALSE的情况下,所述待编码数据块是否被确定其第一方向分块划分标识为TRUE还需要根据相关方案中其他约束进行判断,最终的判断结果可以是FALSE,也可以是TRUE。以第一取值为FALSE,第一方向为垂直方向为例,本公开实施例中记载了第一方向分块划分标识确定为FALSE的方案;第一方向分块划分标识为FALSE,则指示进行所述待编码数据块的块划分处理时不使用垂直分块划分方式。而对于不满足上述判断规则,未确定为FALSE的情况下,所述待编码数据块是否被确定其第一方向分块划分标识为TRUE还需要根据相关方案中其他约束进行判断,最终的判断结果可以是FALSE,也可以是TRUE。相关方案中其他约束的具体判断在此不详细示例。对于第一方向为其他方向的情况,也适用类似的判断方案,在此不详细示例。
在本公开一实施例中,所述单列像素向量Line中第y个分量line(y)根据以下公式计算:
Figure PCTCN2022106532-appb-000001
其中,CU(x,y)为所述待编码视频数据块在位置(x,y)处的像素值,CUWidth为所述待编码视频数据块的宽度。
在本公开一实施例中,所述单列像素向量中各分量的梯度根据以下公式计算:
G(y)=|line(y)-line(y+1)|
其中,G(y)为所述单列像素向量中第y个分量的梯度。
在本公开一实施例中,根据单列向量中各分量的最大梯度值确定的所述第一纹理特征值MG:
MG=maximum(G)
例如,图8所示上下个视频数据块,分别进行水平像素聚合后,再计算单列向量的各分量的梯度,进而确定各自的最大梯度值MG为所述第一纹理特征值。其中,图8中上方视频数据块的第一纹理特征值小于第一特征阈值为80,则对该视频数据块进行分块划分优选处理时,跳过水平二叉树和水平三叉树分块划分尝试;图8中下方视频数据块的第一纹理特征值大于第一特征阈值为80,则根据已有方案进行分块划分尝试。
在本公开一实施例中,所述第一特征阈值为80;则在所述待编码数据块的第一纹理特征值MG<第一特征阈值(80)的情况下,确定所述第一方向分块划分标识为FALSE。在本公开一实施例中,进行多划分方式尝试选优处理时,跳过水平二叉树分块和水平三叉树分块方式。
或者,根据文本字符的特点,设定第一特征阈值为其他数值,不限于上述示例。
在本公开一实施例中,根据H.266/VVC规范,在允许的二叉树分块处理(Allowed binary split process)中确定允许水平二叉树分块标识(allowSplitBtHor)的判断条件中增加:
如果MG<第一特征阈值,则允许二叉树分块标识(allowBtSplit)为FALSE。
在本公开一实施例中,根据H.266/VVC规范,在允许的三叉树分块处理(Allowed ternary split process)中确定允许水平三叉树分块标识(allowSplitTtHor)的判断条件中增加:
如果MG<第一特征阈值,则允许三叉树分块标识(allowTtSplit)为FALSE。
在本公开一实施例中,根据H.266/VVC规范,在允许的二叉树分块处理(Allowed binary split process)中确定允许垂直二叉树分块标识(allowSplitBtVer)的判断条件中增加:
如果MG<第一特征阈值,则允许二叉树分块标识(allowBtSplit)为FALSE。
在本公开一实施例中,根据H.266/VVC规范,在允许的三叉树分块处理(Allowed ternary split process)中确定允许垂直三叉树分块标识(allowSplitTtVer)的判断条件中增加:
如果MG<第一特征阈值,则允许三叉树分块标识(allowTtSplit)为FALSE。
需要说明的是,第一方向为水平方向,第一取值为FALSE的情况下,上述实施例记载了基于所述待编码视频数据块对应图像的第一纹理特征值,确定第一方向分块划分标识为FALSE的情况;第一方向分块划分标识为FALSE,则指示进行所述待编码数据块的块划分处理时不使用水平分块划分方式。并不表明,如果MG>=第一特征阈值,相应的允许水平二叉树分块标识(allowSplitBtHor)或允许水平三叉树分块标识(allowSplitTtHor)就为TRUE。根据H.266/VVC规范,在允许的二叉树分块处理(Allowed binary split process)中确定允许二叉树分块标识(allowBtSplit)判断条件中还包括其他条件,全部条件中任一项或多项满足(为TRUE),允许二叉树分块标识(allowBtSplit)都为FALSE,相应地,允许水平二叉树分块标识(allowSplitBtHor)都为FALSE;全部条件都不满足时,允许二叉树分块标识(allowBtSplit)才为TRUE,相应地,允许水平二叉树分块标识(allowSplitBtHor)才为TRUE。允许水平三叉树分块标识(allowSplitTtHor)和允许三叉树分块标识(allowTtSplit)也根据相似的方式确定,在此不详细讨论。
垂直方向的相关标识也根据相似方式确定,在此不详细讨论。
一些可实现的实施例中,其他判断条件可以参考H.266/VVC规范中6.4.2和6.4.3章节内容。
在本公开一实施例中,所述单行像素向量Line中第x个分量line(x)根据以下公式计算:
Figure PCTCN2022106532-appb-000002
其中,CU(x,y)为所述待编码视频数据块在位置(x,y)处的像素值,CUHeight为所述待编码视频数据块的高度。
所述单行像素向量中各分量的梯度根据以下公式计算:
G(x)=|line(x)-line(x+1)|
其中,G(x)为所述单行像素向量中第x个分量的梯度。
在本公开一实施例中,根据单行向量中各分量的最大梯度值确定的所述第一纹理特征值MG:
MG=maximum(G)。
在本公开一实施例中,根据单列向量中各分量的梯度值确定第二纹理特征值MG1和第三纹理特征值MG2如下:
Figure PCTCN2022106532-appb-000003
Figure PCTCN2022106532-appb-000004
在本公开一实施例中,N=CUHeight为所述待编码视频数据块的高度,M=CUHeight/2。相应地,
Figure PCTCN2022106532-appb-000005
Figure PCTCN2022106532-appb-000006
在本公开一实施例中,N=CUWidth为所述待编码视频数据块的宽度,M=CUWidth/2。相应地,
Figure PCTCN2022106532-appb-000007
Figure PCTCN2022106532-appb-000008
可以看到,根据第二纹理特征值和第三纹理特征值来确定所述第一方向分块划分标识为第一取值的方案,能够更精确地筛选出无需进行第一方向二叉树划分的待编码数据块。
在本公开一实施例中,所述第二特征阈值为80;则在所述待编码数据块的第二纹理特征值MG1<第二特征阈值(80),且第三纹理特征值MG2<第二特征阈值(80)的情况下,确定所述第一方向分块划分标识为第一取值。在本公开一实施例中,进行多划分方式尝试选优处理时,跳过第一方向二叉树分块。
或者,根据文本字符的特点,设定第二特征阈值为其他数值,不限于上述示例。第一特征阈值与第二特征阈值独立设置,可以相同,也可以不相同。
在本公开一实施例中,根据H.266/VVC规范,在允许的二叉树分块处理(Allowed binary split process)中确定允许水平二叉树分块标识(allowSplitBtHor)的判断条件中增加:
如果MG1<第二特征阈值,且MG2<第二特征阈值,则允许二叉树分块标识(allowBtSplit)为FALSE。
在本公开一实施例中,根据H.266/VVC规范,在允许的三叉树分块处理(Allowed ternary split process)中确定允许水平三叉树分块标识(allowSplitTtHor)的判断条件中增加:
如果MG1<第二特征阈值,且MG2<第二特征阈值,则允许三叉树分块标识(allowTtSplit)为FALSE。
在本公开一实施例中,根据H.266/VVC规范,在允许的二叉树分块处理(Allowed binary split process)中确定允许垂直二叉树分块标识(allowSplitBtVer)的判断条件中增加:
如果MG1<第二特征阈值,且MG2<第二特征阈值,则允许二叉树分块标识(allowBtSplit)为FALSE。
在本公开一实施例中,根据H.266/VVC规范,在允许的三叉树分块处理(Allowed ternary split process)中确定允许垂直三叉树分块标识(allowSplitTtVer)的判断条件中增加:
如果MG1<第二特征阈值,且MG2<第二特征阈值,则允许三叉树分块标识(allowTtSplit)为FALSE。
需要说明的是,第一方向为水平方向,第一取值为FALSE的情况下,上述实施例记载了基于所述待编码数据块对应第二纹理特征值和第三纹理特征值,确定第一方向分块划分标识为FALSE的情况;第一方向分块划分标识为FALSE,则指示进行所述待编码数据块的块划分处理时不使用水平分块划分 方式。并不表明,如果MG1>=第二特征阈值,或MG2>=第二特征阈值,相应的允许水平二叉树分块标识(allowSplitBtHor)或允许水平三叉树分块标识(allowSplitTtHor)就为TRUE。根据H.266/VVC规范,在允许的二叉树分块处理(Allowed binary split process)中确定允许二叉树分块标识(allowBtSplit)判断条件中还包括其他条件,全部条件中任一项或多项满足(为TRUE),允许二叉树分块标识(allowBtSplit)都为FALSE,相应地,允许水平二叉树分块标识(allowSplitBtHor)都为FALSE;全部条件都不满足时,允许二叉树分块标识(allowBtSplit)才为TRUE,相应地,允许水平二叉树分块标识(allowSplitBtHor)才为TRUE。允许水平三叉树分块标识(allowSplitTtHor)和允许三叉树分块标识(allowTtSplit)也根据相似的方式确定,在此不详细讨论。
垂直方向的相关标识也根据相似方式确定,在此不详细讨论。
一些可实现的实施例中,其他判断条件可以参考H.266/VVC规范中6.4.2和6.4.3章节内容。
本公开实施例提供一种视频编码方法,如图9所示,包括:
步骤910,根据步骤710和720所述的方法,对所述待编码数据块进行可用划分方式优选处理,确定述待编码数据块的当前块划分深度对应的优选划分方式,并判断是否继续下一深度的划分;
步骤920,在确定不继续下一深度的划分的情况下,按照所确定的优选划分方式对所述待编码数据块进行划分后,完成所述待编码数据块的后续编码步骤。
在本公开一实施例中,如图10所示,所述视频编码方法还包括:
步骤930,在确定继续下一深度的划分的情况下,按照所确定的优选划分方式对所述待编码数据块进行划分后,对划分得到的各待编码数据子块,依次进行下一深度的块划分和编码步骤。
可以理解,步骤910中根据本公开任一实施例所述编码方法确定所述第一方向分块划分标识后,在所述第一方向分块划分标识为第一取值的情况下,针对所述待编码数据块进行可用划分方式选优处理时,不使用第一方向分块划分方式;即针对所述待编码数据块进行可用划分方式选优处理时,所述可用划分方式中不包括第一方向分块划分方式。
在本公开一实施例中,步骤910包括:
根据待编码数据块对应的纹理特征,确定所述待编码数据块对应的第一方向分块划分标识;
在所述第一方向分块划分标识为第一取值的情况下,针对所述待编码数据块进行可用划分方式选优处理时,所述可用划分方式中不包括第一方向分块划分方式;
对所述待编码数据块进行可用划分方式优选处理后,得到至少一个可用划分方式对应的划分评价结果;
从所述至少一个可用划分方式对应的划分评价结果中,选择一个划分方式为所述待编码视频数据块的当前块划分深度对应的优选划分方式。
可以理解的是,在确定需要继续下一深度的划分时,根据所确定的优选划分方式对当前视频数据块进行划分,得到多个数据子块,针对每一个数据子块再次从步骤910开始执行。可以看到,各数据子块再次从步骤910开始执行时,其对应的块划分深度加1。
需要说明的是,判断是否继续下一深度的划分根据相关可实现方案执行,具体方面在本申请实施例中不进一步讨论。
针对所述待编码数据块进行可用划分方式优选处理,确定当前块划分深度对应的优选划分方式包括:采用至少一种可用划分方式对当前待编码数据块进行块划分尝试,选取率失真代价最小的方式为优选划分方式;或者,选取率失真率小于设定率失真阈值的一个划分方式为优选划分方式。具体的优选处理过程根据相关方案实施,具体方面在本申请实施例中不详细讨论。
一些可实现的实施例中,采用本公开实施例提供的编码方法,在VVC参考软件VTM11.0上实现后,在帧内模式下对文本类屏幕内容视频(WebBrowsing,WordEditing,ChineseDocumentEditing等)进行测试,对于文本类屏幕内容视频自适应跳过水平二叉树划分和水平三叉树划分,编码平均性能损失0.71%,编码时间平均下降12.6%。数据表明本公开实施例方案可有效节省编码时间,且不几乎降低编码性能。即对于文本类屏幕内容视频,本技术可以在保持和相关技术基本相当的编码性能的情况下显著减少编码复杂度。
可以理解,本公开实施例提供的视频编码方案,对视频数据块对应图像中像素的空间分布进行分析,来判断其包含的纹理特征是否无需进行第一方向的分块划分处理。例如,单行单词占满整个块的竖直方向,则表明进行水平划分,各水平子块只是单词的部分,找到相似的匹配块的可能较小,因此,确定该类数据块满足不进行水平分块的条件,针对该数据块进行分块划分相关处理时,不需要进行水平方向的划分。例如,在进行可用划分方式选优处理时,所述可用划分方式中不包括水平分块划分方式,即跳过水平分块划分方式的尝试。由此可以节约计算资源,缩短编码时间。
需要说明的是,本公开实施例中示例了根据MG和第一特征阈值的比较方案,还示例了根据MG1,MG2和第二特征阈值的比较方案,以及MG,MG1,MG2的计算方案。根据这些示例,还可以采用其他等效变形的判断方案,来确定所述视频数据块对应图像中像素的空间分布不适合采用第一方向划分,进而在后续分块处理中跳过这些不适合的分块划分尝试。
本公开一实施例还提供了一种视频编码设备,如图11所示,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的视频编码方法。
本公开一实施例还提供了一种码流,其中,所述码流根据如本公开任一实施例所述的视频编码方法生成。
可以看到,本公开实施例提供的块划分以及编解码方法,通过对待编码数据块图像的像素空间分布进行分析,确定对应的纹理特征,根据纹理特征,在进行块划分相关处理时,跳过不适合的分块划分方式,可以有效减小编码端计算量,缩短编码时间。
在一或多个示例性实施例中,所描述的功能可以硬件、软件、固件或其任一组合来实施。如果以软件实施,那么功能可作为一个或多个指令或代码存储在计算机可读介质上或经由计算机可读介质传输,且由基于硬件的处理单元执行。计算机可读介质可包含对应于例如数据存储介质等有形介质的计算机可读存储介质,或包含促进计算机程序例如根据通信协议从一处传送到另一处的任何介质的通信介质。以此方式,计算机可读介质通常可对应于非暂时性的有形计算机可读存储介质或例如信号或载波等通信介质。数据存储介质可为可由一或多个计算机或者一或多个处理器存取以检索用于实施本公开中描述的技术的指令、代码和/或数据结构的任何可用介质。计算机程序产品可包含计算机可读介质。
举例来说且并非限制,此类计算机可读存储介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来以指令或数据结构的形式存储所要程序代码且可由计算机存取的任何其它介质。而且,还可以将任何连接称作计算机可读介质举例来说,如果使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令,则同轴电缆、光纤电缆、双纹线、DSL或例如红外线、无线电及微波等无线技术包含于介质的定义中。然而应了解,计算机可读存储介质和数据存储介质不包含连接、载波、信号或其它瞬时(瞬态)介质,而是针对非瞬时有形存储介质。如本文中所使用,磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘或蓝光光盘等,其中磁盘通常以磁性方式再生数据,而光盘使用激光以光学方式再生数据。上文的组合也应包含在计算机可读介质的范围内。
可由例如一或多个数字信号理器(DSP)、通用微处理器、专用集成电路(ASIC)现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文描述的功能性可提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或并入在组合式编解码器中。并且,可将所述技术完全实施于一个或多个电路或逻辑元件中。
本公开实施例的技术方案可在广泛多种装置或设备中实施,包含无线手机、集成电路(IC)或一组IC(例如,芯片组)。本公开实施例中描各种组件、模块或单元以强调经配置以执行所描述的技术的装置的功能方面,但不一定需要通过不同硬件单元来实现。而是,如上所述,各种单元可在编解码器硬件单元中组合或由互操作硬件单元(包含如上所述的一个或多个处理器)的集合结合合适软件和/或固件来提供。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块 /单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (14)

  1. 一种视频编码方法,其特征在于,包括:
    根据待编码数据块对应的纹理特征,确定所述待编码数据块对应的第一方向分块划分标识;
    在所述第一方向分块划分标识为第一取值的情况下,进行所述待编码数据块的块划分处理时不使用第一方向分块划分方式。
  2. 如权利要求1所述的视频编码方法,其特征在于,
    所述进行所述待编码数据块的块划分处理时不使用第一方向分块划分方式,包括:
    针对所述待编码数据块进行可用划分方式选优处理时,所述可用划分方式中不包括第一方向分块划分方式。
  3. 如权利要求1或2所述的视频编码方法,其特征在于,
    所述第一方向包括:水平方向或垂直方向。
  4. 如权利要求1所述的视频编码方法,其特征在于,
    所述纹理特征包括第一纹理特征值;
    所述第一方向为水平方向的情况下,所述第一纹理特征值根据以下方式确定:
    对所述待编码数据块进行水平方向像素聚合,获得单列像素向量;
    计算所述单列像素向量中各分量的梯度;
    确定所述单列像素向量的全部分量的梯度中的最大梯度为所述第一纹理特征值;
    或者,
    所述第一方向为垂直方向的情况下,所述第一纹理特征值根据以下方式确定:
    对所述待编码数据块进行垂直方向像素聚合,获得单行像素向量;
    计算所述单行像素向量中各分量的梯度;
    确定所述单行像素向量的全部分量的梯度中的最大梯度为所述第一纹理特征值。
  5. 如权利要求4所述的视频编码方法,其特征在于,
    所述第一方向分块划分标识根据以下方式确定:
    在所述第一纹理特征值小于第一特征阈值的情况下,确定所述第一方向分块划分标识为第一取值。
  6. 如权利要求4所述的视频编码方法,其特征在于,
    所述第一方向分块划分标识包括:允许水平二叉树分块划分标识和允许水平三叉树分块划分标识;
    或者,
    所述第一方向分块划分标识包括:允许垂直二叉树分块划分标识和允许垂直三叉树分块划分标识。
  7. 如权利要求1所述的视频编码方法,其特征在于,
    所述纹理特征包括:第二纹理特征值和第三纹理特征值;
    所述第一方向为水平方向的情况下,所述第二纹理特征值和第三纹理特征值根据以下方式确定:
    对所述待编码数据块进行水平方向像素聚合,获得单列像素向量;
    计算所述单列像素向量中全部N个分量的梯度;
    确定所述单列像素向量中前M个分量的梯度中的最大值为第二纹理特征值;
    确定所述单列像素向量中后N-M个分量的梯度中的最大值为第三纹理特征值;
    或者,
    所述第一方向为垂直方向的情况下,所述第二纹理特征值和第三纹理特征值根据以下方式确定:
    对所述待编码数据块进行垂直方向像素聚合,获得单行像素向量;
    计算所述单行像素向量中全部N个分量的梯度;
    确定所述单行像素向量中前M个分量的梯度中的最大值为第二纹理特征值;
    确定所述单行像素向量中后N-M个分量的梯度中的最大值为第三纹理特征值。
  8. 如权利要求7所述的视频编码方法,其特征在于,
    所述第一方向分块划分标识根据以下方式确定:
    在所述第二纹理特征值小于第二特征阈值,且所述第三纹理特征值小于第二特征阈值的情况下,确定所述第一方向分块划分标识为第一取值。
  9. 如权利要求7所述的视频编码方法,其特征在于,
    所述第一方向分块划分标识包括:允许水平二叉树分块划分标识;
    或者,
    所述第一方向分块划分标识包括:允许垂直二叉树分块划分标识。
  10. 如权利要求4或7所述的视频编码方法,其特征在于,
    所述单列像素向量Line中第y个分量line(y)根据以下公式计算:
    Figure PCTCN2022106532-appb-100001
    其中,CU(x,y)为所述待编码数据块在位置(x,y)处的像素值,CUWidth为所述待编码数据块的宽度;
    或者,
    所述单行像素向量Line中第x个分量line(x)根据以下公式计算:
    Figure PCTCN2022106532-appb-100002
    其中,CU(x,y)为所述待编码数据块在位置(x,y)处的像素值,CUHeight为所述待编码数据块的高度。
  11. 如权利要求10所述的视频编码方法,其特征在于,
    所述单列像素向量中各分量的梯度根据以下公式计算:
    G(y)=|line(y)-line(y+1)|
    其中,G(y)为所述单列像素向量中第y个分量的梯度;
    或者,
    所述单行像素向量中各分量的梯度根据以下公式计算:
    G(x)=|line(x)-line(x+1)|
    其中,G(x)为所述单行像素向量中第x个分量的梯度。
  12. 一种视频编码设备,其特征在于,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求1至11中任一项所述的视频编码方法。
  13. 一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如权利要求1至11中任一项所述的视频编码方法。
  14. 一种码流,其中,所述码流根据如权利要求1至11中任一项所述的视频编码方法生成。
PCT/CN2022/106532 2022-07-19 2022-07-19 一种视频编码方法、设备、存储介质及码流 WO2024016171A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106532 WO2024016171A1 (zh) 2022-07-19 2022-07-19 一种视频编码方法、设备、存储介质及码流

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106532 WO2024016171A1 (zh) 2022-07-19 2022-07-19 一种视频编码方法、设备、存储介质及码流

Publications (1)

Publication Number Publication Date
WO2024016171A1 true WO2024016171A1 (zh) 2024-01-25

Family

ID=89616748

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106532 WO2024016171A1 (zh) 2022-07-19 2022-07-19 一种视频编码方法、设备、存储介质及码流

Country Status (1)

Country Link
WO (1) WO2024016171A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018155983A1 (ko) * 2017-02-24 2018-08-30 주식회사 케이티 비디오 신호 처리 방법 및 장치
CN111147867A (zh) * 2019-12-18 2020-05-12 重庆邮电大学 一种多功能视频编码cu划分快速决策方法及存储介质
CN111818332A (zh) * 2020-06-09 2020-10-23 复旦大学 一种适用于vvc标准的帧内预测划分判决的快速算法
CN112104868A (zh) * 2020-11-05 2020-12-18 电子科技大学 一种针对vvc帧内编码单元划分的快速决策方法
US20210136369A1 (en) * 2017-05-26 2021-05-06 Sk Telecom Co., Ltd. Apparatus and method for video encoding or decoding supporting various block sizes
CN113691811A (zh) * 2021-07-30 2021-11-23 浙江大华技术股份有限公司 编码块划分方法、装置、系统及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018155983A1 (ko) * 2017-02-24 2018-08-30 주식회사 케이티 비디오 신호 처리 방법 및 장치
US20210136369A1 (en) * 2017-05-26 2021-05-06 Sk Telecom Co., Ltd. Apparatus and method for video encoding or decoding supporting various block sizes
CN111147867A (zh) * 2019-12-18 2020-05-12 重庆邮电大学 一种多功能视频编码cu划分快速决策方法及存储介质
CN111818332A (zh) * 2020-06-09 2020-10-23 复旦大学 一种适用于vvc标准的帧内预测划分判决的快速算法
CN112104868A (zh) * 2020-11-05 2020-12-18 电子科技大学 一种针对vvc帧内编码单元划分的快速决策方法
CN113691811A (zh) * 2021-07-30 2021-11-23 浙江大华技术股份有限公司 编码块划分方法、装置、系统及存储介质

Similar Documents

Publication Publication Date Title
WO2018001207A1 (zh) 编解码的方法及装置
RU2678490C2 (ru) Определение размера палитры, записей палитры и фильтрование блоков, кодированных на основе палитры, при кодировании видео
RU2565502C2 (ru) Кодирование коэффициентов преобразования для кодирования видео
TW202005399A (zh) 基於區塊之自適應迴路濾波器(alf)之設計及發信令
TW202110189A (zh) 用於視訊寫碼的環繞運動補償
US11924438B2 (en) Picture reconstruction method and apparatus
US11985309B2 (en) Picture coding method, picture decoding method, and related apparatuses
TW202218422A (zh) 用於在視訊譯碼期間進行濾波的多個神經網路模型
CN113455005A (zh) 用于帧内子分区译码工具所产生的子分区边界的去块效应滤波器
JP7494218B2 (ja) ビデオエンコーディングにおける適応ループフィルタのためのクリッピングインデックスコード化
JP2023071967A (ja) 明示的信号伝達および暗黙的信号伝達を用いた、使用不可参照フレームの適応ブロック更新
WO2022166462A1 (zh) 编码、解码方法和相关设备
US11516470B2 (en) Video coder and corresponding method
JP2022546451A (ja) ビデオ/映像コーディングシステムにおける変換係数コーディング方法及び装置
WO2024016171A1 (zh) 一种视频编码方法、设备、存储介质及码流
TW202143712A (zh) 視訊轉碼中的低頻不可分離變換處理
TW202143708A (zh) 用於圖像和視訊解碼的區塊分割
WO2023039856A1 (zh) 一种视频解码、编码方法及设备、存储介质
WO2023004590A1 (zh) 一种视频解码、编码方法及设备、存储介质
WO2024050723A1 (zh) 一种图像预测方法、装置及计算机可读存储介质
RU2801327C2 (ru) Способ и устройство внутреннего прогнозирования для видеопоследовательности
WO2022257142A1 (zh) 一种视频解码、编码方法及设备、存储介质
CN116760976B (zh) 仿射预测决策方法、装置、设备及存储介质
RU2815810C2 (ru) Кодирование информации относительно набора ядер преобразования
WO2023173255A1 (zh) 图像编解码方法、装置、设备、系统、及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22951449

Country of ref document: EP

Kind code of ref document: A1