CN116600107B

CN116600107B - HEVC-SCC quick coding method and device based on IPMS-CNN and spatial neighboring CU coding modes

Info

Publication number: CN116600107B
Application number: CN202310893891.7A
Authority: CN
Inventors: 陈婧; 李志鸿; 曾焕强; 林琦; 朱建清
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-11-21
Anticipated expiration: 2043-07-20
Also published as: CN116600107A

Abstract

The application discloses an HEVC-SCC quick coding method and device based on IPMS-CNN and space domain adjacent CU coding modes, which combines a method for predicting a large-size CU mode by a convolutional neural network with a method for predicting a small-size CU mode based on the number of modes adopted by the space adjacent CU, aims at reducing coding time while maintaining coding quality and reducing calculation complexity, and comprises the steps of firstly constructing a database and training a convolutional neural network model selected by an IBC/PLT mode; secondly, the input CTU passes through a mode selection network, and a mode prediction label of the CTU is output; finally, the mode selected by the current CU is predicted by counting the number of modes used by the neighboring 3 CUs. The application can save the encoding time and reduce the calculation complexity of the screen content video.

Description

HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode methods and devices

技术领域Technical field

本发明涉及视频编码领域，特别涉及一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码方法及装置。The present invention relates to the field of video coding, and in particular to a HEVC-SCC fast coding method and device based on IPMS-CNN and spatial adjacent CU coding mode.

背景技术Background technique

近年来，随着计算机视觉、多媒体技术和人机交互等领域的快速发展，屏幕共享、无线显示、远程教育等屏幕内容视频（Screen Content Video，SCV）的应用程序不断涌现，对传统处理自然视频的视频编码方法提出了巨大的挑战。传统处理自然视频的标准，例如高效视频编码（High Effcient Video Coding，HEVC），是专门为压缩相机拍摄采集的自然视频内容而制定的。而屏幕内容视频主要是由计算机生成的，它通常具有大面积的均匀平面、重复的图案和字符、颜色种类有限但具有高饱和度、图像对比度高、有锐利的边缘等特点。如果仍用传统的视频编码标准来处理屏幕内容视频，压缩效果往往不佳。因此为了利用屏幕内容视频的这些特殊特征，联合视频编码组在HEVC的基础上开发了屏幕内容编码（Screen Content Coding，SCC）标准：HEVC-SCC。该标准增加了四种新的模式：帧内块复制（Intra Block Copy，IBC，调色板模式（Palette Mode，PLT），自适应颜色变换（AdaptiveColor Transform，ACT），自适应运动矢量分辨率（Adaptive Motion Vector Resolution，AMVR）。In recent years, with the rapid development of computer vision, multimedia technology and human-computer interaction, screen content video (SCV) applications such as screen sharing, wireless display, and distance education have continued to emerge, which has challenged the traditional processing of natural videos. Video coding methods present huge challenges. Traditional standards for processing natural videos, such as High Effcient Video Coding (HEVC), are specifically developed to compress natural video content captured by cameras. On-screen content video is mainly computer-generated, and it usually features a large area of uniform plane, repeated patterns and characters, a limited variety of colors but high saturation, high image contrast, and sharp edges. If traditional video encoding standards are still used to process screen content videos, the compression effect is often poor. Therefore, in order to take advantage of these special characteristics of screen content videos, the Joint Video Coding Group developed a Screen Content Coding (SCC) standard based on HEVC: HEVC-SCC. The standard adds four new modes: Intra Block Copy (IBC), Palette Mode (PLT), Adaptive Color Transform (ACT), Adaptive Motion Vector Resolution ( Adaptive Motion Vector Resolution (AMVR).

在这四种模式，IBC和PLT是提高压缩性能的两种主要模式。IBC模式有助于在同一帧内编码重复的模式，而PLT模式的目标是用一些有限的主要颜色进行编码。尽管这两种工具的加入能显著提高SCC的编码性能，但是其编码复杂度也显著增加。Among these four modes, IBC and PLT are the two main modes that improve compression performance. IBC mode helps to encode repeated patterns within the same frame, while PLT mode aims to encode with a limited number of primary colors. Although the addition of these two tools can significantly improve the coding performance of SCC, its coding complexity also increases significantly.

发明内容Contents of the invention

本发明的主要目的在于克服现有技术中的上述缺陷，提出一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码方法及装置，在保证主观质量的同时，能够节省编码时间，降低屏幕内容视频的计算复杂度，加速HEVC-SCC的编码过程。The main purpose of the present invention is to overcome the above-mentioned defects in the existing technology and propose a HEVC-SCC fast encoding method and device based on IPMS-CNN and spatial adjacent CU encoding mode, which can save encoding time while ensuring subjective quality. , reduce the computational complexity of screen content video and accelerate the encoding process of HEVC-SCC.

本发明采用如下技术方案：The present invention adopts the following technical solutions:

一方面，一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码方法，包括：On the one hand, a HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode includes:

数据集制作步骤，建立不同分辨率的视频序列数据集并编码，获取不同量化参数下HEVC-SCC的各个CU是否使用IBC/PLT模式的真实标签；In the data set production step, video sequence data sets of different resolutions are established and encoded to obtain the real labels of whether each CU of HEVC-SCC uses IBC/PLT mode under different quantization parameters;

网络模型构建步骤，构建包括输入层、特征提取层和输出表达层的网络模型IPMS-CNN；其中，特征提取层中搭建三个卷积层提取三种特征图，加上下采样后得到的特征图，共提取四种不同尺寸的特征图；The network model construction step is to build a network model IPMS-CNN including an input layer, a feature extraction layer and an output expression layer. Among them, three convolutional layers are built in the feature extraction layer to extract three types of feature maps, plus the feature maps obtained after downsampling. , a total of four feature maps of different sizes are extracted;

网络模型训练步骤，基于制作的数据集，对构建的网络模型进行训练，获得训练好的IBC/PLT模式选择卷积神经网络IPMS-CNN模型；The network model training step is to train the constructed network model based on the produced data set to obtain the trained IBC/PLT mode selection convolutional neural network IPMS-CNN model;

网络模型预测步骤，将LCU输入到训练好的IPMS-CNN，获得模式预测标签，以预测出CTU的模式选择；In the network model prediction step, the LCU is input into the trained IPMS-CNN to obtain the mode prediction label to predict the mode selection of the CTU;

当前CU模式预测步骤，计算相邻3个CU所使用的IBC/PLT模式数量和相邻3个CU所使用的Intra模式数量，根据两个数量关系联合预测当前8×8CU 模式； Current CU mode prediction step, calculates the number of IBC/PLT modes used by three adjacent CUs and the number of Intra modes used by three adjacent CUs , jointly predict the current 8×8CU mode based on the two quantitative relationships;

编码步骤，编码器基于网络模型预测步骤调用预测标签，与当前CU模式预测步骤一起预测CU划分结果。In the encoding step, the encoder calls the prediction label based on the network model prediction step, and predicts the CU division result together with the current CU mode prediction step.

优选的，所述数据集制作步骤，具体包括：Preferably, the data set creation steps specifically include:

自制三种不同分辨率的视频序列数据集，数据集涵盖图片数据集和视频数据集，包含TGM/M、A、CC三种类型视频序列；Self-made video sequence data sets of three different resolutions. The data sets cover image data sets and video data sets, including three types of video sequences: TGM/M, A, and CC;

接着通过标准编码软件平台进行编码，在全帧内配置下，设置不同量化参数QP下的各个CU的IBC/PLT模式的模式标签。Then encode through the standard encoding software platform, and set the mode tag of the IBC/PLT mode of each CU under different quantization parameters QP under the full intra-frame configuration.

优选的，所述的数据集包括：训练集、验证集和测试集；所述训练集、验证集和测试集中的每一个集均包含三个子集；第一个子集的分辨率为1024×576，第二个子集的分辨率为1792×1024，第三个子集为2304×1280。Preferably, the data set includes: a training set, a verification set and a test set; each of the training set, verification set and test set contains three subsets; the resolution of the first subset is 1024× 576, the second subset has a resolution of 1792×1024, and the third subset has a resolution of 2304×1280.

优选的，所述量化参数包括四个量化等级，分别为22、27、32和37。Preferably, the quantization parameters include four quantization levels, which are 22, 27, 32 and 37 respectively.

优选的，网络模型构建步骤中，特征提取层中搭建三个卷积层，提取三种特征图，同时经过下采样后的特征图会直接送入网络的连接层中。Preferably, in the network model building step, three convolutional layers are built in the feature extraction layer to extract three types of feature maps. At the same time, the downsampled feature maps will be directly sent to the connection layer of the network.

优选的，所述输出表达层中包括全连接层；所述全连接层的特征向量中添加有量化参数QP。Preferably, the output expression layer includes a fully connected layer; a quantization parameter QP is added to the feature vector of the fully connected layer.

优选的，所述网络模型的损失函数如下：Preferably, the loss function of the network model is as follows:

其中，表示真实值和预测值的交叉熵，、、分别表示第一级64×64、第二级32×32、第三级16×16CU的真实模式标签，表示64×64CTU的真实模式标签，,表示4个32×32CTU的真实模式标签，表示4×4个16 ×16CTU的真实模式标签；同理，、、分别表示第一级64×64、第二级32 ×32、第三级16×16的预测标签，表示64×64CTU的预测模式标签，,则表示4个32×32CTU的预测模式标签，表示4×4个16×16CTU的预测模式标签；网络的预测标签和真实标签都经过了二值化，范围在[0，1]之间。 in, Represents the cross entropy of the true value and the predicted value, , , Respectively represent the real mode labels of the first level 64×64, the second level 32×32, and the third level 16×16CU, Real mode tag representing 64×64CTU, , Represents 4 real mode tags of 32×32CTU, Represents 4×4 real mode labels of 16×16CTU; similarly, , , Represents the prediction labels of 64×64 in the first level, 32×32 in the second level, and 16×16 in the third level respectively. Represents the prediction mode label of 64×64CTU, , then represents four prediction mode labels of 32×32CTU, Represents 4 × 4 prediction mode labels of 16 × 16 CTU; both the prediction label and the real label of the network have been binarized, ranging from [0, 1].

优选的，所述网络模型预测步骤中，网络模型输出21个二进制标签表示64×64、32×32、16×16的CTU是否会划分以及在此基础上是否会选择IBC/PLT模式。Preferably, in the network model prediction step, the network model outputs 21 binary labels indicating whether the 64×64, 32×32, and 16×16 CTUs will be divided and whether the IBC/PLT mode will be selected based on this.

优选的，所述当前CU模式预测步骤，具体包括：Preferably, the current CU mode prediction step specifically includes:

当CU尺寸为8×8时，计算相邻3个CU所使用的IBC、PLT模式数量和Intra模式数量；When the CU size is 8×8, calculate the number of IBC, PLT modes and Intra modes used by three adjacent CUs;

具体的，当时，候选模式只有Intra模式；当且时，候选模式为IBC和PLT模式；当且时，候选模式为Intra、IBC和PLT模式。 Specifically, when When , the only candidate mode is Intra mode; when and When , the candidate modes are IBC and PLT modes; when and , the candidate modes are Intra, IBC and PLT modes.

另一方面，一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码装置，包括：On the other hand, a HEVC-SCC fast coding device based on IPMS-CNN and spatial adjacent CU coding mode includes:

数据集制作模块，建立不同分辨率的视频序列数据集并编码，获取不同量化参数下HEVC-SCC的各个CU是否使用IBC/PLT模式的真实标签；The data set production module creates and encodes video sequence data sets of different resolutions to obtain the real labels of whether each CU of HEVC-SCC uses IBC/PLT mode under different quantization parameters;

网络模型构建模块，构建包括输入层、特征提取层和输出表达层的网络模型IPMS-CNN；其中，特征提取层中搭建三个卷积层，提取三种特征图；The network model building module builds a network model IPMS-CNN including an input layer, a feature extraction layer and an output expression layer; among them, three convolutional layers are built in the feature extraction layer to extract three types of feature maps;

网络模型训练模块，基于制作的数据集，对构建的网络模型进行训练，获得训练好的网络模型IPMS-CNN；The network model training module trains the constructed network model based on the produced data set and obtains the trained network model IPMS-CNN;

网络模型预测模块，将CTU输入到训练好的IPMS-CNN，获得模式预测标签，以预测出CTU的模式选择；The network model prediction module inputs the CTU into the trained IPMS-CNN and obtains the mode prediction label to predict the mode selection of the CTU;

当前CU模式预测模块，计算相邻3个CU所使用的IBC/PLT模式数量和相邻3个CU所使用的Intra模式数量，根据两个数量关系联合预测当前8×8CU 模式； Current CU mode prediction module, calculates the number of IBC/PLT modes used by three adjacent CUs and the number of Intra modes used by three adjacent CUs , jointly predict the current 8×8CU mode based on the two quantitative relationships;

编码模块，编码器基于网络模型预测步骤调用预测标签，与当前CU模式预测步骤一起预测CU划分结果。Encoding module, the encoder calls the prediction label based on the network model prediction step, and predicts the CU division result together with the current CU mode prediction step.

与现有技术相比，本发明的有益效果如下：Compared with the prior art, the beneficial effects of the present invention are as follows:

（1）本发明首先搭建数据库，训练IBC/PLT模式选择的卷积神经网络模型（IPMS-CNN）；其次将输入的CTU通过模式选择网络，输出CTU的模式预测标签；最后通过统计相邻3个CU所使用的模式数量来预测当前CU选择的模式，本发明能够在保持编码质量的同时减少编码时间，降低屏幕内容视频的计算复杂度；(1) This invention first builds a database and trains the convolutional neural network model (IPMS-CNN) for IBC/PLT mode selection; secondly, the input CTU is passed through the mode selection network and the mode prediction label of the CTU is output; finally, the adjacent 3 The number of modes used by each CU is used to predict the mode selected by the current CU. This invention can reduce encoding time while maintaining encoding quality, and reduce the computational complexity of screen content video;

（2）本发明采用四种尺度特征融合的网络结构，其中，下采样后的特征图会与后续经过卷积层得到的特征图一起送到连接层中，下采样后的图像提供了一部分深层特征，而卷积层的特征图则提供了一些浅层特征，这些浅层特征和深层特征的结合，不仅可以增加训练数据的数量，还可以给全连接层提供更多的特征信息，提高模型的特征表达能力和预测模式选择的准确性；(2) The present invention adopts a four-scale feature fusion network structure. Among them, the down-sampled feature map will be sent to the connection layer together with the subsequent feature map obtained through the convolution layer. The down-sampled image provides a part of the deep layer. Features, and the feature map of the convolutional layer provides some shallow features. The combination of these shallow features and deep features can not only increase the amount of training data, but also provide more feature information to the fully connected layer, improving the model. Feature expression ability and accuracy of prediction model selection;

（3）本发明在输出表达层的全连接层中添加QP这个外部向量，可以使模型更好地学习如何在不同的QP下选择最佳的编码模式，能够让模型更好地适应各种QP值，进而生成更高质量的重建视频；(3) The present invention adds the external vector QP to the fully connected layer of the output expression layer, which allows the model to better learn how to select the best encoding mode under different QPs, and allows the model to better adapt to various QPs. value, thereby generating a higher quality reconstructed video;

（4）本发明通过将卷积神经网络预测大尺寸CU模式的方法与基于空间相邻CU所采用模式数量预测小尺寸CU模式的方法结合，可以更加精确地预测CU的模式，从而降低模式选择的复杂度。(4) The present invention combines the method of predicting large-size CU modes by convolutional neural network with the method of predicting small-size CU modes based on the number of modes used by spatially adjacent CUs, so that the mode of CUs can be predicted more accurately, thereby reducing mode selection. complexity.

附图说明Description of the drawings

图1为本发明的基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码方法的流程图；Figure 1 is a flow chart of the HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode of the present invention;

图2为本发明的IPMS-CNN卷积神经网络结构示意图；Figure 2 is a schematic structural diagram of the IPMS-CNN convolutional neural network of the present invention;

图3为本发明的当前8×8CU和相邻CU示意图；Figure 3 is a schematic diagram of the current 8×8 CU and adjacent CUs of the present invention;

图4为本发明的将卷积神经网络预测大尺寸CU模式的方法与基于空间相邻CU所采用模式数量预测小尺寸CU模式的方法相连接的详细流程图；Figure 4 is a detailed flow chart of the present invention that connects the method of predicting large-sized CU patterns by a convolutional neural network and the method of predicting small-sized CU patterns based on the number of patterns used in spatially adjacent CUs;

图5为本发明的MFF-CNN网络结构示意图；Figure 5 is a schematic diagram of the MFF-CNN network structure of the present invention;

图6为本发明的基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码装置的结构框图。Figure 6 is a structural block diagram of the HEVC-SCC fast coding device based on IPMS-CNN and spatial adjacent CU coding mode of the present invention.

具体实施方式Detailed ways

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解，在阅读了本发明讲授的内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。The present invention will be further described below in conjunction with specific embodiments. It should be understood that these examples are only used to illustrate the invention and are not intended to limit the scope of the invention. In addition, it should be understood that after reading the teachings of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of this application.

为了解决HEVC-SCC中CU划分复杂度高的问题，本实施例提出一种基于多尺度特征融合（MFF-CNN）的HEVC-SCC帧内CU快速划分编码方法，用于在不影响主观质量的同时加快编码时间，降低编码复杂度。In order to solve the problem of high CU division complexity in HEVC-SCC, this embodiment proposes a HEVC-SCC intra-frame CU fast division coding method based on multi-scale feature fusion (MFF-CNN), which is used to achieve high performance without affecting subjective quality. At the same time, it speeds up the encoding time and reduces the encoding complexity.

具体的，参见图1所示，一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码方法，包括：Specifically, see Figure 1, a HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode, including:

S101，数据集制作步骤，建立不同分辨率的视频序列数据集并编码，获取不同量化参数下HEVC-SCC的各个CU是否使用IBC/PLT模式的真实标签；S101, the data set production step, establish and encode video sequence data sets of different resolutions, and obtain the real labels of whether each CU of HEVC-SCC uses IBC/PLT mode under different quantization parameters;

S102，网络模型构建步骤，构建包括输入层、特征提取层和输出表达层的网络模型IPMS-CNN；其中，特征提取层中搭建三个卷积层，特征提取层中搭建三个卷积层提取三种特征图，加上下采样后得到的特征图，共提取四种不同尺寸的特征图；S102, network model construction step, construct a network model IPMS-CNN including an input layer, a feature extraction layer and an output expression layer; among them, three convolutional layers are built in the feature extraction layer, and three convolutional layers are built in the feature extraction layer to extract Three types of feature maps, plus the feature map obtained after downsampling, extract a total of four feature maps of different sizes;

S103，网络模型训练步骤，基于制作的数据集，对构建的网络模型进行训练，获得训练好的IBC/PLT模式选择卷积神经网络（IBC /PLT Mode Selection ConvolutionNeural Network，IPMS-CNN）模型；S103, network model training step, based on the produced data set, train the constructed network model to obtain the trained IBC/PLT Mode Selection Convolution Neural Network (IPMS-CNN) model;

S104，将LCU输入到训练好的IPMS-CNN，获得64×64、32×32、16×16CU的模式预测标签，以预测出CTU的模式选择；S104, input the LCU into the trained IPMS-CNN, and obtain the mode prediction labels of 64×64, 32×32, and 16×16CU to predict the mode selection of the CTU;

S105，8×8 CU模式预测步骤，8×8CU的模式预测则通过统计空域相邻CU所采用的模式数量来预测当前CU可能选择的模式。计算相邻3个CU所使用的IBC/PLT模式数量和相邻3个CU所使用的Intra模式数量，根据两个数量关系联合预测当前8×8CU模式； S105, 8×8 CU mode prediction step. The mode prediction of 8×8 CU predicts the mode that the current CU may select by counting the number of modes used by adjacent CUs in the spatial domain. Calculate the number of IBC/PLT modes used by three adjacent CUs and the number of Intra modes used by three adjacent CUs , jointly predict the current 8×8CU mode based on two quantitative relationships;

S106，编码步骤，编码器基于网络模型预测步骤调用预测标签，与当前CU模式预测步骤一起预测CU划分结果。S106, encoding step, the encoder calls the prediction label based on the network model prediction step, and predicts the CU division result together with the current CU mode prediction step.

本实施例中，所述数据集制作步骤，具体包括：In this embodiment, the data set creation steps specifically include:

制作三种不同分辨率的视频序列数据集：1024×576、1792×102、2304×1280，数据集涵盖图片数据集和视频数据集，且每种分辨率都包含TGM/M、A、CC三种类型视频序列；Create video sequence data sets of three different resolutions: 1024×576, 1792×102, and 2304×1280. The data sets cover image data sets and video data sets, and each resolution includes TGM/M, A, and CC. types of video sequences;

将所有数据集序列通过HM16.12+SCM8.3进行编码，在全帧内（All Intra，AI）配置下，设置量化参数QP：22、27、32、37，获取四个QP下的各个CU的IBC/PLT模式的模式标签，用于下一步的网络模型训练。Encode all data set sequences through HM16.12+SCM8.3. Under the All Intra (AI) configuration, set the quantization parameters QP: 22, 27, 32, 37, and obtain each CU under the four QPs. The mode label of the IBC/PLT mode is used for the next step of network model training.

参见图2所示，本实施例的IPMS-CNN网络有三个组成部分：输入层、特征提取层和输出表达层。在网络的特征提取层中搭建三个卷积层，提取三种特征图，同时将经过下采样后的特征图直接送入网络的连接层中。因此，网络可以获得12种不同尺度的特征图。网络采用交叉熵损失函数作为目标函数，公式如下：As shown in Figure 2, the IPMS-CNN network in this embodiment has three components: input layer, feature extraction layer and output expression layer. Three convolutional layers are built in the feature extraction layer of the network, three types of feature maps are extracted, and the downsampled feature maps are directly sent to the connection layer of the network. Therefore, the network can obtain feature maps of 12 different scales. The network uses the cross-entropy loss function as the objective function, and the formula is as follows:

其中，表示真实值和预测值的交叉熵，、、分别表示第一级（64×64）、第二级（32×32）、第三级（16×16）CU的真实模式标签，表示64× 64CTU的真实模式标签，,则表示4个32×32CTU的真实模式标签，则表示4×4个16×16CTU的真实模式标签；同理，、、分别表示第一级（64×64）、第二级（32×32）、第三级（16×16）的预测标签，表示64×64CTU的预测模式标签，,则表示4个32×32CTU的预测模式标签，则表示4×4个16 ×16CTU的预测模式标签；网络的预测标签和真实标签都经过了二值化，范围在[0，1]之间。 in, Represents the cross entropy of the true value and the predicted value, , , Represents the real mode labels of the first-level (64×64), second-level (32×32), and third-level (16×16) CUs respectively. Represents the true mode tag of 64×64CTU, , It means four real mode labels of 32×32CTU, It means 4×4 real mode labels of 16×16CTU; similarly, , , Represents the prediction labels of the first level (64×64), the second level (32×32), and the third level (16×16) respectively. Represents the prediction mode label of 64×64CTU, , then represents four prediction mode labels of 32×32CTU, It represents 4 × 4 prediction mode labels of 16 × 16 CTU; the predicted labels and real labels of the network have been binarized, and the range is between [0, 1].

由于量化参数QP是一种可以控制视频压缩质量的参数，QP越大，其压缩比越高，但是压缩后的图像质量越低。而在HEVC-SCC中，QP对于SCC模式的选择也非常重要，因为其可以控制量化误差的大小，从而影响最终的图像质量。当对同一视频帧进行编码时，QP越大，编码后的CU模式选择越粗糙，往往选择Intra模式；而QP越小，则编码后的QP越有可能会找到合适的参考从而选择IBC或者PLT模式。因此，在输出表达层的全连接层中添加QP这个外部向量，可以使模型更好地学习如何在不同的QP下选择最佳的编码模式，能够让模型更好地适应各种QP值，进而生成更高质量的重建视频。Since the quantization parameter QP is a parameter that can control the quality of video compression, the larger the QP, the higher the compression ratio, but the lower the quality of the compressed image. In HEVC-SCC, QP is also very important for the selection of SCC mode, because it can control the size of the quantization error, thus affecting the final image quality. When encoding the same video frame, the larger the QP, the rougher the encoded CU mode selection, and Intra mode is often selected; and the smaller the QP, the more likely the encoded QP will find a suitable reference and select IBC or PLT. model. Therefore, adding the external vector QP to the fully connected layer of the output expression layer can enable the model to better learn how to select the best encoding mode under different QP, allowing the model to better adapt to various QP values, and then Generate higher quality reconstructed videos.

本实施例中，所述网络模型最终输出21个二进制标签表示64×64、32×32、16×16的CTU是否会划分以及在此基础上是否会选择IBC/PLT模式。In this embodiment, the network model finally outputs 21 binary labels indicating whether the 64×64, 32×32, and 16×16 CTUs will be divided and whether the IBC/PLT mode will be selected based on this.

所述当前CU模式预测步骤，具体包括：The current CU mode prediction step specifically includes:

当时，候选模式只有Intra模式；当且时，候选模式为IBC和PLT模式；当且时，候选模式为Intra、IBC和PLT模式。 when When , the only candidate mode is Intra mode; when and When , the candidate modes are IBC and PLT modes; when and , the candidate modes are Intra, IBC and PLT modes.

参见图3所示，为本实施例当前8×8CU和相邻CU示意图。Refer to Figure 3, which is a schematic diagram of the current 8×8 CU and adjacent CUs in this embodiment.

参见图4所示，为本实施例的将卷积神经网络预测大尺寸CU模式的方法与基于空间相邻CU所采用模式数量预测小尺寸CU模式的方法相连接的详细流程图，具体如下：Refer to Figure 4, which is a detailed flowchart of this embodiment that connects the method of predicting large-sized CU patterns by a convolutional neural network and the method of predicting small-sized CU patterns based on the number of patterns used by spatially adjacent CUs. The details are as follows:

（a）网络输入LCU，调用多尺度特征融合的卷积神经网络模型MFF-CNN模型（Multi-scale Feature Fusion Convolution Neural Network）（见图5）、IPMS-CNN模式选择模型，分别输出21个二进制标签表示64×64、32×32、16×16的CTU是否会划分以及在此基础上是否会选择IBC/PLT模式；(a) The network inputs LCU, calls the multi-scale feature fusion convolutional neural network model MFF-CNN model (Multi-scale Feature Fusion Convolution Neural Network) (see Figure 5) and IPMS-CNN mode selection model, and outputs 21 binary values respectively. The label indicates whether the 64×64, 32×32, and 16×16 CTUs will be divided and whether the IBC/PLT mode will be selected based on this;

（b）当CU尺寸为8×8时，计算相邻3个CU所使用的IBC、PLT模式数量和Intra模式数量。当时，候选模式只有Intra模式；当且时，候选模式为IBC、PLT模式；当且时，候选模式为Intra、 IBC、PLT模式； (b) When the CU size is 8×8, calculate the number of IBC, PLT modes and Intra modes used by three adjacent CUs. when When , the only candidate mode is Intra mode; when and When , the candidate modes are IBC and PLT modes; when and When , the candidate modes are Intra, IBC, and PLT modes;

（c）编码器根据步骤（a）调用网络标签，与步骤（b）一起预测CU划分结果，从而跳过不必要的模式遍历，减少编码时间，加速屏幕内容视频的编码过程。(c) The encoder calls the network label according to step (a) and predicts the CU division result together with step (b), thereby skipping unnecessary mode traversal, reducing encoding time, and accelerating the encoding process of screen content video.

参见图5所示，为本实施例MFF-CNN网络结构示意图。Refer to Figure 5, which is a schematic diagram of the MFF-CNN network structure in this embodiment.

参见图6所示，本实施例还公开了一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码装置，包括：Referring to Figure 6, this embodiment also discloses a HEVC-SCC fast coding device based on IPMS-CNN and spatial adjacent CU coding mode, including:

数据集制作模块601，建立不同分辨率的视频序列数据集并编码，获取不同量化参数下HEVC-SCC的各个CU是否使用IBC/PLT模式的真实标签；The data set production module 601 creates and encodes video sequence data sets of different resolutions to obtain the real labels of whether each CU of HEVC-SCC uses IBC/PLT mode under different quantization parameters;

网络模型构建模块602，构建包括输入层、特征提取层和输出表达层的网络模型IPMS-CNN；其中，特征提取层中搭建三个卷积层，提取三种特征图；The network model building module 602 constructs a network model IPMS-CNN including an input layer, a feature extraction layer and an output expression layer; among them, three convolutional layers are built in the feature extraction layer to extract three types of feature maps;

网络模型训练模块603，基于制作的数据集，对构建的网络模型进行训练，获得训练好的网络模型IPMS-CNN；The network model training module 603 trains the constructed network model based on the produced data set and obtains the trained network model IPMS-CNN;

网络模型预测模块604，将CTU输入到训练好的IPMS-CNN，获得模式预测标签，以预测出CTU的模式选择；The network model prediction module 604 inputs the CTU into the trained IPMS-CNN and obtains the mode prediction label to predict the mode selection of the CTU;

当前CU模式预测模块605，计算相邻3个CU所使用的IBC/PLT模式数量和相邻3个CU所使用的Intra模式数量，根据两个数量关系联合预测当前8× 8CU模式； The current CU mode prediction module 605 calculates the number of IBC/PLT modes used by three adjacent CUs. and the number of Intra modes used by three adjacent CUs , jointly predict the current 8×8CU mode based on the two quantitative relationships;

编码模块606，编码器基于网络模型预测步骤调用预测标签，与当前CU模式预测步骤一起预测CU划分结果。In the encoding module 606, the encoder calls the prediction tag based on the network model prediction step, and predicts the CU division result together with the current CU mode prediction step.

一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码装置各模块的具体实现同一种基于IPMS-CNN和空域相邻CU编码模式的HEVC-SCC快速编码方法，本实施例不再重复说明。Specific implementation of each module of a HEVC-SCC fast encoding device based on IPMS-CNN and spatial adjacent CU coding mode. The same HEVC-SCC fast encoding method based on IPMS-CNN and spatial adjacent CU coding mode. This embodiment does not Repeat the explanation again.

上述仅为本发明的具体实施方式，但本发明的设计构思并不局限于此，凡利用此构思对本发明进行非实质性的改动，均应属于侵犯本发明保护范围的行为。The above are only specific embodiments of the present invention, but the design concept of the present invention is not limited thereto. Any non-substantive changes to the present invention using this concept shall constitute an infringement of the protection scope of the present invention.

Claims

1. A HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode, which is characterized by:

In the data set production step, video sequence data sets of different resolutions are established and encoded to obtain the real labels of whether each CU of HEVC-SCC uses IBC/PLT mode under different quantization parameters;

The network model construction step is to build a network model IPMS-CNN including an input layer, a feature extraction layer and an output expression layer. Among them, three convolutional layers are built in the feature extraction layer to extract three types of feature maps, plus the feature maps obtained after downsampling. , a total of four feature maps of different sizes are extracted;

The network model training step is to train the constructed network model based on the produced data set to obtain the trained IBC/PLT mode selection convolutional neural network IPMS-CNN model;

In the network model prediction step, the LCU is input into the trained IPMS-CNN to obtain the mode prediction label to predict the mode selection of the CTU;

The current CU mode prediction step calculates the number of IBC/PLT modes used by three adjacent CUs NumSCC _Neigh and the number of Intra modes used by three adjacent CUs NumIntra _Neigh , and jointly predicts the current 8×8 CU mode based on the relationship between the two quantities; In the encoding step, the encoder calls the prediction label based on the network model prediction step, and predicts the CU division result together with the current CU mode prediction step;

In the network model prediction step, the network model outputs 21 binary labels indicating whether the 64×64, 32×32, and 16×16 CTUs will be divided and whether the IBC/PLT mode will be selected on this basis;

The current CU mode prediction step specifically includes:

When the CU size is 8×8, calculate the number of IBC, PLT modes and Intra modes used by three adjacent CUs; specifically, when NumScc _Neigh = 0, the only candidate mode is the Intra mode; when NumSCC _Neigh ≠ 0 and When NumIntra _Neigh = 0, the candidate modes are IBC and PLT modes; when NumSCC _Neigh ≠0 and NumIntra _Neigh ≠0, the candidate modes are Intra, IBC and PLT modes.

2. The HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode according to claim 1, characterized in that the data set production step specifically includes:

Self-made video sequence data sets of three different resolutions. The data sets cover image data sets and video data sets, including three types of video sequences: TGM/M, A, and CC;

Then encode through the standard encoding software platform, and set the mode tag of the IBC/PLT mode of each CU under different quantization parameters QP under the full intra-frame configuration.

3. The HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode according to claim 1, characterized in that the data set includes: a training set, a verification set and a test set; Each set in the training set, validation set, and test set contains three subsets; the first subset has a resolution of 1024×576, the second subset has a resolution of 1792×1024, and the third subset has a resolution of 2304× 1280.

4. The HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode according to claim 1, characterized in that the quantization parameters include four quantization levels, respectively 22, 27, 32 and 37.

5. The HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode according to claim 1, characterized in that, in the network model building step, three convolutional layers are built in the feature extraction layer to extract Three types of feature maps, and the downsampled feature maps will be directly sent to the connection layer of the network.

6. The HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode according to claim 1, characterized in that the output expression layer includes a fully connected layer; characteristics of the fully connected layer The quantization parameter QP is added to the vector.

7. The HEVC-SCC fast coding method based on IPMS-CNN and spatial adjacent CU coding mode according to claim 1, characterized in that the loss function of the network model is as follows:

Among them, H(*,*) represents the cross entropy of the true value and the predicted value, Respectively representing the real mode labels of the first level 64×64, the second level 32×32, and the third level 16×16CU, /> Represents the true mode tag of 64×64CTU, /> Represents four real mode tags of 32×32CTU,/> Represents 4×4 real mode tags of 16×16CTU; similarly, /> Represents the prediction labels of the first level 64×64, the second level 32×32, and the third level 16×16 respectively, /> Indicates the prediction mode label of 64×64CTU, /> It means four prediction mode labels of 32×32CTU,/> Represents 4 × 4 prediction mode labels of 16 × 16 CTU; both the prediction label and the real label of the network have been binarized, ranging from [0, 1].

8. A HEVC-SCC fast coding device based on IPMS-CNN and spatial adjacent CU coding mode, which is characterized by including:

The data set production module creates and encodes video sequence data sets of different resolutions to obtain the real labels of whether each CU of HEVC-SCC uses IBC/PLT mode under different quantization parameters;

The network model building module builds a network model IPMS-CNN including an input layer, a feature extraction layer and an output expression layer; among them, three convolutional layers are built in the feature extraction layer to extract three types of feature maps;

The network model training module trains the constructed network model based on the produced data set and obtains the trained network model IPMS-CNN;

The network model prediction module inputs the CTU into the trained IPMS-CNN and obtains the mode prediction label to predict the mode selection of the CTU;

The current CU mode prediction module calculates the number of IBC/PLT modes used by three adjacent CUs NumSCC _Neigh and the number of Intra modes used by three adjacent CUs NumIntra _Neight , and jointly predicts the current 8×8 CU mode based on the relationship between the two quantities; Coding module, the encoder calls the prediction label based on the network model prediction step, and predicts the CU division result together with the current CU mode prediction step;

In the network model prediction module, the network model outputs 21 binary labels indicating whether the 64×64, 32×32, and 16×16 CTUs will be divided and whether the IBC/PLT mode will be selected on this basis;

The current CU mode prediction module specifically includes: