CN114513660A - Interframe image mode decision method based on convolutional neural network - Google Patents

Interframe image mode decision method based on convolutional neural network Download PDF

Info

Publication number
CN114513660A
CN114513660A CN202210407485.0A CN202210407485A CN114513660A CN 114513660 A CN114513660 A CN 114513660A CN 202210407485 A CN202210407485 A CN 202210407485A CN 114513660 A CN114513660 A CN 114513660A
Authority
CN
China
Prior art keywords
layer
residual
coding
mode
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210407485.0A
Other languages
Chinese (zh)
Other versions
CN114513660B (en
Inventor
蒋先涛
张纪庄
郭咏梅
郭咏阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Kangda Kaineng Medical Technology Co ltd
Original Assignee
Ningbo Kangda Kaineng Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Kangda Kaineng Medical Technology Co ltd filed Critical Ningbo Kangda Kaineng Medical Technology Co ltd
Priority to CN202210407485.0A priority Critical patent/CN114513660B/en
Publication of CN114513660A publication Critical patent/CN114513660A/en
Application granted granted Critical
Publication of CN114513660B publication Critical patent/CN114513660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • G06N3/0454Architectures, e.g. interconnection topology using a combination of multiple neural nets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive

Abstract

The invention discloses an interframe image mode decision method based on a convolutional neural network, which relates to the technical field of image processing and comprises the following steps: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode; extracting bottom layer characteristics in the input information through a convolution layer in the multilayer tree CNN by taking the connected coded image and the residual image as the input information; performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN; performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition; and coding the target inter-frame image according to the partition mode of each coding block under each coding depth. The method utilizes the advantages of low coding bit rate requirement of the merging mode and the characteristic learning of the convolutional neural network, and reduces the coding time while maintaining the rate distortion performance.

Description

Interframe image mode decision method based on convolutional neural network
Technical Field
The invention relates to the technical field of image processing, in particular to an interframe image mode decision method based on a convolutional neural network.
Background
With the development of multimedia technology, new video formats such as Ultra High Definition (UHD), Virtual Reality (VR), 360-degree video, etc. have appeared. Accordingly, there is an increasing demand for new video coding standards that support higher resolution and higher coding efficiency. Universal video coding (VVC) was developed by the joint video development team (jfet) of VCEG and MPEG. The protocol was finalized in month 7 of 2020. VVC, as the latest video coding standard, employs several new coding schemes and tools, such as a Coding Tree Unit (CTU) with a maximum size of 128 × 128, a quadtree and multi-type tree structure (QT + MTT) divided by a Coding Unit (CU), affine motion compensated prediction, and the like. These new techniques achieve approximately 50% gain over the HEVC standard in terms of bit rate reduction. However, the computational complexity of encoding and decoding also increases dramatically.
The VVC encoder takes advantage of the redundancy that exists between pictures. After the block division, motion compensation is performed on each coding block. There are two main coding methods for intra prediction mode: an Advanced Motion Vector Prediction (AMVP) mode and a merge mode. In the AMVP mode, optimal values of a plurality of motion vector candidates, motion vector difference values, reference picture numbers, and unidirectional/bidirectional prediction modes are encoded. In merge mode, only the optimal values of the plurality of candidate motion vectors are encoded. The AMVP mode has the advantages of free parameter determination and encoding, but the number of bits required for encoding parameters is high, and a complicated encoding process and motion estimation are required. For the Merge mode, the number of bits required for encoding is very small, but the prediction value is not accurate. Some studies on coding under the VVC coding standard have been completed, however, few studies have considered the characteristics of mutual prediction, and related studies have shown that CNN-based methods are suitable for processing images. Therefore, we define the problem as using Convolutional Neural Network (CNN) to decide the split mode of the Code Tree Unit (CTU), and propose a new VVC inter-frame prediction fast mode decision method of convolutional neural network.
Disclosure of Invention
In order to reduce the high coding computation complexity of interframe coding caused by adopting an advanced motion vector prediction mode, the invention provides an interframe image mode decision method based on a convolutional neural network, which comprises the following steps of:
s1: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode;
s2: extracting bottom layer characteristics in the input information through a convolutional layer in the multilayer tree-shaped CNN by taking the connected coded image and the residual image as the input information;
s3: performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN, and acquiring convolution output of each layer;
s4: performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition;
s5: and judging whether the current coding depth reaches the maximum depth, if so, coding the target inter-frame image according to the partition mode of each coding block under each coding depth, and if not, entering the next coding depth and returning to the step of S1.
Further, the step of S1 is preceded by the step of:
s0: and training a multilayer tree CNN based on partition division mode selection results under each coding depth acquired by the advanced motion vector mode and corresponding inter-frame images.
Further, in the step S0, the multi-layer tree CNN is trained based on a weighted classification cross entropy loss function, where the weighted classification cross entropy loss function may be expressed as the following formula:
where loss is the weighted classification loss, L is the total number of residual layers in the multi-layer tree CNN,is a constant that is initially 1 and,is as followsThe weight of the layer residual layer(s),for multi-layer tree CNN atCross entropy loss when layers are residual.
Further, the partition dividing mode includes a non-dividing mode, a quadtree mode, a horizontal binary tree mode, a vertical binary tree mode, a horizontal ternary tree mode, and a vertical ternary tree mode.
Further, the step of S4 is followed by the step of:
s41: judging whether the partition mode of the coding block under the current coding depth and partition is a non-partition mode, if so, entering the step S42, and if not, entering the step S5;
s42: and stopping the partition mode decision of the subsequent coding depth of the coding block, and after the partition mode decision of all the coding blocks, coding the target inter-frame image according to the partition mode of each coding block under each coding depth.
Further, after the step of S3, the method further includes the steps of:
s31: and performing information vector connection on the convolution output, the image number information and the quantization parameters of the coding blocks under the current coding depth and partition division.
Further, the multi-layered tree CNN includes:
the convolution layer comprises a convolution kernel of 3 multiplied by 3 and is used for extracting bottom layer characteristics in the input information;
the transition residual error layer is used for outputting a first residual error block according to the bottom layer characteristics;
a head-end residual layer for outputting a first convolution output and a second residual block by convolution between the bottom-layer features and the first residual block;
a middle residual layer for outputting a second convolution output and a third residual block through convolution between the bottom layer features and the second residual block;
a final residual layer for outputting a third convolution output by convolution between the bottom layer features and a third residual block;
the full connection layer is used for fully connecting the first convolution output, the second convolution output and the third convolution output and outputting a partition division mode decision;
the convolutional layer, the transition residual layer, the head residual layer, the middle residual layer and the tail residual layer are sequentially connected.
Furthermore, an information vector connection layer is respectively connected between the head end residual error layer, the middle residual error layer, the tail end residual error layer and the full connection layer, and the information vector connection layer is used for performing information vector connection on convolution output, corresponding image number information and quantization parameters of the coding blocks.
Compared with the prior art, the invention at least has the following beneficial effects:
(1) the interframe image mode decision method based on the convolutional neural network combines a merging mode (Merge) in an internal prediction mode with the Convolutional Neural Network (CNN), and reduces the time required by coding while keeping the rate distortion performance by utilizing the advantages of low coding bit rate requirement of the merging mode and the characteristic learning of the convolutional neural network;
(2) adding image number information and quantization parameters of coding blocks under the current coding depth and partition division into the multilayer tree-shaped CNN, so that the multilayer tree-shaped CNN can better accord with the parameter characteristics in the actual coding process, and the decision accuracy is further improved;
(3) aiming at the partition problem of the coding block, considering that the partition of the block is similar to a tree-shaped split structure, and learning the characteristics of a layered split tree when the coding block is partitioned through a multilayer tree-shaped CNN by setting a weight different from iteration;
(4) in the training process of the multilayer tree-shaped CNN, higher weights are set for corresponding levels in different training stages, so that the trained multilayer tree-shaped CNN can solve complex problems more effectively.
Drawings
FIG. 1 is a diagram of method steps for an interframe image mode decision method based on a convolutional neural network;
fig. 2 is a schematic diagram of a multi-layer tree CNN architecture.
Detailed Description
The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.
Example one
The VVC inherits the quadtree partition of HEVC, and in order to better adapt to the coding of ultra high definition video, the maximum coding tree unit size allowed by the VVC is 128 × 128. For the VVC inter-frame coding and QT + MTT partitioning problem, a new computational complexity optimization method is needed. With the QT + MTT partition structure, the coding unit can partition between a Quadtree (QT), a Binary Tree (BT), and a Ternary Tree (TT). In addition, horizontal (H) and vertical (V) direction splitting may also be used for BT and TT. Therefore, the coding unit has 6 splitting modes (i.e. Non-split mode Non-split, quadtree mode QT, horizontal binary tree mode BT _ H, vertical binary tree mode BT _ V, horizontal ternary tree mode TT _ H and vertical ternary tree mode TT _ V, which are referred to by the numbers 0 to 5 in the present invention, respectively). More specifically, the coding tree units are first partitioned by the QT structure. Then, the coding unit in each QT leaf node is further partitioned by a QT or MTT structure.
Since the VVC expands the maximum coding tree unit allowable size of HEVC and introduces quadtree partitioning, in order to better encode an inter-frame image, the VVC generally adopts an Advanced Motion Vector Prediction (AMVP) mode that requires a complex encoding process and motion estimation and requires a high number of bits for encoding parameters. This results in VVC for inter prediction using AMVP mode requiring a large amount of computational effort for calculation of the optimal coding mode, which in turn results in coding efficiency that is not as high as desired. Meanwhile, considering that the block division of the coding unit is similar to a tree-shaped split structure, as shown in fig. 1, the invention provides an interframe image mode decision method based on a convolutional neural network, which comprises the following steps:
s1: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode;
s2: extracting bottom layer characteristics in the input information through a convolutional layer in the multilayer tree-shaped CNN by taking the connected coded image and the residual image as the input information;
s3: performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN, and acquiring convolution output of each layer;
s4: performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition;
s5: and judging whether the current coding depth reaches the maximum depth, if so, coding the target inter-frame image according to the partition mode of each coding block under each coding depth, and if not, entering the next coding depth and returning to the step of S1.
The multilayer tree-shaped CNN appearing in the step is designed for a QT + MTT structure in the VVC coding standard. As shown in fig. 2, the network is mainly composed of one convolutional layer and four residual layers (ResBlock, including a BN layer, a ReLU layer, and a Conv layer connected in sequence) of different sizes, and is divided into three hierarchical layers. Among them, the convolutional layers (Conv 3, 32), the transition residual layers (ResBlock, 32), the head residual layers (ResBlock, 64), the intermediate residual layers (ResBlock, 128), and the tail residual layers (ResBlock, 256) are connected in this order. Firstly, obtaining a motion vector prediction result with lower prediction precision obtained after a merging mode is executed on a corresponding inter-frame image under the current coding depth, wherein the motion vector prediction result comprises a coding image and a residual image of the next coding depth. Then, the coded image and the residual image are connected (both need to be used, but are guaranteed to be independent) as input information of the multi-layer tree-shaped CNN.
In the multilayer tree CNN, extraction of pixel-level bottom layer features is performed by a convolution layer having a convolution kernel size of 3 × 3. And then acquiring a first residual block based on the bottom layer characteristics through the transition residual layer. Then, by the head end residual layer, the middle residual layer and the tail end residual layer, according to the residual block (namely, the first residual block, the second residual block and the third residual block) output by the previous residual layer, the image residual information in the residual block is convoluted with the bottom layer characteristics, and the convolution output of the corresponding residual block and the hierarchy (namely, the first convolution output of the head end residual layer, the second convolution output of the middle residual layer and the third convolution output of the tail end residual layer) is output. Finally, the convolution outputs of the three levels of residual layers are fully connected through a full connection layer (FC), and the current coding depth of the target inter-frame image and the partition mode of the coding block under partition are output according to the convolution outputs.
Since the primary merging mode + the multi-layer tree CNN can only make a decision on motion vector selection and partition mode decision at one coding depth, the above operations need to be repeated to make motion vector selection and partition mode decision at each coding depth under the condition that the maximum coding depth is not reached. And when the maximum coding depth is reached, the target inter-frame image can be selected and coded according to the partition mode and the motion vector of each coding block under each coding depth.
It should be noted that, because the existence of the non-partition mode, the partition representing the current coded depth is partitioned to achieve the optimal partition effect, and partition partitioning is not required, in the operation process of the multi-layer tree CNN, the step of S4 is followed by the steps of:
s41: judging whether the partition mode of the coding block under the current coding depth and partition is a non-partition mode, if so, entering the step S42, and if not, entering the step S5;
s42: and stopping the partition mode decision of the subsequent coding depth of the coding block, and after the partition mode decision of all the coding blocks, coding the target inter-frame image according to the partition mode of each coding block under each coding depth.
In order to make the parameter quantity of the multi-layer tree-shaped CNN involved in calculation in the block division decision process consistent with that in the actual operation process and improve the training performance, an information vector connection layer (info. vector) is respectively connected between the head-end residual layer, the middle residual layer, the tail-end residual layer and the full connection layer and is used for performing information vector connection on the convolution output, the corresponding image number information and the quantization parameter of the coding block (if the information is unavailable, the information is set to zero). Correspondingly, after the step of S3, the method further includes the steps of:
s31: and performing information vector connection on the convolution output, the image number information and the quantization parameters of the coding block under the current coding depth and partition division.
Of course, it is obvious that the multi-layer tree CNN only based on the above structural and functional description cannot be applied to the actual inter-frame image coding process, and a training process is necessarily required before the actual operation. Therefore, before the multi-layer tree CNN is put into use, that is, before the step S1, the method further includes the steps of:
s0: and training a multilayer tree CNN based on partition division mode selection results under each coding depth acquired by the advanced motion vector mode and corresponding inter-frame images.
As the partition division mode selection result obtained by the method of the invention is not obtained in the initial training stage, the training data acquisition is carried out by depending on the advanced motion vector mode in the initial stage. And when the multilayer tree CNN training is finished and the multilayer tree CNN is put into operation for a period of time, the multilayer tree CNN can be updated by adopting the partition division mode selection result obtained by the method.
In the initial stage of designing the multilayer tree CNN, the cross loss function adopted by training is as follows:
where s is the pixel value of the input information,andwhen the input samples s are respectively represented, the categoryC is the total number of classes, and the true probability and the predicted probability of the block splitting pattern of (a). Further, due toCan be expressed as:
in the formula, j is the serial number of the category,is a natural logarithm and is used as a basic parameter,is a categoryThe pixel value of (2).
Therefore, in order to make the multi-layer tree-shaped CNN more suitable for the block division characteristics of the coding units under the VVC coding standard, the invention trains the multi-layer tree-shaped CNN by using a cross entropy loss function of weighted classification, specifically, the formula can be expressed as:
where loss is the weighted classification loss, L is the total number of residual layers in the multi-layer tree CNN,is a constant that is initially 1 and,is as followsLayer residual error layerThe weight of (a) is determined,for multi-layer tree CNN atCross entropy loss when layers are residual. In addition to the above-mentioned descriptions,the method needs to be iterated through training after running for a period of time along with the application of the multilayer tree-shaped CNN. It can be seen from the formula that, when the multi-layer tree-shaped CNN is trained on the residual layers at different levels, more weight can be given to the loss of the head-end residual layer in the early stage of training, and with the progress of learning, the residual layers at lower levels (the middle residual layer and the tail-end residual layer) can also obtain more weight, so that the problem under complex conditions can be solved more effectively by the multi-layer tree-shaped CNN under the cross entropy loss function training.
In summary, the interframe image mode decision method based on the convolutional neural network combines the Merge mode (Merge) in the intra-prediction mode with the Convolutional Neural Network (CNN), and utilizes the advantages of the low coding bit rate requirement of the Merge mode and the feature learning of the convolutional neural network, so as to reduce the time required by coding while maintaining the rate-distortion performance.
The image number information, the current coding depth and the quantization parameters of the coding blocks under the partition division are added into the multilayer tree-shaped CNN, so that the multilayer tree-shaped CNN can better accord with the parameter characteristics in the actual coding process, and the decision accuracy is further improved.
Aiming at the problem of coding block partitioning, considering that the partitioning of blocks is similar to a tree-shaped splitting structure, the hierarchical splitting tree characteristics when the coding block is partitioned are learned through multilayer tree-shaped CNNs by setting weights different from iteration. Meanwhile, in the training process of the multilayer tree-shaped CNN, higher weights are set for corresponding levels in different training stages, so that the trained multilayer tree-shaped CNN can solve the complex problem more effectively.
Example two
In order to better verify the technical effect of the method of the present invention, the present embodiment is described by a set of specific experimental data. Specifically, the performance of the algorithm is verified by comparing the rate distortion and the computational complexity in the algorithm and a VVC reference model (VTM) encoder, and a standard VVC video sequence is adopted in experimental tests. In training a multi-level tree CNN, an Adam optimizer is used, with an initial learning rate of 0.0008. To evaluate the performance of the proposed algorithm, the BDBR (Bj Brontegaard Delta Bit Rate) is used to evaluate the overall Rate-distortion characteristics of the proposed algorithm, with reduced coding computation complexity measured as the average saving coding time (Δ T).
Wherein the content of the first and second substances,andthe coding time of reference software and the coding time of the algorithm proposed by the patent are respectively under different quantization parameter QP values. The experimental results are shown in table 1, and it can be seen that the method of the present invention can reduce the encoding time by 34%, while the encoding efficiency is only lost by 1.1%, thus confirming the effectiveness of the present invention.
Table 1: list of experimental results
It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
Moreover, descriptions of the present invention as relating to "first," "second," "a," etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit ly indicating a number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be interconnected within two elements or in a relationship where two elements interact with each other unless otherwise specifically limited. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.

Claims (8)

1. An interframe image mode decision method based on a convolutional neural network is characterized by comprising the following steps:
s1: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode;
s2: extracting bottom layer characteristics in the input information through a convolutional layer in the multilayer tree-shaped CNN by taking the connected coded image and the residual image as the input information;
s3: performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN, and acquiring convolution output of each layer;
s4: performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition;
s5: and judging whether the current coding depth reaches the maximum depth, if so, coding the target inter-frame image according to the partition mode of each coding block under each coding depth, and if not, entering the next coding depth and returning to the step of S1.
2. The convolutional neural network-based interframe image mode decision method as claimed in claim 1, wherein said step of S1 is preceded by the step of:
s0: and training a multilayer tree CNN based on the partition mode selection result under each coding depth acquired by the advanced motion vector mode and the corresponding interframe image.
3. The convolutional neural network-based interframe image mode decision method as claimed in claim 2, wherein in the step S0, the multi-layer tree-shaped CNN is trained based on a weighted classification cross entropy loss function, and the weighted classification cross entropy loss function can be expressed as the following formula:
where loss is the weighted classification loss, L is the total number of residual layers in the multi-layer tree CNN,is a constant that is initially 1 and,is as followsThe weight of the layer residual layer(s),for multi-layer tree CNN atCross entropy loss when layers are residual.
4. The convolutional neural network-based interframe image mode decision method as claimed in claim 1, wherein said partitioned modes include an undivided mode, a quadtree mode, a horizontal binary tree mode, a vertical binary tree mode, a horizontal ternary tree mode, and a vertical ternary tree mode.
5. The convolutional neural network-based interframe image mode decision method as claimed in claim 4, wherein said step of S4 is followed by the step of:
s41: judging whether the partition mode of the coding block under the current coding depth and partition is a non-partition mode, if so, entering the step S42, and if not, entering the step S5;
s42: and stopping the partition mode decision of the subsequent coding depth of the coding block, and after the partition mode decision of all the coding blocks, coding the target inter-frame image according to the partition mode of each coding block under each coding depth.
6. The convolutional neural network based inter-frame image mode decision method as claimed in claim 1, wherein after said step of S3, further comprising the steps of:
s31: and performing information vector connection on the convolution output, the image number information and the quantization parameters of the coding block under the current coding depth and partition division.
7. The convolutional neural network-based interframe image mode decision method of claim 1, wherein the multilayer tree-like CNN comprises:
the convolution layer comprises a convolution kernel of 3 multiplied by 3 and is used for extracting bottom layer characteristics in the input information;
the transition residual error layer is used for outputting a first residual error block according to the bottom layer characteristics;
a head-end residual layer for outputting a first convolution output and a second residual block by convolution between the bottom layer features and the first residual block;
a middle residual layer for outputting a second convolution output and a third residual block through convolution between the bottom layer features and the second residual block;
a final residual layer for outputting a third convolution output by convolution between the bottom layer features and a third residual block;
the full connection layer is used for fully connecting the first convolution output, the second convolution output and the third convolution output and outputting a partition division mode decision;
the convolutional layer, the transition residual layer, the head residual layer, the middle residual layer and the tail residual layer are sequentially connected.
8. The convolutional neural network based inter-frame image mode decision method as claimed in claim 7, wherein an information vector connection layer is connected between the head residual layer, the middle residual layer, the tail residual layer and the full connection layer, respectively, and the information vector connection layer is used for performing information vector connection between the convolutional output and the corresponding image number information and the quantization parameter of the coding block.
CN202210407485.0A 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network Active CN114513660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210407485.0A CN114513660B (en) 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210407485.0A CN114513660B (en) 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN114513660A true CN114513660A (en) 2022-05-17
CN114513660B CN114513660B (en) 2022-09-06

Family

ID=81555492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210407485.0A Active CN114513660B (en) 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114513660B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609601A (en) * 2017-09-28 2018-01-19 北京计算机技术及应用研究所 A kind of ship seakeeping method based on multilayer convolutional neural networks
CN108924558A (en) * 2018-06-22 2018-11-30 电子科技大学 A kind of predictive encoding of video method neural network based
CN110087087A (en) * 2019-04-09 2019-08-02 同济大学 VVC interframe encode unit prediction mode shifts to an earlier date decision and block divides and shifts to an earlier date terminating method
WO2020190297A1 (en) * 2019-03-21 2020-09-24 Google Llc Using rate distortion cost as a loss function for deep learning
US20200344474A1 (en) * 2017-12-14 2020-10-29 Interdigital Vc Holdings, Inc. Deep learning based image partitioning for video compression
CN112261414A (en) * 2020-09-27 2021-01-22 电子科技大学 Video coding convolution filtering method divided by attention mechanism fusion unit
CN112702599A (en) * 2020-12-24 2021-04-23 重庆理工大学 VVC intra-frame rapid coding method based on deep learning
CN112887712A (en) * 2021-02-03 2021-06-01 重庆邮电大学 HEVC intra-frame CTU partitioning method based on convolutional neural network
WO2021228513A1 (en) * 2020-05-15 2021-11-18 Huawei Technologies Co., Ltd. Learned downsampling based cnn filter for image and video coding using learned downsampling feature
US20220086463A1 (en) * 2020-09-16 2022-03-17 Qualcomm Incorporated End-to-end neural network based video coding
US20220094928A1 (en) * 2019-11-07 2022-03-24 Bitmovin, Inc. Fast Multi-Rate Encoding for Adaptive Streaming Using Machine Learming
CN114286093A (en) * 2021-12-24 2022-04-05 杭州电子科技大学 Rapid video coding method based on deep neural network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609601A (en) * 2017-09-28 2018-01-19 北京计算机技术及应用研究所 A kind of ship seakeeping method based on multilayer convolutional neural networks
US20200344474A1 (en) * 2017-12-14 2020-10-29 Interdigital Vc Holdings, Inc. Deep learning based image partitioning for video compression
CN108924558A (en) * 2018-06-22 2018-11-30 电子科技大学 A kind of predictive encoding of video method neural network based
WO2020190297A1 (en) * 2019-03-21 2020-09-24 Google Llc Using rate distortion cost as a loss function for deep learning
CN110087087A (en) * 2019-04-09 2019-08-02 同济大学 VVC interframe encode unit prediction mode shifts to an earlier date decision and block divides and shifts to an earlier date terminating method
US20220094928A1 (en) * 2019-11-07 2022-03-24 Bitmovin, Inc. Fast Multi-Rate Encoding for Adaptive Streaming Using Machine Learming
WO2021228513A1 (en) * 2020-05-15 2021-11-18 Huawei Technologies Co., Ltd. Learned downsampling based cnn filter for image and video coding using learned downsampling feature
US20220086463A1 (en) * 2020-09-16 2022-03-17 Qualcomm Incorporated End-to-end neural network based video coding
CN112261414A (en) * 2020-09-27 2021-01-22 电子科技大学 Video coding convolution filtering method divided by attention mechanism fusion unit
CN112702599A (en) * 2020-12-24 2021-04-23 重庆理工大学 VVC intra-frame rapid coding method based on deep learning
CN112887712A (en) * 2021-02-03 2021-06-01 重庆邮电大学 HEVC intra-frame CTU partitioning method based on convolutional neural network
CN114286093A (en) * 2021-12-24 2022-04-05 杭州电子科技大学 Rapid video coding method based on deep neural network

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
F. GALPIN,ET AL.: "AHG9: CNN-based driving of block partitioning for intra slices encoding", 《JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11,10TH MEETING: SAN DIEGO, US, 10–20 APR. 2018》 *
GARY SULLIVAN,ET AL.: "Meeting Report of the 19th Meeting of the Joint Video Experts Team (JVET)", 《JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 19TH MEETING, BY TELECONFERENCE,22 JUNE-1 JULY 2020》 *
GENWEI TANG,ET AL.: "Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding", 《2019 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)》 *
GERHARD TECH,ET AL.: "Fast Partitioning for VVC Intra-Picture Encoding with a CNN Minimizing the Rate-Distortion-Time Cost", 《 2021 DATA COMPRESSION CONFERENCE (DCC)》 *
SAMEENA JAVAID,ET AL.: "VVC/H.266 Intra Mode QTMT Based CU Partition Using CNN", 《IEEE ACCESS ( VOLUME: 10)》 *
TIM HELLMAN,ET AL.: "AHG16/AHG17: Simplification of CU Splitting Controls", 《JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11,16TH MEETING: GENEVA, CH, 1–11 OCTOBER 2019 》 *
TING-LAN LIN,ET AL.: "Fast Binary Tree Partition Decision in H.266/FVC Intra Coding", 《 2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW)》 *
ZHAOQING PAN,ET AL.: "A CNN-Based Fast Inter Coding Method for VVC", 《IEEE SIGNAL PROCESSING LETTERS ( VOLUME: 28)》 *
彭双,等.: "基于深度学习的快速QTMT划分", 《电信科学》 *

Also Published As

Publication number Publication date
CN114513660B (en) 2022-09-06

Similar Documents

Publication Publication Date Title
US11350124B2 (en) Image processing method and image processing device
WO2016141609A1 (en) Image prediction method and related device
CN100527842C (en) Background-based motion estimation coding method
WO2021032205A1 (en) System, method, codec device for intra frame and inter frame joint prediction
JP2002532026A (en) Improvement of motion estimation and block matching pattern
CN101816183A (en) Method and apparatus for inter prediction encoding/decoding an image using sub-pixel motion estimation
TW202025730A (en) Intra mode coding based on history information
JP2006518157A (en) Method and apparatus for object-based motion compensation
WO2022063265A1 (en) Inter-frame prediction method and apparatus
CN110062239B (en) Reference frame selection method and device for video coding
CN114513660B (en) Interframe image mode decision method based on convolutional neural network
CN114900691B (en) Encoding method, encoder, and computer-readable storage medium
CN110581993A (en) Coding unit rapid partitioning method based on intra-frame coding in multipurpose coding
CN113965753A (en) Inter-frame image motion estimation method and system based on code rate control
CN113810715A (en) Video compression reference image generation method based on void convolutional neural network
CN105828084B (en) HEVC (high efficiency video coding) inter-frame coding processing method and device
CN110392264B (en) Alignment extrapolation frame method based on neural network
CN101268623A (en) Variable shape motion estimation in video sequence
JP4438949B2 (en) Motion compensated predictive coding apparatus, motion compensated predictive coding method, and program
CN112804525B (en) IBC mode intra block copy prediction method, device, medium and equipment
KR102226693B1 (en) Fast motion estimation method and apparatus for the same
Nortje et al. Deep motion estimation for parallel inter-frame prediction in video compression
CN114466199A (en) Reference frame generation method and system applicable to VVC (variable valve timing) coding standard
CN110944198A (en) Chroma mode intra coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant