CN111901596B - Video hybrid encoding and decoding method, device and medium based on deep learning - Google Patents

Video hybrid encoding and decoding method, device and medium based on deep learning Download PDF

Info

Publication number
CN111901596B
CN111901596B CN202010604772.1A CN202010604772A CN111901596B CN 111901596 B CN111901596 B CN 111901596B CN 202010604772 A CN202010604772 A CN 202010604772A CN 111901596 B CN111901596 B CN 111901596B
Authority
CN
China
Prior art keywords
frame image
decoding
video
bottleneck layer
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010604772.1A
Other languages
Chinese (zh)
Other versions
CN111901596A (en
Inventor
贾川民
马思伟
王苫社
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202010604772.1A priority Critical patent/CN111901596B/en
Publication of CN111901596A publication Critical patent/CN111901596A/en
Application granted granted Critical
Publication of CN111901596B publication Critical patent/CN111901596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明公开了基于深度学习的视频混合编码与解码方法及装置、介质。编码方法包括:从指定帧图像中提取出瓶颈层特征;依据瓶颈层特征重建第一帧图像;对瓶颈层特征进行量化和熵编码,以得到帧内编码数据;对当前视频的第一后续帧图像进行补偿、变换、量化及熵编码,以得到第一预测残差数据。解码方法包括对帧内编码数据进行熵解码,从而得到瓶颈层特征,解码出指定帧图像;对第一预测残差数据进行熵解码、反量化、反变换及补偿,然后对补偿后的数据进行环路滤波,解码出第一后续帧图像;编解码装置与相应的编解码方法对应。本发明提供了一种全新的视频编码、解码方案,能够实现对视频的高效压缩和快速解码,实现了视频压缩性能的大幅提升。

Figure 202010604772

The invention discloses a video hybrid encoding and decoding method, device and medium based on deep learning. The encoding method includes: extracting the bottleneck layer feature from the specified frame image; reconstructing the first frame image according to the bottleneck layer feature; quantizing and entropy encoding the bottleneck layer feature to obtain intra-frame encoded data; The image is compensated, transformed, quantized and entropy encoded to obtain first prediction residual data. The decoding method includes performing entropy decoding on the intra-frame coded data to obtain the bottleneck layer feature and decoding the specified frame image; performing entropy decoding, inverse quantization, inverse transformation and compensation on the first prediction residual data, and then performing compensation on the compensated data. Loop filtering to decode the first subsequent frame image; the encoding and decoding device corresponds to the corresponding encoding and decoding method. The present invention provides a brand-new video encoding and decoding scheme, which can realize high-efficiency compression and fast decoding of video, and realizes a great improvement in video compression performance.

Figure 202010604772

Description

Video hybrid coding and decoding method, device and medium based on deep learning
Technical Field
The present invention relates to the field of video coding technologies, and in particular, to a video hybrid coding and decoding method and apparatus, and a computer storage medium.
Background
At present, a traditional hybrid coding framework mainly performs predictive transform coding based on image blocks with different sizes, and an improved scheme often focuses on local rate distortion optimization of each coding tool; for example, entropy estimation models are continuously improved and promoted, probability estimation models based on a mixed gaussian model and a gaussian super-prior distribution entropy estimation model are proposed, and a context model is established in combination with an autoregressive model-based coding framework to help an end-to-end image coding framework obtain higher coding gain. However, existing video hybrid coding schemes are often difficult to meet when faced with higher video compression requirements.
Therefore, how to further improve the video compression rate through hybrid coding becomes a key point for those skilled in the art to solve the technical problem and research all the time.
Disclosure of Invention
In order to solve the problem that the compression rate of the existing video hybrid coding scheme is difficult to further improve, the invention innovatively provides a video hybrid coding and decoding method, a device and a medium based on deep learning, and the video hybrid coding and decoding method, device and medium can better solve the problem of the existing video hybrid coding scheme.
To achieve the technical purpose, the present invention specifically discloses a video hybrid coding method based on deep learning, which includes, but is not limited to, the following processes.
And extracting bottleneck layer characteristics from the specified frame image of the current video.
Reconstructing a first frame image according to the bottleneck layer characteristics; and quantizing and entropy coding the bottleneck layer characteristics in sequence to obtain intra-frame coded data for writing in a code stream.
And compensating a first subsequent frame image of the current video by using the first frame image, and sequentially transforming, quantizing and entropy coding the compensated image to obtain first prediction residual data for writing in a code stream.
Further, the hybrid encoding method further includes:
and compensating a second subsequent frame image of the current video by using the first frame image or the first subsequent frame image, and then transforming, quantizing and entropy coding the compensated image to obtain second prediction residual data for writing in a code stream.
Wherein the designated frame image, the first subsequent frame image, and the second subsequent frame image appear in the current video in a front-to-back order.
Further, the process of compensating the second subsequent frame image by using the first subsequent frame image comprises:
and sequentially carrying out transformation, quantization, inverse transformation, secondary compensation and loop filtering on the first subsequent frame image, taking the loop-filtered image as a reconstructed second frame image, and compensating the second subsequent frame image by using the second frame image.
Further, the process of extracting the bottleneck layer feature from the specified frame image of the current video includes:
sequentially grouping all frame images of the current video according to the sequence from front to back to obtain a plurality of groups of images, and taking the first frame image of each group of images as the appointed frame image; wherein each group of images includes the first subsequent frame image and the second subsequent frame image.
And extracting bottleneck layer characteristics from the first frame image of the current group of images.
Further, before the quantization and entropy coding are sequentially performed on the bottleneck layer features, the method further includes:
and performing code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network.
In order to achieve the technical purpose, the invention also discloses a video decoding method based on deep learning, which is used for decoding data generated by the video hybrid coding method according to any embodiment of the invention; the video decoding method based on deep learning comprises the following steps:
entropy decoding intra-frame coded data in the received code stream to obtain bottleneck layer characteristics, and decoding a specified frame image according to the bottleneck layer characteristics.
And performing entropy decoding, inverse quantization and inverse transformation on first prediction residual data in the received code stream, compensating by using the specified frame image, and performing loop filtering on the compensated data so as to decode a first subsequent frame image in a loop filtering manner.
The deep learning based video decoding method further comprises:
and performing entropy decoding, inverse quantization and inverse transformation on second prediction residual data in the received code stream, compensating by using the first subsequent frame image or the appointed frame image, and performing loop filtering on the compensated image so as to decode a second subsequent frame image in a loop filtering manner.
In order to achieve the technical object, the present invention also specifically discloses a video hybrid coding device based on deep learning, which includes, but is not limited to, the following structure.
And the analysis network module is used for extracting the bottleneck layer characteristics from the specified frame image of the current video.
And the encoding end generation network module is used for reconstructing a first frame image according to the bottleneck layer characteristics.
And the quantization module quantizes the bottleneck layer characteristics.
And the entropy coding module is used for entropy coding the quantized bottleneck layer characteristics to obtain intra-frame coded data used for writing in the code stream.
And the encoding end compensation module is used for compensating a first subsequent frame image of the current video by using the first frame image.
And the transformation module is used for transforming the compensated image.
The quantization module is further configured to quantize the transformed image.
The entropy coding module is further configured to perform entropy coding on the quantized image to obtain first prediction residual data for writing into a code stream.
In order to achieve the technical object, the invention also discloses a video decoding device based on deep learning, which is used for decoding the data generated by the video hybrid coding device of any embodiment of the invention; the video decoding apparatus based on the deep learning includes, but is not limited to, the following structure.
The entropy decoding module is used for carrying out entropy decoding on intra-frame coded data in the received code stream so as to obtain bottleneck layer characteristics; and the decoder is also used for carrying out entropy decoding on the first prediction residual data in the received code stream.
And the decoding end generates a network module for decoding the appointed frame image according to the bottleneck layer characteristics.
And the decoding end inverse quantization module is used for carrying out inverse quantization on the entropy-decoded first prediction residual data.
And the decoding end inverse transformation module is used for carrying out inverse transformation on the first prediction residual data after inverse quantization.
And the decoding end compensation module is used for compensating the first prediction residual data after inverse transformation by using the appointed frame image.
And the decoding end loop filtering module is used for performing loop filtering on the compensated first prediction residual data so as to decode a first subsequent frame image in a loop filtering mode.
To achieve the above technical object, the present invention further provides a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements a deep learning based video hybrid encoding method in any embodiment of the present invention or a deep learning based video decoding method in any embodiment of the present invention.
The invention has the beneficial effects that: the invention provides a novel video coding and decoding scheme, which can realize high-efficiency compression and quick decoding of videos, greatly improve the video compression performance and better solve the problems of difficult compression ratio improvement, slow decoding and the like in the conventional video mixed coding scheme.
Compared with the conventional technology, especially compared with the scheme of generating the intra-frame prediction pixels by adopting the neighborhood pixels, the method does not need to encode the residual error information in the intra-frame encoding process, so the encoding efficiency of the method is far higher than that of the conventional encoding scheme, and the video encoding performance is greatly improved.
In addition, the weight data of the analysis network and the generation network in the deep learning self-encoder are stored off line, and do not need to be transmitted through a code stream, so the deep learning self-encoder can effectively reduce the code flow.
Drawings
Fig. 1 is a flow chart of a video hybrid coding method based on deep learning according to some embodiments of the present invention.
Fig. 2 illustrates a schematic diagram of a coding end framework in some embodiments of the invention.
Fig. 3 shows a decoding end frame diagram in some embodiments of the invention.
FIG. 4 illustrates a schematic diagram of the operational status of an analysis network module and a generation network module in some embodiments of the invention.
Fig. 5 shows a graph of rate-distortion performance compared in synchronization with various conventional coding algorithms.
Detailed Description
The following explains and explains a video hybrid encoding and decoding method and apparatus based on deep learning, and a computer readable storage medium in detail, with reference to the drawings of the specification.
The first embodiment is as follows:
as shown in fig. 1 and 2, the present embodiment provides a video hybrid coding method based on deep learning, which can perform intra-frame coding work by a self-encoder of deep learning. In particular, the video hybrid encoding method may include, but is not limited to, the following steps.
Firstly, an intra-frame coding process is carried out, and bottleneck layer characteristics are extracted from a specified frame image of a current video. In some preferred embodiments of the present invention, the process of extracting the bottleneck layer feature from the specified frame image of the current video includes: sequentially grouping all frame images of the current video according to the sequence from front to back to obtain a plurality of groups of images, and taking the first frame image of each group of images as a designated frame image; if the current video has 1600 frames, it can be divided into 100 groups, each group having 16 frames; each group of images comprises a first subsequent frame image and a second subsequent frame image; the method comprises the steps that Bottleneck Layer characteristics are extracted from a first frame image of a current group of images, and the Bottleneck Layer characteristics can be realized through a deep learning self-encoder (Auto-encoder) constructed based on a convolutional neural network, wherein the self-encoder is provided with a Bottleneck Layer (Bottleneck Layer), different self-encoders can be trained for each code rate point (quantization parameter value), and different network weights can be configured in the training process to realize the encoding of different code rates; as in the above example, for any group of images, if the group of images is numbered from 1 to 16 from front to back, the number 1 image is a designated frame image, the number 2 image is a first subsequent frame image, and the number 3-16 images may be the first subsequent frame image or the second subsequent frame image.
As shown in fig. 2, before extracting the bottleneck layer feature, the present embodiment should determine whether the current frame is the designated frame Fn. That is, in this embodiment, it is required to select whether the encoding mode is the intra mode or the non-intra mode, and the bottleneck layer feature extraction processing can be performed on the first frame (the first frame) of each group of pictures in the intra mode. The non-intra mode in the present embodiment includes, but is not limited to, an inter mode, a skip mode, a merge mode, an affine motion mode, and the like.
Secondly, reconstructing the first frame image according to the bottleneck layer characteristics. And the bottleneck layer characteristics are sequentially quantized and entropy-coded to obtain intra-frame coded data for writing in the code stream, and further for transmitting the intra-frame coded data through the code stream. Therefore, the invention can improve the whole video coding efficiency by improving the intra-frame coding efficiency. In some preferred embodiments of the present invention, before the quantifying and entropy-encoding the bottleneck layer features in sequence, the method further comprises: and performing code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network. After the code rate estimation processing is carried out through the super-prior network, the invention can further improve the compression rate, so that the code stream is smaller and the transmission speed under the same transmission condition is higher.
And thirdly, performing an encoding process of interframe motion compensation prediction, effectively utilizing the first frame image to compensate a first subsequent frame image of the current video, and sequentially transforming, quantizing and entropy encoding the compensated image to obtain first prediction residual data written in a code stream, wherein the first prediction residual data is used for transmitting the first prediction residual data through the code stream, and any group of images of the current video can generate one or more times of first prediction residual data. The inter coding mode of the present embodiment may be a block-based inter motion compensated predictive coding mode. The transform in this embodiment may be an orthogonal transform, and may include, but is not limited to, Discrete Cosine Transform (DCT), wavelet transform, hadamard transform, etc., and some signals insensitive to human eyes may be exposed through the transform process. The present embodiment may achieve the purpose of compression by removing some non-critical or unimportant signals, such as signals insensitive to human eyes, through quantization. The present embodiment enables further compression by entropy coding.
Finally, the embodiment compensates a second subsequent frame image of the current video by using the first frame image or the first subsequent frame image, and then transforms, quantizes and entropy-encodes the compensated image to obtain second prediction residual data for writing into the code stream. In this embodiment, the step of compensating the second subsequent frame image of the current video by using the first frame image is similar to the step of compensating the first subsequent frame image of the current video by using the first frame image, and is not repeated. In some preferred embodiments of the present invention, the process of compensating the second subsequent frame image by using the first subsequent frame image comprises: the first subsequent frame image may be sequentially subjected to transformation, quantization, inverse transformation, secondary compensation and Loop Filter (Loop Filter), and then the Loop-filtered image is used as a reconstructed second frame image, so as to compensate the second subsequent frame image by using the second frame image. The appointed frame image, the first subsequent frame image and the second subsequent frame image appear in the current video from front to back. It should be understood that the preceding second subsequent frame image may be used to compensate the following second subsequent frame image. The present embodiment can be executed in a loop, and after one group of frame images is encoded, the above-mentioned bottleneck layer feature extraction step is restarted for the next group of frame images.
As shown in fig. 5, a schematic diagram of rate-distortion performance of the present invention compared with various conventional coding algorithms in the same step is shown, which can show that the video coding method based on deep learning provided by the present invention has better performance.
Example two:
as shown in fig. 3 and 4, the present embodiment provides a video decoding method based on deep learning, which corresponds to the video encoding method according to any embodiment of the present invention, and is used to perform corresponding decoding on corresponding data generated by the video hybrid encoding method according to any embodiment of the present invention.
Specifically, the video decoding method based on deep learning of the present embodiment includes, but is not limited to, the following steps.
Firstly, entropy decoding is carried out on intra-frame coded data in a received code stream to obtain bottleneck layer characteristics, and a specified frame image is decoded according to the bottleneck layer characteristics.
And then, performing entropy decoding, inverse quantization and inverse transformation on the first prediction residual data in the received code stream, performing compensation by using a specified frame image, and performing loop filtering on the compensated data so as to decode a first subsequent frame image in a loop filtering manner.
Then, entropy decoding, inverse quantization and inverse transformation can be carried out on second prediction residual data in the received code stream, compensation is carried out by utilizing the first subsequent frame image or the appointed frame image, loop filtering is carried out on the compensated image, and therefore a second subsequent frame image is decoded in a loop filtering mode.
Example three:
the present embodiment is based on the same inventive concept as the first embodiment, and can specifically provide a video hybrid coding device based on deep learning. The device can provide a video coding framework fusing a depth intra-frame self-encoder and inter-frame motion compensation prediction, an intra-frame mode or a non-intra-frame mode can be selected at a coding end, efficient coding of video is achieved, and further compression of the video is completed.
And the analysis network module inputs the original signals and outputs the bottleneck layer characteristics. The visible analysis network module is used for extracting the bottleneck layer characteristics from the appointed frame image of the current video. In this embodiment, all frame images of a current video are sequentially grouped and set from front to back to obtain a plurality of groups of images, and a first frame image of each group of images is used as a designated frame image; each group of images can comprise a first subsequent frame image and a second subsequent frame image, and the bottleneck layer characteristics are extracted from the first frame image of the current group of images.
The encoding end generates a network module, which is used to reconstruct the first frame image according to the bottleneck layer characteristics, so as to obtain P illustrated in fig. 2. At the encoding end, the analysis network module and the encoding end generation network module in this embodiment can form a deep learning self-encoder constructed based on a convolutional neural network, and the encoder has a bottleneck layer.
And the quantification module is used for quantifying the characteristics of the bottleneck layer.
And the entropy coding module is used for entropy coding the quantized bottleneck layer characteristics to obtain intra-frame coded data used for writing in the code stream.
The encoding end compensation module is used for compensating a first subsequent frame image of the current video by using the first frame image P; and is also used for compensating a second subsequent frame image in the current video by using the first frame image P or the first subsequent frame image.
The transformation module is used for transforming the compensated image; the transformation module may also be used to transform the first subsequent frame image.
And the super-prior coding module can be used for carrying out code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network.
The quantization module is also used for quantizing the transformed image; and the quantization module may also be used to quantize the transformed first subsequent frame image.
And the coding end inverse quantization module is used for carrying out inverse quantization on the quantized first subsequent frame image.
A coding end inverse transformation module for performing inverse transformation on the inverse quantized first subsequent frame image to obtain D shown in fig. 2n′。
A secondary compensation module at the encoding end, which can be used to perform secondary compensation on the inverse-transformed first subsequent frame image to obtain uF as shown in FIG. 2n′。
And the encoding-side loop filtering module is configured to perform loop filtering on the secondarily compensated first subsequent frame image, so as to use the loop-filtered image as a reconstructed second frame image (also denoted by P in fig. 2) and compensate the second subsequent frame image by using the second frame image.
The entropy coding module can be also used for entropy coding the quantized image to obtain first prediction residual data used for writing in a code stream; and the second prediction residual error data is used for entropy coding the transformed and quantized second subsequent frame image to obtain second prediction residual error data used for writing in the code stream. The designated frame image, the first subsequent frame image, and the second subsequent frame image in this embodiment may appear in the current video in the order from front to back.
Example four:
the video decoding method in the second embodiment can be based on the same inventive concept, and this embodiment specifically provides a video decoding apparatus based on deep learning, which can be used for decoding data generated by the video hybrid encoding apparatus in any embodiment of the present invention.
As shown in fig. 3 and 4, the video decoding apparatus based on deep learning in the present embodiment includes, but is not limited to, the following modules. In addition, the decoding end of the invention does not need to analyze the network, so the structure of the decoding end of the invention is more simplified.
The entropy decoding module is used for carrying out entropy decoding on intra-frame coded data in the received code stream so as to obtain bottleneck layer characteristics; the decoder is also used for carrying out entropy decoding on the first prediction residual data in the received code stream; and the decoder is also used for carrying out entropy decoding on second prediction residual data in the received code stream.
And the decoding end generates a network module, inputs the bottleneck layer characteristics subjected to entropy decoding, and outputs the bottleneck layer characteristics subjected to entropy decoding into a reconstructed intra-frame coding image. As can be seen, the decoding end generation network module of this embodiment is used for decoding the corresponding specified frame image P according to the bottleneck layer characteristics.
And the decoding end inverse quantization module is used for carrying out inverse quantization on the first prediction residual data and the second prediction residual data after entropy decoding.
A decoding-end inverse transformation module, configured to perform inverse transformation on the inverse quantized first prediction residual data and the second prediction residual data to obtain D in fig. 3n′。
A decoding-end compensation module, configured to compensate the inverse-transformed first prediction residual data with a specified frame image to obtain uF in fig. 3n'; and also for compensating the inversely transformed second prediction residual data with the first subsequent frame picture (also denoted by P) or the designated frame picture.
And the decoding end loop filtering module is used for performing loop filtering on the compensated first prediction residual data and second prediction residual data so as to decode a first subsequent frame image and a second subsequent frame image in a loop filtering mode.
Example five:
the present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a deep learning based video hybrid encoding method or a deep learning based video decoding method according to any one of the embodiments of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM-Only Memory, or flash Memory), an optical fiber device, and a portable Compact Disc Read-Only Memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic Gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic Gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "the present embodiment," "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and simplifications made in the spirit of the present invention are intended to be included in the scope of the present invention.

Claims (10)

1.一种基于深度学习的视频混合编码方法,其特征在于,包括:1. a video hybrid coding method based on deep learning, is characterized in that, comprises: 从当前视频的指定帧图像中提取出瓶颈层特征,将首帧图像作为所述指定帧图像;Extract the bottleneck layer feature from the specified frame image of the current video, and use the first frame image as the specified frame image; 依据所述瓶颈层特征重建第一帧图像;并对所述瓶颈层特征依次进行量化和熵编码,以得到用于写入码流的帧内编码数据;Reconstructing the first frame image according to the bottleneck layer feature; and sequentially performing quantization and entropy encoding on the bottleneck layer feature to obtain intra-frame encoded data for writing into the code stream; 利用所述第一帧图像对当前视频的第一后续帧图像进行帧间运动补偿,对补偿后的图像依次进行变换、量化及熵编码,以得到用于写入码流的第一预测残差数据。Use the first frame image to perform inter-frame motion compensation on the first subsequent frame image of the current video, and sequentially perform transformation, quantization and entropy encoding on the compensated image to obtain the first prediction residual for writing into the code stream. data. 2.根据权利要求1所述的基于深度学习的视频混合编码方法,其特征在于,还包括:2. the video hybrid coding method based on deep learning according to claim 1, is characterized in that, also comprises: 利用所述第一帧图像或所述第一后续帧图像对当前视频的第二后续帧图像进行补偿,然后对补偿后的图像进行变换、量化及熵编码,以得到用于写入码流的第二预测残差数据;Compensate the second subsequent frame image of the current video by using the first frame image or the first subsequent frame image, and then perform transformation, quantization and entropy encoding on the compensated image, so as to obtain a code for writing the code stream. The second prediction residual data; 其中,所述指定帧图像、所述第一后续帧图像及所述第二后续帧图像按照从前到后的顺序出现在所述当前视频中。The specified frame image, the first subsequent frame image, and the second subsequent frame image appear in the current video in order from front to back. 3.根据权利要求2所述的基于深度学习的视频混合编码方法,其特征在于,利用所述第一后续帧图像对所述第二后续帧图像进行补偿的过程包括:3. The video hybrid coding method based on deep learning according to claim 2, wherein the process of using the first subsequent frame image to compensate the second subsequent frame image comprises: 对所述第一后续帧图像依次进行变换、量化、反量化、反变换、二次补偿以及环路滤波,将环路滤波后的图像作为重建的第二帧图像,以利用所述第二帧图像对第二后续帧图像进行补偿。Perform transformation, quantization, inverse quantization, inverse transformation, secondary compensation, and loop filtering on the first subsequent frame image in sequence, and use the loop filtered image as the reconstructed second frame image to utilize the second frame image The image is compensated for the second subsequent frame image. 4.根据权利要求1至3中任一权利要求所述的基于深度学习的视频混合编码方法,其特征在于,所述从当前视频的指定帧图像中提取出瓶颈层特征的过程包括:4. The video hybrid coding method based on deep learning according to any one of claims 1 to 3, wherein the process of extracting the bottleneck layer feature from the specified frame image of the current video comprises: 对当前视频的所有帧图像按照从前到后的顺序依次进行分组设置,以得到多组图像,将各组图像的首帧图像作为所述指定帧图像;其中,所述各组图像均包括所述第一后续帧图像和所述第二后续帧图像;All frame images of the current video are grouped and set in sequence from front to back to obtain multiple groups of images, and the first frame image of each group of images is used as the specified frame image; wherein, each group of images includes all the images. the first subsequent frame image and the second subsequent frame image; 从当前组图像的首帧图像中提取出瓶颈层特征。The bottleneck layer features are extracted from the first frame of the current group of images. 5.根据权利要求1所述的基于深度学习的视频混合编码方法,其特征在于,在对所述瓶颈层特征依次进行量化和熵编码之前还包括:5. The video hybrid coding method based on deep learning according to claim 1, is characterized in that, before carrying out quantization and entropy coding successively to described bottleneck layer feature, also comprises: 采用超先验网络对待进行量化的所述瓶颈层特征进行码率估计。A super-prior network is used to estimate the code rate of the bottleneck layer feature to be quantized. 6.一种基于深度学习的视频解码方法,其特征在于,用于对权利要求2至5中任一权利要求所述的视频混合编码方法产生的数据进行解码;所述基于深度学习的视频解码方法包括:6. A video decoding method based on deep learning, characterized in that, for decoding the data generated by the video hybrid coding method according to any one of claims 2 to 5; the deep learning-based video decoding Methods include: 对接收的码流中的帧内编码数据进行熵解码,以得到瓶颈层特征,并依据所述瓶颈层特征解码出指定帧图像;Entropy decoding is performed on the intra-frame encoded data in the received code stream to obtain a bottleneck layer feature, and a specified frame image is decoded according to the bottleneck layer feature; 对接收的码流中的第一预测残差数据进行熵解码、反量化、反变换及利用所述指定帧图像进行补偿,然后对补偿后的数据进行环路滤波,从而通过环路滤波的方式解码出第一后续帧图像。Entropy decoding, inverse quantization, and inverse transformation are performed on the first prediction residual data in the received code stream, and compensation is performed by using the specified frame image, and then loop filtering is performed on the compensated data, so as to pass the loop filtering method The first subsequent frame image is decoded. 7.根据权利要求6所述的基于深度学习的视频解码方法,其特征在于,还包括:7. The video decoding method based on deep learning according to claim 6, further comprising: 对接收的码流中的第二预测残差数据进行熵解码、反量化、反变换以及利用所述第一后续帧图像或所述指定帧图像进行补偿,然后对补偿后的图像进行环路滤波,从而通过环路滤波的方式解码出第二后续帧图像。Entropy decoding, inverse quantization, and inverse transformation are performed on the second prediction residual data in the received code stream, and compensation is performed using the first subsequent frame image or the specified frame image, and then loop filtering is performed on the compensated image , so that the second subsequent frame image is decoded by means of loop filtering. 8.一种基于深度学习的视频混合编码装置,其特征在于,包括:8. A video hybrid encoding device based on deep learning, characterized in that, comprising: 分析网络模块,用于从当前视频的指定帧图像中提取出瓶颈层特征,将首帧图像作为所述指定帧图像;The analysis network module is used to extract the bottleneck layer feature from the specified frame image of the current video, and the first frame image is used as the specified frame image; 编码端生成网络模块,用于依据所述瓶颈层特征重建第一帧图像;The encoding end generates a network module, which is used to reconstruct the first frame image according to the bottleneck layer feature; 量化模块,对所述瓶颈层特征进行量化;a quantization module, to quantify the features of the bottleneck layer; 熵编码模块,用于对量化后的瓶颈层特征进行熵编码,以得到用于写入码流的帧内编码数据;an entropy coding module, used for entropy coding the quantized bottleneck layer feature to obtain intra-frame coding data for writing into the code stream; 编码端补偿模块,用于利用所述第一帧图像对当前视频的第一后续帧图像进行帧间运动补偿;an encoder compensation module, configured to perform inter-frame motion compensation on the first subsequent frame image of the current video by using the first frame image; 变换模块,用于对补偿后的图像进行变换;The transformation module is used to transform the compensated image; 所述量化模块,还用于对变换后的图像进行量化;The quantization module is also used to quantize the transformed image; 所述熵编码模块,还用于对量化后的图像进行熵编码,以得到用于写入码流的第一预测残差数据。The entropy coding module is further configured to perform entropy coding on the quantized image to obtain first prediction residual data for writing into the code stream. 9.一种基于深度学习的视频解码装置,其特征在于,用于对权利要求8所述的视频混合编码装置产生的数据进行解码;所述基于深度学习的视频解码装置包括:9. A video decoding device based on deep learning, characterized in that, for decoding the data generated by the video hybrid encoding device according to claim 8; the video decoding device based on deep learning comprises: 熵解码模块,用于对接收的码流中的帧内编码数据进行熵解码,以得到瓶颈层特征;还用于对接收的码流中的第一预测残差数据进行熵解码;The entropy decoding module is used for entropy decoding the intra-frame coded data in the received code stream to obtain the bottleneck layer feature; and is also used for entropy decoding on the first prediction residual data in the received code stream; 解码端生成网络模块,用于依据所述瓶颈层特征解码出指定帧图像;The decoding end generates a network module for decoding the specified frame image according to the bottleneck layer feature; 解码端反量化模块,用于对熵解码后的第一预测残差数据进行反量化;an inverse quantization module at the decoding end, used for inverse quantization of the first prediction residual data after entropy decoding; 解码端反变换模块,用于对反量化后的第一预测残差数据进行反变换;an inverse transformation module at the decoding end, which is used to inversely transform the inversely quantized first prediction residual data; 解码端补偿模块,用于利用所述指定帧图像对反变换后的第一预测残差数据进行补偿;a decoding end compensation module, configured to use the specified frame image to compensate the inversely transformed first prediction residual data; 解码端环路滤波模块,用于对补偿后的第一预测残差数据进行环路滤波,从而通过环路滤波的方式解码出第一后续帧图像。The decoding-end loop filtering module is configured to perform loop filtering on the compensated first prediction residual data, so as to decode the first subsequent frame image by means of loop filtering. 10.一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至5中任一权利要求所述的视频混合编码方法或者权利要求6或7所述的视频解码方法。10. A computer-readable storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed by a processor, the video hybrid encoding method according to any one of claims 1 to 5 or The video decoding method of claim 6 or 7.
CN202010604772.1A 2020-06-29 2020-06-29 Video hybrid encoding and decoding method, device and medium based on deep learning Active CN111901596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010604772.1A CN111901596B (en) 2020-06-29 2020-06-29 Video hybrid encoding and decoding method, device and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010604772.1A CN111901596B (en) 2020-06-29 2020-06-29 Video hybrid encoding and decoding method, device and medium based on deep learning

Publications (2)

Publication Number Publication Date
CN111901596A CN111901596A (en) 2020-11-06
CN111901596B true CN111901596B (en) 2021-10-22

Family

ID=73206537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010604772.1A Active CN111901596B (en) 2020-06-29 2020-06-29 Video hybrid encoding and decoding method, device and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN111901596B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12170786B2 (en) 2021-09-17 2024-12-17 Samsung Electronics Co., Ltd. Device and method for encoding and decoding image using AI

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113873248A (en) * 2021-09-26 2021-12-31 深圳市万利翔实业有限公司 A kind of digital video data encoding and decoding method and device
CN113766237B (en) * 2021-09-30 2024-07-02 咪咕文化科技有限公司 Encoding method, decoding method, device, equipment and readable storage medium
CN114125443B (en) * 2021-11-19 2024-12-24 展讯通信(上海)有限公司 Video bit rate control method, device and electronic device
WO2023126568A1 (en) * 2021-12-27 2023-07-06 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding
CN115022637A (en) * 2022-04-26 2022-09-06 华为技术有限公司 A kind of image coding method, image decompression method and device
WO2024049627A1 (en) * 2022-09-02 2024-03-07 Interdigital Vc Holdings, Inc. Video compression for both machine and human consumption using a hybrid framework
CN115529457B (en) * 2022-09-05 2024-05-14 清华大学 Video compression method and device based on deep learning
CN115209147B (en) * 2022-09-15 2022-12-27 深圳沛喆微电子有限公司 Camera video transmission bandwidth optimization method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108632527A (en) * 2017-03-24 2018-10-09 安讯士有限公司 Controller, video camera and the method for controlling video camera
CN110677651A (en) * 2019-09-02 2020-01-10 合肥图鸭信息科技有限公司 Video compression method
CN110753225A (en) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 Video compression method and device and terminal equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8493499B2 (en) * 2010-04-07 2013-07-23 Apple Inc. Compression-quality driven image acquisition and processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108632527A (en) * 2017-03-24 2018-10-09 安讯士有限公司 Controller, video camera and the method for controlling video camera
CN110677651A (en) * 2019-09-02 2020-01-10 合肥图鸭信息科技有限公司 Video compression method
CN110753225A (en) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 Video compression method and device and terminal equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于神经网络的图像视频编码》;贾川民等;《电信科学》;20190531;全文 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12170786B2 (en) 2021-09-17 2024-12-17 Samsung Electronics Co., Ltd. Device and method for encoding and decoding image using AI

Also Published As

Publication number Publication date
CN111901596A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN111901596B (en) Video hybrid encoding and decoding method, device and medium based on deep learning
US11159789B2 (en) Generative adversarial network based intra prediction for video coding
Golinski et al. Feedback recurrent autoencoder for video compression
JP5606591B2 (en) Video compression method
CN110798690B (en) Video decoding method, and method, device and equipment for training loop filtering model
CN112715027A (en) Neural network driven codec
CN113766249B (en) Loop filtering method, device, equipment and storage medium in video coding and decoding
JP2011515981A (en) Method and apparatus for encoding or decoding video signal
KR20080018469A (en) Image conversion method and apparatus, inverse conversion method and apparatus
KR102245682B1 (en) Apparatus for compressing image, learning apparatus and method thereof
CN113259671B (en) Loop filtering method, device, equipment and storage medium in video coding and decoding
KR101739603B1 (en) Method and apparatus for reusing tree structures to encode and decode binary sets
EP4018410A1 (en) Watermark-based image reconstruction
CN111757109A (en) High-real-time parallel video coding and decoding method, system and storage medium
CN112954350B (en) Video post-processing optimization method and device based on frame classification
KR100679027B1 (en) Method and apparatus for coding an image without loss of DC components
Belyaev et al. Error concealment for 3-D DWT based video codec using iterative thresholding
CN115883851A (en) Filtering, encoding and decoding methods and devices, computer readable medium and electronic equipment
WO2021263251A1 (en) State transition for dependent quantization in video coding
CN115883842B (en) Filtering and encoding and decoding method, device, computer readable medium and electronic device
WO2024140951A1 (en) A neural network based image and video compression method with integer operations
WO2024140849A1 (en) Method, apparatus, and medium for visual data processing
US20230239470A1 (en) Video encoding and decoding methods, encoder, decoder, and storage medium
WO2024149392A1 (en) Method, apparatus, and medium for visual data processing
WO2024149308A1 (en) Method, apparatus, and medium for video processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant