CN111901596B - Video hybrid coding and decoding method, device and medium based on deep learning - Google Patents

Video hybrid coding and decoding method, device and medium based on deep learning Download PDF

Info

Publication number
CN111901596B
CN111901596B CN202010604772.1A CN202010604772A CN111901596B CN 111901596 B CN111901596 B CN 111901596B CN 202010604772 A CN202010604772 A CN 202010604772A CN 111901596 B CN111901596 B CN 111901596B
Authority
CN
China
Prior art keywords
frame image
decoding
video
bottleneck layer
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010604772.1A
Other languages
Chinese (zh)
Other versions
CN111901596A (en
Inventor
贾川民
马思伟
王苫社
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202010604772.1A priority Critical patent/CN111901596B/en
Publication of CN111901596A publication Critical patent/CN111901596A/en
Application granted granted Critical
Publication of CN111901596B publication Critical patent/CN111901596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a video hybrid coding and decoding method, device and medium based on deep learning. The encoding method comprises the following steps: extracting bottleneck layer characteristics from the appointed frame image; reconstructing a first frame image according to the bottleneck layer characteristics; quantizing and entropy coding the bottleneck layer characteristics to obtain intra-frame coded data; and compensating, transforming, quantizing and entropy coding a first subsequent frame image of the current video to obtain first prediction residual data. The decoding method comprises the steps of carrying out entropy decoding on intra-frame coded data so as to obtain bottleneck layer characteristics, and decoding a specified frame image; entropy decoding, inverse quantization, inverse transformation and compensation are carried out on the first prediction residual data, then loop filtering is carried out on the compensated data, and a first subsequent frame image is decoded; the encoding and decoding device corresponds to a corresponding encoding and decoding method. The invention provides a novel video coding and decoding scheme, which can realize high-efficiency compression and quick decoding of videos and greatly improve the video compression performance.

Description

Video hybrid coding and decoding method, device and medium based on deep learning
Technical Field
The present invention relates to the field of video coding technologies, and in particular, to a video hybrid coding and decoding method and apparatus, and a computer storage medium.
Background
At present, a traditional hybrid coding framework mainly performs predictive transform coding based on image blocks with different sizes, and an improved scheme often focuses on local rate distortion optimization of each coding tool; for example, entropy estimation models are continuously improved and promoted, probability estimation models based on a mixed gaussian model and a gaussian super-prior distribution entropy estimation model are proposed, and a context model is established in combination with an autoregressive model-based coding framework to help an end-to-end image coding framework obtain higher coding gain. However, existing video hybrid coding schemes are often difficult to meet when faced with higher video compression requirements.
Therefore, how to further improve the video compression rate through hybrid coding becomes a key point for those skilled in the art to solve the technical problem and research all the time.
Disclosure of Invention
In order to solve the problem that the compression rate of the existing video hybrid coding scheme is difficult to further improve, the invention innovatively provides a video hybrid coding and decoding method, a device and a medium based on deep learning, and the video hybrid coding and decoding method, device and medium can better solve the problem of the existing video hybrid coding scheme.
To achieve the technical purpose, the present invention specifically discloses a video hybrid coding method based on deep learning, which includes, but is not limited to, the following processes.
And extracting bottleneck layer characteristics from the specified frame image of the current video.
Reconstructing a first frame image according to the bottleneck layer characteristics; and quantizing and entropy coding the bottleneck layer characteristics in sequence to obtain intra-frame coded data for writing in a code stream.
And compensating a first subsequent frame image of the current video by using the first frame image, and sequentially transforming, quantizing and entropy coding the compensated image to obtain first prediction residual data for writing in a code stream.
Further, the hybrid encoding method further includes:
and compensating a second subsequent frame image of the current video by using the first frame image or the first subsequent frame image, and then transforming, quantizing and entropy coding the compensated image to obtain second prediction residual data for writing in a code stream.
Wherein the designated frame image, the first subsequent frame image, and the second subsequent frame image appear in the current video in a front-to-back order.
Further, the process of compensating the second subsequent frame image by using the first subsequent frame image comprises:
and sequentially carrying out transformation, quantization, inverse transformation, secondary compensation and loop filtering on the first subsequent frame image, taking the loop-filtered image as a reconstructed second frame image, and compensating the second subsequent frame image by using the second frame image.
Further, the process of extracting the bottleneck layer feature from the specified frame image of the current video includes:
sequentially grouping all frame images of the current video according to the sequence from front to back to obtain a plurality of groups of images, and taking the first frame image of each group of images as the appointed frame image; wherein each group of images includes the first subsequent frame image and the second subsequent frame image.
And extracting bottleneck layer characteristics from the first frame image of the current group of images.
Further, before the quantization and entropy coding are sequentially performed on the bottleneck layer features, the method further includes:
and performing code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network.
In order to achieve the technical purpose, the invention also discloses a video decoding method based on deep learning, which is used for decoding data generated by the video hybrid coding method according to any embodiment of the invention; the video decoding method based on deep learning comprises the following steps:
entropy decoding intra-frame coded data in the received code stream to obtain bottleneck layer characteristics, and decoding a specified frame image according to the bottleneck layer characteristics.
And performing entropy decoding, inverse quantization and inverse transformation on first prediction residual data in the received code stream, compensating by using the specified frame image, and performing loop filtering on the compensated data so as to decode a first subsequent frame image in a loop filtering manner.
The deep learning based video decoding method further comprises:
and performing entropy decoding, inverse quantization and inverse transformation on second prediction residual data in the received code stream, compensating by using the first subsequent frame image or the appointed frame image, and performing loop filtering on the compensated image so as to decode a second subsequent frame image in a loop filtering manner.
In order to achieve the technical object, the present invention also specifically discloses a video hybrid coding device based on deep learning, which includes, but is not limited to, the following structure.
And the analysis network module is used for extracting the bottleneck layer characteristics from the specified frame image of the current video.
And the encoding end generation network module is used for reconstructing a first frame image according to the bottleneck layer characteristics.
And the quantization module quantizes the bottleneck layer characteristics.
And the entropy coding module is used for entropy coding the quantized bottleneck layer characteristics to obtain intra-frame coded data used for writing in the code stream.
And the encoding end compensation module is used for compensating a first subsequent frame image of the current video by using the first frame image.
And the transformation module is used for transforming the compensated image.
The quantization module is further configured to quantize the transformed image.
The entropy coding module is further configured to perform entropy coding on the quantized image to obtain first prediction residual data for writing into a code stream.
In order to achieve the technical object, the invention also discloses a video decoding device based on deep learning, which is used for decoding the data generated by the video hybrid coding device of any embodiment of the invention; the video decoding apparatus based on the deep learning includes, but is not limited to, the following structure.
The entropy decoding module is used for carrying out entropy decoding on intra-frame coded data in the received code stream so as to obtain bottleneck layer characteristics; and the decoder is also used for carrying out entropy decoding on the first prediction residual data in the received code stream.
And the decoding end generates a network module for decoding the appointed frame image according to the bottleneck layer characteristics.
And the decoding end inverse quantization module is used for carrying out inverse quantization on the entropy-decoded first prediction residual data.
And the decoding end inverse transformation module is used for carrying out inverse transformation on the first prediction residual data after inverse quantization.
And the decoding end compensation module is used for compensating the first prediction residual data after inverse transformation by using the appointed frame image.
And the decoding end loop filtering module is used for performing loop filtering on the compensated first prediction residual data so as to decode a first subsequent frame image in a loop filtering mode.
To achieve the above technical object, the present invention further provides a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements a deep learning based video hybrid encoding method in any embodiment of the present invention or a deep learning based video decoding method in any embodiment of the present invention.
The invention has the beneficial effects that: the invention provides a novel video coding and decoding scheme, which can realize high-efficiency compression and quick decoding of videos, greatly improve the video compression performance and better solve the problems of difficult compression ratio improvement, slow decoding and the like in the conventional video mixed coding scheme.
Compared with the conventional technology, especially compared with the scheme of generating the intra-frame prediction pixels by adopting the neighborhood pixels, the method does not need to encode the residual error information in the intra-frame encoding process, so the encoding efficiency of the method is far higher than that of the conventional encoding scheme, and the video encoding performance is greatly improved.
In addition, the weight data of the analysis network and the generation network in the deep learning self-encoder are stored off line, and do not need to be transmitted through a code stream, so the deep learning self-encoder can effectively reduce the code flow.
Drawings
Fig. 1 is a flow chart of a video hybrid coding method based on deep learning according to some embodiments of the present invention.
Fig. 2 illustrates a schematic diagram of a coding end framework in some embodiments of the invention.
Fig. 3 shows a decoding end frame diagram in some embodiments of the invention.
FIG. 4 illustrates a schematic diagram of the operational status of an analysis network module and a generation network module in some embodiments of the invention.
Fig. 5 shows a graph of rate-distortion performance compared in synchronization with various conventional coding algorithms.
Detailed Description
The following explains and explains a video hybrid encoding and decoding method and apparatus based on deep learning, and a computer readable storage medium in detail, with reference to the drawings of the specification.
The first embodiment is as follows:
as shown in fig. 1 and 2, the present embodiment provides a video hybrid coding method based on deep learning, which can perform intra-frame coding work by a self-encoder of deep learning. In particular, the video hybrid encoding method may include, but is not limited to, the following steps.
Firstly, an intra-frame coding process is carried out, and bottleneck layer characteristics are extracted from a specified frame image of a current video. In some preferred embodiments of the present invention, the process of extracting the bottleneck layer feature from the specified frame image of the current video includes: sequentially grouping all frame images of the current video according to the sequence from front to back to obtain a plurality of groups of images, and taking the first frame image of each group of images as a designated frame image; if the current video has 1600 frames, it can be divided into 100 groups, each group having 16 frames; each group of images comprises a first subsequent frame image and a second subsequent frame image; the method comprises the steps that Bottleneck Layer characteristics are extracted from a first frame image of a current group of images, and the Bottleneck Layer characteristics can be realized through a deep learning self-encoder (Auto-encoder) constructed based on a convolutional neural network, wherein the self-encoder is provided with a Bottleneck Layer (Bottleneck Layer), different self-encoders can be trained for each code rate point (quantization parameter value), and different network weights can be configured in the training process to realize the encoding of different code rates; as in the above example, for any group of images, if the group of images is numbered from 1 to 16 from front to back, the number 1 image is a designated frame image, the number 2 image is a first subsequent frame image, and the number 3-16 images may be the first subsequent frame image or the second subsequent frame image.
As shown in fig. 2, before extracting the bottleneck layer feature, the present embodiment should determine whether the current frame is the designated frame Fn. That is, in this embodiment, it is required to select whether the encoding mode is the intra mode or the non-intra mode, and the bottleneck layer feature extraction processing can be performed on the first frame (the first frame) of each group of pictures in the intra mode. The non-intra mode in the present embodiment includes, but is not limited to, an inter mode, a skip mode, a merge mode, an affine motion mode, and the like.
Secondly, reconstructing the first frame image according to the bottleneck layer characteristics. And the bottleneck layer characteristics are sequentially quantized and entropy-coded to obtain intra-frame coded data for writing in the code stream, and further for transmitting the intra-frame coded data through the code stream. Therefore, the invention can improve the whole video coding efficiency by improving the intra-frame coding efficiency. In some preferred embodiments of the present invention, before the quantifying and entropy-encoding the bottleneck layer features in sequence, the method further comprises: and performing code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network. After the code rate estimation processing is carried out through the super-prior network, the invention can further improve the compression rate, so that the code stream is smaller and the transmission speed under the same transmission condition is higher.
And thirdly, performing an encoding process of interframe motion compensation prediction, effectively utilizing the first frame image to compensate a first subsequent frame image of the current video, and sequentially transforming, quantizing and entropy encoding the compensated image to obtain first prediction residual data written in a code stream, wherein the first prediction residual data is used for transmitting the first prediction residual data through the code stream, and any group of images of the current video can generate one or more times of first prediction residual data. The inter coding mode of the present embodiment may be a block-based inter motion compensated predictive coding mode. The transform in this embodiment may be an orthogonal transform, and may include, but is not limited to, Discrete Cosine Transform (DCT), wavelet transform, hadamard transform, etc., and some signals insensitive to human eyes may be exposed through the transform process. The present embodiment may achieve the purpose of compression by removing some non-critical or unimportant signals, such as signals insensitive to human eyes, through quantization. The present embodiment enables further compression by entropy coding.
Finally, the embodiment compensates a second subsequent frame image of the current video by using the first frame image or the first subsequent frame image, and then transforms, quantizes and entropy-encodes the compensated image to obtain second prediction residual data for writing into the code stream. In this embodiment, the step of compensating the second subsequent frame image of the current video by using the first frame image is similar to the step of compensating the first subsequent frame image of the current video by using the first frame image, and is not repeated. In some preferred embodiments of the present invention, the process of compensating the second subsequent frame image by using the first subsequent frame image comprises: the first subsequent frame image may be sequentially subjected to transformation, quantization, inverse transformation, secondary compensation and Loop Filter (Loop Filter), and then the Loop-filtered image is used as a reconstructed second frame image, so as to compensate the second subsequent frame image by using the second frame image. The appointed frame image, the first subsequent frame image and the second subsequent frame image appear in the current video from front to back. It should be understood that the preceding second subsequent frame image may be used to compensate the following second subsequent frame image. The present embodiment can be executed in a loop, and after one group of frame images is encoded, the above-mentioned bottleneck layer feature extraction step is restarted for the next group of frame images.
As shown in fig. 5, a schematic diagram of rate-distortion performance of the present invention compared with various conventional coding algorithms in the same step is shown, which can show that the video coding method based on deep learning provided by the present invention has better performance.
Example two:
as shown in fig. 3 and 4, the present embodiment provides a video decoding method based on deep learning, which corresponds to the video encoding method according to any embodiment of the present invention, and is used to perform corresponding decoding on corresponding data generated by the video hybrid encoding method according to any embodiment of the present invention.
Specifically, the video decoding method based on deep learning of the present embodiment includes, but is not limited to, the following steps.
Firstly, entropy decoding is carried out on intra-frame coded data in a received code stream to obtain bottleneck layer characteristics, and a specified frame image is decoded according to the bottleneck layer characteristics.
And then, performing entropy decoding, inverse quantization and inverse transformation on the first prediction residual data in the received code stream, performing compensation by using a specified frame image, and performing loop filtering on the compensated data so as to decode a first subsequent frame image in a loop filtering manner.
Then, entropy decoding, inverse quantization and inverse transformation can be carried out on second prediction residual data in the received code stream, compensation is carried out by utilizing the first subsequent frame image or the appointed frame image, loop filtering is carried out on the compensated image, and therefore a second subsequent frame image is decoded in a loop filtering mode.
Example three:
the present embodiment is based on the same inventive concept as the first embodiment, and can specifically provide a video hybrid coding device based on deep learning. The device can provide a video coding framework fusing a depth intra-frame self-encoder and inter-frame motion compensation prediction, an intra-frame mode or a non-intra-frame mode can be selected at a coding end, efficient coding of video is achieved, and further compression of the video is completed.
And the analysis network module inputs the original signals and outputs the bottleneck layer characteristics. The visible analysis network module is used for extracting the bottleneck layer characteristics from the appointed frame image of the current video. In this embodiment, all frame images of a current video are sequentially grouped and set from front to back to obtain a plurality of groups of images, and a first frame image of each group of images is used as a designated frame image; each group of images can comprise a first subsequent frame image and a second subsequent frame image, and the bottleneck layer characteristics are extracted from the first frame image of the current group of images.
The encoding end generates a network module, which is used to reconstruct the first frame image according to the bottleneck layer characteristics, so as to obtain P illustrated in fig. 2. At the encoding end, the analysis network module and the encoding end generation network module in this embodiment can form a deep learning self-encoder constructed based on a convolutional neural network, and the encoder has a bottleneck layer.
And the quantification module is used for quantifying the characteristics of the bottleneck layer.
And the entropy coding module is used for entropy coding the quantized bottleneck layer characteristics to obtain intra-frame coded data used for writing in the code stream.
The encoding end compensation module is used for compensating a first subsequent frame image of the current video by using the first frame image P; and is also used for compensating a second subsequent frame image in the current video by using the first frame image P or the first subsequent frame image.
The transformation module is used for transforming the compensated image; the transformation module may also be used to transform the first subsequent frame image.
And the super-prior coding module can be used for carrying out code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network.
The quantization module is also used for quantizing the transformed image; and the quantization module may also be used to quantize the transformed first subsequent frame image.
And the coding end inverse quantization module is used for carrying out inverse quantization on the quantized first subsequent frame image.
A coding end inverse transformation module for performing inverse transformation on the inverse quantized first subsequent frame image to obtain D shown in fig. 2n′。
A secondary compensation module at the encoding end, which can be used to perform secondary compensation on the inverse-transformed first subsequent frame image to obtain uF as shown in FIG. 2n′。
And the encoding-side loop filtering module is configured to perform loop filtering on the secondarily compensated first subsequent frame image, so as to use the loop-filtered image as a reconstructed second frame image (also denoted by P in fig. 2) and compensate the second subsequent frame image by using the second frame image.
The entropy coding module can be also used for entropy coding the quantized image to obtain first prediction residual data used for writing in a code stream; and the second prediction residual error data is used for entropy coding the transformed and quantized second subsequent frame image to obtain second prediction residual error data used for writing in the code stream. The designated frame image, the first subsequent frame image, and the second subsequent frame image in this embodiment may appear in the current video in the order from front to back.
Example four:
the video decoding method in the second embodiment can be based on the same inventive concept, and this embodiment specifically provides a video decoding apparatus based on deep learning, which can be used for decoding data generated by the video hybrid encoding apparatus in any embodiment of the present invention.
As shown in fig. 3 and 4, the video decoding apparatus based on deep learning in the present embodiment includes, but is not limited to, the following modules. In addition, the decoding end of the invention does not need to analyze the network, so the structure of the decoding end of the invention is more simplified.
The entropy decoding module is used for carrying out entropy decoding on intra-frame coded data in the received code stream so as to obtain bottleneck layer characteristics; the decoder is also used for carrying out entropy decoding on the first prediction residual data in the received code stream; and the decoder is also used for carrying out entropy decoding on second prediction residual data in the received code stream.
And the decoding end generates a network module, inputs the bottleneck layer characteristics subjected to entropy decoding, and outputs the bottleneck layer characteristics subjected to entropy decoding into a reconstructed intra-frame coding image. As can be seen, the decoding end generation network module of this embodiment is used for decoding the corresponding specified frame image P according to the bottleneck layer characteristics.
And the decoding end inverse quantization module is used for carrying out inverse quantization on the first prediction residual data and the second prediction residual data after entropy decoding.
A decoding-end inverse transformation module, configured to perform inverse transformation on the inverse quantized first prediction residual data and the second prediction residual data to obtain D in fig. 3n′。
A decoding-end compensation module, configured to compensate the inverse-transformed first prediction residual data with a specified frame image to obtain uF in fig. 3n'; and also for compensating the inversely transformed second prediction residual data with the first subsequent frame picture (also denoted by P) or the designated frame picture.
And the decoding end loop filtering module is used for performing loop filtering on the compensated first prediction residual data and second prediction residual data so as to decode a first subsequent frame image and a second subsequent frame image in a loop filtering mode.
Example five:
the present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a deep learning based video hybrid encoding method or a deep learning based video decoding method according to any one of the embodiments of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM-Only Memory, or flash Memory), an optical fiber device, and a portable Compact Disc Read-Only Memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic Gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic Gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "the present embodiment," "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and simplifications made in the spirit of the present invention are intended to be included in the scope of the present invention.

Claims (10)

1. A video hybrid coding method based on deep learning is characterized by comprising the following steps:
extracting bottleneck layer characteristics from a specified frame image of a current video, and taking a first frame image as the specified frame image;
reconstructing a first frame image according to the bottleneck layer characteristics; the bottleneck layer characteristics are sequentially quantized and entropy-coded to obtain intra-frame coded data used for writing in a code stream;
and performing inter-frame motion compensation on a first subsequent frame image of the current video by using the first frame image, and sequentially performing transformation, quantization and entropy coding on the compensated image to obtain first prediction residual data for writing in a code stream.
2. The method of claim 1, further comprising:
compensating a second subsequent frame image of the current video by using the first frame image or the first subsequent frame image, and then transforming, quantizing and entropy coding the compensated image to obtain second prediction residual data for writing in a code stream;
wherein the designated frame image, the first subsequent frame image, and the second subsequent frame image appear in the current video in a front-to-back order.
3. The method of claim 2, wherein the process of compensating the second subsequent frame image by the first subsequent frame image comprises:
and sequentially carrying out transformation, quantization, inverse transformation, secondary compensation and loop filtering on the first subsequent frame image, taking the loop-filtered image as a reconstructed second frame image, and compensating the second subsequent frame image by using the second frame image.
4. The method according to any one of claims 1 to 3, wherein the process of extracting the bottleneck layer features from the specified frame images of the current video comprises:
sequentially grouping all frame images of the current video according to the sequence from front to back to obtain a plurality of groups of images, and taking the first frame image of each group of images as the appointed frame image; wherein each of the groups of images includes the first subsequent frame image and the second subsequent frame image;
and extracting bottleneck layer characteristics from the first frame image of the current group of images.
5. The deep learning-based video hybrid coding method according to claim 1, further comprising, before the sequentially quantizing and entropy coding the bottleneck layer features:
and performing code rate estimation on the bottleneck layer characteristics to be quantized by adopting a super-prior network.
6. A video decoding method based on deep learning, for decoding data generated by the video hybrid coding method of any one of claims 2 to 5; the video decoding method based on deep learning comprises the following steps:
entropy decoding intra-frame coded data in a received code stream to obtain bottleneck layer characteristics, and decoding a specified frame image according to the bottleneck layer characteristics;
and performing entropy decoding, inverse quantization and inverse transformation on first prediction residual data in the received code stream, compensating by using the specified frame image, and performing loop filtering on the compensated data so as to decode a first subsequent frame image in a loop filtering manner.
7. The method for decoding video based on deep learning of claim 6, further comprising:
and performing entropy decoding, inverse quantization and inverse transformation on second prediction residual data in the received code stream, compensating by using the first subsequent frame image or the appointed frame image, and performing loop filtering on the compensated image so as to decode a second subsequent frame image in a loop filtering manner.
8. An apparatus for video hybrid encoding based on deep learning, comprising:
the analysis network module is used for extracting bottleneck layer characteristics from the appointed frame image of the current video and taking the first frame image as the appointed frame image;
the encoding end generation network module is used for reconstructing a first frame image according to the bottleneck layer characteristics;
a quantization module to quantize the bottleneck layer features;
the entropy coding module is used for entropy coding the quantized bottleneck layer characteristics to obtain intra-frame coded data used for writing in a code stream;
the encoding end compensation module is used for performing inter-frame motion compensation on a first subsequent frame image of the current video by using the first frame image;
the transformation module is used for transforming the compensated image;
the quantization module is further used for quantizing the transformed image;
the entropy coding module is further configured to perform entropy coding on the quantized image to obtain first prediction residual data for writing into a code stream.
9. A video decoding apparatus based on deep learning, for decoding the data generated by the video hybrid coding apparatus of claim 8; the deep learning based video decoding device comprises:
the entropy decoding module is used for carrying out entropy decoding on intra-frame coded data in the received code stream so as to obtain bottleneck layer characteristics; the decoder is also used for carrying out entropy decoding on the first prediction residual data in the received code stream;
the decoding end generates a network module, which is used for decoding the appointed frame image according to the bottleneck layer characteristics;
the decoding end inverse quantization module is used for carrying out inverse quantization on the first prediction residual data after entropy decoding;
the decoding end inverse transformation module is used for carrying out inverse transformation on the first prediction residual data after inverse quantization;
a decoding end compensation module for compensating the first prediction residual data after inverse transformation by using the appointed frame image;
and the decoding end loop filtering module is used for performing loop filtering on the compensated first prediction residual data so as to decode a first subsequent frame image in a loop filtering mode.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a video mix encoding method as set forth in one of claims 1 to 5 or a video decoding method as set forth in claim 6 or 7.
CN202010604772.1A 2020-06-29 2020-06-29 Video hybrid coding and decoding method, device and medium based on deep learning Active CN111901596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010604772.1A CN111901596B (en) 2020-06-29 2020-06-29 Video hybrid coding and decoding method, device and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010604772.1A CN111901596B (en) 2020-06-29 2020-06-29 Video hybrid coding and decoding method, device and medium based on deep learning

Publications (2)

Publication Number Publication Date
CN111901596A CN111901596A (en) 2020-11-06
CN111901596B true CN111901596B (en) 2021-10-22

Family

ID=73206537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010604772.1A Active CN111901596B (en) 2020-06-29 2020-06-29 Video hybrid coding and decoding method, device and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN111901596B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113766237B (en) * 2021-09-30 2024-07-02 咪咕文化科技有限公司 Encoding method, decoding method, device, equipment and readable storage medium
CN114125443A (en) * 2021-11-19 2022-03-01 展讯通信(上海)有限公司 Video code rate control method and device and electronic equipment
WO2023126568A1 (en) * 2021-12-27 2023-07-06 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding
CN115022637A (en) * 2022-04-26 2022-09-06 华为技术有限公司 Image coding method, image decompression method and device
WO2024049627A1 (en) * 2022-09-02 2024-03-07 Interdigital Vc Holdings, Inc. Video compression for both machine and human consumption using a hybrid framework
CN115529457B (en) * 2022-09-05 2024-05-14 清华大学 Video compression method and device based on deep learning
CN115209147B (en) * 2022-09-15 2022-12-27 深圳沛喆微电子有限公司 Camera video transmission bandwidth optimization method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108632527A (en) * 2017-03-24 2018-10-09 安讯士有限公司 Controller, video camera and the method for controlling video camera
CN110677651A (en) * 2019-09-02 2020-01-10 合肥图鸭信息科技有限公司 Video compression method
CN110753225A (en) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 Video compression method and device and terminal equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8493499B2 (en) * 2010-04-07 2013-07-23 Apple Inc. Compression-quality driven image acquisition and processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108632527A (en) * 2017-03-24 2018-10-09 安讯士有限公司 Controller, video camera and the method for controlling video camera
CN110677651A (en) * 2019-09-02 2020-01-10 合肥图鸭信息科技有限公司 Video compression method
CN110753225A (en) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 Video compression method and device and terminal equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于神经网络的图像视频编码》;贾川民等;《电信科学》;20190531;全文 *

Also Published As

Publication number Publication date
CN111901596A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN111901596B (en) Video hybrid coding and decoding method, device and medium based on deep learning
CN112715027B (en) Neural network driven codec
US11159789B2 (en) Generative adversarial network based intra prediction for video coding
US11109051B2 (en) Motion compensation using temporal picture interpolation
US9723318B2 (en) Compression and decompression of reference images in a video encoder
US8285064B2 (en) Method for processing images and the corresponding electronic device
CN114615504B (en) Video decoding method, video encoding method, device and equipment
CN113766249A (en) Loop filtering method, device and equipment in video coding and decoding and storage medium
KR101739603B1 (en) Method and apparatus for reusing tree structures to encode and decode binary sets
CN113259671B (en) Loop filtering method, device, equipment and storage medium in video coding and decoding
KR102245682B1 (en) Apparatus for compressing image, learning apparatus and method thereof
CN108353180B (en) Video coding with delayed reconstruction
CN111182310A (en) Video processing method and device, computer readable medium and electronic equipment
CN111757109A (en) High-real-time parallel video coding and decoding method, system and storage medium
KR100679027B1 (en) Method and apparatus for coding image without DC component loss
CN112954350B (en) Video post-processing optimization method and device based on frame classification
CN115643406A (en) Video decoding method, video encoding device, storage medium, and storage apparatus
CN115529457A (en) Video compression method and device based on deep learning
Belyaev et al. Error concealment for 3-D DWT based video codec using iterative thresholding
US11647228B2 (en) Method and apparatus for encoding and decoding video signal using transform domain prediction for prediction unit partition
WO2024140951A1 (en) A neural network based image and video compression method with integer operations
WO2024140849A1 (en) Method, apparatus, and medium for visual data processing
EP4412218A1 (en) Filtering method and apparatus, encoding method and apparatus, decoding method and apparatus, computer-readable medium, and electronic device
WO2024149392A1 (en) Method, apparatus, and medium for visual data processing
WO2024217423A1 (en) Method, apparatus, and medium for video processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant