CN117896525A - Video processing method, model training method, device, electronic equipment and storage medium - Google Patents

Video processing method, model training method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117896525A
CN117896525A CN202410066673.0A CN202410066673A CN117896525A CN 117896525 A CN117896525 A CN 117896525A CN 202410066673 A CN202410066673 A CN 202410066673A CN 117896525 A CN117896525 A CN 117896525A
Authority
CN
China
Prior art keywords
video
residual error
encoding
quantized
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410066673.0A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rongming Microelectronics Jinan Co ltd
Original Assignee
Rongming Microelectronics Jinan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rongming Microelectronics Jinan Co ltd filed Critical Rongming Microelectronics Jinan Co ltd
Priority to CN202410066673.0A priority Critical patent/CN117896525A/en
Publication of CN117896525A publication Critical patent/CN117896525A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application relates to a video processing method, a model training device, an electronic device and a storage medium. The video processing method comprises the following steps: after the video to be processed is acquired; compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error; inputting the first residual error into a trained super prior machine learning model for lossy compression treatment to obtain a quantized second residual error; and encoding the quantized second residual error by using lossless encoding to obtain a compressed video. Through the scheme, when the video to be processed is coded, the super prior machine learning model and the lossless coding algorithm are comprehensively utilized to compress the video subjected to primary compression in steps, so that lossless compression can be realized, and the existing hardware support can be fully utilized to realize high-efficiency encoding and decoding of the compressed video.

Description

Video processing method, model training method, device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of video processing, in particular to a method and a device for video processing and model training, electronic equipment and a storage medium.
Background
In video application scenarios, video coding is an essential operation link. The operation flow is complex when video encoding is performed.
Video compression uses standard video codecs, first reducing entropy by various methods including motion estimation, inter prediction, scaling and filtering, etc. The essence of this process is to reduce the number of bits that need to be used in the encoding process by exploiting the repeatability of the video content in the time/space domain, where compression is often lossy. When the entropy reduced data stream is generated by the above encoding steps, the data is losslessly compressed, the compression efficiency of which depends on how much entropy is present in the system. This lossless compressed bit stream is then sent to the receiving end for decoding. Although this method has a good effect in practical application, the effect of optimal compression is not achieved in terms of lossless compression, in other words, the entropy contained in the compressed frame is relatively more, or the bit rate is relatively larger. Accordingly, there is a need for a video processing scheme that enables low entropy compression to be quickly and efficiently implemented.
Disclosure of Invention
The application aims to provide a video processing method, a model training method, a device, an electronic device and a storage medium, which can realize a scheme for improving video coding efficiency.
According to a first aspect of an embodiment of the present application, there is provided a video processing method, including:
Acquiring a video to be processed;
compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error;
Inputting the first residual error into a trained super prior machine learning model for lossy compression treatment to obtain a quantized second residual error;
And encoding the quantized second residual error by using lossless encoding to obtain a compressed video.
In one embodiment, the inputting the first residual error into a trained super a priori machine learning model to perform lossy compression processing to obtain a quantized second residual error includes:
Carrying out quantization treatment on the first residual error to obtain a quantized first residual error;
Performing lossy compression processing on the quantized first residual error by using a trained super prior machine learning model to obtain a second residual error for reducing entropy of the first residual error;
And carrying out quantization processing on the second residual error to obtain the quantized second residual error.
In one embodiment, the encoding the quantized second residual using lossless encoding to obtain a compressed video includes:
and carrying out lossless coding processing on the quantized second residual error by utilizing arithmetic coding to obtain the compressed video.
In one embodiment, the compressing the video to be processed by using a first codec algorithm to obtain a first residual error includes:
And encoding the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error with reduced entropy after encoding processing.
In one embodiment, after obtaining the compressed video, the method further comprises:
Decoding the compressed super prior by utilizing the arithmetic coding to obtain a quantized second residual error;
decoding the quantized second residual error by using a trained super prior machine learning model to obtain a first residual error;
And decoding the obtained compressed video by utilizing the first residual error and a first encoding and decoding algorithm to obtain an original video.
According to a second aspect of an embodiment of the present application, there is provided a model training method, the method comprising:
compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error sample;
Inputting the first residual error sample into a to-be-trained super prior machine learning model for training to obtain a super prior which enables lossless coding to output optimal probability; and the trained super prior machine learning model and the lossless coding are used for coding the video to be processed to obtain the compressed video.
In one embodiment, the inputting the first residual sample into a super prior machine learning model to be trained to train, to obtain a super prior that enables lossless coding to output an optimal probability includes:
Inputting the first residual error sample into a super prior machine learning model to be trained, and inputting the lossless coding into the lossless coding by utilizing the super prior obtained by training to obtain the bit rate after coding;
Comparing the encoded bit rate;
And if the bit rate after encoding is the lowest bit rate, the super prior is the super prior of the optimal probability of the lossless encoding output.
According to a second aspect of an embodiment of the present application, there is provided a video processing apparatus including:
The acquisition module is used for acquiring the video to be processed;
The first coding module is used for compressing the video to be processed by using a first coding and decoding algorithm to obtain a first residual error;
The machine learning module is used for inputting the first residual error into a trained super prior machine learning model to perform lossy compression processing to obtain a quantized second residual error;
And the second coding module is used for obtaining the compressed video after coding the quantized second residual error by using lossless coding.
According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a memory and a processor, the memory being for storing a computer program executable by the processor; the processor is configured to execute a computer program in the memory to implement the method of the first aspect or the second aspect.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the method of the first or second aspect described above is enabled when the executable computer program in the storage medium is executed by a processor.
Compared with the prior art, the application has the beneficial effects that: after the video to be processed is acquired; compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error; inputting the first residual error into a trained super prior machine learning model for lossy compression treatment to obtain a quantized second residual error; and encoding the quantized second residual error by using lossless encoding to obtain a compressed video. Through the scheme, when the video to be processed is coded, the super prior machine learning model and the lossless coding algorithm are comprehensively utilized to compress the video subjected to primary compression in steps, so that lossless compression can be realized, and the existing hardware support can be fully utilized to realize high-efficiency encoding and decoding of the compressed video.
Drawings
Fig. 1 is a flow chart of a video processing method according to an embodiment of the present application.
Fig. 2 is a flowchart of a method for generating a second residual error according to an embodiment of the present application.
Fig. 3 is a flow chart of a model training method according to an embodiment of the present application.
Fig. 4a is a schematic diagram of video encoding and training process according to an embodiment of the present application.
Fig. 4b is a schematic diagram of a video decoding process according to an embodiment of the present application.
Fig. 5 is a block diagram of a video processing apparatus according to an exemplary embodiment.
FIG. 6 is a block diagram of a model training apparatus, according to an example embodiment.
Fig. 7 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
Unless defined otherwise, technical or scientific terms used in the specification and claims should be given the ordinary meaning as understood by one of ordinary skill in the art to which the invention pertains. In the following, specific embodiments of the present invention will be described with reference to the drawings, and it should be noted that in the course of the detailed description of these embodiments, it is not possible in the present specification to describe all features of an actual embodiment in detail for the sake of brevity. Modifications and substitutions of embodiments of the invention may be made by those skilled in the art without departing from the spirit and scope of the invention, and the resulting embodiments are also within the scope of the invention.
Video compression uses standard video codecs, first reducing entropy by various methods including motion estimation, inter prediction, scaling and filtering, etc. The essence of this process is to reduce the number of bits that need to be used in the encoding process by exploiting the repeatability of the video content in the time/space domain, where compression is often lossy. When the entropy reduced data stream is generated by the above encoding steps, the data is losslessly compressed, the compression efficiency of which depends on how much entropy is present in the system. This lossless compressed bit stream is then sent to the receiving end for decoding. Many existing codecs use context-adaptive binary arithmetic coding (CABAC) for lossless compression. But is not optimal in terms of lossless compression, e.g. the bit rate after compression is still relatively high.
It should be noted that context-adaptive binary arithmetic coding (CABC) is a form of entropy coding that uses fewer bits to store frequently occurring symbols rather than unusual symbols. It achieves this by encoding the entire information into a single number, an arbitrarily accurate fraction q, where 0.0< = q < = 1.0. It represents the current information as a range defined by two digits. Also, context-adaptive binary arithmetic coding has a context-specific probability. For example, the residual has one set of weights, the motion vector has another set of weights, and the encoded metadata will have another set of weights. For arithmetic coding to be effective, the probability of a symbol must be accurately guessed. Inaccurate guesses will lead to less than ideal results.
The principle of operation of context-adaptive binary arithmetic coding is to encode the entire information into a single number, which is a fraction q of arbitrary precision, where 0.0< = q < = 1.0. This number is then converted to a binary output on the system. The exact mathematical encoding of the numbers depends on the probability of the symbol. The probability of the symbol can be predicted more accurately, and the coding efficiency is higher.
Although some attempts in the prior art use machine learning model techniques to implement video encoding and decoding, these models often require more computational processing, meaning that the machine learning model is larger, more data needs to be processed, and the data processing speed is slower.
Furthermore, existing codecs have been deployed on a large scale in terms of hardware acceleration infrastructure. Such hardware acceleration infrastructure can accelerate video processing and compression speeds, and therefore even with machine learning model techniques, consideration needs to be given to whether existing hardware acceleration infrastructure can be supported. If the infrastructure is rebuilt for the machine learning model, the cost is high.
Therefore, a solution is needed that can further improve the processing efficiency and processing capacity of video codec based on fully utilizing the existing hardware acceleration infrastructure (i.e., the existing equipment for video codec).
Fig. 1 is a schematic flow chart of a video processing method according to an embodiment of the present application. The execution subject of the method may be a local computer, a cloud computing server, or the like. The method specifically comprises the following steps:
Step 101: and acquiring the video to be processed.
Step 102: and compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error.
Step 103: and inputting the first residual error into a trained super prior machine learning model for lossy compression treatment to obtain a quantized second residual error.
Step 104: and encoding the quantized second residual error by using lossless encoding to obtain a compressed video.
The first codec algorithm described herein may be H.264/AVC (Advanced Video Coding): h.264 is a widely used video codec standard with high compression efficiency and good video quality. It supports various video resolutions and bit rates, and is widely used in the fields of video communication, streaming media, broadcasting, etc. 265/HEVC (HIGH EFFICIENCY Video Coding): h.265 is a successor to h.264 and is also an efficient video codec standard. Compared with H.264, H.265 can provide higher compression ratio and better video quality, and is suitable for transmission and storage of high-resolution videos such as 4K, 8K and the like. VP9: VP9 is an open source video codec developed by Google for achieving compression of high quality video. VP9 is comparable to H.265 in compression efficiency, while having extensive platform and browser support. AV1 (AOMedia Video 1): AV1 is an open source video codec developed by AOMedia alliance and is intended to provide efficient video compression and better video quality. The AV1 has higher compression efficiency and is suitable for streaming media, video communication and online video platforms. MPEG-2 (Moving Picture Experts Group 2): MPEG-2 is a video coding and decoding standard widely used in the fields of digital television, DVD, broadcast, etc. It has high compatibility and broad support, but its compression efficiency is low compared to modern coding standards.
The implementation of the first coding algorithm described above, in turn, needs to rely on existing hardware infrastructure that meets industry specifications and is capable of supporting the first coding algorithm, but not other algorithms, such as part of the machine learning model algorithm. Hardware infrastructure as referred to herein is, for example, a Graphics Processor (GPU): GPUs are hardware devices that are dedicated to graphics and image processing, and may also be used for video encoding and decoding. Modern GPUs typically have powerful parallel computing capabilities and efficient video processing units that can speed up the execution of video codec algorithms. Video Codec Chip (Codec Chip): some dedicated hardware chips are designed to accelerate the video codec process. These chips typically contain dedicated hardware circuitry and instruction sets that are capable of efficiently executing video codec algorithms, providing faster speeds and lower power consumption. System-on-Chip (SoC): the SoC is a chip that integrates a plurality of functional modules, including a video codec. These socs are commonly used in mobile devices, smart televisions, embedded systems, and the like, providing hardware-accelerated video codec functionality. Hardware video codec accelerator (Video Codec Accelerator): this is a dedicated hardware accelerator for accelerating the video codec process. It typically contains specialized hardware circuitry and instruction sets that can efficiently perform video codec algorithms, providing faster speeds and lower power consumption. The content of the hardware infrastructure is only illustrated, and there are many existing hardware infrastructures, which are not described in detail herein, and the basic arrangement in the above embodiments does not limit implementation of the technical solution of the present application.
Here, the residual (or residual frame) refers to a difference or error between the current frame and the predicted frame in video coding. In video coding, inter-frame prediction is typically used to reduce redundancy of data. The predicted frame is generated by a reference frame and a motion vector for approximately predicting the content of the current frame. Then, a residual frame is obtained by calculating the difference between the current frame and the predicted frame. The residual frame contains details that cannot be predicted in the current frame. It captures the differences between the current frame and the predicted frame, including motion, texture, subtle changes, etc. Since the residual frame typically contains higher frequency components and detail information, it requires more bits to encode.
The super prior (Hyperprior) referred to herein refers to a prior distribution used to model data during encoding and decoding. It is a probabilistic model that describes the statistical properties of data and plays an important role in the encoding and decoding process. In the arithmetic coding herein, the super-a priori is used to model the symbols, in other words the weights used in the arithmetic coding are provided by the super-a priori model. The symbols are the basic units in the encoding and decoding process and may be pixel values, pixel differences, or other data elements. The super prior model may predict the probability distribution of the current symbol from the previous symbols, thereby enabling more efficient encoding. The selection and modeling method of the super prior model is critical to the performance of encoding and decoding. By accurately modeling the statistical properties of the data, the super a priori model may provide better compression efficiency and fidelity. Super-prior is a concept of prior distribution for modeling data in lossless compression, which is used to predict probability distribution of symbols during encoding and decoding to achieve efficient compression.
In practical application, the existing hardware acceleration infrastructure can be fully utilized to complete the first compression processing of the video to be processed, and a compressed first residual error is obtained. The compressed first residual, compared to the original video to be processed, already achieves a significant reduction of the bit rate.
Furthermore, the first residual is used as an input parameter and is input into a trained super prior machine learning model, the machine learning model is utilized to perform lossy compression processing, and in the processing process, the trained machine learning model can provide more reasonable super prior, so that the optimal entropy reduction is realized when the first residual is compressed, in other words, the bit rate contained in data compressed by the super prior provided by the machine learning model is minimum, and the compression effect is better. Meanwhile, the trained machine learning model is only used for carrying out data processing on residual errors, but not on the original video to be processed, namely, the data volume required to be processed by the super prior machine learning model is lighter and more energy-saving compared with the machine learning model for carrying out encoding and decoding processing on the video to be processed, and the super prior machine learning model consumes less resources and has higher data processing speed and data processing efficiency.
On the basis, lossless compression is further carried out on the second residual by using a lossless compression algorithm, so that unnecessary loss is avoided after the video obtained by the scheme is compressed, and the decoded video can be closer to the original video to be processed.
Through the scheme, in order to fully utilize and obtain the support of the existing hardware acceleration infrastructure, the first residual error is obtained after the video to be processed is encoded by using the first encoding and decoding algorithm, and the existing basic hardware can be fully utilized while the preliminary compression of the video can be completed. The obtained first residual error is that the data flow is much smaller than the original video to be processed, which means that the super prior machine learning model is smaller, input data is less, the requirement on hardware is not high, the data processing efficiency is higher, and the super prior for further reducing entropy can be provided. After the second residual error is obtained by using the super prior machine learning model, further compressing the second residual error by using a lossless compression algorithm so as to better meet the requirement of lossless compression of the video.
In one or more embodiments of the present application, fig. 2 is a schematic flow chart of a method for generating a second residual error according to an embodiment of the present application. As can be seen from fig. 2, the step of inputting the first residual error into a trained super prior machine learning model to perform lossy compression processing to obtain a quantized second residual error specifically includes the following steps:
step 201: and carrying out quantization processing on the first residual error to obtain a quantized first residual error.
Step 202: and carrying out lossy compression processing on the quantized first residual by using a trained super prior machine learning model to obtain a second residual for reducing entropy of the first residual.
Step 203: and carrying out quantization processing on the second residual error to obtain the quantized second residual error.
In practical application, first, quantization processing is performed on the first residual. The first residual error is obtained by processing the original video to be processed through a first encoding and decoding algorithm and performing difference calculation. Quantization is the conversion of continuous values into discrete representations of values. By mapping the values of the first residual to a limited discrete set, the accuracy of the representation of the data can be reduced, thereby achieving compression.
After the first residual is obtained, the first residual is lossy compressed using a super a priori machine learning model. And carrying out lossy compression processing on the quantized first residual error by using a trained super-prior machine learning model. The super prior model is a prior distribution model for modeling data statistics that can predict the probability distribution of current data based on previous data and context information. By utilizing the super prior model, the quantized first residual error can be compressed more efficiently, so that the entropy of the data is reduced.
Further, after obtaining the second residual, the second residual is quantized. The first residual after lossy compression is referred to as a second residual. The second residual is quantized and mapped to a finite set of discrete values to reduce the accuracy of the representation of the data.
Through the above procedure, the original frame can be quantized and lossy compressed to produce two residuals: a quantized first residual and a quantized second residual. The optimal super prior can be found by using the super prior machine learning model, entropy can be further reduced by using the optimal super prior, the bit rate of compressed data is reduced, and the optimal compression effect is obtained. These residuals may be inverse processed at decoding to recover the original frame by decoding and dequantizing.
In one or more embodiments of the present application, the encoding the quantized second residual using lossless encoding to obtain a compressed video includes:
and carrying out lossless coding processing on the quantized second residual error by utilizing arithmetic coding to obtain the compressed video.
In practical applications, arithmetic coding is a lossless compression technique for sign coding the quantized second residual. Symbol encoding is the process of mapping each symbol to a binary code (assuming binary is employed for arithmetic encoding herein). Arithmetic coding uses an adaptive probability model to determine the coding probability of a symbol based on the probability distribution of the current symbol and encodes the symbol into a binary sequence. The binary sequence of symbols is mapped to a real value between 0 and 1 by arithmetic coding. The arithmetically encoded data are combined to form compressed video data. These data may be stored or transmitted for decoding and recovering the original video frames upon decoding. It should be noted that, in the present embodiment, the present application is not limited to using a specific arithmetic coding (for example, CABC), but may be other arithmetic coding in a non-binary form.
The lossless compression algorithm described herein may be of many kinds, including, for example: huffman Coding (Huffman Coding): huffman coding is a lossless compression algorithm based on variable length coding. The method realizes compression by constructing an optimal coding table according to the occurrence frequency of symbols and representing the symbols with high occurrence frequency by shorter codes. Arithmetic coding (ARITHMETIC CODING): arithmetic coding is a lossless compression algorithm based on symbol probabilities. It encodes the entire data sequence into a value between 0 and 1, which is mapped back to the original data sequence according to the probability distribution of the symbols. Arithmetic coding can achieve higher compression rates, but requires accurate floating point number calculations during encoding and decoding. Predictive coding (PREDICTIVE CODING): predictive coding is a lossless compression algorithm based on prediction errors. It estimates the relationship of the current symbol to the previous symbol by using a predictive model and encodes the prediction error. Common methods for predictive coding include differential coding and adaptive coding. Arithmetic coding is a lossless compression algorithm commonly used in video coding standards such as h.264 and h.265. It is encoded by modeling context information and using an adaptive probability model to achieve higher compression efficiency. Lossless image compression algorithm: lossless image compression algorithms include prediction-based algorithms (e.g., PNG), transform-based algorithms (e.g., JPEG-LS), entropy-coding-based algorithms (e.g., GIF), and the like. These algorithms are optimized for the characteristics of the image data to achieve lossless compression.
In one or more embodiments of the present application, the compressing the video to be processed with a first codec algorithm to obtain a first residual error includes:
And encoding the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error with reduced entropy after encoding processing.
In practical application, preprocessing is performed on the video to be processed: prior to encoding, the original video data may require some preprocessing steps, such as color space conversion, resolution adjustment, etc. Further, inter prediction is performed: video coding typically utilizes inter-prediction to reduce redundant information. The current frame is predicted by comparison with the previous reference frame. This step produces a predicted frame and a residual frame (i.e., the first residual as described above). Motion estimation and motion vector coding: motion estimation is used to determine motion vectors between the current frame and the reference frame for accurate pixel prediction in the predicted frame. Motion vector coding compresses motion vectors.
In one or more embodiments of the present application, after obtaining the compressed video, the method further includes:
Decoding the compressed super prior by utilizing the arithmetic coding to obtain a quantized second residual error;
decoding the quantized second residual error by using a trained super prior machine learning model to obtain a first residual error;
And decoding the obtained compressed video by utilizing the first residual error and a first encoding and decoding algorithm to obtain an original video.
In practical application, the decoding process is as follows: first, the compressed super prior is decoded using an arithmetic coding algorithm and converted back into a symbolic representation. And decoding the quantized second residual error by using the decoded super prior. This may be achieved by an inverse quantization operation, restoring the quantized values to the original second residual values. And decoding the quantized second residual error after decoding by using the trained super prior machine learning model to obtain a first residual error. The super a priori machine learning model may predict the value of the first residual from previous data and context information. Finally, the compressed video is decoded using the resulting first residual and the first codec algorithm to recover the original video. The first codec algorithm may be an algorithm that predicts from a previous frame and the first residual, e.g., a motion compensation algorithm. By combining the first residual with the predicted frame, the original video frame can be restored.
According to the scheme, the model and the algorithm used in the decoding process are the same as those in the encoding process, so that the super priori decoding is needed by utilizing an arithmetic algorithm, then the decoding is conducted by utilizing a trained super priori machine learning model to obtain a second residual, and finally the lossless original video is obtained by utilizing a conventional first encoding and decoding algorithm. Through the scheme, the quick and efficient lossless decoding of the compressed video is realized by comprehensively utilizing the super prior machine learning model and the lossless coding algorithm.
Based on the same thought, the embodiment of the application also provides a model training method. Fig. 3 is a schematic flow chart of a model training method according to an embodiment of the present application. As can be seen from fig. 3, the method comprises the following steps:
step 301: and compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error sample.
Step 302: inputting the first residual error sample into a to-be-trained super prior machine learning model for training to obtain a super prior which enables lossless coding to output optimal probability; and the trained super prior machine learning model and the lossless coding are used for coding the video to be processed to obtain the compressed video.
The step 302, wherein inputting the first residual sample into a super prior machine learning model to be trained to perform training to obtain a super prior that enables lossless encoding to output an optimal probability, includes:
Step 3021: and inputting the first residual error sample into a super prior machine learning model to be trained, and inputting the lossless coding by using the super prior obtained by training to obtain the bit rate after coding.
Step 3022: comparing the encoded bit rates.
Step 3023: and if the bit rate after encoding is the lowest bit rate, the super prior is the super prior of the optimal probability of the lossless encoding output.
In practical application, the super-prior machine learning model is trained by taking the first residual as a training sample, and a priori knowledge is generated by learning the relation between the characteristics of the input signal and the lowest bit rate, and the priori knowledge can help reduce the bit rate when arithmetic coding is performed. The model training is simple, the model structure is simple, and the machine learning model obtained through training is lighter.
The super prior may be considered herein as an encoder (encoder) that generates a series of encoding probability distributions based on the properties and statistics of the input signal. These probability distributions may guide the process of arithmetic coding to achieve more efficient data compression. The super prior machine learning model is trained and the optimal super prior is found, so that the compression performance of the data can be further improved by utilizing the super prior machine learning model, and the required bit number is reduced.
The super-prior machine learning model referred to herein may be a machine learning model based on a convolutional neural network implementation with compressed video capabilities, where compressed video is first quantized (and thus lossy) by a model, and then a bitstream is created over a network that creates a model to simulate the arithmetic coding weights of the compressed bitstream. While this model is itself lossy, the actual compression after quantization is arithmetic and lossless just like a conventional codec.
From the above, it can be seen that in order to utilize the hardware infrastructure of the existing legacy standard codec, and at the same time, at least partially utilize the method of machine learning to improve compression performance, machine learning and legacy codec algorithms are combined. Any standard video or image technique is employed to reduce the entropy of the video input and to quantize the signal in a lossy manner. This output is then used to train a machine learning (AI) model to find the super-a priori that results in the lowest bit rate of arithmetic coding. The super prior machine learning model is also encoded in a lossy manner and sent to a decoding end, so that lossless and efficient video encoding and decoding are realized. An optimal balance will be provided between the computational efficiency of existing hardware and software compression algorithms and the compression efficiency of convolutional neural network video compression techniques.
For ease of understanding, the video codec process will be illustrated by the specific embodiments below. Fig. 4a is a schematic diagram of video encoding and training process according to an embodiment of the present application. As can be seen from figure 4a,
First encoding video using conventional methods and generating a first residual directly related to the original frame. Further, the first residual of the first encoding is quantized, and the quantized first residual is encoded using a trained super-prior machine learning model, since the encoded frame and the original frame are typically very similar. The super prior machine learning model is simpler and has higher precision. The purpose of the super prior machine learning model is to reduce entropy of a residual map under the condition that residual accuracy is guaranteed. Further, the new residual map (i.e., the second residual) generated by the super a priori machine learning model is quantized (here with a small loss of quality). And carrying out lossless coding on the second residual error quantized in the previous step.
Fig. 4b is a schematic diagram of a video decoding process according to an embodiment of the present application. As can be seen from the view of figure 4b,
Arithmetic decoding is performed on the compressed super-prior and then a super-prior for decoding the compressed video is created by a decoding model (the same model used in the encoding stage, i.e. a super-prior machine learning model). The output of the decoder may be placed into any standard video entropy decoder and used to recreate a lossy decompressed version of the original video or image.
Fig. 5 is a block diagram of a video processing apparatus according to an exemplary embodiment. As shown in fig. 5, in this embodiment, the video processing apparatus includes:
the obtaining module 51 is configured to obtain a video to be processed.
The first encoding module 52 is configured to perform compression processing on the video to be processed by using a first codec algorithm, so as to obtain a first residual error.
The machine learning module 53 is configured to input the first residual error into a trained super prior machine learning model for performing lossy compression processing, so as to obtain a quantized second residual error.
And a second encoding module 54, configured to encode the quantized second residual by using lossless encoding to obtain a compressed video.
The machine learning module 53 is configured to perform quantization processing on the first residual error to obtain a quantized first residual error;
Performing lossy compression processing on the quantized first residual error by using a trained super prior machine learning model to obtain a second residual error for reducing entropy of the first residual error;
And carrying out quantization processing on the second residual error to obtain the quantized second residual error.
And a second encoding module 54, configured to perform lossless encoding processing on the quantized second residual by using arithmetic encoding, so as to obtain the compressed video.
The first encoding module 52 is configured to encode the video to be processed by using a first encoding algorithm, so as to obtain a first residual error with reduced entropy after the encoding process.
Optionally, the method further comprises a decoding module 55, configured to decode the compressed super prior by using the arithmetic coding, so as to obtain a quantized second residual;
decoding the quantized second residual error by using a trained super prior machine learning model to obtain a first residual error;
And decoding the obtained compressed video by utilizing the first residual error and a first encoding and decoding algorithm to obtain an original video.
FIG. 6 is a block diagram of a model training apparatus, according to an example embodiment. As shown in fig. 6, in this embodiment, the model training apparatus includes:
The compression module 61 is configured to compress the video to be processed by using a first codec algorithm, so as to obtain a first residual sample.
The training module 62 is configured to input the first residual error sample into a super prior machine learning model to be trained to perform training, so as to obtain a super prior that enables lossless coding to output an optimal probability; and the trained super prior machine learning model and the lossless coding are used for coding the video to be processed to obtain the compressed video.
The training module 62 is configured to input the first residual error sample to a super prior machine learning model to be trained, and input the lossless coding to obtain a coded bit rate by using the super prior obtained by training;
Comparing the encoded bit rate;
And if the bit rate after encoding is the lowest bit rate, the super prior is the super prior of the optimal probability of the lossless encoding output.
The embodiment of the application also provides electronic equipment, which comprises a processor and a memory; the memory is used for storing a computer program executable by the processor; the processor is configured to execute the computer program in the memory to implement the video processing and model training method according to any of the above embodiments.
Embodiments of the present application also provide a computer readable storage medium, which when executed by a processor, can implement the video processing and model training method described in any of the above embodiments.
The specific manner in which the processor performs the operations in the apparatus of the above embodiments has been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 7 is a block diagram of an electronic device, according to an example embodiment. For example, the electronic device 900 may be provided as a server. Referring to FIG. 7, device 900 includes a processing component 922 that further includes one or more processors, and memory resources represented by memory 932, for storing instructions, such as application programs, executable by processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Further, processing component 922 is configured to execute instructions to perform the above-described method for video processing.
The device 900 may also include a power component 926 configured to perform power management for the device 900, a wired or wireless network interface 950 configured to connect the device 900 to a network, and an input output (I/O) interface 958. The device 900 may operate based on an operating system stored in the memory 932, such as Windows Server, macOS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 932, that includes instructions executable by processing component 922 of device 900 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In the present invention, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" refers to two or more, unless explicitly defined otherwise.
The embodiments are described above in order to facilitate the understanding and application of the present application by those of ordinary skill in the art. It will be apparent to those skilled in the art that various modifications can be made to these embodiments and that the general principles described herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present application is not limited to the embodiments herein, and those skilled in the art, based on the present disclosure, may make improvements and modifications within the scope and spirit of the present application without departing from the scope and spirit of the present application.

Claims (10)

1. A method of video processing, the method comprising:
Acquiring a video to be processed;
compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error;
Inputting the first residual error into a trained super prior machine learning model for lossy compression treatment to obtain a quantized second residual error;
And encoding the quantized second residual error by using lossless encoding to obtain a compressed video.
2. The method of claim 1, wherein said inputting the first residual into a trained super a priori machine learning model for lossy compression to obtain a quantized second residual comprises:
Carrying out quantization treatment on the first residual error to obtain a quantized first residual error;
Performing lossy compression processing on the quantized first residual error by using a trained super prior machine learning model to obtain a second residual error for reducing entropy of the first residual error;
And carrying out quantization processing on the second residual error to obtain the quantized second residual error.
3. The method of claim 1, wherein said encoding said quantized second residual using lossless coding results in compressed video, comprising:
and carrying out lossless coding processing on the quantized second residual error by utilizing arithmetic coding to obtain the compressed video.
4. The method of claim 1, wherein compressing the video to be processed using a first codec algorithm to obtain a first residual comprises:
And encoding the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error with reduced entropy after encoding processing.
5. The method of claim 1, further comprising, after obtaining the compressed video:
decoding the compressed super prior by utilizing arithmetic coding to obtain a quantized second residual error;
decoding the quantized second residual error by using a trained super prior machine learning model to obtain a first residual error;
And decoding the obtained compressed video by utilizing the first residual error and a first encoding and decoding algorithm to obtain an original video.
6. A method of model training, the method comprising:
compressing the video to be processed by using a first encoding and decoding algorithm to obtain a first residual error sample;
Inputting the first residual error sample into a to-be-trained super prior machine learning model for training to obtain a super prior which enables lossless coding to output optimal probability; and the trained super prior machine learning model and the lossless coding are used for coding the video to be processed to obtain the compressed video.
7. The method of claim 6, wherein inputting the first residual sample into a super prior machine learning model to be trained to perform training to obtain a super prior that results in an optimal probability of lossless encoding output, comprising:
Inputting the first residual error sample into a super prior machine learning model to be trained, and inputting the lossless coding into the lossless coding by utilizing the super prior obtained by training to obtain the bit rate after coding;
Comparing the encoded bit rate;
And if the bit rate after encoding is the lowest bit rate, the super prior is the super prior of the optimal probability of the lossless encoding output.
8. A video processing apparatus, the apparatus comprising:
The acquisition module is used for acquiring the video to be processed;
The first coding module is used for compressing the video to be processed by using a first coding and decoding algorithm to obtain a first residual error;
The machine learning module is used for inputting the first residual error into a trained super prior machine learning model to perform lossy compression processing to obtain a quantized second residual error;
And the second coding module is used for obtaining the compressed video after coding the quantized second residual error by using lossless coding.
9. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement the method of any one of claims 1 to 5, or the method of any one of claims 6 to 7.
10. A computer readable medium having stored thereon at least one instruction, at least one program, code set, or instruction set, loaded and executed by a processor to implement a method according to any of claims 1 to 5, or any of claims 6 to 7.
CN202410066673.0A 2024-01-16 2024-01-16 Video processing method, model training method, device, electronic equipment and storage medium Pending CN117896525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410066673.0A CN117896525A (en) 2024-01-16 2024-01-16 Video processing method, model training method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410066673.0A CN117896525A (en) 2024-01-16 2024-01-16 Video processing method, model training method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117896525A true CN117896525A (en) 2024-04-16

Family

ID=90650705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410066673.0A Pending CN117896525A (en) 2024-01-16 2024-01-16 Video processing method, model training method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117896525A (en)

Similar Documents

Publication Publication Date Title
RU2510945C1 (en) Method and apparatus for image encoding and decoding using large transformation unit
US10645389B2 (en) Using multiple probability models for entropy coding in video compression
CN107743239B (en) Method and device for encoding and decoding video data
US11765390B2 (en) Non-transform coding
US8285062B2 (en) Method for improving the performance of embedded graphics coding
CN105474642A (en) Re-encoding image sets using frequency-domain differences
KR20070028404A (en) Method of storing pictures in a memory using compression coding and cost function including power consumption
CN108632630B (en) Binary image coding method combining bit operation and probability prediction
JP2019521550A (en) Method and apparatus for context adaptive binary arithmetic coding of a series of binary symbols representing syntax elements related to video data
KR20130006578A (en) Residual coding in compliance with a video standard using non-standardized vector quantization coder
CN117896525A (en) Video processing method, model training method, device, electronic equipment and storage medium
CN114501031B (en) Compression coding and decompression method and device
US9407918B2 (en) Apparatus and method for coding image, and non-transitory computer readable medium thereof
KR101500300B1 (en) Selective Low-Power Video Codec with Interaction Between Encoder and Decoder, and an Encoding/Decoding Method Thereof
US9781418B1 (en) Adaptive deadzone and rate-distortion skip in video processing
CN116437089B (en) Depth video compression method based on key target
US20230268931A1 (en) Data compression system and data compression method
CN109803147B (en) Transformation processing method and device based on video texture features
CN108206950B (en) Code stream length calculation method and device
Anitha Image Compression Based On Octagon Based Intra Prediction
CN117459727A (en) Image processing method, device and system, electronic equipment and storage medium
KR20000028531A (en) Apparatus and method of image compression encoding and decoding using adaptive conversion method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination