CN110234011B - Video compression method and system - Google Patents
Video compression method and system Download PDFInfo
- Publication number
- CN110234011B CN110234011B CN201910318187.2A CN201910318187A CN110234011B CN 110234011 B CN110234011 B CN 110234011B CN 201910318187 A CN201910318187 A CN 201910318187A CN 110234011 B CN110234011 B CN 110234011B
- Authority
- CN
- China
- Prior art keywords
- data
- residual
- frame
- dimension
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000007906 compression Methods 0.000 title claims abstract description 29
- 230000006835 compression Effects 0.000 title claims abstract description 29
- 238000005070 sampling Methods 0.000 claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 claims description 34
- 230000009467 reduction Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000013144 data compression Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000004913 activation Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000013139 quantization Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000013499 data model Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Discrete Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a video compression method and a system, wherein the method comprises the following steps: determining a frame to be coded and a reference frame in a target video, and calculating residual error data of the frame to be coded relative to the reference frame; respectively extracting an expected vector and a variance vector of the residual data; and normally distributing and sampling the expected vector and the variance vector to obtain compressed data of the frame to be coded, wherein the dimensionality of the compressed data is lower than that of the residual data. The technical scheme provided by the application can effectively compress the video file.
Description
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video compression method and system.
Background
With the continuous improvement of video definition, the data volume of video files is also larger and larger. In order to save bandwidth for transmitting video files, efficient and stable video compression schemes are needed.
In the current mainstream video compression scheme, video data can be quantized first, and then a video file can be encoded after scanning is performed on a quantization result. Specifically, the video data may be quantized by the quantization table, and then the quantization result may be scanned by the zigbee method. This can discard some 0 values in the video data, thereby compressing the data amount of the video file.
However, the video compression scheme in the prior art has a better compression effect for video files with more 0 values, and the compression effect is not ideal for video files with less 0 values because the amount of discarded data is less. If the data is compressed by increasing the quantization step size, the distortion rate of the compressed video file is higher. Therefore, there is a need for a more efficient video compression scheme.
Disclosure of Invention
The application aims to provide a video compression method and a video compression system, which can effectively compress video files.
To achieve the above object, an aspect of the present application provides a video compression method, including: determining a frame to be coded and a reference frame in a target video, and calculating residual error data of the frame to be coded relative to the reference frame; respectively extracting an expected vector and a variance vector of the residual data; and normally distributing and sampling the expected vector and the variance vector to obtain compressed data of the frame to be coded, wherein the dimensionality of the compressed data is lower than that of the residual data.
To achieve the above object, another aspect of the present application further provides a video compression system, including: the residual data calculation unit is used for determining a frame to be coded and a reference frame in a target video and calculating residual data of the frame to be coded relative to the reference frame; a vector extraction unit for extracting an expected vector and a variance vector of the residual data, respectively; and the data compression unit is used for performing normal distribution sampling on the expected vector and the variance vector to obtain compressed data of the frame to be coded, wherein the dimensionality of the compressed data is lower than that of the residual data.
To achieve the above object, another aspect of the present application further provides a video compression apparatus, which includes a memory for storing a computer program and a processor, wherein the computer program, when executed by the processor, implements the video compression method described above.
As can be seen from the above, according to the technical solution provided by the present application, for a frame to be encoded in a target video, a reference frame of the frame to be encoded may be predetermined. Wherein the reference frame, when compressed, may retain the content of the full frame. For the frame to be encoded, the residual data of the frame to be encoded relative to the reference frame can be calculated, and then when the frame to be encoded is encoded, only the residual data can be encoded, so that the data volume required by encoding is greatly reduced. To further reduce the amount of data required for encoding, characteristic parameters that can characterize the residual data may be extracted from the residual data. In the present application, the characteristic parameters may be an expected vector and a variance vector of the residual data. The dimensions of the extracted desired vector and variance vector are lower than those of the original residual data, so that data dimension reduction can be realized. Normal distribution sampling may then be performed for the desired vector and the variance vector. The purpose of this is to improve the accuracy of data compression by eliminating noise in the desired vector and the variance vector by normal distribution sampling. On the other hand, the data after normal distribution sampling can accord with the natural distribution rule of the data, after normal distribution sampling, the expected vector and the variance vector are equivalently restored to the original residual data preliminarily, and the dimensionality of the data after normal distribution sampling is only lower than that of the original residual data. Therefore, the data after normal distribution sampling can be guaranteed to have higher fidelity, and the data after normal distribution sampling can be guaranteed to have lower dimensionality, so that the fidelity is guaranteed, and meanwhile, the data compression efficiency is improved. Thus, the data after normal distribution sampling can be used as compressed data of the frame to be encoded, and the compressed data can be used for subsequent transmission or decoding. Therefore, according to the technical scheme provided by the application, the data volume required by video compression can be reduced through residual data, and in addition, the video can be effectively compressed by extracting the expected vector and the variance vector and performing normal distribution sampling on the expected vector and the variance vector.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of the steps of a video compression method in an embodiment of the present invention;
FIG. 2 is a diagram illustrating image processing in units of macroblocks according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a compression model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a neural network in an embodiment of the present invention;
fig. 5 is a functional block diagram of a video compression system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The application provides a video compression method which can be applied to equipment with a data processing function. Referring to fig. 1, the method may include the following steps.
S1: determining a frame to be coded and a reference frame in a target video, and calculating residual data of the frame to be coded relative to the reference frame.
In this embodiment, the target video may be a video to be encoded (to be compressed), and a frame to be encoded and a reference frame corresponding to the frame to be encoded may be determined in the target video. Specifically, the similarity between the frame to be encoded and the reference frame may be calculated by using an algorithm such as SATD (Sum of Absolute Transformed error Sum algorithm) or SAD (Sum of Absolute Differences Sum algorithm), and when the calculated similarity reaches a specified threshold, the reference frame may be used as the reference frame corresponding to the frame to be encoded. Of course, in practical applications, the selection of the reference frame and the frame to be encoded may be based on other standards, and is not limited to the scheme determined according to the scene. Therefore, the present application does not limit the determination method of the reference frame and the frame to be encoded.
In this embodiment, after the frame to be encoded and the reference frame are determined, in order to reduce data required by the frame to be encoded in the encoding process, a difference between the frame to be encoded and the reference frame may be determined and encoded according to the difference, so that the amount of data required by the frame to be encoded in the encoding process may be greatly reduced.
In particular, residual data of the frame to be encoded relative to the reference frame may be calculated. In calculating the residual data, a residual between the frame to be encoded and a reference frame may be first calculated. The residual may be a pixel difference of a corresponding pixel point between the frame to be encoded and the reference frame. For example, the reference frame and the frame to be encoded are both 28 × 28 video frames, and the residual error obtained by calculation may be a vector with 28 × 28 — 784 elements. However, because of the relatively high similarity between the reference frame and the frame to be encoded, the vector representing the residual contains more 0 values, and these 0 values greatly reduce the encoding pressure in the subsequent encoding process.
In one embodiment, the reference frame and the frame to be encoded are generally divided into a predetermined number of macroblocks ((MacroBlock), and then the above process of calculating the residual error may be performed in units of macroblocks, specifically, the frame to be encoded may be divided into a predetermined number of target macroblocks, and each target MacroBlock may be determined to correspond to a reference MacroBlock in the reference frame, please refer to fig. 2, the number of pixels covered by the reference MacroBlock and the target MacroBlock, which are respectively pointed to by two ends of a dotted line, and the size of the covered area may be consistent, thus, the frame to be encoded and the reference frame may be divided into a pair of target macroblocks and reference macroblocks, then, the local residual error between each target MacroBlock and the corresponding reference MacroBlock may be calculated, specifically, the pixel values of the corresponding pixels between the target MacroBlock and the reference MacroBlock may be subtracted, thereby obtaining the pixel difference value at the position of each pixel point. The combination of the difference values of the pixels in a target macro block can be used as the local residual between the target macro block and the corresponding reference macro block. After the local residuals of the target macroblocks are obtained through calculation, a combination of the local residuals can be used as a residual between the frame to be coded and the reference frame.
In this embodiment, the calculated residual may be converted from the time domain to the frequency domain in order to further increase the number of 0 values and thereby reduce the amount of data required for encoding. Specifically, in practical applications, the residual error obtained by calculation may be processed by Discrete Cosine Transform (DCT), and the data after DCT transformation may implement separation of a high frequency part from a low frequency part, so that the data amount is less and the number of 0 values is greater. In this way, the residual of the frequency domain obtained by conversion can be used as the residual data of the frame to be coded relative to the reference frame.
Of course, in practical applications, DCT transform may be performed in units of macro blocks. Specifically, after the local residuals of each of the above target macroblocks are calculated, each of the local residuals may be converted from a time domain to a frequency domain, and a combination of the converted local residuals of the frequency domain is used as residual data of the frame to be encoded relative to the reference frame.
S3: and respectively extracting an expected vector and a variance vector of the residual data.
In this embodiment, after the residual data of the frame to be encoded is obtained by calculation, the residual data may be encoded. Specifically, the process of encoding the residual data may be performed in a trained compression model. Referring to fig. 3, the compression model may include two units, an encoder and a decoder. The encoder receives the input residual data, can extract the characteristic parameters of the residual data, and represents the residual data by using the characteristic parameters. In particular, the characteristic parameters may be a desired vector and a variance vector of the residual data.
As shown in fig. 3, in practical applications, the encoder may include a trained Deep Neural Network (DNN), where the DNN may fit a data model to a large amount of input sample data, fit the obtained data model, and extract a corresponding expected vector and a corresponding variance vector from input residual data. Specifically, in the training phase, a large number of residual data samples, and the actual expected vector and the actual variance vector corresponding to the residual data samples may be prepared in advance. Subsequently, these residual data samples may be input in batches to the DNN to be trained. For example, the residual data of 100 video frames may be selected for each batch, and assuming that each residual data contains 784 elements, the residual data matrix of 100 × 784 may be input for each batch. The DNN to be trained can process the input residual error data matrix according to the initial neuron, so that a corresponding prediction expected vector and a preset variance vector are obtained. In the training stage, the predicted expected vector and the preset variance vector and the actual expected vector and the actual variance vector may have larger errors, so that the errors may be returned to the DNN as feedback values, so that the DNN adjusts the weight coefficients of the internal neurons until the samples of the residual data are input again, and the actual expected vector and the actual variance vector can be correctly predicted. Therefore, the expected vector and the variance vector corresponding to the residual data of the current frame to be coded can be accurately predicted through the DNN obtained by training a large number of residual data samples.
Referring to fig. 3 and fig. 4, the trained DNN may include a plurality of Fully Connected Layers (FCLs) that can implement the functions of extracting the desired vector and the variance vector and reducing the dimension of the data. Specifically, in fig. 4, it is assumed that residual data of 100 frames to be encoded are input, and the input residual data may be represented as a matrix of 100 × 784. The residual data may be reduced from a first dimension (100 x 784) to a second dimension (100 x 256) through a first fully-connected layer in the deep neural network. In particular, the dimension reduction process may be implemented by means of a convolution kernel. The convolution kernel may perform weighted average on pixel values in a region of the frame to be encoded, and replace the pixel values in the region with weighted average values, thereby achieving the effect of reducing dimensions. Subsequently, the expected vector and the variance vector of the residual data of the second dimension can be extracted through a second fully-connected layer and a third fully-connected layer in the deep neural network respectively. The extracted expected vector and variance vector have a dimension reduction effect compared with the second-dimension data. Thus, the dimension (100 × 128) of the desired vector and the variance vector may be lower than the second dimension (100 × 256).
S5: and normally distributing and sampling the expected vector and the variance vector to obtain compressed data of the frame to be coded, wherein the dimensionality of the compressed data is lower than that of the residual data.
In this embodiment, in order to make the extracted expected vector and variance vector conform to the distribution rule of natural data, normal distribution sampling may be performed on the expected vector and variance vector, so as to preliminarily restore original residual data. Data obtained by normal distribution sampling is only lower in dimensionality than the original residual data.
Specifically, referring to fig. 4, a normal sampling layer for performing normal distribution sampling may be further included in the deep neural network, so that the expected vector and the variance vector of the residual data of the second dimension (100 × 256) output through the previous multiple fully-connected layers may be input into the normal sampling layer for normal distribution sampling, thereby obtaining compressed data of a third dimension (100 × 128). In this way, the dimension of the compressed data is not only lower than the second dimension, but also noise in the desired vector and the variance vector can be removed due to the normal distribution sampling. Finally, the compressed data output by the normal sampling layer can be used as the compressed data of the frame to be coded.
In one embodiment, to measure the data distortion of a normal distribution sample, the relative entropy (RL divergence) of the residual data may be calculated according to the desired vector and the variance vector, and the distortion after the normal distribution sample may be characterized by the relative entropy. In one example of an application, the relative entropy can be represented by the following formula:
wherein KL represents the relative entropy, εiRepresents the variance vector, mu, corresponding to the ith frame to be codediAnd representing the expected vector corresponding to the ith frame to be coded.
Through the calculation result of the relative entropy, the process of the normal distribution sampling can be adjusted, so that the distortion degree after the normal distribution sampling is kept in a reasonable range.
In this embodiment, the compressed data encoded by the encoder may be subjected to a subsequent transmission process. After receiving the compressed data, it can be decoded using the decoder shown in fig. 3. Specifically, referring to fig. 3 and 4, a decoding neural network may be constructed based on the DNN obtained by the training, and the compressed data of the frame to be encoded is inversely reconstructed by the decoding neural network, so that the compressed data may be restored to decoded data matched with the dimension of the residual data.
Specifically, the decoding neural network may be an inverse network of the DNN obtained by the training. In the decoding neural network, two fully-connected layers may be included. As shown in fig. 4, the compressed data of the frame to be encoded may be first input to the first fully-connected layer in the decoding neural network, so as to reduce the compressed data from the low latitude of 100 × 128 to the second dimension 100 × 256 described above. The data reduced to the second dimension may then continue to be input into a second fully-connected layer of the decoding neural network, so that the data of the second dimension 100 x 256 may be reduced to the first dimension 100 x 784. Thus, the decoded data having the same dimension as the original residual data can be obtained by decoding. Accordingly, the data restored to the first dimension can be used as decoded data matching the dimension of the residual data.
In one embodiment, in order to evaluate the encoding and decoding effects of the whole encoder and decoder, after the compressed data of the frame to be encoded is inversely reconstructed, an error between the restored decoded data and the residual data may be calculated, and a cross entropy of the error and the relative entropy may be calculated, so that a distortion degree of the decoded data with respect to the residual data may be characterized by the cross entropy. In one application example, the simplified formula of the cross entropy can be as follows:
C=-logP(X'|X)+KL
where C represents the cross entropy, X 'represents decoded data, X represents residual data, -logP (X' | X) represents the error between decoded data and residual data, and KL represents the relative entropy.
In this way, in combination with the relative entropy and the cross entropy, the neural networks of the encoder and the decoder can be corrected so that the distortion after encoding and decoding is within an allowable range or is minimized.
In this embodiment, the decoded data obtained by decoding may be subsequently encoded according to an existing encoding method (for example, an encoding method such as CACBA and VLC), which is not limited in this application.
In practical application, when the full connection layer processes data, a proper activation function needs to be selected. For example, in the above-described first fully connected layer, a ReLU (Rectified Linear Unit) activation function may be employed. For another example, in the decoding neural network described above, the first fully-connected layer and the second fully-connected layer may employ a ReLU activation function and a Sigmoid activation function, respectively. Of course, in practical application, other activation functions can be flexibly selected according to the fitting effect and the actual requirement. For example, a Tanh activation function may also be selected.
Referring to fig. 5, the present application further provides a video compression system, which includes:
the residual data calculation unit is used for determining a frame to be coded and a reference frame in a target video and calculating residual data of the frame to be coded relative to the reference frame;
a vector extraction unit for extracting an expected vector and a variance vector of the residual data, respectively;
and the data compression unit is used for performing normal distribution sampling on the expected vector and the variance vector to obtain compressed data of the frame to be coded, wherein the dimensionality of the compressed data is lower than that of the residual data.
In one embodiment, the residual data calculation unit includes:
and the frequency domain conversion module is used for calculating a residual error between the frame to be coded and the reference frame, converting the residual error from a time domain to a frequency domain, and taking the residual error of the frequency domain obtained through conversion as residual error data of the frame to be coded relative to the reference frame.
In one embodiment, the system further comprises:
the neural network input unit is used for inputting the residual error data into a deep neural network which completes training, and the deep neural network comprises a plurality of full connection layers;
a dimensionality reduction unit for reducing the residual data from a first dimension to a second dimension through a first fully-connected layer in the deep neural network;
correspondingly, the vector extraction unit is further configured to extract an expected vector and a variance vector of the residual data of the second dimension through a second fully-connected layer and a third fully-connected layer in the deep neural network, respectively; wherein the desired vector and the variance vector are lower in dimension than the second dimension.
In one embodiment, the system further comprises:
and the decoding unit is used for carrying out reverse reconstruction on the compressed data of the frame to be coded so as to restore the compressed data into decoded data matched with the dimensionality of the residual data.
In one embodiment, the decoding unit includes:
a decoding network input module, configured to input compressed data of the frame to be encoded into a first fully-connected layer in a decoding neural network, so as to restore the compressed data to the second dimension;
and the data reduction module is used for inputting the data reduced to the second dimension into a second full connection layer of the decoding neural network so as to reduce the data of the second dimension to the first dimension, and taking the data reduced to the first dimension as the decoding data matched with the dimension of the residual data.
In one embodiment, the system further comprises:
the relative entropy calculating unit is used for calculating the relative entropy of the residual data according to the expected vector and the variance vector, and representing the distortion degree after normal distribution sampling through the relative entropy;
and the cross entropy calculation unit is used for calculating an error between the restored decoded data and the residual data, calculating a cross entropy of the error and the relative entropy, and representing the distortion degree of the decoded data relative to the residual data through the cross entropy.
As can be seen from the above, according to the technical solution provided by the present application, for a frame to be encoded in a target video, a reference frame of the frame to be encoded may be predetermined. Wherein the reference frame, when compressed, may retain the content of the full frame. For the frame to be encoded, the residual data of the frame to be encoded relative to the reference frame can be calculated, and then when the frame to be encoded is encoded, only the residual data can be encoded, so that the data volume required by encoding is greatly reduced. To further reduce the amount of data required for encoding, characteristic parameters that can characterize the residual data may be extracted from the residual data. In the present application, the characteristic parameters may be an expected vector and a variance vector of the residual data. The dimensions of the extracted desired vector and variance vector are lower than those of the original residual data, so that data dimension reduction can be realized. Normal distribution sampling may then be performed for the desired vector and the variance vector. The purpose of this is to improve the accuracy of data compression by eliminating noise in the desired vector and the variance vector by normal distribution sampling. On the other hand, the data after normal distribution sampling can accord with the natural distribution rule of the data, after normal distribution sampling, the expected vector and the variance vector are equivalently restored to original residual data preliminarily, and the dimensionality of the data after normal distribution sampling is only lower than that of the original residual data. Therefore, the data after normal distribution sampling can be guaranteed to have higher fidelity, and the data after normal distribution sampling can be guaranteed to have lower dimensionality, so that the fidelity is guaranteed, and meanwhile, the data compression efficiency is improved. Thus, the data after normal distribution sampling can be used as compressed data of the frame to be encoded, and the compressed data can be used for subsequent transmission or decoding. Therefore, according to the technical scheme provided by the application, the data volume required by video compression can be reduced through residual data, and in addition, the video can be effectively compressed by extracting the expected vector and the variance vector and performing normal distribution sampling on the expected vector and the variance vector.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (15)
1. A method of video compression, the method comprising:
determining a frame to be coded and a reference frame in a target video, and calculating residual error data of the frame to be coded relative to the reference frame;
inputting the residual data into a deep neural network which completes training, wherein the deep neural network comprises a plurality of full connection layers, reducing the dimensionality of the residual data through the full connection layers, and extracting an expected vector and a variance vector of the residual data after dimensionality reduction;
the deep neural network further comprises a normal sampling layer, the expected vector and the variance vector of the residual data after dimensionality reduction are input into the normal sampling layer for normal distribution sampling, compressed data are obtained and serve as the compressed data of the frame to be coded, and dimensionality of the compressed data is lower than that of the residual data after dimensionality reduction.
2. The method of claim 1, wherein calculating residual data of the frame to be encoded relative to the reference frame comprises:
and calculating a residual error between the frame to be coded and the reference frame, converting the residual error from a time domain to a frequency domain, and taking the residual error of the frequency domain obtained by conversion as residual error data of the frame to be coded relative to the reference frame.
3. The method of claim 2, wherein computing a residual between the frame to be encoded and a reference frame comprises:
dividing the frame to be coded into a preset number of target macro blocks, and determining a reference macro block corresponding to each target macro block in the reference frame;
respectively calculating local residual errors between each target macro block and the corresponding reference macro block, and taking the combination of the local residual errors corresponding to each target macro block as the residual error between the frame to be coded and the reference frame;
correspondingly, each local residual is converted from a time domain to a frequency domain, and the combination of the local residuals of the converted frequency domain is used as residual data of the frame to be coded relative to the reference frame.
4. The method of claim 1, wherein the reducing the dimensionality of the residual data by the plurality of fully-connected layers and extracting the desired vector and the variance vector of the residual data after the reducing the dimensionality comprises:
reducing the residual data from a first dimension to a second dimension through a first fully-connected layer in the deep neural network;
extracting an expected vector and a variance vector of residual data of the second dimension through a second full connection layer and a third full connection layer in the deep neural network respectively; wherein the desired vector and the variance vector are lower in dimension than the second dimension.
5. The method according to claim 4, wherein inputting the desired vector and the variance vector of the residual data after dimensionality reduction into the normal sampling layer for normal distribution sampling to obtain compressed data comprises:
inputting the expected vector and the variance vector of the residual data of the second dimension into the normal sampling layer for normal distribution sampling to obtain compressed data of a third dimension; wherein the third dimension is lower than the second dimension.
6. The method of claim 4, wherein after obtaining the compressed data of the frame to be encoded, the method further comprises:
and performing reverse reconstruction on the compressed data of the frame to be coded so as to restore the compressed data into decoded data matched with the dimensionality of the residual data.
7. The method of claim 6, wherein inversely reconstructing the compressed data of the frame to be encoded comprises:
inputting compressed data of the frame to be encoded into a first full-connection layer in a decoding neural network so as to restore the compressed data to the second dimension;
inputting the data restored to the second dimension into a second fully-connected layer of the decoding neural network so as to restore the data of the second dimension to the first dimension, and using the data restored to the first dimension as decoding data matched with the dimension of the residual data.
8. The method of claim 6, wherein after normally distributing the desired vector and the variance vector, the method further comprises:
calculating the relative entropy of the residual data according to the expected vector and the variance vector, and representing the distortion degree after normal distribution sampling through the relative entropy;
accordingly, after inversely reconstructing the compressed data of the frame to be encoded, the method further comprises:
and calculating an error between the restored decoded data and the residual data, calculating a cross entropy of the error and the relative entropy, and representing the distortion degree of the decoded data relative to the residual data through the cross entropy.
9. A video compression system, the system comprising:
the residual data calculation unit is used for determining a frame to be coded and a reference frame in a target video and calculating residual data of the frame to be coded relative to the reference frame;
the neural network input unit is used for inputting the residual data into a deep neural network which completes training, and the deep neural network comprises a plurality of full-connection layers and normal sampling layers;
a dimensionality reduction unit for reducing the dimensionality of the residual data by the plurality of fully connected layers;
the vector extraction unit is used for extracting an expected vector and a variance vector of the residual error data after dimension reduction;
and the data compression unit is used for inputting the expected vector and the variance vector of the residual data after dimensionality reduction into the normal sampling layer for normal distribution sampling to obtain compressed data, and taking the compressed data as the compressed data of the frame to be coded, wherein the dimensionality of the compressed data is lower than that of the residual data after dimensionality reduction.
10. The system of claim 9, wherein the residual data calculation unit comprises:
and the frequency domain conversion module is used for calculating a residual error between the frame to be coded and the reference frame, converting the residual error from a time domain to a frequency domain, and taking the residual error of the frequency domain obtained through conversion as residual error data of the frame to be coded relative to the reference frame.
11. The system of claim 9, wherein:
the dimensionality reduction unit is specifically configured to reduce the residual data from a first dimension to a second dimension through a first full-link layer in the deep neural network;
the vector extraction unit is specifically configured to extract an expected vector and a variance vector of the residual data of the second dimension through a second full connection layer and a third full connection layer in the deep neural network, respectively; wherein the desired vector and the variance vector are lower in dimension than the second dimension.
12. The system of claim 11, further comprising:
and the decoding unit is used for carrying out reverse reconstruction on the compressed data of the frame to be coded so as to restore the compressed data into decoded data matched with the dimensionality of the residual data.
13. The system of claim 12, wherein the decoding unit comprises:
a decoding network input module, configured to input compressed data of the frame to be encoded into a first fully-connected layer in a decoding neural network, so as to restore the compressed data to the second dimension;
and the data reduction module is used for inputting the data reduced to the second dimension into a second full connection layer of the decoding neural network so as to reduce the data of the second dimension to the first dimension, and taking the data reduced to the first dimension as the decoding data matched with the dimension of the residual data.
14. The system of claim 12, further comprising:
the relative entropy calculating unit is used for calculating the relative entropy of the residual data according to the expected vector and the variance vector, and representing the distortion degree after normal distribution sampling through the relative entropy;
and the cross entropy calculation unit is used for calculating an error between the restored decoded data and the residual data, calculating a cross entropy of the error and the relative entropy, and representing the distortion degree of the decoded data relative to the residual data through the cross entropy.
15. A video compression device, characterized in that the video compression device comprises a memory for storing a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910318187.2A CN110234011B (en) | 2019-04-19 | 2019-04-19 | Video compression method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910318187.2A CN110234011B (en) | 2019-04-19 | 2019-04-19 | Video compression method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110234011A CN110234011A (en) | 2019-09-13 |
CN110234011B true CN110234011B (en) | 2021-09-24 |
Family
ID=67860744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910318187.2A Expired - Fee Related CN110234011B (en) | 2019-04-19 | 2019-04-19 | Video compression method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110234011B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110933429B (en) * | 2019-11-13 | 2021-11-12 | 南京邮电大学 | Video compression sensing and reconstruction method and device based on deep neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102158703A (en) * | 2011-05-04 | 2011-08-17 | 西安电子科技大学 | Distributed video coding-based adaptive correlation noise model construction system and method |
CN103546749A (en) * | 2013-10-14 | 2014-01-29 | 上海大学 | Method for optimizing HEVC (high efficiency video coding) residual coding by using residual coefficient distribution features and bayes theorem |
CN104299201A (en) * | 2014-10-23 | 2015-01-21 | 西安电子科技大学 | Image reconstruction method based on heredity sparse optimization and Bayes estimation model |
CN104702961A (en) * | 2015-02-17 | 2015-06-10 | 南京邮电大学 | Code rate control method for distributed video coding |
CN109587487A (en) * | 2017-09-28 | 2019-04-05 | 上海富瀚微电子股份有限公司 | The appraisal procedure and system of the structural distortion factor of a kind of pair of RDO strategy |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060193527A1 (en) * | 2005-01-11 | 2006-08-31 | Florida Atlantic University | System and methods of mode determination for video compression |
US8121190B2 (en) * | 2006-10-05 | 2012-02-21 | Siemens Aktiengesellschaft | Method for video coding a sequence of digitized images |
-
2019
- 2019-04-19 CN CN201910318187.2A patent/CN110234011B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102158703A (en) * | 2011-05-04 | 2011-08-17 | 西安电子科技大学 | Distributed video coding-based adaptive correlation noise model construction system and method |
CN103546749A (en) * | 2013-10-14 | 2014-01-29 | 上海大学 | Method for optimizing HEVC (high efficiency video coding) residual coding by using residual coefficient distribution features and bayes theorem |
CN104299201A (en) * | 2014-10-23 | 2015-01-21 | 西安电子科技大学 | Image reconstruction method based on heredity sparse optimization and Bayes estimation model |
CN104702961A (en) * | 2015-02-17 | 2015-06-10 | 南京邮电大学 | Code rate control method for distributed video coding |
CN109587487A (en) * | 2017-09-28 | 2019-04-05 | 上海富瀚微电子股份有限公司 | The appraisal procedure and system of the structural distortion factor of a kind of pair of RDO strategy |
Non-Patent Citations (1)
Title |
---|
H.265/HEVC编码加速算法研究;王建福;《优秀博士论文电子期刊》;20150915;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110234011A (en) | 2019-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111711824B (en) | Loop filtering method, device and equipment in video coding and decoding and storage medium | |
US5699121A (en) | Method and apparatus for compression of low bit rate video signals | |
CN111641832B (en) | Encoding method, decoding method, device, electronic device and storage medium | |
CN102138326B (en) | Method and apparatus for inverse quantizing image, and method and apparatus for decoding image | |
CN107645662B (en) | Color image compression method | |
CN110677651A (en) | Video compression method | |
CN101578880A (en) | Video decoding method and video encoding method | |
CN110753225A (en) | Video compression method and device and terminal equipment | |
WO2020261314A1 (en) | Image encoding method and image decoding method | |
CN113766249A (en) | Loop filtering method, device and equipment in video coding and decoding and storage medium | |
Sun et al. | Dictionary learning for image coding based on multisample sparse representation | |
KR20040015477A (en) | Motion Estimation Method and Apparatus Which Refer to Discret Cosine Transform Coefficients | |
CN116916036A (en) | Video compression method, device and system | |
CN110234011B (en) | Video compression method and system | |
CN108182712B (en) | Image processing method, device and system | |
CN110730347A (en) | Image compression method and device and electronic equipment | |
KR20130006578A (en) | Residual coding in compliance with a video standard using non-standardized vector quantization coder | |
Akbari et al. | Downsampling based image coding using dual dictionary learning and sparse representations | |
KR20010101951A (en) | Improving compressed image appearance using stochastic resonance and energy replacement | |
CN116982262A (en) | State transition for dependent quantization in video coding | |
CN110717948A (en) | Image post-processing method, system and terminal equipment | |
CN114501034B (en) | Image compression method and medium based on discrete Gaussian mixture super prior and Mask | |
Tao et al. | Prior-information-based remote sensing image compression with Bayesian dictionary learning | |
CN117459737B (en) | Training method of image preprocessing network and image preprocessing method | |
US12069238B2 (en) | Image compression method and apparatus for machine vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210924 |