CN110677651A

CN110677651A - Video compression method

Info

Publication number: CN110677651A
Application number: CN201910824115.5A
Authority: CN
Inventors: 武祥吉
Original assignee: Hefei Map Duck Mdt Infotech Ltd
Current assignee: Hefei Map Duck Mdt Infotech Ltd
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2020-01-10

Abstract

The invention is suitable for the field of video compression, and provides a video compression method, a video compression device and terminal equipment, wherein the video compression method comprises the following steps: calculating optical flows of the m-1 th frame reconstruction frame and the m-1 th frame, and performing image distortion on the optical flows and the m-1 th frame reconstruction frame to obtain an m-1 th frame optical flow prediction frame; training a separation convolution network according to the m frame optical flow prediction frame, the m-1 frame reconstruction frame and the m frame, and obtaining a motion vector and a pixel weight based on the separation convolution network; obtaining an m-th frame prediction frame according to the m-1 th frame reconstruction frame, the m-th frame optical flow prediction frame, the motion vector and the pixel weight; calculating a residual error between the m-th frame prediction frame and the m-th frame, and compressing the residual error according to a residual error network to obtain a compressed residual error; decoding the compressed residual error and adding the compressed residual error and the prediction frame of the mth frame to obtain a reconstructed frame of the mth frame; and reconstructing the frame based on all the m frames to form the compressed video. The invention solves the problem of large motion blur in video compression.

Description

Video compression method

Technical Field

The invention belongs to the technical field of video compression, and particularly relates to a video compression method, a video compression device and terminal equipment.

Background

The invention solves the problem of large motion blur in the video compression process by combining optical flow and separation convolution.

Disclosure of Invention

In view of this, embodiments of the present invention provide a video compression method, an apparatus, and a terminal device, so as to solve the problem of large motion blur in the prior art.

A first aspect of an embodiment of the present invention provides a video compression method, including:

obtaining an optical flow according to an m-1 frame reconstruction frame and an m frame in a video to be compressed, and performing image distortion on the optical flow and the m-1 frame reconstruction frame to obtain an m frame optical flow prediction frame, wherein m is more than 1 and less than or equal to N, m and N are positive integers, and N is the total frame number of the video to be compressed;

training a separation convolution network according to the m-th frame optical flow prediction frame, the m-1 th frame reconstruction frame and the m-th frame, and obtaining a motion vector and a pixel weight based on the separation convolution network;

obtaining an m frame prediction frame according to the m-1 frame reconstruction frame, the m frame optical flow prediction frame, the motion vector and the pixel weight;

calculating a residual error between the m-th frame predicted frame and the m-th frame, and compressing the residual error according to a residual error network to obtain a compressed residual error;

decoding the compressed residual error and adding the compressed residual error and the prediction frame of the mth frame to obtain a reconstruction frame of the mth frame;

and reconstructing the frame based on all the m frames to form the compressed video.

A second aspect of an embodiment of the present invention provides a video compression apparatus, including:

the optical flow prediction frame module is used for obtaining optical flow according to an m-1 th frame reconstruction frame and an m-th frame in the video to be compressed, and carrying out image distortion on the optical flow and the m-1 th frame reconstruction frame to obtain an m-th frame optical flow prediction frame, wherein m is more than 1 and less than or equal to N, m and N are positive integers, and N is the total frame number of the video to be compressed;

the network construction module is used for training a separation convolution network according to the mth frame optical flow prediction frame, the (m-1) th frame reconstruction frame and the mth frame, and obtaining a motion vector and a pixel weight based on the separation convolution network;

the predicted frame module is used for obtaining an m frame predicted frame according to the m-1 frame reconstructed frame, the m frame optical flow predicted frame, the motion vector and the pixel weight;

the residual error module is used for calculating the residual error between the predicted frame of the mth frame and compressing the residual error according to a residual error network to obtain a compressed residual error;

a reconstructed frame module, configured to decode the compressed residual and add the compressed residual to the mth frame prediction frame to obtain an mth frame reconstructed frame;

and the video compression module is used for forming a compressed video based on all the m-th frame reconstruction frames.

A third aspect of embodiments of the present invention provides a video compression terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method provided in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as provided in the first aspect above.

Compared with the prior video compression technology, the method solves the problem of large motion blur in video compression.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of an implementation of a video compression method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of an implementation of a training method for a residual error network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a three-layer convolutional neural network provided in an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a self-coding network provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of an implementation flow of a training method for separating a convolutional network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video compression apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a video compression terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Fig. 1 shows a flow of implementing a video compression method according to an embodiment of the present invention, where an execution subject of the method may be a terminal device, which is detailed as follows:

s101, obtaining optical flows according to an m-1 frame reconstruction frame and an m frame in a video to be compressed, and performing image distortion on the optical flows and the m-1 frame reconstruction frame to obtain an m frame optical flow prediction frame, wherein m is more than 1 and less than or equal to N, m and N are positive integers, and N is the total frame number of the video to be compressed.

Optionally, the optical flow represents a motion track of pixels in the video, optical flow _1-2 of a 1 st frame reconstruction frame and a 2 nd frame are calculated in advance, and a transformed 2 nd frame optical flow prediction frame is obtained by performing image warping, namely wrapping operation, on the 1 st frame reconstruction frame and the flow _ 1-2.

S102, training a separation convolution network according to the m-th frame optical flow prediction frame, the m-1 th frame reconstruction frame and the m-th frame, and obtaining a motion vector and a pixel weight based on the separation convolution network.

Optionally, a separate convolutional network is constructed, which is a convolutional neural network. Further, the convolutional neural network may include at least one convolutional layer, the convolutional layer may include a convolutional kernel, an image input to the convolutional layer is subjected to a convolution operation with the convolutional kernel to remove redundant image information, and an image including feature information is output. If the size of the convolution kernel is larger than 1 multiplied by 1, the convolution layer can output a plurality of feature maps with the size smaller than that of the input image, and after the processing of the plurality of convolution layers, the size of the image input into the convolution neural network is subjected to multi-stage contraction to obtain a plurality of feature maps with the size smaller than that of the image input into the neural network. Optionally, the convolutional neural network may further include a pooling layer, an inclusion module, a full-link layer, and the like, which is not limited herein.

Optionally, the m-th frame optical flow predicted frame, the m-1 th frame reconstructed frame and the m-th frame are added, that is, the numerical values of each pixel in the m-th frame optical flow predicted frame, the m-1 th frame reconstructed frame and the m-th frame are respectively added, and the above added frame is input into the separation convolution network to obtain an added frame reconstructed frame. And the 1 st frame reconstruction frame is obtained by reconstructing the 1 st frame based on the 1 st frame reconstruction network.

Optionally, a reconstruction network of frame 1 is constructed, the reconstruction network may be a convolutional neural network, the reconstruction network of frame 1 is trained through a back propagation algorithm, and an input of the reconstruction network may be a random noise. Further, after the reconstruction network is trained, a 1 st frame reconstruction frame is obtained according to the parameters of the reconstruction network.

Further, the separation convolution network is trained through a back propagation algorithm according to the obtained addition frame reconstruction frame and the addition frame. Specifically, calculating a difference value between the addition frame and the addition frame reconstruction frame to obtain a loss function, performing gradient update through the loss function, and adjusting parameters of the separation convolutional network according to the gradient update; inputting the addition frame into a separate convolution network with updated parameters to obtain a new addition frame reconstruction frame, calculating the difference value of the addition frame and the new addition frame reconstruction frame to obtain a loss function, performing gradient update through the loss function, and adjusting the parameters of the separate convolution network according to the gradient update; and repeating the operation of inputting the addition frame into the separation convolution network after adjusting the parameters to obtain a reconstructed frame, performing gradient updating according to a loss function of the reconstructed frame and the addition frame, and adjusting the parameters of the separation convolution network based on the gradient updating until the preset times are reached or the quality of the reconstructed image reaches a preset target.

Further, motion vectors and pixel weights are derived based on the separate convolutional network.

Optionally, the separation convolutional network is designed to be an encoder-decoder structure, high-dimensional feature information of a bottleneck-sock layer (bottleneck-sock layer) in the separation convolutional network, that is, a weight parameter of the bottleneck-sock layer, is extracted, and then the bottleneck-sock layer is quantized and entropy-encoded; wherein the bottleneck layer is the smallest sized convolutional layer in the separate convolutional network. The Decoder stage obtains a separating convolution kernel and a Mask, namely a motion vector and a pixel weight, and the bit number consumed by motion vector compression is bit _ M.

S103, obtaining an m frame prediction frame according to the m-1 frame reconstruction frame, the m frame optical flow prediction frame, the motion vector and the pixel weight.

Optionally, the pixel weight Mask is converted into a probability distribution between (0-1), namely a pixel weight distribution, through a sigmoid activation function. And performing convolution operation on each pixel of the reconstructed m-1 frame and the optical flow prediction frame of the m frame and a motion vector with the size of [20 x 20], and performing point multiplication on the pixel weight distribution and the corresponding pixel weight distribution to obtain a prediction frame P _ prediction.

And S104, calculating a residual error between the m frame prediction frame and the m frame, and compressing the residual error according to a residual error network to obtain a compressed residual error.

Specifically, the m-th frame and the m-th frame prediction frame are subtracted to obtain a residual error. And compressing the residual error based on a pre-constructed residual error compression network to obtain a compressed residual error, wherein the residual error compression network can be a self-coding network, extracting high-dimensional characteristic information of a bottle _ nack layer in the residual error compression network, and then quantizing and entropy coding the high-dimensional characteristic information to finally obtain the compressed residual error.

Further, the training method of the residual error network comprises the following steps:

s201, extracting the characteristics of the image through a residual error network, wherein the residual error network is a self-coding network.

Specifically, the three layers of convolutional neural networks as shown in fig. 3 are used to extract features of the image, and in an optional manner, the result obtained by each layer of convolutional neural networks is used as an input to calculate a final feature, that is, the normalized feature obtained after each layer of convolution is convolved again and used as a cascade input.

S202, estimating the characteristics according to a probability model to obtain a code rate estimation result.

Specifically, the method comprises the following steps:

and estimating the distribution according to a probability model, and estimating the code rate according to the entropy to obtain the code rate estimation result.

The data distribution of natural images is generally considered to be gaussian-like, so a zero mean, variance, can be taken as

Laplacian distribution of

For feature y_iThe probability distribution is modeled, and the formula of the probability is as follows:

wherein μ represents the average distribution of the particles,representing the compression characteristics of a hyper-parametric network.

Further, a self-coding network may be employed to pair variances

Learning is carried out, the structure of the self-coding network is shown in figure 4,representing compression characteristics as an output of a hyper-parametric self-encoded networkAnd learning the standard difference distribution, wherein in the super-parameter self-coding network, the expression formula of the variable z is as follows: z is h_e(y) wherein h_eRepresenting the encoder of the hyper-parametric learning network, and then performing quantization, the quantization formula being

The quantized representation may then be transmitted as an additional variable.

Code rate of features can be modeled using entropy structureThe prior distribution can be fitted using a parameterized approach and then the prior probability model is learned in a data-driven manner.

And S203, quantizing the features to obtain quantized features.

Specifically, the method comprises the following steps:

in the training process, an additive uniform noise is used to set a quantifier, and the representation mode is

Where e is random noise. Wherein the variable

Entropy energy variable of

So that it can be used in the actual use of the model

As a quantization operation, in such a manner, the code rate can be estimated accurately.

And S204, inputting the quantized features into a residual error network for decoding to obtain a reconstructed image.

Specifically, the quantized features are decoded according to a self-decoding network to obtain a reconstructed image.

S205, comparing the reconstructed image with the image, and estimating according to the code rate to obtain a rate-distortion optimization result.

Specifically, the reconstructed image is compared with the image to obtain a distortion residual error;

and obtaining the rate-distortion optimization result according to the code rate estimation result and the distortion residual error.

In the compression model, distortion D can be expressed as Mean Square Error (MSE)

To perform the representation or to use subjective distortion such as MS-SSIM for the calculation. The loss function R + lambda D for weighting the code rate and the distortion is used for optimizing the self-coding compression algorithm end to end, and in the optimization process, the loss function is defined firstly, and then the network parameters can be optimized by using a back propagation algorithm.

S206, adjusting the parameters of the residual error network according to the rate-distortion optimization result.

Specifically, a gradient back propagation algorithm is adopted to update parameters of the convolutional neural network.

And S105, decoding the compressed residual error and adding the compressed residual error and the prediction frame of the mth frame to obtain a reconstructed frame of the mth frame.

Optionally, the compressed residual is decoded according to the residual compression network, i.e. a self-coding network, and added to the obtained mth frame predicted frame to obtain an mth frame reconstructed frame. Specifically, the data matrix of the decoded compressed residual is correspondingly added to the data matrix of the mth frame prediction frame, and the obtained data matrix is the mth frame reconstruction frame.

And S106, constructing a compressed video based on all the m-th frame reconstruction frames.

Specifically, all reconstructed frames are arranged and combined according to the sequence of the frames in the video to be compressed, so as to obtain the compressed video.

Optionally, as shown in fig. 5, the step of training the separate convolutional network includes:

s501, comparing the m-th frame reconstruction frame with the m-th frame to obtain a distortion residual error;

specifically, the mth frame reconstructed frame is compared with the mth frame to obtain a distortion residual. Specifically, the distortion residual Loss may be expressed by mean square error MSE (MSE), Loss ═ MSE (P)_m-P_mRecon). Wherein P represents the mth frame, P_mRecon denotes the m-th frame reconstructed frame or calculation using subjective distortion such as MS-SSIM.

S502, extracting the parameters of the bottleneck layer of the separation convolutional network, and carrying out quantization and entropy coding to obtain a motion bit stream;

specifically, high-dimensional feature information of a bottleneck layer of the separate convolutional network, that is, a weight parameter of a convolutional layer with the smallest size in the separate convolutional network, is extracted. And quantizing and entropy coding the high-dimensional characteristic information to obtain a motion bit stream bit _ M.

S503, extracting parameters of a bottleneck layer of the residual error network, and obtaining a residual error bit stream after quantization and entropy coding;

specifically, high-dimensional characteristic information of a bottleneck layer of the residual error network, namely a weight parameter of a convolutional layer with the smallest size in the residual error network, is extracted, and the residual error network is a self-coding network. And quantizing and entropy coding the high-dimensional characteristic information to obtain a residual error bit stream bit _ R.

S504, calculating a rate distortion residual according to the distortion residual, the motion bit stream and the residual bit stream;

specifically, a rate distortion residual Rd-Loss is calculated according to the obtained distortion residual Loss, the motion bit stream bit _ M and the residual bit stream bit _ R. And Rd-Loss + Rate, wherein Rate is bit _ M + bit _ R, and lambda is a preset value used for distributing the proportion of the compression model to the prediction accuracy and the compression degree.

And S505, adjusting the weight parameter of the separation convolutional network according to the rate distortion residual error.

Optionally, the gradient is updated to W ═ W- α Δ W. Wherein, W represents the weight parameter of the separation convolutional network, W' represents the updated weight parameter, α is the preset learning rate, and Δ W is the calculated gradient.

Further, when performing gradient updates, the calculations can be performed using existing adaptive gradient optimizers. In particular, an Adam optimizer may be used. Further, the Rd-Loss, the weight parameter of the separation convolution network and the preset learning rate are input into the Adam optimizer, and the updated weight parameter can be obtained. Optionally, the updated weight parameter obtained by the calculation replaces the original weight parameter in the separation convolutional network, so as to become a new separation convolutional network.

Optionally, the training of the separation convolutional network is repeated until a preset number of times is reached, that is, the video compression effect of the separation convolutional network reaches a preset target.

In the embodiment, by combining optical flow calculation and separation convolution, the compression effect is improved, and the problem of large motion blur in the video compression process is solved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Example two

Fig. 6 shows a block diagram of a video compression apparatus according to an embodiment of the present invention, and for convenience of description, only the portions related to the embodiment of the present invention are shown. The video compression apparatus 6 includes: the system comprises an optical flow prediction frame module 61, a network construction module 62, a prediction frame module 63, a residual error module 64, a reconstructed frame module 65 and a video compression module 66.

The optical flow prediction frame module 61 is configured to obtain an optical flow according to an m-1 th frame reconstruction frame and an m-th frame in the video to be compressed, perform image distortion on the optical flow and the m-1 th frame reconstruction frame to obtain an m-th frame optical flow prediction frame, where m is greater than 1 and less than or equal to N, m and N are positive integers, and N is a total frame number of the video to be compressed;

a network construction module 62, configured to train a separation convolutional network according to the mth frame optical flow prediction frame, the m-1 frame reconstruction frame, and the mth frame, and obtain a motion vector and a pixel weight based on the separation convolutional network;

a predicted frame module 63, configured to obtain an m-th frame predicted frame according to the m-1 th frame reconstructed frame, the m-th frame optical flow predicted frame, the motion vector, and the pixel weight;

a residual error module 64, configured to calculate a residual error between the m-th frame prediction frame and the m-th frame, and compress the residual error according to a residual error network to obtain a compressed residual error;

a reconstructed frame module 65, configured to decode the compressed residual and add the compressed residual to the mth frame prediction frame to obtain an mth frame reconstructed frame;

and a video compression module 66 for constructing a compressed video based on all the m-th reconstructed frames.

Optionally, the network building module 62 includes:

the construction unit is used for constructing a separation convolution network;

the input unit is used for adding the optical flows of the m-1 th frame reconstruction frame, the mth frame and the m-1 th frame reconstruction frame to the mth frame and inputting the optical flows into the separation convolution network, wherein the 1 st frame reconstruction frame is obtained by reconstructing the 1 st frame based on the 1 st frame reconstruction network;

the network training unit is used for training the separation convolutional network based on a back propagation algorithm;

the parameter extraction unit is used for extracting parameters of a bottleneck layer in the separation convolutional network, wherein the bottleneck layer is the convolutional layer with the smallest size in the separation convolutional network;

and the output unit is used for inputting the bottleneck layer parameters to obtain a motion vector and a pixel weight based on the separation convolutional network.

Optionally, the predicted frame module 63 includes:

a convolution unit for performing convolution operation on each pixel in the m-th frame reconstruction frame and the m-th frame optical flow prediction frame and the motion vector;

and the dot multiplication unit is used for performing dot multiplication on the results obtained by the convolution unit and the respective pixel weight distribution to obtain the m-th frame prediction frame, wherein the pixel weight distribution is obtained by the pixel weight through a Sigmoid activation function.

The present implementation solves the problem of large motion blur in the video compression process by combining optical flow and separate convolution.

EXAMPLE III

Fig. 7 is a schematic diagram of a video compression terminal device according to an embodiment of the present invention. As shown in fig. 7, the video compression terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72, such as a video compression program, stored in said memory 71 and executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps in the various video compression method embodiments described above, such as the steps 101 to 106 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 61 to 66 shown in fig. 6.

Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the video compression terminal device 7. For example, the computer program 72 may be divided into an optical flow predicted frame module, a network construction module, a predicted frame module, a residual module, a reconstructed frame module, and a video compression module, each of which functions specifically as follows:

and the video compression module is used for compressing the video to be compressed based on the trained separation convolutional network.

The video compression terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The video compression terminal equipment may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by a person skilled in the art that fig. 7 is only an example of a video compression terminal device 7 and does not constitute a limitation of the video compression terminal device 7, and that it may comprise more or less components than those shown, or some components may be combined, or different components may be combined, for example, the video compression terminal device may further comprise input and output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the video compression terminal device 7, such as a hard disk or a memory of the video compression terminal device 7. The memory 71 may also be an external storage device of the video compression terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the video compression terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the video compression terminal device 7. The memory 71 is used to store the computer program and other programs and data required by the video compression terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.

From the above, the present invention solves the problem of large motion blur in the video compression process by combining optical flow and separate convolution.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method of video compression, comprising:

2. The method of claim 1 wherein said training a separate convolution network based on said m-th frame optical flow predicted frame, said m-1 th reconstructed frame, and said m-th frame comprises:

constructing a separate convolutional network;

adding the optical flows of the m-1 frame reconstruction frame, the m frame and the m-1 frame to m-1 frame reconstruction frames and inputting the optical flows into the separation convolution network, wherein the 1 st frame reconstruction frame is obtained by reconstructing the 1 st frame based on the 1 st frame reconstruction network;

training the separate convolutional network based on a back propagation algorithm;

extracting parameters of a bottleneck layer in the separation convolutional network, wherein the bottleneck layer is a convolutional layer with the smallest size in the separation convolutional network;

and inputting the bottleneck layer parameters to obtain a motion vector and a pixel weight based on the separation convolutional network.

3. The video compression method of claim 1, wherein said deriving an mth frame predicted frame from said m-1 frame reconstructed frame, an mth frame optical flow predicted frame, motion vectors, and pixel weights comprises:

and performing convolution operation on each pixel in the mth frame reconstruction frame and the mth frame optical flow prediction frame and the motion vector, and multiplying the pixel weight distribution points respectively to obtain the mth frame prediction frame, wherein the pixel weight distribution is obtained after the pixel weight is obtained through a Sigmoid activation function.

4. The video compression method of claim 2, wherein the training method to reconstruct the network from the 1 st frame comprises:

generating random noise;

inputting the random noise to obtain a 1 st frame reconstruction graph based on the 1 st frame reconstruction network;

calculating a loss function of the 1 st frame reconstruction graph and the 1 st frame;

updating the gradient according to the loss function;

and adjusting parameters of the 1 st frame reconstruction network based on the gradient update.

5. The video compression method of claim 1, wherein said constructing a compressed video based on all mth frames comprises:

and arranging and combining all the reconstructed frames according to the sequence of the frames in the video to be compressed to obtain the compressed video.

6. A video compression apparatus, comprising:

7. The video compression apparatus of claim 6, wherein the predicted frame module comprises:

8. The video compression apparatus of claim 6, wherein the network construction module comprises:

9. Video compression terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements the steps of the method according to any of claims 1 to 5 when executing said computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.