WO2021022686A1

WO2021022686A1 - Video compression method and apparatus, and terminal device

Info

Publication number: WO2021022686A1
Application number: PCT/CN2019/114953
Authority: WO
Inventors: 周雷
Original assignee: 合肥图鸭信息科技有限公司
Priority date: 2019-08-08
Filing date: 2019-11-01
Publication date: 2021-02-11
Also published as: CN110650339A

Abstract

Disclosed are a video compression method and apparatus, a terminal device, and a computer storage medium. The video compression method comprises: representing a video to be compressed as a frame sequence including N frames; constructing a corresponding compression network for each frame in the frame sequence; training the compression network corresponding to an mth frame according to a reconstructed frame of a (m-1)th frame, obtaining a reconstructed frame of the mth frame according to the trained compression network corresponding to the mth frame, and simultaneously storing a weighting parameter of the compression network corresponding to the mth frame; and forming, according to the order of the frame sequence, a compressed video by using the stored weighting parameter of the compression network corresponding to each frame. In the present invention, a new video compression framework that is more concise than existing video compression frameworks is constructed by first constructing a compression network corresponding to each frame, and then training the compression network corresponding to a next frame by means of a reconstructed frame of a previous frame and reconstructing the next frame. The video compression performance is improved by means of learning prior information of a video.

Description

Video compression method, device and terminal equipment

This application claims the priority of the Chinese patent application filed at the Chinese Patent Office on August 8, 2019, with the application number 201910727889.6 and the title of the invention "a video compression method, device and terminal equipment", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the technical field of video compression, and in particular to a video compression method, device and terminal equipment.

Background technique

With the continuous development of multimedia information technology, a large number of video information has emerged. As a comprehensive medium for expressing information, video data has become an important information carrier in our real life. With the increasing popularity of surveillance imaging equipment, massive amounts of video data have been generated, and video storage is a very important link in the application of video surveillance systems. Large amounts of video data usually need to be stored for a long time to provide services for the retrieval and playback of video recording materials in the future.

The most common way to store video data is to simply store the original video data for the entire period. This method saves a lot of background information, that is, there is a lot of redundancy in the saved information, which consumes huge storage costs, and it is difficult to save long-term video records. Therefore, the original video data must be compressed and stored to reduce storage space and extend storage time. However, with the development of video compression technology, many video compression algorithms and technologies have emerged, such as the MPEG compression algorithm. Its advantage is that the image quality after decoding is clearer, but the algorithm has a low compression rate and takes up a lot of bandwidth; H.264 The compression rate of the compression technology is high, but the picture quality after decoding is relatively poor, and the occupied bandwidth varies greatly with the complexity of the picture motion.

technical problem

One of the objectives of the embodiments of the present application is to provide a video compression method, device, and terminal equipment, aiming to solve the problem of low video compression performance.

Technical solutions

In order to solve the above technical problems, the technical solutions adopted in the embodiments of this application are:

In the first aspect, a video compression method is provided, including:

Express the video to be compressed as a frame sequence containing N frames, where N is a positive integer;

Respectively construct a corresponding compression network for each frame in the frame sequence;

Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame The weight parameter of the compression network, where 1≤m≤N, and m is a positive integer;

The weight parameters of the compression network corresponding to each of the stored frames are formed into a compressed video in the order of the frame sequence.

In a second aspect, a video compression device is provided, including:

The frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;

The network construction module is used to construct a corresponding compression network for each frame in the frame sequence;

The reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store it at the same time The weight parameter of the compression network corresponding to the mth frame, where 1≤m≤N, and m is a positive integer;

The compressed video forming module is used for forming the compressed video according to the order of the frame sequence by the weight parameters of the compression network corresponding to each of the stored frames.

In a third aspect, a video compression terminal device is provided, including a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor. When the above-mentioned processor executes the above-mentioned computer program, the above-mentioned computer program is implemented Steps of the method.

The beneficial effect of the video compression method provided by the embodiments of this application is that: this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the next frame through the reconstructed frame of the previous frame. Compress the network and reconstruct the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.

The beneficial effects of the video compression device provided in the embodiments of the present application are: in this embodiment, a compression network corresponding to each frame is constructed through the network construction module, and then the compression network of the next frame is trained by the reconstruction module according to the reconstructed frame of the previous frame And reconstruct the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.

The beneficial effect of the video compression terminal device provided by the embodiment of this application is that: this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the next frame through the reconstructed frame of the previous frame Compress the network and reconstruct the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the embodiments or exemplary technical descriptions. Obviously, the drawings in the following description are only for the present application. For some embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic diagram of the implementation process of a video compression method provided by an embodiment of the present application;

Figure 2 is a schematic diagram of the implementation process of training a compression network provided by another embodiment of the present application;

FIG. 3 is a schematic diagram of a compressed network structure provided by another embodiment of the present application;

FIG. 4 is a schematic diagram of an implementation process of generating a reconstructed frame provided by another embodiment of the present application;

FIG. 5 is a schematic diagram of a video compression device provided by another embodiment of the present application;

Fig. 6 is a schematic diagram of a video compression terminal device provided by another embodiment of the present application.

Embodiments of the invention

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, and are not used to limit the present application.

It should be noted that when a component is referred to as being "fixed to" or "installed on" another component, it can be directly on the other component or indirectly on the other component. When a component is said to be "connected" to another component, it can be directly or indirectly connected to the other component. The terms "upper", "lower", "left", "right", etc. indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, and are only for ease of description, and do not indicate or imply the device referred to. Or the element must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be construed as a limitation of the present application. For those of ordinary skill in the art, the specific meaning of the above terms can be understood according to specific conditions. The terms "first" and "second" are only used for ease of description, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features. The meaning of "plurality" means two or more than two, unless otherwise specifically defined.

In order to illustrate the technical solutions described in this application, detailed descriptions are given below in conjunction with specific drawings and embodiments.

Example one

Fig. 1 shows the implementation process of the video compression method provided in the first embodiment of the present invention. The execution subject of the method may be a terminal device. The details are as follows:

In step S101, the video to be compressed is represented as a frame sequence containing N frames, where N is a positive integer.

Optionally, when the continuous image changes more than 24 frames per second, according to the principle of persistence of vision, human eyes cannot distinguish a single static image, and it looks like a smooth and continuous visual effect. Such continuous images are called videos. So the video is composed of several frames, that is, the video is a frame sequence containing N frames, where N is a positive integer.

In step S102, a corresponding compression network is constructed for each frame in the aforementioned frame sequence.

Optionally, a compression network is constructed for each frame in the aforementioned frame sequence, where the aforementioned compression network may be a convolutional neural network. Optionally, the construction of the compressed network may be a manually constructed network, or a network search method may be used to construct a network, or it may be a combination of the two, which is not limited here.

Optionally, for training parameters in the network (such as learning rate, batch processing parameters, weight decay, etc.), random search, grid search, and Bayesian optimization can be used. , Reinforcement learning (Reinforcement learning), Evolutionary algorithm (Evolutionary Algorithm) and other Hyperparameter optimization (HO) frameworks for setting. For the parameters that define the network structure (such as the number of layers of the network, the operator of each layer, the size of the filter in the convolution, etc.), the network architecture search (Neural Architecture Search, NAS) is used for tuning. It should be understood that here Only some methods of adjusting parameters when constructing a network are given as examples, and the process of constructing a network should not be any limitation.

Step S103: Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame The weight parameter of the compression network, where 1≤m≤N, and m is a positive integer.

Among them, when m=1, the m-1th frame is the pre-input data matrix. That is, when m-1=0, since the 0th frame does not exist in the frame sequence, a data matrix needs to be input in advance. Optionally, the aforementioned data matrix may be a randomly generated noise matrix, or a matrix whose values are all 1, and the data matrix is not limited here.

Optionally, FIG. 2 shows a flowchart of training a compression network. The training of the compression network corresponding to the m-th frame according to the reconstructed frame of the m-1th frame includes:

S201: Use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;

Optionally, the m-1th frame reconstructed frame is input into a compression network corresponding to the mth frame. The compression network may be a convolutional neural network, and the convolutional neural network may include at least one convolutional layer. Optionally, the above-mentioned convolutional layer may include a convolution kernel, and the image input to the convolutional layer undergoes a convolution operation with the convolution kernel to remove redundant image information, and output an image containing feature information. If the size of the above convolution kernel is greater than 1×1, the convolutional layer can output multiple feature maps whose size is smaller than the input image. After multiple convolutional layers are processed, the size of the image input to the convolutional neural network is Multi-level shrinking, to obtain multiple feature maps whose size is smaller than the image size of the input neural network. Optionally, in the embodiment of the present invention, inputting the reconstructed frame of the m-1th frame into the compression network to generate the corresponding reconstructed image of the mth frame may be a deconvolution operation, and the deconvolution operation is the same as the input image described above. The process of removing redundant information to generate feature images is reversed.

For example, as shown in Figure 3, the compression network may be a convolutional neural network, including four convolution or deconvolution layers, and the size of the weight parameter matrix corresponding to the four convolution or deconvolution layers is H ₁ * W ₁ *N ₁ *C ₁ , H ₂ *W ₂ *N ₂ *C ₂ , H ₃ *W ₃ *N ₃ *C ₃ , H ₄ *W ₄ *N ₄ *C ₄ , the steps are respectively S ₁ , S ₂ , S ₃ , S ₄ . Among them, H is the height of the weight parameter matrix, W is the width of the weight parameter matrix, N is the number of output channels of the weight parameter matrix, and C is the number of input channels of the weight parameter matrix. And the number of input channels of the weight parameters of the four convolution or deconvolution layers is respectively associated with the number of output channels of the weight parameters of the previous layer. Illustratively, in Fig. 3, C ₁ =C, C ₂ =N ₁ , C ₃ =N ₂ , and C ₄ =N ₃ . Assuming that the input m-1th frame reconstruction frame is a matrix of 1*H*W*C, after the above convolutional neural network, an output matrix of size 1*H'*W'*C' is generated correspondingly, where H '=H*S ₁ *S ₂ *S ₃ *S ₄ , W'=W*S ₁ *S ₂ *S ₃ *S ₄ , C'=C ₄ . Here is only an example of one of the compression networks, and should not limit the structure of the compression network.

S202: Calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the above-mentioned loss function, and adjust the weight parameter of the compression network corresponding to the m-th frame;

Optionally, the loss function between the reconstructed image of the mth frame and the mth frame may use MSE (Mean Square Error). Specifically, the formula of MSE is shown in formula (1):

Among them, H is the height of the m- _th frame reconstruction image, W is the width of the m- _th frame reconstruction image, C is the channel number of the m- _th frame reconstruction image, X'represents the m- _th frame reconstruction image, X represents the m- _th frame, _{X'i ,j,k} represent the value of the i-th row and j-th column of the k-th channel in the m- _th frame reconstruction image, and Xi _,j,k represent the value of the i-th row and jth column of the k-th channel in the m- _th frame.

Optionally, the gradient update formula is shown in formula (2):

W′=W-αΔW (2)

Among them, W represents the weight parameter of the network, W'represents the weight parameter after update, α is the preset learning rate, and ΔW is the calculated gradient.

Optionally, when performing gradient update, an existing adaptive gradient optimizer can be used for calculation. Specifically, the Adam optimizer can be used. Optionally, input the above-mentioned MSE calculation result, the weight parameter of the network, and the preset learning rate in the Adam optimizer to obtain the updated weight parameter.

Optionally, replace the original weight parameter in the compression network with the updated weight parameter obtained by the foregoing calculation to become the compression network corresponding to the new m-th frame.

S203: Repeat S201 to S202 until the compression network corresponding to the mth frame meets the preset condition.

Optionally, repeating S201 to S202 until the compression network corresponding to the mth frame meets a preset condition includes:

Repeat S201 to S202 until the compression network corresponding to the mth frame above reaches the preset reconstruction quality

or

Repeat S201 to S202 until the preset number of times is reached.

Optionally, the number of repeated executions of S201 to S202 reaches a preset number of times, where the preset number is manually preset in the video compression program or preset in the terminal device that loads the video compression program.

Optionally, S201 to S202 are repeatedly executed until the compression network corresponding to the mth frame reaches a preset reconstruction quality. Among them, the reconstruction quality of the compression network corresponding to the m-th frame can be represented by a peak signal-to-noise ratio (PSNR) (Peak Signal to Noise Ratio) and a pixel bit (BPP) (bits per pixel).

Specifically, the test atlas is put into the compression network corresponding to the mth frame to test the reconstruction quality of the compression network corresponding to the mth frame, which can be represented by the peak signal-to-noise ratio PSNR and the pixel bit BPP. Optionally, under a fixed pixel bit BPP, it is determined whether the peak signal-to-noise ratio (PSNR) reaches a preset threshold. The higher the peak signal-to-noise ratio (PSNR), the less information is lost in the frame during compression. Optionally, the aforementioned test atlas may include 24 Kodak standard test atlases, which are not limited here.

Figure 4 is a schematic diagram of the implementation process of generating a reconstructed frame provided by an embodiment of the present invention

S401: Use the weight parameter of the compression network corresponding to the mth frame as the compression feature;

Optionally, the weight parameter of the compression network corresponding to the m-th frame is extracted, and the weight parameter of the compression network corresponding to the m-th frame is used as the compression feature.

S402: Obtain coded data by encoding the compression feature;

The foregoing coding may be an entropy coding scheme such as Shannon coding, Huffman coding, or arithmetic coding, which is not limited here.

S403: Decode the aforementioned encoded data to generate reconstruction weight parameters;

S404: Initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;

Optionally, the reconstruction weight parameter generated in step S403 is replaced with the weight parameter in the compression network corresponding to the m-th frame to form a new compression network corresponding to the m-th frame.

S405: Input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated, to obtain the mth frame reconstructed frame.

In step S104, the weight parameters of the compression network corresponding to each of the above-mentioned stored frames are formed into a compressed video in the order of the above-mentioned frame sequence.

Optionally, in step S103, the weight parameters of the compression network corresponding to each frame are stored. According to the sequence of the above-mentioned frame sequence, the collection of the weight parameters of the compression network corresponding to each frame is the compressed video.

In this embodiment, a new video compression framework is constructed: first, a compression network corresponding to each frame is constructed, and then the compression network of the next frame is trained through the reconstruction frame of the previous frame and the next frame is reconstructed. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

Example two

FIG. 5 shows a structural block diagram of a video compression device provided by an embodiment of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown. The video compression device 5 includes: a frame sequence module 51, a network construction module 52, a reconstruction module 53, and a compressed video construction module 54.

Wherein, the aforementioned frame sequence module 51 is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;

The network construction module 52 is used to construct a corresponding compression network for each frame in the above frame sequence;

The reconstruction module 53 is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the above-mentioned The weight parameter of the compression network corresponding to the m frame, where 1≤m≤N, and m is a positive integer;

The compressed video composing module 54 is used for composing the compressed video in the order of the aforementioned frame sequence by the weight parameters of the compression network corresponding to each of the aforementioned stored frames.

Optionally, the above-mentioned reconstruction 53 module includes:

A reconstruction unit, configured to use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;

A parameter adjustment unit, configured to calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the above-mentioned loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;

The repeated execution unit is configured to repeatedly execute the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the mth frame meets the preset condition.

Optionally, the repeated execution of the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame meets a preset condition includes:

Repeat the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame reaches the preset reconstruction quality

or

The reconstruction unit is repeatedly executed until the number of parameter adjustment units reaches the preset number.

Optionally, the foregoing reconstruction module 53 further includes:

A parameter extraction unit, configured to use the weight parameter of the compression network corresponding to the mth frame as the compression feature;

The coding unit is used to obtain coded data by coding the aforementioned compression features;

A decoding unit, configured to decode the aforementioned encoded data to generate reconstruction weight parameters;

An initialization unit, configured to initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;

The reconstructed frame generating unit is configured to input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.

Optionally, when m=1, the m-1th frame is a pre-input data matrix.

In this embodiment, the network construction module 52 constructs the compression network corresponding to each frame, and then the reconstruction module 53 trains the compression network of the next frame according to the reconstructed frame of the previous frame and reconstructs the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.

Example three

Fig. 6 is a schematic diagram of a video compression terminal device provided by an embodiment of the present invention. As shown in Fig. 6, the video compression terminal device 6 of this embodiment includes: a processor 60, a memory 61, and a computer program 62 stored in the memory 61 and running on the processor 60, such as a video compression program. When the processor 60 executes the computer program 62, the steps in the foregoing embodiments of the video compression method, such as steps 101 to 104 shown in FIG. 1, are implemented. Alternatively, when the processor 60 executes the computer program 62, the functions of the modules/units in the foregoing device embodiments, such as the functions of the modules 51 to 54 shown in FIG. 5, are realized.

Exemplarily, the foregoing computer program 62 may be divided into one or more modules/units, and the foregoing one or more modules/units are stored in the foregoing memory 61 and executed by the foregoing processor 60 to complete the present invention. The foregoing one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the foregoing computer program 62 in the foregoing video compression terminal device 6. For example, the aforementioned computer program 62 can be divided into a frame sequence module, a network building module, a reconstruction module, and a compressed video building module. The specific functions of each module are as follows:

The network construction module is used to construct a corresponding compression network for each frame in the above frame sequence;

The reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the above-mentioned The weight parameter of the compression network corresponding to the m frame, where 1≤m≤N, and m is a positive integer;

The compressed video composition module is used to compose the compressed video according to the order of the frame sequence of the weight parameters of the compression network corresponding to each of the above-mentioned stored frames.

The aforementioned video compression terminal device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The foregoing video compression terminal device may include, but is not limited to, a processor 60 and a memory 61. Those skilled in the art can understand that FIG. 6 is only an example of the video compression terminal device 6 and does not constitute a limitation on the video compression terminal device 6. It may include more or less components than those shown in the figure, or combine certain components. Or different components, for example, the above-mentioned video compression terminal device may also include input and output devices, network access devices, buses, and so on.

The so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The above-mentioned memory 61 may be an internal storage unit of the above-mentioned video compression terminal device 6, such as a hard disk or a memory of the video compression terminal device 6. The aforementioned memory 61 may also be an external storage device of the aforementioned video compression terminal device 6, such as a plug-in hard disk equipped on the aforementioned video compression terminal device 6, a Smart Media Card (SMC), and Secure Digital (SD) ) Card, Flash Card, etc. Optionally, the aforementioned memory 61 may also include both an internal storage unit of the aforementioned video compression terminal device 6 and an external storage device. The memory 61 is used to store the computer program and other programs and data required by the video compression terminal device. The aforementioned memory 61 may also be used to temporarily store data that has been output or will be output.

As can be seen from the above, this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the compression network of the next frame through the reconstruction frame of the previous frame and reconstructs the next frame . By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion, that is, divide the internal structure of the above device into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Form realization can also be realized in the form of software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed device/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are merely illustrative. For example, the division of the above-mentioned modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or Components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the above integrated modules/units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The above-mentioned computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The above-mentioned computer-readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random Access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, software distribution medium, etc. It should be noted that the content contained in the above-mentioned computer-readable media can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable media cannot Including electrical carrier signals and telecommunication signals.

The above are only optional embodiments of the application, and are not used to limit the application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims

A video compression method, characterized in that it comprises:

Express the video to be compressed as a frame sequence containing N frames, where N is a positive integer;

Respectively construct a corresponding compression network for each frame in the frame sequence;

Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame The weight parameter of the compression network, where 1≤m≤N, and m is a positive integer;

The weight parameters of the compression network corresponding to each of the stored frames are formed into a compressed video in the order of the frame sequence.
The video compression method according to claim 1, wherein the training the compression network corresponding to the m-th frame according to the reconstructed frame of the m-1th frame comprises:

S1: Use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;

S2: Calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;

S3: Repeat S1 to S2 until the compression network corresponding to the m-th frame meets a preset condition.
The video compression method according to claim 2, wherein the repeating S1 to S2 until the compression network corresponding to the m-th frame meets a preset condition comprises:

Repeat S1 to S2 until the compression network corresponding to the mth frame reaches the preset reconstruction quality

or

Repeat S1 to S2 until the preset number of times is reached.
The video compression method according to claim 1, wherein the obtaining the m-th frame reconstructed frame according to the compression network corresponding to the m-th frame after the training comprises:

Taking the weight parameter of the compression network corresponding to the mth frame as the compression feature;

Encoding the compression feature to obtain encoded data;

Decoding the encoded data to generate reconstruction weight parameters;

Initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;

The m-1th frame is input into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.
The video compression method according to any one of claims 1 to 3, wherein the compression network corresponding to the mth frame is trained according to the reconstructed frame of the m-1th frame, and the compression network corresponding to the mth frame is trained according to the trained The m-th frame reconstructed frame obtained by the compression network corresponding to the m frame includes:

When m=1, the m-1th frame is a pre-input data matrix.
A video compression device, characterized by comprising:

The frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;

The network construction module is used to construct a corresponding compression network for each frame in the frame sequence;

The reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store it at the same time The weight parameter of the compression network corresponding to the mth frame, where 1≤m≤N, and m is a positive integer;

The compressed video forming module is used for forming the compressed video according to the order of the frame sequence by the weight parameters of the compression network corresponding to each of the stored frames.
7. The video compression device of claim 6, wherein the reconstruction module comprises:

A reconstruction unit, configured to use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;

A parameter adjustment unit, configured to calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;

The repeated execution unit is configured to repeatedly execute the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame meets a preset condition.
8. The video compression device of claim 7, wherein the reconstruction module further comprises:

A parameter extraction unit, configured to use the weight parameter of the compression network corresponding to the m-th frame as a compression feature;

An encoding unit for encoding the compression feature to obtain encoded data;

A decoding unit, configured to decode the encoded data to generate reconstruction weight parameters;

An initialization unit, configured to initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;

The reconstructed frame generating unit is configured to input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.
A video compression terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor executes the computer program as claimed Steps of the method described in any one of 1 to 5.