CN111083482A

CN111083482A - Video compression network training method and device and terminal equipment

Info

Publication number: CN111083482A
Application number: CN201911411588.9A
Authority: CN
Inventors: 郭烈强
Original assignee: Hefei Tucodec Information Technology Co ltd
Current assignee: Hefei Tucodec Information Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-04-28

Abstract

The invention is suitable for the technical field of video compression, and provides a video compression network training method, a device and terminal equipment, wherein the method comprises the following steps: representing a training video as a sequence of frames comprising N frames, wherein N is a positive integer; constructing a video compression network corresponding to the training video; training the video compression network according to the m-1 th frame reconstruction frame and updating the weight parameters, wherein m is more than 1 and less than or equal to N, and m is a positive integer; based on the video compression network after updating the weight parameters, obtaining an m-th frame reconstruction frame according to the m-1 th frame reconstruction frame and the m-th frame; and inputting each frame in the frame sequence into the video compression network and updating the weight parameters to obtain the trained video compression network. The invention uses the video sequence as a training sample and uses the reconstructed frame of the current frame recovered by the video compression network as a reference frame, thereby enabling the video compression network to learn the characteristic distribution of the reconstructed frame and enabling the video compression network to have better reconstruction capability.

Description

Video compression network training method and device and terminal equipment

Technical Field

The invention belongs to the technical field of video compression, and particularly relates to a video compression network training method, a video compression network training device and terminal equipment.

Background

In the prior art, a single code point model has relatively serious attenuation in a video sequence test, mainly because of mismatching of training tests, two I frames are used as reference frames to compress a current frame in the current model training, and a restored reconstructed frame is required to be used as a reference frame in a model performance test, but the model does not see the reference frame, so that the reconstruction quality of the model is reduced.

Therefore, a new technical solution is needed to solve the above problems.

Disclosure of Invention

In view of this, embodiments of the present invention provide a video compression method and apparatus terminal device, so as to solve the problems in the prior art.

A first aspect of an embodiment of the present invention provides a video compression network training method, including:

representing a training video as a sequence of frames comprising N frames, wherein N is a positive integer;

constructing a video compression network corresponding to the training video;

training the video compression network according to the m-1 th frame reconstruction frame and updating the weight parameters, wherein m is more than 1 and less than or equal to N, and m is a positive integer;

based on the video compression network after updating the weight parameters, obtaining an m-th frame reconstruction frame according to the m-1 th frame reconstruction frame and the m-th frame;

and inputting each frame in the frame sequence into the video compression network and updating the weight parameters to obtain the trained video compression network.

A second aspect of an embodiment of the present invention provides a video compression network training apparatus, including:

a video frame module for representing the training video as a sequence of frames comprising N frames, wherein N is a positive integer;

the network construction module is used for constructing a video compression network corresponding to the training video;

the training module is used for training the video compression network according to the m-1 th frame reconstruction frame and updating the weight parameters, wherein m is more than 1 and less than or equal to N, and m is a positive integer;

the parameter updating module is used for obtaining an m-th frame reconstruction frame according to the m-1 th frame reconstruction frame and the m-th frame based on the video compression network after the weight parameters are updated;

and the training completion module is used for obtaining the trained video compression network after each frame in the frame sequence is input into the video compression network and the weight parameters are updated.

A third aspect of the embodiments of the present invention provides a video compression network training terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method provided in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as provided in the first aspect above.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the invention uses the video sequence as a training sample and uses the reconstructed frame of the current frame recovered by the video compression network as a reference frame, thereby enabling the video compression network to learn the characteristic distribution of the reconstructed frame and enabling the video compression network to have better reconstruction capability.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a video compression network training method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a video compression network training apparatus according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a video compression network training terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Fig. 1 shows an implementation flow of a video compression network training method provided in an embodiment of the present invention, where an execution subject of the method may be a terminal device, which is detailed as follows:

step S101, representing a training video as a frame sequence including N frames, where N is a positive integer.

Alternatively, when the continuous image changes more than 24 frames per second, the human eye cannot distinguish a single static picture according to the principle of persistence of vision, and looks like a smooth continuous visual effect, so that the continuous picture is called a video. Therefore, a video is composed of several frames, i.e., the video is a frame sequence containing N frames, where N is a positive integer.

And S102, constructing a video compression network corresponding to the training video.

Optionally, a video compression network corresponding to the training video is constructed, wherein the video compression network may be a convolutional neural network. Optionally, the compression network may be a network constructed manually, a network may also be constructed by using a network search method, or a combination of the two, which is not limited herein.

Further, for training parameters (such as learning rate, batch processing parameters, weight attenuation, etc.) in the network, a stochastic search (Random search), a Grid search (Grid search), bayesian optimization (bayesian optimization), Reinforcement learning (learning), an evolutionary algorithm (evolutionary algorithm), etc. may be used to set the training parameters. For the parameters defining the network structure (such as the number of layers of the network, the operator of each layer, the filter size in convolution, etc.), tuning is performed by Network Architecture Search (NAS), and it should be understood that the method is only exemplified herein for some methods for tuning parameters when constructing the network, and the process of constructing the network should not be limited in any way.

And S103, training the video compression network according to the m-1 th frame reconstruction frame and updating the weight parameters, wherein m is more than 1 and less than or equal to N, and m is a positive integer.

S1031: taking the m-1 th frame reconstruction frame as the input of the video compression network to obtain an m-th frame reconstruction frame;

optionally, the reconstructed m-1 th frame is input into a video compression network, and the video compression network may be a convolutional neural network, and the convolutional neural network may include at least one convolutional layer. Further, the convolution layer may include a convolution kernel, and the image input to the convolution layer is subjected to convolution operation with the convolution kernel to remove redundant image information, and output an image including feature information. If the size of the convolution kernel is larger than 1 multiplied by 1, the convolution layer can output a plurality of feature maps with the size smaller than that of the input image, and after the processing of the plurality of convolution layers, the size of the image input into the convolution neural network is subjected to multi-stage contraction to obtain a plurality of feature maps with the size smaller than that of the image input into the neural network. Further, in the embodiment of the present invention, the input of the m-1 th frame reconstruction frame into the video compression network to generate the corresponding m-th frame reconstruction frame may be a deconvolution operation, which is the reverse of the above-described process of generating the feature image by removing redundant information from the input image.

When m is 2, the m-1 th frame reconstruction frame is the 1 st frame of the training video. That is, when m-1 is 1, since each frame needs to be reconstructed from the previous frame, and m-1, i.e., the 1 st frame, is not preceded by any frame, the 1 st frame is directly used as the input of the 2 nd frame compression.

S1032: calculating a loss function of the mth frame reconstruction frame and the mth frame, performing gradient updating according to the loss function, and adjusting a weight parameter of the video compression network;

alternatively, the loss function between the mth frame reconstruction frame and the mth frame may use MSE (mean square error). Specifically, the formula of MSE is shown in formula (1):

h is the height of the reconstructed frame of the mth frame, W is the width of the reconstructed frame of the mth frame, C is the channel number of the reconstructed frame of the mth frame, X 'represents the reconstructed frame of the mth frame, and X represents the mth frame and X'_i,j,kA value, X, representing the ith row and jth column of the kth channel in the reconstructed frame of the mth frame_i,j,kRepresenting the value in the ith row and the jth column of the kth channel in the mth frame.

Alternatively, the formula for the gradient update is shown in equation (2):

W′＝W-αΔW (2)

where W represents a weight parameter of the network, W' represents an updated weight parameter, α is a preset learning rate, and Δ W is a calculated gradient.

Alternatively, the calculations can be performed using an existing adaptive gradient optimizer when performing the gradient update. In particular, an Adam optimizer may be used. Further, the MSE calculation result, the weight parameters of the network and the preset learning rate are input into the Adam optimizer, and then updated weight parameters can be obtained.

Further, the original weight parameters in the video compression network are replaced by the updated weight parameters obtained by the calculation.

S1033: and repeatedly executing S1031 to S1032 until the video compression network meets the preset condition.

Optionally, the repeatedly executing S1031 to S1032 until the video compression network satisfies a preset condition includes:

repeating the steps S1 to S2 until the video compression network reaches the preset reconstruction quality

Or

The number of times of repeatedly performing S1 to S2 reaches a preset number of times.

Further, S1031 to S1032 are repeatedly executed the number of times until a preset number of times is reached, wherein the preset number of times is manually preset in the video compression program or preset in the terminal device loading the video compression program.

Further, S1031 to S1032 are repeatedly executed until the video compression network reaches the preset reconstruction quality. The reconstruction quality of the video compression network can be represented by using peak Signal-to-noise ratio psnr (peak Signal to noise ratio) and pixel bit bpp (bits per pixel). Specifically, the test atlas is put into the video compression network to test the reconstruction quality of the video compression network, and can be represented by peak signal-to-noise ratio (PSNR) and pixel Bit (BPP). Optionally, under a fixed pixel bit BPP, it is determined whether the peak signal-to-noise ratio PSNR reaches a preset threshold, and the higher the peak signal-to-noise ratio PSNR, the less information representing the frame loss in compression is. Optionally, the test pattern set may include a set of 24 kodak standard test patterns, which is not limited herein.

And step S104, based on the video compression network after the weight parameter is updated, obtaining an m-th frame reconstruction frame according to the m-1 th frame reconstruction frame and the m-th frame.

Optionally, the m-1 th frame reconstruction frame and the m-th frame are input into the video compression network after the weight parameter is updated, so as to obtain the m-th frame reconstruction frame.

And step S105, inputting each frame in the frame sequence into the video compression network, and updating the weight parameters to obtain the trained video compression network.

Optionally, steps S101 to S105 are a training process of a video training sample. In this embodiment, a plurality of video training samples may be included, and the video compression network trained after one video training sample performs steps S101 to S105 continues to be trained as the video compression network of the next video training sample, where the number of video training samples is not limited herein.

In the embodiment, the video sequence is used as the training sample, and the reconstructed frame of the current frame recovered by the video compression network is used as the reference frame, so that the video compression network learns the feature distribution of the reconstructed frame, and the video compression network has better reconstruction capability.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Example two

Fig. 2 shows a block diagram of a video compression network training apparatus according to an embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown. The video compression network training device 2 includes: a video frame module 21, a network construction module 22, a training module 23, a parameter updating module 24 and a training completion module 25.

The video frame module 21 is configured to represent the training video as a frame sequence including N frames, where N is a positive integer;

a network construction module 22, configured to construct a video compression network corresponding to the training video;

the training module 23 is configured to train the video compression network according to the m-1 th frame reconstruction frame and update the weight parameter, where m is greater than 1 and less than or equal to N, and m is a positive integer;

a parameter updating module 24, configured to obtain an m-th frame reconstructed frame according to the m-1 th frame reconstructed frame and the m-th frame based on the video compression network after the weight parameter is updated;

and a training completion module 25, configured to obtain a trained video compression network after each frame in the frame sequence is input into the video compression network and the weight parameter is updated.

Optionally, the training module 23 includes:

a reconstructed frame unit, configured to use the m-1 th reconstructed frame as an input of the video compression network to obtain an m-th reconstructed frame;

a parameter adjusting unit, configured to calculate a loss function between the mth frame and the reconstructed mth frame, perform gradient update according to the loss function, and adjust a weight parameter of the video compression network;

and the circulating unit is used for repeatedly executing the frame reconstruction unit to the parameter adjusting unit until the video compression network meets the preset condition.

Further, the repeatedly executing the unit of reconstructing the frame to the parameter adjusting unit until the video compression network satisfies the preset condition includes:

repeatedly executing the reconstruction frame unit to the parameter adjusting unit until the video compression network reaches the preset reconstruction quality

Or

And repeatedly executing the reconstructed frame unit until the number of times of the parameter adjusting unit reaches the preset number of times.

Optionally, the obtaining the m-th frame reconstruction frame by using the m-1 th frame reconstruction frame as an input of the video compression network includes:

when m is 2, the m-1 th reconstructed frame is the 1 st frame of the training video.

EXAMPLE III

Fig. 3 is a schematic diagram of a video compression network training terminal device according to an embodiment of the present invention. As shown in fig. 3, the video compression network training terminal device 3 of this embodiment includes: a processor 30, a memory 31, and a computer program 32, such as a video compression network training program, stored in the memory 31 and executable on the processor 30. The processor 30, when executing the computer program 32, implements the steps of the video compression network training method embodiments, such as the steps 101 to 105 shown in fig. 1. Alternatively, the processor 30 implements the functions of the modules/units in the device embodiments, such as the modules 21 to 25 shown in fig. 2, when executing the computer program 32.

Illustratively, the computer program 32 may be divided into one or more modules/units, which are stored in the memory 31 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used for describing the execution process of the computer program 32 in the video compression network training terminal device 3. For example, the computer program 32 may be divided into a video frame module, a network building module, a training module, a parameter updating module, and a training completion module, and the specific functions of each module are as follows:

a parameter updating module, configured to obtain an m-th frame reconstruction frame according to the m-1 th frame reconstruction frame and the m-th frame based on the video compression network after the weight parameter is updated;

The video compression network training terminal device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The video compression network training terminal device may include, but is not limited to, a processor 30 and a memory 31. Those skilled in the art will appreciate that fig. 3 is only an example of the video compression network training terminal device 3, and does not constitute a limitation of the video compression network training terminal device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the video compression network training terminal device may further include an input-output device, a network access device, a bus, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 31 may be an internal storage unit of the video compression network training terminal device 3, such as a hard disk or a memory of the video compression network training terminal device 3. The memory 31 may also be an external storage device of the video compression network training terminal device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the video compression network training terminal device 3. Further, the memory 31 may include both an internal storage unit and an external storage device of the video compression network training terminal device 3. The memory 31 is used for storing the computer program and other programs and data required by the video compression network training terminal device. The above-mentioned memory 31 may also be used to temporarily store data that has been output or is to be output.

As can be seen from the above, in the embodiment, the video sequence is used as the training sample, and the reconstructed frame of the current frame recovered by the video compression network is used as the reference frame, so that the video compression network learns the feature distribution of the reconstructed frame, and the video compression network can have better reconstruction capability.

It is clear to those skilled in the art that for the convenience and simplicity of description, the above functional units and modules are merely illustrated as being divided, and in practical applications, the above secure digital flash memory card and the like may be used as needed, and further, the memory may include both an internal storage unit of some terminal device and an external storage device, the memory is used for storing the computer program and other programs and data required by the terminal device, and the memory may be used for temporarily storing data that has been output or will be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one logical function division, and there may be other division manners in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A video compression network training method is characterized by comprising the following steps:

constructing a video compression network corresponding to the training video;

2. The method of claim 1, wherein the training the video compression network according to the m-1 th frame reconstruction frame and updating the weight parameters comprises:

s1: taking the m-1 th frame reconstruction frame as the input of the video compression network to obtain an m-th frame reconstruction frame;

s2: calculating a loss function of the mth frame reconstruction frame and the mth frame, performing gradient updating according to the loss function, and adjusting a weight parameter of the video compression network;

s3: repeating the execution of S1 to S2 until the video compression network satisfies a preset condition.

3. The video compression network training method of claim 2, wherein the repeatedly performing S1 through S2 until the video compression network satisfies a preset condition comprises:

repeatedly performing S1-S2 until the video compression network reaches a preset reconstruction quality

Or

4. The method of claim 2, wherein the obtaining the m-th frame reconstructed frame by using the m-1 th frame reconstructed frame as an input of the video compression network comprises:

when m is 2, the m-1 th frame reconstruction frame is the 1 st frame of the training video.

5. A video compression network training apparatus, comprising:

6. The video compression network training apparatus of claim 5, wherein the training module comprises:

a reconstructed frame unit, configured to use the m-1 th frame reconstructed frame as an input of the video compression network to obtain an m-th frame reconstructed frame;

the parameter adjusting unit is used for calculating a loss function of the mth frame reconstruction frame and the mth frame, performing gradient updating according to the loss function and adjusting the weight parameter of the video compression network;

7. The apparatus for training a video compression network of claim 6, wherein the repeatedly performing the unit of reconstructed frames to the unit of parameter adjustment until the video compression network satisfies a preset condition comprises:

Or

8. The apparatus for training a video compression network according to claim 6, wherein said taking the m-1 th frame reconstruction frame as an input of the video compression network to obtain the m-th frame reconstruction frame comprises:

9. A video compression network training terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.