CN111866511B

CN111866511B - Video damage repairing method based on convolution long-short term memory neural network

Info

Publication number: CN111866511B
Application number: CN202010794331.2A
Authority: CN
Inventors: 何刚; 宋嘉轩; 卢星星; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2022-03-15
Anticipated expiration: 2040-08-10
Also published as: CN111866511A

Abstract

The invention relates to a method for repairing video damage based on a convolution long-short term memory neural network, which comprises the following steps: 1. compressing the undamaged video code stream A1 by using a video coder-decoder to obtain a damaged video code stream A2; 2. decomposing A1 and A2 frame by frame without loss to obtain frame sequences B1 and B2 respectively; correspondingly storing B1 and B2 frame by frame; 3. extracting the block depth information of A2 by using a video codec S1, dividing A2 frame by frame, and superposing the result on B2 to generate a frame sequence B3 with video coding information; 4. constructing a video repair network; 5. b1 is taken as a label, and B2 and B3 are input into a video repair network and trained by taking a loss function as an optimization target; 6. inputting a video sequence to be repaired and coding information thereof into a trained video repairing network to obtain a repaired video sequence; the damaged video repaired by the method has high definition and is only repaired aiming at the damaged area of the video.

Description

Video damage repairing method based on convolution long-short term memory neural network

The technical field is as follows:

the invention relates to a video damage repairing method, in particular to a video damage repairing method based on a convolution long-term and short-term memory neural network.

Background art:

the video is widely applied to many fields such as civil consumption, security monitoring, military aerospace and the like, along with the development of the technology, the demand of people on the video is continuously improved in vision, and the demand on the video quality in other fields is higher and higher, for example, in the security monitoring technology, the portrait captured in the monitoring needs to be identified, but the identity of the portrait cannot be identified through facial features because the image quality is low. The reason is that in the process of video transmission, encoding and decoding, multiple transcoding compression is needed, and multiple low-quality transcoding superposition can generate serious noise, which is easy to cause errors when a computer extracts information; some high definition videos need to be reduced in resolution during transmission to meet bandwidth requirements, which may affect the viewing experience of the videos. Therefore, it is especially important to perform video enhancement and restoration on low-quality video into high-resolution video.

The video damage repairing method based on deep learning completely surpasses the traditional method in terms of good repairing effect. Due to the complex network structure and the huge parameter amount of the deep learning, the complex application scene can be well fitted. For example, in super-resolution research, the emerging network structures VDSR, DnCNN, etc. improve objective indexes and subjective quality to a height that is difficult to achieve by the conventional method. The deep learning method is very suitable for video recovery with complex noise cause through a method of directly learning video enhancement through a data set, but a large amount of computing resources are consumed during reasoning, the further development of the deep learning method is limited by the processing speed of the deep learning method, the network needs to be optimized, and the requirement of daily scenes is met.

The denoising performance of the conventional image denoising technology DnCNN based on a deep convolution residual error learning method is one of the best methods in the field of the current image denoising algorithm. The self-encoder (Denoising auto encoder) is also more prominent in image Denoising, the Denoising algorithm is unsupervised learning based on a neural network, the characteristics of an image are expressed by learning a hidden layer unit, the input and output of the algorithm are easy to obtain, too many image size changes are not required to be considered, and the data characteristics can be well learned. Forest ago, et al have proposed a self-encoding denoising method, which has a good recovery and denoising effect on defective, lost and blurred digital pictures. An image reconstruction algorithm such as a Trainable Nonlinear Reaction Diffusion (TNRD) model is very suitable for image restoration by improving a non-phenomenon Diffusion model.

The enhancement algorithm based on the deep learning can achieve good effects on subjective and objective indexes, but as the number of layers of the algorithm is deepened, problems such as gradient dispersion, explosion, non-convergence of training and the like can occur, and further improvement of the video definition is limited.

The invention content is as follows:

the technical problem to be solved by the invention is as follows: the method can repair the damaged video, further improve the video definition, and specifically repair the damaged area of the video to avoid the damage to the undamaged area.

The technical scheme of the invention is as follows:

a video damage repairing method based on a convolution long-short term memory neural network comprises the following steps:

step 1, compressing a high-resolution undamaged video code stream A1 by using a video codec, and changing a quantization parameter QP (quantization parameter) into a non-zero value to obtain a damaged video code stream A2;

step 2, decomposing the video code stream A1 lossless frame by frame to obtain a frame sequence B1; carrying out lossless frame-by-frame decomposition on the video code stream A2 to obtain a frame sequence B2; storing frame sequence B1 and frame sequence B2 in a frame-by-frame correspondence;

step 3, extracting the block depth information S1 of the video code stream A2 by using a video codec, dividing the video code stream A2 frame by frame according to the block depth information S1, and superposing a division result on a frame sequence B2 through an iterative algorithm to generate a frame sequence B3 carrying video coding information;

step 4, constructing a video repair network based on the convolution long-term and short-term memory neural network;

step 5, designing a Loss function Loss of a mixed structure of the video restoration network;

step 6, taking a Loss function Loss of the mixed structure as an optimization target, taking a frame sequence B1 as a label of the video repair network, inputting the frame sequence B2 and the frame sequence B3 into the video repair network, and training the video repair network to obtain a trained video repair network;

and 7, inputting the video sequence to be repaired and the coding information thereof into the trained video repairing network to obtain the repaired video sequence.

In step 1, the encoding mode of the video codec is set to be a low-latency full P frame encoding (low _ delay _ P) mode, the Quantization Parameter (QP) is changed to 37, the quantization parameter offset (QPoffset) is 0, and the rest of the encoding parameters are kept unchanged.

The specific process of the step 3 is as follows:

step 3.1, analyzing coding header information of the video code stream A2 by using a video codec, and extracting corresponding information of a Slice (Slice), a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU) and a Transformation Unit (TU) from the coding header information respectively to form block depth information S1;

and 3.2, dividing the video code stream A2 frame by frame according to the block depth information S1, and superposing the division result on a frame sequence B2 through an iterative algorithm to generate a frame sequence B3 with macroblock division information and transformation unit division information.

In the block depth information S1, 0 indicates no division, 1 indicates division into blocks of 64 × 64 size, 2 indicates division into blocks of 32 × 32 size, 3 indicates division into blocks of 16 × 16 size, and 4 indicates division into blocks of 8 × 8 size.

The specific process of the step 4 is as follows:

step 4.1, constructing a coding information extraction network: the coded information extraction network comprises 5 groups, each group comprises 2 convolutional blocks, and each convolutional block comprises a convolutional layer, a Batch Normalization (BN) layer and an activation function (Relu) layer; the input of the coded information extraction network is a frame sequence B3 with video coded information, and the output of the coded information extraction network is an output H1;

step 4.2, constructing a feature fusion network: the feature fusion network comprises 5 groups, each group comprises 3 convolution blocks, and each convolution block comprises a convolution layer, a Batch Normalization (BN) layer and an activation function (Relu) layer; the input of the feature fusion network is output H1 and corresponding channel merging (concat) of frame sequence B2, and the output of the feature fusion network is output H2;

step 4.3, constructing a bidirectional long-short term memory network: the bidirectional long-short term memory network comprises 2 groups, each group comprises 5 Convolutional long-short term memory neural network (ConslTM) cell units, and each Convolutional long-short term memory neural network cell unit receives bidirectional information conduction at the same time and outputs corresponding content; the input of the bidirectional long and short term memory network is output H2, and the output of the bidirectional long and short term memory network is output H3;

and 4.4, constructing a feature compression network: the feature compression network comprises 4 layers of convolution blocks, and each layer of convolution block comprises a convolution layer and an activation function (Relu) layer; the input to the feature compression network is output H3 and the output of the feature compression network is output H4.

In step 5, the mixed structure Loss function Loss contains MSE Loss and L1 Loss, the MSE Loss and the L1 Loss are calculated according to the repaired image and the video code stream A1, the repaired image is generated by the video repair network constructed in step 4, and the calculation formulas of the MSE Loss and the L1 Loss are as follows:

in the formula, I (x)_i,y_j) Representing the coordinates (x) in the restored image_i,y_j) Pixel value of (d), O (x)_i,y_j) The coordinate of the video code stream A1 is (x)_i,y_j) A pixel value of (a), n represents the total number of pixels; the repaired image and n pixels in the corresponding video code stream A1 are subjected to difference one by one, the obtained square sum is MSE loss, and L1 loss is the same as the MSE loss;

the hybrid Loss function Loss is expressed as follows:

Loss＝0.5*MSE+0.5*L1。

and step 7, inputting the video to be repaired and the code stream blocking information thereof into the trained video repairing network to obtain a repaired video sequence.

The video codec is the HEVC official video codec.

The HEVC official video codec is HEVC official video codec HM-16.0.

The invention has the beneficial effects that:

1. the method comprises the steps of utilizing spatial information of a current frame of a video to be repaired and time domain information contained in previous and next frames of the current frame to be repaired to repair the current frame; meanwhile, a deep learning model based on a convolution long-term and short-term memory network is constructed, and a mixed structure loss function is used for guaranteeing detail reconstruction, so that not only is the damaged video repaired, but also the video definition is further improved.

2. The invention extracts the information such as macro block division, transformation unit division and the like which are most relevant to video damage from complicated coding and decoding information, so that the neural network model can pertinently repair the damaged area of the video and avoid the damage to the undamaged area.

The specific implementation mode is as follows:

the video damage repairing method based on the convolution long-short term memory neural network comprises the following steps:

step 1, compressing a high-resolution undamaged video code stream A1 by using an HEVC official video codec HM-16.0, and changing a quantization parameter QP (quantization parameter) to a non-zero value to obtain a damaged video code stream A2;

In step 1, the encoding mode of the HEVC official video codec HM-16.0 is set to a low-latency full P frame coding (low _ delay _ P) mode, the Quantization Parameter (QP) is changed to 37, the quantization parameter offset (QPoffset) is 0, and the rest of the encoding parameters remain unchanged.

The specific process of the step 3 is as follows:

step 3.1, analyzing coding header information of a video code stream A2 by using an HEVC official video codec HM-16.0, and extracting corresponding information of a Slice (Slice), a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU) and a Transformation Unit (TU) from the coding header information respectively to form block depth information S1;

The specific process of the step 4 is as follows:

in the formula, I (x)_i,y_j) Representing the coordinates (x) in the restored image_i,y_j) Pixel value of (d), O (x)_i,y_j) The coordinate of the video code stream A1 is (x)_i,y_j) A pixel value of (a), n represents the total number of pixels; the repaired image and n pixels in the corresponding video code stream A1 are subjected to difference one by one, the obtained square sum is MSE loss, and L1 loss and the MSE lossThe same process is carried out;

the hybrid Loss function Loss is expressed as follows:

Loss＝0.5*MSE+0.5*L1。

Claims

1. A video damage repairing method based on a convolution long-short term memory neural network is characterized by comprising the following steps: comprises the following steps:

step 1, compressing a high-resolution undamaged video code stream A1 by using a video coder-decoder, and changing a quantization parameter into a non-zero value to obtain a damaged video code stream A2;

step 7, inputting the video sequence to be repaired and the coding information thereof into the trained video repairing network to obtain a repaired video sequence;

in the formula, I (x)_i,y_j) Representing the coordinates (x) in the restored image_i,y_j) Pixel value of (d), O (x)_i,y_j) The coordinate of the video code stream A1 is (x)_i,y_j) A pixel value of (a), n represents the total number of pixels;

the hybrid Loss function Loss is expressed as follows:

Loss＝0.5*MSE+0.5*L1。

2. the method for repairing video damage based on the convolutional long-short term memory neural network as claimed in claim 1, wherein: in step 1, the coding mode of the video codec is set to be a low-latency full P-frame coding mode, the quantization parameter is changed to 37, and the offset of the quantization parameter is 0.

3. The method for repairing video damage based on the convolutional long-short term memory neural network as claimed in claim 1, wherein: the specific process of the step 3 is as follows:

step 3.1, analyzing the coding header information of the video code stream A2 by using a video coder-decoder, and respectively extracting corresponding information of a strip, a coding tree unit, a coding unit, a prediction unit and a transformation unit from the coding header information to form block depth information S1;

4. The method for repairing video damage based on the convolutional long short term memory neural network as claimed in claim 3, wherein: in the block depth information S1, 0 indicates no division, 1 indicates division into blocks of 64 × 64 size, 2 indicates division into blocks of 32 × 32 size, 3 indicates division into blocks of 16 × 16 size, and 4 indicates division into blocks of 8 × 8 size.

5. The method for repairing video damage based on the convolutional long-short term memory neural network as claimed in claim 1, wherein: the specific process of the step 4 is as follows:

step 4.1, constructing a coding information extraction network: the coded information extraction network comprises 5 groups, each group comprises 2 convolution blocks, and each convolution block comprises a convolution layer, a batch normalization layer and an activation function layer; the input of the coded information extraction network is a frame sequence B3, and the output of the coded information extraction network is an output H1;

step 4.2, constructing a feature fusion network: the feature fusion network comprises 5 groups, each group comprises 3 convolution blocks, and each convolution block comprises a convolution layer, a batch normalization layer and an activation function layer; the input of the feature fusion network is output H1 to be merged with the corresponding channel of the frame sequence B2, and the output of the feature fusion network is output H2;

step 4.3, constructing a bidirectional long-short term memory network: the bidirectional long and short term memory network comprises 2 groups, each group comprises 5 convolution long and short term memory neural network cell units, and each convolution long and short term memory neural network cell unit receives the conduction of bidirectional information and outputs corresponding content; the input of the bidirectional long and short term memory network is output H2, and the output of the bidirectional long and short term memory network is output H3;

and 4.4, constructing a feature compression network: the feature compression network comprises 4 layers of convolution blocks, and each layer of convolution block comprises a convolution layer and an activation function layer; the input to the feature compression network is output H3 and the output of the feature compression network is output H4.

6. The method for repairing video damage based on the convolutional long-short term memory neural network as claimed in claim 1, wherein: in the step 7, the video to be repaired and the code stream blocking information thereof are input into the trained video repairing network, so as to obtain the repaired video sequence.

7. The method for repairing video damage based on the convolutional long-short term memory neural network as claimed in claim 1, wherein: the video codec is an HEVC video codec.

8. The method for repairing video damage based on the convolutional long short term memory neural network as claimed in claim 7, wherein: the HEVC video codec is HEVC video codec HM-16.0.