CN108924385B

CN108924385B - Video de-jittering method based on width learning

Info

Publication number: CN108924385B
Application number: CN201810682319.5A
Authority: CN
Inventors: 陈志华; 李超; 陈若溪; 陈莉莉; 盛斌; 李平
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-11-03
Anticipated expiration: 2038-06-27
Also published as: CN108924385A

Abstract

The invention provides a video de-jittering method based on width learning, which comprises the steps of obtaining input data of a training set and input data of a test set according to a current frame to be processed of an original video, a frame corresponding to the processed video and a previous frame corresponding to a frame of an output video of a non-learning processing method, then extracting primary features of video time continuity by using a mapping function, and then performing feature enhancement on the primary features by using an activation function to obtain enhanced features; and combining the primary features and the enhanced features to obtain all features extracted from the nth network, constructing an energy function with video time continuity and video content fidelity as constraint conditions in a training set, solving weights meeting the energy function through a minimum angle regression method, connecting the weights with target weights of a feature layer and an output layer, and finally obtaining a video de-jittering output frame of a test set in the test set according to the target weights and all the extracted features.

Description

Video de-jittering method based on width learning

Technical Field

The invention relates to the field of computer vision and image processing, in particular to a video debouncing method based on width learning.

Background

Video de-dithering methods, which are characterized by removing the dithering present in video, typically include hue dithering and brightness dithering. The video de-jitter method algorithm removes jitter existing between video frames by adding time continuity between frames, and outputs a time continuity video without jitter.

In the prior art, for video de-dithering, a common implementation method is based on a dithering compensation technique, aiming to remove the dithering effect in the video by aligning the hue or brightness between frames. Although the method can reduce the jitter effect existing in the video to a certain extent, the method must first select a plurality of frames as key frames, and select a plurality of frames from the processed video with jitter as key frames, and whether the key frames have time consistency or not is difficult to guarantee; furthermore, if the selected key frame itself has a jitter effect, aligning other frames with the jittered key frame cannot guarantee that the jitter of the processed video can be removed. In addition, another implementation method can maintain the time consistency among video frames by minimizing an energy function containing a time consistency optimization term, but the method is mainly used for a specific application, and the generalization capability of the video image processing method is limited. For example, common video image processing algorithms of this type include: eigen-map decomposition, color classification, color agreement, white balance, etc. Furthermore, application-specific algorithms for removing video jitter are not suitable for most other situations, limiting the generalization capability of this class of algorithms.

In view of the above-mentioned shortcomings of the prior art, it is an urgent need to solve the problem in the computer vision development process how to design a novel video de-jitter method to improve or eliminate many defects, so that the jitter existing in the processed video can be removed to the maximum extent.

Disclosure of Invention

In order to solve the defects of the existing video de-jitter method, the invention provides a video de-jitter method based on width learning, which can build a de-jitter model based on width learning according to the characteristics of an input video and a processed video so as to remove video jitter.

According to an aspect of the present invention, there is provided a video debouncing method based on width learning, including the steps of:

a) according to the current frame I to be processed of the original video_nApplying corresponding frame P of video processed frame by frame based on image processing method_nLast frame O of corresponding frame of output video of non-learning processing method_n-1Obtaining input data X of the training set_nAnd input data F of the test set_nWherein X is_n＝[I_n|P_n|O_n-1]，F_n＝[I_n|P_n]；

b) Extracting the input data X using a mapping function_nIs used forPrimary characterization of video temporal continuity

Wherein the primary characteristics

Expressed as:

wherein W_eiAnd beta_eiRepresenting the randomly generated weights and biases,

is a mapping function;

c) performing feature enhancement on the extracted primary features by using an activation function to obtain enhanced features

Wherein the features are enhanced

Expressed as:

wherein W_hjAnd beta_hiRepresenting randomly generated weights and offsets, ξ_jIn order to activate the function(s),

m common primary features representing the primary features in all frames;

d) extracting the above primary features

And enhanced features

Simultaneously obtaining all the characteristics A extracted from the nth network_n；

Wherein

P common enhancement features representing the enhancement features in all frames;

e) in the training set, constructing a video time continuity C_tAnd video content fidelity C_fAn energy function E which is a constraint, wherein the energy function E is defined as the expression:

solving the weight omega satisfying the energy function E by a least angle regression method_nAnd the weight ω is set_nTarget weights used as a width learning network to connect the feature layer and the output layer;

f) in the test set, according to the target weight ω_nWith all the features A extracted in the nth network_nObtaining an output Y of the test set of the breadth learning network_n：

Y_n＝A_n·ω_n

Wherein the output Y of the test set_nAn output frame is debounced for a width-learning based video.

In one embodiment, the mapping function

Is sigmoid function or tandent function.

In one embodiment, among others, the activation function ξ_jIs sigmoid function or tandent function.

In one embodiment, the weight ω_nAn energy loss cost factor for minimizing a difference between an output frame of the test set and a previous frame to calculate temporal continuity between adjacent frames of the output video:

C_t＝||A_n·ω_n-O_n-1||²。

in one embodiment, the weight ω_nAn energy loss cost factor for minimizing a difference between an nth video frame of an output video of the test set and an nth video frame in a processed video to compute video content fidelity:

C_f＝||A_n·ω_n-P_n||²。

in one embodiment, the weight ω_nWhen the method is used as a width learning network for connecting the target weights of the feature layer and the output layer, the constraint conditions of video time continuity and video content fidelity are met simultaneously.

In one embodiment, the image processing method adopted by the frame-by-frame processed video comprises color classification processing, space white balance processing, color harmony processing and high dynamic range mapping processing.

The video debouncing method based on the width learning is adopted, firstly, input data of a training set and input data of a test set are obtained according to a current frame to be processed of an original video, a frame corresponding to a video processed frame by frame based on an image processing method and a frame previous to a frame corresponding to an output video of a non-learning processing method, then primary features of the input data of the training set for realizing the video time continuity are extracted by using a mapping function, and feature enhancement is carried out on the primary features by using an activation function to obtain enhanced features; and then combining the extracted primary features and the enhanced features to obtain all the features extracted from the nth network, constructing an energy function taking video time continuity and video content fidelity as constraint conditions in a training set, solving the weight meeting the energy function through a minimum angle regression method, using the weight as a target weight for connecting a feature layer and an output layer of the width learning network, and finally obtaining a video debounced output frame of a test set of the width learning network in the test set according to the target weight and all the extracted features. Compared with the prior art, the method has the advantages that the original input video, the processed video and the output video obtained by the traditional jitter removal method are used as input, the width learning network established by continuously extracting the features layer by layer is applied, and the jitter removal output video is obtained under the conditions that the video time continuity and the video content fidelity are constraints.

Drawings

The various aspects of the present invention will become more apparent to the reader after reading the detailed description of the invention with reference to the attached drawings. Wherein the content of the first and second substances,

FIG. 1 illustrates a flow diagram of a width learning based video dejittering method of the present invention;

FIG. 2 shows an architectural schematic diagram of a width learning network for implementing the video de-dithering method of FIG. 1;

FIG. 3A is a diagram illustrating a video frame of an original video, which is Interview;

fig. 3B shows a schematic diagram of a certain video frame of which the original video is Cable;

fig. 3C is a schematic diagram of a certain video frame of which the original video is Chicken;

FIG. 3D shows a schematic diagram of a certain video frame whose original video is CheckingEmail;

FIG. 3E is a diagram of a video frame with an original video at Travel; and

fig. 4 is a schematic diagram illustrating comparison of video de-jittering effects when the original video is fig. 3A to 3E by using the video de-jittering method of fig. 1 and two video de-jittering methods of the prior art.

Detailed Description

In order to make the technical content disclosed in the present application more detailed and complete, reference may be made to the drawings in the embodiment of the present invention, and details of implementation and technical solutions implemented in the present invention will be described in more detail.

Fig. 1 shows a flowchart of a video de-jittering method based on width learning according to the present invention, fig. 2 shows an architecture schematic diagram of a width learning network for implementing the video de-jittering method of fig. 1, fig. 3A to 3E respectively show schematic diagrams of a certain video frame of which an original video is overview, Cable, Chicken, checking email, and Travel, and fig. 4 shows a comparison schematic diagram of video de-jittering effects when the original video is respectively fig. 3A to 3E by using the video de-jittering method of fig. 1 and two video de-jittering methods of the prior art.

The hardware conditions of the invention are that the CPU frequency is 2.40GHz, the computer of the memory 8G, and the software tool is Matlab2014 b.

Referring to fig. 1, in this embodiment, the width learning based video debouncing method of the present application is mainly implemented by the following steps.

First, in step S1, a current frame I to be processed according to an original video_nApplying corresponding frame P of video processed frame by frame based on image processing method_nThe previous frame O of the corresponding frame of the output video of the non-learning processing method (i.e., the conventional processing method)_n-1Obtaining input data X of the training set_nAnd input data F of the test set_nWherein X is_n＝[I_n|P_n|O_n-1]，F_n＝[I_n|P_n]。

In training the test set data of the learning network of width, the corresponding output frame O is taken into account_nAnd P_nVideo content fidelity in between, and output frame O_nAnd its previous frame O_n-1The time continuity between the original video, the processed video and the corresponding frame in the original output video are taken as the input X of the primary feature mapping function_n＝[I_n|P_n|O_n-1]I-th primary feature we get by mapping function

Wherein

Can be any activation function, can be sigmoid or tandent function, W_eiAnd beta_eiRespectively, randomly generated weights and biases with appropriate dimensions, for reconstructing O at the nth_nIn the neural network of (1), if there are m sets of primary mapping features, let us

To represent m sets of primary mapping features in the nth video dejittering breadth learning network, as shown in fig. 2.

Next, in step S2, for the m sets of primary features generated in step S1

Feature enhancement is performed and retrained to obtain enhanced features

In which ξ_j(. -) can be any sigmoid or tandent function, W_hjAnd beta_hiRespectively, randomly generated weights and biases with appropriate dimensions, for reconstructing O at the nth_nIn the neural network of (1), if there are p groups of enhanced features, let us

P sets of enhancement features in a width learning network used to represent the nth video dejitter, as shown in fig. 2.

M sets of primary features in a width learning network resulting in an nth video debounce

And p group enhancement features

Then, let us order

All extracted features in the width learning network representing the nth debounce. Then, we pass the objective to be solvedWeight ω_nA is to be_nAnd an output layer O_nAre connected together. Solving the target weight omega_nIn the later width learning network, the output Y of the test set_n＝A_n·ω_n. Note that in the training set, frame O is output_nThe method is obtained by a known and traditional non-learning debouncing method, a phase of training a width learning network, and the only unknown number is an object weight omega for connecting a characteristic layer and an output layer_n. In the test set, frame Y is output_nIs unknown and can be solved using a trained width learning network, i.e. Y_n＝A_n·ω_n。

In steps S31 and S32, unknown weights ω of a width learning network for realizing video de-jitter are solved_nIn the process of (2), both video time continuity and video content fidelity must be considered.

In detail, when considering temporal continuity between adjacent frames of a video, we make the energy loss penalty of temporal continuity between adjacent frames of an output video C_tWherein the target weight ω_nCan be used to minimize the difference between the output frame of the test set and the previous frame, thereby enabling the calculation of the energy loss cost factor:

C_t＝||A_n·ω_n-O_n-1||²

wherein | · | purple sweet²Represents L₂Normal form (sum of squares of elements of vector then evolution), O_n-1The (n-1) th frame obtained by the conventional video de-jittering method is shown in the training set, and the solved target weight omega is shown in the test set_nThe (n-1) th frame of the network output.

Similarly, to ensure that the content of the dynamic scenes in the processed video is preserved as much as possible in the output video, we need to minimize the difference between the processed video and the output video and make the energy loss penalty between the output video and the processed video C when considering the video content fidelity_f. Wherein the target weight ω_nNth of output video available for minimizing test setThe difference between the video frame and the nth video frame in the processed video, so that the energy loss cost factor of the fidelity of the video content can be calculated:

C_f＝||A_n·ω_n-P_n||²

wherein, P_nRepresenting the nth frame in the processed video.

In step S4, video temporal continuity C is constructed by combining video temporal continuity constraints and video content fidelity differences_tAnd video content fidelity C_fSolving the weight omega satisfying the energy function E through a minimum angle regression method for the energy function E of the constraint condition_nAnd the weight ω is set_nAnd the target weight is used as a width learning network for connecting the characteristic layer and the output layer. The energy function E can be expressed as:

wherein the first term of the above expression is used to minimize the output frame A obtained from the training set_n·ω_nAnd the output frame O obtained by using the traditional video de-jitter method_nImproving the accuracy of the width learning model, the second term λ₁·‖ω_n‖₁And a third term λ₂·‖ω_n‖²Are all regular terms used to prevent overfitting, where λ₁And λ₂Are each L₁Normal form and L₂Regular term coefficients of the paradigm. Lambda [ alpha ]_tAnd λ_fRespectively, coefficients for video temporal continuity and video content fidelity.

To the weight ω of the unknown quantity in the above formula_nWe can solve by minimum angle regression to determine a width learning based video de-jitter model. As shown in FIGS. 3A-3E and 4, when comparing the video de-jittering method of FIG. 1 with the conventional video de-jittering method, it is easy to see that the videos of Lang et al in the prior art are respectively utilized on an Interview video, a Cable video, a Chicken video, a CheckingEmail video and a Travel videoThe Peak Signal to Noise Ratio (PSNR) values of the output video obtained by the debounce method (e.g., curve 2), the video debounce method of bonnel et al in the prior art (e.g., curve 3), and the video debounce method of the present application (e.g., curve 1) are shown by the vertical dashed lines in fig. 4. For example, when the jitter in the Interview video, the Cable video, the Chicken video, the CheckingEmail video, and the Travel video of fig. 3A to 3E respectively comes from performing frame-by-frame processing on the respective original videos by applying image-based color classification, spatial white balance, eigen-map decomposition, high dynamic range mapping, and defogging methods, the temporal consistency of the videos between adjacent frames is not considered. Since the PSNR value can reflect the quality of the output video and the de-jitter effect, the higher the PSNR value is, the better the quality of the output video and the de-jitter effect are. As can be seen from the above figures, the video de-jitter method (e.g. curve 1) of the present application has better de-jitter performance under PSNR metric than various conventional de-jitter methods (e.g. curves 2 and 3).

Hereinbefore, specific embodiments of the present invention are described with reference to the drawings. However, it will be understood by those skilled in the art that equivalents may be substituted for elements thereof without departing from the true spirit and scope of the present invention, and that such modifications and substitutions are intended to be included within the scope of the present invention as set forth in the following claims.

Claims

1. A video de-jittering method based on width learning is characterized by comprising the following steps:

b) Extracting the input data X using a mapping function_nPrimary feature for video temporal continuity

Wherein the primary characteristics

Expressed as:

wherein W_eiAnd beta_eiRepresenting the randomly generated weights and biases,

is a mapping function;

Wherein the features are enhanced

Expressed as:

m common primary features representing the primary features in all frames;

d) extracting the above primary features

And enhanced features

Wherein

solving the weight omega satisfying the energy function E by a least angle regression method_nAnd the weight ω is set_nTarget weights as a width learning network for connecting feature layers and output layers, where λ₁And λ₂Regular term coefficients, λ, of a first and second term normal form, respectively_tAnd λ_fCoefficients of video temporal continuity and video content fidelity, respectively;

Y_n＝A_n·ω_n

2. The video dejittering method of claim 1, wherein mapping function

Is sigmoid function or tandent function.

3. A video dejittering method as claimed in claim 1, wherein the activation function ξ_jIs sigmoid function or tandent function.

4. The method of claim 1Is characterized in that the weight omega_nEnergy loss cost factor for minimizing the difference between the output frame and the previous frame of the test set to calculate temporal continuity between adjacent frames of the output video

C_t＝||A_n·ω_n-O_n-1||²。

5. The video dejittering method of claim 1, wherein weight ω is_nAn energy loss cost factor for minimizing a difference between an nth video frame of an output video of the test set and an nth video frame in a processed video to compute video content fidelity

C_f＝||A_n·ω_n-P_n||²。

6. The video dejittering method of claim 1, wherein weight ω is_nWhen the method is used as a width learning network for connecting the target weights of the feature layer and the output layer, the constraint conditions of video time continuity and video content fidelity are met simultaneously.

7. The video de-dithering method of claim 1, wherein the image processing method adopted by the frame-by-frame processed video includes a color classification process, a spatial white balance process, a color harmony process, and a high dynamic range mapping process.