CN108924385B - Video de-jittering method based on width learning - Google Patents
Video de-jittering method based on width learning Download PDFInfo
- Publication number
- CN108924385B CN108924385B CN201810682319.5A CN201810682319A CN108924385B CN 108924385 B CN108924385 B CN 108924385B CN 201810682319 A CN201810682319 A CN 201810682319A CN 108924385 B CN108924385 B CN 108924385B
- Authority
- CN
- China
- Prior art keywords
- video
- frame
- features
- output
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/21—Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
Abstract
The invention provides a video de-jittering method based on width learning, which comprises the steps of obtaining input data of a training set and input data of a test set according to a current frame to be processed of an original video, a frame corresponding to the processed video and a previous frame corresponding to a frame of an output video of a non-learning processing method, then extracting primary features of video time continuity by using a mapping function, and then performing feature enhancement on the primary features by using an activation function to obtain enhanced features; and combining the primary features and the enhanced features to obtain all features extracted from the nth network, constructing an energy function with video time continuity and video content fidelity as constraint conditions in a training set, solving weights meeting the energy function through a minimum angle regression method, connecting the weights with target weights of a feature layer and an output layer, and finally obtaining a video de-jittering output frame of a test set in the test set according to the target weights and all the extracted features.
Description
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a video debouncing method based on width learning.
Background
Video de-dithering methods, which are characterized by removing the dithering present in video, typically include hue dithering and brightness dithering. The video de-jitter method algorithm removes jitter existing between video frames by adding time continuity between frames, and outputs a time continuity video without jitter.
In the prior art, for video de-dithering, a common implementation method is based on a dithering compensation technique, aiming to remove the dithering effect in the video by aligning the hue or brightness between frames. Although the method can reduce the jitter effect existing in the video to a certain extent, the method must first select a plurality of frames as key frames, and select a plurality of frames from the processed video with jitter as key frames, and whether the key frames have time consistency or not is difficult to guarantee; furthermore, if the selected key frame itself has a jitter effect, aligning other frames with the jittered key frame cannot guarantee that the jitter of the processed video can be removed. In addition, another implementation method can maintain the time consistency among video frames by minimizing an energy function containing a time consistency optimization term, but the method is mainly used for a specific application, and the generalization capability of the video image processing method is limited. For example, common video image processing algorithms of this type include: eigen-map decomposition, color classification, color agreement, white balance, etc. Furthermore, application-specific algorithms for removing video jitter are not suitable for most other situations, limiting the generalization capability of this class of algorithms.
In view of the above-mentioned shortcomings of the prior art, it is an urgent need to solve the problem in the computer vision development process how to design a novel video de-jitter method to improve or eliminate many defects, so that the jitter existing in the processed video can be removed to the maximum extent.
Disclosure of Invention
In order to solve the defects of the existing video de-jitter method, the invention provides a video de-jitter method based on width learning, which can build a de-jitter model based on width learning according to the characteristics of an input video and a processed video so as to remove video jitter.
According to an aspect of the present invention, there is provided a video debouncing method based on width learning, including the steps of:
a) according to the current frame I to be processed of the original videonApplying corresponding frame P of video processed frame by frame based on image processing methodnLast frame O of corresponding frame of output video of non-learning processing methodn-1Obtaining input data X of the training setnAnd input data F of the test setnWherein X isn=[In|Pn|On-1],Fn=[In|Pn];
b) Extracting the input data X using a mapping functionnIs used forPrimary characterization of video temporal continuityWherein the primary characteristicsExpressed as:
c) performing feature enhancement on the extracted primary features by using an activation function to obtain enhanced featuresWherein the features are enhancedExpressed as:
wherein WhjAnd betahiRepresenting randomly generated weights and offsets, ξjIn order to activate the function(s),m common primary features representing the primary features in all frames;
d) extracting the above primary featuresAnd enhanced features Simultaneously obtaining all the characteristics A extracted from the nth networkn;
e) in the training set, constructing a video time continuity CtAnd video content fidelity CfAn energy function E which is a constraint, wherein the energy function E is defined as the expression:
solving the weight omega satisfying the energy function E by a least angle regression methodnAnd the weight ω is setnTarget weights used as a width learning network to connect the feature layer and the output layer;
f) in the test set, according to the target weight ωnWith all the features A extracted in the nth networknObtaining an output Y of the test set of the breadth learning networkn:
Yn=An·ωn
Wherein the output Y of the test setnAn output frame is debounced for a width-learning based video.
In one embodiment, among others, the activation function ξjIs sigmoid function or tandent function.
In one embodiment, the weight ωnAn energy loss cost factor for minimizing a difference between an output frame of the test set and a previous frame to calculate temporal continuity between adjacent frames of the output video:
Ct=||An·ωn-On-1||2。
in one embodiment, the weight ωnAn energy loss cost factor for minimizing a difference between an nth video frame of an output video of the test set and an nth video frame in a processed video to compute video content fidelity:
Cf=||An·ωn-Pn||2。
in one embodiment, the weight ωnWhen the method is used as a width learning network for connecting the target weights of the feature layer and the output layer, the constraint conditions of video time continuity and video content fidelity are met simultaneously.
In one embodiment, the image processing method adopted by the frame-by-frame processed video comprises color classification processing, space white balance processing, color harmony processing and high dynamic range mapping processing.
The video debouncing method based on the width learning is adopted, firstly, input data of a training set and input data of a test set are obtained according to a current frame to be processed of an original video, a frame corresponding to a video processed frame by frame based on an image processing method and a frame previous to a frame corresponding to an output video of a non-learning processing method, then primary features of the input data of the training set for realizing the video time continuity are extracted by using a mapping function, and feature enhancement is carried out on the primary features by using an activation function to obtain enhanced features; and then combining the extracted primary features and the enhanced features to obtain all the features extracted from the nth network, constructing an energy function taking video time continuity and video content fidelity as constraint conditions in a training set, solving the weight meeting the energy function through a minimum angle regression method, using the weight as a target weight for connecting a feature layer and an output layer of the width learning network, and finally obtaining a video debounced output frame of a test set of the width learning network in the test set according to the target weight and all the extracted features. Compared with the prior art, the method has the advantages that the original input video, the processed video and the output video obtained by the traditional jitter removal method are used as input, the width learning network established by continuously extracting the features layer by layer is applied, and the jitter removal output video is obtained under the conditions that the video time continuity and the video content fidelity are constraints.
Drawings
The various aspects of the present invention will become more apparent to the reader after reading the detailed description of the invention with reference to the attached drawings. Wherein the content of the first and second substances,
FIG. 1 illustrates a flow diagram of a width learning based video dejittering method of the present invention;
FIG. 2 shows an architectural schematic diagram of a width learning network for implementing the video de-dithering method of FIG. 1;
FIG. 3A is a diagram illustrating a video frame of an original video, which is Interview;
fig. 3B shows a schematic diagram of a certain video frame of which the original video is Cable;
fig. 3C is a schematic diagram of a certain video frame of which the original video is Chicken;
FIG. 3D shows a schematic diagram of a certain video frame whose original video is CheckingEmail;
FIG. 3E is a diagram of a video frame with an original video at Travel; and
fig. 4 is a schematic diagram illustrating comparison of video de-jittering effects when the original video is fig. 3A to 3E by using the video de-jittering method of fig. 1 and two video de-jittering methods of the prior art.
Detailed Description
In order to make the technical content disclosed in the present application more detailed and complete, reference may be made to the drawings in the embodiment of the present invention, and details of implementation and technical solutions implemented in the present invention will be described in more detail.
Fig. 1 shows a flowchart of a video de-jittering method based on width learning according to the present invention, fig. 2 shows an architecture schematic diagram of a width learning network for implementing the video de-jittering method of fig. 1, fig. 3A to 3E respectively show schematic diagrams of a certain video frame of which an original video is overview, Cable, Chicken, checking email, and Travel, and fig. 4 shows a comparison schematic diagram of video de-jittering effects when the original video is respectively fig. 3A to 3E by using the video de-jittering method of fig. 1 and two video de-jittering methods of the prior art.
The hardware conditions of the invention are that the CPU frequency is 2.40GHz, the computer of the memory 8G, and the software tool is Matlab2014 b.
Referring to fig. 1, in this embodiment, the width learning based video debouncing method of the present application is mainly implemented by the following steps.
First, in step S1, a current frame I to be processed according to an original videonApplying corresponding frame P of video processed frame by frame based on image processing methodnThe previous frame O of the corresponding frame of the output video of the non-learning processing method (i.e., the conventional processing method)n-1Obtaining input data X of the training setnAnd input data F of the test setnWherein X isn=[In|Pn|On-1],Fn=[In|Pn]。
In training the test set data of the learning network of width, the corresponding output frame O is taken into accountnAnd PnVideo content fidelity in between, and output frame OnAnd its previous frame On-1The time continuity between the original video, the processed video and the corresponding frame in the original output video are taken as the input X of the primary feature mapping functionn=[In|Pn|On-1]I-th primary feature we get by mapping function WhereinCan be any activation function, can be sigmoid or tandent function, WeiAnd betaeiRespectively, randomly generated weights and biases with appropriate dimensions, for reconstructing O at the nthnIn the neural network of (1), if there are m sets of primary mapping features, let usTo represent m sets of primary mapping features in the nth video dejittering breadth learning network, as shown in fig. 2.
Next, in step S2, for the m sets of primary features generated in step S1Feature enhancement is performed and retrained to obtain enhanced featuresIn which ξj(. -) can be any sigmoid or tandent function, WhjAnd betahiRespectively, randomly generated weights and biases with appropriate dimensions, for reconstructing O at the nthnIn the neural network of (1), if there are p groups of enhanced features, let usP sets of enhancement features in a width learning network used to represent the nth video dejitter, as shown in fig. 2.
M sets of primary features in a width learning network resulting in an nth video debounceAnd p group enhancement featuresThen, let us orderAll extracted features in the width learning network representing the nth debounce. Then, we pass the objective to be solvedWeight ωnA is to benAnd an output layer OnAre connected together. Solving the target weight omeganIn the later width learning network, the output Y of the test setn=An·ωn. Note that in the training set, frame O is outputnThe method is obtained by a known and traditional non-learning debouncing method, a phase of training a width learning network, and the only unknown number is an object weight omega for connecting a characteristic layer and an output layern. In the test set, frame Y is outputnIs unknown and can be solved using a trained width learning network, i.e. Yn=An·ωn。
In steps S31 and S32, unknown weights ω of a width learning network for realizing video de-jitter are solvednIn the process of (2), both video time continuity and video content fidelity must be considered.
In detail, when considering temporal continuity between adjacent frames of a video, we make the energy loss penalty of temporal continuity between adjacent frames of an output video CtWherein the target weight ωnCan be used to minimize the difference between the output frame of the test set and the previous frame, thereby enabling the calculation of the energy loss cost factor:
Ct=||An·ωn-On-1||2
wherein | · | purple sweet2Represents L2Normal form (sum of squares of elements of vector then evolution), On-1The (n-1) th frame obtained by the conventional video de-jittering method is shown in the training set, and the solved target weight omega is shown in the test setnThe (n-1) th frame of the network output.
Similarly, to ensure that the content of the dynamic scenes in the processed video is preserved as much as possible in the output video, we need to minimize the difference between the processed video and the output video and make the energy loss penalty between the output video and the processed video C when considering the video content fidelityf. Wherein the target weight ωnNth of output video available for minimizing test setThe difference between the video frame and the nth video frame in the processed video, so that the energy loss cost factor of the fidelity of the video content can be calculated:
Cf=||An·ωn-Pn||2
wherein, PnRepresenting the nth frame in the processed video.
In step S4, video temporal continuity C is constructed by combining video temporal continuity constraints and video content fidelity differencestAnd video content fidelity CfSolving the weight omega satisfying the energy function E through a minimum angle regression method for the energy function E of the constraint conditionnAnd the weight ω is setnAnd the target weight is used as a width learning network for connecting the characteristic layer and the output layer. The energy function E can be expressed as:
wherein the first term of the above expression is used to minimize the output frame A obtained from the training setn·ωnAnd the output frame O obtained by using the traditional video de-jitter methodnImproving the accuracy of the width learning model, the second term λ1·‖ωn‖1And a third term λ2·‖ωn‖2Are all regular terms used to prevent overfitting, where λ1And λ2Are each L1Normal form and L2Regular term coefficients of the paradigm. Lambda [ alpha ]tAnd λfRespectively, coefficients for video temporal continuity and video content fidelity.
To the weight ω of the unknown quantity in the above formulanWe can solve by minimum angle regression to determine a width learning based video de-jitter model. As shown in FIGS. 3A-3E and 4, when comparing the video de-jittering method of FIG. 1 with the conventional video de-jittering method, it is easy to see that the videos of Lang et al in the prior art are respectively utilized on an Interview video, a Cable video, a Chicken video, a CheckingEmail video and a Travel videoThe Peak Signal to Noise Ratio (PSNR) values of the output video obtained by the debounce method (e.g., curve 2), the video debounce method of bonnel et al in the prior art (e.g., curve 3), and the video debounce method of the present application (e.g., curve 1) are shown by the vertical dashed lines in fig. 4. For example, when the jitter in the Interview video, the Cable video, the Chicken video, the CheckingEmail video, and the Travel video of fig. 3A to 3E respectively comes from performing frame-by-frame processing on the respective original videos by applying image-based color classification, spatial white balance, eigen-map decomposition, high dynamic range mapping, and defogging methods, the temporal consistency of the videos between adjacent frames is not considered. Since the PSNR value can reflect the quality of the output video and the de-jitter effect, the higher the PSNR value is, the better the quality of the output video and the de-jitter effect are. As can be seen from the above figures, the video de-jitter method (e.g. curve 1) of the present application has better de-jitter performance under PSNR metric than various conventional de-jitter methods (e.g. curves 2 and 3).
The video debouncing method based on the width learning is adopted, firstly, input data of a training set and input data of a test set are obtained according to a current frame to be processed of an original video, a frame corresponding to a video processed frame by frame based on an image processing method and a frame previous to a frame corresponding to an output video of a non-learning processing method, then primary features of the input data of the training set for realizing the video time continuity are extracted by using a mapping function, and feature enhancement is carried out on the primary features by using an activation function to obtain enhanced features; and then combining the extracted primary features and the enhanced features to obtain all the features extracted from the nth network, constructing an energy function taking video time continuity and video content fidelity as constraint conditions in a training set, solving the weight meeting the energy function through a minimum angle regression method, using the weight as a target weight for connecting a feature layer and an output layer of the width learning network, and finally obtaining a video debounced output frame of a test set of the width learning network in the test set according to the target weight and all the extracted features. Compared with the prior art, the method has the advantages that the original input video, the processed video and the output video obtained by the traditional jitter removal method are used as input, the width learning network established by continuously extracting the features layer by layer is applied, and the jitter removal output video is obtained under the conditions that the video time continuity and the video content fidelity are constraints.
Hereinbefore, specific embodiments of the present invention are described with reference to the drawings. However, it will be understood by those skilled in the art that equivalents may be substituted for elements thereof without departing from the true spirit and scope of the present invention, and that such modifications and substitutions are intended to be included within the scope of the present invention as set forth in the following claims.
Claims (7)
1. A video de-jittering method based on width learning is characterized by comprising the following steps:
a) according to the current frame I to be processed of the original videonApplying corresponding frame P of video processed frame by frame based on image processing methodnLast frame O of corresponding frame of output video of non-learning processing methodn-1Obtaining input data X of the training setnAnd input data F of the test setnWherein X isn=[In|Pn|On-1],Fn=[In|Pn];
b) Extracting the input data X using a mapping functionnPrimary feature for video temporal continuityWherein the primary characteristicsExpressed as:
c) performing feature enhancement on the extracted primary features by using an activation function to obtain enhanced featuresWherein the features are enhancedExpressed as:
wherein WhjAnd betahiRepresenting randomly generated weights and offsets, ξjIn order to activate the function(s),m common primary features representing the primary features in all frames;
d) extracting the above primary featuresAnd enhanced features Simultaneously obtaining all the characteristics A extracted from the nth networkn;
e) in the training set, constructing a video time continuity CtAnd video content fidelity CfAn energy function E which is a constraint, wherein the energy function E is defined as the expression:
solving the weight omega satisfying the energy function E by a least angle regression methodnAnd the weight ω is setnTarget weights as a width learning network for connecting feature layers and output layers, where λ1And λ2Regular term coefficients, λ, of a first and second term normal form, respectivelytAnd λfCoefficients of video temporal continuity and video content fidelity, respectively;
f) in the test set, according to the target weight ωnWith all the features A extracted in the nth networknObtaining an output Y of the test set of the breadth learning networkn:
Yn=An·ωn
Wherein the output Y of the test setnAn output frame is debounced for a width-learning based video.
3. A video dejittering method as claimed in claim 1, wherein the activation function ξjIs sigmoid function or tandent function.
4. The method of claim 1Is characterized in that the weight omeganEnergy loss cost factor for minimizing the difference between the output frame and the previous frame of the test set to calculate temporal continuity between adjacent frames of the output video
Ct=||An·ωn-On-1||2。
5. The video dejittering method of claim 1, wherein weight ω isnAn energy loss cost factor for minimizing a difference between an nth video frame of an output video of the test set and an nth video frame in a processed video to compute video content fidelity
Cf=||An·ωn-Pn||2。
6. The video dejittering method of claim 1, wherein weight ω isnWhen the method is used as a width learning network for connecting the target weights of the feature layer and the output layer, the constraint conditions of video time continuity and video content fidelity are met simultaneously.
7. The video de-dithering method of claim 1, wherein the image processing method adopted by the frame-by-frame processed video includes a color classification process, a spatial white balance process, a color harmony process, and a high dynamic range mapping process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810682319.5A CN108924385B (en) | 2018-06-27 | 2018-06-27 | Video de-jittering method based on width learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810682319.5A CN108924385B (en) | 2018-06-27 | 2018-06-27 | Video de-jittering method based on width learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108924385A CN108924385A (en) | 2018-11-30 |
CN108924385B true CN108924385B (en) | 2020-11-03 |
Family
ID=64421608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810682319.5A Active CN108924385B (en) | 2018-06-27 | 2018-06-27 | Video de-jittering method based on width learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108924385B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109905565B (en) * | 2019-03-06 | 2021-04-27 | 南京理工大学 | Video de-jittering method based on motion mode separation |
CN110222234B (en) * | 2019-06-14 | 2021-07-23 | 北京奇艺世纪科技有限公司 | Video classification method and device |
CN110472741B (en) * | 2019-06-27 | 2022-06-03 | 广东工业大学 | Three-domain fuzzy wavelet width learning filtering system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101616310A (en) * | 2009-07-17 | 2009-12-30 | 清华大学 | The target image stabilizing method of binocular vision system of variable visual angle and resolution |
CN103929568A (en) * | 2013-01-11 | 2014-07-16 | 索尼公司 | Method For Stabilizing A First Sequence Of Digital Image Frames And Image Stabilization Unit |
CN107481185A (en) * | 2017-08-24 | 2017-12-15 | 深圳市唯特视科技有限公司 | A kind of style conversion method based on video image optimization |
CN107808144A (en) * | 2017-11-10 | 2018-03-16 | 深圳市唯特视科技有限公司 | One kind carries out self-supervision insertion posture learning method based on video time-space relationship |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2549068B (en) * | 2016-03-22 | 2021-09-29 | Toshiba Europe Ltd | Image adjustment |
-
2018
- 2018-06-27 CN CN201810682319.5A patent/CN108924385B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101616310A (en) * | 2009-07-17 | 2009-12-30 | 清华大学 | The target image stabilizing method of binocular vision system of variable visual angle and resolution |
CN103929568A (en) * | 2013-01-11 | 2014-07-16 | 索尼公司 | Method For Stabilizing A First Sequence Of Digital Image Frames And Image Stabilization Unit |
CN107481185A (en) * | 2017-08-24 | 2017-12-15 | 深圳市唯特视科技有限公司 | A kind of style conversion method based on video image optimization |
CN107808144A (en) * | 2017-11-10 | 2018-03-16 | 深圳市唯特视科技有限公司 | One kind carries out self-supervision insertion posture learning method based on video time-space relationship |
Non-Patent Citations (3)
Title |
---|
Video Processing Via Implicit and Mixture Motion Models;Xin Li et al;《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》;20070831;全文 * |
利用平稳光流估计的海上视频去抖;王峰等;《中国图象图形学报》;20160331;全文 * |
基于时空一致性的立体视频稳像改进方法;孔悦等;《电视技术》;20160630;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108924385A (en) | 2018-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949255B (en) | Image reconstruction method and device | |
CN108924385B (en) | Video de-jittering method based on width learning | |
CN110570346B (en) | Method for performing style migration on calligraphy based on cyclic generation countermeasure network | |
CN110287819A (en) | Moving target detection method under dynamic background based on low-rank and sparse decomposition | |
CN111080686B (en) | Method for highlight removal of image in natural scene | |
CN112465727A (en) | Low-illumination image enhancement method without normal illumination reference based on HSV color space and Retinex theory | |
Bugeau et al. | Patch-based image colorization | |
CN110809126A (en) | Video frame interpolation method and system based on adaptive deformable convolution | |
CN109544475A (en) | Bi-Level optimization method for image deblurring | |
CN114897711A (en) | Method, device and equipment for processing images in video and storage medium | |
Pan et al. | Real image denoising via guided residual estimation and noise correction | |
CN114693545A (en) | Low-illumination enhancement method and system based on curve family function | |
CN110443754B (en) | Method for improving resolution of digital image | |
CN109905565B (en) | Video de-jittering method based on motion mode separation | |
CN111275751A (en) | Unsupervised absolute scale calculation method and system | |
CN108347549B (en) | Method for improving video jitter based on time consistency of video frames | |
CN110610508A (en) | Static video analysis method and system | |
CN115941871A (en) | Video frame insertion method and device, computer equipment and storage medium | |
Bhatnagar et al. | Reversible Data Hiding scheme for color images based on skewed histograms and cross-channel correlation | |
CN111951183B (en) | Low-rank total variation hyperspectral image restoration method based on near-end alternating penalty algorithm | |
CN108600762B (en) | Progressive video frame generation method combining motion compensation and neural network algorithm | |
Wang et al. | Frequency Compensated Diffusion Model for Real-scene Dehazing | |
CN111667401A (en) | Multi-level gradient image style migration method and system | |
Wei et al. | Multi-Source Collaborative Gradient Discrepancy Minimization for Federated Domain Generalization | |
Wu et al. | Semantic image inpainting based on generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |