CN112215140A - 3-dimensional signal processing method based on space-time countermeasure - Google Patents

3-dimensional signal processing method based on space-time countermeasure Download PDF

Info

Publication number
CN112215140A
CN112215140A CN202011083124.2A CN202011083124A CN112215140A CN 112215140 A CN112215140 A CN 112215140A CN 202011083124 A CN202011083124 A CN 202011083124A CN 112215140 A CN112215140 A CN 112215140A
Authority
CN
China
Prior art keywords
space
time
frames
resolution
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011083124.2A
Other languages
Chinese (zh)
Inventor
侯兴松
李瑞敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Tianbiyou Technology Co ltd
Original Assignee
Suzhou Tianbiyou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Tianbiyou Technology Co ltd filed Critical Suzhou Tianbiyou Technology Co ltd
Priority to CN202011083124.2A priority Critical patent/CN112215140A/en
Publication of CN112215140A publication Critical patent/CN112215140A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Graphics (AREA)
  • Signal Processing (AREA)
  • Television Systems (AREA)

Abstract

The invention discloses a 3-dimensional signal processing method based on space-time countermeasure, wherein a space-time countermeasure network comprises a cycle generator, an optical flow estimation network and a space-time discriminator; a loop generator for recursively generating high resolution video frames from a low resolution input; the optical flow estimation network learns the motion compensation between frames; the space-time arbiter may consider spatial and temporal aspects and penalize unrealistic temporal discontinuities in the results without overly smoothing the image content. The invention solves the problem of obvious reduction of visual effect under various and fuzzy motions in super-frequency super-resolution, and simultaneously fully utilizes time information in the video to ensure the space-time consistency of the super-resolution video.

Description

3-dimensional signal processing method based on space-time countermeasure
Technical Field
The invention relates to the field of video super-resolution, in particular to a video super-resolution method based on a space-time countermeasure network.
Background
The spatial resolution of the video depends on the spatial density of the image sensor, motion, system noise, etc. The temporal resolution of the video depends on the frame rate and exposure time of the camera. When the temporal resolution is low, the video may exhibit motion blur and motion aliasing. In recent years, with the application and development of deep learning in computer vision, CNN-based video object detection and motion recognition have made remarkable progress. However, most neural networks for object detection and motion recognition are trained with high-resolution video, and thus the effect of directly applying the trained neural networks to low-resolution video is not ideal and the performance is significantly reduced. In aerial photography and remote sensing videos, targets are often small, detection difficulty is large, and the detection method is especially suitable for low resolution. One possible solution to this is to perform super-resolution on the video before detection and identification.
In early studies on video super-resolution reconstruction, it was often considered as a simple extension of image super-resolution reconstruction, so that temporal redundancy between adjacent frames was not fully exploited. Previous multi-frame/video super-resolution methods are mainly based on reconstruction and exploit inter-frame coherence. Most of the motion estimation methods are based on a Bayesian framework, and the sub-pixel precision motion estimation is carried out by adopting an optical flow technology. These methods can guarantee high fidelity in the presence of small global motion. However, they tend to fail when the motion is more vigorous.
In recent years, research for improving visual quality and fidelity by combining the representation ability of deep learning with inter-frame time consistency has also achieved certain results. In order to grasp temporal consistency, most of the conventional methods use a sliding frame window, and generate one high-resolution frame using a plurality of low-resolution frames as input. To process spatiotemporal information simultaneously, existing methods typically employ temporal fusion techniques such as motion compensation, Bidirectional Recursive Convolutional Networks (BRCN), LSTM, etc. The three-dimensional convolution (C3D) shows excellent performance in video learning. Some researchers have improved BRCN using C3D so that the model can flexibly gain access to different temporal contexts in a natural way, but the network is still shallow. For the loss function, in video super-resolution, the existing method still mainly uses standard loss such as mean square error, rather than the adversarial loss photoresistive video over resolution (pvsr), and proposes to use the adversarial loss, however, the method mainly aims at a pure spatial discriminator.
Disclosure of Invention
The invention aims to provide a space-time countermeasure based 3-dimensional signal processing method (a video super-resolution method based on a space-time countermeasure network) to solve the problem that the visual effect is obviously reduced under various and fuzzy motions in super-frequency super-resolution, and meanwhile, time information in a video is fully utilized to ensure the space-time consistency of the super-resolution video.
In order to achieve the purpose, the technical scheme of the invention is to design a space-time countermeasure based 3-dimensional signal processing method (a space-time countermeasure network based video super-resolution method) and construct a space-time countermeasure network, wherein the space-time countermeasure network comprises a cycle generator G, an optical flow estimation network F and a space-time discriminator D(s,t)
The loop generator G for recursively generating high resolution video frames from a low resolution input; the optical flow estimation network F learns the motion compensation between frames; the space-time discriminator is the core part of the method, can consider the space and time aspects, and punishs unreal time discontinuity in the result under the condition of not excessively smoothing the image content;
the circular generator is based on a circular convolution network coupled with the optical flow estimation network F; the loop generator derives Low Resolution (LR) frames x fromtGenerating a High Resolution (HR) output gtAnd recursively using the previously generated HR outputs gt-1(ii) a The loop generator only learns residual information and then adds it to the low resolution input of the bicubic interpolation;
the space-time arbiter receives two sets of inputs: generating a frame according to the truth value; the two sets of data have the same structure, including: three adjacent HR frames, three corresponding upsampled LR frames, and three warded HR frames; the loop generator can be provided with the information about the authenticity of the spatial details and the gradient of the time change through the training of the loss function; by taking both spatial and temporal inputs into account, a spatio-temporal discriminator D(s,t)Automatically balancing spatial and temporal aspects, avoiding inconsistent sharpness and over-sharpnessAnd (6) smoothing the result.
Preferably, the 3-dimensional signal processing method based on the spatiotemporal countermeasure comprises the following steps:
1) previous frame xt-1And the current frame xtInputting the data into an optical flow estimation network F to generate motion information vt
2)vtAmplifying the size through bicubic interpolation;
3) large size vtHigh resolution frame g from the previous frame obtained by super resolutiont-1Fusion to give w (g)t-1,vt);
4)w(gt-1,vt) And xtInput into a cycle generating network G which learns the residual r between themt
5)xtBicubic interpolation, adding the residual error learned by the circularly generated network to generate a high-resolution frame gt
6) The loop generator has two inputs: a real frame and a generated frame; the two sets of data have the same structure, including: three adjacent HR frames, three LR frames corresponding to bicubic interpolation, and three warp HR frames;
7) the generator is penalized by the arbiter.
Preferably, the Loss of PP-Loss is successfully eliminated by applying PP Loss function training and resisting the Loss PP-Loss in space and time while keeping proper high-frequency details; furthermore, such a lossy construction effectively increases the size of the training data set, and thus represents a useful form of data augmentation.
Figure BDA0002719404830000031
Preferably, a PCD alignment module and a TSA fusion module are added; the PCD module is an alignment module added with a pyramid structure, a cascade structure and a deformable convolution; wherein the alignment of the frames is realized by gradually thinning on the feature layer by using deformable convolution; effectively fusing different frames under various motion and fuzzy states, and adding a space-time attention fusion module; where the spatiotemporal attention fusion module TSA, the attention mechanism applies to both time and space to emphasize important features for subsequent recovery.
The invention provides a 3-dimensional signal processing method based on space-time antagonism, which solves the problem that the visual effect is obviously reduced under various and fuzzy motions in super-frequency super-resolution, and simultaneously fully utilizes the time information in a video to ensure the space-time consistency of the super-resolution video.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention provides a space-time discriminator, which is different from the simple mean square error loss adopted in a video super-resolution method based on a deep neural network, and the scheme of the invention provides the space-time countermeasure loss and considers the inconsistency of time and space; the scheme of the invention adopts the GAN network and learns the spatiotemporal information of the video through the antagonism training of the generator and the discriminator.
Further, considering the problem of how to align multiple frames under large motion and how to effectively fuse different frames under various motion and fuzzy states, the PCD alignment module and the TSA fusion module are added in the generator, so that optical flow information can be better estimated and a super-resolution frame with higher space-time consistency can be generated by fusion.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a network framework diagram of the present invention;
FIG. 3 is a PCD module frame diagram.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1 to 3, the technical solution of the present invention is as follows:
1. a generator: based on a cyclic convolution network coupled to an optical flow estimation network F; the generator is from low resolution(LR) frame xtGenerating a High Resolution (HR) output gtAnd recursively using the previously generated HR outputs gt-1(ii) a The generator only learns the residual information and then adds it to the low resolution input of the bicubic interpolation.
2. A discriminator that receives two sets of inputs: generating a frame according to the truth value; the two sets of data have the same structure, including: three adjacent HR frames, three corresponding upsampled LR frames, and three warpedHR frames.
3. The generation network and the countermeasure network in the invention adopt a VGG-19 network framework.
4. The optical flow estimation network F is added into a PCD module, and the PCD alignment module shown in FIG. 3 aligns the features of each frame by utilizing deformable convolution; in fig. 1, t is a reference frame, and (t + i) is an adjacent frame, as indicated by a black dotted line, the feature of the l-th layer of the (t + i) frame is obtained by down-sampling the feature of the (l-1) layer; the dashed gray line represents that the offset and alignment features of the l-th layer can be predicted by upsampling the offset and alignment features of the (l +1) layer; the gray background part in the figure means that after the pyramid structure, an alignment part with deformable convolution is cascaded to further refine the alignment characteristic.
The TSA module obtains mapping (embeddings) through a simple convolution filter according to the alignment characteristics of the t frame and the adjacent frames of the t frame, performs dot product operation on the mapping (embeddings), and limits the output to [0, 1] by using a sigmod function, so that the frame similarity in a mapping space is obtained, the purpose of time attention is to calculate the frame similarity in the mapping space, and intuitively, the adjacent frames similar to the reference frame should be paid more attention in the mapping space; the temporal attention map is then multiplied with the original alignment features in a pixel-level manner, these features are aggregated using an additional fused convolutional layer, a spatial attention mask is computed from the fused features, after which the fused features are modulated (by dot multiplication and addition).
6. The detailed execution flow of the invention is as follows:
6.1) previous frame xt-1And the current frame xtInput into an optical flow network F to generate motion information vt
6.2)vtAmplifying the size through bicubic interpolation;
6.3) Large size vtHigh resolution frame g from the previous frame obtained by super resolutiont-1Fusion to give w (g)t-1,vt);
6.4)w(gt-1,vt) And xtInput into a generating network G which learns the residual r between themt
6.5)xtBicubic interpolation, adding the generated net work residual error to generate high resolution frame gt
6.6) the generator has two inputs: a real frame and a generated frame; the two sets of data have the same structure, including: three adjacent HR frames, three LR frames corresponding to bicubic interpolation, and three warp HR frames;
6.7) training by using a PP loss function, and providing the generator with information about the authenticity of the spatial details and the gradient of the time change; by simultaneously taking into account the space-time input, discriminator D(s,t)The spatial and temporal aspects are automatically balanced, avoiding inconsistent sharpness and overly smooth results;
Figure BDA0002719404830000061
7. and testing and evaluating the trained network model. The evaluation criterion was PSNR.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the technical principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (4)

1. A3-dimensional signal processing method based on space-time countermeasure is characterized in that a space-time countermeasure network is constructed, and the space-time countermeasure network comprises a cycle generator G, an optical flow estimation network F and a space-time discriminator D(s,t)
The loop generator G for recursively generating high resolution video frames from a low resolution input; the optical flow estimation network F learns the motion compensation between frames; the time-space discriminator punishs unreal time discontinuity in the result under the condition of not excessively smoothing the image content;
the circular generator is based on a circular convolution network coupled with the optical flow estimation network F; the loop generator derives Low Resolution (LR) frames x fromtGenerating a High Resolution (HR) output gtAnd recursively using the previously generated HR outputs gt-1(ii) a The loop generator learns the residual information and then adds it to the low resolution input of the bicubic interpolation;
the space-time arbiter receives two sets of inputs: generating a frame according to the truth value; the two sets of data have the same structure, including: three adjacent HR frames, three corresponding upsampled LR frames, and three warded HR frames; training through a loss function to provide the loop generator with information about the authenticity of the spatial detail and the gradient of the temporal variation; by taking both spatial and temporal inputs into account, a spatio-temporal discriminator D(s,t)The spatial and temporal aspects are automatically balanced to avoid inconsistent sharpness and overly smooth results.
2. The spatio-temporal countermeasure-based 3-dimensional signal processing method according to claim 1, comprising the steps of:
1) previous frame xt-1And the current frame xtInputting the data into an optical flow estimation network F to generate motion information vt
2)vtAmplifying the size through bicubic interpolation;
3) large size vtHigh resolution frame g from the previous frame obtained by super resolutiont-1Fusion to give w (g)t-1,vt);
4)w(gt-1,vt) And xtInput into a cycle generating network G which learns the residual r between themt
5)xtBicubic interpolation, adding the residual error learned by the circularly generated network to generate a high-resolution frame gt
6) The loop generator has two inputs: a real frame and a generated frame; the two sets of data have the same structure, including: three adjacent HR frames, three LR frames corresponding to bicubic interpolation, and three warp HR frames;
7) the generator is penalized by the arbiter.
3. The spatio-temporal confrontation-based 3-dimensional signal processing method according to claim 1, characterized in that a PP loss function is applied for training; the space-time countermeasure Loss PP-Loss successfully eliminates the drift artifact while retaining appropriate high frequency details; furthermore, such a lossy construction effectively increases the size of the training data set, and thus represents a useful form of data augmentation.
4. The space-time countermeasure based 3-dimensional signal processing method according to claim 1, wherein a PCD alignment module and a TSA fusion module are added; the PCD module is an alignment module added with a pyramid structure, a cascade structure and a deformable convolution; wherein the alignment of the frames is realized by gradually thinning on the feature layer by using deformable convolution; effectively fusing different frames under various motion and fuzzy states, and adding a space-time attention fusion module; where the spatiotemporal attention fusion module TSA, the attention mechanism applies to both time and space to emphasize important features for subsequent recovery.
CN202011083124.2A 2020-10-12 2020-10-12 3-dimensional signal processing method based on space-time countermeasure Pending CN112215140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011083124.2A CN112215140A (en) 2020-10-12 2020-10-12 3-dimensional signal processing method based on space-time countermeasure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011083124.2A CN112215140A (en) 2020-10-12 2020-10-12 3-dimensional signal processing method based on space-time countermeasure

Publications (1)

Publication Number Publication Date
CN112215140A true CN112215140A (en) 2021-01-12

Family

ID=74054420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011083124.2A Pending CN112215140A (en) 2020-10-12 2020-10-12 3-dimensional signal processing method based on space-time countermeasure

Country Status (1)

Country Link
CN (1) CN112215140A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077385A (en) * 2021-03-30 2021-07-06 上海大学 Video super-resolution method and system based on countermeasure generation network and edge enhancement
CN113642498A (en) * 2021-08-20 2021-11-12 浙江大学 Video target detection system and method based on multilevel space-time feature fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070511A (en) * 2019-04-30 2019-07-30 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
KR20190091806A (en) * 2018-01-29 2019-08-07 한국과학기술원 Video sequences generating system using generative adversarial networks and the method thereof
CN110533615A (en) * 2019-08-30 2019-12-03 上海大学 A kind of old film large area method for repairing damage based on generation confrontation network
CN111311490A (en) * 2020-01-20 2020-06-19 陕西师范大学 Video super-resolution reconstruction method based on multi-frame fusion optical flow

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190091806A (en) * 2018-01-29 2019-08-07 한국과학기술원 Video sequences generating system using generative adversarial networks and the method thereof
CN110070511A (en) * 2019-04-30 2019-07-30 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110533615A (en) * 2019-08-30 2019-12-03 上海大学 A kind of old film large area method for repairing damage based on generation confrontation network
CN111311490A (en) * 2020-01-20 2020-06-19 陕西师范大学 Video super-resolution reconstruction method based on multi-frame fusion optical flow

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MENGYU CHU等: ""Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation"", 《ARXIV》, 21 May 2020 (2020-05-21), pages 1 - 19 *
XINTAO WANG 等: ""EDVR: Video Restoration With Enhanced Deformable Convolutional Networks"", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)》, 9 April 2020 (2020-04-09), pages 1954 - 1963 *
陈聪颖: ""基于视频超分辨率的研究与应用"", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 07, 15 July 2020 (2020-07-15), pages 23 - 26 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077385A (en) * 2021-03-30 2021-07-06 上海大学 Video super-resolution method and system based on countermeasure generation network and edge enhancement
CN113642498A (en) * 2021-08-20 2021-11-12 浙江大学 Video target detection system and method based on multilevel space-time feature fusion
CN113642498B (en) * 2021-08-20 2024-05-03 浙江大学 Video target detection system and method based on multilevel space-time feature fusion

Similar Documents

Publication Publication Date Title
Liu et al. Video super-resolution based on deep learning: a comprehensive survey
CN110969577B (en) Video super-resolution reconstruction method based on deep double attention network
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
CN111667424B (en) Unsupervised real image denoising method
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
CN108259994B (en) Method for improving video spatial resolution
CN113837938B (en) Super-resolution method for reconstructing potential image based on dynamic vision sensor
CN106952228A (en) The super resolution ratio reconstruction method of single image based on the non local self-similarity of image
CN111739082A (en) Stereo vision unsupervised depth estimation method based on convolutional neural network
CN112529776B (en) Training method of image processing model, image processing method and device
CN108989731B (en) Method for improving video spatial resolution
CN108280804A (en) A kind of multi-frame image super-resolution reconstruction method
CN111242999B (en) Parallax estimation optimization method based on up-sampling and accurate re-matching
CN110363068A (en) A kind of high-resolution pedestrian image generation method based on multiple dimensioned circulation production confrontation network
CN112215140A (en) 3-dimensional signal processing method based on space-time countermeasure
Yuan et al. Single image dehazing via NIN-DehazeNet
WO2024040973A1 (en) Multi-scale fused dehazing method based on stacked hourglass network
Liang et al. Video super-resolution reconstruction based on deep learning and spatio-temporal feature self-similarity
Zhang et al. Learning stacking regressors for single image super-resolution
Xin et al. Video face super-resolution with motion-adaptive feedback cell
CN114494050A (en) Self-supervision video deblurring and image frame inserting method based on event camera
CN116895037A (en) Frame insertion method and system based on edge information and multi-scale cross fusion network
CN116091337A (en) Image enhancement method and device based on event signal nerve coding mode
Qin et al. Remote sensing image super-resolution using multi-scale convolutional neural network
CN112907456B (en) Deep neural network image denoising method based on global smooth constraint prior model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination