GB2600359A - Video interpolation using one or more neural networks - Google Patents

Video interpolation using one or more neural networks Download PDF

Info

Publication number
GB2600359A
GB2600359A GB2201524.2A GB202201524A GB2600359A GB 2600359 A GB2600359 A GB 2600359A GB 202201524 A GB202201524 A GB 202201524A GB 2600359 A GB2600359 A GB 2600359A
Authority
GB
United Kingdom
Prior art keywords
neural networks
frame
training
processor
pseudo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2201524.2A
Inventor
Reda Fitsum
Sun Deqing
Dundar Aysegul
Shoeybi Mohammad
Liu Guilin
Shih Kevin
Tao Andrew
Kautz Jan
Catanzaro Bryan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Publication of GB2600359A publication Critical patent/GB2600359A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0127Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0135Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.

Claims (37)

CLAIMS WHAT IS CLAIMED IS
1. A processor comprising: one or more arithmetic logic units (ALUs) to be configured to generate higher frame rate video from lower frame rate video using one or more neural networks
2. The processor of claim 1, wherein one or more neural networks are trained using unsupervised training with at least one cycle consistency constraint
3. The processor of claim 2, wherein the unsupervised training includes generating, from a frame triplet, a set of intermediate frames and generating a version of a middle triplet frame from the intermediate frames for determining a loss value to be minimized
4. The processor of claim 1, wherein the one or more neural networks are refined using pseudo-supervised training for a domain other than was used for training the one or more neural networks
5. The processor of claim 4, wherein the pseudo-supervised training includes generating, using one or more already trained neural networks, versions of an intermediate frame using each of two adjacent video frames for determining a loss value to be minimized
6. The processor of claim 1, wherein the one or more neural networks utilize one or more image interpolation algorithms
7. The processor of claim 1, wherein the one or more ALUs are further to be configured to generate enhanced video, using the one or more neural networks, having a higher resolution or lower frame drop rate than input video
8. A system comprising: one or more processors to be configured to generate higher frame rate video from lower frame rate video using one or more neural networks; and one or more memories to store the one or more neural networks
9. The system of claim 8, wherein one or more neural networks are trained using unsupervised training with at least one cycle consistency constraint
10. The system of claim 9, wherein the cycle consistency constraint includes generating, from a frame triplet, a set of intermediate frames and generating a version of a middle triplet frame from the intermediate frames for determining a loss value to be minimized
11. The system of claim 8, wherein the one or more neural networks are refined using pseudo-supervised training for a domain other than was used for training the one or more neural networks
12. The system of claim 11, wherein the pseudo-supervised training includes generating, using one or more already trained neural networks, versions of an intermediate frame using each of two adjacent video frames for determining a loss value to be minimized
13. The system of claim 8, wherein the one or more neural networks utilize one or more image interpolation algorithms
14. A machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: generate higher frame rate video from lower frame rate video using one or more neural networks
15. The machine-readable medium of claim 14, wherein one or more neural networks are trained using unsupervised training with at least one cycle consistency constraint
16. The machine-readable medium of claim 15, wherein the cycle consistency constraint includes generating, from a frame triplet, a set of intermediate frames and generating a version of a middle triplet frame from the intermediate frames for determining a loss value to be minimized
17. The machine-readable medium of claim 14, wherein the one or more neural networks are refined using pseudo-supervised training for a domain other than was used for training the one or more neural networks
18. The machine-readable medium of claim 17, wherein the pseudo- supervised training includes generating, using one or more already trained neural networks, versions of an intermediate frame using each of two adjacent video frames for determining a loss value to be minimized
19. The machine-readable medium of claim 14, wherein the one or more neural networks utilize one or more image interpolation algorithms
20. A processor comprising: one or more arithmetic logic units (ALUs) to train one or more neural networks, at least in part, to generate higher frame rate video from lower frame rate video
21. The processor of claim 20, wherein one or more neural networks are trained using unsupervised training with at least one cycle consistency constraint
22. The processor of claim 21, wherein the cycle consistency constraint includes generating, from a frame triplet, a set of intermediate frames and generating a version of a middle triplet frame from the intermediate frames for determining a loss value to be minimized
23. The processor of claim 20, wherein the one or more neural networks are refined using pseudo-supervised training for a domain other than was used for training the one or more neural networks
24. The processor of claim 23, wherein the pseudo-supervised training includes generating, using one or more already trained neural networks, versions of an intermediate frame using each of two adjacent video frames for determining a loss value to be minimized
25. The processor of claim 20, wherein the one or more neural networks utilize one or more image interpolation algorithms
26. A system comprising: one or more processors to calculate parameters corresponding to one or more neural networks, at least in part, to generate higher frame rate video from lower frame rate video; and one or more memories to store the parameters
27. The system of claim 26, wherein one or more neural networks are trained using unsupervised training with at least one cycle consistency constraint
28. The system of claim 27, wherein the cycle consistency constraint includes generating, from a frame triplet, a set of intermediate frames and generating a version of a middle triplet frame from the intermediate frames for determining a loss value to be minimized
29. The system of claim 26, wherein the one or more neural networks are refined using pseudo-supervised training for a domain other than was used for training the one or more neural networks
30. The system of claim 29, wherein the pseudo-supervised training includes generating, using one or more already trained neural networks, versions of an intermediate frame using each of two adjacent video frames for determining a loss value to be minimized
31. The system of claim 26, wherein the one or more neural networks utilize one or more image interpolation algorithms
32. A machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: cause one or more neural networks to be trained, at least in part, to generate higher frame rate video from lower frame rate video; and one or more memories to store the parameters
33. The machine-readable medium of claim 32, wherein one or more neural networks are trained using unsupervised training with at least one cycle consistency constraint
34. The machine-readable medium of claim 33, wherein the cycle consistency constraint includes generating, from a frame triplet, a set of intermediate frames and generating a version of a middle triplet frame from the intermediate frames for determining a loss value to be minimized
35. The machine-readable medium of claim 32, wherein the one or more neural networks are refined using pseudo-supervised training for a domain other than was used for training the one or more neural networks
36. The machine-readable medium of claim 35, wherein the pseudo- supervised training includes generating, using one or more already trained neural networks, versions of an intermediate frame using each of two adjacent video frames for determining a loss value to be minimized .
37. The machine-readable medium of claim 32, wherein the one or more neural networks utilize one or more image interpolation algorithms.
GB2201524.2A 2019-09-03 2020-08-19 Video interpolation using one or more neural networks Pending GB2600359A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/559,312 US20210067735A1 (en) 2019-09-03 2019-09-03 Video interpolation using one or more neural networks
PCT/US2020/046978 WO2021045904A1 (en) 2019-09-03 2020-08-19 Video interpolation using one or more neural networks

Publications (1)

Publication Number Publication Date
GB2600359A true GB2600359A (en) 2022-04-27

Family

ID=72292682

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2201524.2A Pending GB2600359A (en) 2019-09-03 2020-08-19 Video interpolation using one or more neural networks

Country Status (5)

Country Link
US (1) US20210067735A1 (en)
CN (1) CN114303160A (en)
DE (1) DE112020003165T5 (en)
GB (1) GB2600359A (en)
WO (1) WO2021045904A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI729826B (en) * 2020-05-26 2021-06-01 友達光電股份有限公司 Display method
US11763544B2 (en) 2020-07-07 2023-09-19 International Business Machines Corporation Denoising autoencoder image captioning
US11651522B2 (en) * 2020-07-08 2023-05-16 International Business Machines Corporation Adaptive cycle consistency multimodal image captioning
JP2022046219A (en) * 2020-09-10 2022-03-23 キヤノン株式会社 Image processing method, image processing device, image processing program, learning method, learning device and learning program
CN113891027B (en) * 2021-12-06 2022-03-15 深圳思谋信息科技有限公司 Video frame insertion model training method and device, computer equipment and storage medium
CN114782497B (en) * 2022-06-20 2022-09-27 中国科学院自动化研究所 Motion function analysis method and electronic device
US20240037150A1 (en) * 2022-08-01 2024-02-01 Qualcomm Incorporated Scheduling optimization in sequence space

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109068174A (en) * 2018-09-12 2018-12-21 上海交通大学 Video frame rate upconversion method and system based on cyclic convolution neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109068174A (en) * 2018-09-12 2018-12-21 上海交通大学 Video frame rate upconversion method and system based on cyclic convolution neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Anonymous, "GitHub - NVIDIA/unsupervised-video-interpolation: Unsupervised Video Interpolation using Cycle Consistency", (20200131), URL: https://github.com/NVIDIA/unsupervised-video-interpolation, (20201103), XP055746680 [XP] 1-37 * page 1 - page 9 * *
Anonymous, "Volta (microarchitecture) - Wikipedia", (20190716), URL: https://en.wikipedia.org/w/index.php?title=Volta_(microarchitecture)&oldid=906608016, (20201104), XP055746770 [A] 1-37 * page 1 - page 2 * *
FITSUM A. REDA; DEQING SUN; AYSEGUL DUNDAR; MOHAMMAD SHOEYBI; GUILIN LIU; KEVIN J. SHIH; ANDREW TAO; JAN KAUTZ; BRYAN CATANZARO: "Unsupervised Video Interpolation Using Cycle Consistency", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 13 June 2019 (2019-06-13), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081381443 *

Also Published As

Publication number Publication date
CN114303160A (en) 2022-04-08
DE112020003165T5 (en) 2022-03-31
US20210067735A1 (en) 2021-03-04
WO2021045904A1 (en) 2021-03-11

Similar Documents

Publication Publication Date Title
GB2600359A (en) Video interpolation using one or more neural networks
GB2600346A (en) Video upsampling using one or more neural networks
GB2600869A (en) Video prediction using one or more neural networks
PH12020552224A1 (en) MULTIPLE HISTORY BASED NON-ADJACENT MVPs FOR WAVEFRONT PROCESSING OF VIDEO CODING
BR112019023395A2 (en) LOW LATENCY MATRIX MULTIPLICATION UNIT
US11100192B2 (en) Apparatus and methods for vector operations
WO2009042101A3 (en) Processing an input image to reduce compression-related artifacts
NO20084862L (en) Bandwidth enhancement for 3D display
GB2583243A (en) Blockchain validation system
NZ608999A (en) Composite video streaming using stateless compression
RU2018110382A (en) REPRODUCING AUGMENTATION OF IMAGE DATA
CN101843483A (en) Method and device for realizing water-fat separation
BR9507794A (en) Apparatus for decoding a video device sequence to produce images partially dependent on the input of the interactive user device to a primarily intended data carrier device to receive a data model method to transform an image source method to generate an apparatus field to transform an image source method to generate a field and method to reduce the effects of misalignment
JP2017535142A5 (en)
US11568524B2 (en) Tunable models for changing faces in images
Johnson et al. Motion correction in MRI using deep learning
MX2021013065A (en) Global motion constrained motion vector in inter prediction.
GB2606066A (en) Training one or more neural networks using synthetic data
CN109190619A (en) A kind of Image Description Methods based on target exposure mask
GB2606060A (en) Upsampling an image using one or more neural networks
KR102654862B1 (en) Apparatus and Method of processing image
GB2600896A (en) Image generation using one or more neural networks
GB2600300A (en) Image generation using one or more neural networks
SG10201902668XA (en) Synchronizing video outputs towards a single display frequency
US11734557B2 (en) Neural network with frozen nodes