CN110647936B

CN110647936B - Training method and device for video super-resolution reconstruction model and electronic equipment

Info

Publication number: CN110647936B
Application number: CN201910898183.6A
Authority: CN
Inventors: 李超; 丁予康; 何栋梁; 刘霄; 张赫男; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2023-07-04
Anticipated expiration: 2039-09-20
Also published as: CN110647936A

Abstract

The application discloses a training method and device for a video super-resolution reconstruction model and electronic equipment, and relates to the technical field of computer vision. The specific implementation scheme is as follows: acquiring a plurality of sample data, wherein each sample data comprises a sample video with a first resolution and a sample video with a second resolution, and the second resolution is larger than the first resolution; establishing a video super-resolution reconstruction model, wherein the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, the input of a first level of neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of a non-first level receive characteristic data output by a neural network sub-module of a previous level; and training the neural network sub-modules at each level by adopting the plurality of sample data. The method and the device are favorable for processing the dynamic blurring in the video, and the improvement of the visual effect is favorable.

Description

Training method and device for video super-resolution reconstruction model and electronic equipment

Technical Field

The present application relates to the field of computer technology, and in particular, to the field of computer vision.

Background

Existing super-resolution reconstruction techniques are mainly focused on applications on images. Although the image super-resolution technology can be directly applied to each frame of the video, the effect of performing super-resolution reconstruction on the video frame by frame is general, the motion blur phenomenon in the video cannot be solved, and the visual experience of a user is poor.

Disclosure of Invention

The embodiment of the application provides a training method and device for a video super-resolution reconstruction model and electronic equipment, and aims to solve one or more technical problems in the prior art.

In a first aspect, an embodiment of the present application provides a training method for a video super-resolution reconstruction model, including:

acquiring a plurality of sample data, wherein each sample data comprises a sample video with a first resolution and a sample video with a second resolution, and the second resolution is larger than the first resolution;

establishing a video super-resolution reconstruction model, wherein the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, the input of a first level of neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of a non-first level receive characteristic data output by a neural network sub-module of a previous level;

the neural network sub-modules of each level are trained using a plurality of sample data.

Based on the training method of the embodiment, a video super-resolution reconstruction model of the multi-stage neural network is obtained. The super-resolution reconstruction is directly carried out on the video through the model, so that the time domain information in the video segment can be accurately modeled, and the visual effect of the super-resolution of the video is improved.

In one embodiment, training each level of neural network submodule with a plurality of sample data includes: taking the sample video with the first resolution as the input of a video super-resolution reconstruction model, respectively supervising the characteristic data output by the neural network sub-modules of each level according to the sample video with the second resolution, and training the neural network sub-modules of each level step by step.

In the above embodiment, the sample video with the first resolution is used as input, the sample video with the second resolution is used as a supervisory signal, and a progressive supervisory training method is adopted for the multi-level neural network sub-module, so that the visual effect of the super resolution is remarkably improved.

In one embodiment, monitoring the feature data output by each level of neural network sub-module according to the second resolution sample video, respectively, includes: for each level of neural network sub-module, generating an output video according to the characteristic data output by the neural network sub-module; and supervising and outputting the video according to the sample video with the second resolution.

In one embodiment, generating an output video from the feature data output by the neural network sub-module includes: and generating an output video with a second resolution through up-sampling processing according to the characteristic data output by the neural network sub-module.

In the above embodiment, based on the feature data output by the neural network sub-module, the output video with the second resolution is obtained through up-sampling processing, and compared with the video sample with the second resolution, so that the training effect of the neural network sub-module can be obtained.

In one embodiment, after training the neural network sub-modules of each level using the plurality of sample data, the method further comprises:

determining a combination of weights for the neural network sub-modules at each level;

and under the combination of the weights, retraining the video super-resolution reconstruction model by adopting a plurality of sample data.

In the above embodiment, after the neural network sub-modules of each level are trained step by step, the video super-resolution reconstruction model is integrally trained on the basis of the step by step training, so that the visual effect of super-resolution is remarkably improved.

In one embodiment, the feature data includes temporal information and spatial information for the pixel points.

In a second aspect, an embodiment of the present application provides a method for reconstructing super-resolution of video, including:

receiving an original video to be super-resolved;

inputting an original video into a video super-resolution reconstruction model;

acquiring a video output by a video super-resolution reconstruction model, and taking the video as a video after super-resolution;

the video super-resolution reconstruction model comprises a plurality of levels of nerve network sub-modules, wherein the input of a nerve network sub-module of a first level is used as the input of the video super-resolution reconstruction model, and the nerve network sub-module of a non-first level receives the characteristic data output by the nerve network sub-module of the upper level.

In the embodiment, the super-resolution reconstruction is directly performed on the video by adopting the model of the multi-stage neural network, which is beneficial to accurately modeling the time domain information in the video segment, thereby improving the visual effect of the super-resolution of the video.

In one embodiment, each level of neural network submodule is trained on a plurality of sample data, each sample data including a sample video at a first resolution and a sample video at a second resolution, the second resolution being greater than the first resolution.

In a third aspect, an embodiment of the present application provides a training device for a video super-resolution reconstruction model, including:

a sample data determining unit configured to acquire a plurality of sample data, each sample data including a sample video of a first resolution and a sample video of a second resolution, the second resolution being greater than the first resolution;

the model building unit is used for building a video super-resolution reconstruction model, the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, the input of a first level of neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of the non-first level receive characteristic data output by the neural network sub-module of the previous level;

and the training unit is used for training the neural network submodules at all levels by adopting a plurality of sample data.

In one embodiment, the training unit comprises:

an input subunit, configured to take a sample video with a first resolution as an input of a video super-resolution reconstruction model;

and the monitoring sub-unit is used for respectively monitoring the characteristic data output by the neural network sub-modules of each level according to the sample video of the second resolution and carrying out step-by-step training on the neural network sub-modules of each level.

In one embodiment, the method further comprises:

a weight determining unit for determining a combination of weights of the neural network sub-modules of the respective levels;

and the retraining unit is used for retraining the video super-resolution reconstruction model by adopting a plurality of sample data under the combination of the weights.

In a fourth aspect, an embodiment of the present application further provides a device for reconstructing super-resolution of video, including:

the original video receiving unit is used for receiving an original video with super resolution to be detected;

the original video input unit is used for inputting the original video into the video super-resolution reconstruction model;

the super-resolution video acquisition unit is used for acquiring a video output by the video super-resolution reconstruction model and taking the video as a video after super resolution;

the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, wherein the input of a first level of the neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of non-first level receive characteristic data output by the neural network sub-module of the previous level; wherein the characteristic data includes temporal information and spatial information of the pixel points.

In a fifth aspect, an embodiment of the present application provides an electronic device, where a function of the electronic device may be implemented by hardware, or may be implemented by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In one possible design, the structure of the electronic device includes a processor and a memory, where the memory is used for storing a program for supporting the electronic device to execute the training method of the video super-resolution reconstruction model or the method of video super-resolution reconstruction, and the processor is configured to execute the program stored in the memory. The electronic device may also include a communication interface for communicating with other devices or communication networks.

In a sixth aspect, embodiments of the present application provide a non-transitory computer readable storage medium storing computer instructions for storing computer software instructions for an electronic device, including a program for executing the training method of the video super-resolution reconstruction model or the method of video super-resolution reconstruction described above.

One embodiment of the above application has the following advantages or benefits: and improving the effect of super-resolution reconstruction. Because the multi-level neural network and the spatial up-sampling supervision module are adopted, super-resolution processing is carried out on the input video segment, thereby being beneficial to processing the dynamic blur in the video and improving the visual effect.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flow chart of a training method of a video super-resolution reconstruction model according to a first embodiment of the present application;

fig. 2 is a schematic flowchart after step S13 in a training method of a video super-resolution reconstruction model according to a first embodiment of the present application;

fig. 3 is a diagram showing a structural example of a video super-resolution reconstruction model according to a training method of a video super-resolution reconstruction model according to a first embodiment of the present application;

fig. 4 is a diagram showing a structural example at the time of training of a video super-resolution reconstruction model according to a training method of a video super-resolution reconstruction model according to a first embodiment of the present application;

fig. 5 is a flow chart of a method for video super-resolution reconstruction according to a second embodiment of the present application;

FIG. 6 is a block diagram of a training apparatus for a video super-resolution reconstruction model according to a third embodiment of the present application;

fig. 7 is a block diagram of the training unit 63 of the training apparatus for video super-resolution reconstruction model according to the third embodiment of the present application;

FIG. 8 is a block diagram of another implementation of a training apparatus for a video super-resolution reconstruction model according to a third embodiment of the present application;

fig. 9 is a block diagram of an apparatus for video super-resolution reconstruction according to a fourth embodiment of the present application;

fig. 10 is a block diagram of an electronic device for implementing a method of video super-resolution reconstruction of a training method of a video super-resolution reconstruction model of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows a flowchart of a training method of a video super-resolution reconstruction model according to an embodiment of the present application, including:

s11, acquiring a plurality of sample data, wherein each sample data comprises a sample video with a first resolution and a sample video with a second resolution, and the second resolution is larger than the first resolution;

s12, establishing a video super-resolution reconstruction model, wherein the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, the input of a first level of neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of the non-first level receive characteristic data output by the neural network sub-module of the upper level;

and S13, training the neural network sub-modules of all levels by adopting a plurality of sample data.

In the embodiment, the super-resolution reconstruction is directly carried out on the video by adopting the model of the multi-stage neural network, which is beneficial to accurately modeling the time domain information in the video segment, thereby improving the visual effect of the super-resolution of the video.

The resolution of video is a parameter used to measure how much data is in an image, typically expressed as ppi (pixels per inch). In general, 320X180 in the adjacency of video refers to its effective pixel in the lateral and longitudinal directions, and resolution in the strict sense refers to the effective pixel value ppi per unit length.

Super-Resolution reconstruction (Super-Resolution) is a process of improving the Resolution of an original image by a hardware or software method and obtaining a high-Resolution image by a series of low-Resolution images. May be simply referred to as super resolution.

In one embodiment, the characteristic data output by the neural network submodule may be output in the form of a tensor. Tensors are a multi-linear function that can be used to represent the linear relationship between some vectors, scalar quantities, and other tensors.

In one embodiment, the neural network sub-module further includes an upsampling processing module, configured to generate an output video with a target resolution through upsampling processing according to the feature data output by the neural network sub-module of the last level.

The target resolution is greater than the resolution of the video input by the video super-resolution reconstruction model. For example, training with video samples of a second resolution as supervisory signals, the target resolution is the second resolution. In the upsampling process, the height and width of the upsampled video are enlarged to coincide with the height and width of the video samples of the second resolution by the upsampling process. The height indicates the number of pixels in the vertical direction, and the width indicates the number of pixels in the horizontal direction.

In one embodiment, step S13 includes: taking the sample video with the first resolution as the input of a video super-resolution reconstruction model, respectively supervising the characteristic data output by the neural network sub-modules of each level according to the sample video with the second resolution, and training the neural network sub-modules of each level step by step. At this time, the second resolution sample video is taken as the supervisory signal of the neural network sub-module of each level, and it can be understood that the second resolution sample video is the expected output value of the neural network sub-module of each level.

In the embodiment, progressive multi-level supervised training is adopted, each neural network sub-module in the multi-level neural network is trained step by step, the output effect of the neural network sub-module of each level is improved, and compared with a method for reconstructing super-resolution based on image frames, the visual effect of the super-resolution of the model is improved.

In one embodiment, generating an output video from the feature data output by the neural network sub-module includes: and carrying out up-sampling processing on the characteristic data according to the characteristic data output by the neural network sub-module to generate an output video with a second resolution.

In one example, when training the video super-resolution reconstruction model, a plurality of supervision sub-modules may be set in the video super-resolution reconstruction model, where each supervision sub-module is correspondingly connected to a neural network sub-module, and each supervision sub-module is set to train the neural network sub-module according to feature data output by the neural network sub-module correspondingly connected to the sample video supervision of the second resolution.

In one example, each supervision sub-module is configured as an upsampling supervision sub-module configured to upsample the feature data according to the feature data output by the neural network sub-module, generate an output video at a second resolution, and supervise the neural network sub-module to output the video according to the sample video at the second resolution.

Through the upsampling process, the height of the upsampled video is enlarged to be identical to the height of the video sample of the second resolution, and the width of the upsampled video is enlarged to be identical to the width of the video sample of the second resolution. The height indicates the number of pixels in the vertical direction, and the width indicates the number of pixels in the horizontal direction.

In one embodiment, step S13 includes: and training the neural network submodules of each level by adopting a plurality of sample data until the training returns of the neural network submodules of each level are converged.

In one embodiment, referring to fig. 2, after step S13, the method further includes:

s21, determining the combination of the weights of the neural network sub-modules at each level.

S22, under the combination of the weights, training the video super-resolution reconstruction model again by adopting a plurality of sample data.

The above weights can be understood as: the video generated from the output characteristics of the neural network sub-module can reach the desired output value of the super-resolution reconstruction model (e.g., with the video samples of the second resolution as the desired output value). For example, the weight of the neural network sub-module of the first level is set to 20% and the weight of the neural network sub-module of the second level is set to 40% in order.

In one example, corresponding weights are input in each supervision sub-module, and the supervision sub-modules of the neural network at each level are supervised by the supervision sub-module in combination with the weights and the video samples at the second resolution.

Referring to fig. 3, fig. 3 shows an exemplary diagram of the structure of a video super-resolution reconstruction model. Specific: the video super-resolution reconstruction model comprises a first neural network sub-module, a second neural network sub-module, … … and an nth neural network sub-module, wherein n is a positive integer, and n is more than or equal to 3. Further, the output end of the last neural network sub-module (i.e. the nth neural network sub-module shown in fig. 3) in the data flow direction is connected with an up-sampling module, and the up-sampling module is configured to up-sample the feature data according to the feature data output by the nth neural network sub-module, so as to generate a video with a second resolution; and outputting the video with the second resolution, and outputting the video with the super resolution.

Referring to fig. 4, fig. 4 is a diagram illustrating a structure of the super-resolution reconstruction model shown in fig. 3 when training. As shown, n up-sampling supervision sub-modules are provided in one-to-one connection with n neural network sub-modules to supervise the respective neural network sub-modules. The up-sampling supervision sub-module generates an output video with a second resolution through up-sampling processing according to the characteristic data output by the neural network sub-module, and supervises the neural network sub-module to output the video according to the sample video with the second resolution.

Fig. 5 shows a flowchart of a method for providing video super-resolution reconstruction according to a second embodiment of the present application, the method comprising:

s51, receiving an original video with super resolution to be obtained; super resolution in this embodiment is super resolution reconstruction;

s52, inputting the original video into a video super-resolution reconstruction model;

and S53, acquiring a video output by the video super-resolution reconstruction model, and taking the video as a video after super-resolution.

The training process of the video super-resolution reconstruction model may refer to the content of the training method of the video super-resolution reconstruction model provided in the previous embodiment, which is not described herein.

Fig. 6 shows a training device 6 for providing a video super-resolution reconstruction model according to a third embodiment of the present application. Referring to fig. 6, the apparatus includes:

a sample data determining unit 61 for acquiring a plurality of sample data, each sample data including a sample video of a first resolution and a sample video of a second resolution, the second resolution being greater than the first resolution;

a model building unit 62, configured to build a video super-resolution reconstruction model, where the video super-resolution reconstruction model includes a plurality of levels of neural network sub-modules, and an input of a neural network sub-module of a first level is used as an input of the video super-resolution reconstruction model, and a neural network sub-module of a non-first level receives feature data output from a neural network sub-module of a previous level;

the training unit 63 is configured to train the neural network sub-modules of the respective levels by using the plurality of sample data.

In one embodiment, referring to fig. 7, the training unit 63 includes:

an input subunit 71, configured to take the sample video of the first resolution as an input of a video super-resolution reconstruction model;

and the supervision subunit 72 is configured to respectively supervise the feature data output by the neural network sub-modules at each level according to the sample video of the second resolution, and perform step-by-step training on the neural network sub-modules at each level.

In one embodiment, referring to fig. 8, the training device 6 of the video super-resolution reconstruction model further includes:

a weight determining unit 81 for determining a combination of weights of the neural network sub-modules of the respective levels;

and the retraining unit 82 is configured to retrain the video super-resolution reconstruction model with a plurality of sample data under the combination of weights.

Fig. 9 shows a schematic structural diagram of an apparatus 9 for video super-resolution reconstruction according to a fourth embodiment of the present application. Referring to fig. 9, the apparatus includes:

an original video receiving unit 91 for receiving an original video to be super-resolved;

an original video input unit 92 for inputting an original video into the video super-resolution reconstruction model;

a super-resolution video acquisition unit 93, configured to acquire a video output by the video super-resolution reconstruction model as a super-resolution video;

The functions of each module in each device of the embodiments of the present invention may be referred to the corresponding descriptions in the above methods, and are not described herein again.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 10, a block diagram of an electronic device implementing a training method of a video super-resolution reconstruction model or a method of video super-resolution reconstruction according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 10, the electronic device includes: one or more processors 1001, memory 1002, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of a graphical user interface (Graphical User Interface, GUI) on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 1001 is illustrated in fig. 10.

Memory 1002 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the training method of the video super-resolution reconstruction model or the method of video super-resolution reconstruction provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the training method of the video super-resolution reconstruction model or the method of video super-resolution reconstruction provided by the present application.

The memory 1002 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules (e.g., the sample data determining unit 61, the model creating unit 62, and the training unit 63 shown in fig. 6) corresponding to the training method of the video super-resolution reconstruction model or the method of video super-resolution reconstruction in the embodiments of the present application. The processor 1001 executes various functional applications of the server and data processing, that is, a training method for implementing a video super-resolution reconstruction model or a method for video super-resolution reconstruction in the above-described method embodiment by executing a non-transitory software program, instructions, and modules stored in the memory 1002.

Memory 1002 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of an electronic device implementing a training method of a video super-resolution reconstruction model or a method of video super-resolution reconstruction, or the like. In addition, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 1002 may optionally include memory remotely located with respect to the processor 1001, which may be connected via a network to an electronic device implementing a training method of the video super-resolution reconstruction model or a method of video super-resolution reconstruction. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The training method for realizing the video super-resolution reconstruction model or the method for realizing the video super-resolution reconstruction can further comprise the following steps: an input device 1003 and an output device 1004. The processor 1001, memory 1002, input device 1003, and output device 1004 may be connected by a bus or other means, for example by a bus connection in fig. 10.

The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing a training method of a video super-resolution reconstruction model or a method of video super-resolution reconstruction, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a joystick, one or more mouse buttons, a track ball, a joystick, or the like. The output means 1004 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, application specific integrated circuits (Application Specific Integrated Circuits, ASIC), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (programmable logic device, PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the multi-level neural network and the spatial up-sampling supervision module are adopted to perform super-resolution processing on the input video segments, and the corresponding super-resolution video segments are directly output, so that the dynamic blurring in the video is processed, and the visual effect is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. The training method of the video super-resolution reconstruction model is characterized by comprising the following steps of:

obtaining a plurality of sample data, each sample data comprising a sample video of a first resolution and a sample video of a second resolution, the second resolution being greater than the first resolution;

establishing a video super-resolution reconstruction model, wherein the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, the input of a first level of neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of non-first level receive characteristic data output by the neural network sub-module of the previous level; the characteristic data comprise time information and space information of pixel points; setting a plurality of supervision sub-modules in the video super-resolution reconstruction model, wherein each supervision sub-module is correspondingly connected with one neural network sub-module, and each supervision sub-module is set to train the neural network sub-module according to the characteristic data output by the neural network sub-module correspondingly connected with the sample video supervision of the second resolution;

training the neural network sub-modules of each level by adopting the plurality of sample data;

determining a combination of weights for the neural network sub-modules for each level; the weight of the neural network sub-module is the degree that the video generated according to the output characteristics of the neural network sub-module can reach the expected output value of the video super-resolution reconstruction model;

and under the combination of the weights, retraining the video super-resolution reconstruction model by adopting the plurality of sample data.

2. The method of claim 1, wherein the training the neural network sub-module at each level using the plurality of sample data, respectively, comprises: taking the sample video with the first resolution as the input of the video super-resolution reconstruction model, respectively supervising the characteristic data output by the neural network sub-modules of each level according to the sample video with the second resolution, and training the neural network sub-modules of each level step by step.

3. The method of claim 2, wherein the monitoring feature data output by the neural network sub-modules at respective levels according to the second resolution sample video comprises: for each level of the neural network sub-module, generating an output video according to the characteristic data output by the neural network sub-module; and supervising the output video according to the sample video with the second resolution.

4. The method of claim 3, wherein the generating an output video from the feature data output by the neural network sub-module comprises: and generating the output video with the second resolution through up-sampling processing according to the characteristic data output by the neural network sub-module.

5. A method for video super-resolution reconstruction, comprising:

receiving an original video to be super-resolved;

inputting the original video into a video super-resolution reconstruction model;

acquiring a video output by the video super-resolution reconstruction model, and taking the video as a video after super-resolution;

the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, wherein the input of a first level of the neural network sub-module is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules of non-first level receive characteristic data output by the neural network sub-module of the previous level; the characteristic data comprise time information and space information of pixel points; a plurality of supervision sub-modules are arranged in the video super-resolution reconstruction model, each supervision sub-module is correspondingly connected with one neural network sub-module, and each supervision sub-module is arranged to train the neural network sub-module according to characteristic data output by the neural network sub-module correspondingly connected with the sample video supervision of the second resolution;

after training the neural network sub-modules of each level through a plurality of sample data, retraining the video super-resolution reconstruction model through the plurality of sample data under the combination of weights of the neural network sub-modules of each level, wherein the weights of the neural network sub-modules are the degree that the video generated according to the output characteristics of the neural network sub-modules can reach the expected output value of the video super-resolution reconstruction model;

each of the sample data includes a sample video of a first resolution and a sample video of the second resolution, the second resolution being greater than the first resolution.

6. A training device for a video super-resolution reconstruction model, comprising:

a sample data determining unit configured to acquire a plurality of sample data, each of the sample data including a sample video of a first resolution and a sample video of a second resolution, the second resolution being greater than the first resolution;

the model building unit is used for building a video super-resolution reconstruction model, the video super-resolution reconstruction model comprises a plurality of levels of neural network sub-modules, the input of the neural network sub-module at a first level is used as the input of the video super-resolution reconstruction model, and the neural network sub-modules at non-first level receive characteristic data output by the neural network sub-module at the previous level; the characteristic data comprise time information and space information of pixel points; setting a plurality of supervision sub-modules in the video super-resolution reconstruction model, wherein each supervision sub-module is correspondingly connected with one neural network sub-module, and each supervision sub-module is set to train the neural network sub-module according to the characteristic data output by the neural network sub-module correspondingly connected with the sample video supervision of the second resolution;

the training unit is used for training the neural network sub-modules at all levels by adopting the plurality of sample data;

a weight determination unit for determining a combination of weights of the neural network sub-modules of the respective levels; the weight of the neural network sub-module is the degree that the video generated according to the output characteristics of the neural network sub-module can reach the expected output value of the video super-resolution reconstruction model;

and the retraining unit is used for retraining the video super-resolution reconstruction model by adopting the plurality of sample data under the combination of the weights.

7. The apparatus of claim 6, wherein the training unit comprises:

an input subunit, configured to take the sample video of the first resolution as an input of the video super-resolution reconstruction model;

and the supervision subunit is used for respectively supervising the characteristic data output by the neural network sub-modules at each level according to the sample video with the second resolution and carrying out step-by-step training on the neural network sub-modules at each level.

8. An apparatus for video super-resolution reconstruction, comprising:

the original video input unit is used for inputting the original video into a video super-resolution reconstruction model;

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.