CN113596576A - Video super-resolution method and device - Google Patents

Video super-resolution method and device Download PDF

Info

Publication number
CN113596576A
CN113596576A CN202110823352.7A CN202110823352A CN113596576A CN 113596576 A CN113596576 A CN 113596576A CN 202110823352 A CN202110823352 A CN 202110823352A CN 113596576 A CN113596576 A CN 113596576A
Authority
CN
China
Prior art keywords
processed
image
video
video sub
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110823352.7A
Other languages
Chinese (zh)
Inventor
周琛晖
阮良
陈功
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Zhiqi Technology Co Ltd
Original Assignee
Hangzhou Netease Zhiqi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Zhiqi Technology Co Ltd filed Critical Hangzhou Netease Zhiqi Technology Co Ltd
Priority to CN202110823352.7A priority Critical patent/CN113596576A/en
Publication of CN113596576A publication Critical patent/CN113596576A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0135Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Processing (AREA)

Abstract

The application relates to the field of computer vision, and provides a method and a device for video super-resolution, which are used for solving the problem that real-time communication RTC video images cannot be processed in real time. The method comprises the following steps: dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed, and respectively determining the texture category of each video sub-image to be processed; performing corresponding super-resolution processing on each video subimage to be processed according to the texture category of each video subimage to be processed to obtain a corresponding super-resolution video subimage; and then, splicing all the obtained hyper-resolution video sub-images to obtain corresponding target video images. And performing corresponding super-resolution processing on each video sub-image to be processed according to the texture type of each video sub-image to be processed, so that the processing time of the video super-resolution can be reduced, the processing speed of the video super-resolution is improved, and the real-time requirement of an RTC scene is met.

Description

Video super-resolution method and device
Technical Field
The application relates to the field of computer vision, and provides a method and a device for video super-resolution.
Background
In recent years, with the rapid development of deep learning technology, super-resolution technology has also shown wide application prospects in the fields of image restoration, image enhancement and the like.
Particularly, the video super-resolution method based on the deep convolutional network applied to the field of Real Time Communication (RTC) can solve the problem of serious video distortion caused by decoding operation to a certain extent and recover details of an image well, but the deep convolutional network in the method contains thousands of model parameters, and when the video image in an RTC scene is processed by using the deep convolutional network, more processing Time is needed, and the Real-Time requirement of the RTC scene cannot be met.
Disclosure of Invention
The embodiment of the application provides a method and a device for super-resolution of videos, which aim to solve the problem that RTC video images cannot be processed in real time.
In a first aspect, an embodiment of the present application provides a method for super-resolution of a video, including:
dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed, and respectively determining the texture category of each video sub-image to be processed;
performing corresponding super-resolution processing on each video sub-image to be processed according to the texture category of each video sub-image to be processed to obtain a corresponding super-resolution video sub-image;
and splicing the obtained hyper-resolution video sub-images to obtain corresponding target video images.
Optionally, the performing texture classification processing on the multiple to-be-processed video sub-images to respectively determine the texture category of each to-be-processed video sub-image includes:
respectively extracting the characteristics of the plurality of video sub-images to be processed to obtain respective corresponding texture characteristic graphs;
respectively carrying out normalization processing on each texture feature map to obtain the texture type prediction rate of each to-be-processed video sub-image;
for each to-be-processed video subimage, respectively executing the following operations: and determining the texture class to which one video sub-image to be processed belongs according to the comparison result between the texture class prediction rate of the one video sub-image to be processed and a plurality of set thresholds.
Optionally, the processing method further includes performing corresponding super-resolution processing on each to-be-processed video sub-image according to the texture category of each to-be-processed video sub-image to obtain a corresponding super-resolution video sub-image, where any one of the following operations is performed on one to-be-processed video sub-image:
performing first super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding first super-resolution video sub-image;
and performing second super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding second super-resolution video sub-image.
Optionally, the performing the first super-resolution processing on the to-be-processed video image to obtain a corresponding first super-resolution video sub-image includes:
performing interpolation processing on the video subimage to be processed to obtain a corresponding first super-resolution video subimage; and the pixel density of the video sub-image to be processed is smaller than that of the corresponding first super-resolution video sub-image.
Optionally, the interpolation processing is any one of nearest neighbor interpolation, linear interpolation, bilinear interpolation, and bicubic interpolation.
Optionally, the performing second super-resolution processing on the to-be-processed video sub-image to obtain a corresponding second super-resolution video sub-image includes:
performing feature extraction on the video subimage to be processed to obtain a corresponding image feature map set;
and performing pixel rearrangement processing on each initial pixel point in the image feature map set to obtain a corresponding second super-resolution video sub-image, so that a plurality of initial pixel points at the same position in each image feature map are mapped to be a target pixel point on the second super-resolution video sub-image.
In a second aspect, an embodiment of the present application further provides an apparatus for video super-resolution, including:
the texture classification unit is used for dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed and respectively determining the texture category of each video sub-image to be processed;
the super-resolution processing unit is used for performing corresponding super-resolution processing on each to-be-processed video sub-image according to the texture category of each to-be-processed video sub-image to obtain a corresponding super-resolution video sub-image;
and the generating unit is used for splicing all the obtained hyper-resolution video sub-images to obtain corresponding target video images.
Optionally, the texture classifying unit is configured to:
respectively extracting the characteristics of the plurality of video sub-images to be processed to obtain respective corresponding texture characteristic graphs;
respectively carrying out normalization processing on each texture feature map to obtain the texture type prediction rate of each to-be-processed video sub-image;
for each to-be-processed video subimage, respectively executing the following operations: and determining the texture class to which one video sub-image to be processed belongs according to the comparison result between the texture class prediction rate of the one video sub-image to be processed and a plurality of set thresholds.
Optionally, the super-divide processing unit performs any one of the following operations for one to-be-processed video sub-image:
performing first super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding first super-resolution video sub-image;
and performing second super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding second super-resolution video sub-image.
Optionally, the super-divide processing unit is configured to:
performing interpolation processing on the video subimage to be processed to obtain a corresponding first super-resolution video subimage; and the pixel density of the video sub-image to be processed is smaller than that of the corresponding first super-resolution video sub-image.
Optionally, the interpolation processing is any one of nearest neighbor interpolation, linear interpolation, bilinear interpolation, and bicubic interpolation.
Optionally, the super-divide processing unit is configured to:
performing feature extraction on the video subimage to be processed to obtain a corresponding image feature map set;
and performing pixel rearrangement processing on each initial pixel point in the image feature map set to obtain a corresponding second super-resolution video sub-image, so that a plurality of initial pixel points at the same position in each image feature map are mapped to be a target pixel point on the second super-resolution video sub-image.
In a third aspect, an embodiment of the present application further provides a computer device, including a processor and a memory, where the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the steps of any one of the above-mentioned methods for video super-resolution.
In a fourth aspect, the present application further provides a computer-readable storage medium including program code for causing a computer device to perform any of the above-mentioned video super-resolution method steps when the program product runs on the computer device.
The beneficial effect of this application is as follows:
the embodiment of the application provides a method and a device for video super-resolution, wherein the method comprises the following steps: dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed, and respectively determining the texture category of each video sub-image to be processed; performing corresponding super-resolution processing on each video subimage to be processed according to the texture category of each video subimage to be processed to obtain a corresponding super-resolution video subimage; and then, splicing all the obtained hyper-resolution video sub-images to obtain corresponding target video images. Compared with the method for performing super-resolution processing on the whole video image, the method for performing super-resolution processing on the video image greatly reduces the calculation amount of the super-resolution network, reduces the processing time of the video super-resolution, improves the processing speed of the video super-resolution, and better meets the real-time requirement of the RTC scene according to the texture type of each to-be-processed video sub-image.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1a is a schematic flowchart of a video super-resolution method according to an embodiment of the present application;
FIG. 1b is a block diagram of a video image to be processed according to an embodiment of the present disclosure;
fig. 1c is a schematic flowchart of a texture classification process performed on a plurality of video sub-images to be processed according to an embodiment of the present application;
FIG. 1d is a schematic view of a sub-pixel provided in an embodiment of the present application;
FIG. 1e is a logic diagram illustrating the operation of the sub-pixel convolution layer according to the present embodiment;
FIG. 2a is a schematic flow chart diagram illustrating an embodiment of the present application;
FIG. 2b is a logic diagram of one embodiment of the present application;
FIG. 3 is a schematic structural diagram of an apparatus for super-resolution video in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a computer device in an embodiment of the present application;
fig. 5 is a schematic diagram of a hardware component of a computing device to which an embodiment of the present invention is applied.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments described herein are capable of operation in sequences other than those illustrated or described herein.
In recent years, with the rapid development of deep learning technology, super-resolution technology has also shown wide application prospects in the fields of image restoration, image enhancement and the like. The super-resolution technology is a process of executing a hardware or software method on a series of low-resolution images to improve the resolution of the original image and obtain a high-resolution image.
Particularly, the video super-resolution method based on the deep convolutional network applied to the RTC field can solve the problem of serious distortion of a video to a certain extent and recover details of an image well, but the deep convolutional network in the method contains thousands of model parameters, and when the video image in the RTC scene is processed by using the deep convolutional network, more processing time is needed, and the real-time requirement of the RTC scene cannot be met.
In view of this, the present application provides a new method and apparatus for super-resolution of video to solve the problem that RTC video images cannot be processed in real time.
Referring to the flow chart diagram shown in fig. 1a, a method for video super-resolution proposed in the embodiment of the present application is described.
S101: dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed, and respectively determining the texture category of each video sub-image to be processed.
As shown in fig. 1b, a video image to be processed includes both a simple picture with less texture information and a complex picture with more texture information. If the to-be-processed video image is directly input into the super-resolution network constructed based on the deep convolutional network, on one hand, more time is needed to process the to-be-processed video image because of too many model parameters contained in the deep convolutional network, and on the other hand, more processing time is needed because the super-resolution network performs complex super-resolution processing on a simple picture with less texture information.
Therefore, in the embodiment of the present application, a complete video image to be processed is divided into a plurality of video sub-images to be processed, and then the plurality of video sub-images to be processed are input into the texture classification network together for texture classification processing, so as to determine the texture class of each video sub-image to be processed.
Referring to the flow diagram shown in FIG. 1c, the operations performed within the texture classification network are as follows.
S1011: respectively extracting the characteristics of a plurality of video sub-images to be processed to obtain respective corresponding texture characteristic graphs;
s1012: respectively carrying out normalization processing on each texture feature map to obtain the texture type prediction rate of each to-be-processed video sub-image;
s1013: for each video subimage to be processed, the following operations are respectively executed: and determining the texture class to which the video sub-image to be processed belongs according to the comparison result between the texture class prediction rate of the video sub-image to be processed and a plurality of set thresholds.
In an embodiment of the present application, a texture classification network is composed of an input layer, an output layer, and a hidden layer. The input layer is used for reading a plurality of video sub-images to be processed, the hidden layer comprises a plurality of convolution layers, each convolution layer is provided with a corresponding convolution kernel, and the input video sub-images to be processed or a texture feature map output by the last convolution layer are subjected to feature extraction to obtain a new texture feature map;
the output layer comprises a full connection layer and a plurality of set thresholds, the full connection layer is used for carrying out normalization processing on the texture feature map output by the last convolution layer, the abstract feature map is mapped to a texture class prediction rate between (0,1), the plurality of set thresholds divide a plurality of threshold ranges, each threshold range corresponds to one texture class, therefore, the texture class prediction rate in which the texture class prediction rate is located is determined by comparing the size relation between the texture class prediction rate of one to-be-processed video sub-image and the plurality of set thresholds, and the texture class to which the to-be-processed video sub-image belongs is determined according to the mapping relation between the threshold range and the texture class.
S102: and performing corresponding super-resolution processing on each video sub-image to be processed according to the texture category of each video sub-image to be processed to obtain a corresponding super-resolution video sub-image.
In order to improve the processing efficiency and save the processing time, the embodiment of the application formulates corresponding super-resolution processing according to the characteristics of different texture classes.
For convenience of description, a process of performing super-resolution processing on a video sub-image to be processed is described by taking the video sub-image to be processed as an example.
Operation 1: and performing first super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding first super-resolution video sub-image.
If the texture type represents that the video sub-image to be processed is a simple image containing less texture information, the first super-resolution processing is executed, so that the complexity of the whole super-resolution network can be reduced, the model parameters for constructing the super-resolution network are reduced, the processing speed of the super-resolution network is increased, and the processing time is saved.
In the embodiment of the application, the to-be-processed video subimage can be subjected to interpolation processing, and can also be input into a simple super-resolution network with a small parameter amount for super-resolution processing.
The interpolation processing can be any one of nearest neighbor interpolation, linear interpolation, bilinear interpolation and bicubic interpolation, and no matter which interpolation processing is adopted, the essence of the interpolation processing is to improve the pixel density of the original image, so that the pixel density of the video sub-image to be processed is smaller than the pixel density of the corresponding first super-divided video sub-image. The greater the pixel density, i.e., the number of pixels per inch of screen, the more detail of the displayed picture.
The simple hyper-division Network may be a Neural Network constructed based on a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN), and includes at least an input layer, an output layer, and a hidden layer, where the hidden layer includes a plurality of common convolution layers and a sub-pixel convolution layer. And extracting the image characteristics of the video sub-image to be processed through a plurality of common convolution layers, and performing pixel rearrangement processing on each initial pixel point in the image characteristic image set through the sub-pixel convolution layer to obtain a second super-resolution video sub-image.
In the process of camera imaging, image data generated by discretizing an image obtained by shooting is limited by the capacity of the photosensitive element itself, and each actual physical pixel on an imaging surface only represents a nearby color, as shown in fig. 1d, there are numerous tiny pixels between any two adjacent actual physical pixels, and these numerous tiny pixels are called sub-pixels.
As shown in fig. 1e, the sub-pixel convolution layer is used to map a plurality of initial pixel points at the same position on the multi-channel image feature map set to a target pixel point on the first hyper-resolution video sub-image, and map the image feature map from a low resolution space to a high resolution space by a sub-pixel interpolation method, so as to improve the resolution of the original image and obtain a high resolution image.
Operation 2: and performing second super-resolution processing on the video subimage to be processed according to the texture category of the video subimage to be processed to obtain a corresponding second super-resolution video subimage.
And if the texture type represents that the video sub-image to be processed is a complex picture containing more texture information, calling a complex super-resolution network to perform second super-resolution processing. The complex hyper-division network is also a neural network constructed based on CNN or RNN, and at least comprises an input layer, an output layer and a hidden layer, wherein the hidden layer comprises a plurality of common convolution layers and a sub-pixel convolution layer. The complex hyper-division network comprises more parameters than the simple hyper-division network, so that the complex hyper-division network is adopted to process complex pictures, and a better picture effect can be obtained.
However, the parameter quantity contained in the complex hyper-division network is far less than that of the ultra-large hyper-division network used in the related technology, and the image data quantity processed by the complex hyper-division network is also less than that of the ultra-large hyper-division network, so that the complex hyper-division image is adopted to process the complex image, the complexity of the whole hyper-division network can be reduced, the model parameters for constructing the hyper-division network are reduced, the processing speed of the hyper-division network is improved, and the processing time is saved.
Specifically, a plurality of convolution layers in the complex hyper-division network perform feature extraction on an obtained video subimage to be processed to obtain a corresponding image feature map set; and then, performing pixel rearrangement processing on each initial pixel point in the image feature map set by utilizing a sub-pixel convolution layer in the network to obtain a corresponding second super-resolution video sub-image, so that a plurality of initial pixel points at the same position in each image feature map are mapped to be a target pixel point on the second super-resolution video sub-image.
S103: and splicing the obtained hyper-resolution video sub-images to obtain corresponding target video images.
After the super-resolution processing, the pixel density of each super-resolution video sub-image is increased, and the picture size of each super-resolution video sub-image is enlarged, so that the picture size of the target video image obtained by splicing is increased and the resolution of the target video image is improved compared with the original image.
In the embodiment of the application, two neural networks such as a texture classification network and a super-division network are used, in the training stage, a plurality of training samples are used for training the texture classification network, after the training of the texture classification network is finished, the texture classification network and the super-division network are optimized simultaneously based on a weak supervision learning mode, and the purpose of fine tuning and optimizing model parameters of the texture classification network is achieved.
The formula L-w 1-L1 + w 2-L2 represents the total loss value output in each training turn when two types of neural networks are optimized, and when the total loss value satisfies the iteration stop condition, the training is stopped, and the trained texture classification network and the trained hyper-classification network are output. Wherein, L is the total loss value, L1 is the classification loss function, L2 is the image hyper-resolution loss function, w1 is the weight of the classification loss function, and w2 is the weight of the image hyper-resolution loss function.
Referring to the flowchart shown in fig. 2a and the logic diagram shown in fig. 2b, a method for super-resolution of video according to an embodiment of the present application will be described by taking an embodiment as an example.
S201, dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed, and respectively determining the texture category of each video sub-image to be processed;
s202: according to the texture type of each video sub-image to be processed, carrying out interpolation processing on the video sub-images to be processed containing less texture information to obtain corresponding first super-resolution video sub-images; calling a complex super-division network to carry out super-division processing aiming at the video subimages to be processed containing more texture information to obtain corresponding second super-division video subimages;
s203: and splicing the obtained hyper-resolution video sub-images to obtain corresponding target video images.
After the super-resolution processing, the original low-resolution fuzzy original image is processed into a high-resolution high-definition image.
Based on the same inventive concept as the above method embodiment, the present application embodiment further provides an apparatus for video super-resolution, which, referring to the schematic structural diagram shown in fig. 3, may include a texture classification unit 301, a super-resolution processing unit 302, and a generation unit 303, wherein,
the texture classifying unit 301 is configured to divide a video image to be processed into a plurality of video sub-images to be processed, perform texture classification processing on the plurality of video sub-images to be processed, and determine texture categories of the video sub-images to be processed respectively;
a super-resolution processing unit 302, configured to perform corresponding super-resolution processing on each to-be-processed video sub-image according to the texture type of each to-be-processed video sub-image, so as to obtain a corresponding super-resolution video sub-image;
and the generating unit 303 is configured to perform stitching processing on each obtained hyper-resolution video sub-image to obtain a corresponding target video image.
Optionally, the texture classifying unit 301 is configured to:
respectively extracting the characteristics of the plurality of video sub-images to be processed to obtain respective corresponding texture characteristic graphs;
respectively carrying out normalization processing on each texture feature map to obtain the texture type prediction rate of each to-be-processed video sub-image;
for each to-be-processed video subimage, respectively executing the following operations: and determining the texture class to which one video sub-image to be processed belongs according to the comparison result between the texture class prediction rate of the one video sub-image to be processed and a plurality of set thresholds.
Optionally, the super-divide processing unit 302 performs any one of the following operations on one to-be-processed video sub-image:
performing first super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding first super-resolution video sub-image;
and performing second super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding second super-resolution video sub-image.
Optionally, the super-divide processing unit 302 is configured to:
performing interpolation processing on the video subimage to be processed to obtain a corresponding first super-resolution video subimage; and the pixel density of the video sub-image to be processed is smaller than that of the corresponding first super-resolution video sub-image.
Optionally, the interpolation processing is any one of nearest neighbor interpolation, linear interpolation, bilinear interpolation, and bicubic interpolation.
Optionally, the super-divide processing unit 302 is configured to:
performing feature extraction on the video subimage to be processed to obtain a corresponding image feature map set;
and performing pixel rearrangement processing on each initial pixel point in the image feature map set to obtain a corresponding second super-resolution video sub-image, so that a plurality of initial pixel points at the same position in each image feature map are mapped to be a target pixel point on the second super-resolution video sub-image.
For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.
Having described the method and apparatus for video super-resolution of the exemplary embodiments of the present application, a computer device according to another exemplary embodiment of the present application is next described.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
Based on the same inventive concept as the method embodiment described above, a computer device is also provided in the embodiment of the present application, and referring to fig. 4, the computer device 400 may at least include a processor 401 and a memory 402. Wherein the memory 402 stores program code which, when executed by the processor 401, causes the processor 401 to perform the steps of any of the above-mentioned methods of video super-resolution.
In some possible implementations, a computing device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method for video super-resolution according to various exemplary embodiments of the present application described above in the present specification. For example, a processor may perform the steps as shown in FIG. 1 a.
A computing device 500 according to this embodiment of the present application is described below with reference to fig. 5. The computing device 500 of fig. 5 is only one example and should not be used to limit the scope of use and functionality of embodiments of the present application.
As shown in fig. 5, computing device 500 is embodied in the form of a general purpose computing device. Components of computing device 500 may include, but are not limited to: the at least one processing unit 501, the at least one memory unit 502, and a bus 503 connecting the various system components (including the memory unit 502 and the processing unit 501).
Bus 503 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The storage unit 502 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)5021 and/or cache memory unit 5022, and may further include Read Only Memory (ROM) 5023.
The storage unit 502 may also include a program/utility 5025 having a set (at least one) of program modules 5024, such program modules 5024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Computing device 500 may also communicate with one or more external devices 504 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with computing device 500, and/or with any devices (e.g., router, modem, etc.) that enable computing device 500 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interfaces 505. Moreover, computing device 500 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 506. As shown, the network adapter 506 communicates with the other modules for the computing device 500 over the bus 503. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 500, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Based on the same inventive concept as the above-mentioned method embodiments, various aspects of the method for video super-resolution provided by the present application may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps in the method for video super-resolution according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 1 a.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method for super-resolution of video, comprising:
dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed, and respectively determining the texture category of each video sub-image to be processed;
performing corresponding super-resolution processing on each video sub-image to be processed according to the texture category of each video sub-image to be processed to obtain a corresponding super-resolution video sub-image;
and splicing the obtained hyper-resolution video sub-images to obtain corresponding target video images.
2. The method of claim 1, wherein the performing texture classification processing on the plurality of to-be-processed video sub-images to determine the texture class of each to-be-processed video sub-image comprises:
respectively extracting the characteristics of the plurality of video sub-images to be processed to obtain respective corresponding texture characteristic graphs;
respectively carrying out normalization processing on each texture feature map to obtain the texture type prediction rate of each to-be-processed video sub-image;
for each to-be-processed video subimage, respectively executing the following operations: and determining the texture class to which one video sub-image to be processed belongs according to the comparison result between the texture class prediction rate of the one video sub-image to be processed and a plurality of set thresholds.
3. The method according to claim 1, wherein the respective video sub-images to be processed are subjected to corresponding super-resolution processing according to texture categories of the respective video sub-images to be processed, so as to obtain corresponding super-resolution video sub-images, wherein any one of the following operations is performed for one video sub-image to be processed:
performing first super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding first super-resolution video sub-image;
and performing second super-resolution processing on the video sub-image to be processed according to the texture category of the video sub-image to be processed to obtain a corresponding second super-resolution video sub-image.
4. The method of claim 3, wherein said performing a first super resolution process on said one video image to be processed to obtain a corresponding first super resolution video sub-image comprises:
performing interpolation processing on the video subimage to be processed to obtain a corresponding first super-resolution video subimage; and the pixel density of the video sub-image to be processed is smaller than that of the corresponding first super-resolution video sub-image.
5. The method according to claim 4, wherein the interpolation process is any one of nearest neighbor interpolation, linear interpolation, bilinear interpolation, bicubic interpolation.
6. The method of claim 3, wherein performing a second super resolution process on said one video sub-image to be processed to obtain a corresponding second super resolution video sub-image comprises:
performing feature extraction on the video subimage to be processed to obtain a corresponding image feature map set;
and performing pixel rearrangement processing on each initial pixel point in the image feature map set to obtain a corresponding second super-resolution video sub-image, so that a plurality of initial pixel points at the same position in each image feature map are mapped to be a target pixel point on the second super-resolution video sub-image.
7. An apparatus for super-resolution of video, comprising:
the texture classification unit is used for dividing a video image to be processed into a plurality of video sub-images to be processed, performing texture classification processing on the plurality of video sub-images to be processed and respectively determining the texture category of each video sub-image to be processed;
the super-resolution processing unit is used for performing corresponding super-resolution processing on each to-be-processed video sub-image according to the texture category of each to-be-processed video sub-image to obtain a corresponding super-resolution video sub-image;
and the generating unit is used for splicing all the obtained hyper-resolution video sub-images to obtain corresponding target video images.
8. The apparatus of claim 7, wherein the texture classification unit is to:
respectively extracting the characteristics of the plurality of video sub-images to be processed to obtain respective corresponding texture characteristic graphs;
respectively carrying out normalization processing on each texture feature map to obtain the texture type prediction rate of each to-be-processed video sub-image;
for each to-be-processed video subimage, respectively executing the following operations: and determining the texture class to which one video sub-image to be processed belongs according to the comparison result between the texture class prediction rate of the one video sub-image to be processed and a plurality of set thresholds.
9. A computer device, characterized in that it comprises a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 6.
10. Computer-readable storage medium, characterized in that it comprises program code for causing a computer device to carry out the steps of the method according to any one of claims 1 to 6, when said program product is run on said computer device.
CN202110823352.7A 2021-07-21 2021-07-21 Video super-resolution method and device Pending CN113596576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110823352.7A CN113596576A (en) 2021-07-21 2021-07-21 Video super-resolution method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110823352.7A CN113596576A (en) 2021-07-21 2021-07-21 Video super-resolution method and device

Publications (1)

Publication Number Publication Date
CN113596576A true CN113596576A (en) 2021-11-02

Family

ID=78248568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110823352.7A Pending CN113596576A (en) 2021-07-21 2021-07-21 Video super-resolution method and device

Country Status (1)

Country Link
CN (1) CN113596576A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900717A (en) * 2022-05-13 2022-08-12 杭州网易智企科技有限公司 Video data transmission method, device, medium and computing equipment
CN118134765A (en) * 2024-04-30 2024-06-04 国家超级计算天津中心 Image processing method, apparatus and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592129A (en) * 2012-01-02 2012-07-18 西安电子科技大学 Scenario-driven image characteristic point selection method for smart phone
CN108734659A (en) * 2018-05-17 2018-11-02 华中科技大学 A kind of sub-pix convolved image super resolution ratio reconstruction method based on multiple dimensioned label
CN112580753A (en) * 2021-02-24 2021-03-30 杭州科技职业技术学院 Emergency forced landing position determining method based on texture feature mask
CN112597983A (en) * 2021-03-04 2021-04-02 湖南航天捷诚电子装备有限责任公司 Method for identifying target object in remote sensing image and storage medium and system thereof
CN112862681A (en) * 2021-01-29 2021-05-28 中国科学院深圳先进技术研究院 Super-resolution method, device, terminal equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592129A (en) * 2012-01-02 2012-07-18 西安电子科技大学 Scenario-driven image characteristic point selection method for smart phone
CN108734659A (en) * 2018-05-17 2018-11-02 华中科技大学 A kind of sub-pix convolved image super resolution ratio reconstruction method based on multiple dimensioned label
CN112862681A (en) * 2021-01-29 2021-05-28 中国科学院深圳先进技术研究院 Super-resolution method, device, terminal equipment and storage medium
CN112580753A (en) * 2021-02-24 2021-03-30 杭州科技职业技术学院 Emergency forced landing position determining method based on texture feature mask
CN112597983A (en) * 2021-03-04 2021-04-02 湖南航天捷诚电子装备有限责任公司 Method for identifying target object in remote sensing image and storage medium and system thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900717A (en) * 2022-05-13 2022-08-12 杭州网易智企科技有限公司 Video data transmission method, device, medium and computing equipment
CN114900717B (en) * 2022-05-13 2023-09-26 杭州网易智企科技有限公司 Video data transmission method, device, medium and computing equipment
CN118134765A (en) * 2024-04-30 2024-06-04 国家超级计算天津中心 Image processing method, apparatus and storage medium

Similar Documents

Publication Publication Date Title
Yu et al. Path-restore: Learning network path selection for image restoration
WO2021135254A1 (en) License plate number recognition method and apparatus, electronic device, and storage medium
CN112308200B (en) Searching method and device for neural network
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
US20190138816A1 (en) Method and apparatus for segmenting video object, electronic device, and storage medium
CN109951635B (en) Photographing processing method and device, mobile terminal and storage medium
CN112950471A (en) Video super-resolution processing method and device, super-resolution reconstruction model and medium
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN111079507B (en) Behavior recognition method and device, computer device and readable storage medium
CN113596576A (en) Video super-resolution method and device
TW202037145A (en) Method, device for video processing and computer storage medium thereof
CN111340820B (en) Image segmentation method and device, electronic equipment and storage medium
CN110855957B (en) Image processing method and device, storage medium and electronic equipment
CN111105375A (en) Image generation method, model training method and device thereof, and electronic equipment
CN113724136B (en) Video restoration method, device and medium
CN111951192A (en) Shot image processing method and shooting equipment
CN110503002B (en) Face detection method and storage medium
Li et al. NDNet: Spacewise multiscale representation learning via neighbor decoupling for real-time driving scene parsing
CN111353965A (en) Image restoration method, device, terminal and storage medium
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium
CN111726621B (en) Video conversion method and device
Yu et al. MagConv: Mask-guided convolution for image inpainting
WO2024074042A1 (en) Data storage method and apparatus, data reading method and apparatus, and device
Lyu et al. Cascaded parallel crowd counting network with multi-resolution collaborative representation
WO2023206343A1 (en) Image super-resolution method based on image pre-training strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination