CN114897688A - Video processing method, video processing device, computer equipment and medium - Google Patents

Video processing method, video processing device, computer equipment and medium Download PDF

Info

Publication number
CN114897688A
CN114897688A CN202210458499.5A CN202210458499A CN114897688A CN 114897688 A CN114897688 A CN 114897688A CN 202210458499 A CN202210458499 A CN 202210458499A CN 114897688 A CN114897688 A CN 114897688A
Authority
CN
China
Prior art keywords
pair
sample
resolution
image
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210458499.5A
Other languages
Chinese (zh)
Inventor
磯部駿
陶鑫
戴宇荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202210458499.5A priority Critical patent/CN114897688A/en
Publication of CN114897688A publication Critical patent/CN114897688A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Abstract

The disclosure relates to a video processing method, a video processing device, computer equipment and a medium, and belongs to the technical field of internet. In the embodiment of the disclosure, the motion information of the image pair in the sample video is obtained, and the motion information of the image pair is utilized to obtain the image block pair of which the motion information meets the motion condition in the image pair, so that whether the motion information of the two frames of images of the image pair meets the motion condition is judged by setting the motion condition, so as to obtain the image block containing more motion information, and then model training is performed based on the obtained image block pair, so that the super-resolution model is based on the motion change which can be concerned about between the two adjacent frames of images, the information referred by model training is enriched, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.

Description

Video processing method, video processing device, computer equipment and medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a video processing method and apparatus, a computer device, and a medium.
Background
In the field of video technology, the super-resolution reconstruction technology has a wide application range and research significance, wherein the super-resolution reconstruction technology is used for reconstructing a high-resolution image with higher pixel density and more complete details at a corresponding moment by using one image. With the development of deep learning, the super-resolution reconstruction technology based on the neural network is developed rapidly.
At present, before performing super-resolution reconstruction on a video based on a neural network, the neural network needs to be trained to obtain a super-resolution model with a super-resolution reconstruction function, wherein training data of the neural network generally needs to be intercepted in a random intercepting manner in a plurality of sample videos to obtain image blocks in the sample videos, and the image blocks are used as training data of the neural network to perform model training.
However, due to uncertainty of random interception, image blocks with little information are often intercepted, and therefore, the super-resolution model trained by using the image blocks has low accuracy, so that the accuracy of super-resolution reconstruction is reduced, and the accuracy of video processing is also reduced.
Disclosure of Invention
The present disclosure provides a video processing method, apparatus, computer device and medium, which can train a super-resolution model with high accuracy, and further improve the accuracy of super-resolution reconstruction, and thus improve the accuracy of video processing. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a video processing method, including:
acquiring motion information of an image pair in a sample video, wherein the image pair comprises a pair of adjacent images in the sample video, and the motion information represents pixel motion between two frames of images of the image pair;
obtaining a sample image block pair with motion information meeting motion conditions from the image pair, wherein two sample image blocks in the sample image block pair are respectively positioned in two frames of images of the image pair;
and performing model training based on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, wherein the super-resolution model is used for performing super-resolution reconstruction on the video.
In the embodiment of the disclosure, the motion information of the image pair in the sample video is obtained, and the motion information of the image pair is utilized to obtain the image block pair of which the motion information meets the motion condition in the image pair, so that whether the motion information of the two frames of images of the image pair meets the motion condition is judged by setting the motion condition, so as to obtain the image block containing more motion information, and then model training is performed based on the obtained image block pair, so that the super-resolution model is based on the motion change which can be concerned about between the two adjacent frames of images, the information referred by model training is enriched, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
In some embodiments, obtaining motion information for an image pair in a sample video comprises any one of:
acquiring optical flow information of an image pair in the sample video, wherein the optical flow information represents the moving amount of a pixel of a previous frame image in the image pair moving to a next frame image; or acquiring time sequence difference information of the image pair in the sample video, wherein the time sequence difference information represents the motion change of two frames of the image pair in time sequence.
In the embodiment of the disclosure, two ways of acquiring the motion information of the image pair, namely, the optical flow method and the time sequence difference method, are provided, which can quickly acquire the motion information of the image pair, improve the flexibility of acquiring the motion information, wherein, by acquiring the optical flow information of the image pair, the optical flow information can represent the moving amount of the pixel of the previous frame image in the image pair moving to the next frame image, and can also represent the pixel moving condition between the two frame images of the image pair, so that the optical flow information can be used to judge the motion condition, and by obtaining the time sequence difference information of the image pair, the time-series difference information can represent the motion change of the two frames of images of the image pair in time series, and can also represent the pixel motion situation between the two frames of images of the image pair, so that the time-series difference information can be used for judging the motion condition in the following process.
In some embodiments, the motion information includes a motion parameter indicative of a degree of pixel motion between two images of the image pair; from the pair of images, obtaining a pair of sample image blocks whose motion information satisfies a motion condition includes:
and acquiring a moving image block pair with a motion parameter larger than or equal to a first threshold and a static image block pair with a motion parameter smaller than a second threshold from the image pair, wherein the first threshold is larger than or equal to the second threshold.
In the embodiment of the disclosure, by setting the motion parameter, so as to subsequently use the motion parameter to determine whether the motion information of two frames of images of the image pair satisfies the motion condition, the determination efficiency of the motion condition is improved, and the efficiency of obtaining the sample image block pair is also improved, and by setting the motion condition that the motion parameter is greater than or equal to the first threshold value, so as to obtain the motion image block pair, since the motion image block pair includes pixels in a motion state, when subsequently using the motion image block pair to perform model training, the super-resolution model can focus on the motion change between two adjacent frames of images based on the motion image block pair, by setting the motion condition that the motion parameter is smaller than the second threshold value, so as to obtain the stationary image block pair, since the stationary image block pair includes pixels in a stationary state, when subsequently using the stationary image block pair to perform model training, the super-resolution model can be concerned about the structural information of the image based on the static image block pair, so that the information referred by model training is enriched, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
In some embodiments, obtaining, from the pair of images, a pair of moving image blocks having a motion parameter greater than or equal to a first threshold and a pair of still image blocks having a motion parameter less than a second threshold comprises: determining a first pixel of which the motion parameter is greater than or equal to the first threshold from the image pair, and intercepting an image block pair with a target size by taking the first pixel as a starting point to obtain the motion image block pair; and determining a second pixel of which the motion parameter is smaller than the second threshold, and intercepting the image block pair with the target size by taking the second pixel as a starting point to obtain the static image block pair.
In the embodiment of the disclosure, for the process of obtaining a moving image block pair, the first pixel is determined first, and then the image block pair with the target size is captured by using the first pixel as a starting point, so that the moving image block pair can be obtained quickly, and the efficiency of obtaining the moving image block pair is improved.
In some embodiments, the first threshold and the second threshold are determined based on motion parameters of a plurality of pixels included in any one of the pair of images.
In the embodiment of the disclosure, the first threshold corresponding to the moving image block pair and the second threshold corresponding to the still image block pair are determined based on the motion parameters of the plurality of pixels included in any one frame of image in the image pair, so that whether the pixel in the image is a moving pixel or a still pixel is determined based on the first threshold and the second threshold, and the accuracy of determining the motion condition is improved.
In some embodiments, the number of pairs of moving image blocks is the same as the number of pairs of still image blocks.
In the embodiment of the disclosure, an equal number of moving image block pairs and static image block pairs are obtained, so that the equal number of moving image block pairs and static image block pairs are subsequently used as training data to perform model training to obtain a super-resolution model, since a moving image block pair comprises pixels in motion, a still image block pair comprises pixels in still, furthermore, model training is carried out on the moving image block pair and the static image block pair, so that the super-resolution model can be concerned about motion change between two adjacent frames of images on the basis of the moving image block pair, structural information of the images can be concerned about on the basis of the static image block pair, information referred by model training is enriched, the super-resolution model with high accuracy can be trained, accuracy of super-resolution reconstruction is improved, and accuracy of video processing is improved.
In some embodiments, the sample video includes sample video at a first resolution and sample video at a second resolution, the first resolution being less than the second resolution; from the pair of images, obtaining a pair of sample image blocks whose motion information satisfies a motion condition includes:
acquiring a first sample image block pair of which the motion information meets the motion condition from the image pair of the sample video with the first resolution, and acquiring a second sample image block pair of which the motion information meets the motion condition from the image pair of the sample video with the second resolution, wherein the first sample image block pair is the sample image block pair with the first resolution, and the second sample image block pair is the sample image block pair with the second resolution;
performing model training based on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, wherein the method comprises the following steps:
and performing model training based on the plurality of pairs of the first sample image block pairs in the sample video with the first resolution and the plurality of pairs of the second sample image block pairs in the sample video with the second resolution to obtain the super-resolution model.
In the embodiment of the disclosure, the sample image block pair of the first resolution is obtained in the sample video of the first resolution, and the sample image block pair of the second resolution is obtained in the sample video of the second resolution, so that the sample image block pair of the first resolution and the sample image block pair of the second resolution are subsequently used as training data to perform model training, so as to obtain the super-resolution model.
In some embodiments, obtaining the first sample image block pair whose motion information satisfies the motion condition from the pair of images of the sample video at the first resolution, and obtaining the second sample image block pair whose motion information satisfies the motion condition from the pair of images of the sample video at the second resolution includes any one of:
acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the motion information of the image pair in the sample video with the first resolution, and acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on the first sample image block pair;
acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on the motion information of the image pair in the sample video with the second resolution, and acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the second sample image block pair;
the method includes the steps of conducting up-sampling processing on the basis of motion information of an image pair in a sample video with the first resolution to obtain motion information of the image pair in the sample video with the second resolution, obtaining a second sample image block pair from the image pair of the sample video with the second resolution on the basis of the motion information of the image pair in the sample video with the second resolution, and obtaining a first sample image block pair from the image pair of the sample video with the first resolution on the basis of the second sample image block pair.
In the embodiment of the present disclosure, three ways of obtaining the first sample image block pair and the second sample image block pair are provided, so that the flexibility of obtaining the first sample image block pair and the second sample image block pair is improved.
In some embodiments, obtaining the second sample image block pair from the pair of images of the sample video of the second resolution based on the first sample image block pair comprises: acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on a resolution factor and the first sample image block pair, wherein the resolution factor represents a resolution ratio of the second resolution to the first resolution;
obtaining the first sample image block pair from the pair of images of the sample video of the first resolution based on the second sample image block pair comprises: and acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the resolution ratio and the second sample image block pair.
In the embodiment of the present disclosure, after the first sample image block pair is acquired, based on the resolution, the first sample image block pair can be converted from the first resolution to the second resolution, that is, the second sample image block pair is also acquired, or after the second sample image block pair is acquired, based on the resolution, the second sample image block pair can be converted from the second resolution to the first resolution, that is, the first sample image block pair is also acquired, and efficiency of acquiring the sample image block pair is improved.
In some embodiments, performing model training based on the plurality of pairs of the first sample image block pairs in the sample video of the first resolution and the plurality of pairs of the second sample image block pairs in the sample video of the second resolution, obtaining the super-resolution model includes:
in the ith iteration process of model training, inputting a plurality of pairs of the first sample image blocks into the super-resolution model determined in the (i-1) th iteration process to obtain an image training result of the ith iteration process, wherein i is a positive integer greater than 1;
and adjusting the model parameters of the super-resolution model determined in the (i-1) th iteration process based on the image training result of the ith iteration process and the plurality of pairs of second sample image blocks, performing the (i + 1) th iteration process based on the adjusted model parameters, and repeating the training iteration process until the training meets the target condition.
In the embodiment of the disclosure, the network model with better model parameters is acquired as the super-resolution model in an iterative training mode to acquire the super-resolution model with better super-resolution reconstruction capability, so that the accuracy of the super-resolution model is improved.
In some embodiments, before performing model training based on a plurality of pairs of the sample image blocks in the sample video to obtain a super-resolution model, the method further includes any one of:
performing mirror image inversion on pixels at the inner edge position of the sample image block pair, and adding the pixels after mirror image inversion to the outer edge position of the sample image block pair;
copying pixels on the inner edge position of the sample image block pair, and adding the copied pixels to the outer edge position of the sample image block pair;
at the outer edge positions of the pair of sample image blocks, zero pixels are added.
In the embodiment of the disclosure, by means of mirror image inversion or pixel expansion such as pixel copying or zero pixel adding, pixel expansion is performed at the outer edge position of the sample image block pair, the pixel information of the image edge is increased, and then subsequent model training is performed on the sample image block pair based on the pixel expansion, so that the super-resolution model focuses more on reconstruction of the image edge, the problem of noise at the image edge is avoided, the super-resolution model with high accuracy can be trained, and then the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
In some embodiments, after model training is performed on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, the method further includes: performing super-resolution reconstruction on the video through the super-resolution model to obtain an initial super-resolution video; and for any frame of image in the super-resolution video, cutting pixels on the inner edge position of the image, and generating a target super-resolution video of the video based on the cut image.
In the embodiment of the disclosure, after performing super-resolution reconstruction on a video, pixels on an inner edge position of an image in an output initial super-resolution video are cut to remove redundant pixels, so that a target super-resolution video of the video is obtained.
According to a second aspect of the embodiments of the present disclosure, there is provided a video processing apparatus, the apparatus including:
an information acquisition unit configured to perform acquisition of motion information of a pair of images in a sample video, the pair of images including a pair of adjacent images in the sample video, the motion information representing a pixel motion situation between two frames of images of the pair of images;
an image acquisition unit configured to perform acquisition of a sample image block pair, of which motion information satisfies a motion condition, from the image pair, two sample image blocks of the sample image block pair being respectively located in two frames of images of the image pair;
and the training unit is configured to perform model training based on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, and the super-resolution model is used for performing super-resolution reconstruction on the video.
In some embodiments, the information obtaining unit is configured to perform any one of: acquiring optical flow information of an image pair in the sample video, wherein the optical flow information represents the moving amount of a pixel of a previous frame image in the image pair moving to a next frame image; or acquiring time sequence difference information of the image pair in the sample video, wherein the time sequence difference information represents the motion change of two frames of the image pair in time sequence.
In some embodiments, the motion information includes a motion parameter indicative of a degree of pixel motion between two images of the image pair; the image acquisition unit is configured to perform acquisition of a moving image block pair having a motion parameter greater than or equal to a first threshold value and a still image block pair having a motion parameter less than a second threshold value from the image pair, the first threshold value being greater than or equal to the second threshold value.
In some embodiments, the image acquisition unit is configured to perform: determining a first pixel of which the motion parameter is greater than or equal to the first threshold from the image pair, and intercepting an image block pair with a target size by taking the first pixel as a starting point to obtain the motion image block pair; and determining a second pixel of which the motion parameter is smaller than the second threshold, and intercepting the image block pair with the target size by taking the second pixel as a starting point to obtain the static image block pair.
In some embodiments, the first threshold and the second threshold are determined based on motion parameters of a plurality of pixels included in any one of the pair of images.
In some embodiments, the number of pairs of moving image blocks is the same as the number of pairs of still image blocks.
In some embodiments, the sample video includes sample video at a first resolution and sample video at a second resolution, the first resolution being less than the second resolution; the image acquisition unit is configured to perform acquiring a first sample image block pair of which the motion information satisfies the motion condition from the image pair of the sample video of the first resolution, and acquiring a second sample image block pair of which the motion information satisfies the motion condition from the image pair of the sample video of the second resolution, wherein the first sample image block pair is the sample image block pair of the first resolution, and the second sample image block pair is the sample image block pair of the second resolution; the training unit is configured to perform model training based on the plurality of pairs of the first sample image block pairs in the sample video with the first resolution and the plurality of pairs of the second sample image block pairs in the sample video with the second resolution to obtain the super-resolution model.
In some embodiments, the image acquisition unit comprises any one of: a first image obtaining subunit configured to perform obtaining the first sample image block pair from the image pair of the sample video of the first resolution based on the motion information of the image pair in the sample video of the first resolution, and obtaining the second sample image block pair from the image pair of the sample video of the second resolution based on the first sample image block pair; a second image obtaining subunit configured to perform obtaining, based on the motion information of the pair of images in the sample video at the second resolution, the pair of second sample image blocks from the pair of images in the sample video at the second resolution, and obtaining, based on the pair of second sample image blocks, the pair of first sample image blocks from the pair of images in the sample video at the first resolution; the third image acquisition subunit is configured to perform upsampling processing based on the motion information of the image pair in the sample video with the first resolution to obtain the motion information of the image pair in the sample video with the second resolution, acquire the second sample image block pair from the image pair of the sample video with the second resolution based on the motion information of the image pair in the sample video with the second resolution, and acquire the first sample image block pair from the image pair of the sample video with the first resolution based on the second sample image block pair.
In some embodiments, the first image obtaining subunit is configured to perform obtaining the second sample image block pair from the image pair of the sample video of the second resolution based on a resolution ratio representing a resolution ratio of the second resolution to the first resolution and the first sample image block pair; the second image obtaining subunit is configured to perform obtaining the first sample image block pair from the image pair of the sample video of the first resolution based on the resolution and the second sample image block pair.
In some embodiments, the training unit is configured to perform: in the ith iteration process of model training, inputting a plurality of pairs of the first sample image blocks into the super-resolution model determined in the (i-1) th iteration process to obtain an image training result of the ith iteration process, wherein i is a positive integer greater than 1; and adjusting the model parameters of the super-resolution model determined in the (i-1) th iteration process based on the image training result of the ith iteration process and the plurality of pairs of second sample image blocks, performing the (i + 1) th iteration process based on the adjusted model parameters, and repeating the training iteration process until the training meets the target condition.
In some embodiments, the apparatus further comprises any one of: a mirror inversion unit configured to perform mirror inversion of pixels at an inner edge position of the pair of sample image blocks, the mirror-inverted pixels being added to an outer edge position of the pair of sample image blocks; a copying unit configured to copy pixels at an inner edge position of the pair of sample image blocks, and add the copied pixels to an outer edge position of the pair of sample image blocks; an adding unit configured to perform adding zero pixels at an outer edge position of the pair of sample image blocks.
In some embodiments, the apparatus further comprises: the super-resolution unit is configured to perform super-resolution reconstruction on the video through the super-resolution model to obtain an initial super-resolution video; and the shearing unit is configured to shear pixels on the inner edge position of any frame image in the super-resolution video, and generate the target super-resolution video of the video based on the sheared image.
According to a third aspect of embodiments of the present disclosure, there is provided a computer apparatus comprising:
one or more processors;
a memory for storing the processor executable program code;
wherein the processor is configured to execute the program code to implement the video processing method described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium including: the program code in the computer readable storage medium, when executed by a processor of a computer device, enables the computer device to perform the video processing method described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the video processing method described above.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a schematic diagram of an implementation environment for a video processing method according to an example embodiment;
FIG. 2 is a flow diagram illustrating a video processing method according to an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a video processing method according to an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating an optical flow graph in accordance with an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating an image block in accordance with an exemplary embodiment;
FIG. 6 is a diagram illustrating results of a super-resolution test according to an exemplary embodiment;
FIG. 7 is a block diagram illustrating a video processing device according to an example embodiment;
FIG. 8 is a block diagram illustrating a terminal in accordance with an exemplary embodiment;
FIG. 9 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
It should be noted that information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.), and signals involved in the embodiments of the present disclosure are authorized by the user or sufficiently authorized by various parties, and the collection, use, and processing of the relevant data requires compliance with relevant laws and regulations and standards in relevant countries and regions. For example, the motion information involved in the embodiments of the present disclosure is obtained with sufficient authorization. In some embodiments, a permission query page is provided in the embodiments of the present disclosure, and the permission query page is used to query whether to grant the acquisition permission of the information, in the permission query page, an authorization granting control and an authorization denying control are displayed, and when a trigger operation on the authorization granting control is detected, the information is acquired by using the video processing method provided in the embodiments of the present disclosure, so as to implement training on the super-resolution model.
The video processing method provided by the embodiment of the disclosure can be applied to the super-resolution reconstruction scene of the video, for example, the super-resolution reconstruction of the live video, the super-resolution reconstruction of the game video, and the like. In some embodiments, the video processing method provided by the embodiments of the present disclosure may be applied to a training phase of a super-resolution model to train and obtain the super-resolution model with super-resolution reconstruction capability. The super-resolution reconstruction is to reconstruct a second-resolution image with higher pixel density and more complete details at a corresponding moment by using the image with the first resolution, wherein the first resolution is smaller than the second resolution. Accordingly, super-resolution reconstruction of the video is to reconstruct the video with higher pixel density and more complete details at the corresponding moment by using the video with the first resolution. It should be appreciated that a high resolution image can provide more information, and it is easier to further mine and process the information therein than a low resolution image.
In some embodiments, in a video transmission scene, a video with a first resolution may be transmitted, and after receiving the video with the first resolution, super-resolution reconstruction is performed by using a super-resolution model obtained by training in the embodiments of the present disclosure to obtain a video with a higher resolution, so that resources consumed for video transmission can be effectively reduced. In still other embodiments, in a video rendering scene, a video with a first resolution may be rendered, and then when the video with the first resolution is displayed, super-resolution reconstruction is performed by using the super-resolution model trained according to the embodiments of the present disclosure, so as to obtain a video with a higher resolution, and thus, resources consumed by video rendering can be effectively reduced.
Fig. 1 is a schematic diagram of an implementation environment of a video processing method according to an exemplary embodiment, and referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102.
The terminal 101 may be at least one of a smartphone, a smart watch, a desktop computer, a laptop computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, a laptop portable computer, and the like. The terminal 101 has a communication function and can access a wired network or a wireless network. The terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 101. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer.
The server 102 may be an independent physical server, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Alternatively, the number of the servers 102 may be more or less, and the embodiment of the disclosure does not limit this. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.
In some embodiments, the video processing method provided by the embodiment of the present disclosure is executed by the terminal 101, for example, after the terminal 101 detects a processing operation on a sample video, a sample image block pair whose motion information satisfies a motion condition in an image pair of the sample video is obtained by using the video processing method provided by the embodiment of the present disclosure, and then model training is performed based on multiple pairs of sample image block pairs in the sample video to obtain a super-resolution model.
In other embodiments, the video processing method provided by the embodiment of the present disclosure is executed by the server 102, for example, after the server 102 receives a processing request for a sample video, a sample image block pair whose motion information satisfies a motion condition in an image pair of the sample video is obtained by using the video processing method provided by the embodiment of the present disclosure, and then model training is performed based on a plurality of pairs of sample image block pairs in the sample video to obtain a super-resolution model; or, the server 102 periodically obtains sample image block pairs, of which the motion information satisfies the motion condition, from the image pairs of the sample video, and then performs model training based on the pairs of sample image block pairs in the sample video to obtain the super-resolution model.
In other embodiments, the server 102 and the terminal 101 are connected directly or indirectly through wired or wireless communication, which is not limited in the embodiments of the present disclosure. Accordingly, in some embodiments, if the terminal 101 detects a processing operation on the sample video, a processing request for the sample video is sent to the server 102 to request the server 102 to obtain a sample image block pair in the image pair of the sample video, where the motion information meets the motion condition, by using the video processing method provided in the embodiments of the present disclosure, and then, based on multiple pairs of sample image block pairs in the sample video, model training is performed to obtain the super-resolution model. Embodiments of the present disclosure will be referred to hereinafter as a terminal 101 or server 102 as a computer device.
Fig. 2 is a flow chart illustrating a video processing method according to an exemplary embodiment, as shown in fig. 2, the method being performed by a computer device, which may be provided as the terminal or the server shown in fig. 1, described above, and illustratively, the method includes the steps of:
in step 201, a computer device obtains motion information for a pair of images in a sample video, the pair of images including a pair of adjacent images in the sample video, the motion information indicating pixel motion between two frames of the pair of images.
In step 202, the computer device obtains, from the pair of images, a pair of sample image blocks whose motion information satisfies a motion condition, where two sample image blocks in the pair of sample image blocks are located in two frames of the pair of images, respectively.
In step 203, the computer device performs model training based on the pairs of sample image blocks in the sample video to obtain a super-resolution model, where the super-resolution model is used to perform super-resolution reconstruction on the video.
According to the technical scheme provided by the embodiment of the disclosure, the image block pair with the motion information meeting the motion condition in the image pair is obtained by obtaining the motion information of the image pair in the sample video and utilizing the motion information of the image pair, so that whether the motion information of the two frames of images of the image pair meets the motion condition is judged by setting the motion condition and utilizing the motion information of the two frames of images of the image pair to obtain the image blocks containing more motion information, and then model training is carried out based on the obtained image block pair, so that the super-resolution model can focus on the motion change between the two adjacent frames of images, the information referred by the model training is enriched, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
In some embodiments, obtaining motion information for an image pair in a sample video comprises any one of:
acquiring optical flow information of an image pair in the sample video, wherein the optical flow information represents the moving amount of a pixel of a previous frame image in the image pair moving to a next frame image; or acquiring time sequence difference information of the image pair in the sample video, wherein the time sequence difference information represents the motion change of two frames of the image pair in time sequence.
In some embodiments, the motion information includes a motion parameter indicative of a degree of pixel motion between two images of the image pair; from the pair of images, obtaining a pair of sample image blocks whose motion information satisfies a motion condition includes:
and acquiring a moving image block pair with a motion parameter larger than or equal to a first threshold and a static image block pair with a motion parameter smaller than a second threshold from the image pair, wherein the first threshold is larger than or equal to the second threshold.
In some embodiments, obtaining, from the pair of images, a pair of moving image blocks having a motion parameter greater than or equal to a first threshold and a pair of still image blocks having a motion parameter less than a second threshold comprises: determining a first pixel of which the motion parameter is greater than or equal to the first threshold from the image pair, and intercepting an image block pair with a target size by taking the first pixel as a starting point to obtain the motion image block pair; and determining a second pixel of which the motion parameter is smaller than the second threshold, and intercepting the image block pair with the target size by taking the second pixel as a starting point to obtain the static image block pair.
In some embodiments, the first threshold and the second threshold are determined based on motion parameters of a plurality of pixels included in any one of the pair of images.
In some embodiments, the number of pairs of moving image blocks is the same as the number of pairs of still image blocks.
In some embodiments, the sample video includes sample video at a first resolution and sample video at a second resolution, the first resolution being less than the second resolution; from the pair of images, obtaining a pair of sample image blocks whose motion information satisfies a motion condition includes:
acquiring a first sample image block pair of which the motion information meets the motion condition from the image pair of the sample video with the first resolution, and acquiring a second sample image block pair of which the motion information meets the motion condition from the image pair of the sample video with the second resolution, wherein the first sample image block pair is the sample image block pair with the first resolution, and the second sample image block pair is the sample image block pair with the second resolution;
performing model training based on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, wherein the method comprises the following steps:
and performing model training based on the plurality of pairs of the first sample image block pairs in the sample video with the first resolution and the plurality of pairs of the second sample image block pairs in the sample video with the second resolution to obtain the super-resolution model.
In some embodiments, obtaining the first sample image block pair whose motion information satisfies the motion condition from the pair of images of the sample video at the first resolution, and obtaining the second sample image block pair whose motion information satisfies the motion condition from the pair of images of the sample video at the second resolution includes any one of:
acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the motion information of the image pair in the sample video with the first resolution, and acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on the first sample image block pair;
acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on the motion information of the image pair in the sample video with the second resolution, and acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the second sample image block pair;
the method includes the steps of conducting up-sampling processing on the basis of motion information of an image pair in a sample video with the first resolution to obtain motion information of the image pair in the sample video with the second resolution, obtaining a second sample image block pair from the image pair of the sample video with the second resolution on the basis of the motion information of the image pair in the sample video with the second resolution, and obtaining a first sample image block pair from the image pair of the sample video with the first resolution on the basis of the second sample image block pair.
In some embodiments, obtaining the second sample image block pair from the pair of images of the sample video of the second resolution based on the first sample image block pair comprises: acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on a resolution factor and the first sample image block pair, wherein the resolution factor represents a resolution ratio of the second resolution to the first resolution;
obtaining the first sample image block pair from the pair of images of the sample video of the first resolution based on the second sample image block pair comprises: and acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the resolution ratio and the second sample image block pair.
In some embodiments, performing model training based on the plurality of pairs of the first sample image block pairs in the sample video of the first resolution and the plurality of pairs of the second sample image block pairs in the sample video of the second resolution, obtaining the super-resolution model includes:
in the ith iteration process of model training, inputting a plurality of pairs of the first sample image blocks into the super-resolution model determined in the (i-1) th iteration process to obtain an image training result of the ith iteration process, wherein i is a positive integer greater than 1;
and adjusting the model parameters of the super-resolution model determined in the (i-1) th iteration process based on the image training result of the ith iteration process and the plurality of pairs of second sample image blocks, performing the (i + 1) th iteration process based on the adjusted model parameters, and repeating the training iteration process until the training meets the target condition.
In some embodiments, before performing model training based on a plurality of pairs of the sample image blocks in the sample video to obtain a super-resolution model, the method further includes any one of:
performing mirror image inversion on pixels at the inner edge position of the sample image block pair, and adding the pixels after mirror image inversion to the outer edge position of the sample image block pair;
copying pixels on the inner edge position of the sample image block pair, and adding the copied pixels to the outer edge position of the sample image block pair;
at the outer edge positions of the pair of sample image blocks, zero pixels are added.
In some embodiments, after model training is performed on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, the method further includes: performing super-resolution reconstruction on the video through the super-resolution model to obtain an initial super-resolution video; and for any frame of image in the super-resolution video, cutting pixels on the inner edge position of the image, and generating the target super-resolution video of the video based on the cut image.
Fig. 2 is a basic flow chart of the present disclosure, and the scheme provided by the present disclosure is further explained below based on a specific implementation, and fig. 3 is a flow chart of a video processing method according to an exemplary embodiment, and referring to fig. 3, the method includes:
in step 301, a computer device obtains motion information for a pair of images in a sample video, the pair of images including a pair of adjacent images in the sample video, the motion information indicating pixel motion between two frames of the pair of images.
In the embodiment of the present disclosure, the computer device may be provided as a terminal or a server, and the computer device is provided with a function of constructing training data of a super-resolution model and a function of performing model training based on the training data, where the super-resolution model is used to perform super-resolution reconstruction on a video, where the super-resolution reconstruction is to reconstruct an image of a second resolution with higher pixel density and more complete details at a corresponding time by using an image of a first resolution, and the first resolution is smaller than the second resolution, and accordingly, the super-resolution reconstruction on the video, that is, the video of the second resolution with higher pixel density and more complete details at the corresponding time is reconstructed by using the video of the first resolution.
The sample video refers to a training video for training the super-resolution model, and in some embodiments, the number of the sample videos is multiple. In some embodiments, the sample video is a video stored locally at the terminal, or the sample video is a video stored by the server, or the sample video is a video stored by a video library associated with the server, and so on, which is not limited in this disclosure.
The pixel motion condition refers to a motion condition of a pixel in an image sequence in a time sequence, such as a displacement of motion, a speed of motion, and the like, and accordingly, the motion information refers to a condition that a pixel in a previous frame image in the image pair moves to a next frame image. In some embodiments, the motion information includes any one of optical flow information and time-series difference information, and the following describes a process of the computer device acquiring the motion information of an image pair in the sample video based on the following (1) and (2):
(1) the computer device acquires optical flow information of a pair of images in the sample video, the optical flow information indicating an amount of movement of a pixel of a previous frame image in the pair of images to a subsequent frame image. In this embodiment, by acquiring the optical flow information of the image pair, since the optical flow information can indicate the moving amount of the pixel of the previous frame image in the image pair moving to the next frame image, the pixel movement between the two frame images in the image pair can also be indicated, so that the optical flow information can be used for the subsequent determination process of the movement condition.
The amount of movement is also the displacement of the pixel movement. In some embodiments, the optical flow information includes a moving amount by which the pixel of the previous frame image is moved to the next frame image in the horizontal direction and a moving amount by which the pixel of the previous frame image is moved to the next frame image in the vertical direction. In some embodiments, the optical flow information is in the form of an optical flow graph, such as an optical flow graph corresponding to a horizontal direction (e.g., x-axis direction) and an optical flow graph corresponding to a vertical direction (e.g., y-axis direction).
In some embodiments, the computer device obtains the optical flow information of the image pair in the sample video through an optical flow network, the optical flow network is used for predicting the optical flow information of two adjacent frames of images, and the corresponding process is as follows: and the computer equipment inputs the image sequence included in the sample video into the optical flow network, and processes the image pairs in the image sequence through the optical flow network to obtain the optical flow information of the image pairs in the sample video.
In some embodiments, the optical flow network is trained based on a sample image pair and optical flow information for the sample image pair. Accordingly, the training process of the optical flow network is as follows: and the computer equipment performs model training based on the sample image pair and the optical flow information of the sample image pair to obtain the optical flow network. Further, in some embodiments, the server inputs the sample image pair into the optical flow network determined by the m-1 iteration process during the m-th iteration process of the model training, and obtains the optical flow training result predicted by the m-th iteration process; and adjusting the model parameters of the optical flow network determined in the (m-1) th iteration process based on the optical flow training result predicted in the (m) th iteration process and the optical flow information of the sample image pair, performing the (m + 1) th iteration process based on the adjusted model parameters, and repeating the iteration process of the training until the training meets the target condition. Wherein m is a positive integer greater than 1. In some embodiments, the target condition met by the training is that the number of training iterations of the model reaches a target number, which is a preset number of training iterations, such as 1000; alternatively, the training satisfies a target condition that the loss value satisfies a target threshold condition, such as a loss value less than 0.00001. The embodiments of the present disclosure do not limit the setting of the target conditions.
Therefore, the network model with better model parameters is acquired as the optical flow network in an iterative training mode to acquire the optical flow network with better prediction capability, so that the prediction accuracy of the optical flow network is improved.
It should be noted that the above embodiments take the example of using the optical flow network to acquire the optical flow information of the image pair, and in other embodiments, the computer device can also use other methods to acquire the optical flow information of the image pair, for example, the computer device uses an optical flow prediction algorithm to predict the optical flow information of the image pair, such as a sparse optical flow prediction algorithm, a dense optical flow prediction algorithm, and so on. The embodiments of the present disclosure do not limit this.
(2) The computer device obtains time-series difference information of the image pair in the sample video, wherein the time-series difference information represents motion change of two frames of the image pair in time series. In this embodiment, by acquiring the time-series difference information of the image pair, since the time-series difference information can represent the motion change of the two frames of images of the image pair in time series, the motion of the pixels between the two frames of images of the image pair can also be represented, so that the subsequent determination process of the motion condition can be performed by using the time-series difference information.
Wherein the motion change is also the speed of the pixel motion. In some embodiments, the timing difference information is in the form of a timing difference map. For example, the horizontal axis of the time-series difference map represents the time stamps of two frames of images, and the vertical axis of the time-series difference map represents the motion change of the two frames of images in time series.
In some embodiments, the computer device obtains the time-series difference information of the image pair in the sample video through a time-series difference network, and the time-series difference network is used for predicting the time-series difference information of two adjacent frames of images, and the corresponding process is as follows: and the computer equipment inputs the image sequence included in the sample video into the time sequence difference network, and processes the image pairs in the image sequence through the time sequence difference network to obtain the time sequence difference information of the image pairs in the sample video.
In some embodiments, the timing difference network is trained based on a sample image pair and timing difference information for the sample image pair. Correspondingly, the training process of the time sequence difference network is as follows: and the computer equipment performs model training based on the sample image pair and the time sequence difference information of the sample image pair to obtain the time sequence difference network. Further, in some embodiments, in the nth iteration process of the model training, the server inputs the sample image pair into the timing difference network determined in the (n-1) th iteration process to obtain a timing difference training result predicted by the nth iteration process; and adjusting the model parameters of the time sequence difference network determined in the (n-1) th iteration process based on the time sequence difference training result predicted in the nth iteration process and the time sequence difference information of the sample image pair, performing the (n + 1) th iteration process based on the adjusted model parameters, and repeating the training iteration process until the training meets the target condition. Wherein n is a positive integer greater than 1. In some embodiments, the target condition met by the training is that the number of training iterations of the model reaches a target number, which is a preset number of training iterations, such as 1000; alternatively, the training satisfies a target condition that the loss value satisfies a target threshold condition, such as a loss value less than 0.00001. The embodiments of the present disclosure do not limit the setting of the target conditions.
Therefore, the network model with better model parameters is acquired as the time sequence difference network in an iterative training mode to acquire the time sequence difference network with better prediction capability, so that the prediction accuracy of the time sequence difference network is improved.
It should be noted that the above embodiments are exemplified by using a time-series difference network to obtain the time-series difference information of the image pair, while in other embodiments, the computer device can also obtain the time-series difference information of the image pair by other methods, for example, the computer device uses a time-series difference algorithm to predict the time-series difference information of the image pair, such as a td (temporal difference) algorithm. The embodiments of the present disclosure are not limited thereto.
In the above embodiment, two manners of acquiring the motion information of the image pair, namely an optical flow method and a time sequence difference method, are provided, so that the motion information of the image pair can be quickly acquired, and the flexibility of acquiring the motion information is improved.
In step 302, the computer device obtains, from the pair of images, a pair of sample image blocks whose motion information satisfies a motion condition, where two sample image blocks in the pair of sample image blocks are respectively located in two frames of the pair of images.
In step 302, a sample image block whose motion information satisfies a motion condition is obtained from a previous frame of image of the image pair, and a sample image block whose motion information satisfies a motion condition is obtained from a next frame of image of the image pair, so as to obtain the sample image block pair.
In some embodiments, the motion information includes a motion parameter that represents a degree of pixel motion between two images of the pair of images. It should be understood that the larger the value of the motion parameter, the more intense the motion of the pixel between the two images, and the smaller the value of the motion parameter, the more gradual the motion of the pixel between the two images. Based on the two types of motion information shown in step 301, in some embodiments, the motion parameter is a movement amount provided by optical flow information (e.g., an optical flow graph), or the motion parameter is a motion variation value provided by time-series difference information (e.g., a time-series difference graph). It should be understood that the motion parameter is derived based on the difference between the pixel position of the image of the next frame of the image pair and the pixel position of the image of the previous frame. Therefore, by setting the motion parameters, whether the motion information of the two frames of images of the image pair meets the motion condition or not can be judged by utilizing the motion parameters subsequently, the judgment efficiency of the motion condition is improved, and the efficiency of obtaining the sample image block pair is also improved.
In some embodiments, the motion condition is that the motion parameter is greater than or equal to the first threshold, and accordingly, the process of the computer device obtaining the sample image block pair whose motion information satisfies the motion condition is: acquiring a moving image block pair with a motion parameter larger than or equal to a first threshold value from the image pair; in still other embodiments, the motion condition is that the motion parameter is smaller than the second threshold, and accordingly, the process of the computer device obtaining the sample image block pair whose motion information satisfies the motion condition is as follows: obtaining a static image block pair with a motion parameter smaller than a second threshold value from the image pair; in other embodiments, the motion condition is that the motion parameter is greater than or equal to the first threshold and the motion parameter is less than the second threshold, and accordingly, the process of the computer device obtaining the sample image block pair whose motion information satisfies the motion condition is: from the image pair, a moving image block pair with a motion parameter greater than or equal to a first threshold and a still image block pair with a motion parameter less than a second threshold are obtained.
Wherein the first threshold is greater than or equal to the second threshold. In some embodiments, the first threshold and the second threshold are determined based on motion parameters of a plurality of pixels included in any one of the pair of images. Illustratively, taking the first threshold equal to the second threshold as an example, determining a mean value of motion parameters of a plurality of pixels included in any frame image in the image pair as the first threshold and the second threshold; taking the first threshold value larger than the second threshold value as an example, a mean value of the motion parameters of the plurality of pixels included in any frame image in the image pair is determined, a target value is added on the basis of the mean value to obtain the first threshold value, and the target value is subtracted on the basis of the mean value to obtain the second threshold value. The target value is a predetermined value, such as 3 or 5 or other values. In this way, the first threshold corresponding to the moving image block pair and the second threshold corresponding to the still image block pair are determined based on the motion parameters of the plurality of pixels included in any one of the images in the image pair, so that whether the pixel in the image is a moving pixel or a still pixel is determined based on the first threshold and the second threshold, and the accuracy of determining the motion condition is improved.
In the embodiment, the motion condition with the motion parameter greater than or equal to the first threshold is set to obtain the motion image block pair, since the motion image block pair comprises the pixels in the motion state, when the motion image block pair is subsequently utilized for model training, the super-resolution model can focus on the motion change between two adjacent frames of images based on the motion image block pair, and the motion condition with the motion parameter less than the second threshold is set to obtain the static image block pair, since the static image block pair comprises the pixels in the static state, when the model training is subsequently utilized for model training, the super-resolution model can focus on the structural information of the images based on the static image block pair, so that the information referred by the model training is enriched, the super-resolution model with high accuracy can be trained, and the accuracy of super-resolution reconstruction is further improved, the accuracy of video processing is improved.
For the above process of obtaining a pair of moving image blocks, in some embodiments, the computer device determines, from the pair of images, a first pixel of which the motion parameter is greater than or equal to the first threshold, and intercepts the pair of image blocks with a target size using the first pixel as a starting point to obtain the pair of moving image blocks.
The first pixel is a pixel with a motion parameter greater than or equal to the first threshold, that is, a pixel in a motion state. The target size is a predetermined size, such as a size P × P, where P is a positive integer greater than 1. In some embodiments, the computer device determines, from the pair of images, a position of a first pixel of which the motion parameter is greater than or equal to the first threshold, and cuts out a pair of image blocks of a target size with the position of the first pixel as a starting point to obtain the pair of motion image blocks. In some embodiments, the location of the first pixel is expressed in terms of the coordinates of the first pixel.
In some embodiments, after determining the first pixel, the computer device intercepts an image block pair of the target size with the first pixel as an upper left corner vertex; or, taking the first pixel as the top right corner vertex, and intercepting an image block pair with a target size; or, taking the first pixel as the vertex of the lower left corner, and intercepting an image block pair with a target size; or, taking the first pixel as the vertex of the lower right corner, and intercepting the image block pair with the target size. The embodiments of the present disclosure are not limited thereto.
In the above embodiment, in the process of acquiring the moving image block pair, the first pixel is determined, and then the image block pair with the target size is intercepted by using the first pixel as the starting point, so that the moving image block pair can be acquired quickly, and the efficiency of acquiring the moving image block pair is improved.
In some embodiments, the computer device determines a second pixel of the image pair whose motion parameter is smaller than the second threshold from the image pair, and intercepts the image block pair with the target size using the second pixel as a starting point to obtain the still image block pair.
The second pixel is a pixel whose motion parameter is smaller than the second threshold, that is, a pixel in a static state. In some embodiments, the computer device determines, from the pair of images, a position of a second pixel of which the motion parameter is smaller than the second threshold, and intercepts the pair of image blocks of the target size with the position of the second pixel as a starting point, resulting in the pair of still image blocks. In some embodiments, the location of the second pixel is expressed in terms of the coordinates of the second pixel.
In some embodiments, after determining the second pixel, the computer device intercepts an image block pair of the target size with the second pixel as an upper left corner vertex; or, taking the second pixel as the top right corner vertex, and intercepting an image block pair with a target size; or, taking the second pixel as the vertex of the lower left corner, and intercepting an image block pair with a target size; or, taking the second pixel as the vertex of the lower right corner, and intercepting the image block pair with the target size. The embodiments of the present disclosure are not limited thereto.
In the above embodiment, in the process of obtaining the still image block pair, the second pixel is determined first, and then the image block pair with the target size is captured by using the second pixel as a starting point, so that the still image block pair can be obtained quickly, and the efficiency of obtaining the still image block pair is improved.
For the above-mentioned embodiment of simultaneously acquiring a pair of moving image blocks and a pair of still image blocks, in some embodiments, the number of the pair of moving image blocks is the same as the number of the pair of still image blocks. So, through obtaining the motion image piece of the quantity right and the static image piece of waiting, so that follow-up will wait the motion image piece of quantity right and the static image piece is right as training data carries out the model training, in order to obtain super-resolution model, because the motion image piece is to having contained the pixel that is in motion state, static image piece is to having contained the pixel that is in static state, and then, carry out the model training based on motion image piece pair and static image piece, make super-resolution model can pay close attention to the motion change between two adjacent frame images based on motion image piece, can pay close attention to the structural information of image itself based on static image piece pair, the information that the model training consults has been richened, can train out the super-resolution model that the accuracy is high, and then the accuracy of super-resolution reconstruction has been promoted, also the accuracy of video processing has been promoted.
Fig. 4 is a schematic diagram of an optical flow graph according to an exemplary embodiment, referring to fig. 4, where the image shown on the left side of fig. 4 is an image pair in a sample video, it should be noted that the image pair is presented on the left side of fig. 4 by taking an example of overlapping two frames of images of the image pair, and the image shown on the right side of fig. 4 is an optical flow graph of the image pair, and it should be noted that the optical flow graph shown in fig. 4 presents the degree of pixel motion by taking color brightness as an example, that is, the brighter the color indicates the stronger the degree of pixel motion, and the darker the color indicates the gentler the degree of pixel motion. Of course, the optical flow graph shown in FIG. 4 is provided as an exemplary illustration, and in other embodiments, the optical flow graph can be represented in other forms, such as in the form of a two-dimensional vector to represent the degree of pixel motion, such as the amount of pixel movement in the horizontal direction (x-axis direction) and the amount of pixel movement in the vertical direction (y-axis direction).
From the pair of images shown on the left side of fig. 4, a pair of sample image blocks whose motion information (amount of movement) satisfies the motion condition is acquired based on the light flow diagram shown on the right side of fig. 4. Fig. 5 is a schematic diagram of an image block shown according to an exemplary embodiment, and referring to fig. 5, an image block (a), an image block (b), and an image block (c) shown in fig. 5 are image blocks in an optical flow graph, where the image block (a) and the image block (b) are image blocks with motion parameters greater than or equal to a first threshold, that is, motion image blocks, and the image block (c) is an image block with motion parameters less than a second threshold, that is, a still image block, and accordingly, fig. 5 also shows sample image blocks corresponding to the image blocks in a sample video, that is, a sample image block (d), a sample image block (e), and a sample image block (f), respectively. It should be understood that, after determining the image blocks whose motion information satisfies the motion condition based on the optical flow graph, the embodiments of the present disclosure need to obtain corresponding sample image blocks from the original images of the sample video.
In an embodiment of the present disclosure, the sample video includes a sample video with a first resolution and a sample video with a second resolution, where the first resolution is smaller than the second resolution, and accordingly, the step 302 may be replaced with: the computer device obtains a first sample image block pair of which the motion information satisfies the motion condition from the image pair of the sample video with the first resolution, and obtains a second sample image block pair of which the motion information satisfies the motion condition from the image pair of the sample video with the second resolution.
The sample video of the first resolution and the sample video of the second resolution are acquired based on a shooting device (such as a camera). The first pair of sample image blocks is a pair of sample image blocks in the sample video of the first resolution, i.e. the first pair of sample image blocks is a pair of sample image blocks of the first resolution. The second sample image block pair is a sample image block pair in the sample video of the second resolution, that is, the second sample image block pair is a sample image block pair of the second resolution.
In this embodiment, a pair of sample image blocks of a first resolution is obtained in a sample video of the first resolution, and a pair of sample image blocks of a second resolution is obtained in a sample video of the second resolution, so that the pair of sample image blocks of the first resolution and the pair of sample image blocks of the second resolution are subsequently used as training data to perform model training to obtain a super-resolution model.
It should be noted that, based on the difference in resolution, the size of the first sample image block pair is different from the size of the second sample image block pair, and in some embodiments, the target size includes a first size and a second size, where the first sample image block pair is a first size, the second sample image block pair is a second size, the first size is smaller than the second size, and a ratio of the second size to the first size is a resolution ratio, and the resolution ratio represents a resolution ratio of the second resolution to the first resolution, such as 4. Illustratively, assuming that the first size is a P × P size, the second size is a 4P × 4P size, where P is a positive integer greater than 1.
In some embodiments, the process of the computer device obtaining the first sample image patch pair and the second sample image patch pair is as follows, in any of three implementations:
the first method comprises the following steps: the computer device obtains the first sample image block pair from the image pair of the sample video of the first resolution based on the motion information of the image pair in the sample video of the first resolution, and obtains the second sample image block pair from the image pair of the sample video of the second resolution based on the first sample image block pair.
In some embodiments, a computer device obtains the first pair of sample image blocks of a first size from the pair of images of the sample video of the first resolution based on motion information of the pair of images in the sample video of the first resolution, and obtains the second pair of sample image blocks of a second size from the pair of images of the sample video of the second resolution based on the first pair of sample image blocks.
For the motion information of the image pair in the sample video based on the first resolution, refer to the process of obtaining the sample image block pair based on the motion parameter and the motion condition for obtaining the first sample image block pair from the image pair of the sample video of the first resolution, and are not described again.
In some embodiments, a computer device obtains the second pair of sample image blocks from the pair of sample videos at the second resolution based on a resolution and the first pair of sample image blocks.
In some embodiments, a computer device determines a position of the second sample image block pair based on a resolution and a position of the first sample image block pair, the second sample image block pair being obtained from the image pair of the sample video of the second resolution based on the position of the second sample image block pair.
The positions of the first sample image block pairs are represented by vertex coordinates of the first sample image block pairs, the positions of the second sample image block pairs are represented by vertex coordinates of the second sample image block pairs, and the vertex coordinates are top left corner vertex coordinates or bottom left corner vertex coordinates or top right corner vertex coordinates or bottom right corner vertex coordinates. In an alternative embodiment, the computer device performs a multiplication operation based on the resolution ratio and the vertex coordinates of the first sample image block pair to obtain vertex coordinates of the second sample image block pair, and in the image pair of the sample video of the second resolution, intercepts the image block pair of the second size based on the vertex coordinates of the second sample image block pair to obtain the second sample image block pair.
Illustratively, taking the resolution ratio of 4 as an example, taking the vertex coordinate of the first sample image block pair as the vertex coordinate (x, y) at the upper left corner as an example, the computer device performs a product operation based on the resolution ratio of 4 and the vertex coordinate (x, y) at the upper left corner of the first sample image block pair to obtain the vertex coordinate (4x, 4y) at the upper left corner of the second sample image block pair, and further, based on the vertex coordinate (4x, 4y) at the upper left corner of the second sample image block pair, from the image pair of the sample video at the second resolution, takes (4x, 4y) as the vertex at the upper left corner, cuts out the image block pair at the second size to obtain the second sample image block pair.
In this embodiment, after the first sample image block pair is acquired, based on a resolution, the first sample image block pair can be converted from a first resolution to a second resolution, that is, the second sample image block pair is acquired, so that the efficiency of acquiring the sample image block pair is improved.
And the second method comprises the following steps: the computer device obtains the second sample image block pair from the image pair of the sample video of the second resolution based on the motion information of the image pair in the sample video of the second resolution, and obtains the first sample image block pair from the image pair of the sample video of the first resolution based on the second sample image block pair.
In some embodiments, the computer device obtains the second pair of sample image blocks of the second size from the pair of images of the sample video of the second resolution based on the motion information of the pair of images in the sample video of the second resolution, and obtains the first pair of sample image blocks of the first size from the pair of images of the sample video of the first resolution based on the second pair of sample image blocks.
For the motion information of the image pair in the sample video based on the second resolution, refer to the process of obtaining the sample image block pair based on the motion parameter and the motion condition for obtaining the second sample image block pair from the image pair of the sample video of the second resolution, and are not described again.
In some embodiments, a computer device obtains the first sample image block pair from the pair of sample videos at the first resolution based on the resolution and the second sample image block pair.
In some embodiments, a computer device determines a position of the first sample image block pair based on the resolution and a position of the second sample image block pair, the first sample image block pair being obtained from the image pair of the sample video of the first resolution based on the position of the first sample image block pair.
In an optional embodiment, the computer device performs a division operation based on the resolution ratio and the vertex coordinates of the second sample image block pair to obtain vertex coordinates of the first sample image block pair, and in the image pair of the sample video of the first resolution, intercepts the image block pair of the first size based on the vertex coordinates of the first sample image block pair to obtain the first sample image block pair.
Illustratively, taking the resolution ratio of 4 as an example, taking the vertex coordinate of the second sample image block pair as the top left vertex coordinate (x, y) as an example, the computer device performs a division operation based on the resolution ratio of 4 and the top left vertex coordinate (x, y) of the second sample image block pair to obtain the top left vertex coordinate (x/4, y/4) of the first sample image block pair, and further, based on the top left vertex coordinate (x/4, y/4) of the first sample image block pair, cuts the image block pair of the first size from the image pair of the sample video of the first resolution with (x/4, y/4) as the top left vertex to obtain the first sample image block pair.
In this embodiment, after the second sample image block pair is acquired, based on the resolution, the second sample image block pair can be converted from the second resolution to the first resolution, that is, the first sample image block pair is acquired, so that the efficiency of acquiring the sample image block pair is improved.
And the third is that: the computer device performs upsampling processing on the basis of the motion information of the image pair in the sample video with the first resolution to obtain the motion information of the image pair in the sample video with the second resolution, acquires the second sample image block pair from the image pair of the sample video with the second resolution on the basis of the motion information of the image pair in the sample video with the second resolution, and acquires the first sample image block pair from the image pair of the sample video with the first resolution on the basis of the second sample image block pair.
In some embodiments, the computer device performs the upsampling process based on an upsampling mode of linear interpolation, or performs the upsampling process based on an upsampling mode of deep learning (such as deconvolution). The content of the upsampling process is not limited in the embodiments of the present disclosure.
And obtaining the second sample image block pair from the image pair of the sample video with the second resolution based on the motion information of the image pair in the sample video with the second resolution, and obtaining the first sample image block pair from the image pair of the sample video with the first resolution based on the second sample image block pair.
In the above embodiments, three ways of obtaining the first sample image block pair and the second sample image block pair are provided, so that the flexibility of obtaining the first sample image block pair and the second sample image block pair is improved.
It should be noted that, in the step 301, the image sequence included in the sample video is input to the optical flow network or the time sequence difference network to obtain the motion information of the image pairs in the sample video, that is, the motion information of all the image pairs in the sample video is obtained, and then step 302 is executed to obtain the image block pairs of all the image pairs in the sample video. In other embodiments, the computer device is further capable of obtaining motion information of any image pair (any two adjacent frames of images) in the sample video based on the image pair, performing step 302 based on the motion information of the image pair to obtain a sample image block pair in the image pair, and further obtaining sample image blocks in the rest images in the sample video in sequence based on the positions of the sample image block pair in the image pair to obtain all image blocks in the sample video. Therefore, the computer equipment only needs to execute the steps on the pair of images, the processing content of the computer equipment is greatly reduced, and the processing efficiency of the computer equipment is improved.
In step 303, the computer device performs pixel expansion at the outer edge position of the pair of sample image blocks, and performs step 304 based on the pair of sample image blocks after pixel expansion.
Wherein the outer edge positions include outer edge positions of four sides of the sample image block, i.e., upper, lower, left, and right sides. In some embodiments, for any image block of the sample image block pair, the computer device performs pixel expansion at outer edge positions of four sides of the image block, respectively, and performs step 304 based on the sample image block pair after pixel expansion.
In some embodiments, the process of pixel expansion by the computer device comprises any one of:
in some embodiments, the computer device mirror-inverts pixels at inner edge locations of the pair of sample image blocks, adding the mirror-inverted pixels to outer edge locations of the pair of sample image blocks. In some embodiments, for any image block of the pair of sample image blocks, the computer device mirror-inverts pixels on an inner edge location of the sample image block, adding the mirror-inverted pixels to an outer edge location of the sample image block.
Wherein, the mirror inversion is also the symmetrical inversion. In some embodiments, for any one image block of the pair of sample image blocks, the computer device mirror-inverts pixels on inner edge positions of four upper, lower, left, and right sides of the sample image block, and adds the mirror-inverted pixels to outer edge positions of the four upper, lower, left, and right sides of the sample image block, respectively.
In still other embodiments, the computer device copies pixels at inner edge locations of the pair of sample image blocks, and adds the copied pixels to outer edge locations of the pair of sample image blocks. In some embodiments, for any image block of the pair of sample image blocks, the computer device copies pixels at inner edge locations of the sample image block, adding the copied pixels to outer edge locations of the sample image block.
In some embodiments, for any image block of the pair of sample image blocks, the computer device copies pixels at inner edge positions of the upper, lower, left, and right four sides of the sample image block, and adds the copied pixels to outer edge positions of the upper, lower, left, and right four sides of the sample image block, respectively.
In other embodiments, the computer device adds zero pixels at outer edge positions of the pair of sample image blocks. In some embodiments, for any image block of the pair of sample image blocks, the computer device adds zero pixels at outer edge positions of the sample image block. The zero pixel is a pixel with a pixel value of 0, and the pixel value is used to represent the brightness (or called gray) at the position of the pixel, and usually, the pixel with the pixel value of 0 is also a black pixel.
In some embodiments, for any image block of the pair of sample image blocks, the computer device adds zero pixels at outer edge positions of the upper, lower, left, and right four sides of the sample image block, respectively.
In some embodiments, based on the first sample image block pair and the second sample image block pair shown in step 302, based on the difference in resolution, the pixel expansion amount of the first sample image block pair is different from the pixel expansion amount of the second sample image block pair, in some embodiments, the pixel expansion amount of the first sample image block pair is a first expansion amount, and the pixel expansion amount of the second sample image block pair is a second expansion amount, wherein the first expansion amount is smaller than the second expansion amount, and a ratio of the second expansion amount to the first expansion amount is a resolution ratio. For example, taking the resolution ratio as 4 as an example, assuming that the first extended number is s, the second extended number is 4s, and s is a positive integer greater than 1.
In the above embodiment, by means of mirror image inversion or pixel expansion by copying pixels or adding zero pixels, pixel expansion is performed at the outer edge of the sample image block pair, so that the pixel information of the image edge is increased, and then subsequent model training is performed based on the sample image block pair after pixel expansion, so that the super-resolution model focuses more on reconstruction of the image edge, the problem of noise at the image edge is avoided, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
It should be noted that step 303 is an optional step, and in other embodiments, after the sample image block pair whose motion information satisfies the motion condition is obtained based on step 302, step 304 is executed without executing step 303.
In step 304, the computer device performs model training based on the pairs of sample image blocks in the sample video to obtain a super-resolution model, where the super-resolution model is used to perform super-resolution reconstruction on the video.
Based on the sample video of the first resolution and the sample video of the second resolution shown in step 302, in some embodiments, the computer device performs model training based on a plurality of pairs of the first sample image block pairs in the sample video of the first resolution and a plurality of pairs of the second sample image block pairs in the sample video of the second resolution to obtain the super-resolution model.
Further, in some embodiments, the computer device inputs a plurality of pairs of the first sample image block pairs into the super-resolution model determined in the (i-1) th iteration process in the ith iteration process of the model training, and obtains an image training result in the ith iteration process; and adjusting the model parameters of the super-resolution model determined in the (i-1) th iteration process based on the image training result of the ith iteration process and the plurality of pairs of second sample image blocks, performing the (i + 1) th iteration process based on the adjusted model parameters, and repeating the training iteration process until the training meets the target condition.
In some embodiments, the computer device inputs a plurality of pairs of the first sample image blocks into the initial model in a first iteration process of model training to obtain an image training result of the first iteration process, adjusts a model parameter of the initial model based on the image training result of the first iteration process and the plurality of pairs of the second sample image blocks, performs a next iteration process based on the adjusted model parameter, and repeats the iteration process of the training until the training meets a target condition.
In the embodiment, the network model with the better model parameters is acquired as the super-resolution model through an iterative training mode so as to acquire the super-resolution model with the better super-resolution reconstruction capability, and therefore the accuracy of the super-resolution model is improved.
In some embodiments, after the super-resolution model is obtained based on the training of the foregoing steps 301 to 304, in a testing stage of the super-resolution model, the computer device further performs super-resolution reconstruction on the video through the super-resolution model to obtain an initial super-resolution video, for any frame of image in the super-resolution video, cuts pixels at an inner edge position of the image, and generates a target super-resolution video of the video based on the cut image. In this embodiment, after performing super-resolution reconstruction on a video, for an image in an output initial super-resolution video, pixels at an inner edge position are clipped to remove redundant pixels, so as to obtain a target super-resolution video of the video.
Schematically, assuming that q pixels are expanded at the outer edge position of a video before super-resolution reconstruction of the video, where q is a positive integer greater than 1, after obtaining an initial super-resolution video based on a super-resolution model, for any frame of image of the initial super-resolution video, 4q pixels are cut at the inner edge position of the image, and the cut super-resolution video is determined as the target super-resolution video.
According to the technical scheme provided by the embodiment of the disclosure, the image block pair with the motion information meeting the motion condition in the image pair is obtained by obtaining the motion information of the image pair in the sample video and utilizing the motion information of the image pair, so that whether the motion information of the two frames of images of the image pair meets the motion condition is judged by setting the motion condition and utilizing the motion information of the two frames of images of the image pair to obtain the image blocks containing more motion information, and then model training is carried out based on the obtained image block pair, so that the super-resolution model can focus on the motion change between the two adjacent frames of images, the information referred by the model training is enriched, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
In the embodiment of the present disclosure, fig. 6 is a schematic diagram of a super-resolution test result according to an exemplary embodiment, and in the three images shown in fig. 6, the left image is an original image, the middle image is an image after super-resolution reconstruction is performed by using a scheme of randomly intercepting training data in a related technology, and the right image is an image after super-resolution reconstruction is performed by using the embodiment of the present disclosure. In addition, in the related art, only the image block is obtained by adopting a random interception mode, and then the trained super-resolution model can pay more attention to the reconstruction inside the image, and the reconstruction of the edge of the image is ignored, so that noise is easily generated at the edge of the image.
Fig. 7 is a block diagram illustrating a video processing apparatus according to an exemplary embodiment, and referring to fig. 7, the apparatus includes an information acquisition unit 701, an image acquisition unit 702, and a training unit 703.
An information acquisition unit 701 configured to perform acquisition of motion information of a pair of images in a sample video, the pair of images including a pair of adjacent images in the sample video, the motion information representing a pixel motion situation between two frames of images of the pair of images;
an image obtaining unit 702 configured to perform obtaining, from the pair of images, a pair of sample image blocks of which motion information satisfies a motion condition, two sample image blocks of the pair of sample image blocks being respectively located in two frames of the pair of images;
a training unit 703 configured to perform model training based on a plurality of pairs of the sample image blocks in the sample video, so as to obtain a super-resolution model, where the super-resolution model is used for performing super-resolution reconstruction on the video.
According to the technical scheme provided by the embodiment of the disclosure, the image block pair with the motion information meeting the motion condition in the image pair is obtained by obtaining the motion information of the image pair in the sample video and utilizing the motion information of the image pair, so that whether the motion information of the two frames of images of the image pair meets the motion condition is judged by setting the motion condition and utilizing the motion information of the two frames of images of the image pair to obtain the image blocks containing more motion information, and then model training is carried out based on the obtained image block pair, so that the super-resolution model can focus on the motion change between the two adjacent frames of images, the information referred by the model training is enriched, the super-resolution model with high accuracy can be trained, the accuracy of super-resolution reconstruction is improved, and the accuracy of video processing is also improved.
In some embodiments, the information obtaining unit 701 is configured to perform any one of the following: acquiring optical flow information of an image pair in the sample video, wherein the optical flow information represents the moving amount of a pixel of a previous frame image in the image pair moving to a next frame image; or acquiring time sequence difference information of the image pair in the sample video, wherein the time sequence difference information represents the motion change of two frames of the image pair in time sequence.
In some embodiments, the motion information includes a motion parameter indicative of a degree of pixel motion between two images of the image pair; the image obtaining unit 702 is configured to perform obtaining, from the pair of images, a pair of moving image blocks having a motion parameter greater than or equal to a first threshold value and a pair of still image blocks having a motion parameter less than a second threshold value, the first threshold value being greater than or equal to the second threshold value.
In some embodiments, the image acquisition unit 702 is configured to perform: determining a first pixel of which the motion parameter is greater than or equal to the first threshold from the image pair, and intercepting an image block pair with a target size by taking the first pixel as a starting point to obtain the motion image block pair; and determining a second pixel of which the motion parameter is smaller than the second threshold, and intercepting the image block pair with the target size by taking the second pixel as a starting point to obtain the static image block pair.
In some embodiments, the first threshold and the second threshold are determined based on motion parameters of a plurality of pixels included in any one of the pair of images.
In some embodiments, the number of pairs of moving image blocks is the same as the number of pairs of still image blocks.
In some embodiments, the sample video includes sample video at a first resolution and sample video at a second resolution, the first resolution being less than the second resolution; the image obtaining unit 702 is configured to perform obtaining a first sample image block pair of which the motion information satisfies the motion condition from the image pair of the sample video of the first resolution, and obtaining a second sample image block pair of which the motion information satisfies the motion condition from the image pair of the sample video of the second resolution, where the first sample image block pair is the sample image block pair of the first resolution, and the second sample image block pair is the sample image block pair of the second resolution; the training unit 703 is configured to perform model training based on the plurality of pairs of the first sample image block pairs in the sample video with the first resolution and the plurality of pairs of the second sample image block pairs in the sample video with the second resolution, so as to obtain the super-resolution model.
In some embodiments, the image acquisition unit 702 includes any one of: a first image obtaining subunit configured to perform obtaining the first sample image block pair from the image pair of the sample video of the first resolution based on the motion information of the image pair in the sample video of the first resolution, and obtaining the second sample image block pair from the image pair of the sample video of the second resolution based on the first sample image block pair; a second image obtaining subunit configured to perform obtaining, based on the motion information of the pair of images in the sample video at the second resolution, the pair of second sample image blocks from the pair of images in the sample video at the second resolution, and obtaining, based on the pair of second sample image blocks, the pair of first sample image blocks from the pair of images in the sample video at the first resolution; the third image acquisition subunit is configured to perform upsampling processing based on the motion information of the image pair in the sample video with the first resolution to obtain the motion information of the image pair in the sample video with the second resolution, acquire the second sample image block pair from the image pair of the sample video with the second resolution based on the motion information of the image pair in the sample video with the second resolution, and acquire the first sample image block pair from the image pair of the sample video with the first resolution based on the second sample image block pair.
In some embodiments, the first image obtaining subunit is configured to perform obtaining the second sample image block pair from the image pair of the sample video of the second resolution based on a resolution ratio representing a resolution ratio of the second resolution to the first resolution and the first sample image block pair; the second image obtaining subunit is configured to perform obtaining the first sample image block pair from the image pair of the sample video of the first resolution based on the resolution and the second sample image block pair.
In some embodiments, the training unit 703 is configured to perform: in the ith iteration process of model training, inputting a plurality of pairs of the first sample image blocks into the super-resolution model determined in the (i-1) th iteration process to obtain an image training result of the ith iteration process, wherein i is a positive integer greater than 1; and adjusting the model parameters of the super-resolution model determined in the (i-1) th iteration process based on the image training result of the ith iteration process and the plurality of pairs of second sample image blocks, performing the (i + 1) th iteration process based on the adjusted model parameters, and repeating the training iteration process until the training meets the target condition.
In some embodiments, the apparatus further comprises any one of: a mirror inversion unit configured to perform mirror inversion of pixels at an inner edge position of the pair of sample image blocks, the mirror-inverted pixels being added to an outer edge position of the pair of sample image blocks; a copying unit configured to copy pixels at an inner edge position of the pair of sample image blocks, and add the copied pixels to an outer edge position of the pair of sample image blocks; an adding unit configured to perform adding zero pixels at an outer edge position of the pair of sample image blocks.
In some embodiments, the apparatus further comprises: the super-resolution unit is configured to perform super-resolution reconstruction on the video through the super-resolution model to obtain an initial super-resolution video; and the shearing unit is configured to shear pixels on the inner edge position of any frame image in the super-resolution video, and generate the target super-resolution video of the video based on the sheared image.
It should be noted that: in the video processing apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration in video processing, and in practical applications, the above functions may be distributed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the video processing apparatus and the video processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
The computer device mentioned in the embodiments of the present disclosure may be provided as a terminal. Fig. 8 is a block diagram illustrating a terminal 800 according to an example embodiment. The terminal 800 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer or a desktop computer. The terminal 800 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.
In general, the terminal 800 includes: a processor 801 and a memory 802.
The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 802 is used to store at least one program code for execution by the processor 801 to implement processes performed by the terminal in the video processing method provided by the method embodiments in the present disclosure.
In some embodiments, the terminal 800 may further include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802, and peripheral interface 803 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a display screen 805, a camera assembly 806, an audio circuit 807, a positioning assembly 808, and a power supply 809.
The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.
The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.
The display 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be one, disposed on a front panel of the terminal 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the terminal 800 or in a folded design; in other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the terminal 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.
The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.
The positioning component 808 is used to locate the current geographic position of the terminal 800 for navigation or LBS (Location Based Service).
Power supply 809 is used to provide power to various components in terminal 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power source 809 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.
The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the display 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the terminal 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 813 may be disposed on the side frames of terminal 800 and/or underneath display 805. When the pressure sensor 813 is disposed on the side frame of the terminal 800, the holding signal of the user to the terminal 800 can be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The optical sensor 814 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the brightness of the display 805 based on the ambient light intensity collected by the optical sensor 814. Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display 805 is reduced. In another embodiment, processor 801 may also dynamically adjust the shooting parameters of camera head assembly 806 based on the ambient light intensity collected by optical sensor 814.
A proximity sensor 815, also known as a distance sensor, is typically provided on the front panel of the terminal 800. The proximity sensor 815 is used to collect the distance between the user and the front surface of the terminal 800. In one embodiment, when the proximity sensor 815 detects that the distance between the user and the front surface of the terminal 800 is gradually decreased, the processor 801 controls the display 805 to switch from the bright screen state to the dark screen state; when the proximity sensor 815 detects that the distance between the user and the front surface of the terminal 800 is gradually increased, the processor 801 controls the display 805 to switch from the breath-screen state to the bright-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 8 is not intended to be limiting of terminal 800 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
The computer device mentioned in the embodiments of the present disclosure may be provided as a server. Fig. 9 is a block diagram of a server according to an exemplary embodiment, where the server 900 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 901 and one or more memories 902, where the one or more memories 902 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 901 to implement the processes executed by the server in the video Processing methods provided by the above-mentioned method embodiments. Certainly, the server 900 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 900 may also include other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, there is also provided a computer readable storage medium, such as a memory 902, comprising program code executable by a processor 901 of the server 900 to perform the video processing method described above. Alternatively, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact-Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, comprising a computer program which, when executed by a processor, implements the video processing method described above.
In some embodiments, a computer program according to embodiments of the present disclosure may be deployed to be executed on one computer device or on multiple computer devices located at one site, or on multiple computer devices distributed at multiple sites and interconnected by a communication network, and the multiple computer devices distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (16)

1. A method of video processing, the method comprising:
acquiring motion information of an image pair in a sample video, wherein the image pair comprises a pair of adjacent images in the sample video, and the motion information represents pixel motion conditions between two frames of images of the image pair;
obtaining a sample image block pair with motion information meeting motion conditions from the image pair, wherein two sample image blocks in the sample image block pair are respectively positioned in two frames of images of the image pair;
and performing model training based on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, wherein the super-resolution model is used for performing super-resolution reconstruction on the video.
2. The video processing method according to claim 1, wherein the obtaining of the motion information of the image pair in the sample video comprises any one of:
acquiring optical flow information of an image pair in the sample video, wherein the optical flow information represents the moving amount of a pixel of a previous frame image in the image pair moving to a next frame image; or the like, or, alternatively,
acquiring time sequence difference information of the image pair in the sample video, wherein the time sequence difference information represents the motion change of two frames of the image pair in time sequence.
3. The video processing method of claim 1, wherein the motion information comprises a motion parameter indicating a degree of pixel motion between two images of the image pair;
the obtaining, from the image pair, a sample image block pair whose motion information satisfies a motion condition includes:
and acquiring a moving image block pair with a motion parameter larger than or equal to a first threshold and a static image block pair with a motion parameter smaller than a second threshold from the image pairs, wherein the first threshold is larger than or equal to the second threshold.
4. The video processing method according to claim 3, wherein said obtaining, from the image pairs, a moving image block pair having a motion parameter greater than or equal to a first threshold and a still image block pair having a motion parameter less than a second threshold comprises:
determining a first pixel of which the motion parameter is greater than or equal to the first threshold from the image pair, and intercepting an image block pair with a target size by taking the first pixel as a starting point to obtain the motion image block pair;
and determining a second pixel of which the motion parameter is smaller than the second threshold, and intercepting the image block pair with the target size by taking the second pixel as a starting point to obtain the static image block pair.
5. The video processing method according to claim 3 or 4, wherein the first threshold value and the second threshold value are determined based on motion parameters of a plurality of pixels included in any one of the images in the pair.
6. The video processing method according to claim 3 or 4, wherein the number of pairs of moving image blocks is the same as the number of pairs of still image blocks.
7. The video processing method according to claim 1, wherein the sample video comprises a sample video of a first resolution and a sample video of a second resolution, the first resolution being smaller than the second resolution;
the obtaining, from the image pair, a sample image block pair whose motion information satisfies a motion condition includes:
acquiring a first sample image block pair of which the motion information meets the motion condition from the image pair of the sample video with the first resolution, and acquiring a second sample image block pair of which the motion information meets the motion condition from the image pair of the sample video with the second resolution, wherein the first sample image block pair is the sample image block pair with the first resolution, and the second sample image block pair is the sample image block pair with the second resolution;
the model training is performed on the basis of a plurality of pairs of sample image blocks in the sample video, and the obtaining of the super-resolution model comprises the following steps:
and performing model training based on the plurality of pairs of first sample image block pairs in the sample video with the first resolution and the plurality of pairs of second sample image block pairs in the sample video with the second resolution to obtain the super-resolution model.
8. The video processing method according to claim 7, wherein said obtaining a first sample image block pair whose motion information satisfies the motion condition from the pair of images of the sample video at the first resolution, and obtaining a second sample image block pair whose motion information satisfies the motion condition from the pair of images of the sample video at the second resolution comprises any one of:
acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the motion information of the image pair in the sample video with the first resolution, and acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on the first sample image block pair;
acquiring a second sample image block pair from the image pair of the sample video with the second resolution based on the motion information of the image pair in the sample video with the second resolution, and acquiring a first sample image block pair from the image pair of the sample video with the first resolution based on the second sample image block pair;
the method comprises the steps of carrying out up-sampling processing on the basis of motion information of an image pair in a sample video with the first resolution to obtain the motion information of the image pair in the sample video with the second resolution, obtaining a second sample image block pair from the image pair of the sample video with the second resolution on the basis of the motion information of the image pair in the sample video with the second resolution, and obtaining a first sample image block pair from the image pair of the sample video with the first resolution on the basis of the second sample image block pair.
9. The video processing method according to claim 8, wherein said obtaining the second sample image block pair from the image pair of the sample video of the second resolution based on the first sample image block pair comprises:
acquiring the second sample image block pair from the image pair of the sample video with the second resolution based on a resolution factor and the first sample image block pair, wherein the resolution factor represents a resolution ratio of the second resolution to the first resolution;
the obtaining the first sample image block pair from the image pair of the sample video of the first resolution based on the second sample image block pair comprises:
and acquiring the first sample image block pair from the image pair of the sample video with the first resolution based on the resolution ratio and the second sample image block pair.
10. The video processing method according to claim 7, wherein model training is performed on the plurality of pairs of the first sample image block pairs in the sample video at the first resolution and the plurality of pairs of the second sample image block pairs in the sample video at the second resolution, and obtaining the super-resolution model comprises:
in the ith iteration process of model training, inputting a plurality of pairs of the first sample image blocks into the super-resolution model determined in the (i-1) th iteration process to obtain an image training result of the ith iteration process, wherein i is a positive integer greater than 1;
and adjusting the model parameters of the super-resolution model determined in the (i-1) th iteration process based on the image training result of the ith iteration process and the plurality of pairs of second sample image blocks, performing the (i + 1) th iteration process based on the adjusted model parameters, and repeating the iteration process of the training until the training meets the target condition.
11. The video processing method according to claim 1, wherein before performing model training based on the plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, the method further comprises any one of:
performing mirror image inversion on pixels at the inner edge positions of the sample image block pairs, and adding the pixels after mirror image inversion to the outer edge positions of the sample image block pairs;
copying pixels on the inner edge position of the sample image block pair, and adding the copied pixels to the outer edge position of the sample image block pair;
and adding zero pixels at the outer edge position of the sample image block pair.
12. The video processing method according to claim 11, wherein after model training is performed on the sample image block pairs based on the plurality of pairs in the sample video to obtain a super-resolution model, the method further comprises:
performing super-resolution reconstruction on the video through the super-resolution model to obtain an initial super-resolution video;
and for any frame of image in the super-resolution video, cutting pixels on the inner edge position of the image, and generating the target super-resolution video of the video based on the cut image.
13. A video processing apparatus, characterized in that the apparatus comprises:
an information acquisition unit configured to perform acquisition of motion information of an image pair in a sample video, the image pair including a pair of adjacent images in the sample video, the motion information representing a pixel motion situation between two frames of images of the image pair;
an image acquisition unit configured to perform acquisition of a sample image block pair having motion information satisfying a motion condition from the image pair, two sample image blocks of the sample image block pair being respectively located in two frames of images of the image pair;
and the training unit is configured to perform model training based on a plurality of pairs of sample image blocks in the sample video to obtain a super-resolution model, and the super-resolution model is used for performing super-resolution reconstruction on the video.
14. A computer device, characterized in that the computer device comprises:
one or more processors;
a memory for storing the processor executable program code;
wherein the processor is configured to execute the program code to implement the video processing method of any of claims 1 to 12.
15. A computer-readable storage medium, characterized in that program code in the computer-readable storage medium, when executed by a processor of a computer device, enables the computer device to perform the video processing method of any of claims 1 to 12.
16. A computer program product comprising a computer program, characterized in that the computer program realizes the video processing method of any of claims 1 to 12 when executed by a processor.
CN202210458499.5A 2022-04-24 2022-04-24 Video processing method, video processing device, computer equipment and medium Pending CN114897688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210458499.5A CN114897688A (en) 2022-04-24 2022-04-24 Video processing method, video processing device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210458499.5A CN114897688A (en) 2022-04-24 2022-04-24 Video processing method, video processing device, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN114897688A true CN114897688A (en) 2022-08-12

Family

ID=82720560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210458499.5A Pending CN114897688A (en) 2022-04-24 2022-04-24 Video processing method, video processing device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN114897688A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757970A (en) * 2023-08-18 2023-09-15 腾讯科技(深圳)有限公司 Training method of video reconstruction model, video reconstruction method, device and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757970A (en) * 2023-08-18 2023-09-15 腾讯科技(深圳)有限公司 Training method of video reconstruction model, video reconstruction method, device and equipment
CN116757970B (en) * 2023-08-18 2023-11-17 腾讯科技(深圳)有限公司 Training method of video reconstruction model, video reconstruction method, device and equipment

Similar Documents

Publication Publication Date Title
CN109712224B (en) Virtual scene rendering method and device and intelligent device
CN110097576B (en) Motion information determination method of image feature point, task execution method and equipment
CN108449641B (en) Method, device, computer equipment and storage medium for playing media stream
CN108196755B (en) Background picture display method and device
CN108132790B (en) Method, apparatus and computer storage medium for detecting a garbage code
CN108920606B (en) Map data processing method, map data processing device, terminal equipment and storage medium
CN110839174A (en) Image processing method and device, computer equipment and storage medium
CN111178343A (en) Multimedia resource detection method, device, equipment and medium based on artificial intelligence
CN113763228A (en) Image processing method, image processing device, electronic equipment and storage medium
CN110807769B (en) Image display control method and device
CN110677713B (en) Video image processing method and device and storage medium
CN110572710B (en) Video generation method, device, equipment and storage medium
CN111625315A (en) Page display method and device, electronic equipment and storage medium
CN114897688A (en) Video processing method, video processing device, computer equipment and medium
CN111275607B (en) Interface display method and device, computer equipment and storage medium
CN110232417B (en) Image recognition method and device, computer equipment and computer readable storage medium
CN113301422B (en) Method, terminal and storage medium for acquiring video cover
CN111770339B (en) Video encoding method, device, equipment and storage medium
CN112329909B (en) Method, apparatus and storage medium for generating neural network model
CN110460856B (en) Video encoding method, video encoding device, video encoding apparatus, and computer-readable storage medium
CN109189525B (en) Method, device and equipment for loading sub-page and computer readable storage medium
CN116527993A (en) Video processing method, apparatus, electronic device, storage medium and program product
CN109597951B (en) Information sharing method and device, terminal and storage medium
CN113407774A (en) Cover determining method and device, computer equipment and storage medium
CN111061918A (en) Graph data processing method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination