CN117058040A

CN117058040A - Image quality restoration model training method, device, equipment and storage medium

Info

Publication number: CN117058040A
Application number: CN202311095690.9A
Authority: CN
Inventors: 刘梦梦; 胡思行; 蒋念娟; 吕江波; 沈小勇
Original assignee: Beijing Simou Intelligent Technology Co ltd; Shenzhen Smartmore Technology Co Ltd
Current assignee: Beijing Simou Intelligent Technology Co ltd; Shenzhen Smartmore Technology Co Ltd
Priority date: 2023-08-28
Filing date: 2023-08-28
Publication date: 2023-11-14

Abstract

The invention discloses a training method, a training device, training equipment and a training storage medium for an image quality restoration model. The method comprises the following steps: determining a target high-definition video segment associated with a target repair task; performing degradation treatment on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment; and inputting the target low-quality video segment into a first network model constructed in advance to obtain model output, and carrying out model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training ending condition is met to obtain a target image quality restoration model. The embodiment of the invention improves the restoration effect and restoration quality of video image quality.

Description

Image quality restoration model training method, device, equipment and storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training an image quality restoration model.

Background

Animation is a popular form of art that plays an important role in the fields of movies, television, and games. However, many classical animated works are limited to low resolution in the production process due to technical limitations and cost considerations. Old animation appears blurred and distorted when played on modern high definition televisions and screens, and there are many image quality problems. The animation is processed and repaired to achieve high-definition quality, and has important background significance for animation industry.

In the prior art, in the process of repairing the image quality, a real-world video super-resolution technical method based on deep learning is generally adopted to repair the image quality of the video. However, the image quality restoration method in the prior art easily causes unclean animation restoration results, and has the problems of hollow line artifacts, clutter noise and the like, so that the restoration effect and restoration quality of the video are poor.

Disclosure of Invention

The invention provides a training method, a training device, training equipment and a storage medium for an image quality restoration model, which are used for improving the restoration effect and restoration quality of video image quality.

According to an aspect of the present invention, there is provided an image quality restoration model training method, the method including:

determining a target high-definition video segment associated with a target repair task;

performing degradation treatment on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment;

and inputting the target low-quality video segment into a first pre-constructed network model to obtain model output, and performing model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training ending condition is met to obtain a target image quality restoration model.

According to another aspect of the present invention, there is provided an image quality restoration model training apparatus including:

the high-definition segment determining module is used for determining a target high-definition video segment associated with a target repair task;

the low-quality segment determining module is used for carrying out degradation treatment on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment;

the target repair model determining module is used for inputting the target low-quality video segment into a first network model constructed in advance to obtain model output, and performing model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training ending condition is met to obtain a target image quality repair model.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the image quality restoration model training method according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the image quality restoration model training method according to any one of the embodiments of the present invention when executed.

According to the technical scheme, the target high-definition video clips related to the target repair task are determined, the target high-definition video clips are subjected to degradation treatment to obtain target low-quality video clips corresponding to the target high-definition video clips, the target low-quality video clips are input into a first network model constructed in advance to obtain model output, the first network model is subjected to model training according to the model output and the corresponding target high-definition video clips until the preset first model training end condition is met, a target image quality repair model is obtained, accurate training of the first network model is achieved, and the model accuracy of the target image quality repair model obtained through training is higher, so that the repair effect and the repair quality of video image quality repair by adopting the target image quality repair model are improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1A is a flowchart of a training method for an image quality restoration model according to a first embodiment of the present invention;

FIG. 1B is a block diagram of a multi-scale cyclic network of a first network model according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a training method for an image quality restoration model according to a second embodiment of the present invention;

FIG. 3A is a flowchart of a training method for an image quality restoration model according to a third embodiment of the present invention;

FIG. 3B is a diagram of a second network model according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an image quality restoration model training device according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device implementing the image quality restoration model training method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1A is a flowchart of an image quality restoration model training method according to an embodiment of the present invention, where the method may be performed by an image quality restoration model training device, and the image quality restoration model training device may be implemented in hardware and/or software, and the image quality restoration model training device may be configured in an electronic device. As shown in fig. 1A, the method includes:

s110, determining a target high-definition video segment associated with the target repair task.

The target repair task may be a task to be subjected to video image quality repair. The video types of the video to be repaired in different target repair tasks are different, and the image characteristics of the image frames of different video types are different. Therefore, the target high-definition image segment conforming to the image characteristic can be determined according to the image characteristic of the video to be repaired of the target repair task.

The target high-definition image segment can be a video segment (clip) obtained by combining high-definition image frames; the image characteristics of the target high-definition image segment are consistent with the image characteristics of the video to be repaired in the target repair task, specifically, at least one high-definition video segment is intercepted from the high-definition video consistent with the image characteristics of the video to be repaired, and the intercepted high-definition video segment is taken as the target high-definition video segment.

Illustratively, determining image features of an image frame of a video to be repaired in a target repair task; acquiring a high-definition video with the same video type as the video to be repaired or similar texture characteristics; and intercepting at least one video clip from the high-definition video as a target high-definition video clip. For example, the video type of the video to be repaired is an animation video, and the texture features of the video image of the video to be repaired are rich, and then a high-definition video which is the same as the video type of the video to be repaired and has rich image feature textures can be selected from the high-definition video database.

S120, performing degradation treatment on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment.

The degradation process may be to reduce the picture quality of the high definition video. The number of the target high-definition video clips can be multiple, and different target high-definition video clips correspond to the target low-quality video clips.

For example, a preset degradation processing mode may be adopted to perform degradation processing on the target high-definition video segment, so as to obtain the target high-definition video segment. The degradation processing method may be preset by a person skilled in the art, for example, the degradation processing method may be resolution degradation. Specifically, different low resolutions can be preset, and video clips of the target definition video clip under different low resolutions can be obtained and used as target low-quality video clips.

It should be noted that, in order to achieve accurate degradation of the target high-definition video segment, degradation processing of the target high-definition video segment is achieved to different extents, so that the target high-definition video segment can include various degradation features, and thus the obtained target low-quality video segment is more realistic, and the degradation processing of the target high-definition video segment can also be performed in the following manner.

In an alternative embodiment, the degradation processing is performed on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment, including: obtaining at least one degradation treatment mode; combining the modes of the degradation treatment modes to obtain at least one combined degradation treatment mode; according to the combined degradation processing mode, carrying out degradation processing on the target high-definition video to obtain a target low-quality video segment corresponding to the target high-definition video segment; the degradation processing mode comprises at least one of a video compression processing mode, an image compression processing mode, a downsampling processing mode, a sawtooth adding processing mode, a noise adding processing mode and a blurring processing mode.

The video compression process may be performed by encoding and decoding the video based on FFmpeg (Fast Forward Mpeg, MPEG video coding standard) mode. The image compression processing may be image compression processing based on a JPEG (Joint Photographic Experts Group, image compression standard) compression (JPEG compression) algorithm. The downsampling process may be a random downsampling process, that is, downsampling by different factors, for example, 2-times downsampling or 4-times downsampling. Noise handling may include gaussian color channel noise (Gaussian Color Noise) handling and gaussian ash channel noise (Gaussian Gray Noise) handling. The blurring approach may be to generate different blurring data based on different kernels (kernel). For example, the resulting blur data may be generated separately from a gaussian blur kernel, a bivariate generalized gaussian kernel, and a multi-directionally flat, non-isotropic kernel. It should be noted that the degradation processing manner may also include other manners, and is not limited to the above-described various processing manners in this embodiment.

The combined degradation treatment method may be obtained by combining the degradation treatment methods in a predetermined degradation order. For example, the combined degradation processing may be image compression processing, downsampling processing, blurring processing, video compression processing, noise addition processing, and jaggy addition processing in this order.

In an optional embodiment, according to the combined degradation processing manner, the degradation processing is performed on the target high-definition video, and the obtaining the target low-quality video segment corresponding to the target high-definition video segment may be: performing video compression processing on the target high-definition video segment based on a video compression processing mode to obtain a first degraded video segment; performing image compression processing on the first degraded video clips based on an image compression processing mode to obtain second degraded video clips; performing downsampling processing on the second degraded video segment based on the downsampling processing mode to obtain a third degraded video segment; performing sawtooth adding processing on the third degraded video fragment based on a sawtooth adding processing mode to obtain a fourth degraded video fragment; performing noise adding processing on the fourth degraded video segment based on the noise adding processing mode to obtain a fifth degraded video segment; and carrying out fuzzy processing on the fifth degraded video clips based on the simulation processing mode to obtain target low-quality video clips.

S130, inputting the target low-quality video segment into a first network model constructed in advance to obtain model output, and performing model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training ending condition is met to obtain a target image quality restoration model.

The first network model may be a network model for performing image quality restoration, may be constructed in advance by a person skilled in the art, or may be an existing image quality restoration model. For example, the first network model may be an RLSP (Recurrent Latent Space Propagation, cyclic potential space propagation) network model.

For example, the target low-quality video clip may be input to a first network model constructed in advance, resulting in a model output. The model output result can be a predicted high-definition video segment obtained by prediction. And carrying out model training on the first network model according to the predicted high-definition video segment and the target high-definition video segment which are output by the model until a preset first model training ending condition is met, so as to obtain the target image quality restoration model. The first model training end condition may be preset by a related technician. For example, the first model training end condition may be reaching a set model iteration number threshold. For example, the model iteration number threshold may be 100.

The target high-definition video segment can be real value data of the target low-quality video segment, and in the model training process, the target high-definition video segment is used as tag data of the target low-quality video segment to participate in model training, so that a predicted high-definition video segment output by the model is obtained. In order to further improve the model accuracy of the target image quality restoration model obtained through training, the model training can be ended when the output result obtained by the model is sufficiently accurate.

In an alternative embodiment, performing model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training end condition is met, to obtain a target image quality repair model, including: determining a first loss value and a second loss value according to the model output of the first network model and the corresponding target high-definition video segment; determining whether the first network model meets a preset first model training ending condition according to the first loss value and the second loss value; if yes, the first network model after model training is finished is taken as a target image quality restoration model.

Wherein the first loss value may be an LI loss value and the second loss value may be a perceptual loss value. The first model training end condition may be that the first loss value and the second loss value tend to stabilize or reach a preset loss threshold, and the model training may be terminated.

Illustratively, the first loss value and the second loss value are determined based on a preset loss function according to the predicted high-definition video segment and the target high-definition video segment output by the model of the first network model. If the first loss value and the second loss value tend to be stable or reach a preset loss threshold value, model training is stopped, and a target image quality restoration model is obtained; if the first loss value or the second loss value does not tend to be stable or does not reach a preset loss threshold value, continuing to train the first network model until the first model training ending condition is met, and obtaining the target image quality restoration model.

In order to fully extract the image features of the video image in the model training process, the internal structure of the RLSP cell module in the RLSP model can be optimized and improved based on the RLSP model, a multi-scale design is adopted, and feature fusion on different scales is fully utilized, so that the image quality restoration effect of the video image is improved.

In an alternative embodiment, the first network model comprises a multi-scale cyclic network; the multi-scale circulation network comprises a plurality of circulation branches, and the branch scales of the circulation branches are different; at least one residual block for extracting the image information features is respectively arranged on each circulation branch.

The multi-scale loop network may be a network with partial structural improvement on an RLSP cell module in an RLSP model. The corresponding scales on the plurality of loop branches are different, and for example, four branches may be included, respectively assigned with four feature extraction scales, for example, four scales of x 1, ×0.5, ×0.25, and×0.125, respectively.

A block diagram of a multi-scale cyclic network of a first network model is shown in fig. 1B. Wherein the multi-scale cyclic network comprises 15 residual blocks ResBlk and is assigned four scales of x 1, x 0.5, x 0.25 and x 0.125, respectively. And the residual block ResBlk receives the output result of the fusion layer fusion after feature fusion, performs downsampling processing of different scales by dividing the output result into four scales, and samples the result after feature extraction back to an initial size state based on the corresponding scale, so that feature fusion on different scales is realized, and the extracted features are more diversified. With unidirectional cyclic structure, in each time step, the cyclic block can only access the hidden state of the previous time step and the output of the previous frame. The simple and efficient structure has higher practicability, lower calculation overhead and is more suitable for the super-resolution repair task of the animation video.

The content detail information of the animation video is usually related to elements such as lines, colludes, flat color blocks and the like, and has a large difference from the real video. Proper resolution adjustment does not affect the preservation of detail information, and can reduce the artifacts of pictures, thereby being beneficial to keeping the naturalness of the animation video and improving the quality in the super-resolution recovery process. The multi-scale design can fuse features of different scales, so that details and structures of the animation video can be better recovered. The cyclic network structure can effectively utilize the time sequence information and the multi-scale feature fusion, and the super-resolution recovery performance is improved.

Example two

Fig. 2 is a flowchart of a training method for an image quality restoration model according to a second embodiment of the present invention, where the present embodiment is optimized and improved based on the above technical solutions.

Further, the step of determining the target high-definition video segment associated with the target repair task is thinned to obtain high-definition animation video data with the same video type as the video to be repaired of the target repair task; according to the preset image frame number, extracting the image frame of the high-definition animation video data to obtain candidate high-definition video fragments; and selecting the target high-definition video segments from the candidate high-definition video segments according to the target repair task. To perfect the determination of the target high definition video clip. In the embodiments of the present invention, the descriptions of other embodiments may be referred to in the portions not described in detail.

As shown in fig. 2, the method comprises the following specific steps:

s210, obtaining high-definition video data with the same video type as the video to be repaired of the target repair task.

For example, high-definition animation video data of the same video type as the video to be repaired of the target repair task can be obtained. For example, if the video type of the video to be repaired of the target repair task is an animated video type, high definition video data of the animated video type may be acquired.

And S220, extracting image frames of the high-definition video data according to the preset image frame number to obtain candidate high-definition video fragments.

The preset image frame number can be preset by a related technician according to actual requirements. For example, the preset image frame number may be 100 frames.

For example, image frame extraction may be performed on the high-definition video data according to a preset image frame number, so as to obtain candidate high-definition video segments. It should be noted that, the process of extracting the image frames is to extract continuous image frames from the high-definition video data, and at least one candidate high-definition video segment may be extracted from the same high-definition video data.

S230, selecting the target high-definition video segments from the candidate high-definition video segments according to the target repair task.

It should be noted that, the requirements for selecting the target high-definition video clips for different target repair tasks are different. For example, for target repair tasks with rich picture content, the requirements for transition frames and image textures are higher, while for target repair tasks with simple picture content, the requirements for transition frames and image textures are lower. Therefore, the selection modes of the target high-definition video clips are different according to different repair tasks.

In an alternative embodiment, selecting the target high definition video clip from the candidate high definition video clips according to the target repair task includes: performing transition frame detection on the candidate high-definition video segments to obtain a transition frame detection result; determining a target gradient value of the candidate high-definition video segment according to the image gradient value corresponding to the image frame of the candidate high-definition video segment; and selecting the target high-definition video segment from the candidate high-definition video segments based on the task type of the target repair task according to the transition frame detection result and the target gradient value of the candidate high-definition video segment.

For example, for a task requiring detection of a transition frame, in order to avoid occurrence of a video segment with a transition frame in a sample training set, the detection of the transition frame may be performed on a candidate high-definition video segment, so as to obtain a detection result of the transition frame. The transition frame detection result may include that a transition frame exists in the video segment and that a transition frame does not exist in the video segment.

Wherein the image gradient values are used to characterize the image texture richness. The richer the image texture, the higher the gradient value; the more compact the image texture, the lower the gradient value.

For example, the image gradient value corresponding to each image frame in the candidate high-definition video frequency band may be determined, and an average value of the image gradient values of each image frame may be used as the target gradient value of the candidate high-definition video segment. The larger the target gradient value is, the more abundant the texture features of the candidate high-definition video clips are; the smaller the target gradient value, the simpler the texture features of the candidate high-definition video segment.

The task type can be determined based on the video to be repaired of the target repair task. If the textures of the video image of the video to be repaired are rich, determining a target gradient value in the task type; if the texture of the video image of the video to be repaired is simpler, the task type does not need to include the determination of the target gradient value. Also included in the task type may be whether a transition frame may be present.

For example, if it is determined that the high-definition video segments requiring abundant textures and no transition frame exist according to the task type of the target repair task, selecting a candidate high-definition video segment with no transition frame and a larger target gradient value as the target high-definition video segment from the candidate high-definition video segments as a result of the transition frame.

Optionally, for the candidate high-definition video segments with the transition frames, if the requirement for the transition frames exists, the rejection operation can be performed on the transition frames in the candidate high-definition video segments, so as to obtain the target high-definition video segments without the transition frames.

According to the method, the device and the system, the target high-definition video clips are selected from the candidate high-definition video clips based on task types of target repair tasks according to the transition frame detection results and the target gradient values of the candidate high-definition video clips, so that the target high-definition video clips are selected in a targeted mode, task requirements of the target repair tasks are fully considered in the selection process, and the accurate selection of the target high-definition video clips is realized by combining the transition frame results and the target gradient values.

Alternatively, when the target low-quality video segment and the target high-definition video segment are used as the sample training set, the transition frame and the target gradient value can be used as sample label (label) data in the sample set.

S240, performing degradation treatment on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment.

S250, inputting the target low-quality video segment into a first network model constructed in advance to obtain model output, and carrying out model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training ending condition is met to obtain a target image quality restoration model.

According to the technical scheme, the high-definition animation video data with the same video type as the video to be repaired of the target repair task are obtained, the image frame extraction is carried out on the high-definition animation video data according to the preset image frame number, the candidate high-definition video segments are obtained, the target high-definition video segments are selected from the candidate high-definition video segments according to the target repair task, the targeted selection of the target high-definition video segments is achieved, the accurate construction of a sample training set is improved, and the training accuracy of the target image quality repair model is further improved.

Example III

Fig. 3A is a flowchart of an image quality restoration model training method according to a third embodiment of the present invention, where optimization and improvement are performed based on the above technical solutions.

Further, the step of carrying out degradation treatment on the target high-definition video segments to obtain target low-quality video segments corresponding to the target high-definition video segments is thinned into the step of inputting the target high-definition video segments into a target degradation model obtained through training in advance to obtain target reference degradation video segments; and obtaining a target low-quality video segment corresponding to the target clear video segment based on a preset degradation processing mode according to the target reference degradation video segment. The generation mode of the target low-quality video clips is perfected. In the embodiments of the present invention, the descriptions of other embodiments may be referred to in the portions not described in detail.

As shown in fig. 3A, the method comprises the following specific steps:

s310, determining a target high-definition video segment associated with the target repair task.

S320, inputting the target high-definition video segment into a target degradation model obtained through pre-training, and obtaining a target reference degradation video segment.

The target model can be obtained by training related technicians in advance, can be an existing video degradation model, can be obtained by carrying out model improvement on the existing video degradation model, and can be obtained by carrying out model training on a first network model on the improved model.

The method comprises the steps of inputting a target high-definition video segment into a target degradation model obtained through training in advance, obtaining a model output result of the target degradation model, and taking the model output result as a target reference degradation video segment.

It should be noted that, in order to further improve the segment quality of the target reference degraded video segment, the target degraded model may also be obtained in the following manner, and by improving the accuracy of the model, the accuracy of the output result of the model is improved.

In an alternative embodiment, the target degradation model is trained by: acquiring reference low-quality video data, and extracting image frames of the reference low-quality video data to obtain a reference low-quality video segment; inputting the reference low-quality video segment into a target image quality restoration model to obtain a reference high-definition video segment corresponding to the reference low-quality video segment; and inputting the reference high-definition video segment into a second network model constructed in advance to obtain model output, and carrying out model training on the first network model according to the model output and the corresponding reference low-quality video segment until a preset second model training ending condition is met to obtain a target degradation model.

The acquired reference low-quality video data needs to have video diversity, and specifically includes different degradation features, such as blurring, noise, compression, and the like, so that different types of degradation features can be captured.

For example, image frame extraction may be performed on the reference low-quality video data based on a preset image frame number, so as to obtain a reference low-quality video clip. And inputting the reference low-quality video segment into the target image quality restoration model to obtain a reference high-definition video segment corresponding to the reference low-quality video segment. It can be understood that, in order to obtain degradation effects with different degrees, resolution adjustment with different multiples can be performed on the reference low-quality video segment, for example, 2 times, 4 times, 8 times, and the like, and from the reference high-definition video segments obtained by outputting the model based on the adjustment with different multiples, the output result closest to the video high-definition degree required by the target repair task is selected as the reference high-definition video segment.

The LI loss value, the perceived loss value and the GAN (Generative Adversarial Network, generated against the network) loss value may be obtained from the output low-quality video clip and the reference low-quality video clip output by the model.

The first model training ending condition may be that the L1 loss value, the perceptual loss value, and the GAN loss value tend to be stable or reach a preset loss threshold, and then model training may be terminated.

Illustratively, the L1 loss value, the perceived loss value, and the GAN loss value are determined based on a preset loss function from the output low-quality video clip and the reference low-quality video clip output by the model of the second network model. If the L1 loss value, the perception loss value and the GAN loss value all tend to be stable or reach a preset loss threshold value, model training is stopped, and a target degradation model is obtained; if the L1 loss value, the perception loss value or the GAN loss value does not tend to be stable or does not reach a preset loss threshold value, continuing to train the second network model until the second model training ending condition is met, and obtaining the target degradation model.

The second network model may be an existing network model or a network model previously constructed by a related technician. A model structure diagram of a second network model is shown in fig. 3B. Wherein the second network model comprises Pixel unshifted layers and four 3 x 3 convolutional layers between which an activation function is used. The convolution layer is used for extracting features; the pixel back-shuffle layer is used to convert the high resolution image to a low resolution image.

S330, obtaining a target low-quality video segment corresponding to the target clear video segment based on a preset degradation processing mode according to the target reference degradation video segment.

The degradation processing mode may include at least one of a video compression processing mode, an image compression processing mode, a downsampling processing mode, a jaggy adding processing mode, a noise adding processing mode, and a blurring processing mode.

For example, a target reference degraded video segment may be subjected to secondary degradation in a degradation manner, so as to obtain a target low-quality video segment.

In an alternative embodiment, the downsampling process may be performed on the target reference degraded video clips to obtain the first reference degraded video clip; inputting the downsampled reference degraded video clips into the target degraded model again to obtain second reference degraded video clips; downsampling the second reference degraded video segment again to obtain a third reference degraded video segment; performing sawtooth adding processing on the third reference degraded video segment to obtain a fourth reference degraded video segment; performing noise adding processing on the fourth reference degraded video segment to obtain a fifth reference degraded video segment; performing fuzzy processing on the fifth reference degraded video segment to obtain a sixth reference degraded video segment; and carrying out video compression processing on the sixth reference degraded video segment to obtain the target low-quality video segment.

S340, inputting the target low-quality video segment into a first network model constructed in advance to obtain model output, and carrying out model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training ending condition is met to obtain a target image quality restoration model.

According to the technical scheme, the target high-definition video clips are input into the target degradation model obtained through pre-training, the target reference degradation video clips are obtained, the target low-quality video clips corresponding to the target high-definition video clips are obtained based on a preset degradation processing mode according to the target reference degradation video clips, the reference high-definition video data are introduced, which is equivalent to the introduction of pseudo high-quality sample data, the network is guided to generate the low-quality video data which are more in line with real degradation in the training process, the degradation condition of the actual low-quality video can be better simulated, the sample training set is more close to the real effect of the real low-quality video data, and therefore the processing capacity of the target image quality restoration model for various quality damages is improved.

Example IV

Fig. 4 is a schematic structural diagram of an image quality restoration model training device according to a fourth embodiment of the present invention. The device for training the image quality restoration model provided by the embodiment of the invention can be suitable for the condition of carrying out quality restoration on the image quality of low-quality video, and the device for training the image quality restoration model can be realized in a hardware and/or software mode, as shown in fig. 4, and specifically comprises: a high definition fragment determination module 401, a low quality fragment determination module 402, and a target repair model determination module 403.

Wherein,

a high definition segment determining module 401, configured to determine a target high definition video segment associated with a target repair task;

the low-quality segment determining module 402 is configured to perform degradation processing on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment;

the target repair model determining module 403 is configured to input the target low-quality video segment to a first network model constructed in advance to obtain a model output, and perform model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training end condition is met, to obtain a target image quality repair model.

Optionally, the high-definition segment determining module 401 includes:

the high-definition video data acquisition unit is used for acquiring high-definition video data with the same video type as the video to be repaired of the target repair task;

the candidate high-definition segment extraction unit is used for extracting image frames of the high-definition video data according to the preset image frame number to obtain candidate high-definition video segments;

and the high-definition segment determining unit is used for selecting the target high-definition video segment from the candidate high-definition video segments according to the target repair task.

Optionally, the high-definition segment determining unit includes:

the detection result determining subunit is used for carrying out transition frame detection on the candidate high-definition video segments to obtain a transition frame detection result; the method comprises the steps of,

the gradient value determining subunit is used for determining a target gradient value of the candidate high-definition video segment according to the image gradient value corresponding to the image frame of the candidate high-definition video segment;

and the high-definition segment determining subunit is used for selecting the target high-definition video segment from the candidate high-definition video segments based on the task type of the target repair task according to the transition frame detection result and the target gradient value of the candidate high-definition video segment.

Optionally, the low-quality fragment determining module 402 includes:

a degradation processing mode obtaining unit for obtaining at least one degradation processing mode;

a combination mode determining unit, configured to perform mode combination on each degradation processing mode to obtain at least one combination degradation processing mode;

the first low-quality segment determining unit is used for carrying out degradation processing on the target high-definition video according to the combined degradation processing mode to obtain a target low-quality video segment corresponding to the target high-definition video segment;

the degradation processing mode comprises at least one of a video compression processing mode, an image compression processing mode, a downsampling processing mode, a sawtooth adding processing mode, a noise adding processing mode and a blurring processing mode.

Optionally, the low-quality fragment determining module 402 includes:

the reference degradation segment determining unit is used for inputting the target high-definition video segment into a target degradation model obtained through training in advance to obtain a target reference degradation video segment;

and the second low-quality segment determining unit is used for obtaining a target low-quality video segment corresponding to the target high-definition video segment based on a preset degradation processing mode according to the target reference degradation video segment.

Optionally, the target degradation model is obtained through training in the following manner:

acquiring reference low-quality video data, and extracting image frames of the reference low-quality video data to obtain a reference low-quality video segment;

inputting the reference low-quality video segment into the target image quality restoration model to obtain a reference high-definition video segment corresponding to the reference low-quality video segment;

and inputting the reference high-definition video segment into a pre-constructed second network model to obtain model output, and performing model training on the first network model according to the model output and the corresponding reference low-quality video segment until a preset second model training ending condition is met to obtain a target degradation model.

Optionally, the first network model includes a multi-scale loop network; the multi-scale circulation network comprises a plurality of circulation branches, and the branch scales of the circulation branches are different; and at least one residual block for extracting the image information features is respectively arranged on each circulation branch.

Optionally, the target repair model determining module 403 includes:

the loss value determining unit is used for determining a first loss value and a second loss value according to the model output of the first network model and the corresponding target high-definition video segment;

The ending condition judging unit is used for determining whether the first network model meets a preset first model training ending condition according to the first loss value and the second loss value;

and the target repair model determining unit is used for taking the first network model after model training is finished as a target image quality repair model if a preset first model training finishing condition is met.

The image quality restoration model training device provided by the embodiment of the invention can execute the image quality restoration model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 5 shows a schematic diagram of an electronic device 50 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 50 includes at least one processor 51, and a memory, such as a Read Only Memory (ROM) 52, a Random Access Memory (RAM) 53, etc., communicatively connected to the at least one processor 51, in which the memory stores a computer program executable by the at least one processor, and the processor 51 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 52 or the computer program loaded from the storage unit 58 into the Random Access Memory (RAM) 53. In the RAM 53, various programs and data required for the operation of the electronic device 50 can also be stored. The processor 51, the ROM 52 and the RAM 53 are connected to each other via a bus 54. An input/output (I/O) interface 55 is also connected to bus 54.

Various components in the electronic device 50 are connected to the I/O interface 55, including: an input unit 56 such as a keyboard, a mouse, etc.; an output unit 57 such as various types of displays, speakers, and the like; a storage unit 58 such as a magnetic disk, an optical disk, or the like; and a communication unit 59 such as a network card, modem, wireless communication transceiver, etc. The communication unit 59 allows the electronic device 50 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The processor 51 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 51 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 51 performs the various methods and processes described above, such as the image quality restoration model training method.

In some embodiments, the image quality restoration model training method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 58. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 50 via the ROM 52 and/or the communication unit 59. When the computer program is loaded into RAM 53 and executed by processor 51, one or more steps of the image quality restoration model training method described above may be performed. Alternatively, in other embodiments, processor 51 may be configured to perform the image quality restoration model training method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. The image quality restoration model training method is characterized by comprising the following steps of:

2. The method of claim 1, wherein the determining a target high definition video clip associated with a target repair task comprises:

acquiring high-definition video data with the same video type as the video to be repaired of the target repair task;

extracting image frames of the high-definition video data according to the preset image frame number to obtain candidate high-definition video fragments;

and selecting the target high-definition video segment from the candidate high-definition video segments according to the target repair task.

3. The method of claim 2, wherein selecting the target high definition video clip from the candidate high definition video clips according to the target repair task comprises:

performing transition frame detection on the candidate high-definition video segments to obtain a transition frame detection result; the method comprises the steps of,

determining a target gradient value of the candidate high-definition video segment according to an image gradient value corresponding to the image frame of the candidate high-definition video segment;

and selecting a target high-definition video segment from the candidate high-definition video segments based on the task type of the target repair task according to the transition frame detection result and the target gradient value of the candidate high-definition video segment.

4. The method of claim 1, wherein the performing the degradation process on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment comprises:

obtaining at least one degradation treatment mode;

combining the modes of the degradation treatment modes to obtain at least one combined degradation treatment mode;

performing degradation treatment on the target high-definition video according to the combined degradation treatment mode to obtain a target low-quality video segment corresponding to the target high-definition video segment;

5. The method of claim 1, wherein the performing the degradation process on the target high-definition video segment to obtain a target low-quality video segment corresponding to the target high-definition video segment comprises:

inputting the target high-definition video segment into a target degradation model obtained by training in advance to obtain a target reference degradation video segment;

And obtaining a target low-quality video segment corresponding to the target high-definition video segment based on a preset degradation processing mode according to the target reference degradation video segment.

6. The method of claim 5, wherein the target degradation model is trained by:

and inputting the reference high-definition video segment into a pre-built second network model to obtain model output, and performing model training according to the model output and the corresponding reference low-quality video segment until a preset second model training ending condition is met to obtain a target degradation model.

7. The method of any of claims 1-6, wherein the first network model comprises a multi-scale cyclic network; the multi-scale circulation network comprises a plurality of circulation branches, and the branch scales of the circulation branches are different; and at least one residual block for extracting the image information features is respectively arranged on each circulation branch.

8. The method according to any one of claims 1-6, wherein performing model training on the first network model according to the model output and the corresponding target high-definition video segment until a preset first model training end condition is met, to obtain a target image quality repair model, includes:

determining a first loss value and a second loss value according to the model output of the first network model and the corresponding target high-definition video segment;

determining whether the first network model meets a preset first model training ending condition according to the first loss value and the second loss value;

if yes, the first network model after model training is finished is taken as a target image quality restoration model.

9. An image quality restoration model training device, characterized by comprising:

10. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the image quality restoration model training method of any one of claims 1-8.

11. A computer readable storage medium storing computer instructions for causing a processor to implement the image quality restoration model training method according to any one of claims 1-8 when executed.