CN114513684B - Method for constructing video image quality enhancement model, video image quality enhancement method and device - Google Patents

Method for constructing video image quality enhancement model, video image quality enhancement method and device Download PDF

Info

Publication number
CN114513684B
CN114513684B CN202011277818.XA CN202011277818A CN114513684B CN 114513684 B CN114513684 B CN 114513684B CN 202011277818 A CN202011277818 A CN 202011277818A CN 114513684 B CN114513684 B CN 114513684B
Authority
CN
China
Prior art keywords
enhancement
enhancement model
model
definition
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011277818.XA
Other languages
Chinese (zh)
Other versions
CN114513684A (en
Inventor
李志华
高政
杨松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN202011277818.XA priority Critical patent/CN114513684B/en
Publication of CN114513684A publication Critical patent/CN114513684A/en
Application granted granted Critical
Publication of CN114513684B publication Critical patent/CN114513684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/21Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a method for constructing a video image quality enhancement model, a method and a device for video image quality enhancement, which utilize a machine learning method to construct a definition enhancement model, a color enhancement model and a resolution enhancement model, integrate the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, realize definition enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to video image quality enhancement requests of users, meet different video image quality enhancement requests of users, and improve the universality of the video image quality enhancement method.

Description

Method for constructing video image quality enhancement model, video image quality enhancement method and device
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method for constructing a video image quality enhancement model, a method and an apparatus for enhancing video image quality.
Background
With the development of the times, the requirements of people on video image quality are continuously improved. However, at present, a large number of low-quality videos are caused by factors such as lag of shooting equipment, poor shooting technology, damage to image quality of the videos in the processes of manufacturing, transcoding and transmission, viewing experience is seriously affected, and extra code rate overhead is increased under the same conditions. Therefore, it is important to improve the image quality of low-quality video.
Video image quality enhancement is generally classified into a conventional method and a deep learning method. The traditional video enhancement method is mostly a set of enhancement rules established by experts according to attribute information (brightness, color temperature and the like) of videos, the enhancement effect depends on experience, and the accuracy is low. Most of the existing deep learning methods are researched aiming at a special scene, an end-to-end model is trained, and the universality is not strong.
Disclosure of Invention
In view of the above, the invention provides a method for constructing a video image quality enhancement model, a method and a device for enhancing video image quality, which realize enhancement of definition, color enhancement and resolution of video frames, and have strong universality and good enhancement effect.
In order to achieve the above purpose, the specific technical scheme provided by the invention is as follows:
A method for constructing a video image quality enhancement model comprises the following steps:
Acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
Training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
training a two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
Training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
The sharpness enhancement model, the color enhancement model and the resolution enhancement model are converted into a preset format and integrated in a video transcoding procedure in the order sharpness enhancement-color enhancement-resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained after training the convolutional neural network by using noise estimation training data.
Optionally, the training the self-coding network by using the sharpness enhancement model training data to obtain a sharpness enhancement model includes:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
Optionally, the training the two-way generation type countermeasure network by using the color enhancement model training data to obtain a color enhancement model includes:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
Optionally, training the convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model, including:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
A video image quality enhancement method, comprising:
Under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
Inputting the video frame to be enhanced into a video image enhancement model corresponding to the enhancement processing option to obtain a video frame after video image enhancement processing, wherein the video image enhancement model is pre-constructed according to the method for constructing the video image enhancement model disclosed by the embodiment, the definition enhancement option corresponds to the definition enhancement model, the color enhancement option corresponds to the color enhancement model, and the resolution enhancement option corresponds to the resolution enhancement model.
Optionally, when the video image quality enhancement request includes more than one enhancement processing options, the inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options, to obtain a video frame after video image quality enhancement processing, includes:
And inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options according to the sequence of definition enhancement, color enhancement and resolution enhancement, so as to obtain video frames after video image quality enhancement processing.
A construction device of a video image quality enhancement model comprises:
the training data acquisition unit is used for acquiring the definition enhancement model training data, the resolution enhancement model training data and the color enhancement model training data;
The definition enhancement model building unit is used for training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of convolutional neural networks;
The color enhancement model building unit is used for training the two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
The resolution enhancement model building unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
And the model integration unit is used for converting the definition enhancement model, the color enhancement model and the resolution enhancement model into preset formats and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained after training the convolutional neural network by using noise estimation training data.
Optionally, the sharpness enhancement model building unit is specifically configured to:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
Optionally, the color enhancement model building unit is specifically configured to:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
Optionally, the resolution enhancement model building unit is specifically configured to:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
A video image quality enhancement apparatus comprising:
The enhancement request analysis unit is used for analyzing the video image quality enhancement request under the condition of receiving the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of definition enhancement options, color enhancement options and resolution enhancement options;
The enhancement processing unit is configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option, so as to obtain a video frame after video image quality enhancement processing, where the video image quality enhancement model is pre-constructed according to the method for constructing a video image quality enhancement model disclosed in the foregoing embodiment, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Optionally, when the video image quality enhancement request includes more than one enhancement processing option, the enhancement processing unit is specifically configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option according to a sequence of sharpness enhancement, color enhancement, and resolution enhancement, so as to obtain a video frame after video image quality enhancement processing.
Compared with the prior art, the invention has the following beneficial effects:
The invention discloses a method for constructing a video image quality enhancement model, which utilizes a machine learning method to construct a definition enhancement model, a color enhancement model and a resolution enhancement model, integrates the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, realizes definition enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to video image quality enhancement requests of users, satisfies different video image quality enhancement requests of users, and improves the universality of the video image quality enhancement method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for constructing a video image quality enhancement model according to an embodiment of the present invention;
Fig. 2 is a schematic structural diagram of a self-coding network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a single-path generation type countermeasure network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention;
Fig. 5 is a flow chart of a video image quality enhancement method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a device for constructing a video image quality enhancement model according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of a video image quality enhancement device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the embodiment of the invention discloses a method for constructing a video image quality enhancement model, which specifically comprises the following steps:
s101: acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
Sharpness enhancement is largely divided into denoising and deblurring, where noise includes gaussian noise, compression noise, etc.; blur here refers to the most common motion blur. Therefore, the definition enhancement model training data is generated by simulating different rules, namely adding random noise, adopting different operators for filtering and the like, on the one hand, from an actual low-definition video on a video platform line, and on the other hand, designing a data expansion rule conforming to the actual low-quality video distribution.
The resolution enhancement model training data is partially generated by adopting a real low-resolution video of a video platform and partially generated by script simulation, namely, random noise or filtering of different operators are added while the image size is reduced, so that data close to the real low-resolution video is generated.
In this embodiment, the two-way generation type countermeasure network is used for training the color enhancement model, and the end-to-end unsupervised training is performed, so that the color enhancement model training data is a film library resource of the video platform, i.e. only a group of images with the color effect desired by us needs to be collected.
S102: training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of convolutional neural networks;
the self-coding network used to train the sharpness enhancement model includes an encoder and a decoder, referring to fig. 2, the left half is the encoder and the right half is the decoder, both 11 convolutional layers and 2 pooling layers. Representing a feature map,/>Representing a convolution layer,/>Representing a pooling layer,/>Representing the upsampling layer, in the encoder, the input data is first stepwise converted through the multi-layer convolutional layer and the pooling layer into a feature map of spatial size 1*1 and channel number 256, which is then converted back to the original size and channel number of the input data in the decoder.
And a jump connection structure is widely adopted among the characteristic diagrams in the network structure and is used for combining information of different convolution layers, so that gradient propagation and acceleration convergence are facilitated. Both the encoder and decoder structures employ a residual network resnet as a base module.
In order to process videos with different low definition conditions, a noise estimation sub-network is designed, training data is firstly input into the noise estimation network before entering an encoder, and noise values of the output of the training data and the noise estimation network are simultaneously and sequentially input into the encoder and the decoder, so that a robust output is obtained.
The noise estimation sub-network adopts a common full convolution network, and the noise size of the noise graph is simulated and generated by outputting a real reference value in the training process, and the embedding of the network can make the whole network insensitive to the noise size of an input image.
In order to ensure the training effect of the sharpness enhancement model, the training effect is estimated through the first loss function, wherein the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss and a smooth loss function smoth-loss, so that the rapid convergence of a network can be ensured, and the stability of model training can be ensured.
The output of the training data after passing through the whole self-coding network is input into a first loss function together with the real reference image data, and the result is backward returned for updating the value of the network parameter. The optimizer adopts the self-contained adam method of pytorch frames, the parameters are beta 1=0.9, beta 2=0.999, the batch_size is set to be 16, the initial learning rate is set to be 0.001, 100 epochs are continuously trained, and the subsequent strategy of step-down is adopted, namely, the learning rate of every 20 epochs is reduced to be 10% of the previous one, so that the supervised training is carried out.
S103: training the two-way generation type countermeasure network by utilizing the training data of the color enhancement model to obtain the color enhancement model;
The two-way generation type countermeasure network GAN structure fully utilizes the advantages of GAN in the aspect of image generation, relies on the film library resources of the video platform, and performs end-to-end non-supervision training. The structure of the single-path GAN is shown in fig. 3, and the two-path GAN are two single-path GAN in parallel, and some coupling mechanisms are added between the two single-path GAN. A Generator (Generator) is used to generate a color enhanced image, and a discriminator (Discriminator) distinguishes a true target image from the enhanced image generated by the Generator. We understand the toning problem as an image translation problem, i.e. translating an image of one style into an image of another style. By using the style migration algorithm Cycle-GAN as a reference, the second loss function adopts a cyclic consistency loss function (Cycle consistency loss), so that the occurrence of unstable conditions in the GAN network training process is greatly reduced. Different from other tasks, the color enhancement is difficult to find the training data pair, the GAN network is adopted for non-supervision training, only a group of pictures with the color effect which we wish to achieve are collected, and the difficulty of data collection is greatly reduced. In the training process, the parameter batch size is set to be 4, the learning rate of the generator and the discriminator is set to be 0.00001, and the non-supervision training is carried out on the learning rate by adopting a step-down strategy.
S104: training the convolutional neural network by utilizing the training data of the resolution enhancement model to obtain a resolution enhancement model;
Referring to fig. 4, the convolutional neural network for training the resolution enhancement model uses resnet as a basic module, and in order to reduce the total parameter number, a cascade mechanism is added between each resnet modules, i.e. the output of the middle layer is cascade connected to a higher layer, and finally, the output of the middle layer is converged to the final layer of convolutional layer. The first layer and the last layer of the network are mean shift layers with a convolution kernel of 1*1, the de-averaging and the inverse operation are respectively finished, and parameters do not need to be updated in training. The convolution kernel size of the other convolution layers is 3*3, and relu is adopted as an activation function. And the up-sampling layer adopts Pixelshuffle to perform multiplication processing on the output characteristic diagram.
The convolutional neural network evaluates training effects through a third loss function, which is designed by adopting a characteristic pyramid idea, and uses the polynomial sum of some intermediate layers and a final output layer as a final expression. The shallow layer of the network contains more basic information including textures, lines and the like, the high layer of the network contains more semantic information, and the advantage of designing a loss function by adopting the thought of a feature pyramid is that some detail parts can be finely depicted while super-resolution of an image is realized, so that the whole and detail mapping relation from a low-resolution image to a high-resolution image is fully learned.
The optimizer adopts adam method, the parameter batch size is set to 64, the initial learning rate is set to 0.0001, and the learning rate is also supervised and trained by adopting a step-down strategy.
S105: the sharpness enhancement model, the color enhancement model, and the resolution enhancement model are converted into a preset format and integrated in a video transcoding procedure in the order sharpness enhancement-color enhancement-resolution enhancement.
The model training algorithm in the embodiment of the invention is developed by adopting pytorch frames, and the algorithm is trained on the model Tesla P40 GPU of nvidia company after the network structure design is completed. The training parameters are continuously adjusted according to the training output of the algorithm, so that the algorithm finally converges below an ideal precision. The training resulting model is converted to the pb format of tensorflow framework so that it can be integrated into the ffmpeg transcoding process. The final use flow is approximately as follows: source video, decoding into video frames, dividing video scenes, selecting different model combinations according to requirements to carry out video enhancement, and merging and outputting the video frames.
The embodiment also discloses a video image quality enhancement method, which uses the video image quality enhancement model constructed in the above embodiment to perform video image quality enhancement processing, referring to fig. 5, and the method comprises the following steps:
S201: under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
That is, according to the enhancement requirements of the user on different video frames, a corresponding video image enhancement request is sent, and the enhancement processing options in the request can be any one of the definition enhancement options, the color enhancement options and the resolution enhancement options, any two of the options, or all three of the options.
S202: and inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options to obtain video frames subjected to video image quality enhancement processing, wherein the definition enhancement options correspond to the definition enhancement model, the color enhancement options correspond to the color enhancement model, and the resolution enhancement options correspond to the resolution enhancement model.
When the video image quality enhancement request comprises more than one enhancement processing options, inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options according to the sequence of definition enhancement, color enhancement and resolution enhancement, and obtaining video frames after video image quality enhancement processing.
Taking the example that the video image quality enhancement request comprises a definition enhancement option, a color enhancement option and a resolution enhancement option, inputting the video frame to be enhanced into a definition enhancement model, inputting the output result of the definition enhancement model into the color enhancement model, inputting the output result of the color enhancement model into the resolution enhancement model, and obtaining the output result of the resolution enhancement model as the video frame after the video image quality enhancement processing.
It can be seen that the video image quality enhancement method disclosed in this embodiment uses a machine learning method to construct a sharpness enhancement model, a color enhancement model and a resolution enhancement model, and integrates the sharpness enhancement model, the color enhancement model and the resolution enhancement model in a video transcoding program according to the order of sharpness enhancement, color enhancement and resolution enhancement, so as to implement sharpness enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to a video image quality enhancement request of a user, thereby satisfying different video image quality enhancement requests of the user and improving the universality of the video image quality enhancement method.
Based on the method for constructing a video image quality enhancement model disclosed in the foregoing embodiment, the present embodiment correspondingly discloses a device for constructing a video image quality enhancement model, please refer to fig. 6, which includes:
a training data obtaining unit 401, configured to obtain definition enhancement model training data, resolution enhancement model training data, and color enhancement model training data;
A sharpness enhancement model building unit 402, configured to train a self-coding network by using the sharpness enhancement model training data to obtain a sharpness enhancement model, where the self-coding network includes an encoder and a decoder, and the encoder and the decoder are respectively composed of convolutional neural networks;
A color enhancement model building unit 403, configured to train the two-way generation type countermeasure network by using the color enhancement model training data, so as to obtain a color enhancement model;
the resolution enhancement model building unit 404 is configured to train the convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model;
A model integration unit 405, configured to convert the sharpness enhancement model, the color enhancement model, and the resolution enhancement model into a preset format, and integrate the same into a video transcoding procedure in order of sharpness enhancement-color enhancement-resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained after training the convolutional neural network by using noise estimation training data.
Optionally, the sharpness enhancement model building unit 402 is specifically configured to:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
Optionally, the color enhancement model building unit 403 is specifically configured to:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
Optionally, the resolution enhancement model building unit 404 is specifically configured to:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
Based on the video image quality enhancement method disclosed in the above embodiment, the present embodiment correspondingly discloses a video image quality enhancement device, please refer to fig. 7, which includes:
An enhancement request parsing unit 501, configured to parse the video image quality enhancement request to obtain a video frame to be enhanced and an enhancement processing option, where the enhancement processing option is at least any one of a sharpness enhancement option, a color enhancement option and a resolution enhancement option, when the video image quality enhancement request is received;
The enhancement processing unit 502 is configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option, to obtain a video frame after video image quality enhancement processing, where the video image quality enhancement model is pre-constructed according to a method for constructing a video image quality enhancement model disclosed in the foregoing embodiment, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Optionally, when the video image quality enhancement request includes more than one enhancement processing option, the enhancement processing unit is specifically configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option according to a sequence of sharpness enhancement, color enhancement, and resolution enhancement, so as to obtain a video frame after video image quality enhancement processing.
The embodiment discloses a device for constructing a video image quality enhancement model and a video image quality enhancement device, which utilize a machine learning method to construct a definition enhancement model, a color enhancement model and a resolution enhancement model, integrate the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, realize definition enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to video image quality enhancement requests of users, satisfy different video image quality enhancement requests of users, and improve the universality of the video image quality enhancement method.
The above embodiments may be combined in any manner, and features described in the embodiments in the present specification may be replaced or combined with each other in the above description of the disclosed embodiments, so as to enable those skilled in the art to make or use the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. The method for constructing the video image quality enhancement model is characterized by comprising the following steps of:
Acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
Training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
training a two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
Training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
Converting the definition enhancement model, the color enhancement model and the resolution enhancement model into preset formats, and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement;
the self-coding network further comprises a noise estimation sub-network, wherein the noise estimation sub-network is obtained by training the convolutional neural network through noise estimation training data;
the training of the self-coding network by using the definition enhancement model training data to obtain a definition enhancement model comprises the following steps:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
2. The method of claim 1, wherein the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
3. The method of claim 1, wherein training the two-way generation type countermeasure network with the color enhancement model training data to obtain a color enhancement model, comprising:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
4. The method of claim 1, wherein the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
5. The method of claim 4, wherein training the convolutional neural network using the resolution enhancement model training data to obtain a resolution enhancement model comprises:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
6. A method for enhancing video quality, comprising:
Under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
Inputting the video frame to be enhanced into a video image enhancement model corresponding to the enhancement processing option to obtain a video frame after video image enhancement processing, wherein the video image enhancement model is pre-constructed according to the method for constructing a video image enhancement model according to any one of claims 1-5, the definition enhancement option corresponds to a definition enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
7. The method according to claim 6, wherein when the video image enhancement request includes more than one enhancement processing options, the inputting the video frame to be enhanced into the video image enhancement model corresponding to the enhancement processing options, to obtain the video frame after the video image enhancement processing, includes:
And inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options according to the sequence of definition enhancement, color enhancement and resolution enhancement, so as to obtain video frames after video image quality enhancement processing.
8. A device for constructing a video image quality enhancement model, comprising:
the training data acquisition unit is used for acquiring the definition enhancement model training data, the resolution enhancement model training data and the color enhancement model training data;
The definition enhancement model building unit is used for training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of convolutional neural networks;
The color enhancement model building unit is used for training the two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
The resolution enhancement model building unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
The model integration unit is used for converting the definition enhancement model, the color enhancement model and the resolution enhancement model into preset formats and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement;
the self-coding network further comprises a noise estimation sub-network, wherein the noise estimation sub-network is obtained by training the convolutional neural network through noise estimation training data;
the training of the self-coding network by using the definition enhancement model training data to obtain a definition enhancement model comprises the following steps:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
9. A video image quality enhancement apparatus, comprising:
The enhancement request analysis unit is used for analyzing the video image quality enhancement request under the condition of receiving the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of definition enhancement options, color enhancement options and resolution enhancement options;
The enhancement processing unit is configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option, to obtain a video frame after video image quality enhancement processing, where the video image quality enhancement model is pre-constructed according to a method for constructing a video image quality enhancement model according to any one of claims 1 to 5, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
CN202011277818.XA 2020-11-16 2020-11-16 Method for constructing video image quality enhancement model, video image quality enhancement method and device Active CN114513684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011277818.XA CN114513684B (en) 2020-11-16 2020-11-16 Method for constructing video image quality enhancement model, video image quality enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011277818.XA CN114513684B (en) 2020-11-16 2020-11-16 Method for constructing video image quality enhancement model, video image quality enhancement method and device

Publications (2)

Publication Number Publication Date
CN114513684A CN114513684A (en) 2022-05-17
CN114513684B true CN114513684B (en) 2024-05-28

Family

ID=81547015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011277818.XA Active CN114513684B (en) 2020-11-16 2020-11-16 Method for constructing video image quality enhancement model, video image quality enhancement method and device

Country Status (1)

Country Link
CN (1) CN114513684B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235058A (en) * 2018-01-12 2018-06-29 广州华多网络科技有限公司 Video quality processing method, storage medium and terminal
CN110020684A (en) * 2019-04-08 2019-07-16 西南石油大学 A kind of image de-noising method based on residual error convolution autoencoder network
CN110189268A (en) * 2019-05-23 2019-08-30 西安电子科技大学 Underwater picture color correcting method based on GAN network
CN110263801A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Image processing model generation method and device, electronic equipment
CN111882489A (en) * 2020-05-15 2020-11-03 东北石油大学 Super-resolution graph recovery method for simultaneously enhancing underwater images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235058A (en) * 2018-01-12 2018-06-29 广州华多网络科技有限公司 Video quality processing method, storage medium and terminal
CN110263801A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Image processing model generation method and device, electronic equipment
CN110020684A (en) * 2019-04-08 2019-07-16 西南石油大学 A kind of image de-noising method based on residual error convolution autoencoder network
CN110189268A (en) * 2019-05-23 2019-08-30 西安电子科技大学 Underwater picture color correcting method based on GAN network
CN111882489A (en) * 2020-05-15 2020-11-03 东北石油大学 Super-resolution graph recovery method for simultaneously enhancing underwater images

Also Published As

Publication number Publication date
CN114513684A (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN111739082B (en) Stereo vision unsupervised depth estimation method based on convolutional neural network
WO2020037965A1 (en) Method for multi-motion flow deep convolutional network model for video prediction
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
CN111260560B (en) Multi-frame video super-resolution method fused with attention mechanism
CN109949222B (en) Image super-resolution reconstruction method based on semantic graph
CN110751649A (en) Video quality evaluation method and device, electronic equipment and storage medium
CN112019828B (en) Method for converting 2D (two-dimensional) video into 3D video
CN112017116B (en) Image super-resolution reconstruction network based on asymmetric convolution and construction method thereof
Yu et al. A review of single image super-resolution reconstruction based on deep learning
CN116958534A (en) Image processing method, training method of image processing model and related device
Yao et al. Bidirectional translation between uhd-hdr and hd-sdr videos
CN109949217A (en) Video super-resolution method for reconstructing based on residual error study and implicit motion compensation
Chen et al. Image denoising via deep network based on edge enhancement
CN113992920A (en) Video compressed sensing reconstruction method based on deep expansion network
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
CN117576179A (en) Mine image monocular depth estimation method with multi-scale detail characteristic enhancement
CN112862675A (en) Video enhancement method and system for space-time super-resolution
CN114513684B (en) Method for constructing video image quality enhancement model, video image quality enhancement method and device
CN117196940A (en) Super-resolution reconstruction method suitable for real scene image based on convolutional neural network
CN116977169A (en) Data processing method, apparatus, device, readable storage medium, and program product
CN117242421A (en) Smart client for streaming of scene-based immersive media
CN115396683A (en) Video optimization processing method and device, electronic equipment and computer readable medium
Wang et al. Image quality enhancement using hybrid attention networks
Ma et al. Reduced-reference stereoscopic image quality assessment using gradient sparse representation and structural degradation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant