CN114513684A - Method for constructing video image quality enhancement model, method and device for enhancing video image quality - Google Patents
Method for constructing video image quality enhancement model, method and device for enhancing video image quality Download PDFInfo
- Publication number
- CN114513684A CN114513684A CN202011277818.XA CN202011277818A CN114513684A CN 114513684 A CN114513684 A CN 114513684A CN 202011277818 A CN202011277818 A CN 202011277818A CN 114513684 A CN114513684 A CN 114513684A
- Authority
- CN
- China
- Prior art keywords
- enhancement
- enhancement model
- model
- video
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 7
- 238000012549 training Methods 0.000 claims description 135
- 238000012545 processing Methods 0.000 claims description 49
- 238000013527 convolutional neural network Methods 0.000 claims description 37
- 238000010276 construction Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 2
- 230000010354 integration Effects 0.000 claims description 2
- 238000009499 grossing Methods 0.000 claims 1
- 238000010801 machine learning Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 62
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 9
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234363—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440263—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/21—Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention provides a method for constructing a video image quality enhancement model, a method for enhancing the video image quality and a device thereof, which utilize a machine learning method to realize the definition enhancement and/or the color enhancement and/or the resolution enhancement of the video image quality in a video transcoding program according to the video image quality enhancement request of a user by constructing a definition enhancement model, a color enhancement model and a resolution enhancement model and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model in a video transcoding program according to the sequence of the definition enhancement, the color enhancement and/or the resolution enhancement, thereby meeting the different video image quality enhancement requests of the user and improving the universality of the video image quality enhancement method.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a method for constructing a video image quality enhancement model, a method and a device for enhancing video image quality.
Background
With the development of the times, the requirements of people on video quality are continuously improved. However, at present, a large number of low-quality videos are caused by factors such as laggard shooting equipment, poor shooting technology, damage to image quality of the videos in the processes of making, transcoding and transmitting, the watching experience is seriously affected, and extra code rate overhead is increased under the same condition. Therefore, it is significant to improve the image quality of low-quality video.
Video quality enhancement is generally classified into a conventional method and a deep learning method. Most of the traditional video enhancement methods are a set of enhancement rules made by experts according to the attribute information (brightness, color temperature and the like) of video, the enhancement effect depends on experience, and the accuracy is low. Most of the existing deep learning methods are researched aiming at a certain special scene, an end-to-end model is trained, and the universality is not strong.
Disclosure of Invention
In view of this, the present invention provides a method for constructing a video image quality enhancement model, a method and an apparatus for enhancing video image quality, which achieve enhancement of definition, color enhancement and resolution of video frames, and have strong versatility and good enhancement effect.
In order to achieve the above purpose, the invention provides the following specific technical scheme:
a method for constructing a video image quality enhancement model comprises the following steps:
acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
training a self-coding network by using the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
training the two-way generation type countermeasure network by using the color enhancement model training data to obtain a color enhancement model;
training a convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model;
and converting the definition enhancement model, the color enhancement model and the resolution enhancement model into a preset format, and integrating the preset format in a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained by training the convolutional neural network with noise estimation training data.
Optionally, the training a self-coding network by using the sharpness enhancement model training data to obtain a sharpness enhancement model includes:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
inputting the sharpness enhancement model training data and the noise value into the encoder and the decoder in sequence to obtain output data of the self-coding network;
inputting the output data of the self-coding network and the real reference image data of the sharpness enhancement model training data into a first loss function to obtain an output value of the first loss function;
and when the output value of the first loss function is converged, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute deviation function L1-loss, a minimum squared error function L2-loss and a smooth loss function smoth-loss.
Optionally, the training the two-way generation type countermeasure network by using the color enhancement model training data to obtain a color enhancement model, including:
inputting the color enhancement model training data into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cyclic consistency loss function;
and when the output value of the second loss function is converged, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model takes residual error networks as basic modules, and a preset cascade mechanism is added between the residual error networks.
Optionally, the training the convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model, including:
inputting the resolution enhancement model training data into a convolutional neural network to obtain output data of the convolutional neural network;
inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a characteristic pyramid;
and when the output value of the third loss function is converged, obtaining the resolution enhancement model.
A video image quality enhancement method comprises the following steps:
under the condition of receiving a video image quality enhancement request, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and an enhancement processing option, wherein the enhancement processing option is at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option to obtain a video frame after video image quality enhancement processing, wherein the video image quality enhancement model is pre-constructed according to the construction method of the video image quality enhancement model disclosed by the embodiment, the definition enhancement option corresponds to the definition enhancement model, the color enhancement option corresponds to the color enhancement model, and the resolution enhancement option corresponds to the resolution enhancement model.
Optionally, when the video quality enhancement request includes more than one enhancement processing option, the inputting the to-be-enhanced video frame into a video quality enhancement model corresponding to the enhancement processing option to obtain a video frame after video quality enhancement processing includes:
and inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option according to the sequence of sharpness enhancement, color enhancement and resolution enhancement to obtain the video frame after the video image quality enhancement processing.
An apparatus for constructing a video quality enhancement model, comprising:
a training data acquisition unit for acquiring sharpness enhancement model training data, resolution enhancement model training data and color enhancement model training data;
the sharpness enhancement model building unit is used for training a self-coding network by utilizing the sharpness enhancement model training data to obtain a sharpness enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder are respectively composed of a convolutional neural network;
the color enhancement model building unit is used for training the two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
the resolution enhancement model construction unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
and the model integration unit is used for converting the definition enhancement model, the color enhancement model and the resolution enhancement model into a preset format and integrating the preset format into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained by training the convolutional neural network with noise estimation training data.
Optionally, the sharpness enhancement model constructing unit is specifically configured to:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
inputting the sharpness enhancement model training data and the noise value into the encoder and the decoder in sequence to obtain output data of the self-coding network;
inputting the output data of the self-coding network and the real reference image data of the sharpness enhancement model training data into a first loss function to obtain an output value of the first loss function;
and when the output value of the first loss function is converged, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute deviation function L1-loss, a minimum squared error function L2-loss and a smooth loss function smoth-loss.
Optionally, the color enhancement model building unit is specifically configured to:
inputting the color enhancement model training data into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cyclic consistency loss function;
and when the output value of the second loss function is converged, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model takes residual error networks as basic modules, and a preset cascade mechanism is added between the residual error networks.
Optionally, the resolution enhancement model constructing unit is specifically configured to:
inputting the resolution enhancement model training data into a convolutional neural network to obtain output data of the convolutional neural network;
inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a characteristic pyramid;
and when the output value of the third loss function is converged, obtaining the resolution enhancement model.
A video quality enhancement apparatus includes:
the enhancement request analysis unit is used for analyzing the video quality enhancement request under the condition of receiving the video quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
and an enhancement processing unit, configured to input the video frame to be enhanced into a video quality enhancement model corresponding to the enhancement processing option, to obtain a video frame after video quality enhancement processing, where the video quality enhancement model is pre-constructed according to the method for constructing a video quality enhancement model disclosed in the foregoing embodiment, the definition enhancement option corresponds to a definition enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Optionally, when the video quality enhancement request includes more than one enhancement processing option, the enhancement processing unit is specifically configured to input the video frame to be enhanced into a video quality enhancement model corresponding to the enhancement processing option according to a sequence of sharpness enhancement, color enhancement, and resolution enhancement, so as to obtain a video frame after video quality enhancement processing.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses a method for constructing a video image quality enhancement model, which utilizes a machine learning method, realizes the definition enhancement and/or the color enhancement and/or the resolution enhancement of the video image quality in a video transcoding program according to the video image quality enhancement request of a user by constructing a definition enhancement model, a color enhancement model and a resolution enhancement model and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model in a video transcoding program according to the sequence of definition enhancement, color enhancement and/or resolution enhancement, meets different video image quality enhancement requests of the user, and improves the universality of the video image quality enhancement method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart illustrating a method for constructing a video quality enhancement model according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a self-coding network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a one-way generation type countermeasure network according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present invention;
fig. 5 is a schematic flowchart illustrating a video quality enhancement method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a video quality enhancement model building apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a video quality enhancement apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention discloses a method for constructing a video quality enhancement model, which specifically includes the following steps:
s101: acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
the definition enhancement is mainly divided into denoising and deblurring, wherein the noise comprises Gaussian noise, compression noise and the like; blur here refers to the most common motion blur. Therefore, the definition enhancement model training data comes from the actual low-definition video on the video platform line on one hand, and is generated by different rule simulation on the other hand, namely random noise is added, different operators are adopted for filtering and the like, and a data expansion rule conforming to the real low-quality video distribution is designed.
A part of the resolution enhancement model training data adopts a real low-resolution video of a video platform, and a part of the resolution enhancement model training data is generated by script simulation, namely random noise or filtering of different operators is added while the image size is reduced, so that the data close to the real low-resolution video is generated.
In this embodiment, a two-way generation type countermeasure network is used for training the color enhancement model, and end-to-end unsupervised training is performed, so that the color enhancement model training data is a film library resource of the video platform, that is, only one group of pictures with the color effect that we want to achieve is collected.
S102: training a self-coding network by using the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
the self-coding network for training the sharpness enhancement model includes an encoder and a decoder, see fig. 2, the left half is the encoder and the right half is the decoder, both 11 convolutional layers and 2 pooling layers.A characteristic diagram is shown, wherein,showing the layer of the convolution layer,the layer of pooling is shown as being a layer of pooling,representing the upsampling layers, in the encoder, the input data is first converted step by step through the multilayer convolutional and pooling layers into a feature map with a spatial size of 1 x 1 and a number of channels of 256, and then converted back to the original size and number of channels of the input data in the decoder.
A jump connection structure is widely adopted among all characteristic graphs in the network structure and used for combining information of different convolution layers, and gradient propagation and accelerated convergence are facilitated. Both the encoder and decoder architectures employ a residual error network resnet as a basic module.
In order to process videos with different low-definition conditions, a noise estimation sub-network is designed, training data enters a noise estimation network before entering an encoder, and the training data and noise values output by the noise estimation network are simultaneously and sequentially input into the encoder and a decoder, so that robust output is obtained.
The noise estimation sub-network adopts a common full convolution network, the output real reference value in the training process is the noise size when a noise map is generated in a simulation mode, and the embedding of the network can enable the whole network to be insensitive to the noise size of an input image.
In order to ensure the training effect of the sharpness enhancement model, the embodiment estimates the training effect through the first loss function, which is the weighted sum of the minimum absolute value deviation function L1-loss, the minimum square error function L2-loss and the smooth loss function smoth-loss, thereby ensuring the rapid convergence of the network and the stability of the model training.
And the output of the training data after passing through the whole self-coding network and the real reference image data are input into a first loss function together, and the result is subjected to backward feedback and is used for updating the value of the network parameter. The optimizer adopts an adam method carried by a pyrrch frame, the parameter beta 1 is 0.9, the parameter beta 2 is 0.999, the batch _ size is set to be 16, the initial learning rate is set to be 0.001, 100 epochs are continuously trained, and a strategy of sectional decrement is subsequently adopted, namely, the learning rate of every 20 epochs is reduced to 10 percent of the previous learning rate, so that supervised training is carried out.
S103: training the two-way generation type countermeasure network by using color enhancement model training data to obtain a color enhancement model;
the double-path generation type confrontation network GAN structure fully utilizes the advantages of GAN in the aspect of image generation, and depends on the film library resources of a video platform to perform end-to-end unsupervised training. The structure of the single-channel GAN is shown in FIG. 3, and the double-channel GAN means that two single-channel GANs are arranged in parallel and some connecting mechanisms are added between the two single-channel GANs. The Generator (Generator) is used to generate a color enhanced image and the Discriminator (Discriminator) distinguishes between the true target image and the enhanced image generated by the Generator. We understand the toning problem as an image translation problem, i.e. translating an image of one style into an image of another style. By using a style migration algorithm Cycle-GAN for reference, the second loss function adopts a Cycle consistency loss function (Cycle consistency loss), so that the occurrence of unstable conditions in the training process of the GAN network is greatly reduced. Different from other tasks, training data pairs are difficult to find in color enhancement, a GAN network is adopted for unsupervised training, only one group of pictures with the color effect which is expected to be achieved by the user are needed to be collected, and the difficulty of data collection is greatly reduced. In the training process, the parameter batch size is set to be 4, the learning rate of the generator and the discriminator is set to be 0.00001, and the unsupervised training is carried out on the learning rate by adopting a sectional decreasing strategy.
S104: training the convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model;
referring to fig. 4, the convolutional neural network for training the resolution enhancement model uses resnet as a basic module, and in order to reduce the total parameter, a cascade mechanism is added between resnet modules, that is, the output of the middle layer is cascaded to a higher layer, and finally converges to the last convolutional layer. The first layer and the last layer of the network are both mean shift layers with convolution kernels of 1 x 1, and the mean shift layers respectively complete the averaging and the inverse operation of the mean shift layers, and parameters do not need to be updated in the training process. The size of each convolution kernel is 3 x 3, and the activation function adopts relu. And the upper sampling layer adopts Pixelshuffle to multiply the output characteristic diagram.
The convolutional neural network evaluates the training effect through a third loss function, the third loss function design adopts the characteristic pyramid idea, and the polynomial sum of some intermediate layers and the final output layer is used as a final expression. The shallow layer of the network contains more basic information including textures, lines and the like, the high layer of the network contains more semantic information, and the loss function is designed by adopting the idea of the feature pyramid.
The optimizer adopts an adam method, the parameter batch size is set to be 64, the initial learning rate is set to be 0.0001, and supervised training is carried out on the learning rate by adopting a sectional decreasing strategy.
S105: and converting the definition enhancement model, the color enhancement model and the resolution enhancement model into a preset format, and integrating the preset format into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
The model training algorithms in the embodiment of the invention are developed by adopting a pytorch framework, and after the design of a network structure is finished, the algorithms are trained on a Tesla P40 model GPU of an nvidia company. The training parameters are continuously adjusted according to the training output of the algorithm, so that the algorithm is finally converged below an ideal precision. The trained model is converted into pb format of tensoflow framework, so that it can be integrated into the ffmpeg transcoding flow. The final usage flow is roughly: source video → decoding into video frames → video scene segmentation → selecting different model combinations to perform video enhancement → video frames merging and outputting.
The embodiment also discloses a video quality enhancement method, which performs video quality enhancement processing by using the video quality enhancement model constructed by the embodiment, please refer to fig. 5, and the method comprises the following steps:
s201: under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
that is to say, a corresponding video image quality enhancement request is sent according to the enhancement requirements of the user for different video frames, and the enhancement processing option in the request may be any one of a definition enhancement option, a color enhancement option and a resolution enhancement option, or any two of them, or all three options.
S202: and inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option to obtain the video frame after the video image quality enhancement processing, wherein the definition enhancement option corresponds to the definition enhancement model, the color enhancement option corresponds to the color enhancement model, and the resolution enhancement option corresponds to the resolution enhancement model.
When the video image quality enhancement request comprises more than one enhancement processing option, inputting the video frame to be enhanced into the video image quality enhancement model corresponding to the enhancement processing option according to the sequence of sharpness enhancement, color enhancement and resolution enhancement, and obtaining the video frame after the video image quality enhancement processing.
Taking the example that the video image quality enhancement request comprises a definition enhancement option, a color enhancement option and a resolution enhancement option, inputting a video frame to be enhanced into a definition enhancement model, inputting an output result of the definition enhancement model into the color enhancement model, inputting an output result of the color enhancement model into the resolution enhancement model, and obtaining an output result of the resolution enhancement model, namely the video frame after the video image quality enhancement processing.
The video image quality enhancement method disclosed by the embodiment is characterized in that a machine learning method is utilized, a definition enhancement model, a color enhancement model and a resolution enhancement model are built and integrated in a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, the definition enhancement and/or the color enhancement and/or the resolution enhancement of the video image quality in the video transcoding program are realized according to the video image quality enhancement request of a user, different video image quality enhancement requests of the user are met, and the universality of the video image quality enhancement method is improved.
Based on the method for constructing a video quality enhancement model disclosed in the foregoing embodiment, this embodiment correspondingly discloses a device for constructing a video quality enhancement model, please refer to fig. 6, which includes:
a training data obtaining unit 401, configured to obtain sharpness enhancement model training data, resolution enhancement model training data, and color enhancement model training data;
a sharpness enhancement model construction unit 402, configured to train a self-coding network with the sharpness enhancement model training data to obtain a sharpness enhancement model, where the self-coding network includes an encoder and a decoder, and the encoder and the decoder are respectively composed of a convolutional neural network;
a color enhancement model construction unit, configured to train 403 the two-way generation type countermeasure network by using the color enhancement model training data, so as to obtain a color enhancement model;
the resolution enhancement model construction unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
a model integrating unit 404, configured to convert the sharpness enhancement model, the color enhancement model, and the resolution enhancement model into a preset format, and integrate them in a video transcoding program according to the sequence of sharpness enhancement, color enhancement, and resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained by training the convolutional neural network with noise estimation training data.
Optionally, the sharpness enhancement model constructing unit 402 is specifically configured to:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
inputting the sharpness enhancement model training data and the noise value into the encoder and the decoder in sequence to obtain output data of the self-coding network;
inputting the output data of the self-coding network and the real reference image data of the sharpness enhancement model training data into a first loss function to obtain an output value of the first loss function;
and when the output value of the first loss function is converged, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute deviation function L1-loss, a minimum squared error function L2-loss and a smooth loss function smoth-loss.
Optionally, the color enhancement model building unit 403 is specifically configured to:
inputting the color enhancement model training data into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cyclic consistency loss function;
and when the output value of the second loss function is converged, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model takes residual error networks as basic modules, and a preset cascade mechanism is added between the residual error networks.
Optionally, the resolution enhancement model constructing unit 404 is specifically configured to:
inputting the resolution enhancement model training data into a convolutional neural network to obtain output data of the convolutional neural network;
inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a characteristic pyramid;
and when the output value of the third loss function is converged, obtaining the resolution enhancement model.
Based on the video quality enhancement method disclosed in the above embodiment, the present embodiment correspondingly discloses a video quality enhancement apparatus, please refer to fig. 7, which includes:
an enhancement request parsing unit 501, configured to parse the video quality enhancement request to obtain a video frame to be enhanced and an enhancement processing option, where the enhancement processing option is at least any one of a definition enhancement option, a color enhancement option, and a resolution enhancement option;
an enhancement processing unit 502, configured to input the video frame to be enhanced into a video quality enhancement model corresponding to the enhancement processing option, to obtain a video frame after video quality enhancement processing, where the video quality enhancement model is pre-constructed according to the method for constructing a video quality enhancement model disclosed in the foregoing embodiment, the definition enhancement option corresponds to a definition enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Optionally, when the video quality enhancement request includes more than one enhancement processing option, the enhancement processing unit is specifically configured to input the video frame to be enhanced into a video quality enhancement model corresponding to the enhancement processing option according to a sequence of sharpness enhancement, color enhancement, and resolution enhancement, so as to obtain a video frame after video quality enhancement processing.
According to the device for constructing the video image quality enhancement model and the device for enhancing the video image quality, the definition enhancement model, the color enhancement model and the resolution enhancement model are constructed by utilizing a machine learning method and are integrated in a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, the definition enhancement and/or the color enhancement and/or the resolution enhancement of the video image quality in the video transcoding program are realized according to the video image quality enhancement request of a user, different video image quality enhancement requests of the user are met, and the universality of the video image quality enhancement method is improved.
The above embodiments can be combined arbitrarily, and the features described in the embodiments in the present specification can be replaced or combined with each other in the above description of the disclosed embodiments, so that those skilled in the art can implement or use the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
1. A method for constructing a video image quality enhancement model is characterized by comprising the following steps:
acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
training a self-coding network by using the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
training the two-way generation type countermeasure network by using the color enhancement model training data to obtain a color enhancement model;
training a convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model;
and converting the definition enhancement model, the color enhancement model and the resolution enhancement model into a preset format, and integrating the preset format in a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
2. The method of claim 1, wherein the self-encoding network further comprises a noise estimation sub-network that is derived from training a convolutional neural network with noise estimation training data.
3. The method of claim 2, wherein training a self-coding network with the sharpness enhancement model training data to obtain a sharpness enhancement model comprises:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
inputting the sharpness enhancement model training data and the noise value into the encoder and the decoder in sequence to obtain output data of the self-coding network;
inputting the output data of the self-coding network and the real reference image data of the sharpness enhancement model training data into a first loss function to obtain an output value of the first loss function;
and when the output value of the first loss function is converged, obtaining the definition enhancement model.
4. The method of claim 1, wherein the first loss function is a weighted sum of a minimum absolute deviation function L1-loss, a least squares error function L2-loss, and a smoothing loss function smoth-loss.
5. The method of claim 1, wherein training the two-way generative confrontation network using the color enhancement model training data to obtain a color enhancement model comprises:
inputting the color enhancement model training data into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cyclic consistency loss function;
and when the output value of the second loss function is converged, obtaining the color enhancement model.
6. The method according to claim 1, wherein the convolutional neural network corresponding to the resolution enhancement model takes residual error networks as basic modules, and a preset cascade mechanism is added between the residual error networks.
7. The method of claim 6, wherein training the convolutional neural network using the resolution enhancement model training data to obtain a resolution enhancement model comprises:
inputting the resolution enhancement model training data into a convolutional neural network to obtain output data of the convolutional neural network;
inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a characteristic pyramid;
and when the output value of the third loss function is converged, obtaining the resolution enhancement model.
8. A method for enhancing video quality, comprising:
under the condition of receiving a video image quality enhancement request, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and an enhancement processing option, wherein the enhancement processing option is at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option to obtain a video frame after video image quality enhancement processing, wherein the video image quality enhancement model is pre-constructed according to the construction method of the video image quality enhancement model according to claims 1-7, the definition enhancement option corresponds to a definition enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
9. The method of claim 8, wherein when the video quality enhancement request includes one or more enhancement options, the inputting the video frame to be enhanced into a video quality enhancement model corresponding to the enhancement options to obtain a video frame after video quality enhancement, comprises:
and inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option according to the sequence of sharpness enhancement, color enhancement and resolution enhancement to obtain the video frame after the video image quality enhancement processing.
10. An apparatus for constructing a video quality enhancement model, comprising:
a training data acquisition unit for acquiring sharpness enhancement model training data, resolution enhancement model training data, and color enhancement model training data;
the sharpness enhancement model building unit is used for training a self-coding network by utilizing the sharpness enhancement model training data to obtain a sharpness enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder are respectively composed of a convolutional neural network;
the color enhancement model building unit is used for training the two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
the resolution enhancement model construction unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
and the model integration unit is used for converting the definition enhancement model, the color enhancement model and the resolution enhancement model into a preset format and integrating the preset format into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
11. A video quality enhancement apparatus, comprising:
the enhancement request analysis unit is used for analyzing the video quality enhancement request under the condition of receiving the video quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
an enhancement processing unit, configured to input the video frame to be enhanced into a video quality enhancement model corresponding to the enhancement processing option, so as to obtain a video frame after video quality enhancement processing, where the video quality enhancement model is pre-constructed according to the method for constructing a video quality enhancement model according to claims 1 to 7, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011277818.XA CN114513684B (en) | 2020-11-16 | 2020-11-16 | Method for constructing video image quality enhancement model, video image quality enhancement method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011277818.XA CN114513684B (en) | 2020-11-16 | 2020-11-16 | Method for constructing video image quality enhancement model, video image quality enhancement method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114513684A true CN114513684A (en) | 2022-05-17 |
CN114513684B CN114513684B (en) | 2024-05-28 |
Family
ID=81547015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011277818.XA Active CN114513684B (en) | 2020-11-16 | 2020-11-16 | Method for constructing video image quality enhancement model, video image quality enhancement method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114513684B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108235058A (en) * | 2018-01-12 | 2018-06-29 | 广州华多网络科技有限公司 | Video quality processing method, storage medium and terminal |
CN110020684A (en) * | 2019-04-08 | 2019-07-16 | 西南石油大学 | A kind of image de-noising method based on residual error convolution autoencoder network |
CN110189268A (en) * | 2019-05-23 | 2019-08-30 | 西安电子科技大学 | Underwater picture color correcting method based on GAN network |
CN110263801A (en) * | 2019-03-08 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Image processing model generation method and device, electronic equipment |
CN111882489A (en) * | 2020-05-15 | 2020-11-03 | 东北石油大学 | Super-resolution graph recovery method for simultaneously enhancing underwater images |
-
2020
- 2020-11-16 CN CN202011277818.XA patent/CN114513684B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108235058A (en) * | 2018-01-12 | 2018-06-29 | 广州华多网络科技有限公司 | Video quality processing method, storage medium and terminal |
CN110263801A (en) * | 2019-03-08 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Image processing model generation method and device, electronic equipment |
CN110020684A (en) * | 2019-04-08 | 2019-07-16 | 西南石油大学 | A kind of image de-noising method based on residual error convolution autoencoder network |
CN110189268A (en) * | 2019-05-23 | 2019-08-30 | 西安电子科技大学 | Underwater picture color correcting method based on GAN network |
CN111882489A (en) * | 2020-05-15 | 2020-11-03 | 东北石油大学 | Super-resolution graph recovery method for simultaneously enhancing underwater images |
Also Published As
Publication number | Publication date |
---|---|
CN114513684B (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363716B (en) | High-quality reconstruction method for generating confrontation network composite degraded image based on conditions | |
CN110933429B (en) | Video compression sensing and reconstruction method and device based on deep neural network | |
CN110751649B (en) | Video quality evaluation method and device, electronic equipment and storage medium | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN111260560B (en) | Multi-frame video super-resolution method fused with attention mechanism | |
CN112070664A (en) | Image processing method and device | |
CN112203098B (en) | Mobile terminal image compression method based on edge feature fusion and super-resolution | |
WO2020062074A1 (en) | Reconstructing distorted images using convolutional neural network | |
CN112465726A (en) | Low-illumination adjustable brightness enhancement method based on reference brightness index guidance | |
CN111667406B (en) | Video image super-resolution reconstruction method based on time domain correlation | |
CN116958534A (en) | Image processing method, training method of image processing model and related device | |
CN114494569A (en) | Cloud rendering method and device based on lightweight neural network and residual streaming transmission | |
CN116918329A (en) | Video frame compression and video frame decompression method and device | |
CN117893409A (en) | Face super-resolution reconstruction method and system based on illumination condition constraint diffusion model | |
CN117714702A (en) | Video encoding method, apparatus and storage medium | |
CN112862675A (en) | Video enhancement method and system for space-time super-resolution | |
CN117196940A (en) | Super-resolution reconstruction method suitable for real scene image based on convolutional neural network | |
CN114513684B (en) | Method for constructing video image quality enhancement model, video image quality enhancement method and device | |
CN115243031B (en) | Video space-time feature optimization method, system, electronic equipment and storage medium based on quality attention mechanism | |
CN115396683A (en) | Video optimization processing method and device, electronic equipment and computer readable medium | |
CN115866265A (en) | Multi-code-rate depth image compression system and method applied to mixed context | |
CN113542780B (en) | Method and device for removing compression artifacts of live webcast video | |
Yu et al. | Low light combining multiscale deep learning networks and image enhancement algorithm | |
CN114140363B (en) | Video deblurring method and device and video deblurring model training method and device | |
CN115103118B (en) | High dynamic range image generation method, device, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |