CN114513684B - Method for constructing video image quality enhancement model, video image quality enhancement method and device - Google Patents
Method for constructing video image quality enhancement model, video image quality enhancement method and device Download PDFInfo
- Publication number
- CN114513684B CN114513684B CN202011277818.XA CN202011277818A CN114513684B CN 114513684 B CN114513684 B CN 114513684B CN 202011277818 A CN202011277818 A CN 202011277818A CN 114513684 B CN114513684 B CN 114513684B
- Authority
- CN
- China
- Prior art keywords
- enhancement
- enhancement model
- model
- definition
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012549 training Methods 0.000 claims description 140
- 238000012545 processing Methods 0.000 claims description 53
- 238000013527 convolutional neural network Methods 0.000 claims description 38
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 62
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234363—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440263—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/21—Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention provides a method for constructing a video image quality enhancement model, a method and a device for video image quality enhancement, which utilize a machine learning method to construct a definition enhancement model, a color enhancement model and a resolution enhancement model, integrate the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, realize definition enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to video image quality enhancement requests of users, meet different video image quality enhancement requests of users, and improve the universality of the video image quality enhancement method.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method for constructing a video image quality enhancement model, a method and an apparatus for enhancing video image quality.
Background
With the development of the times, the requirements of people on video image quality are continuously improved. However, at present, a large number of low-quality videos are caused by factors such as lag of shooting equipment, poor shooting technology, damage to image quality of the videos in the processes of manufacturing, transcoding and transmission, viewing experience is seriously affected, and extra code rate overhead is increased under the same conditions. Therefore, it is important to improve the image quality of low-quality video.
Video image quality enhancement is generally classified into a conventional method and a deep learning method. The traditional video enhancement method is mostly a set of enhancement rules established by experts according to attribute information (brightness, color temperature and the like) of videos, the enhancement effect depends on experience, and the accuracy is low. Most of the existing deep learning methods are researched aiming at a special scene, an end-to-end model is trained, and the universality is not strong.
Disclosure of Invention
In view of the above, the invention provides a method for constructing a video image quality enhancement model, a method and a device for enhancing video image quality, which realize enhancement of definition, color enhancement and resolution of video frames, and have strong universality and good enhancement effect.
In order to achieve the above purpose, the specific technical scheme provided by the invention is as follows:
A method for constructing a video image quality enhancement model comprises the following steps:
Acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
Training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
training a two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
Training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
The sharpness enhancement model, the color enhancement model and the resolution enhancement model are converted into a preset format and integrated in a video transcoding procedure in the order sharpness enhancement-color enhancement-resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained after training the convolutional neural network by using noise estimation training data.
Optionally, the training the self-coding network by using the sharpness enhancement model training data to obtain a sharpness enhancement model includes:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
Optionally, the training the two-way generation type countermeasure network by using the color enhancement model training data to obtain a color enhancement model includes:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
Optionally, training the convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model, including:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
A video image quality enhancement method, comprising:
Under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
Inputting the video frame to be enhanced into a video image enhancement model corresponding to the enhancement processing option to obtain a video frame after video image enhancement processing, wherein the video image enhancement model is pre-constructed according to the method for constructing the video image enhancement model disclosed by the embodiment, the definition enhancement option corresponds to the definition enhancement model, the color enhancement option corresponds to the color enhancement model, and the resolution enhancement option corresponds to the resolution enhancement model.
Optionally, when the video image quality enhancement request includes more than one enhancement processing options, the inputting the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options, to obtain a video frame after video image quality enhancement processing, includes:
And inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options according to the sequence of definition enhancement, color enhancement and resolution enhancement, so as to obtain video frames after video image quality enhancement processing.
A construction device of a video image quality enhancement model comprises:
the training data acquisition unit is used for acquiring the definition enhancement model training data, the resolution enhancement model training data and the color enhancement model training data;
The definition enhancement model building unit is used for training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of convolutional neural networks;
The color enhancement model building unit is used for training the two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
The resolution enhancement model building unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
And the model integration unit is used for converting the definition enhancement model, the color enhancement model and the resolution enhancement model into preset formats and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained after training the convolutional neural network by using noise estimation training data.
Optionally, the sharpness enhancement model building unit is specifically configured to:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
Optionally, the color enhancement model building unit is specifically configured to:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
Optionally, the resolution enhancement model building unit is specifically configured to:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
A video image quality enhancement apparatus comprising:
The enhancement request analysis unit is used for analyzing the video image quality enhancement request under the condition of receiving the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of definition enhancement options, color enhancement options and resolution enhancement options;
The enhancement processing unit is configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option, so as to obtain a video frame after video image quality enhancement processing, where the video image quality enhancement model is pre-constructed according to the method for constructing a video image quality enhancement model disclosed in the foregoing embodiment, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Optionally, when the video image quality enhancement request includes more than one enhancement processing option, the enhancement processing unit is specifically configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option according to a sequence of sharpness enhancement, color enhancement, and resolution enhancement, so as to obtain a video frame after video image quality enhancement processing.
Compared with the prior art, the invention has the following beneficial effects:
The invention discloses a method for constructing a video image quality enhancement model, which utilizes a machine learning method to construct a definition enhancement model, a color enhancement model and a resolution enhancement model, integrates the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, realizes definition enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to video image quality enhancement requests of users, satisfies different video image quality enhancement requests of users, and improves the universality of the video image quality enhancement method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for constructing a video image quality enhancement model according to an embodiment of the present invention;
Fig. 2 is a schematic structural diagram of a self-coding network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a single-path generation type countermeasure network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention;
Fig. 5 is a flow chart of a video image quality enhancement method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a device for constructing a video image quality enhancement model according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of a video image quality enhancement device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the embodiment of the invention discloses a method for constructing a video image quality enhancement model, which specifically comprises the following steps:
s101: acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
Sharpness enhancement is largely divided into denoising and deblurring, where noise includes gaussian noise, compression noise, etc.; blur here refers to the most common motion blur. Therefore, the definition enhancement model training data is generated by simulating different rules, namely adding random noise, adopting different operators for filtering and the like, on the one hand, from an actual low-definition video on a video platform line, and on the other hand, designing a data expansion rule conforming to the actual low-quality video distribution.
The resolution enhancement model training data is partially generated by adopting a real low-resolution video of a video platform and partially generated by script simulation, namely, random noise or filtering of different operators are added while the image size is reduced, so that data close to the real low-resolution video is generated.
In this embodiment, the two-way generation type countermeasure network is used for training the color enhancement model, and the end-to-end unsupervised training is performed, so that the color enhancement model training data is a film library resource of the video platform, i.e. only a group of images with the color effect desired by us needs to be collected.
S102: training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of convolutional neural networks;
the self-coding network used to train the sharpness enhancement model includes an encoder and a decoder, referring to fig. 2, the left half is the encoder and the right half is the decoder, both 11 convolutional layers and 2 pooling layers. Representing a feature map,/>Representing a convolution layer,/>Representing a pooling layer,/>Representing the upsampling layer, in the encoder, the input data is first stepwise converted through the multi-layer convolutional layer and the pooling layer into a feature map of spatial size 1*1 and channel number 256, which is then converted back to the original size and channel number of the input data in the decoder.
And a jump connection structure is widely adopted among the characteristic diagrams in the network structure and is used for combining information of different convolution layers, so that gradient propagation and acceleration convergence are facilitated. Both the encoder and decoder structures employ a residual network resnet as a base module.
In order to process videos with different low definition conditions, a noise estimation sub-network is designed, training data is firstly input into the noise estimation network before entering an encoder, and noise values of the output of the training data and the noise estimation network are simultaneously and sequentially input into the encoder and the decoder, so that a robust output is obtained.
The noise estimation sub-network adopts a common full convolution network, and the noise size of the noise graph is simulated and generated by outputting a real reference value in the training process, and the embedding of the network can make the whole network insensitive to the noise size of an input image.
In order to ensure the training effect of the sharpness enhancement model, the training effect is estimated through the first loss function, wherein the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss and a smooth loss function smoth-loss, so that the rapid convergence of a network can be ensured, and the stability of model training can be ensured.
The output of the training data after passing through the whole self-coding network is input into a first loss function together with the real reference image data, and the result is backward returned for updating the value of the network parameter. The optimizer adopts the self-contained adam method of pytorch frames, the parameters are beta 1=0.9, beta 2=0.999, the batch_size is set to be 16, the initial learning rate is set to be 0.001, 100 epochs are continuously trained, and the subsequent strategy of step-down is adopted, namely, the learning rate of every 20 epochs is reduced to be 10% of the previous one, so that the supervised training is carried out.
S103: training the two-way generation type countermeasure network by utilizing the training data of the color enhancement model to obtain the color enhancement model;
The two-way generation type countermeasure network GAN structure fully utilizes the advantages of GAN in the aspect of image generation, relies on the film library resources of the video platform, and performs end-to-end non-supervision training. The structure of the single-path GAN is shown in fig. 3, and the two-path GAN are two single-path GAN in parallel, and some coupling mechanisms are added between the two single-path GAN. A Generator (Generator) is used to generate a color enhanced image, and a discriminator (Discriminator) distinguishes a true target image from the enhanced image generated by the Generator. We understand the toning problem as an image translation problem, i.e. translating an image of one style into an image of another style. By using the style migration algorithm Cycle-GAN as a reference, the second loss function adopts a cyclic consistency loss function (Cycle consistency loss), so that the occurrence of unstable conditions in the GAN network training process is greatly reduced. Different from other tasks, the color enhancement is difficult to find the training data pair, the GAN network is adopted for non-supervision training, only a group of pictures with the color effect which we wish to achieve are collected, and the difficulty of data collection is greatly reduced. In the training process, the parameter batch size is set to be 4, the learning rate of the generator and the discriminator is set to be 0.00001, and the non-supervision training is carried out on the learning rate by adopting a step-down strategy.
S104: training the convolutional neural network by utilizing the training data of the resolution enhancement model to obtain a resolution enhancement model;
Referring to fig. 4, the convolutional neural network for training the resolution enhancement model uses resnet as a basic module, and in order to reduce the total parameter number, a cascade mechanism is added between each resnet modules, i.e. the output of the middle layer is cascade connected to a higher layer, and finally, the output of the middle layer is converged to the final layer of convolutional layer. The first layer and the last layer of the network are mean shift layers with a convolution kernel of 1*1, the de-averaging and the inverse operation are respectively finished, and parameters do not need to be updated in training. The convolution kernel size of the other convolution layers is 3*3, and relu is adopted as an activation function. And the up-sampling layer adopts Pixelshuffle to perform multiplication processing on the output characteristic diagram.
The convolutional neural network evaluates training effects through a third loss function, which is designed by adopting a characteristic pyramid idea, and uses the polynomial sum of some intermediate layers and a final output layer as a final expression. The shallow layer of the network contains more basic information including textures, lines and the like, the high layer of the network contains more semantic information, and the advantage of designing a loss function by adopting the thought of a feature pyramid is that some detail parts can be finely depicted while super-resolution of an image is realized, so that the whole and detail mapping relation from a low-resolution image to a high-resolution image is fully learned.
The optimizer adopts adam method, the parameter batch size is set to 64, the initial learning rate is set to 0.0001, and the learning rate is also supervised and trained by adopting a step-down strategy.
S105: the sharpness enhancement model, the color enhancement model, and the resolution enhancement model are converted into a preset format and integrated in a video transcoding procedure in the order sharpness enhancement-color enhancement-resolution enhancement.
The model training algorithm in the embodiment of the invention is developed by adopting pytorch frames, and the algorithm is trained on the model Tesla P40 GPU of nvidia company after the network structure design is completed. The training parameters are continuously adjusted according to the training output of the algorithm, so that the algorithm finally converges below an ideal precision. The training resulting model is converted to the pb format of tensorflow framework so that it can be integrated into the ffmpeg transcoding process. The final use flow is approximately as follows: source video, decoding into video frames, dividing video scenes, selecting different model combinations according to requirements to carry out video enhancement, and merging and outputting the video frames.
The embodiment also discloses a video image quality enhancement method, which uses the video image quality enhancement model constructed in the above embodiment to perform video image quality enhancement processing, referring to fig. 5, and the method comprises the following steps:
S201: under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
That is, according to the enhancement requirements of the user on different video frames, a corresponding video image enhancement request is sent, and the enhancement processing options in the request can be any one of the definition enhancement options, the color enhancement options and the resolution enhancement options, any two of the options, or all three of the options.
S202: and inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options to obtain video frames subjected to video image quality enhancement processing, wherein the definition enhancement options correspond to the definition enhancement model, the color enhancement options correspond to the color enhancement model, and the resolution enhancement options correspond to the resolution enhancement model.
When the video image quality enhancement request comprises more than one enhancement processing options, inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options according to the sequence of definition enhancement, color enhancement and resolution enhancement, and obtaining video frames after video image quality enhancement processing.
Taking the example that the video image quality enhancement request comprises a definition enhancement option, a color enhancement option and a resolution enhancement option, inputting the video frame to be enhanced into a definition enhancement model, inputting the output result of the definition enhancement model into the color enhancement model, inputting the output result of the color enhancement model into the resolution enhancement model, and obtaining the output result of the resolution enhancement model as the video frame after the video image quality enhancement processing.
It can be seen that the video image quality enhancement method disclosed in this embodiment uses a machine learning method to construct a sharpness enhancement model, a color enhancement model and a resolution enhancement model, and integrates the sharpness enhancement model, the color enhancement model and the resolution enhancement model in a video transcoding program according to the order of sharpness enhancement, color enhancement and resolution enhancement, so as to implement sharpness enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to a video image quality enhancement request of a user, thereby satisfying different video image quality enhancement requests of the user and improving the universality of the video image quality enhancement method.
Based on the method for constructing a video image quality enhancement model disclosed in the foregoing embodiment, the present embodiment correspondingly discloses a device for constructing a video image quality enhancement model, please refer to fig. 6, which includes:
a training data obtaining unit 401, configured to obtain definition enhancement model training data, resolution enhancement model training data, and color enhancement model training data;
A sharpness enhancement model building unit 402, configured to train a self-coding network by using the sharpness enhancement model training data to obtain a sharpness enhancement model, where the self-coding network includes an encoder and a decoder, and the encoder and the decoder are respectively composed of convolutional neural networks;
A color enhancement model building unit 403, configured to train the two-way generation type countermeasure network by using the color enhancement model training data, so as to obtain a color enhancement model;
the resolution enhancement model building unit 404 is configured to train the convolutional neural network by using the resolution enhancement model training data to obtain a resolution enhancement model;
A model integration unit 405, configured to convert the sharpness enhancement model, the color enhancement model, and the resolution enhancement model into a preset format, and integrate the same into a video transcoding procedure in order of sharpness enhancement-color enhancement-resolution enhancement.
Optionally, the self-coding network further includes a noise estimation sub-network, and the noise estimation sub-network is obtained after training the convolutional neural network by using noise estimation training data.
Optionally, the sharpness enhancement model building unit 402 is specifically configured to:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
Optionally, the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
Optionally, the color enhancement model building unit 403 is specifically configured to:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
Optionally, the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
Optionally, the resolution enhancement model building unit 404 is specifically configured to:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
Based on the video image quality enhancement method disclosed in the above embodiment, the present embodiment correspondingly discloses a video image quality enhancement device, please refer to fig. 7, which includes:
An enhancement request parsing unit 501, configured to parse the video image quality enhancement request to obtain a video frame to be enhanced and an enhancement processing option, where the enhancement processing option is at least any one of a sharpness enhancement option, a color enhancement option and a resolution enhancement option, when the video image quality enhancement request is received;
The enhancement processing unit 502 is configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option, to obtain a video frame after video image quality enhancement processing, where the video image quality enhancement model is pre-constructed according to a method for constructing a video image quality enhancement model disclosed in the foregoing embodiment, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Optionally, when the video image quality enhancement request includes more than one enhancement processing option, the enhancement processing unit is specifically configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option according to a sequence of sharpness enhancement, color enhancement, and resolution enhancement, so as to obtain a video frame after video image quality enhancement processing.
The embodiment discloses a device for constructing a video image quality enhancement model and a video image quality enhancement device, which utilize a machine learning method to construct a definition enhancement model, a color enhancement model and a resolution enhancement model, integrate the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement, realize definition enhancement and/or color enhancement and/or resolution enhancement of video image quality in the video transcoding program according to video image quality enhancement requests of users, satisfy different video image quality enhancement requests of users, and improve the universality of the video image quality enhancement method.
The above embodiments may be combined in any manner, and features described in the embodiments in the present specification may be replaced or combined with each other in the above description of the disclosed embodiments, so as to enable those skilled in the art to make or use the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (9)
1. The method for constructing the video image quality enhancement model is characterized by comprising the following steps of:
Acquiring definition enhancement model training data, resolution enhancement model training data and color enhancement model training data;
Training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of a convolutional neural network;
training a two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
Training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
Converting the definition enhancement model, the color enhancement model and the resolution enhancement model into preset formats, and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement;
the self-coding network further comprises a noise estimation sub-network, wherein the noise estimation sub-network is obtained by training the convolutional neural network through noise estimation training data;
the training of the self-coding network by using the definition enhancement model training data to obtain a definition enhancement model comprises the following steps:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
2. The method of claim 1, wherein the first loss function is a weighted sum of a minimum absolute value deviation function L1-loss, a minimum square error function L2-loss, and a smoothing loss function smoth-loss.
3. The method of claim 1, wherein training the two-way generation type countermeasure network with the color enhancement model training data to obtain a color enhancement model, comprising:
Inputting the training data of the color enhancement model into the two-way generation type countermeasure network to obtain output data of the two-way generation type countermeasure network;
Inputting the output data of the two-way generation type countermeasure network and the real reference image data of the color enhancement model training data into a second loss function to obtain an output value of the second loss function, wherein the second loss function is a cycle consistency loss function;
And when the output value of the second loss function converges, obtaining the color enhancement model.
4. The method of claim 1, wherein the convolutional neural network corresponding to the resolution enhancement model uses residual networks as basic modules, and a preset cascade mechanism is added between the residual networks.
5. The method of claim 4, wherein training the convolutional neural network using the resolution enhancement model training data to obtain a resolution enhancement model comprises:
inputting the training data of the resolution enhancement model into a convolutional neural network to obtain output data of the convolutional neural network;
Inputting the output data of the convolutional neural network and the real reference image data of the resolution enhancement model training data into a third loss function to obtain an output value of the third loss function, wherein the third loss function is a function based on a feature pyramid;
and when the output value of the third loss function converges, obtaining the resolution enhancement model.
6. A method for enhancing video quality, comprising:
Under the condition that a video image quality enhancement request is received, analyzing the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of a definition enhancement option, a color enhancement option and a resolution enhancement option;
Inputting the video frame to be enhanced into a video image enhancement model corresponding to the enhancement processing option to obtain a video frame after video image enhancement processing, wherein the video image enhancement model is pre-constructed according to the method for constructing a video image enhancement model according to any one of claims 1-5, the definition enhancement option corresponds to a definition enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
7. The method according to claim 6, wherein when the video image enhancement request includes more than one enhancement processing options, the inputting the video frame to be enhanced into the video image enhancement model corresponding to the enhancement processing options, to obtain the video frame after the video image enhancement processing, includes:
And inputting the video frames to be enhanced into a video image quality enhancement model corresponding to the enhancement processing options according to the sequence of definition enhancement, color enhancement and resolution enhancement, so as to obtain video frames after video image quality enhancement processing.
8. A device for constructing a video image quality enhancement model, comprising:
the training data acquisition unit is used for acquiring the definition enhancement model training data, the resolution enhancement model training data and the color enhancement model training data;
The definition enhancement model building unit is used for training a self-coding network by utilizing the definition enhancement model training data to obtain a definition enhancement model, wherein the self-coding network comprises an encoder and a decoder, and the encoder and the decoder respectively consist of convolutional neural networks;
The color enhancement model building unit is used for training the two-way generation type countermeasure network by utilizing the color enhancement model training data to obtain a color enhancement model;
The resolution enhancement model building unit is used for training the convolutional neural network by utilizing the resolution enhancement model training data to obtain a resolution enhancement model;
The model integration unit is used for converting the definition enhancement model, the color enhancement model and the resolution enhancement model into preset formats and integrating the definition enhancement model, the color enhancement model and the resolution enhancement model into a video transcoding program according to the sequence of definition enhancement, color enhancement and resolution enhancement;
the self-coding network further comprises a noise estimation sub-network, wherein the noise estimation sub-network is obtained by training the convolutional neural network through noise estimation training data;
the training of the self-coding network by using the definition enhancement model training data to obtain a definition enhancement model comprises the following steps:
inputting the definition enhancement model training data into the noise estimation sub-network to obtain a noise value of the definition enhancement model training data;
Sequentially inputting the definition enhancement model training data and the noise value into the encoder and the decoder to obtain output data of the self-coding network;
Inputting the output data of the self-coding network and the real reference image data of the definition enhancement model training data into a first loss function to obtain an output value of the first loss function;
And when the output value of the first loss function converges, obtaining the definition enhancement model.
9. A video image quality enhancement apparatus, comprising:
The enhancement request analysis unit is used for analyzing the video image quality enhancement request under the condition of receiving the video image quality enhancement request to obtain a video frame to be enhanced and enhancement processing options, wherein the enhancement processing options are at least any one of definition enhancement options, color enhancement options and resolution enhancement options;
The enhancement processing unit is configured to input the video frame to be enhanced into a video image quality enhancement model corresponding to the enhancement processing option, to obtain a video frame after video image quality enhancement processing, where the video image quality enhancement model is pre-constructed according to a method for constructing a video image quality enhancement model according to any one of claims 1 to 5, the sharpness enhancement option corresponds to a sharpness enhancement model, the color enhancement option corresponds to a color enhancement model, and the resolution enhancement option corresponds to a resolution enhancement model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011277818.XA CN114513684B (en) | 2020-11-16 | 2020-11-16 | Method for constructing video image quality enhancement model, video image quality enhancement method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011277818.XA CN114513684B (en) | 2020-11-16 | 2020-11-16 | Method for constructing video image quality enhancement model, video image quality enhancement method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114513684A CN114513684A (en) | 2022-05-17 |
CN114513684B true CN114513684B (en) | 2024-05-28 |
Family
ID=81547015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011277818.XA Active CN114513684B (en) | 2020-11-16 | 2020-11-16 | Method for constructing video image quality enhancement model, video image quality enhancement method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114513684B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108235058A (en) * | 2018-01-12 | 2018-06-29 | 广州华多网络科技有限公司 | Video quality processing method, storage medium and terminal |
CN110020684A (en) * | 2019-04-08 | 2019-07-16 | 西南石油大学 | A kind of image de-noising method based on residual error convolution autoencoder network |
CN110189268A (en) * | 2019-05-23 | 2019-08-30 | 西安电子科技大学 | Underwater picture color correcting method based on GAN network |
CN110263801A (en) * | 2019-03-08 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Image processing model generation method and device, electronic equipment |
CN111882489A (en) * | 2020-05-15 | 2020-11-03 | 东北石油大学 | Super-resolution graph recovery method for simultaneously enhancing underwater images |
-
2020
- 2020-11-16 CN CN202011277818.XA patent/CN114513684B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108235058A (en) * | 2018-01-12 | 2018-06-29 | 广州华多网络科技有限公司 | Video quality processing method, storage medium and terminal |
CN110263801A (en) * | 2019-03-08 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Image processing model generation method and device, electronic equipment |
CN110020684A (en) * | 2019-04-08 | 2019-07-16 | 西南石油大学 | A kind of image de-noising method based on residual error convolution autoencoder network |
CN110189268A (en) * | 2019-05-23 | 2019-08-30 | 西安电子科技大学 | Underwater picture color correcting method based on GAN network |
CN111882489A (en) * | 2020-05-15 | 2020-11-03 | 东北石油大学 | Super-resolution graph recovery method for simultaneously enhancing underwater images |
Also Published As
Publication number | Publication date |
---|---|
CN114513684A (en) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363716B (en) | High-quality reconstruction method for generating confrontation network composite degraded image based on conditions | |
CN111739082B (en) | Stereo vision unsupervised depth estimation method based on convolutional neural network | |
WO2020037965A1 (en) | Method for multi-motion flow deep convolutional network model for video prediction | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN111260560B (en) | Multi-frame video super-resolution method fused with attention mechanism | |
CN109949222B (en) | Image super-resolution reconstruction method based on semantic graph | |
CN110751649A (en) | Video quality evaluation method and device, electronic equipment and storage medium | |
CN112019828B (en) | Method for converting 2D (two-dimensional) video into 3D video | |
CN112017116B (en) | Image super-resolution reconstruction network based on asymmetric convolution and construction method thereof | |
Yu et al. | A review of single image super-resolution reconstruction based on deep learning | |
CN116958534A (en) | Image processing method, training method of image processing model and related device | |
Yao et al. | Bidirectional translation between uhd-hdr and hd-sdr videos | |
CN109949217A (en) | Video super-resolution method for reconstructing based on residual error study and implicit motion compensation | |
Chen et al. | Image denoising via deep network based on edge enhancement | |
CN113992920A (en) | Video compressed sensing reconstruction method based on deep expansion network | |
Löhdefink et al. | GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation | |
CN117576179A (en) | Mine image monocular depth estimation method with multi-scale detail characteristic enhancement | |
CN112862675A (en) | Video enhancement method and system for space-time super-resolution | |
CN114513684B (en) | Method for constructing video image quality enhancement model, video image quality enhancement method and device | |
CN117196940A (en) | Super-resolution reconstruction method suitable for real scene image based on convolutional neural network | |
CN116977169A (en) | Data processing method, apparatus, device, readable storage medium, and program product | |
CN117242421A (en) | Smart client for streaming of scene-based immersive media | |
CN115396683A (en) | Video optimization processing method and device, electronic equipment and computer readable medium | |
Wang et al. | Image quality enhancement using hybrid attention networks | |
Ma et al. | Reduced-reference stereoscopic image quality assessment using gradient sparse representation and structural degradation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |