CN113344786B - Video transcoding method, device, medium and equipment based on geometric generation model - Google Patents
Video transcoding method, device, medium and equipment based on geometric generation model Download PDFInfo
- Publication number
- CN113344786B CN113344786B CN202110652621.8A CN202110652621A CN113344786B CN 113344786 B CN113344786 B CN 113344786B CN 202110652621 A CN202110652621 A CN 202110652621A CN 113344786 B CN113344786 B CN 113344786B
- Authority
- CN
- China
- Prior art keywords
- video
- geometric
- model
- image
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 42
- 230000003321 amplification Effects 0.000 claims abstract description 17
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 17
- 230000003068 static effect Effects 0.000 claims abstract description 10
- 238000005070 sampling Methods 0.000 claims description 64
- 238000009826 distribution Methods 0.000 claims description 33
- 230000005540 biological transmission Effects 0.000 claims description 19
- 238000003708 edge detection Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 13
- 230000008447 perception Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 5
- 238000009827 uniform distribution Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 101100467475 Entamoeba histolytica RACB gene Proteins 0.000 description 1
- 101100523505 Oryza sativa subsp. japonica RAC6 gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005381 potential energy Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/20—Contour coding, e.g. using detection of edges
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a video transcoding method, a device, a medium and equipment based on a geometric generative model, which comprises the following steps: constructing a geometric generation model, and training to obtain a super-resolution reconstruction model; acquiring an MPEG-4 video; decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form; aiming at each frame of picture obtained after decoding the MPEG-4 video, carrying out super-resolution amplification reconstruction on each frame of picture through a super-resolution reconstruction model; and (5) encoding the image after super-resolution amplification reconstruction into a video in an H.265 format. In the method, the problem of mode collapse in the generation of the confrontation network is solved from the geometric perspective through the geometric generation model, and the confrontation learning is avoided, so that the generated image is richer and more vivid.
Description
Technical Field
The present invention relates to a video transcoding method, and in particular, to a video transcoding method, apparatus, medium, and device based on a geometric model.
Background
With the wide popularization of the internet, the network becomes a main way for acquiring information in daily life, industries such as network video and live broadcast are rapidly started, people also put forward higher requirements on the quality of videos, and the requirements of high-definition videos and even ultra-definition videos are increasingly large. In recent years, domestic 4K and 8K video devices are popular, but enough super-definition video resources are not available on the network, so that most of the super-definition video devices can only play common high-definition videos and cannot exert the advantages of hardware. The lack of ultra-high-definition video resources is caused by the fact that the ultra-high-definition video is relatively high in production cost, so that the development of the ultra-high-definition video cannot keep up with the demand of the public for the ultra-high-definition video, and is caused by the fact that the storage and the distribution of the ultra-high-definition video are difficult due to the limitation of network transmission bandwidth. Aiming at the problem of high manufacturing cost of super-definition video resources, the existing standard definition video is reconstructed by adopting a video super-resolution technology to become the super-definition video required by people; for the problem that ultra-clear videos are difficult to store and transmit, a better video coding method is needed to compress the videos, and occupation of network bandwidth is reduced. At present, common videos on a network are compressed based on an MPEG standard and an h.264 standard, but with the rise of ultra-high definition videos, even 4K and 8K videos, a higher compression rate is required to reduce the consumption of a storage space and the occupation of a transmission bandwidth, so that the latest video coding standard h.265 is formulated by a video coding expert group in 2012, compared with the h.264 standard, the h.265 adopts a higher compression rate, the network transmission bandwidth can be reduced by half under the condition of ensuring that the picture quality is basically unchanged, and meanwhile, the h.265 standard can also support the resolution up to 8K, so that the h.265 has an important significance for the storage and transmission of the ultra-high definition videos.
In the prior art, in order to restore and obtain high-definition and ultra-high-definition video resources from videos with less high resolution, a video transcoding technology is generally required to be adopted, and the existing video transcoding technology generally uses an interpolation method, but the traditional interpolation method has an obvious insufficient effect and has a phenomenon of low transcoding efficiency. In the prior art, a super-resolution image (SISR) technology based on deep learning mainly uses a Convolutional Neural Network (CNN) as a learning model, learns high-frequency information such as texture details missing from a low-resolution image through a large amount of data, and realizes end-to-end conversion from the low-resolution image to a high-resolution image. Compared with the traditional interpolation method, the deep learning method has great advantages, and the remarkable improvement is realized on the evaluation indexes of the effects such as PSNR (peak signal-to-noise ratio), SSIM (structural similarity) and the like. The super-resolution method based on the generation countermeasure network (GAN) can generate more vivid and clear textures and other details, so the method is widely applied to the field of image generation, and more commonly used are super-resolution, image translation human body posture generation and the like.
Disclosure of Invention
The first objective of the present invention is to overcome the drawbacks and deficiencies of the prior art, and to provide a video transcoding method based on a geometric generative model, wherein the problem of mode collapse in the generation of a countermeasure network is solved from the geometric perspective through the geometric generative model, and countermeasure learning is avoided, so that the generated images are richer and more vivid.
The second objective of the present invention is to provide a video transcoding apparatus based on geometric generative model.
A third object of the present invention is to provide a storage medium.
It is a fourth object of the invention to provide a computing device.
The first purpose of the invention is realized by the following technical scheme: a video transcoding method based on a geometric generation model comprises the following steps:
constructing a geometric generation model, and training to obtain a super-resolution reconstruction model;
acquiring an MPEG-4 video;
decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form;
aiming at each frame of picture obtained after decoding the MPEG-4 video, carrying out super-resolution amplification reconstruction on each frame of picture through a super-resolution reconstruction model;
and (5) encoding the super-resolution amplified and reconstructed image into a video in an H.265 format.
Preferably, the constructed geometric generative model comprises an encoding part and a decoding part;
the coding part comprises two down-sampling layers which are connected in sequence and are respectively a first down-sampling layer and a second down-sampling layer;
the decoding part comprises a first up-sampling layer, a second up-sampling layer and a convolutional layer; the upsampling processing of the first upsampling layer and the second upsampling layer is completed by adopting sub-pixel convolution;
wherein:
inputting the characteristics input by the geometric generation model into a first lower sampling layer;
the output characteristics of the first downsampling layer are used as the input of a second downsampling layer, and the output of the second downsampling layer is used as the output of the coding part;
the features input by the decoding part are input to a first up-sampling layer after passing through a channel attention residual block;
the features output by the first lower sampling layer and the features output by the first upper sampling layer are spliced in channel dimensions and then serve as the input of a second upper sampling layer after passing through a channel attention residual block;
and the output characteristics of the second up-sampling layer and the input characteristics of the geometric generation model are spliced in the channel dimension and then serve as the input of the convolution layer, and the output of the convolution layer is the output of the geometric generation model.
Preferably, in the geometrically generated model:
first, the encoding section maps an input data distribution v to a hidden space Z to obtain a feature distribution μ of the hidden space:
μ=f θ :∑→Z
where Σ denotes a substream of the input data distribution v, f θ Represents a coding map, θ is a parameter to be learned;
then, calculating the optimal transmission mapping T of the hidden space distribution mu, namely Z → Z, namely transforming the uniform distribution zeta to the hidden space distribution mu:
T:Z→Z=ζ→μ
the optimal transmission mapping T of the implicit spatial distribution is calculated by convex optimization;
finally, the distribution obtained after T transformation is input into a decoding part again, and a final high-resolution image is generated by the decoding part.
Further, the training process of the geometric generative model is as follows:
acquiring a low-resolution image which is known to correspond to the high-resolution image and is used as a training sample;
carrying out up-sampling processing on the low-resolution image serving as the training sample through Bicubic to obtain a target resolution image;
and taking the target resolution image corresponding to the low-resolution image as the input of the geometric generation model, taking the high-resolution image corresponding to the low-resolution image as the label image, and training the geometric generation model to obtain the super-resolution reconstruction model.
Furthermore, in the training process of the geometric generative model, edge perception loss processing is performed, specifically:
firstly, performing edge detection on a label image to obtain an edge value of each position;
when calculating the training error, the edge position is given a higher weight.
Furthermore, when the label image is subjected to edge detection, a Laplace filter is adopted as an operator of the edge detection, and a finally used training loss function L is obtained based on edge perception loss and balance Charbonier loss final Comprises the following steps:
L final =L Charbonnier +λ||M*(I SR -I HR )||;
wherein L is Charbonnier A variant, charbonnier loss, which is an L1 loss, specifically:
epsilon is a constant, and lambda is a scaling factor that balances the Charbonnier penalty and the edge perception penalty;
||M*(I SR -I HR ) I is the edge perception loss, I SR To represent the high resolution image reconstructed by the geometrically generated model, I HR Representing a label image;
when the edge detection is carried out on the label image, the edge value of each position is obtained, then a threshold value delta is set to mark whether each position is the edge position, the mark that the edge value is greater than the threshold value is 1, otherwise, the mark is 0:
in the formula, L (i, j) is a matrix formed by detecting laplacian edges at the positions of the label images (i, j) to obtain edge values, M (i, j) is an edge position label of the positions of the label images (i, j), and M (i, j).
wherein (x, y) represents the current position of the image, f (x, y) represents the gray-scale value at the current position (x, y) of the image, f (x-1, y) represents the gray-scale value at the position (x-1, y) of the image, f (x +1, y) represents the gray-scale value at the position (x +1, y) of the image, f (x, y-1) represents the gray-scale value at the position (x, y-1) of the image, and f (x, y + 1) represents the gray-scale value at the position (x, y + 1) of the image.
The second purpose of the invention is realized by the following technical scheme: a video transcoding device based on a geometric generative model, comprising:
the model building module is used for building a geometric generation model;
the model training module is used for training the constructed geometric generation model to obtain a super-resolution reconstruction model;
the acquisition module is used for acquiring an MPEG-4 video;
the video decoding module is used for decoding the MPEG-4 video and storing the decoded video in a continuous static picture form;
the video reconstruction module is used for carrying out super-resolution amplification reconstruction on each frame of picture obtained after decoding the MPEG-4 video through a super-resolution reconstruction model;
and the video coding module is used for coding the super-resolution amplified and reconstructed image into a video in an H.265 format.
The third purpose of the invention is realized by the following technical scheme: a storage medium storing a program which, when executed by a processor, implements the geometric generative model-based video transcoding method of embodiment 1.
The fourth purpose of the invention is realized by the following technical scheme: a computing device comprising a processor and a memory for storing a processor-executable program, the processor when executing the program stored in the memory implementing the method for transcoding geometry-based video as in embodiment 1.
Compared with the prior art, the invention has the following advantages and effects:
(1) The invention relates to a video transcoding method based on a geometric generation model, which comprises the steps of firstly constructing the geometric generation model and obtaining a super-resolution reconstruction model after training; obtaining an MPEG-4 video to be transcoded; decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form; for each frame of picture obtained after decoding the MPEG-4 video, carrying out super-resolution amplification reconstruction on each frame of picture through a super-resolution reconstruction model; and (5) encoding the super-resolution amplified and reconstructed image into a video in an H.265 format. In the method, the image super-resolution is reconstructed from the angle of geometry through the geometric generation model, so that the process of confrontation learning in the confrontation network is avoided, the training is more stable, the problem of mode collapse in the generation of the confrontation network is solved, and the generated image is richer and more vivid; the method can conveniently convert the low-resolution MPEG-4 video into the high-resolution H.265 video, the H.265 adopts higher compression ratio, the network transmission bandwidth can be reduced by half under the condition of ensuring that the picture quality is basically unchanged, and meanwhile, the H.265 standard can also support the resolution of 8K at most, thereby being very beneficial to the storage and transmission of the ultra-high definition video.
(2) In the video transcoding method based on the geometric generation model, after decoding an MPEG-4 video, the super-resolution reconstruction model carries out super-resolution amplification reconstruction on each frame of picture, the amplification factor can be four times, compared with the image reconstructed by the conventional interpolation method, the super-resolution reconstruction model based on the deep learning can obtain better amplification effect, and finally the reconstructed information is supplemented into the coded image, so that the information of the image after coding and decoding is ensured not to have deviation.
(3) In the video transcoding method based on the geometric generative model, the geometric generative model comprises an encoding part and a decoding part; the coding part comprises two down-sampling layers which are connected in sequence and are respectively a first down-sampling layer and a second down-sampling layer; the decoding part comprises a first up-sampling layer, a second up-sampling layer and a convolutional layer; the upsampling processing of the first upsampling layer and the second upsampling layer is completed by adopting sub-pixel convolution. After an image is input into a geometric generation model, features are compressed through two down-sampling operations of a first down-sampling layer and a second down-sampling layer of a coding part, then multi-features are restored through up-sampling operations of a first up-sampling layer and a second up-sampling layer of a decoding part, and a feature graph is adjusted by using a channel attention residual block before up-sampling, so that the feature extraction capability of the model is improved. In addition, the first up-sampling layer and the second up-sampling layer of the decoding part are completed by adopting sub-pixel convolution, so that the chessboard effect can be effectively reduced.
(4) In the video transcoding method based on the geometric generation model, the features output by the coding part are transformed by the hidden feature spatial distribution transformation process to realize the transformation of hidden feature vectors, and finally the transformed feature vectors are input to the decoding part; the method is based on an image super-resolution method of a geometric generation model, coding and decoding mapping of data manifold to a hidden space are realized by an Encoder (coding) -Decoder (decoding) structure, a Brenier potential energy function is calculated in the hidden space by a geometric method, namely a convex optimization method, optimal transmission mapping is obtained, uniform distribution is mapped to distribution of data in the hidden space, the learning model is extremely simple, meanwhile, the probability transformation part is theoretically transparent, the optimized energy is convex, the existence, uniqueness and numerical stability of the optimal transmission mapping are guaranteed, therefore, a part of a network black box of the geometric generation model is transparent, a semitransparent generation model is obtained, and meanwhile, the approximation order from discrete solution to smooth solution is also theoretically guaranteed.
(5) In the video transcoding method based on the geometric generation model, in order to alleviate the problem of edge blurring, an edge perception loss is adopted during the training of the geometric generation model, namely edge detection is firstly carried out on a label image (a real image) to obtain edge information of each position, then higher weight is given to the edge positions during the calculation of a training error, and the edge characteristics are paid more attention to during network training.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram of a geometry generating model in the method of the present invention.
FIG. 3 is a schematic diagram of the transformation process of the spatial distribution of the geometric generative model in the method of the invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1
The embodiment discloses a video transcoding method based on a geometric generation model, which is characterized in that a common standard definition video in an MPEG-4 format is reconstructed through video super-resolution and then is encoded into an ultra-high definition video in an H.265 standard so as to facilitate the transmission and storage of the ultra-high definition video. As shown in fig. 1, the method comprises the steps of:
s1, constructing a geometric generation model; in the present embodiment, as shown in fig. 2, the geometric generative model is constructed as an improved Encoder-Decoder (encoding-decoding) structure including an encoding part and a decoding part; wherein:
the coding part comprises two down-sampling layers which are connected in sequence and are respectively a first down-sampling layer and a second down-sampling layer;
the decoding part comprises a first up-sampling layer, a second up-sampling layer and a convolutional layer; the up-sampling processing of the first up-sampling layer and the second up-sampling layer is completed by adopting sub-pixel convolution;
wherein:
(1) The features of the geometry generator model input are input to the first downsampling layer. The geometric generative model of the embodiment has the following inputs: and the low-resolution image is up-sampled by Bicubic to obtain an image with a target resolution.
(2) The output characteristic of the first downsampling layer is taken as the input of the second downsampling layer, and the output of the second downsampling layer is taken as the output of the coding part.
(3) The features input by the decoding part are input to the first up-sampling layer after feature map adjustment is carried out on the features by a channel attention residual block.
(4) The features output by the first down-sampling layer and the features output by the first up-sampling layer are spliced in channel dimensions, subjected to feature map adjustment by a channel attention residual block RACB, and then input to the second up-sampling layer.
(5) And the output characteristics of the second up-sampling layer and the input characteristics of the geometric generation model are spliced in the channel dimension and then serve as the input of the convolution layer, and the output of the convolution layer is the output of the geometric generation model.
As shown in fig. 3, in the geometric model of this embodiment, the encoding part is to map the input data distribution ν to the hidden space Z, and obtain the feature distribution μ:
μ=f θ :∑→Z
where Σ denotes a substream row of the input data distribution v, f θ Representing the coding mapping, theta is the parameter to be learned, then we compute the optimal transmission mapping T Z → Z for the implicit spatial distribution μ, i.e. transform the uniform distribution ζ to the implicit spatial distribution μ:
T:Z→Z=ζ→μ
the optimal transmission mapping T of the implicit spatial distribution can be calculated by convex optimization of a transparent geometric method, so that manifold dimensionality reduction and probability transformation are separated, a black box is partially replaced by a transparent optimal transmission model to obtain a semitransparent network model, and finally the distribution obtained after T transformation is input into a decoding part to generate a final high-resolution image.
S2, training the geometric generation model to obtain a super-resolution reconstruction model, wherein the specific training process is as follows:
and S21, acquiring a low-resolution image which is known to correspond to the high-resolution image and taking the low-resolution image as a training sample.
In this embodiment, a training sample set is constructed using a data set DIV2K commonly used in the super resolution field, which contains 800 pairs of training images consisting of low-resolution and high-resolution images, and 100 pairs of images for verification.
S22, performing upsampling processing on the low-resolution image serving as the training sample through Bicubic to obtain a target resolution image;
and S23, taking the target resolution image corresponding to the low-resolution image as the input of the geometric generation model, taking the high-resolution image corresponding to the low-resolution image as the label image, and training the geometric generation model to obtain the super-resolution reconstruction model.
In the step, training is completed to obtain an encoding part and a decoding part of the Encoder-Decoder, wherein a hidden vector between the encoding part and the decoding part is hidden feature distribution of the image.
In this step, in the training process of the geometric model, edge perception loss processing is performed, specifically:
(1) Firstly, edge detection is carried out on the label image to obtain an edge value of each position. In this embodiment, when performing edge detection on a label image, a laplacian filter is used as an edge detection operator, the laplacian is a high-pass linear filter based on image derivative operation, curvature of an image function is measured by a second derivative, for an image, a larger derivative of a pixel value indicates a sharper image gray level change, that is, the position is an image edge, and the second derivative is 0 at an edge position where the derivatives are maximum values, and the laplacian detects the edge position by using this characteristic. Since the image is two-dimensional, a second order partial derivative is required in both x and y directions:
to better fit discrete digital images, its second order differential form is commonly used:
and (3) obtaining a calculation formula of the Laplace operator by sorting:
that is, the sum of the gray values of the four neighborhoods is subtracted by four times of the gray value of the current position, and the sum can be written into a template matrix form:
if consider the case of two diagonal lines, another 8-neighborhood template can be obtained:
wherein (x, y) represents the current position of the image, f (x, y) represents the gray-scale value at the current position (x, y) of the image, f (x-1, y) represents the gray-scale value at the position (x-1, y) of the image, f (x +1, y) represents the gray-scale value at the position (x +1, y) of the image, f (x, y-1) represents the gray-scale value at the position (x, y-1) of the image, and f (x, y + 1) represents the gray-scale value at the position (x, y + 1) of the image.
In this embodiment, when performing edge detection on a label image, an edge value of each position is obtained, and then a threshold δ is set to mark whether each position is an edge position, where a mark that the edge value is greater than the threshold is 1, and otherwise is 0, specifically:
in the formula, L (i, j) is an edge value obtained by laplacian edge detection at the position of the label image (i, j), M (i, j) is an edge position label of the position of the label image (i, j), and the threshold δ may be 0.1.
(2) When the training error is calculated, compared with other positions in the label image, the edge positions are given higher weights, and the edge features are more concerned when the network is trained.
In this embodiment, a training loss function L to be finally used is obtained based on the edge perception loss and the balance charbonier loss final Comprises the following steps:
L final =L Charbonnier +λ||M*(I SR -I HR )||;
wherein L is Charbonnier A variant, charbonier loss, which is a loss of L1, specifically:
epsilon is a constant, and lambda is a scaling coefficient for balancing the charbonier loss and the edge perception loss, i.e. a weighting coefficient, which is generally 0.1.
||M*(I SR -I HR ) I is the edge perception loss, I SR To represent the high-resolution image reconstructed by the geometrically generated model I HR Representing a label image, M is a matrix of M (i, j).
And S3, acquiring the MPEG-4 video, decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form.
For MPEG-4 video coding and decoding, the coding process is to carry out DCT change and quantization operation on image blocks in sequence, and the decoding process is to carry out inverse quantization first and then carry out inverse DCT transform. Finally, the reconstructed information is supplemented into the coded image, thereby ensuring that the coded and decoded image information has no deviation.
In this step, the MPEG-4 video can be decoded using the multimedia framework ffmpeg, and the specific codes are as follows:
ffmpeg-i xx.y4m-vsync 0xx%3d.bmp–y。
and S4, performing super-resolution amplification reconstruction on each frame of picture obtained after decoding the MPEG-4 video through a super-resolution reconstruction model.
In this embodiment, the super-resolution reconstruction model obtained in step S1 enlarges each frame of picture by four times. Before the picture is input into the super-resolution reconstruction model, the picture is up-sampled through Bicubic to obtain an image with a target resolution, and then the image is input into the super-resolution reconstruction model.
And S5, encoding the image subjected to super-resolution amplification reconstruction into a video in an H.265 format. Since the resolution of the enlarged and reconstructed picture is large and is not suitable for storage and transmission, the embodiment compresses the information by using an h.265 coding method, and reduces redundant information. In this step, the encoding process may be implemented by a multimedia framework ffmpeg, and the specific codes are as follows:
ffmpeg-i xx%3d.bmp-pix_fmt yuv420p-vsync 0xx.y4m-y-vcodec libx265。
those skilled in the art will appreciate that all or part of the steps in the method according to the present embodiment may be implemented by a program to instruct the relevant hardware, and the corresponding program may be stored in a computer-readable storage medium. It should be noted that although the method operations of embodiment 1 are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution, some steps may be performed concurrently, some steps may additionally or alternatively be omitted, multiple steps may be combined into one step execution, and/or a step may be broken down into multiple step executions.
Example 2
The embodiment discloses a video transcoding device based on a geometric generation model, which comprises a model construction module, a model training module, an acquisition module, a video decoding module, a video reconstruction module and a video coding module, wherein the functions of the modules are as follows:
the model building module is used for building a geometric generation model; in this embodiment, the geometric model is shown in fig. 2, and the specific structure is described in embodiment 1.
And the model training module is used for training the constructed geometric generation model to obtain a super-resolution reconstruction model.
And the acquisition module is used for acquiring the MPEG-4 video.
And the video decoding module is used for decoding the MPEG-4 video and storing the decoded video in a continuous static picture form.
And the video reconstruction module is used for carrying out super-resolution amplification reconstruction on each frame of picture obtained after decoding the MPEG-4 video through a super-resolution reconstruction model.
And the video coding module is used for coding the image subjected to super-resolution amplification reconstruction into a video in an H.265 format.
For specific implementation of each module in this embodiment, reference may be made to embodiment 1, and details are not described here. It should be noted that, the apparatus provided in this embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3
The present embodiment discloses a storage medium storing a program, which when executed by a processor, implements the video transcoding method based on the geometric generative model described in embodiment 1, as follows:
constructing a geometric generation model, and training to obtain a super-resolution reconstruction model;
acquiring an MPEG-4 video;
decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form;
for each frame of picture obtained after decoding the MPEG-4 video, carrying out super-resolution amplification reconstruction on each frame of picture through a super-resolution reconstruction model;
and (5) encoding the super-resolution amplified and reconstructed image into a video in an H.265 format.
In this embodiment, specific implementation of each process may be referred to in embodiment 1, which is not described herein again.
In this embodiment, the storage medium may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a usb disk, a removable hard disk, or other media.
Example 4
The embodiment discloses a computing device, which includes a processor and a memory for storing a processor-executable program, and is characterized in that when the processor executes the program stored in the memory, the geometric model-based video transcoding method described in embodiment 1 is implemented as follows:
constructing a geometric generation model, and training to obtain a super-resolution reconstruction model;
acquiring an MPEG-4 video;
decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form;
aiming at each frame of picture obtained after decoding the MPEG-4 video, carrying out super-resolution amplification reconstruction on each frame of picture through a super-resolution reconstruction model;
and (5) encoding the image after super-resolution amplification reconstruction into a video in an H.265 format.
In this embodiment, the specific implementation of each process described above may refer to embodiment 1, which is not described in detail herein.
In this embodiment, the computing device may be a desktop computer, a notebook computer, a PDA handheld terminal, a tablet computer, or other terminal devices.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (8)
1. A video transcoding method based on a geometric generative model is characterized by comprising the following steps:
constructing a geometric generation model, and training to obtain a super-resolution reconstruction model;
the constructed geometric generative model comprises an encoding part and a decoding part;
the coding part comprises two down-sampling layers which are sequentially connected and respectively comprise a first down-sampling layer and a second down-sampling layer;
the decoding part comprises a first up-sampling layer, a second up-sampling layer and a convolution layer; the upsampling processing of the first upsampling layer and the second upsampling layer is completed by adopting sub-pixel convolution;
wherein:
inputting the characteristics input by the geometric generation model into a first lower sampling layer;
the output characteristic of the first downsampling layer is used as the input of a second downsampling layer, and the output of the second downsampling layer is used as the output of the coding part;
the features input by the decoding part are input to a first up-sampling layer after passing through a channel attention residual block;
the features output by the first lower sampling layer and the features output by the first upper sampling layer are spliced in channel dimensions and then serve as the input of a second upper sampling layer after passing through a channel attention residual block;
the features output by the second up-sampling layer and the features input by the geometric generative model are spliced in the channel dimension and then used as the input of the convolution layer, and the output of the convolution layer is the output of the geometric generative model;
in the geometry generation model:
firstly, mapping the input data distribution v to a hidden space Z by an encoding part to obtain the characteristic distribution mu of the hidden space:
μ=f θ :Σ→Z
wherein Σ represents one substream row of the input data distribution v, f θ Represents a coding map, θ is a parameter to be learned;
and then calculating the optimal transmission mapping T of the hidden space distribution mu, Z → Z, namely transforming the uniform distribution zeta to the hidden space distribution mu:
T:Z→Z=ζ→μ
the optimal transmission mapping T of the implicit spatial distribution is calculated by convex optimization;
finally, the distribution obtained after T transformation is input into a decoding part again, and a final high-resolution image is generated through the decoding part;
acquiring an MPEG-4 video;
decoding the MPEG-4 video, and storing the decoded MPEG-4 video in a continuous static picture form;
aiming at each frame of picture obtained after decoding the MPEG-4 video, carrying out super-resolution amplification reconstruction on each frame of picture through a super-resolution reconstruction model;
and (5) encoding the super-resolution amplified and reconstructed image into a video in an H.265 format.
2. The geometric model generation-based video transcoding method of claim 1, wherein the training process of the geometric model generation is as follows:
acquiring a low-resolution image which is known to correspond to the high-resolution image and is used as a training sample;
performing upsampling processing on a low-resolution image serving as a training sample through Bicubic to obtain a target resolution image;
and taking the target resolution image corresponding to the low-resolution image as the input of the geometric generation model, taking the high-resolution image corresponding to the low-resolution image as the label image, and training the geometric generation model to obtain the super-resolution reconstruction model.
3. The video transcoding method based on the geometric generative model as claimed in claim 2, wherein during the training process of the geometric generative model, edge perception loss processing is performed, specifically:
firstly, performing edge detection on a label image to obtain an edge value of each position;
when calculating the training error, the edge position is given a higher weight.
4. A method for transcoding video based on geometric models according to claim 3, characterized in thatWhen the label image is subjected to edge detection, a Laplacian filter is adopted as an operator of the edge detection, and a finally used training loss function L is obtained based on edge perception loss and balance Chardonnier loss final Comprises the following steps:
L final =L Charbonnier +λ||M*(I SR -I HR )||;
wherein L is Charbonnier A variant, charbonier loss, which is a loss of L1, specifically:
epsilon is a constant, and lambda is a scaling factor that balances the Charbonnier penalty and the edge perception penalty;
||M*(I SR -I HR ) I is the edge perception loss, I SR To represent the high resolution image reconstructed by the geometrically generated model, I HR Representing a label image;
when the edge detection is carried out on the label image, the edge value of each position is obtained, then a threshold value delta is set to mark whether each position is the edge position, the mark that the edge value is greater than the threshold value is 1, otherwise, the mark is 0:
in the formula, L (i, j) is an edge value obtained by laplacian edge detection at the position of the label image (i, j), M (i, j) is an edge position label of the position of the label image (i, j), and M is a matrix formed by M (i, j).
5. The geometry-based model generation video transcoding method of claim 4, wherein the Laplacian filter ® does 2 The formula for f is:
▽ 2 f=f(x-1,y)+f(x+1,y)+f(x,y-1)+f(x,y+1)-4f(x,y);
wherein (x, y) represents the current position of the image, f (x, y) represents the gray-scale value at the current position (x, y) of the image, f (x-1, y) represents the gray-scale value at the position (x-1, y) of the image, f (x +1, y) represents the gray-scale value at the position (x +1, y) of the image, f (x, y-1) represents the gray-scale value at the position (x, y-1) of the image, and f (x, y + 1) represents the gray-scale value at the position (x, y + 1) of the image.
6. A video transcoding apparatus based on geometric generative model, comprising:
the model building module is used for building a geometric generation model;
the constructed geometric generative model comprises an encoding part and a decoding part;
the coding part comprises two down-sampling layers which are sequentially connected and respectively comprise a first down-sampling layer and a second down-sampling layer;
the decoding part comprises a first up-sampling layer, a second up-sampling layer and a convolution layer; the up-sampling processing of the first up-sampling layer and the second up-sampling layer is completed by adopting sub-pixel convolution;
wherein:
inputting the characteristics input by the geometric generation model into a first lower sampling layer;
the output characteristics of the first downsampling layer are used as the input of a second downsampling layer, and the output of the second downsampling layer is used as the output of the coding part;
the features input by the decoding part are input to a first up-sampling layer after passing through a channel attention residual block;
the characteristics output by the first lower sampling layer and the characteristics output by the first upper sampling layer are spliced in channel dimensions and then serve as the input of a second upper sampling layer after passing through a channel attention residual block;
the features output by the second up-sampling layer and the features input by the geometric generative model are spliced in the channel dimension and then used as the input of the convolution layer, and the output of the convolution layer is the output of the geometric generative model;
in the geometry generation model:
firstly, the encoding part maps the input data distribution v to the hidden space Z to obtain the characteristic distribution mu of the hidden space:
μ=f θ :Σ→Z
wherein Σ represents one substream row of the input data distribution v, f θ Represents a coding map, θ is a parameter to be learned;
and then calculating the optimal transmission mapping T of the hidden space distribution mu, Z → Z, namely transforming the uniform distribution zeta to the hidden space distribution mu:
T:Z→Z=ζ→μ
the optimal transmission mapping T of the implicit spatial distribution is calculated by convex optimization;
finally, the distribution obtained after T transformation is input into a decoding part again, and a final high-resolution image is generated through the decoding part;
the model training module is used for training the constructed geometric generation model to obtain a super-resolution reconstruction model;
the acquisition module is used for acquiring MPEG-4 video;
the video decoding module is used for decoding the MPEG-4 video and storing the decoded video in a continuous static picture form;
the video reconstruction module is used for carrying out super-resolution amplification reconstruction on each frame of picture obtained after decoding the MPEG-4 video through a super-resolution reconstruction model;
and the video coding module is used for coding the super-resolution amplified and reconstructed image into a video in an H.265 format.
7. A storage medium storing a program, wherein the program, when executed by a processor, implements the method for transcoding a video according to any one of claims 1 to 5.
8. A computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the method for transcoding geometry-based video according to any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110652621.8A CN113344786B (en) | 2021-06-11 | 2021-06-11 | Video transcoding method, device, medium and equipment based on geometric generation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110652621.8A CN113344786B (en) | 2021-06-11 | 2021-06-11 | Video transcoding method, device, medium and equipment based on geometric generation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113344786A CN113344786A (en) | 2021-09-03 |
CN113344786B true CN113344786B (en) | 2023-02-14 |
Family
ID=77476713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110652621.8A Active CN113344786B (en) | 2021-06-11 | 2021-06-11 | Video transcoding method, device, medium and equipment based on geometric generation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113344786B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230252605A1 (en) * | 2022-02-10 | 2023-08-10 | Lemon Inc. | Method and system for a high-frequency attention network for efficient single image super-resolution |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862370A (en) * | 2017-11-30 | 2019-06-07 | 北京大学 | Video super-resolution processing method and processing device |
CN111970513A (en) * | 2020-08-14 | 2020-11-20 | 成都数字天空科技有限公司 | Image processing method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10944996B2 (en) * | 2019-08-19 | 2021-03-09 | Intel Corporation | Visual quality optimized video compression |
-
2021
- 2021-06-11 CN CN202110652621.8A patent/CN113344786B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862370A (en) * | 2017-11-30 | 2019-06-07 | 北京大学 | Video super-resolution processing method and processing device |
CN111970513A (en) * | 2020-08-14 | 2020-11-20 | 成都数字天空科技有限公司 | Image processing method and device, electronic equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution;Yong Guo 等;《Conference on Computer Vision and Pattern Recognition》;20200613;第5410页及图3 * |
基于权重量化与信息压缩的车载图像超分辨率重建;许德智 等;《计算机应用》;20191210;第3644-3649页 * |
基于联合插值-恢复的超分辨率图像盲复原;谢颂华 等;《计算机应用》;20100228;第341页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113344786A (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hui et al. | Fast and accurate single image super-resolution via information distillation network | |
Gao et al. | Implicit diffusion models for continuous super-resolution | |
CN111192200A (en) | Image super-resolution reconstruction method based on fusion attention mechanism residual error network | |
CN109949222B (en) | Image super-resolution reconstruction method based on semantic graph | |
CN110675321A (en) | Super-resolution image reconstruction method based on progressive depth residual error network | |
Li et al. | Hst: Hierarchical swin transformer for compressed image super-resolution | |
WO2023000179A1 (en) | Video super-resolution network, and video super-resolution, encoding and decoding processing method and device | |
CN115829876A (en) | Real degraded image blind restoration method based on cross attention mechanism | |
CN115131675A (en) | Remote sensing image compression method and system based on reference image texture migration | |
Xing et al. | Scale-arbitrary invertible image downscaling | |
Löhdefink et al. | On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation | |
CN113344786B (en) | Video transcoding method, device, medium and equipment based on geometric generation model | |
CN115552905A (en) | Global skip connection based CNN filter for image and video coding | |
Löhdefink et al. | GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation | |
CN115294222A (en) | Image encoding method, image processing method, terminal, and medium | |
CN115170392A (en) | Single-image super-resolution algorithm based on attention mechanism | |
CN105392014B (en) | A kind of wavelet-transform image compression method of optimization | |
CN112750175B (en) | Image compression method and system based on octave convolution and semantic segmentation | |
Liu et al. | Arbitrary-scale super-resolution via deep learning: A comprehensive survey | |
Qiu | Interresolution look-up table for improved spatial magnification of image | |
CN111383158A (en) | Remote sensing image preprocessing method | |
CN117253126A (en) | Mixed architecture image reconstruction method for global fusion cross self-attention network | |
CN108492264B (en) | Single-frame image fast super-resolution method based on sigmoid transformation | |
Yang et al. | An optimization method for video upsampling and downsampling using interpolation-dependent image downsampling | |
CN105872536B (en) | A kind of method for compressing image based on dual coding pattern |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |