CN110324664A - A kind of video neural network based mends the training method of frame method and its model - Google Patents
A kind of video neural network based mends the training method of frame method and its model Download PDFInfo
- Publication number
- CN110324664A CN110324664A CN201910612434.XA CN201910612434A CN110324664A CN 110324664 A CN110324664 A CN 110324664A CN 201910612434 A CN201910612434 A CN 201910612434A CN 110324664 A CN110324664 A CN 110324664A
- Authority
- CN
- China
- Prior art keywords
- frame
- training
- video
- network
- reference frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 197
- 238000000034 method Methods 0.000 title claims abstract description 113
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 51
- 230000004927 fusion Effects 0.000 claims abstract description 103
- 238000000605 extraction Methods 0.000 claims abstract description 28
- 239000013589 supplement Substances 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 18
- 238000003475 lamination Methods 0.000 claims description 16
- 241000208340 Araliaceae Species 0.000 claims description 5
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 5
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 235000008434 ginseng Nutrition 0.000 claims description 5
- 239000000047 product Substances 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 26
- 230000008569 process Effects 0.000 abstract description 15
- 230000000694 effects Effects 0.000 abstract description 14
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 239000000284 extract Substances 0.000 description 7
- 230000008447 perception Effects 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 230000000712 assembly Effects 0.000 description 3
- 238000000429 assembly Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004907 flux Effects 0.000 description 3
- 238000007499 fusion processing Methods 0.000 description 3
- 244000086443 Craterellus fallax Species 0.000 description 2
- 235000007926 Craterellus fallax Nutrition 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000002844 melting Methods 0.000 description 2
- 230000008018 melting Effects 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234345—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides the training methods that a kind of video neural network based mends frame method and its model;After determining current training reference frame based on preset training set, training reference frame is input to preset initial model;The initial characteristics figure of the default level quantity of training reference frame is generated by feature extraction network;The initial characteristics figure of default level quantity is fused to fusion feature figure by Fusion Features network;Again by fusion feature figure input to output network, the training complementing video frame between the first training frames and the second training frames is exported;The penalty values of training complementing video frame are determined by preset prediction loss function;Continue to be trained to initial model next group of trained reference frame of input, until the parameter convergence in initial model, terminates training, obtains video benefit frame model.The present invention gets the comprehensive characteristic information of reference frame by feature extraction, Fusion Features process, to obtain mending the preferable video supplement frame of effect frame, to improve user's viewing experience.
Description
Technical field
The present invention relates to technical field of image processing, more particularly, to a kind of video neural network based mend frame method and
The training method of its model.
Background technique
In the related technology, it generallys use motion compensation process or benefit frame is carried out to video based on the method for light stream.Pass through fortune
When dynamic compensation method carries out benefit frame, reference frame image is divided into static and movement two parts, estimates object according to motion parts
The motion vector of body, so that it is determined that obtaining the image data of video frame to be mended;However, object occurs quickly between two frame of video
Under case of motion, it is poor to mend frame result.When carrying out benefit frame to video by the method based on light stream, it is assumed that be between consecutive frame
Brightness constancy finds previous frame in the variation in time-domain and the correlation between consecutive frame using pixel in image sequence
With corresponding relationship existing between present frame, so that it is determined that video frame to be mended;Become when occurring unexpected brightness between two frame of video
When change, it is poor to mend frame result;Since above-mentioned video mends the partial information that frame mode only considers reference frame during mending frame, lead
It causes benefit effect frame poor, causes user's viewing experience poor.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of videos neural network based to mend frame method and its model
Training method, to improve benefit effect frame.
In a first aspect, the embodiment of the invention provides the training method that a kind of video neural network based mends frame model,
It include: that current training reference frame is determined based on preset training set;Training reference frame includes the first training frames and the second instruction
Practice frame;Training reference frame is input to preset initial model;Initial model include feature extraction network, Fusion Features network and
Export network;The initial characteristics figure of the default level quantity of training reference frame is generated by feature extraction network;Melted by feature
It closes network and the initial characteristics figure of default level quantity is fused to fusion feature figure;By fusion feature figure input to output network,
Export the training complementing video frame between the first training frames and the second training frames;It is determined and is trained by preset prediction loss function
The penalty values of complementing video frame;Initial model is trained according to penalty values, until the parameter convergence in initial model, obtains
Video mends frame model.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein on
Stating feature extraction network includes sequentially connected the first convolutional network of multiple groups;Every group of first convolutional network includes volume interconnected
Lamination and average pond layer.
The possible embodiment of with reference to first aspect the first, the embodiment of the invention provides second of first aspect
Possible embodiment, wherein the level quantity of above-mentioned initial characteristics figure is multilayer;Scale between multilayer initial characteristics figure is not
Together;The step of initial characteristics figure of default level quantity is fused to fusion feature figure by Fusion Features network, comprising: according to
Multilayer initial characteristics figure is arranged successively by the scale of each layer initial characteristics figure;Wherein, the scale of the initial characteristics figure of top grade
It is minimum;The scale of the initial characteristics figure of bottom grade is maximum;The initial characteristics figure of top grade is determined as melting for top grade
Close characteristic pattern;In addition to top grade, by the fusion feature of the initial characteristics figure of current level and a upper level for current level
Figure is merged, and the fusion feature figure of current level is obtained;It is special that the fusion feature figure of lowest hierarchical level is determined as final fusion
Sign figure.
The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the third of first aspect
Possible embodiment, wherein features described above converged network includes sequentially connected the second convolutional network of multiple groups;Every group of volume Two
Product network includes bilinear interpolation layer and convolutional layer interconnected;By the upper of the initial characteristics figure of current level and current level
The step of fusion feature figure of one level is merged, and the fusion feature figure of current level is obtained, comprising: pass through bilinear interpolation
Layer carries out interpolation processing to the fusion feature figure of a upper level for current level, obtains the ruler with the initial characteristics figure of current level
The very little fusion feature figure to match;By convolutional layer by the current level after the initial characteristics figure of current level and interpolation processing
The fusion feature figure of a upper level carries out convolutional calculation, obtains the fusion feature figure of current level.
With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein on
Stating output network includes the first convolutional layer, the second convolutional layer, third convolutional layer, Volume Four lamination and feature synthesis layer;The first volume
Lamination, the second convolutional layer, third convolutional layer and Volume Four lamination are connected to the network with Fusion Features respectively;First convolutional layer, second
Convolutional layer, third convolutional layer and Volume Four lamination synthesize layer connection with feature respectively;By fusion feature figure input to output network,
The step of exporting the training complementing video frame between the first training frames and the second training frames, comprising: by the first convolutional layer to melting
It closes the corresponding characteristic of the first training frames in characteristic pattern and carries out the first convolution algorithm, export the first vertical characteristic pattern;Pass through
Two convolutional layers carry out the second convolution algorithm to the corresponding characteristic of the first training frames in fusion feature figure, and output first level is special
Sign figure;Third convolution algorithm is carried out to the corresponding characteristic of the second training frames in fusion feature figure by third convolutional layer, it is defeated
Second vertical characteristic pattern out;The 4th is carried out to the corresponding characteristic of the second training frames in fusion feature figure by Volume Four lamination
Convolution algorithm exports the second horizontal properties figure;Layer is synthesized to the first vertical characteristic pattern, first level characteristic pattern, the by feature
Two vertical characteristic patterns and the second horizontal properties figure carry out feature superposition processing, obtain training complementing video frame.
Second aspect, the embodiment of the present invention also provide a kind of video neural network based and mend frame method, comprising: obtain to
Mend the first reference frame and the second reference frame of frame video;First reference frame and the second reference frame are input to the video pre-established
Frame model is mended, complementing video frame is generated;Video mends the training that frame model mends frame model by above-mentioned video neural network based
Method training obtains;Complementing video frame is inserted between the first reference frame and the second reference frame.
The third aspect, the embodiment of the present invention also provide a kind of training device of video benefit frame model neural network based,
It include: trained reference frame determining module, for determining current training reference frame based on preset training set;Training reference frame
Including the first training frames and the second training frames;Training reference frame input module, it is preset first for reference frame will to be trained to be input to
Beginning model;Initial model includes feature extraction network, Fusion Features network and output network;Characteristic extracting module, for passing through
Feature extraction network generates the initial characteristics figure of the default level quantity of training reference frame;Fusion Features module, for passing through spy
It levies converged network and the initial characteristics figure of default level quantity is fused to fusion feature figure;Frame determining module is supplemented, for that will melt
Characteristic pattern input to output network is closed, the training complementing video frame between the first training frames and the second training frames is exported;Penalty values
Module is obtained, the penalty values of training complementing video frame are determined by preset prediction loss function;Training module, for according to damage
Mistake value is trained initial model, until the parameter convergence in initial model, obtains video and mend frame model.
Fourth aspect, the embodiment of the present invention also provide a kind of video benefit frame device neural network based, comprising: reference frame
Module is obtained, for obtaining the first reference frame and the second reference frame of frame video to be mended;Frame generation module is supplemented, is used for first
Reference frame and the second reference frame are input to the video pre-established and mend frame model, generate complementing video frame;It is logical that video mends frame model
The training method training for crossing above-mentioned video benefit frame model neural network based obtains;It supplements frame and is inserted into module, for that will supplement
Video frame is inserted between the first reference frame and the second reference frame.
5th aspect, the embodiment of the present invention also provides a kind of server, including processor and memory, memory are stored with
The machine-executable instruction that can be executed by processor, it is above-mentioned based on nerve net to realize that processor executes machine-executable instruction
The step of training method of the video benefit frame model of network or above-mentioned video neural network based mend frame method.
6th aspect, the embodiment of the present invention also provide a kind of machine readable storage medium, machine readable storage medium storage
There is machine-executable instruction, when being called and being executed by processor, machine-executable instruction promotes to handle machine-executable instruction
The training method of the above-mentioned video benefit frame model neural network based of device realization realization or above-mentioned video neural network based
The step of mending frame method.
The embodiment of the present invention bring it is following the utility model has the advantages that
The embodiment of the invention provides training method, dresses that a kind of video neural network based mends frame method and its model
It sets and server;After determining current training reference frame based on preset training set, training reference frame is input to preset
Initial model;The initial characteristics figure of the default level quantity of training reference frame is generated by feature extraction network;Melted by feature
It closes network and the initial characteristics figure of default level quantity is fused to fusion feature figure;Again by fusion feature figure input to output net
Network exports the training complementing video frame between the first training frames and the second training frames;It is determined by preset prediction loss function
The penalty values of training complementing video frame;Continue to be trained to initial model next group of trained reference frame of input, until introductory die
Parameter convergence in type, terminates training, obtains video and mends frame model.In which, pass through feature extraction network and Fusion Features
Network get reference frame compared with the comprehensive characteristic information of horn of plenty, then pass through trained video mend the available benefit of frame model
The preferable video of effect frame supplements frame, to improve user's viewing experience.
Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with
Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, better embodiment is cited below particularly, and match
Appended attached drawing is closed, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the process for the training method that a kind of video neural network based provided in an embodiment of the present invention mends frame model
Figure;
Fig. 2 is that a kind of video neural network based provided in an embodiment of the present invention is mended in the training method of frame model, just
The structural schematic diagram of beginning model;
Fig. 3 is the stream for the training method that another kind provided in an embodiment of the present invention video neural network based mends frame model
Cheng Tu;
Fig. 4 is the flow chart that a kind of video neural network based provided in an embodiment of the present invention mends frame method;
Fig. 5 is that a kind of adaptive video based on deep learning provided in an embodiment of the present invention is mended in frame method, nerve net
The data flow schematic diagram of network frame;
Fig. 6 is the structure for the training device that a kind of video neural network based provided in an embodiment of the present invention mends frame model
Schematic diagram;
Fig. 7 is the structural schematic diagram that a kind of video neural network based provided in an embodiment of the present invention mends frame device;
Fig. 8 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with embodiment, it is clear that described reality
Applying example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, the common skill in this field
Art personnel every other embodiment obtained without making creative work belongs to the model that the present invention protects
It encloses.
In the prior art, it generallys use motion compensation process or the method based on light stream carries out benefit frame.
The basic thought of motion compensation process are as follows: divide the image into static two parts obtained and move, estimate object
Motion vector, then obtains the image data of former frame according to the obtained motion vector of estimation, then by using predictive filter,
Obtain the prediction pixel of previous frame image data.However, under the quick case of motion of, the supplement that is obtained by the benefit frame mode
It is easy to appear the situations of fuzzy even serious distortion for frame;When the motion vector inaccuracy acquired to occlusion area, motion vector
The flatness of field is difficult to be guaranteed;When carrying out benefit frame to transition video, supplement frame will appear serious torsional deformation situation.
The basic thought of benefit frame method based on light stream are as follows: using variation of the pixel in image sequence in time-domain and
Correlation between consecutive frame finds previous frame with corresponding relationship existing between present frame, to calculate between consecutive frame
The motion information of object.However, since the basic assumption of optical flow method is the brightness constancy between consecutive frame, so unexpected brightness
Variation violates this it is assumed that this leads to the visual artifacts in frame interpolation result.In addition, optical flow method requirement adjacent video frames take frame
Time Continuous alternatively, " small " is compared in the movement of object between consecutive frame, thus is not suitable between the biggish image of spacing
Mend frame.
Based on this, the embodiment of the invention provides the training that a kind of video neural network based mends frame method and its model
Method, apparatus and server can be applied in the benefit frame or relevant image procossing of video, such as 2D video or 3D video.
It is neural network based to one kind disclosed in the embodiment of the present invention first for convenient for understanding the present embodiment
The training method that video mends frame model describes in detail.
A kind of video neural network based shown in Figure 1 mends the flow chart of the training method of frame model, this method
The following steps are included:
Step S100 determines current training reference frame based on preset training set;Training reference frame includes the first instruction
Practice frame and the second training frames.
There are multiple groups video frames in above-mentioned preset training set;Since this method is mainly used for the instruction that video mends frame model
Practice;And usually there is certain similarity between two reference frames of frame to be mended.Frame model can be mended based on the preset video
The scope of application divides the similarity dimensions of two reference frames, so that it is determined that current training reference frame.For example, if two ginsengs
It examines frame and belongs to different scenes, similarity between the two is lower, and under normal conditions It is not necessary to carry out benefit between the two
Frame, therefore similarity can be greater than some threshold value between two frames of the setting as training reference frame, to meet the first reference frame
Belong to the demand of Same Scene with the second reference frame.
Training reference frame is input to preset initial model by step S102.
Usually, the size of two reference frames of the same video is identical;If it is different, adjustable two reference frames
Picture size, preset initial network is input to after adjustment again.It, can be by the first training frames and second in specific implementation process
Training frames are spliced into an image, are input in preset initial model, are handled;Above-mentioned initial model may include spy
Sign extract network, Fusion Features network and output network, three realizes respectively carry out feature extraction processing, Fusion Features handle and
The function of final output complementing video frame.In addition, usually being adopted in an initial model case when above-mentioned trained reference frame is color image
It is handled with triple channel.
Step S104 generates the initial characteristics figure of the default level quantity of training reference frame by feature extraction network.
It can be various forms of neural networks, such as full convolutional network or fully-connected network that features described above, which extracts network,;
After training reference frame is input to feature extraction network, the initial characteristics figure of available default level quantity;The default level
Quantity is related to the convolution layer number in feature extraction network, can specifically be arranged according to demand.It in the specific implementation process, can be with
The initial characteristics figure that previous convolutional layer is exported carries out convolution fortune to it as the input of current convolutional layer, by current convolutional layer
It calculates, exports the initial characteristics figure of current layer, at this point, the scale of the initial characteristics figure of current layer is less than low one layer of initial characteristics
Figure.
The initial characteristics figure of default level quantity is fused to fusion feature figure by Fusion Features network by step S106.
It is obtained since different initial characteristics figures carries out convolution algorithm by different convolution kernels, different initial spies
Levy the feature of the variety classes or dimension comprising training reference frame in figure;These Fusion Features are risen by Fusion Features network
Come, exported for subsequent supplement frame, can make supplement frame that can more restore corresponding details;The fusion process can also pass through
Convolutional calculation obtains, and therefore, Fusion Features network may be various forms of neural networks, such as full convolutional network or full connection
Network etc.;In initial characteristics figure scale not at the same time it can also adding sample level to the feature in initial characteristics figure or fusion process
Figure carries out the transformation of scale.
Step S108 exports fusion feature figure input to output network between the first training frames and the second training frames
Training complementing video frame.
Contain the feature of the first training frames and the second training frames in above-mentioned fusion feature figure, the feature of complementing video frame with
The feature of first training frames and the second training frames has certain relationship;It also may include convolutional Neural net in above-mentioned output network
The structure of network extracts the spy for belonging to complementing video frame from the feature of the first training frames and the feature of the second training frames respectively
Sign, thus the corresponding trained complementing video frame of the current training reference frame of synthesis.
Step S110 determines the penalty values of training complementing video frame by preset prediction loss function.
Above-mentioned prediction loss function may include perception loss function, SSIM (structural similarityindex,
Structural similarity) loss function etc., it can the corresponding loss function of as needed or historical experience selection.
Step S112 is trained the initial model according to the penalty values, until the ginseng in the initial model
Number convergence obtains video and mends frame model.
Above-mentioned penalty values can reflect the matching degree of trained complementing video frame and ideal complement video frame;It can set in advance
Surely penalty values to be achieved are needed, during model training, the adjustment direction of the parameter in model can be drawn close to the penalty values, directly
To the penalty values are reached, the parameter convergence in initial model, available more mature video mends frame model.The process needs
A large amount of sample data;In fact, the training reference frame used can be unduplicated during the training initial model
Sample group, there may also be mutual duplicate sample groups.
The embodiment of the invention provides the training methods that a kind of video neural network based mends frame model;Based on preset
After training set determines current training reference frame, training reference frame is input to preset initial model;Pass through feature extraction
Network generates the initial characteristics figure of the default level quantity of training reference frame;By Fusion Features network by default level quantity
Initial characteristics figure is fused to fusion feature figure;Again by fusion feature figure input to output network, the first training frames and second are exported
Training complementing video frame between training frames;The penalty values of training complementing video frame are determined by preset prediction loss function;
Continue to be trained to initial model next group of trained reference frame of input, until the parameter convergence in initial model, terminates training,
It obtains video and mends frame model.In which, the more rich of reference frame is got by feature extraction network and Fusion Features network
Rich comprehensive characteristic information then mends the preferable video of the available benefit effect frame of frame model by trained video and supplements frame,
To improve user's viewing experience.
The embodiment of the invention also provides the training methods that another video neural network based mends frame model;This method
Emphasis is described through Fusion Features network to the fusion process of initial characteristics figure and by output network output training supplement frame
Process.
This method is based on initial model as shown in Figure 2;The initial model includes feature extraction network, Fusion Features network
And output network;Wherein, feature extraction network includes sequentially connected the first convolutional network of multiple groups;Every group of first convolutional network packet
Include convolutional layer interconnected and average pond layer;In Fig. 2 for including 5 layer of first convolutional network.Fusion Features network packet
Include sequentially connected second convolutional network;Every group of second convolutional network includes bilinear interpolation layer and convolutional layer interconnected;
In Fig. 2 for including 5 layer of second convolutional network.Export network include the first convolutional layer, the second convolutional layer, third convolutional layer,
Volume Four lamination and feature synthesize layer;First convolutional layer, the second convolutional layer, third convolutional layer and Volume Four lamination respectively with feature
Converged network connection;First convolutional layer, the second convolutional layer, third convolutional layer and Volume Four lamination synthesize layer company with feature respectively
It connects.
The flow chart of this method is as shown in Figure 3, comprising the following steps:
Step S300 determines current training reference frame based on preset training set;Training reference frame includes the first instruction
Practice frame and the second training frames.
Training reference frame is input to preset initial model by step S302.
Step S304 generates the initial characteristics figure of the default level quantity of training reference frame by feature extraction network;Knot
The structure of initial model shown in Fig. 2 is closed, presetting level quantity is 5, and each the first convolutional network of layer exports an initial characteristics
Figure;After reference frame being trained to be input to initial network, the convolutional layer of the first convolutional network by the first level and average pond
After layer processing, the initial characteristics figure of the first level is exported;First convolution of the initial characteristics figure of first level Jing Guo the second level
After the convolutional layer of network and average pond layer processing, the initial characteristics figure of the second level is exported;And so on, until obtaining 5
The initial characteristics figure of level.
Initial characteristics figure described in multilayer is arranged successively by step S306 according to the scale of each layer initial characteristics figure;Its
In, the scale of the initial characteristics figure of top grade is minimum;The scale of the initial characteristics figure of bottom grade is maximum.
The initial characteristics figure of top grade is determined as the fusion feature figure of the top grade by step S308.
Step S310, in addition to the top grade, by the upper of the initial characteristics figure of current level and the current level
The fusion feature figure of one level is merged, and the fusion feature figure of current level is obtained.
The fusion feature figure of lowest hierarchical level is determined as final fusion feature figure by step S312.
Based on the structure of Fusion Features network shown in Fig. 2, above-mentioned steps S310 can be accomplished by the following way:
(1) interpolation processing is carried out by fusion feature figure of the bilinear interpolation layer to a upper level for current level, obtained
The fusion feature figure to match with the size of the initial characteristics figure of current level.
(2) pass through convolutional layer for a upper level for the current level after the initial characteristics figure of current level and interpolation processing
Fusion feature figure carries out convolutional calculation, obtains the fusion feature figure of current level;In fact, in use convolutional layer to initial characteristics
Before figure and the processing of fusion feature figure, the corresponding part of initial characteristics figure and fusion feature figure can be overlapped.
Usually, in order to merge the initial characteristics figures of all levels, level quantity and the feature of Fusion Features network are mentioned
Take the level quantity of network identical, as shown in Figure 2.It in practice, can also according to demand, using different level quantity.By
Need to handle initial characteristics figure in Fusion Features network, the first convolutional network of feature extraction network also with Fusion Features
Second convolutional network of network is correspondingly connected with, as shown in Figure 2.
Step S314 carries out first to the corresponding characteristic of the first training frames in fusion feature figure by the first convolutional layer
Convolution algorithm exports the first vertical characteristic pattern.
Step S316 carries out second to the corresponding characteristic of the first training frames in fusion feature figure by the second convolutional layer
Convolution algorithm exports first level characteristic pattern.
Step S318 carries out third to the corresponding characteristic of the second training frames in fusion feature figure by third convolutional layer
Convolution algorithm exports the second vertical characteristic pattern.
Step S320 carries out the 4th to the corresponding characteristic of the second training frames in fusion feature figure by Volume Four lamination
Convolution algorithm exports the second horizontal properties figure.
Above-mentioned first convolutional layer, the second convolutional layer, third convolutional layer and Volume Four lamination convolution kernel be one-dimensional convolution
Core is compared to two-dimensional convolution core, and operand is smaller, and operation time is shorter;By one-dimensional convolution kernel respectively to fusion feature figure
In the first training frames, the vertical feature of the second training frames and horizontal properties extract;When using structure shown in Fig. 2, above-mentioned 4
A step can carry out simultaneously, reduce operation time.
Step S322 synthesizes layer to the first vertical characteristic pattern, first level characteristic pattern, the second vertical characteristic pattern by feature
And second horizontal properties figure carry out feature superposition processing, obtain train complementing video frame.
Step S324 determines the penalty values of training complementing video frame by preset prediction loss function.
Step S326 is trained the initial model according to the penalty values, until the ginseng in the initial model
Number convergence obtains video and mends frame model.
It is more using being made of convolutional layer and average pond layer during generating initial characteristics figure in the above method
A first convolutional network uses during initial characteristics figure is fused to fusion feature figure by convolutional layer and bilinearity difference
Multiple second convolutional networks of layer composition, obtain trained reference frame compared with the comprehensive feature of horn of plenty;It exports layer network and is based on four
A convolutional layer extracts the first training frames, the vertical feature of the second training frames and horizontal properties in fusion feature figure parallel,
It is finally synthesizing complementing video frame;The available preferable benefit effect frame of which, and operand is reduced, when reducing operation
Between.
The training method embodiment that frame model is mended based on above-mentioned video, the embodiment of the invention also provides one kind based on nerve
The video of network mends frame method, flow chart as shown in figure 4, method includes the following steps:
Step S400 obtains the first reference frame and the second reference frame of frame video to be mended.
Above-mentioned first reference frame and the second reference frame can be two frames adjacent in the sequence of frames of video of frame video to be mended,
Intermediate it can be separated with other video frames;The selection process of first reference frame and the second reference frame can also mend frame model referring to video
In training process, some requirements to training reference frame, such as two video frames are in Same Scene.
First reference frame and the second reference frame are input to the video pre-established and mend frame model, generated and mend by step S402
Fill video frame;Video is mended frame model and is obtained by the training method training that above-mentioned video neural network based mends frame model.
First reference frame of input model and the scale size of the second reference frame are answered identical;If it is different, then needing to adjust
To after identical, then input the video benefit frame model pre-established and handled.
Complementing video frame is inserted between the first reference frame and the second reference frame by step S404.
The entire treatment process of this method is end to end, not need to carry out subsequent processing, video frame rate to video frame
Conversion effect is good, compared with conventional method, can provide higher-quality video frame interpolation.
Based on the above embodiment, the present invention also provides a kind of, and the adaptive video based on deep learning mends frame method, should
Method the following steps are included:
Step (1) designs a neural network framework based on complete convolution.
Specifically, using the neural network of complete convolution, this network includes the retraction assemblies (phase for being used for feature extraction
When in features described above extract network) and one comprising up-sampling layer with execute prediction extension layer (be equivalent to features described above merge
Network), it further uses jump connection and extension layer is allowed to obtain the feature from neural network constriction.Information flow is directed to
The last one extension layer, the extension layer are divided into four subnets (being equivalent to four convolutional layers in above-mentioned output network), every height
Net calculates one of kernel;Its data flow schematic diagram is as shown in figure 5, the structure that retraction assemblies are formed with extension layer is equivalent to
Coder-decoder network, the feature of extraction, which is sent to, gives four subnets, the pixel relevant kernel and input frame convolution of estimation
To generate interpolation frame I.Wherein, each subnet estimates one in four 1D kernels of each output pixel in a manner of dense-pixel
(being equivalent to training process);In addition to there are one bilinearity difference layers for convolutional layer in subnet, it acts as the spies that will be extracted
Sign is amplified to match with input frame.In Fig. 5, I1' indicate from I1What reference frame extracted belongs to the feature of supplement frame I ', I2' table
Show from I2What reference frame extracted belongs to the feature of supplement frame I '.
It is specific as follows: for video frame interpolation, to aim at through two input frame I1And I2, obtain intermediate frame
Traditional video frame interpolation method includes two steps: estimation is synthesized with pixel, usually passes through two kinds of sides of light stream and picture element interpolation
Method is realized.When light stream is due to blocking, the problems such as motion blur and when becoming unreliable, the interpolation result that this method obtains may
It can inaccuracy.
For this method, for each output pixelA pair of of two-dimensional convolution is estimated using the method based on convolution
Core K1(x, y) and K2(x, y) and use they and I1And I2It carries out convolution and calculates the color of output pixel, each output pixel
Mathematical description are as follows:
Wherein P1(x, y) and P2(x, y) is with I1And I2In (x, y) centered on patch, be equivalent to I1And I2By upper
Eigenmatrix (being equivalent to above-mentioned fusion feature figure) is obtained after stating retraction assemblies and extension layer processing.
Disappeared by a pair of one-dimensional kernel close to two-dimentional kernel of estimation come solve to calculate larger kernel bring
Consumption problem.For K1And K2, estimation < k1,v,k1,h>and<k2,v,k2,h> it is approximately k1,v*k1,hAnd k2,v*k2,h, k1,v、k1,h
K respectively1Horizontal vector and vertical vector, k2,vAnd k2,hRespectively K2Horizontal vector and vertical vector, realizing will be each interior
The number of parameters of core has been reduced to 2n from original n*n.
To estimate that four groups of one-dimensional kernels, information flow are directed to the last one extension layer, which is divided into four subnets, often
A subnet calculates one of kernel.The combination expression of four kernels can also be considered as unified model to model, but used
When four sub-networks training during convergence rate faster.
Simultaneously to solve the problems, such as artifact in experiment, these pseudomorphisms are handled using bilinear interpolation, in the decoding of network
Up-sampling is executed in device.
Step (2) constructs loss function, so that the VGG-19 network effect based on feature reconstruction loss is more preferable.
Loss is defined using perception loss function, the mathematical description for perceiving loss function is as follows:
Wherein φ is the feature extracted from image,Indicate predicted value, IgtIndicate true value.
Step (3) is trained using convolution perception initial method initialization neural network parameter and using AdaMax,
The image-region for having used 128*128 size avoids improving training effect using the image-region for not including useful information.
Neural network parameter is initialized with convolution perception initial method, and is trained using AdaMax, wherein β 1=
0.9 be single order moments estimation exponential decay rate, β 2=0.999 be second order moments estimation exponential decay rate, learning rate 0.001,
It is classified as 16 minibatch.Using the image-region of 128 × 128 sizes, and non-training entire video frame.Avoiding makes
With the image-region for not including useful information, training effect is improved.
Generating training set, detailed process is as follows: all video frames being divided into three frame groups, and random in each three frames group
A frame is chosen, then extracts the three frame groups centered on the frame in video.Due to the resolution ratio of video have for model it is larger
Influence, have chosen the higher video of resolution ratio, and scale it and bring for the resolution ratio of 1280*720 to reduce video compress
Influence.For avoid choosing have in three frame groups it is a large amount of without or little motion frame, in three frame groups of calculating first frame and last
The average luminous flux of light stream and calculating between frame.Then, 500,000 three frame group has not been selected alternatively.Wherein, three frames
Amount of exercise in group is bigger, is easier to be selected.In this way, the training set with larger movement has been obtained.
Simultaneously as some video councils are made of many camera lenses, the color difference calculated between different frame is carried out the switching of detector lens and is deleted
Except the group across different camera lenses.Finally, calculating the entropy of the light stream in each sample, finally selecting 250,000 has maximum
Three frame groups of entropy form training dataset.It is concentrated in this training data, the luminous flux size of about 70% pixel is at least
20 pixels.Average value is 25 pixels, and maximum value is 38 pixels.
Training data is enhanced while training.Each sample size that training data is concentrated is 150 × 150 pictures
Element, and using size is that the patch of 128 × 128 pixels is trained, it in this way can be by carrying out random cropping to training data
Enhance to execute data, e-learning is prevented to be present in the spatial prior of training data concentration.By in first frame and last
Crop window is moved in frame to enhance the amount of exercise of each sample, while the holding crop window for saving intermediate frame is constant.Pass through
It is consecutively carried out this to operate and move first frame and the crop window of last frame round about, ensure that centre at this time
Frame is still available.Experiment discovery, the displacement effect for executing about 6 pixels is fine, this can make about 8.5 pictures of luminous flux increase
Element.By the vertically or horizontally random time sequencing for overturning the patch cut and exchange them at random, so that training dataset
Interior movement is symmetrical and prevents network from biasing.
After video mends frame model training, before model progress video interleave, it can be determined that two references
Whether frame is in Same Scene;It in the judgment process, can be by the respective pixel position of the first reference frame and the second reference frame
Pixel value subtracts each other, and obtains the corresponding pixel value difference of each location of pixels;According to the corresponding pixel value difference of each location of pixels, calculate
Total pixel difference;Judge whether total pixel difference is greater than or equal to preset difference threshold;If it is not, then the first reference frame of confirmation and institute
It states the second reference frame and is in Same Scene;If it is, the first reference frame of confirmation and second reference frame are not at same field
Scape.When two reference frames are not at Same Scene, it is not usually required to carry out video benefit frame.
Above-mentioned adaptive video based on deep learning mends frame method: including the feature, logical between joint account consecutive frame
It crosses result and generates two steps of intermediate frame;Input frame and spatially adaptive convolution kernel are subjected to convolution;By video frame interpolation representation
To use a pair of one-dimensional convolution kernel to carry out convolution to the part on input frame, occupancy is calculated to solve larger kernel bring
The problem of memory size increases;Using the complete convolutional neural networks of depth after the optimization, interior nuclear fusion can be once calculated
And at entire intermediate frame, and can allow for training neural network in conjunction with perception loss to generate the intermediate frame of high quality.
The above method, for mending frame fuzzy case, due to the method using adaptive convolution, it is thus possible to maximum limit
Degree keeps original image clarity, does not have ghost image artifact;For occlusion area and unexpected brightness change situation: not with the method for light stream
Reliably, and this method by deep learning estimates convolution kernel, be automatically synthesized pixel, mend effect frame and stablize;Additionally it is possible to realize
The convolution kernel of edge perception picture element interpolation.Above-mentioned video mends frame model for all kernels, and only a few has nonzero value;Along image
The pixel at edge, core are anisotropic, and are orientated and are aligned well with edge direction.
The training method embodiment that frame model is mended corresponding to above-mentioned video, the embodiment of the invention also provides one kind based on mind
Video through network mends the training device of frame model, and structural schematic diagram is as shown in Figure 6, comprising:
Training reference frame determining module 600, for determining current training reference frame based on preset training set;Training
Reference frame includes the first training frames and the second training frames.
Training reference frame input module 602, for reference frame will to be trained to be input to preset initial model;Initial model packet
Include feature extraction network, Fusion Features network and output network.
Characteristic extracting module 604, for the first of the default level quantity by feature extraction network generation training reference frame
Beginning characteristic pattern.
Fusion Features module 606, for being fused to the initial characteristics figure of default level quantity by Fusion Features network
Fusion feature figure.
Frame determining module 608 is supplemented, for exporting the first training frames and second for fusion feature figure input to output network
Training complementing video frame between training frames.
Penalty values obtain module 610, and the penalty values of training complementing video frame are determined by preset prediction loss function.
Training module 612, for being trained according to penalty values to initial model, until the parameter in initial model is received
It holds back, obtains video and mend frame model.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation
Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
Frame method embodiment is mended corresponding to above-mentioned video, the embodiment of the invention also provides a kind of views neural network based
Frequency mend frame device, structural schematic diagram as shown in fig. 7, comprises:
Reference frame obtains module 700, for obtaining the first reference frame and the second reference frame of frame video to be mended;It is raw to supplement frame
At module 702, frame model is mended for the first reference frame and the second reference frame to be input to the video pre-established, generates supplement view
Frequency frame;Video is mended frame model and is obtained by the training method training that above-mentioned video neural network based mends frame model;Supplement frame
It is inserted into module 703, for complementing video frame to be inserted between the first reference frame and the second reference frame.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation
Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
Shown in Figure 8 the embodiment of the invention also provides a kind of server, which includes processor 130 and deposits
Reservoir 131, the memory 131 are stored with the machine-executable instruction that can be executed by processor 130, which executes
Machine-executable instruction is to realize that above-mentioned video neural network based mends the training method of frame model or neural network based
Video mends frame method.
Further, server shown in Fig. 8 further includes bus 132 and communication interface 133, processor 130, communication interface
133 and memory 131 connected by bus 132.
Wherein, memory 131 may include high-speed random access memory (RAM, Random Access Memory),
It may further include non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.By extremely
A few communication interface 133 (can be wired or wireless) is realized logical between the system network element and at least one other network element
Letter connection, can be used internet, wide area network, local network, Metropolitan Area Network (MAN) etc..Bus 132 can be isa bus, pci bus or
Eisa bus etc..The bus can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 8 convenient for indicating
One four-headed arrow indicates, it is not intended that an only bus or a type of bus.
Processor 130 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization
Each step of method can be completed by the integrated logic circuit of the hardware in processor 130 or the instruction of software form.On
The processor 130 stated can be general processor, including central processing unit (Central Processing Unit, abbreviation
CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital
Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated
Circuit, abbreviation ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or
Person other programmable logic device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute sheet
Disclosed each method, step and logic diagram in inventive embodiments.General processor can be microprocessor or the processing
Device is also possible to any conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in
Hardware decoding processor executes completion, or in decoding processor hardware and software module combination execute completion.Software mould
Block can be located at random access memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable storage
In the storage medium of this fields such as device, register maturation.The storage medium is located at memory 131, and processor 130 reads memory
Information in 131, in conjunction with its hardware complete previous embodiment method the step of.
The embodiment of the invention also provides a kind of machine readable storage medium, which is stored with machine
Executable instruction, for the machine-executable instruction when being called and being executed by processor, which promotes processor
Realize that above-mentioned video neural network based mends the training method of frame model or video neural network based mends frame method, specifically
It realizes and can be found in embodiment of the method, details are not described herein.
Video neural network based provided by the embodiment of the present invention mends training method, the device of frame method and its model
And the computer program product of server, the computer readable storage medium including storing program code, said program code
Including instruction can be used for executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, herein
It repeats no more.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. the training method that a kind of video neural network based mends frame model characterized by comprising
Current training reference frame is determined based on preset training set;The trained reference frame includes the first training frames and second
Training frames;
The trained reference frame is input to preset initial model;The initial model includes that feature extraction network, feature are melted
Close network and output network;
The initial characteristics figure of the default level quantity of the trained reference frame is generated by the feature extraction network;
The initial characteristics figure of the default level quantity is fused to fusion feature figure by the Fusion Features network;
The fusion feature figure is input to the output network, is exported between first training frames and second training frames
Training complementing video frame;
The penalty values of the trained complementing video frame are determined by preset prediction loss function;
The initial model is trained according to the penalty values, until the parameter convergence in the initial model, depending on
Frequency mends frame model.
2. the method according to claim 1, wherein the feature extraction network includes sequentially connected multiple groups
One convolutional network;First convolutional network described in every group includes convolutional layer interconnected and average pond layer.
3. according to the method described in claim 2, it is characterized in that, the level quantity of the initial characteristics figure is multilayer;Multilayer
Scale between the initial characteristics figure is different;
The step of initial characteristics figure of the default level quantity is fused to fusion feature figure by the Fusion Features network,
Include:
According to the scale of each layer initial characteristics figure, initial characteristics figure described in multilayer is arranged successively;Wherein, top grade
The scale of initial characteristics figure is minimum;The scale of the initial characteristics figure of bottom grade is maximum;
The initial characteristics figure of top grade is determined as to the fusion feature figure of the top grade;
It is in addition to the top grade, the fusion of the initial characteristics figure of current level and a upper level for the current level is special
Sign figure is merged, and the fusion feature figure of current level is obtained;
The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.
4. according to the method described in claim 3, it is characterized in that, the Fusion Features network includes sequentially connected multiple groups
Two convolutional networks;Second convolutional network described in every group includes bilinear interpolation layer and convolutional layer interconnected;
The fusion feature figure of the initial characteristics figure of current level and a upper level for the current level is merged, is worked as
The step of fusion feature figure of preceding level, comprising:
Interpolation processing is carried out by fusion feature figure of the bilinear interpolation layer to a upper level for the current level, is obtained
The fusion feature figure to match with the size of the initial characteristics figure of the current level;
By the convolutional layer by a upper level for the current level after the initial characteristics figure of the current level and interpolation processing
Fusion feature figure carry out convolutional calculation, obtain the fusion feature figure of current level.
5. the method according to claim 1, wherein the output network includes the first convolutional layer, the second convolution
Layer, third convolutional layer, Volume Four lamination and feature synthesize layer;First convolutional layer, second convolutional layer, third volume
Lamination and the Volume Four lamination are connected to the network with the Fusion Features respectively;First convolutional layer, second convolutional layer,
The third convolutional layer and the Volume Four lamination synthesize layer connection with the feature respectively;
The fusion feature figure is input to the output network, is exported between first training frames and second training frames
Training complementing video frame the step of, comprising:
The first convolution is carried out to the corresponding characteristic of the first training frames in the fusion feature figure by first convolutional layer
Operation exports the first vertical characteristic pattern;
The second convolution is carried out to the corresponding characteristic of the first training frames in the fusion feature figure by second convolutional layer
Operation exports first level characteristic pattern;
Third convolution is carried out to the corresponding characteristic of the second training frames in the fusion feature figure by the third convolutional layer
Operation exports the second vertical characteristic pattern;
Volume Four product is carried out to the corresponding characteristic of the second training frames in the fusion feature figure by the Volume Four lamination
Operation exports the second horizontal properties figure;
Layer is synthesized to the described first vertical characteristic pattern, the first level characteristic pattern, the second vertical spy by the feature
Sign figure and the second horizontal properties figure carry out feature superposition processing, obtain the trained complementing video frame.
6. a kind of video neural network based mends frame method characterized by comprising
Obtain the first reference frame and the second reference frame of frame video to be mended;
First reference frame and the second reference frame are input to the video pre-established and mend frame model, generates complementing video frame;
The video is mended frame model and is obtained by the training method training that the described in any item videos of claim 1-5 mend frame model;
The complementing video frame is inserted between first reference frame and second reference frame.
7. the training device that a kind of video neural network based mends frame model characterized by comprising
Training reference frame determining module, for determining current training reference frame based on preset training set;The training ginseng
Examining frame includes the first training frames and the second training frames;
Training reference frame input module, for the trained reference frame to be input to preset initial model;The initial model
Including feature extraction network, Fusion Features network and output network;
Characteristic extracting module, for generated by the feature extraction network the trained reference frame default level quantity just
Beginning characteristic pattern;
Fusion Features module, for being fused to the initial characteristics figure of the default level quantity by the Fusion Features network
Fusion feature figure;
Frame determining module is supplemented, for the fusion feature figure to be input to the output network, exports first training frames
Training complementing video frame between second training frames;
Penalty values obtain module, and the penalty values of the trained complementing video frame are determined by preset prediction loss function;
Training module, for being trained according to the penalty values to the initial model, until the ginseng in the initial model
Number convergence obtains video and mends frame model.
8. a kind of video neural network based mends frame device characterized by comprising
Reference frame obtains module, for obtaining the first reference frame and the second reference frame of frame video to be mended;
Frame generation module is supplemented, mends frame mould for first reference frame and the second reference frame to be input to the video pre-established
Type generates complementing video frame;The video mends frame model and passes through the described in any item views neural network based of claim 1-5
The training method training that frequency mends frame model obtains;
Supplement frame and be inserted into module, for by the complementing video frame be inserted into first reference frame and second reference frame it
Between.
9. a kind of server, which is characterized in that including processor and memory, the memory is stored with can be by the processing
The machine-executable instruction that device executes, the processor execute the machine-executable instruction to realize that claim 1 to 5 is any
Video neural network based described in mends the training method or as claimed in claim 6 based on neural network of frame model
Video mend frame method the step of.
10. a kind of machine readable storage medium, which is characterized in that the machine readable storage medium is stored with the executable finger of machine
It enables, for the machine-executable instruction when being called and being executed by processor, machine-executable instruction promotes processor to realize
Video neural network based described in any one of claim 1 to 5 mends training method or claim 6 institute of frame model
The video neural network based stated mends the step of frame method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910612434.XA CN110324664B (en) | 2019-07-11 | 2019-07-11 | Video frame supplementing method based on neural network and training method of model thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910612434.XA CN110324664B (en) | 2019-07-11 | 2019-07-11 | Video frame supplementing method based on neural network and training method of model thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110324664A true CN110324664A (en) | 2019-10-11 |
CN110324664B CN110324664B (en) | 2021-06-04 |
Family
ID=68123055
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910612434.XA Active CN110324664B (en) | 2019-07-11 | 2019-07-11 | Video frame supplementing method based on neural network and training method of model thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110324664B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242081A (en) * | 2020-01-19 | 2020-06-05 | 深圳云天励飞技术有限公司 | Video detection method, target detection network training method, device and terminal equipment |
CN111353428A (en) * | 2020-02-28 | 2020-06-30 | 北京市商汤科技开发有限公司 | Action information identification method and device, electronic equipment and storage medium |
CN111654746A (en) * | 2020-05-15 | 2020-09-11 | 北京百度网讯科技有限公司 | Video frame insertion method and device, electronic equipment and storage medium |
CN111967382A (en) * | 2020-08-14 | 2020-11-20 | 北京金山云网络技术有限公司 | Age estimation method, and training method and device of age estimation model |
CN112040311A (en) * | 2020-07-24 | 2020-12-04 | 北京航空航天大学 | Video image frame supplementing method, device and equipment and storage medium |
CN112330711A (en) * | 2020-11-26 | 2021-02-05 | 北京奇艺世纪科技有限公司 | Model generation method, information extraction method and device and electronic equipment |
CN112422870A (en) * | 2020-11-12 | 2021-02-26 | 复旦大学 | Deep learning video frame insertion method based on knowledge distillation |
CN112565653A (en) * | 2020-12-01 | 2021-03-26 | 咪咕文化科技有限公司 | Video frame insertion method, system, electronic equipment and storage medium |
CN112804561A (en) * | 2020-12-29 | 2021-05-14 | 广州华多网络科技有限公司 | Video frame insertion method and device, computer equipment and storage medium |
CN113132664A (en) * | 2021-04-19 | 2021-07-16 | 科大讯飞股份有限公司 | Frame interpolation generation model construction method and video frame interpolation method |
CN113542651A (en) * | 2021-05-28 | 2021-10-22 | 北京迈格威科技有限公司 | Model training method, video frame interpolation method and corresponding device |
CN113630621A (en) * | 2020-05-08 | 2021-11-09 | 腾讯科技(深圳)有限公司 | Video processing method, related device and storage medium |
CN113658230A (en) * | 2020-05-12 | 2021-11-16 | 武汉Tcl集团工业研究院有限公司 | Optical flow estimation method, terminal and storage medium |
CN113837136A (en) * | 2021-09-29 | 2021-12-24 | 深圳市慧鲤科技有限公司 | Video frame insertion method and device, electronic equipment and storage medium |
CN113875228A (en) * | 2020-04-30 | 2021-12-31 | 京东方科技集团股份有限公司 | Video frame insertion method and device and computer readable storage medium |
CN114007135A (en) * | 2021-10-29 | 2022-02-01 | 广州华多网络科技有限公司 | Video frame insertion method and device, equipment, medium and product thereof |
CN115002379A (en) * | 2022-04-25 | 2022-09-02 | 武汉大学 | Video frame insertion method, training method, device, electronic equipment and storage medium |
CN115134676A (en) * | 2022-09-01 | 2022-09-30 | 有米科技股份有限公司 | Video reconstruction method and device for audio-assisted video completion |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090244389A1 (en) * | 2008-03-27 | 2009-10-01 | Nao Mishima | Apparatus, Method, and Computer Program Product for Generating Interpolated Images |
CN102811333A (en) * | 2011-05-30 | 2012-12-05 | Jvc建伍株式会社 | Image processing apparatus and interpolation frame generating method |
CN104463172A (en) * | 2014-12-09 | 2015-03-25 | 中国科学院重庆绿色智能技术研究院 | Face feature extraction method based on face feature point shape drive depth model |
CN106326858A (en) * | 2016-08-23 | 2017-01-11 | 北京航空航天大学 | Road traffic sign automatic identification and management system based on deep learning |
CN108416440A (en) * | 2018-03-20 | 2018-08-17 | 上海未来伙伴机器人有限公司 | A kind of training method of neural network, object identification method and device |
CN109377445A (en) * | 2018-10-12 | 2019-02-22 | 北京旷视科技有限公司 | Model training method, the method, apparatus and electronic system for replacing image background |
CN109544482A (en) * | 2018-11-29 | 2019-03-29 | 厦门美图之家科技有限公司 | A kind of convolutional neural networks model generating method and image enchancing method |
CN109785336A (en) * | 2018-12-18 | 2019-05-21 | 深圳先进技术研究院 | Image partition method and device based on multipath convolutional neural networks model |
US10311337B1 (en) * | 2018-09-04 | 2019-06-04 | StradVision, Inc. | Method and device for providing integrated feature map using ensemble of multiple outputs from convolutional neural network |
-
2019
- 2019-07-11 CN CN201910612434.XA patent/CN110324664B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090244389A1 (en) * | 2008-03-27 | 2009-10-01 | Nao Mishima | Apparatus, Method, and Computer Program Product for Generating Interpolated Images |
CN102811333A (en) * | 2011-05-30 | 2012-12-05 | Jvc建伍株式会社 | Image processing apparatus and interpolation frame generating method |
CN104463172A (en) * | 2014-12-09 | 2015-03-25 | 中国科学院重庆绿色智能技术研究院 | Face feature extraction method based on face feature point shape drive depth model |
CN106326858A (en) * | 2016-08-23 | 2017-01-11 | 北京航空航天大学 | Road traffic sign automatic identification and management system based on deep learning |
CN108416440A (en) * | 2018-03-20 | 2018-08-17 | 上海未来伙伴机器人有限公司 | A kind of training method of neural network, object identification method and device |
US10311337B1 (en) * | 2018-09-04 | 2019-06-04 | StradVision, Inc. | Method and device for providing integrated feature map using ensemble of multiple outputs from convolutional neural network |
CN109377445A (en) * | 2018-10-12 | 2019-02-22 | 北京旷视科技有限公司 | Model training method, the method, apparatus and electronic system for replacing image background |
CN109544482A (en) * | 2018-11-29 | 2019-03-29 | 厦门美图之家科技有限公司 | A kind of convolutional neural networks model generating method and image enchancing method |
CN109785336A (en) * | 2018-12-18 | 2019-05-21 | 深圳先进技术研究院 | Image partition method and device based on multipath convolutional neural networks model |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242081A (en) * | 2020-01-19 | 2020-06-05 | 深圳云天励飞技术有限公司 | Video detection method, target detection network training method, device and terminal equipment |
CN111353428A (en) * | 2020-02-28 | 2020-06-30 | 北京市商汤科技开发有限公司 | Action information identification method and device, electronic equipment and storage medium |
CN111353428B (en) * | 2020-02-28 | 2022-05-24 | 北京市商汤科技开发有限公司 | Action information identification method and device, electronic equipment and storage medium |
CN113875228B (en) * | 2020-04-30 | 2023-06-30 | 京东方科技集团股份有限公司 | Video frame inserting method and device and computer readable storage medium |
CN113875228A (en) * | 2020-04-30 | 2021-12-31 | 京东方科技集团股份有限公司 | Video frame insertion method and device and computer readable storage medium |
CN113630621A (en) * | 2020-05-08 | 2021-11-09 | 腾讯科技(深圳)有限公司 | Video processing method, related device and storage medium |
CN113658230A (en) * | 2020-05-12 | 2021-11-16 | 武汉Tcl集团工业研究院有限公司 | Optical flow estimation method, terminal and storage medium |
CN113658230B (en) * | 2020-05-12 | 2024-05-28 | 武汉Tcl集团工业研究院有限公司 | Optical flow estimation method, terminal and storage medium |
CN111654746A (en) * | 2020-05-15 | 2020-09-11 | 北京百度网讯科技有限公司 | Video frame insertion method and device, electronic equipment and storage medium |
US11363271B2 (en) | 2020-05-15 | 2022-06-14 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method for video frame interpolation, related electronic device and storage medium |
CN111654746B (en) * | 2020-05-15 | 2022-01-21 | 北京百度网讯科技有限公司 | Video frame insertion method and device, electronic equipment and storage medium |
CN112040311A (en) * | 2020-07-24 | 2020-12-04 | 北京航空航天大学 | Video image frame supplementing method, device and equipment and storage medium |
CN111967382A (en) * | 2020-08-14 | 2020-11-20 | 北京金山云网络技术有限公司 | Age estimation method, and training method and device of age estimation model |
CN112422870B (en) * | 2020-11-12 | 2021-09-17 | 复旦大学 | Deep learning video frame insertion method based on knowledge distillation |
CN112422870A (en) * | 2020-11-12 | 2021-02-26 | 复旦大学 | Deep learning video frame insertion method based on knowledge distillation |
CN112330711A (en) * | 2020-11-26 | 2021-02-05 | 北京奇艺世纪科技有限公司 | Model generation method, information extraction method and device and electronic equipment |
CN112330711B (en) * | 2020-11-26 | 2023-12-05 | 北京奇艺世纪科技有限公司 | Model generation method, information extraction device and electronic equipment |
CN112565653B (en) * | 2020-12-01 | 2023-04-07 | 咪咕文化科技有限公司 | Video frame insertion method, system, electronic equipment and storage medium |
CN112565653A (en) * | 2020-12-01 | 2021-03-26 | 咪咕文化科技有限公司 | Video frame insertion method, system, electronic equipment and storage medium |
CN112804561A (en) * | 2020-12-29 | 2021-05-14 | 广州华多网络科技有限公司 | Video frame insertion method and device, computer equipment and storage medium |
WO2022141819A1 (en) * | 2020-12-29 | 2022-07-07 | 广州华多网络科技有限公司 | Video frame insertion method and apparatus, and computer device and storage medium |
CN113132664A (en) * | 2021-04-19 | 2021-07-16 | 科大讯飞股份有限公司 | Frame interpolation generation model construction method and video frame interpolation method |
CN113542651A (en) * | 2021-05-28 | 2021-10-22 | 北京迈格威科技有限公司 | Model training method, video frame interpolation method and corresponding device |
CN113542651B (en) * | 2021-05-28 | 2023-10-27 | 爱芯元智半导体(宁波)有限公司 | Model training method, video frame inserting method and corresponding devices |
CN113837136A (en) * | 2021-09-29 | 2021-12-24 | 深圳市慧鲤科技有限公司 | Video frame insertion method and device, electronic equipment and storage medium |
CN113837136B (en) * | 2021-09-29 | 2022-12-23 | 深圳市慧鲤科技有限公司 | Video frame insertion method and device, electronic equipment and storage medium |
CN114007135B (en) * | 2021-10-29 | 2023-04-18 | 广州华多网络科技有限公司 | Video frame insertion method and device, equipment, medium and product thereof |
CN114007135A (en) * | 2021-10-29 | 2022-02-01 | 广州华多网络科技有限公司 | Video frame insertion method and device, equipment, medium and product thereof |
CN115002379B (en) * | 2022-04-25 | 2023-09-26 | 武汉大学 | Video frame inserting method, training device, electronic equipment and storage medium |
CN115002379A (en) * | 2022-04-25 | 2022-09-02 | 武汉大学 | Video frame insertion method, training method, device, electronic equipment and storage medium |
CN115134676A (en) * | 2022-09-01 | 2022-09-30 | 有米科技股份有限公司 | Video reconstruction method and device for audio-assisted video completion |
Also Published As
Publication number | Publication date |
---|---|
CN110324664B (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110324664A (en) | A kind of video neural network based mends the training method of frame method and its model | |
Niklaus et al. | Video frame interpolation via adaptive separable convolution | |
Xue et al. | Video enhancement with task-oriented flow | |
Jin et al. | Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization | |
Liu et al. | Video frame synthesis using deep voxel flow | |
Cheng et al. | Multiple video frame interpolation via enhanced deformable separable convolution | |
CN111275626B (en) | Video deblurring method, device and equipment based on ambiguity | |
Giachetti et al. | Real-time artifact-free image upscaling | |
CN103597839B (en) | Video-frequency compression method, video reconstruction method and system and encoder | |
Nazeri et al. | Edge-informed single image super-resolution | |
JP6094863B2 (en) | Image processing apparatus, image processing method, program, integrated circuit | |
US9525858B2 (en) | Depth or disparity map upscaling | |
US20200334894A1 (en) | 3d motion effect from a 2d image | |
WO2021115403A1 (en) | Image processing method and apparatus | |
CN104506872B (en) | A kind of method and device of converting plane video into stereoscopic video | |
CN112543317A (en) | Method for converting high-resolution monocular 2D video into binocular 3D video | |
CN110135576A (en) | A kind of unsupervised learning method for video deblurring | |
CN110443883A (en) | A kind of individual color image plane three-dimensional method for reconstructing based on dropblock | |
CN109785270A (en) | A kind of image super-resolution method based on GAN | |
CN115298708A (en) | Multi-view neural human body rendering | |
CN106415657A (en) | Method and device for enhancing quality of an image | |
CN116248955A (en) | VR cloud rendering image enhancement method based on AI frame extraction and frame supplement | |
CN108924528A (en) | A kind of binocular stylization real-time rendering method based on deep learning | |
CN112634127A (en) | Unsupervised stereo image redirection method | |
CN108769644B (en) | Binocular animation stylized rendering method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |