CN110446071A - Multi-media processing method, device, equipment and medium neural network based - Google Patents
Multi-media processing method, device, equipment and medium neural network based Download PDFInfo
- Publication number
- CN110446071A CN110446071A CN201910745322.1A CN201910745322A CN110446071A CN 110446071 A CN110446071 A CN 110446071A CN 201910745322 A CN201910745322 A CN 201910745322A CN 110446071 A CN110446071 A CN 110446071A
- Authority
- CN
- China
- Prior art keywords
- neural network
- picture
- amplification factor
- multimedia file
- samples pictures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 122
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 230000003321 amplification Effects 0.000 claims abstract description 87
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 87
- 238000012545 processing Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims description 88
- 238000000034 method Methods 0.000 claims description 73
- 230000008569 process Effects 0.000 claims description 50
- 230000006870 function Effects 0.000 claims description 44
- 230000015654 memory Effects 0.000 claims description 10
- 238000012804 iterative process Methods 0.000 claims description 8
- 230000001965 increasing effect Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 abstract description 19
- 238000013473 artificial intelligence Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 5
- 210000004218 nerve net Anatomy 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234363—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440263—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of multi-media processing method, device, equipment and media neural network based, belong to multimedia technology field.Technical solution provided by the invention, the resolution ratio amplification of multiple and different amplification factors can be realized by a neural network, it is not necessary that different neural networks are arranged for different amplification, space needed for greatly reducing storage, and flexible multi-media processing can be realized using the cascade of the same neural network, substantially increase practicability.
Description
Technical field
The present invention relates to multimedia technology field, in particular to a kind of multi-media processing method neural network based, dress
It sets, equipment and medium.
Background technique
With the development of multimedia technology, user is more and more diversified to multimedia demand, for example, in the presence of to video or
The demand that person's image resolution ratio is adjusted, and when being adjusted to resolution ratio, it is likely that it will cause the loss of detailed information,
Therefore, training neural network, Lai Tigao resolution ratio, so that can guarantee more matchmakers when being adjusted to resolution ratio can be passed through
Weight.
However, currently in order to the resolution adjustment of realization different amplification, can be different amplification factor training differences
Neural network, training process not only needed for training higher cost, but also cumbersome redundancy is time-consuming huge, so that multimedia
The actual efficiency of processing is very low.
Summary of the invention
The embodiment of the invention provides a kind of multi-media processing method, device, equipment and medium neural network based, solutions
It has determined the low problem of actual efficiency of existing multi-media processing.The technical solution is as follows:
On the one hand, a kind of multi-media processing method neural network based is provided, which comprises
Obtain the object magnification of the first multimedia file and first multimedia file;
According to the object magnification and default amplification factor, at the target circulation for determining first multimedia file
Manage number;
First multimedia file is inputted into the corresponding neural network of the default amplification factor, the neural network is used
In the resolution ratio enhanced processing for carrying out the default amplification factor to input multimedia file;
First multimedia file is handled by the neural network, the intermediate multimedia text that processing is obtained
Part inputs the neural network again, when circular treatment number reaches the target circulation number of processes, obtained by output
The second multimedia file.
On the one hand, a kind of multimedia processing apparatus neural network based is provided, described device includes:
Module is obtained, for obtaining the object magnification of the first multimedia file and first multimedia file;
Determining module, for determining the first multimedia text according to the object magnification and default amplification factor
The target circulation number of processes of part;
Input module, for first multimedia file to be inputted the corresponding neural network of the default amplification factor,
The neural network is used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor;
Processing module obtains processing for being handled by the neural network first multimedia file
Intermediate multimedia file input the neural network again, until circular treatment number reaches the target circulation number of processes
When, export obtained second multimedia file.
In a kind of possible implementation, training module is used in any secondary iterative process in the training process, for appointing
One picture group executes following training steps at random:
Select the second samples pictures in the picture group and third samples pictures as training sample, by second sample
This picture inputs the neural network, is based on the corresponding output picture of second samples pictures and the third samples pictures tune
The parameter of the whole neural network;
The first sample picture in the picture group is selected, the first sample picture is inputted into the neural network, base
In the corresponding first output picture of the first sample picture and third samples pictures as training sample, described first is exported
Picture inputs the neural network, based on the first output corresponding output picture of picture and third samples pictures adjustment
The parameter of the neural network;
When meeting training termination condition, neural network used by current iteration process is exported as the nerve net
Network.
In a kind of possible implementation, the device further include:
Random number generation module is generated for being based on current iteration number in any secondary iterative process in the training process
Random number, the random number is obeyed to be uniformly distributed in [0,1] section;
When the random number is greater than the value of objective function, triggering training module is executed in the selection picture group
The second samples pictures and third samples pictures be trained as the step of training sample;
When the random number is less than the value of objective function, triggering training module is executed in the selection picture group
First sample picture the step of be trained;
Wherein, the objective function is monotone non-increasing function.
In a kind of possible implementation, the first sample picture is the default amplification factor of second samples pictures
Down-sampled picture, second samples pictures be the third samples pictures default amplification factor down-sampled picture.
In a kind of possible implementation, the default amplification factor is integer or non-integer.
On the one hand, a kind of computer equipment is provided, the computer equipment includes one or more processors and one
Or multiple memories, at least one program code, at least one program generation are stored in one or more of memories
Code is loaded by one or more of processors and is executed to realize such as above-mentioned multi-media processing method neural network based
Performed operation.
On the one hand, a kind of computer readable storage medium is provided, at least one program is stored in the storage medium
Code, at least one program code are loaded by processor and are executed to realize such as above-mentioned multimedia neural network based
Operation performed by processing method.
Technical solution provided by the invention can realize the resolution of multiple and different amplification factors by a neural network
Rate amplification, it is not necessary that different neural networks are arranged for different amplification, space needed for greatly reducing storage, and can be using same
Flexible multi-media processing is realized in the cascade of one neural network, substantially increases practicability.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is the structural block diagram of multi-media service system provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of the training process of neural network provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram for trained picture group provided in an embodiment of the present invention;
Fig. 4 is a kind of trained flow diagram provided in an embodiment of the present invention;
Fig. 5 is the flow chart of multi-media processing method provided in an embodiment of the present invention;
Fig. 6 is the processing flow schematic diagram of multiple groups different amplification provided in an embodiment of the present invention;
Fig. 7 is a kind of structural schematic diagram of multimedia processing apparatus neural network based provided in an embodiment of the present invention;
Fig. 8 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Artificial intelligence (Artificial Intelligence, AI) is to utilize digital computer or digital computer control
Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum
By, method, technology and application system.In other words, artificial intelligence is a complex art of computer science, it attempts to understand
The essence of intelligence, and produce a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence
The design principle and implementation method for namely studying various intelligence machines make machine have the function of perception, reasoning and decision.
Artificial intelligence technology is an interdisciplinary study, is related to that field is extensive, and the technology of existing hardware view also has software layer
The technology in face.Artificial intelligence basic technology generally comprise as sensor, Special artificial intelligent chip, cloud computing, distributed storage,
The technologies such as big data processing technique, operation/interactive system, electromechanical integration.Artificial intelligence software's technology mainly includes computer
Several general orientation such as vision technique, voice processing technology, natural language processing technique and machine learning/deep learning.
Machine learning (Machine Learning, ML) is a multi-field cross discipline, be related to probability theory, statistics,
The multiple subjects such as Approximation Theory, convextiry analysis, algorithm complexity theory.Specialize in the study that the mankind were simulated or realized to computer how
Behavior reorganizes the existing structure of knowledge and is allowed to constantly improve the performance of itself to obtain new knowledge or skills.Engineering
Habit is the core of artificial intelligence, is the fundamental way for making computer have intelligence, and application spreads the every field of artificial intelligence.
Machine learning and deep learning generally include artificial neural network, confidence network, intensified learning, transfer learning, inductive learning, formula
The technologies such as teaching habit.
With artificial intelligence technology research and progress, research and application is unfolded in multiple fields in artificial intelligence technology, such as
Common smart home, intelligent wearable device, virtual assistant, intelligent sound box, intelligent marketing, unmanned, automatic Pilot, nobody
Machine, robot, intelligent medical, intelligent customer service etc., it is believed that with the development of technology, artificial intelligence technology will obtain in more fields
To application, and play more and more important value.
Scheme provided in an embodiment of the present invention is related to the technologies such as multi-media processing neural network based and machine learning, tool
Body is illustrated by following examples:
Below to the present embodiments relate to the implement scene introductions arrived:
Fig. 1 is the structural block diagram of multi-media service system provided in an embodiment of the present invention.The multi-media service system 100 packet
It includes: terminal 110 and Multi-media Service Platform 140.
Terminal 110 is connected by wireless network or cable network with Multi-media Service Platform 110.Terminal 110 can be solid
Determine terminal or mobile terminal, for example, mobile terminal can for smart phone, game host, desktop computer, tablet computer,
At least one of E-book reader, MP3 player, MP4 player and pocket computer on knee.Terminal 110 installation and
Operation has the application program for supporting multimedia service.The application program can be social application program, instant messaging application program,
Multimedia shares any one in program.Schematically, terminal 110 is the terminal that user uses, and what is run in terminal 110 answers
With user account can be logged in program.
Terminal 110 is connected by wireless network or cable network with Multi-media Service Platform 140.
Map service platform 140 include a server, multiple servers, cloud computing platform and virtualization center in extremely
Few one kind.Optionally, Multi-media Service Platform 140 includes: multimedia server, multimedia database and user information data
Library.Multimedia server provides multimedia service for providing terminal 110.Multimedia server can be one or more.When
When multimedia server is more, there are at least two multimedia servers for providing different multimedia services, and/or,
It is used to provide identical service in the presence of at least two multimedia servers, for example provides same service with load balancing mode,
The embodiment of the present invention is not limited this.The multimedia database is used to store the multimedia file of Multi-media Service Platform,
Such as audio file, video file, picture file etc., the User Information Database are used to provide the relevant information of user, with
Continue after an action of the bowels and personalized service function is provided for terminal.Certainly, which can also include that other function takes
It is engaged in device, in order to provide more comprehensively and diversified service.
Terminal 110 can refer to one in multiple terminals, and the present embodiment is only illustrated with terminal 110.This field
Technical staff could be aware that the quantity of above-mentioned terminal can be more or less.For example above-mentioned terminal can be only one, Huo Zheshang
Stating terminal is tens or several hundred or greater number, and above-mentioned Map service system further includes other terminals at this time.The present invention
Embodiment is not limited the quantity and device type of terminal.
Below to the present embodiments relate to the training process of neural network be introduced:
Fig. 2 is a kind of flow chart of the training process of neural network provided in an embodiment of the present invention, referring to fig. 2, the process
Include:
201, server obtains training dataset, which includes multiple picture groups, and each picture group includes point
First sample picture, the second samples pictures and the third samples pictures that resolution is incremented by with the default amplification factor.
Wherein, which is incremented by the power between the resolution ratio that can refer to one group of picture with amplification factor
Relationship, for example, for the power relationship for being properly termed as that there is amplification factor for r between r, r*r and r*r*r, wherein r is positive
Number.
In a kind of possible implementation, the picture of a picture group can be handled to obtain by a true picture, namely
It is that the first sample picture is the down-sampled picture of the default amplification factor of second samples pictures, second samples pictures
For the down-sampled picture of the default amplification factor of the third samples pictures.The true picture can refer to original image or warp
By the down-sampled obtained picture of original image, and output picture is properly termed as the picture of model output, to be distinguished.
Picture can be stored using the format of [x_pre, x, y] in above-mentioned picture group, wherein with default amplification factor be r
For, then in above-mentioned picture group, x is the down-sampled picture of the 1/r of y, and x_pre is the down-sampled picture of the 1/r*r of y, wherein
" * " indicates multiplication.It is different from training dataset used by current training process, the embodiment of the present invention increases x_pre,
To train obtained model that can not only make true picture amplification presupposition multiple, additionally it is possible to so that output picture amplification is default
Multiple.
Such as shown in Fig. 3, Yao Xunlian is able to carry out the neural network amplified based on 2 times, can be using with such as Fig. 3
In resolution ratio shown in each bracket picture group as training data.Certainly, each for a neural metwork training
There can be coincidence picture in a picture group, that is to say, some picture can reside in different picture groups, as shown in figure 3, resolution ratio
It can reside in picture group 1 and picture group 2 for the picture of r/2.
By configuring the training dataset of above structure, the resolution ratio for trained samples pictures can be divided into three grades
Not, and with default amplification factor it is incremented by, therefore, the input for the network that training is completed can be the picture of multiple and different resolution ratio,
It that is to say, the input of network can be the picture of 1/r resolution ratio, it can also be the picture of 1/r*r or 1/r*r*r, and its net
The output picture of network may each be the enlarged drawing for remaining detailed information.
202, in the training process to neural network, server is based on current iteration number i, generates random number, should be with
Machine number is obeyed to be uniformly distributed in [0,1] section.
It may include successive ignition for training process, it is corresponding in order to randomly choose in each iterative process
Samples pictures can generate random number, wherein i is positive integer based on the number of iterations i as independent variable.Certainly, the random number
Generation be also possible to based on any Generating Random Number realize, it is not limited in the embodiment of the present invention.
203, the random number is compared by server with the value of objective function, when random number is greater than taking for objective function
When value, step 204 is executed, when random number is less than the value of objective function, executes step 205.
Wherein, objective function is the monotone non-increasing function that independent variable is the number of iterations, expression be can be ratio=
Function (i), wherein i is the integer greater than 1, be that is to say, it is from change that objective function ratio, which can be with the number of iterations i,
The monotone non-increasing function of amount, which refers to that argument of function increases, and the not increased function of functional value, the mesh
Scalar functions can be used but be not limited to following form:
For example, the ratio can be linear function,
The linear function can be expressed as ratio=max (ratio_min, k-cepoch),
The ratio can be anti-sigmoid function,
The anti-sigmoid function representation are as follows: ratio=max (ratio_min, 1/ (1+exp (c (epoch-k))))
The ratio can be exponential function,
The exponential function can be expressed as ratio=max (ratio_min, k^epoch), k < 1
Wherein, ratio_min is used to indicate that the minimum value of objective function, the minimum value can be the numerical value of [0,1],
Epoch is for indicating the number of iterations, and k and c are constant, for determining offset and the slope of function.Max takes maximum for indicating
Value.
204, server selects the second samples pictures in the picture group and third samples pictures as training sample, by this
Second samples pictures input the neural network, based on the corresponding output picture of second samples pictures and the third samples pictures tune
The parameter of the whole neural network.
With the samples pictures in picture group to be illustrated for [x_pre, x, y], when random number is greater than objective function
When value, then it can choose the corresponding samples pictures of x and y as training sample, using the corresponding samples pictures of x as network inputs,
Output picture, referred to herein as y ' are obtained, then calculates loss function based on the corresponding samples pictures of y and y ', thus based on loss
The variation tendency of function adjusts the parameter of neural network, by the neural network that process training obtains can make the nerve net
Network has the ability that resolution ratio amplification can be carried out based on true picture.
205, server selects the first sample picture in the picture group, which is inputted the nerve net
Network, it is based on the corresponding first output picture of the first sample picture and third samples pictures as training sample, this is first defeated
Picture inputs the neural network out, adjusts the mind based on the corresponding output picture of the first output picture and the third samples pictures
Parameter through network.
Still with the samples pictures in picture group to be illustrated for [x_pre, x, y], when random number small fish target letter
When several values, then the corresponding samples pictures input Current Situation of Neural Network of x_pre can be first selected, output picture is known as x ', this
When x ' be not true picture, and then can using the corresponding samples pictures of x ' and y as training sample, with x ' for network inputs,
Y " is obtained, then loss function is calculated based on y and y ", to adjust the ginseng of neural network based on the variation tendency of loss function
Number, the neural network can be made to have by the neural network that process training obtains can be differentiated based on output picture
The ability of rate amplification.
In above-mentioned steps 203 to 205, used in trained iterative process each time with certain probability selection true
Real picture is still trained based on the output picture of true picture.At training initial stage, the value of objective function is bigger, with
The random number that machine generates is less than its value with biggish probability and is therefore all trained using true picture when most of, with
Just network can learn to amplifying required parameter for true picture, and with trained progress, objective function
Value gradually becomes smaller, and is gradually increased using the ratio that the output picture based on true picture is trained, so that network energy
Enough gradually to amplify network output picture from can only amplify true picture and be transitioned into, neural network at this time can gradually have
Cascade characteristic becomes the general network for being used to carry out resolution ratio amplification.
206, when meeting training termination condition, server exports neural network used by current iteration process for institute
State neural network.
The training termination condition can reach targeted number for the number of iterations or loss function meets preset condition, may be used also
When thinking based on validation data set verifying, ability is not promoted whithin a period of time.Wherein, which can be in advance
The number of iterations of setting on the opportunity terminated to determine training, avoids the waste to training resource, and the preset condition can be
Such as loss function value is constant whithin a period of time in training process or the conditions such as does not decline, and has illustrated the training process at this time
Training effect through having reached.For entire training process, flow chart provided by Fig. 4 can be referred to, is related in the flow chart
To a kind of method flow that can be used when implementing, it that is to say, first prepare training dataset, then construct neural network, i-th
When secondary iteration, then the part samples pictures that available training data is concentrated, and determined based on random number with which picture come
It is trained, and after a collection of samples pictures training, then termination condition can be based on to determine whether meeting, when meeting
Then the neural network that training obtains can be exported with deconditioning, and work as and be unsatisfactory for, then continue to be instructed based on training dataset
Practice.
In the above process when carrying out model training, if the quantity for all kinds of resolution charts for including in each picture group
It is extended, then can obtain the neural network that can carry out resolution ratio amplification to video by training, that is to say, for being used for
For the training process that video carries out the neural network of resolution ratio amplification, when obtaining training dataset, can still it use
[x_pre, x, y] format described in above-described embodiment, wherein the corresponding samples pictures of y can be high-resolution center frame,
The corresponding samples pictures of x are the center frame and front and back N frame of the 1/r resolution ratio of the samples pictures of y, amount to 2N+1 frame, x_pre is y
1/r^2 resolution ratio center frame and front and back 2N frame, amount to 4N+1 frame, utilize [x_pre, x, y] format training dataset into
The obtained neural network of row training, can input the corresponding samples pictures of x, corresponding by neural network reconstruction resolution ratio y
Center frame can also first input the corresponding samples pictures of x_pre, reconstruct the x ' of 1/r resolution ratio, then input x ', by nerve
The corresponding center frame of network reconnection resolution ratio y, wherein N is the integer greater than 0.
Training process provided in an embodiment of the present invention, relative in the related technology to the mould of some amplification factor r training
Type is only applicable to the case where amplification factor is r, can be based on lower trained cost, obtain can be realized times magnification by training
Number is the neural network of r*n (n is any positive integer).Further, the neural network obtained based on the training of above-mentioned training process
The amplification that can be thus achieved more times, without training the network of multiple and different amplification factors, the parameter amount of this single network
Parameter amount relative to multiple networks is much smaller, and when realizing the amplification factor of same range, the embodiment of the present invention is obtained
Neural network required for save model parameter it is few, save memory space.
Further, above-mentioned training process substantially increases the practicability of model, can be answered by a neural network
It is realized so that neural network can be used as a flexible general module by cascade mode for multiple amplification ranges
Biggish amplification range.
In a kind of possible implementation, which can be set based on VESPCN or VSR-DUF example
Meter, the model of all amplification factors with power relationship is simplified to greatly reduce for a model needs trained model
Number.Wherein, VESPCN is the work of Wenzhe Shi and the Jose Caballero of venture company Magic Pony Technology
Product.And optionally, in order to realize the amplification factor of non-integer, which is also based on Meta-SR model, should
Meta-SR may be implemented a model and amplify suitable for the resolution ratio of a certain range of non-integral multiple amplification factor, to reach
It is suitable for the purpose of arbitrary amplification factor to single model.
The use process that the neural network is introduced below based on Fig. 5 that is to say multi-media processing process, should referring to Fig. 5
Method can be applied in server or computer equipment, for example, server can be come with the good neural network of application training into
Trained neural network can also be sent to any computer equipment to carry out at multimedia by row multi-media processing method
Reason, the embodiment of the present invention are only illustrated so that computer equipment is executing subject as an example, which includes:
501, computer equipment obtains the object magnification of the first multimedia file and first multimedia file.
Wherein, which can be picture or video, and it is not limited in the embodiment of the present invention, and mesh
Mark amplification factor refers to the multiple for wanting to amplify the first multimedia file, which can be to preset
Multiple, for example, for some multimedia file, the object magnification of resolution ratio is 4 times.The object magnification
It can also be the amplification factor being calculated based on current resolution and target resolution, as first multimedia file is current
Resolution ratio is x, and user has selected a target resolution for x*r, then r can be obtained by calculation in the object magnification.Its
In, resolution ratio x is positive number.
502, computer equipment determines first multimedia file according to the object magnification and default amplification factor
Target circulation number of processes.
Wherein, default amplification factor can be the amplification that the pre-set neural network model of computer equipment can be supported
Multiple corresponds respectively to different default amplification factors, then if being provided with multiple neural network models in the computer equipment
It can be compared one by one according to the object magnification and default amplification factor, so that the default amplification factor that ratio is integer be made
The basis calculated for this.
It, can be after getting object magnification for computer equipment, determination needs to carry out circular treatment
Number, therefore, the power relationship of available object magnification and default amplification factor, using the value of power relationship as this
One multimedia is with the target circulation number of processes of file.For example, object magnification is 4, and default amplification factor is 2, then may be used
To determine its target circulation number of processes for 2 times.
503, first multimedia file is inputted the default corresponding neural network of amplification factor, the mind by computer equipment
It is used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor through network.
504, computer equipment is handled first multimedia file by the neural network, will be handled in obtaining
Between multimedia file input the neural network again, until circular treatment number reach the target circulation number of processes stop, it is defeated
Stop obtained second multimedia file when processing out.
Computer equipment can be requested in response to enhanced processing, called this being pre-set in computer equipment to preset and put
The corresponding neural network of big multiple using the first multimedia file as network inputs, and regard network output as network inputs, again
It is input in the neural network, repeats the above process until circular treatment number reaches target circulation number of processes, then it can will
The the second multimedia file output obtained at this time.Still with object magnification for 4, presetting amplification factor is 2, target circulation processing
It, then, will after the first multimedia file being inputted network referring to second group of input and output process of Fig. 6 for number is 2 times
The multimedia file of output inputs neural network again, comes using the multimedia file of this output as the second multimedia file defeated
Out.And if its object magnification is 8, it can determine that target circulation number of processes is 3 times, it is defeated referring to the third group of Fig. 6
Enter to export process, after the first multimedia file being inputted network, the multimedia file of output inputted into neural network again,
This multimedia file exported is re-used as network inputs to input neural network, and using obtained multimedia file as the
Two multimedia files export.
It should be noted that corresponding neural network is based on as video frame when multimedia file is video
Picture is trained to obtain, and when multimedia file is picture, then its corresponding neural network can be based on picture training
It obtains, therefore, the file type of multimedia file can also be identified between step 502, to be based on different files
Type calls corresponding neural network to carry out multi-media processing.
By above-mentioned multi-media processing process, point of multiple and different amplification factors can be realized by a neural network
Resolution amplification, it is not necessary that different neural networks are arranged for different amplification, space needed for greatly reducing storage, and can apply
Flexible multi-media processing is realized in the cascade of the same neural network, substantially increases practicability.
By taking Video Applications as an example, when user is using Video Applications viewing video resource, need constantly to ask to server
HD video is sought, and downloads HD video and not only consumes time delay in the mobile data traffic but also transmission process of user, will cause use
There is the bad experience such as Caton during watching video in family.And technical solution provided in an embodiment of the present invention is applied, it uses
Family can choose the resolution ratio of resolution ratio and the video downloaded from server when watching video.When the video resolution of downloading is small
When resolution ratio when watching video, video should can within by the ratio that both calculates, to select suitable amplification factor pair
Video carries out super-resolution, that is to say, multi-media processing process is carried out, to realize the amplification of resolution ratio.This mode can make
Network condition is bad or wants that the user for saving flow selects the resolution ratio of lower foradownloaded video, but still can watch compared with high score
The video of resolution reaches better audio visual effect while with the problems such as saving flow and avoiding Caton.User can lead to
It crosses and selects suitable resolution ratio, find optimal tradeoff between viewing cost, fluency and image quality.
Following 1 Korean style of table is to carry out experimental data when resolution ratio amplification based on distinct methods, in the table
Numerical value indicate PSNR value, PSNR value is bigger, and performance is better.
Table 1
In table 1 in addition to the method provided by the embodiment of the present invention, it is for x2, x4 has trained 2 independent networks, and
Method provided in an embodiment of the present invention only has trained a neural network, by the recycling of neural network, to realize x4's
Amplification factor can see from the data in table, the training method proposed by the technical program, realize resolution ratio amplification
While, it is suitable with method before in performance, it was demonstrated that the feasibility of cascade structure.
Fig. 7 is a kind of structural schematic diagram of multimedia processing apparatus neural network based provided in an embodiment of the present invention.
Referring to Fig. 7, which includes:
Module 701 is obtained, for obtaining the object magnification of the first multimedia file and first multimedia file;
Determining module 702, for determining first multimedia according to the object magnification and default amplification factor
The target circulation number of processes of file;
Input module 703, for first multimedia file to be inputted the corresponding nerve net of the default amplification factor
Network, the neural network are used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor;
Processing module 704 will be handled for being handled by the neural network first multimedia file
To intermediate multimedia file input the neural network again, until circular treatment number reaches target circulation processing time
When number, obtained second multimedia file is exported.
In a kind of possible implementation, described device further include:
Training module, for obtaining training dataset, the training dataset includes multiple picture groups, each picture group packet
Include first sample picture, the second samples pictures and third samples pictures that resolution ratio is incremented by with the default amplification factor;It is based on
The multiple picture group, is trained neural network, obtains the corresponding neural network of the default amplification factor.
In a kind of possible implementation, training module is used in any secondary iterative process in the training process, for appointing
One picture group executes following training steps at random:
Select the second samples pictures in the picture group and third samples pictures as training sample, by second sample
This picture inputs the neural network, is based on the corresponding output picture of second samples pictures and the third samples pictures tune
The parameter of the whole neural network;
The first sample picture in the picture group is selected, the first sample picture is inputted into the neural network, base
In the corresponding first output picture of the first sample picture and third samples pictures as training sample, described first is exported
Picture inputs the neural network, based on the first output corresponding output picture of picture and third samples pictures adjustment
The parameter of the neural network;
When meeting training termination condition, neural network used by current iteration process is exported as the nerve net
Network.
In a kind of possible implementation, the device further include:
Random number generation module is generated for being based on current iteration number in any secondary iterative process in the training process
Random number, the random number is obeyed to be uniformly distributed in [0,1] section;
When the random number is greater than the value of objective function, triggering training module is executed in the selection picture group
The second samples pictures and third samples pictures be trained as the step of training sample;
When the random number is less than the value of objective function, triggering training module is executed in the selection picture group
First sample picture the step of be trained;
Wherein, the objective function is monotone non-increasing function.
In a kind of possible implementation, the first sample picture is the default amplification factor of second samples pictures
Down-sampled picture, second samples pictures be the third samples pictures default amplification factor down-sampled picture.
In a kind of possible implementation, the default amplification factor is integer or non-integer.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer
It repeats one by one.
It should be understood that multimedia processing apparatus neural network based provided by the above embodiment is in multi-media processing
When, only the example of the division of the above functional modules, in practical application, it can according to need and divide above-mentioned function
With being completed by different functional modules, i.e., the internal structure of equipment is divided into different functional modules, to complete above description
All or part of function.In addition, multimedia processing apparatus neural network based provided by the above embodiment with based on mind
Multi-media processing method embodiment through network belongs to same design, and specific implementation process is detailed in embodiment of the method, here not
It repeats again.
Fig. 8 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can mention
For for server, which can generate bigger difference because configuration or performance are different, may include one or
More than one processor (central processing units, CPU) 801 and one or more memory 802,
In, it is stored at least one instruction in the memory 802, which is loaded by the processor 801 and executed with reality
The multi-media processing method neural network based that existing above-mentioned each embodiment of the method provides.Certainly, which may be used also
With the components such as wired or wireless network interface, keyboard and input/output interface, to carry out input and output, the computer
Equipment can also include other for realizing the component of functions of the equipments, and this will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, depositing for example including program code
Reservoir, above procedure code can by the processor in computer equipment execute with complete in above-described embodiment based on neural network
Multi-media processing method.For example, the computer readable storage medium can be ROM (Read-Only Memory, read-only storage
Device), RAM (random access memory, random access memory), CD-ROM (Compact Disc Read-Only
Memory, CD-ROM), tape, floppy disk and optical data storage devices etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program being somebody's turn to do can store computer-readable deposits in a kind of
In storage media, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of multi-media processing method neural network based, which is characterized in that the described method includes:
Obtain the object magnification of the first multimedia file and first multimedia file;
According to the object magnification and default amplification factor, the target circulation processing time of first multimedia file is determined
Number;
By the corresponding neural network of the first multimedia file input default amplification factor, the neural network for pair
Input multimedia file carries out the resolution ratio enhanced processing of the default amplification factor;
First multimedia file is handled by the neural network, the intermediate multimedia file that processing is obtained is again
The secondary input neural network exports obtained the when circular treatment number reaches the target circulation number of processes
Two multimedia files.
2. the method according to claim 1, which is characterized in that the training process of the corresponding neural network of the default amplification factor
Include:
Training dataset is obtained, the training dataset includes multiple picture groups, and each picture group includes resolution ratio with described pre-
If amplification factor incremental first sample picture, the second samples pictures and third samples pictures;
Based on the multiple picture group, neural network is trained, obtains the corresponding neural network of the default amplification factor.
3. method according to claim 2, which is characterized in that it is described to be based on the multiple picture group, neural network is instructed
Practice, obtaining the corresponding neural network of the default amplification factor includes:
In the training process in any secondary iterative process, for any picture group, following training steps are executed at random:
Select the second samples pictures in the picture group and third samples pictures as training sample, by second sample graph
Piece inputs the neural network, adjusts institute based on the corresponding output picture of second samples pictures and the third samples pictures
State the parameter of neural network;
The first sample picture in the picture group is selected, the first sample picture is inputted into the neural network, is based on institute
The corresponding first output picture of first sample picture and third samples pictures are stated as training sample, by the first output picture
The neural network is inputted, based on described in the first output corresponding output picture of picture and third samples pictures adjustment
The parameter of neural network;
When meeting training termination condition, neural network used by current iteration process is exported as the neural network.
4. according to the method in claim 3, which is characterized in that the method also includes:
In the training process in any secondary iterative process, it is based on current iteration number, generates random number, the random number is obeyed
[0,1] it is uniformly distributed in section;
When the random number is greater than the value of objective function, execute the second samples pictures in the selection picture group and
Third samples pictures are trained as the step of training sample;
When the random number is less than the value of objective function, the first sample picture in the selection picture group is executed
Step is trained;
Wherein, the objective function is monotone non-increasing function.
5. method according to claim 2, which is characterized in that the first sample picture is the default of second samples pictures
The down-sampled picture of amplification factor, second samples pictures are the down-sampled of the default amplification factor of the third samples pictures
Picture.
6. the method according to claim 1, which is characterized in that the default amplification factor is integer or non-integer.
7. a kind of multimedia processing apparatus neural network based, which is characterized in that described device includes:
Module is obtained, for obtaining the object magnification of the first multimedia file and first multimedia file;
Determining module, for determining first multimedia file according to the object magnification and default amplification factor
Target circulation number of processes;
Input module, it is described for first multimedia file to be inputted the corresponding neural network of the default amplification factor
Neural network is used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor;
Processing module will be handled in obtaining for being handled by the neural network first multimedia file
Between multimedia file input the neural network again, when circular treatment number reaches the target circulation number of processes,
Export obtained second multimedia file.
8. device according to claim 7, which is characterized in that described device further include:
Training module, for obtaining training dataset, the training dataset includes multiple picture groups, and each picture group includes point
First sample picture, the second samples pictures and the third samples pictures that resolution is incremented by with the default amplification factor;Based on described
Multiple picture groups, are trained neural network, obtain the corresponding neural network of the default amplification factor.
9. a kind of computer equipment, which is characterized in that the computer equipment includes one or more processors and one or more
A memory, is stored at least one program code in one or more of memories, at least one program code by
One or more of processors load and execute as described in any item based on mind to claim 6 such as claim 1 to realize
Operation performed by multi-media processing method through network.
10. a kind of computer readable storage medium, which is characterized in that be stored at least one program generation in the storage medium
Code, at least one program code are loaded by processor and are executed to realize such as any one of claim 1 to claim 6 institute
Operation performed by the multi-media processing method neural network based stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910745322.1A CN110446071A (en) | 2019-08-13 | 2019-08-13 | Multi-media processing method, device, equipment and medium neural network based |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910745322.1A CN110446071A (en) | 2019-08-13 | 2019-08-13 | Multi-media processing method, device, equipment and medium neural network based |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110446071A true CN110446071A (en) | 2019-11-12 |
Family
ID=68435081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910745322.1A Pending CN110446071A (en) | 2019-08-13 | 2019-08-13 | Multi-media processing method, device, equipment and medium neural network based |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110446071A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101841641A (en) * | 2010-03-29 | 2010-09-22 | 中山大学 | Video amplification method and system based on subdivision method |
CN105260986A (en) * | 2015-10-13 | 2016-01-20 | 武汉大学 | Anti-fuzzy image amplification method |
CN107358575A (en) * | 2017-06-08 | 2017-11-17 | 清华大学 | A kind of single image super resolution ratio reconstruction method based on depth residual error network |
CN108600782A (en) * | 2018-04-08 | 2018-09-28 | 深圳市零度智控科技有限公司 | Video super-resolution method, device and computer readable storage medium |
US20190045168A1 (en) * | 2018-09-25 | 2019-02-07 | Intel Corporation | View interpolation of multi-camera array images with flow estimation and image super resolution using deep learning |
US20190139205A1 (en) * | 2017-11-09 | 2019-05-09 | Samsung Electronics Co., Ltd. | Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation |
CN110363709A (en) * | 2019-07-23 | 2019-10-22 | 腾讯科技(深圳)有限公司 | A kind of image processing method, image presentation method, model training method and device |
-
2019
- 2019-08-13 CN CN201910745322.1A patent/CN110446071A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101841641A (en) * | 2010-03-29 | 2010-09-22 | 中山大学 | Video amplification method and system based on subdivision method |
CN105260986A (en) * | 2015-10-13 | 2016-01-20 | 武汉大学 | Anti-fuzzy image amplification method |
CN107358575A (en) * | 2017-06-08 | 2017-11-17 | 清华大学 | A kind of single image super resolution ratio reconstruction method based on depth residual error network |
US20190139205A1 (en) * | 2017-11-09 | 2019-05-09 | Samsung Electronics Co., Ltd. | Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation |
CN108600782A (en) * | 2018-04-08 | 2018-09-28 | 深圳市零度智控科技有限公司 | Video super-resolution method, device and computer readable storage medium |
US20190045168A1 (en) * | 2018-09-25 | 2019-02-07 | Intel Corporation | View interpolation of multi-camera array images with flow estimation and image super resolution using deep learning |
CN110363709A (en) * | 2019-07-23 | 2019-10-22 | 腾讯科技(深圳)有限公司 | A kind of image processing method, image presentation method, model training method and device |
Non-Patent Citations (1)
Title |
---|
孙跃文等: "基于深度学习的辐射图像超分辨率重建方法", 《原子能科学技术》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6355800B1 (en) | Learning device, generating device, learning method, generating method, learning program, and generating program | |
CN110310628B (en) | Method, device and equipment for optimizing wake-up model and storage medium | |
CN110085244B (en) | Live broadcast interaction method and device, electronic equipment and readable storage medium | |
CN107392783A (en) | Social contact method and device based on virtual reality | |
CN106406522B (en) | Virtual reality scene content adjusting method and device | |
WO2019084560A1 (en) | Neural architecture search | |
CN111476871A (en) | Method and apparatus for generating video | |
CN106471572B (en) | Method, system and the robot of a kind of simultaneous voice and virtual acting | |
CN117238451B (en) | Training scheme determining method, device, electronic equipment and storage medium | |
KR20200057823A (en) | Apparatus for video data argumentation and method for the same | |
JP7030095B2 (en) | Methods, devices, servers, computer-readable storage media and computer programs for generating narration | |
CN112307258B (en) | Short video click rate prediction method based on double-layer capsule network | |
CN109961152A (en) | Personalized interactive method, system, terminal device and the storage medium of virtual idol | |
CN117808946A (en) | Method and system for constructing secondary roles based on large language model | |
CN110990632B (en) | Video processing method and device | |
CN110446071A (en) | Multi-media processing method, device, equipment and medium neural network based | |
CN111294662B (en) | Barrage generation method, device, equipment and storage medium | |
CN109885668A (en) | A kind of expansible field interactive system status tracking method and apparatus | |
CN111760276B (en) | Game behavior control method, device, terminal, server and storage medium | |
CN109472028A (en) | Method and apparatus for generating information | |
CN111949860B (en) | Method and apparatus for generating a relevance determination model | |
US10699127B1 (en) | Method and apparatus for adjusting parameter | |
Rahman et al. | Wealth adjustment using a synergy between communication, cooperation, and one-fifth of wealth variables in an artificial society | |
CN117348736B (en) | Digital interaction method, system and medium based on artificial intelligence | |
CN110222190A (en) | Data enhancement methods, system, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191112 |