CN110446071A - Multi-media processing method, device, equipment and medium neural network based - Google Patents

Multi-media processing method, device, equipment and medium neural network based Download PDF

Info

Publication number
CN110446071A
CN110446071A CN201910745322.1A CN201910745322A CN110446071A CN 110446071 A CN110446071 A CN 110446071A CN 201910745322 A CN201910745322 A CN 201910745322A CN 110446071 A CN110446071 A CN 110446071A
Authority
CN
China
Prior art keywords
neural network
picture
amplification factor
multimedia file
samples pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910745322.1A
Other languages
Chinese (zh)
Inventor
陈为
张博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910745322.1A priority Critical patent/CN110446071A/en
Publication of CN110446071A publication Critical patent/CN110446071A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of multi-media processing method, device, equipment and media neural network based, belong to multimedia technology field.Technical solution provided by the invention, the resolution ratio amplification of multiple and different amplification factors can be realized by a neural network, it is not necessary that different neural networks are arranged for different amplification, space needed for greatly reducing storage, and flexible multi-media processing can be realized using the cascade of the same neural network, substantially increase practicability.

Description

Multi-media processing method, device, equipment and medium neural network based
Technical field
The present invention relates to multimedia technology field, in particular to a kind of multi-media processing method neural network based, dress It sets, equipment and medium.
Background technique
With the development of multimedia technology, user is more and more diversified to multimedia demand, for example, in the presence of to video or The demand that person's image resolution ratio is adjusted, and when being adjusted to resolution ratio, it is likely that it will cause the loss of detailed information, Therefore, training neural network, Lai Tigao resolution ratio, so that can guarantee more matchmakers when being adjusted to resolution ratio can be passed through Weight.
However, currently in order to the resolution adjustment of realization different amplification, can be different amplification factor training differences Neural network, training process not only needed for training higher cost, but also cumbersome redundancy is time-consuming huge, so that multimedia The actual efficiency of processing is very low.
Summary of the invention
The embodiment of the invention provides a kind of multi-media processing method, device, equipment and medium neural network based, solutions It has determined the low problem of actual efficiency of existing multi-media processing.The technical solution is as follows:
On the one hand, a kind of multi-media processing method neural network based is provided, which comprises
Obtain the object magnification of the first multimedia file and first multimedia file;
According to the object magnification and default amplification factor, at the target circulation for determining first multimedia file Manage number;
First multimedia file is inputted into the corresponding neural network of the default amplification factor, the neural network is used In the resolution ratio enhanced processing for carrying out the default amplification factor to input multimedia file;
First multimedia file is handled by the neural network, the intermediate multimedia text that processing is obtained Part inputs the neural network again, when circular treatment number reaches the target circulation number of processes, obtained by output The second multimedia file.
On the one hand, a kind of multimedia processing apparatus neural network based is provided, described device includes:
Module is obtained, for obtaining the object magnification of the first multimedia file and first multimedia file;
Determining module, for determining the first multimedia text according to the object magnification and default amplification factor The target circulation number of processes of part;
Input module, for first multimedia file to be inputted the corresponding neural network of the default amplification factor, The neural network is used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor;
Processing module obtains processing for being handled by the neural network first multimedia file Intermediate multimedia file input the neural network again, until circular treatment number reaches the target circulation number of processes When, export obtained second multimedia file.
In a kind of possible implementation, training module is used in any secondary iterative process in the training process, for appointing One picture group executes following training steps at random:
Select the second samples pictures in the picture group and third samples pictures as training sample, by second sample This picture inputs the neural network, is based on the corresponding output picture of second samples pictures and the third samples pictures tune The parameter of the whole neural network;
The first sample picture in the picture group is selected, the first sample picture is inputted into the neural network, base In the corresponding first output picture of the first sample picture and third samples pictures as training sample, described first is exported Picture inputs the neural network, based on the first output corresponding output picture of picture and third samples pictures adjustment The parameter of the neural network;
When meeting training termination condition, neural network used by current iteration process is exported as the nerve net Network.
In a kind of possible implementation, the device further include:
Random number generation module is generated for being based on current iteration number in any secondary iterative process in the training process Random number, the random number is obeyed to be uniformly distributed in [0,1] section;
When the random number is greater than the value of objective function, triggering training module is executed in the selection picture group The second samples pictures and third samples pictures be trained as the step of training sample;
When the random number is less than the value of objective function, triggering training module is executed in the selection picture group First sample picture the step of be trained;
Wherein, the objective function is monotone non-increasing function.
In a kind of possible implementation, the first sample picture is the default amplification factor of second samples pictures Down-sampled picture, second samples pictures be the third samples pictures default amplification factor down-sampled picture.
In a kind of possible implementation, the default amplification factor is integer or non-integer.
On the one hand, a kind of computer equipment is provided, the computer equipment includes one or more processors and one Or multiple memories, at least one program code, at least one program generation are stored in one or more of memories Code is loaded by one or more of processors and is executed to realize such as above-mentioned multi-media processing method neural network based Performed operation.
On the one hand, a kind of computer readable storage medium is provided, at least one program is stored in the storage medium Code, at least one program code are loaded by processor and are executed to realize such as above-mentioned multimedia neural network based Operation performed by processing method.
Technical solution provided by the invention can realize the resolution of multiple and different amplification factors by a neural network Rate amplification, it is not necessary that different neural networks are arranged for different amplification, space needed for greatly reducing storage, and can be using same Flexible multi-media processing is realized in the cascade of one neural network, substantially increases practicability.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is the structural block diagram of multi-media service system provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of the training process of neural network provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram for trained picture group provided in an embodiment of the present invention;
Fig. 4 is a kind of trained flow diagram provided in an embodiment of the present invention;
Fig. 5 is the flow chart of multi-media processing method provided in an embodiment of the present invention;
Fig. 6 is the processing flow schematic diagram of multiple groups different amplification provided in an embodiment of the present invention;
Fig. 7 is a kind of structural schematic diagram of multimedia processing apparatus neural network based provided in an embodiment of the present invention;
Fig. 8 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Artificial intelligence (Artificial Intelligence, AI) is to utilize digital computer or digital computer control Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum By, method, technology and application system.In other words, artificial intelligence is a complex art of computer science, it attempts to understand The essence of intelligence, and produce a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence The design principle and implementation method for namely studying various intelligence machines make machine have the function of perception, reasoning and decision.
Artificial intelligence technology is an interdisciplinary study, is related to that field is extensive, and the technology of existing hardware view also has software layer The technology in face.Artificial intelligence basic technology generally comprise as sensor, Special artificial intelligent chip, cloud computing, distributed storage, The technologies such as big data processing technique, operation/interactive system, electromechanical integration.Artificial intelligence software's technology mainly includes computer Several general orientation such as vision technique, voice processing technology, natural language processing technique and machine learning/deep learning.
Machine learning (Machine Learning, ML) is a multi-field cross discipline, be related to probability theory, statistics, The multiple subjects such as Approximation Theory, convextiry analysis, algorithm complexity theory.Specialize in the study that the mankind were simulated or realized to computer how Behavior reorganizes the existing structure of knowledge and is allowed to constantly improve the performance of itself to obtain new knowledge or skills.Engineering Habit is the core of artificial intelligence, is the fundamental way for making computer have intelligence, and application spreads the every field of artificial intelligence. Machine learning and deep learning generally include artificial neural network, confidence network, intensified learning, transfer learning, inductive learning, formula The technologies such as teaching habit.
With artificial intelligence technology research and progress, research and application is unfolded in multiple fields in artificial intelligence technology, such as Common smart home, intelligent wearable device, virtual assistant, intelligent sound box, intelligent marketing, unmanned, automatic Pilot, nobody Machine, robot, intelligent medical, intelligent customer service etc., it is believed that with the development of technology, artificial intelligence technology will obtain in more fields To application, and play more and more important value.
Scheme provided in an embodiment of the present invention is related to the technologies such as multi-media processing neural network based and machine learning, tool Body is illustrated by following examples:
Below to the present embodiments relate to the implement scene introductions arrived:
Fig. 1 is the structural block diagram of multi-media service system provided in an embodiment of the present invention.The multi-media service system 100 packet It includes: terminal 110 and Multi-media Service Platform 140.
Terminal 110 is connected by wireless network or cable network with Multi-media Service Platform 110.Terminal 110 can be solid Determine terminal or mobile terminal, for example, mobile terminal can for smart phone, game host, desktop computer, tablet computer, At least one of E-book reader, MP3 player, MP4 player and pocket computer on knee.Terminal 110 installation and Operation has the application program for supporting multimedia service.The application program can be social application program, instant messaging application program, Multimedia shares any one in program.Schematically, terminal 110 is the terminal that user uses, and what is run in terminal 110 answers With user account can be logged in program.
Terminal 110 is connected by wireless network or cable network with Multi-media Service Platform 140.
Map service platform 140 include a server, multiple servers, cloud computing platform and virtualization center in extremely Few one kind.Optionally, Multi-media Service Platform 140 includes: multimedia server, multimedia database and user information data Library.Multimedia server provides multimedia service for providing terminal 110.Multimedia server can be one or more.When When multimedia server is more, there are at least two multimedia servers for providing different multimedia services, and/or, It is used to provide identical service in the presence of at least two multimedia servers, for example provides same service with load balancing mode, The embodiment of the present invention is not limited this.The multimedia database is used to store the multimedia file of Multi-media Service Platform, Such as audio file, video file, picture file etc., the User Information Database are used to provide the relevant information of user, with Continue after an action of the bowels and personalized service function is provided for terminal.Certainly, which can also include that other function takes It is engaged in device, in order to provide more comprehensively and diversified service.
Terminal 110 can refer to one in multiple terminals, and the present embodiment is only illustrated with terminal 110.This field Technical staff could be aware that the quantity of above-mentioned terminal can be more or less.For example above-mentioned terminal can be only one, Huo Zheshang Stating terminal is tens or several hundred or greater number, and above-mentioned Map service system further includes other terminals at this time.The present invention Embodiment is not limited the quantity and device type of terminal.
Below to the present embodiments relate to the training process of neural network be introduced:
Fig. 2 is a kind of flow chart of the training process of neural network provided in an embodiment of the present invention, referring to fig. 2, the process Include:
201, server obtains training dataset, which includes multiple picture groups, and each picture group includes point First sample picture, the second samples pictures and the third samples pictures that resolution is incremented by with the default amplification factor.
Wherein, which is incremented by the power between the resolution ratio that can refer to one group of picture with amplification factor Relationship, for example, for the power relationship for being properly termed as that there is amplification factor for r between r, r*r and r*r*r, wherein r is positive Number.
In a kind of possible implementation, the picture of a picture group can be handled to obtain by a true picture, namely It is that the first sample picture is the down-sampled picture of the default amplification factor of second samples pictures, second samples pictures For the down-sampled picture of the default amplification factor of the third samples pictures.The true picture can refer to original image or warp By the down-sampled obtained picture of original image, and output picture is properly termed as the picture of model output, to be distinguished.
Picture can be stored using the format of [x_pre, x, y] in above-mentioned picture group, wherein with default amplification factor be r For, then in above-mentioned picture group, x is the down-sampled picture of the 1/r of y, and x_pre is the down-sampled picture of the 1/r*r of y, wherein " * " indicates multiplication.It is different from training dataset used by current training process, the embodiment of the present invention increases x_pre, To train obtained model that can not only make true picture amplification presupposition multiple, additionally it is possible to so that output picture amplification is default Multiple.
Such as shown in Fig. 3, Yao Xunlian is able to carry out the neural network amplified based on 2 times, can be using with such as Fig. 3 In resolution ratio shown in each bracket picture group as training data.Certainly, each for a neural metwork training There can be coincidence picture in a picture group, that is to say, some picture can reside in different picture groups, as shown in figure 3, resolution ratio It can reside in picture group 1 and picture group 2 for the picture of r/2.
By configuring the training dataset of above structure, the resolution ratio for trained samples pictures can be divided into three grades Not, and with default amplification factor it is incremented by, therefore, the input for the network that training is completed can be the picture of multiple and different resolution ratio, It that is to say, the input of network can be the picture of 1/r resolution ratio, it can also be the picture of 1/r*r or 1/r*r*r, and its net The output picture of network may each be the enlarged drawing for remaining detailed information.
202, in the training process to neural network, server is based on current iteration number i, generates random number, should be with Machine number is obeyed to be uniformly distributed in [0,1] section.
It may include successive ignition for training process, it is corresponding in order to randomly choose in each iterative process Samples pictures can generate random number, wherein i is positive integer based on the number of iterations i as independent variable.Certainly, the random number Generation be also possible to based on any Generating Random Number realize, it is not limited in the embodiment of the present invention.
203, the random number is compared by server with the value of objective function, when random number is greater than taking for objective function When value, step 204 is executed, when random number is less than the value of objective function, executes step 205.
Wherein, objective function is the monotone non-increasing function that independent variable is the number of iterations, expression be can be ratio= Function (i), wherein i is the integer greater than 1, be that is to say, it is from change that objective function ratio, which can be with the number of iterations i, The monotone non-increasing function of amount, which refers to that argument of function increases, and the not increased function of functional value, the mesh Scalar functions can be used but be not limited to following form:
For example, the ratio can be linear function,
The linear function can be expressed as ratio=max (ratio_min, k-cepoch),
The ratio can be anti-sigmoid function,
The anti-sigmoid function representation are as follows: ratio=max (ratio_min, 1/ (1+exp (c (epoch-k))))
The ratio can be exponential function,
The exponential function can be expressed as ratio=max (ratio_min, k^epoch), k < 1
Wherein, ratio_min is used to indicate that the minimum value of objective function, the minimum value can be the numerical value of [0,1], Epoch is for indicating the number of iterations, and k and c are constant, for determining offset and the slope of function.Max takes maximum for indicating Value.
204, server selects the second samples pictures in the picture group and third samples pictures as training sample, by this Second samples pictures input the neural network, based on the corresponding output picture of second samples pictures and the third samples pictures tune The parameter of the whole neural network.
With the samples pictures in picture group to be illustrated for [x_pre, x, y], when random number is greater than objective function When value, then it can choose the corresponding samples pictures of x and y as training sample, using the corresponding samples pictures of x as network inputs, Output picture, referred to herein as y ' are obtained, then calculates loss function based on the corresponding samples pictures of y and y ', thus based on loss The variation tendency of function adjusts the parameter of neural network, by the neural network that process training obtains can make the nerve net Network has the ability that resolution ratio amplification can be carried out based on true picture.
205, server selects the first sample picture in the picture group, which is inputted the nerve net Network, it is based on the corresponding first output picture of the first sample picture and third samples pictures as training sample, this is first defeated Picture inputs the neural network out, adjusts the mind based on the corresponding output picture of the first output picture and the third samples pictures Parameter through network.
Still with the samples pictures in picture group to be illustrated for [x_pre, x, y], when random number small fish target letter When several values, then the corresponding samples pictures input Current Situation of Neural Network of x_pre can be first selected, output picture is known as x ', this When x ' be not true picture, and then can using the corresponding samples pictures of x ' and y as training sample, with x ' for network inputs, Y " is obtained, then loss function is calculated based on y and y ", to adjust the ginseng of neural network based on the variation tendency of loss function Number, the neural network can be made to have by the neural network that process training obtains can be differentiated based on output picture The ability of rate amplification.
In above-mentioned steps 203 to 205, used in trained iterative process each time with certain probability selection true Real picture is still trained based on the output picture of true picture.At training initial stage, the value of objective function is bigger, with The random number that machine generates is less than its value with biggish probability and is therefore all trained using true picture when most of, with Just network can learn to amplifying required parameter for true picture, and with trained progress, objective function Value gradually becomes smaller, and is gradually increased using the ratio that the output picture based on true picture is trained, so that network energy Enough gradually to amplify network output picture from can only amplify true picture and be transitioned into, neural network at this time can gradually have Cascade characteristic becomes the general network for being used to carry out resolution ratio amplification.
206, when meeting training termination condition, server exports neural network used by current iteration process for institute State neural network.
The training termination condition can reach targeted number for the number of iterations or loss function meets preset condition, may be used also When thinking based on validation data set verifying, ability is not promoted whithin a period of time.Wherein, which can be in advance The number of iterations of setting on the opportunity terminated to determine training, avoids the waste to training resource, and the preset condition can be Such as loss function value is constant whithin a period of time in training process or the conditions such as does not decline, and has illustrated the training process at this time Training effect through having reached.For entire training process, flow chart provided by Fig. 4 can be referred to, is related in the flow chart To a kind of method flow that can be used when implementing, it that is to say, first prepare training dataset, then construct neural network, i-th When secondary iteration, then the part samples pictures that available training data is concentrated, and determined based on random number with which picture come It is trained, and after a collection of samples pictures training, then termination condition can be based on to determine whether meeting, when meeting Then the neural network that training obtains can be exported with deconditioning, and work as and be unsatisfactory for, then continue to be instructed based on training dataset Practice.
In the above process when carrying out model training, if the quantity for all kinds of resolution charts for including in each picture group It is extended, then can obtain the neural network that can carry out resolution ratio amplification to video by training, that is to say, for being used for For the training process that video carries out the neural network of resolution ratio amplification, when obtaining training dataset, can still it use [x_pre, x, y] format described in above-described embodiment, wherein the corresponding samples pictures of y can be high-resolution center frame, The corresponding samples pictures of x are the center frame and front and back N frame of the 1/r resolution ratio of the samples pictures of y, amount to 2N+1 frame, x_pre is y 1/r^2 resolution ratio center frame and front and back 2N frame, amount to 4N+1 frame, utilize [x_pre, x, y] format training dataset into The obtained neural network of row training, can input the corresponding samples pictures of x, corresponding by neural network reconstruction resolution ratio y Center frame can also first input the corresponding samples pictures of x_pre, reconstruct the x ' of 1/r resolution ratio, then input x ', by nerve The corresponding center frame of network reconnection resolution ratio y, wherein N is the integer greater than 0.
Training process provided in an embodiment of the present invention, relative in the related technology to the mould of some amplification factor r training Type is only applicable to the case where amplification factor is r, can be based on lower trained cost, obtain can be realized times magnification by training Number is the neural network of r*n (n is any positive integer).Further, the neural network obtained based on the training of above-mentioned training process The amplification that can be thus achieved more times, without training the network of multiple and different amplification factors, the parameter amount of this single network Parameter amount relative to multiple networks is much smaller, and when realizing the amplification factor of same range, the embodiment of the present invention is obtained Neural network required for save model parameter it is few, save memory space.
Further, above-mentioned training process substantially increases the practicability of model, can be answered by a neural network It is realized so that neural network can be used as a flexible general module by cascade mode for multiple amplification ranges Biggish amplification range.
In a kind of possible implementation, which can be set based on VESPCN or VSR-DUF example Meter, the model of all amplification factors with power relationship is simplified to greatly reduce for a model needs trained model Number.Wherein, VESPCN is the work of Wenzhe Shi and the Jose Caballero of venture company Magic Pony Technology Product.And optionally, in order to realize the amplification factor of non-integer, which is also based on Meta-SR model, should Meta-SR may be implemented a model and amplify suitable for the resolution ratio of a certain range of non-integral multiple amplification factor, to reach It is suitable for the purpose of arbitrary amplification factor to single model.
The use process that the neural network is introduced below based on Fig. 5 that is to say multi-media processing process, should referring to Fig. 5 Method can be applied in server or computer equipment, for example, server can be come with the good neural network of application training into Trained neural network can also be sent to any computer equipment to carry out at multimedia by row multi-media processing method Reason, the embodiment of the present invention are only illustrated so that computer equipment is executing subject as an example, which includes:
501, computer equipment obtains the object magnification of the first multimedia file and first multimedia file.
Wherein, which can be picture or video, and it is not limited in the embodiment of the present invention, and mesh Mark amplification factor refers to the multiple for wanting to amplify the first multimedia file, which can be to preset Multiple, for example, for some multimedia file, the object magnification of resolution ratio is 4 times.The object magnification It can also be the amplification factor being calculated based on current resolution and target resolution, as first multimedia file is current Resolution ratio is x, and user has selected a target resolution for x*r, then r can be obtained by calculation in the object magnification.Its In, resolution ratio x is positive number.
502, computer equipment determines first multimedia file according to the object magnification and default amplification factor Target circulation number of processes.
Wherein, default amplification factor can be the amplification that the pre-set neural network model of computer equipment can be supported Multiple corresponds respectively to different default amplification factors, then if being provided with multiple neural network models in the computer equipment It can be compared one by one according to the object magnification and default amplification factor, so that the default amplification factor that ratio is integer be made The basis calculated for this.
It, can be after getting object magnification for computer equipment, determination needs to carry out circular treatment Number, therefore, the power relationship of available object magnification and default amplification factor, using the value of power relationship as this One multimedia is with the target circulation number of processes of file.For example, object magnification is 4, and default amplification factor is 2, then may be used To determine its target circulation number of processes for 2 times.
503, first multimedia file is inputted the default corresponding neural network of amplification factor, the mind by computer equipment It is used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor through network.
504, computer equipment is handled first multimedia file by the neural network, will be handled in obtaining Between multimedia file input the neural network again, until circular treatment number reach the target circulation number of processes stop, it is defeated Stop obtained second multimedia file when processing out.
Computer equipment can be requested in response to enhanced processing, called this being pre-set in computer equipment to preset and put The corresponding neural network of big multiple using the first multimedia file as network inputs, and regard network output as network inputs, again It is input in the neural network, repeats the above process until circular treatment number reaches target circulation number of processes, then it can will The the second multimedia file output obtained at this time.Still with object magnification for 4, presetting amplification factor is 2, target circulation processing It, then, will after the first multimedia file being inputted network referring to second group of input and output process of Fig. 6 for number is 2 times The multimedia file of output inputs neural network again, comes using the multimedia file of this output as the second multimedia file defeated Out.And if its object magnification is 8, it can determine that target circulation number of processes is 3 times, it is defeated referring to the third group of Fig. 6 Enter to export process, after the first multimedia file being inputted network, the multimedia file of output inputted into neural network again, This multimedia file exported is re-used as network inputs to input neural network, and using obtained multimedia file as the Two multimedia files export.
It should be noted that corresponding neural network is based on as video frame when multimedia file is video Picture is trained to obtain, and when multimedia file is picture, then its corresponding neural network can be based on picture training It obtains, therefore, the file type of multimedia file can also be identified between step 502, to be based on different files Type calls corresponding neural network to carry out multi-media processing.
By above-mentioned multi-media processing process, point of multiple and different amplification factors can be realized by a neural network Resolution amplification, it is not necessary that different neural networks are arranged for different amplification, space needed for greatly reducing storage, and can apply Flexible multi-media processing is realized in the cascade of the same neural network, substantially increases practicability.
By taking Video Applications as an example, when user is using Video Applications viewing video resource, need constantly to ask to server HD video is sought, and downloads HD video and not only consumes time delay in the mobile data traffic but also transmission process of user, will cause use There is the bad experience such as Caton during watching video in family.And technical solution provided in an embodiment of the present invention is applied, it uses Family can choose the resolution ratio of resolution ratio and the video downloaded from server when watching video.When the video resolution of downloading is small When resolution ratio when watching video, video should can within by the ratio that both calculates, to select suitable amplification factor pair Video carries out super-resolution, that is to say, multi-media processing process is carried out, to realize the amplification of resolution ratio.This mode can make Network condition is bad or wants that the user for saving flow selects the resolution ratio of lower foradownloaded video, but still can watch compared with high score The video of resolution reaches better audio visual effect while with the problems such as saving flow and avoiding Caton.User can lead to It crosses and selects suitable resolution ratio, find optimal tradeoff between viewing cost, fluency and image quality.
Following 1 Korean style of table is to carry out experimental data when resolution ratio amplification based on distinct methods, in the table Numerical value indicate PSNR value, PSNR value is bigger, and performance is better.
Table 1
In table 1 in addition to the method provided by the embodiment of the present invention, it is for x2, x4 has trained 2 independent networks, and Method provided in an embodiment of the present invention only has trained a neural network, by the recycling of neural network, to realize x4's Amplification factor can see from the data in table, the training method proposed by the technical program, realize resolution ratio amplification While, it is suitable with method before in performance, it was demonstrated that the feasibility of cascade structure.
Fig. 7 is a kind of structural schematic diagram of multimedia processing apparatus neural network based provided in an embodiment of the present invention. Referring to Fig. 7, which includes:
Module 701 is obtained, for obtaining the object magnification of the first multimedia file and first multimedia file;
Determining module 702, for determining first multimedia according to the object magnification and default amplification factor The target circulation number of processes of file;
Input module 703, for first multimedia file to be inputted the corresponding nerve net of the default amplification factor Network, the neural network are used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor;
Processing module 704 will be handled for being handled by the neural network first multimedia file To intermediate multimedia file input the neural network again, until circular treatment number reaches target circulation processing time When number, obtained second multimedia file is exported.
In a kind of possible implementation, described device further include:
Training module, for obtaining training dataset, the training dataset includes multiple picture groups, each picture group packet Include first sample picture, the second samples pictures and third samples pictures that resolution ratio is incremented by with the default amplification factor;It is based on The multiple picture group, is trained neural network, obtains the corresponding neural network of the default amplification factor.
In a kind of possible implementation, training module is used in any secondary iterative process in the training process, for appointing One picture group executes following training steps at random:
Select the second samples pictures in the picture group and third samples pictures as training sample, by second sample This picture inputs the neural network, is based on the corresponding output picture of second samples pictures and the third samples pictures tune The parameter of the whole neural network;
The first sample picture in the picture group is selected, the first sample picture is inputted into the neural network, base In the corresponding first output picture of the first sample picture and third samples pictures as training sample, described first is exported Picture inputs the neural network, based on the first output corresponding output picture of picture and third samples pictures adjustment The parameter of the neural network;
When meeting training termination condition, neural network used by current iteration process is exported as the nerve net Network.
In a kind of possible implementation, the device further include:
Random number generation module is generated for being based on current iteration number in any secondary iterative process in the training process Random number, the random number is obeyed to be uniformly distributed in [0,1] section;
When the random number is greater than the value of objective function, triggering training module is executed in the selection picture group The second samples pictures and third samples pictures be trained as the step of training sample;
When the random number is less than the value of objective function, triggering training module is executed in the selection picture group First sample picture the step of be trained;
Wherein, the objective function is monotone non-increasing function.
In a kind of possible implementation, the first sample picture is the default amplification factor of second samples pictures Down-sampled picture, second samples pictures be the third samples pictures default amplification factor down-sampled picture.
In a kind of possible implementation, the default amplification factor is integer or non-integer.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.
It should be understood that multimedia processing apparatus neural network based provided by the above embodiment is in multi-media processing When, only the example of the division of the above functional modules, in practical application, it can according to need and divide above-mentioned function With being completed by different functional modules, i.e., the internal structure of equipment is divided into different functional modules, to complete above description All or part of function.In addition, multimedia processing apparatus neural network based provided by the above embodiment with based on mind Multi-media processing method embodiment through network belongs to same design, and specific implementation process is detailed in embodiment of the method, here not It repeats again.
Fig. 8 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can mention For for server, which can generate bigger difference because configuration or performance are different, may include one or More than one processor (central processing units, CPU) 801 and one or more memory 802, In, it is stored at least one instruction in the memory 802, which is loaded by the processor 801 and executed with reality The multi-media processing method neural network based that existing above-mentioned each embodiment of the method provides.Certainly, which may be used also With the components such as wired or wireless network interface, keyboard and input/output interface, to carry out input and output, the computer Equipment can also include other for realizing the component of functions of the equipments, and this will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, depositing for example including program code Reservoir, above procedure code can by the processor in computer equipment execute with complete in above-described embodiment based on neural network Multi-media processing method.For example, the computer readable storage medium can be ROM (Read-Only Memory, read-only storage Device), RAM (random access memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, CD-ROM), tape, floppy disk and optical data storage devices etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program being somebody's turn to do can store computer-readable deposits in a kind of In storage media, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of multi-media processing method neural network based, which is characterized in that the described method includes:
Obtain the object magnification of the first multimedia file and first multimedia file;
According to the object magnification and default amplification factor, the target circulation processing time of first multimedia file is determined Number;
By the corresponding neural network of the first multimedia file input default amplification factor, the neural network for pair Input multimedia file carries out the resolution ratio enhanced processing of the default amplification factor;
First multimedia file is handled by the neural network, the intermediate multimedia file that processing is obtained is again The secondary input neural network exports obtained the when circular treatment number reaches the target circulation number of processes Two multimedia files.
2. the method according to claim 1, which is characterized in that the training process of the corresponding neural network of the default amplification factor Include:
Training dataset is obtained, the training dataset includes multiple picture groups, and each picture group includes resolution ratio with described pre- If amplification factor incremental first sample picture, the second samples pictures and third samples pictures;
Based on the multiple picture group, neural network is trained, obtains the corresponding neural network of the default amplification factor.
3. method according to claim 2, which is characterized in that it is described to be based on the multiple picture group, neural network is instructed Practice, obtaining the corresponding neural network of the default amplification factor includes:
In the training process in any secondary iterative process, for any picture group, following training steps are executed at random:
Select the second samples pictures in the picture group and third samples pictures as training sample, by second sample graph Piece inputs the neural network, adjusts institute based on the corresponding output picture of second samples pictures and the third samples pictures State the parameter of neural network;
The first sample picture in the picture group is selected, the first sample picture is inputted into the neural network, is based on institute The corresponding first output picture of first sample picture and third samples pictures are stated as training sample, by the first output picture The neural network is inputted, based on described in the first output corresponding output picture of picture and third samples pictures adjustment The parameter of neural network;
When meeting training termination condition, neural network used by current iteration process is exported as the neural network.
4. according to the method in claim 3, which is characterized in that the method also includes:
In the training process in any secondary iterative process, it is based on current iteration number, generates random number, the random number is obeyed [0,1] it is uniformly distributed in section;
When the random number is greater than the value of objective function, execute the second samples pictures in the selection picture group and Third samples pictures are trained as the step of training sample;
When the random number is less than the value of objective function, the first sample picture in the selection picture group is executed Step is trained;
Wherein, the objective function is monotone non-increasing function.
5. method according to claim 2, which is characterized in that the first sample picture is the default of second samples pictures The down-sampled picture of amplification factor, second samples pictures are the down-sampled of the default amplification factor of the third samples pictures Picture.
6. the method according to claim 1, which is characterized in that the default amplification factor is integer or non-integer.
7. a kind of multimedia processing apparatus neural network based, which is characterized in that described device includes:
Module is obtained, for obtaining the object magnification of the first multimedia file and first multimedia file;
Determining module, for determining first multimedia file according to the object magnification and default amplification factor Target circulation number of processes;
Input module, it is described for first multimedia file to be inputted the corresponding neural network of the default amplification factor Neural network is used to carry out input multimedia file the resolution ratio enhanced processing of the default amplification factor;
Processing module will be handled in obtaining for being handled by the neural network first multimedia file Between multimedia file input the neural network again, when circular treatment number reaches the target circulation number of processes, Export obtained second multimedia file.
8. device according to claim 7, which is characterized in that described device further include:
Training module, for obtaining training dataset, the training dataset includes multiple picture groups, and each picture group includes point First sample picture, the second samples pictures and the third samples pictures that resolution is incremented by with the default amplification factor;Based on described Multiple picture groups, are trained neural network, obtain the corresponding neural network of the default amplification factor.
9. a kind of computer equipment, which is characterized in that the computer equipment includes one or more processors and one or more A memory, is stored at least one program code in one or more of memories, at least one program code by One or more of processors load and execute as described in any item based on mind to claim 6 such as claim 1 to realize Operation performed by multi-media processing method through network.
10. a kind of computer readable storage medium, which is characterized in that be stored at least one program generation in the storage medium Code, at least one program code are loaded by processor and are executed to realize such as any one of claim 1 to claim 6 institute Operation performed by the multi-media processing method neural network based stated.
CN201910745322.1A 2019-08-13 2019-08-13 Multi-media processing method, device, equipment and medium neural network based Pending CN110446071A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910745322.1A CN110446071A (en) 2019-08-13 2019-08-13 Multi-media processing method, device, equipment and medium neural network based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910745322.1A CN110446071A (en) 2019-08-13 2019-08-13 Multi-media processing method, device, equipment and medium neural network based

Publications (1)

Publication Number Publication Date
CN110446071A true CN110446071A (en) 2019-11-12

Family

ID=68435081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910745322.1A Pending CN110446071A (en) 2019-08-13 2019-08-13 Multi-media processing method, device, equipment and medium neural network based

Country Status (1)

Country Link
CN (1) CN110446071A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101841641A (en) * 2010-03-29 2010-09-22 中山大学 Video amplification method and system based on subdivision method
CN105260986A (en) * 2015-10-13 2016-01-20 武汉大学 Anti-fuzzy image amplification method
CN107358575A (en) * 2017-06-08 2017-11-17 清华大学 A kind of single image super resolution ratio reconstruction method based on depth residual error network
CN108600782A (en) * 2018-04-08 2018-09-28 深圳市零度智控科技有限公司 Video super-resolution method, device and computer readable storage medium
US20190045168A1 (en) * 2018-09-25 2019-02-07 Intel Corporation View interpolation of multi-camera array images with flow estimation and image super resolution using deep learning
US20190139205A1 (en) * 2017-11-09 2019-05-09 Samsung Electronics Co., Ltd. Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation
CN110363709A (en) * 2019-07-23 2019-10-22 腾讯科技(深圳)有限公司 A kind of image processing method, image presentation method, model training method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101841641A (en) * 2010-03-29 2010-09-22 中山大学 Video amplification method and system based on subdivision method
CN105260986A (en) * 2015-10-13 2016-01-20 武汉大学 Anti-fuzzy image amplification method
CN107358575A (en) * 2017-06-08 2017-11-17 清华大学 A kind of single image super resolution ratio reconstruction method based on depth residual error network
US20190139205A1 (en) * 2017-11-09 2019-05-09 Samsung Electronics Co., Ltd. Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation
CN108600782A (en) * 2018-04-08 2018-09-28 深圳市零度智控科技有限公司 Video super-resolution method, device and computer readable storage medium
US20190045168A1 (en) * 2018-09-25 2019-02-07 Intel Corporation View interpolation of multi-camera array images with flow estimation and image super resolution using deep learning
CN110363709A (en) * 2019-07-23 2019-10-22 腾讯科技(深圳)有限公司 A kind of image processing method, image presentation method, model training method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙跃文等: "基于深度学习的辐射图像超分辨率重建方法", 《原子能科学技术》 *

Similar Documents

Publication Publication Date Title
JP6355800B1 (en) Learning device, generating device, learning method, generating method, learning program, and generating program
CN110310628B (en) Method, device and equipment for optimizing wake-up model and storage medium
CN110085244B (en) Live broadcast interaction method and device, electronic equipment and readable storage medium
CN107392783A (en) Social contact method and device based on virtual reality
CN106406522B (en) Virtual reality scene content adjusting method and device
WO2019084560A1 (en) Neural architecture search
CN111476871A (en) Method and apparatus for generating video
CN106471572B (en) Method, system and the robot of a kind of simultaneous voice and virtual acting
CN117238451B (en) Training scheme determining method, device, electronic equipment and storage medium
KR20200057823A (en) Apparatus for video data argumentation and method for the same
JP7030095B2 (en) Methods, devices, servers, computer-readable storage media and computer programs for generating narration
CN112307258B (en) Short video click rate prediction method based on double-layer capsule network
CN109961152A (en) Personalized interactive method, system, terminal device and the storage medium of virtual idol
CN117808946A (en) Method and system for constructing secondary roles based on large language model
CN110990632B (en) Video processing method and device
CN110446071A (en) Multi-media processing method, device, equipment and medium neural network based
CN111294662B (en) Barrage generation method, device, equipment and storage medium
CN109885668A (en) A kind of expansible field interactive system status tracking method and apparatus
CN111760276B (en) Game behavior control method, device, terminal, server and storage medium
CN109472028A (en) Method and apparatus for generating information
CN111949860B (en) Method and apparatus for generating a relevance determination model
US10699127B1 (en) Method and apparatus for adjusting parameter
Rahman et al. Wealth adjustment using a synergy between communication, cooperation, and one-fifth of wealth variables in an artificial society
CN117348736B (en) Digital interaction method, system and medium based on artificial intelligence
CN110222190A (en) Data enhancement methods, system, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191112