CN110516749A - Model training method, method for processing video frequency, device, medium and calculating equipment - Google Patents
Model training method, method for processing video frequency, device, medium and calculating equipment Download PDFInfo
- Publication number
- CN110516749A CN110516749A CN201910811249.3A CN201910811249A CN110516749A CN 110516749 A CN110516749 A CN 110516749A CN 201910811249 A CN201910811249 A CN 201910811249A CN 110516749 A CN110516749 A CN 110516749A
- Authority
- CN
- China
- Prior art keywords
- video
- model
- label
- video clip
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Embodiments of the present invention provide a kind of model training method, comprising: obtain multiple video clips;Respectively the multiple video clip adds label, wherein the label is for characterizing the effective information that the video clip is included;Establish the neural network model comprising time dimension;And the neural network model is trained using the multiple video clip with label, Optimized model is obtained, the Optimized model from video file for extracting the target video segment comprising maximum effective information.Embodiments of the present invention additionally provide a kind of method for processing video frequency, model training apparatus, video process apparatus, medium and calculate equipment.
Description
Technical field
Embodiments of the present invention are related to field of computer technology, more specifically, embodiments of the present invention are related to model
Training method, method for processing video frequency, device, medium and calculating equipment.
Background technique
Background that this section is intended to provide an explanation of the embodiments of the present invention set forth in the claims or context.Herein
Description recognizes it is the prior art not because not being included in this section.
In the prior art, will usually one two points be regarded for the resolution of wonderful in video file and non-wonderful
Class problem carries out analysis solution, two disaggregated model training is carried out by intersecting entropy loss, and carrying out model training process
Often ignore the temporal aspect in video clip, point of the model for causing training to obtain to the video clip of different excellent degree
Distinguish that ability is limited.
Summary of the invention
In the present context, embodiments of the present invention are intended to provide a kind of model training method, method for processing video frequency, dress
Set, medium and calculate equipment.
In the first aspect of embodiment of the present invention, a kind of model training method is provided, comprising: obtain multiple videos
Segment, then respectively multiple video clip adds label.Wherein, label is for characterizing effective letter that video clip is included
Breath amount.Then, the neural network model comprising time dimension is established, and using above-mentioned multiple video clips with label to this
Neural network model is trained, and obtains Optimized model.The Optimized model is used to extracted from video file comprising maximum effective
The target video segment of information content.
In one embodiment of the invention, above-mentioned label includes: the first label, the second label and third label.Wherein,
The effective information of first tag characterization is greater than the effective information of the second tag characterization, the effective information of the second tag characterization
Greater than the effective information of third tag characterization.
In another embodiment of the invention, it is above-mentioned using with label multiple video clips to neural network model into
Row training includes: to construct multiple samples pair based on multiple video clips with label.Wherein each sample is not to including with
With two video clips of label.Then it is trained using multiple samples to neural network model, obtains Optimized model.
It is above-mentioned to be trained using multiple samples to neural network model in another embodiment of the present invention, it obtains
Optimized model includes: for any sample pair, and any sample is to including the first video clip and the second video clip, wherein the
The effective information of the tag characterization of one video clip is greater than the effective information of the tag characterization of the second video clip.Then will
Any sample respectively obtains the first score and the second video clip of the first video clip to neural network model is input to
Second score.The first numerical value is obtained after first score is subtracted the second score, is determined using loss function based on first numerical value
Penalty values, wherein loss function is monotonous descending function.When the penalty values are less than or equal to predetermined threshold, determine that current training obtains
Neural network model be Optimized model.When the penalty values are greater than predetermined threshold, to the current neural network mould trained and obtained
The parameter of type continues to optimize, and repeats aforesaid operations until obtaining Optimized model.
It is above-mentioned that two video clips of a sample centering are separately input into nerve in one more embodiment of the present invention
Network model includes: the image sequence comprising predetermined quantity image to be extracted from the video clip, so for each video clip
The image sequence is input to neural network model afterwards.
In one more embodiment of the present invention, above-mentioned foundation includes that the neural network model of time dimension includes: to establish volume
Product neural network model, and be one or more convolution kernels of the convolutional neural networks model setting comprising time dimension.
In one more embodiment of the present invention, the above-mentioned multiple video clips of acquisition include: obtain multiple video samples and
Image exchange format file relevant to multiple video sample.Then for any video sample in multiple video sample
This, determines starting position and knot of the image exchange format file relevant to any video sample in any Sample video
Beam position is extracted from any video sample from above-mentioned starting position to the video clip of above-mentioned end position.
In one more embodiment of the present invention, above-mentioned be respectively multiple video clips addition labels includes: for from any
Extracted in video sample from above-mentioned starting position to the video clip of above-mentioned end position, add the first label.
In one more embodiment of the present invention, the above-mentioned multiple video clips of acquisition include: to obtain multiple video samples.Then
Video segmentation processing is carried out to multiple video samples, obtains multiple video clips.
In the second aspect of embodiment of the present invention, a kind of method for processing video frequency is provided, comprising: obtain video text
Part.Then the video file is handled using Optimized model, to be extracted from the video file comprising maximum effective information
Target video segment.Wherein, Optimized model be any one of based on the above embodiment described in model training method training obtain
's.
In one embodiment of the invention, above-mentioned to handle video file using Optimized model, so as to from the video file
Target video segment of the middle extraction comprising maximum effective information includes: to carry out Video segmentation processing to a video file, is obtained
To multiple video clips to be measured.Then for any video clip to be measured in multiple video clips to be measured, this is any to be measured
Video clip is input to Optimized model, obtains the score of any video clip to be measured.Then by the video to be measured of highest scoring
Segment is as target video segment.
In another embodiment of the invention, it is above-mentioned by any video clip to be measured be input to Optimized model include: from appoint
The testing image sequence comprising predetermined quantity image is extracted in one video clip to be measured, and testing image sequence inputting is extremely optimized
Model.
In another embodiment of the present invention, the above method further include: image exchange lattice are made based on target video segment
The image exchange format file as the cover of video file and is shown by formula file.
In the third aspect of embodiment of the present invention, a kind of model training apparatus is provided, comprising: first obtains mould
Block, mark module, modeling module and training module.Wherein, the first acquisition module is for obtaining multiple video clips.Label
Module is for being respectively that multiple video clips add label.Wherein label is for characterizing the effective information that video clip is included
Amount.Modeling module is for establishing the neural network model comprising time dimension.Training module is used for using multiple with label
Video clip is trained neural network model, obtains Optimized model.Wherein Optimized model from video file for extracting
Target video segment comprising maximum effective information.
In one embodiment of the invention, above-mentioned label includes: the first label, the second label and third label.First
The effective information of tag characterization is greater than the effective information of the second tag characterization, and the effective information of the second tag characterization is greater than
The effective information of third tag characterization.
In another embodiment of the invention, training module includes sample to building submodule and sample to training submodule
Block.Wherein, sample is used to construct multiple samples pair based on multiple video clips with label to building submodule, wherein each
Sample is to including two video clips with different labels.Sample is used for using multiple samples to nerve training submodule
Network model is trained, and obtains Optimized model.
In another embodiment of the present invention, sample is specifically used for training submodule: any sample is to including the first view
Frequency segment and the second video clip, the effective information of the tag characterization of the first video clip are greater than the label of the second video clip
The effective information of characterization.For any sample pair, by any sample to neural network model is input to, the first view is respectively obtained
First score of frequency segment and the second score of the second video clip.Then first is obtained after the first score being subtracted the second score
Numerical value, and penalty values are determined based on the first numerical value using loss function, wherein loss function is monotonous descending function.When penalty values are small
When being equal to predetermined threshold, determine that neural network model is Optimized model.When penalty values are greater than predetermined threshold, to neural network
The parameter of model optimizes, and repeats aforesaid operations until obtaining Optimized model.
In one more embodiment of the present invention, sample is to training submodule by any sample to being input to neural network model
Process may is that sample to training submodule for each video clip of any sample centering, mentioned from the video clip
The image sequence comprising predetermined quantity image is taken, and the image sequence is input to neural network model.
In one more embodiment of the present invention, modeling module is specifically used for establishing convolutional neural networks model, and is the volume
Product neural network model setting includes one or more convolution kernels of time dimension.
In one more embodiment of the present invention, the first acquisition module includes sample acquisition submodule and snippet extraction submodule
Block.Wherein, sample acquisition submodule is handed over for obtaining multiple video samples and image relevant to the multiple video sample
Change formatted file.Snippet extraction submodule is used for for any video sample in multiple video samples, determining and any video
Starting position and end position in the relevant image exchange format file of a sample Sample video in office, from any video sample
Video clip of the middle extraction from starting position to end position.
In one more embodiment of the present invention, mark module is specifically used for for extracting from any video sample from opening
The first label is added to the video clip of end position in beginning position.
In one more embodiment of the present invention, the first acquisition module includes sample acquisition submodule and dividing processing submodule
Block.Wherein, sample acquisition submodule is for obtaining multiple video samples.Dividing processing submodule be used for multiple video samples into
The processing of row Video segmentation, obtains multiple video clips.
In the fourth aspect of embodiment of the present invention, a kind of video process apparatus is provided, comprising: second obtains module
And extraction module.Wherein, the second acquisition module is for obtaining video file.Extraction module is used to handle video using Optimized model
File, to extract the target video segment comprising maximum effective information from video file.Wherein Optimized model is to be based on
The training of model training method described in any one of above-described embodiment obtains.
In one embodiment of the invention, extraction module include: Video segmentation submodule, evaluation submodule and really
Stator modules.Video segmentation submodule is used to carry out Video segmentation processing to a video file, obtains multiple piece of video to be measured
Section.Submodule is evaluated to be used for for any video clip to be measured in the multiple video clip to be measured, it will be described any to be measured
Video clip is input to the Optimized model, obtains the score of any video clip to be measured.Determine submodule for that will obtain
Divide highest video clip to be measured as the target video segment.
In another embodiment of the invention, evaluation submodule, which is specifically used for extracting from any video clip to be measured, includes
The testing image sequence of the predetermined quantity image, and by testing image sequence inputting to Optimized model.
In another embodiment of the present invention, above-mentioned video process apparatus further includes display module, for being regarded based on target
Frequency segment makes image exchange format file, which as the cover of video file and is shown.
In the 5th aspect of embodiment of the present invention, a kind of medium is provided, computer executable instructions are stored with, is referred to
It enables when being executed by processor for realizing method described in any one of above-described embodiment.
Embodiment of the present invention the 6th aspect in, provide a kind of calculating equipment, comprising: memory, processor and
Store the executable instruction that can be run on a memory and on a processor, realization when processor executes instruction: above-described embodiment
Any one of described in method.
The method for processing video frequency and device of embodiment according to the present invention, using with label video clip to comprising when
Between the neural network model of dimension be trained, label is obtained for characterizing the effective information that video clip is included, training
The Optimized model effective information that can be included to unknown video clip evaluate, so as to for from unknown video
The featured videos segment comprising maximum effective information is extracted in file.
Detailed description of the invention
The following detailed description is read with reference to the accompanying drawings, above-mentioned and other mesh of exemplary embodiment of the invention
, feature and advantage will become prone to understand.In the accompanying drawings, if showing by way of example rather than limitation of the invention
Dry embodiment, in which:
Figure 1A~1B schematically show the model training method, method for processing video frequency of embodiment according to the present invention and
The application scenarios of its device;
Fig. 2 schematically shows the flow charts of model training method according to an embodiment of the invention;
Fig. 3 schematically shows the flow chart of model training process according to an embodiment of the invention;
Fig. 4 schematically shows the example schematic diagrams of model training process according to an embodiment of the invention;
Fig. 5 schematically shows the flow chart of method for processing video frequency according to an embodiment of the invention;
Fig. 6 schematically shows the example schematic diagram of video processing procedure according to an embodiment of the invention;
Fig. 7 schematically shows the block diagram of model training apparatus according to an embodiment of the invention;
Fig. 8 schematically shows the block diagram of video process apparatus according to an embodiment of the invention;
Fig. 9 schematically shows the schematic diagram of the computer readable storage medium product of embodiment according to the present invention;
And
Figure 10 schematically shows the block diagram of the calculating equipment of embodiment according to the present invention.
In the accompanying drawings, identical or corresponding label indicates identical or corresponding part.
Specific embodiment
The principle and spirit of the invention are described below with reference to several illustrative embodiments.It should be appreciated that providing this
A little embodiments are used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and be not with any
Mode limits the scope of the invention.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and energy
It is enough that the scope of the present disclosure is completely communicated to those skilled in the art.
One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method
Or computer program product.Therefore, the present disclosure may be embodied in the following forms, it may be assumed that complete hardware, complete software
The form that (including firmware, resident software, microcode etc.) or hardware and software combine.
Embodiment according to the present invention, propose a kind of model training method, method for processing video frequency, device, medium and
Calculate equipment.
Herein, it is to be understood that related term includes: image exchange format (Graphics
Interchange Format, GIF) file, convolutional neural networks (Convolutional Neural Network, CNN), 3D
Convolutional network, video clip etc..Wherein, gif file be it is a kind of based on Lan Bo-Li Fu-Wei Qu (Lempel-Ziv-Welch,
LZW) the nondestructive compression type of the continuous tone of algorithm.Unlike other static images display forms, gif file is increased
Time dimension is the continuous dynamic picture played automatically.CNN is a kind of feedforward neural network, its artificial neuron
Member can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.When handling image,
The dimension of the convolution kernel used is usually 2 dimensions, and a convolution kernel acts on a characteristic pattern (Feature Map), then plus
Upper storage reservoir layer etc. can get the local feature and global characteristics of image very well.The energy such as convolutional layer combination pond layer, full articulamentum
Enough construct a complete neural network model.Model learns convolution nuclear parameter by back-propagation algorithm.3D convolution net
Network is converted to 3 dimension video data of processing from 2 dimensional data images of processing, and the convolution kernel used also expands to 3 dimensions from 2 dimensions.Calculating side
Formula is similar with 2 dimension convolution modes, the difference is that can be good at obtaining videl stage feature because expanding time dimension.
One video is really made of multiple video clips.The existing regular hour correlation of these segments, and there is centainly only
Vertical property.Temporal correlation, which refers to, in chronological order to be connected them, and a continuous smooth complete video can be formed.
Independence refers to when certain segments are individually shown, user can independent of it context and understand in the segment
Hold.The information content that each segment is included is also different.In addition, any number of elements in attached drawing be used to example rather than
Limitation and any name are only used for distinguishing, without any restrictions meaning.
Below with reference to several representative embodiments of the invention, the principle and spirit of the present invention are explained in detail.
Summary of the invention
Model training mode based on the prior art, point of the model that training obtains to the video clip of different excellent degree
Distinguish that ability is limited.For this purpose, the embodiment of the invention provides a kind of model training method, method for processing video frequency and devices.Wherein, mould
Type training method includes: to obtain multiple video clips.Respectively the multiple video clip adds label, wherein the label
The effective information for being included for characterizing the video clip.Establish the neural network model comprising time dimension.Then, sharp
The neural network model is trained with the multiple video clip with label, obtains Optimized model, wherein optimizing
Model from video file for extracting the target video segment comprising maximum effective information.It is of the invention basic describing
After principle, various non-limiting embodiments of the present invention will be described in detail below.
Application scenarios overview
Referring initially to Figure 1A and Figure 1B elaborate the model training method of the embodiment of the present invention, method for processing video frequency and its
The application scenarios of device.
Figure 1A~1B schematically show the model training method, method for processing video frequency of embodiment according to the present invention and
The application scenarios of its device.
As shown in Figure 1A, which may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link etc..
Terminal device 101,102,103 can be various electronic equipments, can have identical or different computing capability, can
To support video playing, including but not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc.
Deng.Various client applications, such as video playback class application etc. can be installed (only to show on terminal device 101,102,103
Example).As shown in Figure 1B, the video counts that user is issued by the video playback class application viewing server 105 in terminal device 101
According to.
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Server 105 can be to provide the server of various services, such as provide view to terminal device 101,102,103
The back-stage management server (merely illustrative) of frequency evidence, Optimized model etc..Back-stage management server can be to the user received
Request analyze etc. processing, and by processing result (such as according to user's request or the webpage of generation, information or data
Deng) feed back to terminal device.
It should be noted that model training method and/or method for processing video frequency provided by the embodiment of the present disclosure generally may be used
To be executed by server 105.Correspondingly, model training apparatus provided by the embodiment of the present disclosure and/or video process apparatus one
As can be set in server 105.Model training method and/or method for processing video frequency provided by the embodiment of the present disclosure can also
To be executed by terminal device 101,102,103.Correspondingly, model training apparatus and/or video provided by the embodiment of the present disclosure
Processing unit can be set in terminal device 101,102,103.Alternatively, model training method provided by the embodiment of the present disclosure
And/or method for processing video frequency can also by can be communicated with terminal device 101,102,103 and/or server 105 other clothes
Business device or server cluster execute.Correspondingly, model training apparatus and/or video process apparatus provided by the embodiment of the present disclosure
Also it can be set in other servers or server set that can be communicated with terminal device 101,102,103 and/or server 105
In group.
It should be understood that number, the type of terminal device, network and server in Figure 1A~1B are only schematical.
According to actual needs, arbitrary number, any type of terminal device, network and server be can have.
Illustrative methods
Below with reference to the application scenarios of Figure 1A and Figure 1B, exemplary embodiment party according to the present invention is described with reference to Fig. 2~Fig. 6
The model training method and method for processing video frequency of formula.It should be noted that above-mentioned application scenarios are merely for convenience of understanding this hair
Bright spirit and principle and show, embodiments of the present invention are not limited in this respect.On the contrary, embodiment party of the invention
Formula can be applied to applicable any scene.
Fig. 2 schematically shows the flow charts of model training method according to an embodiment of the invention.
As shown in Fig. 2, this method may include following operation S210~S240.
In operation S210, multiple video clips are obtained.
In operation S220, respectively multiple video clips add label.
Wherein, label and video clip correspond, and are used to characterize the piece of video for the label of video clip addition
The effective information that section is included.The effective information that one video clip is included is bigger, and the video clip is for viewer
For should be more excellent.If any two video clip has same label, show the essence of the two video clips
Color degree is suitable.If any two video clip has different labels, show the excellent degree of the two video clips not
Together.
In operation S230, the neural network model comprising time dimension is established.
Wherein, for the information in video clip, temporal aspect is extremely important.This operation S203 is to usually used
Two-Dimensional Neural Network Model is extended, and establishes the neural network model comprising time dimension, and then can obtain piece of video
The timing information of section.Such as can establish the convolutional neural networks comprising time dimension, i.e. 3D convolutional network, convolution kernel therein
It is to increase 3D convolution kernel obtained from time dimension on the basis of 2D convolution kernel.Illustratively, 3D convolutional network can use
A variety of neural network models such as C3D, P3D, I3D, 2+1D, herein with no restrictions.
In operation S240, neural network model is trained using multiple video clips with label, is optimized
Model.
Wherein, the above-mentioned neural network model comprising time dimension is instructed using multiple video clips with label
Practice, the parameter of neural network model is constantly adjusted according to loss function.When loss function, which is realized, restrains, determine that training is completed,
Obtain Optimized model.The Optimized model from any video file for extracting the target video piece comprising maximum effective information
Section, i.e., extract most excellent video clip as target video segment from any video file.
It will be understood by those skilled in the art that method shown in Fig. 2 utilizes the video clip with label to including the time
The neural network model of dimension is trained, and label is used to characterize the effective information that video clip is included, what training obtained
The effective information that Optimized model can be included to unknown video clip is evaluated, so as to for literary from unknown video
The featured videos segment comprising maximum effective information is extracted in part.
In one embodiment of the invention, above-mentioned for label added by video clip may include: the first label, the
Two labels and third label.Wherein, the effective information of the first tag characterization is greater than the effective information of the second tag characterization, the
The effective information of two tag characterizations is greater than the effective information of third tag characterization.
In one embodiment of the invention, at least one of (1)~(2) multiple views can be obtained in the following way
Frequency segment is as training sample.
Mode (1) first obtains multiple video samples and image exchange format file relevant to multiple video samples.So
Afterwards for any video sample in multiple video samples, image exchange format file relevant to any video sample is determined
Starting position and end position in any Sample video are extracted from any video sample from starting position to end
The video clip of position.
For example, the gif file and former video data collection that are generated in open source data set video2GIF comprising a large number of users, it can
Therefrom to obtain multiple gif files and its corresponding video sample.GIF1 and video sample 1 are such as got, determines that GIF1 is being regarded
Initial position in frequency sample 1 be 13 points 21 seconds, end position be 25 points 11 seconds.It is possible thereby to from video sample 1 extract from
13 points of 21 seconds to 25 points 11 seconds video clips are as subsequent training sample.Wherein, gif file is substantially and is based on being extracted
Video clip obtained after overcompression etc. reason, to obtain complete video clip, gif file cannot be directlyed adopt, and
It needs to determine video clip according to the whole story position of gif file.
Since each gif file is that user's Manual interception from corresponding video sample makes, show the gif file
Quality it is higher, corresponding video clip includes more effective information, excellent degree with higher.Therefore, right
When the video clip that pass-through mode (1) extracts adds label, the characterization biggish label classification of effective information can be added.Show
Example property, for the video clip from starting position to end position extracted from any video sample, add the first mark
Label.
Further, since GIF included in open source data set is made by a large number of users, a variety of interest angles can be covered
The content of degree, therefore the video clip that pass-through mode (1) extraction obtains can also cover several scenes, meet following model training
Demand.
Mode (2), first obtains multiple video samples, then carries out Video segmentation processing to multiple video sample, obtains
Multiple video clips.
For example, multiple video samples can be obtained from multiple support channels.For each video sample, using FFmpeg tool into
Video segmentation processing of the row based on division scene detection, so as to obtain multiple video clips.It can manually be these videos
Segment adds label, such as clashes when a video clip can show object in video, the attraction of burst point, accident
It, can be to the video clip the first label of addition, table when complete segment (such as fireworks burst forth, goal of compete) of viewer's concern
Show that excellent degree is high.It, can be to the piece of video when a video clip shows general mo or scene (such as walk, drink water)
The second label of Duan Tianjia indicates that excellent degree is general.When a video clip shows static or almost static scene (such as one
People speaks, pondering without what other were acted) or show the more duplicate information of invalid redundancy (such as fast blink is shaken acutely)
When, third label can be added to the video clip, indicate that excellent degree is low, the concern of viewer will not be attracted, or even may be used also
Viewer can be caused to generate the discomforts such as spinning sensation.
In one embodiment of the invention, convolutional neural networks model is established, and is set for the convolutional neural networks model
Set one or more convolution kernels comprising time dimension.
Fig. 3 schematically shows the flow chart of model training process according to an embodiment of the invention, in explanation
Stating operation S240 utilizes multiple video clips with label to be trained to neural network model to obtain the mistake of Optimized model
Journey.
As shown in figure 3, the process may include following operation S241~S242.
In operation S241, multiple samples pair are constructed based on multiple video clips with label.
Wherein, each sample includes two video clips with different labels to (pair).For example, acquired more
In a video clip, choose two video clips every time and constitute a samples pair, the label of two video clips can be as
One of lower situation: (the first label, the second label), (the first label, third label) and (the second label, third label).By
This is it is found that each sample is relatively low to the video clip for having an excellent degree relatively high and an excellent degree
Video clip.
In operation S242, it is trained using multiple samples to neural network model, obtains Optimized model.
It is appreciated that the present embodiment utilizes the sample being made of two video clips with different labels to nerve net
Network is trained, so that gradually to learn into sample pair excellent degree in signal transduction process relatively high for neural network
Difference between video clip and the relatively low video clip of excellent degree, to obtain that different excellent degree can be differentiated
The Optimized model of video clip.
Illustratively, convolutional neural networks model is established, and includes time dimension for convolutional neural networks model setting
One or more convolution kernels, as neural network model used in this example.It is above-mentioned to utilize the multiple sample to described
Neural network model is trained, and obtaining the Optimized model may include: for any sample pair, and any sample is to including
The effective information of first video clip and the second video clip, the tag characterization of the first video clip is greater than the second video clip
Tag characterization effective information.By any sample to neural network model is input to, the first video clip is respectively obtained
The first score and the second video clip the second score.Then, the first numerical value is obtained after the first score being subtracted the second score.
Then, penalty values are determined based on the first numerical value using loss function, which is monotonous descending function.When penalty values be less than etc.
When predetermined threshold, determine that neural network model is Optimized model, when penalty values are greater than predetermined threshold, to neural network model
Parameter optimize, repeat aforesaid operations until obtain Optimized model.
Fig. 4 schematically shows the example schematic diagrams of model training process according to an embodiment of the invention.
In the example shown in Figure 4, neural network model is realized using the full articulamentum of 3D convolutional network model combination, it will
One sample is to the first video clip S in p+With the second video clip S-It is separately input into neural network model.Such as first view
Frequency segment is the video clip with the first label extracted by above mode (1), and the second video clip is by upper
The video clip with the second label that mode (2) obtains in text.Operation transmitting of the neural network model Jing Guo multilayer, output the
First score h (S of one video clip+) and the second video clip the second score h (S-), wherein h (S+) and h (S-) value
Section is [0,1].Based on the first score h (S+), the second score h (S-) and neural network model loss function determine
Penalty values lossp(S+, S-), it can be indicated by formula (1):
lossp(S+, S-)=max (0,1-h (S+)+h(S-))p
Formula (1)
It is appreciated that the target of above-mentioned training process is so that the relatively high video of the excellent degree of any sample centering
The score of segment is higher than the score of the relatively low video clip of excellent degree, with the target at a distance of remoter generated penalty values
It is bigger.When above-mentioned loss function, which is realized, restrains, determines that the training of neural network model is completed, obtain Optimized model.
With continued reference to Fig. 4, when two video clips of a sample centering are separately input into neural network model, for
Each video clip can first extract the image sequence comprising predetermined quantity image from the video clip, the image sequence
Extraction process can be uniform extraction.Then image sequence is input to neural network model, by neural network model to the figure
As sequential extraction procedures convolution 3D feature (Convolutional 3D (C3D) Feature), using one or more full articulamentums
(FullyConnected Layer), scoring layer (Scoring Layer) etc., finally obtain the score of the video clip.For example,
Input is to be exported by the image sequence of the image construction of 16 224 × 224 sizes as the score between 0~1.For trained
For the Optimized model arrived, the score of a video clip is higher, and the effective information for indicating that the video clip is included is bigger,
Excellent degree is higher.
Fig. 5 schematically shows the flow chart of method for processing video frequency according to an embodiment of the invention.
As shown in figure 5, this method may include following operation S510~S530.
In operation S510, video file is obtained.
In operation S520, Optimized model is obtained.
Wherein, Optimized model is obtained after being trained according to model training method described above to neural network model
, training process has hereinbefore been described in detail, and details are not described herein.
In operation S530, video file is handled using Optimized model, to extract comprising maximum effectively from video file
The target video segment of information content.
Wherein, video file is handled using Optimized model, can obtained for videos one or more in the video file
The score of segment, since the height of the score of video clip characterizes the size of the included effective information of video clip.
Illustratively, above-mentioned to handle video file using Optimized model, have to be extracted from video file comprising maximum
The process for imitating the target video segment of information content can be carried out as follows: first carry out Video segmentation to a video file
Processing, obtains multiple video clips to be measured.Each of multiple video clips to be measured video clip to be measured is separately input into excellent
Change model, respectively obtains the score of each video clip to be measured.The score of more multiple video clips to be measured, by highest scoring
Target video segment of the video clip to be measured as the video file.
In one embodiment of the invention, the above-mentioned target video segment extracted from video file can be used for making
The cover of the video file is effectively converted user with including that the cover of mass efficient information attracts the concern of user by this
For the viewer of the video file.Method for processing video frequency according to an embodiment of the present invention can also include: based on target video piece
Section production gif file, which as the cover of video file and is shown.
Fig. 6 schematically shows the example schematic diagram of video processing procedure according to an embodiment of the invention.
It as shown in fig. 6, video file is first divided into multiple video clips to be measured, such as may include: video clip t-
N, video clip t, video clip t+n, video clip t+m etc., wherein t, m, n are positive integer, and t is greater than n.Video segmentation
Process can be split based on scene, can also be divided based on the behavior of target object in video file.It is each to be measured
Video clip includes multiple consecutive images, and the testing image sequence comprising predetermined quantity image is extracted from each video clip to be measured
Column.For example, extracting the testing image sequence comprising 16 images from each video clip to be measured, testing image sequence is extracted
Process is uniformly extracted, as uniformly extracted 16 figures from 100 consecutive images that a video clip to be measured is included
Picture constitutes corresponding testing image sequence in order.Again by testing image sequence inputting corresponding to each video clip to be measured
It to Optimized model, is scored by Optimized model each video clip to be measured, exports the score of each video clip to be measured.
Illustratively, video clip t-n is scored at 0.6, and video clip t is scored at 0.7, and video clip t+n is scored at 0.8,
Video clip t+m is scored at 0.1, etc..It determines that the video clip t+n of highest scoring is target video segment, is based on the mesh
It marks video clip and makes corresponding gif file, using the cover as video file.
It can be seen from the above, method for processing video frequency according to an embodiment of the present invention can be extracted intelligently from video file
Wonderful comprising maximum effective information, and be fabricated to gif file and show as the cover of the video file in user face
Before.User can be helped quickly to know content interesting in video file, attract user's viewing.User experience can be improved,
Increase user's clicking rate and stay time.
Exemplary means
After describing the method for exemplary embodiment of the invention, next, showing with reference to Fig. 7 and Fig. 8 the present invention
The model training apparatus and video process apparatus of example property embodiment are described in detail.
Fig. 7 schematically shows the block diagram of model training apparatus according to an embodiment of the invention.
As shown in fig. 7, the model training apparatus 700 includes: the first acquisition module 710, mark module 720, modeling module
730 and training module 740.
First acquisition module 710 is for obtaining multiple video clips.
Mark module 720 is for being respectively that multiple video clips add label.Wherein label is for characterizing video clip institute
The effective information for including.
Modeling module 730 is for establishing the neural network model comprising time dimension.
Training module 740 is used to be trained neural network model using multiple video clips with label, obtains
Optimized model.Wherein Optimized model from video file for extracting the target video segment comprising maximum effective information.
In one embodiment of the invention, above-mentioned label includes: the first label, the second label and third label.First
The effective information of tag characterization is greater than the effective information of the second tag characterization, and the effective information of the second tag characterization is greater than
The effective information of third tag characterization.
In another embodiment of the invention, training module 740 includes that sample is sub to training to building submodule and sample
Module.Wherein, sample is used to construct multiple samples pair based on multiple video clips with label to building submodule, wherein often
A sample is to including two video clips with different labels.Sample is used for using multiple samples to mind training submodule
It is trained through network model, obtains Optimized model.
In another embodiment of the present invention, sample is specifically used for training submodule: any sample is to including the first view
Frequency segment and the second video clip, the effective information of the tag characterization of the first video clip are greater than the label of the second video clip
The effective information of characterization.For any sample pair, by any sample to neural network model is input to, the first view is respectively obtained
First score of frequency segment and the second score of the second video clip.Then first is obtained after the first score being subtracted the second score
Numerical value, and penalty values are determined based on the first numerical value using loss function, wherein loss function is monotonous descending function.When penalty values are small
When being equal to predetermined threshold, determine that neural network model is Optimized model.When penalty values are greater than predetermined threshold, to neural network
The parameter of model optimizes, and repeats aforesaid operations until obtaining Optimized model.
In one more embodiment of the present invention, sample is to training submodule by any sample to being input to neural network model
Process may is that sample to training submodule for each video clip of any sample centering, mentioned from the video clip
The image sequence comprising predetermined quantity image is taken, and the image sequence is input to neural network model.
In one more embodiment of the present invention, modeling module 730 is specifically used for establishing convolutional neural networks model, and to be somebody's turn to do
The setting of convolutional neural networks model includes one or more convolution kernels of time dimension.
In one more embodiment of the present invention, first, which obtains module 710, includes sample acquisition submodule and snippet extraction
Module.Wherein, sample acquisition submodule is for obtaining multiple video samples and image relevant to the multiple video sample
Exchange format file.Snippet extraction submodule is used for for any video sample in multiple video samples, determining and any view
Starting position and end position in the relevant image exchange format file of a frequency sample Sample video in office, from any video sample
The video clip from starting position to end position is extracted in this.
In one more embodiment of the present invention, mark module 720 is specifically used for for extracting from any video sample
From starting position to the video clip of end position, the first label is added.
In one more embodiment of the present invention, first, which obtains module 710, includes sample acquisition submodule and dividing processing
Module.Wherein, sample acquisition submodule is for obtaining multiple video samples.Dividing processing submodule is used for multiple video samples
Video segmentation processing is carried out, multiple video clips are obtained.
Fig. 8 schematically shows the block diagram of video process apparatus according to an embodiment of the invention.
As shown in figure 8, the video process apparatus 800 includes: the second acquisition module 810 and extraction module 820.
Second acquisition module 810 is for obtaining video file.
Extraction module 820 is used to handle video file using Optimized model, to extract from video file comprising maximum
The target video segment of effective information.Wherein Optimized model be any one of based on the above embodiment described in model training side
Method training obtains.
In one embodiment of the invention, extraction module 820 include: Video segmentation submodule, evaluation submodule and
Determine submodule.Video segmentation submodule is used to carry out Video segmentation processing to a video file, obtains multiple videos to be measured
Segment.Submodule is evaluated to be used for for any video clip to be measured in the multiple video clip to be measured, by it is described it is any to
It surveys video clip and is input to the Optimized model, obtain the score of any video clip to be measured.Determine submodule for will
The video clip to be measured of highest scoring is as the target video segment.
In another embodiment of the invention, evaluation submodule, which is specifically used for extracting from any video clip to be measured, includes
The testing image sequence of the predetermined quantity image, and by testing image sequence inputting to Optimized model.
In another embodiment of the present invention, above-mentioned video process apparatus 800 further includes display module, for being based on target
Video clip makes image exchange format file, which as the cover of video file and is opened up
Show.
It should be noted that in device section Example each module/unit/subelement etc. embodiment, the skill of solution
Art problem, the function of realization and the technical effect reached respectively with the implementation of corresponding step each in method section Example
Mode, the technical issues of solving, the function of realization and the technical effect that reaches are same or like, and details are not described herein.
Exemplary media
After describing the method and apparatus of exemplary embodiment of the invention, next, to the exemplary reality of the present invention
Apply mode, be introduced for realizing the medium of model training method and/or method for processing video frequency.
The embodiment of the invention provides a kind of media, are stored with computer executable instructions, above-metioned instruction is by processor
For realizing model training method and/or method for processing video frequency described in any one of above method embodiment when execution.
In some possible embodiments, various aspects of the invention are also implemented as a kind of shape of program product
Formula comprising program code, when described program product is run on the computing device, said program code is for making the calculating
Equipment executes described in above-mentioned " illustrative methods " part of this specification the mould of various illustrative embodiments according to the present invention
Operating procedure in type training method and/or method for processing video frequency.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, red
The system of outside line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing
(non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory
(RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc
Read memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Fig. 9 schematically shows the schematic diagram of the computer readable storage medium product of embodiment according to the present invention,
As shown in figure 9, describe embodiment according to the present invention for realizing model training method and/or method for processing video frequency
Program product 90, can be using portable compact disc read only memory (CD-ROM) and including program code, and can count
It calculates and is run in equipment, such as PC.However, program product of the invention is without being limited thereto, and in this document, readable storage medium
Matter can be any tangible medium for including or store program, which, which can be commanded execution system, device or device, makes
With or it is in connection.
Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying
Readable program code.The data-signal of this propagation can take various forms, including --- but being not limited to --- electromagnetism letter
Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can
Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Program in connection.
The program code for including on readable medium can transmit with any suitable medium, including --- but being not limited to ---
Wirelessly, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language --- and such as Java, C++ etc. further include routine
Procedural programming language --- such as " C ", language or similar programming language.Program code can fully exist
It is executed in user calculating equipment, part executes on a remote computing or completely remote on the user computing device for part
Journey calculates to be executed on equipment or server.In the situation for being related to remote computing device, remote computing device can be by any
The network of type --- it is connected to user calculating equipment including local area network (LAN) or wide area network (WAN) one, alternatively, can connect
To external computing device (such as being connected using ISP by internet).
Exemplary computer device
After method, medium and the device for describing exemplary embodiment of the invention, next, introducing according to this hair
The calculating equipment for realizing model training method and/or method for processing video frequency of bright another exemplary embodiment.
The embodiment of the invention also provides a kind of calculating equipment, comprising: memory, processor and storage are on a memory simultaneously
The executable instruction that can be run on a processor, the processor are realized any in above method embodiment when executing described instruction
Model training method and/or method for processing video frequency described in.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
In some possible embodiments, according to the present invention to be handled for realizing model training method and/or video
The calculating equipment of method can include at least at least one processing unit and at least one storage unit.Wherein, the storage
Unit is stored with program code, when said program code is executed by the processing unit, so that the processing unit executes sheet
The model training method of various illustrative embodiments according to the present invention described in above-mentioned " illustrative methods " part of specification
And/or the operating procedure in method for processing video frequency.
Described referring to Figure 10 this embodiment according to the present invention for realizing model training method and/or
The calculating equipment 100 of method for processing video frequency.Calculating equipment 100 as shown in Figure 10 is only an example, should not be to the present invention
The function and use scope of embodiment bring any restrictions.
As shown in Figure 10, equipment 100 is calculated to show in the form of universal computing device.The component for calculating equipment 100 can be with
Including but not limited to: at least one above-mentioned processing unit 1001, connects not homologous ray group at least one above-mentioned storage unit 1002
The bus 1003 of part (including storage unit 1002 and processing unit 1001).
Bus 1003 includes data/address bus, address bus and control bus.
Storage unit 1002 may include volatile memory, such as random access memory (RAM) 10021 and/or height
Fast buffer memory 10022 can further include read-only memory (ROM) 10023.
Storage unit 1002 can also include program/utility with one group of (at least one) program module 10024
10025, such program module 10024 includes but is not limited to: operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.
Calculating equipment 100 can also be with one or more external equipments 1004 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communicate, this communication can be carried out by input/output (I/O) interface 1005.Also, calculating equipment 100 can also pass through
Network adapter 1006 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, example
Such as internet) communication.As shown, network adapter 1006 is communicated by bus 1003 with the other modules for calculating equipment 100.
It should be understood that using other hardware and/or software module although not shown in the drawings, can combine and calculate equipment 100, including but not
Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and
Data backup storage system etc..
It should be noted that although being referred to several lists of model training apparatus and video process apparatus in the above detailed description
Member/module or subelement/module, but it is this division be only exemplary it is not enforceable.In fact, according to the present invention
Embodiment, the feature and function of two or more above-described units/modules can be specific in a units/modules
Change.Conversely, the feature and function of an above-described units/modules can with further division be by multiple units/modules Lai
It embodies.
In addition, although describing the operation of the method for the present invention in the accompanying drawings with particular order, this do not require that or
Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired
As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one
Step is decomposed into execution of multiple steps.
Although detailed description of the preferred embodimentsthe spirit and principles of the present invention are described by reference to several, it should be appreciated that, this
It is not limited to the specific embodiments disclosed for invention, does not also mean that the feature in these aspects cannot to the division of various aspects
Combination is benefited to carry out, this to divide the convenience merely to statement.The present invention is directed to cover appended claims spirit and
Included various modifications and equivalent arrangements in range.
Claims (10)
1. a kind of model training method, comprising:
Obtain multiple video clips;
Respectively the multiple video clip adds label, wherein the label, which is used to characterize the video clip, is included
Effective information;
Establish the neural network model comprising time dimension;And
The neural network model is trained using the multiple video clip with label, obtains Optimized model, institute
Optimized model is stated for extracting the target video segment comprising maximum effective information from video file.
2. according to the method described in claim 1, wherein, the label includes: the first label, the second label and third label,
The effective information of first tag characterization is greater than the effective information of second tag characterization, second tag characterization
Effective information be greater than the third tag characterization effective information.
3. according to the method described in claim 2, wherein, the multiple video clip of the utilization with label is to the mind
It is trained through network model and includes:
Multiple samples pair are constructed based on the multiple video clip with label, and the sample is to including with different labels
Two video clips;And
It is trained using the multiple sample to the neural network model, obtains the Optimized model.
4. according to the method described in claim 3, wherein, it is described using the multiple sample to the neural network model into
Row training, obtaining the Optimized model includes:
For any sample pair, any sample is to including the first video clip and the second video clip, first video
The effective information of the tag characterization of segment is greater than the effective information of the tag characterization of second video clip,
By any sample to the neural network model is input to, the first score of first video clip is respectively obtained
With the second score of second video clip;
The first numerical value is obtained after first score is subtracted second score;
Penalty values are determined based on first numerical value using loss function, the loss function is monotonous descending function;
When the penalty values are less than or equal to predetermined threshold, determine that the neural network model is the Optimized model;And
When the penalty values are greater than the predetermined threshold, the parameter of the neural network model is optimized, is repeated above-mentioned
Operation is until obtain the Optimized model.
5. described to be input to the neural network mould in any sample pair according to the method described in claim 4, wherein
Type includes:
For each video clip of any sample centering, the figure comprising predetermined quantity image is extracted from the video clip
As sequence;And
By described image sequence inputting to the neural network model.
6. a kind of method for processing video frequency, comprising:
Obtain video file;And
The video file is handled using Optimized model, to be extracted from the video file comprising maximum effective information
Target video segment, the Optimized model are obtained based on such as model training method according to any one of claims 1 to 5
's.
7. a kind of model training apparatus, comprising:
First obtains module, for obtaining multiple video clips;
Mark module adds label for respectively the multiple video clip, wherein the label is for characterizing the video
The effective information that segment is included;
Modeling module, for establishing the neural network model comprising time dimension;And
Training module is obtained for being trained using the multiple video clip with label to the neural network model
To Optimized model, the Optimized model from video file for extracting the target video segment comprising maximum effective information.
8. a kind of video process apparatus, comprising:
Second obtains module, for obtaining video file;And
Extraction module, for handling the video file using Optimized model, to extract from the video file comprising most
The target video segment of big effective information, the Optimized model are based on such as model according to any one of claims 1 to 5
What training method obtained.
9. a kind of medium, be stored with computer executable instructions, described instruction when being executed by processor for realizing:
Model training method as described in any one of claims 1 to 5;And/or
Method for processing video frequency as claimed in claim 6.
10. a kind of calculating equipment, comprising: memory, processor and storage on a memory and can run on a processor can
It executes instruction, the processor is realized when executing described instruction:
Model training method as described in any one of claims 1 to 5;And/or
Method for processing video frequency as claimed in claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910811249.3A CN110516749A (en) | 2019-08-29 | 2019-08-29 | Model training method, method for processing video frequency, device, medium and calculating equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910811249.3A CN110516749A (en) | 2019-08-29 | 2019-08-29 | Model training method, method for processing video frequency, device, medium and calculating equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110516749A true CN110516749A (en) | 2019-11-29 |
Family
ID=68629341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910811249.3A Pending CN110516749A (en) | 2019-08-29 | 2019-08-29 | Model training method, method for processing video frequency, device, medium and calculating equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110516749A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111314665A (en) * | 2020-03-07 | 2020-06-19 | 上海中科教育装备集团有限公司 | Key video segment extraction system and method for video post-scoring |
CN111984821A (en) * | 2020-06-22 | 2020-11-24 | 汉海信息技术(上海)有限公司 | Method and device for determining dynamic cover of video, storage medium and electronic equipment |
CN113032624A (en) * | 2021-04-21 | 2021-06-25 | 北京奇艺世纪科技有限公司 | Video viewing interest degree determining method and device, electronic equipment and medium |
CN113132753A (en) * | 2019-12-30 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Data processing method and device and video cover generation method and device |
CN113259601A (en) * | 2020-02-11 | 2021-08-13 | 北京字节跳动网络技术有限公司 | Video processing method and device, readable medium and electronic equipment |
EP3961491A1 (en) * | 2020-08-25 | 2022-03-02 | Beijing Xiaomi Pinecone Electronics Co., Ltd. | Method for extracting video clip, apparatus for extracting video clip, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864411A (en) * | 2017-10-31 | 2018-03-30 | 广东小天才科技有限公司 | A kind of picture output method and terminal device |
CN109934249A (en) * | 2018-12-14 | 2019-06-25 | 网易(杭州)网络有限公司 | Data processing method, device, medium and calculating equipment |
CN110166827A (en) * | 2018-11-27 | 2019-08-23 | 深圳市腾讯信息技术有限公司 | Determination method, apparatus, storage medium and the electronic device of video clip |
-
2019
- 2019-08-29 CN CN201910811249.3A patent/CN110516749A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864411A (en) * | 2017-10-31 | 2018-03-30 | 广东小天才科技有限公司 | A kind of picture output method and terminal device |
CN110166827A (en) * | 2018-11-27 | 2019-08-23 | 深圳市腾讯信息技术有限公司 | Determination method, apparatus, storage medium and the electronic device of video clip |
CN109934249A (en) * | 2018-12-14 | 2019-06-25 | 网易(杭州)网络有限公司 | Data processing method, device, medium and calculating equipment |
Non-Patent Citations (2)
Title |
---|
YIFAN JIAO 等: ""Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection", 《IEEE TRANSACTIONS ON MULTIMEDIA》 * |
YIFAN JIAO 等: "Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection", 《IEEE TRANSACTIONS ON MULTIMEDIA》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113132753A (en) * | 2019-12-30 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Data processing method and device and video cover generation method and device |
CN113259601A (en) * | 2020-02-11 | 2021-08-13 | 北京字节跳动网络技术有限公司 | Video processing method and device, readable medium and electronic equipment |
US11996124B2 (en) | 2020-02-11 | 2024-05-28 | Beijing Bytedance Network Technology Co., Ltd. | Video processing method, apparatus, readable medium and electronic device |
CN111314665A (en) * | 2020-03-07 | 2020-06-19 | 上海中科教育装备集团有限公司 | Key video segment extraction system and method for video post-scoring |
CN111984821A (en) * | 2020-06-22 | 2020-11-24 | 汉海信息技术(上海)有限公司 | Method and device for determining dynamic cover of video, storage medium and electronic equipment |
EP3961491A1 (en) * | 2020-08-25 | 2022-03-02 | Beijing Xiaomi Pinecone Electronics Co., Ltd. | Method for extracting video clip, apparatus for extracting video clip, and storage medium |
US11847818B2 (en) | 2020-08-25 | 2023-12-19 | Beijing Xiaomi Pinecone Electronics Co., Ltd. | Method for extracting video clip, device for extracting video clip, and storage medium |
CN113032624A (en) * | 2021-04-21 | 2021-06-25 | 北京奇艺世纪科技有限公司 | Video viewing interest degree determining method and device, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110531860B (en) | Animation image driving method and device based on artificial intelligence | |
CN110516749A (en) | Model training method, method for processing video frequency, device, medium and calculating equipment | |
US20220180882A1 (en) | Training method and device for audio separation network, audio separation method and device, and medium | |
CN110580500B (en) | Character interaction-oriented network weight generation few-sample image classification method | |
CN110519636B (en) | Voice information playing method and device, computer equipment and storage medium | |
CN111476871B (en) | Method and device for generating video | |
CN106126524B (en) | Information pushing method and device | |
CN111611436A (en) | Label data processing method and device and computer readable storage medium | |
EP3095091A1 (en) | Method and apparatus of processing expression information in instant communication | |
CN110209810B (en) | Similar text recognition method and device | |
CN113766299B (en) | Video data playing method, device, equipment and medium | |
CN112203115B (en) | Video identification method and related device | |
CN111738010B (en) | Method and device for generating semantic matching model | |
CN110880324A (en) | Voice data processing method and device, storage medium and electronic equipment | |
KR102284862B1 (en) | Method for providing video content for programming education | |
CN113395578A (en) | Method, device and equipment for extracting video theme text and storage medium | |
CN111159380A (en) | Interaction method and device, computer equipment and storage medium | |
CN110531849A (en) | A kind of intelligent tutoring system of the augmented reality based on 5G communication | |
CN114461853A (en) | Training sample generation method, device and equipment of video scene classification model | |
CN110855487A (en) | Network user similarity management method, device and storage medium | |
CN110309753A (en) | A kind of race process method of discrimination, device and computer equipment | |
CN110516153B (en) | Intelligent video pushing method and device, storage medium and electronic device | |
CN110585730B (en) | Rhythm sensing method and device for game and related equipment | |
CN112016077A (en) | Page information acquisition method and device based on sliding track simulation and electronic equipment | |
CN112115703B (en) | Article evaluation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191129 |
|
RJ01 | Rejection of invention patent application after publication |