CN109885728A - Video summarization method based on meta learning - Google Patents

Video summarization method based on meta learning Download PDF

Info

Publication number
CN109885728A
CN109885728A CN201910037959.5A CN201910037959A CN109885728A CN 109885728 A CN109885728 A CN 109885728A CN 201910037959 A CN201910037959 A CN 201910037959A CN 109885728 A CN109885728 A CN 109885728A
Authority
CN
China
Prior art keywords
video
model
parameter
learner
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910037959.5A
Other languages
Chinese (zh)
Other versions
CN109885728B (en
Inventor
李学龙
李红丽
董永生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201910037959.5A priority Critical patent/CN109885728B/en
Publication of CN109885728A publication Critical patent/CN109885728A/en
Application granted granted Critical
Publication of CN109885728B publication Critical patent/CN109885728B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of video summarization methods based on meta learning.The abstract problem of each video is considered as an independent video summary task by thought of this method based on meta learning, and learner model is learnt in video summary task space, to improve the Generalization Capability of model, explores video frequency abstract mechanism.Specifically, the present invention is used as learner model with video frequency abstract shot and long term Memory Neural Networks (video summarization Long Short Term Memory neural network, vsLSTM).The method of the present invention specifically includes that all tasks (data are concentrated with the abstract problem of each video) are randomly divided into training set and test set by (1);(2) the two stages Learning Scheme that learner model is proposed by this method is learnt between the task of training set, is explored to video frequency abstract mechanism;(3) Performance Evaluation is completed in performance of the test model on test set.

Description

Video summarization method based on meta learning
Technical field
The invention belongs to one of technical field of computer vision and the critical issue of machine learning and pattern-recognition.This Invention makes a summary to video, extracts key frame therein, can reduce people and browse the time of video, and can be applied to video Retrieval, video management etc..
Background technique
With mobile phone, mobile camera etc. can capture apparatus it is widely available, emerge the video data of magnanimity, and daily There is a large amount of video data to generate and propagate.On the one hand, these data provide information abundant for people, on the other hand, The time for browsing, retrieving the consumption of these video datas is also considerable.In this context, video frequency abstract is concentrated as a kind of video Technology has obtained the extensive concern of computer vision field researcher.
Video frequency abstract refers to through semi- or fully automated mode, analyzes spatio-temporal redundancies existing for video structure and content, The redundancy segment (frame) in original video is removed, and extracts wherein significant segment (frame).People not only can be improved in it The efficiency of video is browsed, can also lay the foundation for subsequent video analysis and processing, be widely used in video frequency searching, video management Etc..It generates so far, has been a concern from it, emerge many representative methods.But because different people is browsing Focus is different when video, and up to the present, there are no a videos that is pervasive or can fully meeting people's needs to pluck Method is wanted, therefore, there are also wide exploration spaces for the research of video frequency abstract algorithm.
Because of the intrinsic structure of video data, serializing characteristic and shot and long term Memory Neural Networks (Long Short Term Memory neural network, LSTM) excellent Series Modeling performance, nearest method uses LSTM as basic mostly Model.If Zhang et al. is in document K.Zhang, W.L.Chao, F.Sha, and K.Grauman, " video summarization with long short term memory,”in Proc.Eur.Conf.Comput.Vis., Pp.766-782,2016. in propose video frequency abstract shot and long term memory network (video summarization LSTM, ) and determinant point process shot and long term memory network (determinantal point process LSTM, dppLSTM) vsLSTM It is two exemplary videos abstract network model based on basic LSTM model refinement in recent years, it can be well to different long in video The time-dependent relation of degree models;Zhou and Qiao is in document K.Zhou and Y.Qiao, " deep reinforcement learning for unsupervised video summarization with diversity Representativeness reward, " a variety of unsupervised and supervision version that proposes in arXiv:18.01.00054,2017. The thought that this depth abstract network (Deep Summarization Network, DSN) learns deeply incorporates LSTM net In the learning process of network, with the structural features of better captured video data;Ji et al. in document Z.Ji, K.Xiong, Y.Pang,and X.Li,“video summarization with attention-based encoder-decoder Networks, " the coding decoder video frequency abstract net based on attention mechanism that proposes in arXiv:1708.09545,2017. Network structure (Attention encoder-decoder networks for Video Summarization, AVS) will be with LSTM is that the encoder of basic model and the decoder based on attention mechanism are implemented in combination with the extraction to key frame of video.
There are the problem of:
1) it more focuses on the structure of video data or serializes characteristic, rather than video summary task itself;
2) model is required to explore the mechanism of video frequency abstract not yet explicitly, model generalization ability is not good enough.
Summary of the invention
Technical problems to be solved
For the deficiency of above-mentioned existing method, the present invention provides a kind of video summarization method based on meta learning.This method Abstract problem to each video is considered as an independent video summary task, the study of model by the thought based on meta learning It is carried out in video summary task space, so that it more focuses on video summary task itself;By in video summary task The study in space, this method explicitly require model to explore a kind of video frequency abstract mechanism, to improve the Generalization Capability of model.
Technical solution
A kind of video summarization method based on meta learning, it is characterised in that steps are as follows:
Step 1: preparing data set
Use open source video frequency abstract data set SumMe, TVSum, Youtube and OVP: when using SumMe as test set, Youtube and OVP is as training set, and TVSum is as verifying collection;When using TVSum as test set, Youtube and OVP are as instruction Practice collection, SumMe is as verifying collection;
Step 2: extracting video frame feature
Video frame is input to GoogleLeNet network, and using the output of network layer second from the bottom as its depth characteristic; Use color histogram, GIST, HOG and dense SIFT as traditional characteristic, wherein color histogram extracts from video frame RGB form, other traditional characteristics extract from the corresponding grayscale image of video frame;
Step 3: training video abstract model
The two stages algorithm for training network based on meta learning thought is used to carry out learner model vsLSTM network f θ ginseng Number θ study, training before by model parameter θ random initializtion be θ0, i-th iteration is by model parameter by θi-1It is updated to θi, Each iteration is made of two stage stochastic gradient descent process in training:
First stage is by parameter by θi-1It is updated toSelect a task at random from training setCalculate learner In parameter current θi-1Performance under state in the taskAnd loss functionIt asksIt is right θi-1Derivative and renewal learning person's parameter θi-1ExtremelyThen learner model can be calculated againPerformance in the task And update its parameterThe update of this parameter can carry out n times, and wherein n is positive integer, be shown below:
Wherein, α indicates learning rate,WithIt is learner modelWithIn taskOn L1Loss function, wherein the parameter of learner model is respectively θi-1WithL1Loss function is defined as:
Wherein y indicates the output vector of model, and x indicates that ground truth vector, N indicate the number of element in vector;
Second stage by parameter byIt is updated to θi: select a task at random from training setLearner is calculated to exist ParameterPerformance under state in the taskAnd loss functionIt asksTo θi-1Derivative simultaneously Renewal learning person parameter is to θi, as shown in formula (3):
Wherein β indicates meta learning rate, is used as hyper parameter in the methods of the invention;It is learner model In taskOn L1Loss function, wherein the parameter of learner model be
This two stages training algorithm as meta learning person's model guidance learning person's model vsLSTM training to carry out video The exploration of digest mechanism, by maximizing generalization ability of the learner model on test set, i.e. minimum learner model exists The extensive error of expectation on test set, can acquire the parameter θ of learner model in successive ignition;
Step 4: the video frame feature in step 2 is input in the trained learner model vsLSTM network of step 3, The probability that every frame is selected into video frequency abstract can be obtained.
Specific step is as follows for step 4: first according to the probability or score of vsLSTM output, video being divided into timing not The segment of intersection;Then using the average value of video frame score in each segment as the score of the video clip, and according to video The score of segment carries out descending sort to video clip;Sequentially retain since the video clip of maximum probability, to avoid choosing Abstract result it is too long, the stopping when the video clip total length of reservation reaches the 15% of original video length, the view chosen at this time Abstract result of the frequency segment as original video.
Beneficial effect
A kind of video summarization method based on meta learning proposed by the present invention, has the beneficial effect that:
1) video frequency abstract is solved the problems, such as using the thought of meta learning for the first time;
2) simple and effective video frequency abstract model training method is proposed, model is made more to focus on video summary task Itself;
3) it is intended to improve the generalization ability of model, is distinctly claimed video frequency abstract model and video frequency abstract mechanism is explored;
4) it proves that inventive algorithm has the characteristics that advanced, validity by the comparison of qualitative and quantitative experiment, has very high Practical application value.
Detailed description of the invention
Fig. 1 is the conceptive whole flow chart of the present invention
Fig. 2 is an iteration process schematic that the present invention proposes training method
Fig. 3 is performance schematic diagram of the present invention under different hyper parameters
Fig. 4 is visualization result figure of the invention
Specific embodiment
Now in conjunction with embodiment, attached drawing, the invention will be further described:
Realize technical solution of the present invention the following steps are included:
1) prepare data set
This method use open source video frequency abstract data set SumMe (M.Gygli, H.Grabner, H.Riemenschneider,and L.Van Gool,“creating summaries from user videos,”in Proc.Eur.Conf.Comput.Vis.,pp.505-520,2014)、TVSum(Y.Song,J.Vallmitjana, A.Stent,and A.Jaimes,“TvSum:summarizing web videos using titles,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit.,pp.5179-5187,2015)、Youtube(S.E.F.De Avila, A.P.B.Lopes,A.da luz Jr,and A.de Albuquerque Ara ujo,“Vssum:a mechanism designed to produce static video summarizies and a novel evaluation method,” Pattern Recognit.Lett., vol.32, no.1, pp.56-68,2011) and OVP (open video project,http://www.open-video.org/.)。
For the Generalization Capability of search model, successively use SumMe or TVSum as test set, the other three is as training Collect with verifying.When using SumMe as test set, TVSum, Youtube and OVP are as training and verifying collection;When being with SumMe When test set, Youtube and OVP are as training set, and TVSum is as verifying collection;When using TVSum as test set, Youtube and OVP is as training set, and SumMe is as verifying collection.
2) video frame feature is extracted
The method of the present invention uses depth and the two kinds of feature of tradition respectively to verify the validity of model.By video frame Be input to GoogleLeNet (C.Szegedy, W.Liu, Y.Jia, P.Sermanet, S.Reed, D.Angueloy, D.Erhan, V.vanhoucke,A.Rabinovich et al,“going deeper with convolutions,”in Porc.IEEE Conf.Comput.Vis.Pattern Recognit., 2015) the layer second from the bottom output in network model is special as its depth Sign, traditional characteristic use color histogram, GIST, HOG (Histogram of Oriented Gradient) and dense SIFT (Scale Invariant Feature Transform), wherein color histogram extracts from the RGB form of video frame, Other traditional characteristics extract from the corresponding grayscale image of video frame.
3) training video abstract model
This method propose the two stages algorithm for training network MetaL-VS based on meta learning thought, training in every time iteration by Two stage stochastic gradient descent algorithm composition, this two stages training algorithm is as meta learning person's model guidance learning person's model Training, vsLSTM carry out the exploration of video frequency abstract mechanism as learner model.
As shown in Figure 1, thought of this method based on meta learning, by the abstract problem to each video be considered as one it is independent Video summary task, model are learnt in video summary task space, finally, by the way that the abstract problem of test video to be considered as New task, model can obtain the corresponding abstract of the video.Specifically, method proposes the two stages nets based on meta learning thought Network training algorithm is to carry out learner model (this method uses vsLSTM network as learner model when realizing) fθParameter θ Study.It can be by model parameter by θ as shown in Fig. 2, setting i-th iterationi-1It is updated to θiIt is (before training that model parameter is initial at random Turn to θ0), training in every time iteration be made of two stage stochastic gradient descent process.
First stage is by parameter by θi-1It is updated to(n=2 in illustrated case): it selects one at random from training set and appoints BusinessLearner is calculated in parameter current θi-1Performance under state in the taskAnd loss functionIt asksTo θi-1Derivative and renewal learning person's parameter θi-1ExtremelyThen learner's mould can be calculated again TypePerformance in the task simultaneously updates its parameterTheoretically it is secondary can to carry out n (n is positive integer) for the update of this parameter, such as formula (1) shown in:
Wherein α indicates learning rate, in the methods of the invention as hyper parameter;WithIt is to learn Habit person's modelWithIn taskOn L1Loss function, wherein the parameter of learner model is respectively θi-1With L1Loss function is defined as:
Wherein y indicates the output vector of model, and x indicates that ground truth vector, N indicate the number of element in vector.
Second stage by parameter byIt is updated to θi: select a task at random from training setLearner is calculated to exist ParameterPerformance under state in the taskAnd loss functionIt asksTo θi-1Derivative simultaneously Renewal learning person parameter is to θi, as shown in formula (3):
Wherein β indicates meta learning rate, is used as hyper parameter in the methods of the invention;It is learner model In taskOn L1Loss function, wherein the parameter of learner model be
This two stages training algorithm as meta learning person's model guidance learning person model (vsLSTM) training to regard The exploration of frequency digest mechanism.It (minimizes learner model to exist by maximizing generalization ability of the learner model on test set The extensive error of expectation on test set), the parameter θ of learner model can be acquired in successive ignition.
4) video frequency abstract is exported
The input of this video frequency abstract model is video frame feature (depth or traditional characteristic), and output is that every frame is selected in video Entering the probability of abstract, (output is a vector, and each element is more than or equal to 0 and is less than or equal to 1 in vector, and the length of vector is equal to frame Number, i.e., each essence indicates that corresponding video frame is selected into the probability of video frequency abstract in vector, it is understood that for the weight for being the frame The property wanted score.).According to document K.Zhang, W.L.Chao, F.Sha, and K.Grauman, " video summarization With long short term memory, " in Proc.Eur.Conf.Comput.Vis., pp.766-782,2016. Method, abstract result can be converted by the result of this method.The feature of each frame of test video is input to trained study Video summary results can be obtained by processing in person's model.
Specific steps: first according to the probability or score of vsLSTM output, with Kernel Temporal Segmentation (KTS) is (according to document K.Zhang, W.L.Chao, F.Sha, and K.Grauman, " video summarization with long short term memory,”in Proc.Eur.Conf.Comput.Vis., Pp.766-782,2016.) video is divided into disjoint segment in timing by method;Then by video frame score in each segment Score of the average value as the video clip descending sort is carried out to video clip and according to the score of video clip;From most The video clip of maximum probability starts sequentially to retain (by the sequence of video clip score from high to low), to avoid the abstract knot chosen Fruit is too long, the stopping when the video clip total length of reservation reaches the 15% of original video length, and the video clip chosen at this time is made For the abstract result of original video.
1) simulated conditions
The present invention is to be in central processing unitI5-3470 3.2GHz GPU, memory 16G, Centos operating system On, the emulation of python program is carried out with Anaconda software.Data set used in experiment is from disclosed database It obtains:
SumMe dataset(http://classif.ai/dataset/ethz-cvl-video-summe)
TVSum dataset(https://github.com/yalesong/tvsum)
Youtube dataset(http://www.npdi.dcc.ufmg.br/VSUMM)
OVP dataset(http://www.open-video.org)
Wherein SumMe data set includes 25 mark videos, respectively there is 50 mark views in TVSum, Youtube and OVP Frequently.In training learner model, training set includes ground truth, and the ground truth of test set is hidden.When with When SumMe is test set, 10 videos are randomly selected from TVSum as verifying collection, remaining video and Youtube in TVSum, Video in OVP collectively constitutes training set;When using TVSum as test set, 25 video conducts are selected at random from TVSum Test set, wherein remaining video, as verifying collection, the other three data set forms training set.In our experiment, it uses Test set verifies the validity of our methods.Performance Evaluating Indexes are F-score F:
Wherein P indicates precision (precision), and R indicates recall rate (recall):
Wherein A indicates the abstract of model generation as a result, B indicates ground truth.
2) emulation content
(1) make the better hyper parameter of the method for the present invention performance (learning rate learning rate, lr, member to show to explore The process of habit rate meta learning rate, mlr and first stage parameter update times n), in an experiment, we The assessment experiment of different hyper parameter drag performances is carried out.
Fig. 3 illustrates the performance of different hyper parameter drag performances.It can be seen from the figure that when lr takes 0.0001, mlr to take When 0.001, model behaving oneself best on both data sets.
It is best that table 1, which illustrates the F-score of model on both data sets, overstriking number when hyper parameter n takes different value, Index.Because testing the video memory limitation using video card, n value maximum is 2, will appear the mistake of low memory when n value is greater than 2. As can be seen from the table, when the value of hyper parameter n is 1, model behaving oneself best on both data sets.
The performance (F-score) of model on both data sets when 1. hyper parameter n of table takes different value
n 1 2
SumMe 44.1% 42.5%
TVSum 58.2% 58.1%
It (2) is the validity for proving this algorithm, in experiment 2, we are by the algorithm of this paper and typical method in recent years It is compared.The first control methods is that Gygli et al. was proposed in 2015, and reference papers are discussed in detail: M.Gygli,H.Grabner,and L.Van Gool,“video summarization by learning submodular mixtures of objectives,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit.,2015, Pp.3090-3098. second of control methods is vsLSTM, and reference papers are discussed in detail: K.Zhang, W.L.Chao, F.Sha, and K.Grauman,“video summarization with long short term memory,”in Proc.Eur.Conf.Comput.Vis., the third control methods of 2016, pp.766-782. is Zhang et al. 2016 What year proposed, reference papers are discussed in detail: K.Zhang, W.L.Chao, F.Sha, and K.Grauman, " summary transfer:exemplar-based subset selection for video summarization,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit., 2016, pp.1059-1067. the 4th kinds of control methods It is SUM-GANsup, reference papers are discussed in detail: B.Mahasseni, M.Lam, and S.Todorovic, " unsupervised video summarizationwith adversarial lstm networks,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit., 2017. the 5th kinds of control methods are DR-DSNsup, reference is discussed in detail Paper: K.Zhou and Y.Qiao, " deep reinforcement learning for unsupervised video summarization with diversity representativeness reward,”arXiv:1801.00054, 2017. the 6th kinds of control methods are that Li et al was proposed in 2017, reference papers are discussed in detail: X.Li, B.Zhao, and X.Lu,“a general framework for edited video and raw video summarization,”IEEE Trans.Image Process., vol.26, no.8, pp.3652-3664,2017. table 2 is pair of quantizating index F-score Than overstriking number is best index.As can be seen from the table, method MetaL-VS proposed in this paper is showed most in comparison It is good.Therefore, by the comparison of the method representative in recent years with this field, advance of the invention is further demonstrated.
Fig. 4 is the visualization result figure of MetaL-VS, and wherein Air_Force_One and car_over_camera video comes From SumMe data set;AwmHb44_ouw and qqR6AEXwxoQ video comes from TVSum data set.The blue portion of histogram is Ground truth, i.e., each frame manually marked are the probability of abstract frame, and RED sector is MetaL-VS as a result, under histogram The picture on side is the few examples picture in MetaL-VS abstract result.Although it can be seen from the figure that there is some deviations, MetaL-VS can select the high frame of different degree from original video, ignore unimportant enough frame.It can from visualization figure Effectiveness of the invention out.
2. 7 kinds of method video summary results index F-score comparisons of table
Method SumMe TVSum
Gygli et al. 39.7% -
vsLSTM 40.7% 56.9%
Zhang et al. 40.9% -
SUM-GANsup 41.7% 56.3%
DR-DSNsup 42.1% 58.1%
Li et al. 43.1% 52.7%
MetaL-VS (present invention) 44.1% 58.2%
It (3) is test the method for the present invention MetaL-VS to the robustness of traditional characteristic, we have generation in nearly 2 years with two The method of table has carried out the comparative experiments of video frequency abstract performance on traditional characteristic.First control methods is SUM-GANsup, in detail Carefully introduce reference papers: B.Mahasseni, M.Lam, and S.Todorovic, " unsupervised video summarization with adversarial lstm networks,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit., 2017. second control methods are dppLSTM, are discussed in detail with reference to text It offers: K.Zhang, W.L.Chao, F.Sha, and K.Grauman, " video summarization with long short Term memory, " in Proc.Eur.Conf.Comput.Vis., 2016, pp.766-782. tables 3 are quantizating index F- The comparison of score, the best index of overstriking digital representation.It can find out from table, MetaL-VS is achieved can be with nearly 2 years classics The performance of method shoulder to shoulder, and the also control methods 4 and 2.8 percentage point more than two respectively on SumMe data set.By Performance of the MetaL-VS on traditional characteristic is it is found that the present invention has certain robustness and generalization ability to traditional characteristic.
Table 3. uses F-score performance comparison when traditional characteristic
Method SumMe TVSum
SUM-GANsup 39.5% 59.5%
dppLSTM 40.7% 57.9%
MetaL-VS (present invention) 43.5% 57.9%
The method of the present invention is first and explores the method that meta learning is applied in video frequency abstract field.Think of based on meta learning Think, video frequency abstract model is learnt in video summary task space, and this mode is conducive to model and more focuses on video Abstract task itself rather than just the video data of structuring, serializing, while being more advantageous to model to video frequency abstract mechanism Exploration, be conducive to improve model generalization ability.Pass through the comparison of qualitative and quantitative experiment, it was demonstrated that inventive algorithm has advanced The features such as property, validity.

Claims (2)

1. a kind of video summarization method based on meta learning, it is characterised in that steps are as follows:
Step 1: preparing data set
Use open source video frequency abstract data set SumMe, TVSum, Youtube and OVP: when using SumMe as test set, Youtube and OVP is as training set, and TVSum is as verifying collection;When using TVSum as test set, Youtube and OVP are as instruction Practice collection, SumMe is as verifying collection;
Step 2: extracting video frame feature
Video frame is input to GoogleLeNet network, and using the output of network layer second from the bottom as its depth characteristic;It uses Color histogram, GIST, HOG and dense SIFT are as traditional characteristic, and wherein color histogram extracts from the RGB of video frame Form, other traditional characteristics extract from the corresponding grayscale image of video frame;
Step 3: training video abstract model
The two stages algorithm for training network based on meta learning thought is used to carry out learner model vsLSTM network f θ parameter θ Study, training before by model parameter θ random initializtion be θ0, i-th iteration is by model parameter by θi-1It is updated to θi, in training Each iteration is made of two stage stochastic gradient descent process:
First stage is by parameter by θi-1It is updated toSelect a task at random from training setLearner is calculated current Parameter θi-1Performance under state in the taskAnd loss functionIt asksTo θi-1Lead Number and renewal learning person's parameter θi-1ExtremelyThen learner model can be calculated againPerformance and update in the task Its parameterThe update of this parameter can carry out n times, and wherein n is positive integer, be shown below:
Wherein, α indicates learning rate,WithIt is learner modelWithIn taskOn L1Damage Function is lost, wherein the parameter of learner model is respectively θi-1WithL1Loss function is defined as:
Wherein y indicates the output vector of model, and x indicates that ground truth vector, N indicate the number of element in vector;
Second stage by parameter byIt is updated to θi: select a task at random from training setLearner is calculated in parameterPerformance under state in the taskAnd loss functionIt asksTo θi-1Derivative and more New learner's parameter is to θi, as shown in formula (3):
Wherein β indicates meta learning rate, is used as hyper parameter in the methods of the invention;It is learner modelIn taskOn L1Loss function, wherein the parameter of learner model be
This two stages training algorithm as meta learning person's model guidance learning person's model vsLSTM training to carry out video frequency abstract The exploration of mechanism, by maximizing generalization ability of the learner model on test set, i.e. minimum learner model is being tested The extensive error of expectation on collection, can acquire the parameter θ of learner model in successive ignition;
Step 4: the video frame feature in step 2 being input in the trained learner model vsLSTM network of step 3, can be obtained The probability of video frequency abstract is selected into every frame.
2. a kind of video summarization method based on meta learning according to claim 1, it is characterised in that the specific step of step 4 It is rapid as follows: first according to the probability or score of vsLSTM output, video being divided into disjoint segment in timing;It then will be each Score of the average value of video frame score as the video clip in segment, and according to the score of video clip, to video clip Carry out descending sort;Sequentially retain since the video clip of maximum probability, to avoid the abstract result chosen too long, works as reservation Stopping when reaching the 15% of original video length of video clip total length, the video clip chosen at this time plucks as original video Want result.
CN201910037959.5A 2019-01-16 2019-01-16 Video abstraction method based on meta-learning Active CN109885728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910037959.5A CN109885728B (en) 2019-01-16 2019-01-16 Video abstraction method based on meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910037959.5A CN109885728B (en) 2019-01-16 2019-01-16 Video abstraction method based on meta-learning

Publications (2)

Publication Number Publication Date
CN109885728A true CN109885728A (en) 2019-06-14
CN109885728B CN109885728B (en) 2022-06-07

Family

ID=66926054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910037959.5A Active CN109885728B (en) 2019-01-16 2019-01-16 Video abstraction method based on meta-learning

Country Status (1)

Country Link
CN (1) CN109885728B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111031390A (en) * 2019-12-17 2020-04-17 南京航空航天大学 Dynamic programming-based method for summarizing video of determinant point process with fixed output size
CN111062284A (en) * 2019-12-06 2020-04-24 浙江工业大学 Visual understanding and diagnosing method of interactive video abstract model
CN111526434A (en) * 2020-04-24 2020-08-11 西北工业大学 Converter-based video abstraction method
CN112884160A (en) * 2020-12-31 2021-06-01 北京爱笔科技有限公司 Meta learning method and related device
JP7378172B2 (en) 2022-02-03 2023-11-13 インハ インダストリー パートナーシップ インスティテュート Unsupervised video summarization method and apparatus with efficient keyframe selection reward function

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239501A (en) * 2014-09-10 2014-12-24 中国电子科技集团公司第二十八研究所 Mass video semantic annotation method based on Spark
CN107103614A (en) * 2017-04-12 2017-08-29 合肥工业大学 The dyskinesia detection method encoded based on level independent element
CN107590505A (en) * 2017-08-01 2018-01-16 天津大学 The learning method of joint low-rank representation and sparse regression
US20180357543A1 (en) * 2016-01-27 2018-12-13 Bonsai AI, Inc. Artificial intelligence system configured to measure performance of artificial intelligence over time
CN109064493A (en) * 2018-08-01 2018-12-21 北京飞搜科技有限公司 A kind of method for tracking target and device based on meta learning
CN109213896A (en) * 2018-08-06 2019-01-15 杭州电子科技大学 Underwater video abstraction generating method based on shot and long term memory network intensified learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239501A (en) * 2014-09-10 2014-12-24 中国电子科技集团公司第二十八研究所 Mass video semantic annotation method based on Spark
US20180357543A1 (en) * 2016-01-27 2018-12-13 Bonsai AI, Inc. Artificial intelligence system configured to measure performance of artificial intelligence over time
CN107103614A (en) * 2017-04-12 2017-08-29 合肥工业大学 The dyskinesia detection method encoded based on level independent element
CN107590505A (en) * 2017-08-01 2018-01-16 天津大学 The learning method of joint low-rank representation and sparse regression
CN109064493A (en) * 2018-08-01 2018-12-21 北京飞搜科技有限公司 A kind of method for tracking target and device based on meta learning
CN109213896A (en) * 2018-08-06 2019-01-15 杭州电子科技大学 Underwater video abstraction generating method based on shot and long term memory network intensified learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BEHROOZ MAHASSENI等: "Unsupervised Video Summarization with Adversarial LSTM Networks", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
K. ZHANG等: "Video summarization with long short-term memory", 《IN PROCEEDING EUROPEAN CONFERENCE ON COMPUTER VISION》 *
SACHIN RAVI等: "Optimization AS A Model For Few-Short Learning", 《PUBLISHED AS A COFERENCE PAPER AT ICLR 2017》 *
XUELONG LI等: "Meta Learning for Task-Driven Video", 《 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS》 *
崔建双等: "基于元学习推荐的优化算法自动选择框架与实证分析", 《计算机应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062284A (en) * 2019-12-06 2020-04-24 浙江工业大学 Visual understanding and diagnosing method of interactive video abstract model
CN111062284B (en) * 2019-12-06 2023-09-29 浙江工业大学 Visual understanding and diagnosis method for interactive video abstract model
CN111031390A (en) * 2019-12-17 2020-04-17 南京航空航天大学 Dynamic programming-based method for summarizing video of determinant point process with fixed output size
CN111031390B (en) * 2019-12-17 2022-10-21 南京航空航天大学 Method for summarizing process video of outputting determinant point with fixed size
CN111526434A (en) * 2020-04-24 2020-08-11 西北工业大学 Converter-based video abstraction method
CN112884160A (en) * 2020-12-31 2021-06-01 北京爱笔科技有限公司 Meta learning method and related device
CN112884160B (en) * 2020-12-31 2024-03-12 北京爱笔科技有限公司 Meta learning method and related device
JP7378172B2 (en) 2022-02-03 2023-11-13 インハ インダストリー パートナーシップ インスティテュート Unsupervised video summarization method and apparatus with efficient keyframe selection reward function

Also Published As

Publication number Publication date
CN109885728B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN109885728A (en) Video summarization method based on meta learning
Qi et al. Attentive relational networks for mapping images to scene graphs
Kordopatis-Zilos et al. Near-duplicate video retrieval by aggregating intermediate cnn layers
Escorcia et al. Daps: Deep action proposals for action understanding
Mei et al. Patch based video summarization with block sparse representation
CN106033426A (en) A latent semantic min-Hash-based image retrieval method
CN105718532A (en) Cross-media sequencing method based on multi-depth network structure
Yang et al. Cascaded split-and-aggregate learning with feature recombination for pedestrian attribute recognition
Chen et al. Multi-scale adaptive task attention network for few-shot learning
Li et al. Meta learning for task-driven video summarization
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
Shen et al. Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description.
Zhang et al. Self-guided adaptation: Progressive representation alignment for domain adaptive object detection
CN105701516B (en) A kind of automatic image marking method differentiated based on attribute
CN103473308B (en) High-dimensional multimedia data classifying method based on maximum margin tensor study
CN105701225A (en) Cross-media search method based on unification association supergraph protocol
Song et al. A weighted topic model learned from local semantic space for automatic image annotation
Lin et al. Scene recognition using multiple representation network
Chen et al. Learning to focus: cascaded feature matching network for few-shot image recognition
Wang et al. Fast and accurate action detection in videos with motion-centric attention model
Papagiannopoulou et al. Concept-based image clustering and summarization of event-related image collections
Li et al. Meta-reweighted regularization for unsupervised domain adaptation
Fu et al. Video summarization with a dual attention capsule network
Mithun et al. Generating diverse image datasets with limited labeling
Lv et al. Retrieval oriented deep feature learning with complementary supervision mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant