CN110110140A - Video summarization method based on attention expansion coding and decoding network - Google Patents

Video summarization method based on attention expansion coding and decoding network Download PDF

Info

Publication number
CN110110140A
CN110110140A CN201910319879.9A CN201910319879A CN110110140A CN 110110140 A CN110110140 A CN 110110140A CN 201910319879 A CN201910319879 A CN 201910319879A CN 110110140 A CN110110140 A CN 110110140A
Authority
CN
China
Prior art keywords
sequence
video frame
video
abstract
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910319879.9A
Other languages
Chinese (zh)
Inventor
冀中
焦放
庞彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910319879.9A priority Critical patent/CN110110140A/en
Publication of CN110110140A publication Critical patent/CN110110140A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A kind of video summarization method based on attention expansion coding and decoding network: regard as sequence to Sequence Learning process video frequency abstract, and utilize the relativity of time domain of video interframe, original video in SumMe or TVSum is obtained into video frame characteristic sequence by pre-training network, using video frame characteristic sequence as the input of encoder network in attention expansion coding and decoding network, obtain the semantic information sequence of video frame, again by the decoding network of multiplying property attention, the score for corresponding to each video frame is obtained;Then the score of all video frames is constituted into abstract sequence, the semantic information sequence of abstract sequence is obtained by retrospective encoder, the global semantic differentiation loss of building, introduce moving average model, the semantic dependency of study abstract sequence and video frame characteristic sequence, the new abstract sequence for retaining original video important information is obtained, set final abstract is selected finally by new abstract sequence.Invention enhances the robustness of model.

Description

Video summarization method based on attention expansion coding and decoding network
Technical field
The present invention relates to a kind of video frequency abstracts.More particularly to it is a kind of for video processing and index based on attention expand Open up the video summarization method of encoding and decoding network.
Background technique
With the fast development of information technology, video data explosive increase, there are redundancy and again in multitude of video data Multiple information, this makes every user's quick obtaining information needed become more difficult.In this case, video summarization technique Come into being, its target be generate one it is compact and comprehensively make a summary, provide target video within the shortest time for user Maximum information, want the needs of fast and accurately browsing video important information to meet people, improve people and obtain information Ability.
The research of video frequency abstract is generally divided into two classes: the method for supervised learning and unsupervised learning.It is wherein unsupervised to pluck Method is wanted to lay particular emphasis on the immanent structure of learning data, using lower-level vision feature, with the pith of positioning video.That studies is each In kind method, including cluster, sparse optimization and energy minimum etc..Main research at this stage is mostly based on the prison manually marked Educational inspector's learning method makes the abstract generated have original video by maximizing the similarity degree generated between abstract and artificial mark More information, and the performance of algorithm is generally better than the video summarization technique based on unsupervised learning.
The research of video summarization technique at present mainly regards video frequency abstract as sequence to Sequence Learning process, using length Short-term memory network (LSTM) and its variant model the relativity of time domain of video interframe.Using original video frame sequence as defeated Enter, export the importance score of each corresponding video frame, then importance score is sorted, is finally selected to close according to the score Key frame or crucial camera lens, obtain final abstract.
But supervised video summarization method requires the abstract generated and original video as close as will generate at present Abstract and corresponding ground-truth construct loss function, then continue to optimize the abstract of generation by backpropagation, make to give birth to At abstract and corresponding manual tag as close possible to finally making the abstract generated rich in original video information.This constraint The local corresponding relationship generated between abstract and true mark is only focused on, so that the abstract generated depends entirely on true mark. But concentrate the data containing supervision message less in disclosed reference data, so that model is easy to out during training Existing over-fitting, hardly results in a preferable depth model, influences the performance for ultimately generating abstract.And video frequency abstract Process is essentially mapping process of the original video to abstract, and many key messages may be lost in mapping process, therefore How making full use of the semantic information of original video and losing slowing down in mapping process information is also our problems to be solved, And when using stochastic gradient descent algorithm training neural network, parameter will avoid parameter from mutating when updating, and prevent from joining Influence of the number fluctuation to result.
Above method is concerned only with the local corresponding relationship generated between abstract and true mark, and does not comprehensively consider to generation The global restriction of abstract, and fail to make full use of the semantic information of video.And at the exceptional value of parameter renewal process Reason is without explicitly proposing that solution, this point also will affect final digest performance.
Summary of the invention
The technical problem to be solved by the invention is to provide a kind of video frequency abstracts based on attention expansion coding and decoding network Method.
The technical scheme adopted by the invention is that: a kind of video summarization method based on attention expansion coding and decoding network, It include: to regard video frequency abstract as sequence to Sequence Learning process, and using the relativity of time domain of video interframe, by SumMe Or the original video in TVSum obtains video frame characteristic sequence by pre-training network, using video frame characteristic sequence as attention The input of encoder network in power expansion coding and decoding network, obtains the semantic information sequence of video frame, then pass through multiplying property attention Decoding network, obtain the score for corresponding to each video frame;Then the score of all video frames is constituted into abstract sequence, by returning Gu property encoder obtains the semantic information sequence of abstract sequence, constructs global semanteme and differentiates loss, introduces moving average model, learn The semantic dependency for practising abstract sequence and video frame characteristic sequence obtains the new abstract sequence for retaining original video important information Column select set final abstract finally by new abstract sequence.
The original video important information is the importance score information marked in SumMe or TVSum.
Specifically comprise the following steps:
1) video frame is obtained with the polydispersity index of 2fps to original video in SumMe or TVSum, the video frame is passed through Pre-training obtains GoogLeNet network on ImageNet data set, extracts video frame characteristic sequence X={ x1,x2,...,xT};
2) by video frame characteristic sequence be input in attention expansion coding and decoding network by two-way shot and long term memory network In the encoder of composition, coding obtains the semantic information sequence V={ v of video frame1,v2,...,vT};
3) by the semantic information sequence inputting of video frame into the decoder being made of shot and long term memory network, in decoder Middle introducing attention mechanism, decoding obtain corresponding to the score of each video frame, and the score of all video frames is constituted abstract sequence Y ={ y1,y2,...,yL};
4) by the abstract sequence inputting of generation into the retrospective encoder being made of two-way shot and long term memory network, coding Obtain the semantic information sequence U={ u of abstract sequence1,u2,...,uT, then by abstract sequence semantic information sequence with it is corresponding The importance score information that is marked in SumMe or TVSum constitute local discriminant loss on the basis of, be re-introduced into video The global of the semantic information Sequence composition of the semantic information sequence of frame and abstract sequence differentiates loss, generates representative new Abstract sequence;Wherein local discriminant loss and the global loss function for differentiating loss building are as follows:
L=LO+λLs
Wherein: local discriminant lossgiIndicate that i-th of video frame is marked in SumMe or TVSum The importance score of note, yiIndicate the score of each video frame generated;
The overall situation differentiates lossV indicates the semantic information sequence of video frame, and U indicates the semanteme of abstract sequence Information sequence, λ are tradeoff parameter, and being worth is 0.001;
5) moving average model is introduced, record constitutes the network of encoder, constitutes the network of decoder and constitute retrospective The average value of each parameter in the network of encoder passing parameter value within the set time, changes each parameter smoothing, suppression Parameter mutation processed, repeats step 1)~step 4) until obtaining the score of all video frames, constitutes new abstract sequence, finally lead to New abstract sequence is crossed to select set final abstract.
The semantic information sequence V={ v of video frame described in step 2)1,v2,...,vTIt is that net is remembered by forward direction shot and long term The hidden state of networkWith the hidden state of backward shot and long term memory networkFusion obtains, whereinV has merged the contextual information of video frame, and t takes 1~T.
5. the video summarization method according to claim 3 based on attention expansion coding and decoding network, feature exist Attention mechanism is introduced into decoding process for improving the accurate of each video frame score prediction in decoding process in, step 3) Property, the attention mechanism is the prediction of fusion abstract sequence guidance current video frame importance scores, that is, passes through similarity Function measures the similarity that previous shot and long term memory network hides the semantic information sequence of layer state and current video frame, similar Spending function isAgain by normalizing the power weight that gains attentionWherein attention weightThe input for corresponding to decoder at this time is the new semantic information sequence V of video framet, whereinObtain corresponding to the score of each video frame by the new semantic information sequence of video frame.
Video summarization method based on attention expansion coding and decoding network of the invention is generating abstract and artificial mark structure On the basis of local restriction, it is introduced into the semantic information by original video and generates the semantic information of abstract in semantic space The global restriction of building incorporates attention mechanism, and smoothing parameter in the training process in decoding process, enhances model Robustness.Advantage is mainly reflected in:
1, it novelty: proposes a kind of based on attention and retrospective to encode the coding/decoding model that combines.It considers not only Time domain relevance before and after video frame preferably merges the effective information of video interframe, so that mapping of the original video to abstract Process is as complete as possible.And global semantic differentiation loss is introduced, the life of abstract is guided using the semantic information of original video At slowing down the less problem of supervision message in data set.
2, validity: the experiment in experiment and corresponding enhancing data set on SumMe, TVSum.The result shows that Method of the invention achieves current advanced level, the experiment knot in experiment and its enhancing data set especially on TVSum 3.1% and 2.8% is respectively increased than current fresh approach in fruit.
3, it practicability: can be used in multimedia signal processing field, reduction user obtains related resource as far as possible Time can preferably improve the search experience of user.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the video summarization method of attention expansion coding and decoding network.
Specific embodiment
Below with reference to embodiment and attached drawing to the video summarization method of the invention based on attention expansion coding and decoding network It is described in detail.
Abstract is generated for enhancing and retains the important ability with correlation information of original video, and the present invention uses for reference retrospective coding Thought is introduced global semantic differentiation loss, the generation of abstract is guided using the semantic information of original video, is constrained on the whole Summarization generation process, and label information is not needed in the process, model is slowed down to the data dependence of tape label.But with Unlike retrospective coding, the present invention is constrained to and sets out to obtain video frame range information and the maximum semantic information of construction Point has merged video frame contextual information, during which and is not introduced into mismatch loss between video, but regards video frequency abstract as single Video sequence regression process inputs the importance score for exporting the corresponding every frame of video for sequence of frames of video, reduces the ginseng of model Number, improves trained efficiency.Based on this present invention in a model decoder section be added multiplying property attention mechanism, not only Input of the last hiding layer state of encoder as decoder, but in the input and decoder for passing through current time decoder One moment output building similarity function, obtains the importance score of current time decoder input, by each of decoder Input is endowed different weights according to its significance level, so that model is obtained the more semantic informations of original video, was decoding Cheng Zhongneng preferably predicts the importance score of each video frame, obtains ideal set of video, generates last Abstract.
As shown in Figure 1, the video summarization method of the invention based on attention expansion coding and decoding network, comprising: by video Abstract regards sequence as to Sequence Learning process, and the relativity of time domain of utilization video interframe will be in SumMe or TVSum Original video obtains video frame characteristic sequence by pre-training network, using video frame characteristic sequence as attention expansion coding and decoding The input of encoder network in network obtains the semantic information sequence of video frame, then by the decoding network of multiplying property attention, obtains To the score of each video frame of correspondence;Then the score of all video frames is constituted into abstract sequence, is obtained by retrospective encoder To the semantic information sequence of abstract sequence, construct it is global semantic differentiate loss, introduce moving average model, study abstract sequence with The semantic dependency of video frame characteristic sequence obtains the new abstract sequence for retaining original video important information, and described is original Video important information is the importance score information marked in SumMe or TVSum.Come finally by new abstract sequence Select set final abstract.
Video summarization method based on attention expansion coding and decoding network of the invention, specifically comprises the following steps:
1) video frame is obtained with the polydispersity index of 2fps to original video in SumMe or TVSum, the video frame is passed through Pre-training obtains GoogLeNet network on ImageNet data set, extracts video frame characteristic sequence X={ x1,x2,...,xT};
2) by video frame characteristic sequence be input in attention expansion coding and decoding network by two-way shot and long term memory network (Bi-LSTM) in the encoder constituted, coding obtains the semantic information sequence V={ v of video frame1,v2,...,vT};
The semantic information sequence V={ v of the video frame1,v2,...,vTIt is by forward direction shot and long term memory network (LSTM) hidden stateWith the hidden state of backward shot and long term memory network (LSTM)Melt Conjunction obtains, whereinV has merged the contextual information of video frame, and t takes 1~T.
3) by the semantic information sequence inputting of video frame into the decoder being made of shot and long term memory network, in decoder Middle introducing attention mechanism, decoding obtain corresponding to the score of each video frame, and the score of all video frames is constituted abstract sequence Y ={ y1,y2,...,yL};
The present invention is introduced into attention mechanism for improving each video frame score prediction in decoding process in decoding process Accuracy, the attention mechanism be fusion abstract sequence guidance current video frame importance scores prediction, that is, pass through It is similar to the semantic information sequence of current video frame that similarity function measures the hiding layer state of previous shot and long term memory network Degree, similarity function areAgain by normalizing the power weight that gains attentionWherein pay attention to Power weightThe input for corresponding to decoder at this time is the new semantic information sequence V of video framet, InObtain corresponding to the score of each video frame by the new semantic information sequence of video frame.
4) by the abstract sequence inputting of generation into the retrospective encoder being made of two-way shot and long term memory network, coding Obtain the semantic information sequence U={ u of abstract sequence1,u2,...,uT, then by abstract sequence semantic information sequence with it is corresponding The importance score information that is marked in SumMe or TVSum constitute local discriminant loss on the basis of, be re-introduced into video The semantic information sequence of the semantic information sequence of frame and abstract sequence at it is global differentiate loss, generate representative new plucks Want sequence;Wherein local discriminant loss and the global loss function for differentiating loss building are as follows:
L=LO+λLs
Wherein: local discriminant lossgiIndicate that i-th of video frame is marked in SumMe or TVSum The importance score of note, yiIndicate the score of each video frame generated;
The overall situation differentiates lossV indicates the semantic information sequence of video frame, and U indicates the semanteme of abstract sequence Information sequence, to weigh parameter, value 0.001;
5) moving average model is introduced, record constitutes the network of encoder, constitutes the network of decoder and constitute retrospective The average value of each parameter in the network of encoder passing parameter value within the set time, changes each parameter smoothing, suppression Parameter mutation processed, increases the robustness of the video summarization method based on attention expansion coding and decoding network, repeats step 1)~step It is rapid to constitute new abstract sequence 4) until obtain the score of all video frames, it is selected finally by new abstract sequence set Fixed final abstract.
The moving average model is to take turns number by updating when using stochastic gradient descent algorithm training neural network The size of attenuation rate is set dynamically, carrys out the update amplitude of control parameter with this so that model training just period parameters update compared with Fastly, close at optimal value parameter update slower, amplitude is smaller, parameter by training, can finally it is stable at one close to true Near weighted value.It in test phase, is predicted using smoothing parameter, improves performance of the final mask in test data. Parameter updates as follows: sr=d*sr-1+ (1-d) * v, wherein srIt indicates by the updated parameter of r wheel training, sr-1For it is passing more New parameter, v are the parameter when front-wheel number is updated.Wherein d=min { id, (1+r)/(10+r) }, wherein d indicates attenuation rate, Id is that the initial attenuation rate r of setting is the wheel number that model parameter updates.

Claims (5)

1. a kind of video summarization method based on attention expansion coding and decoding network characterized by comprising see video frequency abstract Work is sequence to Sequence Learning process, and using the relativity of time domain of video interframe, by the original view in SumMe or TVSum Frequency obtains video frame characteristic sequence by pre-training network, using video frame characteristic sequence as in attention expansion coding and decoding network The input of encoder network obtains the semantic information sequence of video frame, then by the decoding network of multiplying property attention, is corresponded to The score of each video frame;Then the score of all video frames is constituted into abstract sequence, is made a summary by retrospective encoder The semantic information sequence of sequence constructs global semantic differentiation loss, introduces moving average model, study abstract sequence and video frame The semantic dependency of characteristic sequence obtains the new abstract sequence for retaining original video important information, finally by new abstract Sequence selects set final abstract.
2. the video summarization method according to claim 1 based on attention expansion coding and decoding network, which is characterized in that institute The original video important information stated is the importance score information marked in SumMe or TVSum.
3. the video summarization method according to claim 1 based on attention expansion coding and decoding network, which is characterized in that tool Body includes the following steps:
1) video frame is obtained with the polydispersity index of 2fps to original video in SumMe or TVSum, the video frame is passed through Pre-training obtains GoogLeNet network on ImageNet data set, extracts video frame characteristic sequence X={ x1, x2,...,xT};
2) video frame characteristic sequence is input to being made of in attention expansion coding and decoding network two-way shot and long term memory network Encoder in, coding obtain the semantic information sequence V={ v of video frame1,v2,...,vT};
3) the semantic information sequence inputting of video frame is drawn in a decoder into the decoder being made of shot and long term memory network Enter attention mechanism, decoding obtains corresponding to the score of each video frame, and the score of all video frames is constituted abstract sequence Y= {y1,y2,...,yL};
4) by the abstract sequence inputting of generation into the retrospective encoder being made of two-way shot and long term memory network, coding is obtained The semantic information sequence U={ u for sequence of making a summary1,u2,...,uT, then by abstract sequence semantic information sequence with it is corresponding On the basis of the local discriminant loss that the importance score information marked in SumMe or TVSum is constituted, it is re-introduced into video frame The global of the semantic information Sequence composition of semantic information sequence and abstract sequence differentiates loss, generates representative new abstract Sequence;Wherein local discriminant loss and the global loss function for differentiating loss building are as follows:
L=LO+λLs
Wherein: local discriminant lossgiIndicate what i-th of video frame was marked in SumMe or TVSum Importance score, yiIndicate the score of each video frame generated;
The overall situation differentiates lossV indicates the semantic information sequence of video frame, and U indicates the semantic information of abstract sequence Sequence, λ are tradeoff parameter, and being worth is 0.001;
5) moving average model is introduced, record constitutes the network of encoder, constitutes the network of decoder and constitutes retrospective coding The average value of each parameter in the network of device passing parameter value within the set time, changes each parameter smoothing, inhibits ginseng Numerical mutation repeats step 1)~step 4) until obtaining the score of all video frames, new abstract sequence is constituted, finally by new Abstract sequence select set final abstract.
4. the video summarization method according to claim 3 based on attention expansion coding and decoding network, which is characterized in that step It is rapid 2) described in video frame semantic information sequence V={ v1,v2,...,vTIt is by the hiding shape of forward direction shot and long term memory network StateWith the hidden state of backward shot and long term memory networkFusion obtains, whereinV has merged the contextual information of video frame, and t takes 1~T.
5. the video summarization method according to claim 3 based on attention expansion coding and decoding network, which is characterized in that step The rapid attention mechanism that 3) is introduced into decoding process is for improving the accuracy of each video frame score prediction in decoding process, institute The attention mechanism stated is the prediction of fusion abstract sequence guidance current video frame importance scores, that is, passes through similarity function degree Measure the similarity that previous shot and long term memory network hides the semantic information sequence of layer state and current video frame, similarity function ForAgain by normalizing the power weight that gains attentionWherein attention weightThe input for corresponding to decoder at this time is the new semantic information sequence V of video framet, whereinObtain corresponding to the score of each video frame by the new semantic information sequence of video frame.
CN201910319879.9A 2019-04-19 2019-04-19 Video summarization method based on attention expansion coding and decoding network Pending CN110110140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910319879.9A CN110110140A (en) 2019-04-19 2019-04-19 Video summarization method based on attention expansion coding and decoding network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910319879.9A CN110110140A (en) 2019-04-19 2019-04-19 Video summarization method based on attention expansion coding and decoding network

Publications (1)

Publication Number Publication Date
CN110110140A true CN110110140A (en) 2019-08-09

Family

ID=67486054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910319879.9A Pending CN110110140A (en) 2019-04-19 2019-04-19 Video summarization method based on attention expansion coding and decoding network

Country Status (1)

Country Link
CN (1) CN110110140A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110418156A (en) * 2019-08-27 2019-11-05 上海掌门科技有限公司 Information processing method and device
CN110929094A (en) * 2019-11-20 2020-03-27 北京香侬慧语科技有限责任公司 Video title processing method and device
CN111046966A (en) * 2019-12-18 2020-04-21 江南大学 Image subtitle generating method based on measurement attention mechanism
CN111414471A (en) * 2020-03-20 2020-07-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111460979A (en) * 2020-03-30 2020-07-28 上海大学 Key lens video abstraction method based on multi-layer space-time frame
CN111526434A (en) * 2020-04-24 2020-08-11 西北工业大学 Converter-based video abstraction method
CN111984820A (en) * 2019-12-19 2020-11-24 重庆大学 Video abstraction method based on double-self-attention capsule network
CN112468888A (en) * 2020-11-26 2021-03-09 广东工业大学 Video abstract generation method and system based on GRU network
CN112818828A (en) * 2021-01-27 2021-05-18 中国科学技术大学 Weak supervision time domain action positioning method and system based on memory network
CN113111218A (en) * 2021-03-23 2021-07-13 华中师范大学 Unsupervised video abstraction method of bidirectional LSTM model based on visual saliency modulation
CN113204670A (en) * 2021-05-24 2021-08-03 合肥工业大学 Attention model-based video abstract description generation method and device
CN113301422A (en) * 2021-05-24 2021-08-24 腾讯音乐娱乐科技(深圳)有限公司 Method, terminal and storage medium for acquiring video cover
CN114254158A (en) * 2022-02-25 2022-03-29 北京百度网讯科技有限公司 Video generation method and device, and neural network training method and device
CN115544244A (en) * 2022-09-06 2022-12-30 内蒙古工业大学 Cross fusion and reconstruction-based multi-mode generative abstract acquisition method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308501A (en) * 2008-06-30 2008-11-19 腾讯科技(深圳)有限公司 Method, system and device for generating video frequency abstract
CN104346440A (en) * 2014-10-10 2015-02-11 浙江大学 Neural-network-based cross-media Hash indexing method
US20150220543A1 (en) * 2009-08-24 2015-08-06 Google Inc. Relevance-based image selection
CN107484017A (en) * 2017-07-25 2017-12-15 天津大学 Supervision video abstraction generating method is had based on attention model
CN107729821A (en) * 2017-09-27 2018-02-23 浙江大学 A kind of video summarization method based on one-dimensional sequence study
CN108228570A (en) * 2018-01-31 2018-06-29 延安大学 A kind of document representation method based on entity burst character
CN108334889A (en) * 2017-11-30 2018-07-27 腾讯科技(深圳)有限公司 Abstract description generation method and device, abstract descriptive model training method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308501A (en) * 2008-06-30 2008-11-19 腾讯科技(深圳)有限公司 Method, system and device for generating video frequency abstract
US20150220543A1 (en) * 2009-08-24 2015-08-06 Google Inc. Relevance-based image selection
CN104346440A (en) * 2014-10-10 2015-02-11 浙江大学 Neural-network-based cross-media Hash indexing method
CN107484017A (en) * 2017-07-25 2017-12-15 天津大学 Supervision video abstraction generating method is had based on attention model
CN107729821A (en) * 2017-09-27 2018-02-23 浙江大学 A kind of video summarization method based on one-dimensional sequence study
CN108334889A (en) * 2017-11-30 2018-07-27 腾讯科技(深圳)有限公司 Abstract description generation method and device, abstract descriptive model training method and device
CN108228570A (en) * 2018-01-31 2018-06-29 延安大学 A kind of document representation method based on entity burst character

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KE ZHANG,KRISTEN GRAUMAN, AND FEI SHA: "Retrospective Encoders for Video Summarization", 《EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》 *
ZHONG JI,KAILIN XIONG , YANWEI PANG , XUELONG LI: "Video Summarization with Attention-Based Encoder-Decoder Networks", 《ARXIV》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110418156A (en) * 2019-08-27 2019-11-05 上海掌门科技有限公司 Information processing method and device
CN110929094A (en) * 2019-11-20 2020-03-27 北京香侬慧语科技有限责任公司 Video title processing method and device
CN110929094B (en) * 2019-11-20 2023-05-16 北京香侬慧语科技有限责任公司 Video title processing method and device
CN111046966A (en) * 2019-12-18 2020-04-21 江南大学 Image subtitle generating method based on measurement attention mechanism
CN111984820A (en) * 2019-12-19 2020-11-24 重庆大学 Video abstraction method based on double-self-attention capsule network
CN111984820B (en) * 2019-12-19 2023-10-27 重庆大学 Video abstraction method based on double self-attention capsule network
CN111414471A (en) * 2020-03-20 2020-07-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111460979A (en) * 2020-03-30 2020-07-28 上海大学 Key lens video abstraction method based on multi-layer space-time frame
CN111526434A (en) * 2020-04-24 2020-08-11 西北工业大学 Converter-based video abstraction method
CN112468888A (en) * 2020-11-26 2021-03-09 广东工业大学 Video abstract generation method and system based on GRU network
CN112818828A (en) * 2021-01-27 2021-05-18 中国科学技术大学 Weak supervision time domain action positioning method and system based on memory network
CN112818828B (en) * 2021-01-27 2022-09-09 中国科学技术大学 Weak supervision time domain action positioning method and system based on memory network
CN113111218A (en) * 2021-03-23 2021-07-13 华中师范大学 Unsupervised video abstraction method of bidirectional LSTM model based on visual saliency modulation
CN113204670A (en) * 2021-05-24 2021-08-03 合肥工业大学 Attention model-based video abstract description generation method and device
CN113204670B (en) * 2021-05-24 2022-12-09 合肥工业大学 Attention model-based video abstract description generation method and device
CN113301422A (en) * 2021-05-24 2021-08-24 腾讯音乐娱乐科技(深圳)有限公司 Method, terminal and storage medium for acquiring video cover
CN114254158B (en) * 2022-02-25 2022-06-10 北京百度网讯科技有限公司 Video generation method and device, and neural network training method and device
CN114254158A (en) * 2022-02-25 2022-03-29 北京百度网讯科技有限公司 Video generation method and device, and neural network training method and device
CN115544244A (en) * 2022-09-06 2022-12-30 内蒙古工业大学 Cross fusion and reconstruction-based multi-mode generative abstract acquisition method
CN115544244B (en) * 2022-09-06 2023-11-17 内蒙古工业大学 Multi-mode generation type abstract acquisition method based on cross fusion and reconstruction

Similar Documents

Publication Publication Date Title
CN110110140A (en) Video summarization method based on attention expansion coding and decoding network
CN110348016A (en) Text snippet generation method based on sentence association attention mechanism
Perez-Martin et al. Improving video captioning with temporal composition of a visual-syntactic embedding
CN109522411A (en) A kind of writing householder method neural network based
CN109767759A (en) End-to-end speech recognition methods based on modified CLDNN structure
CN112650886B (en) Cross-modal video time retrieval method based on cross-modal dynamic convolution network
Zhu et al. Dual learning for semi-supervised natural language understanding
CN109189862A (en) A kind of construction of knowledge base method towards scientific and technological information analysis
Chen et al. Joint multiple intent detection and slot filling via self-distillation
CN110349597A (en) A kind of speech detection method and device
CN113190656B (en) Chinese named entity extraction method based on multi-annotation frame and fusion features
CN111899766B (en) Speech emotion recognition method based on optimization fusion of depth features and acoustic features
CN114037945A (en) Cross-modal retrieval method based on multi-granularity feature interaction
Liu et al. Jointly encoding word confusion network and dialogue context with BERT for spoken language understanding
WO2023231513A1 (en) Conversation content generation method and apparatus, and storage medium and terminal
Yang et al. Research on students’ adaptive learning system based on deep learning model
Peng et al. Dual contrastive learning network for graph clustering
CN114943216B (en) Case microblog attribute level view mining method based on graph attention network
CN114548090B (en) Fast relation extraction method based on convolutional neural network and improved cascade labeling
Chunlei et al. Research and Implementation of Automatic Composition System Based on DLGN
Liu et al. A Survey of Speech Recognition Based on Deep Learning
Dong Using deep learning and genetic algorithms for melody generation and optimization in music
CN117633239B (en) End-to-end face emotion recognition method combining combined category grammar
Saha et al. Word Sense Induction with Knowledge Distillation from BERT
CN112528667B (en) Domain migration method and device on semantic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190809