CN110110140A - Video summarization method based on attention expansion coding and decoding network - Google Patents
Video summarization method based on attention expansion coding and decoding network Download PDFInfo
- Publication number
- CN110110140A CN110110140A CN201910319879.9A CN201910319879A CN110110140A CN 110110140 A CN110110140 A CN 110110140A CN 201910319879 A CN201910319879 A CN 201910319879A CN 110110140 A CN110110140 A CN 110110140A
- Authority
- CN
- China
- Prior art keywords
- sequence
- video frame
- video
- abstract
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8549—Creating video summaries, e.g. movie trailer
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A kind of video summarization method based on attention expansion coding and decoding network: regard as sequence to Sequence Learning process video frequency abstract, and utilize the relativity of time domain of video interframe, original video in SumMe or TVSum is obtained into video frame characteristic sequence by pre-training network, using video frame characteristic sequence as the input of encoder network in attention expansion coding and decoding network, obtain the semantic information sequence of video frame, again by the decoding network of multiplying property attention, the score for corresponding to each video frame is obtained;Then the score of all video frames is constituted into abstract sequence, the semantic information sequence of abstract sequence is obtained by retrospective encoder, the global semantic differentiation loss of building, introduce moving average model, the semantic dependency of study abstract sequence and video frame characteristic sequence, the new abstract sequence for retaining original video important information is obtained, set final abstract is selected finally by new abstract sequence.Invention enhances the robustness of model.
Description
Technical field
The present invention relates to a kind of video frequency abstracts.More particularly to it is a kind of for video processing and index based on attention expand
Open up the video summarization method of encoding and decoding network.
Background technique
With the fast development of information technology, video data explosive increase, there are redundancy and again in multitude of video data
Multiple information, this makes every user's quick obtaining information needed become more difficult.In this case, video summarization technique
Come into being, its target be generate one it is compact and comprehensively make a summary, provide target video within the shortest time for user
Maximum information, want the needs of fast and accurately browsing video important information to meet people, improve people and obtain information
Ability.
The research of video frequency abstract is generally divided into two classes: the method for supervised learning and unsupervised learning.It is wherein unsupervised to pluck
Method is wanted to lay particular emphasis on the immanent structure of learning data, using lower-level vision feature, with the pith of positioning video.That studies is each
In kind method, including cluster, sparse optimization and energy minimum etc..Main research at this stage is mostly based on the prison manually marked
Educational inspector's learning method makes the abstract generated have original video by maximizing the similarity degree generated between abstract and artificial mark
More information, and the performance of algorithm is generally better than the video summarization technique based on unsupervised learning.
The research of video summarization technique at present mainly regards video frequency abstract as sequence to Sequence Learning process, using length
Short-term memory network (LSTM) and its variant model the relativity of time domain of video interframe.Using original video frame sequence as defeated
Enter, export the importance score of each corresponding video frame, then importance score is sorted, is finally selected to close according to the score
Key frame or crucial camera lens, obtain final abstract.
But supervised video summarization method requires the abstract generated and original video as close as will generate at present
Abstract and corresponding ground-truth construct loss function, then continue to optimize the abstract of generation by backpropagation, make to give birth to
At abstract and corresponding manual tag as close possible to finally making the abstract generated rich in original video information.This constraint
The local corresponding relationship generated between abstract and true mark is only focused on, so that the abstract generated depends entirely on true mark.
But concentrate the data containing supervision message less in disclosed reference data, so that model is easy to out during training
Existing over-fitting, hardly results in a preferable depth model, influences the performance for ultimately generating abstract.And video frequency abstract
Process is essentially mapping process of the original video to abstract, and many key messages may be lost in mapping process, therefore
How making full use of the semantic information of original video and losing slowing down in mapping process information is also our problems to be solved,
And when using stochastic gradient descent algorithm training neural network, parameter will avoid parameter from mutating when updating, and prevent from joining
Influence of the number fluctuation to result.
Above method is concerned only with the local corresponding relationship generated between abstract and true mark, and does not comprehensively consider to generation
The global restriction of abstract, and fail to make full use of the semantic information of video.And at the exceptional value of parameter renewal process
Reason is without explicitly proposing that solution, this point also will affect final digest performance.
Summary of the invention
The technical problem to be solved by the invention is to provide a kind of video frequency abstracts based on attention expansion coding and decoding network
Method.
The technical scheme adopted by the invention is that: a kind of video summarization method based on attention expansion coding and decoding network,
It include: to regard video frequency abstract as sequence to Sequence Learning process, and using the relativity of time domain of video interframe, by SumMe
Or the original video in TVSum obtains video frame characteristic sequence by pre-training network, using video frame characteristic sequence as attention
The input of encoder network in power expansion coding and decoding network, obtains the semantic information sequence of video frame, then pass through multiplying property attention
Decoding network, obtain the score for corresponding to each video frame;Then the score of all video frames is constituted into abstract sequence, by returning
Gu property encoder obtains the semantic information sequence of abstract sequence, constructs global semanteme and differentiates loss, introduces moving average model, learn
The semantic dependency for practising abstract sequence and video frame characteristic sequence obtains the new abstract sequence for retaining original video important information
Column select set final abstract finally by new abstract sequence.
The original video important information is the importance score information marked in SumMe or TVSum.
Specifically comprise the following steps:
1) video frame is obtained with the polydispersity index of 2fps to original video in SumMe or TVSum, the video frame is passed through
Pre-training obtains GoogLeNet network on ImageNet data set, extracts video frame characteristic sequence X={ x1,x2,...,xT};
2) by video frame characteristic sequence be input in attention expansion coding and decoding network by two-way shot and long term memory network
In the encoder of composition, coding obtains the semantic information sequence V={ v of video frame1,v2,...,vT};
3) by the semantic information sequence inputting of video frame into the decoder being made of shot and long term memory network, in decoder
Middle introducing attention mechanism, decoding obtain corresponding to the score of each video frame, and the score of all video frames is constituted abstract sequence Y
={ y1,y2,...,yL};
4) by the abstract sequence inputting of generation into the retrospective encoder being made of two-way shot and long term memory network, coding
Obtain the semantic information sequence U={ u of abstract sequence1,u2,...,uT, then by abstract sequence semantic information sequence with it is corresponding
The importance score information that is marked in SumMe or TVSum constitute local discriminant loss on the basis of, be re-introduced into video
The global of the semantic information Sequence composition of the semantic information sequence of frame and abstract sequence differentiates loss, generates representative new
Abstract sequence;Wherein local discriminant loss and the global loss function for differentiating loss building are as follows:
L=LO+λLs
Wherein: local discriminant lossgiIndicate that i-th of video frame is marked in SumMe or TVSum
The importance score of note, yiIndicate the score of each video frame generated;
The overall situation differentiates lossV indicates the semantic information sequence of video frame, and U indicates the semanteme of abstract sequence
Information sequence, λ are tradeoff parameter, and being worth is 0.001;
5) moving average model is introduced, record constitutes the network of encoder, constitutes the network of decoder and constitute retrospective
The average value of each parameter in the network of encoder passing parameter value within the set time, changes each parameter smoothing, suppression
Parameter mutation processed, repeats step 1)~step 4) until obtaining the score of all video frames, constitutes new abstract sequence, finally lead to
New abstract sequence is crossed to select set final abstract.
The semantic information sequence V={ v of video frame described in step 2)1,v2,...,vTIt is that net is remembered by forward direction shot and long term
The hidden state of networkWith the hidden state of backward shot and long term memory networkFusion obtains, whereinV has merged the contextual information of video frame, and t takes 1~T.
5. the video summarization method according to claim 3 based on attention expansion coding and decoding network, feature exist
Attention mechanism is introduced into decoding process for improving the accurate of each video frame score prediction in decoding process in, step 3)
Property, the attention mechanism is the prediction of fusion abstract sequence guidance current video frame importance scores, that is, passes through similarity
Function measures the similarity that previous shot and long term memory network hides the semantic information sequence of layer state and current video frame, similar
Spending function isAgain by normalizing the power weight that gains attentionWherein attention weightThe input for corresponding to decoder at this time is the new semantic information sequence V of video framet, whereinObtain corresponding to the score of each video frame by the new semantic information sequence of video frame.
Video summarization method based on attention expansion coding and decoding network of the invention is generating abstract and artificial mark structure
On the basis of local restriction, it is introduced into the semantic information by original video and generates the semantic information of abstract in semantic space
The global restriction of building incorporates attention mechanism, and smoothing parameter in the training process in decoding process, enhances model
Robustness.Advantage is mainly reflected in:
1, it novelty: proposes a kind of based on attention and retrospective to encode the coding/decoding model that combines.It considers not only
Time domain relevance before and after video frame preferably merges the effective information of video interframe, so that mapping of the original video to abstract
Process is as complete as possible.And global semantic differentiation loss is introduced, the life of abstract is guided using the semantic information of original video
At slowing down the less problem of supervision message in data set.
2, validity: the experiment in experiment and corresponding enhancing data set on SumMe, TVSum.The result shows that
Method of the invention achieves current advanced level, the experiment knot in experiment and its enhancing data set especially on TVSum
3.1% and 2.8% is respectively increased than current fresh approach in fruit.
3, it practicability: can be used in multimedia signal processing field, reduction user obtains related resource as far as possible
Time can preferably improve the search experience of user.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the video summarization method of attention expansion coding and decoding network.
Specific embodiment
Below with reference to embodiment and attached drawing to the video summarization method of the invention based on attention expansion coding and decoding network
It is described in detail.
Abstract is generated for enhancing and retains the important ability with correlation information of original video, and the present invention uses for reference retrospective coding
Thought is introduced global semantic differentiation loss, the generation of abstract is guided using the semantic information of original video, is constrained on the whole
Summarization generation process, and label information is not needed in the process, model is slowed down to the data dependence of tape label.But with
Unlike retrospective coding, the present invention is constrained to and sets out to obtain video frame range information and the maximum semantic information of construction
Point has merged video frame contextual information, during which and is not introduced into mismatch loss between video, but regards video frequency abstract as single
Video sequence regression process inputs the importance score for exporting the corresponding every frame of video for sequence of frames of video, reduces the ginseng of model
Number, improves trained efficiency.Based on this present invention in a model decoder section be added multiplying property attention mechanism, not only
Input of the last hiding layer state of encoder as decoder, but in the input and decoder for passing through current time decoder
One moment output building similarity function, obtains the importance score of current time decoder input, by each of decoder
Input is endowed different weights according to its significance level, so that model is obtained the more semantic informations of original video, was decoding
Cheng Zhongneng preferably predicts the importance score of each video frame, obtains ideal set of video, generates last
Abstract.
As shown in Figure 1, the video summarization method of the invention based on attention expansion coding and decoding network, comprising: by video
Abstract regards sequence as to Sequence Learning process, and the relativity of time domain of utilization video interframe will be in SumMe or TVSum
Original video obtains video frame characteristic sequence by pre-training network, using video frame characteristic sequence as attention expansion coding and decoding
The input of encoder network in network obtains the semantic information sequence of video frame, then by the decoding network of multiplying property attention, obtains
To the score of each video frame of correspondence;Then the score of all video frames is constituted into abstract sequence, is obtained by retrospective encoder
To the semantic information sequence of abstract sequence, construct it is global semantic differentiate loss, introduce moving average model, study abstract sequence with
The semantic dependency of video frame characteristic sequence obtains the new abstract sequence for retaining original video important information, and described is original
Video important information is the importance score information marked in SumMe or TVSum.Come finally by new abstract sequence
Select set final abstract.
Video summarization method based on attention expansion coding and decoding network of the invention, specifically comprises the following steps:
1) video frame is obtained with the polydispersity index of 2fps to original video in SumMe or TVSum, the video frame is passed through
Pre-training obtains GoogLeNet network on ImageNet data set, extracts video frame characteristic sequence X={ x1,x2,...,xT};
2) by video frame characteristic sequence be input in attention expansion coding and decoding network by two-way shot and long term memory network
(Bi-LSTM) in the encoder constituted, coding obtains the semantic information sequence V={ v of video frame1,v2,...,vT};
The semantic information sequence V={ v of the video frame1,v2,...,vTIt is by forward direction shot and long term memory network
(LSTM) hidden stateWith the hidden state of backward shot and long term memory network (LSTM)Melt
Conjunction obtains, whereinV has merged the contextual information of video frame, and t takes 1~T.
3) by the semantic information sequence inputting of video frame into the decoder being made of shot and long term memory network, in decoder
Middle introducing attention mechanism, decoding obtain corresponding to the score of each video frame, and the score of all video frames is constituted abstract sequence Y
={ y1,y2,...,yL};
The present invention is introduced into attention mechanism for improving each video frame score prediction in decoding process in decoding process
Accuracy, the attention mechanism be fusion abstract sequence guidance current video frame importance scores prediction, that is, pass through
It is similar to the semantic information sequence of current video frame that similarity function measures the hiding layer state of previous shot and long term memory network
Degree, similarity function areAgain by normalizing the power weight that gains attentionWherein pay attention to
Power weightThe input for corresponding to decoder at this time is the new semantic information sequence V of video framet,
InObtain corresponding to the score of each video frame by the new semantic information sequence of video frame.
4) by the abstract sequence inputting of generation into the retrospective encoder being made of two-way shot and long term memory network, coding
Obtain the semantic information sequence U={ u of abstract sequence1,u2,...,uT, then by abstract sequence semantic information sequence with it is corresponding
The importance score information that is marked in SumMe or TVSum constitute local discriminant loss on the basis of, be re-introduced into video
The semantic information sequence of the semantic information sequence of frame and abstract sequence at it is global differentiate loss, generate representative new plucks
Want sequence;Wherein local discriminant loss and the global loss function for differentiating loss building are as follows:
L=LO+λLs
Wherein: local discriminant lossgiIndicate that i-th of video frame is marked in SumMe or TVSum
The importance score of note, yiIndicate the score of each video frame generated;
The overall situation differentiates lossV indicates the semantic information sequence of video frame, and U indicates the semanteme of abstract sequence
Information sequence, to weigh parameter, value 0.001;
5) moving average model is introduced, record constitutes the network of encoder, constitutes the network of decoder and constitute retrospective
The average value of each parameter in the network of encoder passing parameter value within the set time, changes each parameter smoothing, suppression
Parameter mutation processed, increases the robustness of the video summarization method based on attention expansion coding and decoding network, repeats step 1)~step
It is rapid to constitute new abstract sequence 4) until obtain the score of all video frames, it is selected finally by new abstract sequence set
Fixed final abstract.
The moving average model is to take turns number by updating when using stochastic gradient descent algorithm training neural network
The size of attenuation rate is set dynamically, carrys out the update amplitude of control parameter with this so that model training just period parameters update compared with
Fastly, close at optimal value parameter update slower, amplitude is smaller, parameter by training, can finally it is stable at one close to true
Near weighted value.It in test phase, is predicted using smoothing parameter, improves performance of the final mask in test data.
Parameter updates as follows: sr=d*sr-1+ (1-d) * v, wherein srIt indicates by the updated parameter of r wheel training, sr-1For it is passing more
New parameter, v are the parameter when front-wheel number is updated.Wherein d=min { id, (1+r)/(10+r) }, wherein d indicates attenuation rate,
Id is that the initial attenuation rate r of setting is the wheel number that model parameter updates.
Claims (5)
1. a kind of video summarization method based on attention expansion coding and decoding network characterized by comprising see video frequency abstract
Work is sequence to Sequence Learning process, and using the relativity of time domain of video interframe, by the original view in SumMe or TVSum
Frequency obtains video frame characteristic sequence by pre-training network, using video frame characteristic sequence as in attention expansion coding and decoding network
The input of encoder network obtains the semantic information sequence of video frame, then by the decoding network of multiplying property attention, is corresponded to
The score of each video frame;Then the score of all video frames is constituted into abstract sequence, is made a summary by retrospective encoder
The semantic information sequence of sequence constructs global semantic differentiation loss, introduces moving average model, study abstract sequence and video frame
The semantic dependency of characteristic sequence obtains the new abstract sequence for retaining original video important information, finally by new abstract
Sequence selects set final abstract.
2. the video summarization method according to claim 1 based on attention expansion coding and decoding network, which is characterized in that institute
The original video important information stated is the importance score information marked in SumMe or TVSum.
3. the video summarization method according to claim 1 based on attention expansion coding and decoding network, which is characterized in that tool
Body includes the following steps:
1) video frame is obtained with the polydispersity index of 2fps to original video in SumMe or TVSum, the video frame is passed through
Pre-training obtains GoogLeNet network on ImageNet data set, extracts video frame characteristic sequence X={ x1, x2,...,xT};
2) video frame characteristic sequence is input to being made of in attention expansion coding and decoding network two-way shot and long term memory network
Encoder in, coding obtain the semantic information sequence V={ v of video frame1,v2,...,vT};
3) the semantic information sequence inputting of video frame is drawn in a decoder into the decoder being made of shot and long term memory network
Enter attention mechanism, decoding obtains corresponding to the score of each video frame, and the score of all video frames is constituted abstract sequence Y=
{y1,y2,...,yL};
4) by the abstract sequence inputting of generation into the retrospective encoder being made of two-way shot and long term memory network, coding is obtained
The semantic information sequence U={ u for sequence of making a summary1,u2,...,uT, then by abstract sequence semantic information sequence with it is corresponding
On the basis of the local discriminant loss that the importance score information marked in SumMe or TVSum is constituted, it is re-introduced into video frame
The global of the semantic information Sequence composition of semantic information sequence and abstract sequence differentiates loss, generates representative new abstract
Sequence;Wherein local discriminant loss and the global loss function for differentiating loss building are as follows:
L=LO+λLs
Wherein: local discriminant lossgiIndicate what i-th of video frame was marked in SumMe or TVSum
Importance score, yiIndicate the score of each video frame generated;
The overall situation differentiates lossV indicates the semantic information sequence of video frame, and U indicates the semantic information of abstract sequence
Sequence, λ are tradeoff parameter, and being worth is 0.001;
5) moving average model is introduced, record constitutes the network of encoder, constitutes the network of decoder and constitutes retrospective coding
The average value of each parameter in the network of device passing parameter value within the set time, changes each parameter smoothing, inhibits ginseng
Numerical mutation repeats step 1)~step 4) until obtaining the score of all video frames, new abstract sequence is constituted, finally by new
Abstract sequence select set final abstract.
4. the video summarization method according to claim 3 based on attention expansion coding and decoding network, which is characterized in that step
It is rapid 2) described in video frame semantic information sequence V={ v1,v2,...,vTIt is by the hiding shape of forward direction shot and long term memory network
StateWith the hidden state of backward shot and long term memory networkFusion obtains, whereinV has merged the contextual information of video frame, and t takes 1~T.
5. the video summarization method according to claim 3 based on attention expansion coding and decoding network, which is characterized in that step
The rapid attention mechanism that 3) is introduced into decoding process is for improving the accuracy of each video frame score prediction in decoding process, institute
The attention mechanism stated is the prediction of fusion abstract sequence guidance current video frame importance scores, that is, passes through similarity function degree
Measure the similarity that previous shot and long term memory network hides the semantic information sequence of layer state and current video frame, similarity function
ForAgain by normalizing the power weight that gains attentionWherein attention weightThe input for corresponding to decoder at this time is the new semantic information sequence V of video framet, whereinObtain corresponding to the score of each video frame by the new semantic information sequence of video frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910319879.9A CN110110140A (en) | 2019-04-19 | 2019-04-19 | Video summarization method based on attention expansion coding and decoding network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910319879.9A CN110110140A (en) | 2019-04-19 | 2019-04-19 | Video summarization method based on attention expansion coding and decoding network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110110140A true CN110110140A (en) | 2019-08-09 |
Family
ID=67486054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910319879.9A Pending CN110110140A (en) | 2019-04-19 | 2019-04-19 | Video summarization method based on attention expansion coding and decoding network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110140A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110418156A (en) * | 2019-08-27 | 2019-11-05 | 上海掌门科技有限公司 | Information processing method and device |
CN110929094A (en) * | 2019-11-20 | 2020-03-27 | 北京香侬慧语科技有限责任公司 | Video title processing method and device |
CN111046966A (en) * | 2019-12-18 | 2020-04-21 | 江南大学 | Image subtitle generating method based on measurement attention mechanism |
CN111414471A (en) * | 2020-03-20 | 2020-07-14 | 北京百度网讯科技有限公司 | Method and apparatus for outputting information |
CN111460979A (en) * | 2020-03-30 | 2020-07-28 | 上海大学 | Key lens video abstraction method based on multi-layer space-time frame |
CN111526434A (en) * | 2020-04-24 | 2020-08-11 | 西北工业大学 | Converter-based video abstraction method |
CN111984820A (en) * | 2019-12-19 | 2020-11-24 | 重庆大学 | Video abstraction method based on double-self-attention capsule network |
CN112468888A (en) * | 2020-11-26 | 2021-03-09 | 广东工业大学 | Video abstract generation method and system based on GRU network |
CN112818828A (en) * | 2021-01-27 | 2021-05-18 | 中国科学技术大学 | Weak supervision time domain action positioning method and system based on memory network |
CN113111218A (en) * | 2021-03-23 | 2021-07-13 | 华中师范大学 | Unsupervised video abstraction method of bidirectional LSTM model based on visual saliency modulation |
CN113204670A (en) * | 2021-05-24 | 2021-08-03 | 合肥工业大学 | Attention model-based video abstract description generation method and device |
CN113301422A (en) * | 2021-05-24 | 2021-08-24 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, terminal and storage medium for acquiring video cover |
CN114254158A (en) * | 2022-02-25 | 2022-03-29 | 北京百度网讯科技有限公司 | Video generation method and device, and neural network training method and device |
CN115544244A (en) * | 2022-09-06 | 2022-12-30 | 内蒙古工业大学 | Cross fusion and reconstruction-based multi-mode generative abstract acquisition method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308501A (en) * | 2008-06-30 | 2008-11-19 | 腾讯科技(深圳)有限公司 | Method, system and device for generating video frequency abstract |
CN104346440A (en) * | 2014-10-10 | 2015-02-11 | 浙江大学 | Neural-network-based cross-media Hash indexing method |
US20150220543A1 (en) * | 2009-08-24 | 2015-08-06 | Google Inc. | Relevance-based image selection |
CN107484017A (en) * | 2017-07-25 | 2017-12-15 | 天津大学 | Supervision video abstraction generating method is had based on attention model |
CN107729821A (en) * | 2017-09-27 | 2018-02-23 | 浙江大学 | A kind of video summarization method based on one-dimensional sequence study |
CN108228570A (en) * | 2018-01-31 | 2018-06-29 | 延安大学 | A kind of document representation method based on entity burst character |
CN108334889A (en) * | 2017-11-30 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Abstract description generation method and device, abstract descriptive model training method and device |
-
2019
- 2019-04-19 CN CN201910319879.9A patent/CN110110140A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308501A (en) * | 2008-06-30 | 2008-11-19 | 腾讯科技(深圳)有限公司 | Method, system and device for generating video frequency abstract |
US20150220543A1 (en) * | 2009-08-24 | 2015-08-06 | Google Inc. | Relevance-based image selection |
CN104346440A (en) * | 2014-10-10 | 2015-02-11 | 浙江大学 | Neural-network-based cross-media Hash indexing method |
CN107484017A (en) * | 2017-07-25 | 2017-12-15 | 天津大学 | Supervision video abstraction generating method is had based on attention model |
CN107729821A (en) * | 2017-09-27 | 2018-02-23 | 浙江大学 | A kind of video summarization method based on one-dimensional sequence study |
CN108334889A (en) * | 2017-11-30 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Abstract description generation method and device, abstract descriptive model training method and device |
CN108228570A (en) * | 2018-01-31 | 2018-06-29 | 延安大学 | A kind of document representation method based on entity burst character |
Non-Patent Citations (2)
Title |
---|
KE ZHANG,KRISTEN GRAUMAN, AND FEI SHA: "Retrospective Encoders for Video Summarization", 《EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》 * |
ZHONG JI,KAILIN XIONG , YANWEI PANG , XUELONG LI: "Video Summarization with Attention-Based Encoder-Decoder Networks", 《ARXIV》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110418156A (en) * | 2019-08-27 | 2019-11-05 | 上海掌门科技有限公司 | Information processing method and device |
CN110929094A (en) * | 2019-11-20 | 2020-03-27 | 北京香侬慧语科技有限责任公司 | Video title processing method and device |
CN110929094B (en) * | 2019-11-20 | 2023-05-16 | 北京香侬慧语科技有限责任公司 | Video title processing method and device |
CN111046966A (en) * | 2019-12-18 | 2020-04-21 | 江南大学 | Image subtitle generating method based on measurement attention mechanism |
CN111984820A (en) * | 2019-12-19 | 2020-11-24 | 重庆大学 | Video abstraction method based on double-self-attention capsule network |
CN111984820B (en) * | 2019-12-19 | 2023-10-27 | 重庆大学 | Video abstraction method based on double self-attention capsule network |
CN111414471A (en) * | 2020-03-20 | 2020-07-14 | 北京百度网讯科技有限公司 | Method and apparatus for outputting information |
CN111460979A (en) * | 2020-03-30 | 2020-07-28 | 上海大学 | Key lens video abstraction method based on multi-layer space-time frame |
CN111526434A (en) * | 2020-04-24 | 2020-08-11 | 西北工业大学 | Converter-based video abstraction method |
CN112468888A (en) * | 2020-11-26 | 2021-03-09 | 广东工业大学 | Video abstract generation method and system based on GRU network |
CN112818828A (en) * | 2021-01-27 | 2021-05-18 | 中国科学技术大学 | Weak supervision time domain action positioning method and system based on memory network |
CN112818828B (en) * | 2021-01-27 | 2022-09-09 | 中国科学技术大学 | Weak supervision time domain action positioning method and system based on memory network |
CN113111218A (en) * | 2021-03-23 | 2021-07-13 | 华中师范大学 | Unsupervised video abstraction method of bidirectional LSTM model based on visual saliency modulation |
CN113204670A (en) * | 2021-05-24 | 2021-08-03 | 合肥工业大学 | Attention model-based video abstract description generation method and device |
CN113204670B (en) * | 2021-05-24 | 2022-12-09 | 合肥工业大学 | Attention model-based video abstract description generation method and device |
CN113301422A (en) * | 2021-05-24 | 2021-08-24 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, terminal and storage medium for acquiring video cover |
CN114254158B (en) * | 2022-02-25 | 2022-06-10 | 北京百度网讯科技有限公司 | Video generation method and device, and neural network training method and device |
CN114254158A (en) * | 2022-02-25 | 2022-03-29 | 北京百度网讯科技有限公司 | Video generation method and device, and neural network training method and device |
CN115544244A (en) * | 2022-09-06 | 2022-12-30 | 内蒙古工业大学 | Cross fusion and reconstruction-based multi-mode generative abstract acquisition method |
CN115544244B (en) * | 2022-09-06 | 2023-11-17 | 内蒙古工业大学 | Multi-mode generation type abstract acquisition method based on cross fusion and reconstruction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110140A (en) | Video summarization method based on attention expansion coding and decoding network | |
CN110348016A (en) | Text snippet generation method based on sentence association attention mechanism | |
Perez-Martin et al. | Improving video captioning with temporal composition of a visual-syntactic embedding | |
CN109522411A (en) | A kind of writing householder method neural network based | |
CN109767759A (en) | End-to-end speech recognition methods based on modified CLDNN structure | |
CN112650886B (en) | Cross-modal video time retrieval method based on cross-modal dynamic convolution network | |
Zhu et al. | Dual learning for semi-supervised natural language understanding | |
CN109189862A (en) | A kind of construction of knowledge base method towards scientific and technological information analysis | |
Chen et al. | Joint multiple intent detection and slot filling via self-distillation | |
CN110349597A (en) | A kind of speech detection method and device | |
CN113190656B (en) | Chinese named entity extraction method based on multi-annotation frame and fusion features | |
CN111899766B (en) | Speech emotion recognition method based on optimization fusion of depth features and acoustic features | |
CN114037945A (en) | Cross-modal retrieval method based on multi-granularity feature interaction | |
Liu et al. | Jointly encoding word confusion network and dialogue context with BERT for spoken language understanding | |
WO2023231513A1 (en) | Conversation content generation method and apparatus, and storage medium and terminal | |
Yang et al. | Research on students’ adaptive learning system based on deep learning model | |
Peng et al. | Dual contrastive learning network for graph clustering | |
CN114943216B (en) | Case microblog attribute level view mining method based on graph attention network | |
CN114548090B (en) | Fast relation extraction method based on convolutional neural network and improved cascade labeling | |
Chunlei et al. | Research and Implementation of Automatic Composition System Based on DLGN | |
Liu et al. | A Survey of Speech Recognition Based on Deep Learning | |
Dong | Using deep learning and genetic algorithms for melody generation and optimization in music | |
CN117633239B (en) | End-to-end face emotion recognition method combining combined category grammar | |
Saha et al. | Word Sense Induction with Knowledge Distillation from BERT | |
CN112528667B (en) | Domain migration method and device on semantic analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190809 |