CN107911755B - Multi-video abstraction method based on sparse self-encoder - Google Patents

Multi-video abstraction method based on sparse self-encoder Download PDF

Info

Publication number
CN107911755B
CN107911755B CN201711113383.3A CN201711113383A CN107911755B CN 107911755 B CN107911755 B CN 107911755B CN 201711113383 A CN201711113383 A CN 201711113383A CN 107911755 B CN107911755 B CN 107911755B
Authority
CN
China
Prior art keywords
frame
video
subsets
topic
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711113383.3A
Other languages
Chinese (zh)
Other versions
CN107911755A (en
Inventor
冀中
马亚茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201711113383.3A priority Critical patent/CN107911755B/en
Publication of CN107911755A publication Critical patent/CN107911755A/en
Application granted granted Critical
Publication of CN107911755B publication Critical patent/CN107911755B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A sparse auto-encoder based multi-video summarization method, comprising: extracting visual features of the video frames; inputting the visual characteristics of the video frame into a sparse self-encoder, and respectively learning through the sparse self-encoder to obtain: the compressed representation of the video frame is the representation of the neuron of the hidden layer, the connection weight between the input layer and the hidden layer and the connection weight between the hidden layer and the output layer; generating a weight curve by using the obtained connection weight between the input layer and the hidden layer; selecting each local maximum of the weight curve as a key frame set; and sequencing the key frames to realize the abstract. Aiming at the characteristics of the existing multi-video abstract data set, the multi-video abstract technology based on the prototype analysis method with the weight, which is suitable for the characteristics, is designed, so that the special information of the data is fully utilized under the assistance of effective prior information.

Description

Multi-video abstraction method based on sparse self-encoder
Technical Field
The invention relates to a multi-video summarization method. In particular to a multi-video summarization method based on a sparse self-encoder.
Background
With the rapid development of information technology, video data is emerging in large quantities, and becomes one of important ways for people to acquire information. However, due to the dramatic increase in the number of videos, redundant and repetitive information occurs in a large amount of video data, which makes it difficult for a user to quickly acquire desired information. Therefore, under the circumstances, a technology capable of integrating and analyzing mass video data under the same theme is urgently needed to meet the requirement that people want to browse main information of videos quickly and accurately and improve the information acquisition capability of people. The multiple video summarization technique has attracted increasing researchers' attention over the past several decades as one of the effective ways to solve the above-mentioned problems. The multi-video abstraction technology is a content-based video data compression technology, and aims to analyze and integrate a plurality of videos of related topics under the same event, extract main contents in the plurality of videos, and present the extracted contents to a user according to a certain logical relationship. Currently, the multi-video summary is mainly analyzed from three aspects: 1) coverage rate; 2) novelty; 3) the importance of which. Coverage refers to the fact that the extracted video content can cover the main content of multiple videos on the same topic. Redundancy refers to removing duplicate, redundant information in a multi-video summary. The importance refers to extracting important key shots in a video set according to some prior information so as to extract important contents in a plurality of videos.
Although many single video summarizations have been proposed, the research on the multi-video summarization method is less and still in the preliminary stage. This is mainly due to two reasons: 1) one is due to the diversity of multiple video topics under the same event and the cross-correlation of topics between videos. The theme diversity means that information emphasis points of a plurality of videos in the same event are different and a plurality of sub-themes are provided. The topic cross-over refers to that the content of the videos under the same event has cross-over, and the videos have similar content and different information content. 2) Secondly, the audio information, text information and visual information may have a large difference due to the audio information expressed by the multiple video data to the same content. These reasons make the study of multiple video summaries difficult with traditional single video summaries.
In the past decades, methods for multi-video summarization have been proposed for the features of multi-video data sets. The multi-video summarization method based on complex graph clustering is a relatively classical method. The method comprises the steps of constructing a complex graph by extracting key words of corresponding script information of the video and key frames of the video, and realizing abstract by utilizing a graph clustering algorithm on the basis. But the method mainly aims at news videos, and the method loses meaning for video sets without video script information. In addition, because the content contained in a plurality of videos under the same theme has diversity and redundancy, the clustering method only meets the maximum coverage condition of the video content, but the clustering effect is poor only by using the visual information of the videos, and the complexity is higher although the clustering method is combined with other modes to help to a certain extent.
The multi-video abstract has information of multiple modalities, such as text information, visual information, audio information, and the like of a video. A Balanced AV-MMR (Balanced Audio Video maximum geographic retrieval) algorithm is combined with visual and Audio information, and a multi-Video summary algorithm for iteratively selecting key shots is designed under the idea of maximum margin correlation.
In recent years, novel methods have been proposed. Among them, the realization of multi-video summarization by using the visual co-occurrence characteristics (visualCo-occurrence) of video is a novel method. According to the method, important visual concepts (concepts) are considered to be repeated in a plurality of videos under the same theme, a maximum binary search algorithm (maximum binary matching) is provided according to the characteristic, and a sparse co-occurrence mode of the videos is extracted, so that multi-video abstraction is achieved. However, the method is only suitable for a specific data set, and the method loses significance for a video set with small repeatability in the video.
In addition, in order to utilize more related information, related researchers have proposed that sensors such as a GPS and a compass on a mobile phone are used to acquire information such as a geographic position during a mobile phone video shooting process, and thus assist in determining important information in a video and generating a multi-video summary. In addition, the prior information of the webpage picture is used as auxiliary information in the field, and multi-video abstraction is better realized. At present, due to the complexity of multi-video data, the research of multi-video summarization does not achieve the ideal effect. Therefore, how to better utilize the information of the multi-video data to better realize the multi-video summarization becomes a hot spot of research of relevant scholars at present. To this end, it is proposed herein to implement multi-video summarization using Sparse auto-editor algorithm (Sparse auto encoder). The sparse autoencoder is an unsupervised deep learning framework with a three-layer network structure. It learns the compressed representation of the input data by approximating the output to the input through the idea of nonlinear reconstruction. By utilizing the thought, the invention designs an algorithm for extracting the key frames and designs a bottom-up sorting algorithm for the extracted key frames, so that the presentation of the key frames is more logical, and the readability of the abstract is improved.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a sparse self-encoder-based multi-video abstraction method which can effectively utilize video theme related information and improve video browsing efficiency of a user.
The technical scheme adopted by the invention is as follows: a multi-video summarization method based on a sparse self-encoder comprises the following steps:
1) extracting visual characteristics of the video frame, and expressing the visual characteristics of the video frame as X ═ X1,x2,...,xi,...,xn},xi∈Rm
xiVisual features representing the ith frame;
2) video frequencyInputting the visual characteristics of the frame into a sparse self-encoder, and respectively learning by the sparse self-encoder to obtain: compressed representation of video frames, i.e. characterization of neurons of hidden layers, connection weights W between input layers and hidden layers(1)And connection weight W of hidden layer and output layer(2)
3) Using the obtained connection weight W between the input layer and the hidden layer(1)Generating a weight curve, i.e. the weight W(1)Is given by a 2 norm and is formulated as
Figure BDA0001464186350000021
4) Selecting each local maximum of the weight curve as a key frame set;
5) and sequencing the key frames to realize the abstract.
The visual feature of the video frame in the step 1) is one of a depth feature, a color feature and a visual bag-of-words feature.
The local processing in step 4) is to divide the video frame index corresponding to the abscissa of the weight curve into a plurality of local spaces according to a set interval, and to use the frame corresponding to the maximum value of the weight curve in each local space as a key frame.
Step 5) comprises the following steps:
(1) dividing a key frame set containing k elements into k subsets;
(2) respectively calculating the time correlation between every two subsets in the k subsets to obtain the time correlation vector F of the k subsetschro(ii) a The time correlation between each two subsets is calculated as follows, i.e. vector FchroAny one element of (1) is calculated as:
Figure BDA0001464186350000031
wherein A and B represent any two of the k subsets; a islRepresenting the last frame in set A, b1Represents the first frame in set B; t (a)l) Representing frame alTime of (a), V (a)l)=V(b1) Representing frame alAnd frame b1Belong to the same video, V (a)l)≠V(b1) Representing frame alAnd frame b1Belonging to the same video, N (a)l)<N(b1) Representing frame alAppearing earlier in the same video than frame b1;fchro(A > B) represents the temporal relevance of set A to set B;
(3) calculating the topic compactness between every two subsets of the k subsets to obtain topic compactness vectors F of the k subsetstopic(ii) a The topic closeness calculation formula between every two subsets is as follows, namely vector FtopicAny one element of (1) is calculated as:
Figure BDA0001464186350000032
where sim (a, B) denotes the cosine similarity between an arbitrary frame a belonging to the set a and an arbitrary frame B belonging to the set B, ftopic(A > B) represents the closeness of the topic of collection A that ranks before collection B;
(4) and superposing the time relevance vector and the topic closeness vector to obtain a relevance vector F of the k subsets, wherein the calculation formula is as follows:
F=Fchro+Ftopic
and sorting the key frames according to the relevance vectors of the k subsets: firstly, two subsets with the maximum correlation degree are selected and combined into a new set, and then the rest subsets are combined pairwise according to the correlation degree to form a plurality of new sets;
(5) repeating the calculation of the steps (2), (3) and (4) on all the generated new sets until all the video frames are contained in one set, and ending the iteration;
(6) and (5) sequencing the video frames in the set obtained in the step (5) according to the index sequence of the video frames to realize the summary.
The invention relates to a multi-video summarization method based on a sparse self-encoder, which is used for designing a multi-video summarization technology based on a prototype analysis method with weight, which is suitable for the characteristics of the existing multi-video summarization data set, so that the multi-video summarization technology fully utilizes the specific information of data under the assistance of effective prior information. The main advantages are mainly as follows:
(1) the novelty is as follows: the sparse self-editor method is applied to multi-video abstraction for the first time. And provides a bottom-up key frame ordering algorithm.
(2) Effectiveness: compared with a typical clustering method and a minimum sparse reconstruction method applied to single video abstraction, the sparse self-encoder-based multi-video abstraction method disclosed by the invention is proved to have obviously better performance than the clustering method and the minimum sparse reconstruction method, so that the sparse self-encoder-based multi-video abstraction method is more suitable for the multi-video abstraction problem.
(3) The practicability is as follows: the method is simple and feasible and can be used in the field of multimedia information processing.
Drawings
Fig. 1 is a flow chart of a sparse auto-encoder based multi-video summarization method of the present invention.
Detailed Description
The following describes a sparse auto-encoder based multi-video summarization method according to the present invention in detail with reference to the following embodiments and the accompanying drawings.
The invention discloses a multi-video abstraction method based on a sparse self-encoder, which aims to obtain compressed representation of input video frames, provide short and main video content for a user and improve the video browsing efficiency of the user. While sparse self-encoders can be viewed as a non-linear reconstruction process that obtains a compressed representation of the input data. Therefore, the invention applies the sparse self-encoder to the multi-video summary and designs a key frame selection algorithm according to the learned compressed representation. Here the input layer neurons represent a set of video frames and the hidden layer neurons represent a compressed representation of the input data that needs to be learned, also called a dictionary, for reconstructing the input vectors. The number of neurons in the output layer is the same as that in the input layer, and represents an approximate representation of the input layer.
The present invention applies a sparse self-encoder to a multi-video summary. The sparse self-encoder is a three-layer neural network with an implicit layer and is an unsupervised deep learning algorithm. The algorithm attempts to approximate an identity function, i.e., the input is approximately equal to the output. In order to reproduce the input signal as much as possible with the output signal, the self-encoder must capture the most important factors that can represent the input data, finding the principal components that can represent the original information. This process can be viewed as automatically obtaining a compressed representation of the input data.
Sparsity can be explained simply as follows. The constraint that a neuron is inhibited most of the time is called sparsity constraint if it is considered to be activated when its output is close to 1, and inhibited otherwise.
The specific principle of the sparse self-encoder is as follows:
given a set of input video frames X ═ X1,x2,....,xn},xi∈RmCharacterizing visual features of video frames, W(1)Representing the connection weight of the input layer and the hidden layer, W(2)Connection weight, h, representing hidden and output layers(W,b)(x) Representing the output. As used herein
Figure BDA0001464186350000041
Represents the output of the ith neuron of layer 2, i.e. hidden layer:
Figure BDA0001464186350000042
the activation function f is a sigmoid function, and a nonlinear element is introduced, as shown in formula (2):
Figure BDA0001464186350000043
the goal of the self-encoder is to approximate the input by the output, so its objective function is:
Figure BDA0001464186350000044
where b denotes a bias vector.
The number s of hidden layer units is typically required for an auto-encoder2The number of the hidden layers is sometimes larger than the number of the input layer nerve clouds, and even larger than the number of the input layer neurons, and the self-encoder can still learn the compressed representation of the input data by adding the sparsity condition in the hidden layers. The invention adopts KL divergence to control sparsity, and the specific expression is as follows:
Figure BDA0001464186350000045
Figure BDA0001464186350000046
here, the
Figure BDA0001464186350000051
Represents the average activation of the jth neuron of the hidden layer, and p is a sparse parameter and is a constant close to zero.
The total cost function is then expressed as follows:
Figure BDA0001464186350000052
where β is an adjustable parameter for balancing the two terms.
As shown in fig. 1, a sparse auto-encoder-based multiple video summarization method of the present invention includes the following steps:
1. a multi-video summarization method based on a sparse self-encoder is characterized by comprising the following steps:
1) extracting visual characteristics of the video frame, and expressing the visual characteristics of the video frame as X ═ X1,x2,...,xi,...,xn},xi∈Rm
xiVisual features representing the ith frame; the visual feature of the video frame is one of a depth feature, a color feature and a visual bag-of-words feature.
2) Inputting the visual characteristics of the video frame into a sparse self-encoder, and respectively learning through the sparse self-encoder to obtain: compressed representation of video framesI.e. the representation of the neurons of the hidden layer, the weight W of the connection between the input layer and the hidden layer(1)And connection weight W of hidden layer and output layer(2)
3) Using the obtained connection weight W between the input layer and the hidden layer(1)Generating a weight curve, i.e. the weight W(1)Is given by a 2 norm and is formulated as
Figure BDA0001464186350000053
4) Selecting each local maximum of the weight curve as a key frame set;
the local means that the video frame index corresponding to the abscissa of the weight curve is divided into a plurality of local spaces according to a set interval, and a frame corresponding to the maximum value of the weight curve in each local space is used as a key frame.
5) Sorting the key frames to realize the abstract, comprising the following steps:
(1) dividing a key frame set containing k elements into k subsets;
(2) respectively calculating the time correlation between every two subsets in the k subsets to obtain the time correlation vector F of the k subsetschro(ii) a The time correlation between each two subsets is calculated as follows, i.e. vector FchroAny one element of (1) is calculated as:
Figure BDA0001464186350000054
wherein A and B represent any two of the k subsets; a islRepresenting the last frame in set A, b1Represents the first frame in set B; t (a)l) Representing frame alTime of (a), V (a)l)=V(b1) Representing frame alAnd frame b1Belong to the same video, V (a)l)≠V(b1) Representing frame alAnd frame b1Belonging to the same video, N (a)l)<N(b1) Representing frame alAppearing earlier in the same video than frame b1;fchro(A > B) representsThe temporal correlation of set a before set B;
(3) calculating the topic compactness between every two subsets of the k subsets to obtain topic compactness vectors F of the k subsetstopic(ii) a The topic closeness calculation formula between every two subsets is as follows, namely vector FtopicAny one element of (1) is calculated as:
Figure BDA0001464186350000061
where sim (a, B) denotes the cosine similarity between an arbitrary frame a belonging to the set a and an arbitrary frame B belonging to the set B, ftopic(A > B) represents the closeness of the topic of collection A that ranks before collection B;
(4) and superposing the time relevance vector and the topic closeness vector to obtain a relevance vector F of the k subsets, wherein the calculation formula is as follows:
F=Fchro+Ftopic
and sorting the key frames according to the relevance vectors of the k subsets: firstly, two subsets with the maximum correlation degree are selected and combined into a new set, and then the rest subsets are combined pairwise according to the correlation degree to form a plurality of new sets;
(5) repeating the calculation of the steps (2), (3) and (4) on all the generated new sets until all the video frames are contained in one set, and ending the iteration;
(6) and (5) sequencing the video frames in the set obtained in the step (5) according to the index sequence of the video frames to realize the summary.

Claims (3)

1. A multi-video summarization method based on a sparse self-encoder is characterized by comprising the following steps:
1) extracting visual characteristics of the video frame, and expressing the visual characteristics of the video frame as X ═ X1,x2,...,xi,...,xn},xi∈Rm;xiVisual features representing the ith frame;
2) will be provided withInputting visual characteristics of the video frame into a sparse self-encoder, and respectively learning through the sparse self-encoder to obtain: compressed representation of video frames, i.e. characterization of neurons of hidden layers, connection weights W between input layers and hidden layers(1)And connection weight W of hidden layer and output layer(2)
3) Using the obtained connection weight W between the input layer and the hidden layer(1)Generating a weight curve, i.e. the weight W(1)Is given by a 2 norm and is formulated as
Figure FDA0002500237390000011
4) Selecting each local maximum of the weight curve as a key frame set;
5) sorting the key frames to realize the abstract, comprising the following steps:
(1) dividing a key frame set containing k elements into k subsets;
(2) respectively calculating the time correlation between every two subsets in the k subsets to obtain the time correlation vector F of the k subsetschro(ii) a The time correlation between each two subsets is calculated as follows, i.e. vector FchroAny one element of (1) is calculated as:
Figure FDA0002500237390000012
wherein A and B represent any two of the k subsets; a islRepresenting the last frame in set A, b1Represents the first frame in set B; t (a)l) Representing frame alThe time of (d); v (a)l) Representing frame alIn the video, N (a)l) Representing frame alThe rank of the frame in the video, i.e., the frame number; v (b)l) Representing frame blVideo of where, N (b)l) Representing frame blThe rank of the frame in the video, i.e., the frame number; v (a)l)=V(b1) Representing frame alAnd frame b1Belong to the same video, V (a)l)≠V(b1) Representing frame alAnd frame b1Not belonging to the same video, N (a)l)<N(b1) Representing frame alAppearing earlier in the same video than frame b1;fchro(A > B) represents the temporal relevance of set A to set B;
(3) calculating the topic compactness between every two subsets of the k subsets to obtain topic compactness vectors F of the k subsetstopic(ii) a The topic closeness calculation formula between every two subsets is as follows, namely vector FtopicAny one element of (1) is calculated as:
Figure FDA0002500237390000013
where sim (a, B) denotes the cosine similarity between an arbitrary frame a belonging to the set a and an arbitrary frame B belonging to the set B, ftopic(A > B) represents the closeness of the topic of collection A that ranks before collection B;
(4) and superposing the time relevance vector and the topic closeness vector to obtain a relevance vector F of the k subsets, wherein the calculation formula is as follows:
F=Fchro+Ftopic
and sorting the key frames according to the relevance vectors of the k subsets: firstly, two subsets with the maximum correlation degree are selected and combined into a new set, and then the rest subsets are combined pairwise according to the correlation degree sequence to form a plurality of new sets;
(5) repeating the calculation of the steps (2), (3) and (4) on all the generated new sets until all the video frames are contained in one set, and ending the iteration;
(6) and (5) sequencing the video frames in the set obtained in the step (5) according to the index sequence of the video frames to realize the summary.
2. The sparse auto-encoder based multi-video summarization method according to claim 1, wherein the visual feature of the video frame of step 1) is one of a depth feature, a color feature and a visual bag-of-words feature.
3. The sparse auto-encoder based multi-video summarization method according to claim 1, wherein the local step 4) is to divide the video frame index corresponding to the abscissa of the weight curve into a plurality of local spaces at a predetermined interval, and to use the frame corresponding to the maximum value of the weight curve in each local space as the key frame.
CN201711113383.3A 2017-11-10 2017-11-10 Multi-video abstraction method based on sparse self-encoder Active CN107911755B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711113383.3A CN107911755B (en) 2017-11-10 2017-11-10 Multi-video abstraction method based on sparse self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711113383.3A CN107911755B (en) 2017-11-10 2017-11-10 Multi-video abstraction method based on sparse self-encoder

Publications (2)

Publication Number Publication Date
CN107911755A CN107911755A (en) 2018-04-13
CN107911755B true CN107911755B (en) 2020-10-20

Family

ID=61844876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711113383.3A Active CN107911755B (en) 2017-11-10 2017-11-10 Multi-video abstraction method based on sparse self-encoder

Country Status (1)

Country Link
CN (1) CN107911755B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857906B (en) * 2019-01-10 2023-04-07 天津大学 Multi-video abstraction method based on query unsupervised deep learning
CN110110636B (en) * 2019-04-28 2021-03-02 清华大学 Video logic mining device and method based on multi-input single-output coding and decoding model
CN110298270B (en) * 2019-06-14 2021-12-31 天津大学 Multi-video abstraction method based on cross-modal importance perception
CN113008559B (en) * 2021-02-23 2022-02-22 西安交通大学 Bearing fault diagnosis method and system based on sparse self-encoder and Softmax

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104331442A (en) * 2014-10-24 2015-02-04 华为技术有限公司 Video classification method and device
CN106993240A (en) * 2017-03-14 2017-07-28 天津大学 Many video summarization methods based on sparse coding
CN107203636A (en) * 2017-06-08 2017-09-26 天津大学 Many video summarization methods based on the main clustering of hypergraph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8204359B2 (en) * 2007-03-20 2012-06-19 At&T Intellectual Property I, L.P. Systems and methods of providing modified media content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104331442A (en) * 2014-10-24 2015-02-04 华为技术有限公司 Video classification method and device
CN106993240A (en) * 2017-03-14 2017-07-28 天津大学 Many video summarization methods based on sparse coding
CN107203636A (en) * 2017-06-08 2017-09-26 天津大学 Many video summarization methods based on the main clustering of hypergraph

Also Published As

Publication number Publication date
CN107911755A (en) 2018-04-13

Similar Documents

Publication Publication Date Title
Gabeur et al. Multi-modal transformer for video retrieval
Amrani et al. Noise estimation using density estimation for self-supervised multimodal learning
CN107911755B (en) Multi-video abstraction method based on sparse self-encoder
CN109919078B (en) Video sequence selection method, model training method and device
Otani et al. Learning joint representations of videos and sentences with web image search
US20230077849A1 (en) Content recognition method and apparatus, computer device, and storage medium
CN107943990B (en) Multi-video abstraction method based on prototype analysis technology with weight
CN113569001A (en) Text processing method and device, computer equipment and computer readable storage medium
Yang et al. Learning salient visual word for scalable mobile image retrieval
CN110619284B (en) Video scene division method, device, equipment and medium
CN107203636A (en) Many video summarization methods based on the main clustering of hypergraph
CN106993240B (en) Multi-video abstraction method based on sparse coding
Zhou et al. Multi-head attention-based two-stream EfficientNet for action recognition
Piergiovanni et al. Video question answering with iterative video-text co-tokenization
Mahmood et al. Using artificial neural network for multimedia information retrieval
Ramesh Babu et al. A novel framework design for semantic based image retrieval as a cyber forensic tool
Tan et al. DC programming for solving a sparse modeling problem of video key frame extraction
Papagiannopoulou et al. Concept-based image clustering and summarization of event-related image collections
CN113420179B (en) Semantic reconstruction video description method based on time sequence Gaussian mixture hole convolution
Fei et al. Learning user interest with improved triplet deep ranking and web-image priors for topic-related video summarization
Xu et al. Multi-guiding long short-term memory for video captioning
CN110321565B (en) Real-time text emotion analysis method, device and equipment based on deep learning
Mironica et al. Fisher kernel based relevance feedback for multimodal video retrieval
Leonardi et al. Image memorability using diverse visual features and soft attention
CN109857906B (en) Multi-video abstraction method based on query unsupervised deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant