CN107943990B - Multi-video abstraction method based on prototype analysis technology with weight - Google Patents

Multi-video abstraction method based on prototype analysis technology with weight Download PDF

Info

Publication number
CN107943990B
CN107943990B CN201711249015.1A CN201711249015A CN107943990B CN 107943990 B CN107943990 B CN 107943990B CN 201711249015 A CN201711249015 A CN 201711249015A CN 107943990 B CN107943990 B CN 107943990B
Authority
CN
China
Prior art keywords
video
weight
prototype
frames
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711249015.1A
Other languages
Chinese (zh)
Other versions
CN107943990A (en
Inventor
冀中
江俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201711249015.1A priority Critical patent/CN107943990B/en
Publication of CN107943990A publication Critical patent/CN107943990A/en
Application granted granted Critical
Publication of CN107943990B publication Critical patent/CN107943990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of video processing, and provides a multi-video abstraction technology based on a prototype analysis method with weight, which is suitable for the characteristics, so that the specific information of data is fully utilized under the assistance of effective prior information. The invention adopts the technical scheme that a multi-video abstract method based on a weighted prototype analysis technology is characterized in that firstly, a graph model with weights is used for modeling the relation between video frames, so as to obtain a weight matrix required by the weighted prototype analysis; then, a weighted prototype analysis is used to obtain key frames, and a video summary with a given length is generated. The invention is mainly applied to video processing occasions.

Description

Multi-video abstraction method based on prototype analysis technology with weight
Technical Field
The invention relates to the technical field of video processing, in particular to a multi-video abstraction method based on a prototype analysis technology with weight.
Background
With the rapid development of information technology, video data is emerging in large quantities, and becomes one of important ways for people to acquire information. However, due to the dramatic increase in the number of videos, redundant and repetitive information occurs in a large amount of video data, which makes it difficult for a user to quickly acquire desired information. Therefore, under the circumstances, a technology capable of integrating and analyzing mass video data under the same theme is urgently needed to meet the requirement that people want to browse main information of videos quickly and accurately and improve the information acquisition capability of people. The multiple video summarization technique has attracted increasing researchers' attention over the past several decades as one of the effective ways to solve the above-mentioned problems. The multi-video abstraction technology is a content-based video data compression technology, and aims to analyze and integrate a plurality of videos of related topics under the same event, extract main contents in the plurality of videos, and present the extracted contents to a user according to a certain logical relationship. Currently, the multi-video summary is mainly analyzed from three aspects: 1) coverage rate; 2) novelty; 3) the importance of which. Coverage refers to the fact that the extracted video content can cover the main content of multiple videos on the same topic. Redundancy refers to removing duplicate, redundant information in a multi-video summary. The importance refers to extracting important key shots in a video set according to some prior information so as to extract important contents in a plurality of videos.
Although many single video summarizations have been proposed, the research on the multi-video summarization method is less and still in the preliminary stage. This is mainly due to two reasons: 1) one is due to the diversity of multiple video topics under the same event and the cross-correlation of topics between videos. The theme diversity means that information emphasis points of a plurality of videos in the same event are different and a plurality of sub-themes are provided. The topic cross-over refers to that the content of the videos under the same event has cross-over, and the videos have similar content and different information content. 2) Secondly, the audio information, text information and visual information may have a large difference due to the audio information expressed by the multiple video data to the same content. These reasons make the study of multiple video summaries difficult with traditional single video summaries.
In the past decades, methods for multi-video summarization have been proposed for the features of multi-video data sets. The multi-video summarization method based on complex graph clustering is a relatively classical method. The method comprises the steps of constructing a complex graph by extracting key words of corresponding script information of the video and key frames of the video, and realizing abstract by utilizing a graph clustering algorithm on the basis. However, the method mainly aims at news videos, the method loses significance for video sets without video script information, in addition, because the content of a plurality of videos under the same theme has diversity and redundancy, the maximum coverage condition of the video content is met only by using a clustering method, the clustering effect is poor only by using the visual information of the videos aiming at multi-video abstract, and the complexity is higher although the method is combined with other modes to have certain help.
The multi-video abstract has information of multiple modalities, such as text information, visual information, audio information, and the like of a video. The Balanced AV-MMR (Balanced Audio Video maximum local retrieval) is a multi-Video summarization technology which effectively utilizes Video multi-modal information, and analyzes visual information, Audio information and semantic information in the visual information and the Audio information of a Video, wherein the semantic information comprises Audio, human face, time characteristics and the like which have important significance for Video summarization. The method effectively utilizes the multi-modal information of the video, but the extracted video abstract does not achieve a good effect.
In recent years, novel methods have been proposed. Among them, the realization of multi-video summarization by using the visual co-occurrence characteristics (visualCo-occurrence) of video is a novel method. The method considers that important visual concepts are frequently repeatedly appeared in a plurality of videos under the same theme, and provides a maximum binary search algorithm (maximum binary matching) according to the characteristic to extract a sparse co-occurrence mode of the videos, so that multi-video abstraction is realized. However, the method is only suitable for a specific data set, and the method loses significance for a video set with small repeatability in the video.
In addition, in order to utilize more related information, related researchers have proposed that sensors such as a GPS and a compass on a mobile phone are used to acquire information such as a geographic position during a mobile phone video shooting process, and thus assist in determining important information in a video and generating a multi-video summary. In addition, the prior information of the webpage picture is used as auxiliary information in the field, and multi-video abstraction is better realized. At present, due to the complexity of multi-video data, the research of multi-video summarization does not achieve the ideal effect. Therefore, how to better utilize the information of the multi-video data to better realize the multi-video summarization becomes a hot spot of research of relevant scholars at present. To this end, it is proposed herein to implement multiple video summarization using a prototype Analysis technique (Archetypal Analysis).
Archetypal Analysis (AA) treats each data point in a dataset as a blended result of a set of single, observable archetypes, while archetypes themselves are limited to a sparse mixture of data points in a dataset, and are typically located at the boundaries of a dataset. AA models are widely used in different fields, such as economics, astrophysics and pattern recognition. The usefulness of AA models for feature extraction and dimensionality reduction is exploited by machine learning algorithms in various fields, such as computer vision, neural images, chemistry, text mining and collaborative filtering.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a multi-video abstraction technology which is suitable for the characteristic and is based on a prototype analysis method with weight, so that the specific information of data is fully utilized under the assistance of effective prior information. The invention adopts the technical scheme that a multi-video abstract method based on a weighted prototype analysis technology is characterized in that firstly, a graph model with weights is used for modeling the relation between video frames, so as to obtain a weight matrix required by the weighted prototype analysis; then, a weighted prototype analysis is used to obtain key frames, and a video summary with a given length is generated.
The method comprises the following specific steps of obtaining a weight matrix required by prototype analysis with weight:
constructing a weighted simple graph, giving l videos under the same event, preprocessing the videos to obtain n candidate key frames, and expressing the candidate key frames as feature vectors X ═ { f ═ f1,f2,f3,...,fn},fi∈Rm,fiThe method comprises the steps of representing m-dimensional feature vectors of an ith candidate key frame, constructing a visual similarity graph G (X, E, W) by taking the candidate key frame as a vertex, wherein X represents the vertex, E represents a connecting edge between video frames, W represents a visual connecting weight of the edge, and firstly calculating cosine similarity A (f) between the video frames in order to calculate Wi,fj) The calculation formula is as in equation (1):
Figure BDA0001491296660000021
where sim (i, j) denotes the cosine similarity between the ith frame and the jth network image;
constructing a graph model with weights, additionally adding a weight to a connecting edge between video frames of cross videos by utilizing the similarity between the videos, and designing a weight matrix W to present the relationvThe specific calculation is as in equation (2):
Figure BDA0001491296660000022
where v (f) denotes a video containing frame f, sim (v (f)i),v(fj) Is indicative of containing frame fiAnd comprises a frame fjThe similarity refers to cosine similarity obtained according to text information of the video, the expression given above only adds weight to the connecting edge between frames crossing the video, and the weight of the connecting edge between frames in the video remains unchanged;
calculating the average similarity between the video frame and all the network images, and using the similarity as an importance standard of the video frame, wherein the specific calculation mode is as shown in formula (3):
Figure BDA0001491296660000031
wherein g isjRepresents the j network image, sim (f)i,gj) Representing video frames fiAnd gjCosine similarity of (d);
the calculation of the connection weight matrix W of the edges of the constructed weighted graph model is shown in equation (4):
Figure BDA0001491296660000032
the specific steps in one example are as follows:
1) extracting visual features of the video frames and the network images based on the query and text features corresponding to the videos: visual features of a video frame are denoted X ═ f1,f2,f3,...,fn},fi∈RmVisual characteristics of the network image are expressed as { g }1,g2,...,gk},gk∈Rm,gkRepresenting the m-dimensional feature vector of the k network image, and the text feature of the video is represented as t1,t2,...,tl},ta∈Rd,taText features representing the a-th video;
2) construct a weighted complete graph: in order to model the correlation relationship between video frames, a weighted simple graph G is constructed by regarding the video frames as vertexes, and a matrix W is solved by using formulas (1) - (4);
3) using the weight matrix W obtained in step 2 as the weight of prototype analysis problem, and using formula
Figure BDA0001491296660000033
Constructing an input matrix
Figure BDA0001491296660000034
4) At a given point
Figure BDA0001491296660000035
Performing prototype analysis with weight, and alternately obtaining optimal solution matrixes P and Q by using an estimation algorithm, wherein P represents a coefficient matrix of prototype reconstruction input, and Q represents a coefficient matrix of input reconstruction prototype;
5) according to the formula
Figure BDA0001491296660000036
Calculate the importance score S for each prototypei
6) Sorting the prototypes in a descending manner, and selecting the prototypes with the importance scores larger than a certain threshold value epsilon.
7) Starting from the prototype with the maximum importance score, selecting a video frame corresponding to the row number corresponding to the maximum element score in the column of Q corresponding to the prototype, judging the similarity between the frame and all the previously selected frames, and if the similarity is greater than a threshold value, not including the frame in the abstract; if all prototypes do not reach the length of the abstract after the iteration of the process, performing a next round of selection process, and selecting a key frame by selecting the row number corresponding to the second maximum value from each column of Q; the above process is then iterated until the required digest length is met.
The invention has the characteristics and beneficial effects that:
the invention mainly aims at the characteristics of the existing multi-video abstract data set, designs a multi-video abstract technology which is suitable for the characteristics and is based on a prototype analysis method with weight, and makes full use of the specific information of the data under the assistance of effective prior information. The main advantages are mainly as follows:
(1) the novelty is as follows: the prototype analysis method with the weight is applied to the multi-video abstract facing the query for the first time. And the text information of the video and the network image information based on the query are introduced into the multi-video abstract together by utilizing the weighted graph model to model the relationship between the video frames.
(2) Effectiveness: compared with a typical clustering method and a minimum sparse reconstruction method applied to single video abstraction, the performance of the multi-video abstraction method based on prototype analysis with weight designed by the invention is obviously superior to that of the clustering method and the minimum sparse reconstruction method, so that the method is more suitable for the multi-video abstraction problem.
(3) The practicability is as follows: the method is simple and feasible and can be used in the field of multimedia information processing.
Description of the drawings:
fig. 1 is a flow chart of video key shot extraction based on a weighted prototype analysis method provided by the present invention.
Detailed Description
The invention aims at the characteristics of more redundant information and repeated information of multimedia video data, combines visual information, text information and other prior information related to subjects of videos, and utilizes the prototype analysis idea to improve the traditional multi-video summarization method, thereby achieving the purposes of effectively utilizing the related information of the video subjects and improving the video browsing efficiency of users.
The method provided by the invention mainly comprises the following steps: 1) a weighted graph model is first designed for constructing associations between sentences. 2) And then designing a key frame selection method suitable for the characteristics of the query-oriented multi-video abstract data set by utilizing a prototype analysis technology with weight.
Archetypal Analysis (AA) treats each data point in a dataset as a blended result of a set of single, observable archetypes, while archetypes themselves are limited to a sparse mixture of data points in a dataset, and are typically located at the boundaries of a dataset.
Given an n × m matrix X ═ f1,f2,...fi,...,fn},fi∈RmAnd z < n, the prototype analysis problem factors the matrix W into two random matrices P ∈ Rn×zAnd Q ∈ Rn×zP represents the coefficient matrix of the prototype reconstruction input and Q represents the coefficient matrix of the input reconstruction prototype, as follows:
X≈PA with A=XTQ (4)
the prototype analysis algorithm first initializes matrices P and Q to compute prototype matrix a, and then updates P and Q with equation (5) until the residual sum of squares RSS converges to a sufficiently small value or the maximum number of iterations is reached.
The prototype analysis problem described above treats all video frames as frames with the same weight, and each data point (video frame) and its corresponding residual are normalized by the same weight to obtain a prototype using equation (5). In the multi-video summary, the video frames are not identical, and there is an importance point between them. The invention will therefore use weighted prototype analysis to obtain key frames.
Firstly, the invention utilizes a weighted graph model to model the relationship between video frames, thereby obtaining a weight matrix required by weighted prototype analysis.
In order to model the relationship between video frames, the invention constructs a simple graph with weights. Giving l videos under the same event, preprocessing the videos to obtain n candidate key frames X-f1,f2,f3,...,fn},fi∈Rm. The invention takes the candidate key frames as the vertexes to construct a visual similarity graph G ═ X, E and W, wherein X represents the vertexes, E represents the connecting edges between the video frames, and W represents the visual connecting weight of the edges. To calculate W, the present invention first calculates the cosine similarity A (f) between video framesi,fj) The calculation formula is as in equation (1):
Figure BDA0001491296660000051
where sim (i, j) denotes the cosine similarity between the ith frame and the jth network image.
It is observed that distinguishing the inter-frame similarity relationship within a video from the inter-frame similarity relationship between videos helps to improve the quality of the multi-video summary. Therefore, in order to reflect the influence of the relationship between videos on the similarity relationship between frames, a weighted graph model is constructed. The invention will take advantage of the similarity between videos to add an additional weight to the connecting edge between video frames across videos. To present this relationship, the present invention designs a weight matrix WvThe specific calculation is as in equation (2):
Figure BDA0001491296660000052
where v (f) represents a video containing frame f. sim (v (f)i),v(fj) Is indicative of containing frame fiAnd comprises a frame fjThe similarity refers to cosine similarity obtained according to text information of the videos. The expression given above only adds weights across the connecting edges between frames of the video, while the connecting edge weights of frames within the video remain unchanged.
Recently, as more and more user-generated information available on websites, such as images, videos, etc., is available, it is a natural idea to use this external information as an aid to generate summaries. We treat the query image as a priori information to obtain important content of the video. The query image is uploaded as complementary information to the video after being carefully selected by the user, thus presenting the main content of the event in a more semantic way and having less redundant and noisy information relative to the video. All this indicates that the query picture as a priori information facilitates the generation of the multi-video summary. Therefore, the invention firstly calculates the average similarity between the video frame and all the network images, and takes the similarity as the importance standard of the video frame, and the specific calculation mode is as shown in formula (3):
Figure BDA0001491296660000053
wherein g isjRepresents the j network image, sim (f)i,gj) Representing video frames fiAnd gjCosine similarity of (1), WqRepresenting video frames fiThe sum of cosine similarities with all network images.
The calculation of the connection weight matrix W of the edges of the thus constructed weighted graph model is as shown in equation (4):
Figure BDA0001491296660000054
after the weight matrix W is obtained, the method obtains the key frame by applying a prototype analysis technology with weight. The weighted prototype analysis problem can be considered as a minimization problem:
Figure BDA0001491296660000061
this problem can also be rewritten as:
Figure BDA0001491296660000062
therefore, the multi-video abstract method based on Weighted archetype analysis mainly comprises three stages of early-stage data preparation, solving of a weight matrix required by the archetype analysis by using a Weighted graph model and solving of the Weighted archetype analysis.
FIG. 1 depicts a flow diagram for extracting key shots in a video using a weighted-based prototype analysis method in conjunction with web page image prior information. The main idea of this method is to soft-cluster (soft-clustered) video frames into weighted prototypes (archetypes), then sort the video frames according to the prototypes, and select the video frames arranged in front as key frames, to generate a video summary of a given length. The method comprises the following specific steps:
1) extracting video frames and query-based network graphsVisual features of the image and corresponding textual features of the video. Visual features of a video frame are denoted X ═ f1,f2,f3,...,fn},fi∈RmVisual characteristics of the network image are expressed as { g }1,g2,...,gk},gj∈RmThe text feature of the video is denoted as t1,t2,...,tl},ta∈Rd
2) A weighted full graph is constructed. In order to model the correlation between video frames, the invention takes the video frames as vertices to construct a weighted simple graph G ═ X, E, W, and solves the matrix W using equations (1) - (4).
3) Using the weight matrix W obtained in step 2 as the weight of prototype analysis problem, and using formula
Figure BDA0001491296660000063
Constructing an input matrix
Figure BDA0001491296660000064
4) At a given point
Figure BDA0001491296660000065
And performing weighted prototype analysis, and alternately obtaining optimal solutions P and Q by using an estimation algorithm.
5) According to the formula
Figure BDA0001491296660000066
Calculate the importance score S for each prototypei
6) Sorting the prototypes in a descending manner, and selecting the prototypes with the importance scores larger than a certain threshold value epsilon.
7) Starting from the prototype with the maximum importance score, selecting a video frame corresponding to the row number corresponding to the maximum element score in the column of Q corresponding to the prototype, judging the similarity between the frame and all the previously selected frames, and if the similarity is greater than a certain threshold value, excluding the frame in the summary. And if all prototypes do not reach the length of the abstract after the iteration of the process, performing a next round of selection process, and selecting the row number corresponding to the second maximum value from each column of Q to select the key frame. The above process is then iterated until the required digest length is met.

Claims (2)

1. A multi-video abstraction method based on weighted prototype analysis technology is characterized in that firstly, a weighted graph model is used for modeling the relation between video frames, so as to obtain a weight matrix required by weighted prototype analysis; secondly, acquiring key frames by utilizing prototype analysis with weight to generate a video abstract with a given length; the specific steps of obtaining the weight matrix required by the prototype analysis with the weight are as follows:
constructing a weighted simple graph, giving l videos under the same event, preprocessing the videos to obtain n candidate key frames, and expressing the candidate key frames as feature vectors X ═ { f ═ f1,f2,f3,...,fn},fi∈Rm,fiThe method comprises the steps of representing m-dimensional feature vectors of an ith candidate key frame, constructing a visual similarity graph G (X, E, W) by taking the candidate key frame as a vertex, wherein X represents the vertex, E represents a connecting edge between video frames, W represents a visual connecting weight of the edge, and firstly calculating cosine similarity A (f) between the video frames in order to calculate Wi,fj) The calculation formula is as in equation (1):
where sim (i, j) denotes the cosine similarity between the ith frame and the jth network image;
constructing a graph model with weights, additionally adding a weight to a connecting edge between video frames of cross videos by utilizing the similarity between the videos, and designing a weight matrix W to present the relationvThe specific calculation is as in equation (2):
where v (f) denotes a video containing frame f, sim (v (f)i),v(fj) Is indicative of containing frame fiAnd comprises a frame fjThe similarity refers to cosine similarity obtained according to text information of the video, the expression given above only adds weight to the connecting edge between frames crossing the video, and the weight of the connecting edge between frames in the video remains unchanged;
calculating the average similarity between the video frame and all the network images, and using the similarity as an importance standard of the video frame, wherein the specific calculation mode is as shown in formula (3):
Figure FDA0002240504250000013
wherein g isjRepresents the j network image, sim (f)i,gj) Representing video frames fiAnd gjCosine similarity of (d);
the calculation of the connection weight matrix W of the edges of the constructed weighted graph model is shown in equation (4):
W=A⊙Wv⊙Wq(4)。
2. the method for multi-video summarization based on weighted prototype analysis technique according to claim 1, comprising the following steps:
1) extracting visual features of the video frames and the network images based on the query and text features corresponding to the videos: visual features of a video frame are denoted X ═ f1,f2,f3,...,fn},fi∈RmVisual characteristics of the network image are expressed as { g }1,g2,...,gk},gk∈Rm,gkRepresenting the m-dimensional feature vector of the k network image, and the text feature of the video is represented as t1,t2,...,tl},ta∈Rd,taText features representing the a-th video;
2) construct a weighted complete graph: in order to model the correlation relationship between video frames, a weighted simple graph G is constructed by regarding the video frames as vertexes, and a matrix W is solved by using formulas (1) - (4);
3) using the weight matrix W obtained in step 2 as the weight of prototype analysis problem, and using formula
Figure FDA0002240504250000021
Constructing an input matrix
Figure FDA0002240504250000022
4) At a given point
Figure FDA0002240504250000023
Performing prototype analysis with weight, and alternately obtaining optimal solution matrixes P and Q by using an estimation algorithm, wherein P represents a coefficient matrix of prototype reconstruction input, and Q represents a coefficient matrix of input reconstruction prototype;
5) according to the formula
Figure FDA0002240504250000024
Calculate the importance score S for each prototypei
6) Sorting the prototypes in a descending manner, and selecting the prototypes with the importance scores larger than a certain threshold value epsilon;
7) starting from the prototype with the maximum importance score, selecting a video frame corresponding to the row number corresponding to the maximum element score in the column of Q corresponding to the prototype, judging the similarity between the frame and all the previously selected frames, and if the similarity is greater than a threshold value, not including the frame in the abstract; if all prototypes do not reach the length of the abstract after the iteration of the process, performing a next round of selection process, and selecting a key frame by selecting the row number corresponding to the second maximum value from each column of Q; the above process is then iterated until the required digest length is met.
CN201711249015.1A 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight Active CN107943990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711249015.1A CN107943990B (en) 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711249015.1A CN107943990B (en) 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight

Publications (2)

Publication Number Publication Date
CN107943990A CN107943990A (en) 2018-04-20
CN107943990B true CN107943990B (en) 2020-02-14

Family

ID=61948265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711249015.1A Active CN107943990B (en) 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight

Country Status (1)

Country Link
CN (1) CN107943990B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110769279B (en) 2018-07-27 2023-04-07 北京京东尚科信息技术有限公司 Video processing method and device
CN109857906B (en) * 2019-01-10 2023-04-07 天津大学 Multi-video abstraction method based on query unsupervised deep learning
CN110147469B (en) * 2019-05-14 2023-08-08 腾讯音乐娱乐科技(深圳)有限公司 Data processing method, device and storage medium
CN110298270B (en) * 2019-06-14 2021-12-31 天津大学 Multi-video abstraction method based on cross-modal importance perception
CN111062284B (en) * 2019-12-06 2023-09-29 浙江工业大学 Visual understanding and diagnosis method for interactive video abstract model
CN111339359B (en) * 2020-02-18 2020-12-22 中山大学 Sudoku-based video thumbnail automatic generation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106993240A (en) * 2017-03-14 2017-07-28 天津大学 Many video summarization methods based on sparse coding
CN107203636A (en) * 2017-06-08 2017-09-26 天津大学 Many video summarization methods based on the main clustering of hypergraph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106993240A (en) * 2017-03-14 2017-07-28 天津大学 Many video summarization methods based on sparse coding
CN107203636A (en) * 2017-06-08 2017-09-26 天津大学 Many video summarization methods based on the main clustering of hypergraph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization;Ercan Canhasi et al;《Expert Systems with Applications》;20141231;第535–543页 *
Weighted hierarchical archetypal analysis for multi-document summarization;Ercan Canhasi et al;《Computer Speech and Language》;20161231;第24-46页 *

Also Published As

Publication number Publication date
CN107943990A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN107943990B (en) Multi-video abstraction method based on prototype analysis technology with weight
Hidasi et al. Parallel recurrent neural network architectures for feature-rich session-based recommendations
Peng et al. Cross-media shared representation by hierarchical learning with multiple deep networks.
CN107102989B (en) Entity disambiguation method based on word vector and convolutional neural network
Yang et al. Unsupervised extraction of video highlights via robust recurrent auto-encoders
CN107203636B (en) Multi-video abstract acquisition method based on hypergraph master set clustering
Gupta et al. Nonnegative shared subspace learning and its application to social media retrieval
US8451292B2 (en) Video summarization method based on mining story structure and semantic relations among concept entities thereof
Cao et al. Hybrid representation learning for cross-modal retrieval
CN107423282A (en) Semantic Coherence Sexual Themes and the concurrent extracting method of term vector in text based on composite character
CN103559196A (en) Video retrieval method based on multi-core canonical correlation analysis
CN101299241A (en) Method for detecting multi-mode video semantic conception based on tensor representation
Zhang et al. Recognition of emotions in user-generated videos with kernelized features
US20140229486A1 (en) Method and apparatus for unsupervised learning of multi-resolution user profile from text analysis
CN107911755B (en) Multi-video abstraction method based on sparse self-encoder
Wen et al. Seeking the shape of sound: An adaptive framework for learning voice-face association
CN106993240B (en) Multi-video abstraction method based on sparse coding
Liu et al. Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation
CN109034953B (en) Movie recommendation method
Gao et al. Deep spatial pyramid features collaborative reconstruction for partial person reid
Guo et al. Attention based consistent semantic learning for micro-video scene recognition
Ji et al. Multi-video summarization with query-dependent weighted archetypal analysis
Dai et al. Two-stage model for social relationship understanding from videos
Wang et al. Balance act: Mitigating hubness in cross-modal retrieval with query and gallery banks
Zhang et al. Referring expression comprehension with semantic visual relationship and word mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant