CN107943990A - More video summarization methods of archetypal analysis technology based on Weight - Google Patents
More video summarization methods of archetypal analysis technology based on Weight Download PDFInfo
- Publication number
- CN107943990A CN107943990A CN201711249015.1A CN201711249015A CN107943990A CN 107943990 A CN107943990 A CN 107943990A CN 201711249015 A CN201711249015 A CN 201711249015A CN 107943990 A CN107943990 A CN 107943990A
- Authority
- CN
- China
- Prior art keywords
- mrow
- video
- msub
- weight
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to technical field of video processing, to propose more video summarization techniques of the archetypal analysis method based on Weight suitable for the feature, is allowed under the auxiliary of effective prior information, fully utilizes the peculiar information of data.The technical solution adopted by the present invention is more video summarization methods of the archetypal analysis technology based on Weight, first with the relation between the graph model modeling video frame of Weight, so as to obtain the weight matrix needed for the archetypal analysis of Weight;Then key frame is obtained using the archetypal analysis of Weight, generates the video frequency abstract of given length.Present invention is mainly applied to Video processing occasion.
Description
Technical field
The present invention relates to technical field of video processing, specifically, is related to archetypal analysis technology based on Weight more and regards
Frequency method of abstracting.
Background technology
With the fast development of information technology, video data emerges in multitude, become people obtain information important channel it
One.However, due to the sharp increase of number of videos, occur redundancy and the information repeated in multitude of video data, this makes user quickly obtain
Information needed is taken to become difficult.Therefore, in this case, there is an urgent need to it is a kind of can be to the massive video data under same subject
The technology integrated, analyzed, to meet the needs of people want fast and accurately to browse video main information, improves people
Obtain the ability of information.More video summarization techniques as one of effective way to solve the above problems, in the past few decades in
Cause the concern of more and more researchers.More video summarization techniques are a kind of video data compression skills based on content
Art, it is intended to multiple videos of the related subject under same event are analyzed, are integrated, are extracted main interior in multiple videos
Hold, and the content of extraction is presented to user according to certain logical relation.At present for more video frequency abstracts mainly in terms of three
Analyzed:1) coverage rate;2) novelty;3) importance.It is same that coverage rate refers to that extracted video content can cover
The main contents of multiple videos under theme.Redundancy refers to removing repetition, redundancy the information in more video frequency abstracts.It is important
Property refer to is then to extract important crucial camera lens in video set according to some prior informations, so as to extract important in multiple videos
Content.
Although many single videos summary it has been proposed that research for more video summarization methods it is less, still in
Elementary step.This mainly has two reasons:1) one be due under same event the diversity of multiple video subjects and video it
Between theme intercrossing.Theme diversity refers to that the information emphasis of multiple videos under same event is different, has multiple
Sub-topics.And content has intercrossing between theme intercrossing refers to the video under same event, existing similar content, also has
The different information contents.2) two be due to audio-frequency information that more video datas show same content, text message and
There may be bigger difference for visual information.These reasons cause the research of more video frequency abstracts to be difficult to traditional single video summary.
In the past few decades, people are directed to the characteristics of more sets of video data, it is proposed that the side of some more video frequency abstracts
Method.Wherein, more video summarization methods based on complicated figure cluster are the classical methods of a comparison.Such method passes through extraction
The keyword of video corresponding scripts information and the key frame of video, build the figure of complexity, and are calculated on this basis using figure cluster
Method realizes summary.But this method, mainly for news video, for the video set of no video script information, this method is just lost
Meaning has been gone, there is diversity and redundancy additionally, due to the content that multiple videos under same subject include, only with cluster
Although method meets the maximal cover condition of video content, for more video frequency abstracts, only with the visual information cluster effect of video
Fruit is poor, though there is certain help with reference to other mode, complexity is larger.
There are the information of multiple modalities, text message, visual information, the audio-frequency information of such as video in more video frequency abstracts.
Balanced AV-MMR (Balanced Audio Video Maximal Marginal Relevance) are a kind of effectively profits
With more video summarization techniques of video multiple modalities information, it is by analyzing visual information, audio-frequency information and the vision of video
Semantic information in information and audio-frequency information, including audio, face and temporal characteristics etc. these for video frequency abstract have weight
Want the information of meaning.The process efficiently utilizes the multi-modal information of video, but the video frequency abstract that extracts and not up to preferable
Effect.
In recent years, there has been proposed some novel methods.Wherein, the vision co-occurrence characteristic (visual of video is utilized
Co-occurrence it is one of them relatively new method) to realize more video frequency abstracts.The visual concept that this method thinks important is past
Maximum two tuple lookup algorithms are proposed toward being repetitively appearing in multiple videos under same subject, and according to this feature
(Maximal Biclique Finding), extracts the sparse co-occurrence pattern of more videos, so as to fulfill more video frequency abstracts.But should
Method is only applicable to specific data set, and meaning is just lost for the less video set of repeatability, this method in video.
In addition, in order to utilize more relevant informations, correlative study person proposes to be passed using the GPS on mobile phone and compass etc.
Sensor obtains the information such as the geographical location in mobile video shooting process, and the thus important information in auxiliary judgment video, raw
Into more video frequency abstracts.In addition, the art teaches by the use of this prior information of Web page picture as auxiliary information, preferably in fact
Now more video frequency abstracts.At present, since the complexity of more video datas, the research of more video frequency abstracts do not reach ideal effect.
Therefore, the information of more video datas how is better profited from, more video frequency abstracts are better achieved, become presently relevant scholar and grinds
The hot spot studied carefully.For this reason, this paper presents realize more video frequency abstracts using archetypal analysis technology (Archetypal Analysis).
Each data point in data set is considered as one group of list by archetypal analysis technology (Archetypal Analysis, AA)
One, the mixing resultant of observable prototype, and prototype is limited to the sparse mixing of data point in data set in itself, and generally
Positioned at the boundary of data set.AA models are widely used in different fields, such as in economics, astrophysics and pattern
In identification.The serviceability that AA models reduce feature extraction and dimension is utilized by the machine learning algorithm in various fields, such as
Say from computer vision, neuro images, chemistry, the field such as text mining and collaborative filtering.
The content of the invention
For overcome the deficiencies in the prior art, the present invention is directed to propose the archetypal analysis based on Weight suitable for the feature
More video summarization techniques of method, are allowed under the auxiliary of effective prior information, fully utilize the peculiar information of data.This
The technical solution that invention uses is more video summarization methods of the archetypal analysis technology based on Weight, first with Weight
Graph model modeling video frame between relation, so as to obtain the weight matrix needed for the archetypal analysis of Weight;Then utilize
The archetypal analysis of Weight obtain key frame, generate the video frequency abstract of given length.
Obtain the weight matrix specific steps needed for the archetypal analysis of Weight:
The simple graph of a Weight is built, gives l video under same event, n frames time is obtained after being pre-processed
Key frame is selected, is expressed as feature vector, X={ f1,f2,f3,...,fn},fi∈Rm, fiRepresent the m Wei Te of i-th of candidate key-frames
Sign vector, using candidate key-frames as vertex structure visual similarity figure G=(X, E, W), wherein X represents vertex, and E represents video
Connection side between frame, W represent the vision connection weight on side, in order to calculate W, calculate the cosine similarity between video frame first
A(fi,fj), its calculation formula such as equation (1):
Here sim (i, j) represents the cosine similarity between the i-th frame and jth network image;
The graph model of a Weight is built, will be the company between the video frame of video using the similitude between video
Edge fit additionally adds a weight, in order to which this relation, design weight matrix W is presentedv, its specific calculation such as equation (2):
Here v (f) represents the video for including frame f, sim (v (fi),v(fj)) represent to include frame fiVideo and include frame fj
Video between similitude, similitude here refers to the cosine similarity obtained according to the text message of video, above-mentioned institute
The expression formula given is only the connection side increase weight between the frame of video, and in video between the connection side right of frame keep again
It is constant;
Calculate video frame and the average similarity of all-network image, and the importance mark using the similitude as video frame
Standard, shown in its specific calculation such as formula (3):
Wherein gjRepresent jth network image, sim (fi,gj) represent video frame fiWith gjCosine similarity;
Shown in the calculating such as equation (4) of the connection weight matrix W on the side of the graph model of the Weight of structure:
Comprised the following steps that in one example:
1) video frame text feature corresponding with the visual signature and video of the network image based on inquiry is extracted:Video frame
Visual signature be expressed as X={ f1,f2,f3,...,fn},fi∈Rm, the visual signature of network image is expressed as { g1,g2,...,
gk},gk∈Rm, gkRepresent the m dimensional feature vectors of kth network image, the Text Representation of video is { t1,t2,...,tl},
ta∈Rd, taRepresent the text feature of a-th of video;
2) complete graph of Weight is built:In order to model the dependency relation between video frame, regard video frame as vertex structure
The simple graph G=(X, E, W) of Weight is built, and utilizes formula (1)-(4) solution matrix W;
3) weight by the use of the weight matrix W that step 2 obtains as archetypal analysis problem, and use formulaStructure
Input matrix
4) givenThe upper archetypal analysis for performing Weight, and alternately obtain optimal dematrix P using algorithm for estimating
And Q, P represent the coefficient matrix of prototype reconstruct input, Q represents the coefficient matrix of input reconstruct prototype;
5) according to formulaCalculate the importance scores S of each prototypei;
6) mode in descending order sorts prototype, chooses the prototype that importance scores are more than certain threshold epsilon.
7) since the prototype of importance scores maximum, select element score maximum institute right in the row of its corresponding Q
The video frame corresponding to line number answered, judges the similitude of the frame and previously selected all frames, if similitude is more than threshold
Value, then do not include the frame in making a summary;If after all complete above process of prototype iteration, the length not up to made a summary then carries out down
Selection process is taken turns, the line number corresponding to Second Largest Value is chosen from each column of Q to choose key frame;Then the iteration above process is straight
To length of summarization needed for satisfaction.
The features of the present invention and beneficial effect are:
The characteristics of present invention is primarily directed to existing more video frequency abstract data sets, design is suitable for the feature based on band
More video summarization techniques of the archetypal analysis method of weight, are allowed under the auxiliary of effective prior information, fully utilize number
According to peculiar information.Its main advantage is mainly reflected in:
(1) novelty:First by the archetypal analysis method first Application of Weight in more video frequency abstracts of inquiry oriented.And
The text message of video and network image information based on inquiry are collectively incorporated into more video frequency abstracts using the graph model of Weight
In model the relation between video frame.
(2) validity:It has been experimentally confirmed sparse with the typical clustering method applied to single video summary and minimum
Reconstructing method compares, and the performance of the more video summarization methods for the archetypal analysis based on Weight that the present invention designs is substantially better than
Both, therefore more suitable in more video frequency abstract problems.
(3) practicality:Simple possible, can be used in multimedia signal processing field.
Brief description of the drawings:
Fig. 1 is the flow chart of the Video Key camera lens extraction of the archetypal analysis method provided by the invention based on Weight.
Embodiment
The features such as present invention is directed to the redundancy of multimedia video data, duplicate message is more, with reference to the vision of video
Information, text message and the other and relevant prior information of theme, using archetypal analysis thought to traditional more video frequency abstract sides
Method is improved, and has been achieveed the purpose that to efficiently use video subject relevant information, has been improved user and browse video efficiency.
Method provided by the present invention is broadly divided into:1) graph model for designing Weight first is used to construct between sentence
Relevance.2) and then the archetypal analysis Technology design of Weight is utilized to be suitable for more video frequency abstract data set features of inquiry oriented
Crucial frame selecting method.
Each data point in data set is considered as one group of list by archetypal analysis technology (Archetypal Analysis, AA)
One, the mixing resultant of observable prototype, and prototype is limited to the sparse mixing of data point in data set in itself, and generally
Positioned at the boundary of data set.
Matrix X={ the f of given n × m1,f2,...fi,...,fn},fi∈Rm, and in the case of z < < n, archetypal analysis
Matrix W factorization is two random matrix P ∈ R by problemn×zWith Q ∈ Rn×z, the coefficient matrix of P expression prototype reconstruct inputs,
Q represents the coefficient matrix of input reconstruct prototype, as follows:
X ≈ PA with A=XTQ (4)
Archetypal analysis algorithm initializes matrix P and Q and calculates prototype matrix A first, then straight using equation (5) renewal P and Q
A fully small value is converged on to residual sum of squares (RSS) RSS or reaches maximum iteration.
But above-mentioned archetypal analysis problem regards all video frame as the frame with equal weight, each data point
(video frame) and its corresponding residual error minimize equation (5) to obtain prototype with identical weight.And in more video frequency abstracts, depending on
Frequency frame is not point that be equivalent, making a difference between them.Therefore the invention will be obtained using the archetypal analysis of Weight
Key frame.
The present invention is using the relation between the graph model modeling video frame of Weight first, so as to obtain the prototype of Weight
Weight matrix needed for analysis.
In order to model the relation between video frame, the present invention constructs the simple graph of a Weight.Give same event
Under l video, n frame candidate key-frames X={ f are obtained after being pre-processed1,f2,f3,...,fn},fi∈Rm.The present invention will wait
Selecting key frame, wherein X represents vertex as vertex structure visual similarity figure G=(X, E, W), and E represents the company between video frame
Edge fit, W represent the vision connection weight on side.In order to calculate W, the present invention calculates the cosine similarity A between video frame first
(fi,fj), its calculation formula such as equation (1):
Here sim (i, j) represents the cosine similarity between the i-th frame and jth network image.
The frame-to-frame coherence relation that observation finds to distinguish in video between the similarity relationships and video of interframe helps to improve
The quality of more video frequency abstracts.Therefore the influence of the similarity relationships for the relation pair interframe between reflecting video, build herein
The graph model of one Weight.The present invention will be the connection side between the video frame of video using the similitude between video
Additionally one weight of addition.In order to which this relation is presented, the present invention devises weight matrix Wv, its specific calculation such as equation
(2):
Here v (f) represents the video for including frame f.sim(v(fi),v(fj)) represent to include frame fiVideo and include frame fj
Video between similitude, similitude here refers to the cosine similarity obtained according to the text message of video.Above-mentioned institute
The expression formula given is only the connection side increase weight between the frame of video, and in video between the connection side right of frame keep again
It is constant.
Recently, with retrievable user's generation information on website, such as image, video information are more and more, and one certainly
Right idea is to be used as auxiliary generation summary by the use of these external informations.We, which regard query image as, obtains the important interior of video
The prior information of appearance.Query image is that user uploads after conscientiously selecting as the complementary information of video, therefore with one kind
More there are the main contents that semantic mode presents event, and have for video less redundancy and
Noise information.It is all these to show that inquiring about picture contributes to the generation of more video frequency abstracts as prior information.Therefore the invention is first
Video frame and the average similarity of all-network image, and the importance criteria using the similitude as video frame are first calculated, its
Shown in specific calculation such as formula (3):
Wherein gjRepresent jth network image, sim (fi,gj) represent video frame fiWith gjCosine similarity, WqExpression regards
Frequency frame fiWith the cosine similarity of all-network image and.
Therefore shown in the calculating such as equation (4) of the connection weight matrix W on the side of the graph model of constructed Weight:
After obtaining weight matrix W, which uses the archetypal analysis technical limit spacing key frame of Weight.The prototype of Weight
Problem analysis can be considered as minimization problem:
This problem can also be rewritten as:
Thus, we obtain the regarding of the archetypal analysis (Weighted archetypal analysis) based on Weight more
Frequency method of abstracting mainly includes Primary Stage Data and prepares, solves the weight matrix needed for archetypal analysis using the graph model of Weight,
The archetypal analysis of Weight solve three phases.
Fig. 1 is described with reference to Web page image prior information, using in the archetypal analysis method extraction video based on Weight
Crucial camera lens flow chart.The main thought of this method is into Weight by soft cluster (soft-clustered) of video frame
Prototype (archetypes), then according to this prototype sequencing video frame, and select to come video frame above as key frame,
Generate the video frequency abstract of given length.The invention comprises the following steps that:
1) video frame text feature corresponding with the visual signature and video of the network image based on inquiry is extracted.Video frame
Visual signature be expressed as X={ f1,f2,f3,...,fn},fi∈Rm, the visual signature of network image is expressed as { g1,g2,...,
gk},gj∈Rm, the Text Representation of video is { t1,t2,...,tl},ta∈Rd。
2) complete graph of Weight is built.In order to model the dependency relation between video frame, the present invention regards video frame as
Vertex builds the simple graph G=(X, E, W) of Weight, and utilizes formula (1)-(4) solution matrix W.
3) weight by the use of the weight matrix W that step 2 obtains as archetypal analysis problem, and use formulaStructure
Input matrix
4) givenThe upper archetypal analysis for performing Weight, and alternately obtain optimal solution P and Q using algorithm for estimating.
5) according to formulaCalculate the importance scores S of each prototypei。
6) mode in descending order sorts prototype, chooses the prototype that importance scores are more than certain threshold epsilon.
7) since the prototype of importance scores maximum, select element score maximum institute right in the row of its corresponding Q
The video frame corresponding to line number answered, judges the similitude of the frame and previously selected all frames, if similitude is more than one
Fixed threshold value, then do not include the frame in making a summary.If after all complete above process of prototype iteration, the length not up to made a summary, then
Carry out lower whorl and choose process, the line number corresponding to Second Largest Value is chosen from each column of Q to choose key frame.Then iteration is above-mentioned
Process is until length of summarization needed for satisfaction.
Claims (3)
1. a kind of more video summarization methods of the archetypal analysis technology based on Weight, it is characterized in that, first with Weight
Relation between graph model modeling video frame, so as to obtain the weight matrix needed for the archetypal analysis of Weight;Then band is utilized
The archetypal analysis of weight obtain key frame, generate the video frequency abstract of given length.
2. more video summarization methods of the archetypal analysis technology based on Weight as claimed in claim 1, it is characterized in that, obtain
Weight matrix specific steps needed for the archetypal analysis of Weight:
The simple graph of a Weight is built, gives l video under same event, n frames candidate pass is obtained after being pre-processed
Key frame, is expressed as feature vector, X={ f1,f2,f3,...,fn},fi∈Rm, fiRepresent the m dimensional features of i-th of candidate key-frames to
Amount, using candidate key-frames as vertex structure visual similarity figure G=(X, E, W), wherein X represent vertex, E expression video frame it
Between connection side, W represents the vision connection weight on side, in order to calculate W, calculates the cosine similarity A between video frame first
(fi,fj), its calculation formula such as equation (1):
<mrow>
<mi>A</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mo>&Sigma;</mo>
<mrow>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>&Element;</mo>
<mi>X</mi>
<mo>&cap;</mo>
<mi>j</mi>
<mo>&NotEqual;</mo>
<mi>i</mi>
</mrow>
</msub>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Here sim (i, j) represents the cosine similarity between the i-th frame and jth network image;
The graph model of a Weight is built, will be the connection side between the video frame of video using the similitude between video
Additionally one weight of addition, in order to which this relation, design weight matrix W is presentedv, its specific calculation such as equation (2):
<mrow>
<msub>
<mi>W</mi>
<mi>v</mi>
</msub>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mn>1</mn>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mi>f</mi>
<mi> </mi>
<mi>v</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>v</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mn>1</mn>
<mo>+</mo>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mi>v</mi>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
<mo>,</mo>
<mi>v</mi>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mi>f</mi>
<mi> </mi>
<mi>v</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&NotEqual;</mo>
<mi>v</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Here v (f) represents the video for including frame f, sim (v (fi),v(fj)) represent to include frame fiVideo and include frame fjRegard
Similitude between frequency, similitude here refers to the cosine similarity obtained according to the text message of video, above-mentioned given
Expression formula is only the connection side increase weight between the frame of video, and in video between the connection side right of frame keep again not
Become;
Video frame and the average similarity of all-network image, and the importance criteria using the similitude as video frame are calculated,
Shown in its specific calculation such as formula (3):
<mrow>
<msub>
<mi>W</mi>
<mi>q</mi>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>k</mi>
</munderover>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>g</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein gjRepresent jth network image, sim (fi,gj) represent video frame fiWith gjCosine similarity;
Shown in the calculating such as equation (4) of the connection weight matrix W on the side of the graph model of the Weight of structure:
W=A ⊙ Wv⊙Wq (4)。
3. more video summarization methods of the archetypal analysis technology based on Weight as claimed in claim 1, it is characterized in that, one
Comprised the following steps that in example:
1) video frame text feature corresponding with the visual signature and video of the network image based on inquiry is extracted:Video frame regards
Feel character representation is X={ f1,f2,f3,...,fn},fi∈Rm, the visual signature of network image is expressed as { g1,g2,...,gk},
gk∈Rm, gkRepresent the m dimensional feature vectors of kth network image, the Text Representation of video is { t1,t2,...,tl},ta∈
Rd, taRepresent the text feature of a-th of video;
2) complete graph of Weight is built:In order to model the dependency relation between video frame, regard video frame as vertex structure band
The simple graph G=(X, E, W) of weight, and utilize formula (1)-(4) solution matrix W;
3) weight by the use of the weight matrix W that step 2 obtains as archetypal analysis problem, and use formulaStructure input
Matrix
4) givenThe upper archetypal analysis for performing Weight, and alternately obtain optimal dematrix P and Q, P using algorithm for estimating
Represent the coefficient matrix of prototype reconstruct input, Q represents the coefficient matrix of input reconstruct prototype;
5) according to formulaCalculate the importance scores S of each prototypei;
6) mode in descending order sorts prototype, chooses the prototype that importance scores are more than certain threshold epsilon.
7) since the prototype of importance scores maximum, selected in the row of its corresponding Q corresponding to element score maximum
Video frame corresponding to line number, judges the similitude of the frame and previously selected all frames, if similitude is more than threshold value,
Do not include the frame in summary;If after all complete above process of prototype iteration, the length not up to made a summary, then carry out lower whorl selection
Process, chooses the line number corresponding to Second Largest Value to choose key frame from each column of Q;Then the iteration above process is until meeting
Required length of summarization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711249015.1A CN107943990B (en) | 2017-12-01 | 2017-12-01 | Multi-video abstraction method based on prototype analysis technology with weight |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711249015.1A CN107943990B (en) | 2017-12-01 | 2017-12-01 | Multi-video abstraction method based on prototype analysis technology with weight |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107943990A true CN107943990A (en) | 2018-04-20 |
CN107943990B CN107943990B (en) | 2020-02-14 |
Family
ID=61948265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711249015.1A Active CN107943990B (en) | 2017-12-01 | 2017-12-01 | Multi-video abstraction method based on prototype analysis technology with weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107943990B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857906A (en) * | 2019-01-10 | 2019-06-07 | 天津大学 | More video summarization methods of unsupervised deep learning based on inquiry |
CN110147469A (en) * | 2019-05-14 | 2019-08-20 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of data processing method, equipment and storage medium |
CN110298270A (en) * | 2019-06-14 | 2019-10-01 | 天津大学 | A kind of more video summarization methods based on the perception of cross-module state importance |
CN110769279A (en) * | 2018-07-27 | 2020-02-07 | 北京京东尚科信息技术有限公司 | Video processing method and device |
CN111339359A (en) * | 2020-02-18 | 2020-06-26 | 中山大学 | Sudoku-based video thumbnail automatic generation method |
CN111062284B (en) * | 2019-12-06 | 2023-09-29 | 浙江工业大学 | Visual understanding and diagnosis method for interactive video abstract model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106993240A (en) * | 2017-03-14 | 2017-07-28 | 天津大学 | Many video summarization methods based on sparse coding |
CN107203636A (en) * | 2017-06-08 | 2017-09-26 | 天津大学 | Many video summarization methods based on the main clustering of hypergraph |
-
2017
- 2017-12-01 CN CN201711249015.1A patent/CN107943990B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106993240A (en) * | 2017-03-14 | 2017-07-28 | 天津大学 | Many video summarization methods based on sparse coding |
CN107203636A (en) * | 2017-06-08 | 2017-09-26 | 天津大学 | Many video summarization methods based on the main clustering of hypergraph |
Non-Patent Citations (2)
Title |
---|
ERCAN CANHASI ET AL: "Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization", 《EXPERT SYSTEMS WITH APPLICATIONS》 * |
ERCAN CANHASI ET AL: "Weighted hierarchical archetypal analysis for multi-document summarization", 《COMPUTER SPEECH AND LANGUAGE》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110769279A (en) * | 2018-07-27 | 2020-02-07 | 北京京东尚科信息技术有限公司 | Video processing method and device |
US11445272B2 (en) | 2018-07-27 | 2022-09-13 | Beijing Jingdong Shangke Information Technology Co, Ltd. | Video processing method and apparatus |
CN109857906A (en) * | 2019-01-10 | 2019-06-07 | 天津大学 | More video summarization methods of unsupervised deep learning based on inquiry |
CN109857906B (en) * | 2019-01-10 | 2023-04-07 | 天津大学 | Multi-video abstraction method based on query unsupervised deep learning |
CN110147469A (en) * | 2019-05-14 | 2019-08-20 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of data processing method, equipment and storage medium |
CN110147469B (en) * | 2019-05-14 | 2023-08-08 | 腾讯音乐娱乐科技(深圳)有限公司 | Data processing method, device and storage medium |
CN110298270A (en) * | 2019-06-14 | 2019-10-01 | 天津大学 | A kind of more video summarization methods based on the perception of cross-module state importance |
CN111062284B (en) * | 2019-12-06 | 2023-09-29 | 浙江工业大学 | Visual understanding and diagnosis method for interactive video abstract model |
CN111339359A (en) * | 2020-02-18 | 2020-06-26 | 中山大学 | Sudoku-based video thumbnail automatic generation method |
Also Published As
Publication number | Publication date |
---|---|
CN107943990B (en) | 2020-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107943990A (en) | More video summarization methods of archetypal analysis technology based on Weight | |
CN111931062B (en) | Training method and related device of information recommendation model | |
Al-Rousan et al. | Video-based signer-independent Arabic sign language recognition using hidden Markov models | |
CN105138991B (en) | A kind of video feeling recognition methods merged based on emotion significant characteristics | |
CN107203636B (en) | Multi-video abstract acquisition method based on hypergraph master set clustering | |
CN109902912B (en) | Personalized image aesthetic evaluation method based on character features | |
US20230353828A1 (en) | Model-based data processing method and apparatus | |
CN111949886B (en) | Sample data generation method and related device for information recommendation | |
EP3408836A1 (en) | Crowdshaping realistic 3d avatars with words | |
Xu et al. | Mining and application of tourism online review text based on natural language processing and text classification technology | |
CN113239159B (en) | Cross-modal retrieval method for video and text based on relational inference network | |
Zhang et al. | Retargeting semantically-rich photos | |
CN111954087B (en) | Method and device for intercepting images in video, storage medium and electronic equipment | |
CN106993240A (en) | Many video summarization methods based on sparse coding | |
CN115293348A (en) | Pre-training method and device for multi-mode feature extraction network | |
Punyani et al. | Human age-estimation system based on double-level feature fusion of face and gait images | |
CN110351580A (en) | TV programme special recommendation method and system based on Non-negative Matrix Factorization | |
CN115203471A (en) | Attention mechanism-based multimode fusion video recommendation method | |
Lai et al. | Improving graph-based sentence ordering with iteratively predicted pairwise orderings | |
CN115033736A (en) | Video abstraction method guided by natural language | |
CN113343029B (en) | Complex video character retrieval method with enhanced social relationship | |
Guo et al. | Deep attentive factorization machine for app recommendation service | |
Lu et al. | Zero-shot video grounding with pseudo query lookup and verification | |
CN111897999A (en) | LDA-based deep learning model construction method for video recommendation | |
CN116701706B (en) | Data processing method, device, equipment and medium based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |