CN107943990A - More video summarization methods of archetypal analysis technology based on Weight - Google Patents

More video summarization methods of archetypal analysis technology based on Weight Download PDF

Info

Publication number
CN107943990A
CN107943990A CN201711249015.1A CN201711249015A CN107943990A CN 107943990 A CN107943990 A CN 107943990A CN 201711249015 A CN201711249015 A CN 201711249015A CN 107943990 A CN107943990 A CN 107943990A
Authority
CN
China
Prior art keywords
mrow
video
msub
weight
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711249015.1A
Other languages
Chinese (zh)
Other versions
CN107943990B (en
Inventor
冀中
江俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201711249015.1A priority Critical patent/CN107943990B/en
Publication of CN107943990A publication Critical patent/CN107943990A/en
Application granted granted Critical
Publication of CN107943990B publication Critical patent/CN107943990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to technical field of video processing, to propose more video summarization techniques of the archetypal analysis method based on Weight suitable for the feature, is allowed under the auxiliary of effective prior information, fully utilizes the peculiar information of data.The technical solution adopted by the present invention is more video summarization methods of the archetypal analysis technology based on Weight, first with the relation between the graph model modeling video frame of Weight, so as to obtain the weight matrix needed for the archetypal analysis of Weight;Then key frame is obtained using the archetypal analysis of Weight, generates the video frequency abstract of given length.Present invention is mainly applied to Video processing occasion.

Description

More video summarization methods of archetypal analysis technology based on Weight
Technical field
The present invention relates to technical field of video processing, specifically, is related to archetypal analysis technology based on Weight more and regards Frequency method of abstracting.
Background technology
With the fast development of information technology, video data emerges in multitude, become people obtain information important channel it One.However, due to the sharp increase of number of videos, occur redundancy and the information repeated in multitude of video data, this makes user quickly obtain Information needed is taken to become difficult.Therefore, in this case, there is an urgent need to it is a kind of can be to the massive video data under same subject The technology integrated, analyzed, to meet the needs of people want fast and accurately to browse video main information, improves people Obtain the ability of information.More video summarization techniques as one of effective way to solve the above problems, in the past few decades in Cause the concern of more and more researchers.More video summarization techniques are a kind of video data compression skills based on content Art, it is intended to multiple videos of the related subject under same event are analyzed, are integrated, are extracted main interior in multiple videos Hold, and the content of extraction is presented to user according to certain logical relation.At present for more video frequency abstracts mainly in terms of three Analyzed:1) coverage rate;2) novelty;3) importance.It is same that coverage rate refers to that extracted video content can cover The main contents of multiple videos under theme.Redundancy refers to removing repetition, redundancy the information in more video frequency abstracts.It is important Property refer to is then to extract important crucial camera lens in video set according to some prior informations, so as to extract important in multiple videos Content.
Although many single videos summary it has been proposed that research for more video summarization methods it is less, still in Elementary step.This mainly has two reasons:1) one be due under same event the diversity of multiple video subjects and video it Between theme intercrossing.Theme diversity refers to that the information emphasis of multiple videos under same event is different, has multiple Sub-topics.And content has intercrossing between theme intercrossing refers to the video under same event, existing similar content, also has The different information contents.2) two be due to audio-frequency information that more video datas show same content, text message and There may be bigger difference for visual information.These reasons cause the research of more video frequency abstracts to be difficult to traditional single video summary.
In the past few decades, people are directed to the characteristics of more sets of video data, it is proposed that the side of some more video frequency abstracts Method.Wherein, more video summarization methods based on complicated figure cluster are the classical methods of a comparison.Such method passes through extraction The keyword of video corresponding scripts information and the key frame of video, build the figure of complexity, and are calculated on this basis using figure cluster Method realizes summary.But this method, mainly for news video, for the video set of no video script information, this method is just lost Meaning has been gone, there is diversity and redundancy additionally, due to the content that multiple videos under same subject include, only with cluster Although method meets the maximal cover condition of video content, for more video frequency abstracts, only with the visual information cluster effect of video Fruit is poor, though there is certain help with reference to other mode, complexity is larger.
There are the information of multiple modalities, text message, visual information, the audio-frequency information of such as video in more video frequency abstracts. Balanced AV-MMR (Balanced Audio Video Maximal Marginal Relevance) are a kind of effectively profits With more video summarization techniques of video multiple modalities information, it is by analyzing visual information, audio-frequency information and the vision of video Semantic information in information and audio-frequency information, including audio, face and temporal characteristics etc. these for video frequency abstract have weight Want the information of meaning.The process efficiently utilizes the multi-modal information of video, but the video frequency abstract that extracts and not up to preferable Effect.
In recent years, there has been proposed some novel methods.Wherein, the vision co-occurrence characteristic (visual of video is utilized Co-occurrence it is one of them relatively new method) to realize more video frequency abstracts.The visual concept that this method thinks important is past Maximum two tuple lookup algorithms are proposed toward being repetitively appearing in multiple videos under same subject, and according to this feature (Maximal Biclique Finding), extracts the sparse co-occurrence pattern of more videos, so as to fulfill more video frequency abstracts.But should Method is only applicable to specific data set, and meaning is just lost for the less video set of repeatability, this method in video.
In addition, in order to utilize more relevant informations, correlative study person proposes to be passed using the GPS on mobile phone and compass etc. Sensor obtains the information such as the geographical location in mobile video shooting process, and the thus important information in auxiliary judgment video, raw Into more video frequency abstracts.In addition, the art teaches by the use of this prior information of Web page picture as auxiliary information, preferably in fact Now more video frequency abstracts.At present, since the complexity of more video datas, the research of more video frequency abstracts do not reach ideal effect. Therefore, the information of more video datas how is better profited from, more video frequency abstracts are better achieved, become presently relevant scholar and grinds The hot spot studied carefully.For this reason, this paper presents realize more video frequency abstracts using archetypal analysis technology (Archetypal Analysis).
Each data point in data set is considered as one group of list by archetypal analysis technology (Archetypal Analysis, AA) One, the mixing resultant of observable prototype, and prototype is limited to the sparse mixing of data point in data set in itself, and generally Positioned at the boundary of data set.AA models are widely used in different fields, such as in economics, astrophysics and pattern In identification.The serviceability that AA models reduce feature extraction and dimension is utilized by the machine learning algorithm in various fields, such as Say from computer vision, neuro images, chemistry, the field such as text mining and collaborative filtering.
The content of the invention
For overcome the deficiencies in the prior art, the present invention is directed to propose the archetypal analysis based on Weight suitable for the feature More video summarization techniques of method, are allowed under the auxiliary of effective prior information, fully utilize the peculiar information of data.This The technical solution that invention uses is more video summarization methods of the archetypal analysis technology based on Weight, first with Weight Graph model modeling video frame between relation, so as to obtain the weight matrix needed for the archetypal analysis of Weight;Then utilize The archetypal analysis of Weight obtain key frame, generate the video frequency abstract of given length.
Obtain the weight matrix specific steps needed for the archetypal analysis of Weight:
The simple graph of a Weight is built, gives l video under same event, n frames time is obtained after being pre-processed Key frame is selected, is expressed as feature vector, X={ f1,f2,f3,...,fn},fi∈Rm, fiRepresent the m Wei Te of i-th of candidate key-frames Sign vector, using candidate key-frames as vertex structure visual similarity figure G=(X, E, W), wherein X represents vertex, and E represents video Connection side between frame, W represent the vision connection weight on side, in order to calculate W, calculate the cosine similarity between video frame first A(fi,fj), its calculation formula such as equation (1):
Here sim (i, j) represents the cosine similarity between the i-th frame and jth network image;
The graph model of a Weight is built, will be the company between the video frame of video using the similitude between video Edge fit additionally adds a weight, in order to which this relation, design weight matrix W is presentedv, its specific calculation such as equation (2):
Here v (f) represents the video for including frame f, sim (v (fi),v(fj)) represent to include frame fiVideo and include frame fj Video between similitude, similitude here refers to the cosine similarity obtained according to the text message of video, above-mentioned institute The expression formula given is only the connection side increase weight between the frame of video, and in video between the connection side right of frame keep again It is constant;
Calculate video frame and the average similarity of all-network image, and the importance mark using the similitude as video frame Standard, shown in its specific calculation such as formula (3):
Wherein gjRepresent jth network image, sim (fi,gj) represent video frame fiWith gjCosine similarity;
Shown in the calculating such as equation (4) of the connection weight matrix W on the side of the graph model of the Weight of structure:
Comprised the following steps that in one example:
1) video frame text feature corresponding with the visual signature and video of the network image based on inquiry is extracted:Video frame Visual signature be expressed as X={ f1,f2,f3,...,fn},fi∈Rm, the visual signature of network image is expressed as { g1,g2,..., gk},gk∈Rm, gkRepresent the m dimensional feature vectors of kth network image, the Text Representation of video is { t1,t2,...,tl}, ta∈Rd, taRepresent the text feature of a-th of video;
2) complete graph of Weight is built:In order to model the dependency relation between video frame, regard video frame as vertex structure The simple graph G=(X, E, W) of Weight is built, and utilizes formula (1)-(4) solution matrix W;
3) weight by the use of the weight matrix W that step 2 obtains as archetypal analysis problem, and use formulaStructure Input matrix
4) givenThe upper archetypal analysis for performing Weight, and alternately obtain optimal dematrix P using algorithm for estimating And Q, P represent the coefficient matrix of prototype reconstruct input, Q represents the coefficient matrix of input reconstruct prototype;
5) according to formulaCalculate the importance scores S of each prototypei
6) mode in descending order sorts prototype, chooses the prototype that importance scores are more than certain threshold epsilon.
7) since the prototype of importance scores maximum, select element score maximum institute right in the row of its corresponding Q The video frame corresponding to line number answered, judges the similitude of the frame and previously selected all frames, if similitude is more than threshold Value, then do not include the frame in making a summary;If after all complete above process of prototype iteration, the length not up to made a summary then carries out down Selection process is taken turns, the line number corresponding to Second Largest Value is chosen from each column of Q to choose key frame;Then the iteration above process is straight To length of summarization needed for satisfaction.
The features of the present invention and beneficial effect are:
The characteristics of present invention is primarily directed to existing more video frequency abstract data sets, design is suitable for the feature based on band More video summarization techniques of the archetypal analysis method of weight, are allowed under the auxiliary of effective prior information, fully utilize number According to peculiar information.Its main advantage is mainly reflected in:
(1) novelty:First by the archetypal analysis method first Application of Weight in more video frequency abstracts of inquiry oriented.And The text message of video and network image information based on inquiry are collectively incorporated into more video frequency abstracts using the graph model of Weight In model the relation between video frame.
(2) validity:It has been experimentally confirmed sparse with the typical clustering method applied to single video summary and minimum Reconstructing method compares, and the performance of the more video summarization methods for the archetypal analysis based on Weight that the present invention designs is substantially better than Both, therefore more suitable in more video frequency abstract problems.
(3) practicality:Simple possible, can be used in multimedia signal processing field.
Brief description of the drawings:
Fig. 1 is the flow chart of the Video Key camera lens extraction of the archetypal analysis method provided by the invention based on Weight.
Embodiment
The features such as present invention is directed to the redundancy of multimedia video data, duplicate message is more, with reference to the vision of video Information, text message and the other and relevant prior information of theme, using archetypal analysis thought to traditional more video frequency abstract sides Method is improved, and has been achieveed the purpose that to efficiently use video subject relevant information, has been improved user and browse video efficiency.
Method provided by the present invention is broadly divided into:1) graph model for designing Weight first is used to construct between sentence Relevance.2) and then the archetypal analysis Technology design of Weight is utilized to be suitable for more video frequency abstract data set features of inquiry oriented Crucial frame selecting method.
Each data point in data set is considered as one group of list by archetypal analysis technology (Archetypal Analysis, AA) One, the mixing resultant of observable prototype, and prototype is limited to the sparse mixing of data point in data set in itself, and generally Positioned at the boundary of data set.
Matrix X={ the f of given n × m1,f2,...fi,...,fn},fi∈Rm, and in the case of z < < n, archetypal analysis Matrix W factorization is two random matrix P ∈ R by problemn×zWith Q ∈ Rn×z, the coefficient matrix of P expression prototype reconstruct inputs, Q represents the coefficient matrix of input reconstruct prototype, as follows:
X ≈ PA with A=XTQ (4)
Archetypal analysis algorithm initializes matrix P and Q and calculates prototype matrix A first, then straight using equation (5) renewal P and Q A fully small value is converged on to residual sum of squares (RSS) RSS or reaches maximum iteration.
But above-mentioned archetypal analysis problem regards all video frame as the frame with equal weight, each data point (video frame) and its corresponding residual error minimize equation (5) to obtain prototype with identical weight.And in more video frequency abstracts, depending on Frequency frame is not point that be equivalent, making a difference between them.Therefore the invention will be obtained using the archetypal analysis of Weight Key frame.
The present invention is using the relation between the graph model modeling video frame of Weight first, so as to obtain the prototype of Weight Weight matrix needed for analysis.
In order to model the relation between video frame, the present invention constructs the simple graph of a Weight.Give same event Under l video, n frame candidate key-frames X={ f are obtained after being pre-processed1,f2,f3,...,fn},fi∈Rm.The present invention will wait Selecting key frame, wherein X represents vertex as vertex structure visual similarity figure G=(X, E, W), and E represents the company between video frame Edge fit, W represent the vision connection weight on side.In order to calculate W, the present invention calculates the cosine similarity A between video frame first (fi,fj), its calculation formula such as equation (1):
Here sim (i, j) represents the cosine similarity between the i-th frame and jth network image.
The frame-to-frame coherence relation that observation finds to distinguish in video between the similarity relationships and video of interframe helps to improve The quality of more video frequency abstracts.Therefore the influence of the similarity relationships for the relation pair interframe between reflecting video, build herein The graph model of one Weight.The present invention will be the connection side between the video frame of video using the similitude between video Additionally one weight of addition.In order to which this relation is presented, the present invention devises weight matrix Wv, its specific calculation such as equation (2):
Here v (f) represents the video for including frame f.sim(v(fi),v(fj)) represent to include frame fiVideo and include frame fj Video between similitude, similitude here refers to the cosine similarity obtained according to the text message of video.Above-mentioned institute The expression formula given is only the connection side increase weight between the frame of video, and in video between the connection side right of frame keep again It is constant.
Recently, with retrievable user's generation information on website, such as image, video information are more and more, and one certainly Right idea is to be used as auxiliary generation summary by the use of these external informations.We, which regard query image as, obtains the important interior of video The prior information of appearance.Query image is that user uploads after conscientiously selecting as the complementary information of video, therefore with one kind More there are the main contents that semantic mode presents event, and have for video less redundancy and Noise information.It is all these to show that inquiring about picture contributes to the generation of more video frequency abstracts as prior information.Therefore the invention is first Video frame and the average similarity of all-network image, and the importance criteria using the similitude as video frame are first calculated, its Shown in specific calculation such as formula (3):
Wherein gjRepresent jth network image, sim (fi,gj) represent video frame fiWith gjCosine similarity, WqExpression regards Frequency frame fiWith the cosine similarity of all-network image and.
Therefore shown in the calculating such as equation (4) of the connection weight matrix W on the side of the graph model of constructed Weight:
After obtaining weight matrix W, which uses the archetypal analysis technical limit spacing key frame of Weight.The prototype of Weight Problem analysis can be considered as minimization problem:
This problem can also be rewritten as:
Thus, we obtain the regarding of the archetypal analysis (Weighted archetypal analysis) based on Weight more Frequency method of abstracting mainly includes Primary Stage Data and prepares, solves the weight matrix needed for archetypal analysis using the graph model of Weight, The archetypal analysis of Weight solve three phases.
Fig. 1 is described with reference to Web page image prior information, using in the archetypal analysis method extraction video based on Weight Crucial camera lens flow chart.The main thought of this method is into Weight by soft cluster (soft-clustered) of video frame Prototype (archetypes), then according to this prototype sequencing video frame, and select to come video frame above as key frame, Generate the video frequency abstract of given length.The invention comprises the following steps that:
1) video frame text feature corresponding with the visual signature and video of the network image based on inquiry is extracted.Video frame Visual signature be expressed as X={ f1,f2,f3,...,fn},fi∈Rm, the visual signature of network image is expressed as { g1,g2,..., gk},gj∈Rm, the Text Representation of video is { t1,t2,...,tl},ta∈Rd
2) complete graph of Weight is built.In order to model the dependency relation between video frame, the present invention regards video frame as Vertex builds the simple graph G=(X, E, W) of Weight, and utilizes formula (1)-(4) solution matrix W.
3) weight by the use of the weight matrix W that step 2 obtains as archetypal analysis problem, and use formulaStructure Input matrix
4) givenThe upper archetypal analysis for performing Weight, and alternately obtain optimal solution P and Q using algorithm for estimating.
5) according to formulaCalculate the importance scores S of each prototypei
6) mode in descending order sorts prototype, chooses the prototype that importance scores are more than certain threshold epsilon.
7) since the prototype of importance scores maximum, select element score maximum institute right in the row of its corresponding Q The video frame corresponding to line number answered, judges the similitude of the frame and previously selected all frames, if similitude is more than one Fixed threshold value, then do not include the frame in making a summary.If after all complete above process of prototype iteration, the length not up to made a summary, then Carry out lower whorl and choose process, the line number corresponding to Second Largest Value is chosen from each column of Q to choose key frame.Then iteration is above-mentioned Process is until length of summarization needed for satisfaction.

Claims (3)

1. a kind of more video summarization methods of the archetypal analysis technology based on Weight, it is characterized in that, first with Weight Relation between graph model modeling video frame, so as to obtain the weight matrix needed for the archetypal analysis of Weight;Then band is utilized The archetypal analysis of weight obtain key frame, generate the video frequency abstract of given length.
2. more video summarization methods of the archetypal analysis technology based on Weight as claimed in claim 1, it is characterized in that, obtain Weight matrix specific steps needed for the archetypal analysis of Weight:
The simple graph of a Weight is built, gives l video under same event, n frames candidate pass is obtained after being pre-processed Key frame, is expressed as feature vector, X={ f1,f2,f3,...,fn},fi∈Rm, fiRepresent the m dimensional features of i-th of candidate key-frames to Amount, using candidate key-frames as vertex structure visual similarity figure G=(X, E, W), wherein X represent vertex, E expression video frame it Between connection side, W represents the vision connection weight on side, in order to calculate W, calculates the cosine similarity A between video frame first (fi,fj), its calculation formula such as equation (1):
<mrow> <mi>A</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mo>&amp;Sigma;</mo> <mrow> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>&amp;Element;</mo> <mi>X</mi> <mo>&amp;cap;</mo> <mi>j</mi> <mo>&amp;NotEqual;</mo> <mi>i</mi> </mrow> </msub> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Here sim (i, j) represents the cosine similarity between the i-th frame and jth network image;
The graph model of a Weight is built, will be the connection side between the video frame of video using the similitude between video Additionally one weight of addition, in order to which this relation, design weight matrix W is presentedv, its specific calculation such as equation (2):
<mrow> <msub> <mi>W</mi> <mi>v</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> <mi> </mi> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1</mn> <mo>+</mo> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>v</mi> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>,</mo> <mi>v</mi> <mo>(</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> <mi> </mi> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;NotEqual;</mo> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Here v (f) represents the video for including frame f, sim (v (fi),v(fj)) represent to include frame fiVideo and include frame fjRegard Similitude between frequency, similitude here refers to the cosine similarity obtained according to the text message of video, above-mentioned given Expression formula is only the connection side increase weight between the frame of video, and in video between the connection side right of frame keep again not Become;
Video frame and the average similarity of all-network image, and the importance criteria using the similitude as video frame are calculated, Shown in its specific calculation such as formula (3):
<mrow> <msub> <mi>W</mi> <mi>q</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>g</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Wherein gjRepresent jth network image, sim (fi,gj) represent video frame fiWith gjCosine similarity;
Shown in the calculating such as equation (4) of the connection weight matrix W on the side of the graph model of the Weight of structure:
W=A ⊙ Wv⊙Wq (4)。
3. more video summarization methods of the archetypal analysis technology based on Weight as claimed in claim 1, it is characterized in that, one Comprised the following steps that in example:
1) video frame text feature corresponding with the visual signature and video of the network image based on inquiry is extracted:Video frame regards Feel character representation is X={ f1,f2,f3,...,fn},fi∈Rm, the visual signature of network image is expressed as { g1,g2,...,gk}, gk∈Rm, gkRepresent the m dimensional feature vectors of kth network image, the Text Representation of video is { t1,t2,...,tl},ta∈ Rd, taRepresent the text feature of a-th of video;
2) complete graph of Weight is built:In order to model the dependency relation between video frame, regard video frame as vertex structure band The simple graph G=(X, E, W) of weight, and utilize formula (1)-(4) solution matrix W;
3) weight by the use of the weight matrix W that step 2 obtains as archetypal analysis problem, and use formulaStructure input Matrix
4) givenThe upper archetypal analysis for performing Weight, and alternately obtain optimal dematrix P and Q, P using algorithm for estimating Represent the coefficient matrix of prototype reconstruct input, Q represents the coefficient matrix of input reconstruct prototype;
5) according to formulaCalculate the importance scores S of each prototypei
6) mode in descending order sorts prototype, chooses the prototype that importance scores are more than certain threshold epsilon.
7) since the prototype of importance scores maximum, selected in the row of its corresponding Q corresponding to element score maximum Video frame corresponding to line number, judges the similitude of the frame and previously selected all frames, if similitude is more than threshold value, Do not include the frame in summary;If after all complete above process of prototype iteration, the length not up to made a summary, then carry out lower whorl selection Process, chooses the line number corresponding to Second Largest Value to choose key frame from each column of Q;Then the iteration above process is until meeting Required length of summarization.
CN201711249015.1A 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight Active CN107943990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711249015.1A CN107943990B (en) 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711249015.1A CN107943990B (en) 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight

Publications (2)

Publication Number Publication Date
CN107943990A true CN107943990A (en) 2018-04-20
CN107943990B CN107943990B (en) 2020-02-14

Family

ID=61948265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711249015.1A Active CN107943990B (en) 2017-12-01 2017-12-01 Multi-video abstraction method based on prototype analysis technology with weight

Country Status (1)

Country Link
CN (1) CN107943990B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857906A (en) * 2019-01-10 2019-06-07 天津大学 More video summarization methods of unsupervised deep learning based on inquiry
CN110147469A (en) * 2019-05-14 2019-08-20 腾讯音乐娱乐科技(深圳)有限公司 A kind of data processing method, equipment and storage medium
CN110298270A (en) * 2019-06-14 2019-10-01 天津大学 A kind of more video summarization methods based on the perception of cross-module state importance
CN110769279A (en) * 2018-07-27 2020-02-07 北京京东尚科信息技术有限公司 Video processing method and device
CN111339359A (en) * 2020-02-18 2020-06-26 中山大学 Sudoku-based video thumbnail automatic generation method
CN111062284B (en) * 2019-12-06 2023-09-29 浙江工业大学 Visual understanding and diagnosis method for interactive video abstract model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106993240A (en) * 2017-03-14 2017-07-28 天津大学 Many video summarization methods based on sparse coding
CN107203636A (en) * 2017-06-08 2017-09-26 天津大学 Many video summarization methods based on the main clustering of hypergraph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106993240A (en) * 2017-03-14 2017-07-28 天津大学 Many video summarization methods based on sparse coding
CN107203636A (en) * 2017-06-08 2017-09-26 天津大学 Many video summarization methods based on the main clustering of hypergraph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ERCAN CANHASI ET AL: "Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization", 《EXPERT SYSTEMS WITH APPLICATIONS》 *
ERCAN CANHASI ET AL: "Weighted hierarchical archetypal analysis for multi-document summarization", 《COMPUTER SPEECH AND LANGUAGE》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110769279A (en) * 2018-07-27 2020-02-07 北京京东尚科信息技术有限公司 Video processing method and device
US11445272B2 (en) 2018-07-27 2022-09-13 Beijing Jingdong Shangke Information Technology Co, Ltd. Video processing method and apparatus
CN109857906A (en) * 2019-01-10 2019-06-07 天津大学 More video summarization methods of unsupervised deep learning based on inquiry
CN109857906B (en) * 2019-01-10 2023-04-07 天津大学 Multi-video abstraction method based on query unsupervised deep learning
CN110147469A (en) * 2019-05-14 2019-08-20 腾讯音乐娱乐科技(深圳)有限公司 A kind of data processing method, equipment and storage medium
CN110147469B (en) * 2019-05-14 2023-08-08 腾讯音乐娱乐科技(深圳)有限公司 Data processing method, device and storage medium
CN110298270A (en) * 2019-06-14 2019-10-01 天津大学 A kind of more video summarization methods based on the perception of cross-module state importance
CN111062284B (en) * 2019-12-06 2023-09-29 浙江工业大学 Visual understanding and diagnosis method for interactive video abstract model
CN111339359A (en) * 2020-02-18 2020-06-26 中山大学 Sudoku-based video thumbnail automatic generation method

Also Published As

Publication number Publication date
CN107943990B (en) 2020-02-14

Similar Documents

Publication Publication Date Title
CN107943990A (en) More video summarization methods of archetypal analysis technology based on Weight
CN111931062B (en) Training method and related device of information recommendation model
Al-Rousan et al. Video-based signer-independent Arabic sign language recognition using hidden Markov models
CN105138991B (en) A kind of video feeling recognition methods merged based on emotion significant characteristics
CN107203636B (en) Multi-video abstract acquisition method based on hypergraph master set clustering
CN109902912B (en) Personalized image aesthetic evaluation method based on character features
US20230353828A1 (en) Model-based data processing method and apparatus
CN111949886B (en) Sample data generation method and related device for information recommendation
EP3408836A1 (en) Crowdshaping realistic 3d avatars with words
Xu et al. Mining and application of tourism online review text based on natural language processing and text classification technology
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
Zhang et al. Retargeting semantically-rich photos
CN111954087B (en) Method and device for intercepting images in video, storage medium and electronic equipment
CN106993240A (en) Many video summarization methods based on sparse coding
CN115293348A (en) Pre-training method and device for multi-mode feature extraction network
Punyani et al. Human age-estimation system based on double-level feature fusion of face and gait images
CN110351580A (en) TV programme special recommendation method and system based on Non-negative Matrix Factorization
CN115203471A (en) Attention mechanism-based multimode fusion video recommendation method
Lai et al. Improving graph-based sentence ordering with iteratively predicted pairwise orderings
CN115033736A (en) Video abstraction method guided by natural language
CN113343029B (en) Complex video character retrieval method with enhanced social relationship
Guo et al. Deep attentive factorization machine for app recommendation service
Lu et al. Zero-shot video grounding with pseudo query lookup and verification
CN111897999A (en) LDA-based deep learning model construction method for video recommendation
CN116701706B (en) Data processing method, device, equipment and medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant