CN102902756A - Video abstraction extraction method based on story plots - Google Patents

Video abstraction extraction method based on story plots Download PDF

Info

Publication number
CN102902756A
CN102902756A CN2012103581835A CN201210358183A CN102902756A CN 102902756 A CN102902756 A CN 102902756A CN 2012103581835 A CN2012103581835 A CN 2012103581835A CN 201210358183 A CN201210358183 A CN 201210358183A CN 102902756 A CN102902756 A CN 102902756A
Authority
CN
China
Prior art keywords
scene
camera lens
video
intensity
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103581835A
Other languages
Chinese (zh)
Other versions
CN102902756B (en
Inventor
朱松豪
范莉莉
邹黎明
梁志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201210358183.5A priority Critical patent/CN102902756B/en
Publication of CN102902756A publication Critical patent/CN102902756A/en
Application granted granted Critical
Publication of CN102902756B publication Critical patent/CN102902756B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Studio Devices (AREA)

Abstract

The invention discloses a video abstraction extraction method based on story plots. The method comprises the following steps of conducting key frame, shot and scene detection on original video; detecting splendid scenes from the scene of the video story plots; selecting abstraction fragments from the splendid scenes according to actual situation, jointing according to timing sequences and generating the abstraction of the original video. The video abstraction extraction method further screens the splendid scenes according to evolution strength between the scenes, rejects alternative abstraction fragments which cannot express useful information due to short duration, and adjust the alternative abstraction fragments according to integrity of sentences. The method selects appropriate abstraction fragments to generate video abstraction according to development relationship of the story plots, meets logical thinking of people and is favorable for guaranteeing integrity of movie contents.

Description

A kind of video frequency abstract extracting method based on plot
Technical field
The present invention relates to a kind of video frequency abstract extracting method, relate in particular to a kind of video frequency abstract extracting method based on plot, belong to technical field of image processing.
Background technology
Along with increasing film data appears on network, PC and the digital device, require to take the hope of effective and practical these mass datas of method organization and management also more and more stronger.In these methods, the film abstraction method not only can obtain the simple description to the development of original film data plot, and is conducive to just can catch the film theme before spectators watch whole film.Therefore, the purpose of movie summaries is the development according to plot, selects suitable fragment to consist of film abstraction.Yet, how reasonably to select vidclip and effectively they are integrated into summary, be still a problem that remains further research.
Find by prior art documents, the people such as Ma (Y.Ma, X.Hua, L.Lu, and H Zhang.A generic framework of user attention model and its application in video summarization.In IEEE Transactions on Multimedia, 7 (5): 907 – 919,2005) movie summaries of user's attention model has been proposed, the people such as Li (K.Li, L.Guo, C.Faraco, and et al.Human-centered attention models for video summarization.In Proceedings of IEEE International Conference on Multimodal Interfaces, 2010:27-30) proposed about the movie summaries attention model that people-oriented, the people such as Lu (S.Lu, I.King, and M.Lyu.Video summarization by video structure analysis and graph optimization.In Proceedings of IEEE International Conference on Multimedia and Expo, 2004:1959-1962) realize movie summaries by film structure analysis and the method that figure optimizes.These movie summaries methods mainly lay particular emphasis on by extracting bottom audiovisual features or middle layer audiovisual features and generate summary.Yet understand angle from people, because the difference that bottom audiovisual features and high-level semantic are understood, the bottom audiovisual features can not be described the progress of film plot well.By the film making theory as can be known, the essence of any film all is to tell about a story.Therefore, a desirable movie summaries can be known the progress of describing the original film plot.From spectators' angle, why a film attracts him, is that he wonders the ensuing plot of story how this develops.That is, plot is structure and the excellent content of a film, and significant description is provided.
Summary of the invention
Technical matters to be solved by this invention is to overcome the deficiency of existing video summarization method, a kind of video frequency abstract extracting method based on plot is provided, select suitable summary fragment according to plot development relation, the logical thinking that had both met people also is conducive to guarantee the integrality of substance film.
Video frequency abstract extracting method based on plot of the present invention may further comprise the steps:
Steps A, original video is carried out key frame, camera lens and scene detection;
Step B, from scene, detect highlight scene according to the video story plot;
Step C, from highlight scene, select the summary fragment according to actual conditions, and splice according to sequential, generate the summary of original video.
The detection of described highlight scene comprises:
Session operational scenarios detects: at first detect the scene that contains the people's face camera lens that alternately occurs according to people's face information, as candidate's session operational scenarios; Then, from candidate's session operational scenarios, select the scene that comprises voice, be session operational scenarios;
Action scene detects: when a scene satisfies following three conditions simultaneously, then this scene is considered as action scene: the frame number of each camera lens is less than 25 in this scene, the intensity of on average enlivening of each camera lens surpasses 200, and the average audio energy of each camera lens surpasses 100;
The suspense scene detection: when a scene satisfied following three conditions simultaneously, then this scene was the suspense scene: the average intensity of illumination of this scene is less than 50; The audio power bag that this scene begins certain several camera lens is no more than 5, and the audio power bag of certain two cinestrip changes above 50; The intensity of enlivening that this scene begins several camera lenses is no more than 5, and the Strength Changes of enlivening of certain two cinestrip surpasses 100.
Further, described session operational scenarios detects the detection that also comprises the emotion session operational scenarios: extract respectively the average fundamental frequency of each session operational scenarios and Strength Changes in short-term, select both all greater than the session operational scenarios of predetermined threshold value, be the emotion session operational scenarios.
Further, described action scene detects and also comprises:
The gunbattle scene detection: select orange, yellow, red three kinds of color characteristics all greater than the action scene of predetermined threshold value as the gunbattle scene;
The scene detection of fighting: select to comprise the action scene of blare audio frequency characteristics as fighting scene;
Chase scene detects: select to comprise the action scene of grating and birdie audio frequency characteristics as chase scene.
Preferably, described step C specifically comprises following each substep:
Step C1, calculate differentiation intensity between any two highlight scenes according to following formula:
PIF(AS u,AS v)=α*TT n(AS u,AS v)+β*ST n(AS u,AS v)+γ*RT n(AS u,AS v)
In the formula, PIF (AS u, AS v) two different scenario A S of expression uAnd AS vBetween differentiation intensity, TT n(AS u, AS v), ST n(AS u, AS v), RT n(AS u, AS v) be respectively AS uAnd AS vBetween time domain conversion intensity TT (AS u,, AS v), spatial alternation strength S T (AS u, AS v), periodic conversion intensity RT (AS u, AS v) normalized form, α, β, γ are the weight coefficient that satisfies alpha+beta+γ=1; Wherein,
Time domain conversion intensity TT (AS u, AS v) computing formula be:
TT ( AS u , AS v ) = | Σ p = 1 P N ( AS u , Sh l , Kf p ) - Σ q = 1 Q N ( AS v , Sh w , Kf q ) |
In the formula, N (AS u, Sh l, Kf p) be scenario A S uThe people's face number that occurs in the key frame p among interior last camera lens l, N (AS v, Sh w, Kf q) be scenario A S vThe people's face number that occurs in the key frame q among interior first camera lens w, P, Q are respectively the number of key frames among camera lens l and the w;
Spatial alternation strength S T (AS u, AS v) computing formula be:
ST ( AS u , AS v ) = | 1 P Σ p = 1 P RA ( p ) - 1 Q Σ q = 1 Q RA ( q ) | + | 1 P Σ p = 1 P GA ( p ) - 1 Q Σ q = 1 Q GA ( q ) |
+ | 1 P Σ p = 1 P BA ( p ) - 1 Q Σ q = 1 Q BA ( q ) | + | 1 P Σ p = 1 P LA ( p ) - 1 Q Σ q = 1 Q LA ( q ) |
In the formula, RA (p), GA (p), BA (p) and LA (p) represent respectively scenario A S uIn the mean value of red, green, blue and brightness in the background area of key frame p among last camera lens l, RA (q), GA (q), BA (q) and LA (q) represent respectively scenario A S vIn the mean value of red, green, blue and brightness in the background area of key frame q among first camera lens w, P, Q are respectively the number of key frames among camera lens l and the w;
Periodic conversion intensity RT (AS u, AS v) computing formula be:
RT ( AS u , AS v ) = | 1 M Σ m = 1 M Len ( Sh m ) - 1 N Σ n = 1 N Len ( Sh n ) |
In the formula, Len (Sh m) be scenario A S uInterior m camera lens Sh mThe frame number that comprises, Len (Sh n) be scenario A S vInterior n camera lens Sh nThe frame number that comprises, M, N are respectively scenario A S u, scenario A S vIn the camera lens number;
Step C2, will develop intensity and sort from big to small, and select front K maximum corresponding all highlight scenes of differentiation intensity as alternative summary fragment; The value of K is less than or equal to the sum of the detected highlight scene of step B;
Step C3, from alternative summary fragment, select final summary fragment, splice according to sequential, generate the summary of original video.
Describedly from alternative summary fragment, select final summary fragment, can be directly with alternative summary fragment as final summary fragment, also can select at random according to required length of summarization.For the video segment in the summary that makes final generation can present to audience more glibly, the present invention further alternative summary fragment of the duration is the too short and useful information that is beyond expression rejects, and according to the integrality of statement alternative summary fragment is adjusted, specifically in accordance with the following methods:
At first length in the alternative summary fragment is rejected less than 1 second highlight scene; Then respectively remaining each alternative summary fragment is carried out the detection of complete statement, and according to testing result alternative summary fragment is adjusted accordingly: the border such as complete statement exceeds alternative summary segment boundaries, and boundary adjustment that then will this alternative summary fragment is to the border of complete statement; Candidate after the adjustment fragment of making a summary is final summary fragment.
Compared to existing technology, the present invention has following beneficial effect:
The present invention selects suitable summary fragment generating video summary according to plot development relation, and this had both met people's logical thinking, also was conducive to guarantee the integrality of substance film; In addition, compare with bottom or middle layer audiovisual features, the plot Characteristics of Development shows the high-level semantic intension, therefore, can think more to press close to the semantic description of video content based on the summary of the method generation.
Description of drawings
Fig. 1 is the differentiation intensity between each highlight scene;
Fig. 2 is the summary Piece Selection situation of given length of summarization when being 7 seconds;
Fig. 3 is the summary Piece Selection situation of given length of summarization when being 10 seconds.
Embodiment
Below in conjunction with accompanying drawing technical scheme of the present invention is elaborated:
The purpose of this invention is to provide a kind of video frequency abstract extracting method based on plot, its realization approach is: at first, utilize temporal correlation analysis film composition, comprise effectively cutting apart of camera lens and scene; Then, analyze the scene of interest content and extract the audiovisual Expressive Features, realize the plot analysis; At last, according to changing intensity between scene, generation meets the movie summaries that the mankind watch custom.
A preferred implementation of the video frequency abstract extracting method based on plot of the present invention specifically may further comprise the steps:
Steps A, original video is carried out key frame, camera lens and scene detection.
1, camera lens is cut apart
Camera lens is the elementary cell of video data, and therefore, it is the first step of extracting summary that video data is divided into significant camera lens.From the image process angle, the process that camera lens is cut apart is that the picture frame that will take in same place is poly-to of a sort process.Can adopt existing various lens detection method to realize, for example, the employing that the people such as Zhuang propose realizes shot boundary detection (Y.Zhuang without measure of supervision, Y.Rui, T.Huang, and S.Mehrotra.Adaptive key frame extraction using unsupervised clustering.In Proceedings of IEEE International Conference on Image Processing, 1998:866-870), the people such as Boreczky adopt Hidden Markov Model (HMM) to realize solving the problem (J.Boreczky of shot boundary, L.Wilcox.A hidden markov model framework for video segmentation using audio and image features.In Proceedings of IEEE International Conference onAcoustics, Speech, and Signal Processing, 1998:3741 – 3744), the people such as Lienhart use neural network to carry out shot boundary detection (R.Lienhart.Reliable dissolve detection.In Proceedings of IEEE International Conference on Storage and Retrieval for Media Databases 2001:219-230,2001:219 – 230.).In order to make Shot Detection more accurate, use following methods realization camera lens to cut apart in this embodiment: at first, realize candidate's shot boundary detection by following steps: utilize the otherness of content information between the camera lens frame to determine the initial boundary of candidate's camera lens, on this basis, determine the exact boundary of candidate's camera lens according to the otherness of content information in the initial shot boundary neighborhood; Secondly, determine the translation type (gradual change, sudden change etc.) of real candidate's camera lens according to the two-dimensional entropy characteristic of picture frame, the invalid candidate's camera lens that utilizes simultaneously the situations such as the shake of removing those rapid movements because of object, video camera, flashlamp to generate.
2, scene detection
Compare with the content analysis of camera lens layer, it is more meaningful and more complete that the content information of scene layer will seem.This is that scene detection process may be defined as and will have the camera lens of correlativity on the space-time because from the image process angle, and cluster is to the process of Same Scene.Can use existing various scene detection method among the present invention, for example, the people such as Yeung propose to realize based on the method for scene conversion figure the detection (M.Yeung of scene boundary, B.Yeo, and B.Liu.Segmentation of video by clustering and graph analysis.Computer Vision and Image Understanding, 1998,71 (1): 94-109), the people such as Tavanapong carry out detection (the W.Tavanapong and J.Zhou.Shot clustering technique for story browsing.IEEEtransactions on multimedia of scene boundary in conjunction with the Moviemaking theory, (2004), 6 (4), 517 – 526), the people such as Zhai adopt horse customer service chain Monte-Carlo method to solve problem (the Y.Zhai and M of scene boundary detection, Shah.A general framework temporal video scene segmentation.In Proceedings on IEEE international conference on computer vision, 2005:1111-1116).The preferred following methods that adopts is realized scene detection in this embodiment: at first, under the tied mechanism of time window, (be not divided in the different scenes from the camera lens of Same Scene in order to guaranteeing, and prevent from being divided in the Same Scene from the camera lens of different scenes by mistake), determine the spatial coherence of the semantic content information between camera lens; Then, on the basis of existing space-time correlation, according to the otherness of semantic content information between each camera lens, accurately set up the border of scene.The method has been introduced time-constrain mechanism in scene detection, can avoid less divided or the over-segmentation of scene, thereby obtain accurately scene fragment.
Step B, from scene, detect highlight scene according to the video story plot.
Plot is most important to the semantic understanding of effective management of film and movie contents, and excellent plot (scene) wherein is the core that consists of whole movie contents, utilize highlight scene to make up video frequency abstract, can embody better the core content of video.The present invention by the audiovisual features of analyzing video detect session operational scenarios, the scene of fighting, these three kinds of highlight scenes the most representative of suspense scene, specific as follows:
1, the detection of session operational scenarios
Session operational scenarios in the film video often can be passed on important information, helps the beholder to understand the development of plot.The present invention at first utilizes method for detecting human face to detect to contain the scene of the people's face camera lens that alternately occurs, as similar session operational scenarios; Then, utilize audio analysis method (for example Hidden Markov Model (HMM)) to distinguish language and other audio frequency, from similar session operational scenarios, select the scene that comprises voice, be session operational scenarios.
In different session operational scenarios, the easier attraction of emotion session operational scenarios beholder's attention, and whole plot development had important impact.Therefore, be necessary from general session operational scenarios, to detect the emotion session operational scenarios.The present invention adopts two kinds of exemplary audio features to realize the identification of emotion session operational scenarios: average fundamental frequency and Strength Changes in short-term, be specially: extract respectively the average fundamental frequency of each session operational scenarios and Strength Changes in short-term, select both all greater than the session operational scenarios of predetermined threshold value, be the emotion session operational scenarios.。
2, the fight detection of scene
In the video of action/war/adventure, often a lot of action scenes can appear, such as rifle bucket scene, fight scene and chase scene.
If when a scene satisfied following three conditions simultaneously, we then were considered as action scene with this scene: the frame number of each camera lens is less than 25 in this scene, and the intensity of on average enlivening of each camera lens surpasses 200, and the average audio energy of each camera lens surpasses 100.On this basis, action scene also can be further divided into the more familiar scene of following three-type-persons.
(1) gunbattle scene.According to we daily experience knowledge and the theory of film making as can be known, the picture that artillery fire often occurs in the gunbattle scene, explodes and bleed.By to the carefully analyzing of color histogram, we find the most significant color of these three kinds of pictures respectively: be orange, yellow and red.Therefore, we realize the identification of gunbattle scene by the pre-service to color, namely select orange, yellow, red three kinds of color characteristics all greater than the action scene of predetermined threshold value as the gunbattle scene.
(2) fight scene and chase scene.By to the scrutinizing of this two scene audio-frequency informations, we find that they have unique separately audio-frequency information.Wherein, the scene of fighting comprises blare usually, and chase scene then often comprises grating and birdie.Therefore, can adopt audio analysis method (for example Hidden Markov Model (HMM)) to distinguish the audio-frequency information of these three kinds of uniquenesses, the differentiation of scene and chase scene thereby realization is fought: the action scene of selecting to comprise the blare audio frequency characteristics is selected to comprise the action scene of grating and birdie audio frequency characteristics as chase scene as fighting scene.
3, the detection of suspense scene
A lot of suspense scenes can appear in terrible film and the detective's film.When a scene satisfies following three conditions simultaneously, we then this scene be called the suspense scene:
(1) the average intensity of illumination of this scene is less than 50;
(2) this scene audio power bag of beginning certain several camera lens is no more than 5, and the audio power bag of certain two cinestrip changes and surpasses 50;
(3) this scene intensity of enlivening of beginning several camera lenses is no more than 5, and the Strength Changes of enlivening of certain two cinestrip surpasses 100.
Step C, from highlight scene, select the summary fragment according to actual conditions, and splice according to sequential, generate the summary of original video.
According to the highlight scene that obtains among the step B, direct generating video summary for example, can according to the required summary duration suitable highlight scene of selected part at random, also can select certain class highlight scene to consist of summary according to the main action of video.The present invention further screens highlight scene according to the conversion intensity between excellent plot, thereby better describes the development of plot by the variation between plot.Step C specifically comprises following substep:
Differentiation intensity between step C1, any two highlight scenes of calculating.
Alternative types between excellent plot comprises following three kinds: time domain conversion, spatial alternation and rhythm conversion.According to our daily experience and film clip principle as can be known: correlativity is fewer between two plots, and it is larger then to change accordingly intensity.Therefore, scenario transition intensity herein is not only the leading indicator of estimating the development of photoplay plot, also is simultaneously the important foundation that generates movie summaries.
(1) generally speaking, the time domain conversion between two different scenes can be described by corresponding people's face quantity.Two scenario A S uAnd AS vBetween time domain conversion intensity be expressed as:
TT ( AS u , AS v ) = | Σ p = 1 P N ( AS u , Sh l , Kf p ) - Σ q = 1 Q N ( AS v , Sh w , Kf q ) | - - - ( 1 )
In the formula, N (AS u, Sh l, Kf p) be scenario A S uThe people's face number that occurs in the key frame p among interior last camera lens l, N (AS v, Sh w, Kf q) be scenario A S vThe people's face number that occurs in the key frame q among interior first camera lens w, P, Q are respectively the number of key frames among camera lens l and the w.
Set up such as following inequality, then scenario A S uAnd AS vBetween have a time domain conversion:
TT ( AS u , AS v ) > 1 P + Q [ Σ p = 1 P N ( AS u , Sh l , Kf p ) + Σ q = 1 Q N ( AS v , Sh w , Kf q ) ] - - - ( 2 )
(2) spatial alternation represents that same performer appears in two different scenes, can obtain by the variation of judging the background area.The computing formula of spatial alternation intensity is as follows:
ST ( AS u , AS v ) = | 1 P Σ p = 1 P RA ( p ) - 1 Q Σ q = 1 Q RA ( q ) | + | 1 P Σ p = 1 P GA ( p ) - 1 Q Σ q = 1 Q GA ( q ) |
+ | 1 P Σ p = 1 P BA ( p ) - 1 Q Σ q = 1 Q BA ( q ) | + | 1 P Σ p = 1 P LA ( p ) - 1 Q Σ q = 1 Q LA ( q ) |
In the formula, RA (p), GA (p), BA (p) and LA (p) represent respectively scenario A S uIn the mean value of red, green, blue and brightness in the background area of key frame p among last camera lens l, RA (q), GA (q), BA (q) and LA (q) represent respectively scenario A S vIn the mean value of red, green, blue and brightness in the background area of key frame q among first camera lens w, P, Q are respectively the number of key frames among camera lens l and the w.
Set up such as following inequality, then scenario A S uAnd AS vBetween Existential Space conversion:
ST ( AS u , AS v ) > 1 2 | 1 P Σ p = 1 P RA ( p ) + 1 Q Σ q = 1 Q RA ( q ) | + 1 2 | 1 P Σ p = 1 P GA ( p ) + 1 Q Σ q = 1 Q GA ( q ) | ( 4 )
+ 1 2 | 1 P Σ p = 1 P BA ( p ) + 1 Q Σ q = 1 Q BA ( q ) | + 1 2 | 1 P Σ p = 1 P LA ( p ) + 1 Q Σ q = 1 Q LA ( q ) |
(3) adopt periodically conversion on the duration, representative be the anxiety of atmosphere and easily, scenario A S uWith scenario A S vThe periodic conversion strength calculation formula as follows:
RT ( AS u , AS v ) = | 1 M Σ m = 1 M Len ( Sh m ) - 1 N Σ n = 1 N Len ( Sh n ) | - - - ( 5 )
In the formula, Len (SH m) be scenario A S uInterior m camera lens Sh mThe frame number that comprises, Len (Sh n) be scenario A S vInterior n camera lens Sh nThe frame number that comprises, M, N are respectively scenario A S u, scenario A S vIn the camera lens number.
When following inequality is set up, scenario A S uAnd AS vThere is periodic conversion:
RT ( AS u , AS v ) > 2 | 1 M Σ m = 1 M Len ( Sh m ) + 1 N Σ n = 1 N Len ( Sh n ) | - - - ( 6 )
The present invention adopts and develops two scenario A S of strength function (Progress Intensity Function, PIF) elaboration uAnd AS vBetween plot develop, its expression formula is as follows:
PIF(AS u,AS v)=α*TT n(AS u,AS v)+β*ST n(AS u,AS v)+γ*RT n(AS u,AS v) (7)
In the formula, PIF (AS u, AS v) two different scenario A S of expression uAnd AS vBetween differentiation intensity, TT n(AS u, AS v), ST n(AS u, AS v), RT n(AS u, AS v) be respectively AS uAnd AS vBetween time domain conversion intensity TT (AS u, AS v), spatial alternation strength S T (AS u, AS v), periodic conversion intensity RT (AS u, AS v) normalized form, α, β, γ are the weight coefficient that satisfies alpha+beta+γ=1.
Step C2, will develop intensity and sort from big to small, and select front K maximum corresponding all highlight scenes of differentiation intensity as alternative summary fragment; The value of K is less than or equal to the sum of the detected highlight scene of step B.
Because the Summary Time that usually requires is shorter, therefore can according to required summary duration, pick out with other highlight scenes and develop a part of highlight scene of intensity maximum as alternative summary fragment.
Step C3, from alternative summary fragment, select final summary fragment, splice according to sequential, generate the summary of original video.
Alternative summary fragment meets the summary requirement in fact substantially, can be directly with it as final summary fragment, it is spliced chronologically the generating video summary.Fig. 2 has shown an example that adopts this scheme generating video summary, has obtained altogether KS-1, KS-2, these four sections highlight scenes of KS-3, KS-4 by step B in this example, and persistence length was respectively 2 seconds, 3 seconds, 3 seconds and 4 seconds.Its differentiation intensity between any two according to developing intensity order from high to low, can be determined final summary fragment according to given length of summarization as shown in Figure 1.For example, when given length of summarization is 7 seconds, should select KS-1 and KS-3, as shown in Figure 2; When given length of summarization is 10 seconds, then should select KS-1, KS-3, KS-4, as shown in Figure 3.
For the video segment in the summary that makes final generation can present to audience more glibly, the present invention further alternative summary fragment of the duration is the too short and useful information that is beyond expression rejects, and according to the integrality of statement alternative summary fragment is adjusted.
The order of summary is and will generates the video segment that compressibility is very high that it should comprise useful plot information as much as possible, and these video segments can present to audience glibly.Want so that each the summary fragment that generates is meaningful, the fragment of then respectively making a summary can not be too short, finds according to statistical research: if the video sequence duration less than 1s, then it can not represent any useful information.Therefore, the present invention at first directly rejects the duration as useless video segment less than 1 second alternative summary fragment.
For make video segment as far as possible smoothness be and dedicate spectators to, also need make suitable adjustment according to the statement integrality to remaining alternative summary fragment, specific as follows:
Respectively remaining each alternative summary fragment is carried out the detection of complete statement, and according to testing result alternative summary fragment is adjusted accordingly.The detection of complete statement can be adopted existing the whole bag of tricks, for example, Schreiner proposes complete statement detection method (the O.Schreiner.Modulation spectrum for pitch and speech pause detection.In Proceedings on.European Conference on Speech Communication and Technology based on modulation spectrum, 2003), the people such as Liu adopt the condition random field method to detect the border (Y.Liu of complete statement, A.Stolcke, E.Shriberg, and M.Harper.Using Conditional Random Fields for Sentence Boundary Detection in Speech.Annual Meeting of the Association for Computational Linguistics, 2005), the people such as Szczurowska utilize the Kohonen network to realize detection (I.Szczurowska, the W. of statement boundary integrality , and E.Smolka.Speech nonfluency detection using Kohonen networks.Neural Computing and Applications, 2009,18 (7): 677-687).Adopt following methods to carry out the detection of complete statement in this embodiment: to use audio power and second order zero-crossing rate, from continuous voice sequence, detect the time-out fragment; Adopt minimum time out and statement time, realize the level and smooth of previous step segmentation result; With longer time out, detect the statement fragment.
According to testing result alternative summary fragment is adjusted: the border such as complete statement exceeds alternative summary segment boundaries, and boundary adjustment that then will this alternative summary fragment is to the border of complete statement.The border of complete statement exceeds alternative summary segment boundaries can be divided into two kinds of situations, a kind of is that the monolateral border of complete statement exceeds alternative summary segment boundaries, another kind is that the border all exceeds alternative summary segment boundaries (being that complete statement covers alternative summary fragment) before and after the complete statement, and the boundary adjustment that this moment will this alternative summary fragment is the border of complete statement extremely.Candidate after the adjustment fragment of making a summary is final summary fragment.According to the time order and function order final summary fragment assembly is got up namely to obtain video frequency abstract.
Compare existing various video frequency abstract extracting method, the present invention selects suitable summary fragment generating video summary according to plot development relation, the logical thinking that more meets people also is conducive to guarantee the integrality of substance film, accurately embodies the main plot of video.

Claims (9)

1. the video frequency abstract extracting method based on plot is characterized in that, may further comprise the steps:
Steps A, original video is carried out key frame, camera lens and scene detection;
Step B, from scene, detect highlight scene according to the video story plot;
Step C, from highlight scene, select the summary fragment according to actual conditions, and splice according to sequential, generate the summary of original video.
2. as claimed in claim 1 based on the video frequency abstract extracting method of plot, it is characterized in that the detection of described highlight scene comprises:
Session operational scenarios detects: at first detect the scene that contains the people's face camera lens that alternately occurs according to people's face information, as candidate's session operational scenarios; Then, from candidate's session operational scenarios, select the scene that comprises voice, be session operational scenarios;
Action scene detects: when a scene satisfies following three conditions simultaneously, then this scene is considered as action scene: the frame number of each camera lens is less than 25 in this scene, the intensity of on average enlivening of each camera lens surpasses 200, and the average audio energy of each camera lens surpasses 100;
The suspense scene detection: when a scene satisfied following three conditions simultaneously, then this scene was the suspense scene: the average intensity of illumination of this scene is less than 50; The audio power bag that this scene begins certain several camera lens is no more than 5, and the audio power bag of certain two cinestrip changes above 50; The intensity of enlivening that this scene begins several camera lenses is no more than 5, and the Strength Changes of enlivening of certain two cinestrip surpasses 100.
3. as claimed in claim 2 based on the video frequency abstract extracting method of plot, it is characterized in that, described session operational scenarios detects the detection that also comprises the emotion session operational scenarios: extract respectively the average fundamental frequency of each session operational scenarios and Strength Changes in short-term, select both all greater than the session operational scenarios of predetermined threshold value, be the emotion session operational scenarios.
4. as claimed in claim 2 based on the video frequency abstract extracting method of plot, it is characterized in that described action scene detects and also comprises:
The gunbattle scene detection: select orange, yellow, red three kinds of color characteristics all greater than the action scene of predetermined threshold value as the gunbattle scene;
The scene detection of fighting: select to comprise the action scene of blare audio frequency characteristics as fighting scene;
Chase scene detects: select to comprise the action scene of grating and birdie audio frequency characteristics as chase scene.
5. such as claim 1-4 video frequency abstract extracting method based on plot as described in each, it is characterized in that described step C specifically comprises following each substep:
Step C1, calculate differentiation intensity between any two highlight scenes according to following formula:
In the formula,
Figure 829107DEST_PATH_IMAGE002
Represent two different scenes
Figure 2012103581835100001DEST_PATH_IMAGE003
With
Figure 348950DEST_PATH_IMAGE004
Between differentiation intensity, TT n (AS u , AS v ), ST n (AS u , AS v ), RT n (AS u , AS v )Be respectively
Figure 101005DEST_PATH_IMAGE003
With
Figure 172735DEST_PATH_IMAGE004
Between time domain conversion intensity TT (AS u , AS v ), spatial alternation intensity ST (AS u , AS v ), periodic conversion intensity RT (AS u , AS v )Normalized form, α, β, γFor satisfying Alpha+beta+γ=1 weight coefficient; Wherein,
Time domain conversion intensity TT (AS u , AS v )Computing formula be:
Figure DEST_PATH_IMAGE005
In the formula, N (AS u , Sh l , Kf p )It is scene AS u Interior last camera lens lMiddle key frame pIn people's face number of occurring, N (AS v , Sh w , Kf q )It is scene AS v Interior first camera lens wMiddle key frame qIn people's face number of occurring, P, QBe respectively camera lens lWith wIn number of key frames;
Spatial alternation intensity ST (AS u , AS v )Computing formula be:
Figure 564402DEST_PATH_IMAGE006
In the formula, RA (p), GA (p), BA (p)With LA (p)Represent respectively scene AS u Interior last camera lens lMiddle key frame pThe background area in the mean value of red, green, blue and brightness, RA (q), GA (q), BA (q)With LA (q)Represent respectively scene AS v Interior first camera lens wMiddle key frame qThe background area in the mean value of red, green, blue and brightness, P, QBe respectively camera lens lWith wIn number of key frames;
Periodic conversion intensity RT (AS u , AS v )Computing formula be:
Figure DEST_PATH_IMAGE007
In the formula,
Figure 217099DEST_PATH_IMAGE008
Be scene AS u In the mIndividual camera lens
Figure DEST_PATH_IMAGE009
The frame number that comprises,
Figure 10611DEST_PATH_IMAGE010
Be scene AS v In the nIndividual camera lens The frame number that comprises,
Figure 128609DEST_PATH_IMAGE012
,
Figure DEST_PATH_IMAGE013
Be respectively scene AS u , scene AS v In the camera lens number;
Step C2, will develop intensity and sort from big to small, select maximum before KCorresponding all highlight scenes of individual differentiation intensity are as alternative summary fragment; KValue be less than or equal to the sum of the detected highlight scene of step B;
Step C3, from alternative summary fragment, select final summary fragment, splice according to sequential, generate the summary of original video.
6. as claimed in claim 5 based on the video frequency abstract extracting method of plot, it is characterized in that, describedly from alternative summary fragment, select final summary fragment, specifically in accordance with the following methods: at first length in the alternative summary fragment is rejected less than 1 second highlight scene; Then respectively remaining each alternative summary fragment is carried out the detection of complete statement, and according to testing result alternative summary fragment is adjusted accordingly: the border such as complete statement exceeds alternative summary segment boundaries, and boundary adjustment that then will this alternative summary fragment is to the border of complete statement; Candidate after the adjustment fragment of making a summary is final summary fragment.
7. as claimed in claim 6 based on the video frequency abstract extracting method of plot, it is characterized in that, the detection of described complete statement in accordance with the following methods: use audio power and second order zero-crossing rate, from continuous voice sequence, detect the time-out fragment; Adopt minimum time out and statement time, realize the level and smooth of previous step segmentation result; With longer time out, detect the statement fragment.
8. as claimed in claim 5 based on the video frequency abstract extracting method of plot, it is characterized in that, described Shot Detection is specifically in accordance with the following methods: at first, carry out candidate's shot boundary detection: utilize the otherness of content information between the camera lens frame to determine the initial boundary of candidate's camera lens, on this basis, determine the exact boundary of candidate's camera lens according to the otherness of content information in the initial shot boundary neighborhood; Secondly, determine the translation type of real candidate's camera lens according to the two-dimensional entropy characteristic of picture frame, utilize simultaneously and remove invalid candidate's camera lens.
9. as claimed in claim 5 based on the video frequency abstract extracting method of plot, it is characterized in that, described scene detection specifically in accordance with the following methods: at first, under the tied mechanism of time window, determine the spatial coherence of the semantic content information between camera lens; Then, on the basis of existing space-time correlation, according to the otherness of semantic content information between each camera lens, accurately set up the border of scene.
CN201210358183.5A 2012-09-24 2012-09-24 A kind of video abstraction extraction method based on plot Expired - Fee Related CN102902756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210358183.5A CN102902756B (en) 2012-09-24 2012-09-24 A kind of video abstraction extraction method based on plot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210358183.5A CN102902756B (en) 2012-09-24 2012-09-24 A kind of video abstraction extraction method based on plot

Publications (2)

Publication Number Publication Date
CN102902756A true CN102902756A (en) 2013-01-30
CN102902756B CN102902756B (en) 2016-02-03

Family

ID=47574988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210358183.5A Expired - Fee Related CN102902756B (en) 2012-09-24 2012-09-24 A kind of video abstraction extraction method based on plot

Country Status (1)

Country Link
CN (1) CN102902756B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731944A (en) * 2015-03-31 2015-06-24 努比亚技术有限公司 Video searching method and device
CN104811745A (en) * 2015-04-28 2015-07-29 无锡天脉聚源传媒科技有限公司 Video content displaying method and device
CN104883478A (en) * 2015-06-17 2015-09-02 北京金山安全软件有限公司 Video processing method and device
CN104954717A (en) * 2014-03-24 2015-09-30 宇龙计算机通信科技(深圳)有限公司 Terminal and video title generation method
CN104954889A (en) * 2014-03-28 2015-09-30 宇龙计算机通信科技(深圳)有限公司 Generating method and generating system of titles
CN105847964A (en) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 Movie and television program processing method and movie and television program processing system
CN106210902A (en) * 2016-07-06 2016-12-07 华东师范大学 A kind of cameo shot clipping method based on barrage comment data
CN106649713A (en) * 2016-12-21 2017-05-10 中山大学 Movie visualization processing method and system based on content
CN109040773A (en) * 2018-07-10 2018-12-18 武汉斗鱼网络科技有限公司 A kind of video improvement method, apparatus, equipment and medium
CN109089127A (en) * 2018-07-10 2018-12-25 武汉斗鱼网络科技有限公司 A kind of video-splicing method, apparatus, equipment and medium
CN109151616A (en) * 2018-08-07 2019-01-04 石家庄铁道大学 Video key frame extracting method
CN109525892A (en) * 2018-12-03 2019-03-26 易视腾科技股份有限公司 Video Key situation extracting method and device
CN109729421A (en) * 2017-10-27 2019-05-07 优酷网络技术(北京)有限公司 A kind of generation method and device of video presentation content
CN109889879A (en) * 2019-03-25 2019-06-14 联想(北京)有限公司 Information control method and electronic equipment
CN109977262A (en) * 2019-03-25 2019-07-05 北京旷视科技有限公司 The method, apparatus and processing equipment of candidate segment are obtained from video
CN110830852A (en) * 2018-08-07 2020-02-21 北京优酷科技有限公司 Video content processing method and device
CN111052751A (en) * 2017-09-19 2020-04-21 索尼公司 Calibration system for audience response capture and analysis of media content
CN111263234A (en) * 2020-01-19 2020-06-09 腾讯科技(深圳)有限公司 Video clipping method, related device, equipment and storage medium
CN111309916A (en) * 2020-03-05 2020-06-19 北京奇艺世纪科技有限公司 Abstract extraction method and device, storage medium and electronic device
CN112153462A (en) * 2019-06-26 2020-12-29 腾讯科技(深圳)有限公司 Video processing method, device, terminal and storage medium
CN114245171A (en) * 2021-12-15 2022-03-25 百度在线网络技术(北京)有限公司 Video editing method, video editing device, electronic equipment and media
CN115834977A (en) * 2022-11-18 2023-03-21 贝壳找房(北京)科技有限公司 Video processing method, electronic device, storage medium, and computer program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070046669A1 (en) * 2003-06-27 2007-03-01 Young-Sik Choi Apparatus and method for automatic video summarization using fuzzy one-class support vector machines
CN101127866A (en) * 2007-08-10 2008-02-20 西安交通大学 A method for detecting wonderful section of football match video
CN101431689A (en) * 2007-11-05 2009-05-13 华为技术有限公司 Method and device for generating video abstract

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070046669A1 (en) * 2003-06-27 2007-03-01 Young-Sik Choi Apparatus and method for automatic video summarization using fuzzy one-class support vector machines
CN101127866A (en) * 2007-08-10 2008-02-20 西安交通大学 A method for detecting wonderful section of football match video
CN101431689A (en) * 2007-11-05 2009-05-13 华为技术有限公司 Method and device for generating video abstract

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104954717A (en) * 2014-03-24 2015-09-30 宇龙计算机通信科技(深圳)有限公司 Terminal and video title generation method
CN104954717B (en) * 2014-03-24 2018-07-24 宇龙计算机通信科技(深圳)有限公司 A kind of terminal and video title generation method
CN104954889A (en) * 2014-03-28 2015-09-30 宇龙计算机通信科技(深圳)有限公司 Generating method and generating system of titles
CN104954889B (en) * 2014-03-28 2019-06-11 宇龙计算机通信科技(深圳)有限公司 Head generation method and generation system
CN104731944A (en) * 2015-03-31 2015-06-24 努比亚技术有限公司 Video searching method and device
CN104811745A (en) * 2015-04-28 2015-07-29 无锡天脉聚源传媒科技有限公司 Video content displaying method and device
CN104883478B (en) * 2015-06-17 2018-11-16 北京金山安全软件有限公司 Video processing method and device
CN104883478A (en) * 2015-06-17 2015-09-02 北京金山安全软件有限公司 Video processing method and device
US10553254B2 (en) 2015-06-17 2020-02-04 Beijing Kingsoft Internet Security Software Co., Ltd. Method and device for processing video
CN105847964A (en) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 Movie and television program processing method and movie and television program processing system
CN106210902A (en) * 2016-07-06 2016-12-07 华东师范大学 A kind of cameo shot clipping method based on barrage comment data
CN106210902B (en) * 2016-07-06 2019-06-11 华东师范大学 A kind of cameo shot clipping method based on barrage comment data
CN106649713A (en) * 2016-12-21 2017-05-10 中山大学 Movie visualization processing method and system based on content
CN106649713B (en) * 2016-12-21 2020-05-12 中山大学 Movie visualization processing method and system based on content
CN111052751B (en) * 2017-09-19 2022-02-01 索尼公司 Calibration system for audience response capture and analysis of media content
US11218771B2 (en) 2017-09-19 2022-01-04 Sony Corporation Calibration system for audience response capture and analysis of media content
CN111052751A (en) * 2017-09-19 2020-04-21 索尼公司 Calibration system for audience response capture and analysis of media content
CN109729421A (en) * 2017-10-27 2019-05-07 优酷网络技术(北京)有限公司 A kind of generation method and device of video presentation content
CN109089127B (en) * 2018-07-10 2021-05-28 武汉斗鱼网络科技有限公司 Video splicing method, device, equipment and medium
CN109040773A (en) * 2018-07-10 2018-12-18 武汉斗鱼网络科技有限公司 A kind of video improvement method, apparatus, equipment and medium
CN109089127A (en) * 2018-07-10 2018-12-25 武汉斗鱼网络科技有限公司 A kind of video-splicing method, apparatus, equipment and medium
CN110830852A (en) * 2018-08-07 2020-02-21 北京优酷科技有限公司 Video content processing method and device
CN110830852B (en) * 2018-08-07 2022-08-12 阿里巴巴(中国)有限公司 Video content processing method and device
CN109151616B (en) * 2018-08-07 2020-09-08 石家庄铁道大学 Video key frame extraction method
CN109151616A (en) * 2018-08-07 2019-01-04 石家庄铁道大学 Video key frame extracting method
CN109525892B (en) * 2018-12-03 2021-09-10 易视腾科技股份有限公司 Video key scene extraction method and device
CN109525892A (en) * 2018-12-03 2019-03-26 易视腾科技股份有限公司 Video Key situation extracting method and device
CN109889879A (en) * 2019-03-25 2019-06-14 联想(北京)有限公司 Information control method and electronic equipment
CN109977262A (en) * 2019-03-25 2019-07-05 北京旷视科技有限公司 The method, apparatus and processing equipment of candidate segment are obtained from video
CN112153462B (en) * 2019-06-26 2023-02-14 腾讯科技(深圳)有限公司 Video processing method, device, terminal and storage medium
CN112153462A (en) * 2019-06-26 2020-12-29 腾讯科技(深圳)有限公司 Video processing method, device, terminal and storage medium
CN111263234B (en) * 2020-01-19 2021-06-15 腾讯科技(深圳)有限公司 Video clipping method, related device, equipment and storage medium
CN111263234A (en) * 2020-01-19 2020-06-09 腾讯科技(深圳)有限公司 Video clipping method, related device, equipment and storage medium
CN111309916A (en) * 2020-03-05 2020-06-19 北京奇艺世纪科技有限公司 Abstract extraction method and device, storage medium and electronic device
CN111309916B (en) * 2020-03-05 2023-06-30 北京奇艺世纪科技有限公司 Digest extracting method and apparatus, storage medium, and electronic apparatus
CN114245171A (en) * 2021-12-15 2022-03-25 百度在线网络技术(北京)有限公司 Video editing method, video editing device, electronic equipment and media
CN114245171B (en) * 2021-12-15 2023-08-29 百度在线网络技术(北京)有限公司 Video editing method and device, electronic equipment and medium
CN115834977A (en) * 2022-11-18 2023-03-21 贝壳找房(北京)科技有限公司 Video processing method, electronic device, storage medium, and computer program product
CN115834977B (en) * 2022-11-18 2023-09-08 贝壳找房(北京)科技有限公司 Video processing method, electronic device, storage medium and computer program product

Also Published As

Publication number Publication date
CN102902756B (en) 2016-02-03

Similar Documents

Publication Publication Date Title
CN102902756A (en) Video abstraction extraction method based on story plots
Dhall et al. Video and image based emotion recognition challenges in the wild: Emotiw 2015
Yu et al. Video paragraph captioning using hierarchical recurrent neural networks
CN103218608B (en) Network violent video identification method
Sun et al. Large-scale web video event classification by use of fisher vectors
Jin et al. Deep image aesthetics classification using inception modules and fine-tuning connected layer
Bhattacharya et al. Towards a comprehensive computational model foraesthetic assessment of videos
Kang Affective content detection using HMMs
CN101650722B (en) Method based on audio/video combination for detecting highlight events in football video
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
Soleymani et al. A bayesian framework for video affective representation
WO2021238826A1 (en) Method and apparatus for training instance segmentation model, and instance segmentation method
CN113709384A (en) Video editing method based on deep learning, related equipment and storage medium
Patrick et al. Space-time crop & attend: Improving cross-modal video representation learning
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN103150373A (en) Generation method of high-satisfaction video summary
Ionescu et al. A naive mid-level concept-based fusion approach to violence detection in hollywood movies
CN109784277A (en) A kind of Emotion identification method based on intelligent glasses
CN109525892A (en) Video Key situation extracting method and device
CN113963304B (en) Cross-modal video time sequence action positioning method and system based on time sequence-space diagram
Bao et al. Informedia@ trecvid 2011
CN112989950A (en) Violent video recognition system oriented to multi-mode feature semantic correlation features
US9286690B2 (en) Method and apparatus for moving object detection using fisher's linear discriminant based radial basis function network
CN116665083A (en) Video classification method and device, electronic equipment and storage medium
CN110765314A (en) Video semantic structural extraction and labeling method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130130

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000210

Denomination of invention: Video abstraction extraction method based on story plots

Granted publication date: 20160203

License type: Common License

Record date: 20161114

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000210

Date of cancellation: 20180116

EC01 Cancellation of recordation of patent licensing contract
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160203

Termination date: 20180924

CF01 Termination of patent right due to non-payment of annual fee