CN105744356A - Content-based video segmentation method - Google Patents
Content-based video segmentation method Download PDFInfo
- Publication number
- CN105744356A CN105744356A CN201610066554.0A CN201610066554A CN105744356A CN 105744356 A CN105744356 A CN 105744356A CN 201610066554 A CN201610066554 A CN 201610066554A CN 105744356 A CN105744356 A CN 105744356A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- video
- word
- section
- dialogue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Abstract
The invention discloses a content-based video segmentation method. The method comprises the following steps of firstly, obtaining a content and a time point of each paragraph in a video by utilizing a subtitle file and combining the close paragraphs together to form a large paragraph by utilizing a time interval between every two paragraphs; secondly, carrying out word segmentation on the large paragraph, obtaining the similarity between the sentences by utilizing the similarity between the words, combining the sentences with the larger similarity together to form a paragraph and obtaining an initial video segmentation position according to the corresponding time information; and lastly, carrying out shot extraction on the video based on an image and combining with the previous segmentation position to find the final accurate segmentation position.
Description
Technical field
The present invention designs technical field of video processing, and natural language processing technique, particularly to video segmentation method.
Background technology
Science and education video is a kind of common video type, along with the arriving of cybertimes, user viewing science and education video carrier also from TV gradually to computer and network change.And when viewing video, spectators often select fast hop, skip the part being not desired to viewing, watch the content that they are interested.
And in the process jumped, user is difficult to the position adjusted exactly to thinking viewing accurately, it is required for getting to position satisfied in user mind through repeatedly adjustment, this process extremely affects viewing experience, so video is carried out segmentation, user is when selecting to skip this section of content, the segment information that video itself provides can be passed through, position the beginning to next section of content quickly and accurately, slowly adjust without user oneself, this is for video website, considerably increases its competitiveness beyond doubt.
The scheme of current video automatic segmentation is mostly use the method based on scene detection, the video of same scene is divided into one section, the frame of scene change is set as the starting point of a certain section, but a video often has substantial amounts of scene and tells about same part thing, segmentation based on scene can cause segment frequence too high, even occurring in the special circumstances having divided several sections within several seconds to occur, this is all unscientific segmentation method.
And there is the video of the subtitle file of standard, can from the time shaft of captions and particular content, from the angle of natural language, analyze the similarity between each section of words, similarity is utilized to carry out segmentation, recycle this segment information, in conjunction with the conversion in the scene comparing cleaning in science and education video, obtain segment information accurately.
Summary of the invention
The invention aims to solve the automatic segmentation problem of science and education video, it is provided that a kind of content-based automatic segmentation method.It is characterized in that comprising the following steps:
The dialogue stream S={s in video is extracted by subtitle file1, s2, s3..., sn, the time started B={b of each section of dialogue1, b2, b3..., bn, the end time E={e of each section of dialogue1, e2, e3..., en}。
For all of neighbor dialogs si, si-1, set a threshold value λ, work as bi-ei-1During < λ, then make si, si-1Being classified as same section, thus dialogue stream S is divided into m section, wherein i-th section by from kth word, is made up of the dialogue of l section continuous print altogether, i.e. Si={ sk, sk+1, sk+2... sk+l}。
Use ICTCLAS participle instrument to every in short skCarry out participle, after removing non-notional word, obtain skWord list Ck={ ck1, ck2, ck3..., ckh}。
Any two sentence s is tried to achieve by below equationxAnd syBetween similarity:
Wherein f (cxi) for word cxiTerm vector, f (cyi) for word cyiTerm vector.By two word cxi, cyiThe dot product f (c of term vectorxi)f(cyi) can in the hope of the similarity of two words;
For all of Si, utilize the 4th step to try to achieve SiIn similarity matrix M between all sentencesi, in order to allow matrix character become apparent from, we are to MiConvert:
First, to MiIn any one value x all do with down conversion:
Wherein g (x) is:
Then to the M after conversioniIn any one value mX, yAll the window of its 1*11 is made a statistics, namely to { mX, y-5, mX, y-4..., mX, y+4, mX, y+5Count less than mX, yThe number of value, with less than mX, yThe number of value replace mX, y;
Needing below the matrix after replacing is split, dividing method is as follows:
First, the accuracy of a splitting scheme is defined: orderaX, y=(y-x+1)2, it is assumed that to SiDividing method BF={bf is obtained after dividing1, bf2..., bfp, then according to dividing method BF, it is possible to the accuracy of computational methods BF:
Segmentation is the process of an iteration, and wherein the method for the segmentation of the i-th step is as follows: calculates and all once splits BF upperi-1When, then carry out the place wherein arbitrarily can split dividing the dividing method BF ' formedi={ BF 'i1, BF 'i2, BF 'i3... BF 'iw, calculate accuracy D, take out BF 'iMiddle D is the BF ' of maximum valueiuAs optimization next step splitting scheme BFi;
A series of segmentation step BF can be obtained by previous step1, BF2, BF3…BFg, corresponding efficiency is D1, D2, D3…DgTake out the lifting δ D of segmentationi=Di+1-Di, after segmentation each time, calculate δ DiThe mean μ of set and variance ν, whenIn time, stops.
The characteristics of image that whole video utilizes video carries out segmentation, and the RGB histogrammic difference computing formula of adjacent two frames is as follows:
Wherein, H (f1, R, i) represent f1The histogram value of the i-th red level of frame, H (f1, G, i) represent f1The histogram value of the i-th green level of frame, H (f1, B, i) represent f1The histogram value of the i-th blue level of frame.
Then mean μ and the variance ν of the rectangular histogram difference of adjacent two frames of whole video are tried to achieve, when the difference between two frames is more than threshold value μ+av, using the second frame as an edge camera lens.A is for adjusting parameter.
Suitable camera lens is looked near segmentation place that text finds.If segmentation place that text is in video is a period of time, then in this period, find the maximum camera lens of change as shot segmentation;If this period is absent from edge camera lens, then choose from this period nearest camera lens as shot segmentation;If segmentation place that text is in video is a frame, then the immediate camera lens of this frame of selected distance is as shot segmentation.
Accompanying drawing explanation
Fig. 1 is original similarity matrix exemplary plot;
Fig. 2 is the similarity matrix exemplary plot after optimizing;
Fig. 3 is the selection exemplary plot of Factorization algorithm.
Detailed description of the invention
A kind of content-based video segmentation method, comprises the following steps:
The dialogue stream S={s in video is extracted by subtitle file1, s2, s3..., sn, the time started B={b of each section of dialogue1, b2, b3..., bn, the end time E={e of each section of dialogue1, e2, e3..., en}。
For all of neighbor dialogs si, si-1, set a threshold value λ, work as bi-ei-1During < λ, then make si, si-1Being classified as same section, thus dialogue stream S is divided into m section, wherein i-th section by from kth word, is made up of the dialogue of l section continuous print altogether, i.e. Si={ sk, sk+1, sk+2... sk+l}。
Use ICTCLAS participle instrument to every in short skCarry out participle, after removing non-notional word, obtain skWord list Ck={ ck1, ck2, ck3..., ckn}。
Any two sentence s is tried to achieve by below equationxAnd syBetween similarity:
Wherein f (cxi) for word cxiTerm vector, f (cyi) for word cyiTerm vector.By two word cxi, cyiThe dot product f (c of term vectorxi)f(cyi) can in the hope of the similarity of two words;
For all of Si, utilize the 4th step to try to achieve SiIn similarity matrix M between all sentencesi, as it is shown in figure 1, in order to allow matrix character become apparent from, we are to MiConvert:
First, to MiIn any one value x all do with down conversion:
Wherein g (x) is:
Then to the M after conversioniIn any one value mX, yAll the window of its 1*11 is made a statistics, namely to { mX, y-5, mX, y-4..., mX, y+4, mX, y+5Count less than mX, yThe number of value, with less than mX, yThe number of value replace mX, y;As shown in the figure.
As it is shown on figure 3, need below the matrix after replacing is split, dividing method is as follows:
First, the accuracy of a splitting scheme is defined: orderaX, y=(y-x+1)2, it is assumed that to SiDividing method BF={bf is obtained after dividing1, bf2..., bfp, then according to dividing method BF, it is possible to the accuracy of computational methods BF:
Segmentation is the process of an iteration, and wherein the method for the segmentation of the i-th step is as follows: calculates and all once splits BF upperi-1When, then carry out the place wherein arbitrarily can split dividing the dividing method BF ' formedi={ BF 'i1, BF 'i2, BF 'i3... BF 'iw, calculate accuracy D, take out BF 'iMiddle D is the BF ' of maximum valueiuAs optimization next step splitting scheme BFi;
A series of segmentation step BF can be obtained by previous step1, BF2, BF3…BFg, corresponding efficiency is D1, D2, D3…DgTake out the lifting δ D of segmentationi=Di+1-Di, after segmentation each time, calculate δ DiThe mean μ of set and variance ν, whenIn time, stops.
The characteristics of image that whole video utilizes video carries out segmentation, and the RGB histogrammic difference computing formula of adjacent two frames is as follows:
Wherein, H (f1, R, i) represent f1The histogram value of the i-th red level of frame, H (f1, G, i) represent f1The histogram value of the i-th green level of frame, H (f1, B, i) represent f1The histogram value of the i-th blue level of frame.
Then mean μ and the variance ν of the rectangular histogram difference of adjacent two frames of whole video are tried to achieve, when the difference between two frames is more than threshold value μ+av, using the second frame as an edge camera lens.A is for adjusting parameter.
Suitable camera lens is looked near segmentation place that text finds.If segmentation place that text is in video is a period of time, then in this period, find the maximum camera lens of change as shot segmentation;If this period is absent from edge camera lens, then choose from this period nearest camera lens as shot segmentation;If segmentation place that text is in video is a frame, then the immediate camera lens of this frame of selected distance is as shot segmentation.
Claims (2)
1. video segmentation method one kind content-based, it is characterised in that comprise the following steps:
S01: extracted the dialogue stream S={s in video by subtitle file1,s2,s3,…,sn, the time started B={b of each section of dialogue1,b2,b3,…,bn, the end time E={e of each section of dialogue1,e2,e3,…,en};
S02: for all of neighbor dialogs si,si-1, set a threshold value λ, work as bi-ei-1During < λ, then make si,si-1Being classified as same section, thus dialogue stream S is divided into m section, wherein i-th section by from kth word, is made up of the dialogue of l section continuous print altogether, i.e. si={ sk,sk+1,sk+2,…sk+l-1};
S03: use participle instrument to every in short skCarry out participle, after removing non-notional word, obtain skWord list Ck={ ck1,ck2,ck3,…,ckh};
S04: try to achieve any two sentence s by below equationxAnd syBetween similarity:
Wherein f (cxi) for word cxiTerm vector, f (cyi) for word cyiTerm vector, by two word cxi,cyiThe dot product f (c of term vectorxi)f(cyi) can in the hope of the similarity of two words;
S05: for all of Si, utilize the 4th step to try to achieve SiIn similarity matrix M between all sentencesi, to MiSplit, obtain the dividing method of corresponding dialogue;
S06: whole video is utilized the image characteristics extraction edge camera lens of video;
S07: look for optimal shot segmentation near segmentation place that text finds: if segmentation place that text is in video is a period of time, then find the maximum camera lens of change as shot segmentation in this period;If this period is absent from edge camera lens, then choose from this period nearest camera lens as shot segmentation;If segmentation place that text is in video is a frame, then the immediate camera lens of this frame of selected distance is as shot segmentation.
2. content-based video segmentation method according to claim 1, it is characterised in that to M in described step S05iThe method carrying out splitting is:
1) to MiConvert, first, to MiIn any one value x all do with down conversion:
Wherein g (x) is:
Then to the M after conversioniIn any one value mx,yAll the window of its 1*11 is made a statistics, namely to { mx,y-5,mx,y-4,…,mx,y+4,mx,y+5Count less than mx,yThe number of value, with less than mx,yThe number of value replace mx,y;
2) to the M after replacingiSplitting, dividing method is as follows:
First, the accuracy of a splitting scheme is defined: orderax,y=(y-x+1)2, it is assumed that to SiDividing method BF={bf is obtained after dividing1,bf2,…,bfp, then according to dividing method BF, it is possible to the accuracy of computational methods BF:
Segmentation is the process of an iteration, and wherein the method for segmentation of kth step is as follows: calculates and all once splits BF upperk-1When, then carry out the place wherein arbitrarily can split dividing the dividing method BF ' formedk={ BF 'k1,BF′k2,BF′k3,…BF′kw, calculate accuracy D, take out BF 'kMiddle D is the BF ' of maximum valuekuOption b F is split as optimizationk;
3) by the 2nd) step can obtain a series of segmentation step BF1,BF2,BF3…BFg, corresponding efficiency is D1,D2,D3…DgTake out the lifting δ D of segmentationi=Di+1-Di, after segmentation each time, calculate δ DiThe mean μ of set and variance ν, whenIn time, stops.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610066554.0A CN105744356B (en) | 2016-01-29 | 2016-01-29 | A kind of video segmentation method based on content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610066554.0A CN105744356B (en) | 2016-01-29 | 2016-01-29 | A kind of video segmentation method based on content |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105744356A true CN105744356A (en) | 2016-07-06 |
CN105744356B CN105744356B (en) | 2019-03-12 |
Family
ID=56247165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610066554.0A Active CN105744356B (en) | 2016-01-29 | 2016-01-29 | A kind of video segmentation method based on content |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105744356B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107438204A (en) * | 2017-07-26 | 2017-12-05 | 维沃移动通信有限公司 | A kind of method and mobile terminal of media file loop play |
CN108235141A (en) * | 2018-03-01 | 2018-06-29 | 北京网博视界科技股份有限公司 | Live video turns method, apparatus, server and the storage medium of fragmentation program request |
CN108419123A (en) * | 2018-03-28 | 2018-08-17 | 广州市创新互联网教育研究院 | A kind of virtual sliced sheet method of instructional video |
WO2020119464A1 (en) * | 2018-12-12 | 2020-06-18 | 华为技术有限公司 | Video splitting method and electronic device |
CN112291634A (en) * | 2019-07-25 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Video processing method and device |
WO2021047532A1 (en) * | 2019-09-10 | 2021-03-18 | Huawei Technologies Co., Ltd. | Method and system for video segmentation |
CN113160273A (en) * | 2021-03-25 | 2021-07-23 | 常州工学院 | Intelligent monitoring video segmentation method based on multi-target tracking |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030161396A1 (en) * | 2002-02-28 | 2003-08-28 | Foote Jonathan T. | Method for automatically producing optimal summaries of linear media |
CN101047795A (en) * | 2006-03-30 | 2007-10-03 | 株式会社东芝 | Moving image division apparatus, caption extraction apparatus, method and program |
CN101719144A (en) * | 2009-11-04 | 2010-06-02 | 中国科学院声学研究所 | Method for segmenting and indexing scenes by combining captions and video image information |
US20110069939A1 (en) * | 2009-09-23 | 2011-03-24 | Samsung Electronics Co., Ltd. | Apparatus and method for scene segmentation |
CN102833638A (en) * | 2012-07-26 | 2012-12-19 | 北京数视宇通技术有限公司 | Automatic video segmentation and annotation method and system based on caption information |
-
2016
- 2016-01-29 CN CN201610066554.0A patent/CN105744356B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030161396A1 (en) * | 2002-02-28 | 2003-08-28 | Foote Jonathan T. | Method for automatically producing optimal summaries of linear media |
CN101047795A (en) * | 2006-03-30 | 2007-10-03 | 株式会社东芝 | Moving image division apparatus, caption extraction apparatus, method and program |
US20110069939A1 (en) * | 2009-09-23 | 2011-03-24 | Samsung Electronics Co., Ltd. | Apparatus and method for scene segmentation |
CN101719144A (en) * | 2009-11-04 | 2010-06-02 | 中国科学院声学研究所 | Method for segmenting and indexing scenes by combining captions and video image information |
CN102833638A (en) * | 2012-07-26 | 2012-12-19 | 北京数视宇通技术有限公司 | Automatic video segmentation and annotation method and system based on caption information |
Non-Patent Citations (4)
Title |
---|
M XU,LT CHIA,H YI,D RAJAN: "Affective content detection in sitcom using subtitle and audio", 《MULTI-MEDIA MODELLING CONFERENCE PROCEEDINGS, 2006 12TH INTERNATIONAL》 * |
朱映映,周洞汝: "基于视频、音频和文本的视频分段", 《计算机工程与应用》 * |
李松斌,王玲芳,王劲林: "基于剧本及字幕信息的视频分割方法", 《计算机工程》 * |
田波: "基于内容的视频分段技术研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107438204B (en) * | 2017-07-26 | 2019-12-17 | 维沃移动通信有限公司 | Method for circularly playing media file and mobile terminal |
CN107438204A (en) * | 2017-07-26 | 2017-12-05 | 维沃移动通信有限公司 | A kind of method and mobile terminal of media file loop play |
CN108235141B (en) * | 2018-03-01 | 2020-11-20 | 北京网博视界科技股份有限公司 | Method, device, server and storage medium for converting live video into fragmented video on demand |
CN108235141A (en) * | 2018-03-01 | 2018-06-29 | 北京网博视界科技股份有限公司 | Live video turns method, apparatus, server and the storage medium of fragmentation program request |
CN108419123A (en) * | 2018-03-28 | 2018-08-17 | 广州市创新互联网教育研究院 | A kind of virtual sliced sheet method of instructional video |
CN108419123B (en) * | 2018-03-28 | 2020-09-04 | 广州市创新互联网教育研究院 | Virtual slicing method for teaching video |
WO2020119464A1 (en) * | 2018-12-12 | 2020-06-18 | 华为技术有限公司 | Video splitting method and electronic device |
US11902636B2 (en) | 2018-12-12 | 2024-02-13 | Petal Cloud Technology Co., Ltd. | Video splitting method and electronic device |
CN112291634A (en) * | 2019-07-25 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Video processing method and device |
WO2021047532A1 (en) * | 2019-09-10 | 2021-03-18 | Huawei Technologies Co., Ltd. | Method and system for video segmentation |
US10963702B1 (en) | 2019-09-10 | 2021-03-30 | Huawei Technologies Co., Ltd. | Method and system for video segmentation |
CN114342353A (en) * | 2019-09-10 | 2022-04-12 | 华为技术有限公司 | Method and system for video segmentation |
CN113160273A (en) * | 2021-03-25 | 2021-07-23 | 常州工学院 | Intelligent monitoring video segmentation method based on multi-target tracking |
Also Published As
Publication number | Publication date |
---|---|
CN105744356B (en) | 2019-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105744356A (en) | Content-based video segmentation method | |
CN110147788B (en) | Feature enhancement CRNN-based metal plate strip product label character recognition method | |
CN103559196B (en) | Video retrieval method based on multi-core canonical correlation analysis | |
CN104376105B (en) | The Fusion Features system and method for image low-level visual feature and text description information in a kind of Social Media | |
CN108804578B (en) | Unsupervised video abstraction method based on consistency segment generation | |
CN103559237B (en) | Semi-automatic image annotation sample generating method based on target tracking | |
US8203554B2 (en) | Method and apparatus for identifying visual content foregrounds | |
CN102799669B (en) | Automatic grading method for commodity image vision quality | |
US10497166B2 (en) | Home filling method using estimated spatio-temporal background information, and recording medium and apparatus for performing the same | |
KR102024867B1 (en) | Feature extracting method of input image based on example pyramid and apparatus of face recognition | |
EP2568429A1 (en) | Method and system for pushing individual advertisement based on user interest learning | |
Varga et al. | Fully automatic image colorization based on Convolutional Neural Network | |
CN106792005B (en) | Content detection method based on audio and video combination | |
Sun et al. | Specific comic character detection using local feature matching | |
CN103942794A (en) | Image collaborative cutout method based on confidence level | |
US20210319230A1 (en) | Keyframe Extractor | |
CN104636761A (en) | Image semantic annotation method based on hierarchical segmentation | |
CN111210402A (en) | Face image quality scoring method and device, computer equipment and storage medium | |
CN104978565A (en) | Universal on-image text extraction method | |
CN101216943A (en) | A method for video moving object subdivision | |
Nugraha et al. | Video recognition of American sign language using two-stream convolution neural networks | |
CN102930292A (en) | Object identification method based on p-SIFT (Scale Invariant Feature Transform) characteristic | |
CN109213974B (en) | Electronic document conversion method and device | |
CN102831161A (en) | Semi-supervision sequencing study method for image searching based on manifold regularization | |
WO2020022329A1 (en) | Object detection/recognition device, method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |