CN109101920A - Video time domain unit partioning method - Google Patents

Video time domain unit partioning method Download PDF

Info

Publication number
CN109101920A
CN109101920A CN201810889588.9A CN201810889588A CN109101920A CN 109101920 A CN109101920 A CN 109101920A CN 201810889588 A CN201810889588 A CN 201810889588A CN 109101920 A CN109101920 A CN 109101920A
Authority
CN
China
Prior art keywords
video
time domain
formula
domain unit
mssi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810889588.9A
Other languages
Chinese (zh)
Other versions
CN109101920B (en
Inventor
张云佐
朴春慧
张莎莎
李汶轩
姚慧松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Tiedao University
Original Assignee
Shijiazhuang Tiedao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Tiedao University filed Critical Shijiazhuang Tiedao University
Priority to CN201810889588.9A priority Critical patent/CN109101920B/en
Publication of CN109101920A publication Critical patent/CN109101920A/en
Application granted granted Critical
Publication of CN109101920B publication Critical patent/CN109101920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of video time domain unit partioning methods, are related to method for processing video frequency technical field.Described method includes following steps: extracting horizontal Spatial-temporal slice in video caption center;The minimum space and time order information content of each video frame is calculated according to the horizontal Spatial-temporal slice of extraction;Detect the minimum space and time order information content of videoMutation;With minimum space and time order information contentSport boundary segmentation video time domain unit.The method carries out accuracy rate with higher when time domain unit segmentation, there is stronger anti-interference since the method only extracts the detection of the one-row pixels in video relative to control methods, and computation complexity is low, calculation amount is small, has more apparent advantage on calculating the time.

Description

Video time domain unit partioning method
Technical field
The present invention relates to method for processing video frequency technical field more particularly to a kind of video time domain unit partioning methods.
Background technique
Video is non-structured data flow, and its essence is a series of picture frames continuous in time.These picture frames Only mutual sequential relationship, without any structural information.Video temporal segmentation is i.e. according in video content or video It is specific to indicate the structural unit that stratification is detected from digital video sequences, it is that the structural unit of different levels establishes index letter Breath, in order to which video data is stored, manages, analyzed and handled according to particular content.Video temporal segmentation is to video data Stream carries out Hierarchical Segmentation on a timeline, completes turning for the video entities that structuring is flowed to from original non-structured video It changes.Original video flowing is divided into video time domain unit that is significant, being easily managed, forms the level knot of video content Structure.
Video temporal segmentation is the basis and committed step of video concentration, retrieval with browsing, and camera lens is wide in video analysis The general spatial structure unit used, the existing video temporal segmentation method based on shot boundary detector is usually with video features The foundation that variation degree is divided as video lens.These video features include color, shape, edge and motion vector etc.. In pixel domain Processing Algorithm, video lens segmentation mainly utilizes color histogram feature;In compressed domain video partitioning algorithm, Generally utilize motion vector characteristic.The roadmap of these two types of algorithms is almost the same, is all by comparing between adjacent video frames Feature difference and given threshold determine shot boundary.If feature difference is greater than given threshold, current location is regarded as mirror Boundary in front, it is on the contrary then be not shot boundary.The accuracy of shot boundary detector depends on the threshold of definition and the setting of feature difference Value.However camera lens is not the smallest time domain semantic primitive, sub- camera lens is more accurate in terms of describing video semanteme change.But son Shot Detection will carry out on the basis of shot segmentation, complex steps and computationally intensive.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of strong interference immunity, computation complexity is low, calculation amount is small, view Frequency division cuts accurate method.
In order to solve the above technical problems, the technical solution used in the present invention is: a kind of video time domain unit partioning method, It is characterized by comprising following steps:
Horizontal Spatial-temporal slice is extracted in video caption center;
The minimum space and time order information content MSSI of each video frame is calculated according to the horizontal Spatial-temporal slice of extraction;
Detect the mutation of the minimum space and time order information content MSSI of video;
Boundary segmentation video time domain unit is sported with minimum space and time order information content MSSI.
A further technical solution lies in, for video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center It may be expressed as:
In formula:Indicate position x=j in video V, t=i, y take the pixel at subtitle median elevation, meet j ∈ [1, W], i ∈ [1, L], W indicate that the width of video frame, L indicate the length of video.
A further technical solution lies in the method also includes carrying out pretreated step, side to horizontal Spatial-temporal slice S Method is as follows:
Pretreatment is carried out using Adaptive Gaussian mixture background model, using each column of horizontal Spatial-temporal slice S as one Gauss model is inputted, model parameter updates by column;Gaussian mean μ and variance δ2More new formula are as follows:
In above formula:It is the brightness that t+1 is arranged in Spatial-temporal slice S, α is adjusted rate, is defined as:
In above formula: MnIt is matching times;
Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following public affairs Formula is calculated:
According to formula (4), the subtitle on horizontal Spatial-temporal slice S is separated from background as prospect;Video V (x, y, T) the minimum space and time order information content MSSI of the i-th frame can be calculated by following formula in
In formula:
τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as It interferes and gets rid of.
A further technical solution lies in, the mutation of MSSI can be generated at video time domain elementary boundary, note Sudden Changing Rate is Δ, Then according to formula (5), Δ can be calculated by following formula:
It is obtained from formula (7), Δ increases suddenly comprising MSSI and the two kinds of situations that become smaller suddenly, both of these case both correspond to The boundary of video time domain unit;By the boundary B function of video time domain unit is defined as:
In formula: w0Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame;
B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function The segmentation of video time domain unit can be completed in curve.
The beneficial effects of adopting the technical scheme are that video caption is defined as having minimum by the method Semantic sub- lens unit, correspondingly, by minimum space and time order information content (Minimal Spatiotemporal Semantic Information, MSSI) mutation be mapped as the boundary of video time domain unit.The method have when time domain unit segmentation There is higher accuracy rate, relative to control methods, since the method only extracts the detection of the one-row pixels in video, has relatively strong Anti-interference, and computation complexity is low, calculation amount is small, has more apparent advantage on calculating the time.
Detailed description of the invention
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is the schematic diagram of the video Spatial-temporal slice of different directions in the embodiment of the present invention;
Fig. 2 is the horizontal Spatial-temporal slice exemplary diagram of video caption center in the embodiment of the present invention;
Fig. 3 is the flow chart of the method for the embodiment of the present invention.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, but the present invention can be with Implemented using other than the one described here other way, those skilled in the art can be without prejudice to intension of the present invention In the case of do similar popularization, therefore the present invention is not limited by the specific embodiments disclosed below.
Overall, as shown in figure 3, including the following steps: the invention discloses a kind of video time domain unit partioning method
Firstly, extracting horizontal Spatial-temporal slice in video caption center;
Secondly, calculating the minimum space and time order information content MSSI of each video frame according to the horizontal Spatial-temporal slice of extraction;
It is then detected that the mutation of the minimum space and time order information content MSSI of video;
Finally, sporting boundary segmentation video time domain unit with minimum space and time order information content MSSI.
The above method is illustrated below with reference to particular technique means:
Video Spatial-temporal slice is a kind of efficient video analysis method, has many advantages, such as that calculation amount is low, strong robustness.It is only Image space part row, column is extracted, complete video time information is remained, and the missing of spatial information can be by more slice informations Fusion compensates.Video is analyzed under historical time information abundant auxiliary, can effectively avoid interference.Spatial-temporal slice usually from Horizontal, vertical and three directions of diagonal line are chosen, as shown in Figure 1.The video Spatial-temporal slice of different directions reflects different mesh Mark object information and video scene information.The analysis object of this method is video caption, and subtitle multidigit is below video, and transverse direction Arrangement, therefore lateral Spatial-temporal slice is selected, the horizontal Spatial-temporal slice example of video caption center is as shown in Figure 2.
For video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center be may be expressed as:
In formula:Indicate position x=j in video V, t=i, y take the pixel at subtitle median elevation, meet j ∈ [1, W], i ∈ [1, L], W indicate that the width of video frame, L indicate the length of video.
The horizontal Spatial-temporal slice known to formula (1) is only extracted the one-row pixels in subtitling image space, remains complete Video time domain information, and spatial information (si) be able to reflect out subtitle structure, whether there is or not etc. semantic informations.Therefore, using video space-time The minimum semantic information for being sliced to analyze video is feasible, and data volume to be treated is greatly reduced.
Video caption contains video semanteme information abundant, the corresponding video semanteme content of same subtitle is relatively complete, And it is held essentially constant.Based on this observation, video caption is defined as with minimum semantic sub- lens unit, phase by the method It answers, the mutation of minimum space and time order information content MSSI is mapped as to the boundary of time domain unit.Existing sub- lens detection method It is carried out on the basis of shot segmentation, complex steps and computationally intensive, it is difficult to adapt to the reality of massive video data efficient process Demand.Video MSSI can be analyzed and be characterized by video caption Spatial-temporal slice, thus the method use video caption when Cut-in without ball piece detects the mutation of MSSI.
As can be seen from Figure 3 for the video sequence of input, progress Spatial-temporal slice extraction first, according to formula (1) from defeated Enter to extract the horizontal Spatial-temporal slice S of video caption center in video sequence.Caption information in Spatial-temporal slice S characterizes most Small space-time semantic information MSSI, accurate MSSI, first pre-processes horizontal Spatial-temporal slice S in order to obtain.Pretreatment is adopted It is carried out with Adaptive Gaussian mixture background model, using each column of horizontal Spatial-temporal slice S as an input Gauss model, model Parameter updates by column.Gaussian mean μ and variance δ2More new formula are as follows:
In above formula:It is the brightness that t+1 is arranged in Spatial-temporal slice S, α is adjusted rate, is defined as:
In above formula: MnIt is matching times.
Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following public affairs Formula is calculated:
According to formula (4), the subtitle on Spatial-temporal slice S is separated from background as prospect.In video V (x, y, t) The minimum space and time order information content MSSI of i-th frame can be calculated by following formula:
In formula:
τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as It interferes and gets rid of.
The segmentation of video time domain unit is completed, first to detect the boundary of time domain unit.At video time domain elementary boundary The mutation of MSSI can be generated, note Sudden Changing Rate is Δ, then according to formula (5), Δ can be calculated by following formula:
From formula (7) as can be seen that Δ contains MSSI increases suddenly and the unexpected two kinds of situations that become smaller, both of which Boundary corresponding to video time domain unit.For simplicity, the boundary B function of video time domain unit is defined as:
In formula: w0Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame.
B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function The segmentation of video time domain unit can be completed in curve.
In order to verify the validity of the method, by itself and existing main stream approach (Petersohn C.Sub-Shots- Basic Units of Video[C]//International Workshop on Systems,Signals and Image Processing,2007 and,Eurasip Conference Focused on Speech and Image Processing, Multimedia Communications and Services.IEEE, 2007:323-326.) it has carried out pair Than.Comparative experiments carries out on five kinds of different types of credit videos, as shown in table 1:
Table 1 tests video information
Video 1 is Renmin University of China's open class, and captioned test is Chinese text, and captioned test and background separate obviously, Shot change form is mutant form;The speech that video 2 is TEDxSuzhou, captioned test are Sino-British mixing text, word Curtain separates obviously with background, and Shot change form is mutant form;Video 3 is Zhejiang University's open class, and captioned test is Chinese Text, subtitle and background separate obviously, and Shot change form is mutation in conjunction with gradual change;The speech that video 4 is TED, word Curtain text is English text, and subtitle is larger by background influence in background, and Shot change form is mutant form;Video 5 is ox The open class of saliva university, captioned test are Sino-British mixing text, have cross section with background, Shot change form is mutation and gradual change In conjunction with form is more various.Test parameter setting are as follows: τ=10, w0=20.Experiment is completed on universal personal computer, substantially It is configured that 380@2.53G CPU and 8GB memory of Intel (R) Core (TM) i3 M.
Comparison carries out in terms of processing time, recall rate and accuracy rate three.Wherein recall rate RrIt is defined as follows:
Accuracy rate RaIt is defined as follows:
In formula: FCZIndicate the correct video time domain unit number extracted, FCsIndicate the video time domain unit actually having Number, FCtIndicate the total video time domain unit number extracted.
Comparing result is respectively as shown in table 2,3,4,5,6:
Table 2 compares the method for video 1
Table 3 compares the method for video 2
Table 4 compares the method for video 3
Table 5 compares the method for video 4
Table 6 compares the method for video 5
Accuracy rate with higher when can be seen that the method progress time domain unit segmentation from above-mentioned experimental result.Phase There is stronger anti-interference since this method only extracts the detection of the one-row pixels in video for control methods, and calculate multiple Miscellaneous degree is low, calculation amount is small, has more apparent advantage on calculating the time.

Claims (4)

1. a kind of video time domain unit partioning method, it is characterised in that include the following steps:
Horizontal Spatial-temporal slice is extracted in video caption center;
The minimum space and time order information content MSSI of each video frame is calculated according to the horizontal Spatial-temporal slice of extraction;
Detect the mutation of the minimum space and time order information content MSSI of video;
Boundary segmentation video time domain unit is sported with minimum space and time order information content MSSI.
2. video time domain unit partioning method as described in claim 1, it is characterised in that:
For video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center be may be expressed as:
In formula:Indicate position x=j in video V, t=i, y take the pixel at subtitle median elevation, meet j ∈ [1, W], i ∈ [1, L], W indicate that the width of video frame, L indicate the length of video.
3. video time domain unit partioning method as claimed in claim 2, it is characterised in that: when the method also includes to level Cut-in without ball piece S carries out pretreated step, the method is as follows:
Pretreatment is carried out using Adaptive Gaussian mixture background model, is inputted each column of horizontal Spatial-temporal slice S as one Gauss model, model parameter update by column;Gaussian mean μ and variance δ2More new formula are as follows:
In above formula:It is the brightness that t+1 is arranged in Spatial-temporal slice S, α is adjusted rate, is defined as:
In above formula: MnIt is matching times;
Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following formula meter It obtains:
According to formula (4), the subtitle on horizontal Spatial-temporal slice S is separated from background as prospect;In video V (x, y, t) The minimum space and time order information content MSSI of i-th frame can be calculated by following formula
In formula:
τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as interfering And it gets rid of.
4. video time domain unit partioning method as claimed in claim 3, it is characterised in that:
The mutation of MSSI can be generated at video time domain elementary boundary, note Sudden Changing Rate is Δ, then according to formula (5), Δ can be by following formula It is calculated:
It is obtained from formula (7), Δ increases suddenly comprising MSSI and the two kinds of situations that become smaller suddenly, both of these case both correspond to video The boundary of time domain unit;By the boundary B function of video time domain unit is defined as:
In formula: w0Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame;
B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function curve The segmentation of video time domain unit can be completed.
CN201810889588.9A 2018-08-07 2018-08-07 Video time domain unit segmentation method Active CN109101920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810889588.9A CN109101920B (en) 2018-08-07 2018-08-07 Video time domain unit segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810889588.9A CN109101920B (en) 2018-08-07 2018-08-07 Video time domain unit segmentation method

Publications (2)

Publication Number Publication Date
CN109101920A true CN109101920A (en) 2018-12-28
CN109101920B CN109101920B (en) 2021-06-25

Family

ID=64848631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810889588.9A Active CN109101920B (en) 2018-08-07 2018-08-07 Video time domain unit segmentation method

Country Status (1)

Country Link
CN (1) CN109101920B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822866A (en) * 2021-09-23 2021-12-21 深圳爱莫科技有限公司 Widely-adaptive axle number identification method, system, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827224A (en) * 2010-04-23 2010-09-08 河海大学 Detection method of anchor shot in news video
CN105931270A (en) * 2016-04-27 2016-09-07 石家庄铁道大学 Video keyframe extraction method based on movement trajectory analysis
CN106101487A (en) * 2016-07-04 2016-11-09 石家庄铁道大学 Video spatiotemporal motion track extraction method
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
US20160379055A1 (en) * 2015-06-25 2016-12-29 Kodak Alaris Inc. Graph-based framework for video object segmentation and extraction in feature space
CN108307229A (en) * 2018-02-02 2018-07-20 新华智云科技有限公司 A kind of processing method and equipment of video-audio data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827224A (en) * 2010-04-23 2010-09-08 河海大学 Detection method of anchor shot in news video
US20160379055A1 (en) * 2015-06-25 2016-12-29 Kodak Alaris Inc. Graph-based framework for video object segmentation and extraction in feature space
CN105931270A (en) * 2016-04-27 2016-09-07 石家庄铁道大学 Video keyframe extraction method based on movement trajectory analysis
CN106101487A (en) * 2016-07-04 2016-11-09 石家庄铁道大学 Video spatiotemporal motion track extraction method
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN108307229A (en) * 2018-02-02 2018-07-20 新华智云科技有限公司 A kind of processing method and equipment of video-audio data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822866A (en) * 2021-09-23 2021-12-21 深圳爱莫科技有限公司 Widely-adaptive axle number identification method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN109101920B (en) 2021-06-25

Similar Documents

Publication Publication Date Title
Sindagi et al. Multi-level bottom-top and top-bottom feature fusion for crowd counting
CN107622258B (en) Rapid pedestrian detection method combining static underlying characteristics and motion information
CN103593464B (en) Video fingerprint detecting and video sequence matching method and system based on visual features
CN109151501A (en) A kind of video key frame extracting method, device, terminal device and storage medium
CN108600865B (en) A kind of video abstraction generating method based on super-pixel segmentation
CN108304808A (en) A kind of monitor video method for checking object based on space time information Yu depth network
CN112950477B (en) Dual-path processing-based high-resolution salient target detection method
CN103020606A (en) Pedestrian detection method based on spatio-temporal context information
CN102495887B (en) Video lens partitioning method based on color matrixes of key regions and application thereof
CN111401368B (en) News video title extraction method based on deep learning
CN106384359A (en) Moving target tracking method and television set
Wu et al. Overview of video-based vehicle detection technologies
CN109151616B (en) Video key frame extraction method
CN109215047B (en) Moving target detection method and device based on deep sea video
CN110378929A (en) A kind of across camera pedestrian track tracking of business place
CN109101920A (en) Video time domain unit partioning method
Leyva et al. Video anomaly detection based on wake motion descriptors and perspective grids
CN111160099B (en) Intelligent segmentation method for video image target
Zhang et al. Accurate overlay text extraction for digital video analysis
Zhao et al. Learning a perspective-embedded deconvolution network for crowd counting
CN113627383A (en) Pedestrian loitering re-identification method for panoramic intelligent security
Chen et al. ESTN: Exacter Spatiotemporal Networks for Violent Action Recognition
Chun et al. A method for original image recovery for caption areas in video
Yang et al. Objective performance evaluation of video segmentation algorithms with ground-truth
Ntalianis et al. An active contour-based video object segmentation scheme for stereoscopic video sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhang Yunzuo

Inventor after: Zhang Shasha

Inventor after: Guo Yaning

Inventor after: Li Menxuan

Inventor after: Park Chun Hui

Inventor after: Yao Huisong

Inventor before: Zhang Yunzuo

Inventor before: Park Chun Hui

Inventor before: Zhang Shasha

Inventor before: Li Menxuan

Inventor before: Yao Huisong

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant