CN102750349B - Video browsing method based on video semantic modeling - Google Patents

Video browsing method based on video semantic modeling Download PDF

Info

Publication number
CN102750349B
CN102750349B CN201210188991.1A CN201210188991A CN102750349B CN 102750349 B CN102750349 B CN 102750349B CN 201210188991 A CN201210188991 A CN 201210188991A CN 102750349 B CN102750349 B CN 102750349B
Authority
CN
China
Prior art keywords
video
behalf
semantic
browsing
semanteme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210188991.1A
Other languages
Chinese (zh)
Other versions
CN102750349A (en
Inventor
谢小鹏
张昱
肖海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201210188991.1A priority Critical patent/CN102750349B/en
Publication of CN102750349A publication Critical patent/CN102750349A/en
Application granted granted Critical
Publication of CN102750349B publication Critical patent/CN102750349B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video browsing method based on video semantic modeling. A ViMeta-VU system, a video semantic browsing interface, a 2.5-dimentional affirn convertor and an intelligent agent are included; the ViMeta-VU system is a semantic video object classification tracking processing system; the 2.5-dimentional affirn convertor is used for automatic segmentation of a combined semantic video object plane of statistics change detection and a space-time filter. The method includes video segmentation, video semantic representation and modeling and video multiple access and mobile agent. The video browsing method based on video semantic modeling has the advantage that the intelligent agent which provides some behaviors such as browsing behavior, concentrating behavior and participating behavior is brought about based on a video semantic information basis. According to the solution, the rapid access to video objects and the flexibility are provided, requirements of video cameras are satisfied, and the improvement of video browsing efficiency is facilitated.

Description

A kind of video tour method based on video semanteme modeling
Technical field
The present invention relates to image processing field, particularly a kind of video tour method based on video semanteme modeling.
Background technology
The technology of video is provided on the internet, has done at present a lot of research.These technology comprise the design [McCanne & Jacobson 1995] of video server, agency is provided [Floyd 1997], the amendment [McManus & Ross 1996, Chen 1998] of the procotol of adaptive video coding [Rowe 1994] and low level.Current Microsoft and the RealPlayer of RealNetworks use wmplayer front buffer district, it in transmission, the video file of decoding.This will greatly reduce the turn time, but can be because transmission lags behind when freezing video flowing and causing playing.The front buffer management that provides one to comprise deduction information in video flowing has been provided Feng etc. [1998].This plan also needs to monitor existing bandwidth.
All there is fatal problem in said method.This server, agency or procotol are non-telescoping, unpredictable, and are limited in the access times of a time.No matter there are many servers efficiently, agency or agreement, in the time that the quantity of access increases, it can block.This may occur in a video on-demand system.Remain unpractical by front buffer district or adaptive coding video stream application program.If buffer zone is enough large, it may be equivalent to download whole video.Add cache management and also will increase extra expense, video server can lower efficiency greatly.
Summary of the invention
The object of the invention is to overcome the shortcoming and defect of above-mentioned prior art, a kind of video tour method based on video semanteme modeling is provided, improved the efficiency of video tour.
The present invention is achieved through the following technical solutions: a kind of video tour method based on video semanteme modeling, comprises ViMeta-VU system, a video semanteme browser interface, 2.5 dimension affirn transducer and intelligent agents; Described ViMeta-VU system is that disposal system is followed the tracks of in semantic video object classification; Described 2.5 dimension affirn transducers for statistical change detection and time the air filter semantic video object plane auto Segmentation that is harmonious, comprise the steps:
(1) Video segmentation: detect interframe movement and the multiple frames of splicing;
(2) video semanteme characterizes and modeling: the object that video semanteme object and visual object is all referred to as to perception; Frame is a complete unit; Stack frame forms a continuous video sequence;
(3) multiple access of video and mobile agent: acting on behalf of AL and acting on behalf of AM is one group of agency, jointly browses the video file of remote site; Act on behalf of AL and be mounted in local user's generation of computers reason, act on behalf of AM and be mounted in the agency of remote computer;
The interactively video section of browsing from user of the described AL of agency obtains him and wants the video semanteme characteristic information of browsing, and then, this video semanteme characteristic information sends to the AM that acts on behalf of browsing by XML document, retrieval frame sequence, concurrent send to be paired to act on behalf of AL.
The present invention, with respect to prior art, has following advantage and effect:
Because video is very complicated media, the difficulty of manipulation and processing video data is mainly the shortage due to the semantic understanding information of video data.Based on the intelligent agent of some behavior providing has been provided on video semanteme Information base, as browse behavior, concentrated behavior, participation behavior etc.Solution not only provides the fast access of object video, and dirigibility is also provided, to meet the needs of video lens.Application of the present invention will be conducive to the raising of video tour efficiency.The method will be put forth effort on object video will be provided on the internet.There is this technology of a lot of application, as the aspect such as collaborative of Web education, video request program, electronics report, computer supported.
Brief description of the drawings
Fig. 1 is video semanteme structural drawing of the present invention.
Embodiment
Below in conjunction with specific embodiment, the present invention is more specifically described in detail, but embodiments of the present invention are not limited to this, for not dated especially technological parameter, can carry out with reference to routine techniques.
Embodiment
As shown in Figure 1, the video tour method based on video semanteme modeling, comprises ViMeta-VU system, a video semanteme browser interface, 2.5 dimension affirn transducer and intelligent agents; Described ViMeta-VU system is that disposal system is followed the tracks of in semantic video object classification; Described 2.5 dimension affirn transducers for statistical change detection and time the air filter semantic video object plane auto Segmentation that is harmonious, comprise the steps:
(1) Video segmentation: detect interframe movement and the multiple frames of splicing;
Video segmentation: a content-based essential step of cutting apart is to detect interframe movement and the multiple frames of splicing.For the action between frame, method has been described three anglecs of rotation (roll, pitch and yaw) and has been defined as (α, beta, gamma), and three are transformed to (Tx, Ty, Tz).A spatial point (X, Y, Z) and image coordinate (u, v).The point that will move at next frame in the image of (x', y', z') moves to (u', v').Suppose that camera focus f is f ' after movement, next pinhole camera modeling, the coordinate that is related between them is:
x y z = a b c i j k l m n x y z - T x T y T z
With u ′ = f ′ au + bv + cf - f T x / z lu + mv + nf - f T z / z v ′ = f ′ iu + jv + kf - f T y / z lu + mv + nf - f T z / z - - - ( 1 )
If the anglec of rotation is less than 5 degree, equation (1) can be approximately:
u ′ ≈ f ′ u + αv - γf - f T x / z γu - βv + f - f T z / z v ′ ≈ f ′ - αu + v + βf - f T y / z γu - βv + f - f T z / z - - - ( 2 )
If
s=(γu-βv+f-fT z/z)/f' (3a)
Obtain:
s · u ′ = u + αv - γf - f T x / z s · v ′ = - αu + v + βf - f T y / z - - - ( 3 )
In the case of simple rotation, convergent-divergent and little movement, two-dimentional affine model is defined as:
s · u ′ = u + αv - T u s · v ′ = v - αu - T v - - - ( 4 )
Wherein s is the degree factor, and (Tu, Tv) is motion-vector, and α is the angle of rotation.Provide 3 corresponding point pair that exceed between two frames, we can obtain the minimum quadratic power solution of kinematic parameter in equation (4).But relating in the situation of combined moving mode, it is to be unable to estimate, without the activity of depth information.
2.5 dimension inter frame motion models of the degree of depth correlation parameter that contains each point by introducing.
This model is made up of three parts.For moving of car (comprising Rotation and Zoom), we have Tx ≈ 0, Ty ≈ 0, and therefore the little vehicle model of 2.5D is:
s i · u i ′ = u i + α v i - T u s i · v i ′ = v i - α u i - T v ( i = 1 . . . N ) - - - ( 5 )
For horizontal trajectory motion (relate to translation, roll and convergent-divergent), we have Tz' ≈ 0, Ty' ≈ 0.The H-tracing model of 2.5D is:
s · u i ′ = u i + α v i - T ui s · v i ′ = v i + α u i - T v ( i = 1 . . . N ) - - - ( 6 )
Equally, for normal trajectories motion (relate to inclination, roll and convergent-divergent), we have Tz' ≈ 0, Tx' ≈ 0.The V-tracing model of 2.5D is:
s · u i = u i + α v i - T u s · v i = v i + α u i - T vi ( i = 1 . . . N ) - - - ( 7 )
In most of the cases, the main movement of camera has met one of motor pattern of 2.5D.Should see, in example (5), (6) and (7), have N+3 variable in 2N equation.A different N variable contributes to dealing with problems in each situation, and for example, si is the parameter relevant to the degree of depth, and Tui moves horizontally, and Tvi is vertical movement.
(2), video semanteme characterizes and modeling: the object that video semanteme object and visual object is all referred to as to perception; Frame is a complete unit; Stack frame forms a continuous video sequence;
An I frame is the identification frames of the frame sequence at this frame place.This is consistent in the definition of MPEG compression standard framework with I frame.
Video semanteme structural model comprises determines five levels:
Perception aspect: become significantly in the visually-perceptible phenomenon of this one-level, can be identified with sense impression object reader in the difference of this one-level.This one-level comprises visual object (separating according to different fidelitys), the characteristic (short distance, full length shot, close-up shot etc.) of visual characteristic (color, texture, motion etc.) and photography;
Diegetic level: the four dimensional spacetime world is settled by a video image or a series of video image, comprises that the space-time of commission merchant, object, behavior is described;
Intension level: metaphor, analogy and the object in screen and event are carried out to meaning association (being diegetic) is contingent.The culture of social groups of code definition of the intension level of catching is considered to " normally " in colony;
Subtextual level: the symbol more acquiring a special sense and meaning, also can be modified for moviemaking or database;
Film rank: the detail of producing video technique is included in the performance of formal film and historical relic in.This rank comprises mobile camera moving, camera operation (translation, inclination, convergent-divergent, middle scape, far throw, aerial photographing etc.), multiple-camera view (direction, cutting, editor, intersection etc.), illumination scheme and optical effect.
(3), multiple access and the mobile agent of video: act on behalf of AL(local agent) and act on behalf of AM(and transplant agency) be one group of agency, jointly browse the video file of remote site; Act on behalf of AL and be mounted in local user's generation of computers reason, act on behalf of AM and be mounted in the agency of remote computer;
The interactively video section of browsing from user of the described AL of agency obtains him and wants the video semanteme characteristic information of browsing, then, this video semanteme characteristic information sends to the AM that acts on behalf of browsing by XML document, retrieval frame sequence, the <xml that concurrent delivery is right, video> is to acting on behalf of AL; Act on behalf of AL and carry out now following three steps:
Preserve <xml, video> couple; To user's displaying video; Obtain more information from user there.
Repeatedly repeatedly, act on behalf of AL and set up a series of <xml that the semantic mould of repetition is analyzed, video> couple of containing.This behavior pattern is taken as one, has portrayed the regular SR(of the browse behavior of user to given video or the case as a simple finite state machine behavior) description.Act on behalf of AL and send SR, to use SR describe the behavior of browsing to act on behalf of AM upper, retrieval letter is at other more video files of remote computer.In given website, once after the work of browsing completes, transplant and act on behalf of AM and can move to now other websites and browse more films.
By acting on behalf of AL, the workflow of acting on behalf of the working group that AM and viewed website form is:
The information that user provides, is acted on behalf of AM and will be replied inquiry by Analysis of X ML document to acting on behalf of AM as a semantic query Q; Use the semantic query Q search XML of document; From user browsing behavior, derive from the rule of R.
The method can be generalized to, and utilizes the cooperation of the multiple AM of agency that are provided with rule, browses at one time multiple websites, retrieves all video clippings.
As mentioned above, just can realize preferably the present invention.
Above-described embodiment is only preferably embodiment of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other are any does not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitute, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.

Claims (2)

1. the video tour method based on video semanteme modeling, is characterized in that: comprise ViMeta-VU system, a video semanteme browser interface, 2.5 dimension affirn transducer and intelligent agents; Described ViMeta-VU system is that disposal system is followed the tracks of in semantic video object classification; Described 2.5 dimension affirn transducers for statistical change detection and time the air filter semantic video object plane auto Segmentation that is harmonious, comprise the steps:
(1) Video segmentation: detect interframe movement and the multiple frames of splicing;
(2) video semanteme characterizes and modeling: the object that video semanteme object and visual object is all referred to as to perception; Frame is a complete unit; Stack frame forms a continuous video sequence;
(3) multiple access of video and mobile agent: acting on behalf of AL and acting on behalf of AM is one group of agency, jointly browses the video file of remote site; Act on behalf of AL and be mounted in local user's generation of computers reason, act on behalf of AM and be mounted in the agency of remote computer.
2. the video tour method based on video semanteme modeling according to claim 1, it is characterized in that: the interactively video section of browsing from user of the described AL of agency obtains him and wants the video semanteme characteristic information of browsing, then, this video semanteme characteristic information sends to the AM that acts on behalf of browsing by XML document, retrieval frame sequence, concurrent send to be paired to act on behalf of AL.
CN201210188991.1A 2012-06-08 2012-06-08 Video browsing method based on video semantic modeling Expired - Fee Related CN102750349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210188991.1A CN102750349B (en) 2012-06-08 2012-06-08 Video browsing method based on video semantic modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210188991.1A CN102750349B (en) 2012-06-08 2012-06-08 Video browsing method based on video semantic modeling

Publications (2)

Publication Number Publication Date
CN102750349A CN102750349A (en) 2012-10-24
CN102750349B true CN102750349B (en) 2014-10-08

Family

ID=47030534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210188991.1A Expired - Fee Related CN102750349B (en) 2012-06-08 2012-06-08 Video browsing method based on video semantic modeling

Country Status (1)

Country Link
CN (1) CN102750349B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101222578A (en) * 2007-12-07 2008-07-16 西安电子科技大学 Video semanteme unit detecting method based on light stream tensor and HMM discrimination
CN101299241A (en) * 2008-01-14 2008-11-05 浙江大学 Method for detecting multi-mode video semantic conception based on tensor representation
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
CN102193918A (en) * 2010-03-01 2011-09-21 汉王科技股份有限公司 Video retrieval method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101222578A (en) * 2007-12-07 2008-07-16 西安电子科技大学 Video semanteme unit detecting method based on light stream tensor and HMM discrimination
CN101299241A (en) * 2008-01-14 2008-11-05 浙江大学 Method for detecting multi-mode video semantic conception based on tensor representation
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
CN102193918A (en) * 2010-03-01 2011-09-21 汉王科技股份有限公司 Video retrieval method and device

Also Published As

Publication number Publication date
CN102750349A (en) 2012-10-24

Similar Documents

Publication Publication Date Title
Canel et al. Scaling video analytics on constrained edge nodes
Xiang et al. Polarization-driven semantic segmentation via efficient attention-bridged fusion
Guo et al. A survey on deep learning based approaches for scene understanding in autonomous driving
CN103795976A (en) Full space-time three-dimensional visualization method
CN1293782A (en) Descriptor for video sequence and image retrieval system using said descriptor
Wang et al. TransCD: scene change detection via transformer-based architecture
US9727992B2 (en) Unifying augmented reality and big data
Kataoka et al. Temporal and fine-grained pedestrian action recognition on driving recorder database
CN1300503A (en) Camera motion parameters estimation method
Jia et al. Fast and accurate object detector for autonomous driving based on improved YOLOv5
Fu et al. Sequential reinforced 360-degree video adaptive streaming with cross-user attentive network
CN115331183A (en) Improved YOLOv5s infrared target detection method
Gu et al. Embedded and real-time vehicle detection system for challenging on-road scenes
Suo et al. HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection
Zhang et al. Dimension embeddings for monocular 3d object detection
Gharahbagh et al. Best frame selection to enhance training step efficiency in video-based human action recognition
Zhang et al. Where are they going? Predicting human behaviors in crowded scenes
Sun et al. Automated human use mapping of social infrastructure by deep learning methods applied to smart city camera systems
Wang et al. Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline
Chen et al. Fast virtual view synthesis for an 8k 3d light-field display based on cutoff-nerf and 3d voxel rendering
CN102750349B (en) Video browsing method based on video semantic modeling
CN112085767A (en) Passenger flow statistical method and system based on deep optical flow tracking
Folenta et al. Determining vehicle turn counts at multiple intersections by separated vehicle classes using CNNs
Priyadharshini et al. PanoSyn: immersive video synopsis for spherical surveillance video
Boisclair et al. Attention transfer from human to neural networks for road object detection in winter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141008

Termination date: 20210608

CF01 Termination of patent right due to non-payment of annual fee