CN1293782A - Descriptor for video sequence and image retrieval system using said descriptor - Google Patents

Descriptor for video sequence and image retrieval system using said descriptor Download PDF

Info

Publication number
CN1293782A
CN1293782A CN00800099A CN00800099A CN1293782A CN 1293782 A CN1293782 A CN 1293782A CN 00800099 A CN00800099 A CN 00800099A CN 00800099 A CN00800099 A CN 00800099A CN 1293782 A CN1293782 A CN 1293782A
Authority
CN
China
Prior art keywords
descriptor
video
motion
camera
centerdot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN00800099A
Other languages
Chinese (zh)
Inventor
B·莫赖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of CN1293782A publication Critical patent/CN1293782A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/786Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using motion, e.g. object motion or camera motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Abstract

The present invention relates to a descriptor for the representation, from a video indexing viewpoint, of motions of a camera or any kind of observer or observing device within any sequence of frames in a video scene. The motions are at least one or several of the following basic operations: fixed, panning (horizontal rotation), tracking (horizontal transverse movement) tilting (vertical rotation), booming (vertical transverse movement), zooming (changes of the focal length), dollying (translation along the optical axis) and rolling (rotation around the optical axis), or any combination of at least two of these operations. Each of said motion types, except fixed, is oriented and subdivided into two components that stand for two different directions, and represented by means of an histogram in which the values correspond to a predefined size of displacement. The invention also relates to an image retrieval system in which a video indexing device uses said descriptor.

Description

The image retrieval system of video sequence descriptor and this descriptor of use
Invention field
The present invention relates to a kind of descriptor, be used for representing arbitrary frame sequence video camera or the viewer of any kind or the motion of finder in the video scene from the angle of establishment video index, described motion is at least a or several in the following basic operation: maintain static, (horizontally rotating) swings, follow the tracks of (horizontal cross motion, in movie Language, also be called advance (travelling)), pitching (vertical rotation), lifting (booming) (vertical traverse), tracking shot (change focal length), move forward and backward (along the optical axis translation) and roll (rotating on optical axis) at least two kinds combination in any during perhaps these are operated.The present invention can be widely used for the application by the MPEG-7 standard aiming in future.
Background of invention
Being archived in the several application field of image and video information in TV, road traffic, remote sensing, meteorology, medical image or the like field, all is very important task for example.Yet identification is perhaps browsed great lot video files efficiently with the relevant information of given inquiry, remains unusual difficulty.The method that is used for database the most frequently is included as each video distribution keyword of storage, retrieves on the basis of these speech.
MPEG has formulated three standards: be used to store the MPEG-1 of audiovisual sequence, be used for the MPEG-2 of audiovisual sequence broadcasting and be used for the MPEG-4 that object-based interactive multimedia is used.Following standard, MPEG-7 will be combined into the audio-visual information retrieval by the descriptor set of stipulating a standard a solution will be provided, and this descriptor set can be used in describes various types of multimedia messagess.MPEG-7 also will be descriptor and the mutual relationship between them, will define the mode standardization of (description scheme is just represented the method for the information that comprises in the scene) of other descriptor and structure.This description will be correlated with content itself, thus the interested material of search subscriber (still frame, figure, 3D model, audio frequency, voice, video quickly and efficiently ...).
The invention summary
The objective of the invention is to propose a solution, be used for representing the motion of video camera (the perhaps viewer of any kind or finder) in the video scene arbitrary frame sequence.
For this purpose, the present invention relates to a kind of descriptor, illustrated as preface part in this explanation, it is characterised in that each type of sports, except maintaining static, is divided into two components further, these two components are represented two different directions, and, represent that by histogrammic mode the value in the histogram is corresponding to predefined displacement size.
The searching method that though efficient also depends on Database Systems to be adopted, it is indubitable that but the efficient of this descriptor remains, because each component motion (all possible kinematic parameter and relevant speed, the preferably every frame half-pixel of the precision of these movement velocitys seems all enough for all possible application like this) all be independently and accurately to describe.Being easily understood of it makes a large amount of possible inquiries can realize parametrization.The range of application of this descriptor is very wide, because the motion of video camera is that (inquiry-searching system also has video monitor, video editing to all key features based on the application of video content ...).In addition, though the descriptor of Ti Chuing not is veritably at the scalability of describing with data volume here, but this descriptor provides and has been used in the graduation scheme, thereby allows the possibility of expression camera motion state in large-scale time scale.
The accompanying drawing summary
To introduce the present invention by the example reference accompanying drawing below, in these accompanying drawings:
The video camera operation that Fig. 1~3 explanations are basic;
Fig. 4 has provided a total figure of complete camera motion analytic system, and this system is used to realize a kind of method of estimation of video camera estimation characteristic instant;
Fig. 5 is a perspective projection figure, the video camera exterior coordinate OXYZ of system that moves with video camera and performance is described, and has illustrated for focal distance f, and (x is y) with different camera motion parameters corresponding to the retina coordinate of three-dimensional scenic mid point P;
Fig. 6 has illustrated the tracking shot model in the video camera model;
Fig. 7 has provided a kind of filtering technique that is used for Fig. 4;
Fig. 8 illustrates that the application description of the invention accords with a kind of image retrieval system on the basis of classifying.
Detailed Description Of The Invention
From the angle of establishment video index, the operation of video camera is extremely important.Because object of which movement and mass motion are to make the key feature that has difference between rest image and the video, therefore, all should comprise a kind of method based on all authorized index systems of video content, be used for effectively and broadly expression motion.Relating to various degree in the situation of camera motion, very clear, video camera be actionless that part of video with video camera that part of video of advancing or swinging the time be different on the empty content.The same with all other diagnostic characteristicses, if may be that the application of the video of any type of a problem and any type does like this by the motion that takes into account video camera be possible, just must be described and represents this mass motion in the MPEG-7 in future framework.In video documents, increase a description of mass motion, can allow the user, no matter right and wrong expert or professional can take into account that the motion state of video camera is inquired about.These inquiries are described together with further feature, should according to camera motion directly or semantically relevant information allow the retrieve video camera lens.
Regular video camera operation comprises eight kinds of basic operations (seeing Fig. 1,2 and 3) of definition prevailingly as you know, introduced as the front, they are to maintain static, swing, tracking, pitching, lifting, tracking shot, move forward and backward and roll, and wherein the various of at least two kinds of operations may make up.Fixed operation is very common, does not need further introduction.Swing and pitching often use, particularly when the video camera center be when maintaining static (for example on tripod), this operation can be followed the tracks of object or watch very big scene (for example landscape or skyscraper).Tracking shot usually is used to be primarily focused on certain part of scene.Following the tracks of and moving forward and backward in the time of majority all is to be used to follow mobile object (for example advancing).Rolling is the result of aircraft stunt series camera lens for example.All 7 kinds of camera motion operations (maintain static is very intuitively) all can cause different imaging point speed, and they can automatically be simulated and extract.
Consider these operations, a general camera motion descriptor should be described the feature of " camera motion ", that is to say, can represent all these type of sports independently, does not have any restriction so that handle their all combinations.Scheme described herein is with this method unanimity.Each type of sports except maintaining static, can further be divided into two components representing different directions.Really, shown in Fig. 1~3, swing and follow the tracks of not only can be towards a left side but also can be towards the right side, pitching and lifting not only can be upwards but also can be downward, tracking shot not only can be to amplify but also can be to dwindle, moving forward and backward can be also can be forward backward, rolling can be roll left (positive dirction) also can be roll to the right (in the other direction).Therefore, these the two kinds difference between may direction make that we always can be enough on the occasion of this 15 type of motion of expression, and with following the same mode of histogram to represent them.
At first consider the moment case of motion.Suppose that each type of sports all is independently, its speed is all arranged, these type of sports will be described with a kind of uniform way.Because the local velocity that causes of each type of sports can depend on scene depth (in this situation of translation) or depend on the position of imaging point (at tracking shot, move forward and backward and rotate under these situations), has therefore selected a unified unit to represent it.Speed will represent that it is near human sensation to speed with the value of the pixel/frame in the image plane.In the translation situation, the amplitude of motion vector will be average in whole image, because local speed dependent is in the degree of depth of object.Swing or this rotation situation of pitching in, speed will be that speed that produces at image center point, locally not have the distortion that causes because of boundary effect at this.Tracking shot, move forward and backward or these situations of rolling under, motion vector field is (how much being proportional to the distance of image center point) of dispersing, so speed will be represented with the pixel displacement in image corner.
The speed of each motion all uses a kind of pixel displacement value to represent, thereby satisfies the efficient requirement, and the someone advises representing with the precision of half-pixel.So for round values work, speed always will be rounded to immediate half-pix value and multiply by 2.Given these definition, pistolgraph machine motion arbitrarily can be represented with the histogram of these type of sports, value wherein corresponding to the half-pix displacement (obviously, when maintaining static this field and describing with the speed term without any meaning: why Here it is needs the reason of concrete data type, has wherein removed to maintain static).
This situation of long-term expression of camera motion also must be considered.Really, will be very heavy and time-consuming only with describing transient motion.Here also a kind of more or less graduate description of suggestion definition that is to say the motion of describing video camera with time scale arbitrarily.A time window [n of given video data 0, n 0+ N] (N is the frame sum in the window), suppose that the speed of each each type of sports of frame is all known.So can calculate frame number N (type of sports), wherein each type of sports all has the amplitude of a non-zero, and recently is illustrated in temporal existence with percentage, define (for example for swinging) according to following formula:
Figure 00800099000711
Such expression formula is generalized to the motion of any type.So the existence in time of all possible camera motion all will represent that wherein the value between 0 to 100 is corresponding to number percent with Motion Types Histogram.Obviously, if window is reduced to an independent frame, these values just can only be 0 or 100 so, specifically get what value and depend on that given motion is existence or does not have this fact in this frame.
At last, in order directly to visit the data of expression video, and allow between descriptor, to compare efficiently, suggestion increases in description divides the time border that is described window, it can be a complete video sequence, (camera lens is a series of frame to a camera lens, therefore wherein do not have uncontinuity, a video sequence be divided into coherence time during element, allow for example to have the index of a nature), a small segment (it is the part of a camera lens) or an independent frame.These speed corresponding to instantaneous velocity in average (when the motion of given type exists) on the whole time window.
The descriptor suggestion that provides previously, by each type of sports (representing) of starting point, terminating point, existence on the time and the speed amplitude of representing with uniform units (1/2 pixel/frame), can be used in any camera motion state of describing given frame sequence with number percent.The versatility that the main basis of this descriptor and advantage are it (this Camera Motion descriptor considered might direction on all possible physical motion), (amplitude of camera motion precision is all described with half-pixel arbitrarily for its precision, even this is also enough for professional application) and its dirigibility, because Camera Motion descriptor can be associated with very big time scale scope, from an independent frame to whole video sequence (it also can be associated with section continuous time).
In addition, the camera motion of suggestion is described all requirements and the assessment level that satisfies in the formal MPEG-7 file, particularly vision requirement here.Really, in requiring, stipulated MPEG-7:
(a) MPEG-7 describes to the vision of major general's supported feature " motion (under this situation of request of utilizing time composition information to retrieve) ", and the present invention obviously is exactly this situation;
And:
(b) " MPEG-7 will support to describe with the multi-medium data of the ever-increasing scope of visualization ability; MPEG-7 just can allow to have worked out indexed data how many more rough visualizations like this ": the feature of this descriptor aiming of suggestion, camera motion just, be relevant with " motion ", and, in the place that relates to videoization, can imagine the operation of representing video camera with text mode or graphics mode, with a summary (for example in the three unities polyphone chart board) that obtains whole video motion.
About vision data form and classification, in requiring, also stipulates MPEG-7:
(c) " MPEG-7 will support following data layout is described: digital video and film (for example MPEG-1; MPEG-2; MPEG-4); analog video and film; still picture (for example JPEG); figure (for example CAD), three-dimensional model (for example VRML), with relevant composition data of video or the like "; the present invention is like this really; even can be more prone to because carry out the automatic extraction of exercise data on the digital compression video data; with relevant this suggestion of video content itself still with all video data formats as target; digital with simulate, wherein movable information has been included in the content (MPEG-1 for example; the motion vector of MPEG-2 and MPEG-4 form);
(d) " MPEG-7 will support specifically can be applied to the description of following vision data type: natural video frequency, still picture, figure, 2 D animation, three-dimensional model, composition information ", this point has also been passed through and has been examined, because this suggestion can be used for animation vision data arbitrarily, resemble natural video frequency, animation or cartoon.
MPEG-7 requires also to have related to other general features, for example:
(e) level of abstraction of multimedia materials: this solution of suggestion is general, can be applied in the graduate scheme, allow to represent camera motion (the different abstraction level that can represent like this be a small fragment in whole sequence, video lens, the camera lens even mass motion type and the amplitude in the independent frame) with very big time scale scope;
(f) intersect mode: the inquiry on vision is described the basis allows retrieval to be different from special characteristics different in the feature of vision content (for example voice data) or the described video content fully (known an object to be taken before the feature, have a tracking shot probably, perhaps landscape lens generally all can swing, and utilizes the camera motion descriptor to help to relate to the search of different characteristic type);
(g) feature priority: the priority of the information that comprises in the descriptor is divided (having determined after the query argument) and is allowed the function of coupling that the various implications that depend on user preferences and requirement are strongly arranged;
(h) feature hierarchy: describe though do not design camera motion according to the level mode, but, in order to handle these data with inquiry more effectively, might construct description in various degree, for example be used for representing the motion of video scene, wherein also described each camera lens, and so recurrence is gone down, up to arriving the frame one-level;
(i) description of time range: this camera motion descriptor can be associated with different time range in the audio-visual-materials, and (--for example this film is to take with a fixed video camera always--is to the frame one-level from whole video, thereby allow to carry out very fine description), perhaps be associated with the continuous time period, for example different miniature family in camera lens (for example: this camera lens was with a long burnt beginning of 20 seconds, short pitching with 2 seconds finishes), so (this descriptor is relevant with whole data, and is perhaps relevant with its a time subclass) of this association or layering or (this descriptor is relevant with the continuous time period) of order;
(j) immediate data operation: this suggestion allows to do like this.
In addition, obviously the descriptor of suggestion also should be considered functional requirement here, for example:
(k) content-based retrieval: main target of this suggestion is the retrieval that allows on the basis of content multi-medium data to be carried out effectively (" you obtain you exactly and seek ") and efficient (" you obtain you rapidly and seek ") really, no matter semanteme how, validity is mainly guaranteed by the accuracy of this description, it has taken into account all possible motor performance and the amplitude that relates to independently, and efficient then depends on the database engine of employing and the search strategy of selection;
(l) based on the retrieval of similarity: utilize the description of the invention symbol, can carry out this retrieval according to similarity degree, and to the data-base content divided rank;
(m) the flow pattern description describing and store: can hinder without any content in the descriptor of this suggestion and carry out described operation;
(n) quote simulated data: same, the descriptor of this suggestion to any other data of reference object, time reference or analog format without any restriction;
(o) link: be included in the described description since definition is described in the moment of window effective time wherein, therefore this descriptor of suggestion allows the accurate location to the quantity to be quoted certificate.
Construct on the basis of Jian Yi the descriptor kinematic parameter that must define in front like this.Though there have been some technology to be used for estimating (video camera or relevant finder) these kinematic parameters, but they usually have some shortcomings, therefore prefer estimating that a kind of of camera motion parameter improves one's methods, resemble and submitted on Dec 24th, 1999, quote introduced in the international patent application that code is PCT/EP99/10409 (PHF99503) the same.
The whole implementation scheme of this method of estimation illustrates with Fig. 4.Can point out that since MPEG-7 will be a multimedia content description standard, it just should not stipulate type of coding: the forming process of a descriptor must be at all types of coded data work, and no matter that compressed or do not compress.Yet because the most video datas that obtain from incoming frame can be mpeg format (so they are to have compressed) usually, therefore, the motion vector that directly utilizes the MPEG motion compresses to provide is more favourable.Otherwise, if can obtain unpressed video data, so just can in motion vector generating apparatus 41, adopt block matching method, thus described vector obtained.
No matter be what situation, in case from video sequence, read or extract motion vector (between two successive frames), just provide one, so that reduce the data volume unevenness of described motion vector to down-sampling and filtration unit 42.This operation is later on to an instantaneous estimation of video camera feature in the device 43.This estimation is for example based on the following method.
Before introducing this method, at first provide the video camera model.Consider to move past a single-lens camera of a static environment.As can be seen from Figure 5, make that O is the optical centre of video camera, OXYZ is an exterior coordinate system fixing with respect to video camera, and OZ is an optical axis, and x, y, z are respectively level, vertical direction and axis direction.Make T x, T y, T zBe the point-to-point speed of OXYZ with respect to scene, R x, R y, R zBe its angular velocity.If (X, Y, Z) is the instantaneous coordinate of a some P in the three-dimensional scenic, the speed component of P will be so: X ‾ = - T x - R y · Z + R z · Y - - - - - - ( 2 ) Y ‾ = - T y - R z · X + R x · Z - - - - - - ( 3 ) Z ‾ = - T z - R x · Y + R y · X - - - - - - ( 4 ) The image position p of P is provided at image plane by relation (5):
Figure 00800099001141
(wherein f is the focal length of video camera), it can move past image plane with the speed of following initiation: ( u x , u y ) = ( x ‾ , y ‾ ) - - - - - - ( 6 ) Through replacement and calculating, obtain following relation: u x = f · X ‾ Z - f · X · Z ‾ Z 2 - - - - - - ( 7 ) u x = f Z ( - T x - R y · Z + R z · Y ) - f · X Z 2 ( - T z - R x · Y + R y · X ) - - - - - - ( 8 ) And u y = f · Y ‾ Z - f · Y · Z ‾ Z 2 - - - - - - ( 9 ) u y = f Z ( - T y - R z · X + R x · Z ) - f · Y Z 2 ( - T z - R x · Y + R y · X ) - - - - - - ( 10 ) They also can be written as: u x ( x , y ) = - f Z ( T x - x · T z ) + x · y f · R x - f ( 1 + x 2 f 2 ) R y + y · R z - - - - - - ( 11 ) u y ( x , y ) = - f Z ( T y - y · T z ) - x · y f · R y + f ( 1 + y 2 f 2 ) R x - x · R z - - - - - - ( 12 )
In addition, in order in the video camera model, to comprise tracking shot, suppose that tracking shot can be similar in an amplification with angle.If in the scene recently the distance of object compare very greatly with the focal length that changes because of tracking shot, all be this situation usually, so such hypothesis establishment.
Fig. 6 has considered simple push-and-pull operation.A point in the given image plane, moment t it (x, y), next constantly t ' it at (x ', y '), because the image speed u of the x direction that the push-and-pull operation causes x=x '-x can be expressed as R ZoomFunction (R ZoomBe defined as (θ '-θ)/θ, as shown in Figure 6), as what introduce below.
We have: tan (θ ')=x '/f and tan (θ)=x/f, so:
u x=x '-x=[tan (θ ')-tan (θ)] expression formula of f (13) tan (θ ') can be write as: tan ( θ ′ ) = tan [ ( θ ′ - θ ) + θ ] = tan ( θ ′ - θ ) + tan ( θ ) 1 - tan ( θ ′ - θ ) · tan ( θ ) - - - - - - ( 14 ) Suppose then differential seat angle (θ '-θ) very little, that is to say tan (θ '-θ) can with (θ '-θ) come approximate, and (θ '-θ) tan θ<<1, we obtain: u x = x ′ - x = f · [ ( θ ′ - θ ) + tan ( θ ) 1 - ( θ ′ - θ ) · tan ( θ ) - tan θ ] - - - - - - ( 15 ) u x = f · ( θ ′ - θ ) ( 1 + tan 2 ( θ ) ) 1 - ( θ ′ - θ ) · tan ( θ ) - - - - - - - ( 16 ) u x = f · θ · R zoom · 1 + tan 2 ( θ ) 1 - ( θ ′ - θ ) · tan ( θ ) - - - - - - ( 17 ) In fact it is equivalent to:
u x=x '-x=f θ R Zoom(1+tan 2θ) (18) this result can be write as: u x = f · tan - 1 ( x f ) · R zoom · ( 1 + x 2 f 2 ) - - - - - - ( 19 ) In like manner, u yCan be write as: u y = f · tan - 1 ( y f ) · R zoom · ( 1 + y 2 f 2 ) - - - - - - ( 20 ) Speed u=(u x, u y) operate in the motion that causes on the image plane corresponding to a push-and-pull.General model of definition has wherein taken into account all rotations, translation (along X and Y-axis) and tracking shot operation below.
This general model can be write as velocity of rotation, and expression is rotated and the tracking shot motion, and point-to-point speed, the translation of expression X and Y direction, sum (just being respectively to follow the tracks of and tracking shot)
Figure 00800099001311
Wherein:
Figure 00800099001331
Wherein, have only the translation item relevant with object distance Z.
The article of M.V.Srinivasan or the like in the 593rd~605 page of the 4th phase pattern-recognition of 1997 the 30th volumes " the qualitative estimation of the camera motion parameter being carried out from video sequence " described, in order from image sequence, to extract the camera motion parameter, utilize a kind of technology of video camera equation (21)~(23).Or rather, the 3rd part in the described article (the 595th~597 page) has been introduced the ultimate principle of this technology.By seeking optimum value Rx, Ry, the R that produces the flow field z, from initial optical flow field, deduct after it, can obtain a residue flow field, wherein all vectors all are parallel, this technology adopts alternative manner, makes full use of the criterion based on section, and the parallel misalignment of residual stream vector is dropped to minimum.
In each step in this alternative manner, the light stream that causes by current camera motion parameter according to a calculating in two different video camera models.The visual angle size of first model assumption visual field (perhaps focal distance f) is known: this means for every bit in the image to calculate ratio x/f and y/f in the equation (23), utilize described equation can calculate definite light stream then.Very big and when being known when the visual field of video camera, this first model has just been considered that of tracking shot or pitch distortion can produce more accurate result.Unfortunately be, focal length is ignorant sometimes, this just must use second model, suspects when the visual field is very big, just at limited image-region.According to second model, applicable equations (23) before, small field of view approximate (x/f and y/f are much smaller than 1) obtains following equation (24) and (25) thus so be exactly necessary: u x rot ≈ - f · R y + y · R z + x · R zoom - - - - - - ( 24 ) u y rot ≈ - f · R x - x · R z + y · R zoom - - - - - - ( 25 )
Be estimated as each frame is obtained an eigenvector what device carried out in 43 like this.In the whole sequence that is considered, this stack features vector is finally received by a long-term motion analytical equipment 44 then.These device 44 output movement descriptors, this descriptor can be used at content-based retrieval, in MPEG-7 video index establishment framework, is this sequence produce index according to the camera motion mode especially.
Have two main problems to make the pre-treatment step of 42 li in device reasonable: the low frequency part of nonuniform motion vector, particularly image or unity and coherence in writing are local very uniformly, and the size of piece is too little.Providing to down-sampling and filter process is for by original field is carried out to down-sampling, reduces the quantity of vector, filters the vector that does not conform to Global Information simultaneously.Be adopted as the degree of confidence mask that each vector calculates: it is a standard, changes between 0 and 1 according to the confidence level of each motion vector, can be used in to judge whether to consider these vectors.An example putting the letter mask can be to consider that it is too many that motion vector can not become for any theoretic camera motion: contiguous vector has close value.So, can be according to distance metric confidence level from each vector to its neighbour, it can be represented with for example its mean value, perhaps, preferably use its intermediate value (because it is more insensitive to very big single error).So, put letter mask C I, jDefine by equation (26):
Fig. 7 has illustrated this filtering technique: the square that the field after the filtration (right side) comprises has only 1/4th of the middle square number in original field (left side).Go out to represent the motion vector of new square according to the motion vector computation of four original squares, and calculate their confidence level according to the neighbour among the figure.The motion vector of new square is its old weighted mean than blockage:
Figure 00800099001421
Be used for utilizing the motion vector field after the filtration, frame calculated the device 43 that comprises the eigenvector of camera motion information between two frames that are considered, also can adopt an algorithm for estimating, this algorithm for estimating that for example goes through now for each.
At first, utilize equation (26) to calculate and put the letter mask.Begin the parallelization process then.Each consider in the process of calculation cost function or result vector in the motion vector that all uses it puts the weighting of letter mask.Then, utilize following equation can calculate R x, R y, R z, R ZoomWith the optimum value of focal distance f, they provide a remanent field that all vectors are all parallel: R estim → = [ R ^ x , R ^ y , R ^ z , R ^ zoom , f ^ ] = arg min { P ( R ^ ) } ( 28 ) Here P ( R - ) = Σ i Σ i | | V → i , j residual ( R - ) | | 2 · θ i , j · C i , j ( 29 ) Wherein:
Figure 00800099001531
And: θ i , j = angle ( ν - i , j residual , ν - residual ) , ν - resudal = Σ i Σ j V - i , j residual · C i , j Σ i Σ j C i , j ( 31 )
Under the situation of non-translation motion, these remainder vector can be not parallel in big visual field, but should be very ideally near zero.So just can calculate the β ratio according to equation (32)): β = | | ∑ ν - i , j residual ( R - estim ) | | ∑ | | ν - i , j residual ( R - estim ) | | ( 32 )
The parallel degree of its explanation remanent field.It is residual stream vector resultant vector amplitude with the residual stream amplitude of the vector and ratio: β=1 means that remainder vector is a complete matching, and β=0 means that then the direction of remainder vector all is at random.Outside the survey, in order to check the obvious tracking component (to check in the camera motion whether have tangible tracking component) that whether has camera motion, by calculating a that following equation (33) provides, follow the intensity in original flow field to compare the intensity in residue flow field: α = mea n ( * ) ( | | ν - i , j residual ( R - extim ) | | ) mea n ( * ) ( | | ν - i , j | | ) ( 33 ) " mean (*) " its variable of operator representation wherein is the weighted mean of power with the degree of confidence mask.Utilize these two ratios to check and whether have tracking component and size thereof, that is:
A) if β~0, then without any pursuit movement;
B) if β~1:
If α~0, pursuit movement just can be ignored;
If α~1, pursuit movement are just clearly: T ^ x = - ν x residual T ^ y = - v y residual These ratios give the result and are correlated with for information about.
Must be noted that, since degree of depth the unknown of each piece, the translation motion component that estimates, just
Figure 00800099001631
Do not represent first model tangential component really, but in the whole image
Figure 00800099001641
A weighted mean value.Yet they are obviously good expressions of pursuit movement in the image.
The present invention is not limited to the content in the explanation of front, utilizes them can obtain to improve or use, and can not depart from scope of the present invention.For example, the invention still further relates to a kind of image retrieval system, for example shown in Figure 8 is the sort of, and it comprises a video camera 81, is used to obtain video sequence (can obtain with the form of video bit stream in proper order); A video index scheduling apparatus 82 is used for realizing the authorized index method on the basis of classification that uses described (video camera or any finder) motion descriptor to obtain; Store a database 83 of the data that described classification obtains (these data are called metadata sometimes, can allow the retrieval that user subsequently carries out or browse); A graphical user interface 84 is used for database is carried out requested retrieval and a video monitor 85, is used to the information that shows that retrieval is come out.

Claims (7)

1. descriptor, be used for from the angle of establishment video index, in video scene arbitrarily in the frame sequence, the motion of expression video camera or any viewer or finder, described motion is at least a or several in the following basic operation: maintain static, (horizontally rotating) swings, follow the tracks of (horizontal cross motion, in movie Language, also be called advance (travelling)), pitching (vertical rotation), lifting (vertical traverse), tracking shot (change focal length), move forward and backward (along the optical axis translation) and roll (rotating on optical axis), at least two kinds combination in any during perhaps these are operated, each type in the wherein said type of sports, except maintaining static, all further be divided into two components, they represent two different directions, and represent that with histogrammic mode the value in the histogram is corresponding to predefined displacement size.
2. the descriptor of claim 1 utilizes it, supposes independently each type of sports, by select a public unit representation it, all use a kind of uniform way to describe its speed.
3. the descriptor of claim 2 utilizes it, and the speed of each type of sports all uses the pixel displacement value of half-pixel precision to represent.
4. the descriptor of claim 3 utilizes it, and in order to use round values, speed all is rounded to nearest half-pixel value, and multiply by 2.
5. claim 1 and 3 descriptor is characterized in that, by the motion of representing with time scale arbitrarily to handle, this description is layering.
6. the descriptor of claim 4 is characterized in that, a time window [n of given video data 0, n 0+ N] speed of each type of sports in (N is the sum of frame in the window) and each frame, wherein each type of sports has the number N type of sports of the frame of obvious speed to be calculated, and existence is in time represented with number percent, is defined as:
So, the existence that might move in time represent with Motion Types Histogram, value wherein, between 0 and 100, corresponding to number percent, reduced during when this window to an independent frame, be to exist or do not exist according to given motion in this frame, these values can only be 0 or 100.
7. the descriptor of any one in the claim 1~6 is applied in the image retrieval system and goes, this system comprises a video camera, is used to obtain video sequence; A video index scheduling apparatus; A database; A graphical user interface is used for this database is carried out the retrieval of being asked; With a video monitor, be used to show the information that retrieval is come out, video establishment operation is to be categorized as the basis with what the described descriptor that utilizes camera motion obtained in the described video index scheduling apparatus.
CN00800099A 1999-02-01 2000-01-28 Descriptor for video sequence and image retrieval system using said descriptor Pending CN1293782A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP99400219.4 1999-02-01
EP99400219 1999-02-01

Publications (1)

Publication Number Publication Date
CN1293782A true CN1293782A (en) 2001-05-02

Family

ID=8241866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN00800099A Pending CN1293782A (en) 1999-02-01 2000-01-28 Descriptor for video sequence and image retrieval system using said descriptor

Country Status (6)

Country Link
US (1) US7010036B1 (en)
EP (1) EP1068576A1 (en)
JP (1) JP2002536746A (en)
KR (1) KR20010042310A (en)
CN (1) CN1293782A (en)
WO (1) WO2000046695A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100391263C (en) * 2005-10-27 2008-05-28 复旦大学 Method of judging motion cause by means of video motion vectors
CN100461865C (en) * 2005-10-21 2009-02-11 广达电脑股份有限公司 A system to estimate motion vector
CN101420595B (en) * 2007-10-23 2012-11-21 华为技术有限公司 Method and equipment for describing and capturing video object
CN111337031A (en) * 2020-02-24 2020-06-26 南京航空航天大学 Spacecraft landmark matching autonomous position determination method based on attitude information
CN113177445A (en) * 2021-04-16 2021-07-27 新华智云科技有限公司 Video mirror moving identification method and system

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000276469A (en) * 1999-03-23 2000-10-06 Canon Inc Method and device for information retrieval and storage medium
WO2001061448A1 (en) 2000-02-18 2001-08-23 The University Of Maryland Methods for the electronic annotation, retrieval, and use of electronic images
US7275067B2 (en) * 2000-07-19 2007-09-25 Sony Corporation Method and apparatus for providing multiple levels of abstraction in descriptions of audiovisual content
KR20020031015A (en) * 2000-10-21 2002-04-26 오길록 Non-linear quantization and similarity matching methods for edge histogram bins
EP1293911A1 (en) * 2001-08-21 2003-03-19 Deutsche Thomson-Brandt Gmbh Method and apparatus for generating editing-related metadata
FR2833797B1 (en) * 2001-12-19 2004-02-13 Thomson Licensing Sa METHOD FOR ESTIMATING THE DOMINANT MOVEMENT IN A SEQUENCE OF IMAGES
KR100491724B1 (en) * 2002-10-14 2005-05-27 한국전자통신연구원 Spatial Image Information System and Method Supporting Efficient Storage and Retrieaval of Spatial Images
US8824553B2 (en) 2003-05-12 2014-09-02 Google Inc. Video compression method
US7904815B2 (en) * 2003-06-30 2011-03-08 Microsoft Corporation Content-based dynamic photo-to-video methods and apparatuses
KR100612852B1 (en) * 2003-07-18 2006-08-14 삼성전자주식회사 GoF/GoP Texture descriptor method, and Texture-based GoF/GoP retrieval method and apparatus using the GoF/GoP texture descriptor
US7312819B2 (en) * 2003-11-24 2007-12-25 Microsoft Corporation Robust camera motion analysis for home video
US20050215239A1 (en) * 2004-03-26 2005-09-29 Nokia Corporation Feature extraction in a networked portable device
US8804829B2 (en) * 2006-12-20 2014-08-12 Microsoft Corporation Offline motion description for video generation
JP5409189B2 (en) * 2008-08-29 2014-02-05 キヤノン株式会社 Imaging apparatus and control method thereof
CN102479065B (en) * 2010-11-26 2014-05-07 Tcl集团股份有限公司 Rotary display and display method thereof
US8333520B1 (en) 2011-03-24 2012-12-18 CamMate Systems, Inc. Systems and methods for detecting an imbalance of a camera crane
US8540438B1 (en) 2011-03-24 2013-09-24 CamMate Systems. Inc. Systems and methods for positioning a camera crane
US20140317480A1 (en) * 2013-04-23 2014-10-23 Microsoft Corporation Automatic music video creation from a set of photos
US20220002128A1 (en) * 2020-04-09 2022-01-06 Chapman/Leonard Studio Equipment, Inc. Telescoping electric camera crane

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2677312B2 (en) * 1991-03-11 1997-11-17 工業技術院長 Camera work detection method
KR100215586B1 (en) * 1992-11-09 1999-08-16 모리시타 요이찌 Digest image auto-generating apparatus and digest image auto-generating method
JPH06276467A (en) * 1993-03-23 1994-09-30 Toshiba Corp Video index generating system
US5929940A (en) * 1995-10-25 1999-07-27 U.S. Philips Corporation Method and device for estimating motion between images, system for encoding segmented images
JP3226020B2 (en) * 1997-05-28 2001-11-05 日本電気株式会社 Motion vector detection device
EP0884912B1 (en) * 1997-06-09 2003-08-27 Hitachi, Ltd. Image sequence decoding method
US6195458B1 (en) * 1997-07-29 2001-02-27 Eastman Kodak Company Method for content-based temporal segmentation of video
JP3149840B2 (en) * 1998-01-20 2001-03-26 日本電気株式会社 Apparatus and method for detecting motion vector
US6389168B2 (en) * 1998-10-13 2002-05-14 Hewlett Packard Co Object-based parsing and indexing of compressed video streams
JP2000175149A (en) * 1998-12-09 2000-06-23 Matsushita Electric Ind Co Ltd Video detector and summarized video image production device
JP2000222584A (en) * 1999-01-29 2000-08-11 Toshiba Corp Video information describing method, method, and device for retrieving video
JP3508013B2 (en) * 1999-05-25 2004-03-22 日本電信電話株式会社 MPEG video search device based on camera work information and recording medium storing MPEG video search program

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100461865C (en) * 2005-10-21 2009-02-11 广达电脑股份有限公司 A system to estimate motion vector
CN100391263C (en) * 2005-10-27 2008-05-28 复旦大学 Method of judging motion cause by means of video motion vectors
CN101420595B (en) * 2007-10-23 2012-11-21 华为技术有限公司 Method and equipment for describing and capturing video object
US8687064B2 (en) 2007-10-23 2014-04-01 Huawei Technologies Co., Ltd. Method and device for describing and capturing video object
CN111337031A (en) * 2020-02-24 2020-06-26 南京航空航天大学 Spacecraft landmark matching autonomous position determination method based on attitude information
CN113177445A (en) * 2021-04-16 2021-07-27 新华智云科技有限公司 Video mirror moving identification method and system

Also Published As

Publication number Publication date
KR20010042310A (en) 2001-05-25
US7010036B1 (en) 2006-03-07
JP2002536746A (en) 2002-10-29
WO2000046695A1 (en) 2000-08-10
EP1068576A1 (en) 2001-01-17

Similar Documents

Publication Publication Date Title
CN1293782A (en) Descriptor for video sequence and image retrieval system using said descriptor
US20220012495A1 (en) Visual feature tagging in multi-view interactive digital media representations
US20180261000A1 (en) Selecting time-distributed panoramic images for display
CN1300503A (en) Camera motion parameters estimation method
US6956573B1 (en) Method and apparatus for efficiently representing storing and accessing video information
CN1162793C (en) Method and apparatus for representing and searching for an object using shape
WO2000045338A1 (en) System and method for representing trajectories of moving object for content-based indexing and retrieval of visual animated data
EP0976089A1 (en) Method and apparatus for efficiently representing, storing and accessing video information
CN1851710A (en) Embedded multimedia key frame based video search realizing method
CN101064846A (en) Time-shifted television video matching method combining program content metadata and content analysis
CN110688905A (en) Three-dimensional object detection and tracking method based on key frame
CN110009675A (en) Generate method, apparatus, medium and the equipment of disparity map
CN110889349A (en) VSLAM-based visual positioning method for sparse three-dimensional point cloud chart
CN102236714A (en) Extensible markup language (XML)-based interactive application multimedia information retrieval method
CN103279473A (en) Method, system and mobile terminal for searching massive amounts of video content
Maiwald et al. A 4D information system for the exploration of multitemporal images and maps using photogrammetry, web technologies and VR/AR
CN115331183A (en) Improved YOLOv5s infrared target detection method
KR102464271B1 (en) Pose acquisition method, apparatus, electronic device, storage medium and program
CN110390336B (en) Method for improving feature point matching precision
CN116843867A (en) Augmented reality virtual-real fusion method, electronic device and storage medium
Ferreira et al. Towards key-frame extraction methods for 3D video: a review
CN115239763A (en) Planar target tracking method based on central point detection and graph matching
Juan et al. Content-based video retrieval system research
CN111553921B (en) Real-time semantic segmentation method based on channel information sharing residual error module
Choudhary et al. Real time video summarization on mobile platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication