The content of the invention
The present invention provides a kind of new space-time Harris angular-point detection methods and device, it is desirable to provide one kind can be fully anti-
The time-space domain correlation of video is reflected, and then can be extracted on spatial domain comprising the clear motion of the embodiment in time domain simultaneously of apparent information
The detection algorithm of the space-time Harris angle points of change.
The invention provides a kind of new space-time Harris angular-point detection methods, including:
The video image spatial information (si) and time-domain information contained based on video bag, the three-dimensional geometry algebraically for obtaining the video is empty
Between, and build the motion vector of the pixel of the three-dimensional geometry algebraic space;
The table of each pixel is obtained using the motion vector and preset apparent-motion vector algorithms of the pixel
Sight-motion vector;
Space-time second-order matrix is built with reference to apparent-motion vector of the pixel, and according to the space-time second-order matrix
Construct a space-time Harris angle point receptance function;
The space-time of all pixels point on certain pixel p and its neighborhood is calculated according to the space-time Harris angle point receptance functions
Harris angle point receptance function values, if pixel p space-time Harris angle point receptance functions value is more than all pictures on its neighborhood
The space-time Harris angle point receptance function values of vegetarian refreshments, then point p is the space-time Harris angle points of video.
Further, the motion vector of the pixel for building the three-dimensional geometry algebraic space, including:
Build three-dimensional geometry algebraic spaceCurrent frame video image in pixel at p point to next frame video image
Middle prThe motion vector v of the pixel at placep, the motion vector vpFor:
vp=pr-p;
Wherein, p is the three-dimensional geometry algebraic spacePixel in middle current frame video image at p, prFor described three
Tie up Geometrical algebra spaceNext frame video image in neighborhood centered on p ' with the picture in current frame video image at p
The minimum pixel of the difference of element value, wherein, p ' is to have with the pixel at current frame video image p in next frame video image
The pixel of same position.
Further, the preset apparent-motion vector algorithms are:
F ' (p)=f (p)+vp;
Wherein, vpThe motion vector of the pixel in video image at p is represented, f (p) represents the picture at p in video image
The pixel value of vegetarian refreshments, f ' (p) represents apparent-motion vector of the pixel at p in video image.
Further, apparent-motion vector of the pixel with reference to described in builds space-time second-order matrix, including:
Apparent-motion vector f ' (p) of each pixel is calculated in x, y, the gradient f ' in tri- directions of tx,f′y,f′t;
Using apparent-motion vector f ' (p) of the pixel in x, y, the gradient f ' in tri- directions of tx,f′y,f′tBuild
Second order gradient matrix N, the second order gradient matrix N are:
Convolutional calculation is carried out to matrix N using gaussian weighing function ω (p), space-time second-order matrix M (p) is obtained, when described
Empty second-order matrix M (p) is:
Wherein, ω represents gaussian weighing function ω (p),For convolution symbol, A, B, C, D, E and F
Difference homography M (p) each element.
Further, the space-time Harris angle point receptance functions are:
R=det (M)-k (trace (M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
Wherein, R is space-time Harris angle point receptance function values, and k is empirical;Det (M) representing matrix M (p) ranks
Formula, trace (M) representing matrix M (p) mark, its expression formula is as follows:
Det (M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
Trace (M)=λ1+λ2+λ3=A+B+C
Wherein, λ1、λ2、λ3Respectively matrix M characteristic value, A, B, C, D, E and F difference homography M (p) each yuan
Element.
Present invention also offers a kind of new space-time Harris Corner Detection devices, including:
Pretreatment module, for the video image spatial information (si) and time-domain information contained based on video bag, obtains the video
Three-dimensional geometry algebraic space;
Motion vector builds module, the motion vector of the pixel for building the three-dimensional geometry algebraic space;
Apparent-motion vector builds module, for the motion vector using the pixel and preset apparent-motion arrow
Quantity algorithm obtains apparent-motion vector of each pixel;
Space-time second-order matrix builds module, and space-time second moment is built for apparent-motion vector with reference to the pixel
Battle array;
Receptance function constructing module, letter is responded for constructing a space-time Harris angle point according to the space-time second-order matrix
Number;
Space-time Harris angle point acquisition modules, for calculating certain pixel according to the space-time Harris angle point receptance functions
P and all pixels point on its neighborhood space-time Harris angle point receptance function values, if pixel p space-time Harris angle points ring
Functional value is answered to be more than the space-time Harris angle point receptance function values of all pixels point on its neighborhood, then point p is the space-time of video
Harris angle points.
Further, the motion vector builds module, empty specifically for building three-dimensional geometry algebraically according to equation below
BetweenCurrent frame video image in pixel at p point to p in next frame video imagerThe motion vector v of the pixel at placep,
The motion vector vpFor:
vp=pr-p;
Wherein, p is the three-dimensional geometry algebraic spacePixel in middle current frame video image at p, prFor described three
Tie up Geometrical algebra spaceNext frame video image in neighborhood centered on p ' with the picture in current frame video image at p
The minimum pixel of the difference of element value, wherein, p ' is to have with the pixel at current frame video image p in next frame video image
The pixel of same position.
Further, the preset apparent-motion vector algorithms are:
F ' (p)=f (p)+vp;
Wherein, vpThe motion vector of the pixel in video image at p is represented, f (p) represents the picture at p in video image
The pixel value of vegetarian refreshments, f ' (p) represents apparent-motion vector of the pixel at p in video image.
Further, the space-time second-order matrix builds module, including:
Gradient calculation module, for calculating apparent-motion vector f ' (p) of each pixel in x, y, tri- directions of t
Gradient f 'x,f′y,f′t;
Second order gradient matrix builds module, for apparent-motion vector f ' (p) using the pixel in x, y, t tri-
The gradient f ' in individual directionx,f′y,f′tBuilding second order gradient matrix N, the second order gradient matrix N is:
Space-time second-order matrix acquisition module, for carrying out convolutional calculation to matrix N using gaussian weighing function ω (p), is obtained
To space-time second-order matrix M (p), the space-time second-order matrix M (p) is:
Wherein, ω represents gaussian weighing function ω (p),For convolution symbol, A, B, C, D, E and F difference homography M (p)
Each element.
Further, the space-time Harris angle point receptance functions are:
R=det (M)-k (trace (M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
Wherein, R is space-time Harris angle point receptance function values, and k is empirical;Det (M) representing matrix M (p) ranks
Formula, trace (M) representing matrix M (p) mark, its expression formula is as follows:
Det (M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
Trace (M)=λ1+λ2+λ3=A+B+C
Wherein, λ1、λ2、λ3Respectively matrix M characteristic value, A, B, C, D, E and F difference homography M (p) each yuan
Element.
Compared with prior art, beneficial effect is the present invention:A kind of new space-time Harris angle points that the present invention is provided
Detection method and device, the video image spatial information (si) and time-domain information contained based on video bag, obtain the three-dimensional several of the video
What algebraic space, and the motion vector of the pixel of the three-dimensional geometry algebraic space is built, and with reference to the fortune of the pixel
Dynamic vector builds apparent-motion vector, so that it is apparent with movable information unified model to obtain video, in the base of the unified model
Space-time second-order matrix is built on plinth, and a space-time Harris angle point receptance function, root are constructed according to the space-time second-order matrix
The space-time Harris angle points of all pixels point on certain pixel p and its neighborhood are calculated according to the space-time Harris angle point receptance functions
The size of the space-time Harris angle point receptance function values of other pixels in receptance function value, comparison point p and its neighborhood, to sentence
Breakpoint p whether be video space-time Harris angle points.Relative to prior art, video is regarded as a 3 D stereo knot by the present invention
Structure, establishes that video is apparent with movable information unified model using Geometrical algebra, to the apparent information and movable information of video this
There is different dimensional geometry of numbers body and the relation between them to establish unified model a bit, and one kind is proposed based on the model
The detection algorithm of space-time Harris angle points so that the time-space domain correlation of detection algorithm energy fully reflecting video, and to carry
The space-time interest points taken embody clearly motion change in time domain simultaneously on spatial domain comprising unique apparent information.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Due to there is the time-space domain correlation without energy fully reflecting video in the prior art, and then can not veritably it extract
To the space-time interest points i.e. space-time for embodying clear motion change in time domain simultaneously comprising unique apparent information on spatial domain
The problem of Harris angle points.
In order to solve the above-mentioned technical problem, the present invention proposes a kind of new space-time Harris angular-point detection methods, the present invention
Video is regarded as a 3-D solid structure, it is apparent with movable information unified model to establish video using Geometrical algebra, to regarding
The apparent information and movable information of frequency these that there is different dimensional geometry of numbers body and the relation between them to establish is unified
Model, and a kind of detection algorithm of new space-time Harris angle points is proposed based on the model so that the detection algorithm can be fully anti-
Reflect the time-space domain correlation of video, and cause the space-time interest points extracted on spatial domain comprising unique apparent information simultaneously when
Clearly motion change is embodied on domain.
Referring to Fig. 1, for space-time Harris angular-point detection methods new in the embodiment of the present invention, the detection method includes:
Step S1, the video image spatial information (si) and time-domain information contained based on video bag, obtains the three-dimensional several of the video
What algebraic space, and build the motion vector of the pixel of the three-dimensional geometry algebraic space;
In embodiments of the present invention, modeling is carried out in a video, the video image (video included in the video
Frame) sequence can be expressed as a 3-D solid structure, comprising spatial information (si) and time-domain information, it is necessary in explanation, video
Apparent information and movable information can be represented with the Geometrical algebra in 3-D solid structure.In view of geometry (Clifford) generation
The terseness of several computings to vector data and geological information, is entered using three-dimensional geometry algebraically as mathematical framework to video herein
Row modeling, is described below the expression model of the sequence of video images under Geometrical algebra framework.
If R3It is the three-dimensional theorem in Euclid space that the spatial information (si) and time-domain information of the sequence of video images that video bag contains are constituted, it
Orthonormal basis be { e1,e2,e3, then these orthonormal basises open into R by geometry3Geometrical algebra space beShouldThe three-dimensional geometry algebraic space of as above-mentioned video, in embodiments of the present invention, subsequently should
It is abbreviated asIts one group of specification base is:
{1,e1,e2,e3,e1∧e2,e2∧e3,e1∧e3,e1∧e2∧e3} (1)
Wherein, " ∧ " represents the symbol that Geometrical algebra apposition is calculated, e1∧e2, e2∧e3And e1∧e3It is by three orthogonal basis
e1、e2And e3It show respectively on three obtained independent double appositions, these three double apposition geometric meaningsTwo, space
The plane of vector representation, e1∧e2∧e3It is triple appositions:e1∧e2∧e3=(e1∧e2)e3, its geometric interpretation is:Double apposition
e1∧e2Along vector e3The mobile oriented solid obtained.{e1,e2,e3Be considered as3-dimensional vector subspace base
Vectorial { x, y, t }.
Then video sequence F can be expressed as:
F=f (p) (2)
Wherein,And p=xe1+ye2+te3;X and y represent spatial domain coordinate, and have 0 < x < Sx, 0 < y < Sy;T tables
Show time domain coordinate, and 0 < t < St.F (p) represents the apparent value of the pixel at p in video F.
If p1,And p1=x1e1+y1e2+t1e3, p2=x2e1+y2e2+t2e3, then their geometry product can represent
For:
p1p2=p1·p2+p1∧p2 (3)
Wherein, inner product is represented, ∧ represents apposition.It is by inner product (p that it, which represents that the geometry of two vectors is accumulated,1·p2) and it is outer
Product (p1∧p2) sum composition.
In, p1And p2Distance can be represented with △ p, i.e.,:
△ p=p1-p2=(x1-x2)e1+(y1-y2)e2+(t1-t2)e3 (4)
It represents one from p2Point to p1Vector, it is not only the measurement of two pixel distances, and can also reflect
The motion conditions of pixel in the video sequence.
It is the introduction of the three-dimensional geometry algebraic space of video in the embodiment of the present invention above.
In embodiments of the present invention, the motion vector of the pixel for building the three-dimensional geometry algebraic space is specific
To build three-dimensional geometry algebraic spaceCurrent frame video image in pixel at p point to p in next frame video imager
The motion vector v of the pixel at placep, the motion vector vpFor:
vp=pr-p; (5)
Wherein, if p,And p=xie1+yje2+tke3, p '=xie1+yje2+(tk+1)e3, S is in t=tk+ 1 is flat
The set of the point of the neighborhood of l × l on face centered on p '.
Wherein, p is the three-dimensional geometry algebraic spacePixel in middle current frame video image at p,That is prFor the three-dimensional geometry algebraic spaceNext frame video image in be with p '
The minimum pixel of the difference of pixel value in the neighborhood of the heart and in current frame video image at p, wherein, p ' is next frame video figure
There is the pixel of same position as in the pixel at current frame video image p.vpReflect the motion of the pixel at p
Situation of change, including direction of motion change and movement velocity change etc., can use vpModulus value reflect the amplitude of motion change.
The direction of motion or movement velocity of pixel at ordinary circumstance, p change bigger, vpModulus value it is bigger, vice versa.
Step S2, each pixel is obtained using the motion vector and preset apparent-motion vector algorithms of the pixel
Apparent-motion vector of point;
Preset apparent-the motion vector algorithms are:
F ' (p)=f (p)+vp (6)
Wherein, f ' (p) represents apparent-motion vector of the pixel at p in video image, vpRepresent p in video image
The motion vector of the pixel at place, f (p) represents the apparent value of the pixel at p in video image, in embodiments of the present invention, f
(p) pixel value of the pixel in video image at p is represented.
The f ' (p) newly defined is a not only vector containing scalar information but also containing Vector Message, is not only reflected apparent
Information, but also reflect the situation of change of the direction of motion and movement velocity.
Based on definition above, video is apparent with movable information unified model UMAM (unified model of
Appearance and motion), F ' is abbreviated as, is built as follows:
F '=f ' (p) (7)
Wherein, f ' (p) is pixelFunction, i.e. AMV (appearance-motion vector, apparent-motion
Vector), it is used as independent variable.The analysis for having the above understands that UMAM not only includes the apparent information of video, additionally it is possible to reflecting video
Local motion information, including the direction of motion and speed;The space-time Harris angle points that the embodiment of the present invention is mentioned refer to UMAM-
Harris angle points.
Step S3, space-time second-order matrix is built with reference to apparent-motion vector of the pixel, and according to the space-time two
One space-time Harris angle point receptance function of rank matrix construction;
In order to be better understood from the technical scheme in the embodiment of the present invention, the three-dimensional geometry algebraic space of video is described belowUpper AMV weighted correlation function.
IfPoint p ' is the point on p neighborhood, and its coordinate is (p+ △ p), then f ' (p) and f ' (p ') weighting phase
Closing property function is defined as follows:
Wherein, ω (p) is three-dimensional Gaussian kernel function G (p, σ),Convolution algorithm is represented, W (p) is the size centered on p
For l × l × l Gauss function, σ is the scale factor of gaussian kernel function.
IfThen the embodiments of the invention provideIn Gaussian function G (p,
σ), it is defined as follows:
Wherein, σ isIn Gaussian function G (p, σ) scale factor, " ∧ " represent apposition, " " and represent inner product, Gauss
The size of window function is l × l × l, wherein there is l=6 σ+1.
In order to be better understood from the technical scheme in the embodiment of the present invention, above-mentioned Gaussian function G (p, σ) is will demonstrate that below
It is the three-dimensional geometry algebraic space of videoIn effective Gaussian function.
Prove:|p·σ|2It can further spread out as follows.
σ ∧ σ ∧ σ can further spread out as follows.
Then, formula (9) can be written as:
Based on above-mentioned proof, it is seen then that G (p, σ) is switched toIn form and common three-dimensional Gaussian function be it is consistent, because
This, the Gaussian function G (p, σ) provided in the embodiment of the present invention isIn effective Gaussian function.
Then, Taylor series expansion is carried out to f ' (p+ △ p), and takes first approximation:
Wherein, △ x, △ y, △ t are p with p ' in x, y, the distance on tri- directions of t, f 'x,f′y,f′tBe f ' (p) in x,
Gradient on tri- directions of y, t, i.e.,
Result in above-mentioned formula (14) (15) (16) is vector.
Therefore, formula (8) can be approximately
Wherein,
In formula (19), f 'x,f′y,f′tRepresent respectivelyIn x, y, the gradient of the AMV on tri- directions of t, ω is formula
(8) the gaussian weighing function ω (p) in,For convolution symbol, A, B, C, D, E and F difference homography M (p) each element.
Based on above-mentioned weighted correlation function, the flow that step S3 refinement step in the embodiment of the present invention is described below is shown
It is intended to, as shown in Fig. 2 step S3 includes:
S31, convolution (14), (15) and (16) calculates apparent-motion vector f ' (p) of each pixel in x, y, t tri-
The gradient f ' in individual directionx,f′y,f′t;
S32, using apparent-motion vector f ' (p) of the pixel in x, y, the gradient f ' in tri- directions of tx,f′y,f′t
Building second order gradient matrix N, the second order gradient matrix N is:
Specifically, the specific acquisition process of the second order gradient matrix N is referring to formula (17).
S33, carries out convolutional calculation to matrix N using gaussian weighing function ω (p), obtains space-time second-order matrix M (p), institute
Stating space-time second-order matrix M (p) is:
Wherein, ω is the gaussian weighing function ω (p) in formula (10), and the scale factor of Gaussian function takes σ=1,
For convolution symbol, A, B, C, D, E and F difference homography M (p) each element.
Specifically, the specific acquisition process of the space-time second-order matrix M (p) is referring to formula (17).
S34, a space-time Harris angle point receptance function is constructed according to the space-time second-order matrix;
The space-time Harris angle point receptance functions are:
R=det (M)-k (trace (M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
Wherein, R is space-time Harris angle point receptance function values, and k is to take k=0.04 in empirical, the embodiment of the present invention;
Det (M) representing matrix M (p) determinant, trace (M) representing matrix M (p) mark, its expression formula is as follows:
Det (M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
Trace (M)=λ1+λ2+λ3=A+B+C
Wherein, λ1、λ2、λ3Respectively matrix M characteristic value, A, B, C, D, E and F difference homography M (p) each yuan
Element.
Step S4, all pixels on certain pixel p and its neighborhood are calculated according to the space-time Harris angle point receptance functions
The space-time Harris angle point receptance function values of point, if pixel p space-time Harris angle point receptance functions value is more than its neighborhood
The space-time Harris angle point receptance function values of upper all pixels point, then point p is the space-time Harris angle points of video.
Specifically, responded by the UMAM-Harris angle points of other pixels in comparison point p and its h × h × h neighborhood
Functional value, such as fruit dot p have R (p) maximum, then point p is the UMAM-Harris angle points of video in its h × h × h neighborhood.
Referring to Fig. 3, being the structural representation of space-time Harris Corner Detection devices new in the embodiment of the present invention, the inspection
Surveying device includes:
Pretreatment module 1, for the video image spatial information (si) and time-domain information contained based on video bag, obtains the video
Three-dimensional geometry algebraic space;
Motion vector builds module 2, the motion vector of the pixel for building the three-dimensional geometry algebraic space;
Specifically, the motion vector builds module and is used to build three-dimensional geometry algebraic space according to equation belowWork as
Pixel in preceding frame video image at p points to p in next frame video imagerThe motion vector v of the pixel at placep, the motion
Vector vpFor:
vp=pr-p;
Wherein, p is the three-dimensional geometry algebraic spacePixel in middle current frame video image at p, prTo be described
Three-dimensional geometry algebraic spaceNext frame video image in field centered on p ' with current frame video image at p
The minimum pixel of the difference of pixel value, wherein, p ' be in next frame video image with pixel tool current frame video image p at
There is the pixel of same position.
Apparent-motion vector builds module 3, for the motion vector using the pixel and preset apparent-motion
Vector operation is obtained in the apparent-motion vector for obtaining each pixel of each pixel, the three-dimensional geometry algebraic space
Apparent-motion vector of all pixels point is that the video of foundation is apparent with movable information unified model.
Specifically, the preset apparent-motion vector algorithms are:
F ' (p)=f (p)+vp;
Wherein, vpThe motion vector of the pixel in video image at p is represented, f (p) represents the picture at p in video image
The pixel value of vegetarian refreshments, f ' (p) represents apparent-motion vector of the pixel at p in video image.
Space-time second-order matrix builds module 4, and space-time second moment is built for apparent-motion vector with reference to the pixel
Battle array;
Specifically, as shown in figure 4, the refinement functional module that the space-time second-order matrix builds module includes:
Gradient calculation module 41, for calculating apparent-motion vector f ' (p) of each pixel in x, y, tri- directions of t
Gradient f 'x,f′y,f′t;
Second order gradient matrix builds module 42, for apparent-motion vector f ' (p) using the pixel in x, y, t
The gradient f ' in three directionsx,f′y,f′tBuilding second order gradient matrix N, the second order gradient matrix N is:
Space-time second-order matrix acquisition module 43, for carrying out convolutional calculation to matrix N using gaussian weighing function ω (p),
Space-time second-order matrix M (p) is obtained, the space-time second-order matrix M (p) is:
Wherein, ω represents gaussian weighing function ω (p),For convolution symbol, A, B, C, D, E and F difference homography M
(p) each element.
Receptance function constructing module 5, is responded for constructing a space-time Harris angle point according to the space-time second-order matrix
Function;
Specifically, the space-time Harris angle point receptance functions are:
R=det (M)-k (trace (M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
Wherein, R is space-time Harris angle point receptance function values, and k is to take k=0.04 in empirical, the embodiment of the present invention;
Det (M) representing matrix M (p) determinant, trace (M) representing matrix M (p) mark, its expression formula is as follows:
Det (M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
Trace (M)=λ1+λ2+λ3=A+B+C
Wherein, λ1、λ2、λ3Respectively matrix M characteristic value, A, B, C, D, E and F difference homography M (p) each yuan
Element.
Space-time Harris angle points acquisition module 6, for calculating certain pixel according to the space-time Harris angle point receptance functions
Point p and all pixels point on its neighborhood space-time Harris angle point receptance function values, if pixel p space-time Harris angle points
Receptance function value be more than its neighborhood on all pixels point space-time Harris angle point receptance function values, then point p be video when
Empty Harris angle points.
Specifically, responded by the UMAM-Harris angle points of other pixels in comparison point p and its h × h × h neighborhood
Functional value, such as fruit dot p have R (p) maximum, then point p is the UMAM-Harris angle points of video in its h × h × h neighborhood.
In embodiments of the present invention, the video image spatial information (si) and time-domain information contained based on video bag, obtains described regard
The three-dimensional geometry algebraic space of frequency, and the motion vector of the pixel of the three-dimensional geometry algebraic space is built, and with reference to described
The motion vector of pixel builds apparent-motion vector, so that it is apparent with movable information unified model to obtain video, in the system
Space-time second-order matrix is built on the basis of one model, and a space-time Harris angle point is constructed according to the space-time second-order matrix and is rung
Function is answered, the space-time of all pixels point on certain pixel p and its neighborhood is calculated according to the space-time Harris angle point receptance functions
The space-time Harris angle point receptance function values of other pixels in Harris angle point receptance function values, comparison point p and its neighborhood
Size, come judge point p whether be video space-time Harris angle points.Relative to prior art, video is regarded as one by the present invention
Individual 3-D solid structure, establishes video apparent with movable information unified model, to the apparent information of video using Geometrical algebra
These have different dimensional geometry of numbers body with movable information and the relation between them establishes unified model, and are based on being somebody's turn to do
Model proposes a kind of detection algorithm of space-time Harris angle points so that the time-space domain of detection algorithm energy fully reflecting video is related
Property, and cause the space-time interest points extracted embody in time domain simultaneously on spatial domain comprising unique apparent information clearly to move
Change.
In order to verify the validity of the new space-time Harris angular-point detection methods in the embodiment of the present invention, the present invention is implemented
Example is commented the UMAM-Harris detection algorithms of proposition on current popular video behavior identification data collection UCF101
Estimate.UCF101 is one of current maximum video behavior identification data collection of real scene, and it includes the row of 101 classifications
For, 13320 videos altogether, each class behavior is constituted by 25 groups, and every group includes 4-7 video, same group of video by
Same person is completed under identical scene, and with identical background and shooting visual angle.This 101 class video is according to Moving Objects
Difference be divided into 5 major classes:The interaction (Human-Object Interaction) of people and thing, the motion (Body- of body
Motion Only), the interaction (Human-Human Interaction) of person to person, instrument playing (Playing Musical
) and sports (Sports) Instruments.UCF101 provides different scenes, including camera motion, the complexity back of the body
Scape, is blocked, different illumination and low resolution etc., is currently to compare challenging video behavior identification data collection.
The space-time interest points used in experimentation include it is proposed that UMAM-Harris detection algorithms extract and obtain
UMAM-Harris angle points, Harris3D detection algorithms extract obtained Harris3D features and SIFT3D detection algorithms are extracted
The SIFT3D features arrived.For each video, we extract its space-time interest points then calculate that each extracts when
ST-SIFT description of empty point of interest.Space-time interest points to the video for training carry out double sampling, then using PCA
The dimension of reduction description also has pre-training gauss hybrid models to calculate Fisher vectors, and final training obtains SVM models.For
Test, we equally treat test video and carry out PCA dimensionality reductions and calculate the Fisher vectors of their space-time interest points, finally make
The SVM models obtained with pre-training are classified.In order that obtaining, training set is not overlapping with the video of test set, and test set is comprising every
7 groups in one class behavior, remaining 18 groups are then used to train.
In order to test and assess the technical scheme in the embodiment of the present invention, we are right on UCF101 human body behavioral data collection
It is proposed that algorithm be estimated.Experimental section we first by UMAM-Harris detection algorithms extract particular video frequency on
Space-time interest points, recycle the algorithm based on UMAM-Harris angle points the video behavior on UCF101 data sets is divided
Class, and contrasted with the method that there is currently.
We are under identical Setup Experiments to one section of entitled " v_TableTennisShot_g01_ in UCF101 databases
C01 " the video played table tennis use respectively it is proposed that UMAM-Harris detection algorithms and Harris3D detection algorithms and
SIFT3D detection algorithms carry out the extraction of space-time interest points, extract result as shown in Fig. 5, Fig. 6 and Fig. 7.
As shown in a, b, c, d of Fig. 5,6,7, this be when frame of video is respectively set to 42,47,94 and 113,
In the distribution situation of the corresponding space-time interest points extracted of tri- kinds of detection algorithms of Harris3D, SIFT3D and UMAM-Harris, figure
Dot in frame of video represents the position for the space-time interest points extracted, the distribution situation of Fig. 5 correspondence Harris3D features, figure
The distribution situation of the distribution situations of 6 corresponding SIFT3D features and Fig. 7 correspondence UMAM-Harris angle points.
Experimental result more than be can see, and obtained Harris3D feature masters are detected using Harris3D detection algorithms
It is distributed in sportsman, is capable of detecting when the joint motions of sportsman, in addition, also detects to be distributed in video on a small quantity
Background on noise.In addition, the SIFT3D features extracted using SIFT3D detection algorithms are enriched very much, but central include
More background noise, these pseudo- space-time interest points will be that follow-up feature description and Classification and Identification bring interference.Compared to
The extraction result of Harris3D detection algorithms and SIFT3D detection algorithms, the UMAM-Harris that UMAM-Harris algorithms are extracted
Angle point is largely all distributed in sportsman, and comprising few background noise, because UMAM-Harris detections are calculated
Method is except considering that Moving Objects are respectively provided with significant grey scale change in spatial domain and time-domain, it is also contemplated that the fortune of Moving Objects
Information in terms of dynamic speed and the direction of motion, so as to extract the UMAM- for obtaining including movable information from video
Harris angle points, while inhibiting background noise.Therefore, these UMAM-Harris angle points can orient motion pair exactly
As the position at place, the behavior in video with significant changes is preferably characterized.Thus we may safely draw the conclusion:UMAM-
Harris detection algorithms have the advantages that robustness and validity remaining the feature that Harris3D detection algorithms detect
On the basis of, veritably extract the space-time interest points comprising unique apparent information and abundant movable information.
Video is regarded as a 3-D solid structure by the embodiment of the present invention, and it is apparent to establish video using Clifford algebraically
With movable information unified model, and a kind of new space-time Harris Corner Detection Algorithms --- UMAM- is developed on this basis
Harris Corner Detection Algorithms.Test result indicates that, UMAM-Harris detection algorithms, which can be veritably extracted on spatial domain, to be reflected
The unique apparent information of video can embody the space-time interest points of motion change in time domain simultaneously, and can effectively improve and regard
The classification accuracy of frequency Activity recognition.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.