CN107230220A

CN107230220A - A kind of new space-time Harris angular-point detection methods and device

Info

Publication number: CN107230220A
Application number: CN201710384274.9A
Authority: CN
Inventors: 李岩山; 夏荣杰; 谢维信; 刘鹏; 张勇
Original assignee: Shenzhen University
Current assignee: Chongqing Yifang Technology Co ltd
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2017-10-03
Anticipated expiration: 2037-05-26
Also published as: CN107230220B

Abstract

The present invention is applied to field of video image processing there is provided a kind of new space-time Harris angular-point detection methods, including：The video image spatial information (si) and time-domain information contained based on video bag, obtain the three-dimensional geometry algebraic space of video, and the motion vector of the pixel of three-dimensional geometry algebraic space is built, and motion vector and preset apparent motion vector operation using pixel obtains the apparent motion vector of each pixel；Space-time second-order matrix is built with reference to the apparent motion vector of pixel, and constructs a space-time Harris angle point receptance function；According to space-time Harris angle points receptance function calculate pixel functional value and judge pixel whether be video space-time Harris angle points.A kind of detection method and device that the present invention is provided, can be extracted on spatial domain comprising apparent information while embodying the space-time Harris angle points of clear motion change in time domain.

Description

Novel space-time Harris corner detection method and device

Technical Field

The invention belongs to the field of video images, and particularly relates to a novel space-time Harris corner detection method and device.

Background

Spatio-temporal interest points are a class of locally invariant features of video, and are applied to video behavior recognition due to uniqueness and descriptiveness. The Harris3D detection algorithm is extended from a Harris corner detection algorithm applied to two-dimensional images, and is one of the space-time interest point detection algorithms widely applied at present. The Harris3D detection algorithm can detect simple and unique Harris3D features in a video scene with a cluttered background, and the Harris3D detection algorithm has the advantages of robustness, effectiveness and the like.

The Harris3D detection algorithm is carried out based on a traditional European geometric model, the traditional European geometric model refers to a common three-dimensional model of xyz, a video is regarded as a cube, and xyt is equivalent to xyz, so that a time axis t is treated the same as xy in space; however, the spatial domain and the time domain of the video are different, and the spatial-temporal correlation exists between the spatial domain and the time domain, so that the spatial-temporal correlation cannot be treated equally, and the correlation of video pixel points in the spatial-temporal domain is split simply by expanding a feature detection algorithm of a two-dimensional image on the video; moreover, the Harris3D detection algorithm only detects Harris3D features from the perspective of pixel value changes, so that only motion information of video blurring can be captured, and only interest points containing clear motion information can provide necessary information for video behavior identification. In addition, apparent information (including information such as pixel points, edges and textures), motion information and other geometric objects with different dimensions exist in the video, and the traditional european geometric model cannot establish a unified model for analyzing and processing the geometric objects with different dimensions and the relationship between the geometric objects with different dimensions.

Therefore, a detection algorithm capable of fully reflecting the time-space domain correlation of the video and further truly extracting the time-space interest point, namely the time-space Harris corner point, which contains unique apparent information in the space domain and shows clear motion change in the time domain is needed.

Disclosure of Invention

The invention provides a novel space-time Harris corner detection method and a novel space-time Harris corner detection device, and aims to provide a detection algorithm which can fully reflect the space-time domain correlation of a video, and further can extract space-time Harris corners which contain apparent information in a space domain and reflect clear motion change in a time domain.

The invention provides a novel space-time Harris corner detection method, which comprises the following steps:

obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of a video image contained in the video, and constructing a motion vector of a pixel point of the three-dimensional geometric algebraic space;

obtaining an apparent-motion vector of each pixel point by using the motion vector of the pixel point and a preset apparent-motion vector algorithm;

combining the appearance-motion vectors of the pixel points to construct a space-time second-order matrix, and constructing a space-time Harris corner response function according to the space-time second-order matrix;

and calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood of the certain pixel point p according to the space-time Harris corner response function, wherein if the space-time Harris corner response function value of the pixel point p is larger than the space-time Harris corner response function values of all pixel points in the neighborhood of the certain pixel point p, the point p is the space-time Harris corner of the video.

Further, the constructing the motion vector of the pixel point in the three-dimensional geometric algebraic space includes:

constructing a three-dimensional geometric algebraic spacePoints the pixel point at p in the current frame video image to p in the next frame video image_rMotion vector v of pixel point of (d)_pSaid motion vector v_pComprises the following steps:

v_p＝p_r-p；

wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current frame_rFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.

Further, the preset apparent-motion vector algorithm is:

f′(p)＝f(p)+v_p；

wherein v is_pRepresents a motion vector for a pixel point at p in the video image, f (p) represents a pixel value for a pixel point at p in the video image, and f' (p) represents an apparent-motion vector for a pixel point at p in the video image.

Further, the constructing a spatio-temporal second-order matrix by combining the apparent-motion vectors of the pixel points includes:

calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t '_x,f′_y,f′_t；

Utilizing the gradient f ' of the apparent-motion vector f ' (p) of the pixel point in the three directions of x, y and t '_x,f′_y,f′_tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:

performing convolution calculation on the matrix N by using a Gaussian weighting function omega (p) to obtain a space-time second-order matrix M (p), wherein the space-time second-order matrix M (p) is as follows:

where ω denotes a Gaussian weighting function ω (p),for the convolution symbols, A, B, C, D, E and F

Corresponding to the elements of the matrix m (p), respectively.

Further, the spatio-temporal Harris corner response function is:

R＝det(M)-k(trace(M))³＝(ABC+2DEF-BE²-AF²-CD²)-k(A+B+C)³

wherein R is a space-time Harris corner response function value, and k is an empirical constant; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:

det(M)＝λ₁λ₂λ₃＝ABC+2DEF-BE²-AF²-CD²

trace(M)＝λ₁+λ₂+λ₃＝A+B+C

wherein λ is₁、λ₂、λ₃Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).

The invention also provides a novel space-time Harris angular point detection device, which comprises:

the preprocessing module is used for obtaining a three-dimensional geometric algebraic space of the video based on spatial information and time domain information of a video image contained in the video;

the motion vector construction module is used for constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;

the appearance-motion vector construction module is used for obtaining an appearance-motion vector of each pixel point by utilizing the motion vector of the pixel point and a preset appearance-motion vector algorithm;

the space-time second-order matrix construction module is used for constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points;

the response function constructing module is used for constructing a space-time Harris corner response function according to the space-time second-order matrix;

and the space-time Harris angular point acquisition module is used for calculating space-time Harris angular point response function values of a certain pixel point p and all pixel points in the neighborhood of the pixel point p according to the space-time Harris angular point response function values, and if the space-time Harris angular point response function values of the pixel point p are larger than the space-time Harris angular point response function values of all pixel points in the neighborhood of the pixel point p, the point p is the space-time Harris angular point of the video.

Further, the motion vector construction module is specifically configured to construct a three-dimensional geometric algebraic space according to the following formulaPoints the pixel point at p in the current frame video image to p in the next frame video image_rMotion vector v of pixel point of (d)_pSaid motion vector v_pComprises the following steps:

v_p＝p_r-p；

Further, the preset apparent-motion vector algorithm is:

f′(p)＝f(p)+v_p；

Further, the spatio-temporal second-order matrix constructing module includes:

a gradient calculation module for calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in the three directions of x, y and t '_x,f′_y,f′_t；

A second-order gradient matrix construction module for utilizing the gradient f ' of the apparent-motion vector f ' (p) of the pixel point in the three directions of x, y and t '_x,f′_y,f′_tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:

the space-time second-order matrix obtaining module is used for performing convolution calculation on the matrix N by utilizing a Gaussian weighting function omega (p) to obtain a space-time second-order matrix M (p), and the space-time second-order matrix M (p) is as follows:

where ω denotes a Gaussian weighting function ω (p),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.

Further, the spatio-temporal Harris corner response function is:

R＝det(M)-k(trace(M))³＝(ABC+2DEF-BE²-AF²-CD²)-k(A+B+C)³

det(M)＝λ₁λ₂λ₃＝ABC+2DEF-BE²-AF²-CD²

trace(M)＝λ₁+λ₂+λ₃＝A+B+C

Compared with the prior art, the invention has the beneficial effects that: the invention provides a novel space-time Harris angular point detection method and a device, which are used for obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of video images contained in the video and constructing motion vectors of pixel points of the three-dimensional geometric algebraic space, and combining the motion vectors of the pixel points to construct an appearance-motion vector so as to obtain a video appearance and motion information unified model, constructing a spatio-temporal second-order matrix on the basis of the unified model, constructing a spatio-temporal Harris corner response function according to the spatio-temporal second-order matrix, and calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood of the certain pixel point according to the space-time Harris corner response function, and comparing the space-time Harris corner response function values of the certain pixel point p and other pixel points in the neighborhood of the certain pixel point p to judge whether the certain pixel point p is a space-time Harris corner of the video. Compared with the prior art, the method has the advantages that the video is regarded as a three-dimensional structure, a unified model of the video appearance and the motion information is established by adopting geometric algebra, the unified model is established for the geometric bodies with different dimensions and the relation between the geometric bodies with different dimensions of the video appearance information and the motion information, and a space-time Harris corner point detection algorithm is provided based on the model, so that the detection algorithm can fully reflect the time-space domain correlation of the video, and the extracted space-time interest points contain unique appearance information in the space domain and simultaneously represent clear motion change in the time domain.

Drawings

FIG. 1 is a flow chart of a new spatio-temporal Harris corner detection method according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of the refinement step of step S3 provided by the embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a new spatio-temporal Harris corner detection apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a refinement function module of the spatio-temporal second-order matrix building module according to an embodiment of the present invention;

fig. 5a, fig. 5b, fig. 5c and fig. 5d are schematic diagrams illustrating distribution of Harris3D features when video frames are set to 42, 47, 94 and 113 respectively according to an embodiment of the present invention;

fig. 6a, fig. 6b, fig. 6c and fig. 6d are schematic diagrams of distribution of SIFT3D features when video frames provided by the embodiment of the present invention are respectively set to 42, 47, 94 and 113;

fig. 7a, fig. 7b, fig. 7c, and fig. 7d are schematic diagrams illustrating distribution of UMAM-Harris corners when video frames are set to 42, 47, 94, and 113, respectively, according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The problem that the Harris corner points of the space-time interest points, which contain unique apparent information in the space domain and represent clear motion change in the time domain, can not be truly extracted due to the fact that the correlation of the space-time domain of the video cannot be fully reflected in the prior art.

In order to solve the technical problems, the invention provides a novel space-time Harris corner detection method, which treats a video as a three-dimensional structure, adopts geometric algebra to establish a unified model of video appearance and motion information, establishes a unified model for geometric bodies with different dimensions of the video appearance information and the motion information and the relation between the geometric bodies and the motion information, and provides a novel space-time Harris corner detection algorithm based on the model, so that the detection algorithm can fully reflect the time-space domain correlation of the video, and extracted space-time interest points contain unique appearance information in a space domain and simultaneously represent clear motion change in a time domain.

Referring to fig. 1, a new spatio-temporal Harris corner detection method according to an embodiment of the present invention includes:

step S1, obtaining a three-dimensional geometric algebraic space of a video based on spatial domain information and time domain information of a video image contained in the video, and constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;

in the embodiment of the present invention, modeling is performed in a video, a sequence of video images (video frames) included in the video may be represented as a three-dimensional stereo structure, including spatial information and temporal information, and it should be noted that both appearance information and motion information in the video may be represented by geometric algebra in the three-dimensional stereo structure. In view of the simplicity of the operation of geometric (Clifford) algebra on vector data and geometric information, the video is modeled herein using three-dimensional geometric algebra as a mathematical framework, and a representation model of a video image sequence under the geometric algebra framework will be explained below.

Let R³Is a three-dimensional Euclidean space formed by spatial information and time domain information of a video image sequence contained in a video, and the orthonormal base of the three-dimensional Euclidean space is { e }₁,e₂,e₃Then these orthonormal bases are spanned by the geometry to R³The geometric algebraic space ofTheI.e. the three-dimensional geometric algebraic space of the video, in the embodiment of the present invention, the geometric algebraic space is subsequently usedIt is briefly described asOne set of canonical bases for this is:

{1,e₁,e₂,e₃,e₁∧e₂,e₂∧e₃,e₁∧e₃,e₁∧e₂∧e₃} (1)

wherein ^ represents the sign of the geometric algebraic outer product calculation, e₁∧e₂，e₂∧e₃And e₁∧e₃Is composed of three orthogonal radicals e₁、e₂And e₃Three independent double outer products are obtained, which represent geometrically separatePlanes of two vector representations in space, e₁∧e₂∧e₃Is the triple outer product: e.g. of the type₁∧e₂∧e₃＝(e₁∧e₂)e₃The geometrical interpretation is: double outer product e₁∧e₂Along the vector e₃The obtained directed geometry is moved. { e₁,e₂,e₃Can be seen asThe basis vectors { x, y, t } of the 3-dimensional vector subspace.

The video sequence F can be represented as:

F＝f(p) (2)

wherein,and p ═ xe₁+ye₂+te₃(ii) a x and y represent spatial coordinates and have 0 < x < S_x，0＜y＜S_y(ii) a t represents the time domain coordinate, and 0 < t < S_t. F (p) represents the apparent value of the pixel at p in the video F.

Let p₁,And p is₁＝x₁e₁+y₁e₂+t₁e₃，p₂＝x₂e₁+y₂e₂+t₂e₃Then their geometric product can be expressed as:

p₁p₂＝p₁·p₂+p₁∧p₂(3)

wherein, represents the inner product, and Λ represents the outer product. It means that the geometric product of two vectors is composed of inner product (p)₁·p₂) Sum and outer product (p)₁∧p₂) And (4) the sum of the components.

In thatIn, p₁And p₂Can be represented by △ p, i.e.:

△p＝p₁-p₂＝(x₁-x₂)e₁+(y₁-y₂)e₂+(t₁-t₂)e₃(4)

it represents a slave p₂Point of direction p₁The vector of (2) not only is a measure of the distance between two pixel points, but also can reflect the motion condition of the pixel points in the video sequence.

The above is an introduction of the three-dimensional geometric algebraic space of the video in the embodiment of the present invention.

In the embodiment of the present invention, the motion vector of the pixel point for constructing the three-dimensional geometric algebraic space is specifically constructed byPoints the pixel point at p in the current frame video image to p in the next frame video image_rMotion vector v of pixel point of (d)_pSaid motion vector v_pComprises the following steps:

v_p＝p_r-p； (5)

wherein, if p is set, the ratio of p,and p ═ x_ie₁+y_je₂+t_ke₃，p′＝x_ie₁+y_je₂+(t_k+1)e₃S is when t is equal to t_kSet of points of the neighborhood of l × l centered at p' on the +1 plane.

Wherein p is the three-dimensional geometric algebraic spaceThe pixel point at p in the video image of the current frame,i.e. p_rFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image. v. of_pThe motion change conditions of the pixel points at the p position are reflected, including the change of the motion direction, the change of the motion speed and the like, and the change can be expressed as v_pTo reflect the magnitude of the motion change. In general, the larger the change of the motion direction or the motion speed of the pixel point at p is, the larger v_pThe larger the modulus value of (a) and vice versa.

Step S2, obtaining the appearance-motion vector of each pixel point by using the motion vector of the pixel point and a preset appearance-motion vector algorithm;

the preset apparent-motion vector algorithm is as follows:

f′(p)＝f(p)+v_p(6)

where f' (p) represents the apparent-motion vector of the pixel at p in the video image, v_pRepresenting the motion vector of the pixel point at p in the video image, and f (p) representing the apparent value of the pixel point at p in the video image, wherein f (p) represents the pixel value of the pixel point at p in the video image in the embodiment of the present invention.

The newly defined f' (p) is a vector containing both scalar quantity information and vector quantity information, and reflects not only the appearance information but also the change of the moving direction and the moving speed.

Based on the above definition, a unified model umam (unified model of appearance and motion), abbreviated as F', of video appearance and motion is constructed as follows:

F′＝f′(p) (7)

wherein f' (p) is a pixel pointA function of (a), or an apparent-motion vector (AMV), as an argument. As can be seen from the above analysis, the UMAM not only contains the apparent information of the video, but also can reflect the local motion information of the video, including the motion direction and speed; the spatio-temporal Harris corner points mentioned in the embodiments of the present invention refer to UMAM-Harris corner points.

Step S3, constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points, and constructing a space-time Harris corner response function according to the space-time second-order matrix;

in order to better understand the technical solution of the embodiment of the present invention, the following describes a three-dimensional geometric algebraic space of a videoWeighted correlation function of the upper AMV.

Is provided withPoint p 'is a point in the neighborhood of p with coordinates (p + △ p), then the weighted correlation function of f' (p) and f '(p') is defined as follows:

wherein ω (p) is a three-dimensional Gaussian kernel function G (p, σ),w (p) represents a convolution operation, and is a gaussian window function of size l × l × l centered at p, σ being a scale factor of the gaussian kernel function.

Is provided withThen the embodiment of the invention providesGaussian function G (p, σ) in (a), defined as follows:

wherein σ isThe scale factor of gaussian function G (p, σ) in (a) represents the outer product, "· represents the inner product, and the size of the gaussian window function is l × l × l, where l ═ 6 σ + 1.

In order to better understand the technical solution in the embodiment of the present invention, it is proved that the above-mentioned gaussian function G (p, σ) is a three-dimensional geometric algebraic space of a videoA medium effective gaussian function.

And (3) proving that: | p · σ -²It can be further developed as follows.

σ ^ σ may be further expanded as follows.

Then, equation (9) can be written as:

based on the above evidence, it can be seen that G (p, σ) is converted toThe form of (b) is consistent with the general three-dimensional Gaussian function, and therefore, the Gaussian function G (p, σ) provided in the embodiment of the present invention isA medium effective gaussian function.

Then, the taylor series expansion is performed on f' (p +. DELTA.p), and a first order approximation is taken:

wherein △ x, △ y, △ t is the distance between p and p 'in the three directions of x, y and t, and f'_x,f′_y,f′_tIs the gradient of f' (p) in the three x, y, t directions, i.e.

The results in the above equations (14), (15) and (16) are all vectors.

Thus, equation (8) can be approximated as

Wherein,

of formula (19), f'_x,f′_y,f′_tRespectively representThe gradient of AMV in the three directions x, y, t, ω is a Gaussian weighting function ω (p) in equation (8),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.

Based on the weighted correlation function, a flowchart of the step of refining step S3 in the embodiment of the present invention is described below, and as shown in fig. 2, the step S3 includes:

s31, calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t by combining the formulas (14), (15) and (16) '_x,f′_y,f′_t；

S32, using the gradient f ' of the apparent motion vector f ' (p) of the pixel point in three directions of x, y and t '_x,f′_y,f′_tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:

specifically, the specific acquisition process of the second-order gradient matrix N is shown in equation (17).

S33, performing convolution calculation on the matrix N by using a gaussian weighting function ω (p) to obtain a space-time second-order matrix m (p), where the space-time second-order matrix m (p) is:

where ω is a gaussian weighting function ω (p) in formula (10), and the scale factor of the gaussian function takes σ equal to 1,for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.

Specifically, the specific acquisition process of the spatio-temporal second order matrix m (p) is shown in formula (17).

S34, constructing a space-time Harris corner response function according to the space-time second-order matrix;

the spatio-temporal Harris corner response function is:

R＝det(M)-k(trace(M))³＝(ABC+2DEF-BE²-AF²-CD²)-k(A+B+C)³

wherein, R is a space-time Harris angular point response function value, k is an empirical constant, and k is 0.04 in the embodiment of the invention; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:

det(M)＝λ₁λ₂λ₃＝ABC+2DEF-BE²-AF²-CD²

trace(M)＝λ₁+λ₂+λ₃＝A+B+C

Step S4, calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood according to the space-time Harris corner response function, and if the space-time Harris corner response function value of the pixel point p is larger than the space-time Harris corner response function values of all pixel points in the neighborhood, the point p is the space-time Harris corner of the video.

Specifically, by comparing the UMAM-Harris corner response function values of the point p and other pixel points in the h × h × h neighborhood, if the point p has the maximum r (p) in the h × h × h neighborhood, the point p is the UMAM-Harris corner of the video.

Referring to fig. 3, a schematic structural diagram of a new spatio-temporal Harris corner detection apparatus in an embodiment of the present invention is shown, where the detection apparatus includes:

the video processing device comprises a preprocessing module 1, a video processing module and a video processing module, wherein the preprocessing module is used for obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of a video image contained in the video;

the motion vector construction module 2 is used for constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;

specifically, the motion vector construction module is used for constructing a three-dimensional geometric algebraic space according to the following formulaPoints the pixel point at p in the current frame video image to p in the next frame video image_rMotion vector v of pixel point of (d)_pSaid motion vector v_pComprises the following steps:

v_p＝p_r-p；

wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current frame_rFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the current frame video image and the pixel value of the p position in the field taking p 'as the center in the next frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.

And the appearance-motion vector construction module 3 is used for obtaining an appearance-motion vector of each pixel point by using the motion vector of the pixel point and a preset appearance-motion vector algorithm, wherein the appearance-motion vector of each pixel point in the three-dimensional geometric algebraic space is the established video appearance and motion information unified model.

Specifically, the preset apparent-motion vector algorithm is:

f′(p)＝f(p)+v_p；

The space-time second-order matrix construction module 4 is used for constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points;

specifically, as shown in fig. 4, the refinement function module of the spatio-temporal second-order matrix construction module includes:

a gradient calculating module 41 for calculating a gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t '_x,f′_y,f′_t；

A second-order gradient matrix constructing module 42, configured to utilize gradient f ' of the apparent-motion vector f ' (p) of the pixel point in three directions x, y, t '_x,f′_y,f′_tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:

a space-time second-order matrix obtaining module 43, configured to perform convolution calculation on the matrix N by using a gaussian weighting function ω (p) to obtain a space-time second-order matrix m (p), where the space-time second-order matrix m (p) is:

A response function constructing module 5, configured to construct a spatio-temporal Harris corner response function according to the spatio-temporal second order matrix;

specifically, the spatio-temporal Harris corner response function is:

R＝det(M)-k(trace(M))³＝(ABC+2DEF-BE²-AF²-CD²)-k(A+B+C)³

det(M)＝λ₁λ₂λ₃＝ABC+2DEF-BE²-AF²-CD²

trace(M)＝λ₁+λ₂+λ₃＝A+B+C

And the space-time Harris angular point acquisition module 6 is used for calculating space-time Harris angular point response function values of a certain pixel point p and all pixel points in the neighborhood of the pixel point p according to the space-time Harris angular point response function values, and if the space-time Harris angular point response function values of the pixel point p are larger than the space-time Harris angular point response function values of all pixel points in the neighborhood of the pixel point p, the point p is the space-time Harris angular point of the video.

In the embodiment of the invention, a three-dimensional geometric algebraic space of a video is obtained based on spatial information and time domain information of a video image contained in the video, motion vectors of pixel points of the three-dimensional geometric algebraic space are constructed, an appearance-motion vector is constructed by combining the motion vectors of the pixel points, a unified model of video appearance and motion information is obtained, a space-time second-order matrix is constructed on the basis of the unified model, a space-time Harris corner response function is constructed according to the space-time second-order matrix, space-time Harris corner response function values of a certain pixel point p and all pixel points in a neighborhood of the certain pixel point p are calculated according to the space-time Harris corner response function values, and the size of the space-time Harris corner response function values of other pixel points in the neighborhood of the certain pixel point p is compared to judge whether the point p is the space. Compared with the prior art, the method has the advantages that the video is regarded as a three-dimensional structure, a unified model of the video appearance and the motion information is established by adopting geometric algebra, the unified model is established for the geometric bodies with different dimensions and the relation between the geometric bodies with different dimensions of the video appearance information and the motion information, and a space-time Harris corner point detection algorithm is provided based on the model, so that the detection algorithm can fully reflect the time-space domain correlation of the video, and the extracted space-time interest points contain unique appearance information in the space domain and simultaneously represent clear motion change in the time domain.

In order to verify the effectiveness of the new spatio-temporal Harris corner detection method in the embodiment of the present invention, the embodiment of the present invention evaluates the proposed UMAM-Harris detection algorithm on the current popular video behavior recognition data set UCF 101. The UCF101 is one of the largest current video behavior recognition data sets of real scenes, and comprises 101 categories of behaviors, wherein the total number of the behaviors is 13320, each category of behaviors is composed of 25 groups, each group comprises 4-7 videos, and the videos in the same group are all completed by the same person under the same scene and have the same background and shooting view angle. These 101 types of videos are classified into 5 major categories according to the difference of moving objects: Human-Object Interaction, Body-Motion Only, Human-Human Interaction, musical instrument Playing, and Sports (Sports). The UCF101 provides different scenes including camera motion, complex background, occlusion, different lighting and low resolution, etc., which are currently challenging video behavior recognition datasets.

The spatio-temporal interest points used in the experimental process comprise UMAM-Harris angular points extracted by the UMAM-Harris detection algorithm, Harris3D features extracted by the Harris3D detection algorithm and SIFT3D features extracted by the SIFT3D detection algorithm. For each video, we extract its spatio-temporal interest points and then compute the ST-SIFT descriptors of each extracted spatio-temporal interest point. And performing subsampling on the space-time interest points of the video used for training, then using PCA to reduce the dimensionality of the descriptor and a pre-training Gaussian mixture model to calculate a Fisher vector, and finally training to obtain the SVM model. For testing, PCA dimensionality reduction is carried out on the videos to be tested, Fisher vectors of space-time interest points of the videos are calculated, and finally classification is carried out by using an SVM model obtained through pre-training. In order to keep the training set from overlapping the video of the test set, the test set contains 7 groups of each type of behavior, and the remaining 18 groups are used for training.

In order to test and evaluate the technical solution in the embodiment of the present invention, the algorithm proposed by us is evaluated on the UCF101 human behavior data set. In an experimental part, a UMAM-Harris detection algorithm is used for extracting spatio-temporal interest points on a specific video, then an algorithm based on UMAM-Harris corner points is used for classifying video behaviors on a UCF101 data set, and the classification is compared with a current existing method.

Under the same experimental setup, we extract spatio-temporal interest points from a table tennis video named "v _ TableTennisShot _ g01_ c 01" in the UCF101 database by using the proposed UMAM-Harris detection algorithm, the Harris3D detection algorithm and the SIFT3D detection algorithm, respectively, and the extraction results are shown in fig. 5, fig. 6 and fig. 7.

As shown in fig. 5, 6, and 7, a, b, c, and d, when the video frames are set to 42, 47, 94, and 113, the three detection algorithms of Harris3D, SIFT3D, and UMAM-Harris correspond to the distribution of the extracted spatio-temporal interest points, the dots on the video frames in the figure represent the positions of the extracted spatio-temporal interest points, fig. 5 corresponds to the distribution of Harris3D features, fig. 6 corresponds to the distribution of SIFT3D features, and fig. 7 corresponds to the distribution of UMAM-Harris corners.

From the above experimental results, it can be seen that Harris3D features detected by using the Harris3D detection algorithm are mainly distributed on the body of the athlete, so that the joint movement of the athlete can be detected, and in addition, a small amount of noise distributed on the background of the video is also detected. In addition, SIFT3D features extracted by using the SIFT3D detection algorithm are very rich, but contain more background noise points, and the pseudo-spatiotemporal interest points bring interference for subsequent feature description and classification identification. Compared with the extraction results of the Harris3D detection algorithm and the SIFT3D detection algorithm, most of UMAM-Harris corner points extracted by the UMAM-Harris algorithm are distributed on the body of the sportsman and contain few background noise points, because the UMAM-Harris detection algorithm considers that the moving object has obvious gray level changes in the spatial domain and the time domain, and also considers the information of the moving speed, the moving direction and the like of the moving object, so that UMAM-Harris corner points containing moving information can be extracted from the video, and the background noise points are restrained. Therefore, the UMAM-Harris corners can accurately position the position of the moving object, and can better represent the behavior with remarkable change in the video. From this we can conclude that: the UMAM-Harris detection algorithm truly extracts the spatio-temporal interest points containing unique apparent information and rich motion information on the basis of retaining the advantages of robustness, effectiveness and the like of the features detected by the Harris3D detection algorithm.

The embodiment of the invention regards the video as a three-dimensional structure, establishes a unified model of video appearance and motion information by using Clifford algebra, and develops a new space-time Harris corner detection algorithm-UMAM-Harris corner detection algorithm on the basis. Experimental results show that the UMAM-Harris detection algorithm can truly extract the space-time interest points which reflect the unique appearance information of the video in a space domain and can reflect the motion change in a time domain, and the classification accuracy of video behavior identification can be effectively improved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A new space-time Harris corner detection method is characterized by comprising the following steps:

2. The spatio-temporal Harris corner detection method of claim 1, wherein the constructing motion vectors for pixel points of the three-dimensional geometric algebraic space comprises:

v_p＝p_r-p；

3. The spatiotemporal Harris corner detection method of claim 2, wherein the preset apparent-motion vector algorithm is:

f′(p)＝f(p)+v_p；

4. The spatio-temporal Harris corner detection method of claim 3, wherein said constructing a spatio-temporal second order matrix in combination with the apparent-motion vectors of the pixel points comprises:

<mrow> <mi>N</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>

<mrow> <mi>M</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&omega;</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>&CircleTimes;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>A</mi> </mtd> <mtd> <mi>D</mi> </mtd> <mtd> <mi>E</mi> </mtd> </mtr> <mtr> <mtd> <mi>D</mi> </mtd> <mtd> <mi>B</mi> </mtd> <mtd> <mi>F</mi> </mtd> </mtr> <mtr> <mtd> <mi>E</mi> </mtd> <mtd> <mi>F</mi> </mtd> <mtd> <mi>C</mi> </mtd> </mtr> </mtable> </mfenced> </mrow>1

5. The spatio-temporal Harris corner detection method of claim 4, wherein the spatio-temporal Harris corner response function is:

R＝det(M)-k(trace(M))³＝(ABC+2DEF-BE²-AF²-CD²)-k(A+B+C)³wherein R is a space-time Harris corner response function value, and k is an empirical constant; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:

det(M)＝λ₁λ₂λ₃＝ABC+2DEF-BE²-AF²-CD²

trace(M)＝λ₁+λ₂+λ₃＝A+B+C

6. A new space-time Harris corner detection device is characterized by comprising:

7. The spatio-temporal Harris corner detection apparatus as claimed in claim 6, wherein said motion vector construction module is specifically configured to construct a three-dimensional geometric algebraic space according to the following formulaPoints the pixel point at p in the current frame video image to p in the next frame video image_rMotion vector v of pixel point of (d)_pSaid motion vector v_pComprises the following steps:

v_p＝p_r-p；

8. The spatiotemporal Harris corner detection apparatus of claim 7, wherein the preset apparent-motion vector algorithm is:

f′(p)＝f(p)+v_p；

9. The spatio-temporal Harris corner detection apparatus of claim 8, wherein the spatio-temporal second order matrix construction module comprises:

<mrow> <mi>M</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&omega;</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>&CircleTimes;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>&CircleTimes;</mo> <mi>&omega;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>A</mi> </mtd> <mtd> <mi>D</mi> </mtd> <mtd> <mi>E</mi> </mtd> </mtr> <mtr> <mtd> <mi>D</mi> </mtd> <mtd> <mi>B</mi> </mtd> <mtd> <mi>F</mi> </mtd> </mtr> <mtr> <mtd> <mi>E</mi> </mtd> <mtd> <mi>F</mi> </mtd> <mtd> <mi>C</mi> </mtd> </mtr> </mtable> </mfenced> </mrow>

10. The spatio-temporal Harris corner detection apparatus of claim 9, wherein the spatio-temporal Harris corner response function is:

R＝det(M)-k(trace(M))³＝(ABC+2DEF-BE²-AF²-CD²)-k(A+B+C)³

det(M)＝λ₁λ₂λ₃＝ABC+2DEF-BE²-AF²-CD²

trace(M)＝λ₁+λ₂+λ₃＝A+B+C