CN107230220A - A kind of new space-time Harris angular-point detection methods and device - Google Patents

A kind of new space-time Harris angular-point detection methods and device Download PDF

Info

Publication number
CN107230220A
CN107230220A CN201710384274.9A CN201710384274A CN107230220A CN 107230220 A CN107230220 A CN 107230220A CN 201710384274 A CN201710384274 A CN 201710384274A CN 107230220 A CN107230220 A CN 107230220A
Authority
CN
China
Prior art keywords
msubsup
mtd
mrow
prime
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710384274.9A
Other languages
Chinese (zh)
Other versions
CN107230220B (en
Inventor
李岩山
夏荣杰
谢维信
刘鹏
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Yifang Technology Co ltd
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN201710384274.9A priority Critical patent/CN107230220B/en
Publication of CN107230220A publication Critical patent/CN107230220A/en
Application granted granted Critical
Publication of CN107230220B publication Critical patent/CN107230220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Image Analysis (AREA)

Abstract

The present invention is applied to field of video image processing there is provided a kind of new space-time Harris angular-point detection methods, including:The video image spatial information (si) and time-domain information contained based on video bag, obtain the three-dimensional geometry algebraic space of video, and the motion vector of the pixel of three-dimensional geometry algebraic space is built, and motion vector and preset apparent motion vector operation using pixel obtains the apparent motion vector of each pixel;Space-time second-order matrix is built with reference to the apparent motion vector of pixel, and constructs a space-time Harris angle point receptance function;According to space-time Harris angle points receptance function calculate pixel functional value and judge pixel whether be video space-time Harris angle points.A kind of detection method and device that the present invention is provided, can be extracted on spatial domain comprising apparent information while embodying the space-time Harris angle points of clear motion change in time domain.

Description

Novel space-time Harris corner detection method and device
Technical Field
The invention belongs to the field of video images, and particularly relates to a novel space-time Harris corner detection method and device.
Background
Spatio-temporal interest points are a class of locally invariant features of video, and are applied to video behavior recognition due to uniqueness and descriptiveness. The Harris3D detection algorithm is extended from a Harris corner detection algorithm applied to two-dimensional images, and is one of the space-time interest point detection algorithms widely applied at present. The Harris3D detection algorithm can detect simple and unique Harris3D features in a video scene with a cluttered background, and the Harris3D detection algorithm has the advantages of robustness, effectiveness and the like.
The Harris3D detection algorithm is carried out based on a traditional European geometric model, the traditional European geometric model refers to a common three-dimensional model of xyz, a video is regarded as a cube, and xyt is equivalent to xyz, so that a time axis t is treated the same as xy in space; however, the spatial domain and the time domain of the video are different, and the spatial-temporal correlation exists between the spatial domain and the time domain, so that the spatial-temporal correlation cannot be treated equally, and the correlation of video pixel points in the spatial-temporal domain is split simply by expanding a feature detection algorithm of a two-dimensional image on the video; moreover, the Harris3D detection algorithm only detects Harris3D features from the perspective of pixel value changes, so that only motion information of video blurring can be captured, and only interest points containing clear motion information can provide necessary information for video behavior identification. In addition, apparent information (including information such as pixel points, edges and textures), motion information and other geometric objects with different dimensions exist in the video, and the traditional european geometric model cannot establish a unified model for analyzing and processing the geometric objects with different dimensions and the relationship between the geometric objects with different dimensions.
Therefore, a detection algorithm capable of fully reflecting the time-space domain correlation of the video and further truly extracting the time-space interest point, namely the time-space Harris corner point, which contains unique apparent information in the space domain and shows clear motion change in the time domain is needed.
Disclosure of Invention
The invention provides a novel space-time Harris corner detection method and a novel space-time Harris corner detection device, and aims to provide a detection algorithm which can fully reflect the space-time domain correlation of a video, and further can extract space-time Harris corners which contain apparent information in a space domain and reflect clear motion change in a time domain.
The invention provides a novel space-time Harris corner detection method, which comprises the following steps:
obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of a video image contained in the video, and constructing a motion vector of a pixel point of the three-dimensional geometric algebraic space;
obtaining an apparent-motion vector of each pixel point by using the motion vector of the pixel point and a preset apparent-motion vector algorithm;
combining the appearance-motion vectors of the pixel points to construct a space-time second-order matrix, and constructing a space-time Harris corner response function according to the space-time second-order matrix;
and calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood of the certain pixel point p according to the space-time Harris corner response function, wherein if the space-time Harris corner response function value of the pixel point p is larger than the space-time Harris corner response function values of all pixel points in the neighborhood of the certain pixel point p, the point p is the space-time Harris corner of the video.
Further, the constructing the motion vector of the pixel point in the three-dimensional geometric algebraic space includes:
constructing a three-dimensional geometric algebraic spacePoints the pixel point at p in the current frame video image to p in the next frame video imagerMotion vector v of pixel point of (d)pSaid motion vector vpComprises the following steps:
vp=pr-p;
wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current framerFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.
Further, the preset apparent-motion vector algorithm is:
f′(p)=f(p)+vp
wherein v ispRepresents a motion vector for a pixel point at p in the video image, f (p) represents a pixel value for a pixel point at p in the video image, and f' (p) represents an apparent-motion vector for a pixel point at p in the video image.
Further, the constructing a spatio-temporal second-order matrix by combining the apparent-motion vectors of the pixel points includes:
calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t 'x,f′y,f′t
Utilizing the gradient f ' of the apparent-motion vector f ' (p) of the pixel point in the three directions of x, y and t 'x,f′y,f′tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:
performing convolution calculation on the matrix N by using a Gaussian weighting function omega (p) to obtain a space-time second-order matrix M (p), wherein the space-time second-order matrix M (p) is as follows:
where ω denotes a Gaussian weighting function ω (p),for the convolution symbols, A, B, C, D, E and F
Corresponding to the elements of the matrix m (p), respectively.
Further, the spatio-temporal Harris corner response function is:
R=det(M)-k(trace(M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
wherein R is a space-time Harris corner response function value, and k is an empirical constant; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:
det(M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
trace(M)=λ123=A+B+C
wherein λ is1、λ2、λ3Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).
The invention also provides a novel space-time Harris angular point detection device, which comprises:
the preprocessing module is used for obtaining a three-dimensional geometric algebraic space of the video based on spatial information and time domain information of a video image contained in the video;
the motion vector construction module is used for constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;
the appearance-motion vector construction module is used for obtaining an appearance-motion vector of each pixel point by utilizing the motion vector of the pixel point and a preset appearance-motion vector algorithm;
the space-time second-order matrix construction module is used for constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points;
the response function constructing module is used for constructing a space-time Harris corner response function according to the space-time second-order matrix;
and the space-time Harris angular point acquisition module is used for calculating space-time Harris angular point response function values of a certain pixel point p and all pixel points in the neighborhood of the pixel point p according to the space-time Harris angular point response function values, and if the space-time Harris angular point response function values of the pixel point p are larger than the space-time Harris angular point response function values of all pixel points in the neighborhood of the pixel point p, the point p is the space-time Harris angular point of the video.
Further, the motion vector construction module is specifically configured to construct a three-dimensional geometric algebraic space according to the following formulaPoints the pixel point at p in the current frame video image to p in the next frame video imagerMotion vector v of pixel point of (d)pSaid motion vector vpComprises the following steps:
vp=pr-p;
wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current framerFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.
Further, the preset apparent-motion vector algorithm is:
f′(p)=f(p)+vp
wherein v ispRepresents a motion vector for a pixel point at p in the video image, f (p) represents a pixel value for a pixel point at p in the video image, and f' (p) represents an apparent-motion vector for a pixel point at p in the video image.
Further, the spatio-temporal second-order matrix constructing module includes:
a gradient calculation module for calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in the three directions of x, y and t 'x,f′y,f′t
A second-order gradient matrix construction module for utilizing the gradient f ' of the apparent-motion vector f ' (p) of the pixel point in the three directions of x, y and t 'x,f′y,f′tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:
the space-time second-order matrix obtaining module is used for performing convolution calculation on the matrix N by utilizing a Gaussian weighting function omega (p) to obtain a space-time second-order matrix M (p), and the space-time second-order matrix M (p) is as follows:
where ω denotes a Gaussian weighting function ω (p),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.
Further, the spatio-temporal Harris corner response function is:
R=det(M)-k(trace(M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
wherein R is a space-time Harris corner response function value, and k is an empirical constant; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:
det(M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
trace(M)=λ123=A+B+C
wherein λ is1、λ2、λ3Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).
Compared with the prior art, the invention has the beneficial effects that: the invention provides a novel space-time Harris angular point detection method and a device, which are used for obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of video images contained in the video and constructing motion vectors of pixel points of the three-dimensional geometric algebraic space, and combining the motion vectors of the pixel points to construct an appearance-motion vector so as to obtain a video appearance and motion information unified model, constructing a spatio-temporal second-order matrix on the basis of the unified model, constructing a spatio-temporal Harris corner response function according to the spatio-temporal second-order matrix, and calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood of the certain pixel point according to the space-time Harris corner response function, and comparing the space-time Harris corner response function values of the certain pixel point p and other pixel points in the neighborhood of the certain pixel point p to judge whether the certain pixel point p is a space-time Harris corner of the video. Compared with the prior art, the method has the advantages that the video is regarded as a three-dimensional structure, a unified model of the video appearance and the motion information is established by adopting geometric algebra, the unified model is established for the geometric bodies with different dimensions and the relation between the geometric bodies with different dimensions of the video appearance information and the motion information, and a space-time Harris corner point detection algorithm is provided based on the model, so that the detection algorithm can fully reflect the time-space domain correlation of the video, and the extracted space-time interest points contain unique appearance information in the space domain and simultaneously represent clear motion change in the time domain.
Drawings
FIG. 1 is a flow chart of a new spatio-temporal Harris corner detection method according to an embodiment of the present invention;
FIG. 2 is a schematic flowchart of the refinement step of step S3 provided by the embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a new spatio-temporal Harris corner detection apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a refinement function module of the spatio-temporal second-order matrix building module according to an embodiment of the present invention;
fig. 5a, fig. 5b, fig. 5c and fig. 5d are schematic diagrams illustrating distribution of Harris3D features when video frames are set to 42, 47, 94 and 113 respectively according to an embodiment of the present invention;
fig. 6a, fig. 6b, fig. 6c and fig. 6d are schematic diagrams of distribution of SIFT3D features when video frames provided by the embodiment of the present invention are respectively set to 42, 47, 94 and 113;
fig. 7a, fig. 7b, fig. 7c, and fig. 7d are schematic diagrams illustrating distribution of UMAM-Harris corners when video frames are set to 42, 47, 94, and 113, respectively, according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The problem that the Harris corner points of the space-time interest points, which contain unique apparent information in the space domain and represent clear motion change in the time domain, can not be truly extracted due to the fact that the correlation of the space-time domain of the video cannot be fully reflected in the prior art.
In order to solve the technical problems, the invention provides a novel space-time Harris corner detection method, which treats a video as a three-dimensional structure, adopts geometric algebra to establish a unified model of video appearance and motion information, establishes a unified model for geometric bodies with different dimensions of the video appearance information and the motion information and the relation between the geometric bodies and the motion information, and provides a novel space-time Harris corner detection algorithm based on the model, so that the detection algorithm can fully reflect the time-space domain correlation of the video, and extracted space-time interest points contain unique appearance information in a space domain and simultaneously represent clear motion change in a time domain.
Referring to fig. 1, a new spatio-temporal Harris corner detection method according to an embodiment of the present invention includes:
step S1, obtaining a three-dimensional geometric algebraic space of a video based on spatial domain information and time domain information of a video image contained in the video, and constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;
in the embodiment of the present invention, modeling is performed in a video, a sequence of video images (video frames) included in the video may be represented as a three-dimensional stereo structure, including spatial information and temporal information, and it should be noted that both appearance information and motion information in the video may be represented by geometric algebra in the three-dimensional stereo structure. In view of the simplicity of the operation of geometric (Clifford) algebra on vector data and geometric information, the video is modeled herein using three-dimensional geometric algebra as a mathematical framework, and a representation model of a video image sequence under the geometric algebra framework will be explained below.
Let R3Is a three-dimensional Euclidean space formed by spatial information and time domain information of a video image sequence contained in a video, and the orthonormal base of the three-dimensional Euclidean space is { e }1,e2,e3Then these orthonormal bases are spanned by the geometry to R3The geometric algebraic space ofTheI.e. the three-dimensional geometric algebraic space of the video, in the embodiment of the present invention, the geometric algebraic space is subsequently usedIt is briefly described asOne set of canonical bases for this is:
{1,e1,e2,e3,e1∧e2,e2∧e3,e1∧e3,e1∧e2∧e3} (1)
wherein ^ represents the sign of the geometric algebraic outer product calculation, e1∧e2,e2∧e3And e1∧e3Is composed of three orthogonal radicals e1、e2And e3Three independent double outer products are obtained, which represent geometrically separatePlanes of two vector representations in space, e1∧e2∧e3Is the triple outer product: e.g. of the type1∧e2∧e3=(e1∧e2)e3The geometrical interpretation is: double outer product e1∧e2Along the vector e3The obtained directed geometry is moved. { e1,e2,e3Can be seen asThe basis vectors { x, y, t } of the 3-dimensional vector subspace.
The video sequence F can be represented as:
F=f(p) (2)
wherein,and p ═ xe1+ye2+te3(ii) a x and y represent spatial coordinates and have 0 < x < Sx,0<y<Sy(ii) a t represents the time domain coordinate, and 0 < t < St. F (p) represents the apparent value of the pixel at p in the video F.
Let p1,And p is1=x1e1+y1e2+t1e3,p2=x2e1+y2e2+t2e3Then their geometric product can be expressed as:
p1p2=p1·p2+p1∧p2(3)
wherein, represents the inner product, and Λ represents the outer product. It means that the geometric product of two vectors is composed of inner product (p)1·p2) Sum and outer product (p)1∧p2) And (4) the sum of the components.
In thatIn, p1And p2Can be represented by △ p, i.e.:
△p=p1-p2=(x1-x2)e1+(y1-y2)e2+(t1-t2)e3(4)
it represents a slave p2Point of direction p1The vector of (2) not only is a measure of the distance between two pixel points, but also can reflect the motion condition of the pixel points in the video sequence.
The above is an introduction of the three-dimensional geometric algebraic space of the video in the embodiment of the present invention.
In the embodiment of the present invention, the motion vector of the pixel point for constructing the three-dimensional geometric algebraic space is specifically constructed byPoints the pixel point at p in the current frame video image to p in the next frame video imagerMotion vector v of pixel point of (d)pSaid motion vector vpComprises the following steps:
vp=pr-p; (5)
wherein, if p is set, the ratio of p,and p ═ xie1+yje2+tke3,p′=xie1+yje2+(tk+1)e3S is when t is equal to tkSet of points of the neighborhood of l × l centered at p' on the +1 plane.
Wherein p is the three-dimensional geometric algebraic spaceThe pixel point at p in the video image of the current frame,i.e. prFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image. v. ofpThe motion change conditions of the pixel points at the p position are reflected, including the change of the motion direction, the change of the motion speed and the like, and the change can be expressed as vpTo reflect the magnitude of the motion change. In general, the larger the change of the motion direction or the motion speed of the pixel point at p is, the larger vpThe larger the modulus value of (a) and vice versa.
Step S2, obtaining the appearance-motion vector of each pixel point by using the motion vector of the pixel point and a preset appearance-motion vector algorithm;
the preset apparent-motion vector algorithm is as follows:
f′(p)=f(p)+vp(6)
where f' (p) represents the apparent-motion vector of the pixel at p in the video image, vpRepresenting the motion vector of the pixel point at p in the video image, and f (p) representing the apparent value of the pixel point at p in the video image, wherein f (p) represents the pixel value of the pixel point at p in the video image in the embodiment of the present invention.
The newly defined f' (p) is a vector containing both scalar quantity information and vector quantity information, and reflects not only the appearance information but also the change of the moving direction and the moving speed.
Based on the above definition, a unified model umam (unified model of appearance and motion), abbreviated as F', of video appearance and motion is constructed as follows:
F′=f′(p) (7)
wherein f' (p) is a pixel pointA function of (a), or an apparent-motion vector (AMV), as an argument. As can be seen from the above analysis, the UMAM not only contains the apparent information of the video, but also can reflect the local motion information of the video, including the motion direction and speed; the spatio-temporal Harris corner points mentioned in the embodiments of the present invention refer to UMAM-Harris corner points.
Step S3, constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points, and constructing a space-time Harris corner response function according to the space-time second-order matrix;
in order to better understand the technical solution of the embodiment of the present invention, the following describes a three-dimensional geometric algebraic space of a videoWeighted correlation function of the upper AMV.
Is provided withPoint p 'is a point in the neighborhood of p with coordinates (p + △ p), then the weighted correlation function of f' (p) and f '(p') is defined as follows:
wherein ω (p) is a three-dimensional Gaussian kernel function G (p, σ),w (p) represents a convolution operation, and is a gaussian window function of size l × l × l centered at p, σ being a scale factor of the gaussian kernel function.
Is provided withThen the embodiment of the invention providesGaussian function G (p, σ) in (a), defined as follows:
wherein σ isThe scale factor of gaussian function G (p, σ) in (a) represents the outer product, "· represents the inner product, and the size of the gaussian window function is l × l × l, where l ═ 6 σ + 1.
In order to better understand the technical solution in the embodiment of the present invention, it is proved that the above-mentioned gaussian function G (p, σ) is a three-dimensional geometric algebraic space of a videoA medium effective gaussian function.
And (3) proving that: | p · σ -2It can be further developed as follows.
σ ^ σ may be further expanded as follows.
Then, equation (9) can be written as:
based on the above evidence, it can be seen that G (p, σ) is converted toThe form of (b) is consistent with the general three-dimensional Gaussian function, and therefore, the Gaussian function G (p, σ) provided in the embodiment of the present invention isA medium effective gaussian function.
Then, the taylor series expansion is performed on f' (p +. DELTA.p), and a first order approximation is taken:
wherein △ x, △ y, △ t is the distance between p and p 'in the three directions of x, y and t, and f'x,f′y,f′tIs the gradient of f' (p) in the three x, y, t directions, i.e.
The results in the above equations (14), (15) and (16) are all vectors.
Thus, equation (8) can be approximated as
Wherein,
of formula (19), f'x,f′y,f′tRespectively representThe gradient of AMV in the three directions x, y, t, ω is a Gaussian weighting function ω (p) in equation (8),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.
Based on the weighted correlation function, a flowchart of the step of refining step S3 in the embodiment of the present invention is described below, and as shown in fig. 2, the step S3 includes:
s31, calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t by combining the formulas (14), (15) and (16) 'x,f′y,f′t
S32, using the gradient f ' of the apparent motion vector f ' (p) of the pixel point in three directions of x, y and t 'x,f′y,f′tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:
specifically, the specific acquisition process of the second-order gradient matrix N is shown in equation (17).
S33, performing convolution calculation on the matrix N by using a gaussian weighting function ω (p) to obtain a space-time second-order matrix m (p), where the space-time second-order matrix m (p) is:
where ω is a gaussian weighting function ω (p) in formula (10), and the scale factor of the gaussian function takes σ equal to 1,for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.
Specifically, the specific acquisition process of the spatio-temporal second order matrix m (p) is shown in formula (17).
S34, constructing a space-time Harris corner response function according to the space-time second-order matrix;
the spatio-temporal Harris corner response function is:
R=det(M)-k(trace(M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
wherein, R is a space-time Harris angular point response function value, k is an empirical constant, and k is 0.04 in the embodiment of the invention; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:
det(M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
trace(M)=λ123=A+B+C
wherein λ is1、λ2、λ3Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).
Step S4, calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood according to the space-time Harris corner response function, and if the space-time Harris corner response function value of the pixel point p is larger than the space-time Harris corner response function values of all pixel points in the neighborhood, the point p is the space-time Harris corner of the video.
Specifically, by comparing the UMAM-Harris corner response function values of the point p and other pixel points in the h × h × h neighborhood, if the point p has the maximum r (p) in the h × h × h neighborhood, the point p is the UMAM-Harris corner of the video.
Referring to fig. 3, a schematic structural diagram of a new spatio-temporal Harris corner detection apparatus in an embodiment of the present invention is shown, where the detection apparatus includes:
the video processing device comprises a preprocessing module 1, a video processing module and a video processing module, wherein the preprocessing module is used for obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of a video image contained in the video;
the motion vector construction module 2 is used for constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;
specifically, the motion vector construction module is used for constructing a three-dimensional geometric algebraic space according to the following formulaPoints the pixel point at p in the current frame video image to p in the next frame video imagerMotion vector v of pixel point of (d)pSaid motion vector vpComprises the following steps:
vp=pr-p;
wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current framerFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the current frame video image and the pixel value of the p position in the field taking p 'as the center in the next frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.
And the appearance-motion vector construction module 3 is used for obtaining an appearance-motion vector of each pixel point by using the motion vector of the pixel point and a preset appearance-motion vector algorithm, wherein the appearance-motion vector of each pixel point in the three-dimensional geometric algebraic space is the established video appearance and motion information unified model.
Specifically, the preset apparent-motion vector algorithm is:
f′(p)=f(p)+vp
wherein v ispRepresents a motion vector for a pixel point at p in the video image, f (p) represents a pixel value for a pixel point at p in the video image, and f' (p) represents an apparent-motion vector for a pixel point at p in the video image.
The space-time second-order matrix construction module 4 is used for constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points;
specifically, as shown in fig. 4, the refinement function module of the spatio-temporal second-order matrix construction module includes:
a gradient calculating module 41 for calculating a gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t 'x,f′y,f′t
A second-order gradient matrix constructing module 42, configured to utilize gradient f ' of the apparent-motion vector f ' (p) of the pixel point in three directions x, y, t 'x,f′y,f′tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:
a space-time second-order matrix obtaining module 43, configured to perform convolution calculation on the matrix N by using a gaussian weighting function ω (p) to obtain a space-time second-order matrix m (p), where the space-time second-order matrix m (p) is:
where ω denotes a Gaussian weighting function ω (p),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.
A response function constructing module 5, configured to construct a spatio-temporal Harris corner response function according to the spatio-temporal second order matrix;
specifically, the spatio-temporal Harris corner response function is:
R=det(M)-k(trace(M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
wherein, R is a space-time Harris angular point response function value, k is an empirical constant, and k is 0.04 in the embodiment of the invention; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:
det(M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
trace(M)=λ123=A+B+C
wherein λ is1、λ2、λ3Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).
And the space-time Harris angular point acquisition module 6 is used for calculating space-time Harris angular point response function values of a certain pixel point p and all pixel points in the neighborhood of the pixel point p according to the space-time Harris angular point response function values, and if the space-time Harris angular point response function values of the pixel point p are larger than the space-time Harris angular point response function values of all pixel points in the neighborhood of the pixel point p, the point p is the space-time Harris angular point of the video.
Specifically, by comparing the UMAM-Harris corner response function values of the point p and other pixel points in the h × h × h neighborhood, if the point p has the maximum r (p) in the h × h × h neighborhood, the point p is the UMAM-Harris corner of the video.
In the embodiment of the invention, a three-dimensional geometric algebraic space of a video is obtained based on spatial information and time domain information of a video image contained in the video, motion vectors of pixel points of the three-dimensional geometric algebraic space are constructed, an appearance-motion vector is constructed by combining the motion vectors of the pixel points, a unified model of video appearance and motion information is obtained, a space-time second-order matrix is constructed on the basis of the unified model, a space-time Harris corner response function is constructed according to the space-time second-order matrix, space-time Harris corner response function values of a certain pixel point p and all pixel points in a neighborhood of the certain pixel point p are calculated according to the space-time Harris corner response function values, and the size of the space-time Harris corner response function values of other pixel points in the neighborhood of the certain pixel point p is compared to judge whether the point p is the space. Compared with the prior art, the method has the advantages that the video is regarded as a three-dimensional structure, a unified model of the video appearance and the motion information is established by adopting geometric algebra, the unified model is established for the geometric bodies with different dimensions and the relation between the geometric bodies with different dimensions of the video appearance information and the motion information, and a space-time Harris corner point detection algorithm is provided based on the model, so that the detection algorithm can fully reflect the time-space domain correlation of the video, and the extracted space-time interest points contain unique appearance information in the space domain and simultaneously represent clear motion change in the time domain.
In order to verify the effectiveness of the new spatio-temporal Harris corner detection method in the embodiment of the present invention, the embodiment of the present invention evaluates the proposed UMAM-Harris detection algorithm on the current popular video behavior recognition data set UCF 101. The UCF101 is one of the largest current video behavior recognition data sets of real scenes, and comprises 101 categories of behaviors, wherein the total number of the behaviors is 13320, each category of behaviors is composed of 25 groups, each group comprises 4-7 videos, and the videos in the same group are all completed by the same person under the same scene and have the same background and shooting view angle. These 101 types of videos are classified into 5 major categories according to the difference of moving objects: Human-Object Interaction, Body-Motion Only, Human-Human Interaction, musical instrument Playing, and Sports (Sports). The UCF101 provides different scenes including camera motion, complex background, occlusion, different lighting and low resolution, etc., which are currently challenging video behavior recognition datasets.
The spatio-temporal interest points used in the experimental process comprise UMAM-Harris angular points extracted by the UMAM-Harris detection algorithm, Harris3D features extracted by the Harris3D detection algorithm and SIFT3D features extracted by the SIFT3D detection algorithm. For each video, we extract its spatio-temporal interest points and then compute the ST-SIFT descriptors of each extracted spatio-temporal interest point. And performing subsampling on the space-time interest points of the video used for training, then using PCA to reduce the dimensionality of the descriptor and a pre-training Gaussian mixture model to calculate a Fisher vector, and finally training to obtain the SVM model. For testing, PCA dimensionality reduction is carried out on the videos to be tested, Fisher vectors of space-time interest points of the videos are calculated, and finally classification is carried out by using an SVM model obtained through pre-training. In order to keep the training set from overlapping the video of the test set, the test set contains 7 groups of each type of behavior, and the remaining 18 groups are used for training.
In order to test and evaluate the technical solution in the embodiment of the present invention, the algorithm proposed by us is evaluated on the UCF101 human behavior data set. In an experimental part, a UMAM-Harris detection algorithm is used for extracting spatio-temporal interest points on a specific video, then an algorithm based on UMAM-Harris corner points is used for classifying video behaviors on a UCF101 data set, and the classification is compared with a current existing method.
Under the same experimental setup, we extract spatio-temporal interest points from a table tennis video named "v _ TableTennisShot _ g01_ c 01" in the UCF101 database by using the proposed UMAM-Harris detection algorithm, the Harris3D detection algorithm and the SIFT3D detection algorithm, respectively, and the extraction results are shown in fig. 5, fig. 6 and fig. 7.
As shown in fig. 5, 6, and 7, a, b, c, and d, when the video frames are set to 42, 47, 94, and 113, the three detection algorithms of Harris3D, SIFT3D, and UMAM-Harris correspond to the distribution of the extracted spatio-temporal interest points, the dots on the video frames in the figure represent the positions of the extracted spatio-temporal interest points, fig. 5 corresponds to the distribution of Harris3D features, fig. 6 corresponds to the distribution of SIFT3D features, and fig. 7 corresponds to the distribution of UMAM-Harris corners.
From the above experimental results, it can be seen that Harris3D features detected by using the Harris3D detection algorithm are mainly distributed on the body of the athlete, so that the joint movement of the athlete can be detected, and in addition, a small amount of noise distributed on the background of the video is also detected. In addition, SIFT3D features extracted by using the SIFT3D detection algorithm are very rich, but contain more background noise points, and the pseudo-spatiotemporal interest points bring interference for subsequent feature description and classification identification. Compared with the extraction results of the Harris3D detection algorithm and the SIFT3D detection algorithm, most of UMAM-Harris corner points extracted by the UMAM-Harris algorithm are distributed on the body of the sportsman and contain few background noise points, because the UMAM-Harris detection algorithm considers that the moving object has obvious gray level changes in the spatial domain and the time domain, and also considers the information of the moving speed, the moving direction and the like of the moving object, so that UMAM-Harris corner points containing moving information can be extracted from the video, and the background noise points are restrained. Therefore, the UMAM-Harris corners can accurately position the position of the moving object, and can better represent the behavior with remarkable change in the video. From this we can conclude that: the UMAM-Harris detection algorithm truly extracts the spatio-temporal interest points containing unique apparent information and rich motion information on the basis of retaining the advantages of robustness, effectiveness and the like of the features detected by the Harris3D detection algorithm.
The embodiment of the invention regards the video as a three-dimensional structure, establishes a unified model of video appearance and motion information by using Clifford algebra, and develops a new space-time Harris corner detection algorithm-UMAM-Harris corner detection algorithm on the basis. Experimental results show that the UMAM-Harris detection algorithm can truly extract the space-time interest points which reflect the unique appearance information of the video in a space domain and can reflect the motion change in a time domain, and the classification accuracy of video behavior identification can be effectively improved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A new space-time Harris corner detection method is characterized by comprising the following steps:
obtaining a three-dimensional geometric algebraic space of a video based on spatial information and time domain information of a video image contained in the video, and constructing a motion vector of a pixel point of the three-dimensional geometric algebraic space;
obtaining an apparent-motion vector of each pixel point by using the motion vector of the pixel point and a preset apparent-motion vector algorithm;
combining the appearance-motion vectors of the pixel points to construct a space-time second-order matrix, and constructing a space-time Harris corner response function according to the space-time second-order matrix;
and calculating a space-time Harris corner response function value of a certain pixel point p and all pixel points in the neighborhood of the certain pixel point p according to the space-time Harris corner response function, wherein if the space-time Harris corner response function value of the pixel point p is larger than the space-time Harris corner response function values of all pixel points in the neighborhood of the certain pixel point p, the point p is the space-time Harris corner of the video.
2. The spatio-temporal Harris corner detection method of claim 1, wherein the constructing motion vectors for pixel points of the three-dimensional geometric algebraic space comprises:
constructing a three-dimensional geometric algebraic spacePoints the pixel point at p in the current frame video image to p in the next frame video imagerMotion vector v of pixel point of (d)pSaid motion vector vpComprises the following steps:
vp=pr-p;
wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current framerFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.
3. The spatiotemporal Harris corner detection method of claim 2, wherein the preset apparent-motion vector algorithm is:
f′(p)=f(p)+vp
wherein v ispRepresents a motion vector for a pixel point at p in the video image, f (p) represents a pixel value for a pixel point at p in the video image, and f' (p) represents an apparent-motion vector for a pixel point at p in the video image.
4. The spatio-temporal Harris corner detection method of claim 3, wherein said constructing a spatio-temporal second order matrix in combination with the apparent-motion vectors of the pixel points comprises:
calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in three directions of x, y and t 'x,f′y,f′t
Utilizing the gradient f ' of the apparent-motion vector f ' (p) of the pixel point in the three directions of x, y and t 'x,f′y,f′tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:
<mrow> <mi>N</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
performing convolution calculation on the matrix N by using a Gaussian weighting function omega (p) to obtain a space-time second-order matrix M (p), wherein the space-time second-order matrix M (p) is as follows:
<mrow> <mi>M</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&amp;omega;</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>&amp;CircleTimes;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>A</mi> </mtd> <mtd> <mi>D</mi> </mtd> <mtd> <mi>E</mi> </mtd> </mtr> <mtr> <mtd> <mi>D</mi> </mtd> <mtd> <mi>B</mi> </mtd> <mtd> <mi>F</mi> </mtd> </mtr> <mtr> <mtd> <mi>E</mi> </mtd> <mtd> <mi>F</mi> </mtd> <mtd> <mi>C</mi> </mtd> </mtr> </mtable> </mfenced> </mrow>1
where ω denotes a Gaussian weighting function ω (p),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.
5. The spatio-temporal Harris corner detection method of claim 4, wherein the spatio-temporal Harris corner response function is:
R=det(M)-k(trace(M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3wherein R is a space-time Harris corner response function value, and k is an empirical constant; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:
det(M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
trace(M)=λ123=A+B+C
wherein λ is1、λ2、λ3Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).
6. A new space-time Harris corner detection device is characterized by comprising:
the preprocessing module is used for obtaining a three-dimensional geometric algebraic space of the video based on spatial information and time domain information of a video image contained in the video;
the motion vector construction module is used for constructing motion vectors of pixel points of the three-dimensional geometric algebraic space;
the appearance-motion vector construction module is used for obtaining an appearance-motion vector of each pixel point by utilizing the motion vector of the pixel point and a preset appearance-motion vector algorithm;
the space-time second-order matrix construction module is used for constructing a space-time second-order matrix by combining the apparent-motion vectors of the pixel points;
the response function constructing module is used for constructing a space-time Harris corner response function according to the space-time second-order matrix;
and the space-time Harris angular point acquisition module is used for calculating space-time Harris angular point response function values of a certain pixel point p and all pixel points in the neighborhood of the pixel point p according to the space-time Harris angular point response function values, and if the space-time Harris angular point response function values of the pixel point p are larger than the space-time Harris angular point response function values of all pixel points in the neighborhood of the pixel point p, the point p is the space-time Harris angular point of the video.
7. The spatio-temporal Harris corner detection apparatus as claimed in claim 6, wherein said motion vector construction module is specifically configured to construct a three-dimensional geometric algebraic space according to the following formulaPoints the pixel point at p in the current frame video image to p in the next frame video imagerMotion vector v of pixel point of (d)pSaid motion vector vpComprises the following steps:
vp=pr-p;
wherein p is the three-dimensional geometric algebraic spacePixel point p in the video image of the current framerFor the three-dimensional geometric algebraic spaceThe pixel point with the smallest difference between the pixel values of the p position in the neighborhood taking p 'as the center in the next frame video image and the current frame video image, wherein p' is the pixel point with the same position as the pixel point of the p position in the current frame video image in the next frame video image.
8. The spatiotemporal Harris corner detection apparatus of claim 7, wherein the preset apparent-motion vector algorithm is:
f′(p)=f(p)+vp
wherein v ispRepresents a motion vector for a pixel point at p in the video image, f (p) represents a pixel value for a pixel point at p in the video image, and f' (p) represents an apparent-motion vector for a pixel point at p in the video image.
9. The spatio-temporal Harris corner detection apparatus of claim 8, wherein the spatio-temporal second order matrix construction module comprises:
a gradient calculation module for calculating the gradient f ' of the apparent-motion vector f ' (p) of each pixel point in the three directions of x, y and t 'x,f′y,f′t
A second-order gradient matrix construction module for utilizing the gradient f ' of the apparent-motion vector f ' (p) of the pixel point in the three directions of x, y and t 'x,f′y,f′tConstructing a second-order gradient matrix N, wherein the second-order gradient matrix N is as follows:
<mrow> <mi>N</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
the space-time second-order matrix obtaining module is used for performing convolution calculation on the matrix N by utilizing a Gaussian weighting function omega (p) to obtain a space-time second-order matrix M (p), and the space-time second-order matrix M (p) is as follows:
<mrow> <mi>M</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&amp;omega;</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>&amp;CircleTimes;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>x</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>y</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>f</mi> <mi>t</mi> <mo>&amp;prime;</mo> </msubsup> <mo>&amp;CircleTimes;</mo> <mi>&amp;omega;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>A</mi> </mtd> <mtd> <mi>D</mi> </mtd> <mtd> <mi>E</mi> </mtd> </mtr> <mtr> <mtd> <mi>D</mi> </mtd> <mtd> <mi>B</mi> </mtd> <mtd> <mi>F</mi> </mtd> </mtr> <mtr> <mtd> <mi>E</mi> </mtd> <mtd> <mi>F</mi> </mtd> <mtd> <mi>C</mi> </mtd> </mtr> </mtable> </mfenced> </mrow>
where ω denotes a Gaussian weighting function ω (p),for the convolution symbols A, B, C, D, E and F correspond to the elements of matrix M (p), respectively.
10. The spatio-temporal Harris corner detection apparatus of claim 9, wherein the spatio-temporal Harris corner response function is:
R=det(M)-k(trace(M))3=(ABC+2DEF-BE2-AF2-CD2)-k(A+B+C)3
wherein R is a space-time Harris corner response function value, and k is an empirical constant; det (M) represents the determinant of matrix M (p), trace (M) represents the traces of matrix M (p), and the expression is as follows:
det(M)=λ1λ2λ3=ABC+2DEF-BE2-AF2-CD2
trace(M)=λ123=A+B+C
wherein λ is1、λ2、λ3Characteristic values of matrix M, A, B, C, D, E and F respectively, correspond to the elements of matrix M (p).
CN201710384274.9A 2017-05-26 2017-05-26 Novel space-time Harris corner detection method and device Active CN107230220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710384274.9A CN107230220B (en) 2017-05-26 2017-05-26 Novel space-time Harris corner detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710384274.9A CN107230220B (en) 2017-05-26 2017-05-26 Novel space-time Harris corner detection method and device

Publications (2)

Publication Number Publication Date
CN107230220A true CN107230220A (en) 2017-10-03
CN107230220B CN107230220B (en) 2020-02-21

Family

ID=59933617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710384274.9A Active CN107230220B (en) 2017-05-26 2017-05-26 Novel space-time Harris corner detection method and device

Country Status (1)

Country Link
CN (1) CN107230220B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960965A (en) * 2017-12-14 2019-07-02 翔升(上海)电子技术有限公司 Methods, devices and systems based on unmanned plane identification animal behavior
CN110837770A (en) * 2019-08-30 2020-02-25 深圳大学 Video behavior self-adaptive segmentation method and device based on multiple Gaussian models
CN111292255A (en) * 2020-01-10 2020-06-16 电子科技大学 Filling and correcting technology based on RGB image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216941A (en) * 2008-01-17 2008-07-09 上海交通大学 Motion estimation method under violent illumination variation based on corner matching and optic flow method
CN101336856A (en) * 2008-08-08 2009-01-07 西安电子科技大学 Information acquisition and transfer method of auxiliary vision system
CN102222348A (en) * 2011-06-28 2011-10-19 南京大学 Method for calculating three-dimensional object motion vector
CN105139412A (en) * 2015-09-25 2015-12-09 深圳大学 Hyperspectral image corner detection method and system
CN105704496A (en) * 2016-03-25 2016-06-22 符锌砂 Adaptive template matching algorithm based on edge detection
CN106550174A (en) * 2016-10-28 2017-03-29 大连理工大学 A kind of real time video image stabilization based on homography matrix

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216941A (en) * 2008-01-17 2008-07-09 上海交通大学 Motion estimation method under violent illumination variation based on corner matching and optic flow method
CN101336856A (en) * 2008-08-08 2009-01-07 西安电子科技大学 Information acquisition and transfer method of auxiliary vision system
CN102222348A (en) * 2011-06-28 2011-10-19 南京大学 Method for calculating three-dimensional object motion vector
CN105139412A (en) * 2015-09-25 2015-12-09 深圳大学 Hyperspectral image corner detection method and system
CN105704496A (en) * 2016-03-25 2016-06-22 符锌砂 Adaptive template matching algorithm based on edge detection
CN106550174A (en) * 2016-10-28 2017-03-29 大连理工大学 A kind of real time video image stabilization based on homography matrix

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANSHAN LI等: "《A Harris Corner Detection Algorithm for Multispectral Images Based on the Correlation》", 《6TH INTERNATIONAL CONFERENCE ON WIRELESS, MOBILE AND MULTI-MEDIA (ICWMMN 2015)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960965A (en) * 2017-12-14 2019-07-02 翔升(上海)电子技术有限公司 Methods, devices and systems based on unmanned plane identification animal behavior
CN110837770A (en) * 2019-08-30 2020-02-25 深圳大学 Video behavior self-adaptive segmentation method and device based on multiple Gaussian models
CN110837770B (en) * 2019-08-30 2022-11-04 深圳大学 Video behavior self-adaptive segmentation method and device based on multiple Gaussian models
CN111292255A (en) * 2020-01-10 2020-06-16 电子科技大学 Filling and correcting technology based on RGB image
CN111292255B (en) * 2020-01-10 2023-01-17 电子科技大学 Filling and correcting technology based on RGB image

Also Published As

Publication number Publication date
CN107230220B (en) 2020-02-21

Similar Documents

Publication Publication Date Title
Zhu et al. Face alignment in full pose range: A 3d total solution
Zhou et al. Monocap: Monocular human motion capture using a cnn coupled with a geometric prior
Varol et al. Learning from synthetic humans
Du et al. Marker-less 3D human motion capture with monocular image sequence and height-maps
Tran et al. Extreme 3D Face Reconstruction: Seeing Through Occlusions.
Rogez et al. Mocap-guided data augmentation for 3d pose estimation in the wild
Lee et al. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks
Zhao et al. A simple, fast and highly-accurate algorithm to recover 3d shape from 2d landmarks on a single image
Holte et al. 3D human action recognition for multi-view camera systems
Wang et al. Learning view invariant gait features with two-stream GAN
Abdul-Azim et al. Human action recognition using trajectory-based representation
US20120306874A1 (en) Method and system for single view image 3 d face synthesis
CN107230220B (en) Novel space-time Harris corner detection method and device
CN109684969A (en) Stare location estimation method, computer equipment and storage medium
CN107403182A (en) The detection method and device of space-time interest points based on 3D SIFT frameworks
Kanaujia et al. 3D human pose and shape estimation from multi-view imagery
Kanaujia et al. Part segmentation of visual hull for 3d human pose estimation
Chao et al. Structural feature representation and fusion of human spatial cooperative motion for action recognition
Du The computer vision simulation of athlete’s wrong actions recognition model based on artificial intelligence
Hassner et al. SIFTing through scales
CN111126508A (en) Hopc-based improved heterogeneous image matching method
Li Three‐Dimensional Diffusion Model in Sports Dance Video Human Skeleton Detection and Extraction
Zhao Sports motion feature extraction and recognition based on a modified histogram of oriented gradients with speeded up robust features
Sun et al. Devil in the details: Delving into accurate quality scoring for DensePose
Razzaghi et al. A new invariant descriptor for action recognition based on spherical harmonics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210304

Address after: 518000 area a, 21 / F, Konka R & D building, 28 Keji South 12 road, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Easy city square network technology Co.,Ltd.

Address before: 518000 No. 3688 Nanhai Road, Shenzhen, Guangdong, Nanshan District

Patentee before: SHENZHEN University

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 402760 no.1-10 Tieshan Road, Biquan street, Bishan District, Chongqing

Patentee after: Chongqing Yifang Technology Co.,Ltd.

Address before: 518000 area a, 21 / F, Konka R & D building, 28 Keji South 12 road, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Easy city square network technology Co.,Ltd.