CN107341815B - Violent motion detection method based on multi-view stereoscopic vision scene stream - Google Patents

Violent motion detection method based on multi-view stereoscopic vision scene stream Download PDF

Info

Publication number
CN107341815B
CN107341815B CN201710404056.7A CN201710404056A CN107341815B CN 107341815 B CN107341815 B CN 107341815B CN 201710404056 A CN201710404056 A CN 201710404056A CN 107341815 B CN107341815 B CN 107341815B
Authority
CN
China
Prior art keywords
motion
scene
flow
dimensional
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710404056.7A
Other languages
Chinese (zh)
Other versions
CN107341815A (en
Inventor
项学智
肖德广
宋凯
翟明亮
吕宁
尹力
郭鑫立
王帅
张荣芳
于泽婷
张玉琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201710404056.7A priority Critical patent/CN107341815B/en
Publication of CN107341815A publication Critical patent/CN107341815A/en
Application granted granted Critical
Publication of CN107341815B publication Critical patent/CN107341815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a violent motion detection method based on a multi-view stereoscopic scene stream. Firstly, the method comprises the following steps: acquiring a plurality of groups of image sequences by using a calibrated multi-view camera; II, secondly: preprocessing an image sequence; thirdly, the method comprises the following steps: designing a scene flow energy functional data item; fourthly, the method comprises the following steps: designing a scene flow energy functional smoothing term; fifthly: performing optimization solution on an energy functional ground; calculating by using a calculation model from the image pyramid lowest resolution image obtained in the step two; sixthly, the method comprises the following steps: clustering of scene stream motion regions; seventhly, the method comprises the following steps: constructing a motion direction discrete degree evaluation model, and judging whether the motion is violent; eighthly: constructing a kinetic energy size evaluation model of a motion area; nine: and setting a threshold value, and triggering an alarm when n continuous frames meet the evaluation condition. The invention adopts scene flow estimation based on multi-view stereo vision, and multiple groups of image sequences from the same scene are acquired by a calibrated multi-view camera. The detection of sharp motion can be efficiently performed using a 3-dimensional scene stream.

Description

Violent motion detection method based on multi-view stereoscopic vision scene stream
Technical Field
The invention relates to a method for detecting violent movement, in particular to a method for detecting violent movement based on multi-view stereoscopic vision scene flow.
Background
With the high development of the technology information technology, especially human beings have breakthrough progress on computer vision and artificial intelligence, so that a lot of work which should be completed by manpower can be completed by a computer. Such as video surveillance, the most common method of operation is for a person to observe the surveillance display and then react accordingly to the occurrence of an abnormal event. A false alarm phenomenon inevitably occurs because a person cannot concentrate on monitoring all events occurring in a video for a long time. Therefore, it is very important to use a computer to process the video frames and determine whether an abnormal event occurs.
The video surveillance camera is typically fixed in position, i.e. object detection in a static background. The classical methods for the detection of objects in most static backgrounds are as follows: background subtraction, interframe subtraction and optical flow. The background subtraction method has the advantages of small calculation amount, and can update the background model according to the dynamic background change, but is greatly influenced by the background change. The interframe difference method also has a small amount of operation, but does not perform well in terms of stability and robustness. The above two methods are difficult to achieve ideal effects for detecting violent movement. The optical flow method is to calculate an optical flow field through two adjacent frames of images, wherein the calculated flow field is 2-dimensional, namely only plane motion information is lost but depth information is lost. Under the condition of no depth information, the detection of the violent motion is difficult to evaluate and judge, and false alarms are easily caused.
The scene stream contains 3-dimensional motion information and 3-dimensional surface depth information, which is indicative of the true motion of the surface of the object relative to the three general methods described above. The scene flow can obtain enough information to judge whether the motion is violent motion, namely the scene flow can effectively solve the problem of judgment of the violent motion.
Disclosure of Invention
The invention aims to provide a method for detecting violent motion based on multi-view stereoscopic scene flow, which has strong detection adaptability.
The purpose of the invention is realized as follows:
the method comprises the following steps: acquiring a plurality of groups of image sequences by using a calibrated multi-view camera;
step two: preprocessing an image sequence, performing multi-resolution down-sampling on the image sequence by adopting an image pyramid, performing coordinate system conversion according to internal and external parameters of a camera, and establishing a relation between an image coordinate system and a camera coordinate system;
step three: designing a scene flow energy functional data item, directly fusing 3-dimensional scene flow information and 3-dimensional surface depth information, designing the data item, and introducing a robust penalty function at the same time on the basis of a structure tensor constancy assumption;
step four: designing a scene flow energy functional smoothing term, wherein the smoothing term adopts flow driving anisotropic smoothing which simultaneously constrains a 3-dimensional flow field V (u, V, w) and a 3-dimensional surface depth Z, and the smoothing term simultaneously introduces a robust penalty function;
step five: optimizing and solving the energy functional, minimizing the energy functional to obtain an Euler-Lagrange equation, and then solving the equation; starting to use a calculation model to calculate from the image pyramid lowest resolution image obtained in the step two until the image pyramid lowest resolution image reaches the full resolution image;
step six: clustering the motion areas of the scene flow, clustering the motion areas by using a clustering algorithm, separating the motion areas from background areas, and removing the background areas;
step seven: constructing a motion direction discrete degree evaluation model, and judging whether the motion is violent;
step eight: constructing a kinetic energy size evaluation model of a motion area;
step nine: and setting a threshold value, and triggering an alarm when n continuous frames meet the evaluation condition.
The present invention may further comprise:
1. in the first step, a plurality of groups of image sequences are acquired by using a calibrated multi-view camera, and then scene flow V (u, V, w) and depth information Z are obtained.
2. In the second step, in the establishment of the relationship between the image coordinate system and the camera coordinate system, the relationship between the 2-dimensional optical flow and the 3-dimensional scene flow is established as
Figure BDA0001310506470000021
Where (u, v) is the 2-dimensional optical flow, (u)0,v0) Are the optical center coordinates.
3. The design of the data items described in step three specifically includes using the assumption of constancy based on the structure tensor,
the constancy assumption of the structure tensor of the N cameras at the time t and t +1 is defined as:
Figure BDA0001310506470000022
reference camera C0The assumption of constancy of the structure tensor with the other N-1 cameras at time t is defined as:
Figure BDA0001310506470000023
reference camera C0The structural constancy assumption with the other N-1 cameras at time t +1 is defined as:
Figure BDA0001310506470000024
in the above data item formula
Figure BDA0001310506470000025
Penalty function is 0.0001, so that the smoothness approximates to L1The norm of the number of the first-order-of-arrival,
Figure BDA0001310506470000026
is a binary shielding mask, is obtained by a shielding boundary region detection technology of a stereo image, and is used for shielding points when pixels are shielding points
Figure BDA0001310506470000031
Non-occluded points
Figure BDA0001310506470000032
ITIs a local tensor of the 2-dimensional image, e.g. formula
Figure BDA0001310506470000033
As shown.
4. The design of the scene flow energy functional smoothing term described in step four specifically includes,
directly regularizing 3-dimensional flow field and depth information, and designing a flow-driven anisotropic smooth hypothesis, Sm(V) and Sd(Z) respectively constraining the 3-dimensional flow field and the depth information, wherein a smooth item design formula is as follows:
Sm(V)=ψ(|u(x,y)x|2)+ψ(|u(x,y)y|2)+ψ(|v(x,y)x|2)+ψ(|v(x,y)y|2)+ψ(|w(x,y)x|2)+ψ(|w(x,y)y|2)
Sd(Z)=ψ(|Z(x,y)x|2)+ψ(|Z(x,y)y|2)
the overall scene flow estimation energy functional is as follows,
Figure BDA0001310506470000034
5. the clustering of the scene flow motion areas in the sixth step specifically includes clustering the scene flow V (u, V, w) obtained in the fifth step by using a clustering algorithm, and separating the background from the motion areas, wherein the feature information of the scene flow specifically includes: each point scene flow u, v, w three components, each point scene flow module value is
Figure BDA0001310506470000035
The included angle theta between each point scene flow and the xoy plane, the xoz plane and the yoz planex,θy,θzEach point ofAll represent V by a 7-dimensional feature vectori,j=(u,v,w,|V|,θx,θy,θz);
The specific process is as follows: the input is a similarity matrix S formed by the similarity between every two of all N data pointsN×NThe initial stage treats all samples as potential cluster centers and then x in order to find the appropriate cluster centerkContinuously collecting the attraction degree r (i, k) and the reliability degree a (i, k) from the data samples, and matching the formula
Figure BDA0001310506470000036
Figure BDA0001310506470000037
Continuously iterating to update the attraction degree and the reliability degree until m cluster center points are generated, wherein r (i, k) is used for describing the degree that the point k is suitable as the cluster center of the data point i, and a (i, k) is used for describing the degree that the point k is selected as the cluster center of the point i;
setting a flag for a motion area, wherein if the motion area is a motion area, the flag is 1, if the motion area is a background area, the flag is 0, counting the number of pixels in the motion area as count, and setting the motion area as a spatial neighborhood
Figure BDA0001310506470000038
6. The constructing of the motion direction discrete degree evaluation model described in the seventh step specifically includes,
defining the Z axis based on the camera coordinate system as the reference vector direction, and calculating the included angle phi between each motion vector and the reference directioni,j(t), the calculation formula is as follows:
Figure BDA0001310506470000041
calculating phi of pixel point in each frame motion regioni,j(t) variance D (phi)i,j(t)), wherein
Figure BDA0001310506470000042
Is the average of all the included angles,
Figure BDA0001310506470000043
7. the kinetic energy of each frame of motion region is calculated from the calculated scene stream as follows:
Figure BDA0001310506470000044
calculating the average kinetic energy of the motion region from the total kinetic energy of each frame of motion region
Figure BDA0001310506470000045
Figure BDA0001310506470000046
8. In the step of setting an angle variance threshold phithAnd kinetic energy threshold value WthWhen D (phi)i,j(t))>φth
Figure BDA0001310506470000047
If n frames satisfy the above two conditions, it is determined that the motion is violent and an alarm is triggered.
The invention adopts scene flow estimation based on multi-view stereo vision, and multiple groups of image sequences from the same scene are acquired by a calibrated multi-view camera. The invention can obtain scene flow information of a multi-view scene sequence and scene 3-dimensional surface depth information, and can effectively detect severe motion by using the 3-dimensional scene flow.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 shows a stereo correspondence relationship between image sequences acquired by the multi-view camera.
FIG. 3 is a flow chart of an algorithm for solving a scene flow.
Detailed Description
With reference to fig. 1, the detection of the violent motion based on the multi-view stereoscopic scene stream of the present invention mainly includes the following steps:
s1, a plurality of groups of image sequences are obtained by using a calibrated multi-view camera.
And S2, preprocessing the input image, and performing multi-resolution down-sampling on the image sequence by adopting an image pyramid. And converting a coordinate system according to the internal and external parameters of the camera, and establishing a relation between an image coordinate system and a camera coordinate system.
And S3, designing a scene flow energy functional data item. Different from the most of the previous constraint modes which combine optical flow and parallax, the method adopts the mode of directly fusing 3-dimensional scene flow information and 3-dimensional surface depth information. The design of the data item adopts the constancy assumption based on the structure tensor, and simultaneously introduces a robust penalty function.
And S4, designing a scene flow energy functional smoothing item. And the smoothing term adopts flow driving anisotropic smoothing which simultaneously constrains a 3-dimensional flow field V (u, V, w) and a 3-dimensional surface depth Z, and the robust penalty function is introduced into the smoothing term.
And S5, energy functional ground optimization solving. In order to solve the 3-dimensional motion V (u, V, w) and the 3-dimensional surface depth Z, it is necessary to minimize the energy functional, obtain the euler-lagrange equation, and then solve the equation. A multi-resolution calculation scheme from coarse to fine is introduced for solving the problem of large displacement existing in a scene stream. The calculation using the calculation model is started from the image pyramid lowest resolution image obtained in S2 until the full resolution image is reached. And S6, clustering the motion areas of the scene streams. And clustering the motion areas by using a clustering algorithm, separating the motion areas from the background areas, and removing the background areas, thereby facilitating the establishment of a subsequent violent motion judgment model.
And S7, compared with the steady motion state, the motion direction of the 3-dimensional scene flow obtained in the target area in the violent motion state is disordered. Based on the above, a motion direction discrete degree evaluation model can be constructed, and whether the motion is violent motion or not is judged.
S8, compared with the steady motion state, the 3-dimensional scene flow value obtained by the target area in the violent motion state is larger. Based on the method, a motion area kinetic energy size evaluation model can be constructed.
S9, manually setting corresponding threshold values, and triggering an alarm when the continuous n frames meet evaluation conditions.
The invention will now be described in more detail by way of example with reference to the accompanying drawings.
S1, as shown in figure 2, an image sequence is acquired by using a calibrated multi-view camera. Points in a real scene move from the P position to t +1 from time t
Figure BDA0001310506470000051
Position, two points at each camera CiThe corresponding points in the imaging plane are respectively points piAnd point
Figure BDA0001310506470000052
time t +1 position
Figure BDA0001310506470000053
Where V (u, V, w) is a real-world 3-dimensional motion vector, u represents a real-world horizontal-direction instantaneous motion velocity, V represents a real-world vertical-direction instantaneous motion velocity, and w represents an instantaneous motion velocity in the depth direction. Mapping V (u, V, w) to 2-dimensional is optical flow
Figure BDA0001310506470000054
S2, the method is based on direct estimation of 3-dimensional scene flow of multi-view stereo vision, and a real world 3-dimensional motion flow field V (u, V, w) and a 3-dimensional surface depth Z are directly constrained in an energy functional. The scene flow energy functional is based on a 2-dimensional plane image, so that a 3-dimensional space needs to be mapped to a 2-dimensional space through perspective projection transformation, and a mapping relation between a 2-dimensional optical flow and a 3-dimensional scene flow is established. I (x)i,yiT) is a camera CiImage sequence pixel point at time t, MiIs about the camera CiThe projection matrix of (2). P (X, Y, Z)TThe real coordinate of the camera coordinate system at the time t is mapped to an image sequence relational expression as follows:
Figure BDA0001310506470000055
wherein M isiIs a 3 × 4 projection matrix, [ M ]i]1,2Is the first two rows of the matrix, [ M ]i]3Is the third row of the matrix. The projection matrix is shown in formula (2), and C is a camera internal parameter matrix which is only related to the internal structure of the camera. [ R T]It is the extrinsic parameter matrix of the camera that is determined by the orientation of the camera relative to the world coordinate system.
Figure BDA0001310506470000061
The scene flow solving energy functional obtained based on the relation has P (X, Y, Z)TV (u, V, w)6 unknowns. As shown in equation (3), the relationship between X, Y and Z can be established, and 6 unknowns can be reduced to 4 unknowns. Solving for Z and V by N pairs of image sequences, where (o)x,oy) Is the camera principal point.
Figure BDA0001310506470000062
The relationship between the 2-dimensional optical flow V (u, V) and the 3-dimensional scene flow V (u, V, w) is shown in equation (4):
Figure BDA0001310506470000063
and performing image pyramid on the obtained N image sequences, performing multi-resolution down-sampling on the images, wherein the sampling factor eta is 0.9, and performing Gaussian filtering on the images obtained in each layer to filter partial noise.
And S3, designing a scene flow energy functional data item. The structure tensor constancy assumption is used in both the spatial and temporal domains. From the structure tensor constancy assumption the following equation can be derived:
the structure tensor of the N cameras at the time t and t +1 is constantly defined as the formula (5):
Figure BDA0001310506470000064
reference camera C0The structure tensor constancy hypothesis definition at time t for a camera and other N-1 cameras is shown in equation (6):
Figure BDA0001310506470000065
reference camera C0The structure tensor constancy assumption definition at time t +1 for a camera and other N-1 cameras is shown in equation (7):
Figure BDA0001310506470000071
in the above data item formula
Figure BDA0001310506470000072
0.0001 is a penalty function, such that the smoothness approximates TV-L1Norm, in order to reduce the influence of the out-of-set points on the functional solution. I isTIs the local tensor of the 2-dimensional image as shown in equation (8).
Figure BDA0001310506470000073
Figure BDA0001310506470000074
Is a binary occlusion mask that acts to ignore occlusion point pixels. When the pixel is an occlusion point
Figure BDA0001310506470000075
Non-occluded points
Figure BDA0001310506470000076
Is calculated by the occlusion boundary area detection technology of the stereo image, and adopts an occlusion boundary detection algorithm based on a credible map. The reference camera C can be effectively detected0And occlusion areas with other cameras.
And S4, designing a scene flow energy functional smoothing item. Suppose for the parameterExamination camera C0The depth information Z and the 3-dimensional flow field V (u, V, w) are piecewise smooth. The smoothing term directly regularizes a 3-dimensional flow field and depth information, the flow field has smoothness in a 3-dimensional space, and a flow driving anisotropic smoothing hypothesis is designed, so that the smoothness of scene flow is ensured.
Figure BDA0001310506470000077
Sd(Z)=ψ(|Z(x,y)x|2)+ψ(|Z(x,y)y|2) (10)
Sm(V) and SdAnd (Z) respectively constraining the 3-dimensional flow field and the depth information, and adopting anisotropic constraint smoothing based on 3-dimensional flow driving for the flow field. The entire energy functional can be written as shown in equation (11):
Figure BDA0001310506470000078
s5, a solution scheme of the scene flow is adopted, namely the values of Z and V when the energy functional is minimized to the maximum extent are found. The common approach is to minimize the energy functional to obtain the Euler-Lagrange equation and then solve the Euler-Lagrange equation. The euler-lagrange equation after minimization of the energy functional can be written as:
Figure BDA0001310506470000079
before minimizing the energy functional, the structural tensor constancy assumption in the data item is abbreviated for simplicity as follows:
Figure BDA00013105064700000710
Δi=IT(p0,t)-IT(pi,t) (14)
Figure BDA00013105064700000711
according to the variation principle, the energy functional E (Z, V) is minimized, the partial derivatives of u and Z are respectively calculated and are equal to 0, and the following Euler-Lagrangian equation can be obtained:
Figure BDA0001310506470000081
Figure BDA0001310506470000082
for v, w minimizes the energy functional to obtain an Eulerian-Lagrangian equation similar to equation (16). The nonlinear problem exists in data items and smoothing items, and the most critical in the process of solving the scene stream is how to avoid trapping in a local minimum to obtain a global optimal solution.
Since a violent motion is detected, there will be a large displacement motion. In order to solve the problem of large displacement, a multi-resolution calculation scheme from coarse to fine is adopted. Using the image pyramid already obtained in S2, the initial value of the scene stream is set to 0, the calculation is started from the lowest resolution, and the initial value is added to the result as the initial value of the next resolution until the full resolution image is reached. Therefore, the problem of calculation inaccuracy caused by large displacement can be effectively eliminated. The specific solution is shown in fig. 3. In FIG. 3, L is the number of image pyramid layers, and only V (u, V, w) is calculated when L is greater than or equal to K, and scene streams V (u, V, w) and Z are calculated when L is greater than 0 and less than K.
S6.3 dimensional scene stream motion V (u, V, w) clustering. The scene stream V (u, V, w) calculated at S5 is not necessarily zero-valued in the background region due to noise and error. If the background area scene flow is not zero, the subsequent violent motion judgment is influenced. The motion areas are clustered by using a clustering algorithm, so that the background and the motion areas are separated, the background area is excluded, and the evaluation of the violent motion can be effectively carried out.
The clustering algorithm aims to find an optimal class representative point set so that the sum of the similarity of all data points to the nearest class representative point is maximum. The algorithm is briefly as follows: the input to the algorithm is all N data pointsSimilarity matrix S formed by similarity between every twoN×NThe algorithm start-up considers all samples as potential cluster centers. Information establishing the degree of attraction with other sample points for each sample point is defined as follows:
the attraction degree: r (i, k) is used to describe the extent to which point k fits as the cluster center for data point i.
Reliability: a (i, k) is used to describe how well point i selects point k as its cluster center.
To find a suitable cluster center xkThe algorithm continuously gathers evidence r (i, k) and a (i, k) from data samples. The iterative formula for r (i, k) and a (i, k) is as follows:
Figure BDA0001310506470000083
Figure BDA0001310506470000091
the algorithm iterates through equations (18) (19) to update the attractiveness and reliability until m high quality cluster center points are generated, while assigning the remaining data points to the corresponding clusters.
The calculated scene stream includes a scene stream of the background region and a scene stream of the moving object, which are significantly different. The scene flow at each point will differ in magnitude and direction. Therefore, the algorithm takes the scene flow direction information and the amplitude information of each point as the characteristics of the point to form the characteristic vector of the point, and the characteristic vector is input into the clustering algorithm for classification.
The feature information of the scene stream specifically includes: each point scene flow u, v, w three components, each point scene flow module value is
Figure BDA0001310506470000092
The included angle theta between each point scene flow and the xoy plane, the xoz plane and the yoz planex,θy,θz. Each point represents V by a 7-dimensional feature vectori,j=(u,v,w,|V|,θx,θy,θz). And clustering the scene flow with the 7-dimensional feature vector, wherein the obtained clustering area comprises a background area and a motion area. Generally, based on scene flow, when the camera is stationary, it is determined that a region with a motion vector close to 0 in the clustering result belongs to a background region, and other clustering regions are motion regions.
After the motion area and the background area are separated, a flag is set. If the motion region flag is equal to 1, if the motion region flag is equal to 0, counting the number of pixels in the motion region as count, and setting the motion region as a spatial neighborhood
Figure BDA0001310506470000095
And S7, according to the motion area scene flow V (u, V, w) obtained in the step S6, establishing a proper evaluation model to evaluate whether the motion area scene flow is violent or not.
And establishing a motion direction evaluation model according to the motion direction condition of the scene flow. The motion area of each frame of the moving object in the camera coordinate system has been separated in S6. If the motion is normal and stable, the motion vector direction of the motion area is analyzed, and the main focus can be found in one direction. The distribution of the motion directions of the violent motion is relatively random. If the motion vector direction histogram of the motion point is constructed, the histogram formed by the violent motion is relatively discrete, and the histogram formed by the steady motion is relatively centralized.
For quantitative evaluation of the direction of motion of each motion vector, the Z-axis based on the camera coordinate system is defined as the reference vector direction. The direction of the motion vector can be determined by calculating the included angle between each motion point of the motion area and the reference direction. Obtaining the horizontal velocity u of each pixel point of the nth frame in the camera coordinate system from S5i,j(t) velocity v in the vertical directioni,j(t) and velocity w in the depth directioni,j(t) of (d). Its angle phi with the reference vectori,j(t) is shown in equation (20):
Figure BDA0001310506470000093
to determine if the motion is a violent motion, calculate phi of all motion pointsi,j(t) variance D (phi)i,j(t)), as shown in formula (21), wherein
Figure BDA0001310506470000094
Is the mean of all included angles.
Figure BDA0001310506470000101
And S8, establishing an evaluation model of the kinetic energy of the motion according to the motion energy of the motion area. The kinetic energy of each frame of motion region of the scene stream is calculated as follows:
Figure BDA0001310506470000102
the average kinetic energy of each pixel point in the motion area can be calculated according to the total kinetic energy of each frame of motion area
Figure BDA0001310506470000103
Figure BDA0001310506470000104
S9, manually setting an angle variance threshold phithWith the kinetic energy threshold W of each pixelthFrom S7 and S8, D (φ) is knowni,j(t))>φth
Figure BDA0001310506470000105
If n frames continuously satisfy the two conditions, it is determined that the motion is abnormal and violent, and an alarm is triggered.
The invention uses 3-dimensional scene flow for violent motion detection for the first time, and can better realize the detection and alarm function of violent motion.

Claims (8)

1. A violent motion detection method based on multi-view stereoscopic vision scene flow is characterized by comprising the following steps:
the method comprises the following steps: acquiring a plurality of groups of image sequences by using a calibrated multi-view camera;
step two: preprocessing an image sequence, performing multi-resolution down-sampling on the image sequence by adopting an image pyramid, performing coordinate system conversion according to internal and external parameters of a camera, and establishing a relation between an image coordinate system and a camera coordinate system;
step three: designing a scene flow energy functional data item, directly fusing 3-dimensional scene flow information and 3-dimensional surface depth information, designing the data item, and introducing a robust penalty function at the same time on the basis of a structure tensor constancy assumption;
step four: designing a scene flow energy functional smoothing term, wherein the smoothing term adopts flow driving anisotropic smoothing which simultaneously constrains a 3-dimensional flow field V (u, V, w) and a 3-dimensional surface depth Z, and the smoothing term simultaneously introduces a robust penalty function;
step five: optimizing and solving the energy functional, minimizing the energy functional to obtain an Euler-Lagrange equation, and then solving the equation; starting to use a calculation model to calculate from the image pyramid lowest resolution image obtained in the step two until the image pyramid lowest resolution image reaches the full resolution image;
step six: clustering the motion areas of the scene flow, clustering the motion areas by using a clustering algorithm, separating the motion areas from background areas, and removing the background areas;
step seven: constructing a motion direction discrete degree evaluation model, and judging whether the motion is violent;
step eight: constructing a kinetic energy size evaluation model of a motion area;
step nine: and setting a threshold value, and triggering an alarm when n continuous frames meet the evaluation condition.
2. The method of claim 1, wherein the method comprises: in the second step, in the establishment of the relationship between the image coordinate system and the camera coordinate system, the relationship between the 2-dimensional optical flow and the 3-dimensional scene flow is established as
Figure FDA0001310506460000011
Where (u, v) is the 2-dimensional optical flow, (u)0,v0) Are the optical center coordinates.
3. The method of claim 1, wherein the method comprises: the design of the data items described in step three specifically includes using the assumption of constancy based on the structure tensor,
the constancy assumption of the structure tensor of the N cameras at the time t and t +1 is defined as:
Figure FDA0001310506460000012
reference camera C0The assumption of constancy of the structure tensor with the other N-1 cameras at time t is defined as:
Figure FDA0001310506460000021
reference camera C0The structural constancy assumption with the other N-1 cameras at time t +1 is defined as:
Figure FDA0001310506460000022
in the above data item formula
Figure FDA0001310506460000023
Is a penalty function, making the smoothness approximate to L1The norm of the number of the first-order-of-arrival,
Figure FDA0001310506460000024
is a binary shielding mask, is obtained by a shielding boundary region detection technology of a stereo image, and is used for shielding points when pixels are shielding points
Figure FDA0001310506460000025
Non-shielding holder
Figure FDA0001310506460000026
ITIs a local tensor of the 2-dimensional image, e.g. formula
Figure FDA0001310506460000027
As shown.
4. The method of claim 1, wherein the method comprises: the design of the scene flow energy functional smoothing term described in step four specifically includes,
directly regularizing 3-dimensional flow field and depth information, and designing a flow-driven anisotropic smooth hypothesis, Sm(V) and Sd(Z) respectively constraining the 3-dimensional flow field and the depth information, wherein a smooth item design formula is as follows:
Figure FDA0001310506460000028
Sd(Z)=ψ(|Z(x,y)x|2)+ψ(|Z(x,y)y|2)
the overall scene flow estimation energy functional is as follows,
Figure FDA0001310506460000029
5. the method of claim 1, wherein the method comprises: the clustering of the scene flow motion areas in the sixth step specifically includes clustering the scene flow V (u, V, w) obtained in the fifth step by using a clustering algorithm, and separating the background from the motion areas, wherein the feature information of the scene flow specifically includes: each point scene flow u, v, w three components, each point scene flow module value is
Figure FDA00013105064600000210
The included angle theta between each point scene flow and the xoy plane, the xoz plane and the yoz planex,θy,θzEach point represents V by a 7-dimensional feature vectori,j=(u,v,w,|V|,θx,θy,θz);
The specific process is as follows: the input is a similarity matrix S formed by the similarity between every two of all N data pointsN×NThe initial stage treats all samples as potential cluster centers and then x in order to find the appropriate cluster centerkContinuously collecting the attraction degree r (i, k) and the reliability degree a (i, k) from the data samples, and matching the formula
Figure FDA00013105064600000211
Figure FDA0001310506460000031
Continuously iterating to update the attraction degree and the reliability degree until m cluster center points are generated, wherein r (i, k) is used for describing the degree that the point k is suitable as the cluster center of the data point i, and a (i, k) is used for describing the degree that the point k is selected as the cluster center of the point i;
setting a flag for a motion area, wherein if the motion area is a motion area, the flag is 1, if the motion area is a background area, the flag is 0, counting the number of pixels in the motion area as count, and setting the motion area as a spatial neighborhood
Figure FDA0001310506460000032
6. The method of claim 1, wherein the method comprises: the constructing of the motion direction discrete degree evaluation model described in the seventh step specifically includes,
defining the Z axis based on the camera coordinate system as the reference vector direction, and calculating the included angle phi between each motion vector and the reference directioni,j(t), the calculation formula is as follows:
Figure FDA0001310506460000033
calculating phi of pixel point in each frame motion regioni,j(t) variance D (phi)i,j(t)), wherein
Figure FDA0001310506460000034
Is the average of all the included angles,
Figure FDA0001310506460000035
7. the method of claim 1, wherein the method comprises: the kinetic energy of each frame of motion region is calculated from the calculated scene stream as follows:
Figure FDA0001310506460000036
calculating the average kinetic energy of the motion region from the total kinetic energy of each frame of motion region
Figure FDA0001310506460000037
Figure FDA0001310506460000038
8. The method of claim 1, wherein the method comprises: in the step of setting an angle variance threshold phithAnd kinetic energy threshold value WthWhen is coming into contact with
Figure FDA0001310506460000039
If n frames satisfy the above two conditions, it is determined that the motion is violent and an alarm is triggered.
CN201710404056.7A 2017-06-01 2017-06-01 Violent motion detection method based on multi-view stereoscopic vision scene stream Active CN107341815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710404056.7A CN107341815B (en) 2017-06-01 2017-06-01 Violent motion detection method based on multi-view stereoscopic vision scene stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710404056.7A CN107341815B (en) 2017-06-01 2017-06-01 Violent motion detection method based on multi-view stereoscopic vision scene stream

Publications (2)

Publication Number Publication Date
CN107341815A CN107341815A (en) 2017-11-10
CN107341815B true CN107341815B (en) 2020-10-16

Family

ID=60221390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710404056.7A Active CN107341815B (en) 2017-06-01 2017-06-01 Violent motion detection method based on multi-view stereoscopic vision scene stream

Country Status (1)

Country Link
CN (1) CN107341815B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102707596B1 (en) * 2018-08-07 2024-09-19 삼성전자주식회사 Device and method to estimate ego motion
CN109726718B (en) * 2019-01-03 2022-09-16 电子科技大学 Visual scene graph generation system and method based on relation regularization
CN109978968B (en) * 2019-04-10 2023-06-20 广州虎牙信息科技有限公司 Video drawing method, device and equipment of moving object and storage medium
CN112015170A (en) * 2019-05-29 2020-12-01 北京市商汤科技开发有限公司 Moving object detection and intelligent driving control method, device, medium and equipment
CN112581494B (en) * 2020-12-30 2023-05-02 南昌航空大学 Binocular scene flow calculation method based on pyramid block matching
CN112614151B (en) * 2021-03-08 2021-08-31 浙江大华技术股份有限公司 Motion event detection method, electronic device and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680544A (en) * 2015-03-18 2015-06-03 哈尔滨工程大学 Method for estimating variational scene flow based on three-dimensional flow field regularization
CN106485675A (en) * 2016-09-27 2017-03-08 哈尔滨工程大学 A kind of scene flows method of estimation guiding anisotropy to smooth based on 3D local stiffness and depth map
CN106504202A (en) * 2016-09-27 2017-03-15 哈尔滨工程大学 A kind of based on the non local smooth 3D scene flows methods of estimation of self adaptation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659372B2 (en) * 2012-05-17 2017-05-23 The Regents Of The University Of California Video disparity estimate space-time refinement method and codec

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680544A (en) * 2015-03-18 2015-06-03 哈尔滨工程大学 Method for estimating variational scene flow based on three-dimensional flow field regularization
CN106485675A (en) * 2016-09-27 2017-03-08 哈尔滨工程大学 A kind of scene flows method of estimation guiding anisotropy to smooth based on 3D local stiffness and depth map
CN106504202A (en) * 2016-09-27 2017-03-15 哈尔滨工程大学 A kind of based on the non local smooth 3D scene flows methods of estimation of self adaptation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Variational Method for Scene Flow Estimation from Stereo Sequences;Frederic Huguet 等;《2007 IEEE 11th International Conference on Computer Vision》;20071021;第1-17页 *
基于双目场景流的运动目标检测与跟踪;杨文康;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215;I138-3458 *

Also Published As

Publication number Publication date
CN107341815A (en) 2017-11-10

Similar Documents

Publication Publication Date Title
CN107341815B (en) Violent motion detection method based on multi-view stereoscopic vision scene stream
CN111462200B (en) Cross-video pedestrian positioning and tracking method, system and equipment
CN110910421B (en) Weak and small moving object detection method based on block characterization and variable neighborhood clustering
CN102098440A (en) Electronic image stabilizing method and electronic image stabilizing system aiming at moving object detection under camera shake
Xu et al. Dynamic obstacle detection based on panoramic vision in the moving state of agricultural machineries
CN110599522A (en) Method for detecting and removing dynamic target in video sequence
CN111260691B (en) Space-time regular correlation filtering tracking method based on context awareness regression
CN106530407A (en) Three-dimensional panoramic splicing method, device and system for virtual reality
CN105957060B (en) A kind of TVS event cluster-dividing method based on optical flow analysis
CN111582036A (en) Cross-view-angle person identification method based on shape and posture under wearable device
Ellenfeld et al. Deep fusion of appearance and frame differencing for motion segmentation
CN112509014B (en) Robust interpolation light stream computing method matched with pyramid shielding detection block
CN109166079B (en) Mixed synthesis motion vector and brightness clustering occlusion removing method
Hu et al. An integrated background model for video surveillance based on primal sketch and 3D scene geometry
Li et al. Real-time action recognition by feature-level fusion of depth and inertial sensor
Rougier et al. 3D head trajectory using a single camera
CN111160255B (en) Fishing behavior identification method and system based on three-dimensional convolution network
Panagiotakis et al. Shape-based individual/group detection for sport videos categorization
CN108647589A (en) It is a kind of based on regularization form than fall down detection method
CN117876419B (en) Dual-view-field aerial target detection and tracking method
Briassouli et al. Fusion of frequency and spatial domain information for motion analysis
CN118314162B (en) Dynamic visual SLAM method and device for time sequence sparse reconstruction
Nagmode et al. A novel approach to detect and track moving object using Partitioning and Normalized Cross Correlation
Nagmode et al. Moving Object detection and tracking based on correlation and wavelet Transform Technique to optimize processing time
Hadfield et al. Go with the flow: Hand trajectories in 3D via clustered scene flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant