CN106203360A

CN106203360A - Intensive scene crowd based on multistage filtering model hives off detection algorithm

Info

Publication number: CN106203360A
Application number: CN201610559499.9A
Authority: CN
Inventors: 赵倩; 邵洁; 赵琰
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power
Priority date: 2016-07-15
Filing date: 2016-07-15
Publication date: 2016-12-07

Abstract

A kind of intensive scene crowd based on multistage filtering model hives off detection algorithm, relates to image analysis technology field, and solved is the technical problem improving the Detection results that hives off.The foreground area that this algorithm obtains in the background subtraction of gauss hybrid models extracts KLT characteristic point, by analyzing the kinetic characteristic of characteristic point, time-space matrix is used to filter step by step adjacent to filter operator, velocity attitude filter operator and motion relevance filter operator, travel through the characteristic point in all prospects, it is achieved detection of hiving off.The algorithm that the present invention provides, it is adaptable to video image analysis.

Description

Crowd Segmentation Detection Algorithm Based on Multi-level Filtering Model in Dense Scenes

技术领域technical field

本发明涉及图像分析技术，特别是涉及一种基于多级过滤模型的密集场景人群分群检测算法的技术。The invention relates to an image analysis technology, in particular to a technology for grouping and detecting crowds in dense scenes based on a multi-stage filtering model.

背景技术Background technique

密集场景中的人群运动分析、群体事件及异常行为检测是当前智能监控研究领域的热点课题，也是视频监控系统和智能交通研究的重要方向之一。在维护社会治安、提高城市交通规划的科学性等方面都有重要的研究价值。Crowd movement analysis, group events and abnormal behavior detection in dense scenes are hot topics in the field of intelligent surveillance research, and are also one of the important directions of video surveillance system and intelligent transportation research. It has important research value in maintaining social security and improving the scientificity of urban traffic planning.

通常密集人群运动场景中，人群普遍表现为一种无规律的运动组合形式，即无序运动。这种复杂的特性大大地增加了研究的难度，因此之前也较少有研究成果涉及。针对密集场景的传统分析思路为将场景看作目标的运动集合，以实现提取目标轨迹及行为识别的目的。其处理方法有两类，一类是基于目标中心对象的处理方法，另一类是基于场景的处理方法。Usually, in dense crowd movement scenes, the crowd generally presents a form of irregular movement combination, that is, disorderly movement. This complex feature greatly increases the difficulty of research, so there are few previous research results involved. The traditional analysis idea for dense scenes is to regard the scene as a moving set of targets, so as to achieve the purpose of extracting target trajectories and behavior recognition. There are two types of processing methods, one is the processing method based on the target center object, and the other is the processing method based on the scene.

基于目标中心对象的处理方法将密集人群看作大量个体的集合，从个体分析角度出发提取其速度和方向，然后采用加权的连接图或由底向上的多层聚类方法实现群体检测；这种算法的有效性保证条件为目标聚集密度低，目标个体的像素分辨率高。但无序运动密集场景中人群之间遮挡非常严重且遮挡关系通常未知，导致目标的正确分割成为一个难点，因此分群效果较差。The processing method based on the target central object regards the dense crowd as a collection of a large number of individuals, extracts its speed and direction from the perspective of individual analysis, and then uses a weighted connection graph or a bottom-up multi-layer clustering method to achieve group detection; The validity guarantee condition of the algorithm is that the target aggregation density is low and the pixel resolution of the target individual is high. However, the occlusion between crowds is very serious and the occlusion relationship is usually unknown in the scene of disorderly and dense motion, which makes the correct segmentation of the target a difficult point, so the grouping effect is poor.

基于场景的处理方法主要有光流法、动态纹理和网格粒子等，这些算法对于密集人群中存在严重遮挡问题的场景及复杂运动模式场景的处理效果较差，而且还需要对单个行人进行分割以及样本训练，需要事先提供一些先验信息；而且邻域特征点的稳定性较差，分群检测效果有待进一步提高。Scene-based processing methods mainly include optical flow method, dynamic texture and grid particles, etc. These algorithms have poor processing effect on scenes with serious occlusion problems in dense crowds and scenes with complex motion patterns, and need to segment individual pedestrians As well as sample training, some prior information needs to be provided in advance; and the stability of the neighborhood feature points is poor, and the effect of group detection needs to be further improved.

发明内容Contents of the invention

针对上述现有技术中存在的缺陷，本发明所要解决的技术问题是提供一种对密集人群中存在严重遮挡问题的场景及复杂运动模式场景的处理效果好，而且不需要对单个行人进行分割以及样本训练，不需要事先提供先验信息；而且邻域特征点的稳定性好，分群检测效果好的基于多级过滤模型的密集场景人群分群检测算法。In view of the above-mentioned defects in the prior art, the technical problem to be solved by the present invention is to provide a kind of scene with serious occlusion problem among dense crowds and scenes with complex motion patterns, which has a good processing effect, and does not need to segment individual pedestrians and Sample training does not need to provide prior information in advance; and the stability of neighborhood feature points is good, and the grouping detection effect is good. A dense scene crowd grouping detection algorithm based on a multi-level filtering model.

为了解决上述技术问题，本发明所提供的一种基于多级过滤模型的密集场景人群分群检测算法，其特征在于，具体步骤如下：In order to solve the above-mentioned technical problems, a crowd detection algorithm based on a multi-stage filtering model provided by the present invention is characterized in that the specific steps are as follows:

S1)对目标视频图像，先采用混合高斯模型对每一个图像帧进行背景建模，再采用背景差法对每一个图像帧的前景、背景进行分割，从而得到每一个图像帧的前景区域；S1) For the target video image, first adopt the mixed Gaussian model to carry out background modeling for each image frame, and then use the background difference method to segment the foreground and background of each image frame, thereby obtaining the foreground area of each image frame;

S2)对目标视频图像，采用KLT跟踪算法从每一个图像帧的前景区域中提取特征点，从而得到每一个图像帧的特征点坐标集；S2) For the target video image, the KLT tracking algorithm is used to extract feature points from the foreground area of each image frame, thereby obtaining the feature point coordinate set of each image frame;

S3)对每一个图像帧的特征点坐标集，将该特征点坐标集中的各特征点逐一的作为目标点，并按照步骤S3.1至S3.9的方法获取每个目标点的初始同群点集，最终获得的各个目标点的初始同群点集构成该图像帧的初始同群点集序列；S3) For the feature point coordinate set of each image frame, use each feature point in the feature point coordinate set as the target point one by one, and obtain the initial cohort of each target point according to the method of steps S3.1 to S3.9 Point set, the initial homogeneous point set of each target point finally obtained constitutes the initial homogeneous point set sequence of the image frame;

获取目标点的初始同群点集的步骤如下：The steps to obtain the initial cohort set of target points are as follows:

S3.1)从ζ_t中取一个特征点作为目标点i，并设定为目标点i的最近邻域，其中，ζ_t为目标视频图像中的第t个图像帧的特征点坐标集；S3.1) Take a feature point from ζ _t as the target point i, and set Be the nearest neighbor of target point i, wherein, ζ _t is the feature point coordinate set of the tth image frame in the target video image;

S3.2)计算目标点i与ζ_t中的各特征点的高斯权值，具体计算公式为：S3.2) Calculate the Gaussian weight of each feature point in the target point i and ζ _t , the specific calculation formula is:

${w w}_{i i,, j j} = = exp exp ((- - \frac{d d i i s the s t t ((i i,, j j))}{r r})) = = exp exp ((- - \frac{\sqrt{{(({x x}_{i i} - - {x x}_{j j}))}^{22} + + {(({y the y}_{i i} - - {y the y}_{j j}))}^{22}}}{r r}))$

其中，w_i,j为目标点i与特征点j的高斯权值，j∈ζ_t且j≠i，(x_i,y_i)为目标点i的坐标，(x_j,y_j)为特征点j的坐标，r的取值为20；Among them, w _{i, j} is the Gaussian weight of target point i and feature point j, j∈ζ _t and j≠i, ( _xi ,y _i ) is the coordinates of target point i, (x _j ,y _j ) is The coordinates of feature point j, the value of r is 20;

S3.3)对ζ_t中除目标点i之外的每个特征点实施时空距离邻近过滤，过滤方法为：S3.3) Implement spatio-temporal distance proximity filtering for each feature point except the target point i in ζ _t , the filtering method is:

对ζ_t中的任意一个特征点j，j∈ζ_t且j≠i，如果有w_i,j ≥0.5w_max，则将特征点j归入其中的w_max为目标点i与ζ_t中的其它特征点的高斯权值中的最大值；For any feature point j in ζ _t , j∈ζ _t and j≠i, if w _i,j ≥ 0.5w _max , feature point j will be classified into Wherein w _max is the maximum value among the Gaussian weights of other feature points in the target point i and ζ _t ;

S3.4)设定为目标点i在t→t+d之间的邻域交集，其中的d＝3；S3.4) Setting is the neighborhood intersection of the target point i between t→t+d, where d=3;

S3.5)计算中的各个特征点与目标点i从t→t+d的速度夹角，具体计算公式为：S3.5) Calculation The velocity angle between each feature point in and the target point i from t→t+d, the specific calculation formula is:

${θ θ}_{i i,, j j} = = \frac{11}{d d + + 11} {Σ Σ}_{τ τ = = t t}^{t t + + d d} a a b b s the s (({θ θ}_{i i}^{τ τ} - - {θ θ}_{j j}^{τ τ}))$

如果且则令 if and order

如果或则令 if or order

如果且则 if and but

如果或则 if or but

其中，θ_i,j为特征点j与目标点i从t→t+d的速度夹角，d＝3，为特征点j在目标视频图像中的第τ个图像帧的速度方向，为特征点j在目标视频图像中的第τ个图像帧的坐标，为特征点j在目标视频图像中的第τ+1个图像帧的坐标；为目标点i在目标视频图像中的第τ个图像帧的速度方向，为目标点i在目标视频图像中的第τ个图像帧的坐标，为目标点i在目标视频图像中的第τ+1个图像帧的坐标；Among them, θ _i,j is the velocity angle between feature point j and target point i from t→t+d, d=3, is the velocity direction of the feature point j in the τth image frame of the target video image, is the coordinate of the feature point j in the τth image frame of the target video image, Be the coordinates of the τ+1th image frame of the feature point j in the target video image; is the velocity direction of the target point i in the τth image frame of the target video image, is the coordinates of the τth image frame of the target point i in the target video image, be the coordinates of the τ+1th image frame of the target point i in the target video image;

S3.6)对于中的任意一个特征点j，如果有θ_i,j≥45或(360-θ_i,j)≥45,则将该特征点j从中剃除；S3.6) For For any feature point j in , if there is θ _{i, j} ≥ 45 or (360-θ _{i, j} ) ≥ 45, then the feature point j will be changed from in the shaving;

S3.7)计算中的每个特征点与目标点i从t→t+d的运动相关性，具体计算公式为：S3.7) Calculation The motion correlation between each feature point in and the target point i from t→t+d, the specific calculation formula is:

${C C}_{i i,, j j} = = \frac{11}{d d + + 11} {Σ Σ}_{τ τ = = t t}^{t t + + d d} \frac{{v v}_{τ τ}^{i i} \cdot \cdot {v v}_{τ τ}^{j j}}{| | | | {v v}_{τ τ}^{i i} | | | | \cdot &Center Dot; | | | | {v v}_{τ τ}^{j j} | | | |}$

${v v}_{τ τ}^{i i} = = (({v v}_{{x x}_{i i}}^{τ τ},, {v v}_{{y the y}_{i i}}^{τ τ})) = = (({x x}_{i i}^{τ τ + + 11} - - {x x}_{i i}^{τ τ},, {y the y}_{i i}^{τ τ + + 11} - - {y the y}_{i i}^{τ τ}))$

$| | | | {v v}_{τ τ}^{i i} | | | | = = \sqrt{{(({v v}_{{x x}_{i i}}^{τ τ}))}^{22} + + {(({v v}_{{y the y}_{i i}}^{τ τ}))}^{22}}$

${v v}_{τ τ}^{j j} = = (({v v}_{{x x}_{j j}}^{τ τ},, {v v}_{{y the y}_{j j}}^{τ τ})) = = (({x x}_{j j}^{τ τ + + 11} - - {x x}_{j j}^{τ τ},, {y the y}_{j j}^{τ τ + + 11} - - {y the y}_{j j}^{τ τ}))$

$| | | | {v v}_{τ τ}^{j j} | | | | = = \sqrt{{(({v v}_{{x x}_{j j}}^{τ τ}))}^{22} + + {(({v v}_{{y the y}_{j j}}^{τ τ}))}^{22}}$

其中，C_i,j为特征点j与目标点i从t→t+d的运动相关性，d＝3，为目标点i在目标视频图像中的第τ个图像帧的速度，为特征点j在目标视频图像中的第τ个图像帧的速度，为目标点i在目标视频图像中的第τ个图像帧的坐标，为目标点i在目标视频图像中的第τ+1个图像帧的坐标；为特征点j在目标视频图像中的第τ个图像帧的坐标，为特征点j在目标视频图像中的第τ+1个图像帧的坐标；Among them, C _i,j is the motion correlation between feature point j and target point i from t→t+d, d=3, is the velocity of the target point i in the τth image frame in the target video image, is the velocity of the feature point j in the τth image frame in the target video image, is the coordinates of the τth image frame of the target point i in the target video image, be the coordinates of the τ+1th image frame of the target point i in the target video image; is the coordinate of the feature point j in the τth image frame of the target video image, Be the coordinates of the τ+1th image frame of the feature point j in the target video image;

S3.8)设定C_th＝0.6，对于中的任意一个特征点j，如果有C_i,j≤C_th，则将该特征点j从中剃除；S3.8) Set C _th =0.6, for Any feature point j in , if there is C _i,j ≤ C _th , then the feature point j from in the shaving;

S3.9)将定义为目标点i的初始同群点集 S3.9) will Defined as the initial cohort point set of the target point i

S4)对每个图像帧的初始同群点集序列，将该初始同群点集序列中的各个初始同群点集按照特征点数目降序排列；S4) For the initial homogeneous point set sequence of each image frame, arrange each initial homogeneous point set in the initial homogeneous point set sequence in descending order according to the number of feature points;

S5)对每个图像帧的初始同群点集序列，按照步骤S5.1至步骤S5.3的方法对该初始同群点集序列中的各初始同群点集进行分类标记；S5) For the initial co-cluster point set sequence of each image frame, classify and mark each initial co-cluster point set in the initial co-cluster point set sequence according to the method from step S5.1 to step S5.3;

S5.1)令K＝1，L＝1，将第K个初始同群点集中的所有特征点标记为L；S5.1) Make K=1, L=1, mark all feature points in the Kth initial cohort point set as L;

S5.2)令K＝K+1；S5.2) Let K=K+1;

如果第K个初始同群点集中的所有特征点都未标记过，则令L＝L+1，并将第K个初始同群点集中的所有特征点都标记为L；If all feature points in the Kth initial cluster point set have not been marked, then make L=L+1, and all feature points in the Kth initial cluster point set are marked as L;

如果第K个初始同群点集中至少有一个特征点已标记为L，则将第K个初始同群点集中的所有特征点都标记为L；If at least one feature point in the Kth initial cohort point set has been marked as L, then all feature points in the Kth initial cohort point set are marked as L;

S5.3)重复步骤S5.2，直至初始同群点集序列中的各初始同群点集均标记完毕；S5.3) Step S5.2 is repeated until each initial homologous point set in the initial homologous point set sequence is marked;

S6)对每个图像帧的特征点坐标集，把分类标记相同的特征点归为同一类群体，并在该图像帧中把分类标记相同的特征点用相同的颜色进行标示，并在该图像帧中把分类标记相异的特征点用相异的颜色进行标示，从而实现分群检测。S6) For the feature point coordinate set of each image frame, the feature points with the same classification mark are classified into the same group, and the feature points with the same classification mark are marked with the same color in the image frame, and in the image In the frame, the feature points with different classification labels are marked with different colors, so as to realize group detection.

本发明提供的基于多级过滤模型的密集场景人群分群检测算法，具有以下有益效果：The multi-stage filtering model-based grouping detection algorithm for crowds in dense scenes provided by the present invention has the following beneficial effects:

1)采用运动目标人群上的特征点的运动状态估计人群的运动状态，可以避免密集人群中严重的遮挡问题，对复杂运动模式场景的处理效果好；1) Estimate the motion state of the crowd by using the motion state of the feature points on the moving target crowd, which can avoid serious occlusion problems in dense crowds, and has a good processing effect on scenes with complex motion patterns;

2)不需要对单个行人进行分割以及样本训练，也不需要任何先验信息；2) There is no need for segmentation and sample training of a single pedestrian, nor does it require any prior information;

3)时空距离邻近过滤中最近邻域的特征点数能根据相邻距离特征点的最短距离进行自动调节，且特征点的邻域选择连续视频图像取得的邻域交集,保证了邻域特征点的稳定性；3) The number of feature points in the nearest neighborhood in the spatio-temporal distance proximity filter can be automatically adjusted according to the shortest distance between adjacent feature points, and the neighborhood of feature points is selected from the neighborhood intersection obtained from continuous video images, which ensures the accuracy of neighborhood feature points. stability;

4)算法不但考虑了群体运动的相关性，而且考虑了速度方向一致性，分群检测效果好。4) The algorithm not only considers the correlation of group motion, but also considers the consistency of speed and direction, and the effect of group detection is good.

附图说明Description of drawings

图1是本发明实施例的基于多级过滤模型的密集场景人群分群检测算法的原理图。FIG. 1 is a schematic diagram of a crowd detection algorithm in a dense scene based on a multi-level filtering model according to an embodiment of the present invention.

具体实施方式detailed description

以下结合附图说明对本发明的实施例作进一步详细描述，但本实施例并不用于限制本发明，凡是采用本发明的相似结构及其相似变化，均应列入本发明的保护范围，本发明中的顿号均表示和的关系。The embodiments of the present invention are described in further detail below in conjunction with the accompanying drawings, but the present embodiments are not intended to limit the present invention. All similar structures and similar changes of the present invention should be included in the scope of protection of the present invention. The commas in all indicate the relationship between and.

如图1所示，本发明实施例所提供的一种基于多级过滤模型的密集场景人群分群检测算法，其特征在于，具体步骤如下：As shown in Figure 1, a kind of dense scene crowd grouping detection algorithm based on the multi-level filtering model provided by the embodiment of the present invention is characterized in that, the specific steps are as follows:

其中，w_i,j为目标点i与特征点j的高斯权值，j∈ζ_t且j≠i，(x_i,y_i)为目标点i的坐标，(x_j,y_j)为特征点j的坐标，r为常数，r的典型值20；w_i,j将随着距离dist(i,j)增大而减小，反之则增大，这也就意味人群越密集，w_i,j越大；Among them, w _{i, j} is the Gaussian weight of target point i and feature point j, j∈ζ _t and j≠i, ( _xi ,y _i ) is the coordinates of target point i, (x _j ,y _j ) is The coordinates of feature point j, r is a constant, and the typical value of r is 20; w _{i, j} will decrease as the distance dist(i, j) increases, and vice versa, which means that the crowd is denser, w _{i, j} bigger;

得到的中，特征点的个数能根据w_max自动调节，w_max是目标点i与距其最近的特征点之间的高斯权值；owned In , the number of feature points can be automatically adjusted according to w _max , _which is the Gaussian weight between the target point i and the nearest feature point;

S3.4)设定为目标点i在t→t+d之间的邻域交集，其中的d＝3；t→t+d是指目标视频图像的第t个图像帧至第t+d个图像帧；S3.4) Setting Be the neighborhood intersection of target point i between t→t+d, wherein d=3; t→t+d refers to the tth image frame to the t+dth image frame of the target video image;

如果且则令 if and order

如果或则令 if or order

如果且则 if and but

如果或则 if or but

S3.6)对于中的任意一个特征点j，如果有或(360-θ_i,j)≥45,则将该特征点j从中剃除；S3.6) For Any feature point j in , if there is or (360-θ _i,j )≥45, then the feature point j from in the shaving;

${C C}_{i i,, j j} = = \frac{11}{d d + + 11} {Σ Σ}_{τ τ = = t t}^{t t + + d d} \frac{{v v}_{τ τ}^{i i} \cdot \cdot {v v}_{τ τ}^{j j}}{| | | | {v v}_{τ τ}^{i i} | | | | \cdot \cdot | | | | {v v}_{τ τ}^{j j} | | | |}$

S3.9)将定义为目标点i的初始同群点集中的所有特征点均与目标点i同群；S3.9) will Defined as the initial cohort point set of the target point i All the feature points in are in the same group as the target point i;

S5.1)令K＝1，L＝1，将第K个初始同群点集中的所有特征点标记为L；S5.1) Let K=1, L=1, mark all feature points in the Kth initial cohort point set as L;

S5.2)令K＝K+1；S5.2) Let K=K+1;

S6)对每个图像帧的特征点坐标集，把分类标记相同的特征点归为同一类群体，并在该图像帧中把分类标记相同的特征点用相同的颜色进行标示，并在该图像帧中把分类标记相异的特征点用相异的颜色进行标示，从而实现分群检测。S6) For the feature point coordinate set of each image frame, the feature points with the same classification mark are classified into the same group, and the feature points with the same classification mark are marked with the same color in the image frame, and in the image In the frame, the feature points with different classification marks are marked with different colors, so as to realize group detection.

Claims

1. A dense scene crowd grouping detection algorithm based on multi-stage filtering model, is characterized in that, concrete steps are as follows:

S1) For the target video image, first adopt the mixed Gaussian model to carry out background modeling for each image frame, and then use the background difference method to segment the foreground and background of each image frame, thereby obtaining the foreground area of each image frame;

S2) For the target video image, the KLT tracking algorithm is used to extract feature points from the foreground area of each image frame, thereby obtaining the feature point coordinate set of each image frame;

S3) For the feature point coordinate set of each image frame, use each feature point in the feature point coordinate set as the target point one by one, and obtain the initial cohort of each target point according to the method of steps S3.1 to S3.9 Point set, the initial homogeneous point set of each target point finally obtained constitutes the initial homogeneous point set sequence of the image frame;

The steps to obtain the initial cohort set of target points are as follows:

S3.1) Take a feature point from ζ _t as the target point i, and set Be the nearest neighbor of target point i, wherein, ζ _t is the feature point coordinate set of the tth image frame in the target video image;

S3.2) Calculate the Gaussian weight of each feature point in the target point i and ζ _t , the specific calculation formula is:

{w w}_{i i,, j j} = = exp exp ((- - \frac{d d i i s the s t t ((i i,, j j))}{r r})) = = exp exp ((- - \frac{\sqrt{{(({x x}_{i i} - - {x x}_{j j}))}^{22} + + {(({y the y}_{i i} - - {y the y}_{j j}))}^{22}}}{r r}))

Among them, w _{i, j} is the Gaussian weight of target point i and feature point j, j∈ζ _t and j≠i, ( _xi ,y _i ) is the coordinates of target point i, (x _j ,y _j ) is The coordinates of feature point j, the value of r is 20;

S3.3) Implement spatio-temporal distance proximity filtering for each feature point except the target point i in ζ _t , the filtering method is:

For any feature point j in ζ _t , j∈ζ _t and j≠i, if w _{i, j} ≥ 0.5wmax, feature point j will be classified into Wherein w _max is the maximum value among the Gaussian weights of other feature points in the target point i and ζ _t ;

S3.4) Setting is the neighborhood intersection of the target point i between t→t+d, where d=3;

S3.5) Calculation The velocity angle between each feature point in and the target point i from t→t+d, the specific calculation formula is:

{θ θ}_{i i,, j j} = = \frac{11}{d d + + 11} {Σ Σ}_{τ τ = = t t}^{t t + + d d} a a b b s the s (({θ θ}_{i i}^{τ τ} - - {θ θ}_{j j}^{τ τ}))

if and order

if or order

if and but

if or but

Among them, θ _i,j is the velocity angle between feature point j and target point i from t→t+d, d=3, is the velocity direction of the feature point j in the τth image frame of the target video image, is the coordinate of the feature point j in the τth image frame of the target video image, Be the coordinates of the τ+1th image frame of the feature point j in the target video image; is the velocity direction of the target point i in the τth image frame of the target video image, is the coordinates of the τth image frame of the target point i in the target video image, be the coordinates of the τ+1th image frame of the target point i in the target video image;

S3.6) For For any feature point j in , if there is θ _{i, j} ≥ 45 or (360-θ _{i, j} ) ≥ 45, then the feature point j will be changed from in the shaving;

S3.7) Calculation The motion correlation between each feature point in and the target point i from t→t+d, the specific calculation formula is:

{C C}_{i i,, j j} = = \frac{11}{d d + + 11} {Σ Σ}_{τ τ = = t t}^{t t + + d d} \frac{{v v}_{τ τ}^{i i} \cdot &Center Dot; {v v}_{τ τ}^{j j}}{| | | | {v v}_{τ τ}^{i i} | | | | \cdot &Center Dot; | | | | {v v}_{τ τ}^{j j} | | | |}

{v v}_{τ τ}^{i i} = = (({v v}_{{x x}_{i i}}^{τ τ},, {v v}_{{y the y}_{i i}}^{τ τ})) = = (({x x}_{i i}^{τ τ + + 11} - - {x x}_{i i}^{τ τ},, {y the y}_{i i}^{τ τ + + 11} - - {y the y}_{i i}^{τ τ}))

| | | | {v v}_{τ τ}^{i i} | | | | = = \sqrt{{(({v v}_{{x x}_{i i}}^{τ τ}))}^{22} + + {(({v v}_{{y the y}_{i i}}^{τ τ}))}^{22}}

{v v}_{τ τ}^{j j} = = (({v v}_{{x x}_{j j}}^{τ τ},, {v v}_{{y the y}_{j j}}^{τ τ})) = = (({x x}_{j j}^{τ τ + + 11} - - {x x}_{j j}^{τ τ},, {y the y}_{j j}^{τ τ + + 11} - - {y the y}_{j j}^{τ τ}))

| | | | {v v}_{τ τ}^{j j} | | | | = = \sqrt{{(({v v}_{{x x}_{j j}}^{τ τ}))}^{22} + + {(({v v}_{{y the y}_{j j}}^{τ τ}))}^{22}}

Among them, C _i,j is the motion correlation between feature point j and target point i from t→t+d, d=3, is the velocity of the target point i in the τth image frame in the target video image, is the velocity of the feature point j in the τth image frame in the target video image, is the coordinates of the τth image frame of the target point i in the target video image, be the coordinates of the τ+1th image frame of the target point i in the target video image; is the coordinate of the feature point j in the τth image frame of the target video image, Be the coordinates of the τ+1th image frame of the feature point j in the target video image;

S3.8) Set C _th =0.6, for Any feature point j in , if there is C _i,j ≤ C _th , then the feature point j from in the shaving;

S3.9) will Defined as the initial cohort point set of the target point i

S4) For the initial homogeneous point set sequence of each image frame, arrange each initial homogeneous point set in the initial homogeneous point set sequence in descending order according to the number of feature points;

S5) For the initial co-cluster point set sequence of each image frame, classify and mark each initial co-cluster point set in the initial co-cluster point set sequence according to the method from step S5.1 to step S5.3;

S5.1) Make K=1, L=1, mark all feature points in the Kth initial cohort point set as L;

S5.2) Let K=K+1;

If all feature points in the Kth initial cluster point set have not been marked, then make L=L+1, and all feature points in the Kth initial cluster point set are marked as L;

If at least one feature point in the Kth initial cohort point set has been marked as L, then all feature points in the Kth initial cohort point set are marked as L;

S5.3) Step S5.2 is repeated until each initial homologous point set in the initial homologous point set sequence is marked;

S6) For the feature point coordinate set of each image frame, the feature points with the same classification mark are classified into the same group, and the feature points with the same classification mark are marked with the same color in the image frame, and in the image In the frame, the feature points with different classification labels are marked with different colors, so as to realize group detection.