CN115695949A - Video concentration method based on target track motion mode - Google Patents

Video concentration method based on target track motion mode Download PDF

Info

Publication number
CN115695949A
CN115695949A CN202211322752.0A CN202211322752A CN115695949A CN 115695949 A CN115695949 A CN 115695949A CN 202211322752 A CN202211322752 A CN 202211322752A CN 115695949 A CN115695949 A CN 115695949A
Authority
CN
China
Prior art keywords
target
track
video
tracks
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211322752.0A
Other languages
Chinese (zh)
Inventor
汪陈伍
武君胜
王佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211322752.0A priority Critical patent/CN115695949A/en
Publication of CN115695949A publication Critical patent/CN115695949A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a video concentration method based on a target track motion mode, and relates to the technical field of image processing and video monitoring. The method comprises the following steps: 1) Generating and grouping target track motion modes; the stage comprises the following steps: extracting a motion track of a video target, generating a track motion mode, and grouping the track motion modes; 2) On-line video enrichment based on a trajectory motion pattern. The stage comprises the following steps: extracting a single target track, generating a video background picture, matching track motion modes, rearranging target tracks in a group, and generating a concentrated video. The method comprises the steps of training off-line video data to obtain a target track motion mode and a group in a monitoring video scene; according to the trained track motion mode and the trained grouping, on-line video concentration is implemented for the monitoring video stream, the execution efficiency of the video concentration is improved, and the visual effect of the concentrated video is improved.

Description

Video concentration method based on target track motion mode
Technical Field
The invention belongs to the technical field of image processing and video monitoring, and particularly relates to a video concentration method based on a target track motion mode.
Background
With the development of computer networks and video technologies, in order to meet the public safety requirements, a large number of monitoring cameras are deployed in cities, the monitoring cameras continuously work for 24 hours every day, and a huge amount of video data is captured and stored every day. However, browsing video data is time-consuming and labor-consuming, and how to effectively use video data is a difficult problem because massive data information cannot be accurately acquired in time only by manpower.
The video concentration technology is a video abstraction method based on a target, can compress a long video into a short video, and is used for quickly retrieving and browsing original monitoring data. The method comprises the steps of taking a target as a basic processing unit, extracting a background picture of a monitoring video through background modeling, obtaining a foreground target by adopting a target detection and instance segmentation technology, and performing matching association on continuous foreground targets by using a target tracking technology to generate a target track (also called a target tube). And moving the target track on a time axis, rearranging the target track, and carrying out image fusion processing on the rearranged target track and the background picture to generate a concentrated video. The video concentration technology not only eliminates the time redundancy and the space redundancy in the source video, but also well retains the dynamic characteristics of the moving object.
However, the movement of video compression techniques also brings new problems: one is that false collisions between video objects can occur. Two target tracks without time intersection in the source video cannot collide, but when the target tracks are moved on a time axis in the processing process, the target tracks without collision may collide, which is called pseudo collision. Secondly, the target trajectory rearrangement process is very time-consuming. The target track rearrangement is the most important link in video concentration, generally, a loss function is set to convert the target track rearrangement into a multi-target optimization problem, optimal solutions are globally searched by adopting methods such as simulated annealing, markov chain Monte Carlo and the like, and most of the time is slow in convergence rate and very time-consuming. Thirdly, the concentrated video has poor visual browsing effect. The concentrated video usually displays more targets per frame, and if a plurality of targets with different directions and different speeds are collected into one frame and more pseudo-collisions occur between the targets, the visual browsing effect of the user is affected. How to reduce false collision as much as possible under the condition of improving the compression ratio of the condensed video, improve the execution efficiency and improve the visual browsing effect is a problem to be solved in the technical field of video condensation.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a video concentration method based on a target track motion mode, which can improve the video concentration execution efficiency and the visual browsing effect and reduce the false collision among targets.
The technical solution of the invention is as follows: the invention provides a video concentration method based on a target track motion mode, which comprises the following steps:
1) And generating and grouping target track motion patterns. Inputting historical data of a monitoring video, constructing a model to train the video data, and generating a target track motion mode and a group;
2) On-line video enrichment based on a trajectory motion pattern. Inputting a monitoring video stream, extracting a video motion target track, and carrying out online video concentration processing by combining the track motion mode and the grouping result in the step 1) to generate a concentrated video.
The step 1) comprises the following steps:
11 Inputting offline monitoring video data, performing target detection, example segmentation and target tracking by using a deep learning model, obtaining an example mask sequence (the example mask sequence is a target track) of each target in different video frames, and completing extraction of all moving target tracks in the video;
12 On the basis of extracting the target tracks, clustering all the target tracks by adopting a clustering algorithm, clustering the target tracks into different classes, and generating a target track motion mode;
13 On the basis of target track motion mode generation, according to the principle of less collision and direction consistency, the target track motion modes are divided to generate track motion mode groups.
The step 2) comprises the following steps:
21 Input surveillance video stream, use Gaussian mixture model to carry out background modeling, extract background picture of surveillance video, and set certain time interval to update background;
22 Input monitoring video stream, subtraction processing is carried out between a current frame and a background of the video, and a foreground mask is obtained through expansion and corrosion operations; then, tracking the target by using Kalman filtering and Hungarian matching algorithm so as to extract the motion trail of the target;
23 Matching the extracted target motion track with the target track motion mode, and automatically attributing the target track to the corresponding track motion mode group if the matching is successful; if the matching fails, the target track is assigned to an abnormal track motion mode group;
24 For the target motion tracks belonging to the same track motion mode grouping, carrying out video concentration processing by adopting a target track rearrangement method based on dynamic search space local optimization; defining an energy loss function for target motion tracks belonging to abnormal track motion mode groups, and performing video concentration processing of target track rearrangement;
25 For each video condensed packet, performing image fusion processing on the rearranged target track (target tube) and the extracted video background picture by frames by adopting a Poisson fusion algorithm, and merging the fused continuous video frames to generate a condensed video.
The step 12) further comprises the following steps:
121 ) expressing the target track by adopting an equidistant nearest adjacent sampling point method, wherein the target track is expressed as a coordinate vector
Figure BDA0003911056850000031
Wherein
Figure BDA0003911056850000032
For calculatingFold-over parameters used in the method, (x) i ,y i ) Coordinates of a track sampling point;
122 A criterion of similarity of target tracks is determined, euclidean distance between track coordinate vectors is used as the distance between two tracks, and the similarity is higher when the distance value is smaller. The distance between the ith and jth tracks is D i,j The calculation formula is as follows:
Figure BDA0003911056850000033
wherein the content of the first and second substances,
Figure BDA0003911056850000034
and
Figure BDA0003911056850000035
respectively expressed as x and y coordinate values of the kth sample point of the ith trace,
Figure BDA0003911056850000036
is a fold-back parameter.
123 Calculating and generating a similarity measurement matrix of the distances between the tracks, setting the radius gamma of adjacent points and the threshold omega of the minimum number of core points by adopting a DBSCAN clustering algorithm, and clustering the target tracks, wherein each cluster is determined as a target track motion mode;
124 For each target trajectory motion pattern, a representative trajectory is generated. And respectively calculating the average value of the abscissa x and the average value of the ordinate y of the coordinate vectors of all target tracks belonging to the same cluster according to the sampling points to form the coordinate vector representing the track. Assuming that there are n target tracks in a cluster,
Figure BDA0003911056850000037
for the doubling parameter, then
Figure BDA0003911056850000038
The number of the sampling points of the track is,
Figure BDA0003911056850000039
and
Figure BDA00039110568500000310
respectively representing x and y coordinates of a k sampling point of the track, and the calculation formula is as follows:
Figure BDA00039110568500000311
Figure BDA00039110568500000312
then the vector coordinates representing the trajectory are:
Figure BDA00039110568500000313
the step 13) further comprises the following steps:
131 Counting the sizes (including height and width) of all target detection frames in the current motion mode group in the video, and calculating the average value of the sizes of all the target detection frames;
132 The average values of the height and the width of the detection frames are superposed to the representative track, so that the representative track has one detection frame in each frame, and then the collision value between the representative tracks is calculated by summing the number of pixel points of intersection of all two detection frames in the two tracks;
133 Equally dividing theta by 360 degrees in a rectangular coordinate system, respectively representing theta-class directions, then representing the directions of the representative tracks by the directions of connecting lines from the starting points to the end points of the representative tracks, classifying the directions of the representative tracks, and judging the relation between different representative track directions by judging whether the representative track directions belong to the same direction category or not;
134 According to the collision relation and the direction relation among the representative tracks, grouping the representative tracks, wherein each group is a target track motion mode group. Setting a collision threshold value to ensure that no collision exists or little collision exists among the representative tracks in one group; meanwhile, the motion modes forming the annular flow are preferentially enabled to be divided into one group by considering the direction relation, so that the subsequent concentrated videos in the group keep fluency.
The step 23) further comprises the following steps:
231 The extracted target motion track is expressed by adopting an equidistant nearest neighbor sampling point representation method, and the target motion track is expressed as a coordinate vector
Figure BDA0003911056850000041
Wherein
Figure BDA0003911056850000042
For the fold-over parameter used in the algorithm, (x) i ,y i ) Coordinates of a track sampling point;
232 Calculating the distances of all sampling points between the target track and the representative tracks of all motion modes to serve as a basis for judging the similarity between the tracks, wherein the smaller the distance value is, the higher the similarity is;
233 Finding out the target track motion mode with the shortest distance, and recording the shortest distance as d min And compares the distance with a set threshold value alpha if d min If the target track is less than or equal to alpha, the target track is attributed to the target track motion mode and is attributed to the corresponding track motion mode group;
234 If d) min And if the target track is larger than the threshold value alpha, attributing the target track to the abnormal track motion mode.
The step 24) further comprises the following steps:
241 Defining an energy loss function, the loss terms including a condensed length loss term, a pseudo collision loss term, and a time series disorder loss term; the energy loss function is formulated as:
Figure BDA0003911056850000043
wherein l i Indicating the start position, P, of the track i in the condensed video g Representing all tracks belonging to a motion pattern group, P, with track i c Representing the same movement pattern as the trajectory iAll the tracks. E r Representing a condensed video length loss term, E c Representing a condensed video false collision loss term, E t Representing a condensed video track temporal clutter loss term. Alpha is alpha r ,α c ,α t Weights of different loss terms are respectively used for balancing the effects of the different loss terms;
242 Determining the range of the rearrangement position of the target track by adopting a dynamic search space method; target trajectory O i At the start position l of the condensed video i Is a dynamically changing value, l i The minimum value is recorded as l min And the maximum value is denoted by l max The minimum value is the maximum value of the starting positions of all the tracks arranged in the group, the maximum value is the maximum value of the ending positions of all the tracks arranged in the group, and the calculation formula is as follows:
Figure BDA0003911056850000051
Figure BDA0003911056850000052
P c representing all trajectories belonging to the same motion pattern as trajectory i, len (O) k ) The frame length representing the ith trajectory;
243 Using energy loss function and initial position to search space range, and adopting local optimization greedy algorithm to carry out concentration rearrangement on target tracks in the group.
The step 121) further comprises the following steps:
1211 Mark the coordinates of the starting point and the end point of the target track, and draw a straight line from the starting point to the end point;
1212 Equally divide a straight line into equal parts at intervals of distance
Figure BDA0003911056850000053
Parts of wherein
Figure BDA0003911056850000054
Represents the times of folding the film in half,typically having a range of values
Figure BDA0003911056850000055
Then straight line is added in the middle
Figure BDA0003911056850000056
A dividing point;
1213 Taking the division point on the straight line as the center of a circle, searching a point which is closest to the division distance on the target track, and taking the point as a sampling point of the target track;
1214 Starting and ending points of the target track, and (2) ω -1) a coordinate vector of sample points representing the target trajectory, the coordinate vector being of the form
Figure BDA0003911056850000057
The invention has the beneficial effects that:
(1) The target track is represented by adopting an equidistant nearest adjacent sampling point method, the position and position change information of the target track are more accurately represented, the represented lengths of the tracks with different lengths are unified, the distance calculation between the tracks is convenient, and the efficiency and accuracy of a track clustering algorithm and track matching are improved;
(2) The method adopts two-stage processing modes of off-line video data training generation motion mode and video stream on-line video concentration, and the historical data is fully utilized in the training stage, so that the generated track motion mode is more complete and accurate. And in the online stage, the existing track motion mode is used for matching, the dynamic search space local optimization target track rearrangement is carried out, and the execution efficiency of video concentration is improved.
(3) According to the principle of less collision and direction consistency, track motion mode grouping is carried out, detected target tracks belong to different motion mode grouping in an online stage, target track rearrangement is carried out in the group, so that false collision among concentrated video targets is less, the concentrated video comprises the target tracks in the same annular stream, and the visual browsing effect is good.
Drawings
Fig. 1 is a schematic diagram illustrating steps of a video compression method according to the present invention.
FIG. 2 is a schematic diagram of a method for equidistant nearest neighbor sampling points according to the present invention.
FIG. 3 is a schematic diagram of a method for measuring similarity between target tracks according to the present invention.
Fig. 4 is a schematic diagram of grouping target track motion patterns according to the present invention.
Detailed Description
Referring to fig. 2, in the method for equally-spaced nearest-neighbor sampling points of a target track, a straight line is used to connect the start point and the end point of the track, and the straight line is selected to be equally divided by 8, 16, 32 or 64 according to the resolution of a video image. Taking an 8-equal-division straight line as an example, respectively drawing a circle by taking each division point on the straight line as a center, and searching the nearest point on the track as a sampling point of the target track. Coordinates of 8 sampling points on the track form a coordinate vector together to represent the target track.
Referring to fig. 3, in the method for measuring similarity between target tracks, after the tracks are represented by using a method of equally spaced nearest adjacent sampling points, euclidean distances of two tracks 8 to the sampling points are respectively calculated in sequence, and all distance values are added and summed, so that the sum is the distance between the two target tracks. The smaller the distance value, the higher the similarity.
Referring to fig. 4, there are 6 sets of trajectory movement patterns, where a and b, c and d, e and f are representative trajectories of three pairs of trajectory movement patterns with similar trajectories and opposite directions, and there is a large collision between the three pairs of representative trajectories. The principle of motion pattern grouping is to group the motion pattern groups representing fewer trajectory collisions, while the plurality of motion pattern groups in a group are as much as possible in one circular flow direction. After grouping, the groups are divided into two groups, a, c and e are divided into one group, and b, d and f are divided into the other group.
Referring to fig. 1, the present invention provides a video compression method based on target trajectory motion pattern grouping, including:
1) And generating and grouping target track motion patterns. Historical data of the monitoring video is input, a model is built to train the video data, and a target track motion mode and a target track motion group are generated.
The specific process of the step 1) is as follows:
11 Inputting offline monitoring video data, performing target detection and example segmentation by using a Yolact + + model, and performing target tracking by using a Deepsort model to obtain an example mask sequence of each target in different video frames, wherein the example mask sequence is a target track, and the extraction of all moving target tracks in the video is completed;
12 On the basis of target track extraction, clustering operation is carried out on all target tracks by adopting a DBSCAN clustering algorithm, the target tracks are clustered into different classes, and a target track motion mode is generated;
the specific process of the step 12) is as follows:
121 ) the target track is expressed by adopting an equal-spacing nearest neighbor sampling point algorithm, and the target track is expressed as a coordinate vector
Figure BDA0003911056850000071
Wherein
Figure BDA0003911056850000072
For the fold-over parameter used in the algorithm, (x) i ,y i ) Coordinates of a track sampling point;
as shown in fig. 2, the specific process of step 121) is as follows:
1211 Mark the coordinates of the starting point and the end point of the target track, and draw a straight line from the starting point to the end point;
1212 Equally divide a straight line into equal parts at intervals of distance
Figure BDA0003911056850000073
Parts of, wherein
Figure BDA0003911056850000074
Representing the number of folds, usually in the range of
Figure BDA0003911056850000075
Then straight line is added in the middle
Figure BDA0003911056850000076
A dividing point;
1213 Taking the division point on the straight line as the center of a circle, searching a point which is closest to the division distance on the target track, and taking the point as a sampling point of the target track;
1214 Starting and ending points of the target track, and (2) ω -1) a coordinate vector of sample points in the form of a coordinate vector
Figure BDA0003911056850000077
122 Determine the similarity criterion of the target track, and take the euclidean distance between the coordinate vectors of the tracks as the distance between the two tracks, wherein the smaller the distance value is, the higher the similarity is. As shown in FIG. 3, the distance between the ith and jth tracks is D i,j The calculation formula is as follows:
Figure BDA0003911056850000078
wherein the content of the first and second substances,
Figure BDA0003911056850000079
and
Figure BDA00039110568500000710
respectively expressed as x and y coordinate values of the kth sample point of the ith trace,
Figure BDA00039110568500000711
is a folding parameter.
123 Calculating and generating a similarity measurement matrix of the distances between the tracks, setting the radius gamma of adjacent points and the threshold omega of the minimum number of core points by adopting a DBSCAN clustering algorithm, and clustering the target tracks, wherein each cluster is determined as a target track motion mode;
124 For each target trajectory motion pattern, a representative trajectory is generated. And respectively calculating the average value of the abscissa x and the average value of the ordinate y of the coordinate vectors of all target tracks belonging to the same cluster according to the sampling points to form the coordinate vector representing the track. Suppose there are n targets in a clusterThe trajectory of the light beam is determined,
Figure BDA00039110568500000712
for the doubling parameter, then
Figure BDA0003911056850000081
The number of the sampling points of the track is,
Figure BDA0003911056850000082
and
Figure BDA0003911056850000083
respectively representing x and y coordinates of a k sampling point of the track, and the calculation formula is as follows:
Figure BDA0003911056850000084
Figure BDA0003911056850000085
then the vector coordinates representing the trajectory are:
Figure BDA0003911056850000086
13 On the basis of target track motion mode generation, according to the principle of less collision and direction consistency, the target track motion modes are divided to generate track motion mode groups.
The specific process of the step 13) is as follows:
131 Counting the sizes (including height and width) of all target detection frames in the current motion mode group in the video, and calculating the average value of the sizes of all the target detection frames;
132 The average values of the height and the width of the detection frames are superposed to the representative track, so that the representative track has one detection frame in each frame, and then the collision value between the representative tracks is calculated by summing the number of pixel points of intersection of all two detection frames in the two tracks;
133 Dividing 360 degrees into 8 equal parts in a rectangular coordinate system, respectively representing 8 types of directions, then representing the direction of the representative track by the direction of a connecting line from the starting point to the end point of the representative track, classifying the directions of the representative tracks, and judging the relation between different directions of the representative tracks according to whether the representative tracks belong to the same category or not;
134 As shown in fig. 4, representative trajectories are grouped according to the collision relationship and the direction relationship between the representative trajectories, and each group is a group of target trajectory motion patterns. Setting a collision threshold value to ensure that no collision exists or little collision exists among the representative tracks in one group; meanwhile, the motion modes forming the annular flow are preferentially enabled to be divided into one group by considering the direction relation, so that the subsequent concentrated videos in the group keep fluency.
2) On-line video enrichment based on a trajectory motion pattern. Inputting a monitoring video stream, extracting a video motion target track, and carrying out online video concentration processing by combining the track motion mode and the grouping result in the step 1) to generate a concentrated video.
The specific process of step 2) is as follows:
21 Input surveillance video stream, use Gaussian mixture model to carry out background modeling, extract background picture of surveillance video, and set certain time interval to update background;
22 Input monitoring video stream, subtraction processing is carried out between the current frame and the background, and a foreground mask is obtained through expansion and corrosion operations; then, tracking the target by using Kalman filtering and Hungarian matching algorithm so as to extract the motion trail of the target;
23 Matching the extracted target motion track with the target track motion mode, and if the matching is successful, automatically attributing the target track to a corresponding track motion mode group; if the matching fails, the target track is assigned to an abnormal track motion mode group;
the specific process of the step 23) is as follows:
231 The extracted target motion track is expressed by adopting an equal-spacing nearest adjacent sampling point expression method. As shown in FIG. 2, the target motion trajectory is represented as a coordinate vector
Figure BDA0003911056850000091
Wherein
Figure BDA0003911056850000092
For the fold-over parameter used in the algorithm, (x) i ,y i ) Coordinates of a track sampling point;
232 Calculating the distances of all sampling points between the target track and the representative tracks of all motion modes to serve as a basis for judging the similarity between the tracks, wherein the smaller the distance value is, the higher the similarity is;
233 Finding out the target track motion mode with the shortest distance, and recording the shortest distance as d min And compares this distance with a set threshold alpha, if d min If the target trajectory is less than or equal to alpha, attributing the target trajectory to the target trajectory motion mode, and attributing the target trajectory to a corresponding trajectory motion mode group;
234 If d) min And if the target track is larger than the threshold value alpha, attributing the target track to the abnormal track motion mode.
24 For the target motion tracks belonging to the same track motion mode grouping, carrying out video concentration processing by adopting a target track rearrangement method based on dynamic search space local optimization; for target motion tracks belonging to abnormal track motion mode groups, directly carrying out video concentration processing of target track rearrangement according to an energy loss function;
the specific process of the step 24) is as follows:
241 Defining an energy loss function, the loss terms including a condensed length loss term, a pseudo collision loss term, and a time series disorder loss term; the energy loss function is formulated as:
Figure BDA0003911056850000093
wherein l i Indicating the start position, P, of the track i in the condensed video g Representing all tracks belonging to a motion pattern group, P, with track i c Representing all trajectories belonging to the same motion pattern as trajectory i. E r Indicating condensed visionLoss term of frequency length, E c Representing a condensed video false collision loss term, E t Representing a condensed video track temporal clutter loss term. Alpha is alpha r ,α c ,α t Weights of different loss terms are respectively used for balancing the effects of the different loss terms;
242 Determining the range of the rearrangement position of the target track by adopting a dynamic search space method; target trajectory O i At the start position l of the condensed video i Is a dynamically changing value, l i The minimum value is recorded as l min And the maximum value is denoted by l max The minimum value is the maximum value of the starting positions of all the tracks arranged in the group, the maximum value is the maximum value of the ending positions of all the tracks arranged in the group, and the calculation formula is as follows:
Figure BDA0003911056850000101
Figure BDA0003911056850000102
P c representing all trajectories belonging to the same motion pattern as trajectory i, len (O) k ) The frame length representing the ith trajectory;
243 Using energy loss function and initial position to search space range, and adopting local optimization greedy algorithm to carry out concentration rearrangement on target tracks in the group.
25 For each video condensed packet, performing image fusion processing on the rearranged target track (target tube) and the extracted video background picture by frames by adopting a Poisson fusion algorithm, and merging the fused continuous video frames to generate a condensed video.

Claims (8)

1. A video concentration method based on a target track motion mode is characterized by comprising the following specific processes:
1) And generating and grouping target track motion patterns. Inputting historical data of a monitoring video, constructing a model to train the video data, and generating a target track motion mode and a group;
2) On-line video enrichment based on a trajectory motion pattern. Inputting a monitoring video stream, extracting a video motion target track, and carrying out online video concentration processing by combining the track motion mode and the grouping result in the step 1) to generate a concentrated video.
2. The method for concentrating video based on target track motion pattern according to claim 1, wherein the specific process of step 1) is as follows:
11 Inputting offline monitoring video data, performing target detection, example segmentation and target tracking by using a deep learning model, obtaining an example mask sequence (the example mask sequence is a target track) of each target in different video frames, and completing extraction of all moving target tracks in the video;
12 On the basis of extracting the target tracks, clustering all the target tracks by adopting a clustering algorithm, clustering the target tracks into different classes, and generating a target track motion mode;
13 On the basis of target track motion mode generation, according to the principle of less collision and direction consistency, the target track motion modes are divided to generate track motion mode groups.
3. The method for concentrating video based on target track motion pattern as shown in claim 1, wherein the specific process of the step 2) is as follows:
21 Input surveillance video stream, use Gaussian mixture model to carry out background modeling, extract background picture of surveillance video, and set certain time interval to update background;
22 Input monitoring video stream, subtraction processing is carried out between a current frame and a background of the video, and a foreground mask is obtained through expansion and corrosion operations; then, tracking the target by using Kalman filtering and Hungarian matching algorithm so as to extract the motion trail of the target;
23 Matching the extracted target motion track with the target track motion mode, and automatically attributing the target track to the corresponding track motion mode group if the matching is successful; if the matching fails, the target track is assigned to an abnormal track motion mode group;
24 For the target motion tracks belonging to the same track motion mode grouping, carrying out video concentration processing by adopting a target track rearrangement method based on dynamic search space local optimization; defining an energy loss function for target motion tracks belonging to abnormal track motion mode groups, and performing video concentration processing of target track rearrangement;
25 For each video condensed packet, performing image fusion processing on the rearranged target track (target tube) and the extracted video background picture by frames by adopting a Poisson fusion algorithm, and merging the fused continuous video frames to generate a condensed video.
4. The method as claimed in claim 2, wherein the clustering algorithm is used to cluster all target tracks, and the target tracks are clustered into different classes to generate the target track motion pattern, wherein the step 12) comprises the following specific steps:
121 The target track is expressed by adopting an equal-spacing nearest adjacent sampling point algorithm, the target track is expressed as a coordinate vector, and the length of the coordinate vector is the number of sampling points;
122 Determine the similarity criterion of the target track, and take the euclidean distance between the coordinate vectors of the tracks as the distance between the two tracks, wherein the smaller the distance value is, the higher the similarity is.
123 Calculating and generating a similarity measurement matrix of the distances between the tracks, setting the radius gamma of adjacent points and the threshold omega of the minimum number of core points by adopting a DBSCAN clustering algorithm, and clustering the target tracks, wherein each cluster is determined as a target track motion mode;
124 For each target trajectory motion pattern, a representative trajectory is generated. And respectively calculating the average value of the abscissa x and the average value of the ordinate y of the coordinate vectors of all target tracks belonging to the same cluster according to the sampling points to form the coordinate vector representing the track.
5. The method as in claim 2 wherein the step 13) comprises the following steps:
131 Counting the sizes (including the heights and the widths) of all target detection frames in the current motion mode group in the video, and calculating the average value of the heights and the widths of all the target detection frames;
132 The average values of the height and the width of the detection frames are superposed to the representative track, so that the representative track has one detection frame in each frame, and then the collision value between the representative tracks is calculated by summing the number of pixel points of intersection of all two detection frames in the two tracks;
133 Equally dividing theta by 360 degrees in a rectangular coordinate system, respectively representing theta-class directions, then representing the directions of the representative tracks by the directions of connecting lines from the starting points to the end points of the representative tracks, classifying the directions of the representative tracks, and judging the relation between different representative track directions by judging whether the representative track directions belong to the same direction category or not;
134 According to the collision relation and the direction relation among the representative tracks, grouping the representative tracks, wherein each group is a target track motion mode group. Setting a collision threshold value to ensure that no collision exists or little collision exists among the representative tracks in one group; meanwhile, the motion modes forming the annular flow are preferentially enabled to be divided into one group by considering the direction relation, so that the subsequent concentrated videos in the group keep fluency.
6. Matching the extracted target motion trail with the target trail motion patterns according to the right 3, and automatically grouping the target trail into corresponding trail motion patterns if the matching is successful, wherein the specific process of the step 23) is as follows:
231 The extracted target motion track is expressed by adopting an equidistant nearest adjacent sampling point representation method, and the target motion track is expressed as a coordinate vector;
232 Calculating the distances of all sampling points between the target track and the representative tracks of all motion modes, and taking the distances as the basis for judging the similarity between the tracks, wherein the smaller the distance value is, the higher the similarity is;
233 Finding out the target track motion mode with the shortest distance, and recording the shortest distance as d min And compares the distance with a set threshold value alpha if d min If the target track is less than or equal to alpha, the target track is attributed to the target track motion mode and is attributed to the corresponding track motion mode group;
234 If d) min And if the target track is larger than the threshold value alpha, the target track is assigned to an abnormal track motion mode.
7. The method for performing video compression processing on the target motion trajectories belonging to the same trajectory motion mode group by using the target trajectory rearrangement method based on dynamic search space local optimization as described in claim 3, wherein the specific process in step 24) is as follows:
241 Defining an energy loss function, the loss terms including a condensed length loss term, a pseudo collision loss term, and a time series disorder loss term; the energy loss function is formulated as:
Figure FDA0003911056840000031
wherein l i Indicating the start position, P, of the track i in the condensed video g Representing all tracks belonging to a motion pattern group, P, with track i c Representing all trajectories belonging to the same motion pattern as trajectory i. E r Representing a condensed video length loss term, E c Representing a condensed video false collision loss term, E t Representing a condensed video track temporal clutter loss term. Alpha (alpha) ("alpha") r ,α c ,α t Weights of different loss terms are respectively used for balancing the effects of the different loss terms;
242 Determining the range of the rearrangement position of the target track by adopting a dynamic search space method; target trajectory O i At the start position l of the condensed video i Is a dynamically changing value, l i The minimum value is recorded as l min And the maximum value is denoted by l max The minimum value is already in the groupThe maximum value of the start positions of all the arranged tracks is the maximum value of the end positions of all the arranged tracks in the group, and the calculation formula is as follows:
Figure FDA0003911056840000032
Figure FDA0003911056840000041
P c representing all trajectories belonging to the same motion pattern as trajectory i, len (O) k ) The frame length representing the ith trajectory;
243 Using an energy loss function and a starting position search space range, and adopting a local optimization greedy algorithm to perform concentration and rearrangement on the target tracks in the group.
8. The method as claimed in claim 4, wherein the step 121) comprises the following steps:
1211 Mark the coordinates of the starting point and the end point of the target track, and draw a straight line from the starting point to the end point;
1212 Equally divide a straight line into equal parts at intervals of distance
Figure FDA0003911056840000042
Parts of, wherein
Figure FDA0003911056840000043
The folding parameters represent the times of folding, and usually have a value range of
Figure FDA0003911056840000044
Then straight line is added in the middle
Figure FDA0003911056840000045
A dividing point;
1213 Taking the dividing point on the straight line as the center of a circle, searching a point which is closest to the dividing point on the target track, and taking the point as a sampling point of the target track;
1214 Starting and ending points of the target track, and (2) ω -1) a coordinate vector of sample points in the form of a coordinate vector
Figure FDA0003911056840000046
CN202211322752.0A 2022-10-27 2022-10-27 Video concentration method based on target track motion mode Pending CN115695949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211322752.0A CN115695949A (en) 2022-10-27 2022-10-27 Video concentration method based on target track motion mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211322752.0A CN115695949A (en) 2022-10-27 2022-10-27 Video concentration method based on target track motion mode

Publications (1)

Publication Number Publication Date
CN115695949A true CN115695949A (en) 2023-02-03

Family

ID=85098422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211322752.0A Pending CN115695949A (en) 2022-10-27 2022-10-27 Video concentration method based on target track motion mode

Country Status (1)

Country Link
CN (1) CN115695949A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074642A (en) * 2023-03-28 2023-05-05 石家庄铁道大学 Monitoring video concentration method based on multi-target processing unit

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074642A (en) * 2023-03-28 2023-05-05 石家庄铁道大学 Monitoring video concentration method based on multi-target processing unit

Similar Documents

Publication Publication Date Title
CN113034548B (en) Multi-target tracking method and system suitable for embedded terminal
Jana et al. YOLO based Detection and Classification of Objects in video records
CN112836640B (en) Single-camera multi-target pedestrian tracking method
CN103246896B (en) A kind of real-time detection and tracking method of robustness vehicle
CN109753949B (en) Multi-window traffic sign detection method based on deep learning
CN112488061B (en) Multi-aircraft detection and tracking method combined with ADS-B information
CN112100435B (en) Automatic labeling method based on edge traffic audio and video synchronization samples
Shuai et al. Object detection system based on SSD algorithm
CN112906812A (en) Vehicle track clustering method based on outlier removal
CN111882586A (en) Multi-actor target tracking method oriented to theater environment
CN110991397B (en) Travel direction determining method and related equipment
CN113256690B (en) Pedestrian multi-target tracking method based on video monitoring
CN110688940A (en) Rapid face tracking method based on face detection
Wang et al. Multi-target pedestrian tracking based on yolov5 and deepsort
CN115695949A (en) Video concentration method based on target track motion mode
Zeng et al. Robust multivehicle tracking with wasserstein association metric in surveillance videos
CN115311617A (en) Method and system for acquiring passenger flow information of urban rail station area
Li et al. Multi-target tracking with trajectory prediction and re-identification
Gloudemans et al. Vehicle tracking with crop-based detection
CN113420679A (en) Artificial intelligent cross-camera multi-target tracking system and tracking algorithm
Peng et al. Tracklet siamese network with constrained clustering for multiple object tracking
CN115512263A (en) Dynamic visual monitoring method and device for falling object
Zhang et al. What makes for good multiple object trackers?
Aakur et al. Fine-grained action detection in untrimmed surveillance videos
Kang et al. Online multiple object tracking with recurrent neural networks and appearance Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination