CN116647690B

CN116647690B - Video concentration method based on space-time rotation

Info

Publication number: CN116647690B
Application number: CN202310626770.6A
Authority: CN
Inventors: 张云佐; 郭凯娜; 朱鹏飞; 张天
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2024-03-01
Anticipated expiration: 2043-05-30
Also published as: CN116647690A

Abstract

The invention discloses a video concentration method based on space-time rotation, which belongs to the field of computer vision and comprises the following steps: 1) According to the target duty ratio threshold, respectively performing dynamic time domain translation or space-time rotation to avoid pseudo collision in the video concentration process; 2) Defining an angle critical threshold to divide the space-time rotation into self-adaptive space-time rotation and critical space-time rotation, and avoiding pseudo collision while guaranteeing the compression rate; 3) Designing a time sequence function to perform time sequence judgment, so as to avoid time sequence confusion among targets; 4) Finally, the target tube is sutured to generate a concentrated video. The method can fully utilize the space-time information in the video, avoid pseudo collision while maintaining good frame concentration rate, and simultaneously maintain the time sequence relation among targets, thereby having good visual effect.

Description

Video concentration method based on space-time rotation

Technical Field

The invention relates to a video concentration method based on space-time rotation, and belongs to the technical field of computer vision.

Background

In recent years, with the improvement of living standard, the safety precaution consciousness of people is continuously enhanced, and the monitoring equipment which is one of the effective means of safety precaution is widely applied to places with intensive traffic such as banks, supermarkets and the like. A large number of cameras are used for monitoring video recordings 24 hours a day for security purposes, producing a vast amount of video data, and although the captured video contains effective information for security purposes, the storage, management and control of such large amounts of video data is becoming increasingly difficult. In the video surveillance, no activity or activity may only occur in a small image area within a period of time, if the video is watched by adopting the traditional manual method to find the required information, a great deal of manpower, material resources and time are consumed, and moreover, the video is watched by focusing attention for a long time, so that fatigue of workers is easily caused, and misjudgment of information is easily caused by miswatching and missed watching. The video concentration technique provides an effective solution to the above problems by optimally rearranging the motion trajectories in the original picture to generate a concentrated video suitable for efficient browsing and searching. The video duration is greatly reduced, and the information of the original video is not lost. In the face of massive monitoring video data, how to quickly browse and inquire the required content becomes a problem to be solved urgently, and the urgent real demand exists. The application of video concentration technology is particularly important for the situation that video data is being generated from time to time in the current society.

Furthermore, ghatak et al evaluate the performance of converting the tube optimization rearrangement problem into solving an energy function optimal solution problem, namely simulated annealing (Simulated Annealing, SA), cultural algorithm (Cultural Algorithm, CA), teaching-based optimization algorithm (Teaching-Learning-Based Optimization, TLBO), forest optimization algorithm (Forest Optimization Algorithm, FOA), gray wolf optimizer (Gray Wolf Optimizer, GWO), non-dominant ranking genetic algorithm II (Non-dominated Sorting Genetic Algorithm-II, NSGA-II), JAYA algorithm, elite-JAYA algorithm (elist-JAYA) and Adaptive Multi-Population-based JAYA algorithm (SAMP-JAYA). In a subsequent improvement, ghatak et al propose to improve the energy minimization process using a hybrid algorithm combining SA and JAYA. Yao et al use genetic algorithms (Genetic Algorithm, GA) to generate new minimisation energy function formulas. In addition, xu et al propose an optimization scheme based on GA to solve the problem of merging of target tubes in the process of concentrated video generation, which is superior to the SA-based method in terms of information loss and time consumption. Ghatak et al explored the concept of multi-frame and scaling and proposed a HGWOSA optimization algorithm that mixed GWO and SA together to obtain globally optimal results at low computational cost. Moussa et al optimized rearranging the target tubes using a particle swarm algorithm (Particle Swarm Optimization, PSO), reduced false collisions, maintained a time sequence, and calculated interactions between targets.

The most challenging task in object-based video concentration is to obtain the optimal rearrangement of the object tubes to show the most motion information in the shortest time span. In the video concentration process, two objects in the original video, where there is no collision, may collide in the concentrated video, which is called a pseudo collision. In order to solve the problems of target interactivity loss, false collisions, etc., many improved video concentration techniques have been proposed. Nie et al propose a video enrichment technique to move objects in the time and space domain to generate a reduced false collision enriched video. Li et al propose a solution to the problem of false collisions in video concentration that minimizes the identified false collision target size in the time domain, although the problem of false collisions is technically solved, the vehicle and person sizes of the concentrated video in the same scene close to each other may be the same, not in line with reality. He et al define collision states between moving objects, respectively: the method has the advantages that no collision, same-direction collision and reverse collision exist, motion collision is further analyzed, an optimization strategy based on a collision diagram is also provided, a target tube is filled in a deterministic mode, and the calculation complexity is reduced.

Unlike the method of optimizing rearrangement target tubes using energy function solution, huang et al demonstrated the superiority of on-line optimization, which allows rearrangement of target tubes while detecting targets without waiting for the optimization process to begin, but the biggest problem in this method is to ignore false collisions completely to improve time performance, and another problem is that optimization technique threshold is determined manually instead of using decision technique, which brings about a trade-off between computation time and compression rate, and accuracy is lowered. Feng et al introduced a background generation method that selected the most active video frames in the image with the greatest background variation, and the resulting condensed video background varied with time and motion. Hsia et al propose an optimized rearrangement method for selecting target tubes by using region trees, introduce a high-efficiency search technology for a moving target database, and reduce the computational complexity.

Disclosure of Invention

Aiming at the problems that in a video concentration method, pseudo collision is serious and target time sequence is disordered, so that the generated concentrated video has low reflection degree on an original video, and the existing video concentration method for solving the optimal solution based on an energy function has poor local pseudo collision avoidance effect, the invention aims to provide a video concentration method which simultaneously keeps target time sequence, avoids pseudo collision and keeps higher video compression rate, and realizes high-quality and high-efficiency video concentration by designing a space-time rotation algorithm and a time sequence function.

To achieve the above object, an embodiment of the present invention provides a video condensing method based on space-time rotation, which is characterized by comprising the steps of:

1) Performing target detection and tracking on the input video by using a Yolov4 and deep Sort algorithm, and extracting a target tube;

2) Analyzing pseudo collision generated among targets in the video concentration process, defining a target duty ratio threshold, and respectively processing the pseudo collision targets;

3) The method for dynamic time domain translation is provided, and the moving targets larger than the target duty ratio threshold value are subjected to dynamic time domain translation to avoid pseudo collision, so that the robustness of a space-time rotation algorithm in the monitoring videos with different sizes of the moving targets is ensured;

4) Performing space-time rotation on a moving target smaller than a target duty ratio threshold, defining an angle critical threshold, and dividing a space-time rotation method into self-adaptive space-time rotation and critical space-time rotation;

5) The false collision target with the angle critical threshold value smaller than 0 adopts self-adaptive space-time rotation to improve the reflection degree of the original video;

6) For the false collision targets with the angle critical threshold value larger than 0, critical space-time rotation is adopted to improve the compression rate of the concentrated video.

The further technical proposal is that: firstly, defining an object interactivity function to analyze interactivity among objects, and dividing a management set for common processing; secondly, analyzing the pseudo collision among the targets, respectively carrying out dynamic time domain translation or space-time rotation according to the target duty ratio threshold value to avoid the pseudo collision, defining an angle critical threshold value to divide the space-time rotation into self-adaptive space-time rotation and critical space-time rotation, and avoiding the pseudo collision while ensuring the compression rate; then, designing a time sequence function to perform time sequence judgment, so as to avoid time sequence confusion among targets; finally, suturing the target vessel generates a condensed video.

The further technical proposal is that: assuming that the target tube in which the pseudo collision occurs is T _i And T _j Wherein T is _i Has been rearranged and belongs to the concentrated video, T _j Has not been rearranged, T _i And T _j The collision overlap ratio at time t is defined as:

the further technical proposal is that: defining a target duty ratio threshold, and respectively carrying out dynamic time domain translation and time-space domain rotation according to the threshold to avoid pseudo collision; and secondly, defining an angle critical threshold value, and dividing the time-space domain rotation into self-adaptive time-space rotation and critical time-space rotation.

The further technical proposal is that: defining a target duty cycle threshold:

for a pseudo collision target with a target frame area larger than gamma, adopting dynamic time domain translation to delay T _i Will T _j The start time tag of (c) is modified as follows:

the further technical proposal is that: for a target tube T with a target frame area smaller than gamma _i And T _j Avoiding pseudo collision by adopting a time-space domain rotation method, and defining an angle critical threshold value:

the further technical proposal is that: for the monitoring video with eta less than or equal to 0, a self-adaptive space-time rotation mode is provided to avoid pseudo collision, and a target tube T is arranged _j The starting point is taken as the circle center to rotate so as to be far away from the collision point until two targets are just free from collision, and the dotted line is a target tube T _j Space-time trajectory after rotation.

The further technical proposal is that: assume a target tube T _j Into the surveillance zone is the coordinatesRotating target tube T _j Avoiding false collisions, the direction vector after rotation is +.>Specifically defined as:

target tube T _j Rotation angleThe calculation mode of (a) is as follows:

the further technical proposal is that: for the monitoring video with eta > 0, a critical space-time rotation method is provided to avoid pseudo collision on the premise of ensuring the compression rateTarget tube T _j And rotating the target tubes by taking the starting point as the center of a circle to enable the target tubes to be far away from the collision point until two target tubes have no overlapping frames.

The further technical proposal is that: classifying according to the motion position relation of the target tube in the monitoring area, and defining the following formula:

wherein,and->Representing the barycentric coordinates of the mth object exiting and entering the surveillance zone, respectively.

The further technical proposal is that: with target tube T _j When lambda is more than or equal to 0, T is the center of a starting point entering a monitoring area _i And T _j Is T at critical intersection point _i Is called T at this time _i Is a proximal segment; lambda < 0, T _i And T _j Is T at critical intersection point _j Is called T at this time _j Is a proximal segment; in performing optimal rearrangement of target tubes, different target tubes as proximal segments need to be treated differently to avoid collisions.

The further technical proposal is that: t (T) _i For the near heart segment, T is _j Rotate until it is equal to T _i The two target tubes will not have a collision relationship,the specific definition of (2) is as follows:

wherein,respectively T _i Coordinates and T of exit monitoring area _j Into the coordinates of the monitored area. Known->Is T _j Can calculate the rotation angle +.>

The further technical proposal is that: t (T) _j For the proximal segment, T is _j Rotate until T _j Endpoint and T of (2) _i Intersecting, there will be no collision relationship between the two target tubes, knownAssuming a rotated T _j And T is _i The intersection point coordinate of (a, ζ) is +.>The specific definition of (2) is as follows:

wherein, (μ, ζ) is calculated according to known conditions:

the beneficial effects of adopting above-mentioned technical scheme to produce lie in: aiming at the problem of pseudo collision among targets in a concentrated video, the invention provides a high-efficiency concentration scheme of a monitoring video based on space-time rotation, and firstly, a target interactivity function is provided for carrying out interactive analysis on an extracted target pipe to determine a target with interactive behavior; secondly, a dynamic time domain translation and space-time rotation algorithm is provided on the problem of processing the pseudo collision, and the self-adaptive space-time rotation and critical space-time rotation are respectively carried out according to the collision mode of the target, so that the compression rate is ensured while the pseudo collision is avoided; and finally, filling a target tube by using a rotation search mode, and defining a time sequence function to ensure the consistency of the generated concentrated video and the original video time sequence. Experimental results show that the method avoids pseudo collision among targets on the premise of ensuring the compression rate, and can realize high-quality video concentration.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a monitoring video concentration process based on space-time rotation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of dynamic time domain translation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of adaptive spatio-temporal rotation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of critical spatiotemporal rotation provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a pseudo collision according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of critical spatiotemporal rotation (a) according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of critical spatiotemporal rotation (b) according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

As shown in fig. 1, a space-time rotation-based monitoring video concentration overall flowchart provided by an embodiment of the present invention includes:

As shown in fig. 2, in order to ensure the robustness of the space-time rotation algorithm in the surveillance video with different sizes of moving objects, the present invention proposes three processing modes of the space-time rotation algorithm according to different pseudo collision modes. Firstly, defining a target duty ratio threshold, and respectively carrying out dynamic time domain translation and time-space domain rotation according to the threshold to avoid pseudo collision; and secondly, defining an angle critical threshold value, and dividing the time-space domain rotation into self-adaptive time-space rotation and critical time-space rotation. Defining a target duty cycle threshold gamma:

where w and h are the width and height of the target frame, W, H is the width and height of the video frame, and f is the frame rate of the input video. For a pseudo collision target with a target frame area larger than gamma, adopting dynamic time domain translation to delay T _j Is to T assuming that the moving speed of the object j is v _j The start time tag of (c) is modified as follows:

fig. 2 shows the result of video concentration with a target frame area greater than γ, where two targets that do not intersect in the original video appear at the same spatial location at the same time, creating a false collision. Modification of target tube T by dynamic time domain translation method _j It can be seen that the condensed video after dynamic time domain translation has no pseudo collision, and the time length of the whole target tube does not need to be translated, so that the influence on the compression rate is very small.

As shown in FIG. 3, in order to ensure a good visual effect of the concentrated video, a self-adaptive space-time rotation mode is provided for the monitoring video with eta less than or equal to 0 to avoid pseudo collision. After two target pipes translate along a time axis, a small amount of video frames overlap, and collision-free targets in the original video generate pseudo collision after concentration. As shown in fig. 3, the target tube T _j The starting point is taken as the circle center to rotate so as to be far away from the collision point until two targets are just free from collision, and the dotted line is a target tube T _j Space-time trajectory after rotation.

To more intuitively demonstrate the adaptive spatio-temporal rotation method, FIG. 3 shows a target tube T _i And T _j Generating a two-dimensional schematic of the maximum impact area at time T, T _i And T _j The motion direction vectors of (a) are respectivelyAnd->With target tube T _j The starting point of entering the monitoring area is taken as the center of a circle and rotated by T _j Until the false collision disappears, as shown in FIG. 3, the dotted line indicates the direction of movement after rotation, ++>Is a direction vector +.>Is the angle of rotation of the target tube.

In order to improve the compression rate of the concentrated video, for the monitoring video with eta > 0, a large number of video frames generate pseudo collision, and the video compression rate is greatly sacrificed by delaying the starting time label of the target tube, so that the critical space-time rotation method is provided for avoiding the pseudo collision on the premise of ensuring the compression rate.

As shown in fig. 4, after the two target tubes translate along the time axis, a large number of overlapped video frames exist, and after the target tubes in the original video are condensed, the collision-free target has pseudo collision. The target tube T _j Rotating the target tube with the starting point as the center of a circle to separate the target tube from the collision point until two target tubes have no overlapping frames, as shown in FIG. 4, the dotted line is the target tube T _j Space-time trajectory after rotation.

To more intuitively illustrate the critical spatiotemporal rotation method, FIG. 5 illustrates a target tube T _i And T _j The two-dimensional schematic diagram of the motion direction vector at the time t is classified according to the motion position relation of the target tube in the monitoring area, and the following formula is defined:

as shown in FIG. 5, the target tube T _j When lambda is more than or equal to 0, T is the center of a starting point entering a monitoring area _i And T _j Is T at critical intersection point _i Is called T at this time _i Is a proximal segment; lambda < 0, T _i And T _j Is T at critical intersection point _j Is called T at this time _j Is a proximal segment, as shown in fig. 5. In performing the target tube optimization rearrangement, different target tubes as proximal segments need to be treated differently to avoid collisions, as will be described separately belowFor T _i And T _j Introduction of rotation angle as proximal segmentA corresponding calculation method.

As shown in fig. 6, T _i Is a proximal segment. Will T _j Rotate until it is equal to T _i The two target tubes will not have a collision relationship,the specific definition of (2) is as follows:

as shown in FIG. 7, T _j Is a proximal segment. Will T _j Rotate until T _j Endpoint and T of (2) _i Intersecting, there will be no collision relationship between the two target tubes.

Rotation angleIs +.>And->Known->Assuming a rotated T _j And T is _i The intersection point coordinate of (a, ζ) is +.>The specific definition of (2) is as follows:

wherein, (μ, ζ) is calculated according to known conditions:

from this, the target tube T can be obtained _j Is of the rotation angle of (a)And (3) avoiding pseudo collision between target pipes, and generating a concentrated video which accords with the visual effect of human eyes and restores the relation between moving targets.

To verify the validity of the above embodiment, experimental comparisons were made on a common dataset and a real scene captured surveillance video. The experimental environment is Windows10 system, intel (R) Core (TM) i5-8265U CPU, NVIDIA GeForce MX display card, and memory 16G. The proposed method was experimentally verified using 12-segment surveillance video.

Tables 1 and 2 show the results of the frame concentration ratio FR and the collision ratio OR experiments, respectively, on 12-segment video. To more intuitively compare performance between different methods, a comparison is made using the average of experimental results for all videos at the last use of the table.

Table 1 compares FR results of CE method, PSO method, IV method, CF method

Table 2 compares OR results with CE method, PSO method, IV method, CF method

As can be seen from table 2, the average OR of the proposed method is significantly better than the CE, PSO and IV methods because the CE, PSO and IV methods merely translate the target tube on the time axis, while the proposed method considers both spatial and temporal dimensions, and can better avoid false collisions and thus result in a more excellent OR. The average OR of the proposed method is close to 0.0680 and 0.0633, respectively, because the CF method shifts the target tube in the time axis while changing the size and moving speed of the target, but the visual quality is poor although the OR close to the proposed method is obtained, and the degree of reflection on the original video content is to be improved. Generally speaking, the improvement in FR comes at the cost of an increase in OR, and by the comprehensive analysis of tables 1 and 2, the proposed method is a great improvement in FR compared to the CE method, while maintaining a better OR value. The OR value of the concentrated video is effectively reduced and the FR value is optimized compared to the PSO method. Compared with the IV method, the method is obviously superior to the IV method in OR aspect, and in FR aspect, the method is close to the IV method but superior to the IV method.

The foregoing describes in detail specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. A video concentration method based on space-time rotation, comprising the steps of:

2) The method comprises the steps of analyzing pseudo collision generated among targets in the video concentration process, defining a target duty ratio threshold gamma, and respectively processing the targets of the pseudo collision, wherein the target duty ratio threshold is calculated according to the width W, the height H, the frame rate f and the target frame width W and the height H of an input video, and the calculation formula is as follows:

4) Performing space-time rotation on a moving target smaller than a target duty ratio threshold, defining an angle critical threshold eta, and dividing a space-time rotation method into self-adaptive space-time rotation and critical space-time rotation, wherein an angle critical threshold calculation formula is as follows:

in cos θ _i，j Representing the target tube T _i And T _j A direction vector included angle;

5) Pseudo collision target with angle critical threshold less than 0 adopts self-adaptive space-time rotation to improve the reflection degree of original video, and the target is assumed to be T _j Into the surveillance zone is the coordinatesRotating target tube T _j Avoiding false collisions, the direction vector after rotation is +.>Specifically defined as:

wherein the method comprises the steps ofFor the direction vector before rotation of object j, r is the distance of the object from the entry into the surveillance zone to the current position,and->Respectively the target tubes T _i And T _j Barycentric coordinate at time t, < >>And->Respectively T _i And T _j At the time T, the target tube T is arranged at the lower right coordinate of the target frame _j Rotation angle->The calculation mode of (a) is as follows:

6) For the false collision target with the angle critical threshold value larger than 0, adopting critical space-time rotation to improve the compression rate of the concentrated video; the target tube T _j Rotating the target pipes by taking the starting point as the center of a circle to enable the target pipes to be far away from the collision point until two target pipes have no overlapping frames, classifying the target pipes according to the motion position relation of the target pipes in the monitored area, and defining the following formula:

wherein,representative target tube T _i Barycentric coordinates of exit monitoring area, +.>And->Respectively represent target tubes T _j Exit and enter the barycentric coordinates of the surveillance zone; with target tube T _j When lambda is more than or equal to 0, T is the center of a starting point entering a monitoring area _i And T _j Is T at critical intersection point _i Is called T at this time _i Is a proximal segment; lambda < 0, T _i And T _j Is T at critical intersection point _j Is called T at this time _j Is a proximal segment; when the target tubes are optimized and rearranged, different target tubes are used as the near-heart section and need to be subjected to different treatments to avoid collision;

T _i for the near heart segment, T is _j Rotate until it is equal to T _i The two target tubes will not have a collision relationship,is specifically defined as->Known->Is T _j Can calculate the rotation angle +.>

T _j For the near heart segment, T is _j Rotate until T _j Endpoint and T of (2) _i Intersecting, there will be no collision relationship between the two target tubes, knownAssuming a rotated T _j And T is _i The intersection point coordinate of (a, ζ) is +.>The specific definition of (2) is as follows:

wherein (μ, ζ) is calculated according to known conditions:

wherein,representative target tube T _i Entering the barycentric coordinates of the surveillance area.

2. The spatio-temporal rotation based video concentration method of claim 1, wherein the preprocessing operation of the input video is: firstly, detecting a moving target in an input video by using a Yolov4 target detection algorithm; then, a DeepSort target tracking algorithm is adopted to obtain a target motion track, and a target tube is formed; and finally, analyzing the collision relation among target pipes, and respectively processing according to the collision mode.

3. The video concentration method based on space-time rotation as claimed in claim 1, wherein for pseudo collision targets with target frame areas larger than γ, a dynamic time domain translation is adopted to delay the start time label of the target tube, and the moving speed of the target is assumed to be v according to the targetCross-to-cross ratio of target collisionModifying the starting time label of the target pipe as follows: