CN111640137A

CN111640137A - Monitoring video key frame evaluation method

Info

Publication number: CN111640137A
Application number: CN202010481238.6A
Authority: CN
Inventors: 张云佐; 杨攀亮; 李怡; 宋洲臣; 张莎莎; 郑丽娟; 佟宽章; 朴春慧
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2020-05-31
Filing date: 2020-05-31
Publication date: 2020-09-08

Abstract

The invention discloses a monitoring video key frame evaluation method, and relates to the technical field of video monitoring image processing. The method comprises the following steps: extracting the motion trail of the target in the video sequence by utilizing the detection and tracking of the moving target to form an original motion trail set of the target; processing the video sequence, extracting key frames in the video sequence, and reconstructing a motion trail set of a target by utilizing a linear interpolation algorithm according to a target motion trail of the extracted key frames; and comparing the original motion track set with the reconstructed motion track set, wherein the higher the contact ratio is, the stronger the ability of the key frame extraction method for capturing the motion state change of the target is, and the weaker the ability is. The method can effectively evaluate the monitored video key frames from the aspect of capturing global and local motion state changes of the target.

Description

Monitoring video key frame evaluation method

Technical Field

The invention relates to the technical field of monitoring video image processing methods, in particular to a monitoring video key frame evaluation method.

Background

At present, the evaluation method for the key frames is not unified, and the quality of the key frame extraction result varies from person to person. Subjective evaluation, however, cannot be used for automatic analysis of video. Meanwhile, the existing widely-used key frame evaluation standard is not suitable for the key frame evaluation of the surveillance video. In the prior art, at least the following two evaluation methods are available:

the fidelity criterion is a metric method based on semi-Hausdorff distance to evaluate the fidelity of the key frame set. Fidelity is typically calculated as the maximum of the minimum distances between the set of keyframes and the video shots, otherwise known as the half Hausdorff distance.

Assuming that S is a video sequence containing N frames, S can be expressed as:

S＝{F(t+nΔt),n＝1,2,...,N} (1)

key frame set KF selected from S:

KF＝{F(t+n_kΔt),k＝1,2,...,K} (2)

the distance function between two images is represented by Diff (-), and then the distance between the set of key frames KF and a frame F (t + n Δ t) in the video sequence S can be calculated as:

the semi-Hausdorff distance between the set of keyframe frames KF and the video sequence S can then be defined as:

the fidelity of the key frame set and the original video sequence can be calculated through the four formulas, and the fidelity criterion is widely applied.

The SRD criterion is based on the study of the key frame extraction problem from the perspective of shot reconstruction. And based on the degree of motion dynamics of the video lens, providing a lens reconstruction degree. Compared to widely used fidelity criteria, SRD focuses more on capturing the detailed dynamics of a shot, as shown in fig. 1.

The rectangular dots and dots shown in fig. 1 have the same fidelity, or alternatively, they are all 0 apart from the curve. However, it is clear that their generalization capabilities are quite different. Circular points do not capture the trend of the evolution of the curve, while rectangular points do. In other words, the generalization ability of the dots is inferior to that of the rectangular dots. Therefore, the criterion is more focused on the local detail and the evolutionary trend of the video shots. The basic idea of this criterion is that if the shot reconstructed by interpolating the key frames can closely approximate the original shot, i.e. extracting the key frame set can capture the detailed dynamics of the shot well. In other words, the motion dynamics of the lens are well preserved.

Under some frame interpolation algorithm (denoted FIP), the optimal key frame set under SRD criteria is selected as follows.

Assuming θ is a certain key frame extraction algorithm, for a video sequence of N frames, it can be expressed as:

S＝{F(t+nΔt),n＝1,2,...,N}， (5)

the key frame set KF extracted by the key frame extraction algorithm θ can be represented as:

KF＝{F(t+n_k(θ)Δt),K＝1,2,...,R}(6)

suppose n_k(θ)≤n＜n_k+1(θ) based on F (t + n)_k(theta) Deltat) and F (t + n)_k+1(θ) Δ t), reconstructing a video frame F (t + n Δ t) by equation (7), resulting in F^*(t+nΔt，θ)：

F^*(t+nΔt，θ)＝FIP(F(t+n_k(θ)Δt),F(t+n_k+1(θ)Δt),n_k(θ),n,n_k+1(θ)) (7)

Under the SRD-oriented criterion, the key frame set can be optimized by the formula (8)

In the formula:

where Sim (. cndot.) can be calculated by PSNR formula.

By analyzing the principles and the calculation processes of the two methods, the fidelity criterion ignores the local details and the evolution trend of the video, and the SRD criterion is to reconstruct the video frame by using the key frame from the perspective of shot reconstruction and then calculate the PSNR of the reconstructed video and the original video. However, for surveillance video, the similarity between adjacent video frames is high, which also results in that the SRD criterion may ignore the change of motion state occurring during the motion of the object.

By analyzing and summarizing the objective evaluation method, it can be found that the existing video key frame evaluation method cannot well evaluate whether the extracted key frame can capture the change of the motion state of the target in the monitored video.

Disclosure of Invention

The invention aims to solve the technical problem of how to provide a method for effectively evaluating the key frame of a monitoring video from the aspect of capturing global and local motion state changes of a target.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a monitoring video key frame evaluation method is characterized by comprising the following steps:

extracting the motion trail of the target in the video sequence by utilizing the detection and tracking of the moving target to form an original motion trail set of the target;

processing the video sequence, extracting key frames in the video sequence, and reconstructing a motion trail set of a target by utilizing a linear interpolation algorithm according to a target motion trail of the extracted key frames;

and comparing the original motion track set with the reconstructed motion track set, wherein the higher the contact ratio is, the stronger the ability of the key frame extraction method for capturing the motion state change of the target is, and the weaker the ability is.

The further technical scheme is that the original motion track set of the target is obtained by the following method:

and calculating the pixel position of the central point of the moving target in the video sequence, and then sequentially connecting according to the sequence of the video sequence to form a target motion track curve.

The further technical scheme is that the original motion trail set is as follows:

firstly, carrying out moving target detection and target tracking on an input N-frame monitoring video sequence to extract the moving track of a target, wherein the moving track set is expressed as follows:

T＝{P(t+nΔt),n＝1,2,...,N}

in the formula, T represents a motion trajectory set of an N-frame surveillance video target, P (T + N Δ T) represents a motion trajectory of an obtained nth-frame video sequence, T represents time, and Δ T represents a time interval of each frame.

The further technical scheme is that the method for reconstructing the target motion track set comprises the following steps:

the monitored video sequence passes through a key frame extraction algorithm alpha, and the motion track set of the extracted beta frame key frame can be expressed as follows:

βT＝{P(t+n_β(α)Δt),β＝1,2,...,R}

in the formula, β T represents the motion trajectory set for extracting key frames, α represents a certain key frame extraction algorithm, T represents time, Δ T represents the time interval of each frame, and n in the formula_β(α)≤n＜n_β+1(α) so that P (t + n) is in_β(α) Δ t) and P (t + n)_β+1(α) Δ t), the motion track set of P (t + n Δ t) can be obtained by a linear interpolation algorithm, and can be represented as:

P^*(t+nΔt，α)＝PLI(P(t+n_β(α)Δt)：P(t+n_β+1(α)Δt))

in the formula, PLI represents a dotted linear interpolation algorithm, P^*(t + n Δ t, α) represents calculation by linear interpolationAnd obtaining the motion track of the nth frame.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the method comprises the steps of firstly, detecting and tracking a moving target, extracting a moving track of the target in a video, and forming an original moving track set of the target. And then, reconstructing a motion track set of the target by utilizing a linear interpolation algorithm according to the target motion track of the extracted key frame. And finally, comparing the original motion track set with the reconstructed motion track set, wherein the higher the coincidence degree is, the stronger the ability of the key frame extraction method for capturing the motion state change of the target is. Experimental results show that the method can effectively evaluate the key frames of the monitored video from the aspect of capturing the global and local motion state changes of the target.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is an exemplary diagram of the invalidation of fidelity criteria in the prior art;

FIG. 2 is a diagram of a motion trajectory of a video object in an embodiment of the present invention

FIG. 3 is a flow chart of a method according to an embodiment of the invention;

FIG. 4 is a diagram of the result of reconstructing motion trajectories (extracting 8 key frames) of three methods according to the embodiment of the present invention;

FIG. 5 is a diagram of the result of reconstructing motion trajectories (extracting 12 key frames) of three methods according to the embodiment of the present invention;

fig. 6 is a diagram of the result of reconstructing a motion trajectory (extracting 16 key frames) by three methods in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

The embodiment of the invention discloses a monitoring video key frame evaluation method, which is explained in detail by combining specific contents as follows:

starting from the Motion track of the target in the surveillance video, the present application provides a Motion Track Reconstruction (MTR), which is a key frame evaluation mechanism of the surveillance video.

The motion track of the target:

the target motion track is used as a descriptor which can provide key relevant characteristics for describing motion changes in the video, and can provide a concise and information-rich clue for researching the target motion characteristics. The target motion trajectory refers to the spatial position of the target when the target passes through the monitoring scene. The method can provide important analysis and basic data for researching the change of motion states of target acceleration, deceleration, speed, stretching and the like. For surveillance video, people are more concerned about moving objects and are attracted by changes in object motion. The movement track of the target not only can reflect the position information of the target, but also can reflect the change of the movement process of the target. The motion trajectory of an example video object is shown in fig. 2.

As can be seen from fig. 2, the curve in the figure is a target motion trajectory curve formed by pixel positions of the central point of each frame of the moving target in the video. When the moving direction of the target changes, the moving speed changes or the target makes local actions (stretching hands, squatting, etc.), the curve changes. People are often easily attracted by the action of the target motion change, so that the target motion trajectory curve can reflect the change of the local and global motion states of the target and can meet the human visual perception to a great extent. Based on the analysis, the method starts from the perspective that the target motion track can reflect the change of the target motion state, and provides a key frame evaluation mechanism for reconstructing the target motion track. The method evaluates the aspect of the capability of capturing the motion state change of the target by the video key frame extraction method, and the closer the reconstruction of the target motion track in the key frame extracted by the key frame extraction method to the original track indicates that the capability of capturing all the motion state changes of the target of the extracted key frame is stronger, the more information of the motion state changes of the target is contained, and the better the human visual perception is met.

Motion trajectory reconstruction

And MTR represents the coincidence degree of the target motion track reconstruction in the extracted key frame and the target motion track in the original video. The basic idea of motion trajectory reconstruction is to solve the problem that the key frame extraction process is taken as target motion trajectory reconstruction, that is, the motion trajectory of a moving target in a monitored video sequence is restored as much as possible, and the motion trajectory not only refers to the overall motion trajectory of the moving target, but also includes the local motion trajectory of the target. And (3) evaluating a motion trail reconstruction mechanism, as shown in fig. 3.

As shown in fig. 3, in the method, first, a motion trajectory of a target in a video is extracted by using moving target detection and tracking to form an original motion trajectory set of the target. And then, reconstructing a motion track set of the target by utilizing a linear interpolation algorithm according to the target motion track of the extracted key frame. And finally, comparing the original motion track set with the reconstructed motion track set, wherein the higher the coincidence degree is, the stronger the ability of the key frame extraction method for capturing the motion state change of the target is.

Observing fig. 3, firstly, under the MTR evaluation mechanism, the motion trajectory of the target can be extracted by performing motion target detection and target tracking on the input N-frame surveillance video sequence, and the motion trajectory set can be represented as:

T＝{P(t+nΔt),n＝1,2,...,N} (10)

in the formula (10), T represents a motion trajectory set of N frames of monitored video objects, P (T + N Δ T) represents a motion trajectory of an obtained nth frame of video sequence, T represents time, and Δ T represents a time interval of each frame.

βT＝{P(t+n_β(α)Δt),β＝1,2,...,R} (11)

in equation (11), β T represents the motion trajectory set for extracting key frames, α represents a certain key frame extraction algorithm, and assume n in equation_β(α)≤n＜n_β+1(α) so that P (t + n) is in_β(α) Δ t) and P (t + n)_β+1(α) Δ t), the motion track set of P (t + n Δ t) can be obtained by a linear interpolation algorithm, and can be represented as:

P^*(t+nΔt，α)＝PLI(P(t+n_β(α)Δt)：P(t+n_β+1(α)Δt)) (12)

in the formula, PLI represents a dotted linear interpolation algorithm, P^*(t + n Δ t, α) represents the motion trajectory of the nth frame obtained by the linear interpolation algorithm.

In conclusion, the new motion trajectory reconstruction evaluation mechanism is better than the SRD criterion in the aspect of human visual perception, and can evaluate whether the video key frames extracted by the key frame extraction method can meet the human main visual perception. Different from the SRD criterion, the motion trail reconstruction evaluation mechanism can evaluate the capability of the key frame extraction method for capturing the motion state change of the target in the video from the view point of capturing the motion state change of the target.

Experimental results and analysis:

to verify the performance of the motion trajectory reconstruction mechanism described herein. And evaluating the key frame extraction result of Video5 in the table 1 by using a motion track reconstruction evaluation mechanism through three methods of ME, monitoring Video key frame extraction (marked as MV) based on motion speed and monitoring Video key frame extraction (marked as MSE) based on frequency domain analysis.

The completion environment of the verification experiment is the same as the operation environment for verifying the key frame extraction method. The experimental parameters of the MV and MSE key frame extraction method used at this time are set to be N equal to 1, and then a specified number of key frames are extracted.

When the extracted key frame is 8 frames, the motion trail reconstruction results of the three methods are shown in fig. 4.

Observing the target motion trajectory reconstruction result in fig. 4, it can be found that the motion trajectory reconstruction results of the three methods have a larger difference with the original target motion process, but compared with the ME keyframe reconstruction target motion trajectory, the motion trajectory reconstruction results of the MV and MSE methods are closer to the original target motion trajectory. The MV and MSE key frame extraction methods have stronger capability of capturing the motion state of the target, and have more attraction when extracting the video frame with the changed motion state of the target, thereby being more in line with the visual perception of human. This shows that the method of the present application can evaluate the ability of the key frame extraction method to capture the motion state change of the target.

When the extracted key frame is 12 frames, the result of reconstructing the target motion trajectory by using the key frame in different methods is shown in fig. 5. Observing the target motion track reconstruction results of the key frames in different methods in fig. 5, it can be found that when the extracted key frames are 12 frames, the coincidence degree of the motion track of the reconstructed target and the motion track of the target is higher than when the extracted key frames are 8 frames. Compared with the reconstructed motion track of the ME key frame, the reconstructed motion track of the key frame by the MV method and the MSE method has higher coincidence degree with the original target motion track.

When the extracted key frame is 16 frames, the result of reconstructing the motion trajectory by the two methods is shown in fig. 6. As can be seen from fig. 6, compared with the method provided by ME, the coincidence degree between the motion trajectory reconstructed by the MV and MSE key frames and the original target motion trajectory is higher, and the target motion trajectory reconstructed by the MSE key frames and the original motion trajectory are closer. However, the reconstructed target motion track of the MV key frame is closer to the original target motion track than the reconstructed target motion track of the ME key frame, and particularly, the reconstructed motion track of the previous frame is basically the same as the original track.

Through the above motion trajectory reconstruction results, it can be found that as the number of extracted key frames is increased, the reconstructed target motion trajectory is closer to the original trajectory. The above analysis can conclude that the ability of the MV and MSE keyframe extraction results to reconstruct the target motion trajectory is superior to the ability of the ME keyframe to reconstruct the target motion trajectory. The reason is that two key frame extraction methods, namely MV and MSE, are proposed for better capturing the change of the motion state of the target, and ME is a key frame extraction method proposed by the SRD-oriented criterion. This demonstrates from the reverse that the method described in the present application can evaluate the extracted key frames of the surveillance video in terms of capturing global and local motion state changes of the target.

Claims

1. A monitoring video key frame evaluation method is characterized by comprising the following steps:

2. The surveillance video key-frame evaluation method of claim 1, characterized by: the original motion track set of the target is obtained by the following method:

3. The surveillance video key-frame evaluation method of claim 1, characterized by: the original motion trajectory set is as follows:

T＝{P(t+nΔt),n＝1,2,...,N}

4. The surveillance video key-frame evaluation method of claim 1, characterized by: the method for reconstructing the target motion track set comprises the following steps:

βT＝{P(t+n_β(α)Δt),β＝1,2,...,R}

P^*(t+nΔt，α)＝PLI(P(t+n_β(α)Δt)：P(t+n_β+1(α)Δt))