CN110602444B - Video summarization method based on Weber-Fisher's law and time domain masking effect - Google Patents

Video summarization method based on Weber-Fisher's law and time domain masking effect Download PDF

Info

Publication number
CN110602444B
CN110602444B CN201910723748.7A CN201910723748A CN110602444B CN 110602444 B CN110602444 B CN 110602444B CN 201910723748 A CN201910723748 A CN 201910723748A CN 110602444 B CN110602444 B CN 110602444B
Authority
CN
China
Prior art keywords
frame
video
frames
weber
marked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910723748.7A
Other languages
Chinese (zh)
Other versions
CN110602444A (en
Inventor
刘颖
王玲
公衍超
王富平
薛刚
梁伟
卢津
王昊
李兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Priority to CN201910723748.7A priority Critical patent/CN110602444B/en
Publication of CN110602444A publication Critical patent/CN110602444A/en
Application granted granted Critical
Publication of CN110602444B publication Critical patent/CN110602444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A video abstraction method based on a Weber-Fisher law and a time domain masking effect is composed of Gaussian filtering, region blocking, frame difference Euclidean distance determination, Weber-Fisher model construction, threshold determination, denoising model construction, key frame extraction and key frame synthesis video. And the video area is processed in a blocking mode, so that the problem of missed detection caused by the fact that the target is too far away from the camera is solved. The Euclidean distance frame difference method and the Weber-Fisher model are combined to effectively cope with complex monitoring environments, and the process of repeatedly adjusting the threshold value is avoided. The denoising model combined with the time domain covering effect filters the interference noise and then synthesizes the video, so that the display quality of the synthesized video is improved. The method does not depend on color information in the video frame, and is also effective for monitoring videos at night.

Description

Video summarization method based on Weber-Fisher's law and time domain masking effect
Technical Field
The invention belongs to the technical field of video analysis, and particularly relates to a video summarization method.
Background
With the popularization of communication tools and monitoring equipment in a large quantity and the rapid development of the film and television industry, the generated massive video data not only brings huge pressure to data storage, but also is not beneficial to people to quickly retrieve key video information. The efficiency of searching the key information of the video by utilizing the manual work is low, and the conditions of missing detection and false detection are easy to be influenced by human body sensory fatigue. In order to quickly browse and efficiently utilize the video data, the video summarization technology is very important.
The current video abstract generation method mainly comprises a method based on key frame extraction, space-time transformation based on moving objects and highlight scene identification. The key frame extraction method includes a motion analysis based method, a shot boundary based method, an image content based method and a compressed video stream based method. The frame difference method is one of the commonly used methods based on motion analysis, and the basic principle is that a pixel-based time difference is adopted between two or three adjacent frames of an image sequence to extract a moving target. In the frame difference method, the selection of the threshold is quite critical, the noise influence is difficult to suppress when the threshold is too low, and the detail change in the image is ignored when the threshold is too high.
The disadvantages of the prior art described above are as follows:
(1) the existing video abstraction method is used for processing the global area of a video, and the problem of missed detection caused by too far distance between a moving target and a camera easily occurs.
(2) In the aspect of moving target detection, the existing frame difference method is simple, a fixed threshold is mostly adopted, and the complex practical application environment is difficult to meet.
(3) In the aspect of summary video synthesis, the existing method directly uses frames larger than a threshold value for video synthesis, and the influence caused by noise is not fully considered.
Disclosure of Invention
The technical problem to be solved by the present invention is to overcome the above disadvantages of the prior art, and to provide a video summarization method based on weber-fisher's law and time domain masking effect, which can reduce the interference of noise to the target detection, avoid the problem of missing detection, and process the complex environment video.
The technical scheme adopted for solving the technical problems comprises the following steps:
(1) gauss filtering
And (3) carrying out noise removal on the 1 st frame to the Nth frame of the video by a Gaussian filtering method, wherein N is the total frame number of the video and is a limited positive integer.
(2) Region partitioning
Dividing the 1 st frame to the Nth frame of the video into non-overlapping square pixel blocks according to the sequence from left to right and from top to bottom, wherein the side length of each pixel block is m pixels, rounding the number of the divided blocks in the width direction and the height direction of the video frame, and carrying out scale transformation on the 1 st frame to the Nth frame according to the following formula:
w=m×s (1)
h=m×t (2)
where w is the frame width, h is the frame height, s is the number of blocks in the horizontal direction, a positive integer, t is the number of blocks in the vertical direction, a positive integer, and m is an element { 16.
(3) Determining frame difference Euclidean distance
Determining a frame difference Euclidean distance D (k, i, j) of a jth pixel point of an ith block of a kth frame of a video according to formula (3):
Figure BDA0002158173150000021
wherein k ∈ {1, 1., N-2}, i ∈ {1,..,. s × t }, j ∈ {1,..,. m }, respectively2And i is taken from left to right and from top to bottom in the frame, namely i corresponding to the leftmost block in a frame is taken as 1, i corresponding to the rightmost block is taken as s × t, j is taken from left to right and from top to bottom in the block according to the sequence of pixels, namely j corresponding to the leftmost pixel point in the block is taken as 1, and j corresponding to the rightmost pixel point is taken as m2And x (k, i, j) is a luminance component value of the jth pixel point of the ith block of the kth frame of the video.
The Euclidean distance D (k, i) of the frame difference of the ith block of the kth frame of the video is determined according to the formula (4):
Figure BDA0002158173150000022
(4) construction of a Weber-Fechner model
Maximum value p of frame difference Euclidean distance of k frame block of videokDetermined according to equation (5):
pk=max{D(k,1),D(k,2),...,D(k,s×t)} (5)
the maximum value alpha of the Euclidean distance of the frame differences of the blocks from the 1 st frame to the N-2 th frame of the video is determined according to the formula (6):
α=max{p1,p2,...,pN-2} (6)
where alpha is a minimum of 500.
The construction of the weber-fisher model beta is as follows:
β=algα-b (7)
wherein a belongs to [3,4], b belongs to [5,7 ].
(5) Determining a threshold value
The average value u of the maximum value of the Euclidean distance of the frame difference of the previous n frame blocks of the video is determined according to the formula (8):
Figure BDA0002158173150000031
where n is e { 15.,. 50 }.
The threshold value T is determined by equation (9):
T=β×u (9)
(6) construction of denoising model
The absolute value r of the difference between α and u is determined according to equation (10):
r=|α-u| (10)
wherein r is a minimum of 26.
Constructing a denoising model f as follows:
f=round(clgr-d) (11)
wherein, round () is function, integer, c belongs to [0.5,0.64], d belongs to [0,0.2 ].
(7) Extracting key frames
1) The maximum value p of the Euclidean distance of the frame difference of the k frame block of the videokCompared with a threshold value T, if pkAnd ≧ T, marking the kth frame as 1, otherwise marking it as 0.
2) And for the 1 st frame to the N-2 th frame of the video, sequentially checking frame marks according to the playing sequence of the frames, if the frames marked as 1 continuously appear and the frame number is more than f, taking the frames as key frames and storing the key frames under a specified folder. For frames marked as 0, if the number of consecutive frames is less than or equal to f, saving the frames marked as 0 as key frames under a specified folder when any one of the following is satisfied:
these frames marked 0 appear with consecutive frames more than f from the frame marked 1 immediately before and after the playing order.
And secondly, the 1 st frame in the frames marked as 0 is the 1 st frame of the video, and the continuous frame number of the frame marked as 1 which is the closest to the frame marked as 0 after the playing sequence of the frames marked as 0 is greater than f.
③ the frame marked 0 and the frame marked 1 nearest to the frame marked 0 in the playing sequence appear to have a continuous frame number larger than f, and the last 1 frame in the frames marked 0 is the N-2 frame of the video.
3) And for the N-1 th frame and the N-2 th frame of the video, if the N-2 th frame is judged as a key frame, extracting and storing the N-1 th frame and the N-1 th frame into a specified folder.
(8) Key frame composite video
And (5) combining the key frames stored in the appointed folder in the step (7) into the abstract video according to the playing sequence.
In the region blocking step (2) of the present invention, m is preferably 32.
In the step (4) of constructing the Weber-Fechner model, a is optimally 3.5, and b is optimally 6.
In the step (5) of determining the threshold value of the present invention, n is preferably 30.
In the step (6) of constructing the denoising model, c is optimally 0.58, and d is optimally 0.1.
The invention provides a method for acquiring basic information of a read-in video, performing operations such as Gaussian filtering, gray level conversion and the like on all video frames, and blocking all video frames according to a square; calculating frame difference Euclidean distances of corresponding blocks between adjacent frames, determining the maximum value of the frame difference Euclidean distances of all video frame blocks, and establishing a model by combining a Weber-Fisher law to adaptively determine a threshold; constructing a denoising model according to the time domain masking effect, and extracting related frames as key frames when video frames meet certain conditions; and storing the extracted key frame composite abstract video under a designated folder. The problems that the efficiency of manually searching the key information of the video is low, and missing detection and false detection are caused easily by the influence of human body sensory fatigue are solved, and the retrieval accuracy of the key information of the video is improved.
The invention has the following advantages:
(1) and the video area is subjected to blocking processing, so that the problem of missed detection caused by the fact that the target is too far away from the camera is effectively avoided.
(2) The European distance frame difference method is combined with a model established based on the Weber-Fisher law of human vision to determine reasonable threshold values for each video to detect the moving target, so that the complex actual monitoring environment is effectively responded, and the process of repeatedly adjusting the threshold values due to videos with different contents is avoided.
(3) The denoising model combined with the time domain covering effect filters the interference noise and then synthesizes the video, thereby effectively improving the display quality of the synthesized video.
(4) The invention does not depend on color information in video frames, and is also effective for monitoring videos at night.
Drawings
FIG. 1 is a flowchart of example 1 of the present invention.
Fig. 2 is a partial image of a yard surveillance video.
Fig. 3 is a partial image of an elevator surveillance video.
Fig. 4 is a partial image of a checkout counter surveillance video.
Fig. 5 is a partial image of a road monitoring video.
Fig. 6 is a partial image of a cell doorway monitoring video.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the examples.
Example 1
In fig. 1, the video summarization method based on weber-fisher's law and temporal masking effect of the present embodiment is composed of the following steps:
(1) gauss filtering
And (3) carrying out noise removal on the 1 st frame to the Nth frame of the video by a Gaussian filtering method, wherein N is the total frame number of the video and is a limited positive integer.
(2) Region partitioning
Dividing the 1 st frame to the Nth frame of the video into non-overlapping square pixel blocks according to the sequence from left to right and from top to bottom, wherein the side length of each pixel block is m pixels, rounding the number of the divided blocks in the width direction and the height direction of the video frame, and carrying out scale transformation on the 1 st frame to the Nth frame according to the following formula:
w=m×s (1)
h=m×t (2)
where w is the frame width, h is the frame height, s is the number of blocks in the horizontal direction, a positive integer, t is the number of blocks in the vertical direction, a positive integer, and m is 32.
(3) Determining frame difference Euclidean distance
Determining a frame difference Euclidean distance D (k, i, j) of a jth pixel point of an ith block of a kth frame of a video according to formula (3):
Figure BDA0002158173150000051
wherein k ∈ {1, 1., N-2}, i ∈ {1,..,. s × t }, j ∈ {1,..,. m }, respectively2And i is taken from left to right and from top to bottom in the frame, namely i corresponding to the leftmost block in a frame is taken as 1, i corresponding to the rightmost block is taken as s × t, j is taken from left to right and from top to bottom in the block according to the sequence of pixels, namely j corresponding to the leftmost pixel point in the block is taken as 1, and j corresponding to the rightmost pixel point is taken as m2I.e. j takes the value 322And x (k, i, j) is the brightness component value of the j pixel point of the ith block of the kth frame of the video.
The Euclidean distance D (k, i) of the frame difference of the ith block of the kth frame of the video is determined according to the formula (4):
Figure BDA0002158173150000061
(4) construction of a Weber-Fechner model
Maximum value p of frame difference Euclidean distance of k frame block of videokDetermined according to equation (5):
pk=max{D(k,1),D(k,2),...,D(k,s×t)} (5)
the maximum value alpha of the Euclidean distance of the frame differences of the blocks from the 1 st frame to the N-2 th frame of the video is determined according to the formula (6):
α=max{p1,p2,...,pN-2} (6)
where alpha is a minimum of 500.
The construction of the weber-fisher model beta is as follows:
β=algα-b (7)
where a is equal to [3,4], b is equal to [5,7], and in this embodiment, a is 3.5 and b is 6.
(5) Determining a threshold value
The average value u of the maximum value of the Euclidean distance of the frame difference of the previous n frame blocks of the video is determined according to the formula (8):
Figure BDA0002158173150000062
where n ∈ { 15., 50}, in this embodiment, n is taken to be 30.
The threshold value T is determined by equation (9):
T=β×u (9)
(6) construction of denoising model
The absolute value r of the difference between α and u is determined according to equation (10):
r=|α-u| (10)
wherein r is a minimum of 26.
Constructing a denoising model f as follows:
f=round(clgr-d) (11)
where round () is a function, taking an integer, c is 0.5,0.64, d is 0,0.2, this embodiment takes c as 0.58 and d as 0.1.
(7) Extracting key frames
1) The maximum value p of the Euclidean distance of the frame difference of the k frame block of the videokCompared with a threshold value T, if pkMarking the kth frame as 1 when the number is more than or equal to T, otherwise marking the kth frame as 0;
2) for the 1 st frame to the N-2 th frame of the video, sequentially checking frame marks according to the playing sequence of the frames, if the frames marked as 1 continuously appear and the frame number is more than f, taking the frames as key frames and storing the key frames under an appointed folder; for frames marked as 0, if the number of consecutive frames is less than or equal to f, saving the frames marked as 0 as key frames under a specified folder when any one of the following is satisfied:
the continuous frame numbers of the frames marked as 0 and the frames marked as 1 nearest to the frames in the front-back direction of the playing sequence of the frames are all larger than f;
second, the 1 st frame in the frames marked as 0 is the 1 st frame of the video, and the continuous frame number of the frame marked as 1 which is the latest after the playing sequence of the frames marked as 0 is greater than f;
the continuous frame number of the frame marked as 1 before the frame marked as 0 and the frame marked as 0 closest to the frame in the playing sequence is larger than f, and the last 1 frame in the frames marked as 0 is the N-2 frame of the video;
3) and for the N-1 th frame and the N-2 th frame of the video, if the N-2 th frame is judged as a key frame, extracting and storing the N-1 th frame and the N-1 th frame into a specified folder.
(8) Key frame composite video
And (5) combining the key frames stored in the appointed folder in the step (7) into the abstract video according to the playing sequence.
Example 2
The video summarization method based on the weber-fisher law and the time domain masking effect of the embodiment comprises the following steps:
(1) gauss filtering
This procedure is the same as in example 1.
(2) Region partitioning
Dividing the 1 st frame to the Nth frame of the video into non-overlapping square pixel blocks according to the sequence from left to right and from top to bottom, wherein the side length of each pixel block is m pixels, rounding the number of the divided blocks in the width direction and the height direction of the video frame, and carrying out scale transformation on the 1 st frame to the Nth frame according to the following formula:
w=m×s (1)
h=m×t (2)
where w is the frame width, h is the frame height, s is the number of blocks in the horizontal direction, a positive integer, t is the number of blocks in the vertical direction, a positive integer, and m is 16.
(3) Determining frame difference Euclidean distance
Determining a frame difference Euclidean distance D (k, i, j) of a jth pixel point of an ith block of a kth frame of a video according to formula (3):
Figure BDA0002158173150000081
wherein k ∈ {1, 1., N-2}, i ∈ {1,..,. s × t }, j ∈ {1,..,. m }, respectively2And i is taken from left to right and from top to bottom in the frame, namely i corresponding to the leftmost block in a frame is taken as 1, i corresponding to the rightmost block is taken as s × t, j is taken from left to right and from top to bottom in the block according to the sequence of pixels, namely j corresponding to the leftmost pixel point in the block is taken as 1, and j corresponding to the rightmost pixel point is taken as m2I.e. j has a value of 162And x (k, i, j) is the brightness component value of the j pixel point of the ith block of the kth frame of the video.
The Euclidean distance D (k, i) of the frame difference of the ith block of the kth frame of the video is determined according to the formula (4):
Figure BDA0002158173150000082
(4) construction of a Weber-Fechner model
Maximum value p of frame difference Euclidean distance of k frame block of videokDetermined according to equation (5):
pk=max{D(k,1),D(k,2),...,D(k,s×t)} (5)
the maximum value alpha of the Euclidean distance of the frame differences of the blocks from the 1 st frame to the N-2 th frame of the video is determined according to the formula (6):
α=max{p1,p2,...,pN-2} (6)
where alpha is a minimum of 500.
The construction of the weber-fisher model beta is as follows:
β=algα-b (7)
where a is equal to [3,4], b is equal to [5,7], and in this embodiment, a is 3 and b is 5.
(5) Determining a threshold value
The average value u of the maximum value of the Euclidean distance of the frame difference of the previous n frame blocks of the video is determined according to the formula (8):
Figure BDA0002158173150000091
wherein n is 15.
The threshold value T is determined by equation (9):
T=β×u (9)
(6) construction of denoising model
The absolute value r of the difference between α and u is determined according to equation (10):
r=|α-u| (10)
wherein r is a minimum of 26.
Constructing a denoising model f as follows:
f=round(clgr-d) (11)
where round () is a function, taking an integer, c is 0.5,0.64, d is 0,0.2, this embodiment takes c as 0.5, d as 0.
The other steps were the same as in example 1.
Example 3
(1) Gauss filtering
This procedure is the same as in example 1.
(2) Region partitioning
Dividing the 1 st frame to the Nth frame of the video into non-overlapping square pixel blocks according to the sequence from left to right and from top to bottom, wherein the side length of each pixel block is m pixels, rounding the number of the divided blocks in the width direction and the height direction of the video frame, and carrying out scale transformation on the 1 st frame to the Nth frame according to the following formula:
w=m×s (1)
h=m×t (2)
where w is the frame width, h is the frame height, s is the number of blocks in the horizontal direction, a positive integer, t is the number of blocks in the vertical direction, a positive integer, and m is 64.
(3) Determining frame difference Euclidean distance
Determining a frame difference Euclidean distance D (k, i, j) of a jth pixel point of an ith block of a kth frame of a video according to formula (3):
Figure BDA0002158173150000092
wherein k ∈ {1, 1., N-2}, i ∈ {1,..,. s × t }, j ∈ {1,..,. m }, respectively2},i is taken according to the sequence of the blocks from left to right and from top to bottom in the frame, namely i corresponding to the block at the top left corner in one frame is taken as 1, i corresponding to the block at the bottom right corner is taken as s × t, j is taken according to the sequence of pixels from left to right and from top to bottom in the block, namely j corresponding to the pixel point at the top left corner in one block is taken as 1, and j corresponding to the pixel point at the bottom right corner is taken as m2I.e. j takes the value 642And x (k, i, j) is the brightness component value of the j pixel point of the ith block of the kth frame of the video.
The Euclidean distance D (k, i) of the frame difference of the ith block of the kth frame of the video is determined according to the formula (4):
Figure BDA0002158173150000101
(4) construction of a Weber-Fechner model
Maximum value p of frame difference Euclidean distance of k frame block of videokDetermined according to equation (5):
pk=max{D(k,1),D(k,2),...,D(k,s×t)} (5)
the maximum value alpha of the Euclidean distance of the frame differences of the blocks from the 1 st frame to the N-2 th frame of the video is determined according to the formula (6):
α=max{p1,p2,...,pN-2} (6)
where alpha is a minimum of 500.
The construction of the weber-fisher model beta is as follows:
β=algα-b (7)
where a is equal to [3,4], b is equal to [5,7], and in this embodiment, a is 4 and b is 7.
(5) Determining a threshold value
The average value u of the maximum value of the Euclidean distance of the frame difference of the previous n frame blocks of the video is determined according to the formula (8):
Figure BDA0002158173150000102
wherein n is 15.
The threshold value T is determined by equation (9):
T=β×u (9)
(6) construction of denoising model
The absolute value r of the difference between α and u is determined according to equation (10):
r=|α-u| (10)
wherein r is a minimum of 26.
Constructing a denoising model f as follows:
f=round(clgr-d) (11)
where round () is a function, taking an integer, c is 0.5,0.64, d is 0,0.2, this embodiment takes c as 0.64 and d as 0.2.
The other steps were the same as in example 1.
In order to verify the beneficial effects of the invention, the inventor conducts experiments on the test video by adopting the video summarization method based on the weber-fisher law and the time domain masking effect in the embodiment 1 of the invention.
1. Conditions of the experiment
The experimental test environment is a wonderful computer of a Windows l0 (64-bit) operating system, which is configured as an InterCorei 7-7700HQ, a 4-core CPU processor and a 16GB memory, and performs experimental operation on a MATLAB2018a platform.
2. Test video introduction
The test video is shot in the daytime and at night, has high definition and fuzziness, and also has complex texture and simple texture. The courtyard video part content is shown in fig. 2, the elevator video part content is shown in fig. 3, the cashier desk monitoring video part content is shown in fig. 4, the road monitoring video part content is shown in fig. 5, and the cell doorway monitoring video part content is shown in fig. 6. The above video is a video after being subjected to region clipping according to a specific position, and the attribute of each video is shown in table 1.
TABLE 1 attributes of test videos
Figure BDA0002158173150000111
3. Evaluation method
The common evaluation modes of video abstraction include objective evaluation and subjective evaluation. The objective evaluation is to compare the quality of the summarized video by using some evaluation function. Commonly used evaluation functions are accuracy, error rate, precision, recall, and F-score. The subjective evaluation is carried out in a mode of manually scoring the abstract video or evaluating the good and bad grades.
(1) Objective evaluation
The calculation formulas of the precision P, the recall ratio R and the F-score are respectively as follows:
Figure BDA0002158173150000121
Figure BDA0002158173150000122
wherein N ismAS、NAS、NUSThe number of matched key frames, the number of automatically extracted key frames and the number of manual excerpts of the user are respectively.
A frame difference method and the method of the invention are adopted for carrying out comparison experiments. The results of the experiment are shown in table 2. Table 2 the experimental results were averaged over the 5 test video experimental results described above.
TABLE 2 precision, recall, F-score test results for different methods
Figure BDA0002158173150000123
As can be seen from Table 2, the three indexes of the precision, the recall rate and the F-score of the method are respectively 86.2%, 78.5% and 79.6%, and the values of the three indexes are all higher than those of the indexes corresponding to the frame difference method. The higher the precision, recall, F-score, indicates the better the composite digest video. Therefore, it can be seen from the above analysis that the method of the present invention is superior to the frame difference method.
(2) Subjective evaluation
10 boys and 10 girls were invited, 20 of which were between 18 and 24 years of age, and testers with normal vision participated in the subjective evaluation experiment. In a suitable indoor environment, the test video is watched at a distance of 75cm from a computer screen, the key information of the video is known, the video abstract synthesized by the method is watched, and evaluation is carried out according to the following standards.
The evaluation grades are divided into: good, normal, poor. The key information of the video summary of a good grade is basically not lost, the key information of the video summary of a general grade is slightly lost, and the key information of the video summary of a poor grade is greatly lost.
The evaluation grade results of each student are converted into percentages, and the subjective evaluation results of each student are shown in table 3.
TABLE 3 subjective evaluation results of users
Figure BDA0002158173150000124
Figure BDA0002158173150000131
As can be seen from Table 3, the 5 video summaries have good ratings of 74%, which substantially match the recall rate of 78.5%, and the key frames extracted by the method of the present invention meet the evaluation results of the testers; 5 video summaries are typically rated at 20%; the 5 video summary difference level is 6%.
4. Conclusion
Under the same test data and evaluation standard, compared with a frame difference method, the method disclosed by the invention has the advantages that the generated video abstract comprehensive index F-score, recall rate, precision and subjective quality are higher, the key information of the video can be better reflected, and the quality of the video abstract is improved.

Claims (5)

1. A video abstraction method based on Weber-Fisher's law and time domain masking effect is characterized by comprising the following steps:
(1) gauss filtering
Removing noise from the 1 st frame to the Nth frame of the video by a Gaussian filtering method, wherein N is the total frame number of the video and is a limited positive integer;
(2) region partitioning
Dividing the 1 st frame to the Nth frame of the video into non-overlapping square pixel blocks according to the sequence from left to right and from top to bottom, wherein the side length of each pixel block is m pixels, rounding the number of the divided blocks in the width direction and the height direction of the video frame, and carrying out scale transformation on the 1 st frame to the Nth frame according to the following formula:
w=m×s (1)
h=m×t (2)
wherein w is the frame width, h is the frame height, s is the number of blocks in the horizontal direction and is a positive integer, t is the number of blocks in the vertical direction and is a positive integer, and m belongs to { 16.., 64 };
(3) determining frame difference Euclidean distance
Determining a frame difference Euclidean distance D (k, i, j) of a jth pixel point of an ith block of a kth frame of a video according to formula (3):
Figure FDA0002158173140000011
wherein k ∈ {1, 1., N-2}, i ∈ {1,..,. s × t }, j ∈ {1,..,. m }, respectively2I is taken from left to right and from top to bottom in the frame, i is 1 corresponding to the top left block, s × t corresponding to the bottom right block, j is taken from left to right in the block, i is 1 corresponding to the top left pixel point, and m corresponding to the bottom right pixel point2X (k, i, j) is the brightness component value of the jth pixel point of the ith block of the kth frame of the video;
the Euclidean distance D (k, i) of the frame difference of the ith block of the kth frame of the video is determined according to the formula (4):
Figure FDA0002158173140000012
(4) construction of a Weber-Fechner model
Maximum value p of frame difference Euclidean distance of k frame block of videokDetermined according to equation (5):
pk=max{D(k,1),D(k,2),...,D(k,s×t)} (5)
the maximum value alpha of the Euclidean distance of the frame differences of the blocks from the 1 st frame to the N-2 th frame of the video is determined according to the formula (6):
α=max{p1,p2,...,pN-2} (6)
wherein α is a minimum of 500;
the construction of the weber-fisher model beta is as follows:
β=algα-b (7)
wherein a belongs to [3,4], b belongs to [5,7 ];
(5) determining a threshold value
The average value u of the maximum value of the Euclidean distance of the frame difference of the previous n frame blocks of the video is determined according to the formula (8):
Figure FDA0002158173140000021
where n ∈ {15,. 50 };
the threshold value T is determined by equation (9):
T=β×u (9)
(6) construction of denoising model
The absolute value r of the difference between α and u is determined according to equation (10):
r=|α-u| (10)
wherein r is at least 26;
constructing a denoising model f as follows:
f=round(clgr-d) (11)
wherein, round () is function, integer, c belongs to [0.5,0.64], d belongs to [0,0.2 ];
(7) extracting key frames
1) The maximum value p of the Euclidean distance of the frame difference of the k frame block of the videokCompared with a threshold value T, if pkMarking the kth frame as 1 when the number is more than or equal to T, otherwise marking the kth frame as 0;
2) for the 1 st frame to the N-2 th frame of the video, sequentially checking frame marks according to the playing sequence of the frames, if the frames marked as 1 continuously appear and the frame number is more than f, taking the frames as key frames and storing the key frames under an appointed folder; for frames marked as 0, if the number of consecutive frames is less than or equal to f, these frames marked as 0 are saved as key frames under the designated folder when any of the following is satisfied:
the continuous frame numbers of the frames marked as 0 and the frames marked as 1 nearest to the frames in the front-back direction of the playing sequence of the frames are all larger than f;
second, the 1 st frame in the frames marked as 0 is the 1 st frame of the video, and the continuous frame number of the frame marked as 1 which is the latest after the playing sequence of the frames marked as 0 is greater than f;
the continuous frame number of the frame marked as 1 before the frame marked as 0 and the frame marked as 0 closest to the frame in the playing sequence is larger than f, and the last 1 frame in the frames marked as 0 is the N-2 frame of the video;
3) for the N-1 th frame and the N-2 th frame of the video, if the N-1 th frame and the N-2 th frame are judged as key frames, extracting the N-1 th frame and storing the N-1 th frame and the N-1 th frame into a specified folder;
(8) key frame composite video
And (5) combining the key frames stored in the appointed folder in the step (7) into the abstract video according to the playing sequence.
2. The weber-fisher's law and temporal masking effect based video summarization method of claim 1, wherein: in the region blocking step (2), m is 32.
3. The weber-fisher's law and temporal masking effect based video summarization method of claim 1, wherein: in the step (4) of constructing the Weber-Fechner model, a is 3.5, and b is 6.
4. The weber-fisher's law and temporal masking effect based video summarization method of claim 1, wherein: in the step (5) of determining the threshold, n is 30.
5. The weber-fisher's law and temporal masking effect based video summarization method of claim 1, wherein: in the step (6) of constructing the denoising model, c is 0.58, and d is 0.1.
CN201910723748.7A 2019-08-07 2019-08-07 Video summarization method based on Weber-Fisher's law and time domain masking effect Active CN110602444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910723748.7A CN110602444B (en) 2019-08-07 2019-08-07 Video summarization method based on Weber-Fisher's law and time domain masking effect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910723748.7A CN110602444B (en) 2019-08-07 2019-08-07 Video summarization method based on Weber-Fisher's law and time domain masking effect

Publications (2)

Publication Number Publication Date
CN110602444A CN110602444A (en) 2019-12-20
CN110602444B true CN110602444B (en) 2020-10-02

Family

ID=68853612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910723748.7A Active CN110602444B (en) 2019-08-07 2019-08-07 Video summarization method based on Weber-Fisher's law and time domain masking effect

Country Status (1)

Country Link
CN (1) CN110602444B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011101448A2 (en) * 2010-02-19 2011-08-25 Skype Limited Data compression for video
CN109523562A (en) * 2018-12-14 2019-03-26 哈尔滨理工大学 A kind of Infrared Image Segmentation based on human-eye visual characteristic

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104112064B (en) * 2014-07-01 2017-02-22 河南科技大学 Method for establishing touch comfort level model based on Weber-Fechner law
CN104331905A (en) * 2014-10-31 2015-02-04 浙江大学 Surveillance video abstraction extraction method based on moving object detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011101448A2 (en) * 2010-02-19 2011-08-25 Skype Limited Data compression for video
CN109523562A (en) * 2018-12-14 2019-03-26 哈尔滨理工大学 A kind of Infrared Image Segmentation based on human-eye visual characteristic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Generic Framework of User Attention Model and;Yu-Fei Ma, Xian-Sheng Hua, Lie Lu, and Hong-Jiang Zhang;《IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005》;20090930;第1-13页 *

Also Published As

Publication number Publication date
CN110602444A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN106412619B (en) A kind of lens boundary detection method based on hsv color histogram and DCT perceptual hash
CN101719144B (en) Method for segmenting and indexing scenes by combining captions and video image information
CN108921130A (en) Video key frame extracting method based on salient region
US7676085B2 (en) Method and apparatus for representing a group of images
US8326042B2 (en) Video shot change detection based on color features, object features, and reliable motion information
CN104063883B (en) A kind of monitor video abstraction generating method being combined based on object and key frame
CN101329766B (en) Apparatus, method and system for analyzing mobile image
CN104866616B (en) Monitor video Target Searching Method
CN109145708B (en) Pedestrian flow statistical method based on RGB and D information fusion
US20020146168A1 (en) Anchor shot detection method for a news video browsing system
CN107220585A (en) A kind of video key frame extracting method based on multiple features fusion clustering shots
WO2003051031A2 (en) Method and apparatus for planarization of a material by growing and removing a sacrificial film
JP2002288658A (en) Object extracting device and method on the basis of matching of regional feature value of segmented image regions
CN102117313A (en) Video retrieval method and system
CN101982828A (en) Methods of representing images and assessing the similarity between images
Omidyeganeh et al. Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space
CN101527786B (en) Method for strengthening definition of sight important zone in network video
CN109978916B (en) Vibe moving target detection method based on gray level image feature matching
CN110602444B (en) Video summarization method based on Weber-Fisher's law and time domain masking effect
Fernando et al. Fade-in and fade-out detection in video sequences using histograms
CN111708907A (en) Target person query method, device, equipment and storage medium
CN106375773B (en) Altering detecting method is pasted in frame duplication based on dynamic threshold
Lie et al. Video summarization based on semantic feature analysis and user preference
Patel Key Frame Extraction Based on Block based Histogram Difference and Edge Matching Rate
CN113516609A (en) Split screen video detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant