CN107707975A

CN107707975A - Video intelligent clipping method based on monitor supervision platform

Info

Publication number: CN107707975A
Application number: CN201710855228.2A
Authority: CN
Inventors: 王霞; 李岳楠; 张为
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2018-02-16

Abstract

The present invention relates to a kind of video intelligent clipping method based on monitor supervision platform, including：Sport foreground, and the sport foreground point number in statistics frame are extracted, carries out threshold process；Extract sport foreground profile and carry out human testing；To the doubtful frame of video containing personnel, its quantity is counted, if the frame number after statistics is very few, video clipping processing below is no longer carried out, saves as former video；Conversely, then continuing each video frame images of multiple dimensioned scanning input, it is detected and judged with the human testing assembled classifier of pre-training, will be deemed as the frame of video comprising human body and be defined as key frame of video；The key frame of video that preceding step is obtained carries out front and rear extension multi-frame processing, and then the corresponding frame number of these key frame of video after expanding frame processing is stored entirely in same crucial frame number queue；Editing.

Description

Video intelligent clipping method based on monitor supervision platform

Technical field

Field of intelligent video surveillance in terms of the invention belongs to security protection, specifically belong to a kind of and be based on existing monitor supervision platform Video early stage Intelligent treatment technology.

Background technology

In traditional safety-security area, relevant staff need to carry out video more than times check repeatedly could be at some Some favourable clues are found in details, it is such so as to which further video is analyzed and handled according to these clues Requirement of the process to relevant staff is very high, it is necessary to the staring concern screen of staff, spiritual need high concentration, So that the working strength of related personnel is excessive；On the other hand, the notice focus of people has cycle regular hour, reaches After regular hour length, the situation of distraction just occurs in people, be thus likely to result in video contain compared with The key frame of high price value information it is ignored fall, be unable to reach initial purpose and demand.

Based on such present situation, if can be carried out just to some more tediously long monitor videos by some effective algorithms The editing processing of step, shortens video length, then gives that relevant staff is further to be handled, and so can both mitigate significantly The burden of staff, it can increase work efficiency again.

With the continuous development of the technologies such as image steganalysis, computer vision, various intelligent algorithms are constantly suggested, closely In the past few years, machine learning, deep learning are also concerned by people further, and attract increasing people to bound oneself to it, correlative study Deepen continuously.Personage in video is analyzed, on condition that carrying out Preliminary detection to the personage in video, and presently, there are Some personnel's detection algorithms provide strong theories integration to solve actual engineering problem.The conventional side of human testing at present Method is extraction human body key feature and the corresponding grader of training, and the classical conventional method of early stage is the gradient of Dalal et al. designs The combination of histogram feature and SVM classifier, this method promotes the breakthrough in pedestrian detection field, similar, right That answers also has the features such as LBP, Harr, and the grader designed the methods of Adaboost graders, neutral net.In addition, work as It is pedestrian detection that this field of the preceding detection to people is most of, and the substantially pedestrian in parallel views, human body contour outline Than more complete, but because the position of monocular cam in practice is fixed, enter the monitoring field of camera from pedestrian and take the photograph to walking out As the monitoring field of head, the profile of pedestrian is not necessarily always complete, and sometimes only part human body is appeared in monitor video, Moreover, with the change of distance, the human body image in video is not of uniform size, at this time only fears using only certain single grader Fearness can not meet the needs of actual.For such case, if the people for being adapted to a variety of environment in monitor video can be designed Grader is surveyed in physical examination, retains more key frames by reaching higher personnel targets verification and measurement ratio, it is possible to is preferably ensured Video quality after editing processing.

Meanwhile concern of the China for intelligent security guard field is also increasingly deep, video monitoring system spreads over production and living In, and the monitor video to obtaining be further analyzed processing just become a vital task.By using one kind compared with Early stage editing processing is carried out to video for effective personnel's detection algorithm, can greatly mitigate the work load of staff, And the analytical cycle of video is greatly shortened, the work efficiency of relevant departments is improved, this has one for intelligent security guard field Determine meaning.

The content of the invention

It is an object of the invention to overcome the shortcomings of the high human cost of analyzing and processing monitor video in existing safety-security area, One kind is provided and is based on existing monitor supervision platform, can be as follows to the method for the intelligent editing processing of video progress, technical scheme：

A kind of video intelligent clipping method based on monitor supervision platform, including following step：

1) input video, video is read frame by frame, all frame of video are normalized into unified size, then frame of video carried out Pretreatment, so that subsequent step is further handled；

2) to pretreated frame of video, sport foreground, and the sport foreground point number in statistics frame are extracted；

3) threshold process is carried out to foreground point number, retains the frame of video that foreground point number is more than threshold value, exclude foreground point The very few frame of video of number；

4) to the video frame extraction for meeting threshold condition its sport foreground profile remained, sport foreground profile is entered The appropriate Morphological scale-space of row, then draws its bounding rectangles frame, according to detection demand in practice, takes and suitably amplify its side The strategy of boundary's rectangle frame, the width of rectangle frame amplify 1.1 times, highly enlarged 1.2 times of rectangle frame；

5) obtained sport foreground bounding rectangles frame is further processed, calculates the face of sport foreground bounding rectangles frame Product and the ratio of width to height, if sport foreground bounding rectangles frame area is too small or aspect ratio does not meet the general features of human body, The sport foreground is excluded, and then excludes the frame of video；

6) discriminatory analysis in summary, a part of doubtful frame of video containing personnel is tentatively remained, counts its quantity, If the frame number after statistics is very few, video clipping processing below is no longer carried out, saves as former video；Conversely, then continue more Each video frame images of dimensional scan input, are detected and are sentenced to it with the human testing assembled classifier of pre-training It is disconnected, it will be deemed as the frame of video comprising human body and be defined as key frame of video；

Herein, the human testing assembled classifier of pre-training is as follows：

Multiple dimensioned scanning is first carried out to each video frame images of input, then, by using being suitable in monitor video The human testing assembled classifier of a variety of environment, human testing assembled classifier include upper half of human body detection cascade classifier Cascade classifier is detected with the human body lower part of the body, and upper half of human body detection cascade classifier contains the head-and-shoulder area of human body and right The body frame structural information answered, it is ensured that higher human testing rate, when only existing part human body in video image, especially It is when human body head information is not present, can be then compensated its defect using lower part of the body detection cascade classifier；

7) information correlativity between frame of video is considered, the key frame of video that preceding step is obtained carries out front and rear extend Multi-frame processing, these corresponding frame numbers of key frame of video after expanding frame processing are then stored entirely in same key frame In number queue；

8) read former video frame by frame again, judge that the frame number of each frame of video whether there is in the crucial frame number being previously obtained In queue, if in the presence of, correspond to retain the key frame of video, most at last these key frame of video remained reconfigure for One new video preserves.

Color information of the present invention independent of video frame images, the monitor video at night can also be handled, by more Completely key frame of video of the overwhelming majority comprising personnel is preserved, it is rational to remove the video without larger break-up value Frame, video length is shortened, effectively realizes the early stage editing processing of video, significantly reduce the work load of staff, Human cost is reduced, improves efficiency.

Brief description of the drawings

The video monitoring system block diagram that Fig. 1 is carried by the inventive method

Fig. 2 is the detecting system of human body block diagram designed by the inventive method

Fig. 3 is the partial test result figure of the detecting system of human body designed by the inventive method

Fig. 4 is the flow chart of the inventive method

Embodiment

The general processing framework that video monitoring system is treated in existing safety-security area is：The picture shot by analog video camera Face is directly transmitted to monitor by a cable part and shown, another part is transmitted to DVR.Into the mould of DVR Intend signal and be changed into digital code stream, on the one hand encoded, be stored in document form in DVR；On the other hand, may be used DVR is connected by network at any time, extraction code stream is shown, analyzed.It is specific as shown in Figure 1.Itd is proposed based on the present invention Method formed video early stage processing software, be by extraction be stored in DVR the video after transcoding text Part, then the video is pre-processed, input detecting system of human body, by multiple dimensioned scan video two field picture frame by frame and make Detected with the human testing assembled classifier of pre-training, will be determined as that the frame of video comprising personnel is remained as key Frame, and to its appropriate expansion frame, the key frame after expanding frame processing is reassembled into new video, by removing without larger analysis The frame of video of value, video length is shortened, effectively realize the early stage intelligence editing processing of video.

The inventive method is specifically as shown in Figure 4.

Various pieces are described in detail below：

1. frame of video pre-processes

The problem of in view of computational efficiency, the frame of video of reading is normalized, it is big that CIF forms are uniformly processed into It is small, because monitor video is much night video, so frame of video is converted into gray scale bitmap-format by rgb signal, so With independent of color information, and carry out appropriate histogram equalization processing and avoid disturbing caused by frame of video jump in brightness.

2. sport foreground is extracted

What is taken in this programme is the foreground extracting method of mixed Gauss model, by carrying out background modeling and the in good time back of the body Scape updates, and carries out appropriate morphologic filtering processing to the sport foreground of extraction, includes the processing such as dilation erosion, can obtain To more complete sport foreground.

3. foreground point counts and processing

The foreground point number in the sport foreground extracted is counted, and appropriate threshold value constraint is carried out to it, gives up and includes The very few frame of video of foreground point number.

4. sport foreground contours extract

To qualified video frame extraction its sport foreground profile remained, it is carried out at appropriate morphology Reason, then draws its bounding rectangles frame, according to detection demand in practice, takes the strategy for suitably amplifying its bounding rectangles frame, The width of rectangle frame amplifies about 1.1 times, highly enlarged about 1.2 times of rectangle frame, so can walk follow-up detection of classifier Rapid verification and measurement ratio is higher, improves precision；

5. frame of video is chosen in advance

Obtained sport foreground bounding rectangles frame is further processed, calculates its area and size, if fortune Dynamic prospect bounding rectangles frame area is too small or aspect ratio does not meet the general features of human body, then excludes the sport foreground, enter And exclude the frame of video；

6. detection of classifier

In summary discriminatory analysis, a part of doubtful frame of video containing personnel is tentatively remained, counts its quantity, such as Frame number after fruit statistics is very few, then no longer carries out video clipping processing below, save as former video；Conversely, then continue it It is input in detecting system of human body below, each video frame images of multiple dimensioned scanning input, with the people trained Physical examination surveys grader and it is detected and judged, will be deemed as the frame of video comprising human body and is defined as key frame of video.

Herein, the good human testing classifier design process of pre-training is as shown in Figure 2.

Various pieces are described in detail below：

In view of the profile information of human body in actual monitored video, the not necessarily complete and profile of human body is not of uniform size, herein Take following scheme：Multiple dimensioned scanning is first carried out to each video frame images of input, then, by using fitting for design The human testing assembled classifier of a variety of environment in monitor video is answered, reaches higher human testing rate, by most video Key frame more completely preserves.

For the consideration to the Video processing time, what is taken herein is the strategy of cascade classifier, specifically, people's physical examination Survey assembled classifier and include upper half of human body detection cascade classifier and human body lower part of the body detection cascade classifier.Human body upper half Body detection cascade classifier contains the head-and-shoulder area of human body and corresponding body frame structural information, it is ensured that higher Human testing rate, but, when only existing part human body in video image, especially in the absence of human body head information when, under Half body detection cascade classifier can then compensate its defect.

Specifically, there is employed herein histogram of gradients feature extraction and cascade Adaboost algorithm design assembled classifier Scheme.It is to cascade multiple strong classifiers to cascade Adaboost algorithm, and each strong classifier therein is by multiple weak What classifiers combination formed, the two kinds of cascade classifiers designed herein are set as 18 grades.By training substantial amounts of positive negative sample Preliminary the upper half of human body detection cascade classifier for having obtained better performances and lower part of the body detection cascade classifier, then by this two Acted on after kind classifiers combination on test set, analyze test result.According to test result, increase corresponding to this paper and perfect Positive negative sample, including the upper half of human body positive sample at a variety of visual angles and the negative sample for easily causing interference, such as trees, building, electricity Line bar, roadblock etc., the diversity of sample is improved, effectively improves classifier performance, improves verification and measurement ratio, reduce wrong report Rate.

Generally speaking, by multiple dimensioned scan video two field picture and using human testing assembled classifier, original can be regarded Key frame of the overwhelming majority comprising personnel more completely preserves in frequency, the video matter being effectively guaranteed after editing processing Amount.

7. key frame stores

In view of the information correlativity between frame of video, the key frame of video that preceding step is obtained carries out front and rear extension 15 Frame processing, these corresponding frame numbers of key frame of video after expanding frame processing are then stored entirely in same crucial frame number In queue；

8. it is reassembled as new video

Again former video is read frame by frame, judges that the frame number of each frame of video whether there is in the crucial frame number team being previously obtained In row, if in the presence of corresponding to retain the key frame of video, most these key frame of video remained are reconfigured as one at last Individual new video preserves.By more completely remaining the key frame of video containing key message, eliminate without larger point The frame of video of value is analysed, shortens video length, effectively realizes the early stage intelligence editing processing of video.

Claims

1. a kind of video intelligent clipping method based on monitor supervision platform, including following step：

1) input video, video is read frame by frame, all frame of video are normalized into unified size, then frame of video located in advance Reason, so that subsequent step is further handled；

3) threshold process is carried out to foreground point number, retains the frame of video that foreground point number is more than threshold value, exclude foreground point number Very few frame of video；

4) to the video frame extraction for meeting threshold condition its sport foreground profile remained, sport foreground profile is fitted When Morphological scale-space, then draw its bounding rectangles frame, according to detection demand in practice, take and suitably amplify its shape based moment The strategy of shape frame, the width of rectangle frame amplify 1.1 times, highly enlarged 1.2 times of rectangle frame；

5) obtained sport foreground bounding rectangles frame is further processed, calculate sport foreground bounding rectangles frame area and The ratio of width to height, if sport foreground bounding rectangles frame area is too small or aspect ratio does not meet the general features of human body, exclude The sport foreground, and then exclude the frame of video；

6) discriminatory analysis in summary, a part of doubtful frame of video containing personnel is tentatively remained, counts its quantity, if Frame number after statistics is very few, then no longer carries out video clipping processing below, save as former video；Conversely, then continue multiple dimensioned Each video frame images of input are scanned, it is detected and judged with the human testing assembled classifier of pre-training, will It is judged as that the frame of video comprising human body is defined as key frame of video；

Herein, the human testing assembled classifier of pre-training is as follows：

Multiple dimensioned scanning is first carried out to each video frame images of input, it is then, a variety of in monitor video by using being suitable for The human testing assembled classifier of environment, human testing assembled classifier include upper half of human body detection cascade classifier and people The body lower part of the body detects cascade classifier, and upper half of human body detection cascade classifier contains the head-and-shoulder area of human body and corresponding Body frame structural information, it is ensured that higher human testing rate, when only existing part human body in video image, especially not When human body head information being present, it can then be compensated its defect using lower part of the body detection cascade classifier；

7) information correlativity between frame of video is considered, the key frame of video that preceding step is obtained carries out front and rear extension multiframe Processing, these corresponding frame numbers of key frame of video after expanding frame processing are then stored entirely in same crucial frame number team In row；

8) read former video frame by frame again, judge that the frame number of each frame of video whether there is in the crucial frame number queue being previously obtained In, if in the presence of corresponding to retain the key frame of video, most these key frame of video remained are reconfigured as one at last New video preserves.