Summary of the invention
Video summarization technique is significant and wide market outlook for effective utilization of magnanimity monitor video, but existing video summarization technique can not effectively solve in complex scene and occur that Weak target, target block the situations such as adhesion mutually, for monitor video is carried out to fast browsing, reduce loss and fallout ratio that in complex scene, the moving target event is extracted, reduce moving target and in the track generative process, follow the tracks of the probability of makeing mistakes, follow the tracks of lose objects, and eliminate the misgivings of user to existing video summarization technique.
The present invention proposes reliable video abstraction generating method under a kind of complex scene, it comprises:
Step 1, the video obtained is carried out to background modeling, obtain the sport foreground of present frame in video sequence;
Step 2, utilize the information of forecasting calculated in the previous frame tracing process, select suitable object detector to carry out target classification and detection to the sport foreground of present frame, obtain type and the testing result of moving target;
Step 3, according to type and the testing result of the moving target obtained, utilize multi-object tracking method, calculate the movement locus of moving target in present frame, and calculate accordingly the information of forecasting of next frame, and go to step 2 and carry out the detection of next frame, until detect complete all video frame images;
Step 4, according to movement locus and the type of all moving targets that obtain, generating video summary.
Wherein, in step 1 also the sport foreground to the current video image frame that obtains carry out aftertreatment, specifically comprise:
Step 11, use morphological structuring elements, carry out morphology opening operation and closing operation of mathematical morphology to foreground target, obtains the prospect of contour smoothing, and eliminate the less noise spot of area, dwindles the noise spot that area is larger;
Step 12, foreground target is carried out to area calculating, if in the area of foreground target, pixel is less than threshold value T
1=5 o'clock, this foreground target of filtering, otherwise, retain this foreground target.
Wherein, object detector described in step 2 is with the combination of histograms of oriented gradients and linear SVM, to carry out off-line training in advance to obtain, for detection of type, profile and the positional information of moving target.
Wherein, the positive sample set of the described object detector use of training comprises the following three class images that occur in monitor video: 1) pedestrian and positive bicycle; 2) bicycle of side; 3) motor vehicle; Training obtains three object detectors.
Wherein, step 2 specifically comprises:
Step 21, according to the information of forecasting of previous frame tracing process feedback, determine object detector;
Step 22, carry out target detection according to determined object detector;
If step 23 detects moving target, export the testing result of present frame moving target, for the tracing process of moving target;
If step 24 does not detect moving target, will in background modeling, obtain the information output of sport foreground to tracing process.
Wherein, the information of forecasting fed back according to the previous frame tracing process in step 21, comprise type, position, area, the ratio of width to height and the number of moving target in present frame, carrys out type and the yardstick of select target detecting device, and the detection position of localizing objects detecting device.
Wherein, step 3 specifically comprises the following steps:
Step 31, according to the testing result of the moving target of present frame, calculate the similarity of moving target color histogram feature in present frame and previous frame;
Step 32, according to the moving object detection result in previous frame, utilize the positional information of moving target in present frame in Kalman filter prediction previous frame, in conjunction with the testing result of moving target in present frame, the Euclidean distance in the predicted position of calculating moving target and moving object detection result between the physical location of moving target;
Step 33, according to the result of calculation of step 31 and step 32, use Hungary Algorithm, to the moving target that detects in present frame with enliven track and mate, obtain matching result, and according to matching result, upgrade the movement locus of moving target.
Wherein, in step 4, the duration of the movement locus of the moving target of acquisition, classification and appearance is presented in a secondary video snapshot image, forms video frequency abstract in the mode of video snapshot image.
The present invention is directed to the monitor video under complex scene, by novel Video content analysis technique, moving target in original video is detected and follows the tracks of, extract the moving target event in original video, then for each moving target event, add up their movement locus and relevant information, and be shown to compactly the user with the form of image, the user, by watching the picture that records each moving target event just can reach the purpose of watching original video, has shortened widely the user and has watched the spent time of video.When the moving target in video carries out detection and tracking, this method fully takes into account the complicacy of scene, the technical scheme adopted can guarantee the reliability of result of calculation, the loss of moving target event is controlled to extremely low level, thereby make this invention technology can be widely used in the actual combat of many departments, such as public security investigation etc.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.
The present invention proposes reliable video abstraction generating method under a kind of complex scene, the method content is specific as follows:
At first, the image sequence of original video is carried out to background modeling, obtain the sport foreground piece, and it is carried out to prospect aftertreatment etc.; Secondly, utilize a series of object detectors that off-line training is good, the sport foreground piece is carried out to target detection and classification; Again, build the multi-object tracking method based on Hungary Algorithm; Finally, for each moving target generates movement locus, moving target and its movement locus are merged in image, the image after the present invention will be merged is called " video snapshot ".It is worthy of note, this method has been introduced mutual feedback mechanism in foreground target detection and tracking process, feedback mechanism can make target detection and target following mutually promote mutually, improve accuracy rate and the speed of target detection and target following, effectively reduce the loss of Weak target, the false drop rate of dense shelter target, and movement objective orbit with wrong probability, with losing probability and imperfect probability.The original video that can carry out the video frequency abstract processing includes but not limited to: the video file that the live video stream that video monitoring system gathers, video monitoring system store, conventional multimedia video frequency file, TV programme, film etc.
Fig. 1 shows in the present invention under complex scene the process flow diagram of video abstraction generating method reliably.As shown in Figure 1, the concrete implementation step of this video abstraction generating method is as follows:
The video data of step S101, collection video frequency abstract to be generated;
The original video that step S102, storage gather, form the original video data storehouse; Original video can be the video of monitoring camera Real-time Collection, can be also the playback video of monitoring video.
Step S103, the video sequence frame of the original video in the original video data storehouse is carried out to background modeling, obtain sport foreground and background in each frame of video, and it is carried out to aftertreatment.
In an embodiment of the present invention, background modeling can adopt multiple related algorithm, and the present embodiment does not enumerate.The purpose of background modeling is background and the prospect of determining in this video scene.Scene consists of background and prospect, and the background in scene refers to that long period in video remains unchanged or the zone of subtle change is arranged, and accordingly, the prospect in scene refers to the zone of significant change.For example, for the monitor video of crossroad, the automobile travelled on road and the pedestrian who walks, only exist at short notice in video scene, so be considered to sport foreground, and the trees of road, traffic lights and both sides, road, exist for a long time in video scene, can be regarded movement background.By original video is carried out to background modeling, just can extract the sport foreground in video.
But due to the complicacy of actual monitored scene, foreground target can be sneaked into noise spot, such as the trees on road both sides, when the moving leaf of wind, leaf can be shaken, and the leaf of shake is due to the variation that position is arranged, so can be classified as sport foreground.
Based on this phenomenon, in the preferred embodiment of the present invention, the sport foreground obtained is carried out to the prospect aftertreatment, the prospect aftertreatment adopts morphology to calculate, and specifically comprises:
At first, use morphological structuring elements, foreground target is carried out to morphology opening operation and closing operation of mathematical morphology, can obtain the prospect of contour smoothing, and eliminate the less noise spot of area, dwindle the noise spot that area is larger;
Then, foreground target is carried out to area calculating, if in the area of foreground target, the pixel number is less than threshold value T
1=5 o'clock, think that this foreground target belongs to noise, answer filtering, otherwise, retain this foreground target.By above method, eliminate the noise in sport foreground, and can make the edge of prospect become level and smooth.
Step S104, utilize object detector, sport foreground is carried out to target detection and classification, obtain the relevant information such as profile, type, position of moving target.
By video is carried out to background modeling, extract sport foreground, and, after sport foreground is carried out to aftertreatment, can obtain the profile information of moving target in video, still due to the complicacy of actual monitored scene, it is far from being enough having to the profile of moving target.During due to background modeling, usually can move the phenomenons such as target adhesion, target piecemeal, target occlusion, the target of prospect is too dense, these phenomenons often cause the accuracy rate of detecting device to reduce.For example, at the monitor video in the more railway station of flow of the people, the situation that the pedestrian is blocked mutually usually occurs, what at this moment sport foreground showed is group's profile; Monitor video on road bustling with vehicles, usually there will be mutually blocking of vehicle, and what at this moment sport foreground showed is the profile of many cars; And, simply by virtue of the profile of moving target, also can't obtain the classified information (pedestrian/bicycle/motor vehicle) of moving target.In addition, background modeling method is in some picture frame, can move prospect piecemeal phenomenon, for example a pedestrian walks on road, sport foreground should be pedestrian's profile, but background modeling can be regarded the pedestrian above the waist as a moving object after extracting sport foreground, regards the lower part of the body as another moving object.In sum, need to carry out target detection to sport foreground.
In the preferred embodiment of the present invention, adopt the good a series of object detectors of off-line training, target detection and classification are carried out in the zones such as sport foreground.The off-line training of object detector is divided into three steps:
At first, collect sample set.Sample set can be divided into positive sample set and negative sample collection.For example, positive sample set can be divided three classes: 1) pedestrian and positive bicycle; 2) bicycle of side; 3) motor vehicle, the pedestrian, bicycle and the motor vehicle that in a series of images frame of monitor video, occur are formed.Monitor video derives from up to a hundred the videos that the video monitoring system of Quzhou City, Zhejiang Province Ke Shan branch office gathers, pedestrian in positive sample set and vehicle are by manually marking, form the sample data collection of monitor video, and issue on the net, can be for public research.The negative sample collection is the monitoring image in the monitor video that does not comprise pedestrian, bicycle and motor vehicle.
Secondly, extract sample characteristics.In the preferred embodiment of the present invention, adopt histograms of oriented gradients HOG (Histogram of Oriented Gradients) to carry out feature extraction, HOG is a kind of Feature Descriptor for target detection, the direction gradient number of times that image local occurs is counted, the HOG technology is by being divided into entire image little join domain cells, each cell generates a histograms of oriented gradients, and these histogrammic combinations can be expressed as the descriptor of sample.The HOG feature that positive sample in sample set and negative sample is all needed to extract to sample.
Finally, training sample, obtain a series of detecting devices.In the preferred embodiment of the present invention, utilize linear SVM SVM (Support Vector Machine) to train training sample, thereby obtain a series of detecting devices for target detection.For example,, because the positive sample set of training is divided into three classes, therefore can obtain three object detectors.If in moving target, contain front or the side image of pedestrian, bicycle and motor vehicle, the multiclass detecting device that forms of these three object detectors just can detect the information such as position, profile of moving target, and the classified information (pedestrian/bicycle/motor vehicle) of this moving target is provided.
Above step of carrying out off-line training for sample set is only with carrying out once, and purpose is to obtain detecting device, after successfully obtaining detecting device, while for different monitor videos, processing to generate video frequency abstract, only need to call the detecting device trained and get final product.
In the process of generating video summary, this a series of detecting device is that the sport foreground of original video is carried out to target detection, and needs to detect for each sport foreground of each frame of video.When sport foreground is detected, need be from a Train detector, selecting one.
In order to adapt to the needs of moving object detection under complex scene, improve performance and the speed of detecting device, the present invention introduces the information of forecasting of tracing process feedback innovatively in the target detection process, this feedback information is predicted type, area, the ratio of width to height, position and the number of moving target in present frame.According to information of forecasting, can select optimal detecting device, and the parameters such as the yardstick of definite detecting device and detection position.Wherein, in feedback information, the type of moving target can help to select accurately detector type, avoids three detecting devices to carry out duplicate detection to the same movement prospect, causes be multiplied detection time; In feedback information, the area of moving target, aspect ratio information can assist to select exactly the yardstick of detecting device; In feedback information, the effective location of sensing range can assisted detector be carried out in the position of moving target, etc.The present invention, by the feedback mechanism of tracing process to testing process, has increased substantially the performance of detecting device, and greatly reduces detection speed.
For the fresh target in sport foreground, because it just occurs, tracing process there is no the information of forecasting that method is obtained this moving target, therefore according to features such as the ratio of width to height of this moving target, areas, select a class detecting device, the fresh target in sport foreground is detected, obtain testing result, then with threshold value, compare, if higher than threshold value, adopt the testing result of such detecting device, otherwise use an other class detecting device.Obtain testing result the outputs such as type, area, position of moving target.Wherein, the Threshold of each detecting device is different, the present invention during by the detecting device off-line training testing result minimum value of positive sample set be made as threshold value.
By above mechanism, even under complex scene, object detector also can detect comparatively accurately to the sport foreground of each frame.
Fig. 2 shows the process flow diagram of moving target detecting method in the present invention.As shown in Figure 2, this detection process of moving target specifically comprises:
Step S1021, obtain the target prediction information of feeding back in previous frame motion target tracking process, comprise type, position, area and the number etc. of moving target.
Step S1022, according to described target prediction information, determine type, yardstick and the detection position of detecting device.For the selection of yardstick mainly for pedestrian target, according to the height H of pedestrian in target prediction information, pedestrian target is divided into to very little target (H<30pixels), little target (30<H<50 pixels), normal size target (50<H<90), general objective (H>90), the corresponding pedestrian detector who covers 7 search window yardsticks of each target.In order to guarantee that the zone of searching for comprises the target that needs detect fully, for original prediction area, enlarge, the coefficient of expansion is 1.4.
Step S1023, the preferred detecting device of use carry out target detection.
If step S1024 step S1023 detects target, the testing results such as the position of export target, area;
If step S1025 step S1023 does not detect target, directly according to the area in information of forecasting and the corresponding sport foreground of positional information, export as testing result.
Especially, in firm incipient 3 frames of moving target, can't provide information of forecasting accurately in previous frame motion target tracking process.Now, directly according to the height of sport foreground, select the detecting device yardstick, and use 3 class detecting devices to detect successively the sport foreground target, after obtaining testing result, choose the testing result of the detecting device that score is the highest, thereby determine target type, area, the ratio of width to height and position, and above information is exported.
Step S105, utilize multi-object tracking method, obtain the track of moving target.Wherein, enliven track, mean the track of following the tracks of, show in real-time result; Historical track, mean current do not have tracked, but may be transformed into the track that enlivens track; Dead track, mean thoroughly to finish, no longer tracked track.
This paper adopts the movement locus that obtains moving target based on the multiple target tracking mode of Hungary Algorithm, and wherein Hungary Algorithm is used for calculating the optimum correspondence problem of a plurality of moving targets.Wherein, the description of moving target similarity is based on colouring information and the positional information of moving target.Colouring information adopts color histogram to quantize, and a kind of statistical value of color distribution in the color histogram presentation video means different color shared ratio in image, calculates simply, and has yardstick, translation and rotational invariance.Positional information is calculated in conjunction with Kalman filter, Kalman filtering is the linear system optimal estimation method under minimum mean square error criterion, its basic thought is that to make variance of estimaion error be minimum, and estimates it is without inclined to one side, can promote the target following effect.
As shown in Figure 3, the movement locus that obtains moving target based on the multiple target tracking mode of Hungary Algorithm in the present invention specifically can be divided into following step:
Step S1051,8 * 8 * 8 color histogram features of all moving targets that detect in calculation procedure S104, then calculate the similarity of the color histogram feature of the color histogram feature of the moving target obtained in present frame and previous frame moving target.Preferably, the present invention adopts the RGB color space to calculate the color histogram of each moving target: first three color components in color space RGB are quantized, each color space is divided into to 8 sub spaces, one dimension (bin) in the corresponding histogram of every sub spaces, statistics drops on the number of pixels in the subspace that the every one dimension of histogram is corresponding, thereby obtain color histogram, then calculate the similarity between the color histogram feature that previous frame enlivens moving target that track is corresponding and present frame moving target.Preferably, the present invention adopts the Hellinger distance to measure the similarity of two histogram distribution:
Wherein, h
1(q) and h
2(q) represent two color histogram vectors, N is 8 * 8 * 8,
If the color histogram of two targets is more similar, namely the distance of the Hellinger between the color histogram vector is less, and the possibility of two object matchings is higher, and its probability distribution meets Gaussian distribution.For example, in the monitor video picture of highway, there is a white car W in left side, and there is a black car B on right side, and this method need to be followed the tracks of these two moving targets, thereby obtains their movement locus.If in previous frame, two moving object W detecting in picture and B are calculated to color histogram and obtain h
1And h
2, two moving object W in the present frame picture and B are calculated to color histogram and obtain h
3And h
4, by calculating h
1And h
3, h
1And h
4, h
2And h
3, h
2And h
4Between the Hellinger distance, can find h
1And h
3, h
2And h
4The Hellinger distance much smaller than h
1And h
4, h
2And h
3Between the Hellinger distance, can access h so
1And h
3That W is at the corresponding color histogram of two continuous frames, h
2And h
4Be B at the corresponding color histogram of two continuous frames, this information can help the target that two continuous frames occurs to mate.
Step S1052, according to the trace information that enlivens of moving target in the previous frame image, utilize the position of Kalman filter predicted motion target.Enliven trace information according to every in the t-1 two field picture, utilize the position that in Kalman filter prediction t frame, moving target occurs.In step S104, obtain the testing result of t frame moving target, be the definite position of moving target at the t frame, in this step, successively moving target is carried out to Euclidean distance calculating in the predicted position of t frame and the target detection result of t frame detection module, Euclidean distance is less, predicted position and accurate location are more approaching, the possibility of two object matchings is higher so, and its probability distribution meets Gaussian distribution.For example, the left side vehicle W in monitored picture mentioned above and right side vehicle B, if in the t-1 frame, utilize Kalman filter to carry out position prediction to two moving object W and the B of detecting in picture, obtains the predicted position l in the t frame
1' and l
2', in step S104, after the t frame detects two moving object W and B, obtain the physical location l of target
1And l
2.Because in two continuous frames, huge change can not occur in the position of vehicle, so l
1' and l
1, l
2' and l
2Euclidean distance will be far smaller than l
1' and l
2, l
1' and l
2Euclidean distance, this information can help the target that two continuous frames occurs to mate.
Step S1053, adopt Hungary Algorithm, utilizes colouring information and positional information to carry out multiobject coupling, and Hungary Algorithm is the classic algorithm that solves the bipartite graph maximum matching problem.For example, if in the t-1 frame, exist m to enliven track, step S104 detects n moving target in the t frame, and by Hellinger, is calculated the similarity between the moving target color histogram feature of enlivening track and t frame of t-1 frame, and obtains the matrix M of m * n
1And calculate the t-1 frame enliven track in the t frame predicted position and the Euclidean distance between the definite position of t frame moving target, can obtain the matrix M of m * n
2.By matrix M
1And M
2The element of correspondence position multiplies each other, can obtain the matrix M of m * n, input value using this matrix M as Hungary Algorithm, Hungary Algorithm can provide m matching result that enlivens track and n moving target of t frame in the t-1 frame, if in matching result, similarity is less than threshold value T
2=0.5 o'clock, think and do not mate, otherwise the match is successful.
Step S1054, according to the matching result of target in previous step, generate the movement locus of moving target in present frame; The positional information of while target of prediction in next frame etc.
If the t-1 frame enliven track m
iMoving target n with the t frame
jThe match is successful, thinks target n
jMovement locus in front t-1 frame is m
i, integrating step S104 at the t frame to target n
jTesting result, the renewable track m that enlivens
i.Now, for target n
jTracing process at the t frame finishes.
If the moving target of t frame does not match the track that enlivens of t-1 frame, illustrate that this target does not have movement locus, is fresh target; If the track that enlivens of t-1 frame does not match the moving target of t frame, illustrate that target disappears, this is enlivened to track and historical track mates, if on mating, this enlivens track and historical track is integrated into the new track that enlivens, otherwise this enlivens track and changes historical track into.
The present invention is at t frame target n
jAfter renewal enlivens track, utilize Kalman filter target of prediction n
jIn the position of t+1 frame, and preserve target n
jThe information such as type, position, area, the ratio of width to height, to use when the t+1 frame target detection.
Step S106, generating video summary.
By above step, can obtain the information such as the movement locus of moving target and type.In the method, when certain historical track, through the matching operation of N frame, still can't be upper with the sport foreground coupling, be considered as this historical track and stop, N=50 in this algorithm.After the historical track termination, with moving target, merge, generate a sub-picture, be referred to as " video snapshot ", in this " video snapshot ", indicate the classification of moving target, and the time continued appears in this moving target.For example, in the monitor video of road, suspect's vehicle of running away appears, this vehicle can be labeled out to this section running orbit disappeared from picture from entering picture, time of occurrence was from 28.92 seconds to 41.54 seconds, the starting point of movement locus is used and is indicated than light colour, terminating point is used and indicates than dark colour, the gradual change from shallow to deep of the color of the movement locus point between from the starting point to the terminating point, then image and the movement locus of this vehicle in the movement locus midpoint merged, and the classified information of vehicle " motor vehicle " and time of occurrence " 28.92s-41.54s " are indicated in by moving target.After whole section Video processing finishes, generate a series of " video snapshot ", this a series of " video snapshot " formed video frequency abstract, and the user can make a summary by browsing video, reaches the purpose of quick understanding video content.
The present invention has introduced mutual feedback mechanism innovatively in foreground target detection method and method for tracking target, feedback mechanism can make to detect with tracking and mutually promote mutually, greatly improves the performance of the two.On the one hand, for the target detection process, introduce the information of forecasting fed back in tracing process, the information of forecasting of type, position, area, the ratio of width to height and the number of detection method by taking full advantage of the sport foreground target is selected detector type and yardstick, thereby significantly improve detector performance, effectively reduce the loss of Weak target, the false drop rate of dense shelter target, and shortened detection time.On the other hand, this method is utilized the result of target detection in tracing process, can reduce movement objective orbit with wrong probability, with losing probability and imperfect probability.With traditional video summarization method, compare, the present invention can accurately, fast, intactly extract the movement locus (user's events of interest) of foreground moving target in complex scene, under complex scene, can generate reliable video frequency abstract.
Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.