CN103413330A

CN103413330A - Method for reliably generating video abstraction in complex scene

Info

Publication number: CN103413330A
Application number: CN2013103895057A
Authority: CN
Inventors: 郝红卫; 袁飞; 唐矗; 田澍; 冯媛
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2013-08-30
Filing date: 2013-08-30
Publication date: 2013-11-27

Abstract

The invention discloses a method for generating video summaries in complex scenes, which includes: performing background modeling on the acquired video to obtain the motion foreground of the current frame in the video sequence; using prediction information to select a suitable target detector for the current frame According to the type and detection result of the obtained moving object, use the multi-target tracking method to calculate and obtain the moving track of the moving object in the current frame, and calculate the prediction information of the next frame accordingly; Then detect the next frame until the end of the video sequence; generate a video summary according to the obtained information such as the trajectory and type of all moving objects. The present invention introduces a mutual feedback mechanism in the process of foreground target detection and tracking, which can effectively reduce the missed detection rate of weak and small targets and the false detection rate of densely occluded targets, as well as the follow-up error probability, follow-up probability and incomplete probability of moving target trajectories , so that the present invention can be applied to various complex scenarios.

Description

Reliable video abstraction generating method under a kind of complex scene

Technical field

The present invention relates to technical field of image processing, relate in particular to reliable video abstraction generating method under a kind of complex scene.

Background technology

In modern society, video monitoring system is being brought into play more and more important effect in all trades and professions, the departments such as public security, fire-fighting, traffic, commercial production are particularly urgent to the demand of the video monitoring of public place, and video monitoring system has become an important component part that maintains public order, strengthens social management and safety in production.The quantity of monitoring camera increases fast, all can produce the monitor video data of magnanimity every day, and this causes finding clue by monitor video can expend a large amount of human and material resources and time, has reduced the effective rate of utilization of monitor video.According to the ReportLinker corporate statistics, in 2011, the whole world has and surpasses 1.65 hundred million CCTV cameras, produce the monitor data of 1.4 trillion hours, if 20% important monitor video data are arranged need to manually watch, need to employ the labour's (working every day 8 hours, annual work 300 days) over 100,000,000.Therefore, in video monitoring system, make the whole video of user's fast browsing, lock rapidly searching object, be of great importance for the utilization factor that improves the magnanimity monitor video.

In image processing field, in order to improve the browse efficiency of video, can adopt video summarization technique, by the interested contents extraction of user in video out, then they are rearranged in compact mode, thereby generate brief video.In order to extract automatically the interested content of user in video, the simplest method is the key frame extracted in original video, form video frequency abstract (list of references: the triumphant nurse of Chadwick etc. a kind of video frequency abstract overall plan of based target, the 8th ACM (Association of Computing Machinery) multimedia international conference transactions, (2000) 303-311. (Kim, C., Hwang, J.N.:An integrated scheme for object-based video abstraction.In:Proceedings of the eighth ACM international conference on Multimedia. (2000) 303-311)), but whole section video of the description that key frame can't be complete, can cause the loss of important information in video, and because video content is of a great variety, the key frame that How to choose is suitable is a difficult problem.Another kind method is first video content to be analyzed, extract the relevant information of moving target in original video, the movable information that then will extract arranges compactly, the generating video summary (list of references: Ya Aierpuruiqi etc. non-sequential video frequency abstract and index, IEEE pattern analysis and machine intelligence transactions, (2008) 1971-1984. (Pritch, Y., Rav-Acha, A., Peleg, S.:Nonchronological video synopsis and indexing.IEEE Trans.Pattern Anal.Mach.Intell.30 (2008) 1971-1984)), this method can retain the dynamic content of video preferably.For this method, the key of problem is to extract how exactly the interested all events of user.

For monitor video, the photographed scene of monitor video is very complicated: the scene vehicle had is many, and movement velocity is fast, as highway; Some scene pedestrians are many, and the normal appearance phenomenon of blocking mutually, as railway station or crossroad; In some scenes, moving target shared elemental area on picture is very little, etc.; The complicacy of scene is that the accurate detection of moving target brings very large challenge.Current video summarization technique can not solve the test problems of moving target in complex scene well, usually make the loss of moving target very high, can't accurately extract the critical event in video, thereby cause the video frequency abstract of generation to miss the important information in original video.

Summary of the invention

Video summarization technique is significant and wide market outlook for effective utilization of magnanimity monitor video, but existing video summarization technique can not effectively solve in complex scene and occur that Weak target, target block the situations such as adhesion mutually, for monitor video is carried out to fast browsing, reduce loss and fallout ratio that in complex scene, the moving target event is extracted, reduce moving target and in the track generative process, follow the tracks of the probability of makeing mistakes, follow the tracks of lose objects, and eliminate the misgivings of user to existing video summarization technique.

The present invention proposes reliable video abstraction generating method under a kind of complex scene, it comprises:

Step 1, the video obtained is carried out to background modeling, obtain the sport foreground of present frame in video sequence;

Step 2, utilize the information of forecasting calculated in the previous frame tracing process, select suitable object detector to carry out target classification and detection to the sport foreground of present frame, obtain type and the testing result of moving target;

Step 3, according to type and the testing result of the moving target obtained, utilize multi-object tracking method, calculate the movement locus of moving target in present frame, and calculate accordingly the information of forecasting of next frame, and go to step 2 and carry out the detection of next frame, until detect complete all video frame images;

Step 4, according to movement locus and the type of all moving targets that obtain, generating video summary.

Wherein, in step 1 also the sport foreground to the current video image frame that obtains carry out aftertreatment, specifically comprise:

Step 11, use morphological structuring elements, carry out morphology opening operation and closing operation of mathematical morphology to foreground target, obtains the prospect of contour smoothing, and eliminate the less noise spot of area, dwindles the noise spot that area is larger;

Step 12, foreground target is carried out to area calculating, if in the area of foreground target, pixel is less than threshold value T ₁=5 o'clock, this foreground target of filtering, otherwise, retain this foreground target.

Wherein, object detector described in step 2 is with the combination of histograms of oriented gradients and linear SVM, to carry out off-line training in advance to obtain, for detection of type, profile and the positional information of moving target.

Wherein, the positive sample set of the described object detector use of training comprises the following three class images that occur in monitor video: 1) pedestrian and positive bicycle; 2) bicycle of side; 3) motor vehicle; Training obtains three object detectors.

Wherein, step 2 specifically comprises:

Step 21, according to the information of forecasting of previous frame tracing process feedback, determine object detector;

Step 22, carry out target detection according to determined object detector;

If step 23 detects moving target, export the testing result of present frame moving target, for the tracing process of moving target;

If step 24 does not detect moving target, will in background modeling, obtain the information output of sport foreground to tracing process.

Wherein, the information of forecasting fed back according to the previous frame tracing process in step 21, comprise type, position, area, the ratio of width to height and the number of moving target in present frame, carrys out type and the yardstick of select target detecting device, and the detection position of localizing objects detecting device.

Wherein, step 3 specifically comprises the following steps:

Step 31, according to the testing result of the moving target of present frame, calculate the similarity of moving target color histogram feature in present frame and previous frame;

Step 32, according to the moving object detection result in previous frame, utilize the positional information of moving target in present frame in Kalman filter prediction previous frame, in conjunction with the testing result of moving target in present frame, the Euclidean distance in the predicted position of calculating moving target and moving object detection result between the physical location of moving target;

Step 33, according to the result of calculation of step 31 and step 32, use Hungary Algorithm, to the moving target that detects in present frame with enliven track and mate, obtain matching result, and according to matching result, upgrade the movement locus of moving target.

Wherein, in step 4, the duration of the movement locus of the moving target of acquisition, classification and appearance is presented in a secondary video snapshot image, forms video frequency abstract in the mode of video snapshot image.

The present invention is directed to the monitor video under complex scene, by novel Video content analysis technique, moving target in original video is detected and follows the tracks of, extract the moving target event in original video, then for each moving target event, add up their movement locus and relevant information, and be shown to compactly the user with the form of image, the user, by watching the picture that records each moving target event just can reach the purpose of watching original video, has shortened widely the user and has watched the spent time of video.When the moving target in video carries out detection and tracking, this method fully takes into account the complicacy of scene, the technical scheme adopted can guarantee the reliability of result of calculation, the loss of moving target event is controlled to extremely low level, thereby make this invention technology can be widely used in the actual combat of many departments, such as public security investigation etc.

The accompanying drawing explanation

Fig. 1 is reliable video abstraction generating method process flow diagram under complex scene in the present invention.

Fig. 2 is the process flow diagram of moving target detecting method in the present invention.

Fig. 3 is the process flow diagram of method for tracking target in the present invention.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.

The present invention proposes reliable video abstraction generating method under a kind of complex scene, the method content is specific as follows:

At first, the image sequence of original video is carried out to background modeling, obtain the sport foreground piece, and it is carried out to prospect aftertreatment etc.; Secondly, utilize a series of object detectors that off-line training is good, the sport foreground piece is carried out to target detection and classification; Again, build the multi-object tracking method based on Hungary Algorithm; Finally, for each moving target generates movement locus, moving target and its movement locus are merged in image, the image after the present invention will be merged is called " video snapshot ".It is worthy of note, this method has been introduced mutual feedback mechanism in foreground target detection and tracking process, feedback mechanism can make target detection and target following mutually promote mutually, improve accuracy rate and the speed of target detection and target following, effectively reduce the loss of Weak target, the false drop rate of dense shelter target, and movement objective orbit with wrong probability, with losing probability and imperfect probability.The original video that can carry out the video frequency abstract processing includes but not limited to: the video file that the live video stream that video monitoring system gathers, video monitoring system store, conventional multimedia video frequency file, TV programme, film etc.

Fig. 1 shows in the present invention under complex scene the process flow diagram of video abstraction generating method reliably.As shown in Figure 1, the concrete implementation step of this video abstraction generating method is as follows:

The video data of step S101, collection video frequency abstract to be generated;

The original video that step S102, storage gather, form the original video data storehouse; Original video can be the video of monitoring camera Real-time Collection, can be also the playback video of monitoring video.

Step S103, the video sequence frame of the original video in the original video data storehouse is carried out to background modeling, obtain sport foreground and background in each frame of video, and it is carried out to aftertreatment.

In an embodiment of the present invention, background modeling can adopt multiple related algorithm, and the present embodiment does not enumerate.The purpose of background modeling is background and the prospect of determining in this video scene.Scene consists of background and prospect, and the background in scene refers to that long period in video remains unchanged or the zone of subtle change is arranged, and accordingly, the prospect in scene refers to the zone of significant change.For example, for the monitor video of crossroad, the automobile travelled on road and the pedestrian who walks, only exist at short notice in video scene, so be considered to sport foreground, and the trees of road, traffic lights and both sides, road, exist for a long time in video scene, can be regarded movement background.By original video is carried out to background modeling, just can extract the sport foreground in video.

But due to the complicacy of actual monitored scene, foreground target can be sneaked into noise spot, such as the trees on road both sides, when the moving leaf of wind, leaf can be shaken, and the leaf of shake is due to the variation that position is arranged, so can be classified as sport foreground.

Based on this phenomenon, in the preferred embodiment of the present invention, the sport foreground obtained is carried out to the prospect aftertreatment, the prospect aftertreatment adopts morphology to calculate, and specifically comprises:

At first, use morphological structuring elements, foreground target is carried out to morphology opening operation and closing operation of mathematical morphology, can obtain the prospect of contour smoothing, and eliminate the less noise spot of area, dwindle the noise spot that area is larger;

Then, foreground target is carried out to area calculating, if in the area of foreground target, the pixel number is less than threshold value T ₁=5 o'clock, think that this foreground target belongs to noise, answer filtering, otherwise, retain this foreground target.By above method, eliminate the noise in sport foreground, and can make the edge of prospect become level and smooth.

Step S104, utilize object detector, sport foreground is carried out to target detection and classification, obtain the relevant information such as profile, type, position of moving target.

By video is carried out to background modeling, extract sport foreground, and, after sport foreground is carried out to aftertreatment, can obtain the profile information of moving target in video, still due to the complicacy of actual monitored scene, it is far from being enough having to the profile of moving target.During due to background modeling, usually can move the phenomenons such as target adhesion, target piecemeal, target occlusion, the target of prospect is too dense, these phenomenons often cause the accuracy rate of detecting device to reduce.For example, at the monitor video in the more railway station of flow of the people, the situation that the pedestrian is blocked mutually usually occurs, what at this moment sport foreground showed is group's profile; Monitor video on road bustling with vehicles, usually there will be mutually blocking of vehicle, and what at this moment sport foreground showed is the profile of many cars; And, simply by virtue of the profile of moving target, also can't obtain the classified information (pedestrian/bicycle/motor vehicle) of moving target.In addition, background modeling method is in some picture frame, can move prospect piecemeal phenomenon, for example a pedestrian walks on road, sport foreground should be pedestrian's profile, but background modeling can be regarded the pedestrian above the waist as a moving object after extracting sport foreground, regards the lower part of the body as another moving object.In sum, need to carry out target detection to sport foreground.

In the preferred embodiment of the present invention, adopt the good a series of object detectors of off-line training, target detection and classification are carried out in the zones such as sport foreground.The off-line training of object detector is divided into three steps:

At first, collect sample set.Sample set can be divided into positive sample set and negative sample collection.For example, positive sample set can be divided three classes: 1) pedestrian and positive bicycle; 2) bicycle of side; 3) motor vehicle, the pedestrian, bicycle and the motor vehicle that in a series of images frame of monitor video, occur are formed.Monitor video derives from up to a hundred the videos that the video monitoring system of Quzhou City, Zhejiang Province Ke Shan branch office gathers, pedestrian in positive sample set and vehicle are by manually marking, form the sample data collection of monitor video, and issue on the net, can be for public research.The negative sample collection is the monitoring image in the monitor video that does not comprise pedestrian, bicycle and motor vehicle.

Secondly, extract sample characteristics.In the preferred embodiment of the present invention, adopt histograms of oriented gradients HOG (Histogram of Oriented Gradients) to carry out feature extraction, HOG is a kind of Feature Descriptor for target detection, the direction gradient number of times that image local occurs is counted, the HOG technology is by being divided into entire image little join domain cells, each cell generates a histograms of oriented gradients, and these histogrammic combinations can be expressed as the descriptor of sample.The HOG feature that positive sample in sample set and negative sample is all needed to extract to sample.

Finally, training sample, obtain a series of detecting devices.In the preferred embodiment of the present invention, utilize linear SVM SVM (Support Vector Machine) to train training sample, thereby obtain a series of detecting devices for target detection.For example,, because the positive sample set of training is divided into three classes, therefore can obtain three object detectors.If in moving target, contain front or the side image of pedestrian, bicycle and motor vehicle, the multiclass detecting device that forms of these three object detectors just can detect the information such as position, profile of moving target, and the classified information (pedestrian/bicycle/motor vehicle) of this moving target is provided.

Above step of carrying out off-line training for sample set is only with carrying out once, and purpose is to obtain detecting device, after successfully obtaining detecting device, while for different monitor videos, processing to generate video frequency abstract, only need to call the detecting device trained and get final product.

In the process of generating video summary, this a series of detecting device is that the sport foreground of original video is carried out to target detection, and needs to detect for each sport foreground of each frame of video.When sport foreground is detected, need be from a Train detector, selecting one.

In order to adapt to the needs of moving object detection under complex scene, improve performance and the speed of detecting device, the present invention introduces the information of forecasting of tracing process feedback innovatively in the target detection process, this feedback information is predicted type, area, the ratio of width to height, position and the number of moving target in present frame.According to information of forecasting, can select optimal detecting device, and the parameters such as the yardstick of definite detecting device and detection position.Wherein, in feedback information, the type of moving target can help to select accurately detector type, avoids three detecting devices to carry out duplicate detection to the same movement prospect, causes be multiplied detection time; In feedback information, the area of moving target, aspect ratio information can assist to select exactly the yardstick of detecting device; In feedback information, the effective location of sensing range can assisted detector be carried out in the position of moving target, etc.The present invention, by the feedback mechanism of tracing process to testing process, has increased substantially the performance of detecting device, and greatly reduces detection speed.

For the fresh target in sport foreground, because it just occurs, tracing process there is no the information of forecasting that method is obtained this moving target, therefore according to features such as the ratio of width to height of this moving target, areas, select a class detecting device, the fresh target in sport foreground is detected, obtain testing result, then with threshold value, compare, if higher than threshold value, adopt the testing result of such detecting device, otherwise use an other class detecting device.Obtain testing result the outputs such as type, area, position of moving target.Wherein, the Threshold of each detecting device is different, the present invention during by the detecting device off-line training testing result minimum value of positive sample set be made as threshold value.

By above mechanism, even under complex scene, object detector also can detect comparatively accurately to the sport foreground of each frame.

Fig. 2 shows the process flow diagram of moving target detecting method in the present invention.As shown in Figure 2, this detection process of moving target specifically comprises:

Step S1021, obtain the target prediction information of feeding back in previous frame motion target tracking process, comprise type, position, area and the number etc. of moving target.

Step S1022, according to described target prediction information, determine type, yardstick and the detection position of detecting device.For the selection of yardstick mainly for pedestrian target, according to the height H of pedestrian in target prediction information, pedestrian target is divided into to very little target (H<30pixels), little target (30<H<50 pixels), normal size target (50<H<90), general objective (H>90), the corresponding pedestrian detector who covers 7 search window yardsticks of each target.In order to guarantee that the zone of searching for comprises the target that needs detect fully, for original prediction area, enlarge, the coefficient of expansion is 1.4.

Step S1023, the preferred detecting device of use carry out target detection.

If step S1024 step S1023 detects target, the testing results such as the position of export target, area;

If step S1025 step S1023 does not detect target, directly according to the area in information of forecasting and the corresponding sport foreground of positional information, export as testing result.

Especially, in firm incipient 3 frames of moving target, can't provide information of forecasting accurately in previous frame motion target tracking process.Now, directly according to the height of sport foreground, select the detecting device yardstick, and use 3 class detecting devices to detect successively the sport foreground target, after obtaining testing result, choose the testing result of the detecting device that score is the highest, thereby determine target type, area, the ratio of width to height and position, and above information is exported.

Step S105, utilize multi-object tracking method, obtain the track of moving target.Wherein, enliven track, mean the track of following the tracks of, show in real-time result; Historical track, mean current do not have tracked, but may be transformed into the track that enlivens track; Dead track, mean thoroughly to finish, no longer tracked track.

This paper adopts the movement locus that obtains moving target based on the multiple target tracking mode of Hungary Algorithm, and wherein Hungary Algorithm is used for calculating the optimum correspondence problem of a plurality of moving targets.Wherein, the description of moving target similarity is based on colouring information and the positional information of moving target.Colouring information adopts color histogram to quantize, and a kind of statistical value of color distribution in the color histogram presentation video means different color shared ratio in image, calculates simply, and has yardstick, translation and rotational invariance.Positional information is calculated in conjunction with Kalman filter, Kalman filtering is the linear system optimal estimation method under minimum mean square error criterion, its basic thought is that to make variance of estimaion error be minimum, and estimates it is without inclined to one side, can promote the target following effect.

As shown in Figure 3, the movement locus that obtains moving target based on the multiple target tracking mode of Hungary Algorithm in the present invention specifically can be divided into following step:

Step S1051,8 * 8 * 8 color histogram features of all moving targets that detect in calculation procedure S104, then calculate the similarity of the color histogram feature of the color histogram feature of the moving target obtained in present frame and previous frame moving target.Preferably, the present invention adopts the RGB color space to calculate the color histogram of each moving target: first three color components in color space RGB are quantized, each color space is divided into to 8 sub spaces, one dimension (bin) in the corresponding histogram of every sub spaces, statistics drops on the number of pixels in the subspace that the every one dimension of histogram is corresponding, thereby obtain color histogram, then calculate the similarity between the color histogram feature that previous frame enlivens moving target that track is corresponding and present frame moving target.Preferably, the present invention adopts the Hellinger distance to measure the similarity of two histogram distribution:

d (h_{1}, h_{2}) = \sqrt{1 - \frac{1}{\sqrt{\overset{&OverBar;}{h_{1}} \overset{&OverBar;}{h_{2}} N^{2}}} Σ_{q = 1}^{N} \sqrt{h_{1} (q) h_{2} (q)}}

Wherein, h ₁(q) and h ₂(q) represent two color histogram vectors, N is 8 * 8 * 8,

\overset{&OverBar;}{h_{k}} = \frac{1}{N} Σ_{j = 1}^{N} h_{k} (j) .

If the color histogram of two targets is more similar, namely the distance of the Hellinger between the color histogram vector is less, and the possibility of two object matchings is higher, and its probability distribution meets Gaussian distribution.For example, in the monitor video picture of highway, there is a white car W in left side, and there is a black car B on right side, and this method need to be followed the tracks of these two moving targets, thereby obtains their movement locus.If in previous frame, two moving object W detecting in picture and B are calculated to color histogram and obtain h ₁And h ₂, two moving object W in the present frame picture and B are calculated to color histogram and obtain h ₃And h ₄, by calculating h ₁And h ₃, h ₁And h ₄, h ₂And h ₃, h ₂And h ₄Between the Hellinger distance, can find h ₁And h ₃, h ₂And h ₄The Hellinger distance much smaller than h ₁And h ₄, h ₂And h ₃Between the Hellinger distance, can access h so ₁And h ₃That W is at the corresponding color histogram of two continuous frames, h ₂And h ₄Be B at the corresponding color histogram of two continuous frames, this information can help the target that two continuous frames occurs to mate.

Step S1052, according to the trace information that enlivens of moving target in the previous frame image, utilize the position of Kalman filter predicted motion target.Enliven trace information according to every in the t-1 two field picture, utilize the position that in Kalman filter prediction t frame, moving target occurs.In step S104, obtain the testing result of t frame moving target, be the definite position of moving target at the t frame, in this step, successively moving target is carried out to Euclidean distance calculating in the predicted position of t frame and the target detection result of t frame detection module, Euclidean distance is less, predicted position and accurate location are more approaching, the possibility of two object matchings is higher so, and its probability distribution meets Gaussian distribution.For example, the left side vehicle W in monitored picture mentioned above and right side vehicle B, if in the t-1 frame, utilize Kalman filter to carry out position prediction to two moving object W and the B of detecting in picture, obtains the predicted position l in the t frame ₁' and l ₂', in step S104, after the t frame detects two moving object W and B, obtain the physical location l of target ₁And l ₂.Because in two continuous frames, huge change can not occur in the position of vehicle, so l ₁' and l ₁, l ₂' and l ₂Euclidean distance will be far smaller than l ₁' and l ₂, l ₁' and l ₂Euclidean distance, this information can help the target that two continuous frames occurs to mate.

Step S1053, adopt Hungary Algorithm, utilizes colouring information and positional information to carry out multiobject coupling, and Hungary Algorithm is the classic algorithm that solves the bipartite graph maximum matching problem.For example, if in the t-1 frame, exist m to enliven track, step S104 detects n moving target in the t frame, and by Hellinger, is calculated the similarity between the moving target color histogram feature of enlivening track and t frame of t-1 frame, and obtains the matrix M of m * n ₁And calculate the t-1 frame enliven track in the t frame predicted position and the Euclidean distance between the definite position of t frame moving target, can obtain the matrix M of m * n ₂.By matrix M ₁And M ₂The element of correspondence position multiplies each other, can obtain the matrix M of m * n, input value using this matrix M as Hungary Algorithm, Hungary Algorithm can provide m matching result that enlivens track and n moving target of t frame in the t-1 frame, if in matching result, similarity is less than threshold value T ₂=0.5 o'clock, think and do not mate, otherwise the match is successful.

Step S1054, according to the matching result of target in previous step, generate the movement locus of moving target in present frame; The positional information of while target of prediction in next frame etc.

If the t-1 frame enliven track m _iMoving target n with the t frame _jThe match is successful, thinks target n _jMovement locus in front t-1 frame is m _i, integrating step S104 at the t frame to target n _jTesting result, the renewable track m that enlivens _i.Now, for target n _jTracing process at the t frame finishes.

If the moving target of t frame does not match the track that enlivens of t-1 frame, illustrate that this target does not have movement locus, is fresh target; If the track that enlivens of t-1 frame does not match the moving target of t frame, illustrate that target disappears, this is enlivened to track and historical track mates, if on mating, this enlivens track and historical track is integrated into the new track that enlivens, otherwise this enlivens track and changes historical track into.

The present invention is at t frame target n _jAfter renewal enlivens track, utilize Kalman filter target of prediction n _jIn the position of t+1 frame, and preserve target n _jThe information such as type, position, area, the ratio of width to height, to use when the t+1 frame target detection.

Step S106, generating video summary.

By above step, can obtain the information such as the movement locus of moving target and type.In the method, when certain historical track, through the matching operation of N frame, still can't be upper with the sport foreground coupling, be considered as this historical track and stop, N=50 in this algorithm.After the historical track termination, with moving target, merge, generate a sub-picture, be referred to as " video snapshot ", in this " video snapshot ", indicate the classification of moving target, and the time continued appears in this moving target.For example, in the monitor video of road, suspect's vehicle of running away appears, this vehicle can be labeled out to this section running orbit disappeared from picture from entering picture, time of occurrence was from 28.92 seconds to 41.54 seconds, the starting point of movement locus is used and is indicated than light colour, terminating point is used and indicates than dark colour, the gradual change from shallow to deep of the color of the movement locus point between from the starting point to the terminating point, then image and the movement locus of this vehicle in the movement locus midpoint merged, and the classified information of vehicle " motor vehicle " and time of occurrence " 28.92s-41.54s " are indicated in by moving target.After whole section Video processing finishes, generate a series of " video snapshot ", this a series of " video snapshot " formed video frequency abstract, and the user can make a summary by browsing video, reaches the purpose of quick understanding video content.

The present invention has introduced mutual feedback mechanism innovatively in foreground target detection method and method for tracking target, feedback mechanism can make to detect with tracking and mutually promote mutually, greatly improves the performance of the two.On the one hand, for the target detection process, introduce the information of forecasting fed back in tracing process, the information of forecasting of type, position, area, the ratio of width to height and the number of detection method by taking full advantage of the sport foreground target is selected detector type and yardstick, thereby significantly improve detector performance, effectively reduce the loss of Weak target, the false drop rate of dense shelter target, and shortened detection time.On the other hand, this method is utilized the result of target detection in tracing process, can reduce movement objective orbit with wrong probability, with losing probability and imperfect probability.With traditional video summarization method, compare, the present invention can accurately, fast, intactly extract the movement locus (user's events of interest) of foreground moving target in complex scene, under complex scene, can generate reliable video frequency abstract.

Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the video abstraction generating method under a complex scene, it comprises:

2. the method for claim 1, is characterized in that, in step 1 also the sport foreground to the current video image frame that obtains carry out aftertreatment, specifically comprise:

3. the method for claim 1, it is characterized in that, object detector described in step 2 is with the combination of histograms of oriented gradients and linear SVM, to carry out off-line training in advance to obtain, for detection of type, profile and the positional information of moving target.

4. method as claimed in claim 3, is characterized in that, the positive sample set of training described object detector to use comprises the following three class images that occur in monitor video: 1) pedestrian and positive bicycle; 2) bicycle of side; 3) motor vehicle; Training obtains three object detectors.

5. the method for claim 1, is characterized in that, step 2 specifically comprises:

Step 22, carry out target detection according to determined object detector;

6. method as claimed in claim 5, it is characterized in that, the information of forecasting fed back according to the previous frame tracing process in step 21, the type, position, area, the ratio of width to height and the number that comprise moving target in present frame, come type and the yardstick of select target detecting device, and the detection position of localizing objects detecting device.

7. the method for claim 1, is characterized in that, step 3 specifically comprises the following steps:

8. the method for claim 1, is characterized in that, in step 4, the duration of the movement locus of the moving target of acquisition, classification and appearance is presented in a secondary video snapshot image, forms video frequency abstract in the mode of video snapshot image.