Disclosure of Invention
The invention aims to provide a video abstract generating and indexing method to overcome the defects in the prior art.
The purpose of the invention is realized by the following technical scheme:
a video abstract generating and indexing method comprises the following steps:
1) background modeling: performing background modeling on a target frame image in an original video to realize background extraction and separating a background from the original video;
2) extracting a moving target: comparing and segmenting the current image and the background model, and determining a moving target according to a comparison result;
3) tracking a moving target: matching spatial information distributed between frames with the characteristics of a target segmented from each frame of image to realize target tracking and recording the motion track of the target;
4) and (3) correcting the position of the moving target: correcting the tracked target set, mainly correcting the position of a target in the sequence in the image; and
5) abstract synthesis and video index establishment: the method comprises the steps of superposing moving objects on a background, synchronously playing activities which do not occur simultaneously in an original video under the condition that the video summary has no shielding or has small shielding, generating a summary video which is relatively compact in time and space and contains necessary activities in the original video, and recording summary video data when synthesizing the videos to form an index data file.
Further, the background modeling in the step 1) adopts a color background model method.
Preferably, the color background model specifically adopts a mixed gaussian background algorithm.
Further, if the moving target extracted in the step 2) has the defect of cavity and noise interference, performing morphological opening and closing operation to eliminate the cavity and noise; if the moving target extracted in the step 2) has the defect that the same target is divided into two or more targets, calculating the mutual space distance between the targets for all the targets extracted in each frame, and identifying the targets with the distance smaller than the threshold value lambda as the same target.
Further, the step 3) specifically comprises the following steps:
a) the tracking module performs matching tracking between moving targets of adjacent frames by using the spatial distribution information and the color characteristics, the successfully matched moving targets are regarded as the same target, and the moving track is recorded; unsuccessful matching is considered as a new moving object.
b) Storing the tracked results in a collectionIn (1),the target is expressed as follows:
wherein,representing objectsThe sequence occurring in the video.
Further, the method for matching and tracking comprises the following two steps:
the first method comprises the following steps: matching the target segmented from a new frame with the targets in the set omega, and defining the following functions:
time difference function:
wherein,a newly extracted object is represented that is,representation collectionOne object of (1).To representThe time stamp of (a) is stored,to representThe time stamp of (c).Is a defined time difference threshold.
Distance difference function:
wherein,a newly extracted object is represented that is,representation collectionOne object of (1).To representAnddistance in space.Is a defined distance difference threshold.
A comparison function:
if the function is comparedIs 1, then calculateAndif the histogram distance threshold is met, the matching is successful, and the color histogram distance of the color histogram is calculatedIs added toIn the sequence of (a). If the match is unsuccessful orIs 0, thenIs a new target to beAdding to collectionsIn (1).
And the second method comprises the following steps: according to the first method, the target beta is first combined withLast frame of target sequenceComparing, if the matching is not successful, then the comparison is performedUntil the previous frame is comparedThe first M frames.
Further, the step 4) of correcting the position of the moving target specifically includes the following steps:
firstly, after the video is completely processed, countingEach of the objects inWidth, height and ordering of objects in the sequence of (a).
After sortingThe widths of (a) are expressed as follows:
after sortingIs expressed as follows:
secondly, respectively calculating the average value of the sequences to obtain the targetWidth of (2)And heightAccording toAndeach target position in the target sequence is corrected.
Further, during the summary synthesis in step 5), the encoding, position and first occurrence timestamp of the moving object participating in the combination in each frame of the video summary are recorded, and these values are maintained in the index file.
The invention has the beneficial effects that: after the moving target is tracked, the tracking success rate is ensured, the quality of the generated video abstract is greatly improved, and the video indexing function can meet the requirement that a user quickly checks the video, conveniently checks the original video and completely watches the actual situation.
Detailed Description
As shown in fig. 1, the steps of the embodiment of the present invention include background modeling, moving object extraction, moving object tracking, moving object correction, summary synthesis, and video indexing. The method comprises the following specific steps:
1. background modeling
The background modeling module may use various image background modeling algorithms, including both color background models and texture background models. The idea of the color background model is to model the color value (grayscale or color) of each pixel in the image. If the pixel color value at the current image coordinate (x, y) is different from the pixel color value at (x, y) in the background model, the current pixel is considered as foreground, otherwise, the current pixel is considered as background.
The background modeling module of the present embodiment uses a Gaussian Mixture background algorithm in a color background Model, and a Gaussian Mixture background Model (Gaussian Mixture Model) is developed based on a single Gaussian Model, and a density distribution function of an arbitrary shape is smoothly approximated by a weighted average of a plurality of Gaussian probability density functions. The Gaussian mixture model assumes that the Gaussian distribution for describing the color of each pixel point is K, and generally takes 3-5. The number of K values in this embodiment is 3.
2. Moving object extraction
After the background model is established, the current image is compared with the background model in a certain way, and the moving target to be detected is determined according to the comparison result. In general, the obtained foreground includes much noise, and in order to eliminate the noise, the present embodiment performs an opening operation and a closing operation on the extracted moving target image, and then discards a relatively small contour.
After the targets are extracted, the total number of pixels included in each target is counted, and if the total number of pixels of a certain target is less than 400 pixels, the target is regarded as interference elimination and is not processed.
In order to solve the problem of dividing the same target into two or more targets, the mutual space distance between all targets in the current frame is calculated, and the distance is smaller than the distance by taking a pixel as a unitAre identified as the same object. In this embodiment, Λ is 15 pixels.
3. Moving object tracking module
For a certain moving target of the current frame, because the inter-frame time interval is short, the size of the space occupied by the moving target and the change of the position of the space occupied by the moving target are small, and the embodiment performs matching tracking between the moving targets of adjacent frames by using the space distribution information and the color characteristics.
The tracking module performs matching tracking between moving objects of adjacent frames by using the spatial distribution information and the color features. And if the matching is successful, the same object is considered, the motion trail is recorded, and if the matching is unsuccessful, a new motion object is considered.
The results of the tracing are stored in a collectionIn (1),the target is expressed as follows:
wherein,representing objectsThe sequence occurring in the video.
If the target extraction effect of a certain frame of the extraction module is not good, the tracking failure can be caused. In order to improve the success rate of tracking, the following two methods are adopted:
1) the targets divided from the new frame are not matched with the targets of the previous frame, but are aggregatedThe following functions are defined:
time difference function:
wherein,a newly extracted object is represented that is,representation collectionOne object of (1).To representThe time stamp of (a) is stored,to representThe time stamp of (c).Is a defined time difference threshold.
Distance difference function:
wherein,a newly extracted object is represented that is,representation collectionOne object of (1).To representAnddistance in space.Is a defined distance difference threshold.
A comparison function:
if the function is comparedIs 1, then calculateAndif the histogram distance threshold is met, the matching is successful, and the color histogram distance of the color histogram is calculatedIs added toIn the sequence of (a). If the match is unsuccessful orIs 0, thenIs a new target to beAdding to collectionsIn (1).
2) In the above method, only the objects are combinedThe latest frame of the target sequence is compared ifThe last frame is not well extracted, and the tracking failure can occur. First, target beta andlast frame of target sequenceComparing, if the matching is not successful, then the comparison is performedUntil the previous frame is comparedThe first M frames.
In this embodiment, the frame number of the target appearing in the video stream sequence is taken as its time stamp, and the first frame number is 0, and sequentially increases. Time difference function of the present embodimentThe value is 15, which represents the target to be matchedAnd collectionsMiddle targetShould be within 15 frames.
In the present embodiment, the target to be matched is calculatedAnd collectionsMiddle targetThe distance between the two objects is the pixel value between the closest point of the distance between the two objects, and the distance difference functionA value of 20 represents the target to be matchedAnd collectionsMiddle targetShould be within 20 pixels.
In the tracking module of the embodiment, the value of m is 10, which represents the target to be matchedCan be combined withMiddle targetThe last 10 targets of the sequence are compared in reverse order of their time of occurrence.
The embodiment statistically obtains the target to be matchedAnd collectionsInThe Bhattacharyya distance of the two histograms is calculated to describe the similarity of the two histograms. If the Bhattacharyya distance is less than 0.6, the result shows thatAnd collectionsInThe matching is successful, willIs added toIn (c) sequence (c). If it is notAndall the targets in (1) can not be matched, thenAn object code is toAdd into collectionsIn (1).
4. Moving object position correction module
After the video is completely processed, the moving target position correction module carries out statisticsEach of the objects inWidth, height of the targets in the sequence of (1), versus targets in [ omega ]Correcting the target position of the sequence, toThe width and height in the sequence are sorted from large to small, and after sorting, the sequence is divided into a plurality of sectionsThe widths of (a) are expressed as follows:
after sortingIs expressed as follows:
and calculating the average value of the N front widths and heights after sorting to obtain the average width and the average height. Where N is taken to be20% of the total sequence. Root of correct timeAnd according to the principle of target center alignment, the target width is symmetrically modified from left to right, and the target height is symmetrically modified from top to bottom.
The position of the target sequence in the image before the correction is shown in fig. 2, and the position of the target sequence after the correction is shown in fig. 3. After the position of the moving target is corrected, the problem of incomplete target extraction in the extraction process can be solved, and the quality of the generated video abstract is improved.
5. Summary synthesis and video indexing
The module mainly completes the synthesis of the tracked moving object and the video background, and synchronously plays the activities which do not occur simultaneously in the original video under the condition of no occlusion (or small occlusion) in the video abstract, thereby generating an abstract video which is relatively compact in time and space and contains the necessary activities in the original video.
For each frame of image of the video summary, selecting which moving objects appear simultaneously is the key to the composition. The present embodiment is determined by calculating an energy loss function for each moving object. The function consists of a moving target time difference loss function and a moving target collision loss function, and moving targets with energy loss function values meeting conditions are selected and combined.
Before generating the video abstract of each frame, dividing the moving objects into three types of sets: merged complete (S1), merging in progress (S2), to-be-merged (S3). And sequentially calculating energy loss functions between the sum set S2 from S3 according to the sequence of the occurrence time, and merging the energy loss functions in the same frame of video when the energy loss functions meet the loss threshold.
When merging, a background image needs to be provided, and the background at the earliest time of the occurrence time of the moving object in the frame is selected as the background image.
When merging the moving objects, recording the codes, positions and time stamps appearing for the first time of the moving objects participating in merging in each frame, and storing the values in an index file.
When a user clicks a video, judging whether the position of a mouse is within an envelope range of a moving target, and if the position of the mouse is within a certain target range, inquiring an index file to obtain the time of the target appearing in the original video.
The present invention is not limited to the above-mentioned preferred embodiments, and any other products in various forms can be obtained by anyone in the light of the present invention, but any changes in the shape or structure thereof, which have the same or similar technical solutions as those of the present application, fall within the protection scope of the present invention.