CN103929685A

CN103929685A - Video abstract generating and indexing method

Info

Publication number: CN103929685A
Application number: CN201410151449.8A
Authority: CN
Inventors: 沈志忠
Original assignee: CHINA HUA RONG HOLDINGS Corp Ltd
Current assignee: Huarong Technology Co Ltd
Priority date: 2014-04-15
Filing date: 2014-04-15
Publication date: 2014-07-16
Anticipated expiration: 2034-04-15
Also published as: CN103929685B

Abstract

The invention relates to a video abstract generating and indexing method. The video abstract generating and indexing method comprises the following steps of (1) conducting background modeling on a target image in an original video, and achieving background extraction; (2) conducting comparative segmentation on a current image and a background model, and determining movement targets according to a comparison result; (3) matching space information distributed between frames with characteristics of the targets separated from each image, achieving target tracking and recording the movement trails of the targets; (4) correcting the positions of the targets in the images in a sequence; (5) allowing the background to be overlaid with the movement targets to form a short video, and recording abstract video data to form an index data file. The video abstract generating and indexing method has the advantages that after the movement targets are tracked, the tracking success rate is guaranteed, the quality of a generated video abstract is improved greatly, and the video indexing function can meet the requirements that users view the video rapidly and view the original video conveniently so as to view actual situations completely.

Description

Video abstract generating and indexing method

Technical Field

The invention relates to the technical field of video monitoring, in particular to a video abstract generating and indexing method.

Background

With the rapid development of multimedia technology and video acquisition technology, video monitoring, image compression coding and streaming media technology gradually evolves, and the application of video monitoring technology in daily life is more and more extensive, so that video monitoring is not limited to safety precaution, but becomes an effective supervision means for all industries, and the flexibility of the application field of the video monitoring technology far exceeds the scope defined by the traditional security monitoring. However, video monitoring videos have the characteristics of large data storage amount, long storage time and the like, clues are searched through the videos to obtain evidences, a large amount of manpower, material resources and time are consumed in the traditional method, and the efficiency is extremely low, so that the best opportunity is missed. Therefore, in the video monitoring system, the original video is concentrated, so that the retrieval object can be quickly browsed and locked, and various requirements and applications of public security, network supervision and criminal investigation can be met. Video summarization plays an important role in video analysis and content-based video retrieval, which generates a short video that contains all the important activities in the original video. The video compresses the whole original video into a short summary of events by playing multiple events simultaneously, even if they occur at different times in the original video.

The purpose of the video abstract is to facilitate a user to quickly view a video, and the quality of the generated video abstract directly influences the experience effect of the user. At present, the problems of incomplete targets, ghosts and the like commonly exist in video abstractions. Moreover, the summarized video disturbs the sequential logic of the target in the original video, and a user needs to view the target by means of the original video if the user needs to know the real situation of a certain target, so that the problem that how to directly jump from the summarized video to the original video to view the situation of the target is also solved.

Disclosure of Invention

The invention aims to provide a video abstract generating and indexing method to overcome the defects in the prior art.

The purpose of the invention is realized by the following technical scheme:

a video abstract generating and indexing method comprises the following steps:

1) background modeling: performing background modeling on a target frame image in an original video to realize background extraction and separating a background from the original video;

2) extracting a moving target: comparing and segmenting the current image and the background model, and determining a moving target according to a comparison result;

3) tracking a moving target: matching spatial information distributed between frames with the characteristics of a target segmented from each frame of image to realize target tracking and recording the motion track of the target;

4) and (3) correcting the position of the moving target: correcting the tracked target set, mainly correcting the position of a target in the sequence in the image; and

5) abstract synthesis and video index establishment: the method comprises the steps of superposing moving objects on a background, synchronously playing activities which do not occur simultaneously in an original video under the condition that the video summary has no shielding or has small shielding, generating a summary video which is relatively compact in time and space and contains necessary activities in the original video, and recording summary video data when synthesizing the videos to form an index data file.

Further, the background modeling in the step 1) adopts a color background model method.

Preferably, the color background model specifically adopts a mixed gaussian background algorithm.

Further, if the moving target extracted in the step 2) has the defect of cavity and noise interference, performing morphological opening and closing operation to eliminate the cavity and noise; if the moving target extracted in the step 2) has the defect that the same target is divided into two or more targets, calculating the mutual space distance between the targets for all the targets extracted in each frame, and identifying the targets with the distance smaller than the threshold value lambda as the same target.

Further, the step 3) specifically comprises the following steps:

a) the tracking module performs matching tracking between moving targets of adjacent frames by using the spatial distribution information and the color characteristics, the successfully matched moving targets are regarded as the same target, and the moving track is recorded; unsuccessful matching is considered as a new moving object.

b) Storing the tracked results in a collectionIn (1),the target is expressed as follows:

wherein,representing objectsThe sequence occurring in the video.

Further, the method for matching and tracking comprises the following two steps:

the first method comprises the following steps: matching the target segmented from a new frame with the targets in the set omega, and defining the following functions:

time difference function:

wherein,a newly extracted object is represented that is,representation collectionOne object of (1).To representThe time stamp of (a) is stored,to representThe time stamp of (c).Is a defined time difference threshold.

Distance difference function:

wherein,a newly extracted object is represented that is,representation collectionOne object of (1).To representAnddistance in space.Is a defined distance difference threshold.

A comparison function:

if the function is comparedIs 1, then calculateAndif the histogram distance threshold is met, the matching is successful, and the color histogram distance of the color histogram is calculatedIs added toIn the sequence of (a). If the match is unsuccessful orIs 0, thenIs a new target to beAdding to collectionsIn (1).

And the second method comprises the following steps: according to the first method, the target beta is first combined withLast frame of target sequenceComparing, if the matching is not successful, then the comparison is performedUntil the previous frame is comparedThe first M frames.

Further, the step 4) of correcting the position of the moving target specifically includes the following steps:

firstly, after the video is completely processed, countingEach of the objects inWidth, height and ordering of objects in the sequence of (a).

After sortingThe widths of (a) are expressed as follows:

after sortingIs expressed as follows:

secondly, respectively calculating the average value of the sequences to obtain the targetWidth of (2)And heightAccording toAndeach target position in the target sequence is corrected.

Further, during the summary synthesis in step 5), the encoding, position and first occurrence timestamp of the moving object participating in the combination in each frame of the video summary are recorded, and these values are maintained in the index file.

The invention has the beneficial effects that: after the moving target is tracked, the tracking success rate is ensured, the quality of the generated video abstract is greatly improved, and the video indexing function can meet the requirement that a user quickly checks the video, conveniently checks the original video and completely watches the actual situation.

Drawings

The invention is explained in further detail below with reference to the drawing.

Fig. 1 is a schematic flowchart of a video summary generation and indexing method according to an embodiment of the present invention;

FIG. 2 illustrates the position of a sequence of moving objects in an image before modification according to an embodiment of the present invention;

fig. 3 shows the position of a modified sequence of moving objects in an image according to an embodiment of the present invention.

Detailed Description

As shown in fig. 1, the steps of the embodiment of the present invention include background modeling, moving object extraction, moving object tracking, moving object correction, summary synthesis, and video indexing. The method comprises the following specific steps:

1. background modeling

The background modeling module may use various image background modeling algorithms, including both color background models and texture background models. The idea of the color background model is to model the color value (grayscale or color) of each pixel in the image. If the pixel color value at the current image coordinate (x, y) is different from the pixel color value at (x, y) in the background model, the current pixel is considered as foreground, otherwise, the current pixel is considered as background.

The background modeling module of the present embodiment uses a Gaussian Mixture background algorithm in a color background Model, and a Gaussian Mixture background Model (Gaussian Mixture Model) is developed based on a single Gaussian Model, and a density distribution function of an arbitrary shape is smoothly approximated by a weighted average of a plurality of Gaussian probability density functions. The Gaussian mixture model assumes that the Gaussian distribution for describing the color of each pixel point is K, and generally takes 3-5. The number of K values in this embodiment is 3.

2. Moving object extraction

After the background model is established, the current image is compared with the background model in a certain way, and the moving target to be detected is determined according to the comparison result. In general, the obtained foreground includes much noise, and in order to eliminate the noise, the present embodiment performs an opening operation and a closing operation on the extracted moving target image, and then discards a relatively small contour.

After the targets are extracted, the total number of pixels included in each target is counted, and if the total number of pixels of a certain target is less than 400 pixels, the target is regarded as interference elimination and is not processed.

In order to solve the problem of dividing the same target into two or more targets, the mutual space distance between all targets in the current frame is calculated, and the distance is smaller than the distance by taking a pixel as a unitAre identified as the same object. In this embodiment, Λ is 15 pixels.

3. Moving object tracking module

For a certain moving target of the current frame, because the inter-frame time interval is short, the size of the space occupied by the moving target and the change of the position of the space occupied by the moving target are small, and the embodiment performs matching tracking between the moving targets of adjacent frames by using the space distribution information and the color characteristics.

The tracking module performs matching tracking between moving objects of adjacent frames by using the spatial distribution information and the color features. And if the matching is successful, the same object is considered, the motion trail is recorded, and if the matching is unsuccessful, a new motion object is considered.

The results of the tracing are stored in a collectionIn (1),the target is expressed as follows:

wherein,representing objectsThe sequence occurring in the video.

If the target extraction effect of a certain frame of the extraction module is not good, the tracking failure can be caused. In order to improve the success rate of tracking, the following two methods are adopted:

1) the targets divided from the new frame are not matched with the targets of the previous frame, but are aggregatedThe following functions are defined:

time difference function:

Distance difference function:

A comparison function:

2) In the above method, only the objects are combinedThe latest frame of the target sequence is compared ifThe last frame is not well extracted, and the tracking failure can occur. First, target beta andlast frame of target sequenceComparing, if the matching is not successful, then the comparison is performedUntil the previous frame is comparedThe first M frames.

In this embodiment, the frame number of the target appearing in the video stream sequence is taken as its time stamp, and the first frame number is 0, and sequentially increases. Time difference function of the present embodimentThe value is 15, which represents the target to be matchedAnd collectionsMiddle targetShould be within 15 frames.

In the present embodiment, the target to be matched is calculatedAnd collectionsMiddle targetThe distance between the two objects is the pixel value between the closest point of the distance between the two objects, and the distance difference functionA value of 20 represents the target to be matchedAnd collectionsMiddle targetShould be within 20 pixels.

In the tracking module of the embodiment, the value of m is 10, which represents the target to be matchedCan be combined withMiddle targetThe last 10 targets of the sequence are compared in reverse order of their time of occurrence.

The embodiment statistically obtains the target to be matchedAnd collectionsInThe Bhattacharyya distance of the two histograms is calculated to describe the similarity of the two histograms. If the Bhattacharyya distance is less than 0.6, the result shows thatAnd collectionsInThe matching is successful, willIs added toIn (c) sequence (c). If it is notAndall the targets in (1) can not be matched, thenAn object code is toAdd into collectionsIn (1).

4. Moving object position correction module

After the video is completely processed, the moving target position correction module carries out statisticsEach of the objects inWidth, height of the targets in the sequence of (1), versus targets in [ omega ]Correcting the target position of the sequence, toThe width and height in the sequence are sorted from large to small, and after sorting, the sequence is divided into a plurality of sectionsThe widths of (a) are expressed as follows:

after sortingIs expressed as follows:

and calculating the average value of the N front widths and heights after sorting to obtain the average width and the average height. Where N is taken to be20% of the total sequence. Root of correct timeAnd according to the principle of target center alignment, the target width is symmetrically modified from left to right, and the target height is symmetrically modified from top to bottom.

The position of the target sequence in the image before the correction is shown in fig. 2, and the position of the target sequence after the correction is shown in fig. 3. After the position of the moving target is corrected, the problem of incomplete target extraction in the extraction process can be solved, and the quality of the generated video abstract is improved.

5. Summary synthesis and video indexing

The module mainly completes the synthesis of the tracked moving object and the video background, and synchronously plays the activities which do not occur simultaneously in the original video under the condition of no occlusion (or small occlusion) in the video abstract, thereby generating an abstract video which is relatively compact in time and space and contains the necessary activities in the original video.

For each frame of image of the video summary, selecting which moving objects appear simultaneously is the key to the composition. The present embodiment is determined by calculating an energy loss function for each moving object. The function consists of a moving target time difference loss function and a moving target collision loss function, and moving targets with energy loss function values meeting conditions are selected and combined.

Before generating the video abstract of each frame, dividing the moving objects into three types of sets: merged complete (S1), merging in progress (S2), to-be-merged (S3). And sequentially calculating energy loss functions between the sum set S2 from S3 according to the sequence of the occurrence time, and merging the energy loss functions in the same frame of video when the energy loss functions meet the loss threshold.

When merging, a background image needs to be provided, and the background at the earliest time of the occurrence time of the moving object in the frame is selected as the background image.

When merging the moving objects, recording the codes, positions and time stamps appearing for the first time of the moving objects participating in merging in each frame, and storing the values in an index file.

When a user clicks a video, judging whether the position of a mouse is within an envelope range of a moving target, and if the position of the mouse is within a certain target range, inquiring an index file to obtain the time of the target appearing in the original video.

The present invention is not limited to the above-mentioned preferred embodiments, and any other products in various forms can be obtained by anyone in the light of the present invention, but any changes in the shape or structure thereof, which have the same or similar technical solutions as those of the present application, fall within the protection scope of the present invention.

Claims

1. A video abstract generating and indexing method is characterized by comprising the following steps:

2. The method for generating and indexing video summary according to claim 1, wherein in step 1), the background modeling adopts a color background model.

3. The method for video summary generation and indexing as claimed in claim 2, wherein: the color background model specifically adopts a mixed Gaussian background algorithm.

4. The video abstract generating and indexing method according to claim 1, wherein in the step 2), if the extracted moving object has the defect of void and noise interference, the morphological open and close operation is adopted for processing to eliminate the void and the noise; if the moving target extracted in the step 2) has the defect that the same target is divided into two or more targets, calculating the mutual space distance between the targets for all the targets extracted in each frame, and identifying the targets with the distance smaller than the threshold value lambda as the same target.

5. The method for generating and indexing a video summary according to claim 1, wherein the step 3) comprises the following steps:

a) the tracking module performs matching tracking between moving targets of adjacent frames by using the spatial distribution information and the color characteristics, the successfully matched moving targets are regarded as the same target, and the moving track is recorded; the matching is unsuccessful and is regarded as a new moving object; and

wherein,representing objectsThe sequence occurring in the video.

6. The method for video summary generation and indexing as claimed in claim 5, wherein the matching tracking method includes the following two methods:

time difference function:

wherein,a newly extracted object is represented that is,representation collectionOne of the targets;to representThe time stamp of (a) is stored,to representA timestamp of (d);is a defined time difference threshold;

distance difference function:

wherein,a newly extracted object is represented that is,representation collectionIs the object of (1) a method for,to representAnda distance in space;is a defined distance difference threshold;

a comparison function:

if the function is comparedIs 1, then calculateAndif the histogram distance threshold is met, the matching is successful, and the color histogram distance of the color histogram is calculatedIs added toIn the sequence of (a); if the match is unsuccessful orIs 0, thenIs a new oneTarget ofAdding to collectionsPerforming the following steps;

7. The method for generating and indexing a video summary according to claim 1, wherein the step 4) of correcting the position of the moving object specifically comprises the following steps:

firstly, after the video is completely processed, countingEach of the objects inWidth, height and ordering of the targets in the sequence of (a);

after sortingThe widths of (a) are expressed as follows:

after sortingIs expressed as follows:

secondly, respectively calculating the average value of the sequences to obtain the targetWidth of (2)And heightAccording toAnd correcting each target position in the target sequence.

8. The method for video summary generation and indexing as claimed in claim 1, wherein: in step 5), during summary synthesis, the encoding, position and first-appearing timestamp of the moving object participating in the combination in each frame of the video summary need to be recorded, and the values are kept in the index file.