CN103092925A

CN103092925A - Method and device of generation of video abstract

Info

Publication number: CN103092925A
Application number: CN2012105877561A
Authority: CN
Inventors: 王海峰
Original assignee: XINZHENG ELECTRONIC TECHNOLOGY (BEIJING) Co Ltd
Current assignee: IROBOTCITY (BEIJING) CO.,LTD.; TELEFRAME TECHNOLOGY (BEIJING) CO LTD
Priority date: 2012-12-30
Filing date: 2012-12-30
Publication date: 2013-05-08
Anticipated expiration: 2032-12-30
Also published as: CN103092925B

Abstract

The invention provides a method of generation of a video abstract. The method of the generation of the video abstract is characterized by comprising the steps: obtaining direction information of a detected motion object, wherein the direction information is set by a user; carrying out foreground detection on a video image of each frame, tracking a target which is detected, and extracting a motion track of the target; distinguishing the similarity degree of the direction of the extracted motion track and the direction information which is set by the user, and storing the motion track of which the similarity degree with the direction information which is set by the user meets a preset threshold value; enabling images of each frame of the stored motion track to be respectively overlaid on a corresponding background, and generating the video abstract. The invention further provides a device of the generation of the video abstract. The device of the generation of the video abstract comprises a user set module, a track extraction module, a track distinguishing module and a generation module. The method and the device of the generation of the video abstract can effectively save storage space which is occupied by unrelated targets, and accelerate the generating speed of the video abstract.

Description

A kind of video abstraction generating method and device

Technical field

The present invention relates to technical field of image processing, especially relate to a kind of video abstraction generating method and device.

Background technology

In the social public security field, video monitoring system becomes an important component part that maintains public order, strengthens social management.Yet there are the characteristics such as the storage data volume is large, storage time is long in video record, seeks clue by video recording, obtains evidence, browse one by one associated video according to traditional way, expend a large amount of human and material resources and time, efficient is extremely low, solves a case opportunity to such an extent as to miss the best.Therefore in video monitoring system, original video is concentrated, can fast browsing, the locking searching object accelerates for public security the speed of solving a case, and the efficient of solving a case that improves major case, important case has great importance.

In the middle of practical application, in some cases, the event trace when approach of runing away after the suspect commits a crime or crime have often been known in public security organ, when transferring associated video, only need to investigate along suspect's event trace, namely the target that has a similar movement direction with the suspect is observed one by one, can realize knowing of case clue.For example, together in the bank raid case, public security organ clear and definite suspect runs away along southeastern direction, the activity of having recorded all targets that comprise the suspect in the captured video of the camera on bank doorway, and public security organ only wonders along the people of southeastern direction activity which is arranged, thereby obtains the clues such as some macroscopic featuress about the suspect.

And in prior art, the mode that video frequency abstract is adopted is in automatic or automanual mode, first by the moving target analysis, extract moving target, then the movement locus of each target is analyzed, all different targets of extracting on video are spliced in a common background scene, and they are combined to form video frequency abstract in some way.At this moment, the generation of other targets that have nothing to do with the event trace suspect or differ greatly in video frequency abstract is obviously useless.And the generation of these useless targets need to be stored the movement locus of these targets, thereby has wasted certain storage space.

In sum, video summarization technique of the prior art, the technical matters of the wasted storage storage space of the movement locus of the irrelevant target of existence.

Summary of the invention

The present invention proposes a kind of video abstraction generating method and device, saved the shared storage space of movement locus of irrelevant target.

In order to achieve the above object, technical scheme of the present invention is achieved in that

The invention provides a kind of video abstraction generating method, comprise step:

Steps A is obtained the directional information of the detected moving object of user's setting;

Step B carries out foreground detection to each frame video image, and the target that detects is followed the tracks of, and extracts the movement locus of target;

The similarity of the directional information that step C, the direction of differentiating the movement locus extract and user arrange, the similarity of the directional information that will arrange with the user is stored less than the movement locus of predetermined threshold value;

Step D is added to each two field picture of movement locus of storage respectively on corresponding background, the generating video summary.

Wherein, the directional information in described steps A comprises a direction vector at least.

Wherein, described step C comprises step:

The differentiation of the similarity of the directional information that the direction of the movement locus that extracts according to following criterion and user arrange:

β = Σ_{i = 1}^{n - 1} ar \cos (\frac{a \cdot b_{i}}{| a | \cdot | b_{i} |}) / n < T

Wherein, β is the average angle of the direction vector that arranges of the movement locus that extracts and user, and T is predetermined threshold value, the direction vector of the moving object that is detected that a arranges for the user, b _i=p _i+1-p _i, p _iI central point for the movement locus that extracts;

If the β value that calculates is less than predetermined threshold value T, with this movement locus storage.

Wherein, carry out foreground detection in described step B and comprise step:

Utilize the mixed Gaussian function to carry out background modeling to image, extract the target of motion.

Wherein, described step utilizes the mixed Gaussian function to carry out background modeling to image, and the target of extracting motion also comprises to be processed illumination and shade, comprises step:

When illumination variation amplitude in the unit interval in shooting environmental surpasses predetermined threshold value, the span that is judged to be the pixel of background dot is reduced into original 0.4-0.6 doubly;

Use threshold value greater than the pixel of shadow region with image binaryzation, remove shade.

Wherein, in described step C, the target that detects is followed the tracks of and comprises step:

All targets of detecting of traversal present frame, and the detected target of previous frame image compares, if satisfy following condition:

S _cross>min(S _pre,S _temp)×R

S _cross=Width _cross×Height _cross

Width _cross=min(right _pre,right _temp)-max(left _pre,left _temp)

Height _cross=min(Bottom _pre,Bottom _temp)-max(Top _pre,Top _temp)

Wherein, Scross is the intersection area of front and back two frames, Width _CrossFor projecting to the length of the cross section on horizontal direction; Heught _CrossFor projecting to the length of the cross section on vertical direction; Right _preValue for the right margin of former frame profile; Right _TempValue for the right margin of present frame profile; Left _preValue for the left margin of former frame profile; Left _TempValue for the left margin of present frame profile; Bottom _preValue for the lower boundary of former frame profile; Bottom _TempValue for the lower boundary of present frame profile; Top _preValue for the coboundary of former frame profile; Top _TempValue for the coboundary of present frame profile; Described R is cross-ratio;

The target of judging present frame is related with previous frame, upgrades track; If do not satisfy this condition, judgement is not related, produces new track, if there is the track on the target association that does not have to detect with present frame in the previous frame image, stops the tracking of this track, and with this track storage.

Wherein, described step B also comprises step:

Upgrade background;

According to the number of the target of extracting, according to the target numbers principle that frequency is higher, the context update frequency is lower of foreground detection more at most, adjust the frequency of foreground detection and the frequency of context update.

Wherein, the frequency of described step adjustment foreground detection and the frequency of context update comprise step:

When the target numbers of extracting is zero, carry out foreground detection one time every the 3-6 frame, each frame of background upgrades once;

When the target numbers of extracting is 1-3, carry out foreground detection one time every 2 frames, every two frames of background upgrade once;

Every frame all carries out foreground detection when above when the target numbers of extracting is 3, and every three frames of background upgrade once.

The present invention also provides a kind of video frequency abstract generating apparatus, comprises that the user arranges module, track extraction module, track discrimination module and generation module; Described user arranges module, for the directional information of the detected moving object of obtaining user's setting; Described track extraction module is used for each frame video image is carried out foreground detection, and the target that detects is followed the tracks of, and extracts the movement locus of target; Described track discrimination module, the similarity of the directional information that direction that be used for to differentiate the movement locus that extracts and user arrange, the similarity of the directional information that will arrange with the user is stored less than the movement locus of predetermined threshold value; Described generation module, each two field picture that is used for movement locus that will the storage corresponding background that is added to respectively, the generating video summary.

As seen, the present invention has following beneficial effect at least:

A kind of video abstraction generating method of the present invention and device, by obtaining the directional information of the detected moving object that the user arranges, and differentiate the direction of the movement locus extract and the similarity of the directional information of user's setting, the similarity of the directional information that only will arrange with the user is less than the movement locus storage of predetermined threshold value, like this, the movement locus not storage lower with the direction similarity of user's setting, thus this a part of storage space saved;

In addition, on the corresponding background that only image of movement locus of storage is added to, the generating video summary, the movement locus of comparing all targets that will extract all is added in background, also reduced operand, and computing and storage all needs elapsed time, therefore, the rise time of video frequency abstract has also been saved in the minimizing of operand and the minimizing of storage operation, thereby has accelerated the formation speed of video frequency abstract;

Further, adopt the mixed Gaussian function to carry out foreground detection, guaranteed the precision of foreground detection, simultaneously, illumination and shade are processed respectively, prevented that the foreground extraction to video causes adverse effect because the variation of illumination is large, and the processing of shade, also make video more clear, more easily observe;

Further, also the frequency of foreground detection and context update is adjusted, like this, carried out differentiated treatment according to different situations, under the prerequisite that guarantees accuracy, reduce operand as far as possible, thereby further accelerated the formation speed of video frequency abstract.

Description of drawings

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, the below will do one to the accompanying drawing of required use in embodiment or description of the Prior Art and introduce simply, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the process flow diagram of the embodiment one of video abstraction generating method of the present invention;

Fig. 2 is the process flow diagram of the embodiment two of video abstraction generating method of the present invention.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

Embodiment one

The embodiment of the present invention one provides a kind of video abstraction generating method, and is shown in Figure 1, comprises step:

Step S110 obtains the directional information of the detected moving object of user's setting.

Described directional information refers to the direction of motion of the object that will detect that the user arranges comprise that at least is with a directive vector, preferably, can comprise the multiple directions vector.

Step S111 carries out foreground detection to each frame video image, and the target that detects is followed the tracks of, and extracts the movement locus of target.

Foreground detection can adopt multiple related algorithm, such as mixture Gaussian background model, SACON(SAMPLE CONSENSUS) etc., the present embodiment does not enumerate.

Tracing process also can adopt many algorithms, for example comparatively simply neighbor method, multiple target tracking algorithm, border following algorithm etc.

The similarity of the directional information that step S112, the direction of differentiating the movement locus extract and user arrange, the similarity of the directional information that will arrange with the user satisfy the movement locus of predetermined threshold value and store.

The similarity of the direction that the direction of the movement locus that extracts and user arrange is higher, illustrates more that this movement locus is probably the movement locus that the user wants the target known.

Those skilled in the art can be according to technical conceive of the present invention, the multiple method of discrimination of specific implementation, the differentiation of the similarity of the directional information that the direction of the movement locus that extracts according to different criterions and user arrange.

For example, can adopt following method of discrimination:

β = Σ_{i = 1}^{n - 1} ar \cos (\frac{a \cdot b_{i}}{| a | \cdot | b_{i} |}) / n < T

If the β value that calculates less than predetermined threshold value T, illustrates that the similarity with direction user's setting this movement locus is higher, satisfy predetermined threshold value, with this movement locus storage.

Described predetermined threshold value T can rule of thumb be worth and get 0.1-0.4, preferably, can get 0.2.About the value of predetermined threshold value, those skilled in the art can determine multiple span or concrete numerical value according to different criterions, and the embodiment of the present invention one is enumerated one by one.

Step S113 is added to each two field picture of movement locus of storage respectively on corresponding background, the generating video summary.

According to the track of the moving target that extracts and the background image of storage, according to time relationship and spatial relationship that track occurs, track is arranged, on the background image of the storage that then target trajectory that moves is added to, generate summary.

Embodiment two

The embodiment of the present invention two provides a kind of video abstraction generating method, and is shown in Figure 2, comprises step:

Step S210: the directional information of obtaining the detected moving object of user's setting.

In the present embodiment two, described directional information comprises the both direction vector.Be that the movement locus that the user can arrange detected moving object is first to advance along one the direction of determining, advance along another direction more afterwards, for example, can be for first then being folded to east along southeastern direction.

Step S211: utilize the mixed Gaussian function to carry out background modeling, extract the target of motion.

Utilize mixed Gaussian to carry out background modeling to image, extract the prospect of motion, the number that wherein can select the mixed Gaussian function to adopt according to video scene can be trained separately a Gauss model for shade.

Single Gaussian Background modeling function is

f (x; μ; σ) φexp (\frac{- {(x - μ)}^{2}}{{2 σ}^{2}}

The mixed Gaussian background modeling is modeled as the basis with single Gaussian Background, comprises step:

1) initialization mixture model parameter at first comprises the shared weight of each Gauss model of initialization and average and the standard deviation of each Gauss model.

Wherein the initialization of weight is exactly the distribution of background to be carried out the valuation of prior probability, initialized the time, generally the weight of first Gauss model is got greatlyr, and other just corresponding values are less, that is:

ω_{k} (x, y, 1) = \{\begin{matrix} W & k = 1 \\ (1 - W) / (K - 1) & k &NotEqual; 1 \end{matrix}

Wherein the average of first Gauss model equal the first frame of input video corresponding pixel value or process the mean value of unit, that is:

μ_{k} (x, y, l, 1) = \{\begin{matrix} I (x, y, l, 1) & k = 1 \\ 0 & k &NotEqual; 1 & 0 < k < = K \end{matrix}

The variance v of K Gauss model:

σ _k ²(x,y,1)=var?k=1,2,...,K

The initial variance of all Gauss models all equates, that is: σ _k ²(x, y, 1)=var k=1,2 ..., K

The var value is directly relevant to the dynamic perfromance of this video.

2) upgrade the Gauss model parameter

Travel through each Gauss model, relatively following formula:

(I(x,y,l,f)-μ _k(x,y,l,f-1)) ²<c*σ _k(x,y,f-1) ²

If all set up for all color components, so just this pixel is attributed to B Gauss model, otherwise, just not belonging to any one Gauss model, this just is equivalent to occur wild point.Below either way need to do corresponding renewal.

All set up this situation for all colours component, corresponding step of updating is:

This situation represents that the value of current pixel satisfies B Gaussian distribution, and this pixel might not belong to background so, needs to judge whether this B Gaussian distribution meets the following conditions:

Σ_{n = 1}^{B} w_{B} (x, y, f) < Threshold

Illustrate that this pixel belongs to background dot, otherwise just belong to the foreground point.

If this pixel belongs to background dot, so just illustrate that B background distributions exported a sampled value, at this moment all distribute and all need to carry out parameter and upgrade.

B corresponding Gauss model parameter upgraded as follows:

w _B(x,y,f)=(1-α)*w _B(x,y,f-1)+α

μ _B(x,y,l,f)=(1-β)*μ _B(x,y,l,f-1)+β*I(x,y,l,f)

σ _B ²(x,y,f)=(1-β)*σ _B ²(x,y,f-1)+β*(I(:)-μ _B(:)) ^T*(I(:)-μ _B(:))

Remaining Gauss model only changes weights, and average and variance all remain unchanged, that is:

w _k(x,y,f)=(1-α)*w _k(x,y,f-1)?k≠B

β=αη(I(x,y,:,f)|μ _B,σ _B)

Wild point refers to this pixel value and does not meet any one Gaussian distribution, this moment, we regarded this pixel as the new situation that occurs in video, replace K Gaussian distribution with this new situation, its weight and average and variance are all determined according to the initialization thinking, namely distribute a less weight, with a larger variance, that is:

w _K(x,y,f)=(1-W)/(K-1)

μ _K(x,y,l,f)=I(x,y,l,f)

σ _K(x,y,l,f)=var

Determine that simultaneously this point is the foreground point.The foreground point is the pixel of each target very.

Preferably, also comprise illumination and shade processed, comprise step:

The illumination variation amplitude surpasses predetermined threshold value within the unit interval in shooting environmental, when namely illumination variation is very large, the span that is judged to be the pixel of background dot is reduced into original 0.4-0.6 doubly, preferably, is 0.5 times.

Wherein the predetermined threshold value of illumination variation amplitude can specifically be determined according to actual needs by those skilled in the art in the unit interval, and for example this predetermined threshold value can be 10-15lx/s(lux/second).

For shade, use threshold value greater than the pixel of shadow region with image binaryzation, remove shade.

Wherein, the frequency of the frequency of foreground detection and context update can be adjusted according to target numbers.

For example, when the target numbers of extracting is zero, carry out foreground detection one time every the 3-6 frame, each frame of background upgrades once; When the target numbers of extracting is 1-3, carry out foreground detection one time every 2 frames, every two frames of background upgrade once; Every frame all carries out foreground detection when above when the target numbers of extracting is 3, and every three frames of background upgrade once.

Step S212: the target that detects is followed the tracks of, extract the movement locus of target.

Target to front and back two frames that detect is carried out respectively the track association, trajectory generation, and track disappears and differentiates.All prospects of detecting of traversal present frame, and the previous frame result of all tracks compares, if satisfy following condition:

S _cross>min(S _pre,S _temp)×R

S wherein _Cross=Width _Cross* Height _CrossBe the intersection area of front and back two frames, described R is cross-ratio, and in the present embodiment, R can learn from else's experience and test threshold value 0.4.

Width _cross=min(right _pre,right _temp)-max(left _pre,left _temp)

Height _cross=min(Bottom _pre,Bottom _temp)-max(Top _pre,Top _temp)

Width _CrossFor projecting to the length of the cross section on horizontal direction; Height _CrossFor projecting to the length of the cross section on vertical direction; Right _preValue for the right margin of former frame profile; Right _TempValue for the right margin of present frame profile; Left _preValue for the left margin of former frame profile; Left _TempValue for the left margin of present frame profile; Bottom _preValue for the lower boundary of former frame profile; Bottom _TempValue for the lower boundary of present frame profile; Top _preValue for the coboundary of former frame profile; Top _TempValue for the coboundary of present frame profile.

If satisfy above-mentioned condition, certain prospect of judging present frame is with on the track of previous frame storage is related, upgrade track, if do not have in association, produce new track, there is no the prospect that detects with the present frame track on related if having, stop this track and carry out operating next time, track is stored, be used for follow-up generation video frequency abstract.

For example, if definite area S that interweaves to two human body contour outlines _CrossMin (S _pre, S _Temp) * R thinks same human body profile.

Step S213: the similarity of the directional information that the direction of differentiating the movement locus extract and user arrange, the similarity of the directional information that will arrange with the user satisfy the movement locus of predetermined threshold value and store.

After obtaining the track of object of which movement, whether whether the direction of motion of differentiating object consistent with the direction of motion of the object that will detect that arranges.

In the embodiment of the present invention two, the method that track is differentiated is namely differentiated by following discriminant with embodiment one:

β = Σ_{i = 1}^{n - 1} ar \cos (\frac{a \cdot b_{i}}{| a | \cdot | b_{i} |}) / n < T

Wherein T can rule of thumb be worth, and selects 0.2, namely as β less than 0.2 the time, judge that this track as approximate track, stores, if β is not less than 0.2, abandon this track.

Wherein, described with embodiment one, β is the average angle of the direction vector that arranges of the movement locus that extracts and user, and T is predetermined threshold value, the direction vector of the moving object that is detected that a arranges for the user, b _i=p _i+1-p _i, p _iI central point for the movement locus that extracts.

Step S214: each two field picture of movement locus of storage is added to respectively on corresponding background, the generating video summary.

Embodiment three

The embodiment of the present invention three provides a kind of video frequency abstract generating apparatus, comprises that the user arranges module, track extraction module, track discrimination module and generation module.

Described user arranges module, for the directional information of the detected moving object of obtaining user's setting; Described track extraction module is used for each frame video image is carried out foreground detection, and the target that detects is followed the tracks of, and extracts the movement locus of target; Described track discrimination module, the similarity of the directional information that direction that be used for to differentiate the movement locus that extracts and user arrange, the similarity of the directional information that will arrange with the user is stored less than the movement locus of predetermined threshold value; Described generation module, each two field picture that is used for movement locus that will the storage corresponding background that is added to respectively, the generating video summary.

Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the present invention or embodiment.

a kind of video abstraction generating method and the device of the embodiment of the present invention, at first the direction that will detect moving object is set, then carry out background modeling, detect moving object, object is followed the tracks of, obtain the track of object, then track is differentiated, obtain the moving object close with the direction that will detect that arranges, then store these tracks and Background, generate at last the video frequency abstract of the moving object of the specific direction of wanting inspected object, saved owing to generating the irrelevant shared storage space of moving object, reduced related operation, thereby improved the speed that video generates.

It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment, the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a video abstraction generating method, is characterized in that, comprises step:

The similarity of the directional information that step C, the direction of differentiating the movement locus extract and user arrange, the similarity of the directional information that will arrange with the user satisfy the movement locus of predetermined threshold value and store;

2. method according to claim 1, is characterized in that, the directional information in described steps A comprises a direction vector at least.

3. method according to claim 2, is characterized in that, described step C comprises step:

β = Σ_{i = 1}^{n - 1} ar \cos (\frac{a \cdot b_{i}}{| a | \cdot | b_{i} |}) / n < T

4. video abstraction generating method according to claim 1, is characterized in that, carries out foreground detection in described step B and comprise step:

5. method according to claim 4, is characterized in that, described step utilizes the mixed Gaussian function to carry out background modeling to image, and the target of extracting motion also comprises to be processed illumination and shade, comprises step:

6. video abstraction generating method according to claim 1, is characterized in that, in described step B, the target that detects followed the tracks of to comprise step:

S _cross>min(S _pre,S _temp)×R

S _cross=Width _cross×Height _cross

Width _cross=min(right _pre,right _temp)-max(left _pre,left _temp)

Height _cross=min(Bottom _pre,Bottom _temp)-max(Top _pre,Top _temp)

Wherein, Scross is the intersection area of front and back two frames, Width _CrossFor projecting to the length of the cross section on horizontal direction; Height _CrossFor projecting to the length of the cross section on vertical direction; Right _preValue for the right margin of former frame profile; Right _TempValue for the right margin of present frame profile; Left _preValue for the left margin of former frame profile; Left _TempValue for the left margin of present frame profile; Bottom _preValue for the lower boundary of former frame profile; Bottom _TempValue for the lower boundary of present frame profile; Top _preValue for the coboundary of former frame profile; Top _TempValue for the coboundary of present frame profile; Described R is cross-ratio;

7. video abstraction generating method according to claim 1, is characterized in that, described step B also comprises step:

Upgrade background;

8. video abstraction generating method according to claim 7, is characterized in that, described step is adjusted the frequency of foreground detection and the frequency of context update comprises step:

9. a video frequency abstract generating apparatus, is characterized in that, comprises that the user arranges module, track extraction module, track discrimination module and generation module;

Described user arranges module, for the directional information of the detected moving object of obtaining user's setting;

Described track extraction module is used for each frame video image is carried out foreground detection, and the target that detects is followed the tracks of, and extracts the movement locus of target;

Described track discrimination module, the similarity of the directional information that direction that be used for to differentiate the movement locus that extracts and user arrange, the similarity of the directional information that will arrange with the user is stored less than the movement locus of predetermined threshold value;

Described generation module, each two field picture that is used for movement locus that will the storage corresponding background that is added to respectively, the generating video summary.