CN109862313A

CN109862313A - A kind of video concentration method and device

Info

Publication number: CN109862313A
Application number: CN201811518639.3A
Authority: CN
Inventors: 疏坤; 吴小燕; 殷兵; 何山; 柳林; 刘聪; 杨世清
Original assignee: iFlytek Co Ltd
Current assignee: Iflytek Information Technology Co Ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-06-07
Anticipated expiration: 2038-12-12
Also published as: CN109862313B

Abstract

This application discloses a kind of video concentration method and devices, this method comprises: obtaining the video to be concentrated including multiple moving targets, then moving target combination is selected for each condensed images, each condensed images are each frame images obtained after video to be concentrated is concentrated, to carry out video concentration to video to be concentrated according to the moving target combination in each condensed images.It can be seen that, the application can select a kind of reasonable moving target combination for each frame condensed images, to enable each moving target in every frame condensed images to occupy image space to greatest extent and keep the overlapping degree between the different motion target in every frame condensed images small as far as possible, and then improve the concentration precision of concentration video.

Description

A kind of video concentration method and device

Technical field

This application involves technical field of video processing more particularly to a kind of video concentration method and devices.

Background technique

With the raising that people's security protection is realized, household, campus, traffic and communal facility are more and more weighed safely Depending on, therefore also produce the monitor video of magnanimity.Due to being limited by storage size and scene complexity, monitor video is deposited All there are extreme difficulties in storage, transfer and screening investigation, how useful information is fast and effeciently obtained from massive video is one Particularly significant and problem to be solved, therefore, video concentration technique are come into being.

Video concentration refers to a simplified summary to original video content, is to retain original all movement mesh of video Mark (such as people and vehicle) while, over time and space reset moving target sequence, remove unnecessary background information from And obtain concentration video so that the video of original long period (such as 2 hours), within a short period of time (such as 10 minutes it It is interior) it can show and finish, this will greatly improve the efficiency of magnanimity monitor video analysis.

Mainly pass through target detection gets moving target sequence, the moving target sequence to existing video concentration method Column refer to all frame images that the same moving target occurs in original video, then, using the method based on Trace Formation Moving target sequence is fused in background picture to obtain concentration video.But the concentration densities of existing method (are concentrated The number of moving target in video in every frame image) be it is fixed, will utmostly and reasonably can not be closed in original video It is concentrated in the picture material of moving target, so that the concentration precision of concentration video is inadequate.

Summary of the invention

The main purpose of the embodiment of the present application is to provide a kind of video concentration method and device, can be improved concentration video Concentration precision.

The embodiment of the present application provides a kind of video concentration method, comprising:

Video to be concentrated is obtained, includes multiple moving targets in the video to be concentrated；

Moving target combination is selected for each condensed images, each condensed images are to the video to be concentrated The each frame image obtained after being concentrated；

According to the moving target combination in each condensed images, it is dense that video is carried out to the video to be concentrated Contracting.

It is optionally, described to select moving target combination for each condensed images, comprising:

It determines footprint area of each moving target in each frame image of the video to be concentrated and plants oneself；

It according to the footprint area of each moving target and plants oneself, selects moving target combination side for each condensed images Formula.

Optionally, it the footprint area according to each moving target and plants oneself, for each condensed images selection fortune Moving-target combination, comprising:

By the way that the moving target for having overlapping in the video to be concentrated is divided into one group, each initial packet is obtained, it is described It include at least one moving target in initial packet；

By carrying out event detection to the video to be concentrated, each initial packet is grouped adjustment, is obtained each First object grouping；

According to the footprint area of each moving target and plant oneself and each first object grouping, be each concentration figure As selection moving target combination.

Optionally, described by the way that the moving target for having overlapping in the video to be concentrated is divided into one group, it obtains each first Begin to be grouped, comprising:

The corresponding cluster feature of each moving target is extracted from the video to be concentrated；

According to the corresponding cluster feature of each moving target, the moving target for having overlapping is divided into one group, is obtained each A initial packet.

Optionally, the cluster feature include moving direction information of the corresponding moving target in the video to be concentrated, At least one of in translational speed information, location information when first appearing and temporal information when first appearing.

Optionally, described according to the corresponding cluster feature of each moving target, there will be the moving target of overlapping to be divided into One group, obtain each initial packet, comprising:

It will be each respectively according to the corresponding cluster feature of each moving target according to the different default cluster number of M kind A moving target is clustered, and M kind cluster result is obtained；

For each cluster result in the M kind cluster result, the poly- of each moving target in the cluster result is calculated Error between category feature and the cluster centre feature of the cluster result, and the sum of square of each error is calculated, it is poly- to obtain this The corresponding error sum of squares of class result；

The corresponding cluster result of minimum value is selected from the corresponding error sum of squares of M kind cluster result, and will Each cluster in the cluster result selected is as each initial packet.

Optionally, described by carrying out event detection to the video to be concentrated, each initial packet is grouped tune It is whole, obtain each first object grouping, comprising:

The video to be concentrated is divided into multiple first video-frequency bands；

In each first video-frequency band of division, the number of moving target is counted respectively；

Event detection is carried out in each first video-frequency band that statistics number reaches predetermined number threshold value；

If detecting, the moving target in the first video-frequency band in different initial packets participates in same event, will be at the beginning of difference Moving target in beginning grouping merges into same group, by each grouping after merging and each initial point without merging Group is grouped respectively as first object.

Optionally, the footprint area according to each moving target and plant oneself and each first object grouping, Moving target combination is selected for each condensed images, comprising:

The video to be concentrated is segmented, each second video-frequency band is obtained；

It determines each first object grouping occurred in second video-frequency band, and each first object of appearance is grouped It is differently combined, obtains each second targeted packets；

For each second targeted packets, determine each moving target in the second targeted packets in second video-frequency band Each frame image in footprint area and plant oneself；

It according to the footprint area of each moving target of each second targeted packets and plants oneself, in each second target A kind of packet mode is selected in grouping, and according to the packet mode of selection, determination is concentrated to give second video-frequency band The corresponding moving target combination of each condensed images.

It is optionally, described to be segmented the video to be concentrated, comprising:

According to the packet count of the totalframes of the video to be concentrated and first object grouping, by the video to be concentrated It is segmented.

Optionally, it the footprint area of each moving target according to each second targeted packets and plants oneself, A kind of packet mode is selected in each second targeted packets, comprising:

The corresponding target function value of each second targeted packets is generated, and selects minimum target functional value corresponding Two targeted packets；

Wherein, the target function value is generated according to scene utilization rate and duplication loss rate；The scene utilization rate The footprint area for reflecting each moving target in corresponding second targeted packets occupies degree in background image, and described Scene utilization rate is inversely proportional with the target function value；The duplication loss rate reflects each in corresponding second targeted packets Overlapping degree of the moving target in the footprint area in background image between each moving target caused by planting oneself.

The embodiment of the present application also provides a kind of video enrichment facilities, comprising:

Video acquisition unit to be concentrated includes multiple movement mesh in the video to be concentrated for obtaining video to be concentrated Mark；

Objective cross selecting unit, for selecting moving target combination, each concentration for each condensed images Image is each frame image obtained after the video to be concentrated is concentrated；

Video concentration unit, for according to the moving target combination in each condensed images, to described Video to be concentrated carries out video concentration.

Optionally, the objective cross selecting unit includes:

Target data obtains subelement, for determining each moving target in each frame image of the video to be concentrated It footprint area and plants oneself；

Objective cross selects subelement, is each dense for according to the footprint area of each moving target and planting oneself Contracting image selection moving target combination.

Optionally, the objective cross selection subelement includes:

Initial packet subelement, for obtaining by the way that the moving target for having overlapping in the video to be concentrated is divided into one group It include at least one moving target in the initial packet to each initial packet；

Event detection subelement, for by carrying out event detection to the video to be concentrated, by each initial packet into Row grouping adjustment obtains each first object grouping；

Combination selection subelement, for according to the footprint area of each moving target and planting oneself and each first mesh Mark grouping selects moving target combination for each condensed images.

Optionally, the initial packet subelement includes:

Cluster feature extracts subelement, corresponding poly- for extracting each moving target from the video to be concentrated Category feature；

Initial packet forms subelement, for will have overlapping according to the corresponding cluster feature of each moving target Moving target is divided into one group, obtains each initial packet.

Optionally, the initial packet formation subelement includes:

Target clusters subelement, for the default cluster number different according to M kind, respectively respectively according to each moving target Corresponding cluster feature clusters each moving target, obtains M kind cluster result；

Quadratic sum computation subunit, for calculating the cluster knot for each cluster result in the M kind cluster result Error in fruit between the cluster feature of each moving target and the cluster centre feature of the cluster result, and calculate each error The sum of square, obtain the corresponding error sum of squares of the cluster result；

Cluster result selects subelement, for selecting most from the corresponding error sum of squares of M kind cluster result It is small to be worth corresponding cluster result, and using each cluster in the cluster result selected as each initial packet.

Optionally, the event detection subelement includes:

First divides subelement, for the video to be concentrated to be divided into multiple first video-frequency bands；

Number counts subelement, for counting the number of moving target respectively in each first video-frequency band of division；

Event detection subelement, for carrying out thing in each first video-frequency band that statistics number reaches predetermined number threshold value Part detection；

Targeted packets subelement, if for detecting that the moving target in the first video-frequency band in different initial packets participates in Moving target in different initial packets is then merged into same group by same event, by each grouping after merging and not Each initial packet through merging is grouped respectively as first object.

Optionally, the combination selection subelement includes:

Second division subelement obtains each second video-frequency band for the video to be concentrated to be segmented；

Packet assembling subelement for determining each first object occurred in second video-frequency band grouping, and will go out Existing each first object grouping is differently combined, and obtains each second targeted packets；

Determining subelement is occupied, for determining each movement in the second targeted packets for each second targeted packets It footprint area of the target in each frame image of second video-frequency band and plants oneself；

Mode determines subelement, for according to the footprint area of each moving targets of each second targeted packets and occupying Position selects a kind of packet mode in each second targeted packets, and according to the packet mode of selection, determines to described second The corresponding moving target combination of each condensed images that video-frequency band is concentrated to give.

Optionally, described second subelement is divided, specifically for according to the totalframes of the video to be concentrated and described the The video to be concentrated is segmented by the packet count of one targeted packets.

Optionally, the mode determines subelement, is specifically used for generating the corresponding target of each second targeted packets Functional value, and select corresponding second targeted packets of minimum target functional value；

The embodiment of the present application also provides a kind of video enrichment facilities, comprising: processor, memory, system bus；

The processor and the memory are connected by the system bus；

The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any one implementation in above-mentioned video concentration method when being executed by the processor.

The embodiment of the present application also provides a kind of computer readable storage medium, deposited in the computer readable storage medium Instruction is contained, when described instruction is run on the terminal device, so that the terminal device executes in above-mentioned video concentration method Any one implementation.

The embodiment of the present application also provides a kind of computer program product, the computer program product is on the terminal device When operation, so that the terminal device executes any one implementation in above-mentioned video concentration method.

Video concentration method provided by the embodiments of the present application and device obtain the view to be concentrated including multiple moving targets Frequently, moving target combination then is selected for each condensed images, which is dense to video to be concentrated progress The each frame image obtained after contracting, so as to according to the moving target combination in each condensed images, to video to be concentrated into The concentration of row video.As it can be seen that the present embodiment can select a kind of reasonable moving target combination for each frame condensed images, from And so that each moving target in every frame condensed images is occupied image space to greatest extent and make in every frame condensed images Different motion target between overlapping degree it is small as far as possible, and then improve concentration video concentration precision.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is the flow diagram of video concentration method provided by the embodiments of the present application；

Fig. 2 is the flow diagram of determining moving target combination provided by the embodiments of the present application；

Fig. 3 is the grouping flow diagram provided by the embodiments of the present application based on cluster；

Fig. 4 is that the grouping provided by the embodiments of the present application based on event detection adjusts schematic diagram；

Fig. 5 is moving target Statistics of Density schematic diagram provided by the embodiments of the present application；

Fig. 6 is the composition schematic diagram of video enrichment facility provided by the embodiments of the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.

First embodiment

It is a kind of flow diagram of video concentration method provided in this embodiment, this method includes following step referring to Fig. 1 It is rapid:

S101: video to be concentrated is obtained, wherein include multiple moving targets in the video to be concentrated.

In the present embodiment, the original video for carrying out video concentration will be needed to be defined as video to be concentrated, such as certain fields Monitor video under scape.The video to be concentrated is to may include the moving objects such as any video, such as people, vehicle of any moving object The moving object moved each in the video to be concentrated is defined as moving target here by body.

In the present embodiment, the background image in video to be concentrated can be extracted by background modeling.Wherein, the Background Seem to refer to by rejecting a clean background image after the moving target in video to be concentrated, the background image be by Background modeling algorithm is constantly updated background image, it is generally the case that the background in video to be concentrated is substantially not It changes, therefore need to only extract a most clean background image, so as to when subsequent step S103 carries out video concentration It uses.

S102: moving target combination is selected for each condensed images, each condensed images are to described to dense Each frame image that contracting video obtains after being concentrated.

In the present embodiment, a kind of reasonable moving target combination can be selected for each frame condensed images, thus So that each moving target in every frame condensed images is occupied image space to greatest extent and makes in every frame condensed images Overlapping degree between different motion target is small as far as possible, and then improves the concentration precision of concentration video.

In a kind of implementation of the present embodiment, this step S102 may include step S1021-S1022:

S1021: it determines footprint area of each moving target in each frame image of video to be concentrated and plants oneself.

In the present embodiment, the moving target in video to be concentrated can be extracted by Object Detecting and Tracking algorithm Sequence.Wherein, moving target is the object, such as people, vehicle etc. for referring to movement, and moving target sequence is then by same movement The frame image composition for all frame images that target occurs in video to be concentrated, that is, the corresponding movement of a moving target Target sequence.Due to the moving target in video to be concentrated might have it is multiple, the movement obtained from video to be concentrated Target sequence also has multiple, and it is total that the length of each moving target sequence is that each moving target occurs in video to be concentrated Frame number.

It should be noted that due to being to be tracked using target tracking algorism to moving target, for some fortune Moving-target, when after it leaves in video to be concentrated and appearing in video to be concentrated, will the moving target be considered as it is same Moving target.

In order to carry out video concentration to video to be concentrated, for each of video to be concentrated moving target, Ke Yi In the corresponding moving target sequence of the moving target, the moving target in each frame image in the moving target sequence is determined Footprint area in Background and plant oneself, that is, determine moving target in background image in which position and The space size that the position occupies.

S1022: according to the footprint area of each moving target and planting oneself, and selects moving target for each condensed images Combination, wherein each condensed images are each frame images obtained after video to be concentrated is concentrated.

In the present embodiment, for each frame image in video to be concentrated, determine each moving target belonging to it In which position and the space size occupied in the position in each frame image, that is, determine each moving target to dense It footprint area in each frame image of contracting video and plants oneself, the purpose is to when carrying out video concentration, that is, by different frame When the different motion subject fusion of image is into same condensed images, can make each moving target in condensed images occupy compared with More spatial context and make to reduce overlapping to the greatest extent between each moving target in the condensed images.Wherein, " overlapping " refers to two A or more than two moving targets overlap in same image, for example, the body generated in the picture between men Overlapping.

In the present embodiment, it based on footprint area of each moving target in each frame image and can plant oneself, and Consider appearance period of each moving target in video to be concentrated, to greatest extent and reasonable utilization background image is showed Image space selects a kind of reasonable moving target combination for each frame condensed images, makes each in every frame condensed images A moving target can occupy image space to greatest extent and make the weight between the different motion target in every frame condensed images Folded degree is small as far as possible, that is, keeps the scene utilization rate in every frame condensed images high as far as possible and keeps duplication loss rate small as far as possible.

For this purpose, in a kind of implementation of the present embodiment, the process of determination moving target combination as shown in Figure 2 Schematic diagram, this step S1022 may comprise steps of S201-S203:

S201: by the way that the moving target for having overlapping in video to be concentrated is divided into one group, obtaining each initial packet, In, it include at least one moving target in each initial packet.

Wherein, " overlapping " refers to that two or more moving targets occur in the same frame image of video to be concentrated Overlapping, for example, two people hand in hand, shoulder to shoulder together walk when, then, in video to be concentrated, the two people will be produced Raw body overlapping.If the moving target for having overlapping is divided into one group, subsequent grouping speed can be improved, moreover, carrying out When video is concentrated, if the moving target for having overlapping is concentrated together, be conducive to improve scene utilization rate, if be separately concentrated Words, concentration precision may be reduced, it is also possible to so that have two moving targets of overlapping concentration video in mutual alignment with Actual mutual alignment has differences in video to be concentrated.

In the present embodiment, the cluster feature of each moving target can be extracted from video to be concentrated, it specifically can be with Moving target sequential extraction procedures cluster feature based on each moving target, then, by clustering algorithm by institute in video to be concentrated There is the moving target of overlapping to be divided into one group, here, obtained each grouping is defined as initial packet, in each initial packet It may include one or more moving targets.

It should be noted that the specific implementation of this step S201, refers to the related introduction in second embodiment.

S202: by carrying out event detection to video to be concentrated, each initial packet is grouped adjustment, is obtained each First object grouping.

It in the present embodiment, may grouping knot due to only obtaining each initial packet of moving target by clustering algorithm Fruit is inaccurate, because this clustering algorithm possibly can not be by the total movement in the events such as assemble a crowd for a long time containing large area Target (such as a lot of people in certain activity) is assigned in same initial packet, therefore, in order to obtain more accurate group result, is needed Will try again to the moving target in the group in each initial packet between group screening.For this purpose, can be as unit of the time, it will Video to be concentrated is divided into multiple video-frequency bands, here, each video-frequency band is defined as the first video-frequency band, in each first video-frequency band Moving target number counted, when the number of statistics be more than preset threshold value when, just to this be more than threshold value the first video Duan Jinhang event detection is according to testing result adjusted each initial packet before, for example two initial packets are closed And be one group, here, by each grouping after merging and each initial packet without merging, it is respectively defined as first object Grouping.

It should be noted that the specific implementation of this step S202, refers to the related introduction in second embodiment.

It is of course also possible to adjustment is not grouped to each initial packet by S202, and directly by each initial packet It is grouped as each first object.

S203: it according to the footprint area of each moving target and plants oneself and each first object is grouped, to be each Condensed images select moving target combination.

In this implementation, in order to greatest extent and the image space that is showed of reasonable utilization background image, need A kind of reasonable moving target combination is selected for each frame condensed images, makes each moving target in every frame condensed images Image space can be occupied to greatest extent and makes overlapping degree between the different motion target in every frame condensed images as far as possible It is small, that is, to keep the scene utilization rate in every frame condensed images high as far as possible and keep duplication loss rate small as far as possible.

In a kind of concrete implementation mode, this step S203 may comprise steps of A1-A5:

Step A1: video to be concentrated is segmented, and obtains each second video-frequency band.

It is understood that the duration of video to be concentrated is longer, show that the search space of moving target is bigger, it is contemplated that right The efficiency that video to be concentrated is concentrated video segmentation to be concentrated can be concentrated, finally by the concentration knot of each segmentation Fruit is stitched together to arrive final concentration video.In one implementation, it can be got based on step S202 each Entire video to be concentrated is segmented, here, each video-frequency band obtained after segmentation is determined by the packet count of first object grouping Justice is the second video-frequency band.

Specifically, this step A1 can be according to the packet count of the totalframes of video to be concentrated and first object grouping, will be to Concentration video is segmented, to obtain each second video-frequency band.

Wherein, specific segmentation formula is as follows:

Wherein, T indicates the segments of video to be concentrated；α is Optimal Parameters, needs to be determined according to actual scene；F_n Indicate the totalframes of video to be concentrated；N_cIndicate the packet count of first object grouping, i.e., the total number of each first object grouping.

In formula (1), if in formula (1)For integer, then can using the integer as segments T, if In formula (1)It is not integer, integer part or integer part can be added 1 to be used as segments T.

Then, video to be concentrated is segmented according to segments T, to obtain T the second video-frequency bands.

Step A2: for every one second video-frequency band, each first object grouping occurred in the second video-frequency band is determined, and will The each first object grouping occurred is differently combined, and obtains each second targeted packets.

After it will be divided into each second video-frequency band to concentration video, can find out appear in it is each in the second video-frequency band A first object grouping, it should be noted that if the different motion target in the grouping of some first object appears in two differences The second video-frequency band in, for example the grouping of some first object includes moving target A and moving target B, but moving target A and movement Target B is respectively appeared in the second different video-frequency band of adjacent two, then all movements being grouped such first object Target belongs to one of them second video-frequency band.

After each first object grouping in the second video-frequency band has been determined, next, these first objects are grouped Be combined according to preset different modes, for example, by any two first object packet assembling be one group, based on these first The number of targeted packets may can choose all or part of combination, here, by each there are many combination Combined result under combination is defined as the second targeted packets.

It specifically, can be that the second video-frequency band sets an initial concentration densities as d, the size of d indicates the second target The group number for the first object grouping for including in grouping, is grouped based on concentration densities d；Later with concentration densities d+1 progress Grouping, then be grouped with concentration densities d+2 ... ..., until concentration densities reach the first object in second video-frequency band point Until the total number packets of group.

For example, if certain second video-frequency band in include these three first objects of AB, CD, EFG be grouped, wherein A, B, C, D, E, F, G respectively represents an independent moving target.These three first objects can be grouped and be considered as three moving targets X, Y, Z, So that all moving targets in being grouped first object are considered as a moving target progress video concentration, wherein X represents AB, Y CD is represented, Z represents EFG.If the concentration densities most started setting up are 2, by X, Y, Z these three moving targets arrangement group two-by-two It closes, i.e. XY, XZ, YZ, later, by initial concentration density 2 plus 1, i.e., current concentration densities are 3, then by these three movements of X, Y, Z Objective cross is one group, i.e. XYZ, since current concentration densities 3 are equal to the total number packets 3 of first object grouping, combination Terminate, thus, obtain this 4 second targeted packets of XY, XZ, YZ, XYZ.

Step A3: for each second targeted packets, determine each moving target in the second targeted packets in the second view It footprint area in each frame image of frequency range and plants oneself.

Step A4: it according to the footprint area of each moving target of each second targeted packets and plants oneself, each A kind of packet mode is selected in second targeted packets.

It according to the corresponding footprint area of every one second targeted packets and can plant oneself, utilize a given mesh The corresponding target function value of the second targeted packets is calculated in scalar functions, and the target function value is smaller, illustrates second target Each moving target in grouping is the moving target that scene utilization rate is high and duplication loss rate is small.

Wherein, the corresponding scene utilization rate of the second targeted packets reflects each movement mesh in second targeted packets Target footprint area occupies degree in background image.Specifically, scene utilization rate can refer in second targeted packets Different motion target image is after being pasted into the corresponding each condensed images of second video-frequency band, the face of all movement destination images Product union accounts for the specific gravity size of the scene area union of each condensed images, and the rate of specific gravity is bigger, then scene utilization rate is higher.

Wherein, the corresponding duplication loss rate of the second targeted packets reflects each movement mesh in corresponding second targeted packets Overlapping degree between the footprint area being marked in background image and the caused each moving target that plants oneself.Specifically, weight Folded loss late can refer to that overlapping area of the different motion target image in the second video-frequency band in second targeted packets accounts for respectively The size of a movement destination image gross area, the value are bigger, then it represents that duplication loss rate is higher.

Therefore, it when selecting a kind of packet mode by this step A4, that is, when selecting a kind of second targeted packets, can be generated The corresponding target function value of each second targeted packets, and select corresponding second targeted packets of minimum target functional value. Wherein, the corresponding target function value of the second targeted packets is according to the corresponding scene utilization rate of second targeted packets and overlapping What loss late generated, and scene utilization rate is inversely proportional to target function value, duplication loss rate is directlyed proportional to target function value.

Specifically, the corresponding target function value of the second targeted packets can be calculated according to following formula:

Wherein, U/S indicates scene utilization rate, and IoU indicates duplication loss rate.

In formula (2),Indicate the movement destination image in t-th of second targeted packets in its affiliated frame image Area union in (the frame image is the frame image in the second video-frequency band).For example, it is assumed that including first in the second targeted packets Targeted packets AB and first object are grouped CD, and A, B, C and D respectively represent different moving targets, it is assumed that first object is grouped AB Occur in the 1-5 frame in the second video-frequency band, first object grouping CD occurs in the 6-15 frame in the second video-frequency band, then by the The area of A image and B image in 1 frame image takes union, and the area of C image and D image in the 6th frame image is taken simultaneously The two union areas are taken union to obtain U1 by collection, the image of A image and B image in the 2nd frame image are taken union, and will The area of C image and D image in 7th frame image takes union, takes union to obtain U2 the two unions ... ..., by the 5th frame figure The image of A image and B image as in takes union, and the area of C image and D image in the 10th frame image is taken union, will The two unions take union to obtain U5, and each of these moving target can use a rectangle frame and frame, and utilize the rectangle frame Frame the edge of movement destination image, using the area of the rectangle frame as the image area of the moving target, then, by this 5 A union U1, U2 ..., U5 takes after union to get in formula (2)

In formula (2),Indicate (the frame figure of frame image belonging to the moving target in t-th of second targeted packets Seem the frame image in the second video-frequency band) area union.Continue previous example, due to each frame image of the second video-frequency band Size is identical, it is assumed that its area is s, then 5s is in formula (2)

In formula (2),Indicate the area union and its area of the moving target in t-th of second targeted packets The ratio of intersection.Continue previous example, calculates the area union U of A image and B image in the 1-5 frame image belonging to it_AB, and Calculate the area union U of C image and D image in the 6-10 frame image belonging to it_CD, by U_ABWith U_CDUnion is taken to obtain Ub, and will U_ABWith U_CDIt takes intersection to obtain Uj, then by the ratio Uj/Ub of the intersection area Uj obtained and the union area Ub of acquirement, makees For in formula (2)

Therefore, each of the second video-frequency band corresponding target letter of the second targeted packets can be calculated according to above-mentioned formula (2) Corresponding second targeted packets of minimum value can be distributed to second video-frequency band by numerical value.

For example, continuing the example in step A2, firstly, if the concentration densities most started setting up are 2, by X, Y, Z tri- Moving target permutation and combination two-by-two, and the value of the corresponding objective function F of each combination is calculated, for example, XY is calculated when combining The value of objective function F be the value of objective function F being calculated when 0.7, YZ combination be that 0.5, XZ is calculated when combining The value of objective function F is 0.6, at this point, by minimum value and its corresponding combined result record in be calculated three F values Get off, i.e., minimum F value is 0.5, and corresponding group is combined into YZ combination.Then, it by initial concentration density 2 plus 1, i.e., is currently concentrated close Degree is 3, then is one group by tri- moving targets of X, Y, Z, and calculate the value of the corresponding objective function F of the combination, for example, XYZ group The value for the objective function F being calculated when conjunction is 0.35, at this point, the F value being calculated and its corresponding combined result are remembered Record is got off, i.e., F value is 0.35, and corresponding group is combined into XYZ combination.Since total group of number of targeted packets is 3, current concentration densities Also it is 3, therefore, terminates the calculating of F value.

Finally, the minimum value for comparing calculated objective function F when concentration densities are respectively 2 and 3 compare YZ combination and XYZ combines the value of corresponding objective function F, i.e., such as 0.5 and 0.35, by comparing it is found that corresponding mesh when XYZ is combined The value of scalar functions F is minimum, then, which can be distributed to the second video-frequency band, and team to be concentrated is formed based on the combination Column, so that second video-frequency band is concentrated.That is, the length of queue to be concentrated is X+Y+Z=AB+CD+EFG, i.e., it is final Queue length to be concentrated is 7, i.e. 7 independent moving targets, at this point, can be by the moving target sequence of this 7 moving targets Column while being put into queue to be concentrated and be concentrated, each moving target sequence be aligned since first image successively carry out it is dense Contracting.

Step A5: according to the packet mode of selection, each condensed images being concentrated to give to the second video-frequency band are determined Corresponding moving target combination.

After selecting a packet mode from corresponding each second targeted packets of the second video-frequency band, this can be determined There is which first object grouping in the second selected targeted packets, at this point it is possible to obtain the movement mesh of each first object grouping It marks sequence (every frame image in the moving target sequence belongs to second video-frequency band), and by each moving target sequence according to phase It is encoded with mode, such as frame 1, frame 2 ..., it then, will be in the picture frame with identical frame number of each moving target sequence Moving target, as the moving target in same condensed images.It should be noted that due to each moving target sequence Length is different, if some moving target sequence terminates with other moving target sequence concentrations, when in the second video-frequency band When being grouped there is also other first objects other than selected combination, then by first object grouping and remaining moving target Sequence continues to be concentrated, until the moving target in frame images all in the second video-frequency band is completed concentration.

For example, it is assumed that including the total movement target in the second video-frequency band, such as step in the second targeted packets of selection Citing in A4, i.e. 3 first object groupings, then by each movement mesh in the 1st frame in corresponding 3 moving target sequences It marks, all the moving target as the 1st frame condensed images, by each movement mesh in the 2nd frame in 3 moving target sequences It marks, all moving target ... the .. as the 2nd frame condensed images.It is understood that due to the frame number of 3 moving target sequences May be different, behind moving target number in condensed images can reduce.

In another example, it is assumed that include the componental movement target in the second video-frequency band in the second targeted packets of selection, for example walks Citing in rapid A4, including first object are grouped AB and the second targeted packets CD, then will be in corresponding two moving target sequences The 1st frame in each moving target, whole moving target as the 1st frame condensed images, by the two moving target sequences In the 2nd frame in each moving target, whole moving targets ... ... as the 2nd frame condensed images.It should be noted that Since the frame number of this 2 moving target sequences may be different, for example first object grouping AB includes 5 frames, the second targeted packets CD Including 10 frames, then when each moving target being grouped first object in the 5th frame in the moving target sequence of AB and CD, all The each movement mesh being grouped as the moving target of the 5th frame condensed images and then by first object in the 1st frame image of DEF Mark and each moving target in the 6th frame in the moving target sequence of first object grouping CD, are all concentrated as the 6th frame and scheme The moving target of picture, in the manner described above, until by all corresponding concentrations of each movement destination image of the second video-frequency band Until image.

S103: according to the moving target combination in each condensed images, video concentration is carried out to video to be concentrated.

In the present embodiment, each second video-frequency band in video to be concentrated can be concentrated parallel, also, right When every one second video-frequency band is concentrated, the moving target combination in each condensed images can be successively determined, and true After the moving target combination for having determined current condensed images, current condensed images are formed, in this way, can effectively improve concentration speed Degree.

It, can be by each moving target under corresponding moving target combination when being concentrated to get each frame condensed images Image is pasted into the corresponding position in background image (identical as the picture position in the second video-frequency band), to obtain a series of Condensed images are spliced according to the formation sequence of condensed images to get the concentration video-frequency band of second video-frequency band is arrived.Wherein, A condensed images are obtained in order to be merged movement destination image, can be merged using blending algorithm, for example moor Loose blending algorithm.

Since each second video-frequency band of video to be concentrated respectively corresponds a concentration video-frequency band, then according to each second view The playing sequence of frequency range splices its corresponding each concentration video-frequency band to get final concentration video is arrived.

To sum up, video concentration method provided in this embodiment, obtain include multiple moving targets video to be concentrated, can be with Moving target combination is selected for each condensed images, which obtained after video to be concentrated is concentrated Each frame image, so that it is dense to carry out video to video to be concentrated according to the moving target combination in each condensed images Contracting.As it can be seen that the present embodiment can be based on footprint area of each moving target in each frame image of the video to be concentrated and accounting for According to position, a kind of reasonable moving target combination is selected for each frame condensed images, to make in every frame condensed images Each moving target can occupy image space to greatest extent and make between the different motion target in every frame condensed images Overlapping degree is small as far as possible, and then improves the concentration precision of concentration video.

Second embodiment

It should be noted that the present embodiment is by the specific implementation to step S201 and step S202 in first embodiment It is introduced.

In the present embodiment, the grouping flow diagram shown in Figure 3 based on cluster, the step in first embodiment S201 " by the way that the moving target for having overlapping in video to be concentrated is divided into one group, obtains each initial packet ", may include with Lower step S301-S302:

S301: the corresponding cluster feature of each moving target is extracted from video to be concentrated.

In the present embodiment, the moving targets for having overlapping all in video to be concentrated can be divided into one by clustering algorithm Group includes one or more moving targets in each initial packet to obtain each initial packet.In order to pass through clustering algorithm It obtains each initial packet, needs to extract the corresponding cluster feature of each moving target from video to be concentrated, i.e., it is each Moving target corresponds to a group cluster feature, wherein the cluster feature may include corresponding moving target in video to be concentrated One in moving direction information, translational speed information, location information when first appearing and temporal information when first appearing Item is multinomial.

Specifically, in getting video to be concentrated the moving target sequence of each moving target (that is, in video to be concentrated Each frame image with corresponding moving target) after, for each moving target, mentioned from its corresponding moving target sequence Take a group cluster feature.

It is now illustrated by taking one of moving target as an example, for example, with the movement of moving target 1 and moving target 1 Target sequence 1 is illustrated, and extracts moving direction information, the translational speed information of moving target sequence 1, and extracts moving target 1 location information first appeared in video to be concentrated and temporal information, by " moving direction information ", " movement speed of extraction Information ", " location information " and " temporal information " this four information, as the corresponding cluster feature of moving target 1.Below to this The acquisition modes of this four information in cluster feature are introduced respectively:

(1), in each frame image of video to be concentrated, each moving target in frame image can be made to be in a square In shape frame, the edge of movement destination image is framed using the rectangle frame, then, for the moving target sequence of moving target 1 1, the rectangle frame (being moving target 1 in the rectangle frame) in each adjacent two frame image of moving target sequence 1 can be calculated The offset of center point coordinate, such as, it is assumed that moving target sequence 1 has 5 frame images, then 4 offsets can be calculated, And each offset being calculated is averaged, obtaining mean deviation amount, (Δ x, Δ y), willExist as moving target 1 " moving direction information " in video to be concentrated.

(2), mean deviation amount is calculated in (1) (after Δ x, Δ y), to incite somebody to actionAs moving target 1 to " translational speed information " in video is concentrated.

(3), for the moving target sequence 1 of moving target 1, first frame image in moving target sequence 1 can be determined In rectangle frame (rectangle frame in be moving target 1) center point coordinate (x, y), regard the coordinate (x, y) as moving target 1 " location information " when being first appeared in video to be concentrated.

It (4), can be in advance by each frame image of video to be concentrated, according to playing sequence number consecutively, thus each frame Image corresponds to a frame number, in this way, can determine in moving target sequence 1 for the moving target sequence 1 of moving target 1 First frame image frame number f, using frame number f as moving target 1 in video to be concentrated when first appearing " time believe Breath ".

So just having obtained the corresponding cluster feature of moving target 1, i.e.,Further, Parameters in the cluster feature can also be normalized.

As it can be seen that by way of the corresponding cluster feature of above-mentioned acquisition moving target 1, it is each in available video to be concentrated A corresponding cluster feature of moving target.

S302: according to the corresponding cluster feature of each moving target, the moving target for having overlapping is divided into one group, is obtained To each initial packet.

In the present embodiment, it can be based on the corresponding cluster feature of each moving target, by clustering algorithm by institute There is the moving target of overlapping to be divided into one group, so that each initial packet is obtained, for example, can be poly- using K mean value (K-means) Class algorithm is grouped.

In a kind of implementation of the present embodiment, this step S302 can specifically include following steps B1-B3:

Step B1: special according to the corresponding cluster of each moving target respectively according to the different default cluster number of M kind Sign, each moving target is clustered, M kind cluster result is obtained.

Before being clustered, the number of default cluster is needed, is then clustered according to the number, in the present embodiment In, it can choose a variety of different cluster numbers, such as M kind, in this way, M kind cluster result can be obtained, each cluster knot The cluster number of fruit is different.It should be noted that the present embodiment does not limit the size of M, as long as M is less than the total number of moving target ?.

For example, it is assumed that the moving target total number in video to be concentrated is N (N >=2), if the clustering algorithm used is K Mean value (K-means) clustering algorithm then can first set an initial clustering number as N-1, that is, by the movement in video to be concentrated Target is tentatively divided into N-1 group, therefore, can first from arbitrarily selected in N number of cluster feature N-1 cluster feature as cluster in The heart, and for remaining cluster feature, then it, will according to the similarity (distance) of remaining cluster feature and these cluster centres Remaining cluster feature is distributed to the cluster most like with it (cluster representated by cluster centre), and each obtained new cluster is calculated Cluster centre (mean value of each dimension of all cluster features in the cluster), constantly repeat the above process, until cluster knot Shu Hou obtains N-1 cluster；Then, resetting cluster numbers is N-2, is clustered in the manner described above, to obtain N-2 Cluster；……；Finally, setting cluster numbers as N/2, clustered in the manner described above, to obtain N/2 cluster.Pass through This M kind of above-mentioned N-1, N-2 ... N/2 clusters number, available M kind cluster result.

Step B2: for each cluster result in M kind cluster result, each moving target in the cluster result is calculated Error between cluster feature and the cluster centre feature of the cluster result, and the sum of square of each error is calculated, it is somebody's turn to do The corresponding error sum of squares of cluster result.

After obtaining M cluster result by step B2, there are multiple clusters in each cluster result, each Cluster all has a cluster centre.So, in same cluster result, the poly- of each cluster in the cluster result can be determined Then class central feature calculates the cluster feature of each moving target in same cluster and the cluster centre feature of the cluster Between error, and by the sum of square of each error of each cluster, as the corresponding error sum of squares of the cluster result, accidentally Poor quadratic sum is smaller, indicates that the Clustering Effect of the cluster result is better.

Step B3: selecting the corresponding cluster result of minimum value from the corresponding error sum of squares of M kind cluster result, and Using each cluster in the cluster result selected as each initial packet.

When there are M cluster result, then corresponding there are M errors sum of squares, since error sum of squares is smaller, then it represents that The Clustering Effect of corresponding cluster result is better, therefore, can therefrom select a smallest error sum of squares, that is, select the mistake The corresponding cluster result of poor quadratic sum, and the moving target in cluster each in the cluster result is correspondingly formed one initial point Group.

In the present embodiment, grouping shown in Figure 4 based on event detection adjusts schematic diagram, in first embodiment Each initial packet " by carrying out event detection to video to be concentrated, is grouped adjustment, obtains each first by step S202 Targeted packets " may comprise steps of S401-S403:

S401: being divided into multiple first video-frequency bands for video to be concentrated, and in each first video-frequency band of division, respectively Count the number of moving target.

In the present embodiment, video to be concentrated can be divided into multiple video-frequency bands, such as every five seconds as unit of the time A video-frequency band is divided, the duration of video-frequency band can be set based on experience or experimental result, here, by each video-frequency band of division It is defined as the first video-frequency band.For each first video-frequency band, statistics appears in the different motion target in first video-frequency band Number.

It is understood that the moving target number in each first video-frequency band of statistics, really counts each first view The Density Distribution situation of the moving target of frequency range, wherein the Density Distribution situation of moving target can be by histogram or other Form indicates.For example, it is assumed that video total duration to be concentrated is 30 minutes, as unit of the time, by the duration of the first video-frequency band Be set as 5s, then available 360 the first video-frequency bands, then, in each first video-frequency band statistics occur it is all not With the number of moving target, if certain moving target repeatedly occurs in multiple frame images in some first video-frequency band, only unite Meter once without repeating to count, is here provided specific statistical result, moving target as shown in Figure 5 in the form of histogram Statistics of Density schematic diagram, wherein the abscissa of Fig. 5 indicates the number of each first video-frequency band, and ordinate indicates every in abscissa The number of the moving target occurred in corresponding first video-frequency band of one number, as " 1 " in abscissa indicates first first view Frequency range, i.e., the period corresponding video of 00:00:00 to 00:00:05 in video to be concentrated, corresponding ordinate be " 1 " i.e. There is 1 moving target in first the first video-frequency band in expression, and so on, can draw out indicates each first video The histogram of the corresponding moving target number of section.

S402: event detection is carried out in each first video-frequency band that statistics number reaches predetermined number threshold value.

Number for the moving target occurred in each first video-frequency band, can be by each moving target number and one Preset number threshold value (the number threshold value can be set based on experience or experimental result) is compared, if certain moving target Number is greater than or equal to the number threshold value, then event inspection can be carried out in corresponding first video-frequency band of the moving target number Survey, for example, the event detected can be such as assemble a crowd, run, packet loss, the preset event type such as fight.

In the present embodiment, the purpose of event detection is will to participate in all movements of same event (event of such as assembling a crowd) Target assigns to same group, to guarantee the integrality of the moving target in same event, further, since event detection can further expand Moving target number in every group big, so that subsequent video can save more times when being concentrated.

It specifically, can when the first video-frequency band for being greater than predetermined number threshold value to moving target number carries out event detection From continuous a few frame images in first video-frequency band (number of frame image can be based on the setting such as experience or experimental result) Extract Optic flow information, that is, characteristic value is asked to each pixel in each image in these frame images, characteristic value is greater than pre- If the pixel of threshold value is as characteristic point, and using the corresponding displacement information of these characteristic points as Optic flow information, then, by light stream Information input utilizes CNN into the convolutional neural networks (Convolutional neural network, CNN) constructed in advance Model analyzes the motion conditions of the moving target in first video-frequency band based on Optic flow information, to predict first video Whether contain in section and assembles a crowd to wait predeterminable events.

S403: if detecting, the moving target in the first video-frequency band in different initial packets participates in same event, will Moving target in different initial packets merges into same group, by each grouping after merging and without merging it is each just Begin to be grouped, be grouped respectively as first object.

Event detection is carried out by step S402, the purpose is to the moving target for participating in same event is merged into one Group, that is, for not in multiple moving targets of the same initial packet, if it is same to detect that they take part in by event detection A event, the then each moving target for needing to belong to these different initial packets merge into one group, in this way, by some initial point Each grouping that group merges that treated and each initial packet for not merging processing, respectively as one first Targeted packets, in this way, just carried out grouping adjustment to the obtained each initial packet of cluster, so as to form one or more the One targeted packets.

For example, as shown in Figure 5, it is assumed that preset number threshold value is 10, then it is assumed that be greater than 10 institute in moving target number Interested event (the interested event can be any type of event of assembling a crowd) generation may be had by having in the first video-frequency band, If the types such as having detected interested event, for example having fought, then it will belong to the different initial packets of the same event Each moving target be merged into same group.

It should be noted that if there are two determinations or more than two after carrying out event detection to certain first video-frequency band Moving target in initial packet participates in same event, but the componental movement target of some initial packets therein does not occur In first video-frequency band, then will these initial packets assign to same group.For example, it is assumed that in initial packet 1 and initial packet 2 Componental movement target appear in first video-frequency band and take part in certain event, but in initial packet 1 and/or initial packet 2 Other parts moving target does not appear in first video-frequency band, then will initial packet 1 and initial packet 2 total movement Target merges into one group, becomes a first object grouping.

To sum up, each moving target can be carried out preliminary grouping by clustering algorithm by the present embodiment, to obtain each Then each initial packet of parameter to same event is merged into same group by event detection by a initial packet, thus It is grouped to each first object, in this way, in subsequent progress video concentration, it can be by each fortune in the grouping of the same first object Moving-target is concentrated in video together, that is, user can see simultaneously same the in the same condensed images of concentration video All moving targets of one targeted packets, to improve concentrated effect.

3rd embodiment

A kind of video enrichment facility will be introduced in the present embodiment, and related content refers to above method embodiment.

It is a kind of composition schematic diagram of video enrichment facility provided in this embodiment referring to Fig. 6, which includes:

Video acquisition unit 601 to be concentrated includes multiple movements in the video to be concentrated for obtaining video to be concentrated Target；

Objective cross selecting unit 602, it is described each dense for selecting moving target combination for each condensed images Contract drawing seems each frame image obtained after the video to be concentrated is concentrated；

Video concentration unit 603, for according to the moving target combination in each condensed images, to institute It states video to be concentrated and carries out video concentration.

In a kind of implementation of the present embodiment, the objective cross selecting unit 602 includes:

In a kind of implementation of the present embodiment, the objective cross selection subelement includes:

In a kind of implementation of the present embodiment, the initial packet subelement includes:

In a kind of implementation of the present embodiment, the cluster feature includes corresponding moving target in the view to be concentrated Moving direction information, translational speed information in frequency, location information when first appearing and temporal information when first appearing At least one of in.

In a kind of implementation of the present embodiment, the initial packet forms subelement and includes:

In a kind of implementation of the present embodiment, the event detection subelement includes:

In a kind of implementation of the present embodiment, the combination selection subelement includes:

In a kind of implementation of the present embodiment, described second divides subelement, is specifically used for according to described to be concentrated The packet count of the totalframes of video and first object grouping, the video to be concentrated is segmented.

In a kind of implementation of the present embodiment, the mode determines subelement, is specifically used for generating each second mesh Mark is grouped corresponding target function value, and selects corresponding second targeted packets of minimum target functional value；

Further, the embodiment of the present application also provides a kind of video enrichment facilities, comprising: processor, memory, system Bus；

The processor and the memory are connected by the system bus；

Further, described computer-readable to deposit the embodiment of the present application also provides a kind of computer readable storage medium Instruction is stored in storage media, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned video Any one implementation in method for concentration.

Further, the embodiment of the present application also provides a kind of computer program product, the computer program product exists When being run on terminal device, so that the terminal device executes any one implementation in above-mentioned video concentration method.

As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation All or part of the steps in example method can be realized by means of software and necessary general hardware platform.Based on such Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for the technical solution of the application It embodies, which can store in storage medium, such as ROM/RAM, magnetic disk, CD, including several Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway Equipment, etc.) execute method described in certain parts of each embodiment of the application or embodiment.

It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality For applying device disclosed in example, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place Referring to method part illustration.

It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of video concentration method characterized by comprising

Moving target combination is selected for each condensed images, each condensed images are carried out to the video to be concentrated The each frame image obtained after concentration；

According to the moving target combination in each condensed images, video concentration is carried out to the video to be concentrated.

2. the method according to claim 1, wherein described select moving target combination side for each condensed images Formula, comprising:

It according to the footprint area of each moving target and plants oneself, selects moving target combination for each condensed images.

3. according to the method described in claim 2, it is characterized in that, the footprint area according to each moving target and occupying Position selects moving target combination for each condensed images, comprising:

By the way that the moving target for having overlapping in the video to be concentrated is divided into one group, each initial packet is obtained, it is described initial It include at least one moving target in grouping；

By carrying out event detection to the video to be concentrated, each initial packet is grouped adjustment, obtains each first Targeted packets；

According to the footprint area of each moving target and plant oneself and each first object grouping, for each condensed images select Select moving target combination.

4. according to the method described in claim 3, it is characterized in that, the fortune by will have overlapping in the video to be concentrated Moving-target is divided into one group, obtains each initial packet, comprising:

According to the corresponding cluster feature of each moving target, the moving target for having overlapping is divided into one group, is obtained each first Begin to be grouped.

5. according to the method described in claim 4, it is characterized in that, the cluster feature include corresponding moving target it is described to Moving direction information, translational speed information in concentration video, location information when first appearing and when first appearing when Between in information at least one of.

6. according to the method described in claim 4, it is characterized in that, described special according to the corresponding cluster of each moving target Sign, is divided into one group for the moving target for having overlapping, obtains each initial packet, comprising:

According to the different default cluster number of M kind, respectively according to the corresponding cluster feature of each moving target, by each fortune Moving-target is clustered, and M kind cluster result is obtained；

For each cluster result in the M kind cluster result, the cluster for calculating each moving target in the cluster result is special Error between sign and the cluster centre feature of the cluster result, and the sum of square of each error is calculated, obtain the cluster knot The corresponding error sum of squares of fruit；

The corresponding cluster result of minimum value is selected from the corresponding error sum of squares of M kind cluster result, and will selection Cluster result in each cluster as each initial packet.

7. according to the method described in claim 3, it is characterized in that, described by carrying out event inspection to the video to be concentrated It surveys, each initial packet is grouped adjustment, obtain each first object grouping, comprising:

If detecting, the moving target in the first video-frequency band in different initial packets participates in same event, by different initial points Moving target in group merges into same group, and each grouping after merging and each initial packet without merging are divided It Zuo Wei not first object grouping.

8. according to the described in any item methods of claim 3 to 7, which is characterized in that the occupying according to each moving target Area and plant oneself and each first object grouping, for each condensed images select moving target combination, comprising:

Determine occur in second video-frequency band each first object grouping, and by each first object of appearance grouping according to Different modes are combined, and obtain each second targeted packets；

For each second targeted packets, determine each moving target in the second targeted packets in each of second video-frequency band It footprint area in frame image and plants oneself；

It according to the footprint area of each moving target of each second targeted packets and plants oneself, in each second targeted packets In select a kind of packet mode, and according to the packet mode of selection, determine second video-frequency band is concentrated to give it is each The corresponding moving target combination of a condensed images.

9. according to the method described in claim 8, it is characterized in that, described be segmented the video to be concentrated, comprising:

According to the packet count of the totalframes of the video to be concentrated and first object grouping, the video to be concentrated is carried out Segmentation.

10. according to the method described in claim 8, it is characterized in that, each movement according to each second targeted packets It the footprint area of target and plants oneself, selects a kind of packet mode in each second targeted packets, comprising:

The corresponding target function value of each second targeted packets is generated, and selects corresponding second mesh of minimum target functional value Mark grouping；

Wherein, the target function value is generated according to scene utilization rate and duplication loss rate；The scene utilization rate reflection The footprint area of each moving target in corresponding second targeted packets in background image occupies degree, and the scene Utilization rate is inversely proportional with the target function value；The duplication loss rate reflects each movement in corresponding second targeted packets Overlapping degree of the target in the footprint area in background image between each moving target caused by planting oneself.

11. a kind of video enrichment facility characterized by comprising

Video acquisition unit to be concentrated includes multiple moving targets in the video to be concentrated for obtaining video to be concentrated；

Objective cross selecting unit, for selecting moving target combination, each condensed images for each condensed images It is each frame image obtained after the video to be concentrated is concentrated；

Video concentration unit, for according to the moving target combination in each condensed images, to described to dense Contracting video carries out video concentration.

12. device according to claim 11, which is characterized in that the objective cross selecting unit includes:

Target data obtains subelement, for determining each moving target occupying in each frame image of the video to be concentrated It area and plants oneself；

Objective cross selects subelement, is each concentration figure for according to the footprint area of each moving target and planting oneself As selection moving target combination.

13. device according to claim 12, which is characterized in that objective cross selection subelement includes:

Initial packet subelement, for obtaining each by the way that the moving target for having overlapping in the video to be concentrated is divided into one group A initial packet includes at least one moving target in the initial packet；

Event detection subelement, for by carrying out event detection to the video to be concentrated, each initial packet to be divided Group adjustment obtains each first object grouping；

Combination selection subelement, for according to the footprint area of each moving target and planting oneself and each first object point Group selects moving target combination for each condensed images.

14. device according to claim 13, which is characterized in that the initial packet subelement includes:

Cluster feature extracts subelement, special for extracting the corresponding cluster of each moving target from the video to be concentrated Sign；

Initial packet forms subelement, for will have the movement of overlapping according to the corresponding cluster feature of each moving target Target is divided into one group, obtains each initial packet.

15. device according to claim 14, which is characterized in that the initial packet forms subelement and includes:

Target cluster subelement is respectively corresponded to according to each moving target respectively for the default cluster number different according to M kind Cluster feature, each moving target is clustered, M kind cluster result is obtained；

Quadratic sum computation subunit, for calculating in the cluster result for each cluster result in the M kind cluster result Error between the cluster feature of each moving target and the cluster centre feature of the cluster result, and calculate the flat of each error The sum of side, obtains the corresponding error sum of squares of the cluster result；

Cluster result selects subelement, for selecting minimum value from the corresponding error sum of squares of M kind cluster result Corresponding cluster result, and using each cluster in the cluster result selected as each initial packet.

16. device according to claim 13, which is characterized in that the event detection subelement includes:

Event detection subelement, for carrying out event inspection in each first video-frequency band that statistics number reaches predetermined number threshold value It surveys；

Targeted packets subelement, if for detecting that it is same that the moving target in the first video-frequency band in different initial packets participates in Moving target in different initial packets is then merged into same group by event, by each grouping and non-economic cooperation after merging And each initial packet, respectively as first object be grouped.

17. 3 to 16 described in any item methods according to claim 1, which is characterized in that combination selection subelement includes:

Packet assembling subelement, for determining each first object occurred in second video-frequency band grouping, and by appearance Each first object grouping is differently combined, and obtains each second targeted packets；

Determining subelement is occupied, for determining each moving target in the second targeted packets for each second targeted packets It footprint area in each frame image of second video-frequency band and plants oneself；

Mode determines subelement, for according to the footprint area of each moving targets of each second targeted packets and occupying position It sets, a kind of packet mode is selected in each second targeted packets, and according to the packet mode of selection, determine to second view The corresponding moving target combination of each condensed images that frequency range is concentrated to give.

18. device according to claim 17, which is characterized in that the mode determines subelement, is specifically used for generating each The corresponding target function value of a second targeted packets, and select corresponding second targeted packets of minimum target functional value；

19. a kind of video enrichment facility characterized by comprising processor, memory, system bus；

The processor and the memory are connected by the system bus；

The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt The processor makes the processor perform claim require 1-10 described in any item methods when executing.

20. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When described instruction is run on the terminal device, so that the terminal device perform claim requires the described in any item sides of 1-10 Method.

21. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make It obtains the terminal device perform claim and requires the described in any item methods of 1-10.