CN109165565A

CN109165565A - A kind of video object discovery and dividing method based on Coupled Dynamic Markov Network

Info

Publication number: CN109165565A
Application number: CN201810865881.1A
Authority: CN
Inventors: 王乐; 刘子熠; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2019-01-08

Abstract

The video object discovery and dividing method that the invention discloses a kind of based on Coupled Dynamic Markov Network, belong to computer vision and area of pattern recognition, step includes: to divide two problems with video object to video object discovery first with dynamic Markov Network to model；Then appearance and video timing information are utilized, likelihood function, compatibility function and the kinematic function in model are modeled, and target detection model is initialized；Overall model is finally solved in the way of confidence spread, obtains the video object discovery with maximum a posteriori probability and segmentation result.The present invention can solve the problems, such as video object discovery and segmentation two simultaneously, and can realize the combined optimization of two problems, finally the accuracy rate of target detection and Target Segmentation can be made all to be highly improved.

Description

A kind of video object discovery and dividing method based on Coupled Dynamic Markov Network

Technical field

The invention belongs to computer visions and mode identification technology, in particular to a kind of to be based on Coupled Dynamic Ma Erke The video object of husband's network is found and dividing method.

Background technique

Along with the rapid growth of video data, the significance level of the technology for automatically treating of video is growing day by day.Wherein regard Frequency target detection and Target Segmentation problem have increasing theoretical research value and practical application value.How to find automatically The target that includes in video and be located out in the time and space be still current video process field difficulties.It should Practical problem includes two sub-problems, i.e. video object discovery is divided with video object.In the problem, input is comprising target It also include the video sequence of background, output is divided into two parts, one is to the timi requirement of target, i.e. the result of target detection； The second is the space orientation to target, the i.e. result of Target Segmentation.At present traditional method be by two sub-problems Independent modeling, The association between subproblem is not accounted for, two sub-problems cannot obtain combined optimization and solve.

Summary of the invention

The purpose of the present invention is to provide a kind of video object discovery and segmentation based on Coupled Dynamic Markov Network Method, to solve above-mentioned technical problem.The present invention is based on Coupled Dynamic Markov Network by unified model Model, can by video object find with video object segmentation two sub-problems be unified under a frame, two sons can be asked Topic carries out combined optimization and solves；It may make the solution of two sub-problems cooperates with to promote, can finally two sub-problems be made all to reach Higher accuracy rate.

In order to achieve the above objectives, the invention adopts the following technical scheme:

A kind of video object discovery and dividing method based on Coupled Dynamic Markov Network, comprising the following steps:

Step 1, comprising the video sequence of targetWherein f_tIndicate the t frame image of video, video sequence V It altogether include T frame image；Video sequence V is modeled using dynamic Markov Network, obtains Coupled Dynamic Markov Network frame Frame model；

In the Coupled Dynamic Markov Network frame model of acquisition including target detection dynamic Markov Network and The dynamic Markov Network of Target Segmentation；The observation level of the dynamic Markov Network of target detection is that image object is candidate Region；The observation level of the dynamic Markov Network of Target Segmentation is image superpixel；

Step 2, the video sequence in step 1 including target is utilizedBetween the appearance and video consecutive frame for including Timing information, the observation likelihood function in Coupled Dynamic Markov Network frame model, the compatibility function that step 1 is obtained It is modeled with consecutive frame kinematic function, obtains Coupled Dynamic Markov Network model；And to the Coupled Dynamic Ma Er of acquisition Target detection model in section's husband's network model initializes, and obtains initial target discovery result；

Step 3: solving the Coupled Dynamic Markov Network model that institute's step 2 obtains using belief propagation algorithm, obtain To the result of video object discovery and video object segmentation.

Further, include: to the video sequence V specific steps modeled using dynamic Markov Network in step 1

Step 1.1, the target detection label of video sequence V is denoted asCorresponding target detection is observed Wherein o_t,iIndicate f_tI-th of candidate region；

Step 1.2, the Target Segmentation label of video sequence V is denoted asCorresponding Target Segmentation is observed Wherein s_t,jIndicate f_tJ-th of super-pixel；

Step 1.3,WithBetween there are compatibility function Ψ (L, B).

Further, in step 2, to the observation likelihood letter in the Coupled Dynamic Markov Network frame model of acquisition Number, compatibility function and consecutive frame kinematic function modeled specifically includes the following steps:

Step 2.1, it gives a mark to each candidate region, then establishes the likelihood function p (O of target detection_t|L_t)；Each candidate The score in region are as follows:

r(o_t,i)=r_s(o_t,i)·r_a(o_t,i)·r_m(o_t,i)

Wherein, r_s(o_t,i) indicate conspicuousness score, for indicating the significance degree of corresponding region；r_a(o_t,i) indicate object Score, for indicating that corresponding region includes the confidence level of object；r_m(o_t,i) indicate sports scores, for indicating that corresponding region is wrapped The confidence level of the object containing persistent movement；In the score r (o for obtaining each candidate target_t,i) after, the likelihood function of target detection It is established as formula:

Wherein,Indicate r_s(o_t,i) score after normalized；

Step 2.2, the likelihood function p (S about Target Segmentation is established_t|B_t), concrete mode are as follows: the prospect for learning video is high This mixed model and background gauss hybrid models, wherein prospect Gauss model is expressed as h₁；Background Gauss model is h₀；Target point The likelihood function cut is established as formula:

Wherein,

Step 2.3, the foundation of compatibility function, will be from L_tTo B_tCompatibility function is defined as:

Ψ_LB(L_t,B_t)=IoU (o_t,i,B_t(1))；i∈{1,…,K}

Wherein, B_t(1) segmentation result calculated, o are indicated_t,iIndicate f_tI-th of candidate region；

It will be from B_tTo L_tCompatibility function is defined as:

Wherein, O_t(1) indicate the target detection calculated as a result, s_t,jIndicate f_tJ-th of super-pixel；

Step 2.4, by the dynamic model p (L of target detection_t|L_t-1) is defined as:

Wherein,Indicate candidate region o_t,iFor its time adjacent area o_t-1,mTransition probability, i indicate present frame be The label of candidate region in t frame, m indicate to be chosen as o in the former frame i.e. t-1 frame of t frame_t,iAdjacent area mark Number；

By the dynamic model of Target Segmentation is defined as:

Wherein,Indicate super-pixel s_t,jFor its time neighbouring super pixels s_t-1,nTransition probability, j indicate present frame be The label of super-pixel in t frame, n indicate to be chosen as s in the former frame i.e. t-1 frame of t frame_t,jAdjacent area label.

Further, in step 2, target detection model initialization specifically includes the following steps:

(1) video frame for having been illustrated as feature vector is divided by two classes by classifier, positive sample is equal from video The frame of even acquisition, negative sample are the image in uncorrelated data set；

(2) classifier is trained by the data in step (1), and the confidence level for belonging to positive sample is greater than 80% Sample be considered as positive sample, the confidence level for belonging to positive sample is considered as negative sample less than 30% sample；

(3) step (2) are repeated until classifier is restrained；

(4) classifier obtained using step (3) carries out two classification to all frames of video sequence V, obtains comprising target Frame is with the preliminary classification for not including target frame as a result, realizing the initialization of target detection model.

Further, the process for being solved established model in step 3 using belief propagation algorithm, is specifically included following Step:

(1) confidence level for considering adjacent before and after frames is calculated, wherein the confidence level from B to L and the confidence level difference from L to B Are as follows:

WhereinFor the target observation since video to t moment；For from video Terminate the target observation to t moment；Start to the segmentation of t moment to observe for video；For Terminate the target observation to t moment from video；

(2) it after obtaining confidence level by step 1, is sent out according to the target that belief propagation algorithm obtains maximum a posteriori probability Now result p (L_t| O, S) are as follows:

Obtain the Target Segmentation p (B of maximum a posteriori probability_t| O, S) result are as follows:

Compared with prior art, the invention has the following advantages:

Video object based on Coupled Dynamic Markov Network of the invention is found and dividing method, utilizes dynamic Ma Er Section's husband's network finds that dividing two problems with video object models to video object；Then by two dynamic decomposable markov networks Network is coupled together, and overall model is solved in the way of confidence spread, and obtaining, there is the video object of maximum a posteriori probability to send out Now with segmentation result.The present invention can solve the problems, such as video object discovery and segmentation two simultaneously, and the present invention can be real The combined optimization of existing two problems has mathematically obtained having maximum a posteriori under target detection and Target Segmentation double-run Target detection result and the object segmentation result (i.e. p (L of probability_t| O, S) and p (B_t| O, S)), it finally may make target detection It is all highly improved with the accuracy rate of Target Segmentation.

Further, the present invention is when initialized target finds model, using a kind of unsupervised repetitive exercise method. This method does not depend on the data manually marked.If method depends on the training of handmarking's data, identical method Corresponding handmarking's data mating must be provided when handling different data, and flag data is one and expends a large amount of manpowers With the work of time.This method does not depend on the data manually marked, is conducive to popularization and application of the invention.

Detailed description of the invention

Fig. 1 is a kind of stream of video object discovery and dividing method based on Coupled Dynamic Markov Network of the invention Journey schematic block diagram；

Fig. 2 is that a kind of video object discovery based on Coupled Dynamic Markov Network of the invention is established with dividing method Coupled Dynamic Markov Network model schematic block diagram；

Fig. 3 is first in a kind of video object discovery and dividing method based on Coupled Dynamic Markov Network of the invention The schematic process flow diagram of beginningization target detection model；

Fig. 4 is the Comparative result schematic diagram that distinct methods evaluate and test Target Segmentation on SegTrack data set；

Fig. 5 is that the Comparative result that distinct methods evaluate and test target detection on Noisy-ViDiSeg data set is illustrated Figure；

Fig. 6 is that the Comparative result that distinct methods evaluate and test Target Segmentation on Noisy-ViDiSeg data set is illustrated Figure.

Specific embodiment

In order to illustrate more clearly of the present invention, the present invention will be further described with reference to the accompanying drawing.

Referring to Fig. 1 to Fig. 3, a kind of video object discovery and segmentation based on Coupled Dynamic Markov Network of the invention Method, specifically includes the following steps:

Step 1: establishing Coupled Dynamic Markov Network.Provide the video sequence comprising simple targetWherein f_tIndicate the t frame image of video.V is modeled using dynamic Markov Network, obtains Coupled Dynamic horse Er Kefu network frame model, including the dynamic Markov Network of corresponding video object discovery, observation level is image Object candidate area, and the dynamic Markov Network of corresponding video object segmentation, observation level are image superpixel.

Specific modeling procedure includes: in step 1

(1) for the video sequence V comprising simple target, target detection is labeled asIt is corresponding to be observed Wherein o_t,iIndicate f_tI-th of candidate region.

(2) for V, Target Segmentation is labeled asIt is corresponding to be observed Its Middle s_t,jIndicate f_tJ-th of super-pixel.

(3)WithBetween there are compatibility function Ψ (L, B).

(4) timing information is added in the Coupled Dynamic Markov Network of foundation.We define since video to t The image object at moment is observedTerminate to the image object of t moment to be observed from videoSimilar, the image segmentation since video to t moment is observedFrom video knot The image object of beam to t moment is observed

Step 2: timing information between video appearance and video consecutive frame is utilized, in Coupled Dynamic Markov Network Likelihood function, compatibility function and kinematic function are modeled, and are initialized to target detection model.The model of foundation is joined See Fig. 2, include section Points And lines in Fig. 2 model, step 1 is used to explain the physical meaning of node, and step 2 is for explaining line Meaning and specific method for building up.

Specifically includes the following steps:

(1) for the likelihood function p (O of target detection_t|L_t) need to give a mark to each candidate target, mode of giving a mark are as follows:

r(o_t,i)=r_s(o_t,i)·r_a(o_t,i)·r_m(o_t,i)

Wherein r_s(o_t,i) indicate conspicuousness score, that is, indicate that the significance degree of corresponding region, circular are benefit Saliency maps are obtained with existing conspicuousness method, then calculate the average value of the conspicuousness of corresponding region；r_a(o_t,i) expression thing The confidence level that body score, i.e. expression corresponding region include object, circular are the complete edge for calculating corresponding region The ratio at shared all edges；r_m(o_t,i) indicate sports scores, i.e. confidence of the expression corresponding region comprising persistent movement object Degree, circular and r_a(o_t,i) similar, image border is only replaced with into light stream edge, remaining is identical.Obtaining r (o_t,i) after, the likelihood function of target detection is established as formula:

WhereinIndicate the score after all scores normalize in all video frames.

(2) for the likelihood function p (S of Target Segmentation_t|B_t), it needs to learn two Gaussian Mixtures of foreground and background to video Model, the model for prospect are h₁, the model for background is h₀.The likelihood function of Target Segmentation is established as formula:

(3) for compatibility function establish us will be from L_tTo B_tCompatibility function is defined as:

Ψ_LB(L_t,B_t)=IoU (o_t,i,B_t(1))；i∈{1,…,K}

Wherein B_t(1) segmentation result calculated is indicated.Similar, we will be from B_tTo L_tCompatibility function definition Are as follows:

Wherein O_t(1) the target detection result calculated is indicated.

(4) for the dynamic model p (L of target detection_t|L_t-1) we is defined as:

Wherein,Indicate candidate region o_t,iFor its time adjacent area o_t-1,mTransition probability, i indicate present frame The label of candidate region in (i.e. t frame), m indicate to be chosen as o in the former frame (i.e. t-1 frame) of t frame_t,iAdjacent region The label in domain.Specific calculation are as follows:

M=argmax_{i′∈{1,…,K}}IoU(o_t,i,Warp(o_t-1,i′))

Wherein Warp (o_t-1,i′) it is f_t-1In region o_t-1,i′F is transformed into according to light stream_tIn corresponding region, δ_i=δ (l_t-1,m,l_t,i) it is an indicator variable.It is in l_t-1,m≠l_t,iWhen be 1, be otherwise 0.α_i=EMD (h_c(o_t-1,m),h_c(o_t,i)) Indicate o_t-1,mWith o_t,iColor histogram EMD distance.Indicate o_t-1,iWith o_t,mHistogram of gradients Chi-Square measure.

Similar, we define the dynamic model of Target Segmentation are as follows:

Wherein,Indicate super-pixel s_t,jFor its time neighbouring super pixels s_t-1,nTransition probability, j indicate present frame (i.e. T frame) in super-pixel label, n indicate t frame former frame (i.e. t-1 frame) in be chosen as s_t,jAdjacent area mark Number.Specific calculation are as follows:

N=argmax_{j'∈{1,…,J}}IoU(s_t,j,Warp(s_t-1,j'))

Wherein Warp (s_t-1,j') it is f_t-1In region s_t-1,j'F is transformed into according to light stream_tIn corresponding region, δ_j=δ (b_t-1,n,b_t,j) it is an indicator variable.It is in b_t-1,n≠b_t,jWhen be 1, be otherwise 0.ω_j=| | h_m(s_t-1,n)-h_m(s_t,j)| |₂Indicate s_t-1,jWith s_t,nOptical flow gradient histogram Euclidean distance.σ_jIt is also an indicator variable s_t,jWith s_t-1,nBelong to mesh It is 1 when the prospect that mark discovery obtains, is otherwise 0.μ_j=IoU (s_t,j,Warp(s_t-1,n)) it is s_t,jWith Warp (s_t-1,n) IoU point Number.

The initialization process of step 2 target detection model is as shown in Figure 3.Specifically includes the following steps:

(1) video frame for having been illustrated as feature vector is divided into two classes using classifier, positive sample is equal from video The frame of even acquisition, wherein may include noise, negative sample be the image in uncorrelated data set.

(2) classifier is trained using these data, and by and by belonging to the confidence level of positive sample greater than 80% Sample is considered as positive sample, and the confidence level for belonging to positive sample is considered as negative sample less than 30% sample, continues to train.(3) it is straight that (2) are repeated It is restrained to classifier, two classification, the available frame comprising target is carried out to all frames of video using finally obtained classifier With the preliminary classification not comprising target frame as a result, the initialization result of as target detection.

Step 3: established model is solved using belief propagation algorithm.Specifically includes the following steps:

(1) certainty factor algebra is generalized in time series, wherein the confidence level m from B to L_BL(L_t) and setting from L to B Reliability m_LB(B_t) be respectively as follows:

After obtaining confidence level, according to the target detection result of the available maximum a posteriori probability of belief propagation algorithm p(L_t| O, S) are as follows:

Similar, the Target Segmentation p (B of maximum a posteriori probability_t| O, S) result are as follows:

It is finally obtained the result is that posterior probability (i.e. p (L under two kinds of observation conditions_t| O, S) and p (B_t| O, S)), This mathematically shows our final results, unifies the discovery of combined optimization video object and asks with video object two sons of segmentation Topic.In above-described embodiment,The video sequence comprising simple target is indicated, wherein wherein f_tIndicate the t frame of video Image, it includes T frame image that video, which has altogether,；Indicate the target detection label of video；Indicate video Target detection observation, whereino_t,iIndicate f_tI-th of candidate region；Indicate the target of video Dividing mark；Indicate the Target Segmentation observation of video, whereins_t,jIndicate f_tJ-th surpass picture Element；Ψ (L, B): it indicatesWithBetween there are compatibility functions, particularly, L_tTo B_tCompatibility letter Number is Ψ_LB(L_t,B_t), from B_tTo L_tCompatibility function be Ψ_BL(B_t,L_t)；p(O_t|L_t): indicate the observation of video object discovery Likelihood function；p(S_t|B_t): indicate the observation likelihood function of video object segmentation；p(L_t|L_t-1): indicate video object discovery Dynamic model；p(B_t|B_t-1): indicate the dynamic model of video object segmentation.

The present invention is suitable for automatically processing video, finds the main target in video and comes out it from background segment.This The video object based on Coupled Dynamic Markov Network of invention is found and dividing method, first with dynamic decomposable markov networks Network finds that dividing two problems with video object models to video object；Then appearance is utilized, the information such as video timing are right Likelihood function, compatibility function and kinematic function in model are modeled, and are initialized to target detection model；Finally Overall model is solved in the way of confidence spread, obtaining, there is the video object discovery of maximum a posteriori probability to tie with segmentation Fruit.The present invention can solve the problems, such as video object discovery and segmentation two simultaneously, and two problems may be implemented in the present invention Combined optimization, finally make the accuracy rate of target detection and Target Segmentation all be highly improved.

Data analysis

Fig. 4 be different method for evaluate video object segmentation public data collection SegTrack on experimental result, As seen from the figure, in Target Segmentation, our method has all obtained most in addition to " monkey " and " soldier " video It is high as a result, and comprehensively consider, that is, compare average result, achieve highest accuracy.

Fig. 5 be on Noisy-ViDiSeg data set to target detection evaluated and tested as a result, as seen from the figure, Our method has all obtained highest accuracy rate in all videos in target detection.

Fig. 6 be on Noisy-ViDiSeg data set to Target Segmentation evaluated and tested as a result, as seen from the figure, Our method has obtained highest accuracy rate in relatively more comprehensive all data average results in Target Segmentation.

In conclusion being compared with other methods compared with, our method in two problems of Target Segmentation and target detection all Obtain biggish promotion.

Claims

1. it is a kind of based on Coupled Dynamic Markov Network video object discovery and dividing method, which is characterized in that including with Lower step:

Step 1, comprising the video sequence of targetWherein f_tIndicate that the t frame image of video, video sequence V wrap altogether The image of frame containing T；Video sequence V is modeled using dynamic Markov Network, obtains Coupled Dynamic Markov Network frame mould Type；

It include the dynamic Markov Network and target of target detection in the Coupled Dynamic Markov Network frame model of acquisition The dynamic Markov Network of segmentation；The observation level of the dynamic Markov Network of target detection is image object candidate regions Domain；The observation level of the dynamic Markov Network of Target Segmentation is image superpixel；

Step 2, the video sequence in step 1 including target is utilizedTiming between the appearance and video consecutive frame for including Information, observation likelihood function, compatibility function and the phase in Coupled Dynamic Markov Network frame model that step 1 is obtained Adjacent frame kinematic function is modeled, and Coupled Dynamic Markov Network model is obtained；And to the Coupled Dynamic Markov of acquisition Target detection model in network model is initialized, and initial target discovery result is obtained；

Step 3: the Coupled Dynamic Markov Network model that institute's step 2 obtains is solved using belief propagation algorithm, depending on The result of frequency target detection and video object segmentation.

2. a kind of video object discovery and segmentation side based on Coupled Dynamic Markov Network according to claim 1 Method, which is characterized in that include: to the video sequence V specific steps modeled using dynamic Markov Network in step 1

Step 1.1, the target detection label of video sequence V is denoted asCorresponding target detection observation is denoted asWherein o_t,iIndicate f_tI-th of candidate region；

Step 1.2, the Target Segmentation label of video sequence V is denoted asCorresponding Target Segmentation observation is denoted asWherein s_t,jIndicate f_tJ-th of super-pixel；

Step 1.3,WithBetween there are compatibility function Ψ (L, B).

3. a kind of video object discovery and segmentation side based on Coupled Dynamic Markov Network according to claim 2 Method, which is characterized in that in step 2, in the Coupled Dynamic Markov Network frame model of acquisition observation likelihood function, Compatibility function and consecutive frame kinematic function modeled specifically includes the following steps:

Step 2.1, it gives a mark to each candidate region, then establishes the likelihood function p (O of target detection_t|L_t)；Each candidate region Score are as follows:

r(o_t,i)=r_s(o_t,i)·r_a(o_t,i)·r_m(o_t,i)

Wherein, r_s(o_t,i) indicate conspicuousness score, for indicating the significance degree of corresponding region；r_a(o_t,i) indicate object score, For indicating that corresponding region includes the confidence level of object；r_m(o_t,i) indicate sports scores, for indicating that corresponding region includes to continue The confidence level of moving object；In the score r (o for obtaining each candidate target_t,i) after, the likelihood function of target detection is established as Formula:

Wherein,Indicate r_s(o_t,i) score after normalized；

Step 2.2, the likelihood function p (S about Target Segmentation is established_t|B_t), concrete mode are as follows: the prospect Gauss for learning video is mixed Molding type and background gauss hybrid models, wherein prospect Gauss model is expressed as h₁；Background Gauss model is h₀；Target Segmentation Likelihood function is established as formula:

Wherein,

Ψ_LB(L_t,B_t)=IoU (o_t,i,B_t(1))；i∈{1,…,K}

It will be from B_tTo L_tCompatibility function is defined as:

Step 2.4, by the dynamic model p (L of target detection_t|L_t-1) is defined as:

Wherein,Indicate candidate region o_t,iFor its time adjacent area o_t-1,mTransition probability, i indicate present frame i.e. t frame The label of middle candidate region, m indicate to be chosen as o in the former frame i.e. t-1 frame of t frame_t,iAdjacent area label；

By the dynamic model of Target Segmentation is defined as:

Wherein,Indicate super-pixel s_t,jFor its time neighbouring super pixels s_t-1,nTransition probability, j indicate present frame i.e. t frame The label of middle super-pixel, n indicate to be chosen as s in the former frame i.e. t-1 frame of t frame_t,jAdjacent area label.

4. a kind of video object discovery and segmentation side based on Coupled Dynamic Markov Network according to claim 1 Method, which is characterized in that in step 2, the initialization of target detection model specifically includes the following steps:

(1) video frame for having been illustrated as feature vector is divided by two classes by classifier, positive sample is uniformly to adopt from video The frame of collection, negative sample are the image in uncorrelated data set；

(2) classifier is trained by the data in step (1), and the confidence level for belonging to positive sample is greater than to 80% sample Originally it is considered as positive sample, the confidence level for belonging to positive sample is considered as negative sample less than 30% sample；

(3) step (2) are repeated until classifier is restrained；

(4) classifier obtained using step (3) carries out two classification to all frames of video sequence V, obtain the frame comprising target with Preliminary classification not comprising target frame is as a result, realize the initialization of target detection model.

5. a kind of video object discovery and segmentation side based on Coupled Dynamic Markov Network according to claim 4 Method, which is characterized in that the process for solving established model in step 3 using belief propagation algorithm specifically includes following step It is rapid:

(1) confidence level for considering adjacent before and after frames is calculated, wherein the confidence level from B to L and the confidence level from L to B are respectively as follows:

WhereinFor the target observation since video to t moment；To terminate from video To the target observation of t moment；Start to the segmentation of t moment to observe for video；For from view Frequency terminates the target observation to t moment；

(2) after obtaining confidence level by step 1, the target detection knot of maximum a posteriori probability is obtained according to belief propagation algorithm Fruit p (L_t| O, S) are as follows: