CN101610412B - Visual tracking method based on multi-cue fusion - Google Patents

Visual tracking method based on multi-cue fusion Download PDF

Info

Publication number
CN101610412B
CN101610412B CN2009100888784A CN200910088878A CN101610412B CN 101610412 B CN101610412 B CN 101610412B CN 2009100888784 A CN2009100888784 A CN 2009100888784A CN 200910088878 A CN200910088878 A CN 200910088878A CN 101610412 B CN101610412 B CN 101610412B
Authority
CN
China
Prior art keywords
mrow
msubsup
probability distribution
msub
munder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100888784A
Other languages
Chinese (zh)
Other versions
CN101610412A (en
Inventor
杨戈
刘宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2009100888784A priority Critical patent/CN101610412B/en
Publication of CN101610412A publication Critical patent/CN101610412A/en
Application granted granted Critical
Publication of CN101610412B publication Critical patent/CN101610412B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a visual tracking method based on multi-cue fusion, which belongs to the technical field of information. The method comprises the following steps: a) determining a tracking window comprising a target region and a background region in a first frame of a video sequence; b) obtaining a color feature probability distribution graph, a position feature probability distribution graph and a motion continuity feature probability distribution graph of the previous frame from the second frame; c) adding the three probability distribution graphs in a weighed manner to obtain a total probability distribution graph; and d) using a CAMSHIFT algorithm to obtain the coordinates of a central point of the tracking window of the current frame in the total probability distribution graph. The method can be used in human-computer interaction, visual intelligent surveillance, intelligent robot, virtual reality technology, model-based image encoding, content retrieval of streaming media and other fields.

Description

Visual tracking method based on multi-cue fusion
Technical Field
The invention relates to visual tracking, in particular to a visual tracking method fusing multiple clues, and belongs to the technical field of information.
Background
With the rapid development of information technology and intelligent science, computer vision that utilizes computers to realize human vision functions is one of the most popular research directions in the computer field at present. Visual tracking, which is one of the core problems of computer vision, is to find the position of a moving object of interest in each frame of an image sequence. It is necessary and urgent to study it.
Hong Liu et al, 2007, (IEEE 14th International Conference on image Processing), published a paper "collective Mean Shift based on multi-cue fusion and collaborative Mean Shift tracking of assist objects" that combines color, location and prediction feature cues to dynamically update the weight of each cue according to the background, and implemented a visual tracking method using an assist object using Mean Shift technology. However, it assumes that the background model obeys the single gaussian model, and it needs to train the video sequence without moving object in advance to obtain the background initial model, so it limits its application, and uses a rectangle larger than the target to represent the region of interest in the line cable evaluation function, and the region between the rectangle and the tracking window is defined as the background region, for the reliability evaluation function of a certain cable, the size of the background region directly affects its value, i.e. the larger the tracking window is, the smaller the reliability evaluation function value is, and it lacks generality.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a visual tracking method fusing multiple clues, which is particularly applicable to visual tracking for human body movement, so that when a computer carries out visual automatic tracking on a target (such as a human body), the requirements of accuracy and real-time performance are met.
The invention combines a plurality of clues (color characteristic, position characteristic and motion continuity characteristic) of a video image to realize visual tracking by means of a CAMSHIFT (continuous Adaptive Mean Shift) method, as shown in FIG. 1. The color features preferably adopt hue and saturation features, red channel features, green channel features and blue channel features, and better robustness is realized on the change of occlusion and pose; the position characteristics are realized by using a frame difference technology; the motion continuity feature is done according to inter-frame continuity.
The invention adopts a fixed and unchangeable tracking window, thus, although the management of appearance change and occlusion is limited, the invention does not consider that some areas with similar backgrounds can be regarded as a part of a target, and the tracking effect can be realized as well.
The invention is realized by the following technical scheme, which comprises the following steps:
a) determining a tracking window in a first frame of a video sequence, wherein the tracking window comprises a target area and a background area, and the target area contains a tracked object; preferably, the tracking window is a rectangle, the rectangle is equally divided into three parts, the middle part is the target region, and the two parts are the background region, as shown in fig. 2.
b) For each frame from the second frame, obtaining a color feature probability distribution map, a position feature probability distribution map and a motion continuity feature probability distribution map of the previous frame;
c) weighting and adding the three probability distribution maps to obtain a total probability distribution map;
d) and obtaining the center point coordinate of the tracking window of the current frame in the total probability distribution map through a CAMSHIFT algorithm.
The following describes the various threads and thread fusions to which the present invention relates in detail.
Color characteristics
The color features preferably include Hue (Hue) and Saturation (Saturation) features, r (red) channel features, g (green) channel features, and b (blue) channel features of the image, achieving better robustness to occlusion and pose changes.
Assuming that the invention uses a histogram of m handles (bins), the image has n pixel points, their positions and corresponding values in the histogram are { x }i}i=1...n,{qu}u=1,...,m.(R channel characteristics, G channel characteristics, and B channel characteristics) or { q }u(v)}u=1,...,m;v=1,...,m.(hue and saturation characteristics). Defining a function b R2→ 1, …, m, this function characterizes the discrete interval value for each pixel's color information. In the histogram, the value corresponding to the c-th color information interval may be expressed by equations (1) and (2) or equations (1 ') and (2'):
<math><mrow><msub><mi>q</mi><mrow><mi>u</mi><mrow><mo>(</mo><mi>v</mi><mo>)</mo></mrow></mrow></msub><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mi>&delta;</mi><mo>[</mo><mi>b</mi><mrow><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>)</mo></mrow><mo>-</mo><mi>u</mi><mrow><mo>(</mo><mi>v</mi><mo>)</mo></mrow><mo>]</mo><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></math> p u ( v ) = min ( 255 max ( q u ( v ) ) q u ( v ) , 255 ) - - - ( 2 )
or <math><mrow><msub><mi>q</mi><mi>u</mi></msub><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mi>&delta;</mi><mo>[</mo><mi>b</mi><mrow><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>)</mo></mrow><mo>-</mo><mi>u</mi><mo>]</mo><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><msup><mn>1</mn><mo>&prime;</mo></msup><mo>)</mo></mrow></mrow></math> <math><mrow><msub><mi>p</mi><mi>u</mi></msub><mo>=</mo><mi>min</mi><mrow><mo>(</mo><mfrac><mn>255</mn><mrow><mi>max</mi><mrow><mo>(</mo><msub><mi>q</mi><mi>u</mi></msub><mo>)</mo></mrow></mrow></mfrac><msub><mi>q</mi><mi>u</mi></msub><mo>,</mo><mn>255</mn><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><msup><mn>2</mn><mo>&prime;</mo></msup><mo>)</mo></mrow></mrow></math>
The color feature probability distribution map can be established by the following method:
firstly, extracting r (Red), g (Green), and b (Blue) channels from an RGB (Red, Green, Blue) image, then converting the RGB image into an HSV (Hue, Saturation, Value) image, extracting a Hue (Hue) channel and a Saturation (Saturation) channel, and calculating a Hue and Saturation probability distribution, a Red probability distribution, a Green probability distribution, and a Blue probability distribution of pixels in a tracking window by using a Histogram Back-Projection (1 '), as shown in formula (1) or formula (1').
Secondly, the value ranges in the hue and saturation probability distribution, the red probability distribution, the green probability distribution and the blue probability distribution are re-valued by the formula (2) or the formula (2'), so that the value ranges are represented by [0, max (q)u(v))]Or [0, max (q)u)]Projection to [0, 255]。
Thirdly, selecting proper characteristics from the hue and saturation characteristics, the red characteristics, the green characteristics and the blue characteristics according to a certain rule as the color characteristics of a visual tracking algorithm to form a final color probability distribution map p (x, y).
Among the above four features, the method of the present invention preferably dynamically selects one or more features that best reflect the differences between the target region and the background region, and the method comprises the following steps:
for feature k, let i be the value of feature k, H1 k(i) Histogram showing feature values in target area A, H2 k(i) Histograms, p, representing the characteristic values in the background areas B and Ck(i) Is the discrete probability distribution of the target area A, qk(i) Is the discrete probability distribution, L, of background regions B and Ci kIs the log likelihood of feature k, as in equation (10), taking a very small number with δ > 0, which is mainly to prevent equation (10) from appearing where the denominator is 0 or log 0. var (L)i k;pk(i) Is a relative target class distribution pk(i) L ofi kVariance of (2), formula (11), var (L)i k;qk(i) Is a relative background class distribution qk(i) L ofi kVariance of (2), formula (12), var (L)i k;Rk(i) L is a distribution of relative object and background classesi kThe variance of (A) is as shown in formula (13), V (L; p)k(i),qk(i) Is Li kThe variance of (A) is as shown in formula (14), V (L; p)k(i),qk(i) Represents the ability of feature k to separate target and background, V (L; p is a radical ofk(i),qk(i) Larger the feature k indicates that the object is easier to separate from the background, the more likely the feature isThe more suitable the feature as a tracking target.
<math><mrow><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>=</mo><mi>log</mi><mfrac><mrow><mi>max</mi><mo>{</mo><msub><mi>p</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>,</mo><mi>&delta;</mi><mo>}</mo></mrow><mrow><mi>max</mi><mo>{</mo><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>,</mo><mi>&delta;</mi><mo>}</mo></mrow></mfrac><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>10</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><mi>var</mi><mrow><mo>(</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>;</mo><msub><mi>p</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>=</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>-</mo><msup><mrow><mo>(</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>)</mo></mrow><mn>2</mn></msup><mo>=</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>p</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>-</mo><msup><mrow><mo>[</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>p</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo></mrow><mn>2</mn></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>11</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><mi>var</mi><mrow><mo>(</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>;</mo><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>=</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>-</mo><msup><mrow><mo>(</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>)</mo></mrow><mn>2</mn></msup><mo>=</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>-</mo><msup><mrow><mo>[</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo></mrow><mn>2</mn></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>12</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><mi>var</mi><mrow><mo>(</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>;</mo><msub><mi>R</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>=</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>-</mo><msup><mrow><mo>(</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>)</mo></mrow><mn>2</mn></msup><mo>=</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><mtext></mtext><msub><mi>R</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>-</mo><msup><mrow><mo>[</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>R</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo></mrow><mn>2</mn></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>13</mn><mo>)</mo></mrow></mrow></math>
Wherein R isk(i)=[pk(i)+qk(i)]/2;
V ( L i k ; p k ( i ) , q k ( i ) ) = var ( L i k ; R k ( i ) ) var ( L i k ; p k ( i ) ) + var ( L i k ; q k ( i ) ) - - - ( 14 )
In the video tracking process, the reliability of the hue and saturation characteristics, the R (Red) channel characteristics, the G (Green) channel characteristics and the B (blue) channel characteristics is continuously detected, when the reliability of the hue and saturation characteristics, the reliability of the G (Green) channel characteristics and the B (blue) channel characteristics is changed, the reliability is calculated according to the formula (14), the reliability is rearranged according to the reliability, and V (L) is takeni k;pk(i),qk(i) Maximum W features) as the color features of the tracking target. The value of W is preferably 1, 2 or 3.
Location features
Regarding the position characteristics, the invention utilizes the frame difference to calculate the gray difference value of each point in the front and back two frames of images, and then determines which pixel points are motion points by setting a threshold value, and the motion points are determined when the threshold value is exceeded. If the frame difference threshold is set only by relying on experience, certain blindness is provided, and the method is only suitable for certain specific occasions. The invention preferably uses the Otsu method for dynamically determining this frame difference threshold F for this purpose. The basic idea of the Otsu algorithm is to find a suitable threshold F to minimize the intra-class dispersion moment, which is equivalent to finding a suitable threshold F to maximize the inter-class dispersion moment, i.e. the frame difference image is divided into two classes by the frame difference threshold F, such that the variance of the two classes divided is maximized. The intra-class dispersion moment represents the dispersion of sample points around their mean, and the inter-class dispersion moment represents the dispersion between classes. Smaller inter-class scattering moments mean closer inside the classes of samples, and larger inter-class scattering moments mean better separability between the classes of samples.
Motion continuity feature
Regarding the motion continuity characteristics, the invention estimates the speed of the tracked target through the images of the previous frames, and further estimates the target center position at the current moment according to the target position obtained by the image tracking of the previous frames. In a short time (between video frames), the motion of the target has strong continuity, and the speed of the target can be regarded as constant, so that the speed of the tracked target can be estimated through images of the previous frames, and further the target center position at the current moment can be estimated from the target position obtained by tracking the images of the previous frames.
X(t,row)=X(t-1,row)±(X(t-1,row)-X(t-2,row)) (3)
X(t,col)=X(t-1,col)±(X(t-1,col)-X(t-2,col)) (4)
Let X (t, row) represent the line coordinate of the current target center position at time t, as formula (3), X (t, col) represent the ordinate of the current target center position at time t, as formula (4), row is the maximum line number of the image, col is the maximum vertical number of the image, and the current position is predicted by using a linear predictor in consideration of the continuity of the target motion. Therefore, X (t, row), X (t-1, row) and X (t-2, row) are related to the formula (5), and X (t, col), X (t-1, col) and X (t-2, col) are related to the formula (6).
X(t,row)∈[max(X(t-1,row)-(X(t-1,row)-X(t-2,row)),1),min(X(t-1,row)+(X(t-1,row)-X(t-2,row)),rows)] (5)
X(t,col)∈[max(X(t-1,col)-(X(t-1,col)-X(t-2,col)),1),min(X(t-1,col)+(X(t-1,col)-X(t-2,col)),cols)] (6)
And (3) setting the line width of the tracking window to be width and the length to be length, then setting the line coordinate of the target at the current moment as formula (7) and the ordinate as formula (8), namely, setting the target in the rectangular range.
Y(t,row)∈[max(X(t,row)-width,1),min(X(t,row)+width,rows)] (7)
Y(t,col)∈[max(X(t,col)-length,1),min(X(t,col)+length,cols)] (8)
Let B' (x, y, t) denote the probability distribution of motion continuity features, as in equation (9), where (x, y, t) denotes the pixel of the coordinate (x, y) at time t, 1 denotes the tracked object, and 0 denotes the background pixel.
<math><mrow><msup><mi>B</mi><mo>&prime;</mo></msup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><mn>1</mn></mtd><mtd><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>&Element;</mo><mi>Y</mi><mrow><mo>(</mo><mi>t</mi><mo>,</mo><mi>row</mi><mo>)</mo></mrow><mo>&cap;</mo><mi>Y</mi><mrow><mo>(</mo><mi>t</mi><mo>,</mo><mi>col</mi><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>&NotElement;</mo><mi>Y</mi><mrow><mo>(</mo><mi>t</mi><mo>,</mo><mi>row</mi><mo>)</mo></mrow><mo>&cap;</mo><mi>Y</mi><mrow><mo>(</mo><mi>t</mi><mo>,</mo><mi>col</mi><mo>)</mo></mrow></mtd></mtr></mtable></mfenced><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>9</mn><mo>)</mo></mrow></mrow></math>
Thread fusion
Suppose Pk(row, colu, t) is the probability distribution of the pixel (row, colu) at the time t through the characteristic k, and represents the probability that each pixel (row, colu) belongs to the target area under the characteristic k. P (row, colu, t) represents the final probability distribution after fusion of t, W +2 features (W color features, one predicted target location feature and one motion continuity feature) at time t, which characterizes the probability that each pixel (row, colu) belongs to the target region, as in equation (15). The W color features are used as the basis of competitive judgment through the reliability of the last frame, and if the credibility of a certain feature is high, the W color features dominate in a visual tracking system and provide more information for the tracking system; when confidence is low, the information may be derated or ignored.
<math><mrow><mi>P</mi><mrow><mo>(</mo><mi>row</mi><mo>,</mo><mi>colu</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>W</mi><mo>+</mo><mn>2</mn></mrow></munderover><msub><mi>r</mi><mi>k</mi></msub><mo>*</mo><msub><mi>P</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>row</mi><mo>,</mo><mi>colu</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>15</mn><mo>)</mo></mrow></mrow></math>
Wherein r iskIs the weight of the feature k, r1,r2,...,rkIs a selected color feature, rw+1Characterised by the position of the predicted targetWeight, rw+2Is a weight of the motion continuity feature, <math><mrow><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>W</mi><mo>+</mo><mn>2</mn></mrow></munderover><msub><mi>r</mi><mi>k</mi></msub><mo>=</mo><mn>1</mn><mo>,</mo></mrow></math> to make Pw+1(row, colu, t) and Pw+2The value range of (row, colu, t) is projected to [0, 255]Taking Pw+1(row,colu,t)=B(x,y,t)*255,Pw+2(row,colu,t)=B′(x,y,t)*255。
Compared with the prior art, the method is simple and effective, does not need to assume a background model, does not need to train a video sequence without a moving target in advance, has the key point of realizing the fusion of a plurality of clues, is suitable for different scenes, obtains better tracking effect, and is particularly suitable for the conditions that the color saturation of a target environment in the video sequence is lower and the target is partially shielded.
The visual tracking method and the visual tracking device can be used as a tracking result and can also be used as an intermediate result of the next visual understanding. The invention has wide application prospect in the information field, and can be applied in the fields of Human Robot Interaction technology (HRI for short), visual intelligent monitoring, intelligent robots, virtual reality technology, model-based image coding, content retrieval of streaming media and the like. The video monitoring system is applied to community safety monitoring, fire monitoring, traffic violation, flow control, security and protection of public places such as military affairs, banks, markets, airports, subways and the like. The existing video monitoring system usually only records video images to be used as a post-incident evidence and does not fully play the real-time active monitoring role. If the existing video monitoring system is improved into an intelligent video monitoring system, the monitoring capability can be greatly enhanced, the potential safety hazard can be reduced, and meanwhile, the human and material resources and the investment can be saved. The video intelligent system can solve two problems: one is to release the security operator from the complicated and boring task of 'staring at the screen', and the machine is used for completing the work; the other is that the image to be found is quickly searched in massive video data, namely a target is tracked, such as No. 13 line of Beijing subway, and a thief is caught by video analysis; the Pudong airport, the capital airport and a plurality of railway projects already under construction all expect to use video analysis technology, and the visual tracking method of the invention is one of the core and key technologies of the video analysis technology.
Drawings
FIG. 1 is a schematic diagram of the method of the present invention for fusing threads;
FIG. 2 is a schematic view of the tracking window of the present invention, where A is the target area and B and C are the background areas;
fig. 3-5 are schematic diagrams of visual tracking for 50 frames, 100 frames and 120 frames of a video sequence with a resolution of 640 × 512, respectively, where a represents a motion continuity feature probability distribution diagram, b represents a position feature probability distribution diagram, c represents a total probability distribution diagram, and d represents a current frame tracking result diagram.
Fig. 6a-d are graphs of visual tracking results for 50 frames, 90 frames, 120 frames and 164 frames, respectively, of a video sequence with a resolution of 640 x 480.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The scope of protection of the invention is not limited to the examples described below.
The visual tracking of the present embodiment is performed according to the following steps:
firstly, a tracking window is arranged at the 1 st frame of a video sequence, the length and the width of the tracking window are determined by an operator according to the size of a tracked target and are not changed in the tracking process. The tracking window is divided into three parts, with the middle part (a) being the target area and the left and right (B and C) being background areas, as shown in fig. 2.
Second, from the 2 nd frame, 2(W ═ 2) color features (e.g., R channel and B channel) that are most reliable are selected from the previous frame, and a color feature probability distribution map M is calculated1
Thirdly, calculating the probability distribution map M of the position feature2
Fourthly, calculating the probability distribution map M of the motion continuity characteristics3
Fifth, the three probability distribution maps (M) obtained as described above are applied1、M2、M3) Respectively weighting corresponding rkTo obtain the final probability distribution map M, which is M in the present embodiment1-M33/7 (where the weights of the R and B channels are 2/7 and 1/7, respectively), 2/7 and 2/7.
Sixthly, in the probability distribution map M, obtaining the center point coordinate of the tracking window of the current frame through a CAMSHIFT algorithm, wherein the core process of the CAMSHIFT algorithm comprises the following steps: the zeroth order moment (equation 16 below) and the first order moment (equations 17 and 18 below) of the tracking window are calculated, and the (x, y) coordinates are iteratively calculated by equation (20) through equation (19) until the coordinates are not significantly displaced (the change in the values of x and y is less than 2) or the coordinates when iterated up to the maximum number of times of 15 are the tracking window center of the current frame.
<math><mrow><msub><mi>M</mi><mn>00</mn></msub><mo>=</mo><munder><mi>&Sigma;</mi><mi>x</mi></munder><munder><mi>&Sigma;</mi><mi>y</mi></munder><mi>p</mi><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>16</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><msub><mi>M</mi><mn>10</mn></msub><mo>=</mo><munder><mi>&Sigma;</mi><mi>x</mi></munder><munder><mi>&Sigma;</mi><mi>y</mi></munder><mi>xp</mi><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>17</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><msub><mi>M</mi><mn>01</mn></msub><mo>=</mo><munder><mi>&Sigma;</mi><mi>x</mi></munder><munder><mi>&Sigma;</mi><mi>y</mi></munder><mi>yp</mi><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>18</mn><mo>)</mo></mrow></mrow></math>
x = M 10 M 00 - - - ( 19 )
y = M 01 M 00 - - - ( 20 )
Fig. 3-5 are schematic views of visual tracking for 50 frames, 100 frames and 120 frames, respectively, of a video sequence with a resolution of 640 x 512.
Fig. 6 is a graph of visual tracking results for 50 frames, 90 frames, 120 frames, and 164 frames of a video sequence with a resolution of 640 x 480. Although the saturation of the video sequence 2 is low, the tracking target is still realized by comprehensively considering the reliability of the color features and the multi-cue fusion.

Claims (6)

1. A visual tracking method based on multi-cue fusion comprises the following steps:
a) determining a tracking window in a first frame of a video sequence, wherein the tracking window comprises a target area and a background area, and the target area contains a tracked object;
b) for each frame from the second frame, obtaining a color feature probability distribution map, a position feature probability distribution map and a motion continuity feature probability distribution map of the previous frame; the color features in the color feature probability distribution map include one or more of hue and saturation features, R channel features, G channel features, and B channel features;
c) weighting and adding the three probability distribution maps to obtain a total probability distribution map;
d) and obtaining the center point coordinate of the tracking window of the current frame in the total probability distribution map through a CAMSHIFT algorithm.
2. The visual tracking method of claim 1, wherein the tracking window is a rectangle equally divided into three parts, the middle part being the target region and the two parts being the background region.
3. The visual tracking method of claim 1 wherein V (L; p) of the hue and saturation characteristics, R channel characteristics, G channel characteristics, and B channel characteristics is calculated byk(i),qk(i) Value), the color features in the color feature probability distribution map including the previous, two, or three features for which the V value is the greatest:
V ( L i k ; p k ( i ) , q k ( i ) ) = var ( L i k ; R k ( i ) ) var ( L i k ; p k ( i ) ) + var ( L i k ; q k ( i ) ) , wherein,
pk(i) representing a discrete probability distribution of the target region;
qk(i) a discrete probability distribution representing the background region;
Figure FSB00000237491200012
where δ is used to ensure that no cases occur where the denominator is 0 or log 0;
Rk(i)=[pk(i)+qk(i)]/2;
Figure FSB00000237491200013
wherein E represents the mean and var represents the variance;
<math><mrow><mi>var</mi><mrow><mo>(</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>;</mo><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>=</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>-</mo><msup><mrow><mo>(</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>)</mo></mrow><mn>2</mn></msup><mo>=</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>-</mo><msup><mrow><mo>[</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>q</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo></mrow><mn>2</mn></msup><mo>;</mo></mrow></math>
<math><mrow><mi>var</mi><mrow><mo>(</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>;</mo><msub><mi>R</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>=</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>-</mo><msup><mrow><mo>(</mo><mi>E</mi><mo>[</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo><mo>)</mo></mrow><mn>2</mn></msup><mo>=</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>R</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>-</mo><msup><mrow><mo>[</mo><munder><mi>&Sigma;</mi><mi>i</mi></munder><msub><mi>R</mi><mi>k</mi></msub><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow><mo>*</mo><msubsup><mi>L</mi><mi>i</mi><mi>k</mi></msubsup><mo>]</mo></mrow><mn>2</mn></msup><mo>.</mo></mrow></math>
4. the visual tracking method of claim 1, wherein the location feature probability distribution map is obtained by: and calculating the gray difference value of each pixel point of the tracking window in the current frame and the previous frame, wherein if the difference value is greater than a set threshold value, the pixel point is a motion point, and the position characteristic probability distribution map comprises all the motion points.
5. The visual tracking method of claim 4, wherein the threshold is dynamically determined by an Otsu method.
6. The visual tracking method of claim 1, wherein when the three probability distribution maps are weighted and added to obtain the total probability distribution map, the sum of the weights of the respective probability distribution maps is 1.
CN2009100888784A 2009-07-21 2009-07-21 Visual tracking method based on multi-cue fusion Expired - Fee Related CN101610412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100888784A CN101610412B (en) 2009-07-21 2009-07-21 Visual tracking method based on multi-cue fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100888784A CN101610412B (en) 2009-07-21 2009-07-21 Visual tracking method based on multi-cue fusion

Publications (2)

Publication Number Publication Date
CN101610412A CN101610412A (en) 2009-12-23
CN101610412B true CN101610412B (en) 2011-01-19

Family

ID=41483954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100888784A Expired - Fee Related CN101610412B (en) 2009-07-21 2009-07-21 Visual tracking method based on multi-cue fusion

Country Status (1)

Country Link
CN (1) CN101610412B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI497450B (en) * 2013-10-28 2015-08-21 Univ Ming Chuan Visual object tracking method
JP2016033759A (en) * 2014-07-31 2016-03-10 セイコーエプソン株式会社 Display device, method for controlling display device, and program
CN105547635B (en) * 2015-12-11 2018-08-24 浙江大学 A kind of contactless structural dynamic response measurement method for wind tunnel test
CN107403439B (en) * 2017-06-06 2020-07-24 沈阳工业大学 Cam-shift-based prediction tracking method
CN107833240B (en) * 2017-11-09 2020-04-17 华南农业大学 Target motion trajectory extraction and analysis method guided by multiple tracking clues
CN113378616A (en) * 2020-03-09 2021-09-10 华为技术有限公司 Video analysis method, video analysis management method and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619593A (en) * 2004-12-09 2005-05-25 上海交通大学 Video frequency motion target adaptive tracking method based on multicharacteristic information fusion
CN1932846A (en) * 2006-10-12 2007-03-21 上海交通大学 Visual frequency humary face tracking identification method based on appearance model
CN1992911A (en) * 2005-12-31 2007-07-04 中国科学院计算技术研究所 Target tracking method of sports video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619593A (en) * 2004-12-09 2005-05-25 上海交通大学 Video frequency motion target adaptive tracking method based on multicharacteristic information fusion
CN1992911A (en) * 2005-12-31 2007-07-04 中国科学院计算技术研究所 Target tracking method of sports video
CN1932846A (en) * 2006-10-12 2007-03-21 上海交通大学 Visual frequency humary face tracking identification method based on appearance model

Also Published As

Publication number Publication date
CN101610412A (en) 2009-12-23

Similar Documents

Publication Publication Date Title
WO2020173226A1 (en) Spatial-temporal behavior detection method
Park et al. Continuous localization of construction workers via integration of detection and tracking
CN103098076B (en) Gesture recognition system for TV control
Chen et al. Survey of pedestrian action recognition techniques for autonomous driving
US9652863B2 (en) Multi-mode video event indexing
Senior et al. Appearance models for occlusion handling
Brown et al. Performance evaluation of surveillance systems under varying conditions
CN101610412B (en) Visual tracking method based on multi-cue fusion
CN101470809B (en) Moving object detection method based on expansion mixed gauss model
JP2007128513A (en) Scene analysis
CN103530640B (en) Unlicensed vehicle checking method based on AdaBoost Yu SVM
CN104378582A (en) Intelligent video analysis system and method based on PTZ video camera cruising
CN107038411A (en) A kind of Roadside Parking behavior precise recognition method based on vehicle movement track in video
CN115082855A (en) Pedestrian occlusion detection method based on improved YOLOX algorithm
Song et al. Depth driven people counting using deep region proposal network
Jeyabharathi et al. Vehicle Tracking and Speed Measurement system (VTSM) based on novel feature descriptor: Diagonal Hexadecimal Pattern (DHP)
Fradi et al. Spatio-temporal crowd density model in a human detection and tracking framework
CN111476089A (en) Pedestrian detection method, system and terminal based on multi-mode information fusion in image
CN104239854B (en) A kind of pedestrian&#39;s feature extraction and method for expressing based on region sparse integral passage
Wang et al. Two-branch fusion network with attention map for crowd counting
Peihua A clustering-based color model and integral images for fast object tracking
CN116935304A (en) Self-adaptive detection and tracking method based on crowd concentration
Muniruzzaman et al. Deterministic algorithm for traffic detection in free-flow and congestion using video sensor
CN112906456B (en) Crowd abnormal behavior detection method and system based on inter-frame characteristics
Zhang et al. An accurate algorithm for head detection based on XYZ and HSV hair and skin color models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110119

Termination date: 20140721

EXPY Termination of patent right or utility model