CN104050685A

CN104050685A - Moving target detection method based on particle filtering visual attention model

Info

Publication number: CN104050685A
Application number: CN201410255267.5A
Authority: CN
Inventors: 刘龙; 樊波阳; 刘金星
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2014-06-10
Filing date: 2014-06-10
Publication date: 2014-09-17
Anticipated expiration: 2034-06-10
Also published as: CN104050685B

Abstract

The invention discloses a moving target detection method based on a particle filtering visual attention model. First of all, a particle filtering bidirectional fusion attention model is constructed according to the Bayesian estimation principle; next, on the basis of the particle filtering bidirectional fusion attention model, movement attention and target color attention serve as B-U attention input and T-D attention input respectively, the particle distribution state is changed by calculating particle weights, an attention saliency map is formed, and finally the position of a moving target is determined. According to the method, time attention and space attention are fused, so that the movement attention is calculated more accurately; bottom-to-top attention and top-to-bottom attention are fused, so that the forming process of human visual attention is simulated simply and effectively; with respect to a complex global movement scene, the effectiveness and accuracy of moving target detection are improved.

Description

Moving target detection method based on particle filter visual attention model

Technical Field

The invention belongs to the technical field of video image detection, and relates to a moving target detection method based on a particle filter visual attention model.

Background

Moving object detection is one of the important problems in the field of machine vision, and is a precondition for object tracking and identification, but in a complex moving scene, the existing moving object detection method still has great limitations and defects. In recent years, visual perception research is gradually merged into human physiological and psychological research results, the main idea is to adopt a computer to simulate local functions of human physiology to solve the problems existing in the visual field, visual attention is typical of the research, and the research results have important promotion effects on visual problems such as target detection, segmentation and the like.

The traditional moving target detection method comprises an inter-frame difference method, a background difference method and a global motion compensation method, wherein the background difference method and the inter-frame difference method are only limited to a local moving scene, the global motion compensation method is wide in application range, but accuracy is affected by target size and motion intensity, when a target body is large or the motion is strong, the global motion estimation accuracy is reduced, global effective compensation cannot be carried out, and large errors exist in moving target detection.

Human visual attention is generated by the combined action of Bottom-Up (B-U) and Top-Down (T-D). In 2002, Itti and Koch establish a B-U visual attention model with main characteristics of brightness, color, direction and the like, and subsequent research derives a plurality of visual attention calculation methods and applications. Currently, visual attention models can be roughly classified into a bottom-up type and a bidirectional type. Describing a B-U attention calculation method from bottom to top; the bidirectional type describes the attention calculation method of the combined action of B-U and T-D.

Attention is the initial response of the human visual system to external observations, and some scholars have conducted preliminary studies on the problem of target detection from the point of attention. Some of these use a two-way attention approach for static object detection. The Sang-Woo Ban and the like perform self-organizing neural network learning on the color characteristics of a specific static target, generate a weight matrix, and adjust a B-U attention calculation process by taking the weight matrix as an influence factor of T-D to form a target attention saliency map. And (4) extracting the direction characteristics of the target as T-D attention by Yuming Fang and the like, performing proportional weighted fusion with the B-U attention, and finally determining the position of the target. Yuanlong Yu et al establish a target feature Long Term Memory (LTM) unit, calculate a position probability distribution bias by comparison with low-level features, and perform two-way weighted fusion to determine the target position.

In addition, a motion attention model is established in documents for detecting a moving target, and the main idea is to define the motion attention model according to motion contrast so as to enable a motion salient region to approach a target region. Yu-Fei Ma comprehensively defines a motion attention model according to the motion vector energy, the spatial correlation and the temporal correlation of the motion vector field obtained by decompression from the MPEG code stream, and a motion saliency region can be obtained through the model. Junwei Han divides attention into static attention and dynamic attention, the static attention is mainly attracted by information such as brightness and color of an image, the dynamic attention is defined by calculating the proportion of changing pixels of an area on the basis of global motion compensation, and an attention model is finally obtained by fusing the static attention and the dynamic attention and is mainly applied to moving target detection.

In summary, the research on target detection from the visual attention point of view has positive significance, but most of the current research aims at the detection of static targets, and the research on the detection of moving targets is lacked; in addition, the exercise attention model is limited to a bottom-up data-driven model method, and there is no bidirectional attention model that fuses a plurality of features such as color and motion.

Disclosure of Invention

The invention aims to provide a moving target detection method based on a particle filter visual attention model, and solves the problems that in the prior art, a moving attention model is limited to a bottom-up data drive model method, a plurality of characteristics such as color and motion are not fused, the moving attention model cannot adapt to a complex moving scene, and a moving target is difficult to detect effectively and accurately.

The invention adopts the technical scheme that a moving target detection method based on a particle filter visual attention model is characterized in that firstly, a particle filter bidirectional fusion attention model is constructed according to the Bayesian estimation principle; and then on the basis of a particle filter bidirectional fusion attention model framework, taking the motion attention and the target color attention as B-U and T-D attention inputs respectively, calculating and changing the distribution state of particles through particle weight values to form an attention saliency map, and finally determining the position of the motion target.

The invention has the beneficial effects that:

1) the method has the advantages that the video image is subjected to Gaussian multi-scale decomposition, the visual characteristics of human beings are better met, the motion vector field is subjected to superposition and filtering preprocessing, and the influence of estimation errors and noise is reduced.

2) Temporal and spatial attention are fused, so that the motion attention calculation is more accurate.

3) A particle filtering mechanism is introduced according to a Bayesian estimation principle, bottom-up and top-down attention is fused, a bidirectional fusion attention model is constructed, and the human visual attention forming process is simulated simply and effectively.

4) The moving target detection is carried out by using the bidirectional fusion attention model, and the effectiveness and the accuracy of the moving target detection are improved aiming at a complex global moving scene.

Drawings

FIG. 1 is a block flow diagram of the method of the present invention;

FIG. 2 is a three-level Gaussian pyramid of a video image according to an embodiment of the present invention, where (a) is an original scale, (b) is a next lower scale, and (c) is a lower scale;

fig. 3 shows the corresponding motion vector field and its pre-processing result in the present invention, wherein (a) is the motion vector field, (b) the motion vector field after the filtering is superimposed, and (c) the motion vector field after the median filtering is performed;

FIG. 4 is a B-U attention saliency map of an implementation of the present invention, wherein (a) is an attention saliency map and (B) is an attention saliency heat map;

FIG. 5 is a particle filter fusion two-way attention result of the present invention, wherein (a) is the importance sampling result and (b) is the re-sampled particle distribution map;

fig. 6 is a schematic diagram of particle attention saliency map and target localization implemented in accordance with the present invention, wherein (a) a saliency value is generated for the spatial distribution of particles, (b) a saliency map is obtained from resampled particles, and (c) a target localization result.

Fig. 7 shows experimental results of a "vehicle" video sequence in example 1 of the present invention, where the first to fourth rows are respectively the 5 th, 28 th, 40 th and 60 th frames, where (a) is an original frame, (b) is a Yu-Fei Ma motion attention model result, (c) is a global motion compensation-based visual attention (GMC-VA) model result, (d) is a YumingFang two-way weighted fusion visual attention result, (e) is a two-way attention model result of the present invention, and (f) is a moving object detection positioning result of the method of the present invention;

fig. 8 shows the experimental results of the "horse" video sequence of embodiment 2 of the present invention, wherein the first to fourth rows are the 4 th, 15 th, 38 th and 100 th frames respectively.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to a moving target detection method based on a particle filter visual attention model, which comprises the steps of firstly, constructing a particle filter bidirectional fusion attention model according to a Bayesian estimation principle; then based on a particle filter bidirectional fusion attention model frame, the motion attention and the target color attention are respectively used as B-U and T-D attention inputs, the particle distribution state is changed through particle weight calculation, an attention saliency map is formed (the attention saliency is calculated through the particle distribution state after filtering), and finally the position of the motion target is determined.

The method comprises the following specific implementation steps:

step 1, calculating the attention of the movement at the current t moment as B-U attention, and recording the significance asBy passingControlling the sampling of the initial importance of the particles;

carrying out Gaussian multi-scale decomposition on the video image; estimating motion vector fields of the motion vectors respectively, and preprocessing the motion vector fields; defining time and space attention factors, constructing motion attention, and obtaining by fusing multi-scale motion attentionBy passingControlling the sampling of particle importance. The method specifically comprises the following steps:

1.1) Gaussian multiscale decomposition of images

The multi-scale analysis adopts a gaussian image pyramid method, and specifically, a three-layer scale image can be obtained through gaussian smoothing and down-sampling processing, as shown in fig. 2, in the embodiment, a three-layer gaussian pyramid method is adopted.

1.2) estimating a motion vector field by adopting an optical flow method, and performing two preprocessing of superposition and filtering on the motion vector field

As shown in fig. 3(a), the motion vector field estimated by the optical flow method generally exhibits sparse and local and chaotic motion features, because the motion features of adjacent frames are not strong enough, and there is a certain amount of noise in the video signal, which is not favorable for accurate calculation of motion attention. By performing two preprocessing of superposition and filtering on the motion vector field, the influence of estimation errors and noise can be well reduced.

The motion vector superposition process is as follows: let the current frame motion vector field be MVF_tThe center coordinate of the macro block is (k, l), and the corresponding motion vector is expressed asSuperimposed with the motion vectors of preceding and succeeding frames according to the formula

The results of the calculation and the superposition are shown in FIG. 3 (b).

The motion vectors are processed after superposition by median filtering, i.e. for each non-zero motion vector, the median of the neighboring motion vectors is used to replace its value, and the median filtering result is shown in fig. 3 (c).

1.3) calculating the attention of the movement as the attention of the B-U

Calculating the attention of the movement, defining the attention of both time and spaceAndrepresenting temporal and spatial attention, respectively, defined as:

<math> <mrow> <msubsup> <mi>SM</mi> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mi>T</mi> </msubsup> <mo>=</mo> <mo>|</mo> <mi>ΔV</mi> <mo>|</mo> <mo>=</mo> <mo>|</mo> <msub> <mover> <mi>V</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>V</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>,</mo> </mrow> </math>

<math> <mrow> <msubsup> <mi>SM</mi> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mi>S</mi> </msubsup> <mo>=</mo> <mo>|</mo> <msub> <mover> <mi>V</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>,</mo> </mrow> </math>

whereinAndrepresenting the motion vector with coordinate position (i, j) in the motion vector field at time t and t-1, respectively,representsThe mean of the motion vectors within the neighborhood a,

the motor attention is obtained by linear fusion of temporal and spatial attention, i.e.

In the formula, α and β are coefficients of positive values, and B-U attention is shown in FIG. 4.

1.4) byControlling particle importance sampling

Adopting the motion significance characteristics to adjust the density of Gaussian random particle sampling to obtain a random sampling result which changes along with the motion significance, adopting the Gaussian random sampling to obtain the initial distribution state of particles, and setting1,2, …, N independently and equally distributed, such that:

wherein, mu_x、μ_y、Andare respectivelyMean and variance of the pseudo-random sequence, expressed in (μ) by the above formula_x,μ_y) Generating random Gaussian sampling results in a region of the coordinate center, wherein the density of the sampled particles in the region is regulated and controlled by the motion significance, assumingIs the saliency value of the saliency map at time t in (x, y) coordinates, the sampling density function is defined as follows:

<math> <mrow> <munder> <mi>Γ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>Λ</mi> </mrow> </munder> <mrow> <mo>(</mo> <msup> <mi>SM</mi> <mrow> <mi>B</mi> <mo>-</mo> <mi>U</mi> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msqrt> <mn>2</mn> <mi>π</mi> </msqrt> <msub> <mi>δ</mi> <mi>A</mi> </msub> </mrow> </mfrac> <mo>·</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msup> <mi>SM</mi> <mrow> <mi>B</mi> <mo>-</mo> <mi>U</mi> </mrow> </msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>y</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>μ</mi> <mi>A</mi> </msub> </mrow> <msub> <mi>δ</mi> <mi>A</mi> </msub> </mfrac> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

wherein i and j represent the horizontal and vertical coordinates, mean values, respectively, in the saliency mapSum varianceThe significance sampling result is shown in fig. 5 (a).

If the initial time is, sampling particles according to the method to form an initial particle distribution state; otherwise, sampling a part of particles at the position of the difference between the motion attention and the previous motion attention, and replacing the same number of particles with lower weight at the previous moment to serve as the initial particle state at the moment.

Step 2, calculating T-D color attention according to target characteristics

T-D color attention significance is recorded asThe magnitude of T-D attention is measured by the degree of similarity of the target feature to the image feature,

2.1) setting the color histogram as the target feature quantization expression method, and recording asm is the number of components, the color distribution of the particle target region is defined as

Where Delta (□) is a Delta function,to normalize the factor so thatK (□) is an Epanechnikov kernel function defined as

<math> <mrow> <msub> <mi>K</mi> <mi>E</mi> </msub> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>1</mn> <mo>-</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> </mtd> <mtd> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> <mo><</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> </mtd> <mtd> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> <mo>&GreaterEqual;</mo> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow> </math>

2.2) T-D attention significance calculation is defined as:where ρ is the Barcharya (Bhattacharyya) coefficient.

Step 3, fusing bidirectional attention by adopting particle filtering, calculating the weight of the particles, and forming new particle distribution after resampling

The particle filtering is regarded as a Bayesian estimation solving process, namely, an observed value is given, a condition expectation value of the current state is estimated, the process is divided into three links of importance sampling, weight calculation and resampling,

3.1) importance sampling

Defining observation likelihood probabilities of B-U and T-D attention respectively, updating the particle weight by Bayesian fusion of the observation likelihood probabilities of B-U and T-D,

the importance sampling method adopts the motion significance characteristics to adjust the density of Gaussian random particle sampling as described in the step 1.4);

3.2) weight calculation

Using resampling to eliminate particles with lower weight, making the particles gather around the particles with higher weight, and setting the particle state asObserved value is Z^kThen the posterior probability density function at time k is approximated as:

wherein,i is 1,2, …, N, assuming bi-directional fusion saliency map state x_0:tB-U and T-D attention observations areAndthe posterior probability P (X)^T|Z^T) Is shown as

The assumption is that: (i) the temporal dynamic process conforms to the markov process; (ii) the observed values at different times are independent of each other, and the observed values are only related to the current state, so that the posterior probability solution of the bidirectional attention is simply deduced as follows:

\begin{matrix} p (x_{0 : t} | z_{1 : t}^{B - U}, z_{1 : t}^{T - D}) \\ = \frac{p (z_{t}^{B - U} | x_{0 : t}, z_{1 : t - 1}^{B - U}, z_{1 : t}^{T - D}) p (x_{0 : t} | z_{1 : t - 1}^{B - U}, z_{1 : t}^{T - D})}{p (z_{t}^{B - U} | z_{1 : t - 1}^{B - U}, z_{1 : t}^{T - D})} \\ = \frac{1}{k_{t}} p (z_{t}^{B - U} | x_{t}) p (z_{t}^{T - D} | x_{t}, z_{t}^{B - U}) p (x_{t} | x_{0 : t - 1}) p (x_{0 : t - 1} | z_{1 : t - 1}^{B - U}, z_{1 : t - 1}^{T - D}) \end{matrix}

according to the sampling theorem of importance, the weight of a particle is λ⁽ⁱ⁾Proportional ratioIs represented as follows:

then there are:

wherein,represents the conditional probability of the B-U attention observation under the current particle attention state, andis the conditional probability of the T-D attention observation under the B-U observation and the current particle attention state,andthe weight of the updated particle is directly determined, and is defined as follows:

3.3) resampling

After the weight of the particle is recalculated, resampling is added to eliminate the particles with lower weight, so that the particles are gathered around the particles with high weight.

Step 4, calculating an attention saliency map SM 'according to the particle distribution state at the moment'_tAnd determining the target position

4.1) particle saliency map SM'_t

After resampling, the density degree of particle distribution reflects the attention intensity, the attention significance of the dense particle distribution area is strong, the attention significance of the sparse distribution area is weak,

according to the distribution state of the particles, the attention saliency is defined as follows on a two-dimensional space:

wherein (x, y) is the spatial position of the particle distribution, n is the number of particles,for the window width, the window function is a two-dimensional gaussian window function, and then the above equation is transformed:

a schematic diagram of the calculation result of attention saliency is shown in fig. 6(a), and a current attention saliency map is shown in fig. 6(b) as a current attention saliency value after superimposing and normalizing a calculated attention saliency value and a moving attention

4.2) target location

After the processing, the distribution state of all the particles in each frame of image is obtained, the larger the particle swarm density is, the higher the possibility of existence of the moving target is, therefore, the target position information is estimated by calculating the particle swarm position mean value, and the calculation formula is as follows:and (5) obtaining the finished product.

The final target location result is shown in fig. 6 (c).

Fig. 7 and 8 show the test experiment results of the "aircraft" sequence and the "horse" sequence, respectively. The "aircraft" sequence: the target to be detected is selected as a red-coat pilot, the target slides at a high speed in the air, the background is a ground distant view object, the texture is complex, the aircraft moves towards the upper part of the image, and the lens moves along with the aircraft; the "horse" sequence: the target to be measured is selected as a horse, the target moves violently, the background is trees and grasslands, the texture changes flatly, the horse runs to the left side of the image, and the lens moves to the left side rapidly. The test result shows that the overall effect of the method is superior to that of the MA algorithm, the GMC-VA algorithm and the YumingFang algorithm. The MA algorithm is influenced by the motion estimation precision, so that the motion attention calculation is inaccurate, when the motion is relatively simple, a good effect can be obtained, such as an 'aircraft' sequence, and when the motion is strong, the motion attention is relatively messy, and the significance of a motion target cannot be effectively reflected, such as a 'horse' sequence; the GMA-VA algorithm integrates global motion compensation and static attention, but generally, the target significance effect is poor due to the fact that global motion estimation has errors; the experimental result shows that the method of the invention has a significant attention map, compared with the attention maps of the MA algorithm, the GMC-VA algorithm and the YumingFang algorithm, the background noise interference is greatly weakened, the particle distribution state changes along with the motion state and the target position due to the particle filtering process, and the particles are finally converged and gathered in the motion target area, so that the attention accuracy is greatly improved.

Claims

1. A moving target detection method based on a particle filter visual attention model is characterized in that firstly, a particle filter bidirectional fusion attention model is constructed according to a Bayesian estimation principle; and then on the basis of a particle filter bidirectional fusion attention model framework, taking the motion attention and the target color attention as B-U and T-D attention inputs respectively, calculating and changing the distribution state of particles through particle weight values to form an attention saliency map, and finally determining the position of the motion target.

2. The method of claim 1, comprising the steps of:

step 2, calculating the T-D color attention according to the target characteristics;

step 3, adopting particle filtering to fuse bidirectional attention, calculating a particle weight, and forming new particle distribution after resampling;

step 4, calculating the attention saliency map according to the particle distribution state at the momentAnd determines the target location.

3. The method for detecting the moving object based on the particle filter visual attention model according to claim 2, wherein the step 1 is performed according to the following specific steps,

1.1) Gaussian multiscale decomposition of images

The multi-scale analysis adopts a Gaussian image pyramid method;

The calculation is carried out according to the calculation,

the motion vectors are processed by median filtering after superposition, namely, for each non-zero motion vector, the median of the adjacent motion vectors is used for replacing the value of the motion vector;

1.3) calculating the attention of the movement as the attention of the B-U

whereinAndrepresenting motion at times t and t-1, respectivelyA motion vector with coordinate position (i, j) in the vector field,representsThe mean of the motion vectors within the neighborhood a,

In the formula, alpha and beta are coefficients with positive values;

1.4) byControlling particle importance sampling

wherein i and j represent the horizontal and vertical coordinates, mean values, respectively, in the saliency mapSum variance

<math> <mrow> <msubsup> <mi>δ</mi> <mi>A</mi> <mn>2</mn> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mo>{</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>|</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msup> <mi>SM</mi> <mi>t</mi> </msup> <mo>}</mo> </mrow> </munder> <msup> <mrow> <mo>[</mo> <mi>SM</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>u</mi> <mo>]</mo> </mrow> <mn>2</mn> </msup> </mrow> <mi>N</mi> </mfrac> <mo>,</mo> </mrow> </math>

4. The method for detecting the moving object based on the particle filter visual attention model according to claim 3, wherein the step 2 comprises the following specific steps:

2.1) setting the color histogram as the target feature quantization expression method, and recording asm is the number of components, the particle targetThe color distribution of the region is defined as

<math> <mrow> <msub> <mi>K</mi> <mi>E</mi> </msub> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>1</mn> <mo>-</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> </mtd> <mtd> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> <mo><</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> </mtd> <mtd> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <mo>|</mo> <mo>&GreaterEqual;</mo> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>

2.2) T-D attention significance calculation is defined as:wherein ρ is the Barcharian coefficient.

5. The method according to claim 4, wherein the step 3 specifically comprises the following steps:

3.1) importance sampling

3.2) weight calculation

The elimination weight value by resampling is lowerThe particles of (2) are aggregated around the particles having a high weight, and the state of the particles is set toIf the observed value is Zk, the posterior probability density function at the time k is approximated as:

The assumption is that: (i) the temporal dynamic process conforms to the markov process; (ii) the observed values at different time are mutually independent, and the observed values are only related to the current state, so that the posterior of the bidirectional attention is obtained

The probability solution is simply derived as follows:

\begin{matrix} p (x_{0 : t} | z_{1 : t}^{B - U}, z_{1 : t}^{T - D}) \\ = \frac{p (z_{t}^{B - U} | x_{0 : t}, z_{1 : t - 1}^{B - U}, z_{1 : t}^{T - D}) p (x_{0 : t} | z_{1 : t - 1}^{B - U}, z_{1 : t}^{T - D})}{p (z_{t}^{B - U} | z_{1 : t - 1}^{B - U}, z_{1 : t}^{T - D})} \\ = \frac{1}{k_{t}} p (z_{t}^{B - U} | x_{t}) p (z_{t}^{T - D} | x_{t}, z_{t}^{B - U}) p (x_{t} | x_{0 : t - 1}) p (x_{0 : t - 1} | z_{1 : t - 1}^{B - U}, z_{1 : t - 1}^{T - D}) \end{matrix}

then there are:

3.3) resampling

And after the weight of the particles is recalculated, resampling is added to eliminate the particles with lower weight, so that the particles are gathered around the particles with higher weight.

6. The method according to claim 5, wherein the step 4 specifically comprises the following steps:

4.1) particle saliency map SM'_t

superposing and normalizing the attention significance value obtained by calculation with the movement attention to be used as a current attention significance value;

4.2) target location

After the processing, the distribution state of all the particles in each frame of image is obtained, and the target position information is estimated by calculating the position mean value of the particle swarm, wherein the calculation formula is as follows:

and (5) obtaining the finished product.