CN112991395B

CN112991395B - Vision tracking method based on foreground condition probability optimization scale and angle

Info

Publication number: CN112991395B
Application number: CN202110462758.7A
Authority: CN
Inventors: 安志勇; 刘晓庆; 原达; 赵峰; 王彦
Original assignee: Shandong Technology and Business University
Current assignee: Shandong Technology and Business University
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2022-04-15
Anticipated expiration: 2041-04-28
Also published as: CN112991395A

Abstract

The invention discloses a visual tracking method based on foreground condition probability optimization scale and angle, and belongs to the field of computer vision. The method comprises the following implementation steps: 1) reading a video frame sequence, and calculating a regression frame, a segmentation mask (foreground) and a minimum external frame of the frame by using a SimMask method; 2) calculating the proportion of the foreground area in the minimum external frame; 3) when the proportion is smaller than a set threshold value, calculating the reliability of the minimum external frame of the frame; 4) selecting different strategies according to the reliability to optimize the minimum bounding box dimension; 5) performing offset setting aiming at the angle of the tracking frame after the scale optimization; 6) IoU values of each offset angle rotation frame and the foreground are calculated; 7) the tracker adapts the rotation box that outputs the largest value of the foreground IoU. The visual tracking method effectively improves the overall performance of target tracking under the complex conditions of target motion, rotation, scale change and the like.

Description

Vision tracking method based on foreground condition probability optimization scale and angle

Technical Field

The invention relates to the field of computer vision, in particular to a vision tracking method based on foreground condition probability optimization scale and angle.

Background

Target tracking is a hot point problem of computer vision, is a premise and a basis for completing higher-level image understanding, and is widely applied to the fields of intelligent video monitoring, human-computer interaction, visual navigation, medical diagnosis and the like. In the process of tracking the target, background interference (shielding, illumination change and the like) and interference of target change (rotation, scale change, deformation and the like) are often encountered, and the problem of tracking offset is caused in subsequent frames, so that the target tracking is a very challenging problem in the field of computer vision.

In recent years, tracking methods are gradually shifted to discriminant by generative methods, which are represented by correlation filtering and deep learning. At present, the twin network series based on deep learning combines the advantages of relevant filtering and deep learning, and excellent performance is achieved in recent years. The twin network is a special neural network structure, the template image and the search image share the weight of a neural network, and the target position is determined after the characteristic diagrams of the two branches are mutually correlated.

The siamf fc network proposed by Bertinetto et al, which is called the mountain-opening operation of the twin network, is very fast in tracking speed, but requires multi-scale detection, and thus tracking accuracy is not ideal. The SiamRPN proposed by Li and the like is added into a regional suggestion network, and compared with the SiamFC, the network uses boundary box regression to replace multi-scale detection, so that the obtained tracking box is more accurate. However, the tracking frames obtained by the above network are all horizontal, and a large loss is generated when the target rotates. The SiamMask proposed by Wang et al adds a segmentation branch on the basis of SiamRPN, and realizes the combination of image segmentation and target tracking. The network first segments the target and then computes the minimum bounding box of the segmentation mask, which is used as the tracking box. However, the calculation method has a great defect that when the target moves, the tracking frame contains more backgrounds, and the direct calculation of the minimum circumscribing frame angle generates great deviation, which results in accuracy reduction and tracking drift.

Aiming at the problems, the invention provides a visual tracking method based on the foreground condition probability optimization scale and angle, which optimizes the tracking frame from the scale and the angle, improves the foreground proportion in the tracking frame and inhibits background interference.

Disclosure of Invention

The invention relates to a visual tracking method based on foreground condition probability optimization scale and angle aiming at inaccurate scale and angle of a tracking frame, aiming at solving the instability and inaccuracy of the scale and angle of the tracking frame caused by complex scenes such as movement, rotation, scale change and the like of a target.

The technical scheme of the invention is as follows:

a visual tracking method based on foreground condition probability optimization scale and angle comprises the following steps:

(1) reading a video frame sequence, and calculating a regression frame, a segmentation mask (foreground) and a minimum external frame of the frame by using a SimMask method;

(2) calculating the proportion of the foreground area in the minimum external frame;

(3) when the proportion is smaller than a set threshold value, calculating the reliability of the minimum external frame of the frame;

(4) selecting different strategies according to the reliability to optimize the minimum bounding box dimension;

(5) performing offset setting aiming at the angle of the tracking frame after the scale optimization;

(6) calculating IoU (Intersection over Union) values of each offset angle rotation frame and the foreground;

(7) the tracker adapts the rotation box that outputs the largest value of the foreground IoU.

In step 1, first, the vertex coordinates of the minimum bounding box M are obtained and recorded as:

M＝[(x_A,y_A),(x_B,y_B),(x_C,y_C),(x_D,y_D)] (1)

in step 2, in the minimum external frame M, the ratio of the foreground F is:

f _ area is the area of foreground F, M _ area is the area of minimum bounding box M, M_FIs the proportion of the foreground area to the minimum circumscribing frame.

Setting a threshold value rho to evaluate the foreground proportion when M_FWhen the ratio of the foreground in the minimum external frame M is higher than the threshold value rho, the original scale of the minimum external frame M is directly output without scale optimization; otherwise, the next dimension optimization is needed.

In step 3, calculating the reliability of the minimum bounding box M of the frame can be divided into the following steps:

s3.1 the regression frame R is rotated to an angle consistent with the minimum external frame M and is marked as R'. The vertex coordinates of R' are noted as:

R′＝[(x_a,y_a),(x_b,y_b),(x_c,y_c),(x_d,y_d)] (3)

s3.2, searching the target searching area of the current frame { (x)₁,y₁)，(x₂,y₂)，.....，(x_n,y_n) Denoted sample space S. When a certain sample point appears in the rectangle M, the trace frame A occurs, and when the sample point appears in the rectangle R', the trace frame B occurs.

First, under the condition that the tracking frame a occurs, the conditional probability of the tracking frame B occurrence is denoted as P1, specifically:

meanwhile, under the condition that the tracking frame B occurs, the conditional probability that the tracking frame a does not occur is recorded as P0, specifically:

wherein P (A), P (B), P (A &) B),

The probability of the occurrence of the tracking frame A, the probability of the occurrence of the tracking frame B, the probability of the simultaneous occurrence of the tracking frames A and B, and the probability of the occurrence of the tracking frame B but the non-occurrence of the tracking frame A are respectively represented by the following probability calculation formula:

the conditional probability P1 reflects the similarity of the two, and P0 reflects the difference of the two. When the P1 value is larger and the P0 value is smaller, the similarity between the M frame and the R' frame is larger, and the reliability of the M frame is higher; conversely, when the smaller the P1 value is or the larger the P0 value is, the greater the difference between the two values is, the lower the reliability of M at this time is, the thresholds β and α are set to be compared with the conditional probabilities P1 and P0, respectively, and the reliability of the minimum bounding box M is evaluated.

In step 4, two strategies are used to optimize the dimension of the minimum bounding box M:

case 1: when P1> beta and P0< alpha are simultaneously satisfied, the reliability of the minimum bounding box M is higher, and the intersection part of the two is taken as a new tracking box T:

T＝M∩R′ (7)

case 2: when P1< β or P0> α, the reliability of the minimum bounding box M is low, redefining the scale as:

r ' _ w and R ' _ h are the width and height of R ', respectively, and M _ w and M _ h are the width and height of M, respectively, (in any rectangle, width < height). Setting the minimum value of the widths of M and R' as the width of a new tracking frame T, and recording the minimum value as T _ w; and setting the average value of the height of the two as the height thereof, and marking the height as T _ h, wherein the central coordinate and the angle are unchanged. Finally, outputting a tracking frame T after scale optimization:

T＝[(x₁,y₁),(x₂,y₂),(x₃,y₃),(x₄,y₄)] (9)

further angle optimization is performed for T at steps 5-7.

In step 5, n offset thresholds with intervals μ are set clockwise and counterclockwise for the angle θ of the tracking frame T, i.e. a set of tracking frames { T of different angles is generated₁(θ-nμ)，...，T_n(θ-μ)，T_n+1(θ)，T_n+2(θ+μ)，...，T_2n+1(θ + n μ) }, from T_i(i ═ 1,2, …, n, …,2n +1) indicates that the next calculation is performed for a rotating frame at any angle in the set.

In step 6, each angular rotation frame is calculatedT_iThe IoU value with foreground includes the following steps:

and S6.1, carrying out noise reduction on the frame of picture by adopting a mean shift algorithm, and carrying out smooth filtering on the image color level.

S6.2 rotating frames T_iDrawing the edges of the rectangle and filling colors, setting the colors to be (255, 255, 255), and forming a distinguishing degree with other pixels;

s6.3, converting the RGB image of the frame into a gray-scale image, and giving different weights to three channels:

Gray＝R*0.299+G*0.587+B*0.114 (10)

s6.4, carrying out binarization processing on the gray level image, converting the gray level image into a binary matrix consisting of 0 and 1, and recording the binary matrix as

1 is a rotating frame T_iAll pixels in the pixel array, 0 is the other pixels.

S6.5 pairs of rotating frames T_iAnd performing dot multiplication operation on the binary matrix of the foreground to obtain an intersection matrix:

M_FPIa binary matrix of the foreground pixel map, which is the dot product of the corresponding element in the matrix, M_IPIIs the intersection matrix of the two.

S6.6 the area of the matrix is:

h (X, Y) is the value at coordinate (X, Y), and X, Y are the rows and columns of the matrix.

S6.7, after the intersection is known, the IoU values of the two are:

arbitrary T_iThe area is equal to T, denoted as T _ area,f _ area is the foreground area.

In step 7, the specific steps are as follows:

the rotating box with the maximum IoU value of the tracker output and the target foreground is recorded as

The formula is as follows:

namely, the final tracking frame obtained after the tracker is jointly optimized in scale and angle.

Compared with the prior art, the method provided by the invention has the advantages that:

(1) the defects existing in the combination of image segmentation and target tracking are analyzed and discovered.

(2) The accuracy of the tracking frame and the probability of the foreground are improved in the aspect of scale optimization.

(3) And for complex scenes such as severe target motion and the like, the angle of the tracking frame is still accurate.

Drawings

FIG. 1 is a schematic flow chart of the algorithm of the present invention;

fig. 2 shows that the video sets with three moving targets in the VOT2018 data set are selected for comparison by combining the comparison of the real frame and the SiamMask network tracking effect.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Referring to fig. 1, the example provided by the present invention uses the SiamMask tracking algorithm as a baseline tracker, and the specific implementation manner is as follows.

(1) Reading a video frame sequence, and calculating a regression frame, a segmentation mask (foreground) and a minimum external frame of the frame by using a SimMask method. Firstly, obtaining the vertex coordinates of the minimum external frame M, and recording as:

M＝[(x_A,y_A),(x_B,y_B),(x_C,y_C),(x_D,y_D)] (1)

(2) in the minimum circumscribing frame M, the ratio of the foreground F is:

Set threshold ρ 0.9 to evaluate foreground proportion when M_FWhen the ratio of the foreground in the minimum external frame M is higher than the threshold value rho, the original scale of the minimum external frame M is directly output without scale optimization; otherwise, the next dimension optimization is needed.

(3) And when the proportion is smaller than a set threshold value, calculating the reliability of the minimum external frame of the frame. The method comprises the following specific steps:

3.1 the regression frame R is rotated to an angle corresponding to the minimum circumscribing frame M and is denoted as R'. The vertex coordinates of R' are noted as:

R′＝[(x_a,y_a),(x_b,y_b),(x_c,y_c),(x_d,y_d)] (3)

3.2 search area of current frame target { (x)₁,y₁)，(x₂,y₂)，.....，(x_n,y_n) Denoted sample space S. When a certain sample point appears in the rectangle M, the tracking frame A is formed, and when the certain sample point appears in the rectangle R', the tracking frame A is formedBlock B occurs.

wherein P (A), P (B), P (A &) B),

the conditional probability P1 reflects the similarity of the two, and P0 reflects the difference of the two. When the P1 value is larger and the P0 value is smaller, the similarity between the M frame and the R' frame is larger, and the reliability of the M frame is higher; conversely, when the smaller the P1 value is or the larger the P0 value is, the greater the difference between the two values is, the lower the reliability of M is, the threshold β of 0.8 and α of 0.2 are set to be compared with the conditional probabilities P1 and P0, respectively, and the reliability of the minimum circumscribed frame M is evaluated.

(4) Selecting different strategies according to the reliability to optimize the minimum bounding box dimension, and totally dividing the minimum bounding box dimension into two strategies:

T＝M∩R′ (7)

T＝[(x₁,y₁),(x₂,y₂),(x₃,y₃),(x₄,y₄)] (9)

further angle optimization is performed on T in (5) - (7).

(5) And performing offset setting aiming at the angle of the tracking frame after the scale optimization. The angle theta of the tracking frame T is respectively shifted by 20 degrees clockwise and anticlockwise, and the generated tracking frame set T₁(θ-20°)，T₂(θ)，T₃(θ +20 °), represented by T_i(i is 1,2,3) indicates that the next calculation is performed for a rotating frame at an arbitrary angle in the set.

(6) Calculating IoU values for each offset angle rotation frame and the foreground, comprising the steps of:

6.1, adopting a mean shift algorithm to perform noise reduction processing on the frame of picture, and performing smooth filtering on the image color level.

6.2 pairs of rotating frames T_iDrawing the edges of the rectangle and filling colors, setting the colors to be (255, 255, 255), and forming a distinguishing degree with other pixels;

6.3, converting the RGB image of the frame into a gray map, and giving different weights to three channels:

Gray＝R*0.299+G*0.587+B*0.114 (10)

6.4 binarization processing is carried out on the gray level image, the gray level image is converted into a binary matrix consisting of 0 and 1, and the binary matrix is marked as M_TiPI1 is a rotating frame T_iInner imagePixel 0 is the other pixel.

6.5 pairs of rotating frames T_iAnd performing dot multiplication operation on the binary matrix of the foreground to obtain an intersection matrix:

6.6 the matrix area is:

6.7 knowing the intersection, the IoU values for both are:

arbitrary T_iThe area is equal to T, and is marked as T _ area, and F _ area is the foreground area.

(7) The rotation box with the maximum tracker adaptive output and foreground IoU values is recorded

The formula is as follows:

In order to verify the overall tracking performance of the invention, the invention verifies the VOT data set of a single-target mainstream tracking platform, 60 videos on the VOT2018 data set, and overall evaluation is performed on three indexes of Robustness (Robustness), Accuracy (Accuracy) and Average Overlap Expectation (EAO).

In table 1, the comparison of the present invention with the SiamMask network under three indexes under the VOT2018 data set is shown, as shown in the table, the present invention improves the accuracy by about 3.6% and the EAO by about 1.9% compared to the baseline SiamMask network.

Table 1 comparison of VOT2018 dataset results

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A visual tracking method based on foreground condition probability optimization scale and angle is characterized in that an original tracking frame is optimized from two aspects of scale and angle respectively, the foreground proportion in the tracking frame is improved, and steps 2-4 are scale optimization; the steps 5-7 are angle optimization, and the visual tracking method comprises the following specific steps:

step 1: reading a video frame sequence, and calculating a regression frame, a segmentation mask foreground and a minimum external frame of the frame by using a SimMask method;

step 2: calculating the proportion of the foreground area in the minimum external frame;

and step 3: when the proportion is smaller than a set threshold value, calculating the reliability of the minimum external frame of the frame;

and 4, step 4: selecting different strategies according to the reliability to optimize the minimum bounding box dimension;

and 5: performing offset setting aiming at the angle of the tracking frame after the scale optimization;

step 6: IoU values of each offset angle rotation frame and the foreground are calculated;

and 7: the tracker adaptively outputs the rotating frame with the maximum value of the foreground IoU;

the specific steps of step 3 are as follows: the reliability of the minimum external frame M of the frame is calculated, and the method can be divided into the following steps:

s3.1, rotating the regression frame R to form an angle consistent with the minimum external frame M, and recording the vertex coordinates of R' as:

R′＝[(x_a,y_a),(x_b,y_b),(x_c,y_c),(x_d,y_d)] (3)

s3.2, searching the target searching area of the current frame { (x)₁,y₁)，(x₂,y₂)，.....，(x_n,y_n) Recording the result as a sample space S, and when a certain sample point appears in the minimum external frame M, the result is a tracking frame A, and when the certain sample point appears in a rectangle R', the result is a tracking frame B;

setting threshold values beta and alpha to be compared with conditional probabilities P1 and P0 respectively, and evaluating the reliability of the minimum external frame M;

the specific steps of step 4 are as follows: the scale of the minimum circumscribed frame M is optimized by two strategies:

T＝M∩R′ (6)

r '_ w and R' _ h are the width and height of R 'respectively, M _ w and M _ h are the width and height of M respectively, and the minimum value of the widths of M and R' is set as the width of a new tracking frame T and is recorded as T _ w; setting the height mean value of the two as the height, recording the height as T _ h, keeping the central coordinate and the angle unchanged, and finally outputting a scale-optimized tracking frame T:

T＝[(x₁,y₁),(x₂,y₂),(x₃,y₃),(x₄,y₄)] (8)

further angle optimization is performed for T at steps 5-7.

2. The visual tracking method based on the foreground conditional probability optimization scale and angle as claimed in claim 1, wherein the specific steps of step 2 are as follows: firstly, obtaining the vertex coordinates of the minimum external frame M, and recording as:

M＝[(x_A,y_A),(x_B,y_B),(x_C,y_C),(x_D,y_D)] (1)

in the minimum circumscribing frame M, the ratio of the foreground F is:

f _ area is the area of foreground F, M _ area is the area of minimum bounding box M, M_FThe ratio of the foreground area to the minimum circumscribing frame;

3. The visual tracking method based on the foreground conditional probability optimization scale and angle as claimed in claim 1, wherein the specific steps of step 5 are as follows: setting n offset thresholds with interval of mu for angle theta of tracking frame T clockwise and anticlockwise respectively, namely generating tracking frame set { T of different angles₁(θ-nμ)，...，T_n(θ-μ)，T_n+1(θ)，T_n+2(θ+μ)，...，T_2n+1(θ + n μ) }, from T_i(i ═ 1,2, …, n, …,2n +1) indicates that the next calculation is performed for a rotating frame at any angle in the set.

4. The visual tracking method based on the foreground conditional probability optimization scale and angle as claimed in claim 1, wherein the calculation of each angle rotation frame T of step 6_iThe IoU value with foreground includes the following steps:

s6.1, carrying out noise reduction on the frame of picture by adopting a mean shift algorithm, and carrying out smooth filtering on the image color level;

Gray＝R*0.299+G*0.587+B*0.114 (9)

1 is a rotating frame T_iAll pixels in the pixel array, 0 is other pixels;

s6.5 pairs of rotating frames T_iAnd two of the foregroundPerforming dot multiplication operation on the value matrix to obtain an intersection matrix:

M_FPIa binary matrix of the foreground pixel map, which is the dot product of the corresponding element in the matrix, M_IPIIs the intersection matrix of the two;

s6.6 the area of the matrix is:

h (X, Y) is the value at coordinate (X, Y), X, Y are the rows and columns of the matrix;

s6.7, after the intersection is known, the IoU values of the two are:

5. The visual tracking method based on the foreground conditional probability optimization scale and angle as claimed in claim 3, wherein the specific steps of step 7 are as follows:

The formula is as follows: