CN111462181A

CN111462181A - Video single-target tracking method based on rectangular asymmetric inverse layout model

Info

Publication number: CN111462181A
Application number: CN202010235520.6A
Authority: CN
Inventors: 郑运平; 鲁梦如
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-28
Anticipated expiration: 2040-03-30
Also published as: CN111462181B

Abstract

The invention discloses a video single-target tracking method based on a rectangular asymmetric inverse layout model, which comprises the following steps: 1) initializing a first frame, and dividing a same-class block for a target image by using a rectangular asymmetric inverse layout model (RNAM) and an extended Gouraud shadow method to establish a target segmentation template; 2) generating a positive sample set and a negative sample set; 3) extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating parameters of a naive Bayes classifier; 4) selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting Haar-like characteristics according to a target segmentation template; 5) the classifier responses of all candidate sample bounding boxes are calculated by using a naive Bayes classifier, and the sample bounding box with the largest classifier response is selected as the predicted target position of the current frame. The method can effectively describe the target image by less block numbers, can resist the influence of interference such as illumination change, shielding and the like, and saves the calculation time.

Description

Video single-target tracking method based on rectangular asymmetric inverse layout model

Technical Field

The invention relates to the technical field of computer vision, in particular to a video single-target tracking method based on a rectangular asymmetric inverse layout model.

Background

Computer vision is a comprehensive discipline, and the main research is how to make a machine acquire information like a human, that is, a camera is used to replace a human visual organ to acquire information such as images or videos, and then a computer replaces a human brain to process and process the information.

Target tracking is one of the main research fields of computer vision, and is also the basic core technology of computer vision, and many subsequent processes in the field of computer vision need to be based on target tracking, such as target recognition, target motion analysis and other high-level processes. Target tracking technology has been applied in numerous fields, such as intelligent monitoring systems, human-computer interaction, visual navigation, robotic services, and the like.

The video is a continuous image sequence, each frame of the video is an image and is generally a natural image with a complex background, the target in each frame of image needs to be tracked for video target tracking, and the fact is that the position, the scale and other information of the target are obtained through analysis and calculation of the video sequence, so that the target in the continuous video sequence is tracked.

The invention considers the general situation of video target tracking, namely video single target tracking for tracking a single object in a video sequence given the target position and scale information of the first frame of the video sequence. Generally, the target tracking algorithm consists of three parts, namely a target description model, prediction of target positions and model updating. Wherein an effective target description model is crucial for the success of the tracking algorithm. With the improvement of the performance of computers and cameras and the reduction of the cost, the demand of automatic video analysis is increasing, and a video target tracking algorithm has become one of the research hotspots nowadays.

The method includes the steps of establishing a target model by an online learning mode according to a basic idea of algorithm design, searching and reconstructing an image region with the minimum error by using a model, completing target positioning, modeling a target region in a current frame, searching a region most similar to the model as a predicted target position in a next frame, comparing a famous image with a Kalman Filter, a Particle Filter, a Mean-Shift algorithm and the like, modeling a Kalman Filter (Kalman Filter) instead of characteristics of a target, estimating a position of the target in the next frame, calculating a probability density function by searching a group of random samples spread in a state space based on Particle distribution statistics, replacing an operation with a sample Mean value, obtaining a minimum variance estimation of a system state, calculating a Mean-Shift distribution based on the probability density distribution, converging on a local peak of the probability density distribution, extracting a background classification information by using a classical classification algorithm, extracting a background classification method based on a classification algorithm, and a classification method by using a conventional CT classification algorithm, extracting background classification information, and a background classification method by using a background classification algorithm, extracting background classification information, and a classification method by using a conventional classification algorithm, wherein a background classification method is used for classifying a background classification method for extracting a background classification information, a background classification method for extracting a background classification information, a background classification method for classifying a background classification, a background classification method for obtaining a background classification method for a classification method for extracting a classification method for a background classification, which is well distinguishing a classification method for extracting a classification target.

However, due to the challenges such as illumination variation, scale variation, occlusion interference, and motion blur of the target image, it is still very difficult to design an effective target tracking model. Although many target tracking algorithms have shown their advantages, there are still some problems to be solved. For example, the discriminant class algorithm mostly uses a complex classifier model, which increases the complexity of calculation, and thus lacks real-time performance; some algorithms may learn negative examples due to the use of global and fixed update strategies, and thus have the problem of tracking drift.

Disclosure of Invention

The invention aims to overcome the defects of the existing video single-target tracking technology, provides a video single-target tracking method based on a rectangular asymmetric inverse layout model, can effectively describe a target image by fewer blocks, can resist the influence of interference such as illumination change, shielding and the like, saves the calculation time and greatly improves the real-time performance of a video tracking algorithm; in addition, the invention uses a naive Bayes classifier to learn the positive and negative samples, and has high classification accuracy and high classification speed.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a video single-target tracking method based on a rectangular asymmetric inverse layout model comprises the following steps:

1) initializing a first frame, and dividing a target image into homogeneous blocks by using a rectangular non-symmetric inverse layout model (RNAM) and an extended Gouraud shadow method to establish a target segmentation template;

2) generation of positive and negative sample sets

Randomly selecting a group of bounding boxes as a positive sample set in the area of the current frame close to the target position, and randomly selecting a group of bounding boxes as a negative sample set in the area of the current frame far from the target position;

3) updating classifier parameters

Extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating parameters of a naive Bayes classifier;

4) candidate sample bounding box selection

Selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting Haar-like characteristics according to a target segmentation template;

5) prediction of target location

The classifier responses of all candidate sample bounding boxes are calculated by using a naive Bayes classifier, and the sample bounding box with the largest classifier response is selected as the predicted target position of the current frame.

In step 1), the rectangular asymmetric inverse layout model RNAM is specifically as follows:

the asymmetric inverse layout model NAM is a general type mode representation model, the rectangular NAM (named RNAM) is a NAM mode representation model based on a sub-rectangular mode, and the following is an abstract description of the NAM:

assuming that the original mode is that the recovered undistorted mode is 'and the distortion mode is', NAM is a undistorted transform from 'to', or a distorted transform from 'to':

′＝T()，″≈T()

wherein, T () is a forward transform function, or called coding function, and the forward coding process is:

wherein,' is a synthesized pattern after encoding; p ═ P₁,p₂,...,p_nIs a predefined set of sub-patterns, n is the number of types of sub-patterns, p_j∈ P is the jth sub-pattern in P, j is more than or equal to 1 and less than or equal to n, v is P_jA is p_jA parameter set of the sub-mode or a sub-mode set selected through intelligent analysis; a is_iIs sub-pattern p_jI is more than or equal to 1 and less than or equal to m_iM is p_jThe number of (2); (d) is the debris mode, i.e. the trash in the container, d is the threshold of the debris spatial scale; the distortion mode NAM is:

apparently, there is ═ + (d);

the basic idea of RNAM is: giving a well-laid pattern and a predefined rectangular sub-pattern of different shape, then extracting the sub-patterns from the given pattern, and using the combination of the sub-patterns to represent the given pattern;

the extended gouuraud shading method is specifically as follows:

given an error tolerance, all pixel values g (x, y) within a tile B satisfy | g (x, y)-g_est(x,y)|≤，x₁≤x≤x₂，y₁≤y≤y₂Then the rectangular block is called as a homogeneous block, where x₁、y₁Respectively the abscissa and ordinate, x, of the upper left vertex of the rectangular block B₂、y₂Respectively the abscissa and ordinate, g, of the lower right vertex of the rectangular block B_est(x, y) is an approximate gray value at a certain coordinate (x, y) in B, and the calculation formula is as follows:

wherein ,g₁、g₂、g₃ and g₄The gray values g of four corners of the rectangular block B, namely the upper left corner, the upper right corner, the lower left corner and the lower right corner₅＝g₁+(g₂-g₁)×i₂,g₆＝g₃+(g₄-g₃)×i₂,i₁＝(y-y₁)/(y₂-y₁),i₂＝(x-x₁)/(x₂-x₁)；

Suppose { I^tI1, 2, N is a given video sequence, I^tThe video frame corresponding to the moment t, N is the frame number of the video; for the first frame I¹The target position of the first frame is known as L₁(x₁,y₁,w₁,h₁), wherein x₁ and y₁Respectively representing the abscissa and ordinate of the upper left vertex of the target image, w₁ and h₁Respectively representing the width and height of the target image, the target image being noted

Then converting the image into a gray scale image, assuming that W and H are the width and height of the target image respectively, and then giving an error tolerance, dividing the target image into similar blocks by using an RNAM and an extended Gouraud shading method to establish a target segmentation template, wherein the method specifically comprises the following steps:

a. defining a marking matrix R with the size of W × H and initializing all elements of the marking matrix R to be zero;

b. from the target image

Starting with the first entry, the starting point (x) of an unidentified sub-pattern of rectangles is first determined according to the raster scan order₀,y₀) Tracking the corresponding rectangular sub-mode according to a matching algorithm of the rectangular sub-mode, namely an inverse layout algorithm;

c. according to the efficiency scale of the rectangular submodel, namely the area of the rectangular submodel and an extended Gouraud shadow method with the error tolerance, determining a rectangular submodel with the largest area, namely a similar module, and marking the largest rectangular submodel in a matrix R so as to search the next starting point;

d. c, storing the position information of the similar blocks found in the step c, including the coordinates, the width and the height of the top left vertex, into a similar block container F;

e. repeating steps b to d until the target image

Until there are no unmarked blocks;

for the same type of block container F, it is noted V ═ b₁,b₂,…b_nDenotes an object segmentation template, where b_iWhere (x, y, w, h) denotes position information of the ith homogeneous block, and n denotes the number of homogeneous blocks.

In step 2), the generation of the positive and negative sample sets specifically includes:

suppose { I^tI1, 2, N is a given video sequence, I^tIs the video frame corresponding to the time t, N is the frame number of the video, l_tRepresenting the position of the central point of the t frame target frame; in the t-th frame, in a region D close to the target position^α＝{z|||l(z)-l_t'α' randomly selects a group of bounding boxes as a positive sample set and simultaneously selects a region far away from the target position

Randomly selecting a set of bounding boxes to doIs a negative sample set, wherein z represents the selected sample frame, l (z) represents the center point position of the sample frame, α, β,

Is represented by l_tIs the radius of the center.

In step 3), extracting Haar-like features of the positive and negative sample sets according to the target segmentation template specifically includes:

suppose that the ith homogeneous block in the target segmentation template is denoted as b_i＝(x_i,y_i,w_i,h_i), wherein x_i、y_iRespectively representing the abscissa and ordinate, w, of the upper left vertex of the ith block of the same kind_i and h_iRespectively, its width and height; let Int (x, y) represent the element value at the (x, y) position in the integral image of a video frame at a certain time, and the method for extracting the Haar-like feature of the ith homogeneous block according to the target segmentation template comprises the following steps:

Haar(i)＝Int(x_i+w_i,y_i+h_i)+Int(x_i,y_i)-Int(x_i+w_i,y_i)-Int(x_i,y_i+h_i)

the method for extracting the Haar-like characteristics of the bounding boxes in the positive sample set and the negative sample set is the same; assuming that the number of bounding boxes in the positive sample set is M and the number of homogeneous blocks in the target segmentation template is N, the specific steps are as follows:

a. establishing a sample set characteristic matrix S with the size of M × N;

b. calculating an integral image of the current frame;

c. calculating element values in a sample set characteristic matrix S according to a Haar-like characteristic extraction method;

the parameters of the updated naive Bayes classifier are as follows:

calculating the mean value and standard deviation of the Haar-like features corresponding to each similar block to the positive and negative sample sets according to the feature matrix of the positive and negative sample sets, wherein the naive Bayes classifier has two parameters, the first is a feature mean vector which is expressed as P_μ＝{μ₁,μ₂,...,μ_nThe second is a feature standard deviation vector, denoted P_σ＝{σ₁,σ₂,...,σ_n}, wherein μ_iMeans, σ, for the ith Haar-like feature for the positive or negative sample set_iThe standard deviation of the ith Haar-like feature to a positive sample set or a negative sample set is represented, and n represents the number of the Haar-like features, namely the number of similar blocks of the target segmentation template; the strategy for scalar parameter incremental update in two feature vectors is:

μ_i←(1-λ)μ+λμ_i

wherein, lambda represents the learning rate, 0 < lambda < 1, mu represents the mean value of the ith feature to the positive sample set or the negative sample set before the updating of the classifier parameters, and sigma represents the standard deviation of the ith feature to the positive sample set or the negative sample set before the updating of the classifier parameters.

In step 4), a group of candidate sample bounding boxes is selected around the same position of the current frame according to the target position predicted by the previous frame, which is as follows:

suppose { I^tI1, 2, N is a given video sequence, I^tIs the video frame corresponding to the time t, N is the frame number of the video, l_t-1Representing the position of the central point of the target frame of the t-1 th frame; at the t-th frame, at the target position l away from the previous frame_t-₁Radius gamma range D^γ＝{z|||l(z)-l_t-1Randomly selecting a group of candidate sample boundary frames, | < gamma, wherein z represents the selected sample boundary frame, l (z) represents the position of the central point of the sample frame, and gamma represents the position of l_t-1Is the radius of the center.

In step 5), the naive bayes classifier is used to calculate the responses of all candidate sample bounding boxes, and the candidate sample bounding box with the maximum response value is selected as the predicted target position of the current frame, which is specifically as follows:

ha corresponding to same-class blocks in target segmentation templatear-like, denoted by V ═ V₁,v₂,...,v_n)^T, wherein v_i(i-1, 2, …, n) represents the Haar-like feature vector of the ith homogeneous block, and n represents the number of homogeneous blocks in the target segmentation template. Assuming that all elements in V are distributed independently, a naive bayes classifier is constructed as follows:

where p (y ═ 1) and p (y ═ 0) denote the probability that the sample is positive and negative, respectively, y ∈ {0,1} denotes the sample label, assuming a consistent prior p (y ═ 1) ═ p (y ═ 0), and p (v ═ 0)_i1 and p (v)_iY ═ 0) respectively indicates that the positive sample label is positive and negative, the characteristic v_iThe distribution probability of (2); since homogeneous block feature vectors based on RNAM partitioning always follow Gaussian distribution, conditional distribution p (v) in classifier H (v)_i1 and p (v)_iY ═ 0) can be assumed to be gaussian:

wherein ,

and

respectively representing the mean and standard deviation of the positive sample set,

and

respectively representing the mean value and the standard deviation of the negative sample set;

and finally, calculating classifier responses corresponding to all candidate sample bounding boxes through the constructed classifier H (v), and selecting the sample bounding box with the maximum classifier response value as the predicted target position of the current frame.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention adopts the rectangular asymmetric inverse layout model to block the target image to establish the target segmentation template, can effectively highlight the structural characteristics of the target image by less block number, can resist the interference of illumination change, target shielding and the like in various targets, reduces the calculation complexity and meets the requirement of real-time performance.

2. The invention adopts the Haar-like characteristics to extract the image characteristics from the target image, can better describe the tracked target and can quickly calculate, thereby greatly enhancing the robustness and the real-time performance of the tracker.

3. The invention adopts a simple naive Bayes classifier to learn positive and negative samples, selects a candidate sample boundary box with the maximum classifier response as a predicted target position, has stronger classification performance, performs better in the face of tracking targets such as background similar interference, rapid motion, motion blur and the like and complex application environments, has smaller calculation amount compared with a common classifier, and can better meet the real-time performance requirement.

Drawings

FIG. 1 is a logic flow diagram of the present invention.

FIG. 2 is a schematic diagram of classifier updating and predicting target locations.

FIG. 3 is a flow chart of target image segmentation.

Detailed Description

The present invention will be further described with reference to the following specific examples.

As shown in fig. 1 and fig. 2, the method for tracking a single video target based on a rectangular asymmetric inverse layout model provided in this embodiment includes the following steps:

1) initializing a first frame, and dividing a target image into homogeneous blocks by using a rectangular non-symmetric inverse layout model (RNAM for short) and an extended Gouraud shading method to establish a target segmentation template. Specifically, the program first reads in a first frame and a target of the video sequencePosition and scale information (represented using Rect data structure in OpenCV) of target image, and marking of target image as

Then converting the image into a gray image, assuming that W and H are the width and height of the target image respectively, and the given error tolerance is set to be 20, then dividing the target image into similar blocks by using an RNAM and an extended Gouraud shading method to establish a target segmentation template, wherein a flow chart is shown in FIG. 3, and the specific steps are as follows:

a. a tag matrix R of size W × H is defined and all its elements are initialized to zero.

b. From the target image

Starting with the first entry, the starting point (x) of an unidentified sub-pattern of rectangles is first determined according to the raster scan order₀,y₀) And tracking the corresponding rectangular sub-mode according to a matching (inverse layout) algorithm of the rectangular sub-mode.

c. According to the efficiency scale of the rectangular sub-pattern (namely the area of the rectangular sub-pattern) and the extended Gouraud shadow method (the error tolerance is equal to 20), a rectangular sub-pattern with the largest area (namely the homogeneous block) is determined, and the largest rectangular sub-pattern is marked in the matrix R so as to search for the next starting point.

d. And c, storing the position information of the same type of blocks found in the step c, including the coordinates, the width and the height of the top left vertex, into a same type of block container F.

e. Repeating steps b to d until the target image

Until there are no unmarked blocks.

For the same type of block container F obtained by the above procedure, it is noted V ═ b₁,b₂,…b_nDenotes an object segmentation template, where b_iWhere (x, y, w, h) represents the position information of the ith homogeneous block, R in OpenCV may be usedect the data structure indicates that n indicates the number of homogeneous blocks.

2) And generating a positive sample set and a negative sample set, wherein a group of bounding boxes are randomly selected in the area of the current frame close to the target position to serve as the positive sample set, and a group of bounding boxes are randomly selected in the area of the current frame far from the target position to serve as the negative sample set. The selection of the positive and negative sample sets is shown in fig. 2. Let l_tRepresenting the position of the central point of the t frame target frame; at the t-th frame, in a region D closer to the target position^α＝{z|||l(z)-l_t< α (where α takes 4), 45 bounding boxes are randomly selected as a positive sample set, and the regions farther from the target position are simultaneously selected

( wherein

Take 8, β and 30), randomly selecting 50 bounding boxes as a negative sample set, wherein z represents the selected sample box, l (z) represents the central point position of the sample box, α, β,

Is represented by l_tIs the radius of the center.

3) And updating parameters of the classifier, namely extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating the parameters of the naive Bayes classifier. Specifically, first, according to the positive and negative sample sets selected in the second step and the target segmentation template (i.e., the homogeneous block container F) obtained in the first step, Haar-like features of the positive and negative sample sets are extracted, and a feature extraction method is described as follows:

suppose that the ith homogeneous block in the target segmentation template is denoted as b_i＝(x_i,y_i,w_i,h_i), wherein x_i、y_iRespectively representing the abscissa and ordinate, w, of the upper left vertex of the ith block of the same kind_i and h_iRespectively, its width and height; let Int (x, y) denote the element value at (x, y) position in the integral image of video frame at a certain time, and extract it according to ith homogeneous block in target segmentation templateThe method for Haar-like features comprises the following steps:

Haar(i)＝Int(x_i+w_i,y_i+h_i)+Int(x_i,y_i)-Int(x_i+w_i,y_i)-Int(x_i,y_i+h_i)

when extracting the Haar-like features of a certain sample frame in a sample set, each homogeneous block corresponds to a feature, and the method for extracting the Haar-like features of the boundary frames in the positive sample set and the negative sample set is the same, taking the feature extraction of the positive sample set as an example, assuming that the number of the boundary frames in the positive sample set is M (i.e., M is 45), and the number of homogeneous blocks in the target segmentation template is N, the specific steps are as follows:

a. a sample set feature matrix S of size M × N is established.

b. An integral image of the current frame is calculated.

c. And calculating element values in the sample set characteristic matrix S according to a Haar-like characteristic extraction method.

After the characteristics of the positive and negative samples are extracted, the obtained result is a characteristic matrix S of the positive sample set⁺And the feature matrix S of the negative sample set^-. Calculating the mean value and standard deviation of the Haar-like features corresponding to each similar block to the positive and negative sample sets according to the feature matrix of the positive and negative sample sets, wherein the naive Bayes classifier has two parameters, the first is a feature mean vector which is expressed as P_μ＝{μ₁,μ₂,...,μ_nThe second is a feature standard deviation vector, denoted P_σ＝{σ₁,σ₂,...,σ_n}, wherein μ_iMeans, σ, for the ith Haar-like feature for the positive or negative sample set_iThe standard deviation of the ith Haar-like feature to a positive sample set or a negative sample set is represented, and n represents the number of the Haar-like features, namely the number of similar blocks of the target segmentation template; the strategy for scalar parameter incremental update in two feature vectors is:

μ_i←(1-λ)μ+λμ_i

where λ represents the learning rate (0 < λ < 1), λ is 0.85 in this embodiment, μ represents the mean of the ith feature to the positive sample set or the negative sample set before the update of the classifier parameters, and σ represents the standard deviation of the ith feature to the positive sample set or the negative sample set before the update of the classifier parameters.

4) And selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting the Haar-like characteristics according to the target segmentation template. Specifically, assume { I^tI1, 2, N is a given video sequence, I^tIs the video frame corresponding to the time t, N is the frame number of the video, l_t-1Representing the position of the central point of the target frame of the t-1 th frame; at the t-th frame, at the target position l away from the previous frame_t-1Radius gamma range D^γ＝{z|||l(z)-l_t-1And | < γ }, wherein γ is 20, randomly selecting 100 candidate sample boundary frames, wherein z represents the selected sample boundary frame, l (z) represents the position of the center point of the sample frame, and γ represents the position of the center point of the sample frame by l_t-1Is the radius of the center.

The method for extracting the Haar-like characteristics of the candidate sample boundary box is similar to the method for extracting the Haar-like characteristics of the positive and negative sample sets in the step 3), and finally the characteristic matrix of the candidate sample boundary box is obtained.

5) And predicting the target position, calculating the classifier response of all candidate sample bounding boxes by using a naive Bayes classifier, and selecting the sample bounding box with the maximum classifier response as the predicted target position of the current frame. And for Haar-like feature representation corresponding to the same class block in the target segmentation template, marking as V-V (V)₁,v₂,...,v_n)^T, wherein v_i(i-1, 2, …, n) represents the Haar-like feature vector of the ith homogeneous block, and n represents the number of homogeneous blocks in the target segmentation template. Assuming that all elements in V are distributed independently, a naive bayes classifier can be constructed as follows:

where p (y ═ 1) and p (y ═ 0) denote the probability that the sample is positive and negative, respectively, y ∈ {0,1} denotes the sample label, assuming a consistent prior p (y ═ 1) ═ p (y ═ 0), and p (v ═ 0)_i1 and p (v)_iY ═ 0) respectively indicates that the positive sample label is positive and negative, the characteristic v_iThe distribution probability of (2). Since homogeneous block feature vectors based on RNAM partitioning always follow Gaussian distribution, conditional distribution p (v) in classifier H (v)_i1 and p (v)_iY ═ 0) can be assumed to be gaussian:

wherein ,

and

mean and standard deviation of the positive (negative) sample set are indicated, respectively.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A video single-target tracking method based on a rectangular asymmetric inverse layout model is characterized by comprising the following steps:

1) initializing a first frame, and dividing a same-class block for a target image by using a rectangular asymmetric inverse layout model (RNAM) and an extended Gouraud shadow method to establish a target segmentation template;

2) generation of positive and negative sample sets

3) updating classifier parameters

4) candidate sample bounding box selection

5) prediction of target location

2. The method for tracking a single video object based on the rectangular asymmetric inverse layout model according to claim 1, wherein in step 1), the rectangular asymmetric inverse layout model RNAM specifically includes the following steps:

′＝T()，″≈T()

wherein,' is a synthesized pattern after encoding; p ═ P₁,p₂,...,p_nIs a predefined set of sub-patternsN is the number of sub-pattern types, p_j∈ P is the jth sub-pattern in P, j is more than or equal to 1 and less than or equal to n, v is P_jA is p_jA parameter set of the sub-mode or a sub-mode set selected through intelligent analysis; a is_iIs sub-pattern p_jI is more than or equal to 1 and less than or equal to m_iM is p_jThe number of (2); (d) is the debris mode, i.e. the trash in the container, d is the threshold of the debris spatial scale; the distortion mode NAM is:

apparently, there is ═ + (d);

the extended gouuraud shading method is specifically as follows:

given an error tolerance, if all pixel values g (x, y) in a tile B satisfy | g (x, y) -g_est(x,y)|≤，x₁≤x≤x₂，y₁≤y≤y₂Then the rectangular block is called as a homogeneous block, where x₁、y₁Respectively the abscissa and ordinate, x, of the upper left vertex of the rectangular block B₂、y₂Respectively the abscissa and ordinate, g, of the lower right vertex of the rectangular block B_est(x, y) is an approximate gray value at a certain coordinate (x, y) in B, and the calculation formula is as follows:

b. from the target image

e. repeating steps b to d until the target image

Until there are no unmarked blocks;

for the same type of block container F, it is noted V ═ b₁,b₂,···b_nDenotes an object segmentation template, where b_iWhere (x, y, w, h) denotes position information of the ith homogeneous block, and n denotes the number of homogeneous blocks.

3. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 2), the generation of the positive and negative sample sets specifically includes:

Randomly selecting a set of bounding boxes as a negative sample set, wherein z represents the selected sample box, l (z) represents the position of the center point of the sample box, α, β,

Is represented by l_tIs the radius of the center.

4. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 3), extracting Haar-like features of the positive and negative sample sets according to the target segmentation template specifically includes:

suppose the ith homogeneous block table in the target partition templateShown as b_i＝(x_i,y_i,w_i,h_i), wherein x_i、y_iRespectively representing the abscissa and ordinate, w, of the upper left vertex of the ith block of the same kind_i and h_iRespectively, its width and height; let Int (x, y) represent the element value at the (x, y) position in the integral image of a video frame at a certain time, and the method for extracting the Haar-like feature of the ith homogeneous block according to the target segmentation template comprises the following steps:

Haar(i)＝Int(x_i+w_i,y_i+h_i)+Int(x_i,y_i)-Int(x_i+w_i,y_i)-Int(x_i,y_i+h_i)

a. establishing a sample set characteristic matrix S with the size of M × N;

b. calculating an integral image of the current frame;

the parameters of the updated naive Bayes classifier are as follows:

μ_i←(1-λ)μ+λμ_i

5. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 4), a group of candidate sample bounding boxes is selected around the same position of the current frame according to the target position predicted by the previous frame, which is as follows:

suppose { I^tI1, 2, N is a given video sequence, I^tIs the video frame corresponding to the time t, N is the frame number of the video, l_t-1Representing the position of the central point of the target frame of the t-1 th frame; at the t-th frame, at the target position l away from the previous frame_t-1Radius gamma range D^γ＝{z|||l(z)-l_t-1Randomly selecting a group of candidate sample boundary frames, | < gamma, wherein z represents the selected sample boundary frame, l (z) represents the position of the central point of the sample frame, and gamma represents the position of l_t-1Is the radius of the center.

6. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 5), the naive bayes classifier is used to calculate the responses of all candidate sample bounding boxes, and the candidate sample bounding box with the maximum response value is selected as the predicted target position of the current frame, which is specifically as follows:

and for Haar-like feature representation corresponding to the same class block in the target segmentation template, marking as V-V (V)₁,v₂,...,v_n)^T, wherein v_iA Haar-like feature vector representing the ith homogeneous block, i 1,2, …, n,n represents the number of homogeneous blocks in the target segmentation template; assuming that all elements in V are distributed independently, a naive bayes classifier is constructed as follows:

wherein ,

and

and