CN111462181A - Video single-target tracking method based on rectangular asymmetric inverse layout model - Google Patents

Video single-target tracking method based on rectangular asymmetric inverse layout model Download PDF

Info

Publication number
CN111462181A
CN111462181A CN202010235520.6A CN202010235520A CN111462181A CN 111462181 A CN111462181 A CN 111462181A CN 202010235520 A CN202010235520 A CN 202010235520A CN 111462181 A CN111462181 A CN 111462181A
Authority
CN
China
Prior art keywords
target
frame
rectangular
positive
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010235520.6A
Other languages
Chinese (zh)
Other versions
CN111462181B (en
Inventor
郑运平
鲁梦如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010235520.6A priority Critical patent/CN111462181B/en
Publication of CN111462181A publication Critical patent/CN111462181A/en
Application granted granted Critical
Publication of CN111462181B publication Critical patent/CN111462181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video single-target tracking method based on a rectangular asymmetric inverse layout model, which comprises the following steps: 1) initializing a first frame, and dividing a same-class block for a target image by using a rectangular asymmetric inverse layout model (RNAM) and an extended Gouraud shadow method to establish a target segmentation template; 2) generating a positive sample set and a negative sample set; 3) extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating parameters of a naive Bayes classifier; 4) selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting Haar-like characteristics according to a target segmentation template; 5) the classifier responses of all candidate sample bounding boxes are calculated by using a naive Bayes classifier, and the sample bounding box with the largest classifier response is selected as the predicted target position of the current frame. The method can effectively describe the target image by less block numbers, can resist the influence of interference such as illumination change, shielding and the like, and saves the calculation time.

Description

Video single-target tracking method based on rectangular asymmetric inverse layout model
Technical Field
The invention relates to the technical field of computer vision, in particular to a video single-target tracking method based on a rectangular asymmetric inverse layout model.
Background
Computer vision is a comprehensive discipline, and the main research is how to make a machine acquire information like a human, that is, a camera is used to replace a human visual organ to acquire information such as images or videos, and then a computer replaces a human brain to process and process the information.
Target tracking is one of the main research fields of computer vision, and is also the basic core technology of computer vision, and many subsequent processes in the field of computer vision need to be based on target tracking, such as target recognition, target motion analysis and other high-level processes. Target tracking technology has been applied in numerous fields, such as intelligent monitoring systems, human-computer interaction, visual navigation, robotic services, and the like.
The video is a continuous image sequence, each frame of the video is an image and is generally a natural image with a complex background, the target in each frame of image needs to be tracked for video target tracking, and the fact is that the position, the scale and other information of the target are obtained through analysis and calculation of the video sequence, so that the target in the continuous video sequence is tracked.
The invention considers the general situation of video target tracking, namely video single target tracking for tracking a single object in a video sequence given the target position and scale information of the first frame of the video sequence. Generally, the target tracking algorithm consists of three parts, namely a target description model, prediction of target positions and model updating. Wherein an effective target description model is crucial for the success of the tracking algorithm. With the improvement of the performance of computers and cameras and the reduction of the cost, the demand of automatic video analysis is increasing, and a video target tracking algorithm has become one of the research hotspots nowadays.
The method includes the steps of establishing a target model by an online learning mode according to a basic idea of algorithm design, searching and reconstructing an image region with the minimum error by using a model, completing target positioning, modeling a target region in a current frame, searching a region most similar to the model as a predicted target position in a next frame, comparing a famous image with a Kalman Filter, a Particle Filter, a Mean-Shift algorithm and the like, modeling a Kalman Filter (Kalman Filter) instead of characteristics of a target, estimating a position of the target in the next frame, calculating a probability density function by searching a group of random samples spread in a state space based on Particle distribution statistics, replacing an operation with a sample Mean value, obtaining a minimum variance estimation of a system state, calculating a Mean-Shift distribution based on the probability density distribution, converging on a local peak of the probability density distribution, extracting a background classification information by using a classical classification algorithm, extracting a background classification method based on a classification algorithm, and a classification method by using a conventional CT classification algorithm, extracting background classification information, and a background classification method by using a background classification algorithm, extracting background classification information, and a classification method by using a conventional classification algorithm, wherein a background classification method is used for classifying a background classification method for extracting a background classification information, a background classification method for extracting a background classification information, a background classification method for classifying a background classification, a background classification method for obtaining a background classification method for a classification method for extracting a classification method for a background classification, which is well distinguishing a classification method for extracting a classification target.
However, due to the challenges such as illumination variation, scale variation, occlusion interference, and motion blur of the target image, it is still very difficult to design an effective target tracking model. Although many target tracking algorithms have shown their advantages, there are still some problems to be solved. For example, the discriminant class algorithm mostly uses a complex classifier model, which increases the complexity of calculation, and thus lacks real-time performance; some algorithms may learn negative examples due to the use of global and fixed update strategies, and thus have the problem of tracking drift.
Disclosure of Invention
The invention aims to overcome the defects of the existing video single-target tracking technology, provides a video single-target tracking method based on a rectangular asymmetric inverse layout model, can effectively describe a target image by fewer blocks, can resist the influence of interference such as illumination change, shielding and the like, saves the calculation time and greatly improves the real-time performance of a video tracking algorithm; in addition, the invention uses a naive Bayes classifier to learn the positive and negative samples, and has high classification accuracy and high classification speed.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a video single-target tracking method based on a rectangular asymmetric inverse layout model comprises the following steps:
1) initializing a first frame, and dividing a target image into homogeneous blocks by using a rectangular non-symmetric inverse layout model (RNAM) and an extended Gouraud shadow method to establish a target segmentation template;
2) generation of positive and negative sample sets
Randomly selecting a group of bounding boxes as a positive sample set in the area of the current frame close to the target position, and randomly selecting a group of bounding boxes as a negative sample set in the area of the current frame far from the target position;
3) updating classifier parameters
Extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating parameters of a naive Bayes classifier;
4) candidate sample bounding box selection
Selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting Haar-like characteristics according to a target segmentation template;
5) prediction of target location
The classifier responses of all candidate sample bounding boxes are calculated by using a naive Bayes classifier, and the sample bounding box with the largest classifier response is selected as the predicted target position of the current frame.
In step 1), the rectangular asymmetric inverse layout model RNAM is specifically as follows:
the asymmetric inverse layout model NAM is a general type mode representation model, the rectangular NAM (named RNAM) is a NAM mode representation model based on a sub-rectangular mode, and the following is an abstract description of the NAM:
assuming that the original mode is that the recovered undistorted mode is 'and the distortion mode is', NAM is a undistorted transform from 'to', or a distorted transform from 'to':
′=T(),″≈T()
wherein, T () is a forward transform function, or called coding function, and the forward coding process is:
Figure BDA0002430834250000041
wherein,' is a synthesized pattern after encoding; p ═ P1,p2,...,pnIs a predefined set of sub-patterns, n is the number of types of sub-patterns, pj∈ P is the jth sub-pattern in P, j is more than or equal to 1 and less than or equal to n, v is PjA is pjA parameter set of the sub-mode or a sub-mode set selected through intelligent analysis; a isiIs sub-pattern pjI is more than or equal to 1 and less than or equal to miM is pjThe number of (2); (d) is the debris mode, i.e. the trash in the container, d is the threshold of the debris spatial scale; the distortion mode NAM is:
Figure BDA0002430834250000042
apparently, there is ═ + (d);
the basic idea of RNAM is: giving a well-laid pattern and a predefined rectangular sub-pattern of different shape, then extracting the sub-patterns from the given pattern, and using the combination of the sub-patterns to represent the given pattern;
the extended gouuraud shading method is specifically as follows:
given an error tolerance, all pixel values g (x, y) within a tile B satisfy | g (x, y)-gest(x,y)|≤,x1≤x≤x2,y1≤y≤y2Then the rectangular block is called as a homogeneous block, where x1、y1Respectively the abscissa and ordinate, x, of the upper left vertex of the rectangular block B2、y2Respectively the abscissa and ordinate, g, of the lower right vertex of the rectangular block Best(x, y) is an approximate gray value at a certain coordinate (x, y) in B, and the calculation formula is as follows:
Figure BDA0002430834250000051
wherein ,g1、g2、g3 and g4The gray values g of four corners of the rectangular block B, namely the upper left corner, the upper right corner, the lower left corner and the lower right corner5=g1+(g2-g1)×i2,g6=g3+(g4-g3)×i2,i1=(y-y1)/(y2-y1),i2=(x-x1)/(x2-x1);
Suppose { ItI1, 2, N is a given video sequence, ItThe video frame corresponding to the moment t, N is the frame number of the video; for the first frame I1The target position of the first frame is known as L1(x1,y1,w1,h1), wherein x1 and y1Respectively representing the abscissa and ordinate of the upper left vertex of the target image, w1 and h1Respectively representing the width and height of the target image, the target image being noted
Figure BDA0002430834250000052
Then converting the image into a gray scale image, assuming that W and H are the width and height of the target image respectively, and then giving an error tolerance, dividing the target image into similar blocks by using an RNAM and an extended Gouraud shading method to establish a target segmentation template, wherein the method specifically comprises the following steps:
a. defining a marking matrix R with the size of W × H and initializing all elements of the marking matrix R to be zero;
b. from the target image
Figure BDA0002430834250000053
Starting with the first entry, the starting point (x) of an unidentified sub-pattern of rectangles is first determined according to the raster scan order0,y0) Tracking the corresponding rectangular sub-mode according to a matching algorithm of the rectangular sub-mode, namely an inverse layout algorithm;
c. according to the efficiency scale of the rectangular submodel, namely the area of the rectangular submodel and an extended Gouraud shadow method with the error tolerance, determining a rectangular submodel with the largest area, namely a similar module, and marking the largest rectangular submodel in a matrix R so as to search the next starting point;
d. c, storing the position information of the similar blocks found in the step c, including the coordinates, the width and the height of the top left vertex, into a similar block container F;
e. repeating steps b to d until the target image
Figure BDA0002430834250000061
Until there are no unmarked blocks;
for the same type of block container F, it is noted V ═ b1,b2,…bnDenotes an object segmentation template, where biWhere (x, y, w, h) denotes position information of the ith homogeneous block, and n denotes the number of homogeneous blocks.
In step 2), the generation of the positive and negative sample sets specifically includes:
suppose { ItI1, 2, N is a given video sequence, ItIs the video frame corresponding to the time t, N is the frame number of the video, ltRepresenting the position of the central point of the t frame target frame; in the t-th frame, in a region D close to the target positionα={z|||l(z)-lt'α' randomly selects a group of bounding boxes as a positive sample set and simultaneously selects a region far away from the target position
Figure BDA0002430834250000062
Randomly selecting a set of bounding boxes to doIs a negative sample set, wherein z represents the selected sample frame, l (z) represents the center point position of the sample frame, α, β,
Figure BDA0002430834250000063
Is represented by ltIs the radius of the center.
In step 3), extracting Haar-like features of the positive and negative sample sets according to the target segmentation template specifically includes:
suppose that the ith homogeneous block in the target segmentation template is denoted as bi=(xi,yi,wi,hi), wherein xi、yiRespectively representing the abscissa and ordinate, w, of the upper left vertex of the ith block of the same kindi and hiRespectively, its width and height; let Int (x, y) represent the element value at the (x, y) position in the integral image of a video frame at a certain time, and the method for extracting the Haar-like feature of the ith homogeneous block according to the target segmentation template comprises the following steps:
Haar(i)=Int(xi+wi,yi+hi)+Int(xi,yi)-Int(xi+wi,yi)-Int(xi,yi+hi)
the method for extracting the Haar-like characteristics of the bounding boxes in the positive sample set and the negative sample set is the same; assuming that the number of bounding boxes in the positive sample set is M and the number of homogeneous blocks in the target segmentation template is N, the specific steps are as follows:
a. establishing a sample set characteristic matrix S with the size of M × N;
b. calculating an integral image of the current frame;
c. calculating element values in a sample set characteristic matrix S according to a Haar-like characteristic extraction method;
the parameters of the updated naive Bayes classifier are as follows:
calculating the mean value and standard deviation of the Haar-like features corresponding to each similar block to the positive and negative sample sets according to the feature matrix of the positive and negative sample sets, wherein the naive Bayes classifier has two parameters, the first is a feature mean vector which is expressed as Pμ={μ12,...,μnThe second is a feature standard deviation vector, denoted Pσ={σ12,...,σn}, wherein μiMeans, σ, for the ith Haar-like feature for the positive or negative sample setiThe standard deviation of the ith Haar-like feature to a positive sample set or a negative sample set is represented, and n represents the number of the Haar-like features, namely the number of similar blocks of the target segmentation template; the strategy for scalar parameter incremental update in two feature vectors is:
μi←(1-λ)μ+λμi
Figure BDA0002430834250000071
wherein, lambda represents the learning rate, 0 < lambda < 1, mu represents the mean value of the ith feature to the positive sample set or the negative sample set before the updating of the classifier parameters, and sigma represents the standard deviation of the ith feature to the positive sample set or the negative sample set before the updating of the classifier parameters.
In step 4), a group of candidate sample bounding boxes is selected around the same position of the current frame according to the target position predicted by the previous frame, which is as follows:
suppose { ItI1, 2, N is a given video sequence, ItIs the video frame corresponding to the time t, N is the frame number of the video, lt-1Representing the position of the central point of the target frame of the t-1 th frame; at the t-th frame, at the target position l away from the previous framet-1Radius gamma range Dγ={z|||l(z)-lt-1Randomly selecting a group of candidate sample boundary frames, | < gamma, wherein z represents the selected sample boundary frame, l (z) represents the position of the central point of the sample frame, and gamma represents the position of lt-1Is the radius of the center.
In step 5), the naive bayes classifier is used to calculate the responses of all candidate sample bounding boxes, and the candidate sample bounding box with the maximum response value is selected as the predicted target position of the current frame, which is specifically as follows:
ha corresponding to same-class blocks in target segmentation templatear-like, denoted by V ═ V1,v2,...,vn)T, wherein vi(i-1, 2, …, n) represents the Haar-like feature vector of the ith homogeneous block, and n represents the number of homogeneous blocks in the target segmentation template. Assuming that all elements in V are distributed independently, a naive bayes classifier is constructed as follows:
Figure BDA0002430834250000081
where p (y ═ 1) and p (y ═ 0) denote the probability that the sample is positive and negative, respectively, y ∈ {0,1} denotes the sample label, assuming a consistent prior p (y ═ 1) ═ p (y ═ 0), and p (v ═ 0)i1 and p (v)iY ═ 0) respectively indicates that the positive sample label is positive and negative, the characteristic viThe distribution probability of (2); since homogeneous block feature vectors based on RNAM partitioning always follow Gaussian distribution, conditional distribution p (v) in classifier H (v)i1 and p (v)iY ═ 0) can be assumed to be gaussian:
Figure BDA0002430834250000082
wherein ,
Figure BDA0002430834250000083
and
Figure BDA0002430834250000084
respectively representing the mean and standard deviation of the positive sample set,
Figure BDA0002430834250000085
and
Figure BDA0002430834250000086
respectively representing the mean value and the standard deviation of the negative sample set;
and finally, calculating classifier responses corresponding to all candidate sample bounding boxes through the constructed classifier H (v), and selecting the sample bounding box with the maximum classifier response value as the predicted target position of the current frame.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention adopts the rectangular asymmetric inverse layout model to block the target image to establish the target segmentation template, can effectively highlight the structural characteristics of the target image by less block number, can resist the interference of illumination change, target shielding and the like in various targets, reduces the calculation complexity and meets the requirement of real-time performance.
2. The invention adopts the Haar-like characteristics to extract the image characteristics from the target image, can better describe the tracked target and can quickly calculate, thereby greatly enhancing the robustness and the real-time performance of the tracker.
3. The invention adopts a simple naive Bayes classifier to learn positive and negative samples, selects a candidate sample boundary box with the maximum classifier response as a predicted target position, has stronger classification performance, performs better in the face of tracking targets such as background similar interference, rapid motion, motion blur and the like and complex application environments, has smaller calculation amount compared with a common classifier, and can better meet the real-time performance requirement.
Drawings
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is a schematic diagram of classifier updating and predicting target locations.
FIG. 3 is a flow chart of target image segmentation.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1 and fig. 2, the method for tracking a single video target based on a rectangular asymmetric inverse layout model provided in this embodiment includes the following steps:
1) initializing a first frame, and dividing a target image into homogeneous blocks by using a rectangular non-symmetric inverse layout model (RNAM for short) and an extended Gouraud shading method to establish a target segmentation template. Specifically, the program first reads in a first frame and a target of the video sequencePosition and scale information (represented using Rect data structure in OpenCV) of target image, and marking of target image as
Figure BDA0002430834250000091
Then converting the image into a gray image, assuming that W and H are the width and height of the target image respectively, and the given error tolerance is set to be 20, then dividing the target image into similar blocks by using an RNAM and an extended Gouraud shading method to establish a target segmentation template, wherein a flow chart is shown in FIG. 3, and the specific steps are as follows:
a. a tag matrix R of size W × H is defined and all its elements are initialized to zero.
b. From the target image
Figure BDA0002430834250000104
Starting with the first entry, the starting point (x) of an unidentified sub-pattern of rectangles is first determined according to the raster scan order0,y0) And tracking the corresponding rectangular sub-mode according to a matching (inverse layout) algorithm of the rectangular sub-mode.
c. According to the efficiency scale of the rectangular sub-pattern (namely the area of the rectangular sub-pattern) and the extended Gouraud shadow method (the error tolerance is equal to 20), a rectangular sub-pattern with the largest area (namely the homogeneous block) is determined, and the largest rectangular sub-pattern is marked in the matrix R so as to search for the next starting point.
d. And c, storing the position information of the same type of blocks found in the step c, including the coordinates, the width and the height of the top left vertex, into a same type of block container F.
e. Repeating steps b to d until the target image
Figure BDA0002430834250000105
Until there are no unmarked blocks.
For the same type of block container F obtained by the above procedure, it is noted V ═ b1,b2,…bnDenotes an object segmentation template, where biWhere (x, y, w, h) represents the position information of the ith homogeneous block, R in OpenCV may be usedect the data structure indicates that n indicates the number of homogeneous blocks.
2) And generating a positive sample set and a negative sample set, wherein a group of bounding boxes are randomly selected in the area of the current frame close to the target position to serve as the positive sample set, and a group of bounding boxes are randomly selected in the area of the current frame far from the target position to serve as the negative sample set. The selection of the positive and negative sample sets is shown in fig. 2. Let ltRepresenting the position of the central point of the t frame target frame; at the t-th frame, in a region D closer to the target positionα={z|||l(z)-lt< α (where α takes 4), 45 bounding boxes are randomly selected as a positive sample set, and the regions farther from the target position are simultaneously selected
Figure BDA0002430834250000101
( wherein
Figure BDA0002430834250000102
Take 8, β and 30), randomly selecting 50 bounding boxes as a negative sample set, wherein z represents the selected sample box, l (z) represents the central point position of the sample box, α, β,
Figure BDA0002430834250000103
Is represented by ltIs the radius of the center.
3) And updating parameters of the classifier, namely extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating the parameters of the naive Bayes classifier. Specifically, first, according to the positive and negative sample sets selected in the second step and the target segmentation template (i.e., the homogeneous block container F) obtained in the first step, Haar-like features of the positive and negative sample sets are extracted, and a feature extraction method is described as follows:
suppose that the ith homogeneous block in the target segmentation template is denoted as bi=(xi,yi,wi,hi), wherein xi、yiRespectively representing the abscissa and ordinate, w, of the upper left vertex of the ith block of the same kindi and hiRespectively, its width and height; let Int (x, y) denote the element value at (x, y) position in the integral image of video frame at a certain time, and extract it according to ith homogeneous block in target segmentation templateThe method for Haar-like features comprises the following steps:
Haar(i)=Int(xi+wi,yi+hi)+Int(xi,yi)-Int(xi+wi,yi)-Int(xi,yi+hi)
when extracting the Haar-like features of a certain sample frame in a sample set, each homogeneous block corresponds to a feature, and the method for extracting the Haar-like features of the boundary frames in the positive sample set and the negative sample set is the same, taking the feature extraction of the positive sample set as an example, assuming that the number of the boundary frames in the positive sample set is M (i.e., M is 45), and the number of homogeneous blocks in the target segmentation template is N, the specific steps are as follows:
a. a sample set feature matrix S of size M × N is established.
b. An integral image of the current frame is calculated.
c. And calculating element values in the sample set characteristic matrix S according to a Haar-like characteristic extraction method.
After the characteristics of the positive and negative samples are extracted, the obtained result is a characteristic matrix S of the positive sample set+And the feature matrix S of the negative sample set-. Calculating the mean value and standard deviation of the Haar-like features corresponding to each similar block to the positive and negative sample sets according to the feature matrix of the positive and negative sample sets, wherein the naive Bayes classifier has two parameters, the first is a feature mean vector which is expressed as Pμ={μ12,...,μnThe second is a feature standard deviation vector, denoted Pσ={σ12,...,σn}, wherein μiMeans, σ, for the ith Haar-like feature for the positive or negative sample setiThe standard deviation of the ith Haar-like feature to a positive sample set or a negative sample set is represented, and n represents the number of the Haar-like features, namely the number of similar blocks of the target segmentation template; the strategy for scalar parameter incremental update in two feature vectors is:
μi←(1-λ)μ+λμi
Figure BDA0002430834250000121
where λ represents the learning rate (0 < λ < 1), λ is 0.85 in this embodiment, μ represents the mean of the ith feature to the positive sample set or the negative sample set before the update of the classifier parameters, and σ represents the standard deviation of the ith feature to the positive sample set or the negative sample set before the update of the classifier parameters.
4) And selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting the Haar-like characteristics according to the target segmentation template. Specifically, assume { ItI1, 2, N is a given video sequence, ItIs the video frame corresponding to the time t, N is the frame number of the video, lt-1Representing the position of the central point of the target frame of the t-1 th frame; at the t-th frame, at the target position l away from the previous framet-1Radius gamma range Dγ={z|||l(z)-lt-1And | < γ }, wherein γ is 20, randomly selecting 100 candidate sample boundary frames, wherein z represents the selected sample boundary frame, l (z) represents the position of the center point of the sample frame, and γ represents the position of the center point of the sample frame by lt-1Is the radius of the center.
The method for extracting the Haar-like characteristics of the candidate sample boundary box is similar to the method for extracting the Haar-like characteristics of the positive and negative sample sets in the step 3), and finally the characteristic matrix of the candidate sample boundary box is obtained.
5) And predicting the target position, calculating the classifier response of all candidate sample bounding boxes by using a naive Bayes classifier, and selecting the sample bounding box with the maximum classifier response as the predicted target position of the current frame. And for Haar-like feature representation corresponding to the same class block in the target segmentation template, marking as V-V (V)1,v2,...,vn)T, wherein vi(i-1, 2, …, n) represents the Haar-like feature vector of the ith homogeneous block, and n represents the number of homogeneous blocks in the target segmentation template. Assuming that all elements in V are distributed independently, a naive bayes classifier can be constructed as follows:
Figure BDA0002430834250000131
where p (y ═ 1) and p (y ═ 0) denote the probability that the sample is positive and negative, respectively, y ∈ {0,1} denotes the sample label, assuming a consistent prior p (y ═ 1) ═ p (y ═ 0), and p (v ═ 0)i1 and p (v)iY ═ 0) respectively indicates that the positive sample label is positive and negative, the characteristic viThe distribution probability of (2). Since homogeneous block feature vectors based on RNAM partitioning always follow Gaussian distribution, conditional distribution p (v) in classifier H (v)i1 and p (v)iY ═ 0) can be assumed to be gaussian:
Figure BDA0002430834250000132
wherein ,
Figure BDA0002430834250000133
and
Figure BDA0002430834250000134
mean and standard deviation of the positive (negative) sample set are indicated, respectively.
And finally, calculating classifier responses corresponding to all candidate sample bounding boxes through the constructed classifier H (v), and selecting the sample bounding box with the maximum classifier response value as the predicted target position of the current frame.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (6)

1. A video single-target tracking method based on a rectangular asymmetric inverse layout model is characterized by comprising the following steps:
1) initializing a first frame, and dividing a same-class block for a target image by using a rectangular asymmetric inverse layout model (RNAM) and an extended Gouraud shadow method to establish a target segmentation template;
2) generation of positive and negative sample sets
Randomly selecting a group of bounding boxes as a positive sample set in the area of the current frame close to the target position, and randomly selecting a group of bounding boxes as a negative sample set in the area of the current frame far from the target position;
3) updating classifier parameters
Extracting Haar-like characteristics of the positive and negative sample sets according to the target segmentation template and updating parameters of a naive Bayes classifier;
4) candidate sample bounding box selection
Selecting a group of candidate sample boundary frames around the same position of the current frame according to the target position predicted by the previous frame, and extracting Haar-like characteristics according to a target segmentation template;
5) prediction of target location
The classifier responses of all candidate sample bounding boxes are calculated by using a naive Bayes classifier, and the sample bounding box with the largest classifier response is selected as the predicted target position of the current frame.
2. The method for tracking a single video object based on the rectangular asymmetric inverse layout model according to claim 1, wherein in step 1), the rectangular asymmetric inverse layout model RNAM specifically includes the following steps:
the asymmetric inverse layout model NAM is a general type mode representation model, the rectangular NAM (named RNAM) is a NAM mode representation model based on a sub-rectangular mode, and the following is an abstract description of the NAM:
assuming that the original mode is that the recovered undistorted mode is 'and the distortion mode is', NAM is a undistorted transform from 'to', or a distorted transform from 'to':
′=T(),″≈T()
wherein, T () is a forward transform function, or called coding function, and the forward coding process is:
Figure FDA0002430834240000021
wherein,' is a synthesized pattern after encoding; p ═ P1,p2,...,pnIs a predefined set of sub-patternsN is the number of sub-pattern types, pj∈ P is the jth sub-pattern in P, j is more than or equal to 1 and less than or equal to n, v is PjA is pjA parameter set of the sub-mode or a sub-mode set selected through intelligent analysis; a isiIs sub-pattern pjI is more than or equal to 1 and less than or equal to miM is pjThe number of (2); (d) is the debris mode, i.e. the trash in the container, d is the threshold of the debris spatial scale; the distortion mode NAM is:
Figure FDA0002430834240000022
apparently, there is ═ + (d);
the basic idea of RNAM is: giving a well-laid pattern and a predefined rectangular sub-pattern of different shape, then extracting the sub-patterns from the given pattern, and using the combination of the sub-patterns to represent the given pattern;
the extended gouuraud shading method is specifically as follows:
given an error tolerance, if all pixel values g (x, y) in a tile B satisfy | g (x, y) -gest(x,y)|≤,x1≤x≤x2,y1≤y≤y2Then the rectangular block is called as a homogeneous block, where x1、y1Respectively the abscissa and ordinate, x, of the upper left vertex of the rectangular block B2、y2Respectively the abscissa and ordinate, g, of the lower right vertex of the rectangular block Best(x, y) is an approximate gray value at a certain coordinate (x, y) in B, and the calculation formula is as follows:
Figure FDA0002430834240000031
wherein ,g1、g2、g3 and g4The gray values g of four corners of the rectangular block B, namely the upper left corner, the upper right corner, the lower left corner and the lower right corner5=g1+(g2-g1)×i2,g6=g3+(g4-g3)×i2,i1=(y-y1)/(y2-y1),i2=(x-x1)/(x2-x1);
Suppose { ItI1, 2, N is a given video sequence, ItThe video frame corresponding to the moment t, N is the frame number of the video; for the first frame I1The target position of the first frame is known as L1(x1,y1,w1,h1), wherein x1 and y1Respectively representing the abscissa and ordinate of the upper left vertex of the target image, w1 and h1Respectively representing the width and height of the target image, the target image being noted
Figure FDA0002430834240000032
Then converting the image into a gray scale image, assuming that W and H are the width and height of the target image respectively, and then giving an error tolerance, dividing the target image into similar blocks by using an RNAM and an extended Gouraud shading method to establish a target segmentation template, wherein the method specifically comprises the following steps:
a. defining a marking matrix R with the size of W × H and initializing all elements of the marking matrix R to be zero;
b. from the target image
Figure FDA0002430834240000033
Starting with the first entry, the starting point (x) of an unidentified sub-pattern of rectangles is first determined according to the raster scan order0,y0) Tracking the corresponding rectangular sub-mode according to a matching algorithm of the rectangular sub-mode, namely an inverse layout algorithm;
c. according to the efficiency scale of the rectangular submodel, namely the area of the rectangular submodel and an extended Gouraud shadow method with the error tolerance, determining a rectangular submodel with the largest area, namely a similar module, and marking the largest rectangular submodel in a matrix R so as to search the next starting point;
d. c, storing the position information of the similar blocks found in the step c, including the coordinates, the width and the height of the top left vertex, into a similar block container F;
e. repeating steps b to d until the target image
Figure FDA0002430834240000034
Until there are no unmarked blocks;
for the same type of block container F, it is noted V ═ b1,b2,···bnDenotes an object segmentation template, where biWhere (x, y, w, h) denotes position information of the ith homogeneous block, and n denotes the number of homogeneous blocks.
3. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 2), the generation of the positive and negative sample sets specifically includes:
suppose { ItI1, 2, N is a given video sequence, ItIs the video frame corresponding to the time t, N is the frame number of the video, ltRepresenting the position of the central point of the t frame target frame; in the t-th frame, in a region D close to the target positionα={z|||l(z)-lt'α' randomly selects a group of bounding boxes as a positive sample set and simultaneously selects a region far away from the target position
Figure FDA0002430834240000041
Randomly selecting a set of bounding boxes as a negative sample set, wherein z represents the selected sample box, l (z) represents the position of the center point of the sample box, α, β,
Figure FDA0002430834240000042
Is represented by ltIs the radius of the center.
4. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 3), extracting Haar-like features of the positive and negative sample sets according to the target segmentation template specifically includes:
suppose the ith homogeneous block table in the target partition templateShown as bi=(xi,yi,wi,hi), wherein xi、yiRespectively representing the abscissa and ordinate, w, of the upper left vertex of the ith block of the same kindi and hiRespectively, its width and height; let Int (x, y) represent the element value at the (x, y) position in the integral image of a video frame at a certain time, and the method for extracting the Haar-like feature of the ith homogeneous block according to the target segmentation template comprises the following steps:
Haar(i)=Int(xi+wi,yi+hi)+Int(xi,yi)-Int(xi+wi,yi)-Int(xi,yi+hi)
the method for extracting the Haar-like characteristics of the bounding boxes in the positive sample set and the negative sample set is the same; assuming that the number of bounding boxes in the positive sample set is M and the number of homogeneous blocks in the target segmentation template is N, the specific steps are as follows:
a. establishing a sample set characteristic matrix S with the size of M × N;
b. calculating an integral image of the current frame;
c. calculating element values in a sample set characteristic matrix S according to a Haar-like characteristic extraction method;
the parameters of the updated naive Bayes classifier are as follows:
calculating the mean value and standard deviation of the Haar-like features corresponding to each similar block to the positive and negative sample sets according to the feature matrix of the positive and negative sample sets, wherein the naive Bayes classifier has two parameters, the first is a feature mean vector which is expressed as Pμ={μ12,...,μnThe second is a feature standard deviation vector, denoted Pσ={σ12,...,σn}, wherein μiMeans, σ, for the ith Haar-like feature for the positive or negative sample setiThe standard deviation of the ith Haar-like feature to a positive sample set or a negative sample set is represented, and n represents the number of the Haar-like features, namely the number of similar blocks of the target segmentation template; the strategy for scalar parameter incremental update in two feature vectors is:
μi←(1-λ)μ+λμi
Figure FDA0002430834240000051
wherein, lambda represents the learning rate, 0 < lambda < 1, mu represents the mean value of the ith feature to the positive sample set or the negative sample set before the updating of the classifier parameters, and sigma represents the standard deviation of the ith feature to the positive sample set or the negative sample set before the updating of the classifier parameters.
5. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 4), a group of candidate sample bounding boxes is selected around the same position of the current frame according to the target position predicted by the previous frame, which is as follows:
suppose { ItI1, 2, N is a given video sequence, ItIs the video frame corresponding to the time t, N is the frame number of the video, lt-1Representing the position of the central point of the target frame of the t-1 th frame; at the t-th frame, at the target position l away from the previous framet-1Radius gamma range Dγ={z|||l(z)-lt-1Randomly selecting a group of candidate sample boundary frames, | < gamma, wherein z represents the selected sample boundary frame, l (z) represents the position of the central point of the sample frame, and gamma represents the position of lt-1Is the radius of the center.
6. The method for tracking the single video target based on the rectangular asymmetric inverse layout model according to claim 1, wherein: in step 5), the naive bayes classifier is used to calculate the responses of all candidate sample bounding boxes, and the candidate sample bounding box with the maximum response value is selected as the predicted target position of the current frame, which is specifically as follows:
and for Haar-like feature representation corresponding to the same class block in the target segmentation template, marking as V-V (V)1,v2,...,vn)T, wherein viA Haar-like feature vector representing the ith homogeneous block, i 1,2, …, n,n represents the number of homogeneous blocks in the target segmentation template; assuming that all elements in V are distributed independently, a naive bayes classifier is constructed as follows:
Figure FDA0002430834240000061
where p (y ═ 1) and p (y ═ 0) denote the probability that the sample is positive and negative, respectively, y ∈ {0,1} denotes the sample label, assuming a consistent prior p (y ═ 1) ═ p (y ═ 0), and p (v ═ 0)i1 and p (v)iY ═ 0) respectively indicates that the positive sample label is positive and negative, the characteristic viThe distribution probability of (2); since homogeneous block feature vectors based on RNAM partitioning always follow Gaussian distribution, conditional distribution p (v) in classifier H (v)i1 and p (v)iY ═ 0) can be assumed to be gaussian:
Figure FDA0002430834240000062
wherein ,
Figure FDA0002430834240000063
and
Figure FDA0002430834240000064
respectively representing the mean and standard deviation of the positive sample set,
Figure FDA0002430834240000065
and
Figure FDA0002430834240000066
respectively representing the mean value and the standard deviation of the negative sample set;
and finally, calculating classifier responses corresponding to all candidate sample bounding boxes through the constructed classifier H (v), and selecting the sample bounding box with the maximum classifier response value as the predicted target position of the current frame.
CN202010235520.6A 2020-03-30 2020-03-30 Video single-target tracking method based on rectangular asymmetric inverse layout model Active CN111462181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010235520.6A CN111462181B (en) 2020-03-30 2020-03-30 Video single-target tracking method based on rectangular asymmetric inverse layout model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010235520.6A CN111462181B (en) 2020-03-30 2020-03-30 Video single-target tracking method based on rectangular asymmetric inverse layout model

Publications (2)

Publication Number Publication Date
CN111462181A true CN111462181A (en) 2020-07-28
CN111462181B CN111462181B (en) 2023-06-20

Family

ID=71683336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010235520.6A Active CN111462181B (en) 2020-03-30 2020-03-30 Video single-target tracking method based on rectangular asymmetric inverse layout model

Country Status (1)

Country Link
CN (1) CN111462181B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355570A (en) * 2016-10-21 2017-01-25 昆明理工大学 Binocular stereoscopic vision matching method combining depth characteristics
US20170286750A1 (en) * 2016-03-29 2017-10-05 Seiko Epson Corporation Information processing device and computer program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170286750A1 (en) * 2016-03-29 2017-10-05 Seiko Epson Corporation Information processing device and computer program
CN106355570A (en) * 2016-10-21 2017-01-25 昆明理工大学 Binocular stereoscopic vision matching method combining depth characteristics

Also Published As

Publication number Publication date
CN111462181B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
Novotny et al. Semi-convolutional operators for instance segmentation
Enzweiler et al. A mixed generative-discriminative framework for pedestrian classification
CN110111338B (en) Visual tracking method based on superpixel space-time saliency segmentation
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
US6400828B2 (en) Canonical correlation analysis of image/control-point location coupling for the automatic location of control points
CN109829449B (en) RGB-D indoor scene labeling method based on super-pixel space-time context
CN106815842B (en) improved super-pixel-based image saliency detection method
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN108053420B (en) Partition method based on finite space-time resolution class-independent attribute dynamic scene
JP2006209755A (en) Method for tracing moving object inside frame sequence acquired from scene
JP2008310796A (en) Computer implemented method for constructing classifier from training data detecting moving object in test data using classifier
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN113362341B (en) Air-ground infrared target tracking data set labeling method based on super-pixel structure constraint
CN112465021B (en) Pose track estimation method based on image frame interpolation method
WO2008125854A1 (en) Method for tracking multiple objects with occlusions
Sadeghi-Tehran et al. A real-time approach for autonomous detection and tracking of moving objects from UAV
Alsanad et al. Real-time fuel truck detection algorithm based on deep convolutional neural network
CN115063526A (en) Three-dimensional reconstruction method and system of two-dimensional image, terminal device and storage medium
CN113436251B (en) Pose estimation system and method based on improved YOLO6D algorithm
Lin et al. Temporally coherent 3D point cloud video segmentation in generic scenes
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
Lin et al. COB method with online learning for object tracking
CN111462181B (en) Video single-target tracking method based on rectangular asymmetric inverse layout model
Liu et al. Fast tracking via spatio-temporal context learning based on multi-color attributes and pca
Li et al. Multitarget tracking of pedestrians in video sequences based on particle filters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant