CN108470354B

CN108470354B - Video target tracking method and device and implementation device

Info

Publication number: CN108470354B
Application number: CN201810249416.5A
Authority: CN
Inventors: 周浩; 高赟; 张晋; 袁国武; 普园媛; 杜欣悦
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2018-03-23
Filing date: 2018-03-23
Publication date: 2021-04-27
Anticipated expiration: 2038-03-23
Also published as: CN108470354A

Abstract

The invention provides a video target tracking method, a video target tracking device and a video target tracking implementation device; the method comprises the following steps: detecting a feature point set in a current frame in a set image range, and screening the feature point set according to a preset screening condition; further, according to the screened feature point set, feature point matching, motion estimation and tracking condition analysis are carried out on the target object; and updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object. The tracking result in the invention not only can reflect the position of the target object in time, but also can accurately reflect the range and the rotation angle of the target object, so that the tracking of the video frame target object has better robustness and robustness, and meanwhile, the calculation complexity is lower, and the tracking robustness and the calculation speed are both considered.

Description

Video target tracking method and device and implementation device

Technical Field

The invention relates to the technical field of video target tracking, in particular to a video target tracking method, a video target tracking device and a video target tracking implementation device.

Background

The motion tracking means that an interested target is detected in a continuous image sequence to obtain information of the position, the range, the form and the like of the target, so that the corresponding relation of the target is established in the continuous video sequence, and reliable data is provided for the video understanding and analysis of the next step. The traditional tracking method builds a model for a target, when a new frame comes, the target is tracked by searching the optimal likelihood of the target model, and in consideration of the problem of algorithm complexity, the position of the tracked target is usually returned, information such as an imaging range, rotation change and the like of the target in a video is not returned, and the tracking drift and even the tracking failure are easily caused by the influences of factors such as a disordered background, shielding, sudden movement change and the like; therefore, the traditional tracking method of the existing tracking algorithm may have a good effect on the aspect of computational complexity, but sacrifices robustness to a certain extent, or emphasizes robustness, but sacrifices the computation speed, and is usually difficult to consider.

Disclosure of Invention

In view of this, the present invention provides a method, an apparatus, and an implementation apparatus for tracking a video target, so that tracking of a video frame target object has better robustness and robustness, and meanwhile, the computation complexity is low, and the tracking robustness and the computation speed are both considered.

In a first aspect, an embodiment of the present invention provides a video target tracking method, including: initializing a tracking parameter; the tracking parameters at least comprise the position and the range of the target object, the interframe motion parameters of the target object and the neighborhood background, and the feature point set of the target object and the neighborhood background; a plurality of apparent features of the target object and the neighborhood background; detecting a feature point set in a current frame in a set image range, and screening the feature point set according to a preset screening condition; the feature point set comprises feature points and feature vectors corresponding to the feature points; respectively matching the screened feature point set with a target object corresponding to the previous frame and a feature point set of a neighborhood background; according to the screened feature points, carrying out motion estimation on the target object; analyzing the tracking condition of the target object in the current frame according to the distance between the screened feature point and the center position of the target object and the apparent feature of the target object; and updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object.

In a second aspect, an embodiment of the present invention provides a video target tracking apparatus, including: the initialization module is used for initializing the tracking parameters; the tracking parameters at least comprise the position and the range of the target object, the interframe motion parameters of the target object and the neighborhood background, and the feature point set of the target object and the neighborhood background; a plurality of apparent features of the target object and the neighborhood background; the screening module is used for detecting a feature point set in the current frame in a set image range and screening the feature point set according to preset screening conditions; the feature point set comprises feature points and feature vectors corresponding to the feature points; the feature point matching module is used for respectively matching the screened feature point set with a target object corresponding to the previous frame and a feature point set of a neighborhood background; the motion estimation module is used for carrying out motion estimation on the target object according to the screened feature points; the tracking condition analysis module is used for analyzing the tracking condition of the target object in the current frame according to the distance between the screened feature point and the center position of the target object and the apparent feature of the target object; and the updating module is used for updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so that the tracking strategy of the target object is updated.

In a third aspect, an embodiment of the present invention provides a video target tracking implementation apparatus, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor executes the machine-executable instructions to implement the video target tracking method.

The embodiment of the invention has the following beneficial effects:

according to the video target tracking method, the video target tracking device and the video target tracking implementation device, after tracking parameters are initialized, a feature point set in a current frame is detected in a set image range, and the feature point set is screened according to preset screening conditions; respectively matching the screened feature point set with a target object corresponding to the previous frame and a feature point set of a neighborhood background; then, according to the screened feature points, motion estimation is carried out on the target object, and according to the distance between the screened feature points and the center position of the target object and the apparent features of the target object, tracking condition analysis is carried out on the target object in the current frame; finally, updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object; in the mode, the tracking result can reflect the position of the target object in time, and can also accurately reflect the range and the rotation angle of the target object, so that the tracking of the video frame target object has better robustness and robustness, meanwhile, the calculation complexity is lower, and the tracking robustness and the calculation speed are both considered.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention as set forth above.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of an algorithm for video target tracking according to an embodiment of the present invention;

fig. 2 is a flowchart of a video target tracking method according to an embodiment of the present invention;

FIG. 3 is a flowchart of initializing trace parameters according to an embodiment of the present invention;

fig. 4 is a flowchart for matching a feature point set of a target object and a feature point set of a neighborhood background respectively corresponding to a previous frame according to a filtered feature point set according to the embodiment of the present invention;

fig. 5 is a flowchart of analyzing a tracking status of a target object in a current frame according to an embodiment of the present invention;

fig. 6 is a schematic diagram of analyzing a tracking status of a feature point matching situation according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a process of tracking and locating an object according to an embodiment of the present invention;

fig. 8 is a schematic diagram of updating a feature point set of a target object and a neighborhood background, an apparent feature of the target object and the neighborhood background, and an inter-frame motion parameter of the target object and the neighborhood background according to an embodiment of the present invention;

fig. 9 is a flowchart of updating feature point sets of a target object and a neighborhood background according to an embodiment of the present invention;

FIG. 10 is a flowchart of another video target tracking method according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a video target tracking apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an apparatus for tracking a video target according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flow chart of an algorithm for video target tracking is shown; after target initialization, a target initial state X is obtained₀And initializing the target appearance model A₀And entering a tracking stage. In video frame I_tAfter the target arrives, positioning the target in the current frame according to the previous target state and the target model to obtain the state X of the target in the current frame_tAccording to the apparent characteristics of the target in the current frame, the appearance model A is formed_tAnd (6) updating. Generally, the tracking process is inevitableAnd shielding and tracking drift occur, so that robust tracking is realized, the current tracking state is analyzed, and a tracking strategy is adjusted correspondingly. In addition, to realize robust and robust tracking in a complex scene, a feature model is often established by fusing multiple features, so that the problem of multi-feature fusion is often also a problem to be considered by a robust tracking algorithm.

A typical target tracking system mainly includes the following three steps:

(1) and (3) establishing a target model, wherein no matter how the tracking strategy is, the tracking algorithm needs to establish an apparent model for describing the target and search the position of the target in the current frame according to the target model.

(2) The video target tracking solution can be generally divided into a stochastic algorithm and a deterministic algorithm according to different tracking algorithm ideas, and the stochastic method considers the tracking problem as the optimal state of the target in the current frame under the observation data and the state of the known target. While the deterministic approach reduces tracking to the problem of solving the optimal cost function.

(3) The tracking algorithm is to compare and analyze the observation result of the current frame characteristic information with the prior knowledge (namely the target model) of the characteristic information to obtain the tracking result of the current frame. However, in the actual tracking process, the apparent features of the tracked target are not constant, and the apparent changes of the target can be divided into two cases: the appearance of the target in the image frame is actually changed due to factors such as illumination change, deformation, non-planar rotation and the like, and at the moment, the appearance model of the target is adapted in time to follow the change; another situation is the change of the target appearance due to occlusion, noise, etc., when the appearance model should not follow the change of the current frame. It can be seen that the requirements for updating the appearance model are quite different for the two cases, and therefore how to deal with the change of the appearance feature of the target is an important challenge for robust target tracking.

Methods for searching and positioning targets can be divided into stochastic algorithms and deterministic algorithms. The stochastic algorithm converts the target tracking problem into an optimal state estimation problem under a Bayesian framework, wherein the state is a target tracking result and comprises parameters such as the position range of a target in the current frame. The stochastic tracking algorithm is divided into two steps of prediction and observation vector updating, under the condition that prior knowledge of targets such as target representation, initial state and the like is known, the current state of the targets is predicted according to a target motion model, then the maximum posterior probability of the target state is solved through observation data to obtain the optimal estimation of the targets, and the classical stochastic tracking algorithm comprises Kalman filtering (Kalman filter), Particle filtering (Particle filter) and an improved algorithm thereof.

Deterministic algorithms enable tracking by measuring the similarity of a current frame candidate target region to a known target model, often by matching algorithms such as: the Mean-shift algorithm uses the gradient of the non-parameter probability density, and searches an image area which is most similar to the density estimation of the target color kernel in the neighborhood of a previous frame target as a reference in a current frame as the position of the current frame target. The Mean-shift and Cam-shift algorithms are based on the idea to track the target. To improve the robustness of tracking, it is usually necessary to preprocess the image frame sequence, improve the image quality, and build and update the target model.

Whatever target positioning strategy needs to establish a target model and search the optimal matching of the target in the current frame according to the target model. Therefore, establishing a model for describing the appearance of the target is an important factor for determining the robustness of the tracking algorithm, and the primary problem of establishing the apparent modeling of the target is to select apparent features capable of effectively describing the target, and the methods for establishing the apparent model can be divided into the following methods according to the image features used for establishing the apparent model of the target:

(1) apparent features described based on pixel values: directly using pixel values to create target features can be divided into vector-based methods, which directly convert image regions into a high-dimensional vector, and matrix-based methods, which generally create target features directly using a two-dimensional matrix. After the apparent characteristics of the target are established by the method, the target is tracked by calculating the correlation between the current frame image area and the target template, and the target characteristics are updated by using the tracking result in the current frame image.

(2) Apparent features described based on the optical flow method: the optical flow method takes a space-time displacement density field of each pixel in a target image area as a target feature, and generally comprises two types of optical flow calculation methods based on a brightness constant constraint and a non-brightness constant constraint. The non-luminance-invariant constraint method is to geometrically constrain the optical flow field by introducing the spatial context of the pixels. In general, the optical flow method has a high computational complexity.

(3) Apparent features described based on probability density: an image histogram is the most common gray level probability distribution description method, such as Mean-shift and Camshift tracking algorithms, and establishing target features by using the histogram is the most common method in the target tracking algorithm at present.

(4) Apparent features based on covariance description: the target model established based on the covariance can describe the interrelation of all parts in the target.

(5) Apparent features based on profile description: describing the tracked target by using a closed contour curve of the target object boundary, and establishing the apparent characteristic of the target; and the contour features can be continuously updated in a self-adaptive manner along with the scaling, rotation and deformation of the target, so that the method is suitable for occasions of tracking non-rigid targets.

(6) Apparent features based on local feature description: the target is described by only using some local characteristics of the target, such as some distinctive points, lines or local areas of the target, and the target is tracked by establishing a target model through the local characteristics and matching with the local characteristics detected in the current frame, so that even if the target is partially occluded, as long as some local characteristic points can still be detected, effective target tracking can be realized. Local Features commonly used in the target tracking process include angular point Features (such as Harris angular points), Gabor Features, SIFT (Scale Invariant Feature Transform) Features, Speeded Up Robust Features (SURF) Features, and the like.

(7) Apparent features based on compressed sensing: target tracking can be seen as a problem of finding a sparse representation of a tracked target based on a dynamically constructed and updated sample set. And performing sparse representation on the tracking target by utilizing a norm minimization method according to a target sample set, and evaluating the tracking target based on the sparse representation of the sample under the framework of a Kalman filter. The target tracking can also be regarded as a sparse approximation problem under a particle filter framework, the target is sparsely represented by a regularization least square method, and a candidate target with the minimum error with the target sparsely represented in a new image frame is a tracking target.

In the actual tracking process, the apparent characteristics of the tracked target are not invariable due to the influence of factors such as shielding, noise, illumination change, and distance change between the target and the detector. The current apparent online adaptive updating algorithm can be divided into two types: a generating formula (generating) method and a judging formula (discriminating) method; the algorithm of the generating formula only models the target appearance, but does not consider the distinguishing capability of the target model to the background and other target appearances, and the method firstly establishes a target appearance model and searches and tracks the target by obtaining the maximum likelihood or the maximum posterior probability; the decision-based algorithm treats target tracking as a target detection problem: the method comprises the steps that an object is separated from a local area of a neighborhood background of the object through an online training and updating classifier, when an initial frame is used, a user firstly determines the object, and therefore a feature set describing the object and a feature set describing the neighborhood background of the object are obtained, in a continuous frame, the object is separated from the background through a binary classifier, and the classifier needs to be updated timely in order to cope with apparent changes.

The existing tracking algorithm always makes a trade-off between robustness, tracking accuracy, robustness and computational complexity, and has the following specific disadvantages:

(1) the tracking result generally only comprises the position of the target and does not comprise the range of the target; the traditional tracking algorithm obtains the current target position by establishing a target model and adopting a searching and matching method in the current frame, and considering the requirement of tracking application on computational complexity, the tracking result usually does not include the range of the target, no mention is made of the rotation angle of the target, because the position of the tracked target is only the optimal search in a two-dimensional image, and the optimal matching search space is expanded to three or even four dimensions to obtain the range or the rotation angle of the target, so that the computational complexity is greatly increased, however, in many application occasions, the accurate knowledge of the range and the rotation angle of the target has important significance for further processing.

(2) The tracking robustness under the conditions of shielding, tracking drift, complex background and the like needs to be further improved; the traditional tracking method is very sensitive to target occlusion even if the target occlusion is partial; in addition, the tracking drift and the tracking loss lack accurate analysis and judgment, and background information is easily introduced into a tracking model of a target, so that the abnormal condition in the tracking process is difficult to be timely processed, and the tracking failure is caused.

(3) The calculation complexity is always a key factor of the tracking algorithm, and the performance of all aspects is difficult to be considered; an excellent tracker or target tracking method should give consideration to robustness, robustness and computational complexity, and is a complete system requiring cooperation of multiple links, whereas the conventional tracking method may have a good effect on computational complexity, but sacrifices robustness to a certain extent, or emphasizes robustness, but sacrifices computational speed, and is generally difficult to give consideration to.

Based on this, the embodiment of the invention provides a video target tracking method, a video target tracking device and a video target tracking implementation device; the technology can be applied to the target tracking process among continuous video frames; the techniques may be implemented in associated software or hardware, as described by way of example below.

Referring to fig. 2, a flow chart of a video target tracking method is shown; the method comprises the following steps:

step S202, initializing tracking parameters; the tracking parameters at least comprise the position and the range of the target object, the interframe motion parameters of the target object and the neighborhood background, and the feature point set of the target object and the neighborhood background; a plurality of apparent features of the target object and the neighborhood background; the target object may also be referred to as a target.

This step S202 may be specifically implemented by: (1) extracting the apparent characteristics of a target object and a neighborhood background in a current frame; the apparent features at least comprise a plurality of feature descriptor vectors, scale factor feature information, color features, texture features and edge features; (2) determining the center position of the target object and the length and width of the target rectangular frame; (3) initializing the inter-frame motion parameters of the target object and the neighborhood background into the difference of corresponding transformation parameters between the current frame and the previous frame; (4) initializing a feature point set of a target object into a rectangular frame of the target object, and detecting the feature point set; initializing a feature point set of a neighborhood background into a feature point set of the neighborhood background detected in a neighborhood region in a preset range outside a target object; (5) and initializing the apparent features of the target object and the neighborhood background into feature vectors of the extracted apparent features.

In consideration of the fact that video target tracking finds a target from a neighborhood background region in each frame so as to accurately position the target, the distinguishing of the target from the neighborhood background region is usually based on the difference of apparent characteristics of the target and the neighborhood background, so that a model is established only for the target and is used as a basis for tracking, and a robust and stable tracking result is difficult to obtain. Thus, the model built includes a model of the target and its neighborhood background.

In a video frame, the motion of the background among frames is caused by the motion change of a detector, and the change of the position and the range of an object among image frames is caused by the motion of the object and the motion of the detector, so that the rule of the motion of the object is different from that of the motion of the neighborhood background, and correspondingly, the change of the position of the feature point of the object or the neighborhood background area among frames actually reflects the inter-frame motion of the feature point. In practical application, no matter the object or the background, the motion between frames does not change suddenly, and especially for application under the condition of high frame rate sampling, the change of the position between frames always has continuity. Therefore, the corresponding characteristic point position frame-to-frame change also has corresponding continuity and does not generate abrupt change. Observing the inter-frame displacement of a feature point (x, y) in successive video frames, the course of its motion over time can also be expressed as

{U_i(1),…,U_i(t)}＝(u_i(x,y,t₀):1≤t₀≤t) (1)

u_i(x,y,t₀) Is t₀The inter-frame displacement observed at the characteristic point i at the moment (x, y) can be simulated by superimposing certain Gaussian noise on the motion of the inter-frame with the uniform motion within a certain time range. Therefore, we also describe the "feature point inter-frame displacement process" by using a single Gaussian distribution model, i.e. Gaussian distribution N (mu)_u,σ_u) To simulate { U_i(1),…,U_i(t)}。

On the other hand, the object and its neighboring background usually have different apparent features such as color, texture, edge, etc., and therefore their corresponding apparent models should not be the same. Also, due to the influence of noise, illumination variations, motion of the object and detector, background changes, etc., even if the entire scene is stationary, different image frames acquired by the same detector at different times will not be identical. Therefore, even if a feature point is stable in a video, image information in a local region at the position (x, y) thereof, including information such as a gradation value, changes with time. At any time t, the position of the feature point i is (x, y), feature_i(x,y,t₀) Is t₀The feature value observed at a feature point (x, y) at a time, the "feature information process" (the time-dependent change process of the feature information) of the observed value in the neighborhood of the feature point, can be expressed as

{Feat_i(1),…,Feat_i(t)}＝(feature_i(x,y,t₀):1≤t₀≤t) (2)

See the flow chart of initializing tracking parameters shown in FIG. 3; the video between successive frames is always relatively stable and does not mutate. Even in the event of occlusion, occlusionThe image area of the blocking object blocks the blocked object, the image area of the blocking object is relatively stable, on the other hand, the blocked object can still appear after a period of time, and the prior observed prior knowledge still plays an important role in the knowledge of the blocked object. The corresponding SURF feature point detection-based extracted various feature vectors, including SURF feature descriptor information, scale factor information and other feature information, are relatively stable in changes in the video over time. This "signature information process" can be described using a single Gaussian distribution model, i.e., using a Gaussian distribution N (μ)_feat,σ_feat) To simulate { Feat_i(1),…,Feat_i(t) }. And establishing an appearance model of the target through a Gaussian distribution model of the feature vectors of the feature points on the target, wherein similarly, the appearance model of the target neighborhood background is formed by the Gaussian distribution model of the feature vectors of the feature points in the target neighborhood background area.

In fig. 3, model initialization is performed at the first frame, including initialization of the following parameters:

(1) the target is represented by a rectangular frame, and the position and range of the target are initialized

Wherein

The coordinates representing the center of the target rectangular box,

representing the height and width of the target rectangular box; initializing a target neighborhood background region to

Center of, in

The area of the target itself is removed for the high and wide rectangular boxes.

(2) The inter-frame motion parameters of the initialization target are

I.e. no translation, no rotation, no zooming; the inter-frame motion parameter of the initialized target neighborhood background is

I.e. no translation, no rotation, no zooming; the coefficient t represents the frame ordinal number, and t is 0 in the first frame; the mean value of the Gaussian model of the inter-frame motion of the initialized target is

After the second frame comes, the variance of the Gaussian model is set to be

Initializing to a difference of detected transformation parameters of the first frame and the second frame; the mean value of the Gaussian model of the interframe motion of the initialized background is

After the second frame comes, the variance of the background motion Gaussian model is set to be

The difference of the detected transformation parameters of the first frame and the second frame is initialized.

(3) Initializing a set of SURF feature points of the target and the neighborhood background. In a manner that

The center of the device is provided with a central hole,

SURF feature points are detected in a high and wide rectangular area to obtain a SURF feature point set Pg₀Feature point set to be located in target rectangular frame

Initialized to a target feature point set, and the feature point set positioned in a background region of a target neighborhood

A background feature point set, among which:

(4) initializing the appearance model of the target and the appearance model of the neighborhood background area. For each feature point, extracting a feature description sub-vector corresponding to the ith feature point at coordinates (x, y) in the t frame according to a SURF feature point detection algorithm

And simultaneously obtaining the corresponding scale factor characteristic information

According to different tracking objects and application occasions, vectors such as textures, gradients, gray average values and the like in the neighborhood of the feature points can be detected

Considering that each feature vector selected by each SURF feature point meets Gaussian distribution along with the change of video frames, when t is 0 in the first frame, initializing the mean value of corresponding Gaussian components of the feature points

Observed value of the feature point

At the first frame time, corresponding Gaussian model variance of each feature vector

Is initialized to be one comparisonA large initial value; therefore, the tracking starts, and after initialization, the model is established to include: (1) position of target and bounding rectangle, using parameters

Description is given; (2) motion model, object motion Gaussian model, using parameters

To describe, and the Gaussian model of the background motion, using parameters

Description is given; (3) detected feature point set Pg₀Belonging to a target and a neighborhood background, the characteristic point set of the target is

The feature points of the background are set as

(4) The feature vector corresponding to each feature point is

The feature vector corresponds to a Gaussian model parameter of

Step S204, detecting a feature point set in the current frame in a set image range, and screening the feature point set according to a preset screening condition; the feature point set comprises feature points and feature vectors corresponding to the feature points;

the step S204 may be implemented in the following manner: (1) determining the coordinates of the upper left corner and the lower right corner of an image rectangular frame of the image range to be detected; (2) detecting the characteristic points in the image rectangular frame to obtain the coordinates of the characteristic points; (3) calculating the trace of the Hessian matrix of the characteristic points and the characteristic vectors corresponding to the characteristic points; the feature vector comprises a feature descriptor vector, scale factor feature information, and color, texture and edge vectors; (4) and screening the characteristic points in the characteristic point set according to the following screening conditions: the track of the Hessian matrix of the characteristic points has the same sign as the Hessian matrix track of the characteristic points in the previous frame of video frame; the distance between the characteristic point and the characteristic point in the previous frame video frame is smaller than a preset distance threshold value; the Euclidean distance between the feature point and the feature point in the previous frame video frame and the corresponding feature vector meets a preset feature vector threshold; the displacement length, the displacement direction and the relative position relation between the characteristic point and the characteristic point in the previous frame video frame meet a preset displacement consistency threshold; and when the feature points and the feature points of the previous frame of video frame are in a plurality of-to-one matching relationship, screening the feature points with the minimum Euclidean distance from the plurality of feature points.

Step S206, respectively matching the screened feature point set with the feature point set of the target object and the neighborhood background corresponding to the previous frame;

in consideration of the problem of computational complexity, feature point detection and matching are not usually performed on the full image, and the position and range of the target are determined by inter-frame matching of feature points in each frame. And evaluating the target positioning and tracking condition of the current frame by combining with the previously established target motion model, and determining a local area range for detecting and matching the feature points in the new frame of image according to the evaluation result. On the basis of evaluating the tracking accuracy, determining the image range of the next frame for detecting the feature points according to the tracking result of the current frame according to the following formula:

in the formula

For the coordinates of the center position of the target in the current frame and the height and width, thrdU is a threshold constant, and usually takes a value of 2.4-3, (LTx, LTy) and (RBx, RBy) respectively represent the coordinates of the top left corner and the bottom right corner of the rectangular frame of the image for feature point detection in the next frame.

SURF feature point detection is performed within a rectangular image block defined by coordinates (LTx, LTy) and (RBx, RBy), and coordinates (x) of the detected feature point in the image are calculated_i,y_i) Calculating the Hessian matrix trace of the characteristic points, and calculating the characteristic vector corresponding to each characteristic point

The characteristic points belonging to the target are different from the characteristic points belonging to the neighborhood background region in apparent characteristics such as obeying motion rules, colors, shapes and the like, so that the characteristic point set Pg_t-1The method is divided into two categories: feature point set located in target area

Feature point set located in background area

Detecting the feature point set of the current frame

Then, respectively connecting with the target feature point set

And background feature point set

Matching is performed, where TN (t-1) is t-1The number of the target characteristic point sets at the moment, BN (t-1) is the number of the background characteristic point sets at the moment of t-1. The matching result between feature point sets can be represented by a binary vector in the pairing space, where Matched is {0,1}^MMatched per entry in vector Matched_ijMatched on behalf of a pairing response_ijAnd if not, the matching is failed, M represents a matching space formed by the feature point sets of the previous and next frames, M can be described by a two-dimensional matrix, the size of the matrix is N (t-1) × N (t), and N (t-1) and N (t) respectively represent the number of the feature points of the previous and next frames participating in matching. The feature points in the previous frame are successfully matched with one feature point of the current frame, or no feature points are matched, namely, the matching is matched when the constraint condition Rstr is met:

referring to fig. 4, a flowchart for matching the feature point set after screening with the feature point set of the target object and the neighborhood background corresponding to the previous frame respectively includes the following steps:

(1) matching based on Hessian matrix trace; the SURF feature points are local extreme points in the image, and can be divided into two types according to different extreme value conditions, namely, the central gray value of the feature point is two conditions of a minimum gray value and a maximum gray value in the neighborhood, and obviously, matching should not occur between the two types of feature points. The central gray scale of the Hessian matrix of the SURF characteristic point can be judged to be the local maximum or minimum value by calculating the Trace of the Hessian matrix (namely the sum of diagonal elements of the Hessian matrix), the Trace of the characteristic point is expressed by Trace, and if the Trace of the Hessian matrix is positive, the central brightness of the characteristic point is larger than the brightness of the neighborhood pixels; if the trace of the Hessian matrix is negative, the central brightness of the feature point is darker than the brightness of the neighborhood pixels. Comparing Hessian matrix traces of two feature points i and j to be matched in the pairing space M, and considering that the pair of the feature points to be matched is possibly matched only if the feature points i and the feature points j to be matched are of the same sign, namely matched_ij1, and the candidate matching feature point set candidate _ matchpair0 is obtained.

(2) Base ofMatching the characteristic point displacement size constraint; since it is considered that inter-frame motion of the feature point does not change abruptly, the feature point j of the current frame that can be matched with the feature point i of the previous frame is certainly within a certain range centered on the feature point i, and feature points beyond the range do not have the possibility of matching with i, that is, the distance Dist of the inter-frame feature point pair (i, j) is removed from the candidate matching feature point set candidate _ matchpair0_mijGreater than a prescribed threshold thre σ_mTo obtain a new candidate matching feature point set candidate _ matchpair1, which can be expressed by the following formula.

(3) Matching based on feature vector constraints; respectively calculating the target feature point set of the previous frame

And background feature point set

The feature point set detected from the current frame

Between feature vectors

Distance between them

According to the established feature point appearance model, comparing the distance between the feature vectors with the variance of the corresponding feature model

Distance if

If the values are all less than the corresponding threshold values, the matching is considered, and the matching response matched is configured_ijIs 1, otherwise it is consideredMismatch, configure the paired response matched_ijIs 0.

matched_ij＝match_d&match_s&match_o (12)

Thus, a new candidate matching feature point set candidatjmatcha 2 is further selected from candidate matching feature point set candidatjmatcha 1, where thre σ is a threshold value, typically set to 2.4-3.

(4) Matching based on feature point displacement consistency constraint; the movement of the characteristic points in the target area between frames is caused by the change of the position of the target between frames, and similarly, the displacement of the characteristic points in the background area between frames is caused by the movement of the detector, so that the characteristic point set belonging to the target

The position change between frames should satisfy the same motion constraint, and similarly, the feature point set belonging to the background

The same motion constraints should also be satisfied. We generalize such motion constraints into three conditions: the interframe displacement of the same type of feature points has similar displacement size, namely the interframe displacement vector lengths of correctly paired feature points have consistency; the interframe displacement of the same type of feature points should have similar displacement directions, and the interframe displacement vector directions of correctly paired feature points should also have consistency; in most cases, the characteristic points which can be correctly matched are shifted before, after and between frames, and mutuallyThe meta-position relationship should remain substantially unchanged.

By using the idea of RANSAC algorithm, the matching feature points meeting the above 3 conditions are selected in the set candidate _ matchpair2, and the process can be divided into three steps: (1) optionally two pairs of inter-frame paired feature points (i) satisfying a condition₁,j₁) And (i)₂,j₂) Set parameters are estimated. Characteristic point i₁And i₂Is the feature point of the previous frame, and j₁And j₂For the current frame feature point, the frame bit quantity is calculated

Length of | a | ═ i |₁,j₁| and vector

Length b | ═ i₂,j₂And calculating the vector

And vector

Angle theta between_ab(ii) a Computing intra-frame vectors

Length of | c | ═ i |₁,i₂L, calculated amount

Length of (d | ═ j |)₁,j₂L. Calculate the mean of the inter-frame vector lengths | a | and | b |

And variance

Calculate the mean of the vector lengths | c | and | d | within the frame

And variance

By ratio of variance to mean

And

representing the vector length change conditions of different candidate matching feature points between frames and in frames, wherein the motion of the feature points is subject to the motion of the whole target or background area to which the feature points belong because the motion cannot be suddenly changed, so that the two specific values are not too large, and the angle theta_abIt should not be too large if the inter-frame feature point displacement variance sum-mean ratio Par1 is less than 0.24, and the intra-frame feature point displacement variance sum-mean ratio Par2 is less than 0.2, and two pairs of feature points (i)₁,j₁) And (i)₂,j₂) Angle theta between_abWhen the arc value is less than 0.15, the arc value is calculated according to

And the mean of the phase angles of vector a and b

Taking the characteristic point pairs as model parameters, continuing the next step, or reselecting the characteristic point pairs; (2) using estimated model parameters

And

setting a threshold value, and calculating each candidate matching feature point pair (i) of the set candidate _ matchpair2_n,j_n) Inter-frame displacement length i_n,j_nI, direction

Computing inter-feature point vectors in a previous frame

And vector

Mean value of length

Vectors between feature points in the current frame

And vector

Mean value of length

Calculating the variance of vector length between feature points in a frame

If it is

Is less than

Less than 0.1, and

if the characteristic point pair is less than 0.3, the characteristic point pair (i) is considered to be_n,j_n) And an inner point, otherwise, an outer point. Find out the interior points in the set candidate _ matchpair2, and record the corresponding number of interior points. (3) Finding out the estimation with the most interior points, if the proportion of the most interior points to the total number of the set pairings is greater than a threshold value, or the number of interior points is greater than a specified threshold value for short, then the interior points judged under the estimation are used as a new candidate pairing feature point set candidate _ matchpair3, otherwise, repeating the above steps.

(5) Matching based on feature point pairing uniqueness constraints; in the new candidate paired feature point set candidate _ matchpair3, there may be a plurality of featuresWhen the feature points match the same feature point, it is obviously incorrect, and all the feature point pairing relations in the set candidate _ matchpair3 that do not satisfy the one-to-one correspondence constraint are detected, and the non-minimum fusion distance Dist between the apparent feature vectors is deleted_intergralijOnly those matching relations with the minimum fusion distance are reserved as the matching result, as shown in fig. 4, a new matching relation set candidat _ matchpair4 is further obtained. Wherein the fusion distance Dist_intergralijBy the distance between the various feature vectors

Weighted fusion to obtain:

weight_nis the normalized fusion weight value of the nth feature information, n belongs to { d, s, o } is one of the above-mentioned features,

representing the distance between the feature vectors selected according to the actual situation of the video, and calculating the feature vectors in an online learning mode

Variance over time

Defining the fusion weight as:

step S208, estimating the motion of the target object according to the screened feature points;

usually, a rectangular frame is used to represent the tracking area of the target, and the center of the rectangular frame of the target in the previous frame is xc_t-1＝(center_x_t-1,center_y_t-1)，h_t-1And w_t-1Width and height of the representation. The change of the inter-frame position of the target and its neighborhood background region can be regarded as the superposition of translation along the horizontal or vertical direction, scaling with the geometric center as the origin and rotation, and the transformation parameters can be used

(object) or

(neighborhood context), where u_t＝(ux_t,uy_t) For translation parameters, p_tTo scale the parameters, and θ_tFor the rotation parameter, the transformation equation between target region frames is:

ideally, the feature points on the target should follow the target to move in unison therewith. Let the characteristic point at time t-1

And the position at time t is

The characteristic points are obtained by calculation according to a formula (15)

Position estimate at time t

Should be associated with the characteristic point

Similarly, in practice, the estimated value is due to the influence of noise and observation angle variation

And the observed value

And not completely identical. Observed value

Can be regarded as an estimate

Superimposed with gaussian noise. After the pairing relation set candidate _ matchpair4 of the feature points of the previous and next frames is obtained, the estimation value of the feature point set of the previous frame in the current frame image is used as the basis

And the observation value of the current frame characteristic point matched with the current frame characteristic point

Defining the observation error as:

solving motion equation parameter meeting minimum observation error by using nonlinear least square curve fitting method

And

here the weights

The robustness of the feature points is used for determining, and the feature points with good robustness are endowed with larger weights.

Step S210, analyzing the tracking condition of the target object in the current frame according to the distance between the screened feature point and the center position of the target object and the apparent feature of the target object;

the step S210 may be specifically implemented by: (1) detecting the feature points which are wrongly classified according to the distance between the feature points and the central position of the target object, removing the feature points which are wrongly classified, and generating a first feature point set; (2) and analyzing whether the target object in the current video frame has tracking drift or not according to the apparent characteristics of each characteristic point in the first characteristic point set.

In the process of tracking a target object, tracking drift, occlusion (including partial occlusion and complete occlusion) and tracking loss are inevitable, and if robust tracking is to be realized, the current tracking result is analyzed to judge whether tracking is accurate or not, or drift, occlusion, loss and other situations occur, and a tracking strategy is adjusted in time to ensure robust tracking.

Referring to fig. 5, a flowchart for analyzing the tracking status of the target object in the current frame is shown; acquiring a pairing relation set candidate _ matchpair4 of the feature points of the current frame and the previous frame, and respectively estimating the inter-frame motion parameters of the target and the neighborhood background area thereof by using a least square method

And

and then, whether tracking drift occurs or not needs to be analyzed, and whether normal tracking or shielding and tracking loss occur or not is analyzed according to the characteristic point matching condition.

The tracking loss usually starts from tracking drift, so that the accurate judgment of whether the tracking drift occurs is of great significance for improving the performance of the tracker. The embodiment of the invention detects the characteristic points of the current frame

The sets respectively correspond to the target feature point sets

And feature point set of background region

Matching is carried out, a pairing relation set candidate _ matchpair4 is found through multistage series connection and multi-condition constraint, and the inter-frame motion parameters of the target and the neighborhood background are respectively estimated according to the pairing relation set candidate _ matchpair4

And

calculating the rectangular frame range corresponding to the current frame target according to the inter-frame motion parameters of the target, wherein the feature points detected by the current frame are located in the classified target feature points in the target rectangular frame

And the feature points outside the target rectangular frame are classified as background feature points

However, in practical applications, feature points appearing around a target and in an adjacent background area are easily misclassified, if the feature points belonging to the background are misclassified as targets, the misclassified feature points are successfully matched again when feature points of subsequent frames are matched, and even the misclassified feature points can be successfully matched between frames and participate in calculation of motion model parameters, so that tracking drift and even tracking loss can be caused when the subsequent frames are tracked. In addition, due to noise, similar local image features are also prone to cause tracking drift.

In practical application, the tracked target is a rigid body, or the shape of the target does not change suddenly between frames. Therefore, the relative position of the background feature point in the background does not change abruptly between frames, the relative position of the target feature point on the target does not change abruptly, and particularly for a rigid body target, the relative position of the target feature point on the target changes less.

And if the relative position of the target feature point to the geometric center of the target feature point does not have interframe mutation, detecting the misclassified feature point on the basis. First, the target is normalized by the width and height of the target rectangular boxThe distance from the target feature point to the geometric center is used as the relative position of the feature point, and when t frames are calculated on the basis, the coordinates are

Relative position of the feature point i

And comparing the relative position of the feature point with the relative position of the feature point in the previous frame

If the change is more than 0.25, the feature point is considered to be misclassified and therefore causes tracking drift, and the target feature point set is selected

The feature point is eliminated, the pairing relation set candidate _ matchpair4 is updated to candidate _ matchpair5, and the target inter-frame motion parameter is re-estimated

Due to noise and factors of similar local apparent features in image space, it is still possible to cause feature point matching errors, resulting in tracking drift. For such tracking drift, it is assumed that the apparent information of the target does not change abruptly between frames. If tracking drift occurs, part of the detected target range is actually the neighborhood background, the extracted apparent information in the range can be fused into the background information, and compared with the prior knowledge of the target apparent characteristics, the apparent characteristics extracted in the range have a larger difference with the condition of the extracted apparent characteristics under the condition of accurate tracking, namely, sudden change of the apparent information can occur.

From previously estimated inter-target motion parameters

And the positions of four vertexes of the target rectangular frame of the previous frame are calculated to obtain the current tableAnd displaying a rectangular region of the target, extracting an apparent feature vector in the rectangular region, comparing the apparent feature vector with historical experience of the apparent feature vector, judging whether mutation occurs or not, further judging whether drift occurs or not, and converting whether the tracking of the current frame has drift or not into the problem of solving the likelihood probability. The current estimated target motion parameter is obtained by comparing the apparent feature vectors between frames

And the steps are realized, so that whether the tracking is accurate or not is judged by analyzing the apparent characteristic vectors, and the tracking is not reliable.

However, for the tracking algorithm, on one hand, the robustness of the algorithm needs to be improved, and on the other hand, the computational efficiency of the algorithm needs to be ensured. The compressed sensing theory considers that a signal can be projected to a certain proper transform domain to obtain a sparse transform coefficient, then a high-efficiency observation matrix is designed to obtain a useful observation value hidden in the sparse signal, the useful observation value can be associated with the signal through a small amount of observation value, the effectiveness of a characteristic vector on target tracking judgment is concerned corresponding to the video tracking problem, therefore, the target characteristic is converted into a limited observation value, namely a compressed vector, the compressed vector after dimension reduction is directly utilized to describe a target to obtain the apparent characteristic of the target, the compressed sensing theory ensures that the information of an original signal can be almost losslessly stored through the small amount of compressed vector, and the calculation complexity of an algorithm can be greatly reduced. Extracting high-dimensional Haar-like feature vectors from the candidate target region according to the sparse theory

Thus, the signal x is a vector which can obtain K sparse transform coefficients under orthogonal transform, and a Gaussian random measurement matrix which meets constraint isometry property can be directly adopted

Measuring the compression thereof to obtain the pressureReduced measurement vector

N may be set to 10⁶K is 10 and the compressed measurement vector dimension m is 50. Therefore, the ith element in the compressed measurement vector y is the inner product of the ith row vector of the measurement matrix and the Haar-like feature vector, namely:

after a target position and a range are determined by SURF feature point matching in a current frame, in a neighborhood with a radius smaller than alpha near the position, an image block with the same size as that of a target rectangular frame is sampled as a positive sample by taking the neighborhood as a center, alpha can be set to be 3, in a neighborhood range with a radius smaller than beta and larger than xi near the target position of the current frame, the image block with the same size as that of the target rectangular frame is randomly sampled 60 by taking the neighborhood as a center, and xi is used as a negative sample<Beta and beta can be set as the length of a rectangular frame, xi is 6, a compressed measurement vector y is extracted from an image block represented by positive and negative samples, and the (mu) of the compressed measurement vector y of the positive and negative samples is calculated and updated by an EM algorithm under the condition that tracking is accurate¹,σ¹) And (mu)⁰,σ⁰). Wherein: mu.s¹σ 1 and μ⁰,σ⁰Mean and standard deviation of the real target and candidate background samples, respectively.

The question of whether or not the candidate region is a target can be regarded as a two-class question, and the result v ∈ {0,1}, where p (v ═ 1) and p (v ═ 0) represent the probabilities that the candidate region is a target and a non-target, respectively, and both the probabilities are 0.5. Consider the conditional distribution p (y)_i1) obeys a gaussian distribution

And conditional distribution p (y)_i| v ═ 0) obeys a gaussian distribution

After m positive and negative samples are obtained, the score values of the samples can be calculated:

because the target apparent characteristics do not have interframe mutation and the corresponding score value does not have interframe mutation, the change of the score value also meets the Gaussian distribution

And updating the mean and variance of the target score after the tracking of each frame is finished by using an EM algorithm

Taking the tracking result matched according to the SURF feature points at present as a sample to be evaluated, and calculating the evaluation value H of the image rectangular frame currently tracked_T(y) and judging the target tracking state:

drift ∈ (0,1), where 1 and 0 denote the presence or absence of tracking Drift, thred σ, respectively_TBeing a predefined threshold constant, thred σ_TMay be set to 2.4-3.

Before the current frame comes, the known feature point set is Pg_t-1Includes a target feature point set

And target neighborhood background feature point set

Feature point set PgD detected in the current frame respectively_tMatching is carried out, partial characteristic points can be matched, and the target characteristic points on the matching are

The background feature points on the match are

And other part of feature points which cannot be matched are respectively expressed as target feature point sets which cannot be matched

And background feature point set that fails to match

Referring to fig. 6, a schematic diagram of analyzing the tracking status of the feature point matching condition is shown; feature point set matching by analysis

And

the current tracking situation can be preliminarily analyzed according to the spatial distribution situation of the current tracking situation: FIG. 6 (a) shows a feature point set

And

all the tracking areas are not empty and are located in respective areas, and normal tracking is performed; as in (b), the feature point set

And

all are not empty, but have partially matched background feature point sets

The existing feature point is located in a target area in the current frame, and the target is possibly partially occluded. As in (c), the feature point set

Empty, but feature point set

Not empty, i.e. no feature points belonging to the target are successfully matched, which often corresponds to a loss of tracking, or a complete occlusion of the target; as in (d), the feature point set

And

all are empty, i.e. no feature points in the previous frame have been matched, which corresponds to a loss of tracking.

The above process may also be referred to as a target tracking and positioning process; as shown in fig. 7, interframe matching of SURF feature points calculates interframe displacement parameters of a target and a neighboring background thereof, calculates an interframe target, determines a region where the target may appear in a new frame according to the historical knowledge about the motion of the target after the t-th frame arrives, detects SURF feature points in the region, and respectively combines the detected SURF feature points with a target feature point set of a previous frame

And background feature point set

Matching is carried out, in order to ensure that the correct matched feature point pairs between frames are found as much as possible, and in order to avoid wrong matching as much as possible, a series connection mode of multiple constraint conditions can be adopted, wrong matching is gradually eliminated from a candidate matched feature point set, and correct matching is finally obtained; specifically, correct matching between the feature point set of the current frame and the feature point set of the previous frame can be found out according to constraint conditions that inter-frame displacement of the feature points does not suddenly change, apparent features of the feature points do not suddenly change, inter-frame displacement of the feature points belonging to the target keeps consistent with overall target motion, and the like, and inter-frame motion parameters of the target are estimated according to the matched feature points, so that target tracking is achieved.

Step S212, updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object.

Referring to fig. 8, a schematic diagram of updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the inter-frame motion parameters of the target object and the neighborhood background is shown; the embodiment of the invention divides the detected characteristic points into the target characteristic point set

And background feature point set

And performing target tracking through the SURF characteristic point matching detected by the current frame. In the tracking process, due to factors such as noise, illumination change, background change and the like, in order to realize stable tracking, a tracking model and a tracking strategy need to be adjusted in time according to the change of a video. In practical application, not all feature points in the feature point set can be matched, some feature points either disappear or cannot be matched for a long time, new feature points will appear continuously, the number of the feature points and the matching condition will change, and therefore the feature point set needs to be updated; the appearance information corresponding to the feature points can change along with time, and the corresponding appearance model can reflect the change in time; the inter-frame motion rules of the target and the neighborhood background will also change, and thus the corresponding motion model should be updated in time.

Specifically, the step of updating the feature point sets of the target object and the neighborhood background includes: (1) classifying the feature points in the feature point set according to the matching result to obtain a subset of a plurality of feature points; the subsets comprise feature point subsets with successful matching and feature point subsets with failed matching; the feature point subset which is successfully matched also comprises feature points on the target object and feature points on the neighborhood background; the feature point subset which fails in matching also comprises feature points on the target object and feature points on the neighborhood background; (2) deleting feature points in the feature point subset which are not matched successfully within the recent frame number and have higher matching failure than a set threshold from the feature point set corresponding to the previous frame; wherein, the recent frame number is the frame number of the continuous video frames of the set number before the previous frame; (3) adding the characteristic points in the characteristic point set of the current frame to the characteristic point set corresponding to the previous frame according to the tracking state of the current frame; (4) and updating the position coordinates of the characteristic points in the characteristic point set corresponding to the previous frame into the position coordinates of the corresponding characteristic points in the current frame.

Referring to fig. 9, a flowchart of updating the feature point sets of the target object and the neighborhood background is shown; the tracker has established a characteristic point set Pg before the t frame arrives_t-1Including a set of target feature points

And background feature point set

Respectively with the feature point set PgD detected in the t-th frame_tMatching is performed, after matching PgD_tThe feature points in the set should be classified into target feature points and background feature points, and feature point set Pg_t-1Together form a new feature point set Pg_t，Pg_t-1Some feature points in the image should be eliminated, and the retained feature points should be merged into the Pg after updating the coordinate positions of the feature points_tIn (1).

Feature point set Pg established at the end of t-1 frame_t-1Including a set of feature points located on an object

And a feature point set located on the target neighborhood background

The two types of feature point sets are respectively matched, and the classification attributes of the feature points are not changed whether the feature points are matched or not. Feature point set PgD detected in t frames_tAnd is and

and

after matching, the successfully matched feature points are respectively classified as target feature points

And background feature points

But there will still be some feature points that are not successfully matched, and is marked as Pg _ new_tNamely:

therefore, what is needed to determine the type of feature point is the feature point Pg _ new detected in the current frame but not successfully matched_t. According to the set Pg _ new_tPosition of middle feature point i

Tracking the position and range of the current target

And tracking state, and collecting the unmatched feature points Pg _ new_tThe classification is two types of target and background:

collection

And

respectively comparing the feature points with the feature point set of the previous frame

And

merging to obtain the characteristic point set of the t-th frame

Feature point set Pg _ new detected by current frame and not matched_tThe feature points which are newly appeared are often added into the corresponding feature point set, but the feature point set is not feasible to be infinitely increased along with the video frames, so that the matching condition among the feature point frames is generally required to be analyzed, and the relative stability of the number of the feature points is kept.

The number of times that each feature point can be matched in the latest period of time reflects the robustness of the information of the local region of the image corresponding to the feature point in the latest video. The more recent matching times are, the more stable the image information of the local area is; on the other hand, if the image information is not matched for a long time recently, it is considered that the image information of the local area is easily affected by noise and the like, and is relatively fragile. As mentioned above, the robust feature points should be given larger weight when the least square estimation of the motion model parameters in equation (16) is applied

And the reliability is higher, and conversely, the weak characteristic points are endowed with smaller weights. By setting parameters

The reliability of the characteristic point i at time t is described. After the matching operation of the feature points between frames is finished, the parameters of each feature point are updated

For the matched feature point i, the updating method comprises the following steps:

for unmatched feature point i, its coefficients

The update is performed as follows:

wherein Inc and Dec are constants

The method is an important basis for deleting the characteristic points, and can set the Inc to be 1 and the Dec to be 0.5.

For unmatched feature points, if corresponding

If the value is too small, the characteristic point i does not appear in the video for a long time, the image local information represented by the characteristic point can not appear in the video image due to factors such as occlusion, non-planar rotation and the like, and therefore, there is almost no "evidence" that the image local information described by the characteristic point will appear again, and when the value is too small, the image local information represented by the characteristic point i does not appear in the video image for a long time

And when the value is less than 0, deleting the characteristic point from the characteristic point set.

Feature point set PgD detected in the current frame_tAt the feature point set Pg_t-1When matching is performed, there is a partial feature point Pg _ new_tIf the matching is not successful, the part of feature points are newly added feature points, and the feature points can be added to the background feature point set and the target feature point set respectively according to the fact that the positions of the feature points are in the target or background area and the current tracking state is normal tracking, suspected partial occlusion and tracking loss (complete occlusion).

(a) Under the normal tracking condition, newly adding the classification of the feature points; is provided withThe current tracking obtained target position and range are

If the characteristic point is in the target range, adding the characteristic point to a characteristic point set of the target

In, otherwise, classifying as a feature point set belonging to the background

(b) Under the condition of partial shielding, classifying the newly added feature points; as shown in fig. 6, in the case of partial occlusion, some of the matched background feature points appear in the target range of the current frame, which is marked as

The feature points in the target range of the current frame and capable of matching with the previous frame include target feature points

And background feature points

It cannot be simply added to the target feature point set depending on whether the feature point is within the range of the target or not. At this time, a nearest neighbor algorithm may be used to classify the feature points newly added in the target range, that is, the feature point i is a feature point which is newly appeared in the target range and is not matched, and then the feature points are classified according to the following formula, and classified as the class with the closest spatial distance to the feature point:

here, the function G _ dis (i, Pg) represents the closest distance in spatial position in the image from the feature point i to each feature point in the feature point set Pg. And for the newly added feature points appearing in the background area, the feature points are all classified as a background feature point set

(c) Classification of newly added feature points under lost tracking (complete occlusion) conditions; at this time, the feature point set capable of matching with the target feature point in the previous frame detected in the current frame is an empty set, and all the feature points capable of matching with the previous frame are all the feature points belonging to the background

All newly appeared feature points are also classified as background feature point set

For each new feature point appearing in the current frame, the corresponding feature point

An Initial value, Initial _ M, is given, which may be set to 1:

the feature point set Pg of the previous frame is processed_t-1The coordinate position of (2) is updated to the coordinates in the current frame, which can be divided into target and background feature points that can be matched as described above (

And

) Also, some of the feature points cannot be matched with the current frame, i.e. set

And

feature point set capable of being matched

And

the positions of the feature points are the feature points matched with the current frame, and the feature points which are not matched are decreased according to the formula (23)

Has part of characteristic points due to

The values are eliminated after being reduced to be smaller than a specified threshold value, however, some feature points which can not be matched can not be eliminated. The coordinate positions of the part of feature points in the new frame are updated according to the motion equation estimated by the formula (15). Eliminating part of feature points and updating the set Pg after coordinate position_t-1Feature point set not matched with current frame

And

new feature point set Pg obtained by combination_t。

The step of updating the apparent features of the target object and the neighborhood background includes: and updating the mean and variance of Gaussian components of the feature points according to the feature description sub-vectors, the scale factor feature information, the color, the texture and the edge vectors of the feature points in the feature point subset successfully matched.

As described above, the embodiment of the present invention uses the gaussian distribution model to describe the time course of the feature point appearance, and the model is described by the mean μ and the variance σ. And initially assigning values to the corresponding mean value and variance of the feature vector, namely initializing the model, and updating the mean value and variance corresponding to the appearance model according to the feature point matching condition, namely updating the model. In practical application, through experimental analysis in a small target image range, the variation of the additive noise at different positions can be considered to be consistent within a period of time, that is, the variance of the noise at different image positions can be considered to be the same or not different approximately. Therefore, the change of the feature vector corresponding to each feature point detected in the target and the neighborhood of the target in the video frame is approximately considered to be subjected to the Gaussian distribution with the same variance. Then, the initialization and updating strategies of the gaussian model are based on the assumption, and the feature vectors corresponding to the feature points located in the target range have the same variance value, and similarly, the feature vectors of the feature points located in the neighborhood background range also have the same variance value.

When the first frame arrives or a new feature point is detected, as shown in equation (4), the mean of the newly detected feature point model is initialized to the corresponding feature vector of the detected unique feature point. At the first frame time, the variance of each feature vector of the appearance model

May be initialized to a larger initial value, such as 0.9; in the tracking process, the newly detected feature points have the same variance as the feature information process of different feature point feature vectors, the mean value of the corresponding appearance model is initialized to the detected feature vector value of the feature point, and the variance is initialized to the variance value of the corresponding feature vector of the current target or background feature point.

After the initialization of the appearance model is completed, the SURF feature points are subjected to interframe matching in a new image frame, tracking state analysis is carried out, and the Gaussian model of the feature vector is updated on the basis. The model may be trained using an online EM approximation method based on autoregressive filtering. For the feature vector j at time t, doMean value of corresponding Gaussian components of matched feature points

And variance

Kept constant, while the mean and variance of the matched gaussian components are based on the new observations

Updating:

wherein, the parameter i represents the serial number of the matched feature points, and N represents the total number of the matched feature points, which indicates that the variance calculated here is the average variance of the corresponding feature vectors of all the matched feature points. Parameter eta_μAnd η_σThe learning factors for mean and variance update are typically distributed between 0 and 1, which determine the rate at which the mean and variance of the gaussian change with time constant, so that the process of updating the mean and variance of the gaussian can be considered as a result of causal low pass filtering of past parameters. Generally, when a model is initially built, it is desirable that the model be built and converged as soon as possible, and a large learning factor is generally selected to enable the model to be built quickly. After that, the model should be stable to ensure that the previous image data has a certain influence on the model, so that the established model can reflect the history of the change of the 'feature vector' within a certain time, and a smaller learning factor should be selected to improve the robustness of the model to noise.

Thus the learning parameter η for the model mean_μThe setting is performed as follows:

similarly, the update parameter η of the model variance_σThe method comprises the following steps:

wherein, Ck_μCount the number of times each feature point is matched, and Ck_σIs a count of the number of image frames for which there are feature points that are matched. In the model initialization phase, Ck_μOr Ck_σAnd the model is small, and the convergence rate is high. After the first matching, the parameter η_μSuch that the model mean is set to the current observation and after the second match, the parameter η_σThe setting of (2) shows that the variance of the model is set as the difference of the feature vectors at the first and second matching. Over time, Ck_μAnd Ck_σAs the contribution of the current observation value to model update gradually decreases, but if the learning factor approaches zero, the model is abnormally stable and cannot reflect normal changes of image information in time, so that minimum values thrd μ and thrd σ of weight update coefficients are set, and thrd μ and thrd σ may be set to 0.2.

In addition, if the variance of the Gaussian component

Too small, it is easy to cause that the feature points that should be matched cannot be correctly matched due to being too sensitive to noise in the feature point frame-to-frame matching process. Thus the variance for all gaussian components

Defining a lower limit, e.g. T_σ0.05 to enhance the robustness of the system.

The step of updating the inter-frame motion parameters of the target object and the neighborhood background includes: and updating the mean value and the variance of the motion parameters according to the estimated values of the motion transformation parameters between the current frame and the previous frame.

The motion of the target object in the latest period is described, and the current interframe motion transformation parameter Par estimated under the minimum mean square error meaning is only matched according to the characteristic point_t＝(ux_t,uy_t,ρ_t,θ_t) It is not enough, and a corresponding motion model needs to be established for the motion of the target and the field thereof. The motion process between frames can also be described by using a gaussian distribution, and since the inter-frame deformation of the target is assumed to be small, the motion of the feature points and the motion of the target have high consistency, and the inter-frame motion of each feature point can be approximately regarded as obeying the same motion parameters. In order to reduce the operation complexity, the feature points of the target and background regions can be collected

And

the motion model of each feature point is simplified by using the Gaussian models of the motion of the target and the neighborhood background respectively, and motion transformation models are respectively established for the target and the neighborhood background area.

The model is updated by adopting an online EM approximation method, the time t is based on, and the estimated value Par of the current interframe motion transformation parameter_t＝{m_tUpdate of mean and variance of motion parameter m by m ∈ (ux, uy, ρ, θ):

learning factor eta for model update₁It is also similarly set up as:

same Ck_mIs a count of the number of image frames for which there are feature points that are matched. Model mean parameter

Is initialized to (0,0,1,0), namely the target and the neighborhood background are considered to be static without any spatial position change, after the first frame arrives, the mean value of the model is initialized to the motion parameter Par detected by the current frame_t＝(ux_t,uy_t,ρ_t,θ_t) After the second frame comes, the variance of the model is calculated

The difference of the detected transformation parameters of the first frame and the second frame is initialized. Initial phase, Ck_mIs relatively small to make the model converge as soon as possible, after which eta₁Keeping the value constant, therefore thrdm can be set to 0.1, allowing the model to be updated at a steady rate. Similarly, if the motion between frames is very uniform over a period of time, updating the formula according to the variance results in a variance of the Gaussian component

In this case, once the inter-frame motion changes slightly, the feature points to be matched cannot be matched correctly in the inter-frame matching process of the feature points, and therefore, the variance is required

Also stipulate the lower limit T_σ1,1,0.01,0.01 to enhance the robustness of the system.

In the tracking process, besides normal tracking, different tracking states such as drifting, loss, shielding and the like are inevitable, and for different tracking states, corresponding different tracking strategies are adopted for tracking so as to ensure the robustness and robustness of the algorithm. The target model, including the appearance model and the motion model, and the range estimation of the next frame of target that may appear are important tracking strategies and key factors affecting the robustness of the tracking algorithm. In the case of normal tracking, the target appearance model and the motion model do not have sudden changes, so the models are updated, and the target range estimation method of the next frame is performed according to the model updating method described above in the embodiment of the invention, however, under the conditions of tracking drift, loss and occlusion. The position and range of the target cannot be accurately determined or the target cannot be accurately observed, and in such an abnormal tracking state, tracking strategies such as an apparent model and a motion model of the target should be timely adjusted.

Therefore, the step of updating the tracking policy of the target object may be specifically implemented by:

(1) carrying out target shielding treatment; under the condition that the target is partially or completely shielded, the observation of the apparent characteristic information of the target is influenced, in the characteristic point set, the parameters of the characteristic model corresponding to the characteristic points capable of being matched can be updated according to the observation values matched in the current frame according to the formulas (26) and (27), the model parameters (mean value and variance) corresponding to the characteristic points incapable of being matched are kept unchanged, and the corresponding importance parameters are kept unchanged

And also remains unchanged. In the case of complete occlusion, the apparent feature model parameters of the target are unchanged, and in particular, the importance parameters corresponding to the feature points of the target

Remain unchanged. When partially occluded, it is still possible to locate the position and range of the target by matching local feature points. Under the condition of complete shielding or tracking loss, no characteristic point on the target can be matched at the moment, the target cannot be observed, and the motion model transformation parameter Par cannot be observed naturally_t＝(ux_t,uy_t,ρ_t,θ_t) The target cannot be accurately positioned. At the moment, the tracker estimates the position and the range of the target in the current frame according to the priori knowledge of the motion of the target in the video frame, so that the model can be considered to keep moving at a constant speed, and the mean parameter of the motion model

Remain unchanged. When the part is shielded, the position and the range of the target can still be positioned through the matched local characteristic points, and the range of the target detection of the next frame is determined according to the formulas (5) and (6). When the occlusion is completely blocked or the tracking is lost, the greater thrdU value is given to the formulas (5) and (6), so that the SURF feature points can be detected in a larger range to track the target.

(2) Target drift processing; when tracking drift occurs, the position and range of the determined target are not very accurate, so if the apparent model and the motion model are completely updated according to the current tracking result, a large error may be introduced into the model, so that the subsequent tracking result is affected, errors are gradually accumulated, drift is more and more, and most of the tracking drift is also the reason for gradual development of tracking failure. Therefore, when it is determined that tracking drift has occurred, updating of the parameters of the appearance and motion models is usually stopped, and the state of the target in the current frame is calculated according to the historical experience represented by the motion model. The feature point detection range determination for the next frame can still be performed according to equations (5) and (6), but the parameter thrdU should take a larger value. When the target tracking is judged to be correct, the feature point detection can be carried out in a relatively small range in the next frame, otherwise, the feature point detection is carried out in a large range.

According to the video target tracking method provided by the embodiment of the invention, after tracking parameters are initialized, a feature point set in a current frame is detected in a set image range, and the feature point set is screened according to preset screening conditions; respectively matching the screened feature point set with a target object corresponding to the previous frame and a feature point set of a neighborhood background; then, according to the screened feature points, motion estimation is carried out on the target object, and according to the distance between the screened feature points and the center position of the target object and the apparent features of the target object, tracking condition analysis is carried out on the target object in the current frame; finally, updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object; in the mode, the tracking result can reflect the position of the target object in time, and can also accurately reflect the range and the rotation angle of the target object, so that the tracking of the video frame target object has better robustness and robustness, meanwhile, the calculation complexity is lower, and the tracking robustness and the calculation speed are both considered.

Target tracking is a key core technology of intelligent video equipment for video behavior analysis, human-computer interaction and the like; the local features are used as one of the image features, natural robustness can be achieved on partial shielding of the target, and the stable local features can be used as a basis for carrying out robust tracking on the target. The SURF feature points are obtained by improving the quick calculation of the SIFT feature points, the calculation speed is greatly improved through optimization, and meanwhile, the advantages of accurate SIFT feature positioning, insensitivity to illumination change and rotation invariance are kept. And stable local extreme points in the image are obtained through SURF feature point detection and serve as a basis for accurately positioning the target, so that efficient video target tracking is realized.

Based on this, another video target tracking method is provided in the embodiments of the present invention, as shown in fig. 10, this method may also be referred to as a video target tracking method based on local feature point matching, and this method includes the following steps: 1. an initialization stage, establishing a model of a target and a neighborhood background thereof; 2. positioning a target in a new frame, and obtaining the state (target position, range and rotation angle) of the target in the current frame through inter-frame feature point matching to obtain a tracking result; 3. and updating the model according to the tracking result. The method comprises an initialization stage and a target tracking and model updating stage.

In the initialization stage, firstly, the state of the target, namely the position, the range and the angle of the target in the current frame are initialized, the position and the range of the target are represented by a rectangular frame, and the range of a neighborhood background area is further initialized; then, detecting SURF characteristic points of the target and the neighborhood thereof on the basis, respectively initializing and establishing a model of the target and the neighborhood background thereof according to the detected characteristic points, and establishing an initial model of the target and the neighborhood background region thereof; we consider that the inter-frame motion of the object can be described by translation, rotation around the geometric center of the object, and scaling, and initialize the inter-frame motion parameters of the object and its neighborhood background.

In the target tracking and positioning stage, after a new frame comes, SURF feature points are detected in a certain area of a new frame of image according to historical knowledge of target motion, SURF feature points are matched according to an established target model and a neighborhood background model thereof, feature point pairs capable of being correctly matched are searched, interframe motion parameters of the target and the neighborhood background thereof are calculated according to the feature point pairs, so that the position, the range and the rotation angle of the target in the new frame are determined, the current obtained target state is analyzed on the basis, and whether tracking loss, drifting and other conditions occur or not is judged to obtain a final tracking result. In the updating stage of the model, different strategies are adopted to update the model of the target and the neighborhood background thereof according to the tracking result and the analysis of the tracking state (whether the tracking is accurate, drifting, losing or being shielded).

The video target tracking method improves the robustness and robustness of tracking, and has stronger capabilities of resisting shielding, noise and a disordered background; the tracking result not only can reflect the position of the target in time, but also can reflect the imaging range and the rotation change of the target; and a feature point matching method is adopted for tracking, so that the optimal likelihood of searching a target model is avoided, and the calculation complexity is reduced.

Corresponding to the above method embodiment, refer to a schematic structural diagram of a video target tracking apparatus shown in fig. 11; the device includes: an initialization module 110, configured to initialize tracking parameters; the tracking parameters at least comprise the position and the range of the target object, the interframe motion parameters of the target object and the neighborhood background, and the feature point set of the target object and the neighborhood background; a plurality of apparent features of the target object and the neighborhood background; the screening module 111 is configured to detect a feature point set in a current frame within a set image range, and screen the feature point set according to a preset screening condition; the feature point set comprises feature points and feature vectors corresponding to the feature points; a feature point matching module 112, configured to match, according to the filtered feature point set, the feature point set of the target object and the feature point set of the neighborhood background corresponding to the previous frame, respectively; a motion estimation module 113, configured to perform motion estimation on the target object according to the filtered feature points; a tracking condition analysis module 114, configured to analyze a tracking condition of the target object in the current frame according to a distance between the screened feature point and the center position of the target object and an apparent feature of the target object; and the updating module 115 is configured to update the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the inter-frame motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result, and the tracking condition analysis result, so as to update the tracking policy of the target object.

The initialization module is further configured to: extracting the apparent characteristics of a target object and a neighborhood background in a current frame; the apparent features at least comprise a plurality of feature descriptor vectors, scale factor feature information, color features, texture features and edge features; determining the center position of the target object and the length and width of the target rectangular frame; initializing the inter-frame motion parameters of the target object and the neighborhood background into the difference of corresponding transformation parameters between the current frame and the previous frame; initializing a feature point set of a target object into a feature point set detected in a rectangular frame of the target object; initializing a feature point set of a neighborhood background into a feature point set of the neighborhood background detected in a neighborhood region in a preset range outside a target object; and initializing the apparent features of the target object and the neighborhood background into feature vectors of the extracted apparent features.

The embodiment also provides a video target tracking implementation device corresponding to the method embodiment. FIG. 12 is a schematic structural diagram of the video object tracking device; the apparatus comprises a memory 100 and a processor 101; the memory 100 is used to store one or more computer instructions that are executed by the processor to implement the above-described video target tracking method, which may include one or more of the above methods.

Further, the apparatus for implementing video object tracking shown in fig. 12 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103 and the memory 100 are connected via the bus 102. The Memory 100 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 102 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 12, but that does not indicate only one bus or one type of bus.

The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The Processor 101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present disclosure may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 100, and the processor 101 reads the information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.

The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the video target tracking method.

The embodiment of the invention provides a video target tracking method, a video target tracking device and a video target tracking implementation device, and provides a target tracking and positioning system based on SURF interframe matching, which comprises a multi-feature information extraction and self-adaptive fusion technology and a feature information updating technology; the characteristic information updating technology comprises characteristic point set updating, appearance model updating, motion model updating and tracking strategy adjustment; has the following advantages: (1) the method provides deep consideration under the SURF feature point detection and interframe matching framework and organically combines a plurality of key links such as multi-feature fusion, target and neighborhood background modeling, target tracking and positioning, model updating, tracking state detection and the like, so that the method becomes a complete tracking system and realizes robust continuous tracking of the specified target in the video. (2) The tracker designed by the invention can accurately estimate the motion parameters of the target in the current frame according to the SURF feature point interframe matching condition, accurately estimate the displacement, the target range and the rotation angle of the target, avoid the complicated search process of the traditional tracking algorithm and reduce the calculation complexity. (3) The robustness of the tracker is improved through the combination of links such as multi-feature fusion, feature point classification, hierarchical series feature point matching method design, tracking state analysis and model updating, so that the tracker can realize robust and stable tracking in complex scenes such as shading, disordered backgrounds and low signal-to-noise ratios.

The video target tracker method, apparatus, and computer program product of the system provided in the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A video target tracking method is characterized by comprising the following steps:

initializing a tracking parameter; the tracking parameters at least comprise the position and the range of a target object, interframe motion parameters of the target object and a neighborhood background, and a feature point set of the target object and the neighborhood background; a plurality of apparent features of the target object and the neighborhood background;

detecting a feature point set in a current frame in a set image range, and screening the feature point set according to a preset screening condition; the feature point set comprises feature points and feature vectors corresponding to the feature points;

matching the feature point set after screening with the feature point set of the target object and the neighborhood background corresponding to the previous frame respectively;

according to the screened feature points, carrying out motion estimation on the target object;

analyzing the tracking condition of the target object in the current frame according to the distance between the screened feature point and the center position of the target object and the apparent feature of the target object;

updating the feature point sets of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object;

the method comprises the following steps of detecting a feature point set in a current frame in a set image range, and screening the feature point set according to a preset screening condition, wherein the steps comprise:

determining the coordinates of the upper left corner and the lower right corner of an image rectangular frame of the image range to be detected;

detecting characteristic points in the image rectangular frame to obtain coordinates of the characteristic points;

calculating the trace of the Hessian matrix of the characteristic point and the characteristic vector corresponding to the characteristic point; the feature vector comprises a feature descriptor vector, scale factor feature information, and color, texture and edge vectors;

and screening the characteristic points in the characteristic point set according to the following screening conditions:

the track of the Hessian matrix of the characteristic point has the same sign as the track of the Hessian matrix of the characteristic point in the previous frame of video frame;

the distance between the characteristic point and the characteristic point in the previous frame video frame is smaller than a preset distance threshold value;

the Euclidean distance between the feature point and the feature point in the previous frame video frame corresponding to the feature vector meets a preset feature vector threshold;

the displacement length, the displacement direction and the relative position relation between the characteristic point and the characteristic point in the previous frame video frame meet a preset displacement consistency threshold; the relative position is determined by the distance between the characteristic point of the target object and the central position of the target object, wherein the characteristic point is normalized by the length and the width of a target rectangular frame of the target object;

and when the feature points and the feature points of the previous frame of video frame are in a matching relationship of a plurality of pairs, screening the feature points with the minimum Euclidean distance from the plurality of feature points.

2. The method of claim 1, wherein the step of initializing tracking parameters comprises:

extracting the apparent characteristics of the target object and the neighborhood background in the current frame; the apparent features at least comprise a plurality of feature descriptor vectors, scale factor feature information, color features, texture features and edge features;

determining the center position of the target object and the length and width of a target rectangular frame;

initializing the inter-frame motion parameters of the target object and the neighborhood background to be the difference of corresponding transformation parameters between the current frame and the previous frame;

initializing the feature point set of the target object into the detected feature point set in the rectangular frame of the target object; initializing the feature point set of the neighborhood background into a feature point set of the neighborhood background detected in a neighborhood region in a preset range outside the target object;

initializing the apparent features of the target object and the neighborhood background to the extracted feature vector of the apparent features.

3. The method according to claim 1, wherein the step of analyzing the tracking condition of the target object in the current frame according to the distance between the screened feature point and the center position of the target object and the apparent feature of the target object comprises:

detecting wrongly classified feature points according to the distance between the feature points and the center position of the target object, and eliminating the wrongly classified feature points to generate a first feature point set;

and analyzing whether the target object in the current video frame has tracking drift or not according to the apparent characteristics of each characteristic point in the first characteristic point set.

4. The method of claim 1, wherein the step of updating the set of feature points of the target object and the neighborhood background comprises:

classifying the feature points in the feature point set according to the matching result to obtain a subset of a plurality of feature points; wherein the subsets comprise feature point subsets with successful matching and feature point subsets with failed matching; the feature point subset successfully matched also comprises feature points on the target object and feature points on the neighborhood background; the feature point subset which fails in matching further comprises feature points on the target object and feature points on the neighborhood background;

deleting feature points in the feature point subset which are not matched successfully for times higher than a set threshold value and are in the matching failure in the recent frame number from the feature point set corresponding to the previous frame; wherein the recent frame number is the number of consecutive video frames of a set number before the previous frame;

adding the characteristic points in the characteristic point set of the current frame to the characteristic point set corresponding to the previous frame according to the tracking state of the current frame;

and updating the position coordinates of the characteristic points in the characteristic point set corresponding to the previous frame into the position coordinates of the corresponding characteristic points in the current frame.

5. The method of claim 4, wherein the step of updating the apparent features of the target object and the neighborhood background comprises:

and updating the mean and variance of Gaussian components of the feature points according to the feature description sub-vectors, the scale factor feature information, the color, the texture and the edge vectors of the feature points in the feature point subset successfully matched.

6. The method of claim 1, wherein the step of updating the inter-frame motion parameters of the target object and the neighborhood background comprises:

and updating the mean value and the variance of the motion parameters according to the estimated values of the motion transformation parameters between the current frame and the previous frame.

7. A video object tracking apparatus, comprising:

the initialization module is used for initializing the tracking parameters; the tracking parameters at least comprise the position and the range of a target object, interframe motion parameters of the target object and a neighborhood background, and a feature point set of the target object and the neighborhood background; a plurality of apparent features of the target object and the neighborhood background;

the screening module is used for detecting a feature point set in a current frame in a set image range and screening the feature point set according to a preset screening condition; the feature point set comprises feature points and feature vectors corresponding to the feature points;

the feature point matching module is used for respectively matching the feature point set after screening with the feature point set of the target object and the neighborhood background corresponding to the previous frame;

the motion estimation module is used for carrying out motion estimation on the target object according to the screened feature points;

the tracking condition analysis module is used for analyzing the tracking condition of the target object in the current frame according to the distance between the screened feature point and the center position of the target object and the apparent feature of the target object;

the updating module is used for updating the feature point set of the target object and the neighborhood background, the apparent features of the target object and the neighborhood background, and the interframe motion parameters of the target object and the neighborhood background according to the matching result, the motion estimation result and the tracking condition analysis result, so as to update the tracking strategy of the target object;

the screening module is further configured to: determining the coordinates of the upper left corner and the lower right corner of an image rectangular frame of the image range to be detected; detecting characteristic points in the image rectangular frame to obtain coordinates of the characteristic points; calculating the trace of the Hessian matrix of the characteristic point and the characteristic vector corresponding to the characteristic point; the feature vector comprises a feature descriptor vector, scale factor feature information, and color, texture and edge vectors; and screening the characteristic points in the characteristic point set according to the following screening conditions: the track of the Hessian matrix of the characteristic point has the same sign as the track of the Hessian matrix of the characteristic point in the previous frame of video frame; the distance between the characteristic point and the characteristic point in the previous frame video frame is smaller than a preset distance threshold value; the Euclidean distance between the feature point and the feature point in the previous frame video frame corresponding to the feature vector meets a preset feature vector threshold; the displacement length, the displacement direction and the relative position relation between the characteristic point and the characteristic point in the previous frame video frame meet a preset displacement consistency threshold; the relative position is determined by the distance between the characteristic point of the target object and the central position of the target object, wherein the characteristic point is normalized by the length and the width of a target rectangular frame of the target object; and when the feature points and the feature points of the previous frame of video frame are in a matching relationship of a plurality of pairs, screening the feature points with the minimum Euclidean distance from the plurality of feature points.

8. The apparatus of claim 7, wherein the initialization module is further configured to:

initializing the feature point set of the target object to the feature point set detected in the rectangular frame of the target object; initializing the feature point set of the neighborhood background into a feature point set of the neighborhood background detected in a neighborhood region in a preset range outside the target object;

9. A video object tracking implementation apparatus comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to implement the method of any one of claims 1 to 6.