CN109993775A - Monotrack method based on feature compensation - Google Patents

Monotrack method based on feature compensation Download PDF

Info

Publication number
CN109993775A
CN109993775A CN201910258571.8A CN201910258571A CN109993775A CN 109993775 A CN109993775 A CN 109993775A CN 201910258571 A CN201910258571 A CN 201910258571A CN 109993775 A CN109993775 A CN 109993775A
Authority
CN
China
Prior art keywords
target
pixel
image
histogram
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910258571.8A
Other languages
Chinese (zh)
Other versions
CN109993775B (en
Inventor
杨云
白杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201910258571.8A priority Critical patent/CN109993775B/en
Publication of CN109993775A publication Critical patent/CN109993775A/en
Application granted granted Critical
Publication of CN109993775B publication Critical patent/CN109993775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses it is a kind of based on posteriority pixel color histogram, histograms of oriented gradients, convolutional neural networks feature compensation video target tracking method, in simple scenario using simple feature to guarantee real-time, complex scene uses complex characteristic, to guarantee accuracy.It is combined by two kinds of features of posteriority pixel histogram and histograms of oriented gradients, obtained response characteristic figure can be good at the fairly simple situation of adaptive video scene;One classifier of training is to judge that it is insincere that the former merges the target when obtained response obtains;Finally further according to the judging result of classifier, the convolutional neural networks tracker for choosing whether the relatively slow still performance more robust of speed to be added is corrected to the target for occurring deviateing is tracked, or is given for change again to tracking lost target.The present invention improves the precision that target sizes and position are judged in video, and it can well adapt to prolonged target following task, to reach the scene of practical application.

Description

Single target tracking method based on characteristic compensation
Technical Field
The invention belongs to the technical field of single-target tracking of computer vision, and particularly relates to a single-target tracking method based on feature compensation.
Background
In the field of computer vision, tracking tasks are always a core problem, and are widely applied to many aspects such as video monitoring, man-machine interaction, robot vision perception, military guidance and the like. In the single-target tracking, the position and the size of a tracking target are manually marked by a rectangular frame in a first frame of a video, and then all that is needed in the tracking method is to closely follow the manually marked object by the rectangular frame in a subsequent frame of the video. Similar to the above, target detection is performed by scanning and searching for a target in a whole frame range in a still image or a dynamic video, and in summary, the target detection focuses on positioning and classification. Target tracking, however, focuses on how to lock a person or object in real time, and it is not aware of what is tracked by itself. Due to the real-time requirement of the tracking method, the cost of the whole frame search calculation is very expensive, the method is obviously not suitable for the scene, and the tracked object has continuity in time and space, so the search range of the tracking method can be greatly reduced. However, because of the continuity, in the tracking process, some complex scenes have interference factors such as illumination change, appearance deformation, rapid movement, occlusion, background similarity and the like, and most models of the tracking method need to continuously update the models themselves in the tracking task process, so once the models learn the background information, errors are easily generated and accumulated all the time, and finally the target is lost.
Currently, most of the mainstream tracking algorithms are short-term tracking (short-term tracking), and mainly have the following defects:
(1) poor robustness
The tracking model cannot be found back again after losing the target, the algorithms mainly make an article on the basis of the precision of the position and the size of the tracking target, the robustness is not high, the method cannot adapt to a long-time tracking task, and the model cannot be well applied to a real scene.
(2) Low speed
Both an end-to-end neural network structure tracking model and a tracking model combining a deep convolution characteristic map and related filtering can obtain high accuracy, but high calculation time is spent, so that the method is less applied to actual scenes. And other traditional tracking models based on correlation filtering can achieve high speed, but do not perform well enough in accuracy and robustness.
(3) Error accumulation
Due to various interference factors existing in a video scene, the model is difficult to track the target correctly in each frame, so background or other interference information is learned when updating the template, and the error is accumulated and is an irreversible process.
Aiming at the defects of the tracking method, the method can be well applied to a real scene, or the entry point is placed on long-term tracking (long-term tracking), so that the robustness is improved as much as possible, and the method can be suitable for a long-time tracking task while ensuring that the speed is real-time.
Disclosure of Invention
The invention aims to provide a feature compensation video target tracking method based on a posterior pixel color histogram, a direction gradient histogram and a convolutional neural network, so that the method is good in robustness and high in speed, and the model can be fully guaranteed to have a higher frame rate in the speed aspect while the accuracy and the robustness are improved; the method improves the accuracy of judging the size and the position of the target in the video, and can better adapt to a long-time target tracking task so as to achieve a scene of practical application.
The invention adopts the technical scheme that a characteristic compensation video target tracking method based on characteristic fusion is provided, and the method comprises the following steps:
s1, establishing a target tracking model branch of the color histogram feature:
s11, before a target tracking task starts, calling an OpenCV toolkit, and cutting out a target sub-image E with background information on the basis of a target image which is manually marked;
s12, separating the foreground area from the background area of the target sub-image with the background information according to the size of the target and a certain proportion; meanwhile, the pixels are subjected to scale compression within an integer range of 0-32 pixel values, and pixel proportions relative to each pixel value in a corresponding foreground region and a corresponding background region, namely a foreground pixel proportion rho (O) and a background pixel proportion rho (B), are calculated by respectively depending on a foreground mask and a background mask which are the same in size, wherein the expression of the pixel proportions rho is as follows:
ρ(O)=N(O)/|O|; (1-1)
ρ(B)=N(B)/|B|; (1-2)
wherein, O character represents the image area of foreground O, B character represents the image area of background B, N (O) represents the number of non-zero pixel values in the image area of foreground O, N (B) represents the number of non-zero pixel values in the image area of background B, | O | represents the total number of pixel values in the image area of foreground O, | B | represents the total number of pixel values in the image area of background B, and the weight β of the color histogram template of the posterior pixel of the current frame is calculated and obtained based on the formulas (1-1) and (1-2)t
Wherein t represents the current frame, and lambda is a hyper-parameter;
s13, in the next frame of video, in the image range with the target center of the previous frame as the center of the search area, cutting out the sub-image e with the S12, carrying out scale compression on the pixels to obtain psi, and obtaining the weight β of the posterior pixel color histogram template of the previous frame according to the formulas (1-1), (1-2) and (2)t-1Finally obtaining the color histogram response f by using the integral graph formulahist
Wherein psi is a sub-image after M channel pixel compression, and is defined on the current frame clipping picture e; psitThe sub-image is the sub-image after the M channel pixel compression of the current frame; h represents each pixel point of the picture corresponding to the integer range; u represents each corresponding one of the H grids, ψ u]Is a corresponding pixel point on psi, and the superscript T is a matrix transposition;
s14, completing the tracking task of one frame each time, and at the same time, predicting the position of the current frame, and weighting β the histogram template of the posterior pixelstUpdating, namely updating the foreground pixel proportion rho (O) and the background pixel proportion rho (B) respectively to obtain the pixel proportion rho (O) of the foreground O of the current frame after updatingt(O) and the updated pixel ratio ρ of the background B of the current framet(B):
ρt(O)=(1-ηhistt-1(O)+ηhistρ′t(O)
ρt(B)=(1-ηhistt-1(B)+ηhistρ′t(B); (4)
Wherein rho't(O) is an image area of the foreground O of the current framePixel proportion in the domain, ρ't(B) Is the proportion of pixels in the image area of the background B of the current frame, pt-1(O) is the proportion of pixels in the image area of the foreground O of the previous frame, ρt-1(B) Proportion of pixels in image area of background B of previous frame ηhistWeights for pixel scale update;
s2, establishing a target tracking model branch of the histogram feature of the directional gradient:
s21, selecting the target image to be tracked by rectangular frame in S11, cutting out another target area sub-image E' with different size and same background information, and extracting K channels of three-dimensional directional gradient histogram characteristic phikMultiplying the cosine window function in the OpenCV packet to calculate a template of the directional gradient histogram characteristics
Wherein,all variables are defined in a frequency domain and are obtained through discrete Fourier transform; u represents each of the meshes of Γ, Γ represents ΦkCorresponding to each grid in the integer range; the superscript i represents each of the K channels,is the conjugate of a gaussian signal after a fourier transform; the conjugate is denoted by x, e the element multiplication,is a histogram feature of directional gradients phikEach channel element obtained by Fourier transform, K being the number of channels;
s22, obtaining the directional gradient histogram template in S21Performing inverse Fourier transform to obtain h [ u ]]In the next frame of the video, in the image range with the target center of the previous frame as the center of the search area, a sub-image e' is cut, the directional gradient histogram feature phi of the current sub-image is extracted, and the directional gradient histogram fraction f of the current frame is obtained by utilizing the linear function calculationhog
fhog(φ,h)=∑u∈Γh[u]Tφ[u]; (7)
S23, after completing the tracking task of each frame, updating the template of the histogram feature of the direction gradient at the predicted position of the current frame, namely obtaining the updated final signal respectivelyAnd
wherein,andthe signals representing the current frame are respectively calculated from formula (6),anda signal representing a previous frame of the video signal,andindicating the updated final signal ηhogWeights for histogram of oriented gradient template updates;
s3, feature fusion and classifier establishment:
s31, respectively obtaining the color histogram responses f in S13 and S22histHistogram fraction of oriented gradient fhogAnd (3) performing feature fusion by defining a linear function f (x), so as to obtain:
f(x)=γhogfhog(x)+γhistfhist(x); (9)
wherein, γhogIs the weight of the histogram response of the directional gradient, gammahsitTaking the coordinate of a point corresponding to the maximum value of f (x) as the central coordinate of the target;
s32, passing through f, fhogTraining a classifier: selecting a batch of video sequences, and respectively outputting f and f after feature fusion in S31hogAnd let the input of the data set be X ═ max (f)hog);max(f)]Let output tag be h'θDenotes the true value of the dataset, h'θAn integer of 0 or 1, 0 indicating that the tracking box of the model has deviated from the target, 1 indicating no deviation from the target; let logistic regression function hθRepresents the output of the classifier:
dividing the data into a training set and a verification set according to the proportion of 7:3, and obtaining a parameter theta of a logistic regression model in the formula (10) after the training set is converged through a cross entropy loss function and a gradient descent algorithm through multiple iterative computations; then, carrying out fine adjustment on the hyper-parameters by using the data of the verification set, calculating correct results of the parameters under different values by setting the parameters with different values, and selecting the value with the highest correct result as a final parameter value to ensure that a classifier achieves a better classification result on the verification set;
s4, judging whether the convolutional neural network structure tracker needs to be accessed:
s41, mixing f and fhogInputting the data into the classifier obtained in S32 to obtain output; selecting 0.5 among the continuous values output from the classifier (i.e., equation 10) as a threshold in S32; when the output is greater than 0.5, the result of the fusion model is trustworthy without switching to access a convolutional neural network tracker; when the output is less than 0.5, the result of the fusion model is not accepted, and the convolution neural network tracker needs to be switched and accessed;
s42, when the target response score predicted by the current frame of the convolutional neural network tracker is high, repeatedly using S14 and S23, namely respectively updating the posterior pixel histogram template and the direction gradient histogram template; and then entering the tracking task of the next frame until all video frames are finished.
The invention has the beneficial effects that:
(1) the method of the invention integrates various characteristics, combines the characteristics of the characteristics, can well deal with video scenes such as illumination change, motion blur, object deformation, shielding and the like, quickly completes tracking tasks by using simple characteristics in simple scenes, and reduces the influence of interference information by switching more robust characteristics in complex scenes.
(2) The invention adds the method of self-detection classifier, so that the model can be more intelligently expressed when the characteristics are switched, and invalid information is inhibited from being learned when the template of the characteristics is updated, thereby reducing the accumulation of errors; meanwhile, the classifier is simple and does not need too much calculation overhead.
(3) The tracker with the neural network structure selected by the invention does not need to update the template, can not learn the interference information, and has good performance under the condition of target shielding.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of foreground and background masks.
FIG. 2 is a graph of a posterior pixel histogram and response.
Fig. 3 is a histogram of directional gradients and a response plot.
FIG. 4 is a schematic diagram of a single target tracking algorithm based on feature compensation.
Fig. 5 is a distribution diagram of accuracy and robustness of each algorithm under the reset mechanism.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the field of target tracking, the problems of deformation, illumination change, rapid movement, similar background, plane rotation, scale change, shielding, out-of-view and the like mainly exist.
The specific process is as follows:
s1, establishing a target tracking model branch of the color histogram feature:
s11, because the method is established under the situation of single target tracking task, namely before the target tracking task starts, an OpenCV toolkit is called, the target to be tracked is selected by a rectangular frame in a manual marking mode, a target sub-image E with background information is cut out, and then the model can be distinguished according to the characteristics of the selected target and the background thereof, so as to complete the subsequent tracking task. Therefore, in such a scenario, no matter what kind of characteristics are based on, the model generates an initial characteristic template through the image in the target frame selected by the video starting frame according to the respective modes of the model to match the candidate area of the image of the subsequent frame, so as to predict the position and size of the target.
S12, based on the first frame already selected to be tracked, the color histogram model will separate the target sub-image E with background information according to the size of the target and a certain ratio between the foreground region and the background region. Because the pixel value range is 0-255, if the original pixel value is used for calculation, a large amount of calculation time is consumed, so that one pixel needs to be compressed in a scale, the selected scale is 8, namely, the calculation is performed in the integer range of 0-32, and the model speed is greatly improved. And respectively calculating pixel proportions, namely a foreground pixel proportion rho (O) and a background pixel proportion rho (B), of the two regions relative to each pixel value by depending on a foreground mask (as shown in figure 1-a, a white target region is a single-channel image with a value of 1 and a black background region is a value of 0) and a background mask (as shown in figure 1-B, a single-channel image with a black target region value of 0 and a white background region is a value of 1) which are the same in size:
ρ(O)=N(O)/|O|; (1-1)
ρ(B)=N(B)/|B|; (1-2)
wherein, O character represents the image area of foreground O, B character represents the image area of background B, N (O) represents the number of non-zero pixel values in the image area of foreground O, N (B) represents the number of non-zero pixel values in the image area of background B, | O | represents the total number of pixel values in the image area of foreground O, | B | represents the total number of pixel values in the image area of background B, after the pixel proportion of foreground and background are respectively obtained, the weight β of the posterior pixel color histogram template of the current frame can be calculatedt
t denotes the current frame and λ is the hyper-parameter.
S13, after establishing the posterior pixel histogram template, in the next frame of video, in the image range with the target center of the previous frame as the center of the search area, likewise cutting out the sub-image e, and carrying out scale compression on the pixel to obtain psi, according to the formulas (1-1) and (1-2), obtaining the weight β of the posterior pixel color histogram template of the previous framet-1The resulting color histogram response f is obtained using the integrogram formula, as shown in fig. 2-a, 2-b and 2-chist
Wherein psi is a sub-image after M channel pixel compression, and is defined on the current frame clipping picture e; psitThe sub-image is the sub-image after the M channel pixel compression of the current frame; h represents each pixel point of the picture corresponding to the integer range; u represents each corresponding one of the H grids, ψ u]Is the corresponding pixel point on psi, upper cornerMarking T as matrix transposition;
s1-4, in the process of on-line tracking, the scene in the video is changed slightly or drastically at any time, and the influence of interference factors such as illumination change, motion blur and the like is particularly serious for the color histogram feature, so in order to better adapt to various changes in the video scene, while the tracking task of one frame is completed each time, the weight β of the posterior pixel histogram template is required to be used at the predicted position of the current frametUpdating, i.e. updating the pixel proportions ρ (O), ρ (B) of the foreground and background, respectively:
ρt(O)=(1-ηhistt-1(O)+ηhistρ′t(O)
ρt(B)=(1-ηhistt-1(B)+ηhistρ′t(B); (4)
wherein, ρ't(O) is the pixel proportion, ρ't(B) Is the proportion of pixels in the image area of the background B of the current frame, pt-1(O) is the proportion of pixels in the image area of the foreground O of the previous frame, ρt-1(B) Proportion of pixels in image area of background B of previous frame ηhistWeights for pixel scale update;
s2, establishing a target tracking model branch of the histogram feature of the directional gradient:
s2-1, cutting out a target area sub-image E' with background information on the basis of the manually marked tracking target selected in the first frame, and extracting the three-dimensional direction gradient histogram characteristics phi of K channelsk3-a and 3-b, the influence of the peripheral part of the surrounding sub-image is suppressed by multiplying the cosine window function in the OpenCV packet. Template for calculating directional gradient histogram features
Wherein,all variables are defined to be obtained through discrete Fourier transform in a frequency domain, and because cross-correlation operation exists in a model in related filtering, high calculation time cost is consumed, so that after the Fourier transform is carried out on the variables, convolution calculation in a time domain can be converted into element product calculation in the frequency domain, and the calculation time can be greatly reduced. u represents each of the meshes of Γ, Γ represents ΦkCorresponding to each grid in the integer range; the superscript i represents each of the K channels,is the conjugate of a gaussian signal after a fourier transform; the conjugate is denoted by x, e the element multiplication,is a histogram feature of directional gradients phikEach channel element obtained by fourier transform, K being the number of channels.
S22, establishing a histogram template of the directional gradientThen, inverse Fourier transform is carried out to obtain h [ u ]]In the next frame of the video, in the image range with the target center of the previous frame as the center of the search area, a searched sub-image e' is cut, the directional gradient histogram feature phi of the current sub-image is extracted, and the directional gradient histogram fraction f of the current frame can be obtained by calculating a linear functionhogThe effect graph is shown in fig. 3-c:
fhog(φ,h)=∑u∈Γh[u]Tφ[u]; (7)
s23, in the on-line tracking stage, the histogram of directional gradients also causes interference due to the change of the target in the scene, especially the influence caused by the deformation of the object is large, so that after completing the tracking task of each frame, the template of the histogram of directional gradients feature is updated at the predicted position of the current frame:
wherein,andthe signals representing the current frame are respectively calculated from formula (6),anda signal representing a previous frame of the video signal,andindicating the updated final signal ηhogWeights for histogram template update of directional gradients.
S3, feature fusion and classifier establishment:
s31, the color histogram features have a large influence on the model when there are interference factors such as illumination change and picture blur in the scene, and the histogram features of the directional gradient have a large influence on the model when there are interference factors such as object deformation and fast motion. Therefore, the two characteristics are fused, the interference of the factors can be reduced to a certain degree, the accuracy and the robustness of the tracking model are improved, the more accurate position and size of the target can be predicted in the tracking task, and the target is not easy to lose. Here, the color histogram responses f obtained in S13 and S22 are respectivelyhistHistogram fraction of oriented gradient fhogAnd (3) performing feature fusion by defining a linear function f (x), so as to obtain:
f(x)=γhogfhog(x)+γhistfhist(x); (9)
wherein, γhogIs the weight of the histogram response of the directional gradient, gammahsitFor the weight of the color histogram response, the coordinate of the point corresponding to the maximum value of f (x) is taken as the center coordinate of the target.
S32, although the two features are fused to show a good effect in most scenes, there is still a large space for improving performance for some complicated video scenes, such as those with similar background, occlusion, and out-of-view. Therefore, other trackers with more robust and better-effect neural network structures are added to improve the performance of the model. Considering that the running speed of the neural network is slow, the current general hardware equipment can not meet the requirement of real-time performance, so that the performance of the model can be exerted to the maximum extent only when the tracking task of the current frame can not be completed well by the model with the fusion of the former two characteristics by using the tracker of the neural network. The most important thing to meet the requirement is to let the feature fusion model know when the tracker of the neural network structure needs to be switched, and the f are analyzedhist、fhog(input (x) only when there is a mapping in the formula, separate notation does not require the expression (x)) three ringsThe change conditions of the scores in different scenes can be seen, and f are shown under the conditions that the target is greatly deformed or is shielded and the likehogLarge fluctuations occur, so one classifier can be trained with these two values as the label of the switching tracker.
Selecting a batch of video sequences, and respectively outputting f and f after feature fusion in S31hogAnd let the input of the data set be X ═ max (f)hog);max(f)]Let output tag be h'θDenotes the true value of the dataset, h'θAn integer of 0 or 1, 0 indicating that the tracking box of the model has deviated from the target, 1 indicating no deviation from the target; there is a concept of Intersection-over-Union (IoU) in target detection, which represents the overlapping rate of the predicted image frame and the actual image frame, and the Intersection-over-Union is used herein as the basis for measuring whether the target deviates; through experimental trials of multiple values, 0.35 as a boundary value is a more appropriate choice, namely when IoU>0.35 hour, h'θ=1,IoU<0.35 hour, h'θ0; let logistic regression function hθRepresents the output of the classifier:
dividing the data into a training set and a verification set according to the proportion of 7:3, and obtaining the weight theta in the formula (10) after the training set is converged through a cross entropy loss function and a gradient descent algorithm through multiple iterative computations. And then, carrying out fine adjustment on the hyper-parameters by using the data of the verification set, calculating correct results of the parameters under different values by setting the parameters with different values, and selecting the value with the highest correct result as a final parameter value, so that the classifier achieves a better classification result on the verification set.
S4, judging whether the convolutional neural network structure tracker needs to be accessed:
s41, after the classifier model is trained and fine-tuned in the previous step, the color histogram can be judged at the stage of task trackingWhether the model with the fused graph features and the directional gradient histogram features can adapt to the current video scene, and therefore whether the tracker of the neural network needs to be switched is made. During the tracking process, f obtained by the formulas (7) and (9)hogF is used as the input of the classifier, namely formula (10), and the obtained output is the mark of switching. When the verification data in the previous step is subjected to parameter adjustment, a proper threshold value of 0.5 is selected, and when the output is greater than the threshold value, the result of the fusion model is trustworthy and does not need to be switched; when the output is less than this threshold, the result of the fusion model is not accepted, at which time the tracker of the neural network is switched. The neural network tracker selected here is DaSiamRPN, which combines the thought and structure of target detection RPN (Region proposed network), can better cope with some complex scenes, can more accurately fit the size of the target after deformation, and does not need to update the template of the target on line, so that the template pollution condition caused by accumulated errors does not exist.
S42, in the stage of on-line tracking, the posterior pixel histogram template and the direction gradient histogram template need to be respectively updated by formulas (4) and (8) to adapt to the change of the scene in the video. Likewise, after the switching DaSiamaRPN tracker completes the tracking task of the current frame, the updating operation with the two formulas is still required. However, because the DaSiamRPN also has the condition of tracking failure, the template is updated only when the target response score predicted by the current frame of the tracker is higher, and the tracking task of the next frame is performed until all video frames are completed. The whole tracking process is shown in fig. 4.
Examples
To evaluate the performance of the present invention, experiments on a test set of video sequences were performed. Here, the evaluation method, data set, and evaluation system of the vot (visual object tracking) competition were selected to perform the experiment. The data set comprises 60 video sequences, wherein scenes such as occlusion, illumination change, target movement, scale change, camera movement, field of view and the like are involved, a plurality of attributes can appear in one video sequence, and visual attributes of different frames are different, so that a model can be evaluated more accurately. Before the VOT was proposed, a popular evaluation system was to have the tracker initialize at the first frame of the sequence, and then have the tracker go to the last frame. However, since the tracker may lose some frames (fail) at the beginning due to one or two factors, the final evaluation system only utilizes a small part of the sequence, which is wasteful. The VOT proposes that the evaluation system should detect an error (failure) when a tracker is lost, and reinitialize the tracker after 5 frames of failure, so as to fully utilize the data set.
The experimental scores of the reset mechanism were looked at first, as shown in table 1:
table 1 scoring of different algorithms under a reset mechanism
In table 1, a-R rank represents an Accuracy (Accuracy) and Robustness (Robustness) ranking index, Overlap is equivalent to the Accuracy, and represents the overlapping rate of a target predicted by a tracking method and an artificially and truly labeled target, and the larger the Overlap is, the more accurate the prediction is; failure is used for evaluating the stability of the tracking method, and the stability is better when the numerical value is smaller. By comparison with the 7 tracking methods, it can be seen that the accuracy of the method ranks first and the stability ranks third. The scoring trend of all algorithms in the table can also be seen more intuitively in fig. 5. However, in an actual scene, it is impossible to reset after tracking failure, and it is obvious that the first evaluation system has more reference value for the actual scene, and the experimental scores are as shown in table 2:
TABLE 2 Scoring for different algorithms without reset mechanism
In table 2, AUC (Area Under the Curve and enclosed by coordinate axes) is an index for evaluating the performance of the algorithm, and the larger the value is, the better the performance of the algorithm is. The speed indicator FPS (frame Per Second, frame rate of transmission) is also larger and faster. It can be seen that, in the absence of a reset mechanism, that is, in the case that the target is not located again by the scoring system after the tracking fails, the accuracy of the method reaches the highest compared with the other 7 methods, and the method is fastest in the algorithm with the top three accuracy ranks. In addition, in a native hardware configuration the CPU: intel Core i7-6700, GPU: experiments are carried out on the GeForce GT730 to obtain the fastest speed of the siamFC method in the table which is only 3FPS, and the method can reach the fastest speed of 30FPS, so that compared with other methods, the method has higher accuracy and still has unsophisticated speed, and is more suitable for actual scenes.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (1)

1. A single target tracking method based on feature compensation is characterized by comprising the following steps:
s1, establishing a target tracking model branch of the color histogram feature:
s11, before a target tracking task starts, calling an OpenCV toolkit, and cutting out a target sub-image E with background information on the basis of a target image which is manually marked;
s12, separating the foreground area from the background area of the target sub-image with the background information according to the size of the target and a certain proportion; meanwhile, the pixels are subjected to scale compression within an integer range of 0-32 pixel values, and pixel proportions relative to each pixel value in a corresponding foreground region and a corresponding background region, namely a foreground pixel proportion rho (O) and a background pixel proportion rho (B), are calculated by respectively depending on a foreground mask and a background mask which are the same in size, wherein the expression of the pixel proportions rho is as follows:
ρ(O)=N(O)/|O|; (1-1)
ρ(B)=N(B)/|B|; (1-2)
wherein, O character represents the image area of foreground O, B character represents the image area of background B, N (O) represents the number of non-zero pixel values in the image area of foreground O, N (B) represents the number of non-zero pixel values in the image area of background B, | O | represents the total number of pixel values in the image area of foreground O, | B | represents the total number of pixel values in the image area of background B, and the weight β of the color histogram template of the posterior pixel of the current frame is calculated and obtained based on the formulas (1-1) and (1-2)t
Wherein t represents the current frame, and lambda is a hyper-parameter;
s13, in the next frame of video, in the image range with the target center of the previous frame as the center of the search area, cutting out the sub-image e with the S12, carrying out scale compression on the pixels to obtain psi, and obtaining the weight β of the posterior pixel color histogram template of the previous frame according to the formulas (1-1), (1-2) and (2)t-1Finally obtaining the color histogram response f by using the integral graph formulahist
Wherein psi is a sub-image after M channel pixel compression, and is defined on the current frame clipping picture e; psitThe sub-image is the sub-image after the M channel pixel compression of the current frame; h represents each pixel point of the picture corresponding to the integer range; u represents each corresponding one of the H grids, ψ u]Is on psiCorresponding pixel points, wherein the superscript T is a matrix transposition;
s14, completing the tracking task of one frame each time, and at the same time, predicting the position of the current frame, and weighting β the histogram template of the posterior pixelstUpdating, namely updating the foreground pixel proportion rho (O) and the background pixel proportion rho (B) respectively to obtain the pixel proportion rho (O) of the foreground O of the current frame after updatingt(O) and the updated pixel ratio ρ of the background B of the current framet(B):
ρt(O)=(1-ηhistt-1(O)+ηhistρ′t(O)
ρt(B)=(1-ηhistt-1(B)+ηhistρ′t(B); (4)
Wherein, ρ't(O) is the pixel proportion, ρ't(B) Is the proportion of pixels in the image area of the background B of the current frame, pt-1(O) is the proportion of pixels in the image area of the foreground O of the previous frame, ρt-1(B) Proportion of pixels in image area of background B of previous frame ηhistWeights for pixel scale update;
s2, establishing a target tracking model branch of the histogram feature of the directional gradient:
s21, selecting the target image to be tracked by rectangular frame in S11, cutting out another target area sub-image E' with different size and same background information, and extracting K channels of three-dimensional directional gradient histogram characteristic phikMultiplying the cosine window function in the OpenCV packet to calculate a template of the directional gradient histogram characteristics
Wherein,all variables are defined in a frequency domain and are obtained through discrete Fourier transform; u represents each of the meshes of Γ, Γ represents ΦkCorresponding to each grid in the integer range; the superscript i represents each of the K channels,is the conjugate of a gaussian signal after a fourier transform; the conjugate is denoted by x, e the element multiplication,is a histogram feature of directional gradients phikEach channel element obtained by Fourier transform, K being the number of channels;
s22, obtaining the directional gradient histogram template in S21Performing inverse Fourier transform to obtain h [ u ]]In the next frame of the video, in the image range with the target center of the previous frame as the center of the search area, a sub-image e' is cut, the directional gradient histogram feature phi of the current sub-image is extracted, and the directional gradient histogram fraction f of the current frame is obtained by utilizing the linear function calculationhog
fhog(φ,h)=∑u∈Γh[u]Tφ[u]; (7)
S23, after completing the tracking task of each frame, updating the template of the histogram feature of the direction gradient at the predicted position of the current frame, namely obtaining the updated final signal respectivelyAnd
wherein,andthe signals representing the current frame are respectively calculated from formula (6),anda signal representing a previous frame of the video signal,andindicating the updated final signal ηhogWeights for histogram of oriented gradient template updates;
s3, feature fusion and classifier establishment:
s31, respectively obtaining the color histogram responses f in S13 and S22histHistogram fraction of oriented gradient fhogAnd (3) performing feature fusion by defining a linear function f (x), so as to obtain:
f(x)=γhogfhog(x)+γhistfhist(x); (9)
wherein, γhogIs the weight of the histogram response of the directional gradient, gammahsitTaking the maximum of f (x) as the weight of the color histogram responseThe coordinate of the point corresponding to the value is the central coordinate of the target;
s32, passing through f, fhogTraining a classifier: selecting a batch of video sequences, and respectively outputting f and f after feature fusion in S31hogAnd let the input of the data set be X ═ max (f)hog);max(f)]Let output tag be h'θDenotes the true value of the dataset, h'θAn integer of 0 or 1, 0 indicating that the tracking box of the model has deviated from the target, 1 indicating no deviation from the target; let logistic regression function hθRepresents the output of the classifier:
dividing the data into a training set and a verification set according to the proportion of 7:3, and obtaining a parameter theta of a logistic regression model in a formula (10) after the training set is converged through a cross entropy loss function and a gradient descent algorithm through multiple iterative computations; then, carrying out fine adjustment on the hyper-parameters by using the data of the verification set, calculating correct results of the parameters under different values by setting the parameters with different values, and selecting the value with the highest correct result as a final parameter value to ensure that a classifier achieves a better classification result on the verification set;
s4, judging whether the convolutional neural network structure tracker needs to be accessed:
s41, mixing f and fhogInputting the data into the classifier obtained in S32 to obtain output; selecting 0.5 of the continuous values output by the classifier as a threshold in S32; when the output is greater than 0.5, the result of the fusion model is trustworthy without switching to access a convolutional neural network tracker; when the output is less than 0.5, the result of the fusion model is not accepted, and the convolution neural network tracker needs to be switched and accessed;
s42, when the target response score predicted by the current frame of the convolutional neural network tracker is high, repeatedly using S14 and S23, namely respectively updating the posterior pixel histogram template and the direction gradient histogram template; and then entering the tracking task of the next frame until all video frames are finished.
CN201910258571.8A 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation Active CN109993775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910258571.8A CN109993775B (en) 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910258571.8A CN109993775B (en) 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation

Publications (2)

Publication Number Publication Date
CN109993775A true CN109993775A (en) 2019-07-09
CN109993775B CN109993775B (en) 2023-03-21

Family

ID=67132176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910258571.8A Active CN109993775B (en) 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation

Country Status (1)

Country Link
CN (1) CN109993775B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490148A (en) * 2019-08-22 2019-11-22 四川自由健信息科技有限公司 A kind of recognition methods for behavior of fighting
CN110647836A (en) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110738149A (en) * 2019-09-29 2020-01-31 深圳市优必选科技股份有限公司 Target tracking method, terminal and storage medium
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN111260686A (en) * 2020-01-09 2020-06-09 滨州学院 Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window
CN112991395A (en) * 2021-04-28 2021-06-18 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle
CN115063449A (en) * 2022-07-06 2022-09-16 西北工业大学 Hyperspectral video-oriented three-channel video output method for target tracking

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0795385A (en) * 1993-09-21 1995-04-07 Dainippon Printing Co Ltd Method and device for clipping picture
EP0951182A1 (en) * 1998-04-14 1999-10-20 THOMSON multimedia S.A. Method for detecting static areas in a sequence of video pictures
EP1126414A2 (en) * 2000-02-08 2001-08-22 The University Of Washington Video object tracking using a hierarchy of deformable templates
WO2010001364A2 (en) * 2008-07-04 2010-01-07 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Complex wavelet tracker
DE102009038364A1 (en) * 2009-08-23 2011-02-24 Friedrich-Alexander-Universität Erlangen-Nürnberg Method and system for automatic object recognition and subsequent object tracking according to the object shape
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
US20130083192A1 (en) * 2011-09-30 2013-04-04 Siemens Industry, Inc. Methods and System for Stabilizing Live Video in the Presence of Long-Term Image Drift
CN103426178A (en) * 2012-05-17 2013-12-04 深圳中兴力维技术有限公司 Target tracking method and system based on mean shift in complex scene
CN103793926A (en) * 2014-02-27 2014-05-14 西安电子科技大学 Target tracking method based on sample reselecting
CN104299247A (en) * 2014-10-15 2015-01-21 云南大学 Video object tracking method based on self-adaptive measurement matrix
CN104361611A (en) * 2014-11-18 2015-02-18 南京信息工程大学 Group sparsity robust PCA-based moving object detecting method
US20150146022A1 (en) * 2013-11-25 2015-05-28 Canon Kabushiki Kaisha Rapid shake detection using a cascade of quad-tree motion detectors
WO2017088050A1 (en) * 2015-11-26 2017-06-01 Sportlogiq Inc. Systems and methods for object tracking and localization in videos with adaptive image representation
WO2017132830A1 (en) * 2016-02-02 2017-08-10 Xiaogang Wang Methods and systems for cnn network adaption and object online tracking
WO2017143589A1 (en) * 2016-02-26 2017-08-31 SZ DJI Technology Co., Ltd. Systems and methods for visual target tracking
CN108346159A (en) * 2018-01-28 2018-07-31 北京工业大学 A kind of visual target tracking method based on tracking-study-detection
CN108447078A (en) * 2018-02-28 2018-08-24 长沙师范学院 The interference of view-based access control model conspicuousness perceives track algorithm
US20180372499A1 (en) * 2017-06-25 2018-12-27 Invensense, Inc. Method and apparatus for characterizing platform motion
CN109360223A (en) * 2018-09-14 2019-02-19 天津大学 A kind of method for tracking target of quick spatial regularization

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0795385A (en) * 1993-09-21 1995-04-07 Dainippon Printing Co Ltd Method and device for clipping picture
EP0951182A1 (en) * 1998-04-14 1999-10-20 THOMSON multimedia S.A. Method for detecting static areas in a sequence of video pictures
EP1126414A2 (en) * 2000-02-08 2001-08-22 The University Of Washington Video object tracking using a hierarchy of deformable templates
WO2010001364A2 (en) * 2008-07-04 2010-01-07 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Complex wavelet tracker
DE102009038364A1 (en) * 2009-08-23 2011-02-24 Friedrich-Alexander-Universität Erlangen-Nürnberg Method and system for automatic object recognition and subsequent object tracking according to the object shape
US20130083192A1 (en) * 2011-09-30 2013-04-04 Siemens Industry, Inc. Methods and System for Stabilizing Live Video in the Presence of Long-Term Image Drift
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
CN103426178A (en) * 2012-05-17 2013-12-04 深圳中兴力维技术有限公司 Target tracking method and system based on mean shift in complex scene
US20150146022A1 (en) * 2013-11-25 2015-05-28 Canon Kabushiki Kaisha Rapid shake detection using a cascade of quad-tree motion detectors
CN103793926A (en) * 2014-02-27 2014-05-14 西安电子科技大学 Target tracking method based on sample reselecting
CN104299247A (en) * 2014-10-15 2015-01-21 云南大学 Video object tracking method based on self-adaptive measurement matrix
CN104361611A (en) * 2014-11-18 2015-02-18 南京信息工程大学 Group sparsity robust PCA-based moving object detecting method
WO2017088050A1 (en) * 2015-11-26 2017-06-01 Sportlogiq Inc. Systems and methods for object tracking and localization in videos with adaptive image representation
WO2017132830A1 (en) * 2016-02-02 2017-08-10 Xiaogang Wang Methods and systems for cnn network adaption and object online tracking
WO2017143589A1 (en) * 2016-02-26 2017-08-31 SZ DJI Technology Co., Ltd. Systems and methods for visual target tracking
US20180372499A1 (en) * 2017-06-25 2018-12-27 Invensense, Inc. Method and apparatus for characterizing platform motion
CN108346159A (en) * 2018-01-28 2018-07-31 北京工业大学 A kind of visual target tracking method based on tracking-study-detection
CN108447078A (en) * 2018-02-28 2018-08-24 长沙师范学院 The interference of view-based access control model conspicuousness perceives track algorithm
CN109360223A (en) * 2018-09-14 2019-02-19 天津大学 A kind of method for tracking target of quick spatial regularization

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
戴凤智等: "基于深度学习的视频跟踪研究进展综述", 《计算机工程与应用》 *
李杰等: "基于粒子群优化的模板匹配跟踪算法", 《计算机应用》 *
武星等: "视觉导引AGV鲁棒特征识别与精确路径跟踪研究", 《农业机械学报》 *
陆惟见等: "基于多模板的鲁棒运动目标跟踪方法", 《传感器与微系统》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490148A (en) * 2019-08-22 2019-11-22 四川自由健信息科技有限公司 A kind of recognition methods for behavior of fighting
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110647836A (en) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110738149A (en) * 2019-09-29 2020-01-31 深圳市优必选科技股份有限公司 Target tracking method, terminal and storage medium
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN111260686A (en) * 2020-01-09 2020-06-09 滨州学院 Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window
CN111260686B (en) * 2020-01-09 2023-11-10 滨州学院 Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window
CN112991395A (en) * 2021-04-28 2021-06-18 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle
CN112991395B (en) * 2021-04-28 2022-04-15 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle
CN115063449A (en) * 2022-07-06 2022-09-16 西北工业大学 Hyperspectral video-oriented three-channel video output method for target tracking

Also Published As

Publication number Publication date
CN109993775B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN109993775B (en) Single target tracking method based on characteristic compensation
CN111797716B (en) Single target tracking method based on Siamese network
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
CN108986140B (en) Target scale self-adaptive tracking method based on correlation filtering and color detection
CN107424177B (en) Positioning correction long-range tracking method based on continuous correlation filter
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN110135500B (en) Target tracking method under multiple scenes based on self-adaptive depth characteristic filter
CN108062531A (en) A kind of video object detection method that convolutional neural networks are returned based on cascade
CN111260688A (en) Twin double-path target tracking method
CN109859241B (en) Adaptive feature selection and time consistency robust correlation filtering visual tracking method
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN108364305B (en) Vehicle-mounted camera video target tracking method based on improved DSST
CN111091583B (en) Long-term target tracking method
CN110569706A (en) Deep integration target tracking algorithm based on time and space network
CN110009663B (en) Target tracking method, device, equipment and computer readable storage medium
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN111429485A (en) Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating
CN109544584B (en) Method and system for realizing inspection image stabilization precision measurement
CN113129332A (en) Method and apparatus for performing target object tracking
CN110544267A (en) correlation filtering tracking method for self-adaptive selection characteristics
CN117274314A (en) Feature fusion video target tracking method and system
CN117058192A (en) Long-time tracking method integrating space-time constraint and adjacent area re-detection
CN110991565A (en) Target tracking optimization algorithm based on KCF
CN112614158B (en) Sampling frame self-adaptive multi-feature fusion online target tracking method
CN110827324B (en) Video target tracking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant