CN111612817A - Target tracking method based on depth feature adaptive fusion and context information - Google Patents

Target tracking method based on depth feature adaptive fusion and context information Download PDF

Info

Publication number
CN111612817A
CN111612817A CN202010375319.8A CN202010375319A CN111612817A CN 111612817 A CN111612817 A CN 111612817A CN 202010375319 A CN202010375319 A CN 202010375319A CN 111612817 A CN111612817 A CN 111612817A
Authority
CN
China
Prior art keywords
feature
shallow
deep
target
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010375319.8A
Other languages
Chinese (zh)
Inventor
纪元法
何传骥
孙希延
付文涛
严素清
符强
王守华
黄建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010375319.8A priority Critical patent/CN111612817A/en
Publication of CN111612817A publication Critical patent/CN111612817A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a target tracking method based on depth feature adaptive fusion and context information, which comprises the steps of firstly, obtaining a first frame image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context sensing framework; then, a plurality of second frame images of the video image sequence are obtained, and deep layer feature responses and shallow layer feature responses of corresponding tracking targets are calculated by utilizing the deep layer feature models and the shallow layer feature models; obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused; and judging the average peak value correlation energy based on a threshold value, and updating the deep layer feature model and the shallow layer feature model until the video image sequence is finished, so that the target can be effectively tracked, and the accuracy is high.

Description

Target tracking method based on depth feature adaptive fusion and context information
Technical Field
The invention relates to the technical field of image processing, in particular to a target tracking method based on depth feature adaptive fusion and context information.
Background
Visual tracking has always been a major concern in the field of computer vision. In recent years, a visual tracking algorithm based on correlation filtering develops rapidly, and has certain advantages in tracking speed and precision. However, the research of target tracking still has certain difficulties, and external interference factors such as target shielding, rapid movement, illumination change and the like directly influence the performance of the tracking algorithm.
The method based on the correlation filtering and the method based on the deep learning are the current mainstream target tracking algorithm, wherein the correlation filtering is one of the research hotspots in the current target tracking field by virtue of the speed advantage brought by the rapid calculation characteristic of the correlation filtering in the frequency domain. Bolme et al first introduced a correlation filtering algorithm into the field of target tracking, and proposed a MOSSE algorithm, but the gray scale features cannot accurately describe the appearance of a target. Henriques and the like propose a CSK algorithm by improving a kernel function, complete the intensive sampling of samples in a cyclic shift mode, and solve the problem of insufficient training samples, but the CSK algorithm adopts single-channel gray scale characteristics and has limited characteristic description capability. Henriques et al then proposed a kernel correlation filtering KCF algorithm, introducing a kernel function into the ridge regression, extending the gray features of a single channel to the HOG features of multiple channels. Possegger et al use color histogram features to describe the appearance of the target, with good results. Martin Danelljan and the like extend a CSK algorithm by utilizing color attribute Characteristics (CN), reduce the operation amount by Principal Component Analysis (PCA) dimension reduction, and improve the tracking precision.
The method only adopts a single characteristic to describe the appearance of the target, and the identification degree of the target is poor and the target is easy to be interfered in a complex scene. The traditional related filtering tracking algorithm has a serious boundary effect, the appearance of a tracked target cannot be well described by adopting artificial features, and a proper feature fusion strategy is lacked, so that the accuracy of the algorithm is seriously influenced by the factors, and the target cannot be effectively tracked.
Disclosure of Invention
The invention aims to provide a target tracking method based on depth feature adaptive fusion and context information, which improves the accuracy of an algorithm and can effectively track a target.
In order to achieve the above object, the present invention provides a target tracking method based on depth feature adaptive fusion and context information, comprising:
acquiring a first frame of image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context-aware framework;
acquiring a plurality of second frame images of the video image sequence, and calculating a deep layer feature response and a shallow layer feature response of a corresponding tracking target by using the deep layer feature model and the shallow layer feature model;
obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused;
and judging the average peak value correlation energy based on a threshold value, and updating the deep layer feature model and the shallow layer feature model until the video image sequence is ended.
The acquiring a first frame of image of a video image sequence and establishing a deep layer feature model and a shallow layer feature model based on a context-aware framework includes:
and introducing a background sample of the target in the first frame of image of the acquired video image sequence as context information into template learning, simultaneously acquiring three-layer convolution characteristics of a target area and three-layer convolution characteristics of four image blocks of the target, namely the upper image block, the lower image block, the left image block, the right image block and the left image block, and calculating a deep layer characteristic model by using a detection formula.
The acquiring a first frame of image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context-aware framework, further includes:
and establishing a shallow feature model according to the color histogram feature and the HOG feature in the background sample.
Acquiring a plurality of second frame images of the video image sequence, and calculating a deep layer feature response and a shallow layer feature response of a corresponding tracking target by using the deep layer feature model and the shallow layer feature model, wherein the method comprises the following steps:
and sequentially acquiring a plurality of second frame images, extracting deep features and shallow features of the tracking target corresponding to the second frame images based on the target position, and calculating corresponding deep feature responses and shallow feature responses by using the deep feature model and the shallow feature model.
Wherein, obtaining a plurality of second frame images of the video image sequence, and calculating a deep layer feature response and a shallow layer feature response of a corresponding tracking target by using the deep layer feature model and the shallow layer feature model, further comprises:
and performing depth feature extraction by using a VGG-NET-19 depth network, and adjusting the size of the corresponding deep feature image by using a bilinear interpolation method.
Obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused, wherein the obtaining of the position of the tracking target in the corresponding second frame image comprises:
and sorting the local maximum values of the deep layer feature response and the shallow layer feature response in an ascending order to serve as candidate states, and obtaining the candidate states with set overall loss and corresponding weight coefficients based on a minimized loss function.
Wherein, obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the self-adaptive fusion of the deep layer feature response and the shallow layer feature response, further comprises:
and combining the response graphs of the deep layer features and the shallow layer features by using a self-adaptive feature fusion strategy, fusing the loss weight coefficients, introducing a relaxation variable and a distance function to perform prediction quality evaluation, and determining a main peak and an interference peak to obtain the position of the tracking target in the corresponding second frame image.
Wherein the determining an average peak correlation energy based on a threshold and updating the deep layer feature model and the shallow layer feature model until the video image sequence ends comprises:
and calculating corresponding average peak correlation energy based on the corresponding second frame image, and updating the deep layer feature model and the shallow layer feature model when the average peak correlation energy is larger than a historical average peak correlation energy average value until the video image sequence is ended.
The invention relates to a target tracking method based on depth feature adaptive fusion and context information, which comprises the steps of firstly, obtaining a first frame image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context sensing framework; then, a plurality of second frame images of the video image sequence are obtained, and deep layer feature responses and shallow layer feature responses of corresponding tracking targets are calculated by utilizing the deep layer feature models and the shallow layer feature models; obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused; and judging the average peak value correlation energy based on a threshold value, and updating the deep layer feature model and the shallow layer feature model until the video image sequence is finished, so that the target can be effectively tracked, and the accuracy is high.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic step diagram of a target tracking method based on depth feature adaptive fusion and context information according to the present invention.
Fig. 2 is a graph comparing accuracy curves of 8 algorithms provided by the present invention.
Fig. 3 is a graph comparing success rate curves of 8 algorithms provided by the present invention.
FIG. 4 is a flowchart illustrating a target tracking method based on depth feature adaptive fusion and context information according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "left", "right", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Further, in the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Referring to fig. 1, the present invention provides a target tracking method based on depth feature adaptive fusion and context information, including:
s101, obtaining a first frame of image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context-aware framework.
In particular, in the correlation filtering tracking, the background information around the target has a very significant influence on the performance of the tracker. The related filtering tracker easily causes boundary effect due to the characteristics of the cyclic samples, the boundary effect can be effectively limited but background information is reduced through the cosine window, and when a target is deformed or the background is disorderedTracking failures may result. Aiming at the problem, a background sample of a target in a first frame image of an acquired video image sequence is taken as context information and introduced into template learning, and meanwhile, three-layer convolution characteristics of the target area and three-layer convolution characteristics of four image blocks around the target, namely the upper layer, the lower layer, the left layer and the right layer, are acquired and taken as a training template of a depth feature model. And the four image blocks of the target, namely the upper, lower, left and right image blocks are context image blocks. Establishing a shallow feature model according to the color histogram feature and the HOG feature in the background sample, and tracking the object n0∈RnExtracting k context image blocks n aroundi∈RnThe corresponding circulant matrix is N0∈Rn×nAnd Ni∈Rn×n. And calculating a deep layer characteristic model by using a detection formula:
Figure BDA0002479760080000041
wherein λ is2For the regularization parameters, in Fourier form, complex conjugate transform, z is a circulant matrix, rdIndicating the location of the target, ⊙ is a dot product operation,
Figure BDA0002479760080000054
as a result of the fourier transform of the depth features of the target region,
Figure BDA0002479760080000051
and the result is obtained after the Fourier transform of the depth features of the area around the target.
Figure BDA0002479760080000052
Is the dot product of the depth features of the target image,
Figure BDA0002479760080000053
the dot product of the depth features of the image blocks around the target.
The method has the advantages that multilayer convolution characteristics are collected to serve as deep characteristics, the color histogram characteristics and the HOG characteristics serve as shallow characteristics, the appearance of the target can be accurately and effectively described, a context-aware frame (context-aware) is introduced, four image blocks, namely an upper image block, a lower image block, a left image block, a right image block and a left image block are collected around the target, the boundary effect is reduced, the.
S102, obtaining a plurality of second frame images of the video image sequence, and calculating a deep layer feature response and a shallow layer feature response of the corresponding tracking target by using the deep layer feature model and the shallow layer feature model.
Specifically, a plurality of second frame images except the first frame image are sequentially acquired, deep features and shallow features of a tracking target corresponding to the second frame images are extracted based on the target position, and deep feature responses and shallow feature responses corresponding to the tracking target are calculated by using the deep feature models and the shallow feature models, wherein the deep feature responses are calculated by using a VGG-NET-19 depth network to extract the depth features, and by taking a MotorRolling sequence in an OTB100 data set as an example, conv5-4 with more semantic information and conv3-4 and conv4-4 with more detail information in a feature map are selected to describe the appearance of the targetikDepending on the feature map where positions i and k are adjacent, the feature vector for position i is then expressed as:
xi=∑βikmk
after the feature maps of the conv3-4, conv4-4 and conv5-4 convolutional layers are subjected to bilinear interpolation and visualization processing, shallow feature maps such as conv3-4 and conv4-4 have higher resolution, and the contour of the target can be described more accurately. As the depth increases, the conv5-4 deep features describe the range of regions where the target is located, and the brightness is higher. Through sequence comparison, when the shape and the background of the tracked target change simultaneously, the extracted depth features can still distinguish the target.
Calculation method of shallow feature responseThe method comprises the following steps: the shallow feature is mainly a manual feature, comprises RGB pixels, HOG, CN and the like, contains detailed information such as texture, color and the like, has high spatial resolution and is suitable for high-precision positioning. The method extracts the color histogram feature and the HOG feature as the shallow feature, and the color histogram and the HOG response value are represented by a feature image psi of an M-channelx:H→RMCalculated and defined on a finite grid g.
responsehist(x)=g(ψx)
S103, obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused.
Specifically, the depth features encode high-level semantic information, are insensitive to external deformation and can be used for coarse positioning, and the shallow features have higher detail resolution and are suitable for accurate positioning. The two features are treated separately, the deep feature is responsible for robustness, the shallow feature emphasizes accuracy, the two features are fused in a self-adaptive mode, and feature complementation is achieved. Collecting three-layer convolution characteristics as depth characteristics, color histogram characteristics and HOG characteristics as shallow characteristics, respectively training a relevant filter by the two characteristics, constructing two independent appearance models, and combining response graphs of the two characteristics by adopting a self-adaptive characteristic fusion strategy:
yβ(t)=βdyd(t)+βsys(t)
wherein, ydRepresenting the deep feature score, ysDenotes the shallow feature fraction, yβRepresents the total score obtained by weighting two scores, β ═ βds) Weights representing the dark and light scores.
The response graph can reflect the accuracy and robustness of target positioning, the accuracy is related to the response sharpness degree around the predicted target, and the sharper the main peak is, the stronger the accuracy is; robustness is related to the interval from the main peak to the interference peak, and the larger the distance from the main peak to the secondary peak is, the stronger the robustness is. In order to evaluate the reliability of the prediction target, a prediction quality evaluation method is adopted:
Figure BDA0002479760080000061
where y represents the detection score function of the image search region and y (t) ∈ R is the position t ∈ R2Target prediction score of, t*Representing candidate prediction targets. The delta distance function is defined as:
Figure BDA0002479760080000062
lead-in slack variable ξ ═ ξt*{yβCo-estimating the fractional weight β and the target state t based on a formula combining response maps of the two features*Maximizing the quality assessment by minimizing the loss function yields:
Figure BDA0002479760080000063
in actual operation, local maximum values are respectively searched from deep layer scores and shallow layer scores, the local maximum values are sorted and screened according to response values in ascending order and then serve as limited candidate states omega, and each state t is optimized through a minimization loss function*∈ Ω, and then selecting the candidate state t with the set, i.e., minimum, overall loss*As the final prediction result, the corresponding weight coefficient β is obtained (β)ds). The method adopts a deep and shallow feature adaptive fusion strategy to combine the response graphs of the two features, adaptively changes the feature fusion weight according to different tracking backgrounds, effectively adapts to various tracking scenes, and improves the tracking performance.
And S104, judging the average peak value correlation energy based on a threshold value, and updating the deep layer feature model and the shallow layer feature model until the video image sequence is finished.
In particular, the selection of the model update strategy has a significant impact on the performance of the correlation filter. During the tracking process, the tracking process is inevitably interfered by various factors, such as target loss, background occlusion, blurring and the like, and the updating of the model by using wrong information may cause the tracking to drift or even fail. And calculating corresponding average peak correlation energy based on the corresponding second frame image, evaluating the confidence degree of the response by adopting the Average Peak Correlation Energy (APCE), and judging the reliability of the tracking result. The index can reflect the reliability of the target tracking result and the fluctuation degree of the response diagram, and can be calculated by the following formula:
Figure BDA0002479760080000071
wherein, FmaxIndicates the highest response, FminRepresents the lowest response, Fw,hRepresenting the response at location (w, h), mean () represents the average of the values in parentheses. When the target generates deformation and background interference, the response diagram generates severe fluctuation and multi-peak interference occurs, and the APCE is in a lower state. When the target is not disturbed, the response diagram has a sharp and well-defined (unambiguated) peak, while the APCE is in a higher state. Thus, the model is not updated when the APCE value decreases significantly, only when the APCE value and FmaxThe model is updated when the ratio of the updated model is larger than the historical mean value, so that the model updating times are reduced, and the model drift condition is also reduced.
And when the average peak correlation energy is larger than the historical average peak correlation energy mean value, namely a threshold value, updating the deep layer feature model and the shallow layer feature model until the video image sequence is ended. The updated model is as follows:
Figure BDA0002479760080000072
wherein, αdeep_t-1Depth feature model for previous frame, αdeep_tA depth feature model for the current frame, ηdeepLearning rate for depth features αshallow_t-1Model of shallow features for the previous frame, αshallow_tIs a shallow feature model of the current frame, ηshallowThe learning rate of the shallow features.
For example, using OTB100 data set evaluates the tracking performance of the method. The evaluation results were compared in terms of both Precision map (Precision plot) and Success rate (Success plot) maps. The precision graph adopts CLE (Central location error), and CLE is defined as the Euclidean distance between the target coordinate value detected by the tracking method and the actually marked target coordinate value. The success rate refers to the overlapping rate of the boundary frames, and the target boundary frame r detected by the given detection methodtAnd the actual labeled target bounding box raThe overlap ratio is defined as:
Figure BDA0002479760080000081
where, U and @, respectively, represent the union of two regions, and | · | represents the number of pixels.
The experimental hardware environment is a Win10 operating system, an Intel Core i7-8750H (2.20GHz) processor, an 8GB memory and Matlab R2018a, the result of comparing the algorithm with the SRDCF, STAPLE, LCT, RPT, SAMF, KCF and CSK tracking algorithm on an OTB100 data set is shown in a comparison graph of the precision curve and the success rate curve of the 8 algorithms provided in figures 2 and 3, OUR is the method provided by the invention, and the method has the advantages that the precision and the success rate are both in the first place, the precision is improved by 0.2% compared with the second-place LCT algorithm, the precision is improved by 1.5% compared with the SRDCF, and the precision is improved by 10.8% compared with the KCF. On the success rate, the algorithm is improved by 0.2% compared with SRDCF and 20.9% compared with KCF. It can be seen that the invention has better tracking performance.
As shown in fig. 4, the process of the target tracking method based on the depth feature adaptive fusion and the context information specifically includes: reading a first frame image of a video sequence, determining a rectangular region of a tracked target, extracting three-layer convolution features of the target region and three-layer convolution features of four image blocks of the target, namely an upper image block, a lower image block, a left image block, a right image block and a left image block as deep-layer features, extracting color histogram features and HOG features as shallow-layer features, and respectively calculating a deep-layer feature model and a shallow-layer feature model; secondly, reading a next frame of image, extracting deep layer features and shallow layer features of the target at the target position predicted by the previous frame, and calculating deep layer feature response and shallow layer feature response of the target of the current frame according to the feature model calculated by the previous frame; then, adopting a deep-layer and shallow-layer feature self-adaptive fusion strategy, calculating a weight value by finding a candidate state with minimum overall loss in the deep-layer and shallow-layer response scores to obtain the optimal fusion proportion of the two features, and calculating the position with the maximum response value according to the fused response sum, wherein the position is the position (x, y) of the tracking target in the image; and finally, calculating the APCE value of the current frame according to the fused response value, if the APCE value of the current frame is larger than the historical APCE average value, namely APCE is larger than APCE _ mean, judging that the reliability of the image tracking result of the frame is higher, updating the deep layer feature model and the shallow layer feature model, and if not, not updating until the video sequence is finished, effectively tracking the target and having higher accuracy.
The invention relates to a target tracking method based on depth feature adaptive fusion and context information, which comprises the steps of firstly, obtaining a first frame image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context sensing framework; then, a plurality of second frame images of the video image sequence are obtained, and deep layer feature responses and shallow layer feature responses of corresponding tracking targets are calculated by utilizing the deep layer feature models and the shallow layer feature models; obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused; and judging the average peak value correlation energy based on a threshold value, and updating the deep layer feature model and the shallow layer feature model until the video image sequence is finished, so that the target can be effectively tracked, and the accuracy is high.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A target tracking method based on depth feature adaptive fusion and context information is characterized by comprising the following steps:
acquiring a first frame of image of a video image sequence, and establishing a deep layer feature model and a shallow layer feature model based on a context-aware framework;
acquiring a plurality of second frame images of the video image sequence, and calculating a deep layer feature response and a shallow layer feature response of a corresponding tracking target by using the deep layer feature model and the shallow layer feature model;
obtaining the position of the tracking target in the corresponding second frame image according to the response sum after the deep layer feature response and the shallow layer feature response are adaptively fused;
and judging the average peak value correlation energy based on a threshold value, and updating the deep layer feature model and the shallow layer feature model until the video image sequence is ended.
2. The method for tracking a target based on adaptive fusion of depth features and context information as claimed in claim 1, wherein the obtaining a first frame image of a video image sequence and establishing a deep feature model and a shallow feature model based on a context-aware framework comprises:
and introducing a background sample of the target in the first frame of image of the acquired video image sequence as context information into template learning, simultaneously acquiring three-layer convolution characteristics of a target area and three-layer convolution characteristics of four image blocks of the target, namely the upper image block, the lower image block, the left image block, the right image block and the left image block, and calculating a deep layer characteristic model by using a detection formula.
3. The method for tracking a target based on adaptive fusion of depth features and context information as claimed in claim 2, wherein the obtaining a first frame image of a video image sequence and establishing a deep feature model and a shallow feature model based on a context-aware framework further comprises:
and establishing a shallow feature model according to the color histogram feature and the HOG feature in the background sample.
4. The method for tracking a target based on adaptive fusion of deep and shallow features and contextual information according to claim 3, wherein obtaining a plurality of second frame images of the video image sequence and calculating a deep feature response and a shallow feature response of the corresponding tracked target by using the deep feature model and the shallow feature model comprises:
and sequentially acquiring a plurality of second frame images, extracting deep features and shallow features of the tracking target corresponding to the second frame images based on the target position, and calculating corresponding deep feature responses and shallow feature responses by using the deep feature model and the shallow feature model.
5. The method as claimed in claim 4, wherein the method for tracking the target based on the adaptive fusion of the deep and shallow features and the context information comprises obtaining a plurality of second frame images of the video image sequence and calculating a deep feature response and a shallow feature response of the corresponding tracked target by using the deep feature model and the shallow feature model, and further comprising:
and performing depth feature extraction by using a VGG-NET-19 depth network, and adjusting the size of the corresponding deep feature image by using a bilinear interpolation method.
6. The method for tracking the target based on the adaptive fusion of the deep and shallow features and the context information as claimed in claim 5, wherein obtaining the position of the tracked target in the corresponding second frame image according to the sum of the responses after the adaptive fusion of the deep feature response and the shallow feature response comprises:
and sorting the local maximum values of the deep layer feature response and the shallow layer feature response in an ascending order to serve as candidate states, and obtaining the candidate states with set overall loss and corresponding weight coefficients based on a minimized loss function.
7. The method for tracking the target based on the adaptive fusion of the deep and shallow features and the context information as claimed in claim 6, wherein the position of the tracked target in the corresponding second frame image is obtained according to the sum of the responses after the adaptive fusion of the deep feature response and the shallow feature response, further comprising:
and combining the response graphs of the deep layer features and the shallow layer features by using a self-adaptive feature fusion strategy, fusing the loss weight coefficients, introducing a relaxation variable and a distance function to perform prediction quality evaluation, and determining a main peak and an interference peak to obtain the position of the tracking target in the corresponding second frame image.
8. The method for tracking an object based on adaptive fusion of depth features and context information as claimed in claim 7, wherein the determining the average peak correlation energy based on the threshold and updating the deep feature model and the shallow feature model until the end of the video image sequence comprises:
and calculating corresponding average peak correlation energy based on the corresponding second frame image, and updating the deep layer feature model and the shallow layer feature model when the average peak correlation energy is larger than a historical average peak correlation energy average value until the video image sequence is ended.
CN202010375319.8A 2020-05-07 2020-05-07 Target tracking method based on depth feature adaptive fusion and context information Pending CN111612817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010375319.8A CN111612817A (en) 2020-05-07 2020-05-07 Target tracking method based on depth feature adaptive fusion and context information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010375319.8A CN111612817A (en) 2020-05-07 2020-05-07 Target tracking method based on depth feature adaptive fusion and context information

Publications (1)

Publication Number Publication Date
CN111612817A true CN111612817A (en) 2020-09-01

Family

ID=72199466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010375319.8A Pending CN111612817A (en) 2020-05-07 2020-05-07 Target tracking method based on depth feature adaptive fusion and context information

Country Status (1)

Country Link
CN (1) CN111612817A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329784A (en) * 2020-11-23 2021-02-05 桂林电子科技大学 Correlation filtering tracking method based on space-time perception and multimodal response
CN112651999A (en) * 2021-01-19 2021-04-13 滨州学院 Unmanned aerial vehicle ground target real-time tracking method based on space-time context perception
CN112652299A (en) * 2020-11-20 2021-04-13 北京航空航天大学 Quantification method and device of time series speech recognition deep learning model
CN112767440A (en) * 2021-01-07 2021-05-07 江苏大学 Target tracking method based on SIAM-FC network
CN113284155A (en) * 2021-06-08 2021-08-20 京东数科海益信息科技有限公司 Video object segmentation method and device, storage medium and electronic equipment
CN113379802A (en) * 2021-07-01 2021-09-10 昆明理工大学 Multi-feature adaptive fusion related filtering target tracking method
CN113538509A (en) * 2021-06-02 2021-10-22 天津大学 Visual tracking method and device based on adaptive correlation filtering feature fusion learning
CN113610891A (en) * 2021-07-14 2021-11-05 桂林电子科技大学 Target tracking method, device, storage medium and computer equipment
CN113705325A (en) * 2021-06-30 2021-11-26 天津大学 Deformable single-target tracking method and device based on dynamic compact memory embedding
CN113855065A (en) * 2021-09-28 2021-12-31 平安科技(深圳)有限公司 Heart sound identification method based on fusion of shallow learning and deep learning and related device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035300A (en) * 2018-07-05 2018-12-18 桂林电子科技大学 A kind of method for tracking target based on depth characteristic Yu average peak correlation energy

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035300A (en) * 2018-07-05 2018-12-18 桂林电子科技大学 A kind of method for tracking target based on depth characteristic Yu average peak correlation energy

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GOUTAM BHAT 等: "Unveiling the Power of Deep Tracking", 《COMPUTER VISION–ECCV 2018》, pages 1 - 16 *
MING TANG 等: "High-speed Tracking with Multi-kernel Correlation Filters", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, pages 1 - 10 *
张凯帝: "基于卷积神经网络的无人机检测系统的研究与实现", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, pages 031 - 59 *
徐佳晙: "基于手工设计特征与深度特征融合的目标跟踪算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2020, pages 138 - 885 *
纪元法 等: "基于自适应特征融合与上下文感知的目标跟踪", 《激光与光电子学进展》, vol. 58, no. 16, pages 1 - 11 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112652299B (en) * 2020-11-20 2022-06-17 北京航空航天大学 Quantification method and device of time series speech recognition deep learning model
CN112652299A (en) * 2020-11-20 2021-04-13 北京航空航天大学 Quantification method and device of time series speech recognition deep learning model
CN112329784A (en) * 2020-11-23 2021-02-05 桂林电子科技大学 Correlation filtering tracking method based on space-time perception and multimodal response
CN112767440A (en) * 2021-01-07 2021-05-07 江苏大学 Target tracking method based on SIAM-FC network
CN112767440B (en) * 2021-01-07 2023-08-22 江苏大学 Target tracking method based on SIAM-FC network
CN112651999A (en) * 2021-01-19 2021-04-13 滨州学院 Unmanned aerial vehicle ground target real-time tracking method based on space-time context perception
CN113538509A (en) * 2021-06-02 2021-10-22 天津大学 Visual tracking method and device based on adaptive correlation filtering feature fusion learning
CN113284155A (en) * 2021-06-08 2021-08-20 京东数科海益信息科技有限公司 Video object segmentation method and device, storage medium and electronic equipment
CN113284155B (en) * 2021-06-08 2023-11-07 京东科技信息技术有限公司 Video object segmentation method and device, storage medium and electronic equipment
CN113705325A (en) * 2021-06-30 2021-11-26 天津大学 Deformable single-target tracking method and device based on dynamic compact memory embedding
CN113379802A (en) * 2021-07-01 2021-09-10 昆明理工大学 Multi-feature adaptive fusion related filtering target tracking method
CN113379802B (en) * 2021-07-01 2024-04-16 昆明理工大学 Multi-feature adaptive fusion related filtering target tracking method
CN113610891A (en) * 2021-07-14 2021-11-05 桂林电子科技大学 Target tracking method, device, storage medium and computer equipment
CN113610891B (en) * 2021-07-14 2023-05-23 桂林电子科技大学 Target tracking method, device, storage medium and computer equipment
CN113855065A (en) * 2021-09-28 2021-12-31 平安科技(深圳)有限公司 Heart sound identification method based on fusion of shallow learning and deep learning and related device
CN113855065B (en) * 2021-09-28 2023-09-22 平安科技(深圳)有限公司 Heart sound identification method and related device based on shallow learning and deep learning fusion

Similar Documents

Publication Publication Date Title
CN111612817A (en) Target tracking method based on depth feature adaptive fusion and context information
CN111797716B (en) Single target tracking method based on Siamese network
CN111476302B (en) fast-RCNN target object detection method based on deep reinforcement learning
CN109035300B (en) Target tracking method based on depth feature and average peak correlation energy
CN111626993A (en) Image automatic detection counting method and system based on embedded FEFnet network
CN113592911B (en) Apparent enhanced depth target tracking method
CN110175649A (en) It is a kind of about the quick multiscale estimatiL method for tracking target detected again
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN113569724B (en) Road extraction method and system based on attention mechanism and dilation convolution
CN111340842B (en) Correlation filtering target tracking method based on joint model
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN109255799B (en) Target tracking method and system based on spatial adaptive correlation filter
CN111640138A (en) Target tracking method, device, equipment and storage medium
CN110458019B (en) Water surface target detection method for eliminating reflection interference under scarce cognitive sample condition
CN112164093A (en) Automatic person tracking method based on edge features and related filtering
CN111429485A (en) Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating
CN110766657A (en) Laser interference image quality evaluation method
CN116665095B (en) Method and system for detecting motion ship, storage medium and electronic equipment
CN116381672A (en) X-band multi-expansion target self-adaptive tracking method based on twin network radar
CN116363064A (en) Defect identification method and device integrating target detection model and image segmentation model
CN116129417A (en) Digital instrument reading detection method based on low-quality image
CN115311327A (en) Target tracking method and system integrating co-occurrence statistics and fhog gradient features
CN114066935A (en) Long-term target tracking method based on correlation filtering
CN114926826A (en) Scene text detection system
CN111695552B (en) Multi-feature fusion underwater target modeling and optimizing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination