CN110796680B - Target tracking method and device based on similar template updating - Google Patents

Target tracking method and device based on similar template updating Download PDF

Info

Publication number
CN110796680B
CN110796680B CN201910734740.0A CN201910734740A CN110796680B CN 110796680 B CN110796680 B CN 110796680B CN 201910734740 A CN201910734740 A CN 201910734740A CN 110796680 B CN110796680 B CN 110796680B
Authority
CN
China
Prior art keywords
model
video frame
time
moment
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910734740.0A
Other languages
Chinese (zh)
Other versions
CN110796680A (en
Inventor
明悦
张润清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910734740.0A priority Critical patent/CN110796680B/en
Publication of CN110796680A publication Critical patent/CN110796680A/en
Application granted granted Critical
Publication of CN110796680B publication Critical patent/CN110796680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target tracking method and device based on similar template updating. The method comprises the following steps: image feature extraction is carried out on the video frame picture at the initial moment through a target tracking module to obtain an initial model
Figure DDA0002161804260000011
And initial incremental update model
Figure DDA0002161804260000012
Image feature extraction is carried out on the video frame picture at the current moment t through a target tracking module to obtain a new model at the moment t
Figure DDA0002161804260000013
New model according to t-1 time
Figure DDA0002161804260000014
And incremental update model at time t-1
Figure DDA0002161804260000015
Calculating to obtain an incremental updating model at the t moment
Figure DDA0002161804260000016
Calculate a new model
Figure DDA0002161804260000017
And
Figure DDA0002161804260000018
the similarity between delta init Calculate a new model
Figure DDA0002161804260000019
And
Figure DDA00021618042600000110
the similarity between delta incre According to the similarity delta init And similarity δ incre Selecting a new model by a model update strategy
Figure DDA00021618042600000111
Or
Figure DDA00021618042600000112
As the final model at time t. The invention calculates by using convolution response
Figure DDA00021618042600000113
And T init
Figure DDA00021618042600000114
And the final model at the time t is selected according to the similarity, so that the reliability of the new model can be quickly detected.

Description

Target tracking method and device based on similar template updating
Technical Field
The invention relates to the technical field of picture processing, in particular to a target tracking method and device based on similar template updating.
Background
Artificial intelligence is an important driving force of a new technological revolution and industrial revolution, and target tracking is one of important research directions of artificial intelligence technology in computer vision, and the main task of the target tracking is to detect the accurate position of a certain known target or a plurality of known targets from a video. As current computer vision tasks focus more and more on video analysis, target tracking algorithms are receiving more and more attention.
The target tracking system can be roughly divided into three modules of video frame input, target tracking and result display. The video frame input module is used for reading video data and sending the video data to the target tracking module according to frames. The target tracking module is a core function module of the system and is used for searching a target determined by an initial frame in an input picture frame and acquiring the specific position and size of the target. And the result display module combines the specific position and size of the target obtained by the target tracking module with the picture frame to synthesize a video frame picture with a mark frame and output the video frame picture to a user.
The evaluation of the target tracking system mainly has two aspects of accuracy and real-time performance. The main indicators for evaluating accuracy include average overlap expectation, accuracy and robustness. The accuracy mainly evaluates the pixel difference between the tracking result and the actual position of the target, and the difference between the tracking result and the area basis of the actual size of the target. Robustness mainly evaluates the ability of the tracking result to recover correct tracking after the tracking fails. The accuracy of a target tracking system is affected by a number of factors. Given that the target information only has appearance and position information in the first frame, deformation, rotation and scaling of the target itself can affect the performance of the target tracking module. In addition, factors such as illumination change and obstructions exist in the environment where the target is located, and the performance of the target tracking module is affected. Blurring and shooting angle variation in the video shooting process can also become the cause of inaccurate target tracking. In addition to accuracy, real-time is also a very important indicator in target tracking systems. The lowest real-time playing speed of the video is required to be above 24FPS for the tracking result. However, in practical application, the target tracking algorithm often cannot achieve real-time performance due to the problems of complex modeling, image processing calculation and the like.
The target tracking module is essentially an image object detector that needs to detect specific position and size information of a specified target in an input image area. The method mainly comprises three submodules of feature extraction, target positioning and target model updating. For the target tracking algorithm, the feature extraction submodule is used for modeling a target, and a direct target picture cannot be used for target tracking, so that the picture needs to be specially processed into a feature vector, and a target model is constructed by using the feature vector. The image feature extraction method mainly comprises the traditional feature extraction method and the feature extraction method based on deep learning. The traditional feature extraction method has the characteristic of high speed, but the accuracy is much lower than that of features based on deep learning. The feature extraction method based on deep learning often cannot meet the requirement of real-time performance due to the problems of large quantity of required images, complex model, large parameter quantity and the like. And the target positioning sub-module processes the extracted image features, identifies which pixel regions belong to the target and which pixel regions do not belong to the target, and thus determines the specific position and size of the target. Currently common object localization models include convolutional layers and associated filters. The convolution layer has large calculation amount and long time consumption. The correlation filter has an advantage in speed, but has a problem of model degradation in practical application. The target model updating submodule is used for updating a specific model of a target, the appearance of the target changes along with the tracking, and the initial target model cannot ensure the tracking accuracy at the moment, so that the model of the target needs to be updated. In general, the target tracking system updates the target model every frame according to the prediction result of every frame, and such updating takes a lot of computation time. Furthermore, the updated template itself has unreliability and the update process may introduce background information such that the model is modeled incorrectly, which may make the model more and more distant from the correct model as the tracking progresses, leading to tracking drift. During tracking, the current target tracking system does not detect a new model, so that many updates are invalid during model updating. In fact, most frames of the object model are stable, and the update is redundant, and the model update of the object is only valid when the appearance of the object changes. Meanwhile, a large amount of computing resources and time are consumed for detecting whether the appearance of the target model changes, so that the time for the whole system to process the target tracking task is increased.
A processing flow of a target tracking system scheme based on a frame-by-frame incremental update model in the prior art is shown in fig. 1, and the specific steps are as follows:
1. video data is read on a frame-by-frame basis and simple data preprocessing is performed. And determining the position of the target in the current frame by adopting a positioning algorithm for each frame, and predicting the position of the current frame by adopting a model n.
2. And making a new model n from the target of the current nth frame by adopting a feature extraction algorithm, and fusing the new model n and the historical model n to obtain an updated model n.
3. And displaying the tracked video frame.
4. And taking the updated model as a new model for determining the target position of the next frame.
5. And repeating the steps 2) to 5) until the video frame is input.
The target tracking system scheme based on the frame-by-frame incremental update model in the prior art has the following disadvantages:
it is impossible to judge whether the new model is reliable. In the incremental updating method, the new model generated by each frame participates in the updating of the model, but the new model generated by each frame is not detected, and whether the new model is valid or not cannot be judged. When tracking becomes problematic, the model obviously introduces invalid background information, making existing models increasingly unreliable. When the introduced background information is excessive, the target tracking result can drift, so that a new model generated by tracking is more unreliable, and a vicious circle is generated.
The model update efficiency is low. In the target tracking algorithm, if a new model is updated for each frame, the new model is fused with the historical model, and a large amount of calculation is needed when a complex modeling method is adopted. This results in a large amount of time resources being consumed in updating the model, thereby reducing the tracking speed.
A processing flow chart of a design scheme of a conventional correlation filter target tracking system in the prior art is shown in fig. 2, and the specific steps include:
1. and taking the first frame as a template, and extracting the traditional image characteristics.
2. For a video image frame which is newly entered into the system at present, traditional image characteristics are extracted.
3. The image features of the template are used together with the image features of the new image frame to calculate a correlation response.
4. And selecting the position with the maximum response as a new coordinate of the target.
5. And repeating 2) to 4) until the video frame is input.
The above-mentioned conventional correlation filter target tracking system design scheme in the prior art has the following disadvantages:
1. conventional features are not accurate enough in target tracking systems. In the current target tracking system, the traditional characteristics can completely meet the real-time problem under the current hardware condition. However, compared with the deep learning feature, the traditional feature has the performance that the performance is not accurate enough, and because the description capability of the traditional feature is not strong enough, the semantic information of the target image cannot be expressed, when the traditional feature is used for determining the target position, the target is easy to lose.
2. The correlation filter itself models the degradation problem. In this type of design, a correlation filter is used as the positioning method. The performance of the correlation filter is not problematic, and as a single-frame target detector, the accuracy can even exceed that of a neural network convolutional layer. However, when the correlation filter is more prone to model degradation as target tracking progresses, model degradation may gradually misalign the model, resulting in lost target tracking.
Disclosure of Invention
The embodiment of the invention provides a target tracking method and device based on similar template updating, which aims to overcome the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
According to one aspect of the invention, a target tracking method based on similar template updating is provided, which comprises the following steps:
transcoding and framing the video data to obtain video frame pictures corresponding to all moments including the target;
image feature extraction processing is carried out on the video frame picture at the initial moment through a target tracking module to obtain an initial model
Figure GDA0003679244830000041
And initial incremental update model
Figure GDA0003679244830000042
Carrying out image feature extraction processing on the video frame picture at the current moment t through a target tracking module to obtain a new model at the moment t
Figure GDA0003679244830000043
New model according to t-1 time
Figure GDA0003679244830000044
And incremental update model at time t-1
Figure GDA0003679244830000051
Calculating to obtain an incremental updating model at the t moment
Figure GDA0003679244830000052
Respectively calculating new models by a convolution response method
Figure GDA0003679244830000053
And the initial model
Figure GDA0003679244830000054
A value of similarity between delta init Novel model
Figure GDA0003679244830000055
Updating the model with the current increment
Figure GDA0003679244830000056
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure GDA0003679244830000057
Or
Figure GDA0003679244830000058
As the final model at time t;
the target tracking module extracts the image characteristics of the video frame picture at the current moment t to obtain a new model at the moment t
Figure GDA0003679244830000059
New model according to t-1 time
Figure GDA00036792448300000510
And incremental update model at time t-1
Figure GDA00036792448300000511
Calculating to obtain an incremental updating model at the t moment
Figure GDA00036792448300000512
The method comprises the following steps:
performing image feature extraction on the input video frame picture at the t moment by using the convolution layer through a target tracking module to obtain the specific position of the target in the video frame picture at the t moment, and converting the specific position of the target in the video frame picture at the t moment into a new model at the t moment
Figure GDA00036792448300000513
New model according to t-1 moment through target tracking algorithm
Figure GDA00036792448300000514
And incremental update model at time t-1
Figure GDA00036792448300000515
Obtaining an incremental update model at time t
Figure GDA00036792448300000516
Figure GDA00036792448300000517
Wherein alpha is a set learning rate;
Respectively calculating new models by the convolution response method
Figure GDA00036792448300000518
And the initial model
Figure GDA00036792448300000519
A value of similarity between delta init Novel model
Figure GDA00036792448300000520
Updating the model with the current increment
Figure GDA00036792448300000521
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure GDA00036792448300000522
Or
Figure GDA00036792448300000523
As a final model of time t, the following are included:
calculate a new model
Figure GDA00036792448300000524
And the initial model
Figure GDA00036792448300000525
To convert the convolution response into a new model
Figure GDA00036792448300000526
And the initial model
Figure GDA00036792448300000527
A value of similarity between delta init Calculate a new model
Figure GDA00036792448300000528
Updating the model with the current increment
Figure GDA00036792448300000529
To convert the convolution response into a new model
Figure GDA00036792448300000530
Updating the model with the current increment
Figure GDA00036792448300000531
A value of similarity between delta incre
Setting two thresholds of similarity
Figure GDA0003679244830000061
And
Figure GDA0003679244830000062
judgment of
Figure GDA0003679244830000063
And is
Figure GDA0003679244830000064
If yes, the new model is used
Figure GDA0003679244830000065
As final model at time t
Figure GDA0003679244830000066
Otherwise, select
Figure GDA0003679244830000067
As final model at time t
Figure GDA0003679244830000068
Preferably, the transcoding and framing the video data to obtain the video frame picture corresponding to each time point including the target includes:
the data reading thread finishes reading in the video data, transcoding and framing the video data to obtain a video frame picture sequence containing a target, wherein the video frame picture sequence comprises video frame pictures corresponding to all moments;
And carrying out preprocessing operation on each video frame picture in the video frame picture sequence, and transmitting the preprocessed video frame picture to a target tracking module, wherein the preprocessing comprises histogram equalization and picture size adjustment.
Preferably, the image feature extraction is performed on the video frame picture at the initial moment through the target tracking module to obtain an initial model
Figure GDA0003679244830000069
And initial incremental update model
Figure GDA00036792448300000610
The method comprises the following steps:
initial time t is tracked by the target tracking module 1 Extracting image characteristics of video frame pictures at the moment, and setting an initial moment t according to a target tracking task 1 Target position information in video frame picture of moment t 1 After the image blocks of the frame at a moment pass through two convolution layers with the kernel of 3 multiplied by 3, a convolution characteristic matrix of 125 multiplied by 32 is obtained, and the convolution characteristic matrix is used as an initial model
Figure GDA00036792448300000611
t 1 Incremental update model of time of day
Figure GDA00036792448300000612
Is equal in value to
Figure GDA00036792448300000613
According to another aspect of the present invention, there is provided an object tracking apparatus updated based on similar templates, including:
the video data preprocessing module is used for transcoding and framing the video data to obtain video frame pictures corresponding to all moments including targets;
a video frame primary filtering processing module for extracting image characteristics of the video frame image at the initial time through the target tracking module to obtain an initial model
Figure GDA00036792448300000614
And initial incremental update model
Figure GDA00036792448300000615
A current video frame filtering processing module for tracking by targetThe module carries out image feature extraction processing on the video frame picture at the current moment t to obtain a new model at the moment t
Figure GDA0003679244830000071
New model according to t-1 time
Figure GDA0003679244830000072
And incremental update model at time t-1
Figure GDA0003679244830000073
Calculating to obtain an incremental updating model at the t moment
Figure GDA0003679244830000074
A current video frame model determining module for calculating new models by convolution response method
Figure GDA0003679244830000075
And the initial model
Figure GDA0003679244830000076
A value of similarity between delta init Novel model
Figure GDA0003679244830000077
Updating the model with the current increment
Figure GDA0003679244830000078
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure GDA0003679244830000079
Or
Figure GDA00036792448300000710
As the final model at time t;
the current moment video frame filtering processing module is specifically used for passing through the eyesThe mark tracking module utilizes the convolution layer to extract image characteristics of the input video frame picture at the time t, obtains the specific position of the target in the video frame picture at the time t, and converts the specific position of the target in the video frame picture at the time t into a new model at the time t
Figure GDA00036792448300000711
New model according to t-1 moment through target tracking algorithm
Figure GDA00036792448300000712
And incremental update model at time t-1
Figure GDA00036792448300000713
Obtaining an incremental update model at time t
Figure GDA00036792448300000714
Figure GDA00036792448300000715
Wherein alpha is a set learning rate;
the module for determining the video frame model at the current moment is specifically used for calculating a new model
Figure GDA00036792448300000716
And the initial model
Figure GDA00036792448300000717
To convert the convolution response into a new model
Figure GDA00036792448300000718
And the initial model
Figure GDA00036792448300000719
A value of similarity between delta init Calculate a new model
Figure GDA00036792448300000720
Updating the model with the current increment
Figure GDA00036792448300000721
To convert the convolution response into a new model
Figure GDA00036792448300000722
Updating the model with the current increment
Figure GDA00036792448300000723
A value of similarity between delta incre
Setting two thresholds of similarity
Figure GDA00036792448300000724
And
Figure GDA00036792448300000725
judgment of
Figure GDA00036792448300000726
And is
Figure GDA00036792448300000727
If yes, the new model is used
Figure GDA00036792448300000728
As final model at time t
Figure GDA00036792448300000729
Otherwise, select
Figure GDA00036792448300000730
As final model at time t
Figure GDA00036792448300000731
Preferably, the video data preprocessing module is specifically configured to complete video data reading in by a data reading thread, perform transcoding and framing processing on the video data to obtain a video frame picture sequence including a target, where the video frame picture sequence includes video frame pictures corresponding to respective moments;
and preprocessing each video frame picture in the video frame picture sequence, and transmitting the preprocessed video frame pictures to a target tracking module, wherein the preprocessing comprises histogram equalization and picture size adjustment.
Preferably, the video frame primary filtering processing module is specifically configured to perform target tracking on the initial time t by using the target tracking module 1 Extracting image characteristics of video frame pictures at the moment, and setting an initial moment t according to a target tracking task 1 Target position information in video frame picture of moment t 1 After the image blocks of the frame at a moment pass through two convolution layers with the kernel of 3 multiplied by 3, a convolution characteristic matrix of 125 multiplied by 32 is obtained, and the convolution characteristic matrix is used as an initial model
Figure GDA0003679244830000081
t 1 Incremental update model of time of day
Figure GDA0003679244830000082
Is equal in value to
Figure GDA0003679244830000083
It can be seen from the technical solutions provided by the embodiments of the present invention that the embodiments of the present invention obtain a new model at the current time by using a convolutional neural network
Figure GDA0003679244830000084
With the initial model T init Current incremental update model
Figure GDA0003679244830000085
Similarity between the models, and selecting a new model through a model updating strategy according to the similarity
Figure GDA0003679244830000086
Or
Figure GDA0003679244830000087
As a final model at the time t, similarity calculation is not needed by other algorithms, so that the reliability of the new model is rapidly detected.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a prior art approach to a target tracking system based on a frame-by-frame incremental update model;
FIG. 2 is a process flow diagram of a conventional correlation filter target tracking system design in the prior art;
FIG. 3 is a flowchart of a process for implementing target tracking based on similar template updating according to an embodiment of the present invention;
FIG. 4 is a graph of a convolution response provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of a processing procedure of a model update policy according to an embodiment of the present invention;
fig. 6 is a block diagram of a target tracking apparatus based on similar template update according to an embodiment of the present invention, in which a video data preprocessing module 61, a video frame primary filtering processing module 62, a current video frame filtering processing module 63, and a current video frame model determining module 64 are included.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for explaining the present invention and are not construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
Example one
For the current target tracking system, a computer device including a GPU (Graphics Processing Unit) can be adopted, and the image calculation speed is high and the Processing capability is strong. The embodiment of the invention is based on the computer design containing GPU, video data is input into the system according to frames, and the picture characteristics are extracted by adopting a Siamese convolution neural network structure with 2 convolution layers. The method accelerates the image feature extraction by using the GPU, positions the target by combining the convolution layer, improves the overall tracking speed of the system, updates the model by adopting a template updating strategy based on the similarity, and improves the reliability of updating the template.
The method can be applied to certain specific target real-time tracking tasks under natural conditions, such as automatic driving vehicle target positioning, virtual reality human body gesture tracking, intelligent traffic monitoring, video behavior recognition and the like. The system is easy to build, simple to install and low in cost.
The processing flow for realizing target tracking based on similar template updating provided by the embodiment of the invention is shown in fig. 3, and comprises the following processing steps:
step S31, the data reading thread finishes reading video data first, transcodes and frames the video data to obtain a video frame picture sequence including a target, where the video frame picture sequence includes video frame pictures corresponding to each time. And then, preprocessing each video frame picture in the sequence, and transmitting the preprocessed video frame picture to a target tracking module, wherein the preprocessing comprises histogram equalization, picture size adjustment and the like.
Step S32, extracting image characteristics of the video frame picture at the initial time (namely the first video frame picture) through the target tracking module to obtain the initial time t 1 Target position information in a video frame picture of a moment. Then, the target position information in the video frame picture at the initial moment is used as an initial model
Figure GDA0003679244830000111
At an initial time, i.e., at a time when t equals 1, the target information is given by the first frame of known image of the target tracking task. According to the initial time t given by the target tracking task 1 Target position information in video frame picture of moment t 1 The frame image block at the time passes through the convolution layer with two kernels of 3 x 3 to obtain 125 x 125 x 32 convolution feature matrix, using the convolution feature matrix as initial model
Figure GDA0003679244830000112
And at this time, the target's incremental update model is not updated (the second frame is input and needs to be updated), so the incremental update model is updated at this time
Figure GDA0003679244830000113
In numerical value, is equivalent to
Figure GDA0003679244830000114
The initial time, namely the time when t is 1, the target tracking task video will give the initial coordinate information Z of the target in the first frame image 0 (x 0 ,y 0 ,h 0 ,w 0 ) Denotes the coordinates (x) of the upper left corner of the target image block in the first frame 0 ,y 0 ) And the height and width (h) of the target image block 0 ,w 0 )。
According to the initial coordinate information, the system extracts an initial target image block I of a target in a first frame 0 And is combined with 0 The image of (2) is resized to 125 x 125. If I 0 A color image, which itself contains 3 color channels, is essentially a 125 × 125 × 3 matrix of data. If I 0 A black-and-white image, which itself contains 1 color channel, is essentially a 125 × 125 × 1 matrix data, which the system converts by channel replication into 125 × 125 × 3 matrix data.
At this time, the initial target image block I taken out from the first frame 0 Which is essentially a 125 x 3 matrix of data that the system converts to the initial model over a four-layer network
Figure GDA0003679244830000115
The four layers of the network are a convolution layer 1, a relu layer, a convolution layer 2 and a local response standardized layer in sequence. Wherein, the convolution layer 1 has convolution kernel size of 3 × 3 × 3 × 32, expansion edge of 1 pixel, and pooling step length of [1, 1%]The above-mentioned convolutional layer. The Relu layer nonlinearizes the output of convolutional layer 1, mitigating network overfitting. Convolutional layer 2 has convolutional kernel size of 3 × 3 × 32 × 32, extended edge of 1 pixel, pooling step of [1,1]The convolutional layer of (1).
Initial target image block I 0 Obtaining matrix data with the size of 125 multiplied by 32 after passing through two layers of convolution layer 1, relu layer and convolution layer 2, and obtaining an initial model by normalizing the distribution of the matrix data through a local response normalization layer
Figure GDA0003679244830000121
Which is itself a 125 x 32 matrix of data.
Step S33, the target tracking module extracts the image characteristics of the video frame picture at the time t to obtain a new model at the time t
Figure GDA0003679244830000122
New model according to t-1 time
Figure GDA0003679244830000123
And incremental update model at time t-1
Figure GDA0003679244830000124
Obtaining an incremental update model at time t
Figure GDA0003679244830000125
After the video frame picture at the current moment t is input into the target tracking module, the target tracking module extracts image features of the video frame picture at the moment t, obtains the specific position of the target in the video frame picture at the moment t according to the convolution response, and converts the specific position of the target in the video frame picture at the moment t into a new model at the moment t
Figure GDA0003679244830000126
At each time t, the video has a new image frame input system, and the target tracking module obtains a result of positioning the target from the input new frameZ t (x t ,y t ,h t ,w t ) Indicating the coordinate (x) of the upper left corner of the target image block in the t-th frame t ,y t ) And the height and width (h) of the target image block t ,w t ). According to the tracking result Z of each frame t (x t ,y t ,h t ,w t ) The system extracts a target image block I of a target in the t frame t And is combined with t The image of (2) is resized to 125 x 125. If I t A color image, which itself contains 3 color channels, is essentially a 125 × 125 × 3 matrix of data. If I t A black-and-white image, which itself contains 1 color channel, is essentially a 125 × 125 × 1 matrix data, which the system converts by channel replication into 125 × 125 × 3 matrix data.
At this time, the target image block I taken out from the t-th frame t Which is essentially a 125 x 3 matrix of data that the system converts to a new model for frame t through a four-layer network
Figure GDA0003679244830000127
Which is essentially a 125 x 32 matrix of data.
The target tracking algorithm is also based on the new model at time t-1
Figure GDA0003679244830000128
And incremental update model at time t-1
Figure GDA0003679244830000129
Obtaining an incremental update model at time t
Figure GDA00036792448300001210
Figure GDA00036792448300001211
Where α is the learning rate, i.e., the weight of the new model generated from the new video frame in the updated target model. In order to ensure the stability of the target model, the learning rate α is generally about 0.01. Although the updating is strong in stability, the information ratio of the new model to the overall model is very small, so that the updated model is difficult to describe the change of the target appearance at the time t.
Step S34, calculating a new model
Figure GDA0003679244830000131
And the initial model
Figure GDA0003679244830000132
To convert the convolution response into a new model
Figure GDA0003679244830000133
And the initial model
Figure GDA0003679244830000134
A value of similarity between delta init Calculate a new model
Figure GDA0003679244830000135
Updating the model with the current increment
Figure GDA0003679244830000136
To convert the convolution response into a new model
Figure GDA0003679244830000137
Updating the model with the current increment
Figure GDA0003679244830000138
A value of similarity between delta incre Based on the above similarity value δ init And a similarity value delta incre Determining a model by a model update strategy
Figure GDA0003679244830000139
Or
Figure GDA00036792448300001310
As final mode at time tAnd (4) molding.
Fig. 5 shows a schematic processing procedure diagram of a model update policy according to an embodiment of the present invention, which includes the following processing procedures: at time t, the updated model is the new model
Figure GDA00036792448300001311
Or incrementally updating the model
Figure GDA00036792448300001312
Dependent on the new model
Figure GDA00036792448300001313
Reliability and full descriptive nature of the system. Reliability refers to a new model
Figure GDA00036792448300001314
With the initial model T init Similarity of (d) init The similarity between the two models should be as large as possible under the condition of accurate tracking so as to ensure that the new model and the initial model are the same target. Fully descriptive in referring to new models
Figure GDA00036792448300001315
Updating the model with the current increment
Figure GDA00036792448300001316
Similarity of (d) incre Small enough so that the new model adequately describes the change in appearance of the model at time t.
According to the principle, the invention sets two similarity thresholds
Figure GDA00036792448300001317
And
Figure GDA00036792448300001318
suitable threshold values are obtained by trial and error. When in use
Figure GDA00036792448300001319
Then the reliability of the new modelCan be ensured when
Figure GDA00036792448300001320
The full descriptive nature of the new model can also meet the requirements of the tracker. When both similarities meet the threshold requirement, then it is judged that the change in the appearance of the target is sufficient to use the new model
Figure GDA00036792448300001321
Substitution
Figure GDA00036792448300001322
Judging new models simultaneously
Figure GDA00036792448300001323
Whether the specified target object O can be described sufficiently reliably. Then, the new model is set
Figure GDA00036792448300001324
As final model at time t
Figure GDA00036792448300001325
Otherwise, select
Figure GDA00036792448300001326
As final model at time t
Figure GDA00036792448300001327
In the invention, the similarity between two models is calculated by adopting a convolution response method to calculate a new model
Figure GDA0003679244830000141
And incrementally updating the model
Figure GDA0003679244830000142
Taking the similarity between the two as an example, the new model at the time t is formed by a convolutional neural network
Figure GDA0003679244830000143
And incrementally updating the model
Figure GDA0003679244830000144
Performing convolution to
Figure GDA0003679244830000145
And
Figure GDA0003679244830000146
calculating the convolution response M in two dimensions t I.e. by
Figure GDA0003679244830000147
A convolution response map is generated as shown in FIG. 4, where the light areas in FIG. 4 represent the response values M t It is very big to say that in these areas, the new model
Figure GDA0003679244830000148
And incrementally updating the model
Figure GDA0003679244830000149
The correlation is high and therefore likely to be the center position of the target. And the darker the color the response value M t The smaller, i.e. the lower the correlation, the less likely it is to be a region of the target. The maximum point of the convolution response map is the new position of the target.
It can be seen in principle that the response values reflect the new model
Figure GDA00036792448300001410
And incrementally updating the model
Figure GDA00036792448300001411
The degree of correlation of (c). Mapping the magnitude of the response value to [0,1 ] by adopting a normalization method]Then, the maximum value delta of the obtained response value is taken out, so that the delta can reflect a new model in the form of percentage similarity
Figure GDA00036792448300001412
And incrementally updating the model
Figure GDA00036792448300001413
The degree of correlation of (c).
For each model T, it is itself a 125 × 125 × 32 matrix, where N ═ 32 is the number of channels, and thus for each channel, it can be regarded as a 32 × 32 two-dimensional image matrix. Then for both models T 1 、T 2 Their convolution response delta matrices are calculated by the convolution of the two in each channel, let T 1 、T 2 The model at each channel is respectively
Figure GDA00036792448300001414
The convolution response of the corresponding channel Δ i Can be expressed as:
Figure GDA00036792448300001415
where Δ (s, T) represents the value of the s-th row and T-th column in the matrix Δ, and T (x, y) represents the value of the x-th row and y-th column in the matrix T. The similarity δ can then be expressed as:
Figure GDA00036792448300001416
where max (Δ) i (s, t)) represents the matrix Δ i The largest value among the values. Delta i The similarity matrix delta obtained from each channel is a 125 x 125 matrix i Combining to obtain a model T 1 、T 2 125 × 125 × 32 similarity matrix between them, the value of the similarity matrix is the model T 1 、T 2 The similarity value δ therebetween.
Thus in our tracker, at each time T, there are three models, including the initial model T init New model generated by video frame at time t
Figure GDA00036792448300001417
Incremental updating of the generated target model at the last time (t-1)
Figure GDA0003679244830000151
Three pipelines store three models, and the final model at the moment t is obtainedThe current position of the object in the model is converted into a coordinate frame on the video frame and the coordinate frame is displayed on the user interface.
Example two
The embodiment provides an object tracking device based on similar template updating, the structure of the device is shown in fig. 6, and the device comprises the following modules:
the video data preprocessing module 61 is configured to transcode and frame-divide the video data to obtain video frame pictures corresponding to each time point including the target;
a video frame primary filtering processing module 62, configured to perform image feature extraction processing on a video frame picture at an initial time through the target tracking module to obtain an initial model
Figure GDA0003679244830000152
And initial incremental update model
Figure GDA0003679244830000153
A current video frame filtering processing module 63, configured to perform image feature extraction processing on the video frame picture at the current time t through the target tracking module to obtain a new model at the time t
Figure GDA0003679244830000154
New model according to t-1 time
Figure GDA0003679244830000155
And incremental update model at time t-1
Figure GDA0003679244830000156
Calculating to obtain an incremental updating model at the t moment
Figure GDA0003679244830000157
A current video frame model determining module 64 for calculating new models respectively by convolution response method
Figure GDA0003679244830000158
And the initial model
Figure GDA0003679244830000159
A value of similarity between delta init Novel model
Figure GDA00036792448300001510
Updating the model with the current increment
Figure GDA00036792448300001511
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure GDA00036792448300001512
Or
Figure GDA00036792448300001513
As the final model at time t.
Preferably, the video data preprocessing module 61 is specifically configured to complete video data reading in by a data reading thread, perform transcoding and framing processing on the video data to obtain a video frame picture sequence including a target, where the video frame picture sequence includes video frame pictures corresponding to each time;
and carrying out preprocessing operation on each video frame picture in the video frame picture sequence, and transmitting the preprocessed video frame picture to a target tracking module, wherein the preprocessing comprises histogram equalization and picture size adjustment.
Preferably, the video frame primary filtering processing module 62 is specifically configured to perform the target tracking module on the initial time t 1 Extracting image characteristics of video frame pictures at the moment, and setting an initial moment t according to a target tracking task 1 Target position information in video frame picture of moment t 1 After the image blocks of the frame at a moment pass through two convolution layers with the kernel of 3 multiplied by 3, a convolution characteristic matrix of 125 multiplied by 32 is obtained, and the convolution characteristic matrix is used as an initial model
Figure GDA0003679244830000161
t 1 Incremental update model of time of day
Figure GDA0003679244830000162
Is equal in value to
Figure GDA0003679244830000163
Preferably, the current-time video frame filtering processing module 63 is specifically configured to perform image feature extraction on the input video frame picture at the time t by using the convolution layer through the target tracking module, obtain a specific position of the target in the video frame picture at the time t, and convert the specific position of the target in the video frame picture at the time t into a new model at the time t
Figure GDA0003679244830000164
New model according to t-1 moment through target tracking algorithm
Figure GDA0003679244830000165
And incremental update model at time t-1
Figure GDA0003679244830000166
Obtaining an incremental update model at time t
Figure GDA0003679244830000167
Figure GDA0003679244830000168
Where α is a set learning rate.
Preferably, the module 64 for determining the video frame model at the current moment is specifically configured to calculate a new model
Figure GDA0003679244830000169
And the initial model
Figure GDA00036792448300001610
To convert the convolution response into a new model
Figure GDA00036792448300001611
And the initial model
Figure GDA00036792448300001612
A value of similarity between delta init Calculate a new model
Figure GDA00036792448300001613
Updating the model with the current increment
Figure GDA00036792448300001614
To convert the convolution response into a new model
Figure GDA00036792448300001615
Updating the model with the current increment
Figure GDA00036792448300001616
A value of similarity between delta incre
Setting two thresholds of similarity
Figure GDA00036792448300001617
And
Figure GDA00036792448300001618
judgment of
Figure GDA00036792448300001619
And is
Figure GDA00036792448300001620
If yes, the new model is used
Figure GDA00036792448300001621
As final model at time t
Figure GDA00036792448300001622
Otherwise, select
Figure GDA00036792448300001623
As final model at time t
Figure GDA00036792448300001624
The specific process of performing similar template update-based target tracking by using the apparatus of the embodiment of the present invention is similar to that of the foregoing method embodiment, and is not described herein again.
In summary, the embodiment of the present invention obtains the new model of the current time by using the convolutional neural network
Figure GDA0003679244830000171
With the initial model T init Current incremental update model
Figure GDA0003679244830000172
Similarity between the models, and selecting a new model through a model updating strategy according to the similarity
Figure GDA0003679244830000173
Or
Figure GDA0003679244830000174
As a final model at the time t, similarity calculation is not needed by other algorithms, so that the reliability of the new model is rapidly detected.
The template updating strategy designed by the invention comprises three assembly line models, namely a conventional model, a new model and an initial model, and the mechanism of the three assembly lines can ensure that the original high-reliability normal-scale target tracking can be still kept when the reliability of the new model is not high. And when the reliability of the new model is higher, the model generated by combining the new model and the conventional model is adopted for target tracking.
The invention sets two thresholds of similarity
Figure GDA0003679244830000175
And
Figure GDA0003679244830000176
and the final target model is decided based on the comparison of the actual similarity with the two thresholds. By similarity
Figure GDA0003679244830000177
And the relevance between the updated model and the original target is ensured, so that the stability of the target model is ensured. By similarity
Figure GDA0003679244830000178
The updated model is judged to contain information of the change of the appearance of the target in the video, and the information describes the variability of the target. Therefore, the tracking stability of the target tracking system can be ensured.
Convolution response matrix M generated in the present invention t The method can be applied to related video analysis systems. If the target occlusion judging system is used, for targets with integrally consistent appearance changes, the response matrix M t There is a peak where the target exists, but when the local area value is small, it can be judged that the local area is occluded. Similarly, the method can also be applied to a system for analyzing the degree of change of the appearance of the target, namely a response matrix M t The input of the video frame changes with each time t, and the matrix value change reflects the change degree of the target appearance.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A target tracking method based on similar template updating is characterized by comprising the following steps:
transcoding and framing the video data to obtain video frame pictures corresponding to all moments including the target;
image feature extraction processing is carried out on the video frame picture at the initial moment through a target tracking module to obtain an initial model
Figure FDA0003627448930000011
And initial incremental update model
Figure FDA0003627448930000012
Carrying out image feature extraction processing on the video frame picture at the current moment t through a target tracking module to obtain a new model at the moment t
Figure FDA0003627448930000013
New model according to t-1 time
Figure FDA0003627448930000014
And incremental update model at time t-1
Figure FDA0003627448930000015
Calculating to obtain an incremental updating model at the t moment
Figure FDA0003627448930000016
Respectively calculating new models by a convolution response method
Figure FDA0003627448930000017
And the initial model
Figure FDA0003627448930000018
A value of similarity between delta init Novel model
Figure FDA0003627448930000019
Updating the model with the current increment
Figure FDA00036274489300000110
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure FDA00036274489300000111
Or
Figure FDA00036274489300000112
As the final model at time t;
the target tracking module extracts the image characteristics of the video frame picture at the current moment t to obtain a new model at the moment t
Figure FDA00036274489300000113
New model according to t-1 time
Figure FDA00036274489300000114
And incremental update model at time t-1
Figure FDA00036274489300000115
Calculating to obtain an incremental updating model at the t moment
Figure FDA00036274489300000116
The method comprises the following steps:
performing image feature extraction on the input video frame picture at the t moment by using the convolution layer through a target tracking module to obtain the specific position of the target in the video frame picture at the t moment, and converting the specific position of the target in the video frame picture at the t moment into a new model at the t moment
Figure FDA00036274489300000117
New model according to t-1 moment through target tracking algorithm
Figure FDA00036274489300000118
And incremental update model at time t-1
Figure FDA00036274489300000119
Obtaining an incremental update model at time t
Figure FDA00036274489300000120
Figure FDA00036274489300000121
Wherein alpha is a set learning rate;
respectively calculating new models by the convolution response method
Figure FDA0003627448930000021
And the initial model
Figure FDA0003627448930000022
A value of similarity between delta init Novel model
Figure FDA0003627448930000023
Updating the model with the current increment
Figure FDA0003627448930000024
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure FDA0003627448930000025
Or alternatively
Figure FDA0003627448930000026
As a final model of the time t, the following are included:
calculate a new model
Figure FDA0003627448930000027
And the initial model
Figure FDA0003627448930000028
To convert the convolution response into a new model
Figure FDA0003627448930000029
And the initial model
Figure FDA00036274489300000210
A value of similarity between delta init Calculate a new model
Figure FDA00036274489300000211
Updating the model with the current increment
Figure FDA00036274489300000212
To convert the convolution response into a new model
Figure FDA00036274489300000213
Updating the model with the current increment
Figure FDA00036274489300000214
A value of similarity between delta incre
Setting two thresholds of similarity
Figure FDA00036274489300000215
And
Figure FDA00036274489300000216
judgment of
Figure FDA00036274489300000217
And is
Figure FDA00036274489300000218
If yes, the new model is used
Figure FDA00036274489300000219
As final model at time t
Figure FDA00036274489300000220
Otherwise, select
Figure FDA00036274489300000221
As final model at time t
Figure FDA00036274489300000222
Figure FDA00036274489300000225
2. The method of claim 1, wherein the transcoding and framing the video data to obtain the video frame pictures corresponding to the respective moments including the target comprises:
the data reading thread finishes reading in the video data, transcoding and framing the video data to obtain a video frame picture sequence containing a target, wherein the video frame picture sequence comprises video frame pictures corresponding to all moments;
and carrying out preprocessing operation on each video frame picture in the video frame picture sequence, and transmitting the preprocessed video frame picture to a target tracking module, wherein the preprocessing comprises histogram equalization and picture size adjustment.
3. The method according to claim 1, wherein the initial model is obtained by extracting image features of the video frame picture at the initial time through the target tracking module
Figure FDA00036274489300000223
And initial incremental update model
Figure FDA00036274489300000224
The method comprises the following steps:
initial time t is tracked by the target tracking module 1 Extracting image characteristics of video frame pictures at the moment, and setting an initial moment t according to a target tracking task 1 Target position information in video frame picture of timet 1 After the image blocks of the frame at a moment pass through two convolution layers with the kernel of 3 multiplied by 3, a convolution characteristic matrix of 125 multiplied by 32 is obtained, and the convolution characteristic matrix is used as an initial model
Figure FDA0003627448930000031
t 1 Incremental update model of time of day
Figure FDA0003627448930000032
Is equal in value to
Figure FDA0003627448930000033
4. An object tracking device based on similar template updating, comprising:
the video data preprocessing module is used for transcoding and framing the video data to obtain video frame pictures corresponding to all moments including targets;
a video frame primary filtering processing module for extracting image characteristics of the video frame image at the initial time through the target tracking module to obtain an initial model
Figure FDA0003627448930000034
And initial incremental update model
Figure FDA0003627448930000035
A current video frame filtering processing module for extracting image characteristics of the current video frame picture at the moment t through the target tracking module to obtain a new model at the moment t
Figure FDA0003627448930000036
New model according to t-1 time
Figure FDA0003627448930000037
And incremental update model at time t-1
Figure FDA0003627448930000038
Calculating to obtain an incremental updating model at the t moment
Figure FDA0003627448930000039
A current video frame model determining module for calculating new models by convolution response method
Figure FDA00036274489300000310
And the initial model
Figure FDA00036274489300000311
A value of similarity between delta init Novel model
Figure FDA00036274489300000312
Updating the model with the current increment
Figure FDA00036274489300000313
A similarity value of incre According to said similarity value δ init And a similarity value delta incre Selecting a new model by a model update strategy
Figure FDA00036274489300000314
Or
Figure FDA00036274489300000315
As the final model at time t;
the current-time video frame filtering processing module is specifically used for extracting image characteristics of the input t-time video frame picture by using the convolution layer through the target tracking module to obtain the specific position of the target in the t-time video frame picture and converting the specific position of the target in the t-time video frame picture into a new t-time model
Figure FDA00036274489300000316
New model according to t-1 moment through target tracking algorithm
Figure FDA00036274489300000317
And incremental update model at time t-1
Figure FDA00036274489300000318
Obtaining an incremental update model at time t
Figure FDA00036274489300000319
Figure FDA0003627448930000041
Wherein alpha is a set learning rate;
the module for determining the video frame model at the current moment is specifically used for calculating a new model
Figure FDA0003627448930000042
And the initial model
Figure FDA0003627448930000043
To convert the convolution response into a new model
Figure FDA0003627448930000044
And the initial model
Figure FDA0003627448930000045
A value of similarity between delta init Calculate a new model
Figure FDA0003627448930000046
Updating the model with the current increment
Figure FDA0003627448930000047
To convert the convolution response into a new model
Figure FDA0003627448930000048
Updating the model with the current increment
Figure FDA0003627448930000049
A value of similarity between delta incre
Setting two thresholds of similarity
Figure FDA00036274489300000410
And
Figure FDA00036274489300000411
judgment of
Figure FDA00036274489300000412
And is
Figure FDA00036274489300000413
If it is true, the new model is used
Figure FDA00036274489300000414
As final model at time t
Figure FDA00036274489300000415
Otherwise, select
Figure FDA00036274489300000416
As final model at time t
Figure FDA00036274489300000417
5. The apparatus of claim 4, wherein:
the video data preprocessing module is specifically used for completing video data reading in by a data reading thread, transcoding and framing the video data to obtain a video frame picture sequence containing a target, wherein the video frame picture sequence comprises video frame pictures corresponding to all moments;
and carrying out preprocessing operation on each video frame picture in the video frame picture sequence, and transmitting the preprocessed video frame picture to a target tracking module, wherein the preprocessing comprises histogram equalization and picture size adjustment.
6. The apparatus of claim 5, wherein:
the video frame primary filtering processing module is specifically used for processing the initial time t through the target tracking module 1 Extracting image characteristics of video frame pictures at the moment, and setting an initial moment t according to a target tracking task 1 Target position information in video frame picture of moment t 1 After the image blocks of the frame at a moment pass through two convolution layers with the kernel of 3 multiplied by 3, a convolution characteristic matrix of 125 multiplied by 32 is obtained, and the convolution characteristic matrix is used as an initial model
Figure FDA00036274489300000418
t 1 Incremental update model of time of day
Figure FDA00036274489300000419
Is equal in value to
Figure FDA00036274489300000420
CN201910734740.0A 2019-08-09 2019-08-09 Target tracking method and device based on similar template updating Active CN110796680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910734740.0A CN110796680B (en) 2019-08-09 2019-08-09 Target tracking method and device based on similar template updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910734740.0A CN110796680B (en) 2019-08-09 2019-08-09 Target tracking method and device based on similar template updating

Publications (2)

Publication Number Publication Date
CN110796680A CN110796680A (en) 2020-02-14
CN110796680B true CN110796680B (en) 2022-07-29

Family

ID=69427419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910734740.0A Active CN110796680B (en) 2019-08-09 2019-08-09 Target tracking method and device based on similar template updating

Country Status (1)

Country Link
CN (1) CN110796680B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899284B (en) * 2020-08-14 2024-04-09 北京交通大学 Planar target tracking method based on parameterized ESM network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410247A (en) * 2018-10-16 2019-03-01 中国石油大学(华东) A kind of video tracking algorithm of multi-template and adaptive features select

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657625B2 (en) * 2016-03-29 2020-05-19 Nec Corporation Image processing device, an image processing method, and computer-readable recording medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410247A (en) * 2018-10-16 2019-03-01 中国石油大学(华东) A kind of video tracking algorithm of multi-template and adaptive features select

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking;zheng zhu etc.;《2017 IEEE International Conference on Computer Vision Workshops (ICCVW)》;20171029;pp.1973-1982 *
基于孪生网络的目标跟踪算法及其在舰船场景中的应用研究;王永;《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》;20190715(第07期);第C032-9页 *
视频序列中动目标检测与跟踪算法的研究;焦安霞;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20100615(第06期);第I138-443页 *

Also Published As

Publication number Publication date
CN110796680A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
US12020474B2 (en) Image processing apparatus, image processing method, and non-transitory computer-readable storage medium
Islam et al. Revisiting salient object detection: Simultaneous detection, ranking, and subitizing of multiple salient objects
WO2020259249A1 (en) Focusing method and device, electronic device and computer-readable storage medium
CN111612008B (en) Image segmentation method based on convolution network
WO2021169334A1 (en) Rapid wide-angle stitching method for high-resolution images
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
Kumar et al. Recent trends in multicue based visual tracking: A review
WO2022156626A1 (en) Image sight correction method and apparatus, electronic device, computer-readable storage medium, and computer program product
CN112927279A (en) Image depth information generation method, device and storage medium
CN110827320B (en) Target tracking method and device based on time sequence prediction
CN107563299B (en) Pedestrian detection method using RecNN to fuse context information
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN111079539A (en) Video abnormal behavior detection method based on abnormal tracking
KR20210029692A (en) Method and storage medium for applying bokeh effect to video images
CN114140623A (en) Image feature point extraction method and system
CN113505634A (en) Double-flow decoding cross-task interaction network optical remote sensing image salient target detection method
CN114708615A (en) Human body detection method based on image enhancement in low-illumination environment, electronic equipment and storage medium
CN110796680B (en) Target tracking method and device based on similar template updating
WO2022120996A1 (en) Visual position recognition method and apparatus, and computer device and readable storage medium
CN113743300A (en) Semantic segmentation based high-resolution remote sensing image cloud detection method and device
CN117541574A (en) Tongue diagnosis detection method based on AI semantic segmentation and image recognition
CN117456330A (en) MSFAF-Net-based low-illumination target detection method
CN116778187A (en) Salient target detection method based on light field refocusing data enhancement
CN116309050A (en) Image super-resolution method, program product, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant