CN109785385B - Visual target tracking method and system - Google Patents

Visual target tracking method and system Download PDF

Info

Publication number
CN109785385B
CN109785385B CN201910058977.1A CN201910058977A CN109785385B CN 109785385 B CN109785385 B CN 109785385B CN 201910058977 A CN201910058977 A CN 201910058977A CN 109785385 B CN109785385 B CN 109785385B
Authority
CN
China
Prior art keywords
target
position information
tracked
network
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910058977.1A
Other languages
Chinese (zh)
Other versions
CN109785385A (en
Inventor
王金桥
赵飞
唐明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910058977.1A priority Critical patent/CN109785385B/en
Publication of CN109785385A publication Critical patent/CN109785385A/en
Application granted granted Critical
Publication of CN109785385B publication Critical patent/CN109785385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a visual target tracking method and a system, wherein the visual target tracking method comprises the following steps: acquiring a plurality of historical target templates and historical position information of a current video sequence of a target to be tracked; determining a plurality of groups of target template images and search areas from the current video sequence of the target to be tracked according to each historical target template; predicting the predicted position information of the target template image in the search area according to the target positioning model, each group of target template images and the search area; based on an action network model, determining a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information; and comparing the target position prediction profit values of all the target template images, and determining the prediction position information of the target template image with the maximum target position prediction profit value, so that the prediction position information of the current frame image of the target to be tracked can be accurately determined.

Description

Visual target tracking method and system
Technical Field
The invention relates to the technical field of image processing, in particular to a visual target tracking method and system.
Background
Visual target tracking is an important research direction in computer vision. In each frame of a video sequence, the tracking algorithm needs to determine the position and scale information of the object to be tracked. Because the apparent characteristic of the object to be tracked is only given by the first frame and is influenced by various factors such as illumination change, scale change, shielding, motion blur and the like in the tracking process, the tracking algorithm not only needs to have extremely strong robustness on environmental change, but also needs to model the apparent characteristic of the target to be tracked with strong discriminability. Under such conditions, conventional manual feature-based object tracking algorithms perform poorly.
In recent years, deep learning has enjoyed great success in many directions of computer vision. The deep convolutional neural network automatically learns by utilizing a large amount of training data and a back propagation algorithm of errors to obtain the features with strong discriminative power. Meanwhile, the reinforcement learning algorithm based on the deep neural network has great potential in a complex environment. Namely, the deep neural network can be used for feature extraction of images and fitting complex decision functions.
The existing visual target tracking algorithm based on deep learning or reinforcement learning has the following defects: firstly, a target tracking algorithm based on reinforcement learning can only return tracking results through a plurality of discrete actions, so that not only is the tracking efficiency low, but also the tracking precision is poor due to the discrete actions; secondly, the existing tracking algorithm based on deep learning only utilizes two discrete frame images to train in the training stage, but does not train on a continuous video sequence, so that the accumulated error of the tracking algorithm is large, and long-time tracking cannot be carried out; finally, these tracking algorithms do not have an effective target template updating strategy, and as the tracking time becomes longer, the accumulated error gradually increases, eventually leading to tracking failure.
Disclosure of Invention
In order to solve the above problems in the prior art, i.e. to improve the target tracking accuracy, the invention provides a visual target tracking method and system.
In order to achieve the purpose, the invention provides the following scheme:
a visual target tracking method, the visual target tracking method comprising:
acquiring a plurality of historical target templates and historical position information of a current video sequence of a target to be tracked;
determining a plurality of groups of target template images and search areas from the current video sequence of the target to be tracked according to each historical target template;
predicting the predicted position information of the target template image in the search area according to the target positioning model, each group of target template images and the search area;
based on an action network model, determining a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information;
and comparing the target position prediction profit values of all the target template images, and determining the prediction position information of the target template image with the maximum target position prediction profit value for tracking the current frame image of the target to be tracked.
Optionally, the visual target tracking method further includes:
extracting a prediction target template from the current frame image according to the prediction position information;
and replacing the target template corresponding to the target template image with the minimum target position prediction profit value by the prediction target template so as to update the historical target template.
Optionally, the method for constructing the target location model includes:
carrying out target position labeling on a historical video sequence of a target to be tracked to obtain a target tracking data set of the target to be tracked;
determining training data for training the conjoined network according to the target tracking data set;
and performing connected network training on the training data to obtain a target positioning model.
Optionally, the target position labeling is performed on the historical video sequence of the target to be tracked, and a target tracking data set of the target to be tracked is obtained, which specifically includes:
extracting a plurality of frames of images from the historical video sequence;
determining a corresponding target rectangular frame from each frame of image, wherein each target rectangular frame is a minimum rectangular frame containing a target to be tracked;
obtaining a target tracking data set according to each target rectangular frame; the target tracking data set includes a plurality of pairs of coordinate pairs formed by upper left corner coordinate information and lower right corner coordinate information of a set rectangular box.
Optionally, the determining training data for training the connected network according to the target tracking data set specifically includes:
determining a target area and a search area of each target rectangular frame according to the target tracking data set;
generating a target response graph in a Gaussian shape in the target area by taking the search area as a reference for each target rectangular frame;
determining a truth sample according to the target response graph; the training data includes a plurality of true value samples.
Optionally, the performing connected network training on the training data to obtain a target positioning model specifically includes:
for each target rectangular frame, carrying out pixel scaling on a target image of a target area and a search image of a search area to obtain a pair of scaled target images and scaled search images;
carrying out RGB three-channel change on each pair of zoomed target images and zoomed searching images to obtain a pair of color images;
respectively carrying out mean value reduction operation on the color images to obtain a pair of value-reduced color images;
training each pair of subtraction color images through a depth network to obtain a depth network output value;
calculating errors for output values and true value samples of the depth network through a loss function;
according to the error, obtaining a target positioning model by adopting a back propagation algorithm; and the target positioning model determines the predicted position information of the target template image in the search area according to the input target template image and the search area.
Optionally, the object localization model includes two structurally identical, parameter-shared first sub-networks and a deconvolution first sub-network;
the output ends of the two first sub-networks are connected in parallel and then connected with the input end of the deconvolution first sub-network;
combining the outputs of the two first sub-networks to serve as the input of the deconvolution first sub-network; the output of the deconvolution first subnetwork is predicted position information used for characterizing the center position and scale information of the target.
Optionally, the method for constructing the action network model includes:
determining a plurality of short video sequences according to a target tracking data set of a target to be tracked; each short video sequence comprises a plurality of frames of position images, and each frame of position image comprises position information of an object to be tracked;
and performing network training according to each short video sequence and the target positioning model to determine an action network model.
Optionally, the action network model includes a second sub-network shared by a plurality of weights;
the input of each second sub-network is predicted position information and a plurality of historical position information, and the output is a target position prediction profit value.
In order to solve the technical problems, the invention also provides the following scheme:
a visual target tracking system, the visual target tracking system comprising:
the device comprises an acquisition unit, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of historical target templates and historical position information of a current video sequence of a target to be tracked;
the first determining unit is used for determining a plurality of groups of target template images and search areas from the current video sequence of the target to be tracked according to each historical target template;
the prediction unit is used for predicting the predicted position information of the target template image in the search area according to the target positioning model, each group of target template images and the search area;
a second determining unit, configured to determine, based on an action network model, a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information;
and the tracking unit is used for comparing the target position prediction profit values of all the target template images, determining the prediction position information of the target template image with the maximum target position prediction profit value, and tracking the current frame image of the target to be tracked.
According to the embodiment of the invention, the invention discloses the following technical effects:
the target positioning method and the target tracking system are based on the target positioning model and the action network model, and can obtain a plurality of pieces of predicted position information according to the historical target template and the historical position information, further determine corresponding target position predicted income values, and accurately determine the predicted position information of the current frame image of the target to be tracked by comparing the sizes of the target position predicted income values.
Drawings
FIG. 1 is a flow chart of a visual target tracking method of the present invention;
FIG. 2 is a schematic diagram of a visual target tracking method according to an embodiment of the present invention;
FIG. 3a is a schematic diagram of a convolution module structure;
FIG. 3b is a schematic diagram of a deconvolution module structure;
FIG. 4 is a schematic diagram of a structure of an object localization model;
FIG. 5 is a block diagram of a visual target tracking system according to the present invention.
Description of the symbols:
an acquisition unit-1, a first determination unit-2, a prediction unit-3, a second determination unit-4, and a tracking unit-5.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The invention provides a visual target tracking method, which is based on a target positioning model and an action network model, can obtain a plurality of pieces of predicted position information according to a historical target template and historical position information, further determines corresponding target position predicted income values, and can accurately determine the predicted position information of a current frame image of a target to be tracked by comparing the sizes of the target position predicted income values.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1 and 2, the visual target tracking method of the present invention includes:
step 100: and acquiring a plurality of historical target templates and historical position information of the current video sequence of the target to be tracked.
Step 200: and determining a plurality of groups of target template images and search areas from the current video sequence of the target to be tracked according to each historical target template.
Step 300: and predicting the predicted position information of the target template image in the search area according to the target positioning model, each group of target template images and the search area.
Step 400: and determining a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information based on an action network model.
Step 500: and comparing the target position prediction profit values of all the target template images, and determining the prediction position information of the target template image with the maximum target position prediction profit value for tracking the current frame image of the target to be tracked.
Preferably, the visual target tracking method of the present invention further comprises:
step 600: extracting a prediction target template from the current frame image according to the prediction position information;
step 700: and replacing the target template corresponding to the target template image with the minimum target position prediction profit value by the prediction target template so as to update the historical target template.
Further, in step 300, the method for constructing the object location model includes:
step 301: and marking the target position of the historical video sequence of the target to be tracked to obtain a target tracking data set of the target to be tracked.
Step 302: and determining training data for training the conjoined network according to the target tracking data set.
Step 303: and performing connected network training on the training data to obtain a target positioning model.
In step 301, the target position labeling is performed on the historical video sequence of the target to be tracked, and a target tracking data set of the target to be tracked is obtained, which specifically includes:
step 3011: extracting a plurality of frames of images from the historical video sequence.
Step 3012: and determining a corresponding target rectangular frame from each frame of image, wherein each target rectangular frame is the minimum rectangular frame containing the target to be tracked.
Step 3013: and obtaining a target tracking data set according to each target rectangular frame.
The target tracking data set includes a plurality of pairs of coordinate pairs formed by upper left corner coordinate information and lower right corner coordinate information of a set rectangular box.
The source of the video sequence containing the target to be tracked can be obtained in many ways, such as collection from a network or self-shooting; in the embodiment, the position of each target to be tracked is marked in a manual marking mode.
In step 302, the determining training data for training the connected network according to the target tracking data set specifically includes:
step 3021: and determining a target area and a search area of each target rectangular frame according to the target tracking data set.
Step 3022: and generating a target response graph in a Gaussian shape in the target area by taking the search area as a reference for each target rectangular frame.
Step 3023: determining a truth sample according to the target response graph; the training data includes a plurality of true value samples.
In the present embodiment, the size of the target area is set to four times the target size, and the target object is fixedly placed at the center position of the target area. In the search area, the length and width of the search area are set to be 1.4 to 3.3 times of the target object, and the target positioning model can be more robust to the scale change of the target object in the tracking process and also has strong robustness to the change of the length-width ratio of the target object; and generating a Gaussian-shaped target response in the target area by taking the search image of the search area as a reference, wherein the other areas are background and the response is zero.
And training a target positioning model based on the connected network by using the generated training data, and outputting the trained target positioning model. In the training process, the initial learning rate is set to be 1e-4, the batch size is set to be 50, and the maximum training round number of the network is set to be 1M.
In step 303, the performing connected network training on the training data to obtain a target positioning model specifically includes:
step 3031: and for each target rectangular frame, carrying out pixel scaling on the target image of the target area and the search image of the search area to obtain a pair of scaled target image and scaled search image.
Step 3032: and carrying out RGB three-channel change on each pair of the zoomed target image and the zoomed searching image to obtain a pair of color images.
Step 3033: and respectively carrying out the average value reduction operation on the color images to obtain a pair of value-reduced color images.
Step 3034: and training each pair of subtraction color images through a depth network to obtain a depth network output value.
Step 3035: the error is calculated by the loss function on the deep network output value and the true value sample.
Wherein the loss function is a mean square error loss.
Step 3036: and obtaining a target positioning model by adopting a back propagation algorithm according to the error.
The target location model may determine predicted location information of the target template image in a search area according to an input target template image and the search area.
Through training, the target positioning model can accurately position the target in the search area according to the input target template image and the search area. Namely, the position of the target to be tracked in the search area is determined according to the position of the maximum response value in the response diagram, and meanwhile, the length and the width of the target are determined according to the shape and the size of the response diagram.
The target center position is the position of the maximum response point C in the response map. The determination method of the target width is as follows: the distance between the two farthest points where the response value in x-direction through element C is greater than the threshold value of 0.1. Similarly, the height of the target is the distance between the two farthest points whose response value in the y direction is greater than the threshold value 0.1.
In particular, the object localization model comprises two structurally identical, parameter-sharing first sub-networks and one deconvolution first sub-network.
The outputs of the two first sub-networks are connected in parallel and then connected to the inputs of the deconvolution first sub-network.
Combining the outputs of the two first sub-networks to serve as the input of the deconvolution first sub-network; the output of the deconvolution first subnetwork is predicted position information used for characterizing the center position and scale information of the target.
Two sub-networks with the same structure and shared parameters comprise 8 convolution modules (as shown in fig. 4), and each convolution module comprises 32, 64, 128, 256, 512 and 512 feature maps. The deconvolution sub-network includes 8 deconvolution modules, the deconvolution modules of the deconvolution sub-network containing 512, 256, 128, 64, 32, 16, 8, 1 feature maps.
As shown in fig. 3a, each convolution block contains 1 input layer, 3 convolution layers, 2 batch normalization layers, 2 nonlinear functions, 1 element addition layer, and 1 output layer. As shown in fig. 3b, each deconvolution block contains 1 input layer, 1 convolutional layer, 2 convolutional layers, 2 batch normalization layers, 2 nonlinear functions, 1 element addition layer, and 1 output layer.
The Batch Normalization (BN) includes a Batch norm step and a scale step. The batch norm is responsible for normalization of 0-mean-1 variance for the input, and scale is responsible for scaling and translation for the input. The mean and variance of batch norm come from the input, while the scale layer's scale and translation parameters need to be learned from the training data. The batch normalization layer effectively eliminates covariate transfer inside the network by normalizing the network input, accelerates the convergence of the network, and is a regularization mechanism to effectively prevent overfitting of the network. The nonlinear function is a relu (rectified Linear units) activation function, which is a commonly used effective nonlinear activation function and will not be described herein again.
In step 400, the method for constructing the action network model includes:
step 401: and determining a plurality of short video sequences according to a target tracking data set of the target to be tracked.
Each short video sequence comprises a plurality of frames of position images, and each frame of position image contains position information of an object to be tracked. In this embodiment, each short video sequence contains at least 50 frames and at most 100 frames of images.
Step 402: and performing network training according to each short video sequence and the target positioning model to determine an action network model.
Wherein the action network model comprises a second sub-network shared by a plurality of weights.
The input of each second sub-network is predicted position information and a plurality of historical position information, and the output is a target position prediction profit value.
And each subnetwork contains 2 fully connected layers, with 64 and 32 neurons respectively. The outputs of all the sub-networks are spliced into a vector, a full connection layer with 64 neurons is input, and the output is n nodes. Wherein n has a value of 8 and m has a value of 12.
In the network training phase, the elements involved in reinforcement learning are first defined as follows: defining an agent as a target positioning model, defining an environment (env) as a current video frame and all target templates stored, defining a state(s) as stored historical coordinates and target prediction coordinates corresponding to all target templates, defining an action (a) as selecting a best template from the target templates, defining an award (r) as a predicted target position (bbx)pre) And sample truth value (bbx)gt) Cross-to-parallel ratio (IoU) between them, i.e.
Figure BDA0001953502070000111
Operation value Qπ(s, a) represents the desire to select action a in state s according to policy π. Vπ(s) represents the expectation of selecting strategy pi at state s. In the training process, actions are selected according to the probability distribution of the output of the action network.
During training, the state s is firstly acquired by using envt(ii) a The prize r is derived using equation (1). Updating env by using the action network; obtaining the next state s by envt+1(ii) a To obtain TD error, i.e.
td=r+γ·Vπ(st;θc)-Vπ(st+1;θc) (2);
Where γ is equal to 0.9. Calculating the gradient of the action network, i.e.
Figure BDA0001953502070000121
Calculating the gradient of the discriminating network, i.e.
Figure BDA0001953502070000122
And finally, updating the two networks respectively by utilizing the gradients of the action network and the judgment network.
Aiming at the problem of single target tracking, the invention designs a deep neural network based on a conjoined network and an operator-criticc network structure based on a multilayer perceptron. Through offline supervised learning, the target position can be accurately positioned in a search area by the deep neural network based on the connected network, and the length and the width of the target can be accurately predicted; by updating the parameters of the deconvolution part of the conjoined network on line, the prediction precision of the network can be greatly improved. Through reinforcement learning based on time difference errors, the operator network can accurately predict the value functions of different target templates in the tracking process, so that a robust target template updating strategy can be learned in the tracking process, and meanwhile, the critic network can evaluate the strategy made by the operator network in the training process, so that the operator network learns a more robust target template updating strategy. The method has high tracking performance in a single-target visual tracking task.
In addition, the invention also provides a visual target tracking system to improve the target tracking precision.
As shown in fig. 5, the visual target tracking system of the present invention includes an acquisition unit 1, a first determination unit 2, a prediction unit 3, a second determination unit 4, and a tracking unit 5.
The acquiring unit 1 is configured to acquire a plurality of historical target templates and historical position information of a current video sequence of a target to be tracked.
The first determining unit 2 is configured to determine, according to each of the historical target templates, a plurality of sets of target template images and search regions from the current video sequence of the target to be tracked.
The prediction unit 3 is configured to predict the predicted position information of the target template image in the search area according to the target positioning model, and each set of target template image and search area.
The second determining unit 4 is configured to determine a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information based on an action network model.
The tracking unit 5 is configured to compare the target position prediction benefit values of the target template images, determine the prediction position information of the target template image with the maximum target position prediction benefit value, and track the current frame image of the target to be tracked.
Preferably, the visual target tracking system of the present invention further comprises an extraction unit and an update unit. The extraction unit is used for extracting a prediction target template from the current frame image according to the prediction position information; the updating unit is used for replacing the target template corresponding to the target template image with the minimum target position prediction profit value with the prediction target template so as to update the historical target template
Compared with the prior art, the visual target tracking system has the same beneficial effects as the visual target tracking method, and is not repeated herein.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A visual target tracking method, comprising:
acquiring a plurality of historical target templates and historical position information of a current video sequence of a target to be tracked;
determining a plurality of groups of target template images and search areas from the current video sequence of the target to be tracked according to each historical target template;
predicting the predicted position information of the target template image in the search area according to the target positioning model, each group of target template images and the search area;
based on an action network model, determining a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information;
and comparing the target position prediction profit values of all the target template images, and determining the prediction position information of the target template image with the maximum target position prediction profit value for tracking the current frame image of the target to be tracked.
2. The visual target tracking method of claim 1, further comprising:
extracting a prediction target template from the current frame image according to the prediction position information of the target template image with the maximum target position prediction profit value;
and replacing the target template corresponding to the target template image with the minimum target position prediction profit value by the prediction target template so as to update the historical target template.
3. The visual target tracking method of claim 1, wherein the target location model is constructed by a method comprising:
carrying out target position labeling on a historical video sequence of a target to be tracked to obtain a target tracking data set of the target to be tracked;
determining training data for training the conjoined network according to the target tracking data set;
and performing connected network training on the training data to obtain a target positioning model.
4. The visual target tracking method according to claim 3, wherein the target position labeling is performed on the historical video sequence of the target to be tracked, and a target tracking data set of the target to be tracked is obtained, specifically comprising:
extracting a plurality of frames of images from the historical video sequence;
determining a corresponding target rectangular frame from each frame of image, wherein each target rectangular frame is a minimum rectangular frame containing a target to be tracked;
obtaining a target tracking data set according to each target rectangular frame; the target tracking data set includes a plurality of pairs of coordinate pairs formed by upper left corner coordinate information and lower right corner coordinate information of a set rectangular box.
5. The visual target tracking method according to claim 4, wherein the determining training data for training the connected network according to the target tracking data set specifically comprises:
determining a target area and a search area of each target rectangular frame according to the target tracking data set;
generating a target response graph in a Gaussian shape in the target area by taking the search area as a reference for each target rectangular frame;
determining a truth sample according to the target response graph; the training data includes a plurality of true value samples.
6. The visual target tracking method according to claim 5, wherein the performing connected network training on the training data to obtain a target positioning model specifically comprises:
for each target rectangular frame, carrying out pixel scaling on a target image of a target area and a search image of a search area to obtain a pair of scaled target images and scaled search images;
carrying out RGB three-channel change on each pair of zoomed target images and zoomed searching images to obtain a pair of color images;
respectively carrying out mean value reduction operation on the color images to obtain a pair of value-reduced color images;
training each pair of subtraction color images through a depth network to obtain a depth network output value;
calculating errors for output values and true value samples of the depth network through a loss function;
according to the error, obtaining a target positioning model by adopting a back propagation algorithm; and the target positioning model determines the predicted position information of the target template image in the search area according to the input target template image and the search area.
7. A visual object tracking method according to any one of claims 1-6, wherein the object localization model comprises two structurally identical, parameter-sharing first sub-networks and one deconvolution first sub-network;
the output ends of the two first sub-networks are connected in parallel and then connected with the input end of the deconvolution first sub-network;
combining the outputs of the two first sub-networks to serve as the input of the deconvolution first sub-network; the output of the deconvolution first subnetwork is predicted position information used for characterizing the center position and scale information of the target.
8. The visual target tracking method of claim 3, wherein the method of constructing the action network model comprises:
determining a plurality of short video sequences according to a target tracking data set of a target to be tracked; each short video sequence comprises a plurality of frames of position images, and each frame of position image comprises position information of an object to be tracked;
and performing network training according to each short video sequence and the target positioning model to determine an action network model.
9. The visual target tracking method of claim 1, wherein the action network model comprises a second sub-network shared by a plurality of weights;
the input of each second sub-network is predicted position information and a plurality of historical position information, and the output is a target position prediction profit value.
10. A visual target tracking system, the visual target tracking system comprising:
the device comprises an acquisition unit, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of historical target templates and historical position information of a current video sequence of a target to be tracked;
the first determining unit is used for determining a plurality of groups of target template images and search areas from the current video sequence of the target to be tracked according to each historical target template;
the prediction unit is used for predicting the predicted position information of the target template image in the search area according to the target positioning model, each group of target template images and the search area;
a second determining unit, configured to determine, based on an action network model, a target position prediction profit value of the target template image according to the predicted position information of the target template image and the historical position information;
and the tracking unit is used for comparing the target position prediction profit values of all the target template images, determining the prediction position information of the target template image with the maximum target position prediction profit value, and tracking the current frame image of the target to be tracked.
CN201910058977.1A 2019-01-22 2019-01-22 Visual target tracking method and system Active CN109785385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910058977.1A CN109785385B (en) 2019-01-22 2019-01-22 Visual target tracking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910058977.1A CN109785385B (en) 2019-01-22 2019-01-22 Visual target tracking method and system

Publications (2)

Publication Number Publication Date
CN109785385A CN109785385A (en) 2019-05-21
CN109785385B true CN109785385B (en) 2021-01-29

Family

ID=66502068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910058977.1A Active CN109785385B (en) 2019-01-22 2019-01-22 Visual target tracking method and system

Country Status (1)

Country Link
CN (1) CN109785385B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738687A (en) * 2019-10-18 2020-01-31 上海眼控科技股份有限公司 Object tracking method, device, equipment and storage medium
CN110796686B (en) * 2019-10-29 2022-08-09 浙江大华技术股份有限公司 Target tracking method and device and storage device
CN111161311A (en) * 2019-12-09 2020-05-15 中车工业研究院有限公司 Visual multi-target tracking method and device based on deep learning
CN111161314B (en) * 2019-12-17 2024-03-12 中国科学院上海微系统与信息技术研究所 Target object position area determination method and device, electronic equipment and storage medium
CN111392619B (en) * 2020-03-25 2021-11-23 广东博智林机器人有限公司 Tower crane early warning method, device and system and storage medium
CN111563913B (en) * 2020-04-15 2021-12-10 上海摩象网络科技有限公司 Searching method and device based on tracking target and handheld camera thereof
CN112037254A (en) * 2020-08-11 2020-12-04 浙江大华技术股份有限公司 Target tracking method and related device
CN112037255A (en) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 Target tracking method and device
CN112614111B (en) * 2020-12-24 2023-09-05 南开大学 Video tampering operation detection method and device based on reinforcement learning
CN113052874B (en) * 2021-03-18 2022-01-25 上海商汤智能科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN113421287A (en) * 2021-07-16 2021-09-21 上海微电机研究所(中国电子科技集团公司第二十一研究所) Robot based on vision active target tracking and control method and system thereof
CN113947616B (en) * 2021-09-23 2022-08-30 北京航空航天大学 Intelligent target tracking and loss rechecking method based on hierarchical perceptron

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424634A (en) * 2013-08-23 2015-03-18 株式会社理光 Object tracking method and device
CN105931269A (en) * 2016-04-22 2016-09-07 海信集团有限公司 Tracking method for target in video and tracking device thereof
US9552648B1 (en) * 2012-01-23 2017-01-24 Hrl Laboratories, Llc Object tracking with integrated motion-based object detection (MogS) and enhanced kalman-type filtering
CN106408592A (en) * 2016-09-09 2017-02-15 南京航空航天大学 Target tracking method based on target template updating
CN107038448A (en) * 2017-03-01 2017-08-11 中国科学院自动化研究所 Target detection model building method
CN108021856A (en) * 2016-10-31 2018-05-11 比亚迪股份有限公司 Light for vehicle recognition methods, device and vehicle
CN108460787A (en) * 2018-03-06 2018-08-28 北京市商汤科技开发有限公司 Method for tracking target and device, electronic equipment, program, storage medium
CN109145781A (en) * 2018-08-03 2019-01-04 北京字节跳动网络技术有限公司 Method and apparatus for handling image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521840B (en) * 2011-11-18 2014-06-18 深圳市宝捷信科技有限公司 Moving target tracking method, system and terminal
CN102982559B (en) * 2012-11-28 2015-04-29 大唐移动通信设备有限公司 Vehicle tracking method and system
CN108010058A (en) * 2017-11-29 2018-05-08 广东技术师范学院 A kind of method and system that vision tracking is carried out to destination object in video flowing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9552648B1 (en) * 2012-01-23 2017-01-24 Hrl Laboratories, Llc Object tracking with integrated motion-based object detection (MogS) and enhanced kalman-type filtering
CN104424634A (en) * 2013-08-23 2015-03-18 株式会社理光 Object tracking method and device
CN105931269A (en) * 2016-04-22 2016-09-07 海信集团有限公司 Tracking method for target in video and tracking device thereof
CN106408592A (en) * 2016-09-09 2017-02-15 南京航空航天大学 Target tracking method based on target template updating
CN108021856A (en) * 2016-10-31 2018-05-11 比亚迪股份有限公司 Light for vehicle recognition methods, device and vehicle
CN107038448A (en) * 2017-03-01 2017-08-11 中国科学院自动化研究所 Target detection model building method
CN108460787A (en) * 2018-03-06 2018-08-28 北京市商汤科技开发有限公司 Method for tracking target and device, electronic equipment, program, storage medium
CN109145781A (en) * 2018-08-03 2019-01-04 北京字节跳动网络技术有限公司 Method and apparatus for handling image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"视频目标跟踪与定位方法研究";杨静;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180315(第3期);正文第2-12页 *

Also Published As

Publication number Publication date
CN109785385A (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN109785385B (en) Visual target tracking method and system
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN109741347B (en) Iterative learning image segmentation method based on convolutional neural network
US20230418250A1 (en) Operational inspection system and method for domain adaptive device
CN107633226B (en) Human body motion tracking feature processing method
CN109977997B (en) Image target detection and segmentation method based on convolutional neural network rapid robustness
Chen et al. Learning linear regression via single-convolutional layer for visual object tracking
CN109658442B (en) Multi-target tracking method, device, equipment and computer readable storage medium
CN111091023B (en) Vehicle detection method and device and electronic equipment
CN108734109B (en) Visual target tracking method and system for image sequence
CN113888461A (en) Method, system and equipment for detecting defects of hardware parts based on deep learning
CN113221787A (en) Pedestrian multi-target tracking method based on multivariate difference fusion
CN107844739B (en) Robust target tracking method based on self-adaptive simultaneous sparse representation
CN111242026A (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN113313047A (en) Lane line detection method and system based on lane structure prior
CN112364881B (en) Advanced sampling consistency image matching method
CN111191739B (en) Wall surface defect detection method based on attention mechanism
Cheng et al. Water quality monitoring method based on TLD 3D fish tracking and XGBoost
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN117576079A (en) Industrial product surface abnormality detection method, device and system
CN116994236A (en) Low-quality image license plate detection method based on deep neural network
CN116824330A (en) Small sample cross-domain target detection method based on deep learning
Zhang et al. Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image
CN111435086B (en) Navigation method and device based on splicing map
CN111461060A (en) Traffic sign identification method based on deep learning and extreme learning machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant