CN113221676B

CN113221676B - Target tracking method and device based on multidimensional features

Info

Publication number: CN113221676B
Application number: CN202110450929.4A
Authority: CN
Inventors: 程力; 赵明心; 窦润江; 刘力源; 刘剑; 吴南健
Original assignee: Institute of Semiconductors of CAS
Current assignee: Institute of Semiconductors of CAS
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2023-10-13
Anticipated expiration: 2041-04-25
Also published as: CN113221676A

Abstract

The invention discloses a target tracking method and device based on multidimensional features, wherein the method comprises the following steps: determining a candidate image from the images to be tracked based on the position of the target object in the target tracking image, wherein the target tracking image is an associated image in a preset time period of the images to be tracked; extracting features of a target image from different dimensions to generate first multi-dimensional feature data, wherein the target image comprises a target object; extracting features of the candidate images from different dimensions to generate second multi-dimensional feature data; performing characteristic shaping processing on the first multi-dimensional characteristic data and the second multi-dimensional characteristic data to obtain a shaping characteristic diagram; and inputting the shaping feature map into a target tracking model, and outputting target object confidence coefficient which is used for representing the confidence coefficient of the target object contained in the candidate image. The robustness of the target tracking model under different scenes is improved by respectively carrying out multi-dimensional feature extraction on the target image and the candidate image.

Description

Target tracking method and device based on multidimensional features

Technical Field

The invention relates to the technical field of computer vision, in particular to a target tracking method and device based on multidimensional features.

Background

The field of visual target tracking is an important research topic in computer vision. Common object tracking algorithms can be divided into a discriminant model and a generative model, wherein the core of the discriminant model is to obtain a classifier to separate the object from the background. The core of the model generation is to model the tracked target, and search the block most similar to the target in the next frame.

The generative model framework can be generally divided into two parts, extracted features and feature classification. Existing generative models typically describe the object using a single feature, and thus do not adequately describe the object. Some researchers propose to use multiple features to describe an image simultaneously, and to introduce multiple features while also using multiple classifiers to classify the features correspondingly, and finally combining the results of the multiple classifiers linearly to produce the final result. The method has the problems that as the number of features increases, the design complexity and the calculation redundancy of the classifier are rapidly increased, and meanwhile, the effectiveness problem and the optimal combination problem of the classifier combination are still to be researched.

Disclosure of Invention

Accordingly, it is a primary object of the present invention to provide a method and apparatus for object tracking based on multi-dimensional features, so as to at least partially solve at least one of the above-mentioned problems.

In order to achieve the above object, the present invention provides a technical solution comprising:

one aspect of the present invention provides a target tracking method based on a multi-dimensional feature, including:

determining a candidate image from an image to be tracked based on the position of a target object in a target tracking image, wherein the target tracking image is an associated image in a preset time period of the image to be tracked;

extracting features of a target image from different dimensions to generate first multi-dimensional feature data, wherein the target image comprises the target object;

extracting features of the candidate images from different dimensions to generate second multi-dimensional feature data;

performing characteristic shaping processing on the first multi-dimensional characteristic data and the second multi-dimensional characteristic data to obtain a shaping characteristic diagram; and

and inputting the shaping feature map into a target tracking model, and outputting target object confidence, wherein the target object confidence is used for representing the confidence of the target object contained in the candidate image.

Another aspect of the present invention provides a multi-dimensional feature-based object tracking device, comprising:

the determining module is used for determining candidate images from images to be tracked based on the position of a target object in target tracking images, wherein the target tracking images are associated images in a preset time period of the images to be tracked;

the first extraction module is used for extracting the characteristics of the target image from different dimensions and generating first multi-dimensional characteristic data;

the second extraction module is used for extracting the characteristics of the candidate images from different dimensions and generating second multi-dimensional characteristic data;

the characteristic shaping module is used for carrying out characteristic shaping processing on the first multi-dimensional characteristic data and the second multi-dimensional characteristic data to obtain a shaping characteristic diagram; and

the target tracking module is used for inputting the shaping feature map into a target tracking model and outputting target object confidence coefficient, and the target object confidence coefficient is used for representing the confidence coefficient of the target object contained in the candidate image.

Based on the technical scheme, compared with the prior art, the invention has at least one or a part of the following beneficial effects:

1. the robustness of the target tracking model under different scenes is improved by respectively carrying out multidimensional feature extraction on the target image and the candidate image;

2. the target tracking model is utilized to carry out target tracking on the candidate images, so that the design difficulty of the classifier in the prior art is reduced, the calculation complexity and redundancy of the classifier are also reduced, and the multi-dimensional characteristics of the target images and the candidate images are fully utilized;

3. the target tracking method based on the multidimensional features provided by the invention enables the target tracking process to have interpretability, so that the target tracking process can be adjusted according to actual application;

4. the target tracking model provided by the invention is a neural network classifier, so that the characteristic extraction process and the target tracking process are relatively independent, and parallel operation is facilitated, and the target tracking process is accelerated.

Drawings

FIG. 1 schematically illustrates a flow chart of a multi-dimensional feature-based object tracking method provided in accordance with an embodiment of the present invention;

FIG. 2 schematically illustrates a flow chart provided in accordance with an embodiment of the present invention for determining candidate images;

FIG. 3 schematically illustrates a flow chart for updating a target image in a template pool provided in accordance with an embodiment of the present invention;

FIG. 4 schematically illustrates a flow chart for updating a target image in a template pool provided in accordance with another embodiment of the invention;

FIG. 5 schematically illustrates a target tracking method based on multi-dimensional features according to an embodiment of the present invention;

FIG. 6 schematically illustrates a flow chart of the resulting shaping feature map provided by an embodiment of the present invention; and

fig. 7 schematically illustrates a block diagram of a target tracking device based on multi-dimensional features according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Fig. 1 schematically shows a flowchart of a target tracking method based on multi-dimensional features according to an embodiment of the invention.

As shown in fig. 1, the method includes operations S101 to S105.

In operation S101, a candidate image is determined from images to be tracked based on a position of a target object in a target tracking image, wherein the target tracking image is an associated image within a preset period of time of the images to be tracked.

According to an embodiment of the present invention, the target tracking image may include an image of a frame preceding the frame in which the image to be tracked is located.

According to the embodiment of the invention, the shooting interval of two frames of images in a common 30-frame video image sequence is 0.03 seconds.

According to an alternative embodiment of the present invention, since the shooting interval between two adjacent frame images is short, the displacement of the target object on the two adjacent frame images is generally small, and thus, the target tracking image may include an image of a frame preceding the frame where the image to be tracked is located.

According to the embodiment of the invention, the probability of including the target object in the candidate image can be improved by determining the candidate image from the image to be tracked based on the position of the target object in the target tracking image.

In operation S102, features of a target image are extracted from different dimensions, and first multi-dimensional feature data is generated, wherein the target image includes a target object therein.

In operation S103, features of the candidate image are extracted from different dimensions, and second multi-dimensional feature data is generated.

In operation S104, feature shaping processing is performed on the first multi-dimensional feature data and the second multi-dimensional feature data, so as to obtain a shaping feature map. And

in operation S105, the shaping feature map is input to the target tracking model, and the target object confidence is output, where the target object confidence is used to characterize the confidence that the candidate image includes the target object.

The embodiment of the invention provides a target tracking method based on multi-dimensional characteristics, which improves the robustness of a target tracking model in different scenes by respectively extracting the multi-dimensional characteristics of a target image and a candidate image, and reduces the design difficulty of a classifier in the prior art and the calculation complexity and redundancy of the classifier by utilizing the target tracking model to track the target of the candidate image, thereby fully utilizing the multi-dimensional characteristics of the target image and the candidate image.

According to an embodiment of the invention, the first multi-dimensional feature data comprises at least one of: the first scale invariant feature data, the first rotation invariant feature data and the first illumination invariant feature data.

The second multi-dimensional feature data includes at least one of: the second scale invariant feature data, the second rotation invariant feature data and the second illumination invariant feature data.

According to the embodiment of the invention, harris angle detection, scale invariant feature transformation and local binary processing are carried out on a target image to respectively obtain first scale invariant feature data, first rotation invariant feature data and first illumination invariant feature data.

And carrying out Harris angle detection, scale invariant feature transformation and local binary processing on the candidate image to respectively obtain second scale invariant feature data, second rotation invariant feature data and second illumination invariant feature data.

According to the embodiment of the invention, the multi-dimensional characteristics of the target image and the candidate image can be fully represented by extracting the scale invariant characteristic, the rotation invariant characteristic and the illumination invariant characteristic of the target image and extracting the scale invariant characteristic, the rotation invariant characteristic and the illumination invariant characteristic of the candidate image, so that the robustness of the target tracking model under different scenes is improved.

According to the embodiment of the invention, in order to ensure the parallelism of the feature extraction process, the extraction process of the scale-invariant feature, the rotation-invariant feature and the illumination-invariant feature can be performed simultaneously.

According to the embodiment of the invention, on the premise of enough hardware resources, the feature extraction operation can be simultaneously carried out on a plurality of candidate images.

Fig. 2 schematically illustrates a flowchart for determining a candidate image from an image to be tracked based on the position of a target object in a target tracking image according to an embodiment of the present invention.

According to an embodiment of the present invention, referring to fig. 2, determining a candidate image from an image to be tracked based on a position of a target object in a target tracking image includes operations S201 to S203.

In operation S201, an image to be tracked is acquired.

In operation S202, a search area of the image to be tracked is determined based on the position of the target object in the target tracking image, wherein the range of the search area is greater than the range of the target object in the target tracking image.

According to an embodiment of the present invention, the center of the search area may coincide with the center of the target object in the target tracking image.

According to the embodiment of the invention, the length of the search area range can be 1.5-2.5 times the length of the target object range in the target tracking image; the search area range may be 1.5 to 2.5 times as wide as the target object range in the target tracking image.

According to an alternative embodiment of the present invention, the length of the search area range may be 2 times longer than the length of the target object range in the target tracking image; the search area range may be 2 times as wide as the target object range in the target tracking image.

In operation S203, a candidate image is determined from the search area in the image to be tracked using a sliding window method.

According to the embodiment of the present invention, when a candidate image is determined from a search area in an image to be tracked using a sliding window method, the range of the sliding window may be the same as the range of a target object in a target tracking image.

According to an embodiment of the present invention, determining a candidate image from a search area in an image to be tracked using a sliding window method includes: a plurality of candidate images are determined from a search area in the image to be tracked using a sliding window method.

According to the embodiment of the invention, the sliding window can be utilized to slide in the search area according to the preset step length, one candidate image is determined every time the sliding window slides, and a plurality of candidate images can be determined through multiple sliding of the sliding window.

Fig. 3 schematically shows a flow chart for updating a target image in a template pool with a candidate image corresponding to the highest target object confidence provided in accordance with an embodiment of the present invention.

According to an embodiment of the present invention, referring to fig. 3, the target tracking method further includes operations S301 to S303.

In operation S301, a template pool is constructed, wherein a plurality of target images are included in the template pool.

In operation S302, a plurality of target object confidences are obtained, where each target object confidence is a confidence obtained by performing feature shaping processing on the second multi-dimensional feature data of one candidate image and the first multi-dimensional feature data of one target image, and inputting the obtained shaped feature map to the target tracking model.

According to other embodiments of the present invention, the confidence of the plurality of target objects may be averaged to obtain an average confidence, and the average confidence is used as the final confidence of the image to be tracked.

In operation S303, a highest target object confidence is determined from the plurality of target object confidence.

In operation S304, in case it is determined that the highest target object confidence is greater than the first preset threshold, the target image in the template pool is updated with the candidate image corresponding to the highest target object confidence.

According to an embodiment of the present invention, the first preset threshold may represent a degree of similarity between the target object and the real target object in the candidate image, for example, the first preset threshold may be 80, which represents a degree of similarity between the target object and the real target object in the candidate image of 80%, but is not limited thereto, and the first preset threshold may be 83, 85, or 87.

According to the embodiment of the invention, the first preset threshold is not particularly limited, and the first preset threshold can be flexibly adjusted according to practical application by a person skilled in the art.

According to the embodiment of the invention, the situation that no target object exists in the candidate image or the quality of the target object in the candidate image is poor can exist, and because the embodiment of the invention firstly determines whether the highest target confidence is larger than the first preset threshold before updating the template pool, the template pool is updated under the condition that the highest target confidence is larger than the first preset threshold, thereby ensuring the effectiveness of the target image in the template pool.

According to an embodiment of the invention, the plurality of target images in the template pool are ordered according to the update time.

According to an embodiment of the present invention, for example, four Target images are included in the template pool, i.e., target= { Target1, target2, target3, target4}, where Target1 may be the first Target image updated into the template pool and Target2 may be the second Target image updated into the template pool.

According to an embodiment of the invention, target1 may be a target image of a manually annotated target object.

It should be noted that, the foregoing description of the template pool is merely exemplary, and does not limit the template pool provided by the embodiments of the present invention.

Fig. 4 schematically shows a flow chart for updating a target image in a template pool with a candidate image corresponding to the highest target object confidence, provided in accordance with another embodiment of the invention.

According to an embodiment of the present invention, referring to fig. 4, the target tracking method includes S301 to S304 and S401 to S403. Operations S301 to S304 are the same as or similar to the method described above with reference to fig. 3, and will not be described here again.

In operation S401, it is determined whether the number of target images in the template pool is greater than a second preset threshold.

In case that the number of target images in the template pool is greater than the second preset threshold, operation S402 is performed.

In operation S402, the target image in the template pool is updated with the candidate image corresponding to the highest target object confidence based on the update time.

According to the embodiment of the invention, since Target1 is the Target image of the Target object which is marked manually, the Target image updated to the template pool by the second time can be deleted, for example, target2 in the Target is deleted, then Target3 is taken as Target2, target4 is taken as Target3, and finally the candidate image corresponding to the highest Target object confidence is taken as Target4 and added to the template pool to update the Target image in the template pool.

Operation S403 is performed in case the number of target images in the template pool is less than a second preset threshold.

In operation S403, the candidate image corresponding to the highest target object confidence is added to the template pool.

According to the embodiment of the invention, the target image in the template pool is continuously updated in the application process, so that the target object in the target image in the template pool can be ensured to keep higher similarity with the real target object, and the effectiveness of the target image is ensured.

Fig. 5 schematically illustrates a schematic diagram of a target tracking method based on multi-dimensional features according to an embodiment of the present invention.

According to the embodiment of the invention, the target tracking model can be a model based on a neural network classifier, and the target tracking is carried out on the candidate image by utilizing the neural network classifier, so that after the multi-dimensional feature extraction is carried out on the candidate image and the target image, a corresponding classifier is not required to be independently arranged for the feature of each dimension, and the technical effects of reducing the design complexity and the calculation redundancy of the classifier are realized.

In accordance with an embodiment of the present invention, and referring more particularly to fig. 5, a neural network classifier-based target tracking model includes a first convolutional layer, a first pooled layer, a nonlinear activation layer, a second convolutional layer, a second pooled layer, a third convolutional layer, and a fully-connected layer, which are cascaded in sequence.

According to an embodiment of the invention, the nonlinear activation layer may employ a Relu activation function.

According to an embodiment of the present invention, the first convolution layer may receive an image of 32×32×1 and may receive a feature vector of length 1024.

According to the embodiment of the invention, the input layer of the target tracking model is in a convolution structure, the output layer is in a full-connection structure, and the middle hidden layer is in a convolution, nonlinear activation and pooling structure in a mixed mode. Different network structures in the object tracking model have different practical effects, e.g. the amount of convolution kernel parameters in the convolution layer is small, but the amount of computation required is relatively large. The fully connected layer is used for storing weight information of all neurons, so that the parameter quantity is obviously larger than that of the convolution layer, but the calculation quantity is lower than that of the convolution layer. Different network structures are mixed in the target tracking model, so that the calculated amount and the parameter amount of the target tracking model can be well balanced, and the hardware cost is reduced.

According to an embodiment of the present invention, the object tracking model processes feature information contained in an input image of the object tracking model using a convolution layer. The hidden layer of the object tracking model uses a mixture of nonlinear activation, pooling, and convolution layers to extract abstract features in the feature map. The target tracking model outputs a classification result by using a full-connection layer, and the full-connection layer can comprise two output channels, which respectively correspond to the confidence of classifying candidate images into targets and backgrounds.

According to the embodiment of the invention, since the feature length of the first multi-dimensional feature data and the second multi-dimensional feature data cannot be matched with the size of the input layer of the target tracking model, feature shaping processing can be performed on the first multi-dimensional feature data and the second multi-dimensional feature data so that the feature length of the first multi-dimensional feature data and the second multi-dimensional feature data are matched with the size of the input layer of the target tracking model.

According to an embodiment of the present invention, parameter information of the first convolution layer, the first pooling layer, the nonlinear activation layer, the second convolution layer, the second pooling layer, the third convolution layer, and the full connection layer is shown in table 1.

TABLE 1

Type(s)	Step size	Filter size	Output channel
				First convolution layer	1	5x5	4
First pooling layer	2	2x2	4
				Nonlinear active layer	1	--	4
Second convolution layer	1	5x5	16
				Second pooling layer	1	--	16
Third convolution layer	1	5x5	64
				Full connection layer	--	--	2

Fig. 6 schematically illustrates a flowchart of performing feature shaping processing on the first multi-dimensional feature data and the second multi-dimensional feature data to obtain a shaped feature map according to an embodiment of the present invention.

According to an embodiment of the present invention, referring to fig. 6 and 5, performing feature shaping processing on the first multi-dimensional feature data and the second multi-dimensional feature data to obtain a shaped feature map includes operations S601 to S604.

In operation S601, the first multi-dimensional feature data and the second multi-dimensional feature data are combined to obtain combined feature data.

In operation S602, the combined feature data is up-sampled to obtain up-sampled feature data.

According to an embodiment of the invention, the feature length of the upsampled feature data may be greater than 1024.

In operation S603, the up-sampled feature data is subjected to adaptive pooling processing, so as to obtain pooled feature data with a preset dimension.

According to an embodiment of the invention, the feature length of the pooled feature data may be 1024.

In operation S604, the pooled feature data is rearranged to obtain a shaping feature map.

According to the embodiment of the invention, the pooled feature data can be segmented and rearranged by carrying out rearrangement processing on the pooled feature data, so that the shaping feature map with the resolution of 32 x 32 is obtained.

According to the embodiment of the invention, since the input layer of the target tracking model can receive the 32×32×1 image, the dimensions of the first multi-dimensional feature data and the second multi-dimensional feature data after the feature shaping processing are identical to those of the input layer of the target tracking model.

According to the embodiment of the invention, after the first multi-dimensional characteristic data and the second multi-dimensional characteristic data are obtained, the first multi-dimensional characteristic data and the second multi-dimensional characteristic data are connected to obtain combined characteristic data which simultaneously comprises target image characteristics and candidate image characteristics, the combined characteristic data are scaled to be of a fixed length by up-sampling and adaptive pooling, and finally the scaled characteristic vectors are truncated and rearranged to obtain a shaping characteristic diagram, and the shaping characteristic diagram is used as the input of a target tracking model.

According to the embodiment of the invention, the target image is normalized to the preset resolution to obtain the first feature map to be extracted, so that the features of the target image are extracted from different dimensions to generate first multi-dimensional feature data.

According to the embodiment of the invention, the resolution of the target image can be normalized to 32 x 32, so that the feature length of the first multi-dimensional feature data obtained by extracting the features of the target image can be fixed, and the feature shaping processing is facilitated.

And normalizing the candidate images to a preset resolution to obtain a second feature map to be extracted so as to extract the features of the candidate images from different dimensions and generate second multi-dimensional feature data.

According to the embodiment of the invention, the resolution of the candidate image can be normalized to 32 x 32, so that the feature length of the second multi-dimensional feature data obtained by extracting the features of the candidate image can be fixed, and the feature shaping processing is facilitated.

The embodiment of the invention also provides a target tracking device based on the multi-dimensional characteristics.

Referring to fig. 7, a multi-dimensional feature-based object tracking device 700 provided by an embodiment of the present invention includes a determining module 701, a first extracting module 702, a second extracting module 703, a feature shaping module 704, and an object tracking module 705.

The determining module 701 is configured to determine a candidate image from images to be tracked based on a position of a target object in a target tracking image, where the target tracking image is an associated image within a preset period of time of the images to be tracked.

A first extraction module 702 is configured to extract features of the target image from different dimensions, and generate first multi-dimensional feature data.

A second extraction module 703 is configured to extract features of the candidate image from different dimensions, and generate second multi-dimensional feature data.

And the feature shaping module 704 is configured to perform feature shaping processing on the first multi-dimensional feature data and the second multi-dimensional feature data, so as to obtain a shaping feature map. And

the target tracking module 705 is configured to input the shaping feature map to a target tracking model, and output a target object confidence level, where the target object confidence level is used to characterize a confidence level of a target object included in the candidate image.

It should be noted that, in the embodiment of the present invention, the portion of the target tracking device based on the multi-dimensional feature corresponds to the portion of the target tracking method based on the multi-dimensional feature in the embodiment of the present invention, and the portion of the target tracking device based on the multi-dimensional feature is described with specific reference to the portion of the target tracking method based on the multi-dimensional feature, which is not described herein again.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the invention thereto, but to limit the invention thereto, and any modifications, equivalents, improvements and equivalents thereof may be made without departing from the spirit and principles of the invention.

Claims

1. A target tracking method based on multi-dimensional features, comprising:

extracting features of a target image from different dimensions, and generating first multi-dimensional feature data, wherein the target image comprises the target object, and the first multi-dimensional feature data comprises at least one of the following: the first scale invariant feature data, the first rotation invariant feature data and the first illumination invariant feature data;

extracting features of the candidate image from different dimensions, generating second multi-dimensional feature data, wherein the second multi-dimensional feature data comprises at least one of: the second scale invariant feature data, the second rotation invariant feature data and the second illumination invariant feature data;

performing characteristic shaping processing on the first multi-dimensional characteristic data and the second multi-dimensional characteristic data to obtain a shaping characteristic diagram;

performing feature shaping processing on the first multi-dimensional feature data and the second multi-dimensional feature data to obtain a shaping feature map, wherein the step of obtaining the shaping feature map comprises the following steps:

combining the first multi-dimensional characteristic data and the second multi-dimensional characteristic data to obtain combined characteristic data;

carrying out up-sampling treatment on the combined characteristic data to obtain up-sampling characteristic data;

performing self-adaptive pooling processing on the up-sampling characteristic data to obtain pooled characteristic data with preset dimensions;

rearranging the pooled characteristic data to obtain a shaping characteristic diagram; and

inputting the shaping feature map into a target tracking model, and outputting target object confidence coefficient, wherein the target object confidence coefficient is used for representing the confidence coefficient of the target object contained in the candidate image;

constructing a template pool, wherein the template pool comprises a plurality of target images;

obtaining a plurality of target object confidence degrees, wherein each target object confidence degree is a confidence degree obtained by performing feature shaping processing on second multi-dimensional feature data of one candidate image and first multi-dimensional feature data of one target image, and inputting the obtained shaping feature image into the target tracking model;

determining a highest target object confidence level from the plurality of target object confidence levels;

and under the condition that the highest target object confidence is determined to be greater than a first preset threshold, updating the target image in the template pool by using the candidate image corresponding to the highest target object confidence.

2. The method of claim 1, wherein,

carrying out Harris angle detection, scale invariant feature transformation and local binary processing on the target image to respectively obtain first scale invariant feature data, first rotation invariant feature data and first illumination invariant feature data;

and carrying out Harris angle detection, scale invariant feature transformation and local binary processing on the candidate images to respectively obtain second scale invariant feature data, second rotation invariant feature data and second illumination invariant feature data.

3. The method of claim 1, wherein the determining a candidate image from the image to be tracked based on the location of the target object in the target tracking image comprises:

acquiring the image to be tracked;

determining a search area of the image to be tracked based on the position of the target object in the target tracking image, wherein the range of the search area is larger than the range of the target object in the target tracking image;

and determining the candidate image from the search area in the image to be tracked by utilizing a sliding window method.

4. A method according to claim 3, wherein said determining said candidate image from a search area in said image to be tracked using a sliding window method comprises: and determining a plurality of candidate images from a search area in the image to be tracked by utilizing a sliding window method.

5. The method of claim 4, further comprising:

sorting the plurality of target images in the template pool according to the update time;

the method further comprises the steps of:

judging whether the number of target images in the template pool is larger than a second preset threshold value or not;

in case the number of target images in the template pool is greater than the second preset threshold,

updating the target image in the template pool with the candidate image corresponding to the highest target object confidence based on the update time;

in the case that the number of target images in the template pool is smaller than the second preset threshold value:

and adding the candidate image corresponding to the highest target object confidence to the template pool.

6. The method of claim 1, wherein the object tracking model comprises a first convolution layer, a first pooling layer, a nonlinear activation layer, a second convolution layer, a second pooling layer, a third convolution layer, and a fully connected layer that are cascaded in sequence.

7. The method of claim 1, further comprising:

normalizing the target image to a preset resolution to obtain a first feature image to be extracted so as to extract the features of the target image from different dimensions and generate the first multi-dimensional feature data;

and normalizing the candidate images to a preset resolution to obtain a second feature map to be extracted so as to extract the features of the candidate images from different dimensions and generate the second multi-dimensional feature data.

8. A multi-dimensional feature-based object tracking device, comprising:

the first extraction module is used for extracting the characteristics of the target image from different dimensions and generating first multi-dimensional characteristic data, wherein the first multi-dimensional characteristic data comprises at least one of the following: the first scale invariant feature data, the first rotation invariant feature data and the first illumination invariant feature number;

a second extraction module for extracting features of the candidate image from different dimensions, generating second multi-dimensional feature data, wherein the second multi-dimensional feature data comprises at least one of: the second scale invariant feature data, the second rotation invariant feature data and the second illumination invariant feature data;

the characteristic shaping module is used for carrying out characteristic shaping processing on the first multi-dimensional characteristic data and the second multi-dimensional characteristic data to obtain a shaping characteristic diagram;

the target tracking module is used for inputting the shaping feature map into a target tracking model and outputting target object confidence coefficient, and the target object confidence coefficient is used for representing the confidence coefficient of the target object contained in the candidate image;

the target tracking apparatus is further configured to perform the steps of: