CN114494972A

CN114494972A - Target tracking method and system combining channel selection and position optimization

Info

Publication number: CN114494972A
Application number: CN202210126865.7A
Authority: CN
Inventors: 吴捷; 王宗元; 马小虎
Original assignee: Taizhou Polytechnic College
Current assignee: Taizhou Polytechnic College
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-13

Abstract

The invention discloses a target tracking method and a system combining channel selection and position optimization, which relate to the field of target tracking and comprise the following steps: acquiring a video sequence to be tracked; performing feature extraction by using a CNN network model according to the video sequence to be tracked to obtain convolution features; the convolution features comprise a third convolution feature, a fourth convolution feature and a fifth convolution feature; carrying out gradient calculation, channel selection and cross-correlation operation in sequence according to the convolution characteristics to obtain a first target response graph; determining a target primary positioning position and a peak value side lobe ratio according to the first target response image; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position. The invention gives consideration to the tracking precision and speed, and solves the problem of tracking drift in scenes such as large deformation or low resolution of the target.

Description

Target tracking method and system combining channel selection and position optimization

Technical Field

The invention relates to the field of target tracking, in particular to a target tracking method and a target tracking system combining channel selection and position optimization.

Background

Target tracking is one of research hotspots in the field of computer vision, and has extremely wide application in the fields of human-computer interaction, medical imaging, traffic monitoring and the like. The basic flow of target tracking is to mark a target bounding box in a first frame of a video sequence to be tracked and accurately position a target in a subsequent frame. The target tracking has a few problems to be solved urgently because the target object may deform, rotate and be influenced by many factors such as illumination change in the natural environment in the moving process.

In recent years, the deep learning method of the heterophoria is widely applied to the field of target tracking, and trackers such as DeepSRDCF, CF2, ECO, STRCF, MCCT and the like are combined with a Convolutional Neural Network (CNN) under a DCF framework to achieve higher tracking accuracy, but the operation speed of the algorithm is obviously reduced by using the depth characteristic, and the application range of the algorithm is limited. The full convolution twin network (siamenet) is the hottest research direction in the field of target tracking by virtue of the advantages of accuracy and end-to-end training exceeding the DCF method.

The existing SimFC algorithm firstly introduces a twin network into the field of video target tracking, converts the video target tracking problem into an image matching problem, and realizes the tracking of a target by selecting a candidate image most similar to a template image. The siamnc algorithm realizes end-to-end training and achieves super real-time performance, but is slightly deficient in tracking accuracy, so that a plurality of improved algorithms appear based on the siamnt. Although a series of tracking algorithms based on SiamNet achieve good tracking effect, the following problems also exist: first, the target in visual tracking can be arbitrary, while the CNN model pre-trained from the normal image is agnostic to the target object of interest, which makes pre-training less efficient. Secondly, from the tracking speed point of view, because the depth features of the pre-trained model are high-dimensional, the computation load of a tracker which uses a large number of depth features is high. In order to improve the tracking speed, it is very important to extract and track deep features with high correlation of the target for visual tracking.

In the existing TADT algorithm, on the basis of a SimNet framework, a target sensing module is embedded behind a pre-training network, and a characteristic channel which is most sensitive to target positioning and scale change is effectively selected by calculating regression loss and sequencing loss, so that a very good effect is achieved, and the accuracy is optimal in a series of real-time tracking algorithms. However, in the TADT algorithm, only the Conv4-3 depth features in the convolutional neural network VGG16 are used to locate the target position during feature extraction, and tracking drift is likely to occur in scenes such as large deformation or low resolution of the target.

Disclosure of Invention

The invention aims to provide a target tracking method and a target tracking system combining channel selection and position optimization, which have the advantages of considering both tracking precision and speed and solving the problem of tracking drift in scenes of large target deformation or low resolution and the like.

In order to achieve the purpose, the invention provides the following scheme:

a target tracking method combining channel selection and position optimization comprises the following steps:

acquiring a video sequence to be tracked;

performing feature extraction by using a CNN network model according to the video sequence to be tracked to obtain convolution features; the convolution features comprise a third convolution feature, a fourth convolution feature and a fifth convolution feature;

carrying out gradient calculation, channel selection and cross-correlation operation in sequence according to the convolution characteristics to obtain a first target response graph;

determining a target primary positioning position and a peak value side lobe ratio according to the first target response image;

and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position.

Optionally, the determining a target primary positioning position and a peak-to-side lobe ratio according to the first target response map specifically includes:

determining a target primary positioning position according to the position corresponding to the maximum confidence degree value in the first target response image;

determining a peak-to-side lobe ratio according to a maximum value of the first target response map, a mean value of the first target response map, and a variance of the first target response map.

Optionally, the performing position optimization according to the target primary positioning position and the peak-to-side lobe ratio to obtain a target final positioning position specifically includes:

determining a ratio according to the peak side lobe ratio and the peak side lobe ratio mean value;

determining the reliability of the tracking result according to the ratio, the first set ratio and the second set ratio; the reliability of the tracking result comprises high reliability, to-be-optimized reliability and low reliability;

when the reliability of the tracking result is high, determining the target primary positioning position as a target final positioning position;

when the reliability of the tracking result is to be optimized, tracking the target according to the third convolution characteristic to determine the final positioning position of the target;

and when the reliability of the tracking result is low and credible, the fifth convolution characteristic tracks the target to determine the final positioning position of the target.

Optionally, when the reliability of the tracking result is to be optimized, performing target tracking according to the third convolution feature to determine a final target location position, specifically including:

performing gradient calculation, channel selection and cross-correlation operation in sequence according to the third convolution characteristic to obtain a second target response graph;

judging whether the maximum value of the second target response image is smaller than the set multiple of the maximum value of the first target response image, if not, taking the second target response image as the first target response image, and returning to the step of determining the target primary positioning position and the peak value side lobe ratio according to the first target response image; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.

Optionally, when the reliability of the tracking result is low and reliable, the determining the final target location position by the fifth convolution feature by target tracking specifically includes:

performing gradient calculation, channel selection and cross-correlation operation in sequence according to the fifth convolution characteristic to obtain a third target response graph;

judging whether the maximum value of the third target response image is smaller than the set multiple of the maximum value of the first target response image, if not, taking the third target response image as the first target response image, and returning to the step of determining the target primary positioning position and the peak value side lobe ratio according to the first target response image; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.

A target tracking system incorporating channel selection and location optimization, comprising:

the acquisition module is used for acquiring a video sequence to be tracked;

the characteristic extraction module is used for extracting characteristics by utilizing a CNN network model according to the video sequence to be tracked to obtain convolution characteristics; the convolution features comprise a third convolution feature, a fourth convolution feature and a fifth convolution feature;

the first target response graph determining module is used for sequentially carrying out gradient calculation, channel selection and cross-correlation operation according to the convolution characteristics to obtain a first target response graph;

a target primary positioning position and peak sidelobe ratio determining module, configured to determine a target primary positioning position and a peak sidelobe ratio according to the first target response map;

and the position optimization module is used for carrying out position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position.

Optionally, the module for determining the target primary positioning position and the peak-to-side lobe ratio specifically includes:

the target primary positioning position determining unit is used for determining a target primary positioning position according to the position corresponding to the maximum confidence degree value in the first target response image;

and the peak-to-side lobe ratio determining unit is used for determining the peak-to-side lobe ratio according to the maximum value of the first target response diagram, the mean value of the first target response diagram and the variance of the first target response diagram.

Optionally, the location optimization module specifically includes:

a ratio determining unit, configured to determine a ratio according to the peak side lobe ratio and the peak side lobe ratio mean;

the tracking result reliability determining unit is used for determining the reliability of the tracking result according to the ratio, the first set ratio and the second set ratio; the reliability of the tracking result comprises high reliability, to-be-optimized reliability and low reliability;

the first target final positioning position determining unit is used for determining the target primary positioning position as a target final positioning position when the reliability of the tracking result is high;

a second target final positioning position determining unit, configured to perform target tracking according to the third convolution feature to determine a target final positioning position when the reliability of the tracking result is to be optimized;

and the third target final positioning position determining unit is used for performing target tracking by the fifth convolution characteristics to determine a target final positioning position when the reliability of the tracking result is low and credible.

Optionally, the determining unit for the final location position of the second target specifically includes:

the second target response graph determining subunit is used for sequentially performing gradient calculation, channel selection and cross-correlation operation according to the third convolution characteristics to obtain a second target response graph;

a first judging subunit, configured to judge whether a maximum value of the second target response map is smaller than a set multiple of a maximum value of the first target response map, if not, use the second target response map as the first target response map, and return to the step "determine a target primary positioning position and a peak-to-side lobe ratio according to the first target response map; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.

Optionally, the third target final positioning location determining unit specifically includes:

a third target response graph determining subunit, configured to perform gradient calculation, channel selection, and cross-correlation operation in sequence according to the fifth convolution feature, to obtain a third target response graph;

a second judging subunit, configured to judge whether a maximum value of the third target response map is smaller than a set multiple of a maximum value of the first target response map, if not, use the third target response map as the first target response map, and return to the step "determine a target primary positioning position and a peak-to-side lobe ratio according to the first target response map; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the target tracking method and the target tracking system combining channel selection and position optimization provided by the invention are used for acquiring a video sequence to be tracked; performing feature extraction by using a CNN network model according to a video sequence to be tracked to obtain convolution features; performing gradient calculation, channel selection and cross-correlation operation in sequence according to the convolution characteristics to obtain a first target response graph; determining a target primary positioning position and a peak value side lobe ratio according to the first target response image; and performing position optimization according to the target primary positioning position and the peak-to-side lobe ratio to obtain a target final positioning position. The method utilizes convolution characteristics, uses peak side lobe comparison to process the primary positioning position of the target, gives consideration to tracking precision and speed, and solves the problem that algorithms such as TADT (active digital analysis for transform) and the like are easy to generate tracking drift in complex scenes such as target deformation and low resolution.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a target tracking method incorporating channel selection and location optimization according to the present invention;

FIG. 2 is a schematic view of the VGG-16 structure provided by the present invention;

FIG. 3 is a Siamese framework diagram incorporating target perception;

FIG. 4 is a schematic flow chart of a target tracking method combining channel selection and position optimization according to the present invention;

FIG. 5 is a graph comparing the success rate curve and the distance accuracy of the OTB-100 according to the present invention and 9 advanced target tracking methods.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the target tracking method combining channel selection and position optimization provided by the present invention includes:

step 101: a video sequence to be tracked is acquired.

Step 102: performing feature extraction by using a CNN network model according to the video sequence to be tracked to obtain convolution features; the convolution features include a third convolution feature, a fourth convolution feature, and a fifth convolution feature. The third convolution characteristic is the convolution characteristic extracted by the third convolution module, the fourth convolution characteristic is the convolution characteristic extracted by the fourth convolution module, and the fifth convolution characteristic is the convolution characteristic extracted by the fifth convolution module, the CNN network model used by the invention is a VGG-16 network, and only one convolution layer is adopted in the fifth group of convolution modules of the VGG-16 network. Namely, the device comprises a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, a fifth convolution module and three connecting layers which are connected in sequence. The first convolution module and the second convolution module both comprise two convolution layers and one pooling layer, the third convolution module and the fourth convolution module both comprise two convolution layers and one pooling layer, and the fifth convolution module comprises one convolution layer and one pooling layer.

Step 103: as shown in fig. 3, gradient calculation, channel selection, and cross-correlation operations are performed in sequence according to the convolution characteristics to obtain a first target response map. The gradient calculation module is used for receiving the extracted target characteristics and calculating gradient information of all characteristic channels. The channel selection module is used for selecting the characteristic channel corresponding to the positive gradient information as the characteristic representation of the current target, ranking the importance of the characteristic channel and taking the obtained importance score as the cross-correlation weight coefficient of each channel of the subsequent cross-correlation part. In practical applications, the fourth convolution feature is utilized.

Step 104: and determining a target primary positioning position and a peak value side lobe ratio according to the first target response graph. Step 104, specifically comprising:

and determining the primary positioning position of the target according to the position corresponding to the maximum confidence degree value in the first target response image.

Step 105: and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position.

Step 105: the method specifically comprises the following steps:

and determining a ratio according to the peak side lobe ratio and the peak side lobe ratio mean value.

Determining the reliability of the tracking result according to the ratio, the first set ratio and the second set ratio; the reliability of the tracking result comprises high reliability, to-be-optimized reliability and low reliability. The first set ratio is greater than the second set ratio. When the ratio is larger than the first set ratio, the reliability of the tracking result is high, when the ratio is smaller than the second set ratio, the reliability of the tracking result is low, and when the ratio is between the first set ratio and the second ratio, the reliability of the tracking result is to be optimized.

And when the reliability of the tracking result is high, determining the target primary positioning position as a target final positioning position.

And when the reliability of the tracking result is to be optimized, tracking the target according to the third convolution characteristic to determine the final positioning position of the target. In practical application, when the reliability of the tracking result is to be optimized, performing target tracking according to the third convolution characteristic to determine a final target positioning position specifically includes:

performing gradient calculation, channel selection and cross-correlation operation in sequence according to the third convolution characteristic to obtain a second target response graph; judging whether the maximum value of the second target response image is smaller than the set multiple of the maximum value of the first target response image, if so, determining that the second target response image is not credible, and not performing position optimization processing; if not, taking the second target response graph as a first target response graph, and returning to the step of determining the target primary positioning position and the peak-to-side lobe ratio according to the first target response graph; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.

And when the reliability of the tracking result is low and credible, the fifth convolution characteristic tracks the target to determine the final positioning position of the target. In practical application, when the reliability of the tracking result is low and reliable, the fifth convolution characteristic performs target tracking to determine a final target positioning position, and specifically includes:

performing gradient calculation, channel selection and cross-correlation operation in sequence according to the fifth convolution characteristic to obtain a third target response graph; judging whether the maximum value of the third target response image is smaller than the set multiple of the maximum value of the first target response image, if so, determining that the third target response image is not credible, and not carrying out optimization processing; if not, taking the third target response graph as a first target response graph, and returning to the step of determining the target primary positioning position and the peak-to-side lobe ratio according to the first target response graph; and performing position optimization according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.

The invention also provides a target tracking system combining channel selection and position optimization, which comprises:

and the acquisition module is used for acquiring the video sequence to be tracked.

The characteristic extraction module is used for extracting characteristics by utilizing a CNN network model according to the video sequence to be tracked to obtain convolution characteristics; the convolution features include a third convolution feature, a fourth convolution feature, and a fifth convolution feature.

And the first target response graph determining module is used for sequentially carrying out gradient calculation, channel selection and cross-correlation operation according to the convolution characteristics to obtain a first target response graph.

And the target primary positioning position and peak sidelobe ratio determining module is used for determining the target primary positioning position and the peak sidelobe ratio according to the first target response map.

In practical application, the module for determining the target primary positioning position and the peak-to-side lobe ratio specifically comprises:

and the target primary positioning position determining unit is used for determining a target primary positioning position according to the position corresponding to the maximum confidence degree value in the first target response image.

In practical application, the position optimization module specifically includes:

and the ratio determining unit is used for determining the ratio according to the peak side lobe ratio and the peak side lobe ratio mean value.

The tracking result reliability determining unit is used for determining the reliability of the tracking result according to the ratio, the first set ratio and the second set ratio; the reliability of the tracking result comprises high reliability, to-be-optimized reliability and low reliability.

And the first target final positioning position determining unit is used for determining the target primary positioning position as the target final positioning position when the reliability of the tracking result is high.

And the second target final positioning position determining unit is used for tracking the target according to the third convolution characteristics to determine the target final positioning position when the reliability of the tracking result is to be optimized.

In practical applications, the determining unit for the final positioning position of the second target specifically includes:

and the second target response graph determining subunit is used for sequentially performing gradient calculation, channel selection and cross-correlation operation according to the third convolution characteristic to obtain a second target response graph.

In practical applications, the third target final positioning location determining unit specifically includes:

and the third target response graph determining subunit is used for sequentially performing gradient calculation, channel selection and cross-correlation operation according to the fifth convolution characteristic to obtain a third target response graph.

As shown in fig. 4, the present invention further provides a specific working method of the target tracking method combining channel selection and position optimization in practical application, and the steps are as follows:

step 1, initializing a CNN network model: in the aspect of CNN network model selection, a VGG16 deep network model developed by oxford university as shown in fig. 2 was used. The network layers containing parameters in the VGG16 network model have 16 layers in total, namely 13 convolutional layers and 3 fully-connected layers, wherein the convolutional layers are divided into 5 parts, namely Conv 1-Conv 5. The feature extraction of the input image can be realized by using the convolution layer. The process of feature extraction using the VGG16 network model is as follows:

compared with the TADT algorithm, the method of the present invention preserves the Conv5-1 convolutional layer to obtain deeper features for subsequent target relocation. The higher order Conv5-2 and Conv5-3 convolutional layers were not retained, primarily considering that the use of deeper features would significantly increase the processing time of the algorithm.

Step 2, primary positioning of the target: the target sensing module shown in fig. 3 is proposed by combining with a TADT algorithm under the siemese framework for the primary positioning of the target. The main process is as follows: (1) extracting Conv4-3 layer characteristics of the target to be tracked (the TADT algorithm has proved through experiments that the target positioning effect is best by using the Conv4-3 layer characteristics); (2) and calculating gradient information of all feature channels by using a regression loss and a sorting loss function in the TADT algorithm, and selecting the feature channels which are sensitive to target activity and scale change to participate in subsequent cross-correlation operation. After the channel selection, the dimensionality of the convolutional layer is reduced, and the effectiveness of the convolutional layer is improved. (3) And (2) obtaining a target response image through cross-correlation operation (the cross-correlation operation is an important component of a Simese tracking algorithm, is similar to convolution operation in a network and is used for calculating the similarity between the template image and the candidate area search image), wherein the value in the response image represents the confidence coefficient that the corresponding position is the actual target, and the actual position of the target object in the search area, namely the target primary positioning position, can be calculated by using the position corresponding to the maximum confidence coefficient in the response image.

Step 3, judging the reliability of the primary positioning: and (3) introducing a peak side lobe ratio to perform reliability judgment on the initial tracking positioning result, wherein the peak side lobe ratio is defined as follows: let f be the characteristic response map of the t-th frame video sequence_tThen its peak side lobe ratio P_tIs composed of

In the formula: max is taken to be the maximum value, mu_t、σ_tMean and variance of the target response plot. When the tracking algorithm can accurately position, the peak side lobe ratio is larger, otherwise, the peak side lobe ratio is smaller.

The reliability judgment method comprises the following steps: the peak value sidelobe ratio of the t frame video sequence is obtained as P by the formula_tAnd calculating the ratio of the average value of the peak side lobe ratio of the first t-2 frame to the average value:

ratio＝P_t/avg(P₂+…P_t-1)

if the ratio is more than or equal to 0.8, the tracking result is considered to be high credibility; if the ratio is more than or equal to 0.7 and less than 0.8, the tracking result is considered to be optimized; and if the ratio is more than or equal to 0.6 and less than 0.7, the tracking result is considered to be low and credible.

And 4, optimizing the position: the tracking result is high credibility, and the target primary positioning position obtained in the step 2 is directly used as a final result; and if the tracking result is to be optimized, extracting Conv3-3 layer depth features (the low-layer features have higher resolution and can accurately position the target) on the basis of the search area obtained by primary positioning, performing channel selection again by using the target sensing module in the step 2, and further performing cross-correlation operation to obtain a new target response graph. Setting a threshold value alpha (set to be 1.15 through a large number of experiments), replacing the primary positioning position with the newly positioned target position only when the maximum value of the new target response image reaches or exceeds alpha times of the maximum value of the original target response image, realizing accurate relocation, and updating the search area by taking the positioned new position as a reference; and for the tracking result which is low credible, extracting Conv5-1 layer depth features (the high-level features contain more semantic information, can effectively perform range positioning on the target, process larger target change and prevent the tracker from drifting) to perform target relocation (similar to the processing method), and updating the search area by taking the positioned new position as the reference.

And 5, outputting the target position: and outputting the optimized target position information to finish the tracking of the frame of image.

The invention provides a novel target tracking method by combining channel selection and position optimization under a TADT algorithm framework. The algorithm utilizes depth features of three levels of Conv3-3, Conv4-3 and Conv5-1 and uses peak side lobe comparison to classify and process primary tracking results, so that tracking precision and speed are considered, and the problem that tracking drift is easy to occur in TADT and other algorithms under complex scenes of target deformation, low resolution and the like is solved. As shown in fig. 5, fig. 5(a) is a graph of success rate of 10 advanced target tracking methods in OTB-100, fig. 5(b) is a graph of distance accuracy of 10 advanced target tracking methods in OTB-100, sufficient experiments are performed on an OTB tracking data set and compared with a current advanced algorithm, it is proved that the method has excellent tracking performance, main tracking indexes exceed a current mainstream tracking algorithm, the tracking success rate is comprehensively improved compared with a TADT algorithm, the tracking success rate on the OTB100 data set reaches 0.662, the tracking accuracy is 0.870, and the tracking speed reaches 24.8 frames/second.

The invention has the following advantages:

1. improvements are made in depth feature extraction. First, when the VGG16 network is initialized, the method preserves the Conv5-1 convolutional layer and the TADT algorithm removes all Conv5 layer features. However, the TADT algorithm locates the target position only by using the Conv4-3 layer depth feature in the convolutional neural network VGG16, and tracking drift is likely to occur in scenes such as large deformation or low resolution of the target. Different hierarchical features of the convolutional network play a very important role in target tracking. The low-level features have higher resolution ratio and can accurately position the target, the high-level features contain more semantic information, the range of the target can be effectively positioned, larger target change is processed, and the tracker is prevented from drifting.

2. And (4) introducing peak side lobe comparison to perform reliability judgment on each frame of tracking result and performing optimization processing. Directly utilizing the initial target positioning position obtained before as a final result when the initial positioning result is high credibility; for the situation to be optimized, on the basis of a search area obtained by primary positioning, the Conv3-3 layer depth feature is utilized to accurately reposition the target; for low confidence scenarios, target relocation is performed using Conv5-1 layer depth features. The reliability judgment and the classification processing are carried out, and both the tracking precision and the tracking speed are taken into consideration.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A target tracking method combining channel selection and position optimization is characterized by comprising the following steps:

acquiring a video sequence to be tracked;

2. The method for tracking a target in combination with channel selection and location optimization according to claim 1, wherein the determining a target primary localization location and a peak to side lobe ratio according to the first target response map specifically comprises:

3. The target tracking method combining channel selection and position optimization according to claim 1, wherein the performing position optimization according to the primary target positioning position and the peak-to-side lobe ratio to obtain a final target positioning position specifically comprises:

4. The method for tracking the target in combination with the channel selection and the position optimization according to claim 3, wherein when the reliability of the tracking result is to be optimized, performing target tracking according to the third convolution feature to determine a final target positioning position specifically includes:

5. The method for tracking a target in combination with channel selection and location optimization according to claim 3, wherein when the reliability of the tracking result is low, the fifth convolution feature performs target tracking to determine a final target location, specifically including:

6. A target tracking system incorporating channel selection and location optimization, comprising:

the acquisition module is used for acquiring a video sequence to be tracked;

7. The system for tracking an object in combination with channel selection and location optimization according to claim 6, wherein the module for determining the initial target location and the peak-to-side lobe ratio specifically comprises:

8. The system for tracking an object in combination with channel selection and position optimization according to claim 6, wherein the position optimization module specifically comprises:

9. The system for tracking an object in combination with channel selection and position optimization according to claim 8, wherein the second object final positioning position determining unit specifically comprises:

10. The system for tracking an object by combining channel selection and position optimization according to claim 8, wherein the third object final positioning position determining unit specifically comprises:

a second judging subunit, configured to judge whether a maximum value of the third target response map is smaller than a set multiple of a maximum value of the first target response map, if not, use the third target response map as the first target response map, and return to the step "determine a target primary positioning position and a peak-to-side lobe ratio according to the first target response map; and optimizing the position according to the target primary positioning position and the peak side lobe ratio to obtain a target final positioning position'.