CN111242976B

CN111242976B - Aircraft detection tracking method using attention mechanism

Info

Publication number: CN111242976B
Application number: CN202010017027.7A
Authority: CN
Inventors: 李剑思; 林姝含; 郑文涛
Original assignee: Beijing Tianrui Kongjian Technology Co ltd
Current assignee: Beijing Tianrui Kongjian Technology Co ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2021-05-11
Anticipated expiration: 2040-01-08
Also published as: CN111242976A

Abstract

The invention relates to an aircraft detection tracking method utilizing an attention mechanism, which adopts a target detection algorithm based on a convolutional neural network to carry out target detection and is characterized in that a local attention enhancement mechanism is introduced, a large target feature information related to the whole target and a small target feature information related to the specific local part of the target are respectively extracted by adopting a large target convolutional filter and a small target convolutional filter, the large target feature information is correspondingly enhanced by the small target feature information, and the large target feature information after the local information enhancement is used for carrying out target detection on a current frame image to form a large target detection proposal. The method reduces background interference, improves the performance of aircraft detection, can avoid failure of the tracker caused by long-term local shielding, and is mainly used for aircraft detection and tracking in a flight area.

Description

Aircraft detection tracking method using attention mechanism

Technical Field

The invention relates to an aircraft detection tracking method by using an attention mechanism, belonging to the technical field of computer vision.

Background

In the operation management and control of the aircraft in the flight area, the aircraft needs to be detected and tracked in real time. The traditional aircraft detection and tracking is carried out based on a field monitoring radar, and in recent years, an air traffic control operation system based on a panoramic video of a flight area is more intuitive and actively popularized.

The detection and tracking of moving objects in video are two important topics in the field of computer vision. In many application scenarios, the two are linked together. E.g., Deep Sort algorithm^[1]Is a typical method of post-detection tracking.

For target detection, a target detection algorithm designed based on a convolutional neural network is the mainstream method at present, and a target detection algorithm model based on the convolutional neural network can be roughly divided into a Two stage model (such as fast-RCNN)^[2]、Mask-RCNN^[3]) And One stage models (e.g., Yolo)^[4]、SSD^[5]). Target tracking is more complex and difficult than target detection, and researchers have proposed various solutions from different perspectives, such as correlation filtering based on single target features^[6]Based on depth scienceGenerating type tracking method for learning^[7]And Re-identification (Re-identification ) based on feature similarity matching, and the like.

However, it is difficult to achieve high accuracy by directly applying the moving object detection and tracking method to the aircraft in the flight area (or airport). There are several reasons for this: firstly, a target detection algorithm based on a convolutional neural network is not friendly to the detection task of the aircraft, which is caused by the fact that the area occupation ratio of the aircraft in a rectangular frame of the aircraft learned by a detection model is usually not high enough, and the features extracted by a convolutional feature extractor have strong background characteristics; secondly, in the ground environment of an airport, the local shielding event of the airplane is a normal state, and the specific local characteristics of the airplane target are lost due to local shielding, so that the target detection algorithm can fail to detect the airplane target; finally, airplanes of the same model have almost no difference in appearance (unless large-area painting is performed by an airline, but painting of airplanes of the same airline is generally similar), and airplanes of different models have small difference in appearance, which makes it difficult to distinguish different airplanes using image features based on the appearances of airplanes, so that the Re-identification function is limited in performance in the tracking task.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an aircraft detection and tracking method using an attention mechanism, so as to be better suitable for detecting and tracking aircraft in a flight area.

The technical scheme of the invention is as follows: a method for detecting and tracking an aircraft by using an attention mechanism comprises the steps of carrying out target detection by adopting a target detection algorithm based on a convolutional neural network, introducing a local attention enhancement mechanism to complete the operation of enhancing large target information characteristics by using small target information characteristics, respectively extracting large target characteristic information related to a target population (namely a large target) and small target characteristic information related to a target specific part (namely a small target) by using a large target convolution filter and a small target convolution filter, carrying out corresponding local information enhancement on the large target characteristic information by using the small target characteristic information, and carrying out target detection on a current frame image by using the large target characteristic information after the local information enhancement to form a large target detection proposal.

The target may be an aircraft or airplane and the scene involved may be an airport or flight area.

The specific part of the target can be a part with obvious visual characteristics, such as a wing, a nose, a tail and the like. The number can be one or more according to the actual situation.

The receptive field setting and down-sampling rate of the small target convolution filter should be less than the receptive field setting and sampling rate of the large target convolution filter.

The manner of performing corresponding local information enhancement on the large target feature information by using the small target feature information may be shown as the following formula:

f_u＝a_s(f_s)*k(f_b)+b_s(f_s)

wherein f is_uFor feature maps used for large target identification, k is an upsampling function for upsampling large target convolution features, a_sA scaling parameter network for calculating scaling parameters based on small object features, b_sOffset parameter network for calculating an offset parameter from small target features, f_bConvolution signature, f, obtained for a large target convolution filter_sConvolution signature obtained for a small convolution filter.

Preferably, the basic convolution filter is set as a common pre-part of the large target filter and the small target filter, and is used for extracting primary characteristic information of the image, and the primary characteristic information is used as an input of the subsequent large target convolution filter and the subsequent small target convolution filter.

Preferably, when the target detection is performed on the same frame of image, a large target detection proposal and a specific local small target detection proposal of a large target can be obtained at the same time, so that the small target detection proposal can be used for replacing a large target with failed detection to perform tracking operation when the target tracking is performed.

For example, a small target detector may be provided to perform small target detection on the target specific part, the small target detection detector performs small target detection related to the target specific part using the small target feature information extracted by the small target convolution filter to form a small target detection proposal, and when large target detection or identification related to the target population fails, the large target position related to the target population is estimated based on a positional relationship between the small target detection proposal and the target specific part related to the small target in the target.

Preferably, a double-task learning mode is adopted to train a small target detection module, a small target label and a large target label are set, a large target note is hit as a second task (an auxiliary task), and the small target detection module mainly comprises a small target convolution filter and a small target detector.

The tracker keeps large target position information and a large target motion equation related to the whole target, keeps small target position information and a small target motion model related to specific local target, performs large target detection related to the whole target and small target detection related to specific local target on the current frame image to obtain a detection target proposal of the current image, wherein the detection target proposal comprises a detection large target proposal (large target proposal) and a detection small target proposal (small target proposal), and updates the tracker according to the following modes:

1) and matching the large target detection proposal with the small target detection proposal.

And calculating the overlapping degree of the large target detection proposal and the small target detection proposal of the current frame, and determining whether the matching is successful according to the overlapping degree and a corresponding threshold value.

And classifying the successfully matched small target detection proposals into large target detection proposals, and recording the unsuccessfully matched small target detection proposals.

2) And matching the large target detection proposal with the tracker.

And performing an overlapping test of the large target detection proposal and the large target tracking proposal, and matching the large target detection proposal and the tracker according to an overlapping calculation result and a corresponding threshold, wherein the large target tracking proposal is the large target tracking proposal of the current frame calculated by the current tracker according to a large target motion equation.

And recording a tracker with successful matching and a tracker for detecting the large target proposal pair, and recording a large target proposal with failed matching and a tracker with failed matching.

3) And matching the small target detection proposal with the tracker.

And performing an overlapping test of the small target detection proposal and the small target tracking proposal, and matching the small target detection proposal and the tracker according to an overlapping calculation result and a corresponding threshold, wherein the small target tracking proposal is the small target tracking proposal of the current frame calculated by the current tracker according to a small target motion model.

And recording a tracker which is successfully matched and a tracker which is used for detecting the small target proposal pair, and recording a small target proposal which is failed to be matched and a tracker which is failed to be matched.

4) And updating the tracker.

And for the tracker which is successfully matched, updating the corresponding information record of the tracker according to the corresponding large target detection proposal and/or small target detection proposal, starting a new tracker by using the large target detection proposal which is unsuccessfully matched with the large target detection proposal, simultaneously reducing the reliability of the tracker, closing the tracker with the reliability lower than the corresponding threshold value, and discarding the small target detection proposal which is unsuccessfully matched with the small target detection.

The invention has the beneficial effects that: due to the fact that a local attention mechanism is introduced, the overall characteristics of the aircraft are enhanced by using the local characteristics of the aircraft, background interference is reduced, the performance of aircraft detection is improved, the overall and local tracking of the aircraft is achieved, the overall position is estimated according to the local position information, and failure of a tracker caused by long-term local shielding is avoided.

The invention can be mainly used for detecting and tracking the aircraft in the flight area.

Drawings

FIG. 1 is an overall flow diagram;

FIG. 2 is a target detection inference flow with local attention enhancement mechanism;

FIG. 3 is a target detection model training flow with a local attention-enhancing mechanism;

FIG. 4 is a schematic diagram of a flow architecture involved in a small target attention mechanism assistance module;

FIG. 5 is a target tracking flow diagram with a small target attention mechanism.

Detailed Description

1. Integrated process

The overall process of the present invention is shown in fig. 1, and mainly comprises a target detection module with a local attention enhancement mechanism and a target tracking module with a local attention enhancement mechanism. Firstly, inputting video images into a target detection model with a local attention module frame by frame to obtain a large target and a small target; then, using the large target and the small target for updating and confirming the tracker; and finally, the tracker predicts the next frame of image according to the updated parameters.

2. Object detection with local attention enhancement mechanism

The attention mechanism is an application of neuroscience ideas in the field of computer science, and the performance of a target detection model is improved by paying attention to a partial region with strong target characteristics. The attention mechanism adopted in the method is a local attention mechanism in the field of computer vision^[8,9]. The target detection and tracking object is not only the whole aircraft, but also representative parts (such as a fuselage and a wing) of the aircraft. By means of local 'attention' to the representative aircraft, effective information is increased, accuracy and sensitivity of target detection are improved, and meanwhile the influence of local shielding on target tracking is reduced.

2.1 object detection model with local attention enhancement mechanism

The design idea of target detection with a local attention enhancement mechanism is to extract features by adopting a convolution filter based on the same basic group, and to access two groups of convolution neural network target detection models with different structures at the rear end of the convolution neural network target detection model for detecting a large target (the whole aircraft) and a small target (the part of the aircraft, such as a fuselage and a wing) respectively.

The detection models are divided into training models for training and inference models for reasoning. During reasoning, the characteristic information of the small target detection module is transmitted to the large target detection module through the small target attention assisting module, so that the characteristics of the large target are enhanced, and the performance of the large target detector is improved. When model training is carried out, the target labels of the small targets are used as learning targets in addition to the labeled small target labels, so that the difficulty in small target detection and learning caused by the fact that the small targets are difficult to hit during small target detection training can be relieved.

The training model and the inference model both have a shared convolution filter, a small target convolution filter, a large target convolution filter, a small target detector, a large target detector, and a small target attention assist module. Unlike the inference model, the training model has a large target location assistance module.

The basic convolution filter is a common functional module of the large target detection module and the small target detection module. The basic convolution filter is responsible for extracting primary characteristic information of the image. The large target detection module and the small target detection module share the same basic convolution filter, which is beneficial to the generalization of the model and prevents overfitting.

The large target detection module and the small target detection module are obtained by improving the existing same convolutional neural network target detection algorithm (such as fast RCNN, YOLO, SSD and the like). The large target detection module and the small target detection module can be respectively divided into a convolution feature extractor (convolution filter) and a target detector (the detailed structure depends on the selection of a basic model). The setting of the receptive field and the down-sampling rate of the convolution feature extractor of the small target detection module are smaller relative to the large target detection module.

The small target attention assisting module is used for finding the small target convolution feature vector corresponding to each large target feature point and calculating the scaling coefficient and the translation coefficient of the small target convolution feature vector according to the small target feature. And zooming and translating the large target feature vector according to the result of the small target attention assisting module to obtain a convolution feature vector with enhanced local attention for large target detection.

The large target positioning auxiliary module is a functional module which is only used in model training. The main function of this module is to use the large target label as one of the small target training targets as well. One of the difficulties in training small targets is that the detection result of the small target and the label are difficult to hit each other, and the large target label is introduced to be used as positioning assistance, so that the small target learning is facilitated.

2.2 target detection inference procedure with local attention enhancement mechanism

The object detection reasoning process is shown in figure 2. And operating the whole image by the basic convolution filter to obtain an image primary feature map. And performing next feature extraction on the obtained primary feature map by using convolution filters with different down-sampling rates and different receptive fields for large target detection and small target detection respectively. The small target convolution filter is a special feature extractor for identifying local structures such as wings and a fuselage, and compared with a large target convolution filter, the small target convolution filter is lower in down-sampling rate and keeps more local information. This local information can be used to enhance the recognition of large objects.

The small target attention assisting module is composed of a scaling parameter network and an offset parameter network, and related architecture and workflow are shown in fig. 4. The scaling parameter net and the offset parameter net are two different 1 x 1 convolution layers. The number of input channels and the number of output channels of the two convolutional layers are the same, the number of the input channels is equal to that of the channels of the small target characteristic diagram, and the number of the output channels is equal to that of the channels of the large target characteristic diagram. And calculating the small target characteristic diagram through a scaling parameter network and an offset parameter network to obtain a scaling parameter template and an offset parameter template for improving the local attention of the large target characteristic diagram. Since the small target convolution feature extractor has a lower down-sampling rate than the large target convolution feature extractor, the small target feature map calculated based on the same primary convolution feature map has a higher resolution than the large target feature map. Here, one up-sampling operation is performed on the large target feature map so that the large target feature map has the same resolution as the small target feature map. And finally, carrying out affine transformation on each feature vector on the large target feature map according to the scaling parameter template and the offset parameter template, namely multiplying the feature vector by the corresponding scaling parameter and adding the multiplied feature vector to the corresponding offset parameter to obtain the large target feature map with enhanced local information.

The local attention feature provided by the large target detection and the small target acquisition can be expressed as formula (1):

f_u＝a_s(f_s)*k(f_b)+b_s(f_s) (1)

And the large target detector performs large target reasoning according to the large target characteristic diagram after the local information is enhanced.

2.3 target detection training procedure with local attention enhancement mechanism

Before the training of the model is performed, the data used for training needs to be labeled. Unlike the conventional data for training target recognition, it is necessary to mark the entire detection target, i.e., the aircraft, and to mark the structures having distinctive portions, such as the wing and the fuselage, constituting the aircraft, but it is not necessary to mark the aircraft to which the corresponding portion belongs.

The target detection model has two problems: the small target detection is difficult to hit, the learning difficulty is caused, and the target detection difficulty is caused by local shielding. For the local occlusion problem, the local attention mechanism is used for processing as mentioned above. If the small target detection learning fails, the local attention mechanism is disabled, and the tracking performance in the method is also reduced. In order to solve the problem of learning difficulty caused by the difficulty in detecting the small target, the invention designs a double-task learning mode for the training of the small target detection module, as shown in the attached appendix 3, and compared with a single small target detection task, the method uses a large target positioning auxiliary module to design a second task for the small target detector, namely, the small target detector is hit with a large target label. Compared with the case that both the inference result and the real label are small targets, the inference result is a small target, and the real label is a large target, the two targets hit each other more simply. In the small target learning single task mode, when a small target inference result is not hit with a small target label, the inference result is classified into a negative sample, but in the double task learning mode, if the inference result is hit with a large target label, the label is also taken as a positive sample to perform retention learning.

In the learning and training of the small target module, the second task (hit of the large target label) in the method is an auxiliary task, and the model is not expected to take the task as a main learning target, so that an optimization function for the second task is designed. The optimization objective of the second learning task is the area ratio of the intersection area of the small target and the large target to the small target, which needs to be close to 1 in the well-learned condition, and the first task is not submerged because the optimization function has an upper limit. At the same time, the weight variables are designed for the second task to balance the relationship between the first task and the second task.

The invention adopts an optimization function based on FocalLoss deformation^[10]See formula (2):

L＝FL(p)-a*A(p)*log(1-p_-t) (2)

wherein L is a cost function to be optimized, FL (p) is a first task multi-classification FocalLoss cost function, A (p) the ratio of the intersection area of the small target and the large target to the area of the small target, a is weight set by the second task, and p is_-tA score evaluated as background.

3. Target tracking based on attention enhancement mechanism

Tracking partial runout-of-tire in Deep Sort algorithm^[1]The processing flow is shown in figure 5.

In order to further reduce the influence of long-term local shielding on multi-target tracking, the method not only tracks the whole aircraft, but also tracks the local part of the aircraft. When long-term local shielding or too many shielding parts occur, even if the whole aircraft cannot be detected, the tracker can update the whole aircraft tracker through small target detection and tracking, and further target loss caused by local shielding is avoided.

The tracker in the method not only records the position information and the motion equation of the whole aircraft, but also retains the position information and the motion model (or motion equation) of the local structure (such as wings and fuselage) of the aircraft. And storing each local position information of the aircraft by using a five-dimensional vector (x, y, w, h, c), wherein x represents the ratio of the horizontal direction distance from the upper left corner of the small target frame to the upper left corner of the large target frame to the width of the large target frame, y represents the ratio of the vertical direction distance from the upper left corner of the small target frame to the upper left corner of the large target frame to the height of the large target frame, w represents the ratio of the width of the small target frame to the width of the large target frame, h represents the ratio of the height of the small target frame to the height of the large target frame, and c represents the local type (such as wings, airframes and the like) of the small target.

When the large target recognition fails but the small target hit succeeds, the large target position is presumed based on the small target position using equation (3).

w_b＝w_s*w

h_b＝h_s*h (3)

u_b＝u_s-x*w_b

v_b＝v_s-y*h_b

Wherein w_b、h_bCoordinates of the upper left corner of the large target frame in the horizontal direction and the vertical direction in the image, u_b、v_bWidth and height of the frame body for large target, w_s、h_sCoordinates of the upper left corner of the small target frame in the horizontal direction and the vertical direction in the image, u_s、v_sThe small target frame width and height.

When the large target detection and the small target detection of the current frame are finished, a detection target proposal (detection proposal) of the current image is obtained, the detection target proposal (detection proposal) comprises a detection large target proposal (large target proposal) and a detection small target proposal (small target proposal), the target proposal of the current frame is used for updating the information in the tracker, and the updating step of the tracker is as follows:

1) the large target proposal matches the small target proposal. Calculating the overlapping degree of a large target detection proposal and a small target detection proposal of the current frame, classifying the small target detection proposal with the overlapping degree exceeding a threshold value into the large target detection proposal, and recording the small target detection proposal with failed matching;

2) the large target proposal is matched with the tracking proposal. Performing an overlapping test on the detection large target proposal of the current frame and a tracking proposal (tracking large target proposal) of the current frame calculated by the current tracker according to a motion equation, matching the detection proposal and the tracker according to an overlapping calculation result and a set corresponding threshold value, and recording a tracker and a detection large target proposal pair which are successfully matched and a matching failure large target proposal and a matching failure tracker;

3) the small target proposal is matched with the tracking proposal. Proposing the small target position of the tracker with failed matching according to the coordinate information and the motion model of the small target reserved in the tracker to form a proposal for tracking the small target, relative distance calculation and matching are carried out on different types of small target detection proposals and small target tracking proposals, the corresponding large target detection proposal is presumed according to the successfully matched small target detection proposal and the relative position information of the small target in the large target in the corresponding small target tracking proposal, the tracking large target proposal of the current frame obtained by the tracker in failure of matching according to the motion equation is subjected to overlapping test with the large target detection proposal presumed by the small target detection proposal, matching the detection proposal and the tracker according to the overlapping calculation result and the set corresponding threshold value, and recording the tracker which is successfully matched, the detection large target proposal pair, the matching failure small target proposal and the matching failure tracker;

4) and updating the tracker. Updating the tracker which is successfully matched by using the detection proposal information (comprising the step 2) and the step 3), starting a new tracker by using the detection large target proposal which is failed to be matched, reducing the reliability of the tracker which is failed to be matched, closing the tracker with the reliability lower than a threshold value, and discarding the detection small target proposal which is failed to be matched.

The technical means disclosed by the invention can be combined arbitrarily to form a plurality of different technical schemes except for special description and the further limitation that one technical means is another technical means.

Reference to the literature

[1]Wojke,Nicolai,Bewley,Alex,Paulus,Dietrich.Simple online and realtime tracking with a deep association metric[J],2017IEEE International Conference on Image Processing(ICIP):3645-3649.

[2]Ren,Shaoqing,He,Kaiming,Girshick,Ross,et al,Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2015,39(6):1137-1149.

[3]Kaiming He,Georgia Gkioxari,Piotr Dollar,et al,Mask R-CNN[C]，2017 IEEE International Conference on Computer Vision(ICCV).IEEE Computer Society,2017.

[4]Redmon,Joseph,Farhadi,Ali.YOLOv3:An Incremental Improvement[J].

[5]Wei Liu,Dragomir Anguelov,Dumitru Erhan,et al,SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision.Springer International Publishing,2016.

[6]

F.Henriques,Caseiro R,Martins P,et al.Exploiting the Circulant Structure of Tracking-by-Detection with Kernels[M]//Computer Vision–ECCV 2012.Springer Berlin Heidelberg,2012.

[7]Bertinetto L,Valmadre J,

F.Henriques,et al.Fully-Convolutional Siamese Networks for Object Tracking[C]//European Conference on Computer Vision.Springer,Cham,2016.

[8]Mnih V,Heess N,Graves A,et al.Recurrent Models of Visual Attention[J].Advances in neural information processing systems,2014.

[9]Chorowski J,Bahdanau D,Serdyuk D,et al.Attention-Based Models for Speech Recognition[J].Computer Science,2015,10(4):429-439.

[10]Lin T Y,Goyal P,Girshick R,et al.Focal Loss for Dense Object Detection[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,PP(99):2999-649.

Claims

1. A method for detecting and tracking an aircraft by using an attention mechanism adopts a target detection algorithm based on a convolutional neural network to detect a target, and is characterized in that a local attention enhancement mechanism is introduced to complete the operation of enhancing the information characteristic of a large target by using the information characteristic of a small target, a large target convolution filter and a small target convolution filter are adopted to respectively extract the characteristic information of the large target related to the whole target and the characteristic information of the small target related to the specific local part of the target, the characteristic information of the large target is correspondingly enhanced by using the characteristic information of the small target, the characteristic information of the large target after local information enhancement is used for detecting the large target of a current frame image to form a proposal for detecting the large target, and the mode of correspondingly enhancing the local information of the characteristic information of the large target by using the characteristic information of the small target is shown by the following formula:

f_u＝a_s(f_s)*k(f_b)+b_s(f_s)

2. The method of claim 1, wherein a basic convolution filter is provided as a pre-stage shared by the large target filter and the small target filter for extracting preliminary feature information of the image, the preliminary feature information being input to the subsequent large target convolution filter and the small target convolution filter.

3. The method according to claim 1 or 2, characterized in that it is arranged to obtain a large target detection proposal and a specific local small target detection proposal of a large target at the same time when target detection is performed on the same frame of image, so that the small target detection proposal is used to replace a large target with failed detection for tracking operation when target tracking is performed.

4. The method of claim 3, wherein a small target detection module is trained using a dual task learning mode, setting small target labels and large target labels, with large target note hits as a second task, the small target detection module consisting essentially of the small target convolution filter and a small target detector.

5. The method of claim 1 or 2, wherein the tracker keeps large target position information and large target motion equation related to the whole target, keeps small target position information and small target motion model related to the specific part of the target, performs large target detection related to the whole target and small target detection related to the specific part of the target on the current frame image, obtains a detection target proposal of the current image, the detection target proposal comprises a detection large target proposal and a detection small target proposal, and updates the tracker according to the following modes:

1) matching the detected large target proposal with the detected small target proposal, classifying the successfully matched detected small target proposal into the detected large target proposal, and recording the unsuccessfully matched detected small target proposal;

2) matching the detection large target proposal with a tracker, recording pairs of the tracker and the detection large target proposal which are successfully matched, and recording the large target proposal which is failed to be matched and the tracker which is failed to be matched;

3) matching the detected small target proposal with a tracker, recording pairs of the tracker and the detected small target proposal which are successfully matched, and recording the small target proposal which is failed to be matched and the tracker which is failed to be matched;

4) updating the tracker, namely updating the corresponding information record of the tracker according to the corresponding large target detection proposal and/or small target detection proposal for the tracker with successful matching, starting a new tracker by using the large target detection proposal with failed matching under the condition that the tracker is unsuccessfully matched with the large target detection proposal, and discarding the small target detection proposal with failed matching under the condition that the tracker is unsuccessfully matched with the small target detection proposal.

6. The method of claim 3, wherein the tracker keeps large target position information and large target motion equations related to the whole target, keeps small target position information and small target motion models related to specific parts of the target, performs large target detection related to the whole target and small target detection related to specific parts of the target on the current frame image, and obtains a detected target proposal of the current image, wherein the detected target proposal comprises a detected large target proposal and a detected small target proposal, and the tracker is updated according to the following modes:

7. The method of claim 4, wherein the tracker keeps large target position information and large target motion equations related to the whole target, keeps small target position information and small target motion models related to specific parts of the target, performs large target detection related to the whole target and small target detection related to specific parts of the target on the current frame image, and obtains a detected target proposal of the current image, wherein the detected target proposal comprises a detected large target proposal and a detected small target proposal, and the tracker is updated according to the following modes: