CN116912632B

CN116912632B - Target tracking method and device based on shielding

Info

Publication number: CN116912632B
Application number: CN202311168561.8A
Authority: CN
Inventors: 蒋召
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2024-04-12
Anticipated expiration: 2043-09-12
Also published as: CN116912632A

Abstract

The application provides a target tracking method and device based on shielding. The method comprises the following steps: constructing an attention network by using a channel global average pooling layer, a convolution layer, an activation layer and an attention layer; constructing a target classification network and an occlusion classification network by using a space global average pooling layer and a full connection layer; constructing an occlusion enhancement network, and constructing a target tracking model by using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network and the occlusion classification network; training the target tracking model according to the target tracking task, and executing the target tracking task by using the trained target tracking model. By adopting the technical means, the problem of low accuracy of the target tracking model caused by shielding in the prior art is solved.

Description

Target tracking method and device based on shielding

Technical Field

The present disclosure relates to the field of target detection technologies, and in particular, to a target tracking method and device based on shielding.

Background

Target tracking (Person-identification), also known as pedestrian re-identification, is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence. In a real scene, there is a large amount of occlusion data for target tracking, which may be pedestrian occlusion or object occlusion, which has a large impact on the accuracy of the target tracking model.

Disclosure of Invention

In view of this, the embodiments of the present application provide a method, an apparatus, an electronic device, and a computer readable storage medium for object tracking based on occlusion, so as to solve the problem in the prior art that the accuracy of the object tracking model is low due to occlusion.

In a first aspect of an embodiment of the present application, there is provided a target tracking method based on occlusion, including: constructing an attention network by using a channel global average pooling layer, a convolution layer, an activation layer and an attention layer; constructing a target classification network and an occlusion classification network by using a space global average pooling layer and a full connection layer; constructing an occlusion enhancement network, and constructing a target tracking model by using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network and the occlusion classification network; training the target tracking model according to the target tracking task, and executing the target tracking task by using the trained target tracking model.

In a second aspect of embodiments of the present application, there is provided an occlusion-based object tracking device, including: a first building module configured to build an attention network using the channel global average pooling layer, the convolution layer, the activation layer, and the attention layer; the second construction module is configured to construct a target classification network and an occlusion classification network by using the space global average pooling layer and the full connection layer; a third construction module configured to construct an occlusion enhancement network, the object tracking model being constructed using the occlusion enhancement network, the feature extraction network, the attention network, the object classification network, and the occlusion classification network; the training module is configured to train the target tracking model according to the target tracking task and execute the target tracking task by using the trained target tracking model.

In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

Compared with the prior art, the embodiment of the application has the beneficial effects that: because the embodiments of the present application build an attention network by utilizing a channel global average pooling layer, a convolution layer, an activation layer, and an attention layer; constructing a target classification network and an occlusion classification network by using a space global average pooling layer and a full connection layer; constructing an occlusion enhancement network, and constructing a target tracking model by using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network and the occlusion classification network; the target tracking model is trained according to the target tracking task, and the target tracking task is executed by utilizing the trained target tracking model, so that the problem of low target tracking model precision caused by shielding in the prior art can be solved by adopting the technical means, and the precision of the target tracking model is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a target tracking method based on occlusion according to an embodiment of the present application;

FIG. 2 is a flowchart of a training method of a target tracking model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an occlusion-based object tracking device according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Fig. 1 is a schematic flow chart of a target tracking method based on occlusion according to an embodiment of the present application. The occlusion-based object tracking method of fig. 1 may be performed by a computer or server, or software on a computer or server. As shown in fig. 1, the occlusion-based target tracking method includes:

s101, constructing an attention network by using a channel global average pooling layer, a convolution layer, an activation layer and an attention layer;

s102, constructing a target classification network and an occlusion classification network by using a space global average pooling layer and a full connection layer;

s103, constructing an occlusion enhancement network, and constructing a target tracking model by using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network and the occlusion classification network;

s104, training the target tracking model according to the target tracking task, and executing the target tracking task by using the trained target tracking model.

Specifically: sequentially connecting a channel global average pooling layer, a convolution layer, an activation layer and an attention layer to obtain an attention network; the target classification network and the shielding classification network are respectively obtained by sequentially connecting a space global average pooling layer and a full connection layer; and sequentially connecting the shielding enhancement network, the feature extraction network and the attention network in series, connecting an attention layer in the attention network with the feature extraction network on the basis of being connected with the activation layer, and connecting the target classification network and the shielding classification network in parallel after the attention network to obtain a target tracking model.

Wherein the attention layer in the attention network is configured to assign attention to each element in the output of the feature extraction network based on the output of the activation layer. For example, the multiplication operation may be performed on the output of the activation layer and the output of the feature extraction network, where the multiplication operation is to assign attention to each element in the output of the feature extraction network based on the output of the activation layer.

The activation function used by the activation layer may be Sigmoid; the feature extraction network is a backhaul network, and the backhaul network can select a residual error network; the channel global average pooling layer is to average on the channel, and the spatial global average pooling layer is to average on the size, for example, the input is 3x3x100 feature map, the output is 3x3 through the channel global average pooling layer, and the output is 1x1x100 through the spatial global average pooling layer. Wherein the first 3 of 3x3x100 represents width, the second 3 represents height (3 x3 is size), and 100 represents the number of channels.

According to the technical scheme provided by the embodiment of the application, the attention network is constructed by using the channel global average pooling layer, the convolution layer, the activation layer and the attention layer; constructing a target classification network and an occlusion classification network by using a space global average pooling layer and a full connection layer; constructing an occlusion enhancement network, and constructing a target tracking model by using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network and the occlusion classification network; the target tracking model is trained according to the target tracking task, and the target tracking task is executed by utilizing the trained target tracking model, so that the problem of low target tracking model precision caused by shielding in the prior art can be solved by adopting the technical means, and the precision of the target tracking model is further improved.

Further, training the target tracking model according to the target tracking task includes: acquiring a training data set corresponding to a target tracking task, and randomly determining a first training sample of a target tracking model input at the current moment from the training data set; inputting a first training sample into a target tracking model: the shielding enhancement network processes the first training sample to obtain a shielding sample; the feature extraction network processes the shielding sample to obtain a sample feature map; the attention network processes the sample feature map to obtain an attention feature map; the target classification network processes the attention characteristic diagram to obtain a target classification result; the shielding classification network processes the attention characteristic diagram to obtain a shielding classification result; calculating the classification loss between the target classification result and the target corresponding label in the first training sample by using the cross entropy loss function; calculating shielding losses between shielding classification results and shielding corresponding labels in shielding samples corresponding to the first training samples by using a cross entropy loss function; and optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the first training sample until the model parameters of the target tracking model are optimized by using all training samples in the training data set, and then determining that the training of the target tracking model is completed.

The training data set is provided with a large number of training samples, the training sample of the target tracking model input at the current moment is recorded as a first training sample, and the training sample of the target tracking model input at the current moment is randomly determined. The target classification result is the classification of the target, namely the pedestrian, in the first training sample, and the shielding classification result is the classification of the shielding type in the shielding sample corresponding to the first training sample. The shielding types comprise pedestrian shielding, object shielding, animal shielding and the like, wherein the pedestrian shielding is used for shielding targets by other pedestrians, the object shielding is used for shielding targets by objects, the objects can be buildings, pedestrian backpacks and the like, and the animal shielding is used for shielding targets by animals.

The model parameters of the target tracking model are optimized by using the classification loss and the shielding loss, and the model parameters of the target tracking model can be optimized according to the result of weighted summation by carrying out weighted summation on the classification loss and the shielding loss. After all training samples in the training dataset are used to optimize the model parameters of the target tracking model, the training of the target tracking model is determined to be completed.

Still further, the processing of the first training sample by the occlusion enhancement network to obtain an occlusion sample includes: determining a second training sample corresponding to the first training sample, wherein the second training sample is any one training sample in the training data set except the first training sample; cutting at any position of the second training sample according to the randomly determined size to obtain a cutting picture; placing the cut picture at any position of the first training sample to obtain a shielding sample corresponding to the first training sample, wherein the cut picture is used as shielding in the shielding sample; and determining a label corresponding to the occlusion in the occlusion sample.

Randomly determining a size as a target size, cutting a cutting picture with the size of the target size at any position of a second training sample, and placing the cutting picture at any position of a first training sample to obtain a shielding sample. Wherein the target size is smaller than the size of the first training sample. After the occlusion enhancement network obtains the occlusion sample, the occlusion enhancement network also determines the label corresponding to the occlusion in the occlusion sample, namely, determines whether the cut picture is a picture about pedestrians, objects or animals.

Further, the attention network processes the sample feature map to obtain an attention feature map, which comprises: the sample feature map sequentially passes through a channel global average pooling layer, a convolution layer and an activation layer to obtain attention force map; the attention profile and the sample profile are input into an attention layer and an attention profile is output, wherein the attention profile assigns attention to each element in the sample profile based on the attention profile.

The attention map and the sample feature map are input into an attention layer, and the attention layer multiplies the attention map and the sample feature map to assign attention to each element in the sample feature map based on the attention map and output the attention feature map.

In an alternative embodiment: after training the target tracking model is completed, deleting the shielding classification network in the target tracking model to update the model structure of the target tracking model; and executing a target tracking task by using the target tracking model after updating the model structure.

The shielding classification network is used for compensating the shielding loss caused by the shielding enhancement network, the characteristic extraction network and the attention network in training, and after the training is finished, if the target tracking model is not required to output shielding categories, the shielding classification network in the target tracking model can be deleted so as to simplify the model structure of the target tracking model.

Fig. 2 is a flowchart of a training method of a target tracking model according to an embodiment of the present application. As shown in fig. 2, includes:

s201, acquiring a training data set corresponding to a target tracking task, and executing the following loop algorithm:

s202, judging whether i is larger than N, wherein i is the sequence number of training samples in the training data set, and N is the number of training samples in the training data set;

s203, when i is larger than N, determining that training of the target tracking model is completed;

s204, when i is not greater than N, inputting the ith training sample into the target tracking model:

s205, the shielding enhancement network processes the ith training sample to obtain an ith shielding sample;

s206, the feature extraction network processes the ith shielding sample to obtain an ith sample feature map;

s207, the attention network processes the ith sample feature map to obtain an ith attention feature map;

s208, the target classification network processes the ith attention feature map to obtain an ith target classification result;

s209, the shielding classification network processes the ith attention feature map to obtain an ith shielding classification result;

s210, calculating classification loss between an ith target classification result and a target corresponding label in an ith training sample by using a cross entropy loss function;

s211, calculating the shielding loss between the ith shielding classification result and the shielding corresponding label in the ith shielding sample by using a cross entropy loss function;

s212, optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the ith training sample, i+1.

i+1 is updating i with the value of i+1. According to the embodiment of the application, the training samples in the training data set are traversed, so that model parameters of the target tracking model are optimized for each training sample.

In an alternative embodiment: after training the target tracking model is completed, deleting the shielding enhancement network and the shielding classification network in the target tracking model to update the model structure of the target tracking model; and executing a target tracking task by using the target tracking model after updating the model structure.

In an alternative embodiment, performing the target tracking task using the target tracking model after deleting the occlusion enhancement network and the occlusion classification network includes: inputting a target image or a target video into a target tracking model: the feature extraction network processes the target image or the target video to obtain a target feature map; inputting the target feature map into an attention network and outputting the target attention feature map; and inputting the target attention characteristic diagram into a target classification network, and outputting a recognition result.

The target image or target video may be one or more.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Fig. 3 is a schematic diagram of an occlusion-based object tracking device according to an embodiment of the present application. As shown in fig. 3, the occlusion-based object tracking device includes:

a first building block 301 configured to build an attention network using a channel global average pooling layer, a convolution layer, an activation layer, and an attention layer;

a second construction module 302 configured to construct a target classification network and an occlusion classification network using the spatial global averaging pooling layer and the full connectivity layer;

a third construction module 303 configured to construct an occlusion enhancement network, constructing a target tracking model using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network, and the occlusion classification network;

the training module 304 is configured to train the target tracking model according to the target tracking task, and execute the target tracking task by using the trained target tracking model.

In some embodiments, the training module 304 is further configured to obtain a training data set corresponding to the target tracking task, and randomly determine a first training sample of the target tracking model input at the current time from the training data set; inputting a first training sample into a target tracking model: the shielding enhancement network processes the first training sample to obtain a shielding sample; the feature extraction network processes the shielding sample to obtain a sample feature map; the attention network processes the sample feature map to obtain an attention feature map; the target classification network processes the attention characteristic diagram to obtain a target classification result; the shielding classification network processes the attention characteristic diagram to obtain a shielding classification result; calculating the classification loss between the target classification result and the target corresponding label in the first training sample by using the cross entropy loss function; calculating shielding losses between shielding classification results and shielding corresponding labels in shielding samples corresponding to the first training samples by using a cross entropy loss function; and optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the first training sample until the model parameters of the target tracking model are optimized by using all training samples in the training data set, and then determining that the training of the target tracking model is completed.

In some embodiments, the training module 304 is further configured to determine a second training sample corresponding to the first training sample, wherein the second training sample is any one of the other training samples in the training data set than the first training sample; cutting at any position of the second training sample according to the randomly determined size to obtain a cutting picture; placing the cut picture at any position of the first training sample to obtain a shielding sample corresponding to the first training sample, wherein the cut picture is used as shielding in the shielding sample; and determining a label corresponding to the occlusion in the occlusion sample.

In some embodiments, training module 304 is further configured to obtain an attention map by sequentially passing the channel global average pooling layer, the convolution layer, and the activation layer; the attention profile and the sample profile are input into an attention layer and an attention profile is output, wherein the attention profile assigns attention to each element in the sample profile based on the attention profile.

In some embodiments, the training module 304 is further configured to delete the occlusion classification network in the target tracking model after training of the target tracking model is completed to update the model structure of the target tracking model; and executing a target tracking task by using the target tracking model after updating the model structure.

In some embodiments, training module 304 is further configured to obtain a training data set corresponding to the target tracking task, performing a loop algorithm as follows: judging whether i is larger than N, wherein i is the sequence number of training samples in the training data set, and N is the number of training samples in the training data set; when i is greater than N, determining that training of the target tracking model is completed; when i is not greater than N, inputting the ith training sample into the target tracking model: the shielding enhancement network processes the ith training sample to obtain an ith shielding sample; the feature extraction network processes the ith shielding sample to obtain an ith sample feature map; the attention network processes the ith sample feature map to obtain an ith attention feature map; the target classification network processes the ith attention feature map to obtain an ith target classification result; the shielding classification network processes the ith attention feature map to obtain an ith shielding classification result; calculating classification loss between the ith target classification result and the target corresponding label in the ith training sample by using the cross entropy loss function; calculating the shielding loss between the ith shielding classification result and the shielding corresponding label in the ith shielding sample by using the cross entropy loss function; and optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the ith training sample, wherein i+1 is the model parameters.

In some embodiments, the training module 304 is further configured to delete the occlusion enhancement network and the occlusion classification network in the target tracking model after training of the target tracking model is completed to update the model structure of the target tracking model; and executing a target tracking task by using the target tracking model after updating the model structure.

In some embodiments, training module 304 is further configured to input the target image or target video into a target tracking model: the feature extraction network processes the target image or the target video to obtain a target feature map; inputting the target feature map into an attention network and outputting the target attention feature map; and inputting the target attention characteristic diagram into a target classification network, and outputting a recognition result.

The target image or target video may be one or more.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 4 is a schematic diagram of an electronic device 4 provided in an embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.

The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow in the methods of the above embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the respective method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of occlusion-based object tracking, comprising:

constructing an attention network by using a channel global average pooling layer, a convolution layer, an activation layer and an attention layer;

constructing a target classification network and an occlusion classification network by using a space global average pooling layer and a full connection layer;

constructing an occlusion enhancement network, and constructing a target tracking model by using the occlusion enhancement network, a feature extraction network, the attention network, the target classification network and the occlusion classification network;

training the target tracking model according to a target tracking task, and executing the target tracking task by using the trained target tracking model;

the training of the target tracking model according to the target tracking task comprises the following steps: acquiring a training data set corresponding to the target tracking task, and randomly determining a first training sample input into the target tracking model at the current moment from the training data set; inputting the first training sample into the target tracking model: the shielding enhancement network processes the first training sample to obtain a shielding sample; the characteristic extraction network processes the shielding sample to obtain a sample characteristic diagram; the attention network processes the sample feature map to obtain an attention feature map; the target classification network processes the attention feature map to obtain a target classification result; the shielding classification network processes the attention feature map to obtain shielding classification results; calculating the classification loss between the target classification result and the target corresponding label in the first training sample by using a cross entropy loss function; calculating the shielding loss between the shielding classification result and the shielding corresponding label in the shielding sample corresponding to the first training sample by using the cross entropy loss function; and optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the first training sample until training of the target tracking model is determined to be completed after optimizing the model parameters of the target tracking model by using all training samples in the training data set.

2. The method of claim 1, wherein constructing a target tracking model using the occlusion enhancement network, the feature extraction network, the attention network, the target classification network, and the occlusion classification network comprises:

in the target tracking model, the occlusion enhancement network, the feature extraction network and the attention network are sequentially connected in series, the attention layer in the attention network is connected with the activation layer and the feature extraction network respectively, and the target classification network and the occlusion classification network are connected behind the attention network in parallel;

wherein the attention layer in the attention network is to assign attention to each element in the output of the feature extraction network based on the output of the activation layer.

3. The method of claim 1, wherein the occlusion enhancement network processes the first training samples to obtain occlusion samples, comprising:

determining a second training sample corresponding to the first training sample, wherein the second training sample is any one training sample in the training data set except the first training sample;

cutting at any position of the second training sample according to the randomly determined size to obtain a cutting picture;

placing the cut picture at any position of the first training sample to obtain a shielding sample corresponding to the first training sample, wherein the cut picture is used as shielding in the shielding sample;

and determining a label corresponding to shielding in the shielding sample.

4. The method of claim 1, wherein the attention network processes the sample profile to obtain an attention profile, comprising:

the sample feature map sequentially passes through the channel global average pooling layer, the convolution layer and the activation layer to obtain attention force diagram;

the attention profile and the sample profile are input to the attention layer and the attention profile is output, wherein the attention profile assigns attention to each element in the sample profile based on the attention profile.

5. The method according to claim 1, wherein the method further comprises:

deleting the occlusion classification network in the target tracking model after training of the target tracking model is completed, so as to update a model structure of the target tracking model;

and executing the target tracking task by using the target tracking model with the updated model structure.

6. The method of claim 1, wherein training the object tracking model in accordance with an object tracking task comprises:

acquiring a training data set corresponding to the target tracking task, and executing the following circulation algorithm:

judging whether i is larger than N, wherein i is the sequence number of training samples in the training data set, and N is the number of training samples in the training data set;

when i is greater than N, determining that training of the target tracking model is completed;

when i is not greater than N, inputting an ith training sample into the target tracking model:

the shielding enhancement network processes the ith training sample to obtain an ith shielding sample;

the characteristic extraction network processes the ith shielding sample to obtain an ith sample characteristic diagram;

the attention network processes the ith sample feature map to obtain an ith attention feature map;

the target classification network processes the ith attention feature map to obtain an ith target classification result;

the shielding classification network processes the ith attention feature map to obtain an ith shielding classification result;

calculating classification loss between the ith target classification result and the target corresponding label in the ith training sample by using the cross entropy loss function;

calculating the shielding loss between the ith shielding classification result and the shielding corresponding label in the ith shielding sample by using the cross entropy loss function;

and optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the ith training sample, wherein i+1 is the model parameters.

7. A barrier-based object tracking device, comprising:

a first building module configured to build an attention network using the channel global average pooling layer, the convolution layer, the activation layer, and the attention layer;

the second construction module is configured to construct a target classification network and an occlusion classification network by using the space global average pooling layer and the full connection layer;

a third building module configured to build an occlusion enhancement network, a target tracking model being built using the occlusion enhancement network, a feature extraction network, the attention network, the target classification network, and the occlusion classification network;

the training module is configured to train the target tracking model according to a target tracking task, and execute the target tracking task by using the trained target tracking model;

the training module is further configured to acquire a training data set corresponding to the target tracking task, and randomly determine a first training sample input into the target tracking model at the current moment from the training data set; inputting the first training sample into the target tracking model: the shielding enhancement network processes the first training sample to obtain a shielding sample; the characteristic extraction network processes the shielding sample to obtain a sample characteristic diagram; the attention network processes the sample feature map to obtain an attention feature map; the target classification network processes the attention feature map to obtain a target classification result; the shielding classification network processes the attention feature map to obtain shielding classification results; calculating the classification loss between the target classification result and the target corresponding label in the first training sample by using a cross entropy loss function; calculating the shielding loss between the shielding classification result and the shielding corresponding label in the shielding sample corresponding to the first training sample by using the cross entropy loss function; and optimizing model parameters of the target tracking model by using the classification loss and the shielding loss corresponding to the first training sample until training of the target tracking model is determined to be completed after optimizing the model parameters of the target tracking model by using all training samples in the training data set.

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.