CN115661688A

CN115661688A - Unmanned aerial vehicle target re-identification method, system and equipment with rotation invariance

Info

Publication number: CN115661688A
Application number: CN202211225141.4A
Authority: CN
Inventors: 叶茫; 陈朔怡; 杜博
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-10-09
Filing date: 2022-10-09
Publication date: 2023-01-31
Anticipated expiration: 2042-10-09
Also published as: CN115661688B

Abstract

The invention discloses a method, a system and equipment for identifying an unmanned aerial vehicle target with rotation invariance, which comprises the following steps of firstly dividing an original picture into a plurality of small blocks in a partially overlapped mode through a block generation module; then, after flattening the small blocks into sequences, adding a random initialization small block as a subsequent global classification characteristic, and then inputting all the small blocks into a transform layer with the depth of h; the method comprises the following steps that characteristics obtained through learning of a transform layer enter two branches, one branch is subjected to characteristic level rotation to obtain a plurality of rotation characteristics, and the other branch is subjected to learning of a transform layer to obtain an original characteristic; then, adopting rotation invariant constraint loss optimization to the average value of a plurality of rotation characteristics and an original characteristic; a plurality of rotation features and one original feature are optimized by adopting triple loss; and finally, carrying out classification and identification through the pictures of the full connection layer and the batch normalization layer. The method enhances the generalization capability of the target angle change in the unmanned aerial vehicle scene, and improves the retrieval accuracy.

Description

Unmanned aerial vehicle target re-identification method, system and equipment with rotation invariance

Technical Field

The invention belongs to the technical field of computer vision target retrieval, and relates to a method, a system and equipment for identifying the target weight of an unmanned aerial vehicle, in particular to a method, a system and equipment for identifying the target weight of the unmanned aerial vehicle with rotation invariance.

Background

Object Re-identification (Re-ID) is a task of retrieving a specific object (e.g., pedestrian, vehicle) by a non-overlapping camera [ documents 1 to 3]. The current research in this field is mainly focused on urban cameras. However, the conventional urban camera has a limitation in capturing images, and particularly in a large open area, the position of the urban camera is fixed, so that the shooting range is limited and some blind areas exist [ document 4]. With the rapid development of unmanned aerial vehicles in the field of video surveillance, unmanned aerial vehicles can easily cover large areas and hard-to-reach areas, and exhibit more diverse and irreplaceable viewing angles [ documents 5 to 6]. The technology can be applied to many scenes, such as urban public security and large public place management. We define a new task that is more challenging than general target re-identification: the target in the unmanned aerial vehicle scene is identified, namely a specific target is identified in a plurality of aerial images captured from a high-altitude depression.

Compared to fixed city cameras, the fast movement and the constantly changing height of the drone lead to a great difference in viewing angle [ document 6]. In order to correctly identify the identity, the image needs to contain the whole body of the object. However, this faces two major difficult problems: 1) The shape of the generated bounding box varies greatly. This means that the bounding box is more than the background area encompassed under normal viewing angles, which makes the model more susceptible to some meaningless content interference. 2) The body of the same person in the bounding box has different rotation directions. This results in drone target re-identification with greater intra-class distance than traditional target re-identification. Identifying targets with large rotational views is challenging for widely used convolutional neural network models.

A large number of target re-identification methods based on convolutional neural networks (document 2,7,8,9,10) have achieved great success in urban camera scenes, but they are difficult to solve the rotation problem of unmanned aerial vehicle scenes. Unmanned pedestrian images inevitably contain a large portion of background, and convolution of a convolutional neural network is a typical operation between locally adjacent pixels [ document 11]. Therefore, convolutional neural network-based approaches always spend too much time in the background to accurately model the target area that provides useful information, limiting their applicability in drone scenarios. While Transformer is a structure based entirely on the attention mechanism, visual Transformer [ reference 12] exhibits a powerful ability to simulate global and remote relationships between each input image block portion. This property has prompted us to study a rotation invariant solution under the transform framework.

In solving the rotation problem, there are some studies for realizing rotation invariance based on a convolutional neural network applied to image classification, target monitoring and other visual tasks [ documents 13 to 15]. The adaptability of image conversion is improved, for example, by inserting a learnable module into a convolutional neural network [ document 15]. There are also methods to achieve rotation invariance by forcing training samples before and after rotation to share similar feature behavior [ document 14]. But these methods based on convolution and two-dimensional image-level operations are difficult to apply on a transform due to block operations.

In conclusion, it is important to design a rotation-invariant feature learning model for the Re-ID of the unmanned aerial vehicle to solve the above problems.

[ document 1] ying-Cong Chen, xiatian Zhu, wei-Shi Zheng, and Jian-Huang Lai.2017.Person re-identification by camera correlation evaluation of labor longevity evaluation. IEEE TPAMI 40,2 (2017), 392-408.

[ document 2] Man Ye, jianbin Shen, gaojie Lin, tao Xiao, ling Shao, and Steven C.H.Hoi.2021.Deep learning for person re-identification: A surfey and outlook. IEEE TPAMI (2021), 1-1.

[ document 3] Liang Zheng, yi Yang, and Alexander G Hauptmann.2016.Person identification.

[ document 4] Shizhou Zhang, qi Zhang, yifei Yang, xing Wei, peng Wang, bingliang Jiao, and Yanning Zhang, 2020.Person re-identification in ear image. IEEE TMM 23 (2020), 281-291.

[ document 5]]SV Kumar,Ehsan Yaghoubi,Abhijit Das,BS Harish,and Hugo

2020.The P-DESTRE:a fully annotated dataset for pedestrian detection,tracking,reidentification and search from aerial devices.arXiv preprint arXiv:2004.02782(2020).

[ document 6] Tianjiao Li, jun Liu, wei Zhang, yun Ni, wenqian Wang, and Zhang Li.2021.UAV-Human A Large Benchmark for Human being Beihavior Understanding with Unmanned Aerial vehicles. In CVPR.16266-16275.

[ document 7] Yifan Sun, liang Zheng, yi Yang, qi Tian, and Shengjin Wang.2018.Beyond part models, person retrieval with redefined part firing (and a strong volumetric base). In ECCV.480-496.

[ document 8] guanshuo Wang, yufeng Yuan, xiong Chen, jiwei Li, and Xi Zhou.2018.Learning discrete features with multiple granularities for person-identification in ACM MM.274-282.

[ document 9] Hao Luo, wei Jiang, youzhi Gu, fuxu Liu, xingyu Liao, shenqi Lai, and Jianyanggu.2019.A strongbaine and batch catalysis for deep person re-identification. IEEE TMM 22,10 (2019), 2597-2609.

[ document 10] Kaiyang Zhou, yongxin Yang, andrea Cavallaro, and Tao Xiao.2019, omni-scale feature learning for person re-identification in ICCV.3702-3712.

[ document 11] Xiaolong Wang, ross Girshick, abhinav Gupta, and Kaiming He.2018.Non-local neural networks in CVPR.7794-7803.

[ document 12] Alexey Dosovitsky, lucas Beyer, alexander Kolesnikov, dirk Weissenborn, xiaohua ZHai, thomas Untertioner, mostafa Dehgani, matthias Minderer, georg Heigold, sylvain Gelly, et al.2020.An image is worth 16x16 words.

[ document 13] Aharon Azurilay and Yair Weiss.2018. While do dependent network gene transformation so oxygen to small image transformation to paper copy arxiv.

[ document 14] Gong Cheng, peicheng Zhou, and Junwei Han.2016.Rifd-cnn: rotation-innovative and fisher discrete capacitive pressurized neural networks for object detection. In CVPR.2884-2893.

[ document 15] Max Jaderberg, karen Simnyan, andrew Zisserman, et al 2015 spatial transform networks Advances in neural information processing systems 28 (2015), 2017-2025.

Disclosure of Invention

In order to solve the technical problems, the invention provides a Vision Transformer (ViT) -based unmanned aerial vehicle target re-identification method, system and equipment with rotation invariance, and the accuracy of unmanned aerial vehicle scene target re-identification is improved.

The method adopts the technical scheme that: an unmanned aerial vehicle target re-identification method with rotation invariance is characterized in that a rotation invariance target identification network is adopted to carry out unmanned aerial vehicle target re-identification; the rotation invariant target identification network comprises a block generation module and a plurality of transform layers;

the method specifically comprises the following steps:

step 1: dividing an original picture into a plurality of small blocks in a way of partially overlapping through a block generation module;

step 2: after flattening the small blocks into a sequence, adding a random initialization small block as a subsequent global classification characteristic, and then inputting all the small blocks into a transform layer with the depth of h;

and step 3: entering two branches according to the characteristics obtained by learning of the transform layer in the step 2, wherein one branch is subjected to characteristic level rotation to obtain a plurality of rotation characteristics, and the other branch is subjected to learning of the transform layer to obtain an original characteristic;

and 4, step 4: adopting rotation invariant constraint loss optimization to the average value of a plurality of rotation characteristics and an original characteristic;

and 5: triple loss optimization is adopted for the plurality of rotation features and one original feature processed in the step 4;

step 6: and (5) classifying and identifying the pictures processed in the step (5) through a full connection layer and a batch normalization layer.

The technical scheme adopted by the system of the invention is as follows: an unmanned aerial vehicle target re-identification system with rotation invariance is characterized in that a rotation invariance target identification network is adopted to carry out unmanned aerial vehicle target re-identification; the rotation invariant target identification network comprises a block generation module and a plurality of transform layers;

the system specifically comprises the following modules:

the module 1 is used for dividing an original picture into a plurality of small blocks in a way of partially overlapping through the block generation module;

the module 2 is used for flattening the small blocks into sequences, then adding a random initialization small block as a subsequent global classification characteristic, and then inputting all the small blocks into a transform layer with the depth of h;

the module 3 is used for entering the two branches according to the features obtained by learning through the transform layer in the module 2, wherein one branch is subjected to feature level rotation to obtain a plurality of rotation features, and the other branch is subjected to learning through a transform layer to obtain an original feature;

a module 4 for optimizing the average value of a plurality of rotation characteristics and an original characteristic by adopting rotation invariant constraint loss;

a module 5, configured to perform triple loss optimization on the multiple rotation features and one original feature processed by the module 4;

and the module 6 is used for classifying and identifying the pictures processed by the module 5 through a full connection layer and a batch normalization layer.

The technical scheme adopted by the equipment of the invention is as follows: an unmanned aerial vehicle target re-identification device with rotational invariance, comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the drone target re-identification method with rotation invariance.

The invention has the following advantages:

(1) The invention designs a novel characteristic-level rotation strategy to enhance the generalization capability of coping with the rotation change of the unmanned aerial vehicle.

(2) The invention integrates the rotation invariance constraint into the characteristic learning process, enhances the robustness to the space change and reduces the error classification caused by the rotation change.

(3) The method provided by the invention is used for evaluating the unmanned aerial vehicle and the urban camera, and achieves better performance than the current most advanced technology. Rank-1/mAP was elevated from 63.3%/55.1% to 70.8%/63.7% on the challenging PRAI-1581 dataset.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a diagram of a rotation invariant target recognition network architecture according to an embodiment of the present invention;

fig. 3 is a schematic diagram of feature level rotation according to an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and the implementation examples, it is to be understood that the implementation examples described herein are only for the purpose of illustration and explanation and are not to be construed as limiting the present invention.

Considering that ViT has strong modeling capability and generalization capability, the method has excellent performance on common target recognition tasks. The core idea of the invention is to design a new feature-level rotation strategy based on ViT to enhance generalization to the rotation change, and integrate the rotation invariance constraint into the feature learning process to enhance the robustness to the spatial change, so as to reduce the error classification caused by the rotation change. In particular, the present invention proposes a method of simulating block feature rotation at the feature level to generate rotation features. Finally, the present invention establishes strong constraints between the plurality of rotation features and the original features, and optimizes with the original target, thereby improving the retrieval rate.

Referring to fig. 1, in the method for identifying the target of the unmanned aerial vehicle with the rotation invariance, the target of the unmanned aerial vehicle is identified again by adopting a rotation invariance target identification network; the rotation invariant target recognition network comprises a block generation module and a plurality of transform layers;

referring to fig. 2, the block generating module of the present embodiment includes a convolution layer, and divides the source image by 16 × 16 in an overlapping manner. A convolution kernel size of 16x16 with a step size of 12 was used. The Transformer layer consists of MSA (multi-headed self attention) and MLP (two-layer fully-connected network using the GELU activation function), both MSA and MLP preceded by LayerNorm and residual concatenation.

The method of the embodiment specifically comprises the following steps:

and 2, step: after flattening the small blocks into sequences, adding a random initialization small block as a subsequent global classification characteristic, and then inputting all the small blocks into a transform layer with the depth of h;

and step 3: entering the characteristics obtained by learning of the transform layer in the step 2 into two branches, performing characteristic-level rotation on one branch to obtain a plurality of rotation characteristics, and learning the other branch through a transform layer to obtain an original characteristic;

global feature representation learned by framework network

N +1 is here represented by a sequence of blocks of length N (denoted f) _p ) And a global classification feature component (denoted c) _O Including n original features). To is coming toSimulating the rotation operation in two dimensions, this embodiment will

Is reconstructed into

X and Y here represent the spatial size of step S produced by overlap block embedding. The formula for X and Y is:

where W, H is the length and width of the image, P is the size of a block, and D is the dimension;

referring to fig. 3, each block is regarded as a pixel, f in the present embodiment _res Visually, it can be seen as a two-dimensional matrix. In this way, the present embodiment can apply an operation similar to a rotation matrix to the block feature level. Due to the continuous movement of the drone, the captured matrix angle is randomly varied. The present embodiment randomly generates a series of angles a = { θ = _i I =1,2, …, n }. The coordinates of each block vector in the two-dimensional matrix are represented as (x, y), and the rotation angle formula is:

unlike pixel-based picture rotation, the feature level rotation is performed over larger blocks, so that rotation by a value that appears to be small actually simulates a relatively large rotation. Thus, the present embodiment defines a parameter α to limit the magnitude of the angle produced, θ ∈ [ - α, α]. A series of multi-angle rotating pictures F are obtained by implementing the rotating operation _r ＝{f _r1 ,f _r2 ,…,f _rn }. In thatThis step, the multi-angle characteristics of unmanned aerial vehicle scene are introduced the model in advance and are simulated diversified rotatory, and overall situation classification characteristic learns all information from original picture.

the feature level rotation in the step 3 improves the generalization capability of the network to angle change from the diversity angle. Furthermore, both the rotated features and the original features represent the same object. The embodiment artificially adds invariance constraints of the rotation feature and the original feature to the loss function to establish a relationship therebetween. By this method, the distance within the class (Man Ye, jianbin Shen, xu Zhang, pong C Yuen, and Shih-Fu Chang.2020. Evaluation in variance and implementation mapping for IEEE TPAMI (2020)) is shortened, which is more favorable for correct classification. Set of global classification features c of rotation features _r Global classification feature c from original features _O If constraints are established for each pair of original features and each rotated feature, it takes a significant computational cost. To avoid redundant calculations, the average of the rotation features is used to establish invariance, expressed as:

wherein, c _r1 、c _r2 、…、c _rn Respectively representing the added global features for classification for each rotational feature.

The goal of this embodiment is to limit the difference between the average rotated feature and the original feature. It is necessary to ensure that the class distinction represented by the rotation feature is not impaired. Mean Square Error (MSE) is the most commonly used loss function, which represents the sum of the squares of the differences between the predicted and target values. The embodiment selects smooth L1 loss to calculate the difference, which can effectively prevent the gradient explosion problem. The rotational invariance constraint for this portion is expressed as:

in the training phase, the overall loss function consists of three parts. When the rotation feature is updated, the original feature is also input to a transform layer to further update the global classification feature representing the global. This embodiment represents the original features obtained by learning through multiple transform layers as c _O . After batch normalization, triple loss and cross entropy loss are also employed:

furthermore, the mean rotation feature is an auxiliary feature representation that accommodates angular diversity. The invariance constraint controls the difference between the original and rotated features. The ensemble learning objective function is:

where λ and 1- λ represent the specific gravities of the original and rotated features, respectively.

step 6: and (4) classifying and identifying the pictures processed in the step (5) through a full connection layer and a batch normalization layer (BN layer).

The rotation invariant target recognition network of the embodiment is a trained rotation invariant target recognition network; due to the randomness of the rotation, each rotation feature containing different information can be considered as a new feature. To learn multiple features, two-dimensional

Should be flattened into

So that the Transformer can receiveIs a sequence of blocks. Each rotated feature has the same size N blocks as the original feature and it is difficult to cover all block information when classifying. The global classification characteristics after being learned by a plurality of transform layers are integrated into global characteristic representation. The embodiment adds the global classification characteristic c of n original characteristics to each rotation characteristic _O Is replicated to obtain

The purpose of this operation is that each rotation feature can be classified by a learnable global classification feature as a sample c of n rotation blocks _r1 、c _r2 、…、c _rn . Then, this embodiment establishes a transform layer for each sample to ensure that diversity is learned. C representing rotation characteristics during training _r Is based on the original global classification features c that already contain rich feature information _O This effectively avoids the loss of the rotating feature. In this embodiment, n classifiers are established for the global classification feature of the rotation feature updated by the transform layer. The most commonly used cross entropy loss function was used after batch standardization (Hao Luo, wei Jiang, youzhi Gu, fuxu Liu, xingyu Liao, shenqi Lai, and JianyangGu.2019.A strongb baseline and batch normalization sock for deep person re-identification. IEEE TMM 22,10 (2019), 2597-2609). Furthermore, for fine-grained identification, a triple Loss function is used at each global classification feature (Alexander Hermans, lucas Beyer, and Basian leibe.2017.In Defense of the triple Loss for Person Re-identification. ArXiv preprint arXiv:1703.07737 (2017)). The final loss function for the rotation characteristic is:

each global classification feature representing a rotation feature plays an equivalent role in updating the entire model.

The principle of the present embodiment is further illustrated below with reference to specific experiments.

The deep learning framework employed in the present embodiment is Pytorch. The experimental hardware environment is NVIDIA GeForce RTX3090 x 8 graphics card, and the processor is Intel (R) Xeon (R) Gold 6240. The experimental procedure was as follows:

the first step is as follows: rotational feature generation network build

In the experiment, a Vision Transformer (ViT) network is used as a feature extractor, block feature rotation is simulated on a feature level to generate rotation features, and finally constraint between the original features and the rotated features is established. And identity classification loss, triple loss, smooth L1 loss, cross entropy loss combined end-to-end feature extractor, rotation feature generation network and rotation invariant constraint are adopted.

The second step: network training

And dividing the target object picture and the unmanned aerial vehicle shot picture into a training set and a test set. The target object picture is sent to a characteristic rotating network for training. And optimizing and updating network parameters by utilizing forward propagation and backward propagation.

The third step: network testing

And taking the images of the target objects in the test set as a set to be queried, and taking a sample set shot by the unmanned aerial vehicle as a gallery set. And reasoning by adopting the model with the best effect in the training process to obtain the final retrieval result on the test set. The evaluation indexes adopt Rank-1, rank5, mAP and mINP matching precision, and the precision reflects the retrieval probability of correct re-identification images.

The invention relates to a pedestrian re-identification method based on a ground monitoring camera, wherein three data sets of PRAI-1581, UAVHuman and VRAI shot by an unmanned aerial vehicle and a Market1501 and MSMT7 are two commonly used pedestrian re-identification data sets collected by using the ground monitoring camera. PRAI-1581 is a data set proposed for the drone mission. The unmanned aerial vehicle consists of 39461 images of 1581 pedestrians shot by two unmanned aerial vehicles flying at the height of 20-60 meters. The UAVHuman is mainly used for unmanned aerial vehicle pedestrian behavior research and can also be used for various tasks such as pedestrian re-identification, action identification and height estimation. This data set contained 1444 pedestrians and 41290 images. VRAI is a vehicle re-identification dataset consisting of 137613 photographs of 13033 vehicles. The vehicle pictures are collected by drones flying at different places at heights ranging from 15 meters to 80 meters. Meanwhile, the system has rich annotations including colors, vehicle types, attributes, pictures and distinguishing places.

The present invention uniformly adjusts the images to 256 × 256. In addition, padding of 10 pixels, random cropping, and random erasure with a probability of 0.5 were employed in the training data. And initializing network parameters by using the parameters pre-trained by ImageNet-1K. In the overlapped block embedding phase, the patch size is set to 16 and the stride size is set to 12. In the feature-level rotation, the feature number N of the rotation is 4, and the random rotation angle ranges from-15 degrees to 15 degrees. Because the rotation is based on the block, the angle is not set too large. For the original features and rotated features extracted from the skeleton, edge-distance-free triplet loss is used, and cross-entropy loss is used after the features pass through the batch generalization layer. The original feature λ is weighted 0.5 and the rotated feature 1- λ is weighted 0.5. A Smooth L1 penalty is applied between the average rotated feature and the original feature. During training, a random gradient descent (SGD) optimizer is used. The initial learning rate is 0.008, and the cosine learning rate attenuation is adopted. The number of training times was 200. The batch size is set to 64 and includes 16 identities with 4 images each. In the testing phase, only the raw features are used to compute the distance matrix. The whole experiment was carried out based on PyTorch.

In order to verify the effectiveness of the invention, the section compares the retrieval result of the invention with the existing unmanned aerial vehicle re-identification method, and the existing target re-identification method mainly comprises the following steps:

(1)PCB：Jianlou Si,Honggang Zhang,Chun-Guang Li,Jason Kuen,Xiangfei Kong,Alex C Kot,and Gang Wang.2018.Dual attention matching network forcontext-aware feature sequence based person re-identification.In CVPR.5363–5372.

(2)SP：Shizhou Zhang,Qi Zhang,Yifei Yang,Xing Wei,Peng Wang,Bingliang Jiao,and Yanning Zhang.2020.Person re-identification in aerial imagery.IEEE TMM 23(2020),281–291.

(3)AGW：Mang Ye,Jianbing Shen,Gaojie Lin,Tao Xiang,Ling Shao,and Steven C.H.Hoi.2021.Deep learning for person re-identification:A survey and outlook.IEEE TPAMI(2021),1–1.

(4)Multi-task：Peng Wang,Bingliang Jiao,Lu Yang,Yifei Yang,Shizhou Zhang,Wei Wei,and Yanning Zhang.2019.Vehicle re-identification in aerial imagery:Dataset and approach.In ICCV.460–469.

(5)Baseline(ViT)：Shuting He,Hao Luo,Pichao Wang,Fan Wang,Hao Li,and Wei Jiang.2021.Transreid:Transformer-based object re-identification.In ICCV.15013–15022.

(6)TransReID：Shuting He,Hao Luo,Pichao Wang,Fan Wang,Hao Li,and Wei Jiang.2021.Transreid:Transformer-based object re-identification.In ICCV.15013–15022.

the test was performed on the PRAI-1581, UAVHuman, VRAI data set and the results are shown in Table 1

TABLE 1

Tests were performed on the Market-1501 and MSMT17 data sets with results shown in Table 2

TABLE 2

As can be seen from tables 1 and 2: compared with Re-ID in recent years, the method provided by the invention realizes the improvement of the retrieval rate on both target Re-identification of unmanned aerial vehicles and target Re-identification of urban cameras. On the PRAI-1581 data set, the performance of the method is obviously superior to that of all the methods in the table, and the performance of the method is superior to that of the current optimal method TransRe-ID by 4.8 percent and 5.9 percent on Rank1 and mAP respectively. On the UAVHuman dataset, mAP is 2% better than the TransRe-ID of the current optimal method. On the VRAI dataset, the method proposed by the present invention achieves a Rank1 accuracy of 83.5% and an mAP of 84.8% without using any side information, exceeding all other methods. On the Market-1501 and MSMT17 data sets, experiments of the proposed method in a common city camera scene also show that the method has strong generalization capability, and the mAP and Rank1 are respectively improved by 5.4% and 3.2% compared with the current optimum. The effectiveness and the generalization of the method are proved by experimental results on three unmanned aerial vehicle acquisition data sets and two ground cameras acquisition data sets.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The utility model provides an unmanned aerial vehicle target heavy identification method with rotation invariance which characterized in that: adopting a rotation invariant target identification network to carry out target re-identification on the unmanned aerial vehicle; the rotation invariant target identification network comprises a block generation module and a plurality of transform layers;

the method specifically comprises the following steps:

step 2: after flattening the small blocks into sequences, adding a random initialization small block as a subsequent global classification characteristic, and then inputting all the small blocks into a transform layer with the depth of h;

and 6: and (5) classifying and identifying the pictures processed in the step (5) through a full connection layer and a batch normalization layer.

2. The unmanned aerial vehicle target re-identification method with rotation invariance of claim 1, wherein: the block generation module comprises a convolution layer, the size of the convolution kernel is 16x16, and the step size is 12; the original picture is divided in units of 16 × 16 in an overlapping manner.

3. The method for re-identifying the target of the unmanned aerial vehicle with rotation invariance as claimed in claim 1, wherein: the Transformer layer consists of a multi-headed self-attention MSA and two layers of fully-connected network MLPs using GELU activation functions, wherein both the MSA and the MLP are configured with LayerNorm and residual connection in front.

4. The unmanned aerial vehicle target re-identification method with rotation invariance of claim 1, wherein: in step 3, after the feature level is rotated, a plurality of rotation features are obtained, namely, the rotation features are obtained by rotating the feature level

Is reconstructed into

X and Y here represent the spatial size of step S produced by overlapped block embedding; the formula for X and Y is:

where W, H is the length and width of the image, P is the size of a block, and D is the dimension; f. of _p Representing the block order of N, the global features being represented as

N +1 block sequence f of length N _p And a global classification feature c comprising n original features _O Composition is carried out;

each block is treated as a pixel，f _res Visually as a two-dimensional matrix; by randomly generating a series of angles A = { theta = { (theta) } _i I =1,2, …, n }, the coordinates of each block vector in the two-dimensional matrix are represented as (x, y), and the rotation angle formula is:

the rotation operation obtains a plurality of rotation characteristics F _r ＝{f _r1 ,f _r2 ,…,f _rn }。

5. The method for re-identifying the target of the unmanned aerial vehicle with rotation invariance as claimed in claim 1, wherein the rotation invariance constraint in step 4 is:

wherein, c _r1 、c _r2 、…、c _rn Respectively, the global features for classification added for each rotational feature.

6. The method for re-identifying the target of the unmanned aerial vehicle with rotation invariance as claimed in any one of claims 1 to 5, wherein: the rotation invariant target identification network is a trained rotation invariant target identification network; the loss function used in the training process is:

wherein the content of the first and second substances,

representing the loss of a triplet of rotation features,

representing the classification loss for the rotation feature, 1 ≦ i ≦ n.

7. The utility model provides an unmanned aerial vehicle target heavy identification system with rotation invariance which characterized in that: adopting a rotation invariant target identification network to carry out target re-identification on the unmanned aerial vehicle; the rotation invariant target identification network comprises a block generation module and a plurality of transform layers;

the system specifically comprises the following modules:

the module 5 is used for optimizing a plurality of rotation characteristics and one original characteristic processed by the module 4 by adopting triple loss;

8. The utility model provides an unmanned aerial vehicle target heavy identification equipment with rotation invariance which characterized in that includes:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the drone target re-identification method with rotation invariance of any one of claims 1 to 6.